Re: Responding to Requests with Chunks/Streaming

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Re: Responding to Requests with Chunks/Streaming

Mikhail Khludnev
Hello Developers,

I just want to ask don't you think that response streaming can be useful for things like OLAP, e.g. is you have sharded index presorted and pre-joined by BJQ way you can calculate counts in many cube cells in parallel?
Essential distributed test for response streaming just passed. 
https://github.com/m-khl/solr-patches/blob/ec4db7c0422a5515392a7019c5bd23ad3f546e4b/solr/core/src/test/org/apache/solr/response/RespStreamDistributedTest.java

branch is https://github.com/m-khl/solr-patches/tree/streaming

Regards

On Mon, Apr 2, 2012 at 10:55 AM, Mikhail Khludnev <[hidden email]> wrote:

Hello,

Small update - reading streamed response is done via callback. No SolrDocumentList in memory.
https://github.com/m-khl/solr-patches/tree/streaming
here is the test https://github.com/m-khl/solr-patches/blob/d028d4fabe0c20cb23f16098637e2961e9e2366e/solr/core/src/test/org/apache/solr/response/ResponseStreamingTest.java#L138

no progress in distributed search via streaming yet.

Pls let me know if you don't want to have updates from my playground.

Regards


On Thu, Mar 29, 2012 at 1:02 PM, Mikhail Khludnev <[hidden email]> wrote:
@All
Why nobody desires such a pretty cool feature?

Nicholas,
I have a tiny progress: I'm able to stream in javabin codec format while searching, It implies sorting by _docid_

here is the diff
https://github.com/m-khl/solr-patches/commit/2f9ff068c379b3008bb983d0df69dff714ddde95

The current issue is that reading response by SolrJ is done as whole. Reading by callback is supported by EmbeddedServer only. Anyway it should not a big deal. ResponseStreamingTest.java somehow works.
I'm stuck on introducing response streaming in distributes search, it's actually more challenging  - RespStreamDistributedTest fails

Regards


On Fri, Mar 16, 2012 at 3:51 PM, Nicholas Ball <[hidden email]> wrote:

Mikhail & Ludovic,

Thanks for both your replies, very helpful indeed!

Ludovic, I was actually looking into just that and did some tests with
SolrJ, it does work well but needs some changes on the Solr server if we
want to send out individual documents a various times. This could be done
with a write() and flush() to the FastOutputStream (daos) in JavBinCodec. I
therefore think that a combination of this and Mikhail's solution would
work best!

Mikhail, you mention that your solution doesn't currently work and not
sure why this is the case, but could it be that you haven't flushed the
data (os.flush()) you've written in the collect method of DocSetStreamer? I
think placing the output stream into the SolrQueryRequest is the way to go,
so that we can access it and write to it how we intend. However, I think
using the JavaBinCodec would be ideal so that we can work with SolrJ
directly, and not mess around with the encoding of the docs/data etc...

At the moment the entry point to JavaBinCodec is through the
BinaryResponseWriter which calls the highest level marshal() method which
decodes and sends out the entire SolrQueryResponse (line 49 @
BinaryResponseWriter). What would be ideal is to be able to break up the
response and call the JavaBinCodec for pieces of it with a flush after each
call. Did a few tests with a simple Thread.sleep and a flush to see if this
would actually work and looks like it's working out perfectly. Just trying
to figure out the best way to actually do it now :) any ideas?

An another note, for a solution to work with the chunked transfer encoding
(and therefore web browsers), a lot more development is going to be needed.
Not sure if it's worth trying yet but might look into it later down the
line.

Nick

On Fri, 16 Mar 2012 07:29:20 +0300, Mikhail Khludnev
<[hidden email]> wrote:
> Ludovic,
>
> I looked through. First of all, it seems to me you don't amend regular
> "servlet" solr server, but the only embedded one.
> Anyway, the difference is that you stream DocList via callback, but it
> means that you've instantiated it in memory and keep it there until it
will
> be completely consumed. Think about a billion numfound. Core idea of my
> approach is keep almost zero memory for response.
>
> Regards
>
> On Fri, Mar 16, 2012 at 12:12 AM, lboutros <[hidden email]> wrote:
>
>> Hi,
>>
>> I was looking for something similar.
>>
>> I tried this patch :
>>
>> https://issues.apache.org/jira/browse/SOLR-2112
>>
>> it's working quite well (I've back-ported the code in Solr 3.5.0...).
>>
>> Is it really different from what you are trying to achieve ?
>>
>> Ludovic.
>>
>> -----
>> Jouve
>> France.
>> --
>> View this message in context:
>>
http://lucene.472066.n3.nabble.com/Responding-to-Requests-with-Chunks-Streaming-tp3827316p3829909.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>



--
Sincerely yours
Mikhail Khludnev




--
Sincerely yours
Mikhail Khludnev




--
Sincerely yours
Mikhail Khludnev