Hits not serializable.

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Hits not serializable.

rnewson

Can Hits be made serializable?

I'm finding that almost all of the time for a remote search is spent
lazily retrieving document objects.

I'd like to create a remote interface like with a method like;

Hits search(Query query, Filter filter, int prefetch)

The remote end would call Hits.doc() for the first $prefetch entries.

This will make a huge difference to remote searching performance;

total fetch server1 server2 server3
862     699     86      69      96

For now, I'll use Document[] as the return value, but Hits feels more
natural.

B.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Hits not serializable.

Nrupal Akolkar
Hi,
Dear try doing the following,
1. write an extension class and extend the class containing search(...)
method you listed. Define that class to be serialized.
2. let the class be overriding search method with just same content in it as
in the super class.
3. build your lucene 1.** file again with ant, and try working out the way
you desire.
I think this solves your problem.
Nrupal


 On 6/24/05, Robert Newson <[hidden email]> wrote:

>
>
> Can Hits be made serializable?
>
> I'm finding that almost all of the time for a remote search is spent
> lazily retrieving document objects.
>
> I'd like to create a remote interface like with a method like;
>
> Hits search(Query query, Filter filter, int prefetch)
>
> The remote end would call Hits.doc() for the first $prefetch entries.
>
> This will make a huge difference to remote searching performance;
>
> total fetch server1 server2 server3
> 862 699 86 69 96
>
> For now, I'll use Document[] as the return value, but Hits feels more
> natural.
>
> B.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Hits not serializable. (bulk document retrieval)

rnewson

Thanks for the suggestion. I have solved this problem locally, I'm
wondering if this should be in Lucene core.

I have seven machines in a rack, each with Lucene indexes of about 30
million messages each. I'm trying to search across them with
RemoteSearcher and ParallelMultiSearcher.

Search times are impressive, only hundreds of milliseconds (for multiple
term queries).

Unfortunately, in order for the search to be useful, I need to pull back
a page worth of hits. In my case this is the first 25 results.

With the current out-of-the-box API this causes 50 sequential RMI calls,
which seriously degrades the total time that the client must wait for a
response.

ParallelMultiSearcher itself is pretty reasonable, though I have my own
re-implementation using the java.util.concurrent framework. However, the
Lucene API is simply not optimised for retrieving Documents in bulk.

Obviously we can all work around it in different ways, but I feel that
it should be core functionality.

Searchable could have a bulk retrieval method and ParallelMultiSearcher
should be able to execute it *in parallel* to each underlying searcher.

I've implemented it locally. If anyone feels that this addresses a
genuine problem, let me know.

In short, should Lucene provide an efficient document paging facility,
or is it not considered core?

B.

P.S. I'm using a CVS snapshot of Lucene 1.9.

Nrupal Akolkar wrote:

> Hi,
> Dear try doing the following,
> 1. write an extension class and extend the class containing search(...)
> method you listed. Define that class to be serialized.
> 2. let the class be overriding search method with just same content in it as
> in the super class.
> 3. build your lucene 1.** file again with ant, and try working out the way
> you desire.
> I think this solves your problem.
> Nrupal
>
>
>  On 6/24/05, Robert Newson <[hidden email]> wrote:
>
>>
>>Can Hits be made serializable?
>>
>>I'm finding that almost all of the time for a remote search is spent
>>lazily retrieving document objects.
>>
>>I'd like to create a remote interface like with a method like;
>>
>>Hits search(Query query, Filter filter, int prefetch)
>>
>>The remote end would call Hits.doc() for the first $prefetch entries.
>>
>>This will make a huge difference to remote searching performance;
>>
>>total fetch server1 server2 server3
>>862 699 86 69 96
>>
>>For now, I'll use Document[] as the return value, but Hits feels more
>>natural.
>>
>>B.
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: [hidden email]
>>For additional commands, e-mail: [hidden email]
>>
>>
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]