Performance, yet again

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Performance, yet again

Andre Rubin
Hi all,

Most of our queries are very simple, of the type:

Query query = new PrefixQuery(new Term(LABEL_FIELD, prefix));
Hits hits = searcher.search(query, new Sort(new SortField(LABEL_FIELD)))

Which sometimes result in 10, 20, sometimes 40 thousand hits.

I get good performance if hits.length is 20.000 or less (less than 0.5
seconds). I However, if it is 40.000 or more, querying takes over a second,
up to 2.5 seconds. Point in check here is that this solution is not scaling.
Any ideas I can try?

I already exhausted the ideas from http://wiki.apache.org/lucene
-java/ImproveSearchingSpeed

I was reading about TopDocs and TopFieldDocs. Is this search method (using
TopDocs) preferred over Hits? Also, there's no constructor for them without
a Filter, can I just pass null?

Is it possible to pre-sort the index, so I don't have to every time I
perform a query?

Any other ideas?


Thanks,


Andre
Reply | Threaded
Open this post in threaded view
|

Re: Performance, yet again

Mark Miller-3
Andre Rubin wrote:
> Hi all,
>
> Most of our queries are very simple, of the type:
>
> Query query = new PrefixQuery(new Term(LABEL_FIELD, prefix));
> Hits hits = searcher.search(query, new Sort(new SortField(LABEL_FIELD)))
>  
You might want to check out solrs ConstantScorePrefixQuery and compare
performance.

> Which sometimes result in 10, 20, sometimes 40 thousand hits.
>
> I get good performance if hits.length is 20.000 or less (less than 0.5
> seconds). I However, if it is 40.000 or more, querying takes over a second,
> up to 2.5 seconds. Point in check here is that this solution is not scaling.
> Any ideas I can try?
>
> I already exhausted the ideas from http://wiki.apache.org/lucene
> -java/ImproveSearchingSpeed
>
> I was reading about TopDocs and TopFieldDocs. Is this search method (using
> TopDocs) preferred over Hits? Also, there's no constructor for them without
> a Filter, can I just pass null?
>  
It is preferred over Hits. Hits has been deprecated and you should
really migrate away from it.
> Is it possible to pre-sort the index, so I don't have to every time I
> perform a query?
>
> Any other ideas?
>  
I think in general, sorting and prefix query can be slower operations in
Lucene (though sorting is generally pretty fast after the field caches
are loaded). You might try the first couple suggestions there though,
and others may fill on other steps you can take as well.

- Mark


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Performance, yet again

Andre Rubin
On Tue, Sep 2, 2008 at 10:16 AM, Mark Miller <[hidden email]> wrote:

> Andre Rubin wrote:
>
>> Hi all,
>>
>> Most of our queries are very simple, of the type:
>>
>> Query query = new PrefixQuery(new Term(LABEL_FIELD, prefix));
>> Hits hits = searcher.search(query, new Sort(new SortField(LABEL_FIELD)))
>>
>>
> You might want to check out solrs ConstantScorePrefixQuery and compare
> performance.


I'm not familiar with Solrs. It is not standard Lucene, is it?


>
>  Which sometimes result in 10, 20, sometimes 40 thousand hits.
>>
>> I get good performance if hits.length is 20.000 or less (less than 0.5
>> seconds). I However, if it is 40.000 or more, querying takes over a
>> second,
>> up to 2.5 seconds. Point in check here is that this solution is not
>> scaling.
>> Any ideas I can try?
>>
>> I already exhausted the ideas from http://wiki.apache.org/lucene
>> -java/ImproveSearchingSpeed
>>
>> I was reading about TopDocs and TopFieldDocs. Is this search method (using
>> TopDocs) preferred over Hits? Also, there's no constructor for them
>> without
>> a Filter, can I just pass null?
>>
>>
> It is preferred over Hits. Hits has been deprecated and you should really
> migrate away from it.


I was trying, before, to use it, but it doesn't seem as straightfoward as
Hits. Is there an example code, somewhere?


>  Is it possible to pre-sort the index, so I don't have to every time I
>> perform a query?
>>
>> Any other ideas?
>>
>>
> I think in general, sorting and prefix query can be slower operations in
> Lucene (though sorting is generally pretty fast after the field caches are
> loaded). You might try the first couple suggestions there though, and others
> may fill on other steps you can take as well.
>
> - Mark
>


Thanks, Mark.


Andre
Reply | Threaded
Open this post in threaded view
|

Re: Performance, yet again

Mark Miller-3
Andre Rubin wrote:

> On Tue, Sep 2, 2008 at 10:16 AM, Mark Miller <[hidden email]> wrote:
>
>  
>> Andre Rubin wrote:
>>
>>    
>>> Hi all,
>>>
>>> Most of our queries are very simple, of the type:
>>>
>>> Query query = new PrefixQuery(new Term(LABEL_FIELD, prefix));
>>> Hits hits = searcher.search(query, new Sort(new SortField(LABEL_FIELD)))
>>>
>>>
>>>      
>> You might want to check out solrs ConstantScorePrefixQuery and compare
>> performance.
>>    
>
>
> I'm not familiar with Solrs. It is not standard Lucene, is it?
>  
Sorry about that. Solr is a search server that is a sub project of the
Lucene Apache project. You can just copy the Query from solrs source
code and use it with Lucene.  ConstantScorePrefixQuery may be faster for
you than PrefixQuery and it doesn't have MaxClause exceptions issues
when your prefix matches too many terms in the index. Please report back
the speed difference if you can.

http://lucene.apache.org/solr/

>
>  
>>  Which sometimes result in 10, 20, sometimes 40 thousand hits.
>>    
>>> I get good performance if hits.length is 20.000 or less (less than 0.5
>>> seconds). I However, if it is 40.000 or more, querying takes over a
>>> second,
>>> up to 2.5 seconds. Point in check here is that this solution is not
>>> scaling.
>>> Any ideas I can try?
>>>
>>> I already exhausted the ideas from http://wiki.apache.org/lucene
>>> -java/ImproveSearchingSpeed
>>>
>>> I was reading about TopDocs and TopFieldDocs. Is this search method (using
>>> TopDocs) preferred over Hits? Also, there's no constructor for them
>>> without
>>> a Filter, can I just pass null?
>>>
>>>
>>>      
>> It is preferred over Hits. Hits has been deprecated and you should really
>> migrate away from it.
>>    
>
>
> I was trying, before, to use it, but it doesn't seem as straightfoward as
> Hits. Is there an example code, somewhere?
>  
I think work was done on this when Hits was deprecated. Anyone know?

>
>  
>>  Is it possible to pre-sort the index, so I don't have to every time I
>>    
>>> perform a query?
>>>
>>> Any other ideas?
>>>
>>>
>>>      
>> I think in general, sorting and prefix query can be slower operations in
>> Lucene (though sorting is generally pretty fast after the field caches are
>> loaded). You might try the first couple suggestions there though, and others
>> may fill on other steps you can take as well.
>>
>> - Mark
>>
>>    
>
>
> Thanks, Mark.
>
>
> Andre
>
>  


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Performance, yet again

hossman
In reply to this post by Andre Rubin

: I was trying, before, to use it, but it doesn't seem as straightfoward as
: Hits. Is there an example code, somewhere?

"SearchFiles.java" in the Lucene demo was updated to use TopDocCollector
when Hits was deprecated.

: >  Is it possible to pre-sort the index, so I don't have to every time I
: >> perform a query?

Sorting by document order is very fast, so if you only care about one sort
order and you cna add your docs in that order, that's one thing that can
help ... otherwise just reuse the same IndexReader as much as possible so
your don't waste the FieldCache.

(if you have the luxury of adding docs in a set order that you search byu,
odds are you aren't upating your index on the fly, so you should be ableto
reuse an IndexReader and it's FieldCaches for a while)




-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Performance, yet again

Andre Rubin
In reply to this post by Mark Miller-3
I've tested ConstantScorePrefixQuery and it hit right in the head. It's now
mind-boggling fast! Even a query that has 200.000 matches was under 0.5
seconds!

Thanks! :))


Andre


On Tue, Sep 2, 2008 at 10:44 AM, Mark Miller <[hidden email]> wrote:

> Andre Rubin wrote:
>
>> On Tue, Sep 2, 2008 at 10:16 AM, Mark Miller <[hidden email]>
>> wrote:
>>
>>
>>
>>> Andre Rubin wrote:
>>>
>>>
>>>
>>>> Hi all,
>>>>
>>>> Most of our queries are very simple, of the type:
>>>>
>>>> Query query = new PrefixQuery(new Term(LABEL_FIELD, prefix));
>>>> Hits hits = searcher.search(query, new Sort(new SortField(LABEL_FIELD)))
>>>>
>>>>
>>>>
>>>>
>>> You might want to check out solrs ConstantScorePrefixQuery and compare
>>> performance.
>>>
>>>
>>
>>
>> I'm not familiar with Solrs. It is not standard Lucene, is it?
>>
>>
> Sorry about that. Solr is a search server that is a sub project of the
> Lucene Apache project. You can just copy the Query from solrs source code
> and use it with Lucene.  ConstantScorePrefixQuery may be faster for you than
> PrefixQuery and it doesn't have MaxClause exceptions issues when your prefix
> matches too many terms in the index. Please report back the speed difference
> if you can.
>
> http://lucene.apache.org/solr/
>
>>
>>
>>
>>>  Which sometimes result in 10, 20, sometimes 40 thousand hits.
>>>
>>>
>>>> I get good performance if hits.length is 20.000 or less (less than 0.5
>>>> seconds). I However, if it is 40.000 or more, querying takes over a
>>>> second,
>>>> up to 2.5 seconds. Point in check here is that this solution is not
>>>> scaling.
>>>> Any ideas I can try?
>>>>
>>>> I already exhausted the ideas from http://wiki.apache.org/lucene
>>>> -java/ImproveSearchingSpeed
>>>>
>>>> I was reading about TopDocs and TopFieldDocs. Is this search method
>>>> (using
>>>> TopDocs) preferred over Hits? Also, there's no constructor for them
>>>> without
>>>> a Filter, can I just pass null?
>>>>
>>>>
>>>>
>>>>
>>> It is preferred over Hits. Hits has been deprecated and you should really
>>> migrate away from it.
>>>
>>>
>>
>>
>> I was trying, before, to use it, but it doesn't seem as straightfoward as
>> Hits. Is there an example code, somewhere?
>>
>>
> I think work was done on this when Hits was deprecated. Anyone know?
>
>>
>>
>>
>>>  Is it possible to pre-sort the index, so I don't have to every time I
>>>
>>>
>>>> perform a query?
>>>>
>>>> Any other ideas?
>>>>
>>>>
>>>>
>>>>
>>> I think in general, sorting and prefix query can be slower operations in
>>> Lucene (though sorting is generally pretty fast after the field caches
>>> are
>>> loaded). You might try the first couple suggestions there though, and
>>> others
>>> may fill on other steps you can take as well.
>>>
>>> - Mark
>>>
>>>
>>>
>>
>>
>> Thanks, Mark.
>>
>>
>> Andre
>>
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>