Inconsistent Search Speed

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Inconsistent Search Speed

fangz
Hi,

I am using a simple java program to test the search speed. The index file is about 1.93G in size. I initiated an indexsearcher and built a query using the query parser: parser.parse("entity:fail"). The initial run took more than 60 seconds, but the subsequent runs only took 1.5 seconds. This does not change with or without calling indexsearcher.close(). As I know, Lucene does not cache results (no filter is involved). So what is causing such a big speed difference?

Thank you in advance!

fangz
Reply | Threaded
Open this post in threaded view
|

Re: Inconsistent Search Speed

Grant Ingersoll-2
The first call loads various data structures into memory.  The second  
takes advantage of those structures being in memory.  What you want to  
do is "warm" the searcher by sending some queries to it before making  
it available.

-Grant

On Feb 26, 2008, at 3:49 PM, fangz wrote:

>
> Hi,
>
> I am using a simple java program to test the search speed. The index  
> file is
> about 1.93G in size. I initiated an indexsearcher and built a query  
> using
> the query parser: parser.parse("entity:fail"). The initial run took  
> more
> than 60 seconds, but the subsequent runs only took 1.5 seconds. This  
> does
> not change with or without calling indexsearcher.close(). As I know,  
> Lucene
> does not cache results (no filter is involved). So what is causing  
> such a
> big speed difference?
>
> Thank you in advance!
>
> fangz
> --
> View this message in context: http://www.nabble.com/Inconsistent-Search-Speed-tp15698325p15698325.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>

--------------------------
Grant Ingersoll
http://www.lucenebootcamp.com
Next Training: April 7, 2008 at ApacheCon Europe in Amsterdam

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ






---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Inconsistent Search Speed

Eric Th
In reply to this post by fangz
Did you use the keywords in two calls?

2008/2/27, fangz <[hidden email]>:

>
>
> Hi,
>
> I am using a simple java program to test the search speed. The index file
> is
> about 1.93G in size. I initiated an indexsearcher and built a query using
> the query parser: parser.parse("entity:fail"). The initial run took more
> than 60 seconds, but the subsequent runs only took 1.5 seconds. This does
> not change with or without calling indexsearcher.close(). As I know,
> Lucene
> does not cache results (no filter is involved). So what is causing such a
> big speed difference?
>
> Thank you in advance!
>
> fangz
>
> --
> View this message in context:
> http://www.nabble.com/Inconsistent-Search-Speed-tp15698325p15698325.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Inconsistent Search Speed

fangz
In reply to this post by fangz
Thank you for the info.  It makes sense.

My search will return more than 10000 documents and I have to loop through all documents to build a list with unique field values. It seems that the looping of the hits takes the longest time in the initial run but afterwards it becomes much faster. If the hits are relatively small, I do not see the same behavior.
Reply | Threaded
Open this post in threaded view
|

Re: Inconsistent Search Speed

Mark Miller-3
The Lucene prime directive: dont iterate through all of Hits! Its
horribly inefficient. You must use a hitcollector. Even still, getting
your field values will be slow no matter what if you get for every hit.
You don't want to do this for every hit in a search. But don't loop
through Hits.

fangz wrote:
> Thank you for the info.  It makes sense.
>
> My search will return more than 10000 documents and I have to loop through
> all documents to build a list with unique field values. It seems that the
> looping of the hits takes the longest time in the initial run but afterwards
> it becomes much faster. If the hits are relatively small, I do not see the
> same behavior.
>  

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Inconsistent Search Speed

Grant Ingersoll-2
You could also look at the FieldSelector when getting the Document.  
Such that you only load the one field you need

-Grant

On Feb 26, 2008, at 10:13 PM, Mark Miller wrote:

> The Lucene prime directive: dont iterate through all of Hits! Its  
> horribly inefficient. You must use a hitcollector. Even still,  
> getting your field values will be slow no matter what if you get for  
> every hit. You don't want to do this for every hit in a search. But  
> don't loop through Hits.
>
> fangz wrote:
>> Thank you for the info.  It makes sense.
>> My search will return more than 10000 documents and I have to loop  
>> through
>> all documents to build a list with unique field values. It seems  
>> that the
>> looping of the hits takes the longest time in the initial run but  
>> afterwards
>> it becomes much faster. If the hits are relatively small, I do not  
>> see the
>> same behavior.
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>

--------------------------
Grant Ingersoll
http://www.lucenebootcamp.com
Next Training: April 7, 2008 at ApacheCon Europe in Amsterdam

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ






---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Inconsistent Search Speed

Erick Erickson
To reinforce Grant's comment, lazy loading improved one situation for me
on the order of 10X. I wrote it up and it's somewhere in the Wiki. Your
results
will vary, and unless you have a LOT of stored fields I wouldn't necessarily
expect a similar speedup, but it's sure worth looking at.

And don't iterate through the Hits object for more than 100 or so hits. Like
Mark said. Really. Really don't <G>...

Best
Erick

On Wed, Feb 27, 2008 at 7:33 AM, Grant Ingersoll <[hidden email]>
wrote:

> You could also look at the FieldSelector when getting the Document.
> Such that you only load the one field you need
>
> -Grant
>
> On Feb 26, 2008, at 10:13 PM, Mark Miller wrote:
>
> > The Lucene prime directive: dont iterate through all of Hits! Its
> > horribly inefficient. You must use a hitcollector. Even still,
> > getting your field values will be slow no matter what if you get for
> > every hit. You don't want to do this for every hit in a search. But
> > don't loop through Hits.
> >
> > fangz wrote:
> >> Thank you for the info.  It makes sense.
> >> My search will return more than 10000 documents and I have to loop
> >> through
> >> all documents to build a list with unique field values. It seems
> >> that the
> >> looping of the hits takes the longest time in the initial run but
> >> afterwards
> >> it becomes much faster. If the hits are relatively small, I do not
> >> see the
> >> same behavior.
> >>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
> >
>
> --------------------------
> Grant Ingersoll
> http://www.lucenebootcamp.com
> Next Training: April 7, 2008 at ApacheCon Europe in Amsterdam
>
> Lucene Helpful Hints:
> http://wiki.apache.org/lucene-java/BasicsOfPerformance
> http://wiki.apache.org/lucene-java/LuceneFAQ
>
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Inconsistent Search Speed

fangz
In reply to this post by fangz
I implemented HitCollector as you suggested. It improved the initial run significantly. However it only showed slight improvement in the subsequent runs. I don't know how to implement FieldSelector in my situation. My codes look like this:

public void collect( int doc, float score ) {

    TermFreqVector vector = null;
    vector = searcher.getIndexReader().getTermFreqVector(doc, "field");
    ...

Thank you again!

fangz
Reply | Threaded
Open this post in threaded view
|

Re: Inconsistent Search Speed

Grant Ingersoll-2
Ah, you didn't mention term vectors.  What do you need them for?  
Perhaps a bit more background could help here.

-Grant

On Feb 27, 2008, at 1:31 PM, fangz wrote:

>
> I implemented HitCollector as you suggested. It improved the initial  
> run
> significantly. However it only showed slight improvement in the  
> subsequent
> runs. I don't know how to implement FieldSelector in my situation.  
> My codes
> look like this:
>
> public void collect( int doc, float score ) {
>
>    TermFreqVector vector = null;
>    vector = searcher.getIndexReader().getTermFreqVector(doc, "field");
>    ...
>
> Thank you again!
>
> fangz
> --
> View this message in context: http://www.nabble.com/Inconsistent-Search-Speed-tp15698325p15719770.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>

--------------------------
Grant Ingersoll
http://www.lucenebootcamp.com
Next Training: April 7, 2008 at ApacheCon Europe in Amsterdam

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ






---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Inconsistent Search Speed

Daniel Noll-3-2
In reply to this post by Erick Erickson
On Thursday 28 February 2008 01:52:27 Erick Erickson wrote:
> And don't iterate through the Hits object for more than 100 or so hits.
> Like Mark said. Really. Really don't <G>...

Is there a good trick for avoiding this?

Say you have a situation like this...
  - User searches
  - User sees first N hits, perhaps scrolls
  - User chooses to save results to a file

Clearly for the first two, using Hits is normal.  For the third step you would
be iterating over potentially a larger number of results, so Hits is not
recommended.  But implementing a HitCollector from scratch to get the same
results as Hits seems silly, so what is the usual way out of this?  Do you
re-execute the query using TopDocs?  Or do you call hitDoc(hits.length()) to
force Hits itself to load the remainder, and then go back to the start and
iterate through?

Using TopDocs up-front would be desirable but it turns out it tries to
allocate the maximum you pass in, up-front...

Daniel

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]