Search Performance When There Are Many Segments

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Search Performance When There Are Many Segments

fireofenigma
Let me start with an example application/scenario.

I have an application that allows users to upload their documents that will eventually be added to the index. Every 10 documents, I commit(). Do I ever need to make a call to optimize() to optimize the index or does Solr have a default behavior when to call optimize()? Regardless, if optimize() never gets called after say 1000 calls to commit() with each commit adding 10 documents to the index, does that have an adverse effect to the search speed?

Let's assume I'm using a compound-file index.

Any insight is greatly appreciated. Thanks!
Reply | Threaded
Open this post in threaded view
|

Re: Search Performance When There Are Many Segments

Otis Gospodnetic-2
Hi,

If your mergeFactor is "reasonable" (e.g. default 10), Lucene will keep the number of segments in the index under control.  Your index will not be optimized at all times, but the number of segments will not be astronomical and not having a single-segment (i.e. optimized) index will not cause you headaches.  I have some large indices over at simpy.com and I *never* optimize them.  They grow and shrink as new docs are added to them, but never explode.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

----- Original Message ----

> From: fireofenigma <[hidden email]>
> To: [hidden email]
> Sent: Tuesday, February 19, 2008 6:30:34 PM
> Subject: Search Performance When There Are Many Segments
>
>
> Let me start with an example application/scenario.
>
> I have an application that allows users to upload their documents that will
> eventually be added to the index. Every 10 documents, I commit(). Do I ever
> need to make a call to optimize() to optimize the index or does Solr have a
> default behavior when to call optimize()? Regardless, if optimize() never
> gets called after say 1000 calls to commit() with each commit adding 10
> documents to the index, does that have an adverse effect to the search
> speed?
>
> Let's assume I'm using a compound-file index.
>
> Any insight is greatly appreciated. Thanks!
> --
> View this message in context:
> http://www.nabble.com/Search-Performance-When-There-Are-Many-Segments-tp15578740p15578740.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


Reply | Threaded
Open this post in threaded view
|

Re: Search Performance When There Are Many Segments

Mike Klaas
And yet, if you are experiencing performance problems, consider  
optimizing regularly.  If not, why worry?

-Mike

On 19-Feb-08, at 4:17 PM, Otis Gospodnetic wrote:

> Hi,
>
> If your mergeFactor is "reasonable" (e.g. default 10), Lucene will  
> keep the number of segments in the index under control.  Your index  
> will not be optimized at all times, but the number of segments will  
> not be astronomical and not having a single-segment (i.e. optimized)  
> index will not cause you headaches.  I have some large indices over  
> at simpy.com and I *never* optimize them.  They grow and shrink as  
> new docs are added to them, but never explode.
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
> ----- Original Message ----
>> From: fireofenigma <[hidden email]>
>> To: [hidden email]
>> Sent: Tuesday, February 19, 2008 6:30:34 PM
>> Subject: Search Performance When There Are Many Segments
>>
>>
>> Let me start with an example application/scenario.
>>
>> I have an application that allows users to upload their documents  
>> that will
>> eventually be added to the index. Every 10 documents, I commit().  
>> Do I ever
>> need to make a call to optimize() to optimize the index or does  
>> Solr have a
>> default behavior when to call optimize()? Regardless, if optimize()  
>> never
>> gets called after say 1000 calls to commit() with each commit  
>> adding 10
>> documents to the index, does that have an adverse effect to the  
>> search
>> speed?
>>
>> Let's assume I'm using a compound-file index.
>>
>> Any insight is greatly appreciated. Thanks!
>> --
>> View this message in context:
>> http://www.nabble.com/Search-Performance-When-There-Are-Many-Segments-tp15578740p15578740.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
>
>