Quantcast

Control segment size

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Control segment size

vivek sar
Hi,

  Is there any configuration to control the segments' file size in
Solr? Currently, I've an index (70G) with 80 segment files and one of
the file is 24G. We noticed that in some cases commit takes over 2
hours to complete (committing 50K records), whereas usually it
finishes in 20 seconds. After further investigation it turns out the
system was doing lot of paging - the file system buffer was trying to
write back the big segment back to disk. I got 20G memory on system
with 6 G assigned to Solr instance (running 2 instances).

It seems if I can control the segment size to max of 4-5 GB I'll be
ok. Is there any way to do so?

I got merging factor of 100 - does that impacts the size too? Why
different segments have different size?

Thanks,
-vivek
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Control segment size

Otis Gospodnetic-2

Hi,

You are looking for maxMergeDocs, I believe.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----

> From: vivek sar <[hidden email]>
> To: [hidden email]
> Sent: Thursday, April 23, 2009 1:08:20 PM
> Subject: Control segment size
>
> Hi,
>
>   Is there any configuration to control the segments' file size in
> Solr? Currently, I've an index (70G) with 80 segment files and one of
> the file is 24G. We noticed that in some cases commit takes over 2
> hours to complete (committing 50K records), whereas usually it
> finishes in 20 seconds. After further investigation it turns out the
> system was doing lot of paging - the file system buffer was trying to
> write back the big segment back to disk. I got 20G memory on system
> with 6 G assigned to Solr instance (running 2 instances).
>
> It seems if I can control the segment size to max of 4-5 GB I'll be
> ok. Is there any way to do so?
>
> I got merging factor of 100 - does that impacts the size too? Why
> different segments have different size?
>
> Thanks,
> -vivek

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Control segment size

vivek sar
Thanks Otis.

I did set the maxMergeDocs to 10M, but I still see couple of index
files over 30G which do not match with max number of documents. Here
are some numbers,

1) My total index size = 66GB
2) Number of total documents = 200M
3) 1M doc = 300MB
4) 10M doc should be roughly around 3-4GB.

Under the index I see,

-rw-r--r--   1 dssearch  staff  31771545312 May  6 14:15 _2tp.cfs
-rw-r--r--   1 dssearch  staff  31932190573 May  7 08:13 _5ne.cfs
-rw-r--r--   1 dssearch  staff    543118747 May  7 08:32 _5p2.cfs
-rw-r--r--   1 dssearch  staff    543124452 May  7 08:53 _5qr.cfs
-rw-r--r--   1 dssearch  staff    543100201 May  7 09:18 _5sg.cfs
..
..

As you can see couple of files are huge. Are those documents or index
files? How can I control the file size so no single file grows more
than 10GB.

Thanks,
-vivek



On Thu, Apr 23, 2009 at 10:26 AM, Otis Gospodnetic
<[hidden email]> wrote:

>
> Hi,
>
> You are looking for maxMergeDocs, I believe.
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
> ----- Original Message ----
>> From: vivek sar <[hidden email]>
>> To: [hidden email]
>> Sent: Thursday, April 23, 2009 1:08:20 PM
>> Subject: Control segment size
>>
>> Hi,
>>
>>   Is there any configuration to control the segments' file size in
>> Solr? Currently, I've an index (70G) with 80 segment files and one of
>> the file is 24G. We noticed that in some cases commit takes over 2
>> hours to complete (committing 50K records), whereas usually it
>> finishes in 20 seconds. After further investigation it turns out the
>> system was doing lot of paging - the file system buffer was trying to
>> write back the big segment back to disk. I got 20G memory on system
>> with 6 G assigned to Solr instance (running 2 instances).
>>
>> It seems if I can control the segment size to max of 4-5 GB I'll be
>> ok. Is there any way to do so?
>>
>> I got merging factor of 100 - does that impacts the size too? Why
>> different segments have different size?
>>
>> Thanks,
>> -vivek
>
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Control segment size

Shalin Shekhar Mangar
On Fri, May 8, 2009 at 1:30 AM, vivek sar <[hidden email]> wrote:

>
> I did set the maxMergeDocs to 10M, but I still see couple of index
> files over 30G which do not match with max number of documents. Here
> are some numbers,
>
> 1) My total index size = 66GB
> 2) Number of total documents = 200M
> 3) 1M doc = 300MB
> 4) 10M doc should be roughly around 3-4GB.
>
> As you can see couple of files are huge. Are those documents or index
> files? How can I control the file size so no single file grows more
> than 10GB.
>

No, there is no way to limit an individual file to a specific size.

--
Regards,
Shalin Shekhar Mangar.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Control segment size

vivek sar
Shalin,

 Here is what I've read on maxMergeDocs,

 "While merging segments, Lucene will ensure that no segment with more
than maxMergeDocs is created."

 Wouldn't that mean that no index file should contain more than max
docs? I guess the index files could also just contain the index
information which is not limited by any property - is that true?

Is there any work around to limit the index size, beside limiting the
index itself?

Thanks,
-vivek

On Fri, May 8, 2009 at 10:02 PM, Shalin Shekhar Mangar
<[hidden email]> wrote:

> On Fri, May 8, 2009 at 1:30 AM, vivek sar <[hidden email]> wrote:
>
>>
>> I did set the maxMergeDocs to 10M, but I still see couple of index
>> files over 30G which do not match with max number of documents. Here
>> are some numbers,
>>
>> 1) My total index size = 66GB
>> 2) Number of total documents = 200M
>> 3) 1M doc = 300MB
>> 4) 10M doc should be roughly around 3-4GB.
>>
>> As you can see couple of files are huge. Are those documents or index
>> files? How can I control the file size so no single file grows more
>> than 10GB.
>>
>
> No, there is no way to limit an individual file to a specific size.
>
> --
> Regards,
> Shalin Shekhar Mangar.
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Control segment size

Shalin Shekhar Mangar
On Tue, May 12, 2009 at 2:30 AM, vivek sar <[hidden email]> wrote:

> Here is what I've read on maxMergeDocs,
>
>  "While merging segments, Lucene will ensure that no segment with more
> than maxMergeDocs is created."
>
>  Wouldn't that mean that no index file should contain more than max
> docs? I guess the index files could also just contain the index
> information which is not limited by any property - is that true?
>

Yes, an individual segment will not contain more than maxMergeDocs number of
documents. But the size of the segment may still vary because some documents
may have more unique tokens than others.

What you saw originally must have been a segment merge which is normal and
happens in the course of indexing. I don't think there's a way to avoid that
other than to have a ridiculously high mergeFactor (which will affect search
performance).

--
Regards,
Shalin Shekhar Mangar.
Loading...