NativeFSLockFactory, ConcurrentMergeScheduler: why locks?

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

NativeFSLockFactory, ConcurrentMergeScheduler: why locks?

Fuad Efendi
1.       I always have files lucene-xxxx-write.lock and
lucene-xxxx-n-write.lock which I believe shouldn't be used with
NativeFSLockFactory

2.       I use mergeFactor=100 and ramBufferSizeMB=256, few GB indes size. I
tried mergeFactor=10 and mergeFactor=1000.

 

 

It seems ConcurrentMergeScheduler locks everything instead of using separate
thread on background...

 

 

So that my configured system spents half an hour to UPDATE (probably
existing in the index) million of documents, then it stops and waits few
hours for index merge which is extremely slow (a lot of deletes?)

 

With mergeFactor=1000 I had extremely performant index updates (50,000,000 a
first day), and then I was waiting more than 2 days when merge complete (and
was forced to kill process).

 

Why it locks everything?

 

Thanks,

Fuad  

 

Reply | Threaded
Open this post in threaded view
|

Re: NativeFSLockFactory, ConcurrentMergeScheduler: why locks?

Jason Rutherglen
Fuad,

The lock indicates to external processes the index is in use, meaning
it's not cause ConcurrentMergeScheduler to block.

ConcurrentMergeScheduler does merge in it's own thread, however
if the merges are large then they can spike IO, CPU, and cause
the machine to be somewhat unresponsive.

What is the size of your index (in docs and GB)? How many
deletes are you performing? There are a few possible solutions
to these problems if you're able to separate athe updating from
the searching onto different servers.

-J

On Tue, Aug 11, 2009 at 10:08 AM, Fuad Efendi<[hidden email]> wrote:

> 1.       I always have files lucene-xxxx-write.lock and
> lucene-xxxx-n-write.lock which I believe shouldn't be used with
> NativeFSLockFactory
>
> 2.       I use mergeFactor=100 and ramBufferSizeMB=256, few GB indes size. I
> tried mergeFactor=10 and mergeFactor=1000.
>
>
>
>
>
> It seems ConcurrentMergeScheduler locks everything instead of using separate
> thread on background...
>
>
>
>
>
> So that my configured system spents half an hour to UPDATE (probably
> existing in the index) million of documents, then it stops and waits few
> hours for index merge which is extremely slow (a lot of deletes?)
>
>
>
> With mergeFactor=1000 I had extremely performant index updates (50,000,000 a
> first day), and then I was waiting more than 2 days when merge complete (and
> was forced to kill process).
>
>
>
> Why it locks everything?
>
>
>
> Thanks,
>
> Fuad
>
>
>
>
Reply | Threaded
Open this post in threaded view
|

RE: NativeFSLockFactory, ConcurrentMergeScheduler: why locks?

Fuad Efendi
Hi Jason,

I am using Master/Slave (two servers);
I monitored few hours today - 1 minute of document updates (about 100,000
documents) and then SOLR stops for at least 5 minutes to do background jobs
like RAM flush, segment merge...

Documents are small; about 10Gb of total index size for 50,000,000
documents.

I am suspecting "delete" is main bottleneck for Lucene since it marks
documents for deletion and then it needs to optimize inverted indexes (in
fact, to optimize)...


I run "update" queries to update documents, I have timestamp field and in
many cases I need to update timestamp only of existing document (specific
process periodically deletes expired documents, once a week) - but I am
still using out-of-the-box /update instead of implementing specific document
handler.

I can run it in a batch - for instance, collecting of millions of documents
somewhere and removing duplicates before sending to SOLR - but I prefer to
update document several times during a day - it's faster (although I
encountered a problem...)


Thanks,
Fuad



-----Original Message-----
From: Jason Rutherglen [mailto:[hidden email]]
Sent: August-11-09 4:45 PM
To: [hidden email]
Subject: Re: NativeFSLockFactory, ConcurrentMergeScheduler: why locks?

Fuad,

The lock indicates to external processes the index is in use, meaning
it's not cause ConcurrentMergeScheduler to block.

ConcurrentMergeScheduler does merge in it's own thread, however
if the merges are large then they can spike IO, CPU, and cause
the machine to be somewhat unresponsive.

What is the size of your index (in docs and GB)? How many
deletes are you performing? There are a few possible solutions
to these problems if you're able to separate athe updating from
the searching onto different servers.

-J

On Tue, Aug 11, 2009 at 10:08 AM, Fuad Efendi<[hidden email]> wrote:
> 1.       I always have files lucene-xxxx-write.lock and
> lucene-xxxx-n-write.lock which I believe shouldn't be used with
> NativeFSLockFactory
>
> 2.       I use mergeFactor=100 and ramBufferSizeMB=256, few GB indes size.
I
> tried mergeFactor=10 and mergeFactor=1000.
>
>
>
>
>
> It seems ConcurrentMergeScheduler locks everything instead of using
separate

> thread on background...
>
>
>
>
>
> So that my configured system spents half an hour to UPDATE (probably
> existing in the index) million of documents, then it stops and waits few
> hours for index merge which is extremely slow (a lot of deletes?)
>
>
>
> With mergeFactor=1000 I had extremely performant index updates (50,000,000
a
> first day), and then I was waiting more than 2 days when merge complete
(and

> was forced to kill process).
>
>
>
> Why it locks everything?
>
>
>
> Thanks,
>
> Fuad
>
>
>
>


Reply | Threaded
Open this post in threaded view
|

Re: NativeFSLockFactory, ConcurrentMergeScheduler: why locks?

Jason Rutherglen
> 1 minute of document updates (about 100,000 documents) and
then SOLR stops

100,000 docs in a minute is a lot. Lucene is probably
automatically flushing to disk and merging which is tying up the
IO subsystem. You may want to set the ConcurrentMergeScheduler
to 1 thread (which in Solr cannot be done and requires a custom
class, currently). This will minimize the number of threads
trying to merge at once, and may allow the merges to occur more
quickly (as the sequential read/writes will have longer to
perform, otherwise they could be interrupted by other merges,
causing excessive HD head movement).

I'd look at using SSDs, however I am aware that business folks
typically are not fond of them!

> instead of implementing specific document handler

Implementing a custom handler is probably unnecessary?

> I am suspecting "delete" is main bottleneck for Lucene

How many deletes are you performing in a minute? Or is it
100,000? (Meaning the update above is an update call to Solr,
not an add). 100,000 deletes is a lot as well.

Based on what you've said Fuad, I'd add documents, queue up
deletes to a separate file (i.e. not in Solr/Lucene), then later
on, send the deletes to Solr just prior to committing. This will allow
Lucene to focus on indexing only, create new segments etc, then
apply deletes later only when the segments are somewhat stable
(i.e. not being merged at a rapid pace).

Feel free to post some more numbers.

On Tue, Aug 11, 2009 at 2:07 PM, Fuad Efendi<[hidden email]> wrote:

> Hi Jason,
>
> I am using Master/Slave (two servers);
> I monitored few hours today - 1 minute of document updates (about 100,000
> documents) and then SOLR stops for at least 5 minutes to do background jobs
> like RAM flush, segment merge...
>
> Documents are small; about 10Gb of total index size for 50,000,000
> documents.
>
> I am suspecting "delete" is main bottleneck for Lucene since it marks
> documents for deletion and then it needs to optimize inverted indexes (in
> fact, to optimize)...
>
>
> I run "update" queries to update documents, I have timestamp field and in
> many cases I need to update timestamp only of existing document (specific
> process periodically deletes expired documents, once a week) - but I am
> still using out-of-the-box /update instead of implementing specific document
> handler.
>
> I can run it in a batch - for instance, collecting of millions of documents
> somewhere and removing duplicates before sending to SOLR - but I prefer to
> update document several times during a day - it's faster (although I
> encountered a problem...)
>
>
> Thanks,
> Fuad
>
>
>
> -----Original Message-----
> From: Jason Rutherglen [mailto:[hidden email]]
> Sent: August-11-09 4:45 PM
> To: [hidden email]
> Subject: Re: NativeFSLockFactory, ConcurrentMergeScheduler: why locks?
>
> Fuad,
>
> The lock indicates to external processes the index is in use, meaning
> it's not cause ConcurrentMergeScheduler to block.
>
> ConcurrentMergeScheduler does merge in it's own thread, however
> if the merges are large then they can spike IO, CPU, and cause
> the machine to be somewhat unresponsive.
>
> What is the size of your index (in docs and GB)? How many
> deletes are you performing? There are a few possible solutions
> to these problems if you're able to separate athe updating from
> the searching onto different servers.
>
> -J
>
> On Tue, Aug 11, 2009 at 10:08 AM, Fuad Efendi<[hidden email]> wrote:
>> 1.       I always have files lucene-xxxx-write.lock and
>> lucene-xxxx-n-write.lock which I believe shouldn't be used with
>> NativeFSLockFactory
>>
>> 2.       I use mergeFactor=100 and ramBufferSizeMB=256, few GB indes size.
> I
>> tried mergeFactor=10 and mergeFactor=1000.
>>
>>
>>
>>
>>
>> It seems ConcurrentMergeScheduler locks everything instead of using
> separate
>> thread on background...
>>
>>
>>
>>
>>
>> So that my configured system spents half an hour to UPDATE (probably
>> existing in the index) million of documents, then it stops and waits few
>> hours for index merge which is extremely slow (a lot of deletes?)
>>
>>
>>
>> With mergeFactor=1000 I had extremely performant index updates (50,000,000
> a
>> first day), and then I was waiting more than 2 days when merge complete
> (and
>> was forced to kill process).
>>
>>
>>
>> Why it locks everything?
>>
>>
>>
>> Thanks,
>>
>> Fuad
>>
>>
>>
>>
>
>
>
Reply | Threaded
Open this post in threaded view
|

RE: NativeFSLockFactory, ConcurrentMergeScheduler: why locks?

Fuad Efendi
Thank a lot Jason!


I'll go into depths with MergePolicy;

I use temporarily ramBufferSizeMB=8192 & mergeFactor=10 and looks like I
have constantly few thousands docs per second with very rare merge (already
3 hours, 8Gb index size, > 30 mlns docs)

I don't do "delete"; I execute SOLR /update and I use unique ID (so SOLR
probably deletes old version of doc). I was thinking about moving to Nutch
architecture (separate processes and schedule for data preparation and
document indexing) and pure Lucene but SOLR performs well now...

-Fuad

-----Original Message-----
From: Jason Rutherglen [mailto:[hidden email]]
Sent: August-11-09 8:48 PM
To: [hidden email]
Subject: Re: NativeFSLockFactory, ConcurrentMergeScheduler: why locks?

> 1 minute of document updates (about 100,000 documents) and
then SOLR stops

100,000 docs in a minute is a lot. Lucene is probably
automatically flushing to disk and merging which is tying up the
IO subsystem. You may want to set the ConcurrentMergeScheduler
to 1 thread (which in Solr cannot be done and requires a custom
class, currently). This will minimize the number of threads
trying to merge at once, and may allow the merges to occur more
quickly (as the sequential read/writes will have longer to
perform, otherwise they could be interrupted by other merges,
causing excessive HD head movement).

I'd look at using SSDs, however I am aware that business folks
typically are not fond of them!

> instead of implementing specific document handler

Implementing a custom handler is probably unnecessary?

> I am suspecting "delete" is main bottleneck for Lucene

How many deletes are you performing in a minute? Or is it
100,000? (Meaning the update above is an update call to Solr,
not an add). 100,000 deletes is a lot as well.

Based on what you've said Fuad, I'd add documents, queue up
deletes to a separate file (i.e. not in Solr/Lucene), then later
on, send the deletes to Solr just prior to committing. This will allow
Lucene to focus on indexing only, create new segments etc, then
apply deletes later only when the segments are somewhat stable
(i.e. not being merged at a rapid pace).

Feel free to post some more numbers.

On Tue, Aug 11, 2009 at 2:07 PM, Fuad Efendi<[hidden email]> wrote:
> Hi Jason,
>
> I am using Master/Slave (two servers);
> I monitored few hours today - 1 minute of document updates (about 100,000
> documents) and then SOLR stops for at least 5 minutes to do background
jobs

> like RAM flush, segment merge...
>
> Documents are small; about 10Gb of total index size for 50,000,000
> documents.
>
> I am suspecting "delete" is main bottleneck for Lucene since it marks
> documents for deletion and then it needs to optimize inverted indexes (in
> fact, to optimize)...
>
>
> I run "update" queries to update documents, I have timestamp field and in
> many cases I need to update timestamp only of existing document (specific
> process periodically deletes expired documents, once a week) - but I am
> still using out-of-the-box /update instead of implementing specific
document
> handler.
>
> I can run it in a batch - for instance, collecting of millions of
documents

> somewhere and removing duplicates before sending to SOLR - but I prefer to
> update document several times during a day - it's faster (although I
> encountered a problem...)
>
>
> Thanks,
> Fuad
>
>
>
> -----Original Message-----
> From: Jason Rutherglen [mailto:[hidden email]]
> Sent: August-11-09 4:45 PM
> To: [hidden email]
> Subject: Re: NativeFSLockFactory, ConcurrentMergeScheduler: why locks?
>
> Fuad,
>
> The lock indicates to external processes the index is in use, meaning
> it's not cause ConcurrentMergeScheduler to block.
>
> ConcurrentMergeScheduler does merge in it's own thread, however
> if the merges are large then they can spike IO, CPU, and cause
> the machine to be somewhat unresponsive.
>
> What is the size of your index (in docs and GB)? How many
> deletes are you performing? There are a few possible solutions
> to these problems if you're able to separate athe updating from
> the searching onto different servers.
>
> -J
>
> On Tue, Aug 11, 2009 at 10:08 AM, Fuad Efendi<[hidden email]> wrote:
>> 1.       I always have files lucene-xxxx-write.lock and
>> lucene-xxxx-n-write.lock which I believe shouldn't be used with
>> NativeFSLockFactory
>>
>> 2.       I use mergeFactor=100 and ramBufferSizeMB=256, few GB indes
size.

> I
>> tried mergeFactor=10 and mergeFactor=1000.
>>
>>
>>
>>
>>
>> It seems ConcurrentMergeScheduler locks everything instead of using
> separate
>> thread on background...
>>
>>
>>
>>
>>
>> So that my configured system spents half an hour to UPDATE (probably
>> existing in the index) million of documents, then it stops and waits few
>> hours for index merge which is extremely slow (a lot of deletes?)
>>
>>
>>
>> With mergeFactor=1000 I had extremely performant index updates
(50,000,000

> a
>> first day), and then I was waiting more than 2 days when merge complete
> (and
>> was forced to kill process).
>>
>>
>>
>> Why it locks everything?
>>
>>
>>
>> Thanks,
>>
>> Fuad
>>
>>
>>
>>
>
>
>