concurrent optimize and update

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

concurrent optimize and update

Jeremy Hinegardner
Hi all,

What happens internally in solr when an optimize/commit request is submitted by
one process, and some other process starts submitting Xml documents to add?  Is
this generally a safe thing to do?  

Basically I'm continually adding documents to solr, and decided that <autocommit
/> would be a good thing for me to use, so I'm using that every 25000 docs or
every 15 minutes.  Now I want to do an optimize every 24 hours or so, so I was
going to cron that up, but do I also need to stop the indexing processes from
submitting xml docs to the update handler while the optimize is taking place?

enjoy,

-jeremy

--
========================================================================
 Jeremy Hinegardner                              [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: concurrent optimize and update

Yonik Seeley-2
On Mon, Aug 11, 2008 at 6:16 PM, Jeremy Hinegardner
<[hidden email]> wrote:
> What happens internally in solr when an optimize/commit request is submitted by
> one process, and some other process starts submitting Xml documents to add?  Is
> this generally a safe thing to do?

It's safe... the adds will block until the commit or optimize has finished.

-Yonik
Reply | Threaded
Open this post in threaded view
|

Re: concurrent optimize and update

Jason Rennie-2
On Mon, Aug 11, 2008 at 6:41 PM, Yonik Seeley <[hidden email]> wrote:

> It's safe... the adds will block until the commit or optimize has finished.
>

By block, do you mean that the update connection(s) will be held open?  Our
optimizes take many minutes to complete.  I'm thinking that this could cause
a large pile of threads to accumulate if we're not careful...

Jason
Reply | Threaded
Open this post in threaded view
|

Re: concurrent optimize and update

Yonik Seeley-2
On Tue, Aug 12, 2008 at 11:19 AM, Jason Rennie <[hidden email]> wrote:
> On Mon, Aug 11, 2008 at 6:41 PM, Yonik Seeley <[hidden email]> wrote:
>
>> It's safe... the adds will block until the commit or optimize has finished.
>>
>
> By block, do you mean that the update connection(s) will be held open?

HTTP calls are synchronous, so yes it will hold a connection open
(unless the container is configured to time out responses after a
while).

> Our
> optimizes take many minutes to complete.  I'm thinking that this could cause
> a large pile of threads to accumulate if we're not careful...

Many HTTP clients block at a certain number of open connections to the
same server, acting as a natural  throttling mechanism.

-Yonik
Reply | Threaded
Open this post in threaded view
|

Re: concurrent optimize and update

Jeremy Hinegardner
On Tue, Aug 12, 2008 at 11:51:12AM -0400, Yonik Seeley wrote:

> On Tue, Aug 12, 2008 at 11:19 AM, Jason Rennie <[hidden email]> wrote:
> > On Mon, Aug 11, 2008 at 6:41 PM, Yonik Seeley <[hidden email]> wrote:
> >
> >> It's safe... the adds will block until the commit or optimize has finished.
> >>
> >
> > By block, do you mean that the update connection(s) will be held open?
>
> HTTP calls are synchronous, so yes it will hold a connection open
> (unless the container is configured to time out responses after a
> while).
>
> > Our
> > optimizes take many minutes to complete.  I'm thinking that this could cause
> > a large pile of threads to accumulate if we're not careful...
>
> Many HTTP clients block at a certain number of open connections to the
> same server, acting as a natural  throttling mechanism.

The route we've taken is to use <autocommit> with 25000 docs or 15 minutes, in
the solrconfig.xml and continually add new data.  Then 1/night we stop adding
new data, and run an optimize.  For us this takes about 90 minutes across all
our cores.

So far this is working well.

enjoy,

-jeremy

--
========================================================================
 Jeremy Hinegardner                              [hidden email]