Multiple instances of Lucene IndexWriter

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Multiple instances of Lucene IndexWriter

David K-2
We are currently evaluating Lucene for document indexing and a question came up regarding multiple instances of IndexWriter possibly accessing the same index (directory).

This would be a consequence of multiple instances of our application possibly accessing the same index. Where multiple instances are used for load balancing and fail over of the application.

The index could be either on a local drive when virtualization is used to achieve multiple instances on a single box. The index could also be on a shared drive (windows file sharing) and multiple server instances trying to update it.

I have been looking around in the forums and it is always advised against multiple instances of IndexWriter  writing to the same index but I was wondering whether the group has any suggestions for workarounds. Surely there must be other load balanced applications using Lucene?

Some the workarounds I can think of OTTOMH:

1. each instance writing to a local index and merge these local indexes periodically to a shared index where searching is performed

2. implement our own queuing algorithms by testing for write locks and wait until locks are cleared

thank you,
David
Reply | Threaded
Open this post in threaded view
|

Re: Multiple instances of Lucene IndexWriter

Erik Hatcher
David,

Have a look at Solr, http://lucene.apache.org/solr - it addresses  
this issue and many others that you would likely encounter with using  
pure Lucene.

        Erik


On Oct 12, 2007, at 6:26 AM, David K wrote:

>
> We are currently evaluating Lucene for document indexing and a  
> question came
> up regarding multiple instances of IndexWriter possibly accessing  
> the same
> index (directory).
>
> This would be a consequence of multiple instances of our application
> possibly accessing the same index. Where multiple instances are  
> used for
> load balancing and fail over of the application.
>
> The index could be either on a local drive when virtualization is  
> used to
> achieve multiple instances on a single box. The index could also be  
> on a
> shared drive (windows file sharing) and multiple server instances  
> trying to
> update it.
>
> I have been looking around in the forums and it is always advised  
> against
> multiple instances of IndexWriter  writing to the same index but I was
> wondering whether the group has any suggestions for workarounds.  
> Surely
> there must be other load balanced applications using Lucene?
>
> Some the workarounds I can think of OTTOMH:
>
> 1. each instance writing to a local index and merge these local  
> indexes
> periodically to a shared index where searching is performed
>
> 2. implement our own queuing algorithms by testing for write locks  
> and wait
> until locks are cleared
>
> thank you,
> David
> --
> View this message in context: http://www.nabble.com/Multiple- 
> instances-of-Lucene-IndexWriter-tf4612568.html#a13172543
> Sent from the Lucene - General mailing list archive at Nabble.com.

Reply | Threaded
Open this post in threaded view
|

Re: Multiple instances of Lucene IndexWriter

David K-2
Thank you for the quick response but at the moment we are interested in our own (small) usage of Lucene. It may be that in the future it turns out that Solr is the solution we need.

At the moment I was hoping for a more descriptive workaround for the issue of using multiple instances of IndexWriter on the same index.



Erik Hatcher wrote
David,

Have a look at Solr, http://lucene.apache.org/solr - it addresses  
this issue and many others that you would likely encounter with using  
pure Lucene.

        Erik


On Oct 12, 2007, at 6:26 AM, David K wrote:

>
> We are currently evaluating Lucene for document indexing and a  
> question came
> up regarding multiple instances of IndexWriter possibly accessing  
> the same
> index (directory).
>
> This would be a consequence of multiple instances of our application
> possibly accessing the same index. Where multiple instances are  
> used for
> load balancing and fail over of the application.
>
> The index could be either on a local drive when virtualization is  
> used to
> achieve multiple instances on a single box. The index could also be  
> on a
> shared drive (windows file sharing) and multiple server instances  
> trying to
> update it.
>
> I have been looking around in the forums and it is always advised  
> against
> multiple instances of IndexWriter  writing to the same index but I was
> wondering whether the group has any suggestions for workarounds.  
> Surely
> there must be other load balanced applications using Lucene?
>
> Some the workarounds I can think of OTTOMH:
>
> 1. each instance writing to a local index and merge these local  
> indexes
> periodically to a shared index where searching is performed
>
> 2. implement our own queuing algorithms by testing for write locks  
> and wait
> until locks are cleared
>
> thank you,
> David
> --
> View this message in context: http://www.nabble.com/Multiple- 
> instances-of-Lucene-IndexWriter-tf4612568.html#a13172543
> Sent from the Lucene - General mailing list archive at Nabble.com.
Reply | Threaded
Open this post in threaded view
|

Re: Multiple instances of Lucene IndexWriter

Fredrik Andersson-2-2
What you suggested is generally the most easygoing way to deal with
it, i.ehaving a separate index per writer and one serial merging
process. I have
dabbled with disabling (file system) locks and synchronizing the writing
processes by different means, but it's failure-prone unless you're very
familiar with the Lucene internals.
So, if it isn't a big hassle to create a serial merger (depends on your
hardware/communiction setup mostly I guess) I would recommend that.

On 10/12/07, David K <[hidden email]> wrote:

>
>
> Thank you for the quick response but at the moment we are interested in
> our
> own (small) usage of Lucene. It may be that in the future it turns out
> that
> Solr is the solution we need.
>
> At the moment I was hoping for a more descriptive workaround for the issue
> of using multiple instances of IndexWriter on the same index.
>
>
>
>
> Erik Hatcher wrote:
> >
> > David,
> >
> > Have a look at Solr, http://lucene.apache.org/solr - it addresses
> > this issue and many others that you would likely encounter with using
> > pure Lucene.
> >
> >       Erik
> >
> >
> > On Oct 12, 2007, at 6:26 AM, David K wrote:
> >
> >>
> >> We are currently evaluating Lucene for document indexing and a
> >> question came
> >> up regarding multiple instances of IndexWriter possibly accessing
> >> the same
> >> index (directory).
> >>
> >> This would be a consequence of multiple instances of our application
> >> possibly accessing the same index. Where multiple instances are
> >> used for
> >> load balancing and fail over of the application.
> >>
> >> The index could be either on a local drive when virtualization is
> >> used to
> >> achieve multiple instances on a single box. The index could also be
> >> on a
> >> shared drive (windows file sharing) and multiple server instances
> >> trying to
> >> update it.
> >>
> >> I have been looking around in the forums and it is always advised
> >> against
> >> multiple instances of IndexWriter  writing to the same index but I was
> >> wondering whether the group has any suggestions for workarounds.
> >> Surely
> >> there must be other load balanced applications using Lucene?
> >>
> >> Some the workarounds I can think of OTTOMH:
> >>
> >> 1. each instance writing to a local index and merge these local
> >> indexes
> >> periodically to a shared index where searching is performed
> >>
> >> 2. implement our own queuing algorithms by testing for write locks
> >> and wait
> >> until locks are cleared
> >>
> >> thank you,
> >> David
> >> --
> >> View this message in context: http://www.nabble.com/Multiple-
> >> instances-of-Lucene-IndexWriter-tf4612568.html#a13172543
> >> Sent from the Lucene - General mailing list archive at Nabble.com.
> >
> >
> >
>
> --
> View this message in context:
> http://www.nabble.com/Multiple-instances-of-Lucene-IndexWriter-tf4612568.html#a13174201
> Sent from the Lucene - General mailing list archive at Nabble.com.
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Multiple instances of Lucene IndexWriter

David K-2
I can't really say I'm "very familiar with the Lucene internals" :-)

What method would you recommend for checking for locked indexes? I have seen mainly two methods and would be interested in the faster one with less overhead:

Directory directory = FSDirectory.getDirectory(indexDir);
directory.makeLock(IndexWriter.WRITE_LOCK_NAME).isLocked()

or
Directory directory = FSDirectory.getDirectory(indexDir);
IndexReader.isLocked(directory)

many thanks,
David

Fredrik Andersson-2 wrote
What you suggested is generally the most easygoing way to deal with
it, i.ehaving a separate index per writer and one serial merging
process. I have
dabbled with disabling (file system) locks and synchronizing the writing
processes by different means, but it's failure-prone unless you're very
familiar with the Lucene internals.
So, if it isn't a big hassle to create a serial merger (depends on your
hardware/communiction setup mostly I guess) I would recommend that.

On 10/12/07, David K <dkaspar@asite.com> wrote:
>
>
> Thank you for the quick response but at the moment we are interested in
> our
> own (small) usage of Lucene. It may be that in the future it turns out
> that
> Solr is the solution we need.
>
> At the moment I was hoping for a more descriptive workaround for the issue
> of using multiple instances of IndexWriter on the same index.
>
>
>
>
> Erik Hatcher wrote:
> >
> > David,
> >
> > Have a look at Solr, http://lucene.apache.org/solr - it addresses
> > this issue and many others that you would likely encounter with using
> > pure Lucene.
> >
> >       Erik
> >
> >
> > On Oct 12, 2007, at 6:26 AM, David K wrote:
> >
> >>
> >> We are currently evaluating Lucene for document indexing and a
> >> question came
> >> up regarding multiple instances of IndexWriter possibly accessing
> >> the same
> >> index (directory).
> >>
> >> This would be a consequence of multiple instances of our application
> >> possibly accessing the same index. Where multiple instances are
> >> used for
> >> load balancing and fail over of the application.
> >>
> >> The index could be either on a local drive when virtualization is
> >> used to
> >> achieve multiple instances on a single box. The index could also be
> >> on a
> >> shared drive (windows file sharing) and multiple server instances
> >> trying to
> >> update it.
> >>
> >> I have been looking around in the forums and it is always advised
> >> against
> >> multiple instances of IndexWriter  writing to the same index but I was
> >> wondering whether the group has any suggestions for workarounds.
> >> Surely
> >> there must be other load balanced applications using Lucene?
> >>
> >> Some the workarounds I can think of OTTOMH:
> >>
> >> 1. each instance writing to a local index and merge these local
> >> indexes
> >> periodically to a shared index where searching is performed
> >>
> >> 2. implement our own queuing algorithms by testing for write locks
> >> and wait
> >> until locks are cleared
> >>
> >> thank you,
> >> David
> >> --
> >> View this message in context: http://www.nabble.com/Multiple-
> >> instances-of-Lucene-IndexWriter-tf4612568.html#a13172543
> >> Sent from the Lucene - General mailing list archive at Nabble.com.
> >
> >
> >
>
> --
> View this message in context:
> http://www.nabble.com/Multiple-instances-of-Lucene-IndexWriter-tf4612568.html#a13174201
> Sent from the Lucene - General mailing list archive at Nabble.com.
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Multiple instances of Lucene IndexWriter

Fredrik Andersson-2-2
Hi,

you would probably want to use the Lock.obtain() method to get atomicity
since IndexReader.isLocked doesn't actually acquire the lock. Another
process can swipe the lock between your IndexReader.isLocked and your actual
writes. So something like

if(directory.makeLock(...).obtain()) {
  try { your writing stuff }
  finally { directory.clearLock(...); }
} else {
  wait for the lock
}

Best off testing this, been many major versions since I fiddled with locks..
but should work.

On 10/12/07, David K <[hidden email]> wrote:

>
>
> I can't really say I'm "very familiar with the Lucene internals" :-)
>
> What method would you recommend for checking for locked indexes? I have
> seen
> mainly two methods and would be interested in the faster one with less
> overhead:
>
> Directory directory = FSDirectory.getDirectory(indexDir);
> directory.makeLock(IndexWriter.WRITE_LOCK_NAME).isLocked()
>
> or
> Directory directory = FSDirectory.getDirectory(indexDir);
> IndexReader.isLocked(directory)
>
> many thanks,
> David
>
>
> Fredrik Andersson-2 wrote:
> >
> > What you suggested is generally the most easygoing way to deal with
> > it, i.ehaving a separate index per writer and one serial merging
> > process. I have
> > dabbled with disabling (file system) locks and synchronizing the writing
> > processes by different means, but it's failure-prone unless you're very
> > familiar with the Lucene internals.
> > So, if it isn't a big hassle to create a serial merger (depends on your
> > hardware/communiction setup mostly I guess) I would recommend that.
> >
> > On 10/12/07, David K <[hidden email]> wrote:
> >>
> >>
> >> Thank you for the quick response but at the moment we are interested in
> >> our
> >> own (small) usage of Lucene. It may be that in the future it turns out
> >> that
> >> Solr is the solution we need.
> >>
> >> At the moment I was hoping for a more descriptive workaround for the
> >> issue
> >> of using multiple instances of IndexWriter on the same index.
> >>
> >>
> >>
> >>
> >> Erik Hatcher wrote:
> >> >
> >> > David,
> >> >
> >> > Have a look at Solr, http://lucene.apache.org/solr - it addresses
> >> > this issue and many others that you would likely encounter with using
> >> > pure Lucene.
> >> >
> >> >       Erik
> >> >
> >> >
> >> > On Oct 12, 2007, at 6:26 AM, David K wrote:
> >> >
> >> >>
> >> >> We are currently evaluating Lucene for document indexing and a
> >> >> question came
> >> >> up regarding multiple instances of IndexWriter possibly accessing
> >> >> the same
> >> >> index (directory).
> >> >>
> >> >> This would be a consequence of multiple instances of our application
> >> >> possibly accessing the same index. Where multiple instances are
> >> >> used for
> >> >> load balancing and fail over of the application.
> >> >>
> >> >> The index could be either on a local drive when virtualization is
> >> >> used to
> >> >> achieve multiple instances on a single box. The index could also be
> >> >> on a
> >> >> shared drive (windows file sharing) and multiple server instances
> >> >> trying to
> >> >> update it.
> >> >>
> >> >> I have been looking around in the forums and it is always advised
> >> >> against
> >> >> multiple instances of IndexWriter  writing to the same index but I
> was
> >> >> wondering whether the group has any suggestions for workarounds.
> >> >> Surely
> >> >> there must be other load balanced applications using Lucene?
> >> >>
> >> >> Some the workarounds I can think of OTTOMH:
> >> >>
> >> >> 1. each instance writing to a local index and merge these local
> >> >> indexes
> >> >> periodically to a shared index where searching is performed
> >> >>
> >> >> 2. implement our own queuing algorithms by testing for write locks
> >> >> and wait
> >> >> until locks are cleared
> >> >>
> >> >> thank you,
> >> >> David
> >> >> --
> >> >> View this message in context: http://www.nabble.com/Multiple-
> >> >> instances-of-Lucene-IndexWriter-tf4612568.html#a13172543
> >> >> Sent from the Lucene - General mailing list archive at Nabble.com.
> >> >
> >> >
> >> >
> >>
> >> --
> >> View this message in context:
> >>
> http://www.nabble.com/Multiple-instances-of-Lucene-IndexWriter-tf4612568.html#a13174201
> >> Sent from the Lucene - General mailing list archive at Nabble.com.
> >>
> >>
> >
> >
>
> --
> View this message in context:
> http://www.nabble.com/Multiple-instances-of-Lucene-IndexWriter-tf4612568.html#a13177008
> Sent from the Lucene - General mailing list archive at Nabble.com.
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Multiple instances of Lucene IndexWriter

Michael McCandless-2

I think you can just rely on IndexWriter's locking: it will acquire
the write lock, and, throw a LockObtainFailedException if it failed to
acquire it.  You can simply catch this exception, wait some amount of
time, and retry?

Beware, though, that if a writer died ungracefully (JVM crashed, or,
writer was not actually closed before JVM exited) then the current
default LockFactory (SimpleFSLockFactory) will leave the lock acquired
and you must manually release it or delete the lock file.  You can
switch to NativeFSLockFactory to avoid that, however, that locking
implementation has issues over NFS (which you won't hit if your app is
all Windows).

Also, for future reference, this kind of question really should be
asked on java-user instead (general is for broader questions that span
all of the Lucene projects).

Mike

"Fredrik Andersson" <[hidden email]> wrote:

> Hi,
>
> you would probably want to use the Lock.obtain() method to get atomicity
> since IndexReader.isLocked doesn't actually acquire the lock. Another
> process can swipe the lock between your IndexReader.isLocked and your
> actual
> writes. So something like
>
> if(directory.makeLock(...).obtain()) {
>   try { your writing stuff }
>   finally { directory.clearLock(...); }
> } else {
>   wait for the lock
> }
>
> Best off testing this, been many major versions since I fiddled with
> locks..
> but should work.
>
> On 10/12/07, David K <[hidden email]> wrote:
> >
> >
> > I can't really say I'm "very familiar with the Lucene internals" :-)
> >
> > What method would you recommend for checking for locked indexes? I have
> > seen
> > mainly two methods and would be interested in the faster one with less
> > overhead:
> >
> > Directory directory = FSDirectory.getDirectory(indexDir);
> > directory.makeLock(IndexWriter.WRITE_LOCK_NAME).isLocked()
> >
> > or
> > Directory directory = FSDirectory.getDirectory(indexDir);
> > IndexReader.isLocked(directory)
> >
> > many thanks,
> > David
> >
> >
> > Fredrik Andersson-2 wrote:
> > >
> > > What you suggested is generally the most easygoing way to deal with
> > > it, i.ehaving a separate index per writer and one serial merging
> > > process. I have
> > > dabbled with disabling (file system) locks and synchronizing the writing
> > > processes by different means, but it's failure-prone unless you're very
> > > familiar with the Lucene internals.
> > > So, if it isn't a big hassle to create a serial merger (depends on your
> > > hardware/communiction setup mostly I guess) I would recommend that.
> > >
> > > On 10/12/07, David K <[hidden email]> wrote:
> > >>
> > >>
> > >> Thank you for the quick response but at the moment we are interested in
> > >> our
> > >> own (small) usage of Lucene. It may be that in the future it turns out
> > >> that
> > >> Solr is the solution we need.
> > >>
> > >> At the moment I was hoping for a more descriptive workaround for the
> > >> issue
> > >> of using multiple instances of IndexWriter on the same index.
> > >>
> > >>
> > >>
> > >>
> > >> Erik Hatcher wrote:
> > >> >
> > >> > David,
> > >> >
> > >> > Have a look at Solr, http://lucene.apache.org/solr - it addresses
> > >> > this issue and many others that you would likely encounter with using
> > >> > pure Lucene.
> > >> >
> > >> >       Erik
> > >> >
> > >> >
> > >> > On Oct 12, 2007, at 6:26 AM, David K wrote:
> > >> >
> > >> >>
> > >> >> We are currently evaluating Lucene for document indexing and a
> > >> >> question came
> > >> >> up regarding multiple instances of IndexWriter possibly accessing
> > >> >> the same
> > >> >> index (directory).
> > >> >>
> > >> >> This would be a consequence of multiple instances of our application
> > >> >> possibly accessing the same index. Where multiple instances are
> > >> >> used for
> > >> >> load balancing and fail over of the application.
> > >> >>
> > >> >> The index could be either on a local drive when virtualization is
> > >> >> used to
> > >> >> achieve multiple instances on a single box. The index could also be
> > >> >> on a
> > >> >> shared drive (windows file sharing) and multiple server instances
> > >> >> trying to
> > >> >> update it.
> > >> >>
> > >> >> I have been looking around in the forums and it is always advised
> > >> >> against
> > >> >> multiple instances of IndexWriter  writing to the same index but I
> > was
> > >> >> wondering whether the group has any suggestions for workarounds.
> > >> >> Surely
> > >> >> there must be other load balanced applications using Lucene?
> > >> >>
> > >> >> Some the workarounds I can think of OTTOMH:
> > >> >>
> > >> >> 1. each instance writing to a local index and merge these local
> > >> >> indexes
> > >> >> periodically to a shared index where searching is performed
> > >> >>
> > >> >> 2. implement our own queuing algorithms by testing for write locks
> > >> >> and wait
> > >> >> until locks are cleared
> > >> >>
> > >> >> thank you,
> > >> >> David
> > >> >> --
> > >> >> View this message in context: http://www.nabble.com/Multiple-
> > >> >> instances-of-Lucene-IndexWriter-tf4612568.html#a13172543
> > >> >> Sent from the Lucene - General mailing list archive at Nabble.com.
> > >> >
> > >> >
> > >> >
> > >>
> > >> --
> > >> View this message in context:
> > >>
> > http://www.nabble.com/Multiple-instances-of-Lucene-IndexWriter-tf4612568.html#a13174201
> > >> Sent from the Lucene - General mailing list archive at Nabble.com.
> > >>
> > >>
> > >
> > >
> >
> > --
> > View this message in context:
> > http://www.nabble.com/Multiple-instances-of-Lucene-IndexWriter-tf4612568.html#a13177008
> > Sent from the Lucene - General mailing list archive at Nabble.com.
> >
> >