Indexing from multiple applications to a central index.

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Indexing from multiple applications to a central index.

Doug Hughes
 
Hello,
 
I have a situation where I need to have multiple applications, potentially
located on different servers, and which have no knowledge of each other,
indexing into and searching from the same Lucene index.  I anticipate
problems with locks.  
 
Let's say I have two applications and, at any time, either of them may try
to index upwards of 1000 documents (or more!).  If, by luck, these
applications do not attempt to write to the index at the same time then
things are fine.  However, if both of them try to write to the index at the
same time, one of them will fail due to the index being locked.  
 
My first solution to this problem was to have both applications check to see
if the index is locked and to let them sleep until the index was unlocked.
The problem with this is that if, while indexing, an application is shut
down or killed, the index may not be unlocked.  This will block other
applications from indexing and may cause them to hang.
 
Clearly I have a threading problems.  I think I may know a solution to this
problem and I would appreciate verification of the solution or suggestions
on approaches.
 
I am thinking that I can make all of the applications index into their own
index, not the central shared index.  Their own index might be a FSDirectory
or a RAMDirectory.  When done indexing, the applications' indexes would be
merged with the central index for consumption by all applications sharing
the index.
 
From what I understand, the process of merging indexes takes a lot less time
than the process of inserting into or deleting from an index.  This seems to
mean that I'm less "likely" to run into locking issues.  I can more safely
have process sleep until the index is unlocked and can gain access to merge
their index with the central index.  If these applications use their own
FSDirectory I should be able to continue working with their FS directory in
the case of an unclean shutdowns and should still be able to merge it with
the central index.
 
Does anyone have any advice to offer on this?
 
Thank you,
 
Doug Hughes
[hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: Indexing from multiple applications to a central index.

Daniel Naber
On Tuesday 07 June 2005 19:36, Doug Hughes wrote:

> I am thinking that I can make all of the applications index into their
> own index, not the central shared index.  Their own index might be a
> FSDirectory or a RAMDirectory.  When done indexing, the applications'
> indexes would be merged with the central index for consumption by all
> applications sharing the index.

Why not just work on more than one index without merging and search using
MultiSearcher?

Regards
 Daniel

--
http://www.danielnaber.de

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Indexing from multiple applications to a central index.

Otis Gospodnetic-2
In reply to this post by Doug Hughes
I think your setup is right for a centralized IndexQueueManager that is
subscribed to topics to which your distributed servers push data to
index via JMS.  That way you get an easy way to add more machines to
the cluster, you get persistence of not-yet-indexed data, and you get a
queuing mechanism that takes care of locking issues.

Otis


--- Doug Hughes <[hidden email]> wrote:

>  
> Hello,
>  
> I have a situation where I need to have multiple applications,
> potentially
> located on different servers, and which have no knowledge of each
> other,
> indexing into and searching from the same Lucene index.  I anticipate
> problems with locks.  
>  
> Let's say I have two applications and, at any time, either of them
> may try
> to index upwards of 1000 documents (or more!).  If, by luck, these
> applications do not attempt to write to the index at the same time
> then
> things are fine.  However, if both of them try to write to the index
> at the
> same time, one of them will fail due to the index being locked.  
>  
> My first solution to this problem was to have both applications check
> to see
> if the index is locked and to let them sleep until the index was
> unlocked.
> The problem with this is that if, while indexing, an application is
> shut
> down or killed, the index may not be unlocked.  This will block other
> applications from indexing and may cause them to hang.
>  
> Clearly I have a threading problems.  I think I may know a solution
> to this
> problem and I would appreciate verification of the solution or
> suggestions
> on approaches.
>  
> I am thinking that I can make all of the applications index into
> their own
> index, not the central shared index.  Their own index might be a
> FSDirectory
> or a RAMDirectory.  When done indexing, the applications' indexes
> would be
> merged with the central index for consumption by all applications
> sharing
> the index.
>  
> From what I understand, the process of merging indexes takes a lot
> less time
> than the process of inserting into or deleting from an index.  This
> seems to
> mean that I'm less "likely" to run into locking issues.  I can more
> safely
> have process sleep until the index is unlocked and can gain access to
> merge
> their index with the central index.  If these applications use their
> own
> FSDirectory I should be able to continue working with their FS
> directory in
> the case of an unclean shutdowns and should still be able to merge it
> with
> the central index.
>  
> Does anyone have any advice to offer on this?
>  
> Thank you,
>  
> Doug Hughes
> [hidden email]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]