Multi-threaded IndexWriter

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

Multi-threaded IndexWriter


I have a multi-threaded indexing application that indexes documents into a set
of Lucene index databases (I have millions of documents to index, hence the
split DB) .  When a thread gets an index request, it determines the index DB to
index the data in.  It grabs the IndexWriter for that database.

My question is: If I have several threads that want to index some data for the
same DB concurrently and also have threads that will be wanting to delete
documents and searchers too.  Does anyone know the benefits and drawbacks of the
following approaches with respect to the performance characteristics of the
Lucene internals

a) Serialisation of writes i.e. multiple IndexWriter.close().  Each thread
blocks waiting for the writer and does

new IndexWriter()
close IndexWriter

for each thread or

b) Parallelisation of writes with a single IndexWriter.close().  Allow all
threads to share the same IndexWriter instance.  LIA says that IndexWriter is
thread-safe between several threads.  So, the first thread requesting the writer
just creates a new instance, all subsequent threads just add documents to the
same instance with the last user closing the writer, e.g.

First thread - new IndexWriter()
2..n threads - inc use_count +┬┤get existing IndexWriter
all threads - addDocuments()
n..2 threads - dec use_count
Last thread - close IndexWriter

The middle 3 steps will of course happen in random order, not as defined above.


To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]