Approach for indexing and queryin good volume data.
This post was updated on .
I am planning to use Lucene(not in cluster) for indexing and querying good volume data. Use case is, 10-20 documents / second(roughly around 40-50 fields) and in parallel doing query. Below is the approach i am planning to take, can anyone please let me know from their past experience if that sounds plausible and how to handle performance issues. Ideally i'll get maximum throughput if index in batches, but latency would be issue, so simple approach is index documents as they come.
-- Index documents as they come(single thread / blocking queue(1))
-- create document
-- add document
-- commit(this would be cost, but no option). I guess i can hold on the commit, would Lucene do autocommit after certain point in time(can i define any criteria ?) ?
-- DirectoryReader.openIfChanged((DirectoryReader) indexReader, indexWriter) to open new reader before query to get new changes.
-- Also shall i close the indexWriter after every commit or keep it open for the lifecycle of application(because application could keep n running for months and months) ? Shall i create indexReader for every query and close or just keep indexReader open for the lifecycle of application ?