Caching Filters and docIds when using MultiSearcher/IndexSearcher(MultiReader)...

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

Caching Filters and docIds when using MultiSearcher/IndexSearcher(MultiReader)...

Up to now I have only needed to search a single index, but now I will have many
index shards to search across.  My existing search mantained cached filters for
the index as well as a cache of my own unique ID fields in the index, keyed by
Lucene DocId.

Now I need to search multiple indices, I am trying to work out how to continue
to use these caches.

I have one index per month of data (up to 10M docs per month) and users can
search across whichever date range they want, so one search may search Index
1-->12 (e.g. Jan07-Dec07) and another 13-20 (Jan08-Aug08).

It makes no sense to cache a single bitset generated from a MultiReader over
indices 1-12 when the next search could be for indices 2-11 and all the bits
would be useless, so to be of any use, caches, including cached BitSets should
therefore contain the doc ids specific to the particular index rather than to
any particular MultiReader.  Then my Filter implementation can determine the
real doc id and delegate to a bitset for the particular reader instance.

This means I need to find the original reader/searcher instance and the
particular doc Id from that instance to perform bitset checks or cache lookups.

In the MultiSearcher there is subDoc and subSearcher, but there's no such beast
for an IndexReader to find the real reader/doc from the pseudo one.

This also raises the question about MultiSearcher vs IndexSearcher(MultiReader)
which, even after reading the the archives, I am unsure which I should use -
there seem to be comments in the dev list to avoid MultiSearcher...

Any thoughts or have I spiralled too far into Lucene's depths to see where I am...?


To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]