Jason Rutherglen updated LUCENE-2312:
This is a revised version of the LUCENE-2312 patch. The following are various and miscelaneous notes pertaining to the patch and where it needs to go to be committed.
Feel free to review the approach taken, eg, we're getting around non-realtime structures through the usage of array copies (of which the arrays can be pooled at some point).
* A copy of FreqProxPostingsArray.termFreqs is made per new reader. That array can be pooled. This is no different than the deleted docs BitVector which is created anew per-segment for any deletes that have occurred.
* FreqProxPostingsArray freqUptosRT, proxUptosRT, lastDocIDsRT, lastDocFreqsRT is copied into, per new reader (as opposed to an entirely new array instantiated for each new reader), this is a slight optimization in object allocation.
* For deleting, a DWPT is clothed in an abstract class that exposes the necessary methods from segment info, so that deletes may be applied to the RT RAM reader. The deleting is still performed in BufferedDeletesStream. BitVectors are cloned as well. There is room for improvement, eg, pooling the BV byte’s.
* Documents (FieldsWriter) and term vectors are flushed on each get reader call, so that reading will be able to load the data. We will need to test if this is performant. We are not creating new files so this way of doing things may well be efficient.
* We need to measure the cost of the native system array copy. It could very well be quite fast / enough.
* Full posting functionality should be working including payloads
* Field caching may be implemented as a new field cache that is growable and enables lock’d replacement of the underlying array
* String to string ordinal comparison caches needs to be figured out. The RAM readers cannot maintain a sorted terms index the way statically sized segments do
* When a field cache value is first being created, it needs to obtain the indexing lock on the DWPT. Otherwise documents will continue to be indexed, new values created, while the array will miss the new values. The downside is that while the array is initially being created, indexing will stop. This can probably be solved at some point by only locking during the creation of the field cache array, and then notifying the DWPT of the new array. New values would then accumulate into the array from the point of the max doc of the reader the values creator is working from.
* The terms dictionary is a ConcurrentSkipListMap. We can periodically convert it into a sorted [by term] int, that has an FST on top.