I have a situation similar to indexing a mailing list, with each mail
indexed as a Doc. Mails from a same thread share a same thread ID, which is
indexed in a separate field. Now I want to search through all the mails
using some keywords, and list all the unique thread IDs which I can pass to
the database calls.
I tried DuplicateFilter, which didn't work well - by missing some results. I
went through the code, and found all the filters are basically pre-filters,
in other words, they generate the bitsets based on the index, and filter the
duplicates out (in the case of DuplicateFilter) before being applied to the
result collector. It causes problem when some mails contain the searching
keywords but were filtered out as they were set to false in the bitset
Any solutions for this? is there any sort of post-filtering things exist,
that filter records in the search result (could be slow), rather than in the
whole collection? Thanks.