Most efficient way to find related terms

Martin Bayly
I'm wondering what the most efficient approach is to finding all terms
which are related to a specific term X.


By related I mean terms which appear in a specific document field that
also contains the target term X.


e.g. Document has a keyword field, field1 that can contain multiple


document1 - field1 HAS key1, key2, key3

document2 - field1 HAS key2, key4

document3 - field1 HAS key5


If I want to find terms related to key2, I need to return key1, key3,


Obviously I can do a search for key2, iterate all the docs and collect
there field1 terms manually.


But presumably a more efficient way is to use TermDocs:


1. TermDocs termdocs = IndexReader.termDocs(new Term("field1", "key2")

2. Iterate term docs to get documents containing that term

3. Now this is the bit I'm not sure of:

a. I could call Document doc = IndexReader.document(n), but that will
load all fields and I only want the field1

b. Presumably better to call Document doc = IndexReader.document(n,

c. Or would I be better to turn on TermFrequencyVectors for this field
so I can call IndexReader.getTermFrequencyVector(n, "field1") - don't
particularly care about the frequencies as it will always be 1 for a
particular doc.


Other approaches?


I'm going to perf test to see how (b) and (c) compare but would be glad
if anyone has any insights.