Top terms relevance from specific documents ?

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

Top terms relevance from specific documents ?

Yannick Martel
Hi !

I am using (Java) Lucene for data indexation, and I want to produce kind
of tags cloud for specific data.

I've found HighFreqTerms to get a top list of terms from *all
documents* (if I have well understood) (by the bye, I had override it to
be able to filter on several fields instead only one).

But, it does not really match with my need : I'd like to get the most
repeated terms in a single (or several specific) document(s).
For exemple, considering a document with Terms "Title", "Summary",
"Description", I try to get the count of each terms (excluding stop
words from Analyzer).

I cannot find process to do that : I searched among TopFieldCollector,
or other collector, but seems it just give document scores :/

Find documentation is not easy I think, cause lot of questions/answers
are either not corresponding my need, or with old version (3.x for
example), and I'm feeling lost in all of this...

Hopping someone could guide me well.


Yannick Martel

To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]