Top terms relevance from specific documents ?

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Top terms relevance from specific documents ?

Yannick Martel
Hi !

I am using (Java) Lucene for data indexation, and I want to produce kind
of tags cloud for specific data.

I've found HighFreqTerms to get a top list of terms from *all
documents* (if I have well understood) (by the bye, I had override it to
be able to filter on several fields instead only one).

But, it does not really match with my need : I'd like to get the most
repeated terms in a single (or several specific) document(s).
For exemple, considering a document with Terms "Title", "Summary",
"Description", I try to get the count of each terms (excluding stop
words from Analyzer).

I cannot find process to do that : I searched among TopFieldCollector,
or other collector, but seems it just give document scores :/

Find documentation is not easy I think, cause lot of questions/answers
are either not corresponding my need, or with old version (3.x for
example), and I'm feeling lost in all of this...


Hopping someone could guide me well.

Regards,

--
Yannick Martel


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]