Top N terms of an indexed field

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Top N terms of an indexed field

Alex Benjamen
I was wondering if it is possible to retrieve the top 20 terms for a given
fields in an index.
 
For example, if we're indexing user profile data and one of the fields
is "interests" - it would be great to get the top 20 terms for interests
found in the index.
 
-Alex
Reply | Threaded
Open this post in threaded view
|

Re: Top N terms of an indexed field

Ryan McKinley
Alex Benjamen wrote:
> I was wondering if it is possible to retrieve the top 20 terms for a given
> fields in an index.
>  
> For example, if we're indexing user profile data and one of the fields
> is "interests" - it would be great to get the top 20 terms for interests
> found in the index.
>  

check out faceting
http://wiki.apache.org/solr/SimpleFacetParameters

if you want it across all documents, use the query *:*

ryan
Reply | Threaded
Open this post in threaded view
|

Re: Top N terms of an indexed field

Otis Gospodnetic-2
In reply to this post by Alex Benjamen
Alex,

You can also use HighFrequencyTerms class (or something with a very similar name) from Lucene contrib/misc (I believe).  It's a command line app that will get you exactly what you want.  Good for figuring out if you should add more terms to your stopword list, for example.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

----- Original Message ----

> From: Alex Benjamen <[hidden email]>
> To: [hidden email]
> Sent: Thursday, February 28, 2008 10:22:38 PM
> Subject: Top N terms of an indexed field
>
> I was wondering if it is possible to retrieve the top 20 terms for a given
> fields in an index.
>  
> For example, if we're indexing user profile data and one of the fields
> is "interests" - it would be great to get the top 20 terms for interests
> found in the index.
>  
> -Alex
>