recall/precision with lucene

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

recall/precision with lucene

Panos Konstantinidis
Hello I am a new lucene user. I am trying to calculate the recall/precision of
a query and I was wondering if lucene provides an easy way to do it.

Currently I have a number of documents that match a given query. Then I am
doing a search and I am getting back all the Hits. I then divide the number of
documents that came back from lucene (the Hits size) with the number of
documents that should have got. This is how I calculate the recall.

For precision I just get the hits.score() of each relevant document. I am not
sure if I am on the right track or if there is an easier/better way to do it. I
would appreciate any insigith into this.

Regards

Panos


      ____________________________________________________________________________________
Never miss a thing.  Make Yahoo your home page.
http://www.yahoo.com/r/hs

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: recall/precision with lucene

Paul Elschot
Op Saturday 09 February 2008 01:59:12 schreef Panos Konstantinidis:
> Hello I am a new lucene user. I am trying to calculate the recall/precision of
> a query and I was wondering if lucene provides an easy way to do it.
>
> Currently I have a number of documents that match a given query. Then I am
> doing a search and I am getting back all the Hits. I then divide the number of
> documents that came back from lucene (the Hits size) with the number of
> documents that should have got. This is how I calculate the recall.

Since you're going to use all hits for the query, it is normally better to avoid
Hits and use a HitCollector or a TopDocs.
 
> For precision I just get the hits.score() of each relevant document. I am not
> sure if I am on the right track or if there is an easier/better way to do it. I
> would appreciate any insigith into this.

To use the score value for precision one could define a cut off value for
the score value, but then the calculation for recall would also need to
be adapted. For this a HitCollector would be good.

In case you want the results sorted by decreasing score value have
a look at the search methods that return TopDocs. From this one
can make a precision/recall graph for the query by considering
the total results higher than a given score.

When a lot of such computations are needed, you may also want
to cache the values of a unique identifier field for all indexed docs,
have a look at FieldCache for this.

Regards,
Paul Elschot

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: recall/precision with lucene

Doron Cohen-2
In reply to this post by Panos Konstantinidis
Take a look at the quality package under contrib/benchmark.

Regards,
Doron

On Sat, Feb 9, 2008 at 2:59 AM, Panos Konstantinidis <[hidden email]>
wrote:

> Hello I am a new lucene user. I am trying to calculate the
> recall/precision of
> a query and I was wondering if lucene provides an easy way to do it.
>
> Currently I have a number of documents that match a given query. Then I am
> doing a search and I am getting back all the Hits. I then divide the
> number of
> documents that came back from lucene (the Hits size) with the number of
> documents that should have got. This is how I calculate the recall.
>
> For precision I just get the hits.score() of each relevant document. I am
> not
> sure if I am on the right track or if there is an easier/better way to do
> it. I
> would appreciate any insigith into this.
>
> Regards
>
> Panos
>