Search quality evaluation

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Search quality evaluation

Andrzej Białecki-2
Hi,

I found this paper, more or less by accident:

"Scaling IR-System Evaluation using Term Relevance Sets"; Einat Amitay,
David Carmel, Ronny Lempel, Aya Soffer

    http://einat.webir.org/SIGIR_2004_Trels_p10-amitay.pdf

It gives an interesting and rather simple framework for evaluating the
quality of search results.

Anybody interested in hacking together a component for Nutch and e.g.
for Google, to run this evaluation? ;)

--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


Reply | Threaded
Open this post in threaded view
|

Re: Search quality evaluation

Dawid Weiss

I can help by reusing input components from Carrot2 -- they give access
to Google (via GoogleAPI), Yahoo (via their REST API) and Nutch (via
OpenSearch). Somebody would need to put together the rest of the
evaluation framework though :)

D.

Andrzej Bialecki wrote:

> Hi,
>
> I found this paper, more or less by accident:
>
> "Scaling IR-System Evaluation using Term Relevance Sets"; Einat Amitay,
> David Carmel, Ronny Lempel, Aya Soffer
>
>    http://einat.webir.org/SIGIR_2004_Trels_p10-amitay.pdf
>
> It gives an interesting and rather simple framework for evaluating the
> quality of search results.
>
> Anybody interested in hacking together a component for Nutch and e.g.
> for Google, to run this evaluation? ;)
>
Reply | Threaded
Open this post in threaded view
|

Re: Search quality evaluation

Andrzej Białecki-2
Dawid Weiss wrote:
>
> I can help by reusing input components from Carrot2 -- they give
> access to Google (via GoogleAPI), Yahoo (via their REST API) and Nutch
> (via OpenSearch). Somebody would need to put together the rest of the
> evaluation framework though :)

Ah, yes, that's a good idea, this is the most tedious part I suppose.
Regarding the rest of the framework - well, I already work 25 hours a
day (you know, I get up one hour earlier than everybody else ... ;).

--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


Reply | Threaded
Open this post in threaded view
|

Re: Search quality evaluation

Doug Cutting
In reply to this post by Andrzej Białecki-2
FYI, Mike wrote some evaluation stuff for Nutch a long time ago.  I
found it in the Sourceforge Attic:

http://cvs.sourceforge.net/viewcvs.py/nutch/nutch/src/java/net/nutch/quality/Attic/

This worked by querying a set of search engines, those in:

http://cvs.sourceforge.net/viewcvs.py/nutch/nutch/engines/

The results of each engine is scored by how much they differ from all of
the other engines combined.  The Kendall Tau distance is used to compare
rankings.  Thus this is a good tool to find out how close Nutch is to
the quality of other engines, but it may not not be a good tool to make
Nutch better than other search engines.

In any case, it includes a system to scrape search results from other
engines, based on Apple's Sherlock search-engine descriptors.  These
descriptors are also used by Mozilla:

http://mycroft.mozdev.org/deepdocs/quickstart.html

So there's a ready supply of up-to-date descriptions for most major
search engines.  Many engines provide a skin specifically to simplify
parsing by these plugins.

The code that implemented Sherlock plugins in Nutch is at:

http://cvs.sourceforge.net/viewcvs.py/nutch/nutch/src/java/net/nutch/quality/dynamic/

Doug

Andrzej Bialecki wrote:

> Hi,
>
> I found this paper, more or less by accident:
>
> "Scaling IR-System Evaluation using Term Relevance Sets"; Einat Amitay,
> David Carmel, Ronny Lempel, Aya Soffer
>
>    http://einat.webir.org/SIGIR_2004_Trels_p10-amitay.pdf
>
> It gives an interesting and rather simple framework for evaluating the
> quality of search results.
>
> Anybody interested in hacking together a component for Nutch and e.g.
> for Google, to run this evaluation? ;)
>
Reply | Threaded
Open this post in threaded view
|

Re: Search quality evaluation

Dawid Weiss


> In any case, it includes a system to scrape search results from other
> engines, based on Apple's Sherlock search-engine descriptors.  These
> descriptors are also used by Mozilla:

Just a note: we used to have exactly the same mechanism in Carrot2.
Unfortunately this format does not make a clear distinction between
title/ url/ snippet parts and stays at snippet granularity, so we
additionally parsed each snippet with regular expressions...  The
problem that lies beneath is in terms-of-use which forbid automatic
scraping of search results using these plugins... That's the main reason
why we switched to public APIs, actually.

D.