Statistaical evaluation of modifications to a Lucene query based on search logs

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Statistaical evaluation of modifications to a Lucene query based on search logs

Daniel Shane-2
Hi!

I'm developing a new type of Query, called a SubPhraseQuery. I have sent
a message to the list regarding this and Doug was kind enough to put me
on the right track. The query is simply a PhraseQuery where all terms
are search, but, if any of the subphrases are found, it boosts the
results the longer the subphrase is.

For example, searching A B C, it will behave a bit like if I had rewrote
that query +A +B +C  "A B"^2  "B C"^2  "A B C"^4

Now the hard part is fine tuning the weight of the subphrases, and I was
wondering if there is any articles that deal with comparing search
engines based on search logs.

I'm trying to find a way to test this new query to see if it improves
most of the queries that people do on our site. The only problem is, it
seems very difficult, based on search logs, to know when a user is
satisfied with a document or not.

The principle behind statistical evaluation of search logs would be to
see if documents that people found "interesting" are, in average, higher
ranked with the new query than with the previous one. The problem is in
determining how one can deduce that a document is "interesting" based on
search logs.

Here are a few approaches :

a) Find queries where the user did not choose any of the documents in
the first page of results, clicks next, and then clicks on a document.
We can assume here that if the document has appeared in the first page
of results, he most probably would of clicked on it there, so if, in
average, the new query ranks this hit higher, then it should be better.

b) Find a search pattern where the user rapidly clicks on different
document and then seems to spend a longer time on a particular document.
Maybe we can deduce here that this document is better and therefore we
should try to rank it higher.

There are probably many other ways, and I'm wondering if any of the
developers on Lucene tried to analyze search logs in order to fine tune
a query and if so what approach did you use? Is there any literature on
the subject that anyone knows about (any papers, web references etc...)?

As usual, thanks in advance for any help,
Daniel Shane

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Statistaical evaluation of modifications to a Lucene query based on search logs

Otis Gospodnetic-2
Hi,

I haven't done this type of analysis and my guess is no commercial search engine would be willing to share this data.
However, I think it's intuitive that longer phrase matches would look more promissing.

I think one way you can test this is by showing KWIC snippets (using Highlighter) and making sure the longest matching phrase is shown and highlighted, so the user sees it clearly.  Then keep track on which documents the person clicks on, assuming that the person saw short and long phrase matches in those KWICs and decided that one looks better than the other.

Otis


----- Original Message ----
From: Daniel Shane <[hidden email]>
To: [hidden email]
Sent: Thursday, May 4, 2006 10:52:46 AM
Subject: Statistaical evaluation of modifications to a Lucene query based on search logs

Hi!

I'm developing a new type of Query, called a SubPhraseQuery. I have sent
a message to the list regarding this and Doug was kind enough to put me
on the right track. The query is simply a PhraseQuery where all terms
are search, but, if any of the subphrases are found, it boosts the
results the longer the subphrase is.

For example, searching A B C, it will behave a bit like if I had rewrote
that query +A +B +C  "A B"^2  "B C"^2  "A B C"^4

Now the hard part is fine tuning the weight of the subphrases, and I was
wondering if there is any articles that deal with comparing search
engines based on search logs.

I'm trying to find a way to test this new query to see if it improves
most of the queries that people do on our site. The only problem is, it
seems very difficult, based on search logs, to know when a user is
satisfied with a document or not.

The principle behind statistical evaluation of search logs would be to
see if documents that people found "interesting" are, in average, higher
ranked with the new query than with the previous one. The problem is in
determining how one can deduce that a document is "interesting" based on
search logs.

Here are a few approaches :

a) Find queries where the user did not choose any of the documents in
the first page of results, clicks next, and then clicks on a document.
We can assume here that if the document has appeared in the first page
of results, he most probably would of clicked on it there, so if, in
average, the new query ranks this hit higher, then it should be better.

b) Find a search pattern where the user rapidly clicks on different
document and then seems to spend a longer time on a particular document.
Maybe we can deduce here that this document is better and therefore we
should try to rank it higher.

There are probably many other ways, and I'm wondering if any of the
developers on Lucene tried to analyze search logs in order to fine tune
a query and if so what approach did you use? Is there any literature on
the subject that anyone knows about (any papers, web references etc...)?

As usual, thanks in advance for any help,
Daniel Shane

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]





---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Statistaical evaluation of modifications to a Lucene query based on search logs

Robin H. Johnson-3
In reply to this post by Daniel Shane-2
On Thu, May 04, 2006 at 10:52:46AM -0400, Daniel Shane wrote:
> I'm developing a new type of Query, called a SubPhraseQuery. I have sent
> a message to the list regarding this and Doug was kind enough to put me
> on the right track. The query is simply a PhraseQuery where all terms
> are search, but, if any of the subphrases are found, it boosts the
> results the longer the subphrase is.
I can't help on the analyzing portion, but I can show you an alternative
implementation.

We use Lucene to power the search behind isohunt.com, and I came up with
a different way of doing what you want. It's got less in the way of
magic constants, and more in the way of using existing Lucene
functionality.

It's got one difference from yours, in that the terms are allowed to
occur in any order in the sub-phrases (so phrase "C B" from your
original example is scored like "B C").

If the query is a boolean query, it's a candidate for transmuting.
Otherwise it's just used as is.

/* Puesdo-code follows */
static Query transmuteBooleanQueryToSpanQuery(BooleanQuery bq)
1. Set required = get all terms with BOoleanClause.Occur.MUST.
2. Set optional = get all terms with BOoleanClause.Occur.SHOULD.
3. If the sum of the size of the two sets is <= 1, just return (safety case).
4. SpanTermQuery stq[] = (construct for a SpanTermQuery for each item in the above sets).
5. This is the bit of magic here: Define a value 'proximity' using the
   size of the sets above. We use required.size*3 + optional.size*2 + 5.
5. snq = new SpanNearQuery(stq,proximity,false);
6. bq.add(snq, BooleanClause.Occur.SHOULD);
7. return bq;

--
Robin Hugh Johnson
E-Mail     : [hidden email]
Home Page  : http://www.orbis-terrarum.net/?l=people.robbat2
ICQ#       : 30269588 or 41961639
GnuPG FP   : 11AC BA4F 4778 E3F6 E4ED  F38E B27B 944E 3488 4E85

attachment0 (249 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Statistaical evaluation of modifications to a Lucene query based on search logs

Chris Hostetter-3

: It's got one difference from yours, in that the terms are allowed to
: occur in any order in the sub-phrases (so phrase "C B" from your
: original example is scored like "B C").

there's a much bigger differnece, in that your technique won't reqard
documents where B and C are "near" eachother, but A is farther away in the
document then the proximity value you calculate.

Daniel's goal is to make sure that documents matching any subphrase of the
orriginal query get a increase in score based on the length of hte
subphrase.  in his specific example the orriginal query only had three
words, and he wanted all of them to be mandatory, but consider the case
where they are all optional.  if i search for 'A B C Z', and Z is a
nonexistent term, he wants documents matching the phrase "A B C" to get
better scores then documents matching the phrase "A B", or "B C" which
should get better score then documents that just match the individual
terms with large gaps in between them.

your approach will still only increase the scores of documents where *all*
of the terms appear within some proximity.




-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Statistaical evaluation of modifications to a Lucene query based on search logs

Daniel Shane-2
Chris Hostetter wrote:

> : It's got one difference from yours, in that the terms are allowed to
> : occur in any order in the sub-phrases (so phrase "C B" from your
> : original example is scored like "B C").
>
> there's a much bigger differnece, in that your technique won't reqard
> documents where B and C are "near" eachother, but A is farther away in the
> document then the proximity value you calculate.
>
> Daniel's goal is to make sure that documents matching any subphrase of the
> orriginal query get a increase in score based on the length of hte
> subphrase.  in his specific example the orriginal query only had three
> words, and he wanted all of them to be mandatory, but consider the case
> where they are all optional.  if i search for 'A B C Z', and Z is a
> nonexistent term, he wants documents matching the phrase "A B C" to get
> better scores then documents matching the phrase "A B", or "B C" which
> should get better score then documents that just match the individual
> terms with large gaps in between them.
>
> your approach will still only increase the scores of documents where *all*
> of the terms appear within some proximity.
>
>
>
>
> -Hoss
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>  
Thanks Robin for sharing your idea, your idea is certainly interesting,
but like Hoss says, I really wanted to be strict on what matches and
only boost documents with either the whole phrase or bits of sub-phrases
scored higher.

After much thought about testing a search engine, I came to the
conclusion that one of the best way of doing it would be to deploy it
locally for a few months and have our staff use it. When they do a
search, they are presented with two lists and they can choose which one
is the "best" (old version and new version). Better would be to just
never tell what list is generated by the newer version and see if in
most cases the newer version is prefered over the old one.

We just cant seem to find any pattern in a search log that would give us
any kind of certainty as to user satisfaction in a result, the bias is
always so large that it makes the whole statistical study almost useless.

Daniel



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]