intuitive explanation for what seems like odd result?

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

intuitive explanation for what seems like odd result?

Donna L Gresh
I have two slightly different queries, and am filtering to return only a
single unique document. The scores are very slightly different, but in the
opposite way from what my (naive) reasoning would have expected.

In the first case the query is
text:"j2ee"^2.0, text:"soa"^2.0, text:webservic, text:db2

Note that there are two boosted terms and two unboosted terms. The
document, in fact, contains only the terms "db2" and "soa", and note that
db2 is an unboosted term.
The score is 0.069.

The second query is
text:"db2"^2.0, text:"j2ee"^2.0, text:"soa"^2.0, text:webservic

In this case there are three boosted terms and one unboosted term. Note
that now both db2 and soa are "boosted".
The score is 0.065, which is slightly smaller, which is the opposite of
what I would expect, since I have two boosted terms now instead of just
one.

Looking at the explanation object, the difference is entirely due to the
queryNorm factor (which explain doesn't really "go into").
My next step is to get the Lucene source and try to step through to
determine why queryNorm is larger in the first case than the second, but I
was wondering if anyone out there can give a simple explanation for why it
would differ for these two queries. I use the DefaultSimilarity class.

Many thanks in advance--
Donna


Donna L. Gresh
Services Research, Mathematical Sciences Department
IBM T.J. Watson Research Center
(914) 945-2472
http://www.research.ibm.com/people/g/donnagresh
[hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: intuitive explanation for what seems like odd result?

Karl Wettin
Donna L Gresh skrev:
> I have two slightly different queries,

Hi Donna,

I can't help you, but perhaps I would understand everthing better if you
also pasted in the explanations.



      karl

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: intuitive explanation for what seems like odd result?

Donna L Gresh
Sure; here are the two explanations (below). Your question made me go look

at the explanation more carefully again and (no) surprise, I discovered
that I
misspoke (miswrote) earlier; the two "found" terms are j2ee and soa,
which then makes my "concern" much less of one, since in both cases, the
"found" terms *are*
being boosted. So the very slight difference in score is not much to be
concerned about.

So I guess, "never mind". I did step through all the code and now have a
much better
understanding of the steps involved in calcuating the score---
Thanks


First the one for the query with db2 NOT boosted
clauses= [text:"j2ee"^2.0, text:"soa"^2.0, text:webservic, text:db2]
"explanation.toString()"= "0.06920814 = (MATCH) product of:\n
  0.13841628 = (MATCH) sum of:\n
    0.04483024 = (MATCH) weight(text:j2ee^2.0 in 13588), product of:\n
      0.50752115 = queryWeight(text:j2ee^2.0), product of:\n
        2.0 = boost\n
        3.230419 = idf(docFreq=3719, numDocs=34610)\n
        0.07855346 = queryNorm\n
      0.08833177 = (MATCH) fieldWeight(text:j2ee in 13588), product of:\n
        1.0 = tf(termFreq(text:j2ee)=1)\n
        3.230419 = idf(docFreq=3719, numDocs=34610)\n
        0.02734375 = fieldNorm(field=text, doc=13588)\n
    0.093586035 = (MATCH) weight(text:soa^2.0 in 13588), product of:\n
      0.7332873 = queryWeight(text:soa^2.0), product of:\n
        2.0 = boost\n
        4.667441 = idf(docFreq=883, numDocs=34610)\n
        0.07855346 = queryNorm\n
      0.12762533 = (MATCH) fieldWeight(text:soa in 13588), product of:\n
        1.0 = tf(termFreq(text:soa)=1)\n
        4.667441 = idf(docFreq=883, numDocs=34610)\n
        0.02734375 = fieldNorm(field=text, doc=13588)\n
  0.5 = coord(2/4)\n"

Here for the one with db2 boosted:
clauses= [text:"db2"^2.0, text:"j2ee"^2.0, text:"soa"^2.0, text:webservic]

"explanation.toString()"= "0.06465748 = (MATCH) product of:\n
  0.12931496 = (MATCH) sum of:\n
    0.041882508 = (MATCH) weight(text:j2ee^2.0 in 13588), product of:\n
      0.47415 = queryWeight(text:j2ee^2.0), product of:\n
        2.0 = boost\n
        3.230419 = idf(docFreq=3719, numDocs=34610)\n
        0.073388316 = queryNorm\n
      0.08833177 = (MATCH) fieldWeight(text:j2ee in 13588), product of:\n
        1.0 = tf(termFreq(text:j2ee)=1)\n
        3.230419 = idf(docFreq=3719, numDocs=34610)\n
        0.02734375 = fieldNorm(field=text, doc=13588)\n
    0.087432444 = (MATCH) weight(text:soa^2.0 in 13588), product of:\n
      0.68507123 = queryWeight(text:soa^2.0), product of:\n
        2.0 = boost\n
        4.667441 = idf(docFreq=883, numDocs=34610)\n
        0.073388316 = queryNorm\n
      0.12762533 = (MATCH) fieldWeight(text:soa in 13588), product of:\n
        1.0 = tf(termFreq(text:soa)=1)\n
        4.667441 = idf(docFreq=883, numDocs=34610)\n
        0.02734375 = fieldNorm(field=text, doc=13588)\n
  0.5 = coord(2/4)\n"

Donna L. Gresh
Services Research, Mathematical Sciences Department
IBM T.J. Watson Research Center
(914) 945-2472
http://www.research.ibm.com/people/g/donnagresh
[hidden email]


Karl Wettin <[hidden email]> wrote on 04/01/2008 02:33:42 PM:

> Donna L Gresh skrev:
> > I have two slightly different queries,
>
> Hi Donna,
>
> I can't help you, but perhaps I would understand everthing better if you

> also pasted in the explanations.
>
>
>
>       karl
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>