Scoring differences solr versions

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Scoring differences solr versions

roySolr
Hi,

I have some question about the scoring in SOLR4. I have the same query on 2 versions of SOLR(same indexed docs). The debug of the scoring:

SOLR4:
3.3243241 = (MATCH) sum of: 0.20717455 = (MATCH) max plus 1.0 times others of: 0.19920631 = (MATCH) weight(firstname_search:g^50.0 in 783453) [DefaultSimilarity], result of: 0.19920631 = score(doc=783453,freq=1.0 = termFreq=1.0 ), product of: 0.11625154 = queryWeight, product of: 50.0 = boost 3.4271598 = idf(docFreq=195811, maxDocs=2217897) 6.784133E-4 = queryNorm 1.7135799 = fieldWeight in 783453, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 3.4271598 = idf(docFreq=195811, maxDocs=2217897) 0.5 = fieldNorm(doc=783453) 0.007968252 = (MATCH) weight(name_first_letter:g in 783453) [DefaultSimilarity], result of: 0.007968252 = score(doc=783453,freq=1.0 = termFreq=1.0 ), product of: 0.0023250307 = queryWeight, product of: 3.4271598 = idf(docFreq=195811, maxDocs=2217897) 6.784133E-4 = queryNorm 3.4271598 = fieldWeight in 783453, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 3.4271598 = idf(docFreq=195811, maxDocs=2217897) 1.0 = fieldNorm(doc=783453) 3.1171496 = (MATCH) max plus 1.0 times others of: 3.1171496 = (MATCH) weight(lastname_search:aalbers^50.0 in 783453) [DefaultSimilarity], result of: 3.1171496 = score(doc=783453,freq=1.0 = termFreq=1.0 ), product of: 0.3251704 = queryWeight, product of: 50.0 = boost 9.586204 = idf(docFreq=413, maxDocs=2217897) 6.784133E-4 = queryNorm 9.586204 = fieldWeight in 783453, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 9.586204 = idf(docFreq=413, maxDocs=2217897) 1.0 = fieldNorm(doc=783453)

SOLR3.1:
3.3741257 = (MATCH) sum of: 0.25697616 = (MATCH) max plus 1.0 times others of: 0.2490079 = (MATCH) weight(firstname_search:g^50.0 in 1697008), product of: 0.11625154 = queryWeight(firstname_search:g^50.0), product of: 50.0 = boost 3.4271598 = idf(docFreq=195811, maxDocs=2217897) 6.784133E-4 = queryNorm 2.141975 = (MATCH) fieldWeight(firstname_search:g in 1697008), product of: 1.0 = tf(termFreq(firstname_search:g)=1) 3.4271598 = idf(docFreq=195811, maxDocs=2217897) 0.625 = fieldNorm(field=firstname_search, doc=1697008) 0.007968252 = (MATCH) weight(name_first_letter:g in 1697008), product of: 0.0023250307 = queryWeight(name_first_letter:g), product of: 3.4271598 = idf(docFreq=195811, maxDocs=2217897) 6.784133E-4 = queryNorm 3.4271598 = (MATCH) fieldWeight(name_first_letter:g in 1697008), product of: 1.0 = tf(termFreq(name_first_letter:g)=1) 3.4271598 = idf(docFreq=195811, maxDocs=2217897) 1.0 = fieldNorm(field=name_first_letter, doc=1697008) 3.1171496 = (MATCH) max plus 1.0 times others of: 3.1171496 = (MATCH) weight(lastname_search:aalbers^50.0 in 1697008), product of: 0.3251704 = queryWeight(lastname_search:aalbers^50.0), product of: 50.0 = boost 9.586204 = idf(docFreq=413, maxDocs=2217897) 6.784133E-4 = queryNorm 9.586204 = (MATCH) fieldWeight(lastname_search:aalbers in 1697008), product of: 1.0 = tf(termFreq(lastname_search:aalbers)=1) 9.586204 = idf(docFreq=413, maxDocs=2217897) 1.0 = fieldNorm(field=lastname_search, doc=1697008)


What is the reason for differences in score? Is there something really different in calculating scores in SOLR 4?

Thanks
Roy
Reply | Threaded
Open this post in threaded view
|

RE: Scoring differences solr versions

Markus Jelsma-2
Could you provide indented format instead? This is hard to debug but i suspect it's the query norm.
 
-----Original message-----

> From:roySolr <[hidden email]>
> Sent: Mon 21-Jan-2013 17:00
> To: [hidden email]
> Subject: Scoring differences solr versions
>
> Hi,
>
> I have some question about the scoring in SOLR4. I have the same query on 2
> versions of SOLR(same indexed docs). The debug of the scoring:
>
> *SOLR4:*
> 3.3243241 = (MATCH) sum of: 0.20717455 = (MATCH) max plus 1.0 times others
> of: 0.19920631 = (MATCH) weight(firstname_search:g^50.0 in 783453)
> [DefaultSimilarity], result of: 0.19920631 = score(doc=783453,freq=1.0 =
> termFreq=1.0 ), product of: 0.11625154 = queryWeight, product of: 50.0 =
> boost 3.4271598 = idf(docFreq=195811, maxDocs=2217897) 6.784133E-4 =
> queryNorm 1.7135799 = fieldWeight in 783453, product of: 1.0 = tf(freq=1.0),
> with freq of: 1.0 = termFreq=1.0 3.4271598 = idf(docFreq=195811,
> maxDocs=2217897) 0.5 = fieldNorm(doc=783453) 0.007968252 = (MATCH)
> weight(name_first_letter:g in 783453) [DefaultSimilarity], result of:
> 0.007968252 = score(doc=783453,freq=1.0 = termFreq=1.0 ), product of:
> 0.0023250307 = queryWeight, product of: 3.4271598 = idf(docFreq=195811,
> maxDocs=2217897) 6.784133E-4 = queryNorm 3.4271598 = fieldWeight in 783453,
> product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 3.4271598 =
> idf(docFreq=195811, maxDocs=2217897) 1.0 = fieldNorm(doc=783453) 3.1171496 =
> (MATCH) max plus 1.0 times others of: 3.1171496 = (MATCH)
> weight(lastname_search:aalbers^50.0 in 783453) [DefaultSimilarity], result
> of: 3.1171496 = score(doc=783453,freq=1.0 = termFreq=1.0 ), product of:
> 0.3251704 = queryWeight, product of: 50.0 = boost 9.586204 =
> idf(docFreq=413, maxDocs=2217897) 6.784133E-4 = queryNorm 9.586204 =
> fieldWeight in 783453, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 =
> termFreq=1.0 9.586204 = idf(docFreq=413, maxDocs=2217897) 1.0 =
> fieldNorm(doc=783453)
>
> *SOLR3.1:*
> 3.3741257 = (MATCH) sum of: 0.25697616 = (MATCH) max plus 1.0 times others
> of: 0.2490079 = (MATCH) weight(firstname_search:g^50.0 in 1697008), product
> of: 0.11625154 = queryWeight(firstname_search:g^50.0), product of: 50.0 =
> boost 3.4271598 = idf(docFreq=195811, maxDocs=2217897) 6.784133E-4 =
> queryNorm 2.141975 = (MATCH) fieldWeight(firstname_search:g in 1697008),
> product of: 1.0 = tf(termFreq(firstname_search:g)=1) 3.4271598 =
> idf(docFreq=195811, maxDocs=2217897) 0.625 =
> fieldNorm(field=firstname_search, doc=1697008) 0.007968252 = (MATCH)
> weight(name_first_letter:g in 1697008), product of: 0.0023250307 =
> queryWeight(name_first_letter:g), product of: 3.4271598 =
> idf(docFreq=195811, maxDocs=2217897) 6.784133E-4 = queryNorm 3.4271598 =
> (MATCH) fieldWeight(name_first_letter:g in 1697008), product of: 1.0 =
> tf(termFreq(name_first_letter:g)=1) 3.4271598 = idf(docFreq=195811,
> maxDocs=2217897) 1.0 = fieldNorm(field=name_first_letter, doc=1697008)
> 3.1171496 = (MATCH) max plus 1.0 times others of: 3.1171496 = (MATCH)
> weight(lastname_search:aalbers^50.0 in 1697008), product of: 0.3251704 =
> queryWeight(lastname_search:aalbers^50.0), product of: 50.0 = boost 9.586204
> = idf(docFreq=413, maxDocs=2217897) 6.784133E-4 = queryNorm 9.586204 =
> (MATCH) fieldWeight(lastname_search:aalbers in 1697008), product of: 1.0 =
> tf(termFreq(lastname_search:aalbers)=1) 9.586204 = idf(docFreq=413,
> maxDocs=2217897) 1.0 = fieldNorm(field=lastname_search, doc=1697008)
>
>
> What is the reason for differences in score? Is there something really
> different in calculating scores in SOLR 4?
>
> Thanks
> Roy
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Scoring-differences-solr-versions-tp4035106.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
Reply | Threaded
Open this post in threaded view
|

RE: Scoring differences solr versions

roySolr
Hello Markus,

Thanks for you reply. I read this about querynorm:

queryNorm is just a normalizing value applied equally to every clause - it
won't change the relative ordering of any docs, it just helps to ensure
that your scores don't sky rocket so high that floating point rounding
loses precision.

In the solr versions the score is different, that's not a big problem. The main problem is that the order of returned documents are also different, that's strange because it's the same config and the same index.

You want that format indended? How/where can i get that?

Thanks!
Reply | Threaded
Open this post in threaded view
|

Re: Scoring differences solr versions

Shawn Heisey-4
On 1/22/2013 2:07 AM, roySolr wrote:
> You want that format indended? How/where can i get that?

If you are using the query UI in Solr 4, just put a check in the box
that says "indent" ... otherwise just add &indent=true to the query URL
you send.

If you use the Solr 4 UI, the results will be inside the UI, but you'll
get the actual query URL at the top.  If you open that URL in a new tab,
you can then view the page source to see the XML in a format that's
easier to copy/paste.  For 3.x Solr versions, that step should not be
necessary - the query results don't show up within the Solr UI.

On Firefox for Windows, press Ctrl-U to view the page source.  Other
browsers/platforms probably have different methods.

Once you have the XML to paste, using a website like pastie.org where
you can set the format to XML is better than including it directly in
your email.  You can include the entire response, and it will be nicely
colorized.

Thanks,
Shawn

Reply | Threaded
Open this post in threaded view
|

Re: Scoring differences solr versions

roySolr
Hello Shawn,

Thanks for the help:

Indented format:

SOLR4

<str name="1697058">
3.3243241 = (MATCH) sum of:
  0.20717455 = (MATCH) max plus 1.0 times others of:
    0.19920631 = (MATCH) weight(firstname_search:g^50.0 in 783453) [DefaultSimilarity], result of:
      0.19920631 = score(doc=783453,freq=1.0 = termFreq=1.0
), product of:
        0.11625154 = queryWeight, product of:
          50.0 = boost
          3.4271598 = idf(docFreq=195811, maxDocs=2217897)
          6.784133E-4 = queryNorm
        1.7135799 = fieldWeight in 783453, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          3.4271598 = idf(docFreq=195811, maxDocs=2217897)
          0.5 = fieldNorm(doc=783453)
    0.007968252 = (MATCH) weight(name_first_letter:g in 783453) [DefaultSimilarity], result of:
      0.007968252 = score(doc=783453,freq=1.0 = termFreq=1.0
), product of:
        0.0023250307 = queryWeight, product of:
          3.4271598 = idf(docFreq=195811, maxDocs=2217897)
          6.784133E-4 = queryNorm
        3.4271598 = fieldWeight in 783453, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          3.4271598 = idf(docFreq=195811, maxDocs=2217897)
          1.0 = fieldNorm(doc=783453)
  3.1171496 = (MATCH) max plus 1.0 times others of:
    3.1171496 = (MATCH) weight(lastname_search:aalbers^50.0 in 783453) [DefaultSimilarity], result of:
      3.1171496 = score(doc=783453,freq=1.0 = termFreq=1.0
), product of:
        0.3251704 = queryWeight, product of:
          50.0 = boost
          9.586204 = idf(docFreq=413, maxDocs=2217897)
          6.784133E-4 = queryNorm
        9.586204 = fieldWeight in 783453, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.586204 = idf(docFreq=413, maxDocs=2217897)
          1.0 = fieldNorm(doc=783453)
</str>


SOLR 3.1

<str name="1697058">
3.3741257 = (MATCH) sum of:
  0.25697616 = (MATCH) max plus 1.0 times others of:
    0.2490079 = (MATCH) weight(firstname_search:g^50.0 in 1697008), product of:
      0.11625154 = queryWeight(firstname_search:g^50.0), product of:
        50.0 = boost
        3.4271598 = idf(docFreq=195811, maxDocs=2217897)
        6.784133E-4 = queryNorm
      2.141975 = (MATCH) fieldWeight(firstname_search:g in 1697008), product of:
        1.0 = tf(termFreq(firstname_search:g)=1)
        3.4271598 = idf(docFreq=195811, maxDocs=2217897)
        0.625 = fieldNorm(field=firstname_search, doc=1697008)
    0.007968252 = (MATCH) weight(name_first_letter:g in 1697008), product of:
      0.0023250307 = queryWeight(name_first_letter:g), product of:
        3.4271598 = idf(docFreq=195811, maxDocs=2217897)
        6.784133E-4 = queryNorm
      3.4271598 = (MATCH) fieldWeight(name_first_letter:g in 1697008), product of:
        1.0 = tf(termFreq(name_first_letter:g)=1)
        3.4271598 = idf(docFreq=195811, maxDocs=2217897)
        1.0 = fieldNorm(field=name_first_letter, doc=1697008)
  3.1171496 = (MATCH) max plus 1.0 times others of:
    3.1171496 = (MATCH) weight(lastname_search:aalbers^50.0 in 1697008), product of:
      0.3251704 = queryWeight(lastname_search:aalbers^50.0), product of:
        50.0 = boost
        9.586204 = idf(docFreq=413, maxDocs=2217897)
        6.784133E-4 = queryNorm
      9.586204 = (MATCH) fieldWeight(lastname_search:aalbers in 1697008), product of:
        1.0 = tf(termFreq(lastname_search:aalbers)=1)
        9.586204 = idf(docFreq=413, maxDocs=2217897)
        1.0 = fieldNorm(field=lastname_search, doc=1697008)
</str>


Why scores this doc higher in solr 3.1?
Reply | Threaded
Open this post in threaded view
|

RE: Scoring differences solr versions

Markus Jelsma-2
Ah, your fieldNorm is different. Are you sure firstname_search has exactly the same value and more important, length? I cannot seem to remember an issue that encodes norms differently between 3.x and 4.x but i'm likely wrong ;)
 
-----Original message-----

> From:roySolr <[hidden email]>
> Sent: Tue 22-Jan-2013 16:11
> To: [hidden email]
> Subject: Re: Scoring differences solr versions
>
> Hello Shawn,
>
> Thanks for the help:
>
> Indented format:
>
> *SOLR4*
>
> <str name="1697058">
> 3.3243241 = (MATCH) sum of:
>   0.20717455 = (MATCH) max plus 1.0 times others of:
>     0.19920631 = (MATCH) weight(firstname_search:g^50.0 in 783453)
> [DefaultSimilarity], result of:
>       0.19920631 = score(doc=783453,freq=1.0 = termFreq=1.0
> ), product of:
>         0.11625154 = queryWeight, product of:
>           50.0 = boost
>           3.4271598 = idf(docFreq=195811, maxDocs=2217897)
>           6.784133E-4 = queryNorm
>         1.7135799 = fieldWeight in 783453, product of:
>           1.0 = tf(freq=1.0), with freq of:
>             1.0 = termFreq=1.0
>           3.4271598 = idf(docFreq=195811, maxDocs=2217897)
>           0.5 = fieldNorm(doc=783453)
>     0.007968252 = (MATCH) weight(name_first_letter:g in 783453)
> [DefaultSimilarity], result of:
>       0.007968252 = score(doc=783453,freq=1.0 = termFreq=1.0
> ), product of:
>         0.0023250307 = queryWeight, product of:
>           3.4271598 = idf(docFreq=195811, maxDocs=2217897)
>           6.784133E-4 = queryNorm
>         3.4271598 = fieldWeight in 783453, product of:
>           1.0 = tf(freq=1.0), with freq of:
>             1.0 = termFreq=1.0
>           3.4271598 = idf(docFreq=195811, maxDocs=2217897)
>           1.0 = fieldNorm(doc=783453)
>   3.1171496 = (MATCH) max plus 1.0 times others of:
>     3.1171496 = (MATCH) weight(lastname_search:aalbers^50.0 in 783453)
> [DefaultSimilarity], result of:
>       3.1171496 = score(doc=783453,freq=1.0 = termFreq=1.0
> ), product of:
>         0.3251704 = queryWeight, product of:
>           50.0 = boost
>           9.586204 = idf(docFreq=413, maxDocs=2217897)
>           6.784133E-4 = queryNorm
>         9.586204 = fieldWeight in 783453, product of:
>           1.0 = tf(freq=1.0), with freq of:
>             1.0 = termFreq=1.0
>           9.586204 = idf(docFreq=413, maxDocs=2217897)
>           1.0 = fieldNorm(doc=783453)
> </str>
>
>
> *SOLR 3.1*
>
> <str name="1697058">
> 3.3741257 = (MATCH) sum of:
>   0.25697616 = (MATCH) max plus 1.0 times others of:
>     0.2490079 = (MATCH) weight(firstname_search:g^50.0 in 1697008), product
> of:
>       0.11625154 = queryWeight(firstname_search:g^50.0), product of:
>         50.0 = boost
>         3.4271598 = idf(docFreq=195811, maxDocs=2217897)
>         6.784133E-4 = queryNorm
>       2.141975 = (MATCH) fieldWeight(firstname_search:g in 1697008), product
> of:
>         1.0 = tf(termFreq(firstname_search:g)=1)
>         3.4271598 = idf(docFreq=195811, maxDocs=2217897)
>         0.625 = fieldNorm(field=firstname_search, doc=1697008)
>     0.007968252 = (MATCH) weight(name_first_letter:g in 1697008), product
> of:
>       0.0023250307 = queryWeight(name_first_letter:g), product of:
>         3.4271598 = idf(docFreq=195811, maxDocs=2217897)
>         6.784133E-4 = queryNorm
>       3.4271598 = (MATCH) fieldWeight(name_first_letter:g in 1697008),
> product of:
>         1.0 = tf(termFreq(name_first_letter:g)=1)
>         3.4271598 = idf(docFreq=195811, maxDocs=2217897)
>         1.0 = fieldNorm(field=name_first_letter, doc=1697008)
>   3.1171496 = (MATCH) max plus 1.0 times others of:
>     3.1171496 = (MATCH) weight(lastname_search:aalbers^50.0 in 1697008),
> product of:
>       0.3251704 = queryWeight(lastname_search:aalbers^50.0), product of:
>         50.0 = boost
>         9.586204 = idf(docFreq=413, maxDocs=2217897)
>         6.784133E-4 = queryNorm
>       9.586204 = (MATCH) fieldWeight(lastname_search:aalbers in 1697008),
> product of:
>         1.0 = tf(termFreq(lastname_search:aalbers)=1)
>         9.586204 = idf(docFreq=413, maxDocs=2217897)
>         1.0 = fieldNorm(field=lastname_search, doc=1697008)
> </str>
>
>
> Why scores this doc higher in solr 3.1?
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Scoring-differences-solr-versions-tp4035106p4035334.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
Reply | Threaded
Open this post in threaded view
|

RE: Scoring differences solr versions

roySolr
Hello Markus,

I analyse the firstname_search field and there is no difference in length or value.

I use the analyse function in the solr ui.

Some other ideas?

Thanks so far Markus.
Reply | Threaded
Open this post in threaded view
|

RE: Scoring differences solr versions

roySolr
This post was updated on .
The problem is the fieldNorm.

Solr3 is the fieldnorm 0.625 and in Solr4 it's 0.5. Can someone explains me why this is difference? The shema is exactly the same for this field.

Is there a difference in solr 4 scoring?

Thanks