Debugging/scoring question

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Debugging/scoring question

LOPEZ-CORTES Mariano-ext
Hi all

I've a 20 document collection. In a debugging plan, we have:

"1000000051":"
20.794415 = max of:
  20.794415 = weight(nomUsageE:jean in 1) [SchemaSimilarity], result of:
    20.794415 = score(doc=1,freq=1.0 = termFreq=1.0
), product of:
      15.0 = boost
      1.3862944 = idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:
        1.0 = docFreq
        5.0 = docCount
      1.0 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:
        1.0 = termFreq=1.0
        1.2 = parameter k1
        0.75 = parameter b
        1.0 = avgFieldLength
        1.0 = fieldLength

  "1000000053":"
21.11246 = max of:
  21.11246 = weight(prenomE:jean in 3) [SchemaSimilarity], result of:
    21.11246 = score(doc=3,freq=1.0 = termFreq=1.0
), product of:
      8.0 = boost
      2.6390574 = idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:
        1.0 = docFreq
        20.0 = docCount
      1.0 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:
        1.0 = termFreq=1.0
        1.2 = parameter k1
        0.75 = parameter b
        1.0 = avgFieldLength
        1.0 = fieldLength

docCount = 5.0 for the document 1000000051. Why? docCount is the total number of documents, isn't it?

Thanks in advance!


Reply | Threaded
Open this post in threaded view
|

Re: Debugging/scoring question

Alessandro Benedetti
Hi Mariano,
From the documentation :

docCount = total number of documents containing this field, in the range [1
.. {@link #maxDoc()}]

In your debug the fields involved in the score computation are indeed
different ( nomUsageE, prenomE) .

Does this make sense ?

Cheers



-----
---------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
---------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
Reply | Threaded
Open this post in threaded view
|

RE: Debugging/scoring question

LOPEZ-CORTES Mariano-ext
Yes. This make sense.

I guess you talk about this doc:

https://lucene.apache.org/core/6_0_1/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html

How I can decrease the effect of the IDF component in my query?

Thanks!!

-----Message d'origine-----
De : Alessandro Benedetti [mailto:[hidden email]]
Envoyé : mercredi 23 mai 2018 18:05
À : [hidden email]
Objet : Re: Debugging/scoring question

Hi Mariano,
From the documentation :

docCount = total number of documents containing this field, in the range [1 .. {@link #maxDoc()}]

In your debug the fields involved in the score computation are indeed different ( nomUsageE, prenomE) .

Does this make sense ?

Cheers



-----
---------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director Sease Ltd. - www.sease.io
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Reply | Threaded
Open this post in threaded view
|

Re: Debugging/scoring question

Erick Erickson
Well, first you have to be using that similarity ;)

Since Solr 6.0, BM25 has been the default similarity algorithm.

If you insist, you can modify the score with function queries, see the
docfreq method.

Best,
Erck

On Wed, May 23, 2018 at 12:17 PM, LOPEZ-CORTES Mariano-ext
<[hidden email]> wrote:

> Yes. This make sense.
>
> I guess you talk about this doc:
>
> https://lucene.apache.org/core/6_0_1/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html
>
> How I can decrease the effect of the IDF component in my query?
>
> Thanks!!
>
> -----Message d'origine-----
> De : Alessandro Benedetti [mailto:[hidden email]]
> Envoyé : mercredi 23 mai 2018 18:05
> À : [hidden email]
> Objet : Re: Debugging/scoring question
>
> Hi Mariano,
> From the documentation :
>
> docCount = total number of documents containing this field, in the range [1 .. {@link #maxDoc()}]
>
> In your debug the fields involved in the score computation are indeed different ( nomUsageE, prenomE) .
>
> Does this make sense ?
>
> Cheers
>
>
>
> -----
> ---------------
> Alessandro Benedetti
> Search Consultant, R&D Software Engineer, Director Sease Ltd. - www.sease.io
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html