Duplicated tokens in search string

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Duplicated tokens in search string

rodio
Hi all,

We are trying to emulate in Solr 8.0 the behaviour of Solr 3.6 and we are
facing a problem that we cannot solve

When we have duplicated tokens:

- Solr 8.0 scores only once the token but it applies a huge boost
- Solr 3.6 scores individually each token and the final score is lower

We are using ClassicSimilarity algorythm but we cannot prevent that boosting

Example: table 60 cm 50 cm

Solr 8.0

/11.096966 = sum of:
  4.3195267 = sum of:
    4.3195267 = weight(name:table in 138556) [ClassicSimilarity], result of:
      4.3195267 = score(freq=1.0), product of:
        8.639053 = idf, computed as log((docCount+1)/(docFreq+1)) + 1 from:
          62381 = docFreq, number of documents containing term
          129615816 = docCount, total number of documents with field
        1.0 = tf(freq=1.0), with freq of:
          1.0 = freq, occurrences of term within document
        0.5 = fieldNorm
  2.7624812 = weight(name:60 in 138556) [ClassicSimilarity], result of:
    2.7624812 = score(freq=1.0), product of:
      5.5249624 = idf, computed as log((docCount+1)/(docFreq+1)) + 1 from:
        1404402 = docFreq, number of documents containing term
        129615816 = docCount, total number of documents with field
      1.0 = tf(freq=1.0), with freq of:
        1.0 = freq, occurrences of term within document
      0.5 = fieldNorm
  4.0149584 = weight(name:cm in 138556) [ClassicSimilarity], result of:
    4.0149584 = score(freq=1.0), product of:
*      2.0 = boost*
      4.0149584 = idf, computed as log((docCount+1)/(docFreq+1)) + 1 from:
        6357381 = docFreq, number of documents containing term
        129615816 = docCount, total number of documents with field
      1.0 = tf(freq=1.0), with freq of:
        1.0 = freq, occurrences of term within document
      0.5 = fieldNorm
/

Solr 3.6

/3.098446 = (MATCH) product of:
  3.8730574 = (MATCH) sum of:
    2.120801 = (MATCH) sum of:
      2.120801 = (MATCH) weight(name:table in 101441), product of:
        0.4913325 = queryWeight(name:table), product of:
          8.632854 = idf(docFreq=135231, maxDocs=279245306)
          0.05691426 = queryNorm
        4.316427 = (MATCH) fieldWeight(name:table in 101441), product of:
          1.0 = tf(termFreq(name:table)=1)
          8.632854 = idf(docFreq=135231, maxDocs=279245306)
          0.5 = fieldNorm(field=name, doc=101441)
    0.8427305 = (MATCH) weight(name:60 in 101441), product of:
      0.30972046 = queryWeight(name:60), product of:
        5.4418783 = idf(docFreq=3287778, maxDocs=279245306)
        0.05691426 = queryNorm
      2.7209392 = (MATCH) fieldWeight(name:60 in 101441), product of:
        1.0 = tf(termFreq(name:60)=1)
        5.4418783 = idf(docFreq=3287778, maxDocs=279245306)
        0.5 = fieldNorm(field=name, doc=101441)
    0.45476305 = (MATCH) weight(name:cm in 101441), product of:
      0.22751924 = queryWeight(name:cm), product of:
        3.9975789 = idf(docFreq=13936507, maxDocs=279245306)
        0.05691426 = queryNorm
      1.9987894 = (MATCH) fieldWeight(name:cm in 101441), product of:
        1.0 = tf(termFreq(name:cm)=1)
        3.9975789 = idf(docFreq=13936507, maxDocs=279245306)
        0.5 = fieldNorm(field=name, doc=101441)
    0.45476305 = (MATCH) weight(name:cm in 101441), product of:
      0.22751924 = queryWeight(name:cm), product of:
        3.9975789 = idf(docFreq=13936507, maxDocs=279245306)
        0.05691426 = queryNorm
      1.9987894 = (MATCH) fieldWeight(name:cm in 101441), product of:
        1.0 = tf(termFreq(name:cm)=1)
        3.9975789 = idf(docFreq=13936507, maxDocs=279245306)
        0.5 = fieldNorm(field=name, doc=101441)
  0.8 = coord(4/5)
/

Is it possible to configure this?

Thanks in advance!




--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html