This email and any files transmitted with it are confidential and intended
solely for the use of the individual or entity to whom they are addressed.
If you have received this email in error please notify the system manager.
This message contains confidential information and is intended only for the
individual named. If you are not the named addressee you should not
disseminate, distribute or copy this e-mail. Please notify the sender
immediately by e-mail if you have received this e-mail by mistake and
delete this e-mail from your system. If you are not the intended recipient
you are notified that disclosing, copying, distributing or taking any
action in reliance on the contents of this information is strictly
if you add to the request the param : debugQuery=on you will see what
happens under the hood and understand how the score is assigned.
If you are new to the Lucene Similarity that Solr version uses ( BM25)
you can paste here the debug score response and we can briefly explain it to
you the first time.
First of all we are not even sure if the content field is actually used for
scoring in your case, if it is and it is alone used, it may be related to
the field length ( But it would be suspicious as they are quite similar in
length in your example).
Are you sorting by score for any reason ?
It's been a while I have not checked but I doubt you get any benefit from
the default ( which rank by score).
So I recommend you to send here the debug response and then possibly your
select request handler config.
Years of effort has gone into tuning the Lucene code that calculates
scores. It is almost certain that the score is working as designed, but
the design does not fit your expectations.
Lucene's score calculation (which defaults to the BM25 similarity in
Solr 6.x and later) takes term frequency (TF) into account, but that is
not the whole story. Another part of the calculation is inverse
document frequency (IDF). BM25 is more complicated than just those two
factors, but I they are large influences in the final score.
One thing that taking both TF and IDF into account does is reduce the
score when the size of the document is large -- because the term showing
up in a short document probably means that it's more relevant there.
The actual calculation is certainly a lot more complex than what I'm
going to describe, but the simple idea below illustrates what is
For the doc with id 1, there are two terms, and the search for java
matches one of them - it's half of the document, which makes it pretty
important for that document. For the doc with id 2, the search term
appears three times, but there are nine terms total, so the term only
contributes a third of that document. For id 3, the importance is also
about one third. This means that id 1 probably outscores both id 2 and
id 3 for a search term of "java".
Here's a detailed article about TF and IDF. Older versions of Solr
(before 6.x) used this kind of calcuation: