Different order of docs between SOLR-4.10.4 to SOLR-6.5.1

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Different order of docs between SOLR-4.10.4 to SOLR-6.5.1

SOLR4189
Hey all,
I need to upgrade from SOLR-4.10.3 to SOLR-6.5.1 in production environment.
When I checked it in the test environment, I noticed the order of returned docs for each query is different. The score has changed as well. I use same similarity algorithm - OccapiBM25 as in previous version. Number of shards and number of docs by shards also haven't changed.

Does it normal?
What might be the causes for such behavior?

Regards.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Different order of docs between SOLR-4.10.4 to SOLR-6.5.1

Shawn Heisey-2
On 8/4/2017 1:02 AM, SOLR4189 wrote:
> I need to upgrade from SOLR-4.10.3 to SOLR-6.5.1 in production environment.
> When I checked it in the test environment, I noticed the order of returned docs for each query is different. The score has changed as well. I use same similarity algorithm - OccapiBM25 as in previous version. Number of shards and number of docs by shards also haven't changed.

You're comparing versions released more than two years apart, and across
two major version upgrades.

Solr is an application built around Lucene.  The score calculation in
Lucene is frequently tweaked, producing slightly different results even
with identical data.  Over such a large version discrepancy, I would be
very surprised if the order and the scores were the same.

Is the index identical between the versions?  If the indexes were each
built from scratch by their respective versions, rather than going
through an index upgrade procedure, they are very likely NOT completely
identical.  Text analysis components are also tweaked frequently, to fix
bugs and improve behavior.

If the shard hash ranges are not the same on the old and new versions,
that could contribute to differences in scoring as well.

Are you writing because you're seeing different results, or because you
think the order you're seeing in the newer version is wrong?

Thanks,
Shawn

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Different order of docs between SOLR-4.10.4 to SOLR-6.5.1

Erick Erickson
In addition to Shawn's comments, deleted but not merged documents
alter the statistics used for scoring, so the only hope that the
scores are comparable would be on an optimized index. And note that I
would recommend optimizing _only_ for testing, don't use it in a
production system unless the index is static. I.e. if your pattern is
build once a day and optimize, optimizing is fine, but not on a
continuously changing index.

Best,
Erick

On Fri, Aug 4, 2017 at 5:52 AM, Shawn Heisey <[hidden email]> wrote:

> On 8/4/2017 1:02 AM, SOLR4189 wrote:
>> I need to upgrade from SOLR-4.10.3 to SOLR-6.5.1 in production environment.
>> When I checked it in the test environment, I noticed the order of returned docs for each query is different. The score has changed as well. I use same similarity algorithm - OccapiBM25 as in previous version. Number of shards and number of docs by shards also haven't changed.
>
> You're comparing versions released more than two years apart, and across
> two major version upgrades.
>
> Solr is an application built around Lucene.  The score calculation in
> Lucene is frequently tweaked, producing slightly different results even
> with identical data.  Over such a large version discrepancy, I would be
> very surprised if the order and the scores were the same.
>
> Is the index identical between the versions?  If the indexes were each
> built from scratch by their respective versions, rather than going
> through an index upgrade procedure, they are very likely NOT completely
> identical.  Text analysis components are also tweaked frequently, to fix
> bugs and improve behavior.
>
> If the shard hash ranges are not the same on the old and new versions,
> that could contribute to differences in scoring as well.
>
> Are you writing because you're seeing different results, or because you
> think the order you're seeing in the newer version is wrong?
>
> Thanks,
> Shawn
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Different order of docs between SOLR-4.10.4 to SOLR-6.5.1

SOLR4189
In reply to this post by Shawn Heisey-2
Yes, only because I'm seeing different results.

For example, changing WordDelimiterFilterFactory to WordDelimiterGraphFilterFactory  can change order of docs? (http://lucene.apache.org/core//6_5_1/analyzers-common/index.html?deprecated-list.html)

For building index I tried 2 ways: 1) Dataimport from SOLR-4 to SOLR-6 and 2) IndexUpgraderTool
And in both ways order of docs is different.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Different order of docs between SOLR-4.10.4 to SOLR-6.5.1

Shawn Heisey-2
On 8/11/2017 2:52 AM, SOLR4189 wrote:
> Yes, only because I'm seeing different results.
>
> For example, changing *WordDelimiterFilterFactory *to
> *WordDelimiterGraphFilterFactory * can change order of docs? (
> http://lucene.apache.org/core//6_5_1/analyzers-common/index.html?deprecated-list.html
> <http://lucene.apache.org/core//6_5_1/analyzers-common/index.html?deprecated-list.html>
> )

I can't say for sure, but if that difference changes what parts of your
query match or don't match, that is very likely to affect document scores.

> For building index I tried 2 ways: 1) Dataimport from SOLR-4 to SOLR-6 and
> 2) IndexUpgraderTool
> And in both ways order of docs is different.

If you are changing things like WordDelimiterFilterFactory to the graph
version, you'll definitely want to reindex.  The IndexUpgrader tool is
not a reindex.  If the Solr 4 index meets the requirements of having all
relevant fields stored, then doing a dataimport from 4 to 6 would be the
same as a reindex.

Thanks,
Shawn

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Different order of docs between SOLR-4.10.4 to SOLR-6.5.1

SOLR4189
> If you are changing things like WordDelimiterFilterFactory to the graph
> version, you'll definitely want to reindex

What does it mean "want to reindex"? If I change WordDelimiterFilterFactory to the graph and use IndexUpgrader is it mistake? Or changes will not be affected only?
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Different order of docs between SOLR-4.10.4 to SOLR-6.5.1

David Hastings
Rebuild your index. It's just the safest way.

On Aug 13, 2017, at 2:02 PM, SOLR4189 <[hidden email]> wrote:

>> If you are changing things like WordDelimiterFilterFactory to the graph
>> version, you'll definitely want to reindex
>
> What does it mean "*want to reindex*"? If I change
> WordDelimiterFilterFactory to the graph and use IndexUpgrader is it mistake?
> Or changes will not be affected only?
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Different-order-of-docs-between-SOLR-4-10-4-to-SOLR-6-5-1-tp4349021p4350413.html
> Sent from the Solr - User mailing list archive at Nabble.com.
Loading...