[jira] [Commented] (SOLR-12974) RandomSort not consistent in SolrCloud Mode

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

[jira] [Commented] (SOLR-12974) RandomSort not consistent in SolrCloud Mode

JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/SOLR-12974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16679921#comment-16679921 ]

Erick Erickson commented on SOLR-12974:

I'd guess that the number is generated when the doc arrives at the replica and that it would need to be assigned by the leader before the document was sent to the followers.

It wouldn't do to have it assigned before it got to the leader, that wouldn't handle the case of the document being sent to the leader directly via HTTP.

I don't know how much extra work this would require on the leader's part.


> RandomSort not consistent in SolrCloud Mode
> -------------------------------------------
>                 Key: SOLR-12974
>                 URL: https://issues.apache.org/jira/browse/SOLR-12974
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public)
>          Components: SolrCloud
>    Affects Versions: 6.5.1
>            Reporter: Shrey Shivam
>            Priority: Minor
> Expected behaviour of RandomSort is that given the same random field name (random_<seed>) which acts a seed, the sorting order will remain consistent with the same version of Solr Index.
> From schema.xml:
> {{~<!-- The "RandomSortField" is not used to store or search any data. You can declare fields of this type it in your schema to generate pseudo-random orderings of your docs for sorting or function purposes. The ordering is generated based on the field name and the version of the index. As long as the index version remains unchanged, and the same field name is reused, the ordering of the docs will be consistent. If you want different psuedo-random orderings of documents, for the same version of the index, use a dynamicField and change the field name in the request. -->~}}
> In master slave mode, replication happens based on index version. If version number of slave is different than that of master, replication is done by slaves and the index number is updated to match the index version of master.
> However in SolrCloud mode, observation has been that replicas of the same shard do not maintain the same version number at all times even though the documents are same and consistent. 
> This has been previously discussed in [mailing list |https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201508.mbox/%3CCAE3uTzMGgPRv-P6juwjWM2yYyxfW893xayq7+2haV7MMobmi5g@...%3E]as well.
> {quote}SolrCloud works very differently than the old master-slave replication.
> The index is NOT copied from the leader to the other replicas, except
>  in extreme recovery circumstances.
> Each replica builds its own copy of the index independently from the
>  others. Due to slight timing differences in the indexing operations,
>  and possible actions related to transaction log replay on node restart,
>  each replica may end up with a different index layout. There also could
>  be differences in the number of deleted documents. Unless something
>  goes really wrong, all replicas should contain the same live documents.
> {quote}
> When a query to a shard is made which has 2 or more replicas, any replica is chosen to respond to the query. Now, if all replicas do not have the same index number, RandomSort will generate random hash seed differently for the same random_<seed> field name.
> In the source code of [RandomSort|https://github.com/apache/lucene-solr/blob/branch_6_5/solr/core/src/java/org/apache/solr/schema/RandomSortField.java] class, in line 86, it mentions the use of index version (of shard) to create random hash seed.
> Hence when querying a Solr Collection, for the same query, Solr is giving different results depending on version mismatch in replicas as well as based on which replica is serving request each time.
> Example of Solr Query where random field is being used:
> {code:java}
> https://solr-stage.mydomain.com:8983/solr/mycollection/select?wt=json&q=*:*&defType=edismax&fl=id&boost=if(query({!v='documentDate:[2018-11-07 TO *]'}),sum(div(scale(random_SW84gaDAf3RynhOyGQDZlgAAAYc1,0,1),1),sub(1,div(1,1))),if(or(exists(query({!v='documentType:sponsored'})),exists(query({!v='documentType:featured'}))),sum(div(scale(random_SW84gaDAf3RynhOyGQDZlgAAAYc1,0,1),4),sub(1,div(1,4))), if(or(exists(query({!v='documentType:listing'})),exists(query({!v='documentType:promotional'}))),sum(div(scale(random_SW84gaDAf3RynhOyGQDZlgAAAYc1,0,1),2),sub(1,div(1,2))),scale(random_SW84gaDAf3RynhOyGQDZlgAAAYc1,0,1))))
> {code}

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]