replica's of same shard have different file contents

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

replica's of same shard have different file contents

Nicolas Franck
I noticed a - in my opinion - strange behavior in Solr Cloud.

I have a collection that has 1 shard and two replica's.

When I look at the directory structure, both have the same file names
in "data/index" ..

BUT the contents of those files are different.

So when I query this collection, and sort on "score",
and the score is the same for a lot of documents,
then the order is different depending on the node that
was queried. The results are the same, just the returned order.

I guess the segments are not sent as "is" from leaders to the other replica's?
Or something else could be wrong?

Thanks in advance


Reply | Threaded
Open this post in threaded view
|

Re: replica's of same shard have different file contents

Erick Erickson
This is expected for NRT replicas. For NRT, segments are _not_
the unit of update for the replicas, documents are. So the process is:

- leader gets documents to index
- leader indexes them locally and forwards the raw documents to the replicas

The autocommit timers trigger when the first doc hits a replica. Due to
network uncertainties and the like, the autocommit timers do not expire at the
same wall-clock time on all replicas. Plus, docs may have been received/indexed
by one replica but not another at the instant their autocommit timer expires.

So segments on different replicas will contain different docs. But wait! There’s more!
Segments are merged, and due to the fact that the segments may have different
docs in them, different decisions will be made about which segments to combine.

That all means that the segments on different replicas of the same shard will have
different docs in them. Additionally, if docs are updated/deleted, since the TF/IDF
stats include deleted docs, the scores from different replicas can have different
scores.

And just for your continued delectation….. even if the scores are identical for two
documents on multiple replicas, the final order may be different depending on
the replica. This is because the tiebreaker for identical scores is the _internal_
lucene document id, which changes when segments are merged, even possibly
the relative order of the same two docs.

So, you can try enabling stats cache, see: https://lucene.apache.org/solr/guide/7_7/distributed-requests.html

None of the above applies to TLOG/PULL setups, because in those situations
segments _are_ the unit of update, they’re copied from the leader as-is. However,
there are still situations where the order will be (temporarily) different. To whit:
followers periodically poll the leader for changed segments. Again, due to network
vagaries a given segment may or may not have been replicate to the follower at
any given time T, so if you happen to query replica1 and replica2 when a segment
has been copied to one but not the other, the stats used to compute the score
may be slightly different. This should only be the case when documents are being
ingested, once indexing has stopped and all followers have polled the leader and
replicated the segments, things should be identical.

Best,
Erick

> On Jan 14, 2020, at 8:25 AM, Nicolas Franck <[hidden email]> wrote:
>
> I noticed a - in my opinion - strange behavior in Solr Cloud.
>
> I have a collection that has 1 shard and two replica's.
>
> When I look at the directory structure, both have the same file names
> in "data/index" ..
>
> BUT the contents of those files are different.
>
> So when I query this collection, and sort on "score",
> and the score is the same for a lot of documents,
> then the order is different depending on the node that
> was queried. The results are the same, just the returned order.
>
> I guess the segments are not sent as "is" from leaders to the other replica's?
> Or something else could be wrong?
>
> Thanks in advance
>
>