Issue with CDCR bootstrapping in Solr 7.1

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Issue with CDCR bootstrapping in Solr 7.1

Tom Peters
I'm running into an issue with the initial CDCR bootstrapping of an existing index. In short, after turning on CDCR only the leader replica in the target data center will have the documents replicated and it will not exist in any of the follower replicas in the target data center. All subsequent incremental updates made to the source datacenter will appear in all replicas in the target data center.

A little more details:

I have two clusters setup, a source cluster and a target cluster. Each cluster has only one shard and three replicas. I used the configuration detailed in the Source and Target sections of the reference guide as-is with the exception of updating the zkHost (https://lucene.apache.org/solr/guide/7_1/cross-data-center-replication-cdcr.html#cdcr-configuration-2).

The source data center has the following nodes:
        solr01-a, solr01-b, and solr01-c

The target data center has the following nodes:
        solr02-a, solr02-b, and solr02-c

Here are the steps that I've done:

1. Create collection in source and target data centers

2. Add a number of documents to the source data center

3. Verify:

    $ for i in solr0{1,2}-{a,b,c}; do echo -n "$i: "; curl -s $i:8080/solr/mycollection/select'?q=*:*' | jq '.response.numFound'; done
    solr01-a: 81
    solr01-b: 81
    solr01-c: 81
    solr02-a: 0
    solr02-b: 0
    solr02-c: 0

4. Start CDCR:

    $ curl 'solr01-a:8080/solr/mycollection/cdcr?action=START'

5. See if target data center has received the initial index

    $ for i in solr0{1,2}-{a,b,c}; do echo -n "$i: "; curl -s $i:8080/solr/mycollection/select'?q=*:*' | jq '.response.numFound'; done
    solr01-a: 81
    solr01-b: 81
    solr01-c: 81
    solr02-a: 0
    solr02-b: 0
    solr02-c: 81

    note: only -c has received the index

6. Add another document to the source cluster

7. See how many documents are in each node:

    $ for i in solr0{1,2}-{a,b,c}; do echo -n "$i: "; curl -s $i:8080/solr/mycollection/select'?q=*:*' | jq '.response.numFound'; done
    solr01-a: 82
    solr01-b: 82
    solr01-c: 82
    solr02-a: 1
    solr02-b: 1
    solr02-c: 82


As you can see, the initial index only made it to one of the replicas in the target data center, but subsequent incremental updates have appeared everywhere I would expect. Any help would be greatly appreciated, thanks.



This message and any attachment may contain information that is confidential and/or proprietary. Any use, disclosure, copying, storing, or distribution of this e-mail or any attached file by anyone other than the intended recipient is strictly prohibited. If you have received this message in error, please notify the sender by reply email and delete the message and any attachments. Thank you.
Reply | Threaded
Open this post in threaded view
|

Re: Issue with CDCR bootstrapping in Solr 7.1

Amrit Sarkar
Hi Tom,

I see what you are saying and I too think this is a bug, but I will confirm
once on the code. Bootstrapping should happen on all the nodes of the
target.

Meanwhile can you index more than 100 documents in the source and do the
exact same experiment again. Followers will not copy the entire index of
Leader unless the difference in versions in docs are more than
"numRecordsToKeep", which is default 100, unless you have modified in
solrconfig.xml.

Looking forward to your analysis.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2

On Thu, Nov 30, 2017 at 9:03 PM, Tom Peters <[hidden email]> wrote:

> I'm running into an issue with the initial CDCR bootstrapping of an
> existing index. In short, after turning on CDCR only the leader replica in
> the target data center will have the documents replicated and it will not
> exist in any of the follower replicas in the target data center. All
> subsequent incremental updates made to the source datacenter will appear in
> all replicas in the target data center.
>
> A little more details:
>
> I have two clusters setup, a source cluster and a target cluster. Each
> cluster has only one shard and three replicas. I used the configuration
> detailed in the Source and Target sections of the reference guide as-is
> with the exception of updating the zkHost (https://lucene.apache.org/
> solr/guide/7_1/cross-data-center-replication-cdcr.html#
> cdcr-configuration-2).
>
> The source data center has the following nodes:
>         solr01-a, solr01-b, and solr01-c
>
> The target data center has the following nodes:
>         solr02-a, solr02-b, and solr02-c
>
> Here are the steps that I've done:
>
> 1. Create collection in source and target data centers
>
> 2. Add a number of documents to the source data center
>
> 3. Verify:
>
>     $ for i in solr0{1,2}-{a,b,c}; do echo -n "$i: "; curl -s
> $i:8080/solr/mycollection/select'?q=*:*' | jq '.response.numFound'; done
>     solr01-a: 81
>     solr01-b: 81
>     solr01-c: 81
>     solr02-a: 0
>     solr02-b: 0
>     solr02-c: 0
>
> 4. Start CDCR:
>
>     $ curl 'solr01-a:8080/solr/mycollection/cdcr?action=START'
>
> 5. See if target data center has received the initial index
>
>     $ for i in solr0{1,2}-{a,b,c}; do echo -n "$i: "; curl -s
> $i:8080/solr/mycollection/select'?q=*:*' | jq '.response.numFound'; done
>     solr01-a: 81
>     solr01-b: 81
>     solr01-c: 81
>     solr02-a: 0
>     solr02-b: 0
>     solr02-c: 81
>
>     note: only -c has received the index
>
> 6. Add another document to the source cluster
>
> 7. See how many documents are in each node:
>
>     $ for i in solr0{1,2}-{a,b,c}; do echo -n "$i: "; curl -s
> $i:8080/solr/mycollection/select'?q=*:*' | jq '.response.numFound'; done
>     solr01-a: 82
>     solr01-b: 82
>     solr01-c: 82
>     solr02-a: 1
>     solr02-b: 1
>     solr02-c: 82
>
>
> As you can see, the initial index only made it to one of the replicas in
> the target data center, but subsequent incremental updates have appeared
> everywhere I would expect. Any help would be greatly appreciated, thanks.
>
>
>
> This message and any attachment may contain information that is
> confidential and/or proprietary. Any use, disclosure, copying, storing, or
> distribution of this e-mail or any attached file by anyone other than the
> intended recipient is strictly prohibited. If you have received this
> message in error, please notify the sender by reply email and delete the
> message and any attachments. Thank you.
>
Reply | Threaded
Open this post in threaded view
|

Re: Issue with CDCR bootstrapping in Solr 7.1

Tom Peters
Hi Amrit,

Starting with more documents doesn't appear to have made a difference. This time I tried with >1000 docs. Here are the steps I took:

1. Deleted the collection on both the source and target DCs.

2. Recreated the collections.

3. Indexed >1000 documents on source data center, hard commmit

  $ for i in solr0{1,2}-{a,b,c}; do echo -n "$i: "; curl -s $i:8080/solr/mycollection/select'?q=*:*' | jq '.response.numFound'; done
  solr01-a: 1368
  solr01-b: 1368
  solr01-c: 1368
  solr02-a: 0
  solr02-b: 0
  solr02-c: 0

4. Enabled CDCR and checked docs

  $ curl 'solr01-a:8080/solr/synacor/cdcr?action=START'

  $ for i in solr0{1,2}-{a,b,c}; do echo -n "$i: "; curl -s $i:8080/solr/mycollection/select'?q=*:*' | jq '.response.numFound'; done
  solr01-a: 1368
  solr01-b: 1368
  solr01-c: 1368
  solr02-a: 0
  solr02-b: 0
  solr02-c: 1368

Some additional notes:

* I do not have numRecordsToKeep defined in my solrconfig.xml, so I assume it will use the default of 100

* I found a way to get the follower replicas to receive the documents from the leader in the target data center, I have to restart the solr instance running on that server. Not sure if this information helps at all.

> On Nov 30, 2017, at 11:22 AM, Amrit Sarkar <[hidden email]> wrote:
>
> Hi Tom,
>
> I see what you are saying and I too think this is a bug, but I will confirm
> once on the code. Bootstrapping should happen on all the nodes of the
> target.
>
> Meanwhile can you index more than 100 documents in the source and do the
> exact same experiment again. Followers will not copy the entire index of
> Leader unless the difference in versions in docs are more than
> "numRecordsToKeep", which is default 100, unless you have modified in
> solrconfig.xml.
>
> Looking forward to your analysis.
>
> Amrit Sarkar
> Search Engineer
> Lucidworks, Inc.
> 415-589-9269
> www.lucidworks.com
> Twitter http://twitter.com/lucidworks
> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> Medium: https://medium.com/@sarkaramrit2
>
> On Thu, Nov 30, 2017 at 9:03 PM, Tom Peters <[hidden email]> wrote:
>
>> I'm running into an issue with the initial CDCR bootstrapping of an
>> existing index. In short, after turning on CDCR only the leader replica in
>> the target data center will have the documents replicated and it will not
>> exist in any of the follower replicas in the target data center. All
>> subsequent incremental updates made to the source datacenter will appear in
>> all replicas in the target data center.
>>
>> A little more details:
>>
>> I have two clusters setup, a source cluster and a target cluster. Each
>> cluster has only one shard and three replicas. I used the configuration
>> detailed in the Source and Target sections of the reference guide as-is
>> with the exception of updating the zkHost (https://lucene.apache.org/
>> solr/guide/7_1/cross-data-center-replication-cdcr.html#
>> cdcr-configuration-2).
>>
>> The source data center has the following nodes:
>>        solr01-a, solr01-b, and solr01-c
>>
>> The target data center has the following nodes:
>>        solr02-a, solr02-b, and solr02-c
>>
>> Here are the steps that I've done:
>>
>> 1. Create collection in source and target data centers
>>
>> 2. Add a number of documents to the source data center
>>
>> 3. Verify:
>>
>>    $ for i in solr0{1,2}-{a,b,c}; do echo -n "$i: "; curl -s
>> $i:8080/solr/mycollection/select'?q=*:*' | jq '.response.numFound'; done
>>    solr01-a: 81
>>    solr01-b: 81
>>    solr01-c: 81
>>    solr02-a: 0
>>    solr02-b: 0
>>    solr02-c: 0
>>
>> 4. Start CDCR:
>>
>>    $ curl 'solr01-a:8080/solr/mycollection/cdcr?action=START'
>>
>> 5. See if target data center has received the initial index
>>
>>    $ for i in solr0{1,2}-{a,b,c}; do echo -n "$i: "; curl -s
>> $i:8080/solr/mycollection/select'?q=*:*' | jq '.response.numFound'; done
>>    solr01-a: 81
>>    solr01-b: 81
>>    solr01-c: 81
>>    solr02-a: 0
>>    solr02-b: 0
>>    solr02-c: 81
>>
>>    note: only -c has received the index
>>
>> 6. Add another document to the source cluster
>>
>> 7. See how many documents are in each node:
>>
>>    $ for i in solr0{1,2}-{a,b,c}; do echo -n "$i: "; curl -s
>> $i:8080/solr/mycollection/select'?q=*:*' | jq '.response.numFound'; done
>>    solr01-a: 82
>>    solr01-b: 82
>>    solr01-c: 82
>>    solr02-a: 1
>>    solr02-b: 1
>>    solr02-c: 82
>>
>>
>> As you can see, the initial index only made it to one of the replicas in
>> the target data center, but subsequent incremental updates have appeared
>> everywhere I would expect. Any help would be greatly appreciated, thanks.
>>
>>
>>
>> This message and any attachment may contain information that is
>> confidential and/or proprietary. Any use, disclosure, copying, storing, or
>> distribution of this e-mail or any attached file by anyone other than the
>> intended recipient is strictly prohibited. If you have received this
>> message in error, please notify the sender by reply email and delete the
>> message and any attachments. Thank you.
>>



This message and any attachment may contain information that is confidential and/or proprietary. Any use, disclosure, copying, storing, or distribution of this e-mail or any attached file by anyone other than the intended recipient is strictly prohibited. If you have received this message in error, please notify the sender by reply email and delete the message and any attachments. Thank you.
Reply | Threaded
Open this post in threaded view
|

Re: Issue with CDCR bootstrapping in Solr 7.1

Amrit Sarkar
Tom,

This is very useful:

> I found a way to get the follower replicas to receive the documents from
> the leader in the target data center, I have to restart the solr instance
> running on that server. Not sure if this information helps at all.


You have to issue hardcommit on target after the bootstrapping is done.
Reloading makes the core opening a new searcher. While explicit commit is
issued at target leader after the BS is done, follower are left unattended
though the docs are copied over.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2

On Thu, Nov 30, 2017 at 10:06 PM, Tom Peters <[hidden email]> wrote:

> Hi Amrit,
>
> Starting with more documents doesn't appear to have made a difference.
> This time I tried with >1000 docs. Here are the steps I took:
>
> 1. Deleted the collection on both the source and target DCs.
>
> 2. Recreated the collections.
>
> 3. Indexed >1000 documents on source data center, hard commmit
>
>   $ for i in solr0{1,2}-{a,b,c}; do echo -n "$i: "; curl -s
> $i:8080/solr/mycollection/select'?q=*:*' | jq '.response.numFound'; done
>   solr01-a: 1368
>   solr01-b: 1368
>   solr01-c: 1368
>   solr02-a: 0
>   solr02-b: 0
>   solr02-c: 0
>
> 4. Enabled CDCR and checked docs
>
>   $ curl 'solr01-a:8080/solr/synacor/cdcr?action=START'
>
>   $ for i in solr0{1,2}-{a,b,c}; do echo -n "$i: "; curl -s
> $i:8080/solr/mycollection/select'?q=*:*' | jq '.response.numFound'; done
>   solr01-a: 1368
>   solr01-b: 1368
>   solr01-c: 1368
>   solr02-a: 0
>   solr02-b: 0
>   solr02-c: 1368
>
> Some additional notes:
>
> * I do not have numRecordsToKeep defined in my solrconfig.xml, so I assume
> it will use the default of 100
>
> * I found a way to get the follower replicas to receive the documents from
> the leader in the target data center, I have to restart the solr instance
> running on that server. Not sure if this information helps at all.
>
> > On Nov 30, 2017, at 11:22 AM, Amrit Sarkar <[hidden email]>
> wrote:
> >
> > Hi Tom,
> >
> > I see what you are saying and I too think this is a bug, but I will
> confirm
> > once on the code. Bootstrapping should happen on all the nodes of the
> > target.
> >
> > Meanwhile can you index more than 100 documents in the source and do the
> > exact same experiment again. Followers will not copy the entire index of
> > Leader unless the difference in versions in docs are more than
> > "numRecordsToKeep", which is default 100, unless you have modified in
> > solrconfig.xml.
> >
> > Looking forward to your analysis.
> >
> > Amrit Sarkar
> > Search Engineer
> > Lucidworks, Inc.
> > 415-589-9269
> > www.lucidworks.com
> > Twitter http://twitter.com/lucidworks
> > LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> > Medium: https://medium.com/@sarkaramrit2
> >
> > On Thu, Nov 30, 2017 at 9:03 PM, Tom Peters <[hidden email]> wrote:
> >
> >> I'm running into an issue with the initial CDCR bootstrapping of an
> >> existing index. In short, after turning on CDCR only the leader replica
> in
> >> the target data center will have the documents replicated and it will
> not
> >> exist in any of the follower replicas in the target data center. All
> >> subsequent incremental updates made to the source datacenter will
> appear in
> >> all replicas in the target data center.
> >>
> >> A little more details:
> >>
> >> I have two clusters setup, a source cluster and a target cluster. Each
> >> cluster has only one shard and three replicas. I used the configuration
> >> detailed in the Source and Target sections of the reference guide as-is
> >> with the exception of updating the zkHost (https://lucene.apache.org/
> >> solr/guide/7_1/cross-data-center-replication-cdcr.html#
> >> cdcr-configuration-2).
> >>
> >> The source data center has the following nodes:
> >>        solr01-a, solr01-b, and solr01-c
> >>
> >> The target data center has the following nodes:
> >>        solr02-a, solr02-b, and solr02-c
> >>
> >> Here are the steps that I've done:
> >>
> >> 1. Create collection in source and target data centers
> >>
> >> 2. Add a number of documents to the source data center
> >>
> >> 3. Verify:
> >>
> >>    $ for i in solr0{1,2}-{a,b,c}; do echo -n "$i: "; curl -s
> >> $i:8080/solr/mycollection/select'?q=*:*' | jq '.response.numFound';
> done
> >>    solr01-a: 81
> >>    solr01-b: 81
> >>    solr01-c: 81
> >>    solr02-a: 0
> >>    solr02-b: 0
> >>    solr02-c: 0
> >>
> >> 4. Start CDCR:
> >>
> >>    $ curl 'solr01-a:8080/solr/mycollection/cdcr?action=START'
> >>
> >> 5. See if target data center has received the initial index
> >>
> >>    $ for i in solr0{1,2}-{a,b,c}; do echo -n "$i: "; curl -s
> >> $i:8080/solr/mycollection/select'?q=*:*' | jq '.response.numFound';
> done
> >>    solr01-a: 81
> >>    solr01-b: 81
> >>    solr01-c: 81
> >>    solr02-a: 0
> >>    solr02-b: 0
> >>    solr02-c: 81
> >>
> >>    note: only -c has received the index
> >>
> >> 6. Add another document to the source cluster
> >>
> >> 7. See how many documents are in each node:
> >>
> >>    $ for i in solr0{1,2}-{a,b,c}; do echo -n "$i: "; curl -s
> >> $i:8080/solr/mycollection/select'?q=*:*' | jq '.response.numFound';
> done
> >>    solr01-a: 82
> >>    solr01-b: 82
> >>    solr01-c: 82
> >>    solr02-a: 1
> >>    solr02-b: 1
> >>    solr02-c: 82
> >>
> >>
> >> As you can see, the initial index only made it to one of the replicas in
> >> the target data center, but subsequent incremental updates have appeared
> >> everywhere I would expect. Any help would be greatly appreciated,
> thanks.
> >>
> >>
> >>
> >> This message and any attachment may contain information that is
> >> confidential and/or proprietary. Any use, disclosure, copying, storing,
> or
> >> distribution of this e-mail or any attached file by anyone other than
> the
> >> intended recipient is strictly prohibited. If you have received this
> >> message in error, please notify the sender by reply email and delete the
> >> message and any attachments. Thank you.
> >>
>
>
>
> This message and any attachment may contain information that is
> confidential and/or proprietary. Any use, disclosure, copying, storing, or
> distribution of this e-mail or any attached file by anyone other than the
> intended recipient is strictly prohibited. If you have received this
> message in error, please notify the sender by reply email and delete the
> message and any attachments. Thank you.
>
Reply | Threaded
Open this post in threaded view
|

Re: Issue with CDCR bootstrapping in Solr 7.1

Tom Peters
Hi Amrit, I tried issuing hard commits to the various nodes in the target cluster and it does not appear to cause the follower replicas to receive the initial index. The only way I can get the replicas to see the original index is by restarting those nodes (and take care not to restart the leader node otherwise it will replicate from one of the replicas which is missing the index).


> On Nov 30, 2017, at 12:16 PM, Amrit Sarkar <[hidden email]> wrote:
>
> Tom,
>
> This is very useful:
>
>> I found a way to get the follower replicas to receive the documents from
>> the leader in the target data center, I have to restart the solr instance
>> running on that server. Not sure if this information helps at all.
>
>
> You have to issue hardcommit on target after the bootstrapping is done.
> Reloading makes the core opening a new searcher. While explicit commit is
> issued at target leader after the BS is done, follower are left unattended
> though the docs are copied over.
>
> Amrit Sarkar
> Search Engineer
> Lucidworks, Inc.
> 415-589-9269
> www.lucidworks.com
> Twitter http://twitter.com/lucidworks
> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> Medium: https://medium.com/@sarkaramrit2
>
> On Thu, Nov 30, 2017 at 10:06 PM, Tom Peters <[hidden email]> wrote:
>
>> Hi Amrit,
>>
>> Starting with more documents doesn't appear to have made a difference.
>> This time I tried with >1000 docs. Here are the steps I took:
>>
>> 1. Deleted the collection on both the source and target DCs.
>>
>> 2. Recreated the collections.
>>
>> 3. Indexed >1000 documents on source data center, hard commmit
>>
>>  $ for i in solr0{1,2}-{a,b,c}; do echo -n "$i: "; curl -s
>> $i:8080/solr/mycollection/select'?q=*:*' | jq '.response.numFound'; done
>>  solr01-a: 1368
>>  solr01-b: 1368
>>  solr01-c: 1368
>>  solr02-a: 0
>>  solr02-b: 0
>>  solr02-c: 0
>>
>> 4. Enabled CDCR and checked docs
>>
>>  $ curl 'solr01-a:8080/solr/synacor/cdcr?action=START'
>>
>>  $ for i in solr0{1,2}-{a,b,c}; do echo -n "$i: "; curl -s
>> $i:8080/solr/mycollection/select'?q=*:*' | jq '.response.numFound'; done
>>  solr01-a: 1368
>>  solr01-b: 1368
>>  solr01-c: 1368
>>  solr02-a: 0
>>  solr02-b: 0
>>  solr02-c: 1368
>>
>> Some additional notes:
>>
>> * I do not have numRecordsToKeep defined in my solrconfig.xml, so I assume
>> it will use the default of 100
>>
>> * I found a way to get the follower replicas to receive the documents from
>> the leader in the target data center, I have to restart the solr instance
>> running on that server. Not sure if this information helps at all.
>>
>>> On Nov 30, 2017, at 11:22 AM, Amrit Sarkar <[hidden email]>
>> wrote:
>>>
>>> Hi Tom,
>>>
>>> I see what you are saying and I too think this is a bug, but I will
>> confirm
>>> once on the code. Bootstrapping should happen on all the nodes of the
>>> target.
>>>
>>> Meanwhile can you index more than 100 documents in the source and do the
>>> exact same experiment again. Followers will not copy the entire index of
>>> Leader unless the difference in versions in docs are more than
>>> "numRecordsToKeep", which is default 100, unless you have modified in
>>> solrconfig.xml.
>>>
>>> Looking forward to your analysis.
>>>
>>> Amrit Sarkar
>>> Search Engineer
>>> Lucidworks, Inc.
>>> 415-589-9269
>>> www.lucidworks.com
>>> Twitter http://twitter.com/lucidworks
>>> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>>> Medium: https://medium.com/@sarkaramrit2
>>>
>>> On Thu, Nov 30, 2017 at 9:03 PM, Tom Peters <[hidden email]> wrote:
>>>
>>>> I'm running into an issue with the initial CDCR bootstrapping of an
>>>> existing index. In short, after turning on CDCR only the leader replica
>> in
>>>> the target data center will have the documents replicated and it will
>> not
>>>> exist in any of the follower replicas in the target data center. All
>>>> subsequent incremental updates made to the source datacenter will
>> appear in
>>>> all replicas in the target data center.
>>>>
>>>> A little more details:
>>>>
>>>> I have two clusters setup, a source cluster and a target cluster. Each
>>>> cluster has only one shard and three replicas. I used the configuration
>>>> detailed in the Source and Target sections of the reference guide as-is
>>>> with the exception of updating the zkHost (https://lucene.apache.org/
>>>> solr/guide/7_1/cross-data-center-replication-cdcr.html#
>>>> cdcr-configuration-2).
>>>>
>>>> The source data center has the following nodes:
>>>>       solr01-a, solr01-b, and solr01-c
>>>>
>>>> The target data center has the following nodes:
>>>>       solr02-a, solr02-b, and solr02-c
>>>>
>>>> Here are the steps that I've done:
>>>>
>>>> 1. Create collection in source and target data centers
>>>>
>>>> 2. Add a number of documents to the source data center
>>>>
>>>> 3. Verify:
>>>>
>>>>   $ for i in solr0{1,2}-{a,b,c}; do echo -n "$i: "; curl -s
>>>> $i:8080/solr/mycollection/select'?q=*:*' | jq '.response.numFound';
>> done
>>>>   solr01-a: 81
>>>>   solr01-b: 81
>>>>   solr01-c: 81
>>>>   solr02-a: 0
>>>>   solr02-b: 0
>>>>   solr02-c: 0
>>>>
>>>> 4. Start CDCR:
>>>>
>>>>   $ curl 'solr01-a:8080/solr/mycollection/cdcr?action=START'
>>>>
>>>> 5. See if target data center has received the initial index
>>>>
>>>>   $ for i in solr0{1,2}-{a,b,c}; do echo -n "$i: "; curl -s
>>>> $i:8080/solr/mycollection/select'?q=*:*' | jq '.response.numFound';
>> done
>>>>   solr01-a: 81
>>>>   solr01-b: 81
>>>>   solr01-c: 81
>>>>   solr02-a: 0
>>>>   solr02-b: 0
>>>>   solr02-c: 81
>>>>
>>>>   note: only -c has received the index
>>>>
>>>> 6. Add another document to the source cluster
>>>>
>>>> 7. See how many documents are in each node:
>>>>
>>>>   $ for i in solr0{1,2}-{a,b,c}; do echo -n "$i: "; curl -s
>>>> $i:8080/solr/mycollection/select'?q=*:*' | jq '.response.numFound';
>> done
>>>>   solr01-a: 82
>>>>   solr01-b: 82
>>>>   solr01-c: 82
>>>>   solr02-a: 1
>>>>   solr02-b: 1
>>>>   solr02-c: 82
>>>>
>>>>
>>>> As you can see, the initial index only made it to one of the replicas in
>>>> the target data center, but subsequent incremental updates have appeared
>>>> everywhere I would expect. Any help would be greatly appreciated,
>> thanks.
>>>>
>>>>
>>>>
>>>> This message and any attachment may contain information that is
>>>> confidential and/or proprietary. Any use, disclosure, copying, storing,
>> or
>>>> distribution of this e-mail or any attached file by anyone other than
>> the
>>>> intended recipient is strictly prohibited. If you have received this
>>>> message in error, please notify the sender by reply email and delete the
>>>> message and any attachments. Thank you.
>>>>
>>
>>
>>
>> This message and any attachment may contain information that is
>> confidential and/or proprietary. Any use, disclosure, copying, storing, or
>> distribution of this e-mail or any attached file by anyone other than the
>> intended recipient is strictly prohibited. If you have received this
>> message in error, please notify the sender by reply email and delete the
>> message and any attachments. Thank you.
>>



This message and any attachment may contain information that is confidential and/or proprietary. Any use, disclosure, copying, storing, or distribution of this e-mail or any attached file by anyone other than the intended recipient is strictly prohibited. If you have received this message in error, please notify the sender by reply email and delete the message and any attachments. Thank you.
Reply | Threaded
Open this post in threaded view
|

Re: Issue with CDCR bootstrapping in Solr 7.1

Amrit Sarkar
Tom,

(and take care not to restart the leader node otherwise it will replicate
> from one of the replicas which is missing the index).

How is this possible? Ok I will look more into it. Appreciate if someone
else also chimes in if they have similar issue.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2

On Fri, Dec 1, 2017 at 4:49 AM, Tom Peters <[hidden email]> wrote:

> Hi Amrit, I tried issuing hard commits to the various nodes in the target
> cluster and it does not appear to cause the follower replicas to receive
> the initial index. The only way I can get the replicas to see the original
> index is by restarting those nodes (and take care not to restart the leader
> node otherwise it will replicate from one of the replicas which is missing
> the index).
>
>
> > On Nov 30, 2017, at 12:16 PM, Amrit Sarkar <[hidden email]>
> wrote:
> >
> > Tom,
> >
> > This is very useful:
> >
> >> I found a way to get the follower replicas to receive the documents from
> >> the leader in the target data center, I have to restart the solr
> instance
> >> running on that server. Not sure if this information helps at all.
> >
> >
> > You have to issue hardcommit on target after the bootstrapping is done.
> > Reloading makes the core opening a new searcher. While explicit commit is
> > issued at target leader after the BS is done, follower are left
> unattended
> > though the docs are copied over.
> >
> > Amrit Sarkar
> > Search Engineer
> > Lucidworks, Inc.
> > 415-589-9269
> > www.lucidworks.com
> > Twitter http://twitter.com/lucidworks
> > LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> > Medium: https://medium.com/@sarkaramrit2
> >
> > On Thu, Nov 30, 2017 at 10:06 PM, Tom Peters <[hidden email]>
> wrote:
> >
> >> Hi Amrit,
> >>
> >> Starting with more documents doesn't appear to have made a difference.
> >> This time I tried with >1000 docs. Here are the steps I took:
> >>
> >> 1. Deleted the collection on both the source and target DCs.
> >>
> >> 2. Recreated the collections.
> >>
> >> 3. Indexed >1000 documents on source data center, hard commmit
> >>
> >>  $ for i in solr0{1,2}-{a,b,c}; do echo -n "$i: "; curl -s
> >> $i:8080/solr/mycollection/select'?q=*:*' | jq '.response.numFound';
> done
> >>  solr01-a: 1368
> >>  solr01-b: 1368
> >>  solr01-c: 1368
> >>  solr02-a: 0
> >>  solr02-b: 0
> >>  solr02-c: 0
> >>
> >> 4. Enabled CDCR and checked docs
> >>
> >>  $ curl 'solr01-a:8080/solr/synacor/cdcr?action=START'
> >>
> >>  $ for i in solr0{1,2}-{a,b,c}; do echo -n "$i: "; curl -s
> >> $i:8080/solr/mycollection/select'?q=*:*' | jq '.response.numFound';
> done
> >>  solr01-a: 1368
> >>  solr01-b: 1368
> >>  solr01-c: 1368
> >>  solr02-a: 0
> >>  solr02-b: 0
> >>  solr02-c: 1368
> >>
> >> Some additional notes:
> >>
> >> * I do not have numRecordsToKeep defined in my solrconfig.xml, so I
> assume
> >> it will use the default of 100
> >>
> >> * I found a way to get the follower replicas to receive the documents
> from
> >> the leader in the target data center, I have to restart the solr
> instance
> >> running on that server. Not sure if this information helps at all.
> >>
> >>> On Nov 30, 2017, at 11:22 AM, Amrit Sarkar <[hidden email]>
> >> wrote:
> >>>
> >>> Hi Tom,
> >>>
> >>> I see what you are saying and I too think this is a bug, but I will
> >> confirm
> >>> once on the code. Bootstrapping should happen on all the nodes of the
> >>> target.
> >>>
> >>> Meanwhile can you index more than 100 documents in the source and do
> the
> >>> exact same experiment again. Followers will not copy the entire index
> of
> >>> Leader unless the difference in versions in docs are more than
> >>> "numRecordsToKeep", which is default 100, unless you have modified in
> >>> solrconfig.xml.
> >>>
> >>> Looking forward to your analysis.
> >>>
> >>> Amrit Sarkar
> >>> Search Engineer
> >>> Lucidworks, Inc.
> >>> 415-589-9269
> >>> www.lucidworks.com
> >>> Twitter http://twitter.com/lucidworks
> >>> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> >>> Medium: https://medium.com/@sarkaramrit2
> >>>
> >>> On Thu, Nov 30, 2017 at 9:03 PM, Tom Peters <[hidden email]>
> wrote:
> >>>
> >>>> I'm running into an issue with the initial CDCR bootstrapping of an
> >>>> existing index. In short, after turning on CDCR only the leader
> replica
> >> in
> >>>> the target data center will have the documents replicated and it will
> >> not
> >>>> exist in any of the follower replicas in the target data center. All
> >>>> subsequent incremental updates made to the source datacenter will
> >> appear in
> >>>> all replicas in the target data center.
> >>>>
> >>>> A little more details:
> >>>>
> >>>> I have two clusters setup, a source cluster and a target cluster. Each
> >>>> cluster has only one shard and three replicas. I used the
> configuration
> >>>> detailed in the Source and Target sections of the reference guide
> as-is
> >>>> with the exception of updating the zkHost (https://lucene.apache.org/
> >>>> solr/guide/7_1/cross-data-center-replication-cdcr.html#
> >>>> cdcr-configuration-2).
> >>>>
> >>>> The source data center has the following nodes:
> >>>>       solr01-a, solr01-b, and solr01-c
> >>>>
> >>>> The target data center has the following nodes:
> >>>>       solr02-a, solr02-b, and solr02-c
> >>>>
> >>>> Here are the steps that I've done:
> >>>>
> >>>> 1. Create collection in source and target data centers
> >>>>
> >>>> 2. Add a number of documents to the source data center
> >>>>
> >>>> 3. Verify:
> >>>>
> >>>>   $ for i in solr0{1,2}-{a,b,c}; do echo -n "$i: "; curl -s
> >>>> $i:8080/solr/mycollection/select'?q=*:*' | jq '.response.numFound';
> >> done
> >>>>   solr01-a: 81
> >>>>   solr01-b: 81
> >>>>   solr01-c: 81
> >>>>   solr02-a: 0
> >>>>   solr02-b: 0
> >>>>   solr02-c: 0
> >>>>
> >>>> 4. Start CDCR:
> >>>>
> >>>>   $ curl 'solr01-a:8080/solr/mycollection/cdcr?action=START'
> >>>>
> >>>> 5. See if target data center has received the initial index
> >>>>
> >>>>   $ for i in solr0{1,2}-{a,b,c}; do echo -n "$i: "; curl -s
> >>>> $i:8080/solr/mycollection/select'?q=*:*' | jq '.response.numFound';
> >> done
> >>>>   solr01-a: 81
> >>>>   solr01-b: 81
> >>>>   solr01-c: 81
> >>>>   solr02-a: 0
> >>>>   solr02-b: 0
> >>>>   solr02-c: 81
> >>>>
> >>>>   note: only -c has received the index
> >>>>
> >>>> 6. Add another document to the source cluster
> >>>>
> >>>> 7. See how many documents are in each node:
> >>>>
> >>>>   $ for i in solr0{1,2}-{a,b,c}; do echo -n "$i: "; curl -s
> >>>> $i:8080/solr/mycollection/select'?q=*:*' | jq '.response.numFound';
> >> done
> >>>>   solr01-a: 82
> >>>>   solr01-b: 82
> >>>>   solr01-c: 82
> >>>>   solr02-a: 1
> >>>>   solr02-b: 1
> >>>>   solr02-c: 82
> >>>>
> >>>>
> >>>> As you can see, the initial index only made it to one of the replicas
> in
> >>>> the target data center, but subsequent incremental updates have
> appeared
> >>>> everywhere I would expect. Any help would be greatly appreciated,
> >> thanks.
> >>>>
> >>>>
> >>>>
> >>>> This message and any attachment may contain information that is
> >>>> confidential and/or proprietary. Any use, disclosure, copying,
> storing,
> >> or
> >>>> distribution of this e-mail or any attached file by anyone other than
> >> the
> >>>> intended recipient is strictly prohibited. If you have received this
> >>>> message in error, please notify the sender by reply email and delete
> the
> >>>> message and any attachments. Thank you.
> >>>>
> >>
> >>
> >>
> >> This message and any attachment may contain information that is
> >> confidential and/or proprietary. Any use, disclosure, copying, storing,
> or
> >> distribution of this e-mail or any attached file by anyone other than
> the
> >> intended recipient is strictly prohibited. If you have received this
> >> message in error, please notify the sender by reply email and delete the
> >> message and any attachments. Thank you.
> >>
>
>
>
> This message and any attachment may contain information that is
> confidential and/or proprietary. Any use, disclosure, copying, storing, or
> distribution of this e-mail or any attached file by anyone other than the
> intended recipient is strictly prohibited. If you have received this
> message in error, please notify the sender by reply email and delete the
> message and any attachments. Thank you.
>
Reply | Threaded
Open this post in threaded view
|

Re: Issue with CDCR bootstrapping in Solr 7.1

Tom Peters
Not sure how it's possible. But I also tried using the _default config and just adding in the source and target configuration to make sure I didn't have something wonky in my custom solrconfig that was causing this issue. I can confirm that until I restart the follower nodes, they will not receive the initial index.

> On Dec 1, 2017, at 12:52 AM, Amrit Sarkar <[hidden email]> wrote:
>
> Tom,
>
> (and take care not to restart the leader node otherwise it will replicate
>> from one of the replicas which is missing the index).
>
> How is this possible? Ok I will look more into it. Appreciate if someone
> else also chimes in if they have similar issue.
>
> Amrit Sarkar
> Search Engineer
> Lucidworks, Inc.
> 415-589-9269
> www.lucidworks.com
> Twitter http://twitter.com/lucidworks
> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> Medium: https://medium.com/@sarkaramrit2
>
> On Fri, Dec 1, 2017 at 4:49 AM, Tom Peters <[hidden email]> wrote:
>
>> Hi Amrit, I tried issuing hard commits to the various nodes in the target
>> cluster and it does not appear to cause the follower replicas to receive
>> the initial index. The only way I can get the replicas to see the original
>> index is by restarting those nodes (and take care not to restart the leader
>> node otherwise it will replicate from one of the replicas which is missing
>> the index).
>>
>>
>>> On Nov 30, 2017, at 12:16 PM, Amrit Sarkar <[hidden email]>
>> wrote:
>>>
>>> Tom,
>>>
>>> This is very useful:
>>>
>>>> I found a way to get the follower replicas to receive the documents from
>>>> the leader in the target data center, I have to restart the solr
>> instance
>>>> running on that server. Not sure if this information helps at all.
>>>
>>>
>>> You have to issue hardcommit on target after the bootstrapping is done.
>>> Reloading makes the core opening a new searcher. While explicit commit is
>>> issued at target leader after the BS is done, follower are left
>> unattended
>>> though the docs are copied over.
>>>
>>> Amrit Sarkar
>>> Search Engineer
>>> Lucidworks, Inc.
>>> 415-589-9269
>>> www.lucidworks.com
>>> Twitter http://twitter.com/lucidworks
>>> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>>> Medium: https://medium.com/@sarkaramrit2
>>>
>>> On Thu, Nov 30, 2017 at 10:06 PM, Tom Peters <[hidden email]>
>> wrote:
>>>
>>>> Hi Amrit,
>>>>
>>>> Starting with more documents doesn't appear to have made a difference.
>>>> This time I tried with >1000 docs. Here are the steps I took:
>>>>
>>>> 1. Deleted the collection on both the source and target DCs.
>>>>
>>>> 2. Recreated the collections.
>>>>
>>>> 3. Indexed >1000 documents on source data center, hard commmit
>>>>
>>>> $ for i in solr0{1,2}-{a,b,c}; do echo -n "$i: "; curl -s
>>>> $i:8080/solr/mycollection/select'?q=*:*' | jq '.response.numFound';
>> done
>>>> solr01-a: 1368
>>>> solr01-b: 1368
>>>> solr01-c: 1368
>>>> solr02-a: 0
>>>> solr02-b: 0
>>>> solr02-c: 0
>>>>
>>>> 4. Enabled CDCR and checked docs
>>>>
>>>> $ curl 'solr01-a:8080/solr/synacor/cdcr?action=START'
>>>>
>>>> $ for i in solr0{1,2}-{a,b,c}; do echo -n "$i: "; curl -s
>>>> $i:8080/solr/mycollection/select'?q=*:*' | jq '.response.numFound';
>> done
>>>> solr01-a: 1368
>>>> solr01-b: 1368
>>>> solr01-c: 1368
>>>> solr02-a: 0
>>>> solr02-b: 0
>>>> solr02-c: 1368
>>>>
>>>> Some additional notes:
>>>>
>>>> * I do not have numRecordsToKeep defined in my solrconfig.xml, so I
>> assume
>>>> it will use the default of 100
>>>>
>>>> * I found a way to get the follower replicas to receive the documents
>> from
>>>> the leader in the target data center, I have to restart the solr
>> instance
>>>> running on that server. Not sure if this information helps at all.
>>>>
>>>>> On Nov 30, 2017, at 11:22 AM, Amrit Sarkar <[hidden email]>
>>>> wrote:
>>>>>
>>>>> Hi Tom,
>>>>>
>>>>> I see what you are saying and I too think this is a bug, but I will
>>>> confirm
>>>>> once on the code. Bootstrapping should happen on all the nodes of the
>>>>> target.
>>>>>
>>>>> Meanwhile can you index more than 100 documents in the source and do
>> the
>>>>> exact same experiment again. Followers will not copy the entire index
>> of
>>>>> Leader unless the difference in versions in docs are more than
>>>>> "numRecordsToKeep", which is default 100, unless you have modified in
>>>>> solrconfig.xml.
>>>>>
>>>>> Looking forward to your analysis.
>>>>>
>>>>> Amrit Sarkar
>>>>> Search Engineer
>>>>> Lucidworks, Inc.
>>>>> 415-589-9269
>>>>> www.lucidworks.com
>>>>> Twitter http://twitter.com/lucidworks
>>>>> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>>>>> Medium: https://medium.com/@sarkaramrit2
>>>>>
>>>>> On Thu, Nov 30, 2017 at 9:03 PM, Tom Peters <[hidden email]>
>> wrote:
>>>>>
>>>>>> I'm running into an issue with the initial CDCR bootstrapping of an
>>>>>> existing index. In short, after turning on CDCR only the leader
>> replica
>>>> in
>>>>>> the target data center will have the documents replicated and it will
>>>> not
>>>>>> exist in any of the follower replicas in the target data center. All
>>>>>> subsequent incremental updates made to the source datacenter will
>>>> appear in
>>>>>> all replicas in the target data center.
>>>>>>
>>>>>> A little more details:
>>>>>>
>>>>>> I have two clusters setup, a source cluster and a target cluster. Each
>>>>>> cluster has only one shard and three replicas. I used the
>> configuration
>>>>>> detailed in the Source and Target sections of the reference guide
>> as-is
>>>>>> with the exception of updating the zkHost (https://lucene.apache.org/
>>>>>> solr/guide/7_1/cross-data-center-replication-cdcr.html#
>>>>>> cdcr-configuration-2).
>>>>>>
>>>>>> The source data center has the following nodes:
>>>>>>      solr01-a, solr01-b, and solr01-c
>>>>>>
>>>>>> The target data center has the following nodes:
>>>>>>      solr02-a, solr02-b, and solr02-c
>>>>>>
>>>>>> Here are the steps that I've done:
>>>>>>
>>>>>> 1. Create collection in source and target data centers
>>>>>>
>>>>>> 2. Add a number of documents to the source data center
>>>>>>
>>>>>> 3. Verify:
>>>>>>
>>>>>>  $ for i in solr0{1,2}-{a,b,c}; do echo -n "$i: "; curl -s
>>>>>> $i:8080/solr/mycollection/select'?q=*:*' | jq '.response.numFound';
>>>> done
>>>>>>  solr01-a: 81
>>>>>>  solr01-b: 81
>>>>>>  solr01-c: 81
>>>>>>  solr02-a: 0
>>>>>>  solr02-b: 0
>>>>>>  solr02-c: 0
>>>>>>
>>>>>> 4. Start CDCR:
>>>>>>
>>>>>>  $ curl 'solr01-a:8080/solr/mycollection/cdcr?action=START'
>>>>>>
>>>>>> 5. See if target data center has received the initial index
>>>>>>
>>>>>>  $ for i in solr0{1,2}-{a,b,c}; do echo -n "$i: "; curl -s
>>>>>> $i:8080/solr/mycollection/select'?q=*:*' | jq '.response.numFound';
>>>> done
>>>>>>  solr01-a: 81
>>>>>>  solr01-b: 81
>>>>>>  solr01-c: 81
>>>>>>  solr02-a: 0
>>>>>>  solr02-b: 0
>>>>>>  solr02-c: 81
>>>>>>
>>>>>>  note: only -c has received the index
>>>>>>
>>>>>> 6. Add another document to the source cluster
>>>>>>
>>>>>> 7. See how many documents are in each node:
>>>>>>
>>>>>>  $ for i in solr0{1,2}-{a,b,c}; do echo -n "$i: "; curl -s
>>>>>> $i:8080/solr/mycollection/select'?q=*:*' | jq '.response.numFound';
>>>> done
>>>>>>  solr01-a: 82
>>>>>>  solr01-b: 82
>>>>>>  solr01-c: 82
>>>>>>  solr02-a: 1
>>>>>>  solr02-b: 1
>>>>>>  solr02-c: 82
>>>>>>
>>>>>>
>>>>>> As you can see, the initial index only made it to one of the replicas
>> in
>>>>>> the target data center, but subsequent incremental updates have
>> appeared
>>>>>> everywhere I would expect. Any help would be greatly appreciated,
>>>> thanks.
>>>>>>
>>>>>>
>>>>>>
>>>>>> This message and any attachment may contain information that is
>>>>>> confidential and/or proprietary. Any use, disclosure, copying,
>> storing,
>>>> or
>>>>>> distribution of this e-mail or any attached file by anyone other than
>>>> the
>>>>>> intended recipient is strictly prohibited. If you have received this
>>>>>> message in error, please notify the sender by reply email and delete
>> the
>>>>>> message and any attachments. Thank you.
>>>>>>
>>>>
>>>>
>>>>
>>>> This message and any attachment may contain information that is
>>>> confidential and/or proprietary. Any use, disclosure, copying, storing,
>> or
>>>> distribution of this e-mail or any attached file by anyone other than
>> the
>>>> intended recipient is strictly prohibited. If you have received this
>>>> message in error, please notify the sender by reply email and delete the
>>>> message and any attachments. Thank you.
>>>>
>>
>>
>>
>> This message and any attachment may contain information that is
>> confidential and/or proprietary. Any use, disclosure, copying, storing, or
>> distribution of this e-mail or any attached file by anyone other than the
>> intended recipient is strictly prohibited. If you have received this
>> message in error, please notify the sender by reply email and delete the
>> message and any attachments. Thank you.
>>



This message and any attachment may contain information that is confidential and/or proprietary. Any use, disclosure, copying, storing, or distribution of this e-mail or any attached file by anyone other than the intended recipient is strictly prohibited. If you have received this message in error, please notify the sender by reply email and delete the message and any attachments. Thank you.
Reply | Threaded
Open this post in threaded view
|

Re: Issue with CDCR bootstrapping in Solr 7.1

Amrit Sarkar
Tom,

Thank you for trying out bunch of things with CDCR setup. I am successfully
able to replicate the exact issue on my setup, this is a problem.

I have opened a JIRA for the same:
https://issues.apache.org/jira/browse/SOLR-11724. Feel free to add any
relevant details as you like.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2

On Tue, Dec 5, 2017 at 2:23 AM, Tom Peters <[hidden email]> wrote:

> Not sure how it's possible. But I also tried using the _default config and
> just adding in the source and target configuration to make sure I didn't
> have something wonky in my custom solrconfig that was causing this issue. I
> can confirm that until I restart the follower nodes, they will not receive
> the initial index.
>
> > On Dec 1, 2017, at 12:52 AM, Amrit Sarkar <[hidden email]>
> wrote:
> >
> > Tom,
> >
> > (and take care not to restart the leader node otherwise it will replicate
> >> from one of the replicas which is missing the index).
> >
> > How is this possible? Ok I will look more into it. Appreciate if someone
> > else also chimes in if they have similar issue.
> >
> > Amrit Sarkar
> > Search Engineer
> > Lucidworks, Inc.
> > 415-589-9269
> > www.lucidworks.com
> > Twitter http://twitter.com/lucidworks
> > LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> > Medium: https://medium.com/@sarkaramrit2
> >
> > On Fri, Dec 1, 2017 at 4:49 AM, Tom Peters <[hidden email]> wrote:
> >
> >> Hi Amrit, I tried issuing hard commits to the various nodes in the
> target
> >> cluster and it does not appear to cause the follower replicas to receive
> >> the initial index. The only way I can get the replicas to see the
> original
> >> index is by restarting those nodes (and take care not to restart the
> leader
> >> node otherwise it will replicate from one of the replicas which is
> missing
> >> the index).
> >>
> >>
> >>> On Nov 30, 2017, at 12:16 PM, Amrit Sarkar <[hidden email]>
> >> wrote:
> >>>
> >>> Tom,
> >>>
> >>> This is very useful:
> >>>
> >>>> I found a way to get the follower replicas to receive the documents
> from
> >>>> the leader in the target data center, I have to restart the solr
> >> instance
> >>>> running on that server. Not sure if this information helps at all.
> >>>
> >>>
> >>> You have to issue hardcommit on target after the bootstrapping is done.
> >>> Reloading makes the core opening a new searcher. While explicit commit
> is
> >>> issued at target leader after the BS is done, follower are left
> >> unattended
> >>> though the docs are copied over.
> >>>
> >>> Amrit Sarkar
> >>> Search Engineer
> >>> Lucidworks, Inc.
> >>> 415-589-9269
> >>> www.lucidworks.com
> >>> Twitter http://twitter.com/lucidworks
> >>> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> >>> Medium: https://medium.com/@sarkaramrit2
> >>>
> >>> On Thu, Nov 30, 2017 at 10:06 PM, Tom Peters <[hidden email]>
> >> wrote:
> >>>
> >>>> Hi Amrit,
> >>>>
> >>>> Starting with more documents doesn't appear to have made a difference.
> >>>> This time I tried with >1000 docs. Here are the steps I took:
> >>>>
> >>>> 1. Deleted the collection on both the source and target DCs.
> >>>>
> >>>> 2. Recreated the collections.
> >>>>
> >>>> 3. Indexed >1000 documents on source data center, hard commmit
> >>>>
> >>>> $ for i in solr0{1,2}-{a,b,c}; do echo -n "$i: "; curl -s
> >>>> $i:8080/solr/mycollection/select'?q=*:*' | jq '.response.numFound';
> >> done
> >>>> solr01-a: 1368
> >>>> solr01-b: 1368
> >>>> solr01-c: 1368
> >>>> solr02-a: 0
> >>>> solr02-b: 0
> >>>> solr02-c: 0
> >>>>
> >>>> 4. Enabled CDCR and checked docs
> >>>>
> >>>> $ curl 'solr01-a:8080/solr/synacor/cdcr?action=START'
> >>>>
> >>>> $ for i in solr0{1,2}-{a,b,c}; do echo -n "$i: "; curl -s
> >>>> $i:8080/solr/mycollection/select'?q=*:*' | jq '.response.numFound';
> >> done
> >>>> solr01-a: 1368
> >>>> solr01-b: 1368
> >>>> solr01-c: 1368
> >>>> solr02-a: 0
> >>>> solr02-b: 0
> >>>> solr02-c: 1368
> >>>>
> >>>> Some additional notes:
> >>>>
> >>>> * I do not have numRecordsToKeep defined in my solrconfig.xml, so I
> >> assume
> >>>> it will use the default of 100
> >>>>
> >>>> * I found a way to get the follower replicas to receive the documents
> >> from
> >>>> the leader in the target data center, I have to restart the solr
> >> instance
> >>>> running on that server. Not sure if this information helps at all.
> >>>>
> >>>>> On Nov 30, 2017, at 11:22 AM, Amrit Sarkar <[hidden email]>
> >>>> wrote:
> >>>>>
> >>>>> Hi Tom,
> >>>>>
> >>>>> I see what you are saying and I too think this is a bug, but I will
> >>>> confirm
> >>>>> once on the code. Bootstrapping should happen on all the nodes of the
> >>>>> target.
> >>>>>
> >>>>> Meanwhile can you index more than 100 documents in the source and do
> >> the
> >>>>> exact same experiment again. Followers will not copy the entire index
> >> of
> >>>>> Leader unless the difference in versions in docs are more than
> >>>>> "numRecordsToKeep", which is default 100, unless you have modified in
> >>>>> solrconfig.xml.
> >>>>>
> >>>>> Looking forward to your analysis.
> >>>>>
> >>>>> Amrit Sarkar
> >>>>> Search Engineer
> >>>>> Lucidworks, Inc.
> >>>>> 415-589-9269
> >>>>> www.lucidworks.com
> >>>>> Twitter http://twitter.com/lucidworks
> >>>>> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> >>>>> Medium: https://medium.com/@sarkaramrit2
> >>>>>
> >>>>> On Thu, Nov 30, 2017 at 9:03 PM, Tom Peters <[hidden email]>
> >> wrote:
> >>>>>
> >>>>>> I'm running into an issue with the initial CDCR bootstrapping of an
> >>>>>> existing index. In short, after turning on CDCR only the leader
> >> replica
> >>>> in
> >>>>>> the target data center will have the documents replicated and it
> will
> >>>> not
> >>>>>> exist in any of the follower replicas in the target data center. All
> >>>>>> subsequent incremental updates made to the source datacenter will
> >>>> appear in
> >>>>>> all replicas in the target data center.
> >>>>>>
> >>>>>> A little more details:
> >>>>>>
> >>>>>> I have two clusters setup, a source cluster and a target cluster.
> Each
> >>>>>> cluster has only one shard and three replicas. I used the
> >> configuration
> >>>>>> detailed in the Source and Target sections of the reference guide
> >> as-is
> >>>>>> with the exception of updating the zkHost (
> https://lucene.apache.org/
> >>>>>> solr/guide/7_1/cross-data-center-replication-cdcr.html#
> >>>>>> cdcr-configuration-2).
> >>>>>>
> >>>>>> The source data center has the following nodes:
> >>>>>>      solr01-a, solr01-b, and solr01-c
> >>>>>>
> >>>>>> The target data center has the following nodes:
> >>>>>>      solr02-a, solr02-b, and solr02-c
> >>>>>>
> >>>>>> Here are the steps that I've done:
> >>>>>>
> >>>>>> 1. Create collection in source and target data centers
> >>>>>>
> >>>>>> 2. Add a number of documents to the source data center
> >>>>>>
> >>>>>> 3. Verify:
> >>>>>>
> >>>>>>  $ for i in solr0{1,2}-{a,b,c}; do echo -n "$i: "; curl -s
> >>>>>> $i:8080/solr/mycollection/select'?q=*:*' | jq '.response.numFound';
> >>>> done
> >>>>>>  solr01-a: 81
> >>>>>>  solr01-b: 81
> >>>>>>  solr01-c: 81
> >>>>>>  solr02-a: 0
> >>>>>>  solr02-b: 0
> >>>>>>  solr02-c: 0
> >>>>>>
> >>>>>> 4. Start CDCR:
> >>>>>>
> >>>>>>  $ curl 'solr01-a:8080/solr/mycollection/cdcr?action=START'
> >>>>>>
> >>>>>> 5. See if target data center has received the initial index
> >>>>>>
> >>>>>>  $ for i in solr0{1,2}-{a,b,c}; do echo -n "$i: "; curl -s
> >>>>>> $i:8080/solr/mycollection/select'?q=*:*' | jq '.response.numFound';
> >>>> done
> >>>>>>  solr01-a: 81
> >>>>>>  solr01-b: 81
> >>>>>>  solr01-c: 81
> >>>>>>  solr02-a: 0
> >>>>>>  solr02-b: 0
> >>>>>>  solr02-c: 81
> >>>>>>
> >>>>>>  note: only -c has received the index
> >>>>>>
> >>>>>> 6. Add another document to the source cluster
> >>>>>>
> >>>>>> 7. See how many documents are in each node:
> >>>>>>
> >>>>>>  $ for i in solr0{1,2}-{a,b,c}; do echo -n "$i: "; curl -s
> >>>>>> $i:8080/solr/mycollection/select'?q=*:*' | jq '.response.numFound';
> >>>> done
> >>>>>>  solr01-a: 82
> >>>>>>  solr01-b: 82
> >>>>>>  solr01-c: 82
> >>>>>>  solr02-a: 1
> >>>>>>  solr02-b: 1
> >>>>>>  solr02-c: 82
> >>>>>>
> >>>>>>
> >>>>>> As you can see, the initial index only made it to one of the
> replicas
> >> in
> >>>>>> the target data center, but subsequent incremental updates have
> >> appeared
> >>>>>> everywhere I would expect. Any help would be greatly appreciated,
> >>>> thanks.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> This message and any attachment may contain information that is
> >>>>>> confidential and/or proprietary. Any use, disclosure, copying,
> >> storing,
> >>>> or
> >>>>>> distribution of this e-mail or any attached file by anyone other
> than
> >>>> the
> >>>>>> intended recipient is strictly prohibited. If you have received this
> >>>>>> message in error, please notify the sender by reply email and delete
> >> the
> >>>>>> message and any attachments. Thank you.
> >>>>>>
> >>>>
> >>>>
> >>>>
> >>>> This message and any attachment may contain information that is
> >>>> confidential and/or proprietary. Any use, disclosure, copying,
> storing,
> >> or
> >>>> distribution of this e-mail or any attached file by anyone other than
> >> the
> >>>> intended recipient is strictly prohibited. If you have received this
> >>>> message in error, please notify the sender by reply email and delete
> the
> >>>> message and any attachments. Thank you.
> >>>>
> >>
> >>
> >>
> >> This message and any attachment may contain information that is
> >> confidential and/or proprietary. Any use, disclosure, copying, storing,
> or
> >> distribution of this e-mail or any attached file by anyone other than
> the
> >> intended recipient is strictly prohibited. If you have received this
> >> message in error, please notify the sender by reply email and delete the
> >> message and any attachments. Thank you.
> >>
>
>
>
> This message and any attachment may contain information that is
> confidential and/or proprietary. Any use, disclosure, copying, storing, or
> distribution of this e-mail or any attached file by anyone other than the
> intended recipient is strictly prohibited. If you have received this
> message in error, please notify the sender by reply email and delete the
> message and any attachments. Thank you.
>