Quantcast

Upgrading cluster from 4 to 5. Slow replication detected.

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Upgrading cluster from 4 to 5. Slow replication detected.

Himanshu Sachdeva
Hi,

We're starting to upgrade our solr cluster to version 5.5. So we removed
one slave node from the cluster and installed solr 5.5.4 on it and started
solr. So it started copying the index from the master. However, we noticed
a drop in the replication speed compared to the other nodes which were
still running solr 4. To do a fair comparison, I removed another slave node
from the cluster and disabled replication on it till the new node has
caught up with it. When both these nodes were at the same index generation,
I turned replication on for both the nodes. Now, it has been over 15 hours
since this exercise and the new node has again started lagging behind.
Currently, the node with solr 5.5 is seven generations behind the other
node.
Is it because the master is running solr 4 and this node is running solr
5?  Has anyone else faced similar problem while upgrading?

--
Himanshu Sachdeva
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Upgrading cluster from 4 to 5. Slow replication detected.

Shawn Heisey-2
On 4/14/2017 2:10 AM, Himanshu Sachdeva wrote:

> We're starting to upgrade our solr cluster to version 5.5. So we
> removed one slave node from the cluster and installed solr 5.5.4 on it
> and started solr. So it started copying the index from the master.
> However, we noticed a drop in the replication speed compared to the
> other nodes which were still running solr 4. To do a fair comparison,
> I removed another slave node from the cluster and disabled replication
> on it till the new node has caught up with it. When both these nodes
> were at the same index generation, I turned replication on for both
> the nodes. Now, it has been over 15 hours since this exercise and the
> new node has again started lagging behind. Currently, the node with
> solr 5.5 is seven generations behind the other node.

Version 5 is capable of replication bandwidth throttling, but unless you
actually configure the maxWriteMBPerSec attribute in the replication
handler definition, this should not happen by default.

One problem that I think might be possible is that the heap has been
left at the default 512MB on the new 5.5.4 install and therefore the
machine is doing constant full garbage collections to free up memory for
normal operation, which would make Solr run EXTREMELY slowly.
Eventually a machine in this state would most likely encounter an
OutOfMemoryError.  On non-windows systems, OOME will cause a forced halt
of the entire Solr instance.

The heap might not be the problem ... if it's not, then I do not know
what is going on.  Are there any errors or warnings in solr.log?

Thanks,
Shawn

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Upgrading cluster from 4 to 5. Slow replication detected.

Himanshu Sachdeva
Hello Shawn,

Thanks for taking the time out to help me. I had assigned 45GB to the heap
as starting memory and maximum memory it can use. The logs show the
following two warnings repeatedly :

   - IndexFetcher : Cannot complete replication attempt because file
   already exists.
   - IndexFetcher : Replication attempt was not successful - trying a full
   index replication reloadCore=false.



On Tue, Apr 18, 2017 at 6:58 PM, Shawn Heisey <[hidden email]> wrote:

> On 4/14/2017 2:10 AM, Himanshu Sachdeva wrote:
> > We're starting to upgrade our solr cluster to version 5.5. So we
> > removed one slave node from the cluster and installed solr 5.5.4 on it
> > and started solr. So it started copying the index from the master.
> > However, we noticed a drop in the replication speed compared to the
> > other nodes which were still running solr 4. To do a fair comparison,
> > I removed another slave node from the cluster and disabled replication
> > on it till the new node has caught up with it. When both these nodes
> > were at the same index generation, I turned replication on for both
> > the nodes. Now, it has been over 15 hours since this exercise and the
> > new node has again started lagging behind. Currently, the node with
> > solr 5.5 is seven generations behind the other node.
>
> Version 5 is capable of replication bandwidth throttling, but unless you
> actually configure the maxWriteMBPerSec attribute in the replication
> handler definition, this should not happen by default.
>
> One problem that I think might be possible is that the heap has been
> left at the default 512MB on the new 5.5.4 install and therefore the
> machine is doing constant full garbage collections to free up memory for
> normal operation, which would make Solr run EXTREMELY slowly.
> Eventually a machine in this state would most likely encounter an
> OutOfMemoryError.  On non-windows systems, OOME will cause a forced halt
> of the entire Solr instance.
>
> The heap might not be the problem ... if it's not, then I do not know
> what is going on.  Are there any errors or warnings in solr.log?
>
> Thanks,
> Shawn
>
>


--
Himanshu Sachdeva
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Upgrading cluster from 4 to 5. Slow replication detected.

Himanshu Sachdeva
I am guessing that the index has got corrupted somehow and deleted the data
directory on the slave. It has started copying the index. I'll report here
once that gets completed. If there is any other suggestion you might have
please reply back in the meantime. Thanks.

On Wed, Apr 19, 2017 at 12:21 PM, Himanshu Sachdeva <[hidden email]>
wrote:

> Hello Shawn,
>
> Thanks for taking the time out to help me. I had assigned 45GB to the heap
> as starting memory and maximum memory it can use. The logs show the
> following two warnings repeatedly :
>
>    - IndexFetcher : Cannot complete replication attempt because file
>    already exists.
>    - IndexFetcher : Replication attempt was not successful - trying a
>    full index replication reloadCore=false.
>
>
>
> On Tue, Apr 18, 2017 at 6:58 PM, Shawn Heisey <[hidden email]> wrote:
>
>> On 4/14/2017 2:10 AM, Himanshu Sachdeva wrote:
>> > We're starting to upgrade our solr cluster to version 5.5. So we
>> > removed one slave node from the cluster and installed solr 5.5.4 on it
>> > and started solr. So it started copying the index from the master.
>> > However, we noticed a drop in the replication speed compared to the
>> > other nodes which were still running solr 4. To do a fair comparison,
>> > I removed another slave node from the cluster and disabled replication
>> > on it till the new node has caught up with it. When both these nodes
>> > were at the same index generation, I turned replication on for both
>> > the nodes. Now, it has been over 15 hours since this exercise and the
>> > new node has again started lagging behind. Currently, the node with
>> > solr 5.5 is seven generations behind the other node.
>>
>> Version 5 is capable of replication bandwidth throttling, but unless you
>> actually configure the maxWriteMBPerSec attribute in the replication
>> handler definition, this should not happen by default.
>>
>> One problem that I think might be possible is that the heap has been
>> left at the default 512MB on the new 5.5.4 install and therefore the
>> machine is doing constant full garbage collections to free up memory for
>> normal operation, which would make Solr run EXTREMELY slowly.
>> Eventually a machine in this state would most likely encounter an
>> OutOfMemoryError.  On non-windows systems, OOME will cause a forced halt
>> of the entire Solr instance.
>>
>> The heap might not be the problem ... if it's not, then I do not know
>> what is going on.  Are there any errors or warnings in solr.log?
>>
>> Thanks,
>> Shawn
>>
>>
>
>
> --
> Himanshu Sachdeva
>
>


--
Himanshu Sachdeva
Loading...