Quantcast

Nodes goes down but never recovers.

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Nodes goes down but never recovers.

Pranaya Behera-2
Hi,
     Through SolrJ I am trying to upload configsets and create
collections in my solrcloud.

Setup:
1 Standalone zookeeper listening on 2181 port. version 3.4.10
-- bin/zkServer.sh start
3 Starting solr nodes. (All running from the same solr.home) version
6.5.0 and as well in 6.2.1
-- bin/solr -c -z localhost:2181 -p 8983
-- bin/solr -c -z localhost:2181 -p 8984
-- bin/solr -c -z localhost:2181 -p 8985

After first run of my java application to upload the config and create
the collections in solr through zookeeper is seemless and working
fine.
Here is the clusterstatus after the first run.
https://gist.github.com/shadow-fox/5874f8b5de93fff0f5bcc8886be81d4d#file-3nodes-json

Stopped one solr node via:
-- bin/solr stop -p 8985
clusterstatus changed to:
https://gist.github.com/shadow-fox/5874f8b5de93fff0f5bcc8886be81d4d#file-3nodes1down-json

Till now everything is as expected.

Here is the remaining part where it confuses me.

Bring the down node back to life. Clusterstatus changed from 2 node
down with 1 node not found to 3 node down including the new node that
just brought up.
https://gist.github.com/shadow-fox/5874f8b5de93fff0f5bcc8886be81d4d#file-3nodes3down-json
Expected result should be all the other nodes should be in active mode
and this one would be recovery mode and then it would be active mode,
as this node had data before i stopped it using the script.

Now I added one more node to the cluster via
-- bin/solr -c -z localhost:2181 -p 8986
The clusterstatus changed to:
https://gist.github.com/shadow-fox/5874f8b5de93fff0f5bcc8886be81d4d#file-4node3down-json
This one just retains the previous state and adds the node to the cluster.


When bringing up the removed node which was previously in the cluster
which was registered to the zookeeper and has data about the
collections be registered as active rather than making every other
node down ? If so what is the solution to this ?

When we add more nodes to an existing cluster, how to ensure that it
also gets the same collections/data i.e. basically synchronizes with
the other nodes which are present in the node rather than manually
create collection for that specific node ? As you can see from the
lastly added node's clusterstate it is there in the live_nodes but
never got the collections into its data dir.
Is there any other way to add a node with the existing cluster with
the cluster data ?

For the completion here is the code that is used to upload config and
create collection through CloudSolrClient in Solrj.(Not full code but
part of it where the operation is happening.)
https://gist.github.com/shadow-fox/5874f8b5de93fff0f5bcc8886be81d4d#file-code-java
Thats all there is for a collection to create: upload configsets to
zookeeper, create collection and reload collection if required.

This I have tried in my local Mac OS Sierra and also in AWS env which
same effect.



--
Thanks & Regards
Pranaya PR Behera
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Nodes goes down but never recovers.

Pranaya Behera-2
Hi,
     Can someone from the mailing list also confirm the same findings
? I am at wit's end on what to do to fix this. Please guide me to
create a patch for the same.

On Thu, Apr 20, 2017 at 3:13 PM, Pranaya Behera
<[hidden email]> wrote:

> Hi,
>      Through SolrJ I am trying to upload configsets and create
> collections in my solrcloud.
>
> Setup:
> 1 Standalone zookeeper listening on 2181 port. version 3.4.10
> -- bin/zkServer.sh start
> 3 Starting solr nodes. (All running from the same solr.home) version
> 6.5.0 and as well in 6.2.1
> -- bin/solr -c -z localhost:2181 -p 8983
> -- bin/solr -c -z localhost:2181 -p 8984
> -- bin/solr -c -z localhost:2181 -p 8985
>
> After first run of my java application to upload the config and create
> the collections in solr through zookeeper is seemless and working
> fine.
> Here is the clusterstatus after the first run.
> https://gist.github.com/shadow-fox/5874f8b5de93fff0f5bcc8886be81d4d#file-3nodes-json
>
> Stopped one solr node via:
> -- bin/solr stop -p 8985
> clusterstatus changed to:
> https://gist.github.com/shadow-fox/5874f8b5de93fff0f5bcc8886be81d4d#file-3nodes1down-json
>
> Till now everything is as expected.
>
> Here is the remaining part where it confuses me.
>
> Bring the down node back to life. Clusterstatus changed from 2 node
> down with 1 node not found to 3 node down including the new node that
> just brought up.
> https://gist.github.com/shadow-fox/5874f8b5de93fff0f5bcc8886be81d4d#file-3nodes3down-json
> Expected result should be all the other nodes should be in active mode
> and this one would be recovery mode and then it would be active mode,
> as this node had data before i stopped it using the script.
>
> Now I added one more node to the cluster via
> -- bin/solr -c -z localhost:2181 -p 8986
> The clusterstatus changed to:
> https://gist.github.com/shadow-fox/5874f8b5de93fff0f5bcc8886be81d4d#file-4node3down-json
> This one just retains the previous state and adds the node to the cluster.
>
>
> When bringing up the removed node which was previously in the cluster
> which was registered to the zookeeper and has data about the
> collections be registered as active rather than making every other
> node down ? If so what is the solution to this ?
>
> When we add more nodes to an existing cluster, how to ensure that it
> also gets the same collections/data i.e. basically synchronizes with
> the other nodes which are present in the node rather than manually
> create collection for that specific node ? As you can see from the
> lastly added node's clusterstate it is there in the live_nodes but
> never got the collections into its data dir.
> Is there any other way to add a node with the existing cluster with
> the cluster data ?
>
> For the completion here is the code that is used to upload config and
> create collection through CloudSolrClient in Solrj.(Not full code but
> part of it where the operation is happening.)
> https://gist.github.com/shadow-fox/5874f8b5de93fff0f5bcc8886be81d4d#file-code-java
> Thats all there is for a collection to create: upload configsets to
> zookeeper, create collection and reload collection if required.
>
> This I have tried in my local Mac OS Sierra and also in AWS env which
> same effect.
>
>
>
> --
> Thanks & Regards
> Pranaya PR Behera



--
Thanks & Regards
Pranaya PR Behera
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Nodes goes down but never recovers.

Erick Erickson
Have you looked at the Solr logs on the node you try to bring back up?
There are sometimes much more informative messages in the log files.
The proverbial "smoking gun" would be messages about write locks.

You say they are all using the same solr.home, which is probably the
source of a lot of your issues. Take a look at the directory structure
after you start up the example and you'll see different -s parameters
for each of the instances started on the same machine, so the startup
looks something like:

bin/solr start -c -z localhost:2181 -p 898$1 -s example/cloud/node1/solr
bin/solr start -c -z localhost:2181 -p 898$1 -s example/cloud/node2/solr

and the like.

Best,
Erick

On Thu, Apr 20, 2017 at 11:01 AM, Pranaya Behera
<[hidden email]> wrote:

> Hi,
>      Can someone from the mailing list also confirm the same findings
> ? I am at wit's end on what to do to fix this. Please guide me to
> create a patch for the same.
>
> On Thu, Apr 20, 2017 at 3:13 PM, Pranaya Behera
> <[hidden email]> wrote:
>> Hi,
>>      Through SolrJ I am trying to upload configsets and create
>> collections in my solrcloud.
>>
>> Setup:
>> 1 Standalone zookeeper listening on 2181 port. version 3.4.10
>> -- bin/zkServer.sh start
>> 3 Starting solr nodes. (All running from the same solr.home) version
>> 6.5.0 and as well in 6.2.1
>> -- bin/solr -c -z localhost:2181 -p 8983
>> -- bin/solr -c -z localhost:2181 -p 8984
>> -- bin/solr -c -z localhost:2181 -p 8985
>>
>> After first run of my java application to upload the config and create
>> the collections in solr through zookeeper is seemless and working
>> fine.
>> Here is the clusterstatus after the first run.
>> https://gist.github.com/shadow-fox/5874f8b5de93fff0f5bcc8886be81d4d#file-3nodes-json
>>
>> Stopped one solr node via:
>> -- bin/solr stop -p 8985
>> clusterstatus changed to:
>> https://gist.github.com/shadow-fox/5874f8b5de93fff0f5bcc8886be81d4d#file-3nodes1down-json
>>
>> Till now everything is as expected.
>>
>> Here is the remaining part where it confuses me.
>>
>> Bring the down node back to life. Clusterstatus changed from 2 node
>> down with 1 node not found to 3 node down including the new node that
>> just brought up.
>> https://gist.github.com/shadow-fox/5874f8b5de93fff0f5bcc8886be81d4d#file-3nodes3down-json
>> Expected result should be all the other nodes should be in active mode
>> and this one would be recovery mode and then it would be active mode,
>> as this node had data before i stopped it using the script.
>>
>> Now I added one more node to the cluster via
>> -- bin/solr -c -z localhost:2181 -p 8986
>> The clusterstatus changed to:
>> https://gist.github.com/shadow-fox/5874f8b5de93fff0f5bcc8886be81d4d#file-4node3down-json
>> This one just retains the previous state and adds the node to the cluster.
>>
>>
>> When bringing up the removed node which was previously in the cluster
>> which was registered to the zookeeper and has data about the
>> collections be registered as active rather than making every other
>> node down ? If so what is the solution to this ?
>>
>> When we add more nodes to an existing cluster, how to ensure that it
>> also gets the same collections/data i.e. basically synchronizes with
>> the other nodes which are present in the node rather than manually
>> create collection for that specific node ? As you can see from the
>> lastly added node's clusterstate it is there in the live_nodes but
>> never got the collections into its data dir.
>> Is there any other way to add a node with the existing cluster with
>> the cluster data ?
>>
>> For the completion here is the code that is used to upload config and
>> create collection through CloudSolrClient in Solrj.(Not full code but
>> part of it where the operation is happening.)
>> https://gist.github.com/shadow-fox/5874f8b5de93fff0f5bcc8886be81d4d#file-code-java
>> Thats all there is for a collection to create: upload configsets to
>> zookeeper, create collection and reload collection if required.
>>
>> This I have tried in my local Mac OS Sierra and also in AWS env which
>> same effect.
>>
>>
>>
>> --
>> Thanks & Regards
>> Pranaya PR Behera
>
>
>
> --
> Thanks & Regards
> Pranaya PR Behera
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Nodes goes down but never recovers.

Pranaya Behera-2
Hi Erick,
              Even if they use different solr.home which I have also
tested in AWS environment there also is the same problem.

Can someone verify the first message in their local ?

On Fri, Apr 21, 2017 at 2:27 AM, Erick Erickson <[hidden email]> wrote:

> Have you looked at the Solr logs on the node you try to bring back up?
> There are sometimes much more informative messages in the log files.
> The proverbial "smoking gun" would be messages about write locks.
>
> You say they are all using the same solr.home, which is probably the
> source of a lot of your issues. Take a look at the directory structure
> after you start up the example and you'll see different -s parameters
> for each of the instances started on the same machine, so the startup
> looks something like:
>
> bin/solr start -c -z localhost:2181 -p 898$1 -s example/cloud/node1/solr
> bin/solr start -c -z localhost:2181 -p 898$1 -s example/cloud/node2/solr
>
> and the like.
>
> Best,
> Erick
>
> On Thu, Apr 20, 2017 at 11:01 AM, Pranaya Behera
> <[hidden email]> wrote:
>> Hi,
>>      Can someone from the mailing list also confirm the same findings
>> ? I am at wit's end on what to do to fix this. Please guide me to
>> create a patch for the same.
>>
>> On Thu, Apr 20, 2017 at 3:13 PM, Pranaya Behera
>> <[hidden email]> wrote:
>>> Hi,
>>>      Through SolrJ I am trying to upload configsets and create
>>> collections in my solrcloud.
>>>
>>> Setup:
>>> 1 Standalone zookeeper listening on 2181 port. version 3.4.10
>>> -- bin/zkServer.sh start
>>> 3 Starting solr nodes. (All running from the same solr.home) version
>>> 6.5.0 and as well in 6.2.1
>>> -- bin/solr -c -z localhost:2181 -p 8983
>>> -- bin/solr -c -z localhost:2181 -p 8984
>>> -- bin/solr -c -z localhost:2181 -p 8985
>>>
>>> After first run of my java application to upload the config and create
>>> the collections in solr through zookeeper is seemless and working
>>> fine.
>>> Here is the clusterstatus after the first run.
>>> https://gist.github.com/shadow-fox/5874f8b5de93fff0f5bcc8886be81d4d#file-3nodes-json
>>>
>>> Stopped one solr node via:
>>> -- bin/solr stop -p 8985
>>> clusterstatus changed to:
>>> https://gist.github.com/shadow-fox/5874f8b5de93fff0f5bcc8886be81d4d#file-3nodes1down-json
>>>
>>> Till now everything is as expected.
>>>
>>> Here is the remaining part where it confuses me.
>>>
>>> Bring the down node back to life. Clusterstatus changed from 2 node
>>> down with 1 node not found to 3 node down including the new node that
>>> just brought up.
>>> https://gist.github.com/shadow-fox/5874f8b5de93fff0f5bcc8886be81d4d#file-3nodes3down-json
>>> Expected result should be all the other nodes should be in active mode
>>> and this one would be recovery mode and then it would be active mode,
>>> as this node had data before i stopped it using the script.
>>>
>>> Now I added one more node to the cluster via
>>> -- bin/solr -c -z localhost:2181 -p 8986
>>> The clusterstatus changed to:
>>> https://gist.github.com/shadow-fox/5874f8b5de93fff0f5bcc8886be81d4d#file-4node3down-json
>>> This one just retains the previous state and adds the node to the cluster.
>>>
>>>
>>> When bringing up the removed node which was previously in the cluster
>>> which was registered to the zookeeper and has data about the
>>> collections be registered as active rather than making every other
>>> node down ? If so what is the solution to this ?
>>>
>>> When we add more nodes to an existing cluster, how to ensure that it
>>> also gets the same collections/data i.e. basically synchronizes with
>>> the other nodes which are present in the node rather than manually
>>> create collection for that specific node ? As you can see from the
>>> lastly added node's clusterstate it is there in the live_nodes but
>>> never got the collections into its data dir.
>>> Is there any other way to add a node with the existing cluster with
>>> the cluster data ?
>>>
>>> For the completion here is the code that is used to upload config and
>>> create collection through CloudSolrClient in Solrj.(Not full code but
>>> part of it where the operation is happening.)
>>> https://gist.github.com/shadow-fox/5874f8b5de93fff0f5bcc8886be81d4d#file-code-java
>>> Thats all there is for a collection to create: upload configsets to
>>> zookeeper, create collection and reload collection if required.
>>>
>>> This I have tried in my local Mac OS Sierra and also in AWS env which
>>> same effect.
>>>
>>>
>>>
>>> --
>>> Thanks & Regards
>>> Pranaya PR Behera
>>
>>
>>
>> --
>> Thanks & Regards
>> Pranaya PR Behera



--
Thanks & Regards
Pranaya PR Behera
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Nodes goes down but never recovers.

Pranaya Behera-2
Any other solutions for this ?

On Fri, Apr 21, 2017 at 9:42 AM, Pranaya Behera
<[hidden email]> wrote:

> Hi Erick,
>               Even if they use different solr.home which I have also
> tested in AWS environment there also is the same problem.
>
> Can someone verify the first message in their local ?
>
> On Fri, Apr 21, 2017 at 2:27 AM, Erick Erickson <[hidden email]> wrote:
>> Have you looked at the Solr logs on the node you try to bring back up?
>> There are sometimes much more informative messages in the log files.
>> The proverbial "smoking gun" would be messages about write locks.
>>
>> You say they are all using the same solr.home, which is probably the
>> source of a lot of your issues. Take a look at the directory structure
>> after you start up the example and you'll see different -s parameters
>> for each of the instances started on the same machine, so the startup
>> looks something like:
>>
>> bin/solr start -c -z localhost:2181 -p 898$1 -s example/cloud/node1/solr
>> bin/solr start -c -z localhost:2181 -p 898$1 -s example/cloud/node2/solr
>>
>> and the like.
>>
>> Best,
>> Erick
>>
>> On Thu, Apr 20, 2017 at 11:01 AM, Pranaya Behera
>> <[hidden email]> wrote:
>>> Hi,
>>>      Can someone from the mailing list also confirm the same findings
>>> ? I am at wit's end on what to do to fix this. Please guide me to
>>> create a patch for the same.
>>>
>>> On Thu, Apr 20, 2017 at 3:13 PM, Pranaya Behera
>>> <[hidden email]> wrote:
>>>> Hi,
>>>>      Through SolrJ I am trying to upload configsets and create
>>>> collections in my solrcloud.
>>>>
>>>> Setup:
>>>> 1 Standalone zookeeper listening on 2181 port. version 3.4.10
>>>> -- bin/zkServer.sh start
>>>> 3 Starting solr nodes. (All running from the same solr.home) version
>>>> 6.5.0 and as well in 6.2.1
>>>> -- bin/solr -c -z localhost:2181 -p 8983
>>>> -- bin/solr -c -z localhost:2181 -p 8984
>>>> -- bin/solr -c -z localhost:2181 -p 8985
>>>>
>>>> After first run of my java application to upload the config and create
>>>> the collections in solr through zookeeper is seemless and working
>>>> fine.
>>>> Here is the clusterstatus after the first run.
>>>> https://gist.github.com/shadow-fox/5874f8b5de93fff0f5bcc8886be81d4d#file-3nodes-json
>>>>
>>>> Stopped one solr node via:
>>>> -- bin/solr stop -p 8985
>>>> clusterstatus changed to:
>>>> https://gist.github.com/shadow-fox/5874f8b5de93fff0f5bcc8886be81d4d#file-3nodes1down-json
>>>>
>>>> Till now everything is as expected.
>>>>
>>>> Here is the remaining part where it confuses me.
>>>>
>>>> Bring the down node back to life. Clusterstatus changed from 2 node
>>>> down with 1 node not found to 3 node down including the new node that
>>>> just brought up.
>>>> https://gist.github.com/shadow-fox/5874f8b5de93fff0f5bcc8886be81d4d#file-3nodes3down-json
>>>> Expected result should be all the other nodes should be in active mode
>>>> and this one would be recovery mode and then it would be active mode,
>>>> as this node had data before i stopped it using the script.
>>>>
>>>> Now I added one more node to the cluster via
>>>> -- bin/solr -c -z localhost:2181 -p 8986
>>>> The clusterstatus changed to:
>>>> https://gist.github.com/shadow-fox/5874f8b5de93fff0f5bcc8886be81d4d#file-4node3down-json
>>>> This one just retains the previous state and adds the node to the cluster.
>>>>
>>>>
>>>> When bringing up the removed node which was previously in the cluster
>>>> which was registered to the zookeeper and has data about the
>>>> collections be registered as active rather than making every other
>>>> node down ? If so what is the solution to this ?
>>>>
>>>> When we add more nodes to an existing cluster, how to ensure that it
>>>> also gets the same collections/data i.e. basically synchronizes with
>>>> the other nodes which are present in the node rather than manually
>>>> create collection for that specific node ? As you can see from the
>>>> lastly added node's clusterstate it is there in the live_nodes but
>>>> never got the collections into its data dir.
>>>> Is there any other way to add a node with the existing cluster with
>>>> the cluster data ?
>>>>
>>>> For the completion here is the code that is used to upload config and
>>>> create collection through CloudSolrClient in Solrj.(Not full code but
>>>> part of it where the operation is happening.)
>>>> https://gist.github.com/shadow-fox/5874f8b5de93fff0f5bcc8886be81d4d#file-code-java
>>>> Thats all there is for a collection to create: upload configsets to
>>>> zookeeper, create collection and reload collection if required.
>>>>
>>>> This I have tried in my local Mac OS Sierra and also in AWS env which
>>>> same effect.
>>>>
>>>>
>>>>
>>>> --
>>>> Thanks & Regards
>>>> Pranaya PR Behera
>>>
>>>
>>>
>>> --
>>> Thanks & Regards
>>> Pranaya PR Behera
>
>
>
> --
> Thanks & Regards
> Pranaya PR Behera



--
Thanks & Regards
Pranaya PR Behera
Loading...