SolrCloud installation troubles...

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

SolrCloud installation troubles...

Scott Prentice
Using Solr 7.2.0 and Zookeeper 3.4.11

In an effort to move to a more robust Solr environment, I'm setting up a
prototype system of 3 Solr servers and 3 Zookeeper servers. For now,
this is all on one machine, but will eventually be 3 machines.

This works fine on a Ubuntu 5.4.0-6 VM on my local system, but when I do
the same setup on the company's network machine (a Red Hat 4.8.5-16 VM),
I'm unable to create a collection. To keep things simple, I'm not using
our custom schema yet, but just creating a collection through the Solr
Admin UI using Collections > Add Collection, using the "_default" config
set. On the Ubuntu system, I can create various collections .. 1 shard
w/ 1 replication .. 2 shards w/ 3 replications .. 3 shards w/ 4
replications .. all seem alive and well.

But when I do the same thing on the Red Hat system it fails. Through the
UI, it'll first time out with this message ..

     Connection to Solr lost

Then after a refresh, the collection appears to have been partially
created, but it's in the "Gone" state, and after some time, is deleted
by an apparent cleanup process. If I try to create one through the
command line ..

     ./bin/solr create -c test99 -n _default -s 2 -rf 2

I get this response ..

ERROR: Failed to create collection 'test99' due to:
{10.6.208.31:8984_solr=org.apache.solr.client.solrj.SolrServerException:IOException
occured when talking to server at: http://10.6.208.31:8984/solr,
10.6.208.31:8985_solr=org.apache.solr.client.solrj.SolrServerException:IOException
occured when talking to server at: http://10.6.208.31:8985/solr,
10.6.208.31:8983_solr=org.apache.solr.client.solrj.SolrServerException:IOException
occured when talking to server at: http://10.6.208.31:8983/solr}

I've seen other reports of errors like this but no solutions that seem
to apply to my situation. Any thoughts?

Thanks!
...scott


Reply | Threaded
Open this post in threaded view
|

Re: SolrCloud installation troubles...

Shawn Heisey-2
On 1/29/2018 1:13 PM, Scott Prentice wrote:

> But when I do the same thing on the Red Hat system it fails. Through
> the UI, it'll first time out with this message ..
>
>     Connection to Solr lost
>
> Then after a refresh, the collection appears to have been partially
> created, but it's in the "Gone" state, and after some time, is deleted
> by an apparent cleanup process. If I try to create one through the
> command line ..
>
>     ./bin/solr create -c test99 -n _default -s 2 -rf 2
>
> I get this response ..
>
> ERROR: Failed to create collection 'test99' due to:
> {10.6.208.31:8984_solr=org.apache.solr.client.solrj.SolrServerException:IOException
> occured when talking to server at: http://10.6.208.31:8984/solr,
> 10.6.208.31:8985_solr=org.apache.solr.client.solrj.SolrServerException:IOException
> occured when talking to server at: http://10.6.208.31:8985/solr,
> 10.6.208.31:8983_solr=org.apache.solr.client.solrj.SolrServerException:IOException
> occured when talking to server at: http://10.6.208.31:8983/solr}

This sounds like either network connectivity problems or possibly issues
caused by extreme garbage collection pauses that result in timeouts.

Thanks,
Shawn

Reply | Threaded
Open this post in threaded view
|

RE: SolrCloud installation troubles...

Davis, Daniel (NIH/NLM) [C]
In reply to this post by Scott Prentice
To expand on that answer, you have to wonder what ports are open in the server system's port-based firewall.    I have to ask my systems team to open ports for everything I'm using, especially when I move from localhost to outside.

You should be able to "fake it out" if you set up your zookeeper configuration to use localhost ports.

-----Original Message-----
From: Scott Prentice [mailto:[hidden email]]
Sent: Monday, January 29, 2018 3:13 PM
To: [hidden email]
Subject: SolrCloud installation troubles...

Using Solr 7.2.0 and Zookeeper 3.4.11

In an effort to move to a more robust Solr environment, I'm setting up a prototype system of 3 Solr servers and 3 Zookeeper servers. For now, this is all on one machine, but will eventually be 3 machines.

This works fine on a Ubuntu 5.4.0-6 VM on my local system, but when I do the same setup on the company's network machine (a Red Hat 4.8.5-16 VM), I'm unable to create a collection. To keep things simple, I'm not using our custom schema yet, but just creating a collection through the Solr Admin UI using Collections > Add Collection, using the "_default" config set. On the Ubuntu system, I can create various collections .. 1 shard w/ 1 replication .. 2 shards w/ 3 replications .. 3 shards w/ 4 replications .. all seem alive and well.

But when I do the same thing on the Red Hat system it fails. Through the UI, it'll first time out with this message ..

     Connection to Solr lost

Then after a refresh, the collection appears to have been partially created, but it's in the "Gone" state, and after some time, is deleted by an apparent cleanup process. If I try to create one through the command line ..

     ./bin/solr create -c test99 -n _default -s 2 -rf 2

I get this response ..

ERROR: Failed to create collection 'test99' due to:
{10.6.208.31:8984_solr=org.apache.solr.client.solrj.SolrServerException:IOException
occured when talking to server at: http://10.6.208.31:8984/solr, 10.6.208.31:8985_solr=org.apache.solr.client.solrj.SolrServerException:IOException
occured when talking to server at: http://10.6.208.31:8985/solr, 10.6.208.31:8983_solr=org.apache.solr.client.solrj.SolrServerException:IOException
occured when talking to server at: http://10.6.208.31:8983/solr}

I've seen other reports of errors like this but no solutions that seem to apply to my situation. Any thoughts?

Thanks!
...scott


Reply | Threaded
Open this post in threaded view
|

Re: SolrCloud installation troubles...

Scott Prentice
In reply to this post by Shawn Heisey-2

On 1/29/18 12:44 PM, Shawn Heisey wrote:

> On 1/29/2018 1:13 PM, Scott Prentice wrote:
>> But when I do the same thing on the Red Hat system it fails. Through
>> the UI, it'll first time out with this message ..
>>
>>     Connection to Solr lost
>>
>> Then after a refresh, the collection appears to have been partially
>> created, but it's in the "Gone" state, and after some time, is
>> deleted by an apparent cleanup process. If I try to create one
>> through the command line ..
>>
>>     ./bin/solr create -c test99 -n _default -s 2 -rf 2
>>
>> I get this response ..
>>
>> ERROR: Failed to create collection 'test99' due to:
>> {10.6.208.31:8984_solr=org.apache.solr.client.solrj.SolrServerException:IOException
>> occured when talking to server at: http://10.6.208.31:8984/solr,
>> 10.6.208.31:8985_solr=org.apache.solr.client.solrj.SolrServerException:IOException
>> occured when talking to server at: http://10.6.208.31:8985/solr,
>> 10.6.208.31:8983_solr=org.apache.solr.client.solrj.SolrServerException:IOException
>> occured when talking to server at: http://10.6.208.31:8983/solr}
>
> This sounds like either network connectivity problems or possibly
> issues caused by extreme garbage collection pauses that result in
> timeouts.
>
> Thanks,
> Shawn
>
Thanks, Shawn. I was wondering if there was something going on with IP
redirection that was causing confusion. Any thoughts on how to debug?
And, what do you mean by "extreme garbage collection pauses"? Is that
Solr garbage collection or the OS itself? There's really nothing
happening on this machine, it's purely for testing so there shouldn't be
any extra load from other processes.

Thanks!
...scott



Reply | Threaded
Open this post in threaded view
|

Re: SolrCloud installation troubles...

Scott Prentice
In reply to this post by Davis, Daniel (NIH/NLM) [C]
Interesting. I am using "localhost" in the config files (using the IP
caused things to break even worse). But perhaps I should check with IT
to make sure the ports are all open.

Thanks,
...scott


On 1/29/18 12:57 PM, Davis, Daniel (NIH/NLM) [C] wrote:

> To expand on that answer, you have to wonder what ports are open in the server system's port-based firewall.    I have to ask my systems team to open ports for everything I'm using, especially when I move from localhost to outside.
>
> You should be able to "fake it out" if you set up your zookeeper configuration to use localhost ports.
>
> -----Original Message-----
> From: Scott Prentice [mailto:[hidden email]]
> Sent: Monday, January 29, 2018 3:13 PM
> To: [hidden email]
> Subject: SolrCloud installation troubles...
>
> Using Solr 7.2.0 and Zookeeper 3.4.11
>
> In an effort to move to a more robust Solr environment, I'm setting up a prototype system of 3 Solr servers and 3 Zookeeper servers. For now, this is all on one machine, but will eventually be 3 machines.
>
> This works fine on a Ubuntu 5.4.0-6 VM on my local system, but when I do the same setup on the company's network machine (a Red Hat 4.8.5-16 VM), I'm unable to create a collection. To keep things simple, I'm not using our custom schema yet, but just creating a collection through the Solr Admin UI using Collections > Add Collection, using the "_default" config set. On the Ubuntu system, I can create various collections .. 1 shard w/ 1 replication .. 2 shards w/ 3 replications .. 3 shards w/ 4 replications .. all seem alive and well.
>
> But when I do the same thing on the Red Hat system it fails. Through the UI, it'll first time out with this message ..
>
>       Connection to Solr lost
>
> Then after a refresh, the collection appears to have been partially created, but it's in the "Gone" state, and after some time, is deleted by an apparent cleanup process. If I try to create one through the command line ..
>
>       ./bin/solr create -c test99 -n _default -s 2 -rf 2
>
> I get this response ..
>
> ERROR: Failed to create collection 'test99' due to:
> {10.6.208.31:8984_solr=org.apache.solr.client.solrj.SolrServerException:IOException
> occured when talking to server at: http://10.6.208.31:8984/solr, 10.6.208.31:8985_solr=org.apache.solr.client.solrj.SolrServerException:IOException
> occured when talking to server at: http://10.6.208.31:8985/solr, 10.6.208.31:8983_solr=org.apache.solr.client.solrj.SolrServerException:IOException
> occured when talking to server at: http://10.6.208.31:8983/solr}
>
> I've seen other reports of errors like this but no solutions that seem to apply to my situation. Any thoughts?
>
> Thanks!
> ...scott
>
>

Reply | Threaded
Open this post in threaded view
|

RE: SolrCloud installation troubles...

Davis, Daniel (NIH/NLM) [C]
In reply to this post by Scott Prentice
Trying 127.0.0.1 could help.   We kind of tend to think localhost is always 127.0.0.1, but I've seen localhost start to resolve to ::1, the IPv6 equivalent of 127.0.0.1.

I guess some environments can be strict enough to restrict communication on localhost; seems hard to imagine, but it does happen.

-----Original Message-----
From: Scott Prentice [mailto:[hidden email]]
Sent: Monday, January 29, 2018 4:02 PM
To: [hidden email]
Subject: Re: SolrCloud installation troubles...


On 1/29/18 12:44 PM, Shawn Heisey wrote:

> On 1/29/2018 1:13 PM, Scott Prentice wrote:
>> But when I do the same thing on the Red Hat system it fails. Through
>> the UI, it'll first time out with this message ..
>>
>>     Connection to Solr lost
>>
>> Then after a refresh, the collection appears to have been partially
>> created, but it's in the "Gone" state, and after some time, is
>> deleted by an apparent cleanup process. If I try to create one
>> through the command line ..
>>
>>     ./bin/solr create -c test99 -n _default -s 2 -rf 2
>>
>> I get this response ..
>>
>> ERROR: Failed to create collection 'test99' due to:
>> {10.6.208.31:8984_solr=org.apache.solr.client.solrj.SolrServerExcepti
>> on:IOException occured when talking to server at:
>> http://10.6.208.31:8984/solr,
>> 10.6.208.31:8985_solr=org.apache.solr.client.solrj.SolrServerExceptio
>> n:IOException occured when talking to server at:
>> http://10.6.208.31:8985/solr,
>> 10.6.208.31:8983_solr=org.apache.solr.client.solrj.SolrServerExceptio
>> n:IOException occured when talking to server at:
>> http://10.6.208.31:8983/solr}
>
> This sounds like either network connectivity problems or possibly
> issues caused by extreme garbage collection pauses that result in
> timeouts.
>
> Thanks,
> Shawn
>
Thanks, Shawn. I was wondering if there was something going on with IP redirection that was causing confusion. Any thoughts on how to debug?
And, what do you mean by "extreme garbage collection pauses"? Is that Solr garbage collection or the OS itself? There's really nothing happening on this machine, it's purely for testing so there shouldn't be any extra load from other processes.

Thanks!
...scott



Reply | Threaded
Open this post in threaded view
|

Re: SolrCloud installation troubles...

Scott Prentice
Looks like 2888 and 2890 are not open. At least they are not reported
with a netstat -plunt .. could be the problem.

Thanks, all!

...scott


On 1/29/18 1:10 PM, Davis, Daniel (NIH/NLM) [C] wrote:

> Trying 127.0.0.1 could help.   We kind of tend to think localhost is always 127.0.0.1, but I've seen localhost start to resolve to ::1, the IPv6 equivalent of 127.0.0.1.
>
> I guess some environments can be strict enough to restrict communication on localhost; seems hard to imagine, but it does happen.
>
> -----Original Message-----
> From: Scott Prentice [mailto:[hidden email]]
> Sent: Monday, January 29, 2018 4:02 PM
> To: [hidden email]
> Subject: Re: SolrCloud installation troubles...
>
>
> On 1/29/18 12:44 PM, Shawn Heisey wrote:
>> On 1/29/2018 1:13 PM, Scott Prentice wrote:
>>> But when I do the same thing on the Red Hat system it fails. Through
>>> the UI, it'll first time out with this message ..
>>>
>>>      Connection to Solr lost
>>>
>>> Then after a refresh, the collection appears to have been partially
>>> created, but it's in the "Gone" state, and after some time, is
>>> deleted by an apparent cleanup process. If I try to create one
>>> through the command line ..
>>>
>>>      ./bin/solr create -c test99 -n _default -s 2 -rf 2
>>>
>>> I get this response ..
>>>
>>> ERROR: Failed to create collection 'test99' due to:
>>> {10.6.208.31:8984_solr=org.apache.solr.client.solrj.SolrServerExcepti
>>> on:IOException occured when talking to server at:
>>> http://10.6.208.31:8984/solr,
>>> 10.6.208.31:8985_solr=org.apache.solr.client.solrj.SolrServerExceptio
>>> n:IOException occured when talking to server at:
>>> http://10.6.208.31:8985/solr,
>>> 10.6.208.31:8983_solr=org.apache.solr.client.solrj.SolrServerExceptio
>>> n:IOException occured when talking to server at:
>>> http://10.6.208.31:8983/solr}
>> This sounds like either network connectivity problems or possibly
>> issues caused by extreme garbage collection pauses that result in
>> timeouts.
>>
>> Thanks,
>> Shawn
>>
> Thanks, Shawn. I was wondering if there was something going on with IP redirection that was causing confusion. Any thoughts on how to debug?
> And, what do you mean by "extreme garbage collection pauses"? Is that Solr garbage collection or the OS itself? There's really nothing happening on this machine, it's purely for testing so there shouldn't be any extra load from other processes.
>
> Thanks!
> ...scott
>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: SolrCloud installation troubles...

Shawn Heisey
In reply to this post by Scott Prentice
On 1/29/2018 2:02 PM, Scott Prentice wrote:
> Thanks, Shawn. I was wondering if there was something going on with IP
> redirection that was causing confusion. Any thoughts on how to debug?
> And, what do you mean by "extreme garbage collection pauses"? Is that
> Solr garbage collection or the OS itself? There's really nothing
> happening on this machine, it's purely for testing so there shouldn't
> be any extra load from other processes.

Garbage collection is one of the primary features of Java's memory
management.  It's not Solr or the OS.

If the java heap is really enormous, you can end up with long pauses,
but I wouldn't expect them to be frequent unless the index is also
really huge.

A very common issue that can cause even worse pause issues than a large
heap is a heap that's too small, but not quite small enough to cause
Java to completely run out of heap memory.  The default max heap size in
recent Solr versions is 512MB, which is very small.  A Java program
(which Solr is) can never use more heap memory than the maximum it is
configured with, even if the machine has more memory available.

This paragraph is included because you mentioned IP redirection: 
Extreme care must be used when setting up SolrCloud on virtual machines
where accessing the VM has to go through any kind of IP translation. 
SolrCloud keeps track of how to reach each server in the cloud and if it
stores an untranslated address when you need the translated address (or
vice-versa), things are not going to work.  Generally speaking
translated addresses are going to be problematic for SolrCloud, and
should not be used.

Thanks,
Shawn

Reply | Threaded
Open this post in threaded view
|

Re: SolrCloud installation troubles...

Scott Prentice

On 1/29/18 1:31 PM, Shawn Heisey wrote:

> On 1/29/2018 2:02 PM, Scott Prentice wrote:
>> Thanks, Shawn. I was wondering if there was something going on with
>> IP redirection that was causing confusion. Any thoughts on how to
>> debug? And, what do you mean by "extreme garbage collection pauses"?
>> Is that Solr garbage collection or the OS itself? There's really
>> nothing happening on this machine, it's purely for testing so there
>> shouldn't be any extra load from other processes.
>
> Garbage collection is one of the primary features of Java's memory
> management.  It's not Solr or the OS.
>
> If the java heap is really enormous, you can end up with long pauses,
> but I wouldn't expect them to be frequent unless the index is also
> really huge.
>
> A very common issue that can cause even worse pause issues than a
> large heap is a heap that's too small, but not quite small enough to
> cause Java to completely run out of heap memory.  The default max heap
> size in recent Solr versions is 512MB, which is very small.  A Java
> program (which Solr is) can never use more heap memory than the
> maximum it is configured with, even if the machine has more memory
> available.
>
> This paragraph is included because you mentioned IP redirection:
> Extreme care must be used when setting up SolrCloud on virtual
> machines where accessing the VM has to go through any kind of IP
> translation.  SolrCloud keeps track of how to reach each server in the
> cloud and if it stores an untranslated address when you need the
> translated address (or vice-versa), things are not going to work. 
> Generally speaking translated addresses are going to be problematic
> for SolrCloud, and should not be used.
>
> Thanks,
> Shawn
>
Thanks for the clarification. Yes, we're just using the default heap
size for Solr, but there's no index (yet) and nothing really going on,
so I'd hope that garbage collection isn't the problem.

I'm putting my money on some IP translation issues (this is on a tightly
controlled corporate network) or the fact that the 2888 and 2890 ports
appear to not be open. I'll dig down the network issue path for now and
see where that gets me.

Thanks,
...scott


Reply | Threaded
Open this post in threaded view
|

Re: SolrCloud installation troubles...

Rick Leir-2
SELinux? Number open File limits? Number of Process limits?
--
Sorry for being brief. Alternate email is rickleir at yahoo dot com