Solr 8.2 docker image in cloud mode not connecting to Zookeeper on startup

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Solr 8.2 docker image in cloud mode not connecting to Zookeeper on startup

Drew Kidder
Hello! I'm new to the list and I have a bit of an issue that I could use
some help with.

I'm in the process of upgrading our Solr installation from legacy to cloud.
I'm new to the idea of Solr Cloud, so I've been wading through the
documentation and trying to get a basic cluster up and running. I've got my
Zookeeper ensemble set up, talking to each other,  and accessible to my
network via DNS hostnames. I'm using the official Solr 8.2 docker image
from docker hub. Please see more environment information following the
gigantic stack trace below. I've tried to run the docker image both locally
and in the same Amazon VPC as the ZK ensemble, but every time it tries to
start up I get this message in both cases:

2019-10-17 22:30:03.443 INFO  (main) [   ] o.a.s.c.c.ConnectionManager
Waiting for client to connect to ZooKeeper
2019-10-17 22:30:23.539 WARN  (main-SendThread(zk1:2181)) [   ]
o.a.z.ClientCnxn Client session timed out, have not heard from server in
20095ms for sessionid 0x0
2019-10-17 22:30:43.612 WARN  (main-SendThread(zk3:2181)) [   ]
o.a.z.ClientCnxn Client session timed out, have not heard from server in
20005ms for sessionid 0x0
2019-10-17 22:30:43.724 ERROR (main-EventThread) [   ] o.a.z.ClientCnxn
Error while calling watcher  =>
java.util.concurrent.RejectedExecutionException: Task
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$$Lambda$186/0x0000000100328440@5b1d0665
rejected from
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor@64e89eea[Terminated,
pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0]
at
java.base/java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2055)
java.util.concurrent.RejectedExecutionException: Task
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$$Lambda$186/0x0000000100328440@5b1d0665
rejected from
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor@64e89eea[Terminated,
pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0]
at
java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2055)
~[?:?]
at
java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:825)
~[?:?]
at
java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1355)
~[?:?]
at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.execute(ExecutorUtil.java:194)
~[?:?]
at
java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:118)
~[?:?]
at
org.apache.solr.common.cloud.SolrZkClient$ProcessWatchWithExecutor.process(SolrZkClient.java:843)
~[?:?]
at
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:535)
~[?:?]
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510)
~[?:?]
2019-10-17 22:30:43.742 ERROR (main) [   ] o.a.s.s.SolrDispatchFilter Could
not start Solr. Check solr/home property and the logs
2019-10-17 22:30:43.818 ERROR (main) [   ] o.a.s.c.SolrCore
null:org.apache.solr.common.SolrException: Error occurred while loading
solr.xml from zookeeper
at
org.apache.solr.servlet.SolrDispatchFilter.loadNodeConfig(SolrDispatchFilter.java:289)
at
org.apache.solr.servlet.SolrDispatchFilter.createCoreContainer(SolrDispatchFilter.java:259)
at
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:181)
at org.eclipse.jetty.servlet.FilterHolder.initialize(FilterHolder.java:136)
at
org.eclipse.jetty.servlet.ServletHandler.lambda$initialize$0(ServletHandler.java:750)
at
java.base/java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948)
at
java.base/java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:734)
at
java.base/java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:734)
at
java.base/java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:658)
at
org.eclipse.jetty.servlet.ServletHandler.initialize(ServletHandler.java:744)
at
org.eclipse.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:369)
at
org.eclipse.jetty.webapp.WebAppContext.startWebapp(WebAppContext.java:1497)
at
org.eclipse.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1459)
at
org.eclipse.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:854)
at
org.eclipse.jetty.servlet.ServletContextHandler.doStart(ServletContextHandler.java:278)
at org.eclipse.jetty.webapp.WebAppContext.doStart(WebAppContext.java:545)
at
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at
org.eclipse.jetty.deploy.bindings.StandardStarter.processBinding(StandardStarter.java:46)
at org.eclipse.jetty.deploy.AppLifeCycle.runBindings(AppLifeCycle.java:192)
at
org.eclipse.jetty.deploy.DeploymentManager.requestAppGoal(DeploymentManager.java:510)
at
org.eclipse.jetty.deploy.DeploymentManager.addApp(DeploymentManager.java:153)
at
org.eclipse.jetty.deploy.providers.ScanningAppProvider.fileAdded(ScanningAppProvider.java:172)
at
org.eclipse.jetty.deploy.providers.WebAppProvider.fileAdded(WebAppProvider.java:436)
at
org.eclipse.jetty.deploy.providers.ScanningAppProvider$1.fileAdded(ScanningAppProvider.java:65)
at org.eclipse.jetty.util.Scanner.reportAddition(Scanner.java:610)
at org.eclipse.jetty.util.Scanner.reportDifferences(Scanner.java:529)
at org.eclipse.jetty.util.Scanner.scan(Scanner.java:392)
at org.eclipse.jetty.util.Scanner.doStart(Scanner.java:313)
at
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at
org.eclipse.jetty.deploy.providers.ScanningAppProvider.doStart(ScanningAppProvider.java:145)
at
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at
org.eclipse.jetty.deploy.DeploymentManager.startAppProvider(DeploymentManager.java:598)
at
org.eclipse.jetty.deploy.DeploymentManager.doStart(DeploymentManager.java:240)
at
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at
org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:167)
at org.eclipse.jetty.server.Server.start(Server.java:418)
at
org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:119)
at
org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:113)
at org.eclipse.jetty.server.Server.doStart(Server.java:382)
at
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at
org.eclipse.jetty.xml.XmlConfiguration.lambda$main$0(XmlConfiguration.java:1797)
at java.base/java.security.AccessController.doPrivileged(Native Method)
at org.eclipse.jetty.xml.XmlConfiguration.main(XmlConfiguration.java:1746)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native
Method)
at
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at org.eclipse.jetty.start.Main.invokeMain(Main.java:220)
at org.eclipse.jetty.start.Main.start(Main.java:490)
at org.eclipse.jetty.start.Main.main(Main.java:77)
Caused by: org.apache.solr.common.SolrException:
java.util.concurrent.TimeoutException: Could not connect to ZooKeeper
zk1:2181,zk2:2181,zk3:2181 within 30000 ms
at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:201)
at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:125)
at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:120)
at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:107)
at
org.apache.solr.servlet.SolrDispatchFilter.loadNodeConfig(SolrDispatchFilter.java:282)
... 49 more
Caused by: java.util.concurrent.TimeoutException: Could not connect to
ZooKeeper zk1:2181,zk2:2181,zk3:2181 within 30000 ms
at
org.apache.solr.common.cloud.ConnectionManager.waitForConnected(ConnectionManager.java:250)
at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:192)
... 53 more

* zk1, zk2, and zk3 are all resolvable from within my docker container
(running `echo ruok | nc zk1 2181` returns the expected "imok" response
from ZK within the docker container where Solr is located)
* The netcat command mentioned above shows up in the ZK logs, but the Solr
attempts to connect do not (it's like the request isn't even getting to ZK)
* Zookeeper is not set up as secure at this time (no ACLs required)
* I'm using the following command line to start a basic solr cloud instance
as per the documentation: `bin/solr start -c -z zk1:2181,zk2:2181,zk3:2181`
with all other parameters being the defaults as specified by the docker
image
* Interestingly, I don't see a connection attempt to zk2 showing up in the
above log trace. Is that a clue?

Does anyone have any ideas why solr can't connect to ZK? I haven't been
able to find any logs or information as to WHY it can't connect to ZK, and
to my knowledge there's no reason it shouldn't connect to the ZK ensemble
if the netcat command is able to resolve the hostnames. Where can I find
out this information or where can I look?

Any and all suggestions are welcomed and very much appreciated!

--
Drew([hidden email])
Kidder

-- I Drive Way Too Fast To Worry About Cholesterol.
Reply | Threaded
Open this post in threaded view
|

Re: Solr 8.2 docker image in cloud mode not connecting to Zookeeper on startup

Jörn Franke
Could it be that you start the Solr command too early, ie before the network is setup in the Docker container?

Normally I would also expect that a zkRoot Is specified.
Can the Zknodes talk to each other?
Have you tried to specify it in the Solr config?
Normally, I would expect that the Solr config is external to the container, especially later when you secure it. Eg you would not put certificates etc directly in the container as this is not a secure practice.

> Am 18.10.2019 um 01:25 schrieb Drew Kidder <[hidden email]>:
>
> Hello! I'm new to the list and I have a bit of an issue that I could use
> some help with.
>
> I'm in the process of upgrading our Solr installation from legacy to cloud.
> I'm new to the idea of Solr Cloud, so I've been wading through the
> documentation and trying to get a basic cluster up and running. I've got my
> Zookeeper ensemble set up, talking to each other,  and accessible to my
> network via DNS hostnames. I'm using the official Solr 8.2 docker image
> from docker hub. Please see more environment information following the
> gigantic stack trace below. I've tried to run the docker image both locally
> and in the same Amazon VPC as the ZK ensemble, but every time it tries to
> start up I get this message in both cases:
>
> 2019-10-17 22:30:03.443 INFO  (main) [   ] o.a.s.c.c.ConnectionManager
> Waiting for client to connect to ZooKeeper
> 2019-10-17 22:30:23.539 WARN  (main-SendThread(zk1:2181)) [   ]
> o.a.z.ClientCnxn Client session timed out, have not heard from server in
> 20095ms for sessionid 0x0
> 2019-10-17 22:30:43.612 WARN  (main-SendThread(zk3:2181)) [   ]
> o.a.z.ClientCnxn Client session timed out, have not heard from server in
> 20005ms for sessionid 0x0
> 2019-10-17 22:30:43.724 ERROR (main-EventThread) [   ] o.a.z.ClientCnxn
> Error while calling watcher  =>
> java.util.concurrent.RejectedExecutionException: Task
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$$Lambda$186/0x0000000100328440@5b1d0665
> rejected from
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor@64e89eea[Terminated,
> pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0]
> at
> java.base/java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2055)
> java.util.concurrent.RejectedExecutionException: Task
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$$Lambda$186/0x0000000100328440@5b1d0665
> rejected from
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor@64e89eea[Terminated,
> pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0]
> at
> java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2055)
> ~[?:?]
> at
> java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:825)
> ~[?:?]
> at
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1355)
> ~[?:?]
> at
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.execute(ExecutorUtil.java:194)
> ~[?:?]
> at
> java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:118)
> ~[?:?]
> at
> org.apache.solr.common.cloud.SolrZkClient$ProcessWatchWithExecutor.process(SolrZkClient.java:843)
> ~[?:?]
> at
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:535)
> ~[?:?]
> at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510)
> ~[?:?]
> 2019-10-17 22:30:43.742 ERROR (main) [   ] o.a.s.s.SolrDispatchFilter Could
> not start Solr. Check solr/home property and the logs
> 2019-10-17 22:30:43.818 ERROR (main) [   ] o.a.s.c.SolrCore
> null:org.apache.solr.common.SolrException: Error occurred while loading
> solr.xml from zookeeper
> at
> org.apache.solr.servlet.SolrDispatchFilter.loadNodeConfig(SolrDispatchFilter.java:289)
> at
> org.apache.solr.servlet.SolrDispatchFilter.createCoreContainer(SolrDispatchFilter.java:259)
> at
> org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:181)
> at org.eclipse.jetty.servlet.FilterHolder.initialize(FilterHolder.java:136)
> at
> org.eclipse.jetty.servlet.ServletHandler.lambda$initialize$0(ServletHandler.java:750)
> at
> java.base/java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948)
> at
> java.base/java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:734)
> at
> java.base/java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:734)
> at
> java.base/java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:658)
> at
> org.eclipse.jetty.servlet.ServletHandler.initialize(ServletHandler.java:744)
> at
> org.eclipse.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:369)
> at
> org.eclipse.jetty.webapp.WebAppContext.startWebapp(WebAppContext.java:1497)
> at
> org.eclipse.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1459)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:854)
> at
> org.eclipse.jetty.servlet.ServletContextHandler.doStart(ServletContextHandler.java:278)
> at org.eclipse.jetty.webapp.WebAppContext.doStart(WebAppContext.java:545)
> at
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
> at
> org.eclipse.jetty.deploy.bindings.StandardStarter.processBinding(StandardStarter.java:46)
> at org.eclipse.jetty.deploy.AppLifeCycle.runBindings(AppLifeCycle.java:192)
> at
> org.eclipse.jetty.deploy.DeploymentManager.requestAppGoal(DeploymentManager.java:510)
> at
> org.eclipse.jetty.deploy.DeploymentManager.addApp(DeploymentManager.java:153)
> at
> org.eclipse.jetty.deploy.providers.ScanningAppProvider.fileAdded(ScanningAppProvider.java:172)
> at
> org.eclipse.jetty.deploy.providers.WebAppProvider.fileAdded(WebAppProvider.java:436)
> at
> org.eclipse.jetty.deploy.providers.ScanningAppProvider$1.fileAdded(ScanningAppProvider.java:65)
> at org.eclipse.jetty.util.Scanner.reportAddition(Scanner.java:610)
> at org.eclipse.jetty.util.Scanner.reportDifferences(Scanner.java:529)
> at org.eclipse.jetty.util.Scanner.scan(Scanner.java:392)
> at org.eclipse.jetty.util.Scanner.doStart(Scanner.java:313)
> at
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
> at
> org.eclipse.jetty.deploy.providers.ScanningAppProvider.doStart(ScanningAppProvider.java:145)
> at
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
> at
> org.eclipse.jetty.deploy.DeploymentManager.startAppProvider(DeploymentManager.java:598)
> at
> org.eclipse.jetty.deploy.DeploymentManager.doStart(DeploymentManager.java:240)
> at
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
> at
> org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:167)
> at org.eclipse.jetty.server.Server.start(Server.java:418)
> at
> org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:119)
> at
> org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:113)
> at org.eclipse.jetty.server.Server.doStart(Server.java:382)
> at
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
> at
> org.eclipse.jetty.xml.XmlConfiguration.lambda$main$0(XmlConfiguration.java:1797)
> at java.base/java.security.AccessController.doPrivileged(Native Method)
> at org.eclipse.jetty.xml.XmlConfiguration.main(XmlConfiguration.java:1746)
> at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native
> Method)
> at
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.base/java.lang.reflect.Method.invoke(Method.java:566)
> at org.eclipse.jetty.start.Main.invokeMain(Main.java:220)
> at org.eclipse.jetty.start.Main.start(Main.java:490)
> at org.eclipse.jetty.start.Main.main(Main.java:77)
> Caused by: org.apache.solr.common.SolrException:
> java.util.concurrent.TimeoutException: Could not connect to ZooKeeper
> zk1:2181,zk2:2181,zk3:2181 within 30000 ms
> at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:201)
> at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:125)
> at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:120)
> at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:107)
> at
> org.apache.solr.servlet.SolrDispatchFilter.loadNodeConfig(SolrDispatchFilter.java:282)
> ... 49 more
> Caused by: java.util.concurrent.TimeoutException: Could not connect to
> ZooKeeper zk1:2181,zk2:2181,zk3:2181 within 30000 ms
> at
> org.apache.solr.common.cloud.ConnectionManager.waitForConnected(ConnectionManager.java:250)
> at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:192)
> ... 53 more
>
> * zk1, zk2, and zk3 are all resolvable from within my docker container
> (running `echo ruok | nc zk1 2181` returns the expected "imok" response
> from ZK within the docker container where Solr is located)
> * The netcat command mentioned above shows up in the ZK logs, but the Solr
> attempts to connect do not (it's like the request isn't even getting to ZK)
> * Zookeeper is not set up as secure at this time (no ACLs required)
> * I'm using the following command line to start a basic solr cloud instance
> as per the documentation: `bin/solr start -c -z zk1:2181,zk2:2181,zk3:2181`
> with all other parameters being the defaults as specified by the docker
> image
> * Interestingly, I don't see a connection attempt to zk2 showing up in the
> above log trace. Is that a clue?
>
> Does anyone have any ideas why solr can't connect to ZK? I haven't been
> able to find any logs or information as to WHY it can't connect to ZK, and
> to my knowledge there's no reason it shouldn't connect to the ZK ensemble
> if the netcat command is able to resolve the hostnames. Where can I find
> out this information or where can I look?
>
> Any and all suggestions are welcomed and very much appreciated!
>
> --
> Drew([hidden email])
> Kidder
>
> -- I Drive Way Too Fast To Worry About Cholesterol.
Reply | Threaded
Open this post in threaded view
|

Re: Solr 8.2 docker image in cloud mode not connecting to Zookeeper on startup

Rajeswari Natarajan
In reply to this post by Drew Kidder
Are you running zookeeper as container too. If yes , port 2181 needs to be
exposed.

-Rajeswari

On Thu, Oct 17, 2019 at 4:25 PM Drew Kidder <[hidden email]> wrote:

> Hello! I'm new to the list and I have a bit of an issue that I could use
> some help with.
>
> I'm in the process of upgrading our Solr installation from legacy to cloud.
> I'm new to the idea of Solr Cloud, so I've been wading through the
> documentation and trying to get a basic cluster up and running. I've got my
> Zookeeper ensemble set up, talking to each other,  and accessible to my
> network via DNS hostnames. I'm using the official Solr 8.2 docker image
> from docker hub. Please see more environment information following the
> gigantic stack trace below. I've tried to run the docker image both locally
> and in the same Amazon VPC as the ZK ensemble, but every time it tries to
> start up I get this message in both cases:
>
> 2019-10-17 22:30:03.443 INFO  (main) [   ] o.a.s.c.c.ConnectionManager
> Waiting for client to connect to ZooKeeper
> 2019-10-17 22:30:23.539 WARN  (main-SendThread(zk1:2181)) [   ]
> o.a.z.ClientCnxn Client session timed out, have not heard from server in
> 20095ms for sessionid 0x0
> 2019-10-17 22:30:43.612 WARN  (main-SendThread(zk3:2181)) [   ]
> o.a.z.ClientCnxn Client session timed out, have not heard from server in
> 20005ms for sessionid 0x0
> 2019-10-17 22:30:43.724 ERROR (main-EventThread) [   ] o.a.z.ClientCnxn
> Error while calling watcher  =>
> java.util.concurrent.RejectedExecutionException: Task
>
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$$Lambda$186/0x0000000100328440@5b1d0665
> rejected from
>
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor@64e89eea
> [Terminated,
> pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0]
> at
>
> java.base/java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2055)
> java.util.concurrent.RejectedExecutionException: Task
>
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$$Lambda$186/0x0000000100328440@5b1d0665
> rejected from
>
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor@64e89eea
> [Terminated,
> pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0]
> at
>
> java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2055)
> ~[?:?]
> at
> java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:825)
> ~[?:?]
> at
>
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1355)
> ~[?:?]
> at
>
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.execute(ExecutorUtil.java:194)
> ~[?:?]
> at
>
> java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:118)
> ~[?:?]
> at
>
> org.apache.solr.common.cloud.SolrZkClient$ProcessWatchWithExecutor.process(SolrZkClient.java:843)
> ~[?:?]
> at
>
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:535)
> ~[?:?]
> at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510)
> ~[?:?]
> 2019-10-17 22:30:43.742 ERROR (main) [   ] o.a.s.s.SolrDispatchFilter Could
> not start Solr. Check solr/home property and the logs
> 2019-10-17 22:30:43.818 ERROR (main) [   ] o.a.s.c.SolrCore
> null:org.apache.solr.common.SolrException: Error occurred while loading
> solr.xml from zookeeper
> at
>
> org.apache.solr.servlet.SolrDispatchFilter.loadNodeConfig(SolrDispatchFilter.java:289)
> at
>
> org.apache.solr.servlet.SolrDispatchFilter.createCoreContainer(SolrDispatchFilter.java:259)
> at
>
> org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:181)
> at org.eclipse.jetty.servlet.FilterHolder.initialize(FilterHolder.java:136)
> at
>
> org.eclipse.jetty.servlet.ServletHandler.lambda$initialize$0(ServletHandler.java:750)
> at
>
> java.base/java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948)
> at
>
> java.base/java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:734)
> at
>
> java.base/java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:734)
> at
>
> java.base/java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:658)
> at
>
> org.eclipse.jetty.servlet.ServletHandler.initialize(ServletHandler.java:744)
> at
>
> org.eclipse.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:369)
> at
> org.eclipse.jetty.webapp.WebAppContext.startWebapp(WebAppContext.java:1497)
> at
>
> org.eclipse.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1459)
> at
>
> org.eclipse.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:854)
> at
>
> org.eclipse.jetty.servlet.ServletContextHandler.doStart(ServletContextHandler.java:278)
> at org.eclipse.jetty.webapp.WebAppContext.doStart(WebAppContext.java:545)
> at
>
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
> at
>
> org.eclipse.jetty.deploy.bindings.StandardStarter.processBinding(StandardStarter.java:46)
> at org.eclipse.jetty.deploy.AppLifeCycle.runBindings(AppLifeCycle.java:192)
> at
>
> org.eclipse.jetty.deploy.DeploymentManager.requestAppGoal(DeploymentManager.java:510)
> at
>
> org.eclipse.jetty.deploy.DeploymentManager.addApp(DeploymentManager.java:153)
> at
>
> org.eclipse.jetty.deploy.providers.ScanningAppProvider.fileAdded(ScanningAppProvider.java:172)
> at
>
> org.eclipse.jetty.deploy.providers.WebAppProvider.fileAdded(WebAppProvider.java:436)
> at
>
> org.eclipse.jetty.deploy.providers.ScanningAppProvider$1.fileAdded(ScanningAppProvider.java:65)
> at org.eclipse.jetty.util.Scanner.reportAddition(Scanner.java:610)
> at org.eclipse.jetty.util.Scanner.reportDifferences(Scanner.java:529)
> at org.eclipse.jetty.util.Scanner.scan(Scanner.java:392)
> at org.eclipse.jetty.util.Scanner.doStart(Scanner.java:313)
> at
>
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
> at
>
> org.eclipse.jetty.deploy.providers.ScanningAppProvider.doStart(ScanningAppProvider.java:145)
> at
>
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
> at
>
> org.eclipse.jetty.deploy.DeploymentManager.startAppProvider(DeploymentManager.java:598)
> at
>
> org.eclipse.jetty.deploy.DeploymentManager.doStart(DeploymentManager.java:240)
> at
>
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
> at
>
> org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:167)
> at org.eclipse.jetty.server.Server.start(Server.java:418)
> at
>
> org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:119)
> at
>
> org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:113)
> at org.eclipse.jetty.server.Server.doStart(Server.java:382)
> at
>
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
> at
>
> org.eclipse.jetty.xml.XmlConfiguration.lambda$main$0(XmlConfiguration.java:1797)
> at java.base/java.security.AccessController.doPrivileged(Native Method)
> at org.eclipse.jetty.xml.XmlConfiguration.main(XmlConfiguration.java:1746)
> at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native
> Method)
> at
>
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
>
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.base/java.lang.reflect.Method.invoke(Method.java:566)
> at org.eclipse.jetty.start.Main.invokeMain(Main.java:220)
> at org.eclipse.jetty.start.Main.start(Main.java:490)
> at org.eclipse.jetty.start.Main.main(Main.java:77)
> Caused by: org.apache.solr.common.SolrException:
> java.util.concurrent.TimeoutException: Could not connect to ZooKeeper
> zk1:2181,zk2:2181,zk3:2181 within 30000 ms
> at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:201)
> at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:125)
> at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:120)
> at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:107)
> at
>
> org.apache.solr.servlet.SolrDispatchFilter.loadNodeConfig(SolrDispatchFilter.java:282)
> ... 49 more
> Caused by: java.util.concurrent.TimeoutException: Could not connect to
> ZooKeeper zk1:2181,zk2:2181,zk3:2181 within 30000 ms
> at
>
> org.apache.solr.common.cloud.ConnectionManager.waitForConnected(ConnectionManager.java:250)
> at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:192)
> ... 53 more
>
> * zk1, zk2, and zk3 are all resolvable from within my docker container
> (running `echo ruok | nc zk1 2181` returns the expected "imok" response
> from ZK within the docker container where Solr is located)
> * The netcat command mentioned above shows up in the ZK logs, but the Solr
> attempts to connect do not (it's like the request isn't even getting to ZK)
> * Zookeeper is not set up as secure at this time (no ACLs required)
> * I'm using the following command line to start a basic solr cloud instance
> as per the documentation: `bin/solr start -c -z zk1:2181,zk2:2181,zk3:2181`
> with all other parameters being the defaults as specified by the docker
> image
> * Interestingly, I don't see a connection attempt to zk2 showing up in the
> above log trace. Is that a clue?
>
> Does anyone have any ideas why solr can't connect to ZK? I haven't been
> able to find any logs or information as to WHY it can't connect to ZK, and
> to my knowledge there's no reason it shouldn't connect to the ZK ensemble
> if the netcat command is able to resolve the hostnames. Where can I find
> out this information or where can I look?
>
> Any and all suggestions are welcomed and very much appreciated!
>
> --
> Drew([hidden email])
> Kidder
>
> -- I Drive Way Too Fast To Worry About Cholesterol.
>
Reply | Threaded
Open this post in threaded view
|

Re: Solr 8.2 docker image in cloud mode not connecting to Zookeeper on startup

Martijn Koster
In reply to this post by Drew Kidder


> On 18 Oct 2019, at 00:25, Drew Kidder <[hidden email]> wrote:

> * I'm using the following command line to start a basic solr cloud instance
> as per the documentation: `bin/solr start -c -z zk1:2181,zk2:2181,zk3:2181`

I assume you’re just looking to run a single Solr node in a single container, right?

Just set the ZK_HOST environment variable, and remove the command-line arguments.
And you don’t need to specify the port number unless you deviate from the default.
Have a look at this example https://github.com/docker-solr/docker-solr-examples/blob/master/swarm/docker-compose.yml <https://github.com/docker-solr/docker-solr-examples/blob/master/swarm/docker-compose.yml#L61with>

The “start” command starts Solr in the background, which is typically not what you want
when running Solr under docker.


Why your command isn’t working as is, is not clear. When you say you’re using that
command-line, how do you actually do that? In a full docker command line,
or a compose file, or from a “docker exec”, or from some orchestrator.
Share the exact thing you’re doing; perhaps there is mistake there.
Also, run `ps -eflww` in the container to see what command-line arguments the JVM actually got started with.
And share the full startup log somewhere (in a GitHub gist perhaps), there might be something of interest earlier on.

>> (running `echo ruok | nc zk1 2181` returns the expected "imok" response
>> from ZK within the docker container where Solr is located)
>> * The netcat command mentioned above shows up in the ZK logs, but the Solr
>> attempts to connect do not (it's like the request isn't even getting to ZK)

Then it doesn’t sound like a environmental firewall/security-group/routing issue.
Next step to debug then could be to check if you actually see Solr make tcp connections
to port 2181, in the Solr container, using tcpdump/sysdig/netstat or some such.
If that gives a negative result, then you know it’s an issue in your Solr invocation config, or name resolution.
If that gives a positive result, then it’s environmental after all; and you can dig further.


But try the ZK_HOST thing first; it may just fix it.

— Martijn
Reply | Threaded
Open this post in threaded view
|

Re: Solr 8.2 docker image in cloud mode not connecting to Zookeeper on startup

Drew Kidder
Thank you all for your suggestions! I appreciate the fast turnaround.

My setup is using Amazon ECS for our solr cloud installation. Each ZK is in
its own container, using Route53 Service Discovery to provide the DNS name.
The ZK nodes can all talk to each other, and I can communicate to each one
of those nodes from my local machine and from within the solr container.
Solr is one node per container, as Martijn correctly assumed. I am not
using a zkRoot at present because my intention is to use ZK solely for Solr
Cloud and nothing else.

I have tried removing the "-z" option from the Dockerfile CMD and using the
ZK_HOST environment variable (see below). I have even also modified the
solr.in.sh and set the ZK_HOST variable there, all to no avail. I have
tried both the Dockerfile command route, and have logged into the solr
container and tried to run the CMD manually to see if there was a problem
with the way I was using the CMD entry. All of those methods give me the
same result output captured in the gist below.

The gist for my solr.log output is here:
https://gist.github.com/dkidder/2db9a6d393dedb97a39ed32e2be0c087

My Dockerfile for the solr container looks like this:


FROM    solr:8.2

EXPOSE    8983 8999 2181

VOLUME    /app/logs
VOLUME    /app/data
VOLUME    /app/conf

## add our jetty configuration (increased request size!)
COPY   jetty.xml /opt/solr/server/etc/

## SolrCloud configuration
ENV     ZK_HOST zk1:2181,zk2:2181,zk3:2181
ENV     ZK_CLIENT_TIMEOUT 30000

USER   root
RUN    apt-get update
RUN    apt-get install -y netcat net-tools vim procps
USER   solr

# Copy over custom solr plugins
COPY    myplugins/src/resources/* /opt/solr/server/solr/my-resources/
COPY    lib/*.jar /opt/solr/my-lib/

# Copy over my configs
COPY    conf/ /app/conf

#Start solr in cloud mode, connecting to zookeeper
CMD       ["solr","start","-f","-c"]

The docker command I use to execute this Dockerfile is `docker run -p
8983:8983 -p 2181:2181 --name $(APP_NAME) $(APP_NAME):latest`

Output of `ps -eflww` from within the solr container (as root):

root@fe0ad5b40b42:/opt/solr-8.2.0# ps -eflww
F S UID        PID  PPID  C PRI  NI ADDR SZ WCHAN  STIME TTY          TIME
CMD
4 S solr         1     0  9  80   0 - 1043842 -    14:36 ?        00:00:07
/usr/local/openjdk-11/bin/java -server -Xms512m -Xmx512m -XX:+UseG1GC
-XX:+PerfDisableSharedMem -XX:+ParallelRefProcEnabled
-XX:MaxGCPauseMillis=250 -XX:+UseLargePages -XX:+AlwaysPreTouch
-Xlog:gc*:file=/var/solr/logs/solr_gc.log:time,uptime:filecount=9,filesize=20M
-Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.local.only=false
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.port=18983
-Dcom.sun.management.jmxremote.rmi.port=18983 -DzkClientTimeout=30000
-DzkHost=zk1:2181,zk2:2181,zk3:2181 -Dsolr.log.dir=/var/solr/logs
-Djetty.port=8983 -DSTOP.PORT=7983 -DSTOP.KEY=solrrocks -Duser.timezone=UTC
-Djetty.home=/opt/solr/server -Dsolr.solr.home=/var/solr/data
-Dsolr.data.home= -Dsolr.install.dir=/opt/solr
-Dsolr.default.confdir=/opt/solr/server/solr/configsets/_default/conf
-Dlog4j.configurationFile=file:/var/solr/log4j2.xml -Xss256k
-Dsolr.jetty.https.port=8983 -jar start.jar --module=http
4 S root        90     0  0  80   0 -  4988 -      14:37 pts/0    00:00:00
/bin/bash
0 R root        95    90  0  80   0 -  9595 -      14:37 pts/0    00:00:00
ps -eflww

Output of netstat from within the solr container (as root):

root@fe0ad5b40b42:/opt/solr-8.2.0# netstat
Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State
tcp        0      0 fe0ad5b40b42:43678      172.20.28.179:2181
 TIME_WAIT
tcp        0      0 fe0ad5b40b42:60164      172.20.155.241:2181
TIME_WAIT
tcp        0      0 fe0ad5b40b42:60500      172.20.60.138:2181
 TIME_WAIT
Active UNIX domain sockets (w/o servers)
Proto RefCnt Flags       Type       State         I-Node   Path
unix  2      [ ]         STREAM     CONNECTED     129252
unix  2      [ ]         STREAM     CONNECTED     129270

I'm beginning to think that ZK is not setup correctly. I haven't uploaded
any configuration files to ZK yet; my understanding was that I could start
up a solr cloud node with no collections and upload the configuration from
there. I was under the impression that it would try to connect to ZK and if
it couldn't get config files from there it would use local config files. Do
I need to upload the solr cloud configuration files to ZK before starting
up the cluster?  The netstat output makes it look like the solr container
is indeed connected to the ZK containers, but there's no indication as to
why it cannot connect to Zookeeper that I can see.

--
Drew([hidden email])
http://wyntermute.dyndns.org/blog/

-- I Drive Way Too Fast To Worry About Cholesterol.


On Fri, Oct 18, 2019 at 3:11 AM Martijn Koster <[hidden email]>
wrote:

>
>
> > On 18 Oct 2019, at 00:25, Drew Kidder <[hidden email]> wrote:
>
> > * I'm using the following command line to start a basic solr cloud
> instance
> > as per the documentation: `bin/solr start -c -z
> zk1:2181,zk2:2181,zk3:2181`
>
> I assume you’re just looking to run a single Solr node in a single
> container, right?
>
> Just set the ZK_HOST environment variable, and remove the command-line
> arguments.
> And you don’t need to specify the port number unless you deviate from the
> default.
> Have a look at this example
> https://github.com/docker-solr/docker-solr-examples/blob/master/swarm/docker-compose.yml
> <
> https://github.com/docker-solr/docker-solr-examples/blob/master/swarm/docker-compose.yml#L61with
> >
>
> The “start” command starts Solr in the background, which is typically not
> what you want
> when running Solr under docker.
>
>
> Why your command isn’t working as is, is not clear. When you say you’re
> using that
> command-line, how do you actually do that? In a full docker command line,
> or a compose file, or from a “docker exec”, or from some orchestrator.
> Share the exact thing you’re doing; perhaps there is mistake there.
> Also, run `ps -eflww` in the container to see what command-line arguments
> the JVM actually got started with.
> And share the full startup log somewhere (in a GitHub gist perhaps), there
> might be something of interest earlier on.
>
> >> (running `echo ruok | nc zk1 2181` returns the expected "imok" response
> >> from ZK within the docker container where Solr is located)
> >> * The netcat command mentioned above shows up in the ZK logs, but the
> Solr
> >> attempts to connect do not (it's like the request isn't even getting to
> ZK)
>
> Then it doesn’t sound like a environmental firewall/security-group/routing
> issue.
> Next step to debug then could be to check if you actually see Solr make
> tcp connections
> to port 2181, in the Solr container, using tcpdump/sysdig/netstat or some
> such.
> If that gives a negative result, then you know it’s an issue in your Solr
> invocation config, or name resolution.
> If that gives a positive result, then it’s environmental after all; and
> you can dig further.
>
>
> But try the ZK_HOST thing first; it may just fix it.
>
> — Martijn
Reply | Threaded
Open this post in threaded view
|

Re: Solr 8.2 docker image in cloud mode not connecting to Zookeeper on startup

Shawn Heisey-2
On 10/18/2019 9:28 AM, Drew Kidder wrote:
> I'm beginning to think that ZK is not setup correctly. I haven't uploaded
> any configuration files to ZK yet; my understanding was that I could start
> up a solr cloud node with no collections and upload the configuration from
> there. I was under the impression that it would try to connect to ZK and if
> it couldn't get config files from there it would use local config files.

SolrCloud will always read index configs from ZooKeeper.  It will not
use local config files.  I believe that the only config that will be
read locally is solr.xml, but that can also be placed in ZK.

Solr will run with no collections in the cloud.  When the first
SolrCloud node in a cluster is started, that is the state it will be in.
  All of the nodes can run with no collections.

> Do I need to upload the solr cloud configuration files to ZK before starting
> up the cluster?  The netstat output makes it look like the solr container
> is indeed connected to the ZK containers, but there's no indication as to
> why it cannot connect to Zookeeper that I can see.

If Solr finds no information at all in ZK when it starts, then it will
create the required structures within ZK for the cluster.  Index configs
will not normally be uploaded just by starting Solr.  Some methods of
creating collections will also upload the config.  Some will require
that you upload the configuration first, or use one that is already there.

The entries on the netstat output show TIME_WAIT.  If there were active
connections, they would show ESTABLISHED.  When a ZK client is running,
it maintains continuous connections to all of the servers that it is given.

All of the work related to making ZK connections is handled by the ZK
client, not Solr itself.  I'm not sure what options are available for
getting that client to provide more information about what went wrong.
Based on the information available, there seems to be some kind of
network problem.  I do not know whether it is something in Java, Docker,
or somewhere else.

Have you tried your "ruok" test as the user that is running Solr, or was
that test done as root?

Thanks,
Shawn
Reply | Threaded
Open this post in threaded view
|

Re: Solr 8.2 docker image in cloud mode not connecting to Zookeeper on startup

A Adel
In reply to this post by Drew Kidder
This could be because Zookeeper ensemble is not properly configured. Using
a very similar setup which consists of ZK cluster of three hosts and one
Solr Cloud node (all are containers), the system got running. Each ZK host
has ZOO_MY_ID and ZOO_SERVERS environment variables set before running ZK.
In this case, the former variable value would be from 1 to 3 on each host
and the latter would be "server.1=z1:2888:3888;2181
server.2=z2:2888:3888;2181 server.3=z3:2888:3888;2181" the same on all
hosts (the double quotes may be needed for proper parsing). This
ZOO_SERVERS syntax is for ZK version 3.5. 3.4 is slightly different.

http://aadel.io

On Fri, Oct 18, 2019 at 5:28 PM Drew Kidder <[hidden email]> wrote:

> Thank you all for your suggestions! I appreciate the fast turnaround.
>
> My setup is using Amazon ECS for our solr cloud installation. Each ZK is in
> its own container, using Route53 Service Discovery to provide the DNS name.
> The ZK nodes can all talk to each other, and I can communicate to each one
> of those nodes from my local machine and from within the solr container.
> Solr is one node per container, as Martijn correctly assumed. I am not
> using a zkRoot at present because my intention is to use ZK solely for Solr
> Cloud and nothing else.
>
> I have tried removing the "-z" option from the Dockerfile CMD and using the
> ZK_HOST environment variable (see below). I have even also modified the
> solr.in.sh and set the ZK_HOST variable there, all to no avail. I have
> tried both the Dockerfile command route, and have logged into the solr
> container and tried to run the CMD manually to see if there was a problem
> with the way I was using the CMD entry. All of those methods give me the
> same result output captured in the gist below.
>
> The gist for my solr.log output is here:
> https://gist.github.com/dkidder/2db9a6d393dedb97a39ed32e2be0c087
>
> My Dockerfile for the solr container looks like this:
>
>
> FROM    solr:8.2
>
> EXPOSE    8983 8999 2181
>
> VOLUME    /app/logs
> VOLUME    /app/data
> VOLUME    /app/conf
>
> ## add our jetty configuration (increased request size!)
> COPY   jetty.xml /opt/solr/server/etc/
>
> ## SolrCloud configuration
> ENV     ZK_HOST zk1:2181,zk2:2181,zk3:2181
> ENV     ZK_CLIENT_TIMEOUT 30000
>
> USER   root
> RUN    apt-get update
> RUN    apt-get install -y netcat net-tools vim procps
> USER   solr
>
> # Copy over custom solr plugins
> COPY    myplugins/src/resources/* /opt/solr/server/solr/my-resources/
> COPY    lib/*.jar /opt/solr/my-lib/
>
> # Copy over my configs
> COPY    conf/ /app/conf
>
> #Start solr in cloud mode, connecting to zookeeper
> CMD       ["solr","start","-f","-c"]
>
> The docker command I use to execute this Dockerfile is `docker run -p
> 8983:8983 -p 2181:2181 --name $(APP_NAME) $(APP_NAME):latest`
>
> Output of `ps -eflww` from within the solr container (as root):
>
> root@fe0ad5b40b42:/opt/solr-8.2.0# ps -eflww
> F S UID        PID  PPID  C PRI  NI ADDR SZ WCHAN  STIME TTY          TIME
> CMD
> 4 S solr         1     0  9  80   0 - 1043842 -    14:36 ?        00:00:07
> /usr/local/openjdk-11/bin/java -server -Xms512m -Xmx512m -XX:+UseG1GC
> -XX:+PerfDisableSharedMem -XX:+ParallelRefProcEnabled
> -XX:MaxGCPauseMillis=250 -XX:+UseLargePages -XX:+AlwaysPreTouch
>
> -Xlog:gc*:file=/var/solr/logs/solr_gc.log:time,uptime:filecount=9,filesize=20M
> -Dcom.sun.management.jmxremote
> -Dcom.sun.management.jmxremote.local.only=false
> -Dcom.sun.management.jmxremote.ssl=false
> -Dcom.sun.management.jmxremote.authenticate=false
> -Dcom.sun.management.jmxremote.port=18983
> -Dcom.sun.management.jmxremote.rmi.port=18983 -DzkClientTimeout=30000
> -DzkHost=zk1:2181,zk2:2181,zk3:2181 -Dsolr.log.dir=/var/solr/logs
> -Djetty.port=8983 -DSTOP.PORT=7983 -DSTOP.KEY=solrrocks -Duser.timezone=UTC
> -Djetty.home=/opt/solr/server -Dsolr.solr.home=/var/solr/data
> -Dsolr.data.home= -Dsolr.install.dir=/opt/solr
> -Dsolr.default.confdir=/opt/solr/server/solr/configsets/_default/conf
> -Dlog4j.configurationFile=file:/var/solr/log4j2.xml -Xss256k
> -Dsolr.jetty.https.port=8983 -jar start.jar --module=http
> 4 S root        90     0  0  80   0 -  4988 -      14:37 pts/0    00:00:00
> /bin/bash
> 0 R root        95    90  0  80   0 -  9595 -      14:37 pts/0    00:00:00
> ps -eflww
>
> Output of netstat from within the solr container (as root):
>
> root@fe0ad5b40b42:/opt/solr-8.2.0# netstat
> Active Internet connections (w/o servers)
> Proto Recv-Q Send-Q Local Address           Foreign Address         State
> tcp        0      0 fe0ad5b40b42:43678      172.20.28.179:2181
>  TIME_WAIT
> tcp        0      0 fe0ad5b40b42:60164      172.20.155.241:2181
> TIME_WAIT
> tcp        0      0 fe0ad5b40b42:60500      172.20.60.138:2181
>  TIME_WAIT
> Active UNIX domain sockets (w/o servers)
> Proto RefCnt Flags       Type       State         I-Node   Path
> unix  2      [ ]         STREAM     CONNECTED     129252
> unix  2      [ ]         STREAM     CONNECTED     129270
>
> I'm beginning to think that ZK is not setup correctly. I haven't uploaded
> any configuration files to ZK yet; my understanding was that I could start
> up a solr cloud node with no collections and upload the configuration from
> there. I was under the impression that it would try to connect to ZK and if
> it couldn't get config files from there it would use local config files. Do
> I need to upload the solr cloud configuration files to ZK before starting
> up the cluster?  The netstat output makes it look like the solr container
> is indeed connected to the ZK containers, but there's no indication as to
> why it cannot connect to Zookeeper that I can see.
>
> --
> Drew([hidden email])
> http://wyntermute.dyndns.org/blog/
>
> -- I Drive Way Too Fast To Worry About Cholesterol.
>
>
> On Fri, Oct 18, 2019 at 3:11 AM Martijn Koster <
> [hidden email]>
> wrote:
>
> >
> >
> > > On 18 Oct 2019, at 00:25, Drew Kidder <[hidden email]> wrote:
> >
> > > * I'm using the following command line to start a basic solr cloud
> > instance
> > > as per the documentation: `bin/solr start -c -z
> > zk1:2181,zk2:2181,zk3:2181`
> >
> > I assume you’re just looking to run a single Solr node in a single
> > container, right?
> >
> > Just set the ZK_HOST environment variable, and remove the command-line
> > arguments.
> > And you don’t need to specify the port number unless you deviate from the
> > default.
> > Have a look at this example
> >
> https://github.com/docker-solr/docker-solr-examples/blob/master/swarm/docker-compose.yml
> > <
> >
> https://github.com/docker-solr/docker-solr-examples/blob/master/swarm/docker-compose.yml#L61with
> > >
> >
> > The “start” command starts Solr in the background, which is typically not
> > what you want
> > when running Solr under docker.
> >
> >
> > Why your command isn’t working as is, is not clear. When you say you’re
> > using that
> > command-line, how do you actually do that? In a full docker command line,
> > or a compose file, or from a “docker exec”, or from some orchestrator.
> > Share the exact thing you’re doing; perhaps there is mistake there.
> > Also, run `ps -eflww` in the container to see what command-line arguments
> > the JVM actually got started with.
> > And share the full startup log somewhere (in a GitHub gist perhaps),
> there
> > might be something of interest earlier on.
> >
> > >> (running `echo ruok | nc zk1 2181` returns the expected "imok"
> response
> > >> from ZK within the docker container where Solr is located)
> > >> * The netcat command mentioned above shows up in the ZK logs, but the
> > Solr
> > >> attempts to connect do not (it's like the request isn't even getting
> to
> > ZK)
> >
> > Then it doesn’t sound like a environmental
> firewall/security-group/routing
> > issue.
> > Next step to debug then could be to check if you actually see Solr make
> > tcp connections
> > to port 2181, in the Solr container, using tcpdump/sysdig/netstat or some
> > such.
> > If that gives a negative result, then you know it’s an issue in your Solr
> > invocation config, or name resolution.
> > If that gives a positive result, then it’s environmental after all; and
> > you can dig further.
> >
> >
> > But try the ZK_HOST thing first; it may just fix it.
> >
> > — Martijn
>
Reply | Threaded
Open this post in threaded view
|

Re: Solr 8.2 docker image in cloud mode not connecting to Zookeeper on startup

Jörn Franke
In reply to this post by Drew Kidder
Even if you do not have a dedicated zkRoot node you will need to provide / in the connection.

Then, even if the zk nodes can connect with each other it does not mean they form an ensemble. You need to adapt zoo.cfg of all nodes and add all nodes to it. Additionally all will need a myid file with a unique id.

Am 18.10.2019 um 17:28 schrieb Drew Kidder <[hidden email]>:

>
> Thank you all for your suggestions! I appreciate the fast turnaround.
>
> My setup is using Amazon ECS for our solr cloud installation. Each ZK is in
> its own container, using Route53 Service Discovery to provide the DNS name.
> The ZK nodes can all talk to each other, and I can communicate to each one
> of those nodes from my local machine and from within the solr container.
> Solr is one node per container, as Martijn correctly assumed. I am not
> using a zkRoot at present because my intention is to use ZK solely for Solr
> Cloud and nothing else.
>
> I have tried removing the "-z" option from the Dockerfile CMD and using the
> ZK_HOST environment variable (see below). I have even also modified the
> solr.in.sh and set the ZK_HOST variable there, all to no avail. I have
> tried both the Dockerfile command route, and have logged into the solr
> container and tried to run the CMD manually to see if there was a problem
> with the way I was using the CMD entry. All of those methods give me the
> same result output captured in the gist below.
>
> The gist for my solr.log output is here:
> https://gist.github.com/dkidder/2db9a6d393dedb97a39ed32e2be0c087
>
> My Dockerfile for the solr container looks like this:
>
>
> FROM    solr:8.2
>
> EXPOSE    8983 8999 2181
>
> VOLUME    /app/logs
> VOLUME    /app/data
> VOLUME    /app/conf
>
> ## add our jetty configuration (increased request size!)
> COPY   jetty.xml /opt/solr/server/etc/
>
> ## SolrCloud configuration
> ENV     ZK_HOST zk1:2181,zk2:2181,zk3:2181
> ENV     ZK_CLIENT_TIMEOUT 30000
>
> USER   root
> RUN    apt-get update
> RUN    apt-get install -y netcat net-tools vim procps
> USER   solr
>
> # Copy over custom solr plugins
> COPY    myplugins/src/resources/* /opt/solr/server/solr/my-resources/
> COPY    lib/*.jar /opt/solr/my-lib/
>
> # Copy over my configs
> COPY    conf/ /app/conf
>
> #Start solr in cloud mode, connecting to zookeeper
> CMD       ["solr","start","-f","-c"]
>
> The docker command I use to execute this Dockerfile is `docker run -p
> 8983:8983 -p 2181:2181 --name $(APP_NAME) $(APP_NAME):latest`
>
> Output of `ps -eflww` from within the solr container (as root):
>
> root@fe0ad5b40b42:/opt/solr-8.2.0# ps -eflww
> F S UID        PID  PPID  C PRI  NI ADDR SZ WCHAN  STIME TTY          TIME
> CMD
> 4 S solr         1     0  9  80   0 - 1043842 -    14:36 ?        00:00:07
> /usr/local/openjdk-11/bin/java -server -Xms512m -Xmx512m -XX:+UseG1GC
> -XX:+PerfDisableSharedMem -XX:+ParallelRefProcEnabled
> -XX:MaxGCPauseMillis=250 -XX:+UseLargePages -XX:+AlwaysPreTouch
> -Xlog:gc*:file=/var/solr/logs/solr_gc.log:time,uptime:filecount=9,filesize=20M
> -Dcom.sun.management.jmxremote
> -Dcom.sun.management.jmxremote.local.only=false
> -Dcom.sun.management.jmxremote.ssl=false
> -Dcom.sun.management.jmxremote.authenticate=false
> -Dcom.sun.management.jmxremote.port=18983
> -Dcom.sun.management.jmxremote.rmi.port=18983 -DzkClientTimeout=30000
> -DzkHost=zk1:2181,zk2:2181,zk3:2181 -Dsolr.log.dir=/var/solr/logs
> -Djetty.port=8983 -DSTOP.PORT=7983 -DSTOP.KEY=solrrocks -Duser.timezone=UTC
> -Djetty.home=/opt/solr/server -Dsolr.solr.home=/var/solr/data
> -Dsolr.data.home= -Dsolr.install.dir=/opt/solr
> -Dsolr.default.confdir=/opt/solr/server/solr/configsets/_default/conf
> -Dlog4j.configurationFile=file:/var/solr/log4j2.xml -Xss256k
> -Dsolr.jetty.https.port=8983 -jar start.jar --module=http
> 4 S root        90     0  0  80   0 -  4988 -      14:37 pts/0    00:00:00
> /bin/bash
> 0 R root        95    90  0  80   0 -  9595 -      14:37 pts/0    00:00:00
> ps -eflww
>
> Output of netstat from within the solr container (as root):
>
> root@fe0ad5b40b42:/opt/solr-8.2.0# netstat
> Active Internet connections (w/o servers)
> Proto Recv-Q Send-Q Local Address           Foreign Address         State
> tcp        0      0 fe0ad5b40b42:43678      172.20.28.179:2181
> TIME_WAIT
> tcp        0      0 fe0ad5b40b42:60164      172.20.155.241:2181
> TIME_WAIT
> tcp        0      0 fe0ad5b40b42:60500      172.20.60.138:2181
> TIME_WAIT
> Active UNIX domain sockets (w/o servers)
> Proto RefCnt Flags       Type       State         I-Node   Path
> unix  2      [ ]         STREAM     CONNECTED     129252
> unix  2      [ ]         STREAM     CONNECTED     129270
>
> I'm beginning to think that ZK is not setup correctly. I haven't uploaded
> any configuration files to ZK yet; my understanding was that I could start
> up a solr cloud node with no collections and upload the configuration from
> there. I was under the impression that it would try to connect to ZK and if
> it couldn't get config files from there it would use local config files. Do
> I need to upload the solr cloud configuration files to ZK before starting
> up the cluster?  The netstat output makes it look like the solr container
> is indeed connected to the ZK containers, but there's no indication as to
> why it cannot connect to Zookeeper that I can see.
>
> --
> Drew([hidden email])
> http://wyntermute.dyndns.org/blog/
>
> -- I Drive Way Too Fast To Worry About Cholesterol.
>
>
>> On Fri, Oct 18, 2019 at 3:11 AM Martijn Koster <[hidden email]>
>> wrote:
>>
>>
>>
>>>> On 18 Oct 2019, at 00:25, Drew Kidder <[hidden email]> wrote:
>>>
>>> * I'm using the following command line to start a basic solr cloud
>> instance
>>> as per the documentation: `bin/solr start -c -z
>> zk1:2181,zk2:2181,zk3:2181`
>>
>> I assume you’re just looking to run a single Solr node in a single
>> container, right?
>>
>> Just set the ZK_HOST environment variable, and remove the command-line
>> arguments.
>> And you don’t need to specify the port number unless you deviate from the
>> default.
>> Have a look at this example
>> https://github.com/docker-solr/docker-solr-examples/blob/master/swarm/docker-compose.yml
>> <
>> https://github.com/docker-solr/docker-solr-examples/blob/master/swarm/docker-compose.yml#L61with
>>>
>>
>> The “start” command starts Solr in the background, which is typically not
>> what you want
>> when running Solr under docker.
>>
>>
>> Why your command isn’t working as is, is not clear. When you say you’re
>> using that
>> command-line, how do you actually do that? In a full docker command line,
>> or a compose file, or from a “docker exec”, or from some orchestrator.
>> Share the exact thing you’re doing; perhaps there is mistake there.
>> Also, run `ps -eflww` in the container to see what command-line arguments
>> the JVM actually got started with.
>> And share the full startup log somewhere (in a GitHub gist perhaps), there
>> might be something of interest earlier on.
>>
>>>> (running `echo ruok | nc zk1 2181` returns the expected "imok" response
>>>> from ZK within the docker container where Solr is located)
>>>> * The netcat command mentioned above shows up in the ZK logs, but the
>> Solr
>>>> attempts to connect do not (it's like the request isn't even getting to
>> ZK)
>>
>> Then it doesn’t sound like a environmental firewall/security-group/routing
>> issue.
>> Next step to debug then could be to check if you actually see Solr make
>> tcp connections
>> to port 2181, in the Solr container, using tcpdump/sysdig/netstat or some
>> such.
>> If that gives a negative result, then you know it’s an issue in your Solr
>> invocation config, or name resolution.
>> If that gives a positive result, then it’s environmental after all; and
>> you can dig further.
>>
>>
>> But try the ZK_HOST thing first; it may just fix it.
>>
>> — Martijn
Reply | Threaded
Open this post in threaded view
|

Re: Solr 8.2 docker image in cloud mode not connecting to Zookeeper on startup

Drew Kidder
In reply to this post by A Adel
Again, thank you all for the suggestions.

My ZK ensemble is talking to each other and the outside world:

solr@fe0ad5b40b42:/etc/default# echo srvr | nc zk1.zookeeper.internal 2181
Zookeeper version: 3.5.5-390fe37ea45dee01bf87dc1c042b5e3dcce88653, built on
05/03/2019 12:07 GMT
Latency min/avg/max: 0/0/0
Received: 53
Sent: 33
Connections: 1
Outstanding: 19
Zxid: 0x0
Mode: follower
Node count: 5

solr@fe0ad5b40b42:/etc/default# echo srvr | nc zk2.zookeeper.internal 2181
Zookeeper version: 3.5.5-390fe37ea45dee01bf87dc1c042b5e3dcce88653, built on
05/03/2019 12:07 GMT
Latency min/avg/max: 0/0/0
Received: 37
Sent: 17
Connections: 1
Outstanding: 19
Zxid: 0x200000000
Mode: leader
Node count: 5
Proposal sizes last/min/max: 32/32/36

solr@fe0ad5b40b42:/etc/default# echo srvr | nc zk3.zookeeper.internal 2181
Zookeeper version: 3.5.5-390fe37ea45dee01bf87dc1c042b5e3dcce88653, built on
05/03/2019 12:07 GMT
Latency min/avg/max: 0/0/0
Received: 7
Sent: 3
Connections: 1
Outstanding: 3
Zxid: 0x200000000
Mode: follower
Node count: 5

All of these commands can be executed on the solr container as either the
root user or the solr user (see the command prompt in each command). Note
that zk2 is the leader and zk1 and zk3 are followers. The configuration
files (including the ZOO_MY_ID and ZOO_SERVERS environment variables) are
all set up correctly and by all rights and purposes, ZK appears to be set
up correctly and functioning.

Jorne Franke: I tried implementing your suggestion of providing "/" as the
root node by appending "/" to the end of the ZK_HOST connection string and
it still did not work (e.g. ENV ZK_HOST
zk1.zookeeper.internal:2181,zk2.zookeeper.internal:2181,zk3.zookeeper.internal:2181/
in the Dockerfile). Was this what you meant?  Or were you suggesting to set
the ZK_ROOT in the Solr configs/environment instead?

--
Drew([hidden email])
http://wyntermute.dyndns.org/blog/

-- I Drive Way Too Fast To Worry About Cholesterol.


On Fri, Oct 18, 2019 at 12:11 PM Ahmed Adel <[hidden email]> wrote:

> This could be because Zookeeper ensemble is not properly configured. Using
> a very similar setup which consists of ZK cluster of three hosts and one
> Solr Cloud node (all are containers), the system got running. Each ZK host
> has ZOO_MY_ID and ZOO_SERVERS environment variables set before running ZK.
> In this case, the former variable value would be from 1 to 3 on each host
> and the latter would be "server.1=z1:2888:3888;2181
> server.2=z2:2888:3888;2181 server.3=z3:2888:3888;2181" the same on all
> hosts (the double quotes may be needed for proper parsing). This
> ZOO_SERVERS syntax is for ZK version 3.5. 3.4 is slightly different.
>
> http://aadel.io
>
> On Fri, Oct 18, 2019 at 5:28 PM Drew Kidder <[hidden email]> wrote:
>
> > Thank you all for your suggestions! I appreciate the fast turnaround.
> >
> > My setup is using Amazon ECS for our solr cloud installation. Each ZK is
> in
> > its own container, using Route53 Service Discovery to provide the DNS
> name.
> > The ZK nodes can all talk to each other, and I can communicate to each
> one
> > of those nodes from my local machine and from within the solr container.
> > Solr is one node per container, as Martijn correctly assumed. I am not
> > using a zkRoot at present because my intention is to use ZK solely for
> Solr
> > Cloud and nothing else.
> >
> > I have tried removing the "-z" option from the Dockerfile CMD and using
> the
> > ZK_HOST environment variable (see below). I have even also modified the
> > solr.in.sh and set the ZK_HOST variable there, all to no avail. I have
> > tried both the Dockerfile command route, and have logged into the solr
> > container and tried to run the CMD manually to see if there was a problem
> > with the way I was using the CMD entry. All of those methods give me the
> > same result output captured in the gist below.
> >
> > The gist for my solr.log output is here:
> > https://gist.github.com/dkidder/2db9a6d393dedb97a39ed32e2be0c087
> >
> > My Dockerfile for the solr container looks like this:
> >
> >
> > FROM    solr:8.2
> >
> > EXPOSE    8983 8999 2181
> >
> > VOLUME    /app/logs
> > VOLUME    /app/data
> > VOLUME    /app/conf
> >
> > ## add our jetty configuration (increased request size!)
> > COPY   jetty.xml /opt/solr/server/etc/
> >
> > ## SolrCloud configuration
> > ENV     ZK_HOST zk1:2181,zk2:2181,zk3:2181
> > ENV     ZK_CLIENT_TIMEOUT 30000
> >
> > USER   root
> > RUN    apt-get update
> > RUN    apt-get install -y netcat net-tools vim procps
> > USER   solr
> >
> > # Copy over custom solr plugins
> > COPY    myplugins/src/resources/* /opt/solr/server/solr/my-resources/
> > COPY    lib/*.jar /opt/solr/my-lib/
> >
> > # Copy over my configs
> > COPY    conf/ /app/conf
> >
> > #Start solr in cloud mode, connecting to zookeeper
> > CMD       ["solr","start","-f","-c"]
> >
> > The docker command I use to execute this Dockerfile is `docker run -p
> > 8983:8983 -p 2181:2181 --name $(APP_NAME) $(APP_NAME):latest`
> >
> > Output of `ps -eflww` from within the solr container (as root):
> >
> > root@fe0ad5b40b42:/opt/solr-8.2.0# ps -eflww
> > F S UID        PID  PPID  C PRI  NI ADDR SZ WCHAN  STIME TTY
> TIME
> > CMD
> > 4 S solr         1     0  9  80   0 - 1043842 -    14:36 ?
> 00:00:07
> > /usr/local/openjdk-11/bin/java -server -Xms512m -Xmx512m -XX:+UseG1GC
> > -XX:+PerfDisableSharedMem -XX:+ParallelRefProcEnabled
> > -XX:MaxGCPauseMillis=250 -XX:+UseLargePages -XX:+AlwaysPreTouch
> >
> >
> -Xlog:gc*:file=/var/solr/logs/solr_gc.log:time,uptime:filecount=9,filesize=20M
> > -Dcom.sun.management.jmxremote
> > -Dcom.sun.management.jmxremote.local.only=false
> > -Dcom.sun.management.jmxremote.ssl=false
> > -Dcom.sun.management.jmxremote.authenticate=false
> > -Dcom.sun.management.jmxremote.port=18983
> > -Dcom.sun.management.jmxremote.rmi.port=18983 -DzkClientTimeout=30000
> > -DzkHost=zk1:2181,zk2:2181,zk3:2181 -Dsolr.log.dir=/var/solr/logs
> > -Djetty.port=8983 -DSTOP.PORT=7983 -DSTOP.KEY=solrrocks
> -Duser.timezone=UTC
> > -Djetty.home=/opt/solr/server -Dsolr.solr.home=/var/solr/data
> > -Dsolr.data.home= -Dsolr.install.dir=/opt/solr
> > -Dsolr.default.confdir=/opt/solr/server/solr/configsets/_default/conf
> > -Dlog4j.configurationFile=file:/var/solr/log4j2.xml -Xss256k
> > -Dsolr.jetty.https.port=8983 -jar start.jar --module=http
> > 4 S root        90     0  0  80   0 -  4988 -      14:37 pts/0
> 00:00:00
> > /bin/bash
> > 0 R root        95    90  0  80   0 -  9595 -      14:37 pts/0
> 00:00:00
> > ps -eflww
> >
> > Output of netstat from within the solr container (as root):
> >
> > root@fe0ad5b40b42:/opt/solr-8.2.0# netstat
> > Active Internet connections (w/o servers)
> > Proto Recv-Q Send-Q Local Address           Foreign Address         State
> > tcp        0      0 fe0ad5b40b42:43678      172.20.28.179:2181
> >  TIME_WAIT
> > tcp        0      0 fe0ad5b40b42:60164      172.20.155.241:2181
> > TIME_WAIT
> > tcp        0      0 fe0ad5b40b42:60500      172.20.60.138:2181
> >  TIME_WAIT
> > Active UNIX domain sockets (w/o servers)
> > Proto RefCnt Flags       Type       State         I-Node   Path
> > unix  2      [ ]         STREAM     CONNECTED     129252
> > unix  2      [ ]         STREAM     CONNECTED     129270
> >
> > I'm beginning to think that ZK is not setup correctly. I haven't uploaded
> > any configuration files to ZK yet; my understanding was that I could
> start
> > up a solr cloud node with no collections and upload the configuration
> from
> > there. I was under the impression that it would try to connect to ZK and
> if
> > it couldn't get config files from there it would use local config files.
> Do
> > I need to upload the solr cloud configuration files to ZK before starting
> > up the cluster?  The netstat output makes it look like the solr container
> > is indeed connected to the ZK containers, but there's no indication as to
> > why it cannot connect to Zookeeper that I can see.
> >
> > --
> > Drew([hidden email])
> > http://wyntermute.dyndns.org/blog/
> >
> > -- I Drive Way Too Fast To Worry About Cholesterol.
> >
> >
> > On Fri, Oct 18, 2019 at 3:11 AM Martijn Koster <
> > [hidden email]>
> > wrote:
> >
> > >
> > >
> > > > On 18 Oct 2019, at 00:25, Drew Kidder <[hidden email]> wrote:
> > >
> > > > * I'm using the following command line to start a basic solr cloud
> > > instance
> > > > as per the documentation: `bin/solr start -c -z
> > > zk1:2181,zk2:2181,zk3:2181`
> > >
> > > I assume you’re just looking to run a single Solr node in a single
> > > container, right?
> > >
> > > Just set the ZK_HOST environment variable, and remove the command-line
> > > arguments.
> > > And you don’t need to specify the port number unless you deviate from
> the
> > > default.
> > > Have a look at this example
> > >
> >
> https://github.com/docker-solr/docker-solr-examples/blob/master/swarm/docker-compose.yml
> > > <
> > >
> >
> https://github.com/docker-solr/docker-solr-examples/blob/master/swarm/docker-compose.yml#L61with
> > > >
> > >
> > > The “start” command starts Solr in the background, which is typically
> not
> > > what you want
> > > when running Solr under docker.
> > >
> > >
> > > Why your command isn’t working as is, is not clear. When you say you’re
> > > using that
> > > command-line, how do you actually do that? In a full docker command
> line,
> > > or a compose file, or from a “docker exec”, or from some orchestrator.
> > > Share the exact thing you’re doing; perhaps there is mistake there.
> > > Also, run `ps -eflww` in the container to see what command-line
> arguments
> > > the JVM actually got started with.
> > > And share the full startup log somewhere (in a GitHub gist perhaps),
> > there
> > > might be something of interest earlier on.
> > >
> > > >> (running `echo ruok | nc zk1 2181` returns the expected "imok"
> > response
> > > >> from ZK within the docker container where Solr is located)
> > > >> * The netcat command mentioned above shows up in the ZK logs, but
> the
> > > Solr
> > > >> attempts to connect do not (it's like the request isn't even getting
> > to
> > > ZK)
> > >
> > > Then it doesn’t sound like a environmental
> > firewall/security-group/routing
> > > issue.
> > > Next step to debug then could be to check if you actually see Solr make
> > > tcp connections
> > > to port 2181, in the Solr container, using tcpdump/sysdig/netstat or
> some
> > > such.
> > > If that gives a negative result, then you know it’s an issue in your
> Solr
> > > invocation config, or name resolution.
> > > If that gives a positive result, then it’s environmental after all; and
> > > you can dig further.
> > >
> > >
> > > But try the ZK_HOST thing first; it may just fix it.
> > >
> > > — Martijn
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Solr 8.2 docker image in cloud mode not connecting to Zookeeper on startup

Drew Kidder
As an additional bit of information, here's the tcpdump of my startup of
solr in the docker container, after logging into the container and running
"bin/solr start -f -c" (which is the same CMD my Dockerfile executes):

root@91e3883fb675:/opt/solr-8.2.0# tcpdump -nvvv -i any -c 100 host
172.20.60.138
tcpdump: listening on any, link-type LINUX_SLL (Linux cooked), capture size
262144 bytes
21:54:49.426019 IP (tos 0x0, ttl 64, id 44803, offset 0, flags [DF], proto
TCP (6), length 60)
    172.17.0.2.60562 > 172.20.60.138.2181: Flags [S], cksum 0x94e0
(incorrect -> 0x19d3), seq 2175798173, win 29200, options [mss
1460,sackOK,TS val 6792350 ecr 0,nop,wscale 7], length 0
21:54:49.472340 IP (tos 0x0, ttl 37, id 37699, offset 0, flags [none],
proto TCP (6), length 48)
    172.20.60.138.2181 > 172.17.0.2.60562: Flags [S.], cksum 0xd892
(correct), seq 452884582, ack 2175798174, win 65535, options [mss
1460,wscale 2,eol], length 0
21:54:49.472428 IP (tos 0x0, ttl 64, id 44804, offset 0, flags [DF], proto
TCP (6), length 40)
    172.17.0.2.60562 > 172.20.60.138.2181: Flags [.], cksum 0x94cc
(incorrect -> 0x0472), seq 1, ack 1, win 229, length 0
21:54:49.472950 IP (tos 0x0, ttl 64, id 44805, offset 0, flags [DF], proto
TCP (6), length 89)
    172.17.0.2.60562 > 172.20.60.138.2181: Flags [P.], cksum 0x94fd
(incorrect -> 0x8ecb), seq 1:50, ack 1, win 229, length 49
21:54:49.473400 IP (tos 0x0, ttl 37, id 33425, offset 0, flags [none],
proto TCP (6), length 40)
    172.20.60.138.2181 > 172.17.0.2.60562: Flags [.], cksum 0x0526
(correct), seq 1, ack 50, win 65535, length 0
21:54:59.448636 IP (tos 0x0, ttl 64, id 44806, offset 0, flags [DF], proto
TCP (6), length 40)
    172.17.0.2.60562 > 172.20.60.138.2181: Flags [F.], cksum 0x94cc
(incorrect -> 0x0440), seq 50, ack 1, win 229, length 0
21:54:59.449070 IP (tos 0x0, ttl 37, id 3430, offset 0, flags [none], proto
TCP (6), length 40)
    172.20.60.138.2181 > 172.17.0.2.60562: Flags [.], cksum 0x0525
(correct), seq 1, ack 51, win 65535, length 0
21:55:21.518447 IP (tos 0x0, ttl 37, id 2259, offset 0, flags [none], proto
TCP (6), length 40)
    172.20.60.138.2181 > 172.17.0.2.60562: Flags [F.], cksum 0x0524
(correct), seq 1, ack 51, win 65535, length 0
21:55:21.518513 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP
(6), length 40)
    172.17.0.2.60562 > 172.20.60.138.2181: Flags [.], cksum 0x043f
(correct), seq 51, ack 2, win 229, length 0

172.17.0.2 is my solr docker container, 172.20.60.138 is my zk1 docker
container residing out in AWS.

From this, it looks like communication is happening but that it's finishing
and closing the connection instead of holding it open. Am I interpreting
this correctly?


--
Drew([hidden email])
http://wyntermute.dyndns.org/blog/

-- I Drive Way Too Fast To Worry About Cholesterol.


On Fri, Oct 18, 2019 at 1:18 PM Drew Kidder <[hidden email]> wrote:

> Again, thank you all for the suggestions.
>
> My ZK ensemble is talking to each other and the outside world:
>
> solr@fe0ad5b40b42:/etc/default# echo srvr | nc zk1.zookeeper.internal 2181
> Zookeeper version: 3.5.5-390fe37ea45dee01bf87dc1c042b5e3dcce88653, built
> on 05/03/2019 12:07 GMT
> Latency min/avg/max: 0/0/0
> Received: 53
> Sent: 33
> Connections: 1
> Outstanding: 19
> Zxid: 0x0
> Mode: follower
> Node count: 5
>
> solr@fe0ad5b40b42:/etc/default# echo srvr | nc zk2.zookeeper.internal 2181
> Zookeeper version: 3.5.5-390fe37ea45dee01bf87dc1c042b5e3dcce88653, built
> on 05/03/2019 12:07 GMT
> Latency min/avg/max: 0/0/0
> Received: 37
> Sent: 17
> Connections: 1
> Outstanding: 19
> Zxid: 0x200000000
> Mode: leader
> Node count: 5
> Proposal sizes last/min/max: 32/32/36
>
> solr@fe0ad5b40b42:/etc/default# echo srvr | nc zk3.zookeeper.internal 2181
> Zookeeper version: 3.5.5-390fe37ea45dee01bf87dc1c042b5e3dcce88653, built
> on 05/03/2019 12:07 GMT
> Latency min/avg/max: 0/0/0
> Received: 7
> Sent: 3
> Connections: 1
> Outstanding: 3
> Zxid: 0x200000000
> Mode: follower
> Node count: 5
>
> All of these commands can be executed on the solr container as either the
> root user or the solr user (see the command prompt in each command). Note
> that zk2 is the leader and zk1 and zk3 are followers. The configuration
> files (including the ZOO_MY_ID and ZOO_SERVERS environment variables) are
> all set up correctly and by all rights and purposes, ZK appears to be set
> up correctly and functioning.
>
> Jorne Franke: I tried implementing your suggestion of providing "/" as the
> root node by appending "/" to the end of the ZK_HOST connection string and
> it still did not work (e.g. ENV ZK_HOST
> zk1.zookeeper.internal:2181,zk2.zookeeper.internal:2181,zk3.zookeeper.internal:2181/
> in the Dockerfile). Was this what you meant?  Or were you suggesting to set
> the ZK_ROOT in the Solr configs/environment instead?
>
> --
> Drew([hidden email])
> http://wyntermute.dyndns.org/blog/
>
> -- I Drive Way Too Fast To Worry About Cholesterol.
>
>
> On Fri, Oct 18, 2019 at 12:11 PM Ahmed Adel <[hidden email]> wrote:
>
>> This could be because Zookeeper ensemble is not properly configured. Using
>> a very similar setup which consists of ZK cluster of three hosts and one
>> Solr Cloud node (all are containers), the system got running. Each ZK host
>> has ZOO_MY_ID and ZOO_SERVERS environment variables set before running ZK.
>> In this case, the former variable value would be from 1 to 3 on each host
>> and the latter would be "server.1=z1:2888:3888;2181
>> server.2=z2:2888:3888;2181 server.3=z3:2888:3888;2181" the same on all
>> hosts (the double quotes may be needed for proper parsing). This
>> ZOO_SERVERS syntax is for ZK version 3.5. 3.4 is slightly different.
>>
>> http://aadel.io
>>
>> On Fri, Oct 18, 2019 at 5:28 PM Drew Kidder <[hidden email]> wrote:
>>
>> > Thank you all for your suggestions! I appreciate the fast turnaround.
>> >
>> > My setup is using Amazon ECS for our solr cloud installation. Each ZK
>> is in
>> > its own container, using Route53 Service Discovery to provide the DNS
>> name.
>> > The ZK nodes can all talk to each other, and I can communicate to each
>> one
>> > of those nodes from my local machine and from within the solr container.
>> > Solr is one node per container, as Martijn correctly assumed. I am not
>> > using a zkRoot at present because my intention is to use ZK solely for
>> Solr
>> > Cloud and nothing else.
>> >
>> > I have tried removing the "-z" option from the Dockerfile CMD and using
>> the
>> > ZK_HOST environment variable (see below). I have even also modified the
>> > solr.in.sh and set the ZK_HOST variable there, all to no avail. I have
>> > tried both the Dockerfile command route, and have logged into the solr
>> > container and tried to run the CMD manually to see if there was a
>> problem
>> > with the way I was using the CMD entry. All of those methods give me the
>> > same result output captured in the gist below.
>> >
>> > The gist for my solr.log output is here:
>> > https://gist.github.com/dkidder/2db9a6d393dedb97a39ed32e2be0c087
>> >
>> > My Dockerfile for the solr container looks like this:
>> >
>> >
>> > FROM    solr:8.2
>> >
>> > EXPOSE    8983 8999 2181
>> >
>> > VOLUME    /app/logs
>> > VOLUME    /app/data
>> > VOLUME    /app/conf
>> >
>> > ## add our jetty configuration (increased request size!)
>> > COPY   jetty.xml /opt/solr/server/etc/
>> >
>> > ## SolrCloud configuration
>> > ENV     ZK_HOST zk1:2181,zk2:2181,zk3:2181
>> > ENV     ZK_CLIENT_TIMEOUT 30000
>> >
>> > USER   root
>> > RUN    apt-get update
>> > RUN    apt-get install -y netcat net-tools vim procps
>> > USER   solr
>> >
>> > # Copy over custom solr plugins
>> > COPY    myplugins/src/resources/* /opt/solr/server/solr/my-resources/
>> > COPY    lib/*.jar /opt/solr/my-lib/
>> >
>> > # Copy over my configs
>> > COPY    conf/ /app/conf
>> >
>> > #Start solr in cloud mode, connecting to zookeeper
>> > CMD       ["solr","start","-f","-c"]
>> >
>> > The docker command I use to execute this Dockerfile is `docker run -p
>> > 8983:8983 -p 2181:2181 --name $(APP_NAME) $(APP_NAME):latest`
>> >
>> > Output of `ps -eflww` from within the solr container (as root):
>> >
>> > root@fe0ad5b40b42:/opt/solr-8.2.0# ps -eflww
>> > F S UID        PID  PPID  C PRI  NI ADDR SZ WCHAN  STIME TTY
>> TIME
>> > CMD
>> > 4 S solr         1     0  9  80   0 - 1043842 -    14:36 ?
>> 00:00:07
>> > /usr/local/openjdk-11/bin/java -server -Xms512m -Xmx512m -XX:+UseG1GC
>> > -XX:+PerfDisableSharedMem -XX:+ParallelRefProcEnabled
>> > -XX:MaxGCPauseMillis=250 -XX:+UseLargePages -XX:+AlwaysPreTouch
>> >
>> >
>> -Xlog:gc*:file=/var/solr/logs/solr_gc.log:time,uptime:filecount=9,filesize=20M
>> > -Dcom.sun.management.jmxremote
>> > -Dcom.sun.management.jmxremote.local.only=false
>> > -Dcom.sun.management.jmxremote.ssl=false
>> > -Dcom.sun.management.jmxremote.authenticate=false
>> > -Dcom.sun.management.jmxremote.port=18983
>> > -Dcom.sun.management.jmxremote.rmi.port=18983 -DzkClientTimeout=30000
>> > -DzkHost=zk1:2181,zk2:2181,zk3:2181 -Dsolr.log.dir=/var/solr/logs
>> > -Djetty.port=8983 -DSTOP.PORT=7983 -DSTOP.KEY=solrrocks
>> -Duser.timezone=UTC
>> > -Djetty.home=/opt/solr/server -Dsolr.solr.home=/var/solr/data
>> > -Dsolr.data.home= -Dsolr.install.dir=/opt/solr
>> > -Dsolr.default.confdir=/opt/solr/server/solr/configsets/_default/conf
>> > -Dlog4j.configurationFile=file:/var/solr/log4j2.xml -Xss256k
>> > -Dsolr.jetty.https.port=8983 -jar start.jar --module=http
>> > 4 S root        90     0  0  80   0 -  4988 -      14:37 pts/0
>> 00:00:00
>> > /bin/bash
>> > 0 R root        95    90  0  80   0 -  9595 -      14:37 pts/0
>> 00:00:00
>> > ps -eflww
>> >
>> > Output of netstat from within the solr container (as root):
>> >
>> > root@fe0ad5b40b42:/opt/solr-8.2.0# netstat
>> > Active Internet connections (w/o servers)
>> > Proto Recv-Q Send-Q Local Address           Foreign Address
>>  State
>> > tcp        0      0 fe0ad5b40b42:43678      172.20.28.179:2181
>> >  TIME_WAIT
>> > tcp        0      0 fe0ad5b40b42:60164      172.20.155.241:2181
>> > TIME_WAIT
>> > tcp        0      0 fe0ad5b40b42:60500      172.20.60.138:2181
>> >  TIME_WAIT
>> > Active UNIX domain sockets (w/o servers)
>> > Proto RefCnt Flags       Type       State         I-Node   Path
>> > unix  2      [ ]         STREAM     CONNECTED     129252
>> > unix  2      [ ]         STREAM     CONNECTED     129270
>> >
>> > I'm beginning to think that ZK is not setup correctly. I haven't
>> uploaded
>> > any configuration files to ZK yet; my understanding was that I could
>> start
>> > up a solr cloud node with no collections and upload the configuration
>> from
>> > there. I was under the impression that it would try to connect to ZK
>> and if
>> > it couldn't get config files from there it would use local config
>> files. Do
>> > I need to upload the solr cloud configuration files to ZK before
>> starting
>> > up the cluster?  The netstat output makes it look like the solr
>> container
>> > is indeed connected to the ZK containers, but there's no indication as
>> to
>> > why it cannot connect to Zookeeper that I can see.
>> >
>> > --
>> > Drew([hidden email])
>> > http://wyntermute.dyndns.org/blog/
>> >
>> > -- I Drive Way Too Fast To Worry About Cholesterol.
>> >
>> >
>> > On Fri, Oct 18, 2019 at 3:11 AM Martijn Koster <
>> > [hidden email]>
>> > wrote:
>> >
>> > >
>> > >
>> > > > On 18 Oct 2019, at 00:25, Drew Kidder <[hidden email]> wrote:
>> > >
>> > > > * I'm using the following command line to start a basic solr cloud
>> > > instance
>> > > > as per the documentation: `bin/solr start -c -z
>> > > zk1:2181,zk2:2181,zk3:2181`
>> > >
>> > > I assume you’re just looking to run a single Solr node in a single
>> > > container, right?
>> > >
>> > > Just set the ZK_HOST environment variable, and remove the command-line
>> > > arguments.
>> > > And you don’t need to specify the port number unless you deviate from
>> the
>> > > default.
>> > > Have a look at this example
>> > >
>> >
>> https://github.com/docker-solr/docker-solr-examples/blob/master/swarm/docker-compose.yml
>> > > <
>> > >
>> >
>> https://github.com/docker-solr/docker-solr-examples/blob/master/swarm/docker-compose.yml#L61with
>> > > >
>> > >
>> > > The “start” command starts Solr in the background, which is typically
>> not
>> > > what you want
>> > > when running Solr under docker.
>> > >
>> > >
>> > > Why your command isn’t working as is, is not clear. When you say
>> you’re
>> > > using that
>> > > command-line, how do you actually do that? In a full docker command
>> line,
>> > > or a compose file, or from a “docker exec”, or from some orchestrator.
>> > > Share the exact thing you’re doing; perhaps there is mistake there.
>> > > Also, run `ps -eflww` in the container to see what command-line
>> arguments
>> > > the JVM actually got started with.
>> > > And share the full startup log somewhere (in a GitHub gist perhaps),
>> > there
>> > > might be something of interest earlier on.
>> > >
>> > > >> (running `echo ruok | nc zk1 2181` returns the expected "imok"
>> > response
>> > > >> from ZK within the docker container where Solr is located)
>> > > >> * The netcat command mentioned above shows up in the ZK logs, but
>> the
>> > > Solr
>> > > >> attempts to connect do not (it's like the request isn't even
>> getting
>> > to
>> > > ZK)
>> > >
>> > > Then it doesn’t sound like a environmental
>> > firewall/security-group/routing
>> > > issue.
>> > > Next step to debug then could be to check if you actually see Solr
>> make
>> > > tcp connections
>> > > to port 2181, in the Solr container, using tcpdump/sysdig/netstat or
>> some
>> > > such.
>> > > If that gives a negative result, then you know it’s an issue in your
>> Solr
>> > > invocation config, or name resolution.
>> > > If that gives a positive result, then it’s environmental after all;
>> and
>> > > you can dig further.
>> > >
>> > >
>> > > But try the ZK_HOST thing first; it may just fix it.
>> > >
>> > > — Martijn
>> >
>>
>