Solr 8.x Startup problems when ZK is partially unavailable

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Solr 8.x Startup problems when ZK is partially unavailable

Markus Jelsma-2
Hello,

I have multiple collections, one 7.5.0 and the rest is on 8.3.1. They all share the same ZK ensemble and have the same ZK connection string. The first ZK address in the connection string is one that is not reachable, it seems firewalled, the rest is accessible.

The 7.5.0 nodes do not appear to have problems with a partial accessible ZK ensemble. It gave a simple warning but the cores on the nodes keep starting up nicely.

I have trouble starting up 8.x nodes because it times out when connecting to ZK. The logs are filled with:

2020-01-10 16:33:33.146 WARN  (qtp1620948294-21) [   ] o.a.s.h.a.ZookeeperStatusHandler Failed talking to zookeeper bad_node1:2181 => org.apache.solr.common.SolrException: Failed talking to Zookeeper 89.188.14.28:2181
        at org.apache.solr.handler.admin.ZookeeperStatusHandler.getZkRawResponse(ZookeeperStatusHandler.java:245)

And i get this one for one of the cores on a restarted node:

2020-01-10 16:31:11.752 ERROR (searcherExecutor-12-thread-1-processing-n:s2.io:8983_solr x:documents_shard2_replica_t19 c:documents s:shard2 r:core_node20) [c:documents s:shard2 r:core_node20 x:documents_shard2_replica_t19] o.a.s.h.RequestHandlerBase java.lang.NullPointerException
        at org.apache.solr.handler.component.SearchHandler.initComponents(SearchHandler.java:183)

This one is probably preventing the core from getting properly loaded. One the same node, however, there is another shard of the same collection, which did start up normally, as did other cores on the node.

Is this a known 8.x problem? I can work around it by temporarily removing the bad node address from the ZK connection string but thats all.

Thanks,
Markus

Reply | Threaded
Open this post in threaded view
|

Re: Solr 8.x Startup problems when ZK is partially unavailable

Jan Høydahl / Cominvent
I’ve also seen timeout with zkCli.sh of Solr8.4 when connected to 3 ZK and the first is not accessible. Solr 8.4 has ZK3.5.5 while 7.x has Zk3.4.x

Jan Høydahl

> 10. jan. 2020 kl. 17:44 skrev Markus Jelsma <[hidden email]>:
>
> Hello,
>
> I have multiple collections, one 7.5.0 and the rest is on 8.3.1. They all share the same ZK ensemble and have the same ZK connection string. The first ZK address in the connection string is one that is not reachable, it seems firewalled, the rest is accessible.
>
> The 7.5.0 nodes do not appear to have problems with a partial accessible ZK ensemble. It gave a simple warning but the cores on the nodes keep starting up nicely.
>
> I have trouble starting up 8.x nodes because it times out when connecting to ZK. The logs are filled with:
>
> 2020-01-10 16:33:33.146 WARN  (qtp1620948294-21) [   ] o.a.s.h.a.ZookeeperStatusHandler Failed talking to zookeeper bad_node1:2181 => org.apache.solr.common.SolrException: Failed talking to Zookeeper 89.188.14.28:2181
>        at org.apache.solr.handler.admin.ZookeeperStatusHandler.getZkRawResponse(ZookeeperStatusHandler.java:245)
>
> And i get this one for one of the cores on a restarted node:
>
> 2020-01-10 16:31:11.752 ERROR (searcherExecutor-12-thread-1-processing-n:s2.io:8983_solr x:documents_shard2_replica_t19 c:documents s:shard2 r:core_node20) [c:documents s:shard2 r:core_node20 x:documents_shard2_replica_t19] o.a.s.h.RequestHandlerBase java.lang.NullPointerException
>        at org.apache.solr.handler.component.SearchHandler.initComponents(SearchHandler.java:183)
>
> This one is probably preventing the core from getting properly loaded. One the same node, however, there is another shard of the same collection, which did start up normally, as did other cores on the node.
>
> Is this a known 8.x problem? I can work around it by temporarily removing the bad node address from the ZK connection string but thats all.
>
> Thanks,
> Markus
>