How to resolve a single domain name to multiple zookeeper IP in Solr

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

How to resolve a single domain name to multiple zookeeper IP in Solr

LEE Ween Jiann
I'm trying to modify the helm chart for solr such that it works for kubernetes (k8s) deployment correctly. There needs to be a particular change in the way solr resolves zookeepers hostname in order for this to happen.

Let me explain...
The standard way to configure solr is by listing all the zookeeper hostname/IP in either:

  *   solr.in.sh or solr.in.cmd
  *   zoo.cfg
  *   -z param
For example: ZK_HOST="zk1:2181,zk2:2181,zk3:2181".

However, when it comes to cloud deployment, in particular on k8s using helm chart, this is not an ideal situation as the user is required to modify zk_host each time they scale the number of zookeeper up/down.

  *   For example (scale down): ZK_HOST="zk1:2181,zk2:2181".
  *   For example (scale up): ZK_HOST="zk1:2181,zk2:2181,zk3:2181,zk4:2181".

This cannot be done automatically using in helm/k8s. In k8s, this parameter should remain static, meaning that it should not be changed after deployment of the chart.

  *   For example (k8s): ZK_HOST="zk-headless:2181".

What a chart can do is to create a service with a DNS name such as zk-headless that contains all the IP of the zookeepers, and as zookeeper scales, the number of IP resolved from zk-headless changes. Could solr to resolve multiple zookeeper IPs from a single name?

Cheers,
Ween Jiann
Reply | Threaded
Open this post in threaded view
|

Re: How to resolve a single domain name to multiple zookeeper IP in Solr

Jörn Franke
The newest zk version supports dynamic change of the zk instances:

https://zookeeper.apache.org/doc/r3.5.3-beta/zookeeperReconfig.html

However, for that to work properly in case of a Solr restart you always need a minimal set of servers that do not change and just increase/decrease additional ones.

> Am 26.09.2019 um 12:22 schrieb LEE Ween Jiann <[hidden email]>:
>
> I'm trying to modify the helm chart for solr such that it works for kubernetes (k8s) deployment correctly. There needs to be a particular change in the way solr resolves zookeepers hostname in order for this to happen.
>
> Let me explain...
> The standard way to configure solr is by listing all the zookeeper hostname/IP in either:
>
>  *   solr.in.sh or solr.in.cmd
>  *   zoo.cfg
>  *   -z param
> For example: ZK_HOST="zk1:2181,zk2:2181,zk3:2181".
>
> However, when it comes to cloud deployment, in particular on k8s using helm chart, this is not an ideal situation as the user is required to modify zk_host each time they scale the number of zookeeper up/down.
>
>  *   For example (scale down): ZK_HOST="zk1:2181,zk2:2181".
>  *   For example (scale up): ZK_HOST="zk1:2181,zk2:2181,zk3:2181,zk4:2181".
>
> This cannot be done automatically using in helm/k8s. In k8s, this parameter should remain static, meaning that it should not be changed after deployment of the chart.
>
>  *   For example (k8s): ZK_HOST="zk-headless:2181".
>
> What a chart can do is to create a service with a DNS name such as zk-headless that contains all the IP of the zookeepers, and as zookeeper scales, the number of IP resolved from zk-headless changes. Could solr to resolve multiple zookeeper IPs from a single name?
>
> Cheers,
> Ween Jiann
Reply | Threaded
Open this post in threaded view
|

RE: How to resolve a single domain name to multiple zookeeper IP in Solr

LEE Ween Jiann
SMU Classification: Restricted

Yes zookeeper supports dynamic change from 3.5.x.
I am referring to Solr here.

You would need to specify the list of zookeeper servers in solr.in.sh or solr.in.cmd or as -z param.
https://lucene.apache.org/solr/guide/8_1/setting-up-an-external-zookeeper-ensemble.html
But scaling zookeeper after helm deployment does not change this list of ZK_HOST automatically, this is intended as helm/k8s does not do this for you and you should not change this manually.

K8s has DNS that allow you to resolve a single domain name to multiple IP. Let say this domain is zk-headless. Then ZK_HOST="zk-headless:2181".
Solr should resolve all instance of zookeeper from a single domain name.

nslookup zk-headless
Server:  xxx
Address:  xxx
Non-authoritative answer:
Name:    zk-headless
Addresses:  
          10.0.0.11
          10.0.0.12
          10.0.0.13

This three addresses will be the zookeeper servers.

-----Original Message-----
From: Jörn Franke <[hidden email]>
Sent: Thursday, September 26, 2019 6:41 PM
To: [hidden email]
Subject: Re: How to resolve a single domain name to multiple zookeeper IP in Solr

The newest zk version supports dynamic change of the zk instances:

https://zookeeper.apache.org/doc/r3.5.3-beta/zookeeperReconfig.html

However, for that to work properly in case of a Solr restart you always need a minimal set of servers that do not change and just increase/decrease additional ones.

> Am 26.09.2019 um 12:22 schrieb LEE Ween Jiann <[hidden email]>:
>
> I'm trying to modify the helm chart for solr such that it works for kubernetes (k8s) deployment correctly. There needs to be a particular change in the way solr resolves zookeepers hostname in order for this to happen.
>
> Let me explain...
> The standard way to configure solr is by listing all the zookeeper hostname/IP in either:
>
>  *   solr.in.sh or solr.in.cmd
>  *   zoo.cfg
>  *   -z param
> For example: ZK_HOST="zk1:2181,zk2:2181,zk3:2181".
>
> However, when it comes to cloud deployment, in particular on k8s using helm chart, this is not an ideal situation as the user is required to modify zk_host each time they scale the number of zookeeper up/down.
>
>  *   For example (scale down): ZK_HOST="zk1:2181,zk2:2181".
>  *   For example (scale up): ZK_HOST="zk1:2181,zk2:2181,zk3:2181,zk4:2181".
>
> This cannot be done automatically using in helm/k8s. In k8s, this parameter should remain static, meaning that it should not be changed after deployment of the chart.
>
>  *   For example (k8s): ZK_HOST="zk-headless:2181".
>
> What a chart can do is to create a service with a DNS name such as zk-headless that contains all the IP of the zookeepers, and as zookeeper scales, the number of IP resolved from zk-headless changes. Could solr to resolve multiple zookeeper IPs from a single name?
>
> Cheers,
> Ween Jiann
Reply | Threaded
Open this post in threaded view
|

Re: How to resolve a single domain name to multiple zookeeper IP in Solr

Shawn Heisey-2
In reply to this post by LEE Ween Jiann
On 9/26/2019 4:12 AM, LEE Ween Jiann wrote:
> I'm trying to modify the helm chart for solr such that it works for kubernetes (k8s) deployment correctly. There needs to be a particular change in the way solr resolves zookeepers hostname in order for this to happen.

This is the solr-user mailing list.  Your question is about ZooKeeper.

Solr uses the ZK client without any modifications.  It passes the zkHost
string to ZK and ZK handles it.  Solr does not interpret that string --
it is ZK that is looking up the hosts, not Solr.

You're going to need to ask ZK folks this question.

Thanks,
Shawn
Reply | Threaded
Open this post in threaded view
|

Re: How to resolve a single domain name to multiple zookeeper IP in Solr

LEE Ween Jiann
Thank you, this is what I needed to know.

On 26/9/19, 9:08 PM, "Shawn Heisey" <[hidden email]> wrote:

    On 9/26/2019 4:12 AM, LEE Ween Jiann wrote:
    > I'm trying to modify the helm chart for solr such that it works for kubernetes (k8s) deployment correctly. There needs to be a particular change in the way solr resolves zookeepers hostname in order for this to happen.
   
    This is the solr-user mailing list.  Your question is about ZooKeeper.
   
    Solr uses the ZK client without any modifications.  It passes the zkHost
    string to ZK and ZK handles it.  Solr does not interpret that string --
    it is ZK that is looking up the hosts, not Solr.
   
    You're going to need to ask ZK folks this question.
   
    Thanks,
    Shawn
   

Reply | Threaded
Open this post in threaded view
|

Re: How to resolve a single domain name to multiple zookeeper IP in Solr

Mikhail Khludnev-2
In reply to this post by LEE Ween Jiann
My understanding is that a node resolves zk dns every time it reconnect
(after it lost older zk connection). So, unless one have dumb dns cache new
instance zk should be picked after older one drop. There's a subtle detail
that dns should be repeated for faster rotation eg ZK_HOST=
zk-headless:2181,zk-headless:2181
Happy scaling!

On Thu, Sep 26, 2019 at 1:58 PM LEE Ween Jiann <[hidden email]>
wrote:

> SMU Classification: Restricted
>
> Yes zookeeper supports dynamic change from 3.5.x.
> I am referring to Solr here.
>
> You would need to specify the list of zookeeper servers in solr.in.sh or
> solr.in.cmd or as -z param.
>
> https://lucene.apache.org/solr/guide/8_1/setting-up-an-external-zookeeper-ensemble.html
> But scaling zookeeper after helm deployment does not change this list of
> ZK_HOST automatically, this is intended as helm/k8s does not do this for
> you and you should not change this manually.
>
> K8s has DNS that allow you to resolve a single domain name to multiple IP.
> Let say this domain is zk-headless. Then ZK_HOST="zk-headless:2181".
> Solr should resolve all instance of zookeeper from a single domain name.
>
> nslookup zk-headless
> Server:  xxx
> Address:  xxx
> Non-authoritative answer:
> Name:    zk-headless
> Addresses:
>           10.0.0.11
>           10.0.0.12
>           10.0.0.13
>
> This three addresses will be the zookeeper servers.
>
> -----Original Message-----
> From: Jörn Franke <[hidden email]>
> Sent: Thursday, September 26, 2019 6:41 PM
> To: [hidden email]
> Subject: Re: How to resolve a single domain name to multiple zookeeper IP
> in Solr
>
> The newest zk version supports dynamic change of the zk instances:
>
> https://zookeeper.apache.org/doc/r3.5.3-beta/zookeeperReconfig.html
>
> However, for that to work properly in case of a Solr restart you always
> need a minimal set of servers that do not change and just increase/decrease
> additional ones.
>
> > Am 26.09.2019 um 12:22 schrieb LEE Ween Jiann <
> [hidden email]>:
> >
> > I'm trying to modify the helm chart for solr such that it works for
> kubernetes (k8s) deployment correctly. There needs to be a particular
> change in the way solr resolves zookeepers hostname in order for this to
> happen.
> >
> > Let me explain...
> > The standard way to configure solr is by listing all the zookeeper
> hostname/IP in either:
> >
> >  *   solr.in.sh or solr.in.cmd
> >  *   zoo.cfg
> >  *   -z param
> > For example: ZK_HOST="zk1:2181,zk2:2181,zk3:2181".
> >
> > However, when it comes to cloud deployment, in particular on k8s using
> helm chart, this is not an ideal situation as the user is required to
> modify zk_host each time they scale the number of zookeeper up/down.
> >
> >  *   For example (scale down): ZK_HOST="zk1:2181,zk2:2181".
> >  *   For example (scale up):
> ZK_HOST="zk1:2181,zk2:2181,zk3:2181,zk4:2181".
> >
> > This cannot be done automatically using in helm/k8s. In k8s, this
> parameter should remain static, meaning that it should not be changed after
> deployment of the chart.
> >
> >  *   For example (k8s): ZK_HOST="zk-headless:2181".
> >
> > What a chart can do is to create a service with a DNS name such as
> zk-headless that contains all the IP of the zookeepers, and as zookeeper
> scales, the number of IP resolved from zk-headless changes. Could solr to
> resolve multiple zookeeper IPs from a single name?
> >
> > Cheers,
> > Ween Jiann
>


--
Sincerely yours
Mikhail Khludnev
Reply | Threaded
Open this post in threaded view
|

Re: How to resolve a single domain name to multiple zookeeper IP in Solr

LEE Ween Jiann
In reply to this post by LEE Ween Jiann
Hi,

I have looked the zookeeper code and confirmed with the ZK people. What I found was that the ZKclient does indeed resolve multiple IPs from a single address, however, solr only reports one of them.

I have dug deeper and found out that fortunately this only affects the solr webapp and not the core.
This is a cosmetic issue in getZkStatus() of ZookeeperStatusHandler.java as it takes the raw ZK_HOST string and splits it to get the ensemble size. Then loops through this array to get the info.

This results in:
Status: red
Errors:
"Leader reports 2 followers, but we only found 0. Please check zkHost configuration"; "We do not have a leader" (either one)
ZK connection string: solr-zookeeper-headless:2181
Ensemble size: 1

FYI, solr-zookeeper-headless resolves to 3 different IPs. And getZkRawResponse() method only connects to 1 of the 3.

My suggestion is the following:
- List<String> zookeepers = Arrays.asList(zkHost.split("/")[0].split(","));
+ List<String> zookeeperHosts = Arrays.asList(zkHost.split("/")[0].split(","));
+ final List<String> zookeepers = new ArrayList<>();
+ for (String host : zookeeperHosts) {
+   // resolve host and add all IP:port to zookeepers array
+ }

Let me know your thoughts.

On 26/9/19, 9:47 PM, "LEE Ween Jiann" <[hidden email]> wrote:

    Thank you, this is what I needed to know.
   
    On 26/9/19, 9:08 PM, "Shawn Heisey" <[hidden email]> wrote:
   
        On 9/26/2019 4:12 AM, LEE Ween Jiann wrote:
        > I'm trying to modify the helm chart for solr such that it works for kubernetes (k8s) deployment correctly. There needs to be a particular change in the way solr resolves zookeepers hostname in order for this to happen.
       
        This is the solr-user mailing list.  Your question is about ZooKeeper.
       
        Solr uses the ZK client without any modifications.  It passes the zkHost
        string to ZK and ZK handles it.  Solr does not interpret that string --
        it is ZK that is looking up the hosts, not Solr.
       
        You're going to need to ask ZK folks this question.
       
        Thanks,
        Shawn
       
   
   

Reply | Threaded
Open this post in threaded view
|

Re: How to resolve a single domain name to multiple zookeeper IP in Solr

Shawn Heisey-2
On 9/27/2019 10:39 AM, LEE Ween Jiann wrote:

> FYI, solr-zookeeper-headless resolves to 3 different IPs. And getZkRawResponse() method only connects to 1 of the 3.
>
> My suggestion is the following:
> - List<String> zookeepers = Arrays.asList(zkHost.split("/")[0].split(","));
> + List<String> zookeeperHosts = Arrays.asList(zkHost.split("/")[0].split(","));
> + final List<String> zookeepers = new ArrayList<>();
> + for (String host : zookeeperHosts) {
> +   // resolve host and add all IP:port to zookeepers array
> + }
>
> Let me know your thoughts.

I think that Solr should query the ZK client for information about what
server hosts are active, rather than relying on the connection string,
unless that information cannot be obtained by the client.

That will be particularly important now that Solr contains ZK client
version 3.5.x.  When paired with servers also running 3.5 or later, it
is capable of dynamic reconfiguration.  It is entirely possible that the
active server list will be very different than the connection string.

According to the ZK team, the /zookeeper/config znode has this
information.  I do not know if that is new in version 3.5 or if it also
exists in 3.4.

Thanks,
Shawn
Reply | Threaded
Open this post in threaded view
|

Re: How to resolve a single domain name to multiple zookeeper IP in Solr

LEE Ween Jiann
Yes, that is the best way to go, but only available for zk >3.5, I have spun up zookeeper and checked /zookeeper/config.

Any idea if querying the ZK client for information for zk >3.5 will be added in the near future?
Should I raise a JIRA for it?

On 28/9/19, 1:33 AM, "Shawn Heisey" <[hidden email]> wrote:

    On 9/27/2019 10:39 AM, LEE Ween Jiann wrote:
    > FYI, solr-zookeeper-headless resolves to 3 different IPs. And getZkRawResponse() method only connects to 1 of the 3.
    >
    > My suggestion is the following:
    > - List<String> zookeepers = Arrays.asList(zkHost.split("/")[0].split(","));
    > + List<String> zookeeperHosts = Arrays.asList(zkHost.split("/")[0].split(","));
    > + final List<String> zookeepers = new ArrayList<>();
    > + for (String host : zookeeperHosts) {
    > +   // resolve host and add all IP:port to zookeepers array
    > + }
    >
    > Let me know your thoughts.
   
    I think that Solr should query the ZK client for information about what
    server hosts are active, rather than relying on the connection string,
    unless that information cannot be obtained by the client.
   
    That will be particularly important now that Solr contains ZK client
    version 3.5.x.  When paired with servers also running 3.5 or later, it
    is capable of dynamic reconfiguration.  It is entirely possible that the
    active server list will be very different than the connection string.
   
    According to the ZK team, the /zookeeper/config znode has this
    information.  I do not know if that is new in version 3.5 or if it also
    exists in 3.4.
   
    Thanks,
    Shawn
   

Reply | Threaded
Open this post in threaded view
|

Re: How to resolve a single domain name to multiple zookeeper IP in Solr

Jörn Franke
In reply to this post by Shawn Heisey-2
Some food for thoughts: if zookeeper can dynamically reconfigure then Solr must be able to do so as well. Let’s assume you start with an ensemble server1,server2,server3 and store this in the Solr config. During lifetime of the Solr service it is changed to server4,server5,server6. Now Solr service is restarted and it cannot connect anymore to zookeeper as server1,server2,server3 do not exist anymore.
I propose to have a dynamic config file / Solr service to record those changes in a local file that is available after restart .

> Am 27.09.2019 um 19:33 schrieb Shawn Heisey <[hidden email]>:
>
> On 9/27/2019 10:39 AM, LEE Ween Jiann wrote:
>> FYI, solr-zookeeper-headless resolves to 3 different IPs. And getZkRawResponse() method only connects to 1 of the 3.
>> My suggestion is the following:
>> - List<String> zookeepers = Arrays.asList(zkHost.split("/")[0].split(","));
>> + List<String> zookeeperHosts = Arrays.asList(zkHost.split("/")[0].split(","));
>> + final List<String> zookeepers = new ArrayList<>();
>> + for (String host : zookeeperHosts) {
>> +   // resolve host and add all IP:port to zookeepers array
>> + }
>> Let me know your thoughts.
>
> I think that Solr should query the ZK client for information about what server hosts are active, rather than relying on the connection string, unless that information cannot be obtained by the client.
>
> That will be particularly important now that Solr contains ZK client version 3.5.x.  When paired with servers also running 3.5 or later, it is capable of dynamic reconfiguration.  It is entirely possible that the active server list will be very different than the connection string.
>
> According to the ZK team, the /zookeeper/config znode has this information.  I do not know if that is new in version 3.5 or if it also exists in 3.4.
>
> Thanks,
> Shawn
Reply | Threaded
Open this post in threaded view
|

Re: How to resolve a single domain name to multiple zookeeper IP in Solr

Mikhail Khludnev-2
>   and store this in the Solr config.
I don't think it's ever possible.

On Sat, Sep 28, 2019 at 2:02 PM Jörn Franke <[hidden email]> wrote:

> Some food for thoughts: if zookeeper can dynamically reconfigure then Solr
> must be able to do so as well. Let’s assume you start with an ensemble
> server1,server2,server3 and store this in the Solr config. During lifetime
> of the Solr service it is changed to server4,server5,server6. Now Solr
> service is restarted and it cannot connect anymore to zookeeper as
> server1,server2,server3 do not exist anymore.
> I propose to have a dynamic config file / Solr service to record those
> changes in a local file that is available after restart .
>
> > Am 27.09.2019 um 19:33 schrieb Shawn Heisey <[hidden email]>:
> >
> > On 9/27/2019 10:39 AM, LEE Ween Jiann wrote:
> >> FYI, solr-zookeeper-headless resolves to 3 different IPs. And
> getZkRawResponse() method only connects to 1 of the 3.
> >> My suggestion is the following:
> >> - List<String> zookeepers =
> Arrays.asList(zkHost.split("/")[0].split(","));
> >> + List<String> zookeeperHosts =
> Arrays.asList(zkHost.split("/")[0].split(","));
> >> + final List<String> zookeepers = new ArrayList<>();
> >> + for (String host : zookeeperHosts) {
> >> +   // resolve host and add all IP:port to zookeepers array
> >> + }
> >> Let me know your thoughts.
> >
> > I think that Solr should query the ZK client for information about what
> server hosts are active, rather than relying on the connection string,
> unless that information cannot be obtained by the client.
> >
> > That will be particularly important now that Solr contains ZK client
> version 3.5.x.  When paired with servers also running 3.5 or later, it is
> capable of dynamic reconfiguration.  It is entirely possible that the
> active server list will be very different than the connection string.
> >
> > According to the ZK team, the /zookeeper/config znode has this
> information.  I do not know if that is new in version 3.5 or if it also
> exists in 3.4.
> >
> > Thanks,
> > Shawn
>


--
Sincerely yours
Mikhail Khludnev
Reply | Threaded
Open this post in threaded view
|

Re: How to resolve a single domain name to multiple zookeeper IP in Solr

Mikhail Khludnev-2
In reply to this post by LEE Ween Jiann
Yes. Please raise a jira ticket.

On Sat, Sep 28, 2019 at 10:59 AM LEE Ween Jiann <[hidden email]>
wrote:

> Yes, that is the best way to go, but only available for zk >3.5, I have
> spun up zookeeper and checked /zookeeper/config.
>
> Any idea if querying the ZK client for information for zk >3.5 will be
> added in the near future?
> Should I raise a JIRA for it?
>
> On 28/9/19, 1:33 AM, "Shawn Heisey" <[hidden email]> wrote:
>
>     On 9/27/2019 10:39 AM, LEE Ween Jiann wrote:
>     > FYI, solr-zookeeper-headless resolves to 3 different IPs. And
> getZkRawResponse() method only connects to 1 of the 3.
>     >
>     > My suggestion is the following:
>     > - List<String> zookeepers =
> Arrays.asList(zkHost.split("/")[0].split(","));
>     > + List<String> zookeeperHosts =
> Arrays.asList(zkHost.split("/")[0].split(","));
>     > + final List<String> zookeepers = new ArrayList<>();
>     > + for (String host : zookeeperHosts) {
>     > +   // resolve host and add all IP:port to zookeepers array
>     > + }
>     >
>     > Let me know your thoughts.
>
>     I think that Solr should query the ZK client for information about
> what
>     server hosts are active, rather than relying on the connection string,
>     unless that information cannot be obtained by the client.
>
>     That will be particularly important now that Solr contains ZK client
>     version 3.5.x.  When paired with servers also running 3.5 or later, it
>     is capable of dynamic reconfiguration.  It is entirely possible that
> the
>     active server list will be very different than the connection string.
>
>     According to the ZK team, the /zookeeper/config znode has this
>     information.  I do not know if that is new in version 3.5 or if it
> also
>     exists in 3.4.
>
>     Thanks,
>     Shawn
>
>
>

--
Sincerely yours
Mikhail Khludnev
Reply | Threaded
Open this post in threaded view
|

Re: How to resolve a single domain name to multiple zookeeper IP in Solr

LEE Ween Jiann
Filed. https://issues.apache.org/jira/browse/SOLR-13801

I'll try the repeated dns method as well. Thanks!


On 29/9/19, 3:37 AM, "Mikhail Khludnev" <[hidden email]> wrote:

    Yes. Please raise a jira ticket.
   
    --
    Sincerely yours
    Mikhail Khludnev