When does Solr write in Zookeeper ?

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

When does Solr write in Zookeeper ?

Dominique Bejean
Hi,

I would like to be certain to understand how Solr use Zookeeper and more
precisely when Solr write into Zookeeper.

Solr stores various informations in ZK

   - globale configuration (autoscaling, security.json)
   - collection configuration (configs)
   - collections state (state.json, leaders, ...)
   - nodes state (live_nodes, overseer)


Writes in Zk occur when

   - a zookeeper member start or stop
   - a solr node start or stop
   - a configuration is loaded
   - a collection is created, deleted or updated (nearly all call to
   collection, core or config API)


Write do not occur during

   - SolrJ client creation
   - indexing data (Solrj, HTTP, DIH, ...)
   - searching (Solrj, HTTP)


In conclusion, if Solr nodes are stable (no failure, no maintenance), no
calls to  collection, core or config API are done, so there is nearly no
writes to ZK.

Is it correct ?


Regards

Dominique
Reply | Threaded
Open this post in threaded view
|

Re: When does Solr write in Zookeeper ?

Erick Erickson
Dominique:

In a word, “yes”. You’ve got it. A common misunderstanding is that ZK is actively involved in queries/updates/whatever. Basically, what ZK is responsible for is maintaining collection-wide resources, i.e. the current state of all the replicas, config files, etc., your “global configuration" and "collection configuration”, which should change very rarely thus rarely generate writes.

The “collection state” (including your “nodes state”) information changes more frequently and generates more writes as nodes come up and down, go into recovery, etc. That said, for a cluster where all the replicas are “active” and don’t go away or go into recovery etc, ZK won’t do any writes.

So the consequence is that when you power up a cluster, there will be a flurry of write operations managed by the Overseer, but after all the replicas are up, write activity should pretty much cease.

As long as the state is steady, i.e. no replicas changing state, each individual Solr node has a copy of the relevant collection’s “state.json” znode and has all the information it needs to query or index without asking Zookeeper without _either_ reading or writing to ZK.

One rather obscure cause for ZK writes is when using “schemaless” mode. When a new field is detected, the schema (and thus the collection’s configuration) is changed, which generates writes..

Best,
Erick


> On Nov 15, 2019, at 12:06 PM, Dominique Bejean <[hidden email]> wrote:
>
> Hi,
>
> I would like to be certain to understand how Solr use Zookeeper and more
> precisely when Solr write into Zookeeper.
>
> Solr stores various informations in ZK
>
>   - globale configuration (autoscaling, security.json)
>   - collection configuration (configs)
>   - collections state (state.json, leaders, ...)
>   - nodes state (live_nodes, overseer)
>
>
> Writes in Zk occur when
>
>   - a zookeeper member start or stop
>   - a solr node start or stop
>   - a configuration is loaded
>   - a collection is created, deleted or updated (nearly all call to
>   collection, core or config API)
>
>
> Write do not occur during
>
>   - SolrJ client creation
>   - indexing data (Solrj, HTTP, DIH, ...)
>   - searching (Solrj, HTTP)
>
>
> In conclusion, if Solr nodes are stable (no failure, no maintenance), no
> calls to  collection, core or config API are done, so there is nearly no
> writes to ZK.
>
> Is it correct ?
>
>
> Regards
>
> Dominique

Reply | Threaded
Open this post in threaded view
|

Re: When does Solr write in Zookeeper ?

Dominique Bejean
Thank you Erick for this fast answer
Why is it a best practice to set the zookeeper  connection timeout to 30000
instead the default 15000 value?

Regards

Dominique

Le ven. 15 nov. 2019 à 18:36, Erick Erickson <[hidden email]> a
écrit :

> Dominique:
>
> In a word, “yes”. You’ve got it. A common misunderstanding is that ZK is
> actively involved in queries/updates/whatever. Basically, what ZK is
> responsible for is maintaining collection-wide resources, i.e. the current
> state of all the replicas, config files, etc., your “global configuration"
> and "collection configuration”, which should change very rarely thus rarely
> generate writes.
>
> The “collection state” (including your “nodes state”) information changes
> more frequently and generates more writes as nodes come up and down, go
> into recovery, etc. That said, for a cluster where all the replicas are
> “active” and don’t go away or go into recovery etc, ZK won’t do any writes.
>
> So the consequence is that when you power up a cluster, there will be a
> flurry of write operations managed by the Overseer, but after all the
> replicas are up, write activity should pretty much cease.
>
> As long as the state is steady, i.e. no replicas changing state, each
> individual Solr node has a copy of the relevant collection’s “state.json”
> znode and has all the information it needs to query or index without asking
> Zookeeper without _either_ reading or writing to ZK.
>
> One rather obscure cause for ZK writes is when using “schemaless” mode.
> When a new field is detected, the schema (and thus the collection’s
> configuration) is changed, which generates writes..
>
> Best,
> Erick
>
>
> > On Nov 15, 2019, at 12:06 PM, Dominique Bejean <
> [hidden email]> wrote:
> >
> > Hi,
> >
> > I would like to be certain to understand how Solr use Zookeeper and more
> > precisely when Solr write into Zookeeper.
> >
> > Solr stores various informations in ZK
> >
> >   - globale configuration (autoscaling, security.json)
> >   - collection configuration (configs)
> >   - collections state (state.json, leaders, ...)
> >   - nodes state (live_nodes, overseer)
> >
> >
> > Writes in Zk occur when
> >
> >   - a zookeeper member start or stop
> >   - a solr node start or stop
> >   - a configuration is loaded
> >   - a collection is created, deleted or updated (nearly all call to
> >   collection, core or config API)
> >
> >
> > Write do not occur during
> >
> >   - SolrJ client creation
> >   - indexing data (Solrj, HTTP, DIH, ...)
> >   - searching (Solrj, HTTP)
> >
> >
> > In conclusion, if Solr nodes are stable (no failure, no maintenance), no
> > calls to  collection, core or config API are done, so there is nearly no
> > writes to ZK.
> >
> > Is it correct ?
> >
> >
> > Regards
> >
> > Dominique
>
>
Reply | Threaded
Open this post in threaded view
|

Re: When does Solr write in Zookeeper ?

Vincenzo D'Amore
Hi Dominique,

in my experience, with Solr 4.8.1, this configuration it’s related to the garbage collection. When a “stop the world” endures more than 15 seconds the Solr nodes disconnects from Zookeeper, the node replicas go down and sometimes, I don’t know exactly why, you need to restart the node to have the replica back. As said this is my own personal experience, and it’s related to an old version of Solr which runs with Java 8 (CMS) a collection with 8/10 millions of documents and 4/5 millions of updates per day.

I think that the size of the collection and the number of updates play an import role in this scenario. I mean in terms of memory fragmentation.

With the newer version of Solr I don’t know if this happens again even because I have worked always with smaller size, so I never had this kind of troubles.

Ciao,
Vincenzo

--


> On 15 Nov 2019, at 18:49, Dominique Bejean <[hidden email]> wrote:
>
> Thank you Erick for this fast answer
> Why is it a best practice to set the zookeeper  connection timeout to 30000
> instead the default 15000 value?
>
> Regards
>
> Dominique
>
>> Le ven. 15 nov. 2019 à 18:36, Erick Erickson <[hidden email]> a
>> écrit :
>>
>> Dominique:
>>
>> In a word, “yes”. You’ve got it. A common misunderstanding is that ZK is
>> actively involved in queries/updates/whatever. Basically, what ZK is
>> responsible for is maintaining collection-wide resources, i.e. the current
>> state of all the replicas, config files, etc., your “global configuration"
>> and "collection configuration”, which should change very rarely thus rarely
>> generate writes.
>>
>> The “collection state” (including your “nodes state”) information changes
>> more frequently and generates more writes as nodes come up and down, go
>> into recovery, etc. That said, for a cluster where all the replicas are
>> “active” and don’t go away or go into recovery etc, ZK won’t do any writes.
>>
>> So the consequence is that when you power up a cluster, there will be a
>> flurry of write operations managed by the Overseer, but after all the
>> replicas are up, write activity should pretty much cease.
>>
>> As long as the state is steady, i.e. no replicas changing state, each
>> individual Solr node has a copy of the relevant collection’s “state.json”
>> znode and has all the information it needs to query or index without asking
>> Zookeeper without _either_ reading or writing to ZK.
>>
>> One rather obscure cause for ZK writes is when using “schemaless” mode.
>> When a new field is detected, the schema (and thus the collection’s
>> configuration) is changed, which generates writes..
>>
>> Best,
>> Erick
>>
>>
>>> On Nov 15, 2019, at 12:06 PM, Dominique Bejean <
>> [hidden email]> wrote:
>>>
>>> Hi,
>>>
>>> I would like to be certain to understand how Solr use Zookeeper and more
>>> precisely when Solr write into Zookeeper.
>>>
>>> Solr stores various informations in ZK
>>>
>>>  - globale configuration (autoscaling, security.json)
>>>  - collection configuration (configs)
>>>  - collections state (state.json, leaders, ...)
>>>  - nodes state (live_nodes, overseer)
>>>
>>>
>>> Writes in Zk occur when
>>>
>>>  - a zookeeper member start or stop
>>>  - a solr node start or stop
>>>  - a configuration is loaded
>>>  - a collection is created, deleted or updated (nearly all call to
>>>  collection, core or config API)
>>>
>>>
>>> Write do not occur during
>>>
>>>  - SolrJ client creation
>>>  - indexing data (Solrj, HTTP, DIH, ...)
>>>  - searching (Solrj, HTTP)
>>>
>>>
>>> In conclusion, if Solr nodes are stable (no failure, no maintenance), no
>>> calls to  collection, core or config API are done, so there is nearly no
>>> writes to ZK.
>>>
>>> Is it correct ?
>>>
>>>
>>> Regards
>>>
>>> Dominique
>>
>>
Reply | Threaded
Open this post in threaded view
|

Re: When does Solr write in Zookeeper ?

Dominique Bejean
In reply to this post by Erick Erickson
How Solr nodes know that something was changed in Zookeeper by an other
node ? Is there any notification from ZK or do Solr nodes read
systematically in ZK (without local caching) ?

Dominique



Le ven. 15 nov. 2019 à 18:36, Erick Erickson <[hidden email]> a
écrit :

> Dominique:
>
> In a word, “yes”. You’ve got it. A common misunderstanding is that ZK is
> actively involved in queries/updates/whatever. Basically, what ZK is
> responsible for is maintaining collection-wide resources, i.e. the current
> state of all the replicas, config files, etc., your “global configuration"
> and "collection configuration”, which should change very rarely thus rarely
> generate writes.
>
> The “collection state” (including your “nodes state”) information changes
> more frequently and generates more writes as nodes come up and down, go
> into recovery, etc. That said, for a cluster where all the replicas are
> “active” and don’t go away or go into recovery etc, ZK won’t do any writes.
>
> So the consequence is that when you power up a cluster, there will be a
> flurry of write operations managed by the Overseer, but after all the
> replicas are up, write activity should pretty much cease.
>
> As long as the state is steady, i.e. no replicas changing state, each
> individual Solr node has a copy of the relevant collection’s “state.json”
> znode and has all the information it needs to query or index without asking
> Zookeeper without _either_ reading or writing to ZK.
>
> One rather obscure cause for ZK writes is when using “schemaless” mode.
> When a new field is detected, the schema (and thus the collection’s
> configuration) is changed, which generates writes..
>
> Best,
> Erick
>
>
> > On Nov 15, 2019, at 12:06 PM, Dominique Bejean <
> [hidden email]> wrote:
> >
> > Hi,
> >
> > I would like to be certain to understand how Solr use Zookeeper and more
> > precisely when Solr write into Zookeeper.
> >
> > Solr stores various informations in ZK
> >
> >   - globale configuration (autoscaling, security.json)
> >   - collection configuration (configs)
> >   - collections state (state.json, leaders, ...)
> >   - nodes state (live_nodes, overseer)
> >
> >
> > Writes in Zk occur when
> >
> >   - a zookeeper member start or stop
> >   - a solr node start or stop
> >   - a configuration is loaded
> >   - a collection is created, deleted or updated (nearly all call to
> >   collection, core or config API)
> >
> >
> > Write do not occur during
> >
> >   - SolrJ client creation
> >   - indexing data (Solrj, HTTP, DIH, ...)
> >   - searching (Solrj, HTTP)
> >
> >
> > In conclusion, if Solr nodes are stable (no failure, no maintenance), no
> > calls to  collection, core or config API are done, so there is nearly no
> > writes to ZK.
> >
> > Is it correct ?
> >
> >
> > Regards
> >
> > Dominique
>
>
Reply | Threaded
Open this post in threaded view
|

Re: When does Solr write in Zookeeper ?

Shawn Heisey-2
On 11/18/2019 8:39 AM, Dominique Bejean wrote:
> How Solr nodes know that something was changed in Zookeeper by an other
> node ? Is there any notification from ZK or do Solr nodes read
> systematically in ZK (without local caching) ?

This is built-in functionality of ZooKeeper.  The client allows setting
what's called watches, which trigger when the watched node changes.

https://zookeeper.apache.org/doc/r3.5.6/zookeeperProgrammers.html#sc_zkDataMode_watches

This functionality is used extensively in SolrCloud.

Thanks,
Shawn
Reply | Threaded
Open this post in threaded view
|

Re: When does Solr write in Zookeeper ?

Dominique Bejean
Thanh you Shawn


Le lun. 18 nov. 2019 à 19:28, Shawn Heisey <[hidden email]> a écrit :

> On 11/18/2019 8:39 AM, Dominique Bejean wrote:
> > How Solr nodes know that something was changed in Zookeeper by an other
> > node ? Is there any notification from ZK or do Solr nodes read
> > systematically in ZK (without local caching) ?
>
> This is built-in functionality of ZooKeeper.  The client allows setting
> what's called watches, which trigger when the watched node changes.
>
>
> https://zookeeper.apache.org/doc/r3.5.6/zookeeperProgrammers.html#sc_zkDataMode_watches
>
> This functionality is used extensively in SolrCloud.
>
> Thanks,
> Shawn
>