|
|
Hi,
I would like to be certain to understand how Solr use Zookeeper and more
precisely when Solr write into Zookeeper.
Solr stores various informations in ZK
- globale configuration (autoscaling, security.json)
- collection configuration (configs)
- collections state (state.json, leaders, ...)
- nodes state (live_nodes, overseer)
Writes in Zk occur when
- a zookeeper member start or stop
- a solr node start or stop
- a configuration is loaded
- a collection is created, deleted or updated (nearly all call to
collection, core or config API)
Write do not occur during
- SolrJ client creation
- indexing data (Solrj, HTTP, DIH, ...)
- searching (Solrj, HTTP)
In conclusion, if Solr nodes are stable (no failure, no maintenance), no
calls to collection, core or config API are done, so there is nearly no
writes to ZK.
Is it correct ?
Regards
Dominique
|
|
Dominique:
In a word, “yes”. You’ve got it. A common misunderstanding is that ZK is actively involved in queries/updates/whatever. Basically, what ZK is responsible for is maintaining collection-wide resources, i.e. the current state of all the replicas, config files, etc., your “global configuration" and "collection configuration”, which should change very rarely thus rarely generate writes.
The “collection state” (including your “nodes state”) information changes more frequently and generates more writes as nodes come up and down, go into recovery, etc. That said, for a cluster where all the replicas are “active” and don’t go away or go into recovery etc, ZK won’t do any writes.
So the consequence is that when you power up a cluster, there will be a flurry of write operations managed by the Overseer, but after all the replicas are up, write activity should pretty much cease.
As long as the state is steady, i.e. no replicas changing state, each individual Solr node has a copy of the relevant collection’s “state.json” znode and has all the information it needs to query or index without asking Zookeeper without _either_ reading or writing to ZK.
One rather obscure cause for ZK writes is when using “schemaless” mode. When a new field is detected, the schema (and thus the collection’s configuration) is changed, which generates writes..
Best,
Erick
> On Nov 15, 2019, at 12:06 PM, Dominique Bejean < [hidden email]> wrote:
>
> Hi,
>
> I would like to be certain to understand how Solr use Zookeeper and more
> precisely when Solr write into Zookeeper.
>
> Solr stores various informations in ZK
>
> - globale configuration (autoscaling, security.json)
> - collection configuration (configs)
> - collections state (state.json, leaders, ...)
> - nodes state (live_nodes, overseer)
>
>
> Writes in Zk occur when
>
> - a zookeeper member start or stop
> - a solr node start or stop
> - a configuration is loaded
> - a collection is created, deleted or updated (nearly all call to
> collection, core or config API)
>
>
> Write do not occur during
>
> - SolrJ client creation
> - indexing data (Solrj, HTTP, DIH, ...)
> - searching (Solrj, HTTP)
>
>
> In conclusion, if Solr nodes are stable (no failure, no maintenance), no
> calls to collection, core or config API are done, so there is nearly no
> writes to ZK.
>
> Is it correct ?
>
>
> Regards
>
> Dominique
|
|
Thank you Erick for this fast answer
Why is it a best practice to set the zookeeper connection timeout to 30000
instead the default 15000 value?
Regards
Dominique
Le ven. 15 nov. 2019 à 18:36, Erick Erickson < [hidden email]> a
écrit :
> Dominique:
>
> In a word, “yes”. You’ve got it. A common misunderstanding is that ZK is
> actively involved in queries/updates/whatever. Basically, what ZK is
> responsible for is maintaining collection-wide resources, i.e. the current
> state of all the replicas, config files, etc., your “global configuration"
> and "collection configuration”, which should change very rarely thus rarely
> generate writes.
>
> The “collection state” (including your “nodes state”) information changes
> more frequently and generates more writes as nodes come up and down, go
> into recovery, etc. That said, for a cluster where all the replicas are
> “active” and don’t go away or go into recovery etc, ZK won’t do any writes.
>
> So the consequence is that when you power up a cluster, there will be a
> flurry of write operations managed by the Overseer, but after all the
> replicas are up, write activity should pretty much cease.
>
> As long as the state is steady, i.e. no replicas changing state, each
> individual Solr node has a copy of the relevant collection’s “state.json”
> znode and has all the information it needs to query or index without asking
> Zookeeper without _either_ reading or writing to ZK.
>
> One rather obscure cause for ZK writes is when using “schemaless” mode.
> When a new field is detected, the schema (and thus the collection’s
> configuration) is changed, which generates writes..
>
> Best,
> Erick
>
>
> > On Nov 15, 2019, at 12:06 PM, Dominique Bejean <
> [hidden email]> wrote:
> >
> > Hi,
> >
> > I would like to be certain to understand how Solr use Zookeeper and more
> > precisely when Solr write into Zookeeper.
> >
> > Solr stores various informations in ZK
> >
> > - globale configuration (autoscaling, security.json)
> > - collection configuration (configs)
> > - collections state (state.json, leaders, ...)
> > - nodes state (live_nodes, overseer)
> >
> >
> > Writes in Zk occur when
> >
> > - a zookeeper member start or stop
> > - a solr node start or stop
> > - a configuration is loaded
> > - a collection is created, deleted or updated (nearly all call to
> > collection, core or config API)
> >
> >
> > Write do not occur during
> >
> > - SolrJ client creation
> > - indexing data (Solrj, HTTP, DIH, ...)
> > - searching (Solrj, HTTP)
> >
> >
> > In conclusion, if Solr nodes are stable (no failure, no maintenance), no
> > calls to collection, core or config API are done, so there is nearly no
> > writes to ZK.
> >
> > Is it correct ?
> >
> >
> > Regards
> >
> > Dominique
>
>
|
|
Hi Dominique,
in my experience, with Solr 4.8.1, this configuration it’s related to the garbage collection. When a “stop the world” endures more than 15 seconds the Solr nodes disconnects from Zookeeper, the node replicas go down and sometimes, I don’t know exactly why, you need to restart the node to have the replica back. As said this is my own personal experience, and it’s related to an old version of Solr which runs with Java 8 (CMS) a collection with 8/10 millions of documents and 4/5 millions of updates per day.
I think that the size of the collection and the number of updates play an import role in this scenario. I mean in terms of memory fragmentation.
With the newer version of Solr I don’t know if this happens again even because I have worked always with smaller size, so I never had this kind of troubles.
Ciao,
Vincenzo
--
> On 15 Nov 2019, at 18:49, Dominique Bejean < [hidden email]> wrote:
>
> Thank you Erick for this fast answer
> Why is it a best practice to set the zookeeper connection timeout to 30000
> instead the default 15000 value?
>
> Regards
>
> Dominique
>
>> Le ven. 15 nov. 2019 à 18:36, Erick Erickson < [hidden email]> a
>> écrit :
>>
>> Dominique:
>>
>> In a word, “yes”. You’ve got it. A common misunderstanding is that ZK is
>> actively involved in queries/updates/whatever. Basically, what ZK is
>> responsible for is maintaining collection-wide resources, i.e. the current
>> state of all the replicas, config files, etc., your “global configuration"
>> and "collection configuration”, which should change very rarely thus rarely
>> generate writes.
>>
>> The “collection state” (including your “nodes state”) information changes
>> more frequently and generates more writes as nodes come up and down, go
>> into recovery, etc. That said, for a cluster where all the replicas are
>> “active” and don’t go away or go into recovery etc, ZK won’t do any writes.
>>
>> So the consequence is that when you power up a cluster, there will be a
>> flurry of write operations managed by the Overseer, but after all the
>> replicas are up, write activity should pretty much cease.
>>
>> As long as the state is steady, i.e. no replicas changing state, each
>> individual Solr node has a copy of the relevant collection’s “state.json”
>> znode and has all the information it needs to query or index without asking
>> Zookeeper without _either_ reading or writing to ZK.
>>
>> One rather obscure cause for ZK writes is when using “schemaless” mode.
>> When a new field is detected, the schema (and thus the collection’s
>> configuration) is changed, which generates writes..
>>
>> Best,
>> Erick
>>
>>
>>> On Nov 15, 2019, at 12:06 PM, Dominique Bejean <
>> [hidden email]> wrote:
>>>
>>> Hi,
>>>
>>> I would like to be certain to understand how Solr use Zookeeper and more
>>> precisely when Solr write into Zookeeper.
>>>
>>> Solr stores various informations in ZK
>>>
>>> - globale configuration (autoscaling, security.json)
>>> - collection configuration (configs)
>>> - collections state (state.json, leaders, ...)
>>> - nodes state (live_nodes, overseer)
>>>
>>>
>>> Writes in Zk occur when
>>>
>>> - a zookeeper member start or stop
>>> - a solr node start or stop
>>> - a configuration is loaded
>>> - a collection is created, deleted or updated (nearly all call to
>>> collection, core or config API)
>>>
>>>
>>> Write do not occur during
>>>
>>> - SolrJ client creation
>>> - indexing data (Solrj, HTTP, DIH, ...)
>>> - searching (Solrj, HTTP)
>>>
>>>
>>> In conclusion, if Solr nodes are stable (no failure, no maintenance), no
>>> calls to collection, core or config API are done, so there is nearly no
>>> writes to ZK.
>>>
>>> Is it correct ?
>>>
>>>
>>> Regards
>>>
>>> Dominique
>>
>>
|
|
How Solr nodes know that something was changed in Zookeeper by an other
node ? Is there any notification from ZK or do Solr nodes read
systematically in ZK (without local caching) ?
Dominique
Le ven. 15 nov. 2019 à 18:36, Erick Erickson < [hidden email]> a
écrit :
> Dominique:
>
> In a word, “yes”. You’ve got it. A common misunderstanding is that ZK is
> actively involved in queries/updates/whatever. Basically, what ZK is
> responsible for is maintaining collection-wide resources, i.e. the current
> state of all the replicas, config files, etc., your “global configuration"
> and "collection configuration”, which should change very rarely thus rarely
> generate writes.
>
> The “collection state” (including your “nodes state”) information changes
> more frequently and generates more writes as nodes come up and down, go
> into recovery, etc. That said, for a cluster where all the replicas are
> “active” and don’t go away or go into recovery etc, ZK won’t do any writes.
>
> So the consequence is that when you power up a cluster, there will be a
> flurry of write operations managed by the Overseer, but after all the
> replicas are up, write activity should pretty much cease.
>
> As long as the state is steady, i.e. no replicas changing state, each
> individual Solr node has a copy of the relevant collection’s “state.json”
> znode and has all the information it needs to query or index without asking
> Zookeeper without _either_ reading or writing to ZK.
>
> One rather obscure cause for ZK writes is when using “schemaless” mode.
> When a new field is detected, the schema (and thus the collection’s
> configuration) is changed, which generates writes..
>
> Best,
> Erick
>
>
> > On Nov 15, 2019, at 12:06 PM, Dominique Bejean <
> [hidden email]> wrote:
> >
> > Hi,
> >
> > I would like to be certain to understand how Solr use Zookeeper and more
> > precisely when Solr write into Zookeeper.
> >
> > Solr stores various informations in ZK
> >
> > - globale configuration (autoscaling, security.json)
> > - collection configuration (configs)
> > - collections state (state.json, leaders, ...)
> > - nodes state (live_nodes, overseer)
> >
> >
> > Writes in Zk occur when
> >
> > - a zookeeper member start or stop
> > - a solr node start or stop
> > - a configuration is loaded
> > - a collection is created, deleted or updated (nearly all call to
> > collection, core or config API)
> >
> >
> > Write do not occur during
> >
> > - SolrJ client creation
> > - indexing data (Solrj, HTTP, DIH, ...)
> > - searching (Solrj, HTTP)
> >
> >
> > In conclusion, if Solr nodes are stable (no failure, no maintenance), no
> > calls to collection, core or config API are done, so there is nearly no
> > writes to ZK.
> >
> > Is it correct ?
> >
> >
> > Regards
> >
> > Dominique
>
>
|
|
On 11/18/2019 8:39 AM, Dominique Bejean wrote:
> How Solr nodes know that something was changed in Zookeeper by an other
> node ? Is there any notification from ZK or do Solr nodes read
> systematically in ZK (without local caching) ?
This is built-in functionality of ZooKeeper. The client allows setting
what's called watches, which trigger when the watched node changes.
https://zookeeper.apache.org/doc/r3.5.6/zookeeperProgrammers.html#sc_zkDataMode_watchesThis functionality is used extensively in SolrCloud.
Thanks,
Shawn
|
|
Thanh you Shawn
Le lun. 18 nov. 2019 à 19:28, Shawn Heisey < [hidden email]> a écrit :
> On 11/18/2019 8:39 AM, Dominique Bejean wrote:
> > How Solr nodes know that something was changed in Zookeeper by an other
> > node ? Is there any notification from ZK or do Solr nodes read
> > systematically in ZK (without local caching) ?
>
> This is built-in functionality of ZooKeeper. The client allows setting
> what's called watches, which trigger when the watched node changes.
>
>
> https://zookeeper.apache.org/doc/r3.5.6/zookeeperProgrammers.html#sc_zkDataMode_watches>
> This functionality is used extensively in SolrCloud.
>
> Thanks,
> Shawn
>
|
|