Query regarding Solr Cloud Setup

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Query regarding Solr Cloud Setup

iporritt

Hi,

 

I am relatively new to Solr especially Solr Cloud and have been using it for a few days now. I think I have setup Solr Cloud correctly however would like some guidance to ensure I am doing it correctly. I ideally want to be able to process 40 million documents on production via Solr Cloud. The number of fields is undefined as the documents may differ but could be around 20+.

 

The current setup I have at present is as follows: (note this is all on 1 machine for now). A 3 Zookeeper Ensemble (all running on different ports) and works as expected.

 

3 Solar Nodes started on separate ports (note: directory path à D:\solr-7.7.1\example\cloud\Node (1/2/3).

 

 

Setup of Solr would be similar to the above except its on my local, the below is the Graph status in Solr Cloud.

 

 

I have a few questions which I cannot seem to find the answer for on the web.

 

We have a schema which I have managed to upload to Zookeeper along with the Solrconfig, how do I get the system to recognise both a lib/.jar extension and a custom core.properties file? I bypassed the issue of the core.properties by amending the update.autoCreateField in the Solrconfig.xml to false however would like to include as a colleague has done on Solr Standlone.

 

Also from a high availability aspect, if I effectivly lost 2 of the Solr Servers due to an outage will the system still work as expected? Would I expect any data loss?

 

 


smime.p7s (10K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Query regarding Solr Cloud Setup

Shawn Heisey-2
On 9/3/2019 7:22 AM, Porritt, Ian wrote:
> We have a schema which I have managed to upload to Zookeeper along with
> the Solrconfig, how do I get the system to recognise both a lib/.jar
> extension and a custom core.properties file? I bypassed the issue of the
> core.properties by amending the update.autoCreateField in the
> Solrconfig.xml to false however would like to include as a colleague has
> done on Solr Standlone.

I cannot tell what you are asking here.  The core.properties file lives
on the disk, not in ZK.

I was under the impression that .jar files could not be loaded into ZK
and used in a core config.  Documentation saying otherwise was recently
pointed out to me on the list, but I remain skeptical that this actually
works, and I have not tried to implement it myself.

The best way to handle custom jar loading is to create a "lib" directory
under the solr home, and place all jars there.  Solr will automatically
load them all before any cores are started, and no config commands of
any kind will be needed to make it happen.

> Also from a high availability aspect, if I effectivly lost 2 of the Solr
> Servers due to an outage will the system still work as expected? Would I
> expect any data loss?

If all three Solr servers have a complete copy of all your indexes, then
you should remain fully operational if two of those Solr servers go down.

Note that if you have three ZK servers and you lose two, that means that
you have lost zookeeper quorum, and in that situation, SolrCloud will
transition to read only -- you will not be able to change any index in
the cloud.  This is how ZK is designed and it cannot be changed.  If you
want a ZK deployment to survive the loss of two servers, you must have
at least five total ZK servers, so more than 50 percent of the total
survives.

Thanks,
Shawn
Reply | Threaded
Open this post in threaded view
|

Re: Query regarding Solr Cloud Setup

Erick Erickson
Having custom core.properties files is “fraught”. First of all, that file can be re-written. Second, the collections ADDREPLICA command will create a new core.properties file. Third, any mistakes you make when hand-editing the file can have grave consequences.

What change exactly do you want to make to core.properties and why?

Trying to reproduce “what a colleague has done on standalone” is not something I’d recommend, SolrCloud is a different beast. Reproducing the _behavior_ is another thing, so what is the behavior you want in SolrCloud that causes you to want to customize core.properties?

Best,
Erick

> On Sep 3, 2019, at 10:15 AM, Shawn Heisey <[hidden email]> wrote:
>
> On 9/3/2019 7:22 AM, Porritt, Ian wrote:
>> We have a schema which I have managed to upload to Zookeeper along with the Solrconfig, how do I get the system to recognise both a lib/.jar extension and a custom core.properties file? I bypassed the issue of the core.properties by amending the update.autoCreateField in the Solrconfig.xml to false however would like to include as a colleague has done on Solr Standlone.
>
> I cannot tell what you are asking here.  The core.properties file lives on the disk, not in ZK.
>
> I was under the impression that .jar files could not be loaded into ZK and used in a core config.  Documentation saying otherwise was recently pointed out to me on the list, but I remain skeptical that this actually works, and I have not tried to implement it myself.
>
> The best way to handle custom jar loading is to create a "lib" directory under the solr home, and place all jars there.  Solr will automatically load them all before any cores are started, and no config commands of any kind will be needed to make it happen.
>
>> Also from a high availability aspect, if I effectivly lost 2 of the Solr Servers due to an outage will the system still work as expected? Would I expect any data loss?
>
> If all three Solr servers have a complete copy of all your indexes, then you should remain fully operational if two of those Solr servers go down.
>
> Note that if you have three ZK servers and you lose two, that means that you have lost zookeeper quorum, and in that situation, SolrCloud will transition to read only -- you will not be able to change any index in the cloud.  This is how ZK is designed and it cannot be changed.  If you want a ZK deployment to survive the loss of two servers, you must have at least five total ZK servers, so more than 50 percent of the total survives.
>
> Thanks,
> Shawn

Reply | Threaded
Open this post in threaded view
|

Re: Query regarding Solr Cloud Setup

Jörn Franke
In reply to this post by iporritt
If you have a properly secured cluster eg with Kerberos then you should not update files in ZK directly. Use the corresponding Solr REST interfaces then you also less likely to mess something up.

If you want to have HA you should have at least 3 Solr nodes and replicate the collection to all three of them (more is not needed from a HA point of view). This would also allow you upgrades to the cluster without downtime.

> Am 03.09.2019 um 15:22 schrieb Porritt, Ian <[hidden email]>:
>
> Hi,
>  
> I am relatively new to Solr especially Solr Cloud and have been using it for a few days now. I think I have setup Solr Cloud correctly however would like some guidance to ensure I am doing it correctly. I ideally want to be able to process 40 million documents on production via Solr Cloud. The number of fields is undefined as the documents may differ but could be around 20+.
>  
> The current setup I have at present is as follows: (note this is all on 1 machine for now). A 3 Zookeeper Ensemble (all running on different ports) and works as expected.
>  
> 3 Solar Nodes started on separate ports (note: directory path à D:\solr-7.7.1\example\cloud\Node (1/2/3).
>  
> <image001.jpg>
>  
> Setup of Solr would be similar to the above except its on my local, the below is the Graph status in Solr Cloud.
>  
> <image002.jpg>
>  
> I have a few questions which I cannot seem to find the answer for on the web.
>  
> We have a schema which I have managed to upload to Zookeeper along with the Solrconfig, how do I get the system to recognise both a lib/.jar extension and a custom core.properties file? I bypassed the issue of the core.properties by amending the update.autoCreateField in the Solrconfig.xml to false however would like to include as a colleague has done on Solr Standlone.
>  
> Also from a high availability aspect, if I effectivly lost 2 of the Solr Servers due to an outage will the system still work as expected? Would I expect any data loss?
>  
>  
Reply | Threaded
Open this post in threaded view
|

RE: Query regarding Solr Cloud Setup

iporritt
Hi Jörn/Erick/Shawn thanks for your responses.

@Jörn - much apprecaited for the heads up on Kerberos authentication its something we havent really considered at the moment, more production this may well be the case. With regards to the Solr Nodes 3 is something we are looking as a minimum, when adding a new Solr Node to the cluster will settings/configuration be applied by Zookeeper on the new node or is there manual intervention?
@Erick - With regards to the core.properties, on standard Solr the update.autoCreateFields=false is within the core.properites file however for Cloud I have it added within Solrconfig.xml which gets uploaded to Zookeeper, apprecaite standard and cloud may work entirely different just wanted to ensure it’s the correct way of doing it.
@Shawn - Will try the creation of the lib directory in Solr Home to see if it gets picked up and having 5 Zookeepers would more than satisy high availability.


Regards
Ian

-----Original Message-----
From: Jörn Franke <[hidden email]>

If you have a properly secured cluster eg with Kerberos then you should not update files in ZK directly. Use the corresponding Solr REST interfaces then you also less likely to mess something up.

If you want to have HA you should have at least 3 Solr nodes and replicate the collection to all three of them (more is not needed from a HA point of view). This would also allow you upgrades to the cluster without downtime.

-----Original Message-----
From: [hidden email]>
Having custom core.properties files is “fraught”. First of all, that file can be re-written. Second, the collections ADDREPLICA command will create a new core.properties file. Third, any mistakes you make when hand-editing the file can have grave consequences.

What change exactly do you want to make to core.properties and why?

Trying to reproduce “what a colleague has done on standalone” is not something I’d recommend, SolrCloud is a different beast. Reproducing the _behavior_ is another thing, so what is the behavior you want in SolrCloud that causes you to want to customize core.properties?

Best,
Erick  

-----Original Message-----
From: Shawn Heisey <[hidden email]>

I cannot tell what you are asking here.  The core.properties file lives
on the disk, not in ZK.

I was under the impression that .jar files could not be loaded into ZK
and used in a core config.  Documentation saying otherwise was recently
pointed out to me on the list, but I remain skeptical that this actually
works, and I have not tried to implement it myself.

The best way to handle custom jar loading is to create a "lib" directory
under the solr home, and place all jars there.  Solr will automatically
load them all before any cores are started, and no config commands of
any kind will be needed to make it happen.

> Also from a high availability aspect, if I effectivly lost 2 of the Solr
> Servers due to an outage will the system still work as expected? Would I
> expect any data loss?

If all three Solr servers have a complete copy of all your indexes, then
you should remain fully operational if two of those Solr servers go down.

Note that if you have three ZK servers and you lose two, that means that
you have lost zookeeper quorum, and in that situation, SolrCloud will
transition to read only -- you will not be able to change any index in
the cloud.  This is how ZK is designed and it cannot be changed.  If you
want a ZK deployment to survive the loss of two servers, you must have
at least five total ZK servers, so more than 50 percent of the total
survives.

Thanks,
Shawn

smime.p7s (10K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Query regarding Solr Cloud Setup

Erick Erickson
Ok, you can set it as a sysvar when starting solr. Or you can change your
solrconfig.xml to either use classic schema (schema.xml) or take out the
add-unknown-fields... from the update processor chain. You can also set a
cluster property IIRC. Better to use one of the supported options...

On Fri, Sep 6, 2019, 05:22 Porritt, Ian <[hidden email]> wrote:

> Hi Jörn/Erick/Shawn thanks for your responses.
>
> @Jörn - much apprecaited for the heads up on Kerberos authentication its
> something we havent really considered at the moment, more production this
> may well be the case. With regards to the Solr Nodes 3 is something we are
> looking as a minimum, when adding a new Solr Node to the cluster will
> settings/configuration be applied by Zookeeper on the new node or is there
> manual intervention?
> @Erick - With regards to the core.properties, on standard Solr the
> update.autoCreateFields=false is within the core.properites file however
> for Cloud I have it added within Solrconfig.xml which gets uploaded to
> Zookeeper, apprecaite standard and cloud may work entirely different just
> wanted to ensure it’s the correct way of doing it.
> @Shawn - Will try the creation of the lib directory in Solr Home to see if
> it gets picked up and having 5 Zookeepers would more than satisy high
> availability.
>
>
> Regards
> Ian
>
> -----Original Message-----
> From: Jörn Franke <[hidden email]>
>
> If you have a properly secured cluster eg with Kerberos then you should
> not update files in ZK directly. Use the corresponding Solr REST interfaces
> then you also less likely to mess something up.
>
> If you want to have HA you should have at least 3 Solr nodes and replicate
> the collection to all three of them (more is not needed from a HA point of
> view). This would also allow you upgrades to the cluster without downtime.
>
> -----Original Message-----
> From: [hidden email]>
> Having custom core.properties files is “fraught”. First of all, that file
> can be re-written. Second, the collections ADDREPLICA command will create a
> new core.properties file. Third, any mistakes you make when hand-editing
> the file can have grave consequences.
>
> What change exactly do you want to make to core.properties and why?
>
> Trying to reproduce “what a colleague has done on standalone” is not
> something I’d recommend, SolrCloud is a different beast. Reproducing the
> _behavior_ is another thing, so what is the behavior you want in SolrCloud
> that causes you to want to customize core.properties?
>
> Best,
> Erick
>
> -----Original Message-----
> From: Shawn Heisey <[hidden email]>
>
> I cannot tell what you are asking here.  The core.properties file lives
> on the disk, not in ZK.
>
> I was under the impression that .jar files could not be loaded into ZK
> and used in a core config.  Documentation saying otherwise was recently
> pointed out to me on the list, but I remain skeptical that this actually
> works, and I have not tried to implement it myself.
>
> The best way to handle custom jar loading is to create a "lib" directory
> under the solr home, and place all jars there.  Solr will automatically
> load them all before any cores are started, and no config commands of
> any kind will be needed to make it happen.
>
> > Also from a high availability aspect, if I effectivly lost 2 of the Solr
> > Servers due to an outage will the system still work as expected? Would I
> > expect any data loss?
>
> If all three Solr servers have a complete copy of all your indexes, then
> you should remain fully operational if two of those Solr servers go down.
>
> Note that if you have three ZK servers and you lose two, that means that
> you have lost zookeeper quorum, and in that situation, SolrCloud will
> transition to read only -- you will not be able to change any index in
> the cloud.  This is how ZK is designed and it cannot be changed.  If you
> want a ZK deployment to survive the loss of two servers, you must have
> at least five total ZK servers, so more than 50 percent of the total
> survives.
>
> Thanks,
> Shawn
>