Apache Solr in High Availability Primary and Secondary node.

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Apache Solr in High Availability Primary and Secondary node.

Kaushal Shriyan
Hi,

We are running Apache Solr 8.7.0 search service on CentOS Linux release
7.9.2009 (Core).

Is there a way to set up the Solr search service in High Availability Mode
in the Primary and Secondary node? For example, if the primary node is down
secondary node will take care of the service.

Best Regards,

Kaushal
Reply | Threaded
Open this post in threaded view
|

RE: Apache Solr in High Availability Primary and Secondary node.

DAVID MARTIN NIETO
I believe Solr dont have this configuration, you need a load balancer with that configuration mode for that.

Kind regards.


________________________________
De: Kaushal Shriyan <[hidden email]>
Enviado: lunes, 11 de enero de 2021 11:32
Para: [hidden email] <[hidden email]>
Asunto: Apache Solr in High Availability Primary and Secondary node.

Hi,

We are running Apache Solr 8.7.0 search service on CentOS Linux release
7.9.2009 (Core).

Is there a way to set up the Solr search service in High Availability Mode
in the Primary and Secondary node? For example, if the primary node is down
secondary node will take care of the service.

Best Regards,

Kaushal
Reply | Threaded
Open this post in threaded view
|

Re: Apache Solr in High Availability Primary and Secondary node.

Kaushal Shriyan
On Mon, Jan 11, 2021 at 4:11 PM DAVID MARTIN NIETO <[hidden email]>
wrote:

> I believe Solr dont have this configuration, you need a load balancer with
> that configuration mode for that.
>
> Kind regards.
>
>
Thanks, David for the quick response. Is there any use-case to use HAProxy
or Nginx webserver or any other application to load balance both Solr
primary and secondary nodes?

Best Regards,

Kaushal
Reply | Threaded
Open this post in threaded view
|

RE: Apache Solr in High Availability Primary and Secondary node.

DAVID MARTIN NIETO
Hi again,

I dont know about those products but, with Apache something like that can works:

https://stackoverflow.com/questions/6381749/apache-httpd-mod-proxy-balancer-with-active-passive-setup/11083458
https://httpd.apache.org/docs/2.4/mod/mod_proxy_balancer.html

Kind regards.


________________________________
David Martín Nieto
Analista Funcional
Calle Cabeza Mesada 5
28031, Madrid
T: +34 667 414 432
T: +34 91 779 56 98| Ext. 3198
E-mail: [hidden email] | Web: www.viewnext.com
________________________________
[https://mail.google.com/mail/u/0?ui=2&ik=72317294cd&attid=0.0.2&permmsgid=msg-f:1662155651369049897&th=171129c229429f29&view=fimg&sz=s0-l75-ft&attbid=ANGjdJ_o0Ds8_P8d7W-csq2mmc6mBGQy9hSjXsGEv15RXUutalCYzg3HQB3CByE2swcJkH3yRaLwrXkr1G81F9FpfqcPlbpRoZcainmsJjviLoypusuKOxCnOw97zuo&disp=emb]



________________________________
De: Kaushal Shriyan <[hidden email]>
Enviado: lunes, 11 de enero de 2021 12:02
Para: [hidden email] <[hidden email]>
Asunto: Re: Apache Solr in High Availability Primary and Secondary node.

On Mon, Jan 11, 2021 at 4:11 PM DAVID MARTIN NIETO <[hidden email]>
wrote:

> I believe Solr dont have this configuration, you need a load balancer with
> that configuration mode for that.
>
> Kind regards.
>
>
Thanks, David for the quick response. Is there any use-case to use HAProxy
or Nginx webserver or any other application to load balance both Solr
primary and secondary nodes?

Best Regards,

Kaushal
Reply | Threaded
Open this post in threaded view
|

Re: Apache Solr in High Availability Primary and Secondary node.

Shawn Heisey-2
In reply to this post by Kaushal Shriyan
On 1/11/2021 4:02 AM, Kaushal Shriyan wrote:
> Thanks, David for the quick response. Is there any use-case to use HAProxy
> or Nginx webserver or any other application to load balance both Solr
> primary and secondary nodes?

I had a setup with haproxy and two copies of a Solr index.

Four of the nodes with Solr on them were running a pacemaker setup for
high availability on the haproxy load balancer.  If any single system
were to die, everything kept on working.

My homegrown indexing system kept both copies of the index up to date
independently -- no replication.   I had to abandon replication because
version 3.x and later cannot replicate from 1.x.  I kept that paradigm
even after I was running version with compatible replication because it
was very flexible.

I really like haproxy, but going into further detail would be off topic
for this list.

Thanks,
Shawn
Reply | Threaded
Open this post in threaded view
|

Re: Apache Solr in High Availability Primary and Secondary node.

Walter Underwood
In reply to this post by DAVID MARTIN NIETO
There are all sorts of problems with the primary/secondary approach. How do you know
the secondary is working? How do you deal with cold caches on the secondary when it
suddenly gets lots of load?

Instead, size the cluster with the number of hosts you need, then add one. Send traffic
to all of them. If any of them goes down, you have the capacity to handle the traffic.
This is called “N+1 provisioning”.

This was our rule at Netflix a dozen years ago, running Solr 1.3. I do it the same way
today with large sharded clusters, one extra per shard.

wunder
Walter Underwood
[hidden email]
http://observer.wunderwood.org/  (my blog)

> On Jan 11, 2021, at 2:41 AM, DAVID MARTIN NIETO <[hidden email]> wrote:
>
> I believe Solr dont have this configuration, you need a load balancer with that configuration mode for that.
>
> Kind regards.
>
>
> ________________________________
> De: Kaushal Shriyan <[hidden email]>
> Enviado: lunes, 11 de enero de 2021 11:32
> Para: [hidden email] <[hidden email]>
> Asunto: Apache Solr in High Availability Primary and Secondary node.
>
> Hi,
>
> We are running Apache Solr 8.7.0 search service on CentOS Linux release
> 7.9.2009 (Core).
>
> Is there a way to set up the Solr search service in High Availability Mode
> in the Primary and Secondary node? For example, if the primary node is down
> secondary node will take care of the service.
>
> Best Regards,
>
> Kaushal

Reply | Threaded
Open this post in threaded view
|

Re: Apache Solr in High Availability Primary and Secondary node.

Dmitri Maziuk
On 1/11/2021 11:25 AM, Walter Underwood wrote:
> There are all sorts of problems with the primary/secondary approach. How do you know
> the secondary is working? How do you deal with cold caches on the secondary when it
> suddenly gets lots of load?
>
> Instead, size the cluster with the number of hosts you need, then add one. Send traffic
> to all of them. If any of them goes down, you have the capacity to handle the traffic.
> This is called “N+1 provisioning”.

Where do you send your solr queries? If you have an http server at an ip
address that answers them, that's a single point of failure unless you
put it on a heartbet'ed cluster ip. (I tend to prefer ucarp to pacemaker
for that as the latter is bloated and too cumbersome for simple
active/passive setups, but that's OT.)

Dima

Reply | Threaded
Open this post in threaded view
|

Re: Apache Solr in High Availability Primary and Secondary node.

Walter Underwood
Use a load balancer. We’re in AWS, so we use an AWS ALB.

If you don’t have a failure-tolerant load balancer implementation, the site has bigger problems than search.

wunder
Walter Underwood
[hidden email]
http://observer.wunderwood.org/  (my blog)

> On Jan 11, 2021, at 10:15 AM, Dmitri Maziuk <[hidden email]> wrote:
>
> On 1/11/2021 11:25 AM, Walter Underwood wrote:
>> There are all sorts of problems with the primary/secondary approach. How do you know
>> the secondary is working? How do you deal with cold caches on the secondary when it
>> suddenly gets lots of load?
>> Instead, size the cluster with the number of hosts you need, then add one. Send traffic
>> to all of them. If any of them goes down, you have the capacity to handle the traffic.
>> This is called “N+1 provisioning”.
>
> Where do you send your solr queries? If you have an http server at an ip address that answers them, that's a single point of failure unless you put it on a heartbet'ed cluster ip. (I tend to prefer ucarp to pacemaker for that as the latter is bloated and too cumbersome for simple active/passive setups, but that's OT.)
>
> Dima

Reply | Threaded
Open this post in threaded view
|

Re: Apache Solr in High Availability Primary and Secondary node.

Dmitri Maziuk
On 1/11/2021 12:30 PM, Walter Underwood wrote:
> Use a load balancer. We’re in AWS, so we use an AWS ALB.
>
> If you don’t have a failure-tolerant load balancer implementation, the site has bigger problems than search.

That is the point, you have amazon doing that for you, some of us do it
ourselves, and it wasn't clear (to me anyway) if OP was asking about that.

Dima
Reply | Threaded
Open this post in threaded view
|

Re: Apache Solr in High Availability Primary and Secondary node.

Kaushal Shriyan


On Tue, Jan 12, 2021 at 12:10 AM Dmitri Maziuk <[hidden email]> wrote:
On 1/11/2021 12:30 PM, Walter Underwood wrote:
> Use a load balancer. We’re in AWS, so we use an AWS ALB.
>
> If you don’t have a failure-tolerant load balancer implementation, the site has bigger problems than search.

That is the point, you have amazon doing that for you, some of us do it
ourselves, and it wasn't clear (to me anyway) if OP was asking about that.

Dima

Hi,

Thanks for all the suggestions. I am hosting my Solr search service in GCP. I have a follow-up question regarding Solr Nodes. Do I need to have a Single Master and Multiple Slaves? I am using GCP Internal Load Balancer (https://cloud.google.com/load-balancing/docs/l7-internal). 

Internal LB -> Master Node1 and Master Node2. Master Node1 will have Slave 1 and Master Node2 will have Slave2 as per the below diagram as an example. Please suggest further and correct me if the approach is incorrect. I am not sure how do I replicate indices when I use Google Compute Platform Internal LB.  


                                  

Thanks in Advance.

Best Regards,

Kaushal
Reply | Threaded
Open this post in threaded view
|

Re: Apache Solr in High Availability Primary and Secondary node.

Kaushal Shriyan
Hi,

Checking in again if someone can pitch in for my earlier post to this mailing list? Thanks in Advance.

Best Regards,

On Tue, Jan 12, 2021 at 8:30 AM Kaushal Shriyan <[hidden email]> wrote:


On Tue, Jan 12, 2021 at 12:10 AM Dmitri Maziuk <[hidden email]> wrote:
On 1/11/2021 12:30 PM, Walter Underwood wrote:
> Use a load balancer. We’re in AWS, so we use an AWS ALB.
>
> If you don’t have a failure-tolerant load balancer implementation, the site has bigger problems than search.

That is the point, you have amazon doing that for you, some of us do it
ourselves, and it wasn't clear (to me anyway) if OP was asking about that.

Dima

Hi,

Thanks for all the suggestions. I am hosting my Solr search service in GCP. I have a follow-up question regarding Solr Nodes. Do I need to have a Single Master and Multiple Slaves? I am using GCP Internal Load Balancer (https://cloud.google.com/load-balancing/docs/l7-internal). 

Internal LB -> Master Node1 and Master Node2. Master Node1 will have Slave 1 and Master Node2 will have Slave2 as per the below diagram as an example. Please suggest further and correct me if the approach is incorrect. I am not sure how do I replicate indices when I use Google Compute Platform Internal LB.  


                                  

Thanks in Advance.

Best Regards,

Kaushal