Load balance writes

classic Classic list List threaded Threaded
20 messages Options
Reply | Threaded
Open this post in threaded view
|

Load balance writes

Boban Acimovic
I am wondering would I get performance benefits if I distribute writes to Solr nodes by sending documents exactly to the master of collection where the document belongs? My idea is that this would save some load between the cluster nodes and improve performances. How to do writes in the best way? Thank you in advance.
Reply | Threaded
Open this post in threaded view
|

Re: Load balance writes

Emir Arnautović
Hi Boban,
If you use SolrCloud  Solrj client and initialise it with ZK, it should be aware of masters and send documents in a smart way.

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 11 Feb 2019, at 12:18, Boban Acimovic <[hidden email]> wrote:
>
> I am wondering would I get performance benefits if I distribute writes to Solr nodes by sending documents exactly to the master of collection where the document belongs? My idea is that this would save some load between the cluster nodes and improve performances. How to do writes in the best way? Thank you in advance.

Reply | Threaded
Open this post in threaded view
|

Re: Load balance writes

Boban Acimovic
Thank you Emir for quick reply. I use home brewed Go client and write just to one of 12 available nodes. I believe I should find out this smart way to handle this :)




> On 11. Feb 2019, at 15:21, Emir Arnautović <[hidden email]> wrote:
>
> Hi Boban,
> If you use SolrCloud  Solrj client and initialise it with ZK, it should be aware of masters and send documents in a smart way.
>
> HTH,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/

Reply | Threaded
Open this post in threaded view
|

Re: Load balance writes

Emir Arnautović
Hi Boban,
Not sure if there is Solrj port to Go, but you can take that as model to build your ZK aware client that groups and sends updates to shard leaders. I see that there are couple of Solr Go clients, so you might first check if some already supports it or if it makes sense that you contribute that part to one of your choice.

Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 11 Feb 2019, at 16:09, Boban Acimovic <[hidden email]> wrote:
>
> Thank you Emir for quick reply. I use home brewed Go client and write just to one of 12 available nodes. I believe I should find out this smart way to handle this :)
>
>
>
>
>> On 11. Feb 2019, at 15:21, Emir Arnautović <[hidden email]> wrote:
>>
>> Hi Boban,
>> If you use SolrCloud  Solrj client and initialise it with ZK, it should be aware of masters and send documents in a smart way.
>>
>> HTH,
>> Emir
>> --
>> Monitoring - Log Management - Alerting - Anomaly Detection
>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>

Reply | Threaded
Open this post in threaded view
|

Re: Load balance writes

Walter Underwood
We send all updates to the load balancer, so they’ll end up on the wrong shard, not on the leader, etc. Indexing speed is still limited by the CPU available on each leader. I don’t think that sending the update to the right leader makes any improvement in throughput.

On the other hand, the CloudSolrClient ignores errors from Solr, which makes it unacceptable for production use.

I would stay with your current indexing client and worry about something else.

wunder
Walter Underwood
[hidden email]
http://observer.wunderwood.org/  (my blog)

> On Feb 11, 2019, at 8:15 AM, Emir Arnautović <[hidden email]> wrote:
>
> Hi Boban,
> Not sure if there is Solrj port to Go, but you can take that as model to build your ZK aware client that groups and sends updates to shard leaders. I see that there are couple of Solr Go clients, so you might first check if some already supports it or if it makes sense that you contribute that part to one of your choice.
>
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
>> On 11 Feb 2019, at 16:09, Boban Acimovic <[hidden email]> wrote:
>>
>> Thank you Emir for quick reply. I use home brewed Go client and write just to one of 12 available nodes. I believe I should find out this smart way to handle this :)
>>
>>
>>
>>
>>> On 11. Feb 2019, at 15:21, Emir Arnautović <[hidden email]> wrote:
>>>
>>> Hi Boban,
>>> If you use SolrCloud  Solrj client and initialise it with ZK, it should be aware of masters and send documents in a smart way.
>>>
>>> HTH,
>>> Emir
>>> --
>>> Monitoring - Log Management - Alerting - Anomaly Detection
>>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>>
>

Reply | Threaded
Open this post in threaded view
|

Re: Load balance writes

Boban Acimovic
In reply to this post by Emir Arnautović
Thank you again Emir. I can make my code ZK aware, that is no problem, but I can’t make it shard leader aware.  Can you point me to a document how are Solr shards created?  I already use ZK to get stuff, but I don’ t understand how to distinguish between shards from information I can get from a document that has to be indexes.

At the moment I send everything to one node, but I am pretty much sure it would help to send data to collection nodes. However, it would be even better it I can send data directly to shard leader. If you can’t describe this easily, I will check Soltj implementation.

Regards,
Boban




> On 11. Feb 2019, at 17:15, Emir Arnautović <[hidden email]> wrote:
>
> Hi Boban,
> Not sure if there is Solrj port to Go, but you can take that as model to build your ZK aware client that groups and sends updates to shard leaders. I see that there are couple of Solr Go clients, so you might first check if some already supports it or if it makes sense that you contribute that part to one of your choice.
>
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
Reply | Threaded
Open this post in threaded view
|

Re: Load balance writes

Boban Acimovic
In reply to this post by Walter Underwood
I would actually like to write a load balancer itself, but I want it to be able to send the data as efficiently as possible. I know how to read ZK data, but I don’t know how can I figure out which shard is responsible upon data that I have in a document that I want to index.




> On 11. Feb 2019, at 17:23, Walter Underwood <[hidden email]> wrote:
>
> We send all updates to the load balancer, so they’ll end up on the wrong shard, not on the leader, etc. Indexing speed is still limited by the CPU available on each leader. I don’t think that sending the update to the right leader makes any improvement in throughput.
>
> On the other hand, the CloudSolrClient ignores errors from Solr, which makes it unacceptable for production use.
>
> I would stay with your current indexing client and worry about something else.
>
> wunder
> Walter Underwood
> [hidden email]
> http://observer.wunderwood.org/  (my blog)
Reply | Threaded
Open this post in threaded view
|

Re: Load balance writes

Walter Underwood
Why would you want to write a load balancer when there are so many that are free and very fast?

For update traffic, there is very little benefit in sending updates directly to the shard leader. Forwarding an update to the leader is fast. Indexing is slow. So the bottleneck is always at the leader.

Before you build anything, measure. Collect a large update and send that directly to the leader. Then do the same to a non-leader shard. Compare the speed. If you are batching and indexing with multiple threads, I doubt you’ll see a meaningful difference. I commonly see 10% difference in identical load benchmarks, so the speedup has to be much larger than that to be real.

wunder
Walter Underwood
[hidden email]
http://observer.wunderwood.org/  (my blog)

> On Feb 11, 2019, at 8:38 AM, Boban Acimovic <[hidden email]> wrote:
>
> I would actually like to write a load balancer itself, but I want it to be able to send the data as efficiently as possible. I know how to read ZK data, but I don’t know how can I figure out which shard is responsible upon data that I have in a document that I want to index.
>
>
>
>
>> On 11. Feb 2019, at 17:23, Walter Underwood <[hidden email]> wrote:
>>
>> We send all updates to the load balancer, so they’ll end up on the wrong shard, not on the leader, etc. Indexing speed is still limited by the CPU available on each leader. I don’t think that sending the update to the right leader makes any improvement in throughput.
>>
>> On the other hand, the CloudSolrClient ignores errors from Solr, which makes it unacceptable for production use.
>>
>> I would stay with your current indexing client and worry about something else.
>>
>> wunder
>> Walter Underwood
>> [hidden email]
>> http://observer.wunderwood.org/  (my blog)

Reply | Threaded
Open this post in threaded view
|

Re: Load balance writes

Boban Acimovic
Can you mention one dockerized load balancer? Or even better one with Helm chart?


Like I said, I send all updates at the moment just to one out of 12 nodes.




> On 11. Feb 2019, at 17:52, Walter Underwood <[hidden email]> wrote:
>
> Why would you want to write a load balancer when there are so many that are free and very fast?
>
> For update traffic, there is very little benefit in sending updates directly to the shard leader. Forwarding an update to the leader is fast. Indexing is slow. So the bottleneck is always at the leader.
>
> Before you build anything, measure. Collect a large update and send that directly to the leader. Then do the same to a non-leader shard. Compare the speed. If you are batching and indexing with multiple threads, I doubt you’ll see a meaningful difference. I commonly see 10% difference in identical load benchmarks, so the speedup has to be much larger than that to be real.
>
> wunder
> Walter Underwood
> [hidden email]
> http://observer.wunderwood.org/  (my blog)
Reply | Threaded
Open this post in threaded view
|

Re: Load balance writes

Walter Underwood
nginx

http://nginx.org/en/docs/http/load_balancing.html
https://hub.docker.com/_/nginx

We run in Amazon AWS, so we use their Application Load Balaner (ALB). We do use nginx for other things.

wunder
Walter Underwood
[hidden email]
http://observer.wunderwood.org/  (my blog)

> On Feb 11, 2019, at 8:57 AM, Boban Acimovic <[hidden email]> wrote:
>
> Can you mention one dockerized load balancer? Or even better one with Helm chart?
>
>
> Like I said, I send all updates at the moment just to one out of 12 nodes.
>
>> On 11. Feb 2019, at 17:52, Walter Underwood <[hidden email]> wrote:
>>
>> Why would you want to write a load balancer when there are so many that are free and very fast?
>>
>> For update traffic, there is very little benefit in sending updates directly to the shard leader. Forwarding an update to the leader is fast. Indexing is slow. So the bottleneck is always at the leader.
>>
>> Before you build anything, measure. Collect a large update and send that directly to the leader. Then do the same to a non-leader shard. Compare the speed. If you are batching and indexing with multiple threads, I doubt you’ll see a meaningful difference. I commonly see 10% difference in identical load benchmarks, so the speedup has to be much larger than that to be real.
>>
>> wunder
>> Walter Underwood
>> [hidden email]
>> http://observer.wunderwood.org/  (my blog)

Reply | Threaded
Open this post in threaded view
|

RE: Load balance writes

Davis, Daniel (NIH/NLM) [C]
In reply to this post by Boban Acimovic
I think that the container orchestration framework takes care of that for you, but I am not an expert.  In Kubernetes, NGINX is often the Ingress controller, and as long as the services are running within the Kubernetes cluster, it can also serve as a load balancer, AFAICT.   In Kubernetes, a "Load Balancer" appears to be a concept for accessing services outside the cluster.

I presume you are using Kubernetes because of your reference to helm, but for what it's worth, here's an official haproxy image - https://hub.docker.com/_/haproxy

> -----Original Message-----
> From: Boban Acimovic <[hidden email]>
> Sent: Monday, February 11, 2019 11:58 AM
> To: [hidden email]
> Subject: Re: Load balance writes
>
> Can you mention one dockerized load balancer? Or even better one with
> Helm chart?
>
>
> Like I said, I send all updates at the moment just to one out of 12 nodes.
>
>
>
>
> > On 11. Feb 2019, at 17:52, Walter Underwood
> <[hidden email]> wrote:
> >
> > Why would you want to write a load balancer when there are so many that
> are free and very fast?
> >
> > For update traffic, there is very little benefit in sending updates directly to
> the shard leader. Forwarding an update to the leader is fast. Indexing is slow.
> So the bottleneck is always at the leader.
> >
> > Before you build anything, measure. Collect a large update and send that
> directly to the leader. Then do the same to a non-leader shard. Compare the
> speed. If you are batching and indexing with multiple threads, I doubt you’ll
> see a meaningful difference. I commonly see 10% difference in identical load
> benchmarks, so the speedup has to be much larger than that to be real.
> >
> > wunder
> > Walter Underwood
> > [hidden email]
> > http://observer.wunderwood.org/  (my blog)
Reply | Threaded
Open this post in threaded view
|

Re: Load balance writes

Walter Underwood
I haven’t used Kubernetes, but a web search for “helm nginx” seems to give some useful pages.

wunder
Walter Underwood
[hidden email]
http://observer.wunderwood.org/  (my blog)

> On Feb 11, 2019, at 9:13 AM, Davis, Daniel (NIH/NLM) [C] <[hidden email]> wrote:
>
> I think that the container orchestration framework takes care of that for you, but I am not an expert.  In Kubernetes, NGINX is often the Ingress controller, and as long as the services are running within the Kubernetes cluster, it can also serve as a load balancer, AFAICT.   In Kubernetes, a "Load Balancer" appears to be a concept for accessing services outside the cluster.
>
> I presume you are using Kubernetes because of your reference to helm, but for what it's worth, here's an official haproxy image - https://hub.docker.com/_/haproxy
>
>> -----Original Message-----
>> From: Boban Acimovic <[hidden email]>
>> Sent: Monday, February 11, 2019 11:58 AM
>> To: [hidden email]
>> Subject: Re: Load balance writes
>>
>> Can you mention one dockerized load balancer? Or even better one with
>> Helm chart?
>>
>>
>> Like I said, I send all updates at the moment just to one out of 12 nodes.
>>
>>
>>
>>
>>> On 11. Feb 2019, at 17:52, Walter Underwood
>> <[hidden email]> wrote:
>>>
>>> Why would you want to write a load balancer when there are so many that
>> are free and very fast?
>>>
>>> For update traffic, there is very little benefit in sending updates directly to
>> the shard leader. Forwarding an update to the leader is fast. Indexing is slow.
>> So the bottleneck is always at the leader.
>>>
>>> Before you build anything, measure. Collect a large update and send that
>> directly to the leader. Then do the same to a non-leader shard. Compare the
>> speed. If you are batching and indexing with multiple threads, I doubt you’ll
>> see a meaningful difference. I commonly see 10% difference in identical load
>> benchmarks, so the speedup has to be much larger than that to be real.
>>>
>>> wunder
>>> Walter Underwood
>>> [hidden email]
>>> http://observer.wunderwood.org/  (my blog)

Reply | Threaded
Open this post in threaded view
|

Re: Load balance writes

Boban Acimovic
In reply to this post by Walter Underwood
This is naive load balancing because it is not aware of ZK.




> On 11. Feb 2019, at 18:05, Walter Underwood <[hidden email]> wrote:
>
> nginx
>
> http://nginx.org/en/docs/http/load_balancing.html
> https://hub.docker.com/_/nginx
>
> We run in Amazon AWS, so we use their Application Load Balaner (ALB). We do use nginx for other things.
>
> wunder
> Walter Underwood
> [hidden email]
> http://observer.wunderwood.org/  (my blog)
Reply | Threaded
Open this post in threaded view
|

Re: Load balance writes

Boban Acimovic
In reply to this post by Davis, Daniel (NIH/NLM) [C]
But like I said in the previous message, nginx is not aware of the status of Solr nodes. I can easily write Go load balancer but not considering the shards. The only problem I have here is how to figure out which shard master is responsible of a document I want to insert to the index. How does Solr sharing works? Which values are used to determine the shard?




> On 11. Feb 2019, at 18:13, Davis, Daniel (NIH/NLM) [C] <[hidden email]> wrote:
>
> I think that the container orchestration framework takes care of that for you, but I am not an expert.  In Kubernetes, NGINX is often the Ingress controller, and as long as the services are running within the Kubernetes cluster, it can also serve as a load balancer, AFAICT.   In Kubernetes, a "Load Balancer" appears to be a concept for accessing services outside the cluster.
>
> I presume you are using Kubernetes because of your reference to helm, but for what it's worth, here's an official haproxy image - https://hub.docker.com/_/haproxy
>
>> -----Original Message-----
>> From: Boban Acimovic <[hidden email]>
>> Sent: Monday, February 11, 2019 11:58 AM
>> To: [hidden email]
>> Subject: Re: Load balance writes
>>
>> Can you mention one dockerized load balancer? Or even better one with
>> Helm chart?
>>
>>
>> Like I said, I send all updates at the moment just to one out of 12 nodes.
>>
>>
>>
>>
>>> On 11. Feb 2019, at 17:52, Walter Underwood
>> <[hidden email]> wrote:
>>>
>>> Why would you want to write a load balancer when there are so many that
>> are free and very fast?
>>>
>>> For update traffic, there is very little benefit in sending updates directly to
>> the shard leader. Forwarding an update to the leader is fast. Indexing is slow.
>> So the bottleneck is always at the leader.
>>>
>>> Before you build anything, measure. Collect a large update and send that
>> directly to the leader. Then do the same to a non-leader shard. Compare the
>> speed. If you are batching and indexing with multiple threads, I doubt you’ll
>> see a meaningful difference. I commonly see 10% difference in identical load
>> benchmarks, so the speedup has to be much larger than that to be real.
>>>
>>> wunder
>>> Walter Underwood
>>> [hidden email]
>>> http://observer.wunderwood.org/  (my blog)

Reply | Threaded
Open this post in threaded view
|

Re: Load balance writes

Boban Acimovic
In reply to this post by Walter Underwood
Like I said before, nginx is not a load balancer or at least not a clever load balancer. It does not talk to ZK. Please give me advanced solutions.




> On 11. Feb 2019, at 18:32, Walter Underwood <[hidden email]> wrote:
>
> I haven’t used Kubernetes, but a web search for “helm nginx” seems to give some useful pages.
>
> wunder
> Walter Underwood
> [hidden email]
> http://observer.wunderwood.org/  (my blog)
>
>> On Feb 11, 2019, at 9:13 AM, Davis, Daniel (NIH/NLM) [C] <[hidden email]> wrote:
>>
>> I think that the container orchestration framework takes care of that for you, but I am not an expert.  In Kubernetes, NGINX is often the Ingress controller, and as long as the services are running within the Kubernetes cluster, it can also serve as a load balancer, AFAICT.   In Kubernetes, a "Load Balancer" appears to be a concept for accessing services outside the cluster.
>>
>> I presume you are using Kubernetes because of your reference to helm, but for what it's worth, here's an official haproxy image - https://hub.docker.com/_/haproxy
Reply | Threaded
Open this post in threaded view
|

Re: Load balance writes

Walter Underwood
For the fourth time, ignore the shard leaders until you have measurements that prove the complexity is worth it.

We can index a million documents per minute by sending batched updates to a dumb load balancer.

wunder
Walter Underwood
[hidden email]
http://observer.wunderwood.org/  (my blog)

> On Feb 11, 2019, at 10:29 AM, Boban Acimovic <[hidden email]> wrote:
>
> Like I said before, nginx is not a load balancer or at least not a clever load balancer. It does not talk to ZK. Please give me advanced solutions.
>
>
>
>
>> On 11. Feb 2019, at 18:32, Walter Underwood <[hidden email]> wrote:
>>
>> I haven’t used Kubernetes, but a web search for “helm nginx” seems to give some useful pages.
>>
>> wunder
>> Walter Underwood
>> [hidden email]
>> http://observer.wunderwood.org/  (my blog)
>>
>>> On Feb 11, 2019, at 9:13 AM, Davis, Daniel (NIH/NLM) [C] <[hidden email]> wrote:
>>>
>>> I think that the container orchestration framework takes care of that for you, but I am not an expert.  In Kubernetes, NGINX is often the Ingress controller, and as long as the services are running within the Kubernetes cluster, it can also serve as a load balancer, AFAICT.   In Kubernetes, a "Load Balancer" appears to be a concept for accessing services outside the cluster.
>>>
>>> I presume you are using Kubernetes because of your reference to helm, but for what it's worth, here's an official haproxy image - https://hub.docker.com/_/haproxy

Reply | Threaded
Open this post in threaded view
|

Re: Load balance writes

lstusr 5u93n4
In reply to this post by Boban Acimovic
Hi Boban,

First of all: I agree with Walter here. Because the bottleneck is during
indexing on the leader, a basic round robin load balancer will perform just
as well as a custom solution. With far less headache. A custom solution
will be far more work than it's worth.

But, should you really want to write this yourself, you can get all of the
information you need from zookeeper, from the path:

<zkroot>/collections/<collection_name>/state.json

There, for each shard you'll see:
  - the "range" parameter that tells  you which subset of documents this
shard is responsible for (see
https://lucene.apache.org/solr/guide/7_6/shards-and-indexing-data-in-solrcloud.html#document-routing
for details on routing)
  - the list of all replicas. On each replica it will tell you:
      - the host name (base_url)
      - if it is the leader (has the property leader: true)

So your go-based solution would be to watch the state.json file from
zookeeper, and build up a function that, given the proper routing structure
for your document (the hash of the id by default, I think) will return the
hostname of the replica that's the leader.

Kyle

On Mon, 11 Feb 2019 at 13:30, Boban Acimovic <[hidden email]> wrote:

> Like I said before, nginx is not a load balancer or at least not a clever
> load balancer. It does not talk to ZK. Please give me advanced solutions.
>
>
>
>
> > On 11. Feb 2019, at 18:32, Walter Underwood <[hidden email]>
> wrote:
> >
> > I haven’t used Kubernetes, but a web search for “helm nginx” seems to
> give some useful pages.
> >
> > wunder
> > Walter Underwood
> > [hidden email]
> > http://observer.wunderwood.org/  (my blog)
> >
> >> On Feb 11, 2019, at 9:13 AM, Davis, Daniel (NIH/NLM) [C] <
> [hidden email]> wrote:
> >>
> >> I think that the container orchestration framework takes care of that
> for you, but I am not an expert.  In Kubernetes, NGINX is often the Ingress
> controller, and as long as the services are running within the Kubernetes
> cluster, it can also serve as a load balancer, AFAICT.   In Kubernetes, a
> "Load Balancer" appears to be a concept for accessing services outside the
> cluster.
> >>
> >> I presume you are using Kubernetes because of your reference to helm,
> but for what it's worth, here's an official haproxy image -
> https://hub.docker.com/_/haproxy
>
Reply | Threaded
Open this post in threaded view
|

Re: Load balance writes

Walter Underwood
The update router would also need to look for failures indexing at each leader,
then re-read the cluster state to see if the leader had changed. Also re-send any
failed updates, and so on.

wunder
Walter Underwood
[hidden email]
http://observer.wunderwood.org/  (my blog)

> On Feb 11, 2019, at 11:07 AM, lstusr 5u93n4 <[hidden email]> wrote:
>
> Hi Boban,
>
> First of all: I agree with Walter here. Because the bottleneck is during
> indexing on the leader, a basic round robin load balancer will perform just
> as well as a custom solution. With far less headache. A custom solution
> will be far more work than it's worth.
>
> But, should you really want to write this yourself, you can get all of the
> information you need from zookeeper, from the path:
>
> <zkroot>/collections/<collection_name>/state.json
>
> There, for each shard you'll see:
>  - the "range" parameter that tells  you which subset of documents this
> shard is responsible for (see
> https://lucene.apache.org/solr/guide/7_6/shards-and-indexing-data-in-solrcloud.html#document-routing
> for details on routing)
>  - the list of all replicas. On each replica it will tell you:
>      - the host name (base_url)
>      - if it is the leader (has the property leader: true)
>
> So your go-based solution would be to watch the state.json file from
> zookeeper, and build up a function that, given the proper routing structure
> for your document (the hash of the id by default, I think) will return the
> hostname of the replica that's the leader.
>
> Kyle
>
> On Mon, 11 Feb 2019 at 13:30, Boban Acimovic <[hidden email]> wrote:
>
>> Like I said before, nginx is not a load balancer or at least not a clever
>> load balancer. It does not talk to ZK. Please give me advanced solutions.
>>
>>
>>
>>
>>> On 11. Feb 2019, at 18:32, Walter Underwood <[hidden email]>
>> wrote:
>>>
>>> I haven’t used Kubernetes, but a web search for “helm nginx” seems to
>> give some useful pages.
>>>
>>> wunder
>>> Walter Underwood
>>> [hidden email]
>>> http://observer.wunderwood.org/  (my blog)
>>>
>>>> On Feb 11, 2019, at 9:13 AM, Davis, Daniel (NIH/NLM) [C] <
>> [hidden email]> wrote:
>>>>
>>>> I think that the container orchestration framework takes care of that
>> for you, but I am not an expert.  In Kubernetes, NGINX is often the Ingress
>> controller, and as long as the services are running within the Kubernetes
>> cluster, it can also serve as a load balancer, AFAICT.   In Kubernetes, a
>> "Load Balancer" appears to be a concept for accessing services outside the
>> cluster.
>>>>
>>>> I presume you are using Kubernetes because of your reference to helm,
>> but for what it's worth, here's an official haproxy image -
>> https://hub.docker.com/_/haproxy
>>

Reply | Threaded
Open this post in threaded view
|

Re: Load balance writes

Jason Gerlowski
> On the other hand, the CloudSolrClient ignores errors from Solr, which makes it unacceptable for production use.

Did you mean "ConcurrentUpdateSolrClient"?  I don't think
CloudSolrClient does this, though I've been surprised before and
possible I just missed something.  Just wondering.

Jason

On Mon, Feb 11, 2019 at 2:14 PM Walter Underwood <[hidden email]> wrote:

>
> The update router would also need to look for failures indexing at each leader,
> then re-read the cluster state to see if the leader had changed. Also re-send any
> failed updates, and so on.
>
> wunder
> Walter Underwood
> [hidden email]
> http://observer.wunderwood.org/  (my blog)
>
> > On Feb 11, 2019, at 11:07 AM, lstusr 5u93n4 <[hidden email]> wrote:
> >
> > Hi Boban,
> >
> > First of all: I agree with Walter here. Because the bottleneck is during
> > indexing on the leader, a basic round robin load balancer will perform just
> > as well as a custom solution. With far less headache. A custom solution
> > will be far more work than it's worth.
> >
> > But, should you really want to write this yourself, you can get all of the
> > information you need from zookeeper, from the path:
> >
> > <zkroot>/collections/<collection_name>/state.json
> >
> > There, for each shard you'll see:
> >  - the "range" parameter that tells  you which subset of documents this
> > shard is responsible for (see
> > https://lucene.apache.org/solr/guide/7_6/shards-and-indexing-data-in-solrcloud.html#document-routing
> > for details on routing)
> >  - the list of all replicas. On each replica it will tell you:
> >      - the host name (base_url)
> >      - if it is the leader (has the property leader: true)
> >
> > So your go-based solution would be to watch the state.json file from
> > zookeeper, and build up a function that, given the proper routing structure
> > for your document (the hash of the id by default, I think) will return the
> > hostname of the replica that's the leader.
> >
> > Kyle
> >
> > On Mon, 11 Feb 2019 at 13:30, Boban Acimovic <[hidden email]> wrote:
> >
> >> Like I said before, nginx is not a load balancer or at least not a clever
> >> load balancer. It does not talk to ZK. Please give me advanced solutions.
> >>
> >>
> >>
> >>
> >>> On 11. Feb 2019, at 18:32, Walter Underwood <[hidden email]>
> >> wrote:
> >>>
> >>> I haven’t used Kubernetes, but a web search for “helm nginx” seems to
> >> give some useful pages.
> >>>
> >>> wunder
> >>> Walter Underwood
> >>> [hidden email]
> >>> http://observer.wunderwood.org/  (my blog)
> >>>
> >>>> On Feb 11, 2019, at 9:13 AM, Davis, Daniel (NIH/NLM) [C] <
> >> [hidden email]> wrote:
> >>>>
> >>>> I think that the container orchestration framework takes care of that
> >> for you, but I am not an expert.  In Kubernetes, NGINX is often the Ingress
> >> controller, and as long as the services are running within the Kubernetes
> >> cluster, it can also serve as a load balancer, AFAICT.   In Kubernetes, a
> >> "Load Balancer" appears to be a concept for accessing services outside the
> >> cluster.
> >>>>
> >>>> I presume you are using Kubernetes because of your reference to helm,
> >> but for what it's worth, here's an official haproxy image -
> >> https://hub.docker.com/_/haproxy
> >>
>
Reply | Threaded
Open this post in threaded view
|

Re: Load balance writes

Boban Acimovic
In reply to this post by Walter Underwood
OK, thank you guys :)

Regards,
Boban