Reads only on replicas?

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Reads only on replicas?

Stephen Lewis Bianamara
Hi Folks,

Is it possible to configure a Solr cloud to serve reads only from the
followers? I see this page
<https://solr.apache.org/guide/8_8/distributed-requests.html> explains how
to prefer by local, replica type, or sysprops. But is it possible to
specify that something should be served by a non-leader whenever possible
(or even require it to return at all)? E.g., something like

shards.preference=leader:false


Thanks,
Stephen
Reply | Threaded
Open this post in threaded view
|

Re: Reads only on replicas?

Stephen Lewis Bianamara
Hi Community,

I checked the source and see that this doesn't look to be supported to read
only from followers. I guess this is possible by pulling the state.json
data from zookeeper, identifying the leaders, and then passing an explicit
whitelist not including the leaders. Can someone confirm that this is the
only way to accomplish this goal?

Thanks,
Stephen

On Tue, Jun 8, 2021 at 1:55 PM Stephen Lewis Bianamara <
[hidden email]> wrote:

> Hi Folks,
>
> Is it possible to configure a Solr cloud to serve reads only from the
> followers? I see this page
> <https://solr.apache.org/guide/8_8/distributed-requests.html> explains
> how to prefer by local, replica type, or sysprops. But is it possible to
> specify that something should be served by a non-leader whenever possible
> (or even require it to return at all)? E.g., something like
>
> shards.preference=leader:false
>
>
> Thanks,
> Stephen
>
Reply | Threaded
Open this post in threaded view
|

Re: Reads only on replicas?

Stephen Lewis Bianamara
I've filed the Jira item to implement this here:

SOLR-15472 <https://issues.apache.org/jira/browse/SOLR-15472> shards.preference
should support leader=false

On Thu, Jun 10, 2021 at 9:41 AM Stephen Lewis Bianamara <
[hidden email]> wrote:

> Hi Community,
>
> I checked the source and see that this doesn't look to be supported to
> read only from followers. I guess this is possible by pulling the
> state.json data from zookeeper, identifying the leaders, and then passing
> an explicit whitelist not including the leaders. Can someone confirm that
> this is the only way to accomplish this goal?
>
> Thanks,
> Stephen
>
> On Tue, Jun 8, 2021 at 1:55 PM Stephen Lewis Bianamara <
> [hidden email]> wrote:
>
>> Hi Folks,
>>
>> Is it possible to configure a Solr cloud to serve reads only from the
>> followers? I see this page
>> <https://solr.apache.org/guide/8_8/distributed-requests.html> explains
>> how to prefer by local, replica type, or sysprops. But is it possible to
>> specify that something should be served by a non-leader whenever possible
>> (or even require it to return at all)? E.g., something like
>>
>> shards.preference=leader:false
>>
>>
>> Thanks,
>> Stephen
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: Reads only on replicas?

Walter Underwood
In reply to this post by Stephen Lewis Bianamara
What problem are you trying to solve with this?

Are you trying to send queries to less loaded machines? If so, this won’t do that.
Leaders only do a little bit more work than followers. All indexing processing is local
and that is most of the CPU usage.

wunder
Walter Underwood
[hidden email]
http://observer.wunderwood.org/  (my blog)

> On Jun 10, 2021, at 9:41 AM, Stephen Lewis Bianamara <[hidden email]> wrote:
>
> Hi Community,
>
> I checked the source and see that this doesn't look to be supported to read
> only from followers. I guess this is possible by pulling the state.json
> data from zookeeper, identifying the leaders, and then passing an explicit
> whitelist not including the leaders. Can someone confirm that this is the
> only way to accomplish this goal?
>
> Thanks,
> Stephen
>
> On Tue, Jun 8, 2021 at 1:55 PM Stephen Lewis Bianamara <
> [hidden email]> wrote:
>
>> Hi Folks,
>>
>> Is it possible to configure a Solr cloud to serve reads only from the
>> followers? I see this page
>> <https://solr.apache.org/guide/8_8/distributed-requests.html> explains
>> how to prefer by local, replica type, or sysprops. But is it possible to
>> specify that something should be served by a non-leader whenever possible
>> (or even require it to return at all)? E.g., something like
>>
>> shards.preference=leader:false
>>
>>
>> Thanks,
>> Stephen
>>

Reply | Threaded
Open this post in threaded view
|

Re: Reads only on replicas?

David Hastings
I would think it would be as simple as deleting the update handler from your solrconfig on the search servers

> On Jun 10, 2021, at 6:28 PM, Walter Underwood <[hidden email]> wrote:
>
> What problem are you trying to solve with this?
>
> Are you trying to send queries to less loaded machines? If so, this won’t do that.
> Leaders only do a little bit more work than followers. All indexing processing is local
> and that is most of the CPU usage.
>
> wunder
> Walter Underwood
> [hidden email]
> http://observer.wunderwood.org/  (my blog)
>
>> On Jun 10, 2021, at 9:41 AM, Stephen Lewis Bianamara <[hidden email]> wrote:
>>
>> Hi Community,
>>
>> I checked the source and see that this doesn't look to be supported to read
>> only from followers. I guess this is possible by pulling the state.json
>> data from zookeeper, identifying the leaders, and then passing an explicit
>> whitelist not including the leaders. Can someone confirm that this is the
>> only way to accomplish this goal?
>>
>> Thanks,
>> Stephen
>>
>>> On Tue, Jun 8, 2021 at 1:55 PM Stephen Lewis Bianamara <
>>> [hidden email]> wrote:
>>>
>>> Hi Folks,
>>>
>>> Is it possible to configure a Solr cloud to serve reads only from the
>>> followers? I see this page
>>> <https://solr.apache.org/guide/8_8/distributed-requests.html> explains
>>> how to prefer by local, replica type, or sysprops. But is it possible to
>>> specify that something should be served by a non-leader whenever possible
>>> (or even require it to return at all)? E.g., something like
>>>
>>> shards.preference=leader:false
>>>
>>>
>>> Thanks,
>>> Stephen
>>>
>
Reply | Threaded
Open this post in threaded view
|

Re: Reads only on replicas?

Shawn Heisey-2
On 6/10/2021 5:08 PM, Dave wrote:
> I would think it would be as simple as deleting the update handler from your solrconfig on the search servers

In SolrCloud, all cores for a collection use the same solrconfig.xml
file, and it's in zookeeper.  Any solrconfig.xml file on the disk is
ignored.

Your suggestion would be great for non-cloud deployments.

Thanks,
Shawn
Reply | Threaded
Open this post in threaded view
|

Re: Reads only on replicas?

Bram Van Dam
In reply to this post by Walter Underwood
On 11/06/2021 00.28, Walter Underwood wrote:
> Are you trying to send queries to less loaded machines? If so, this won’t do that.
> Leaders only do a little bit more work than followers. All indexing processing is local
> and that is most of the CPU usage.

I suspect that depends on the type of replica.

Reducing the load on leaders seems like a valuable feature. We've
observed cases where a high query load on leaders caused it to become
unresponsive, resulting in a cascade of failures, eventually rendering
an entire cluster unusable.

In fact, it would also be useful to be able to direct certain queries
*only* to leaders when you know that replicas are lagging behind.

  - Bram
Reply | Threaded
Open this post in threaded view
|

Re: Reads only on replicas?

weiwang19
We did some explorations on excluding reads from the leader in a TLOG +
PULL cloud. When updates are heavy we do observer query throughput and
latency improvement.  Added the patch we have bee testing
https://issues.apache.org/jira/secure/attachment/13026792/SOLR-15472.patch


On Fri, Jun 11, 2021 at 11:43 AM Bram Van Dam <[hidden email]> wrote:

> On 11/06/2021 00.28, Walter Underwood wrote:
> > Are you trying to send queries to less loaded machines? If so, this
> won’t do that.
> > Leaders only do a little bit more work than followers. All indexing
> processing is local
> > and that is most of the CPU usage.
>
> I suspect that depends on the type of replica.
>
> Reducing the load on leaders seems like a valuable feature. We've
> observed cases where a high query load on leaders caused it to become
> unresponsive, resulting in a cascade of failures, eventually rendering
> an entire cluster unusable.
>
> In fact, it would also be useful to be able to direct certain queries
> *only* to leaders when you know that replicas are lagging behind.
>
>   - Bram
>
Reply | Threaded
Open this post in threaded view
|

Re: Reads only on replicas?

Stephen Lewis Bianamara
This is all great info. Thanks for the patch Wei! It looks reasonable to me
and it's exciting to hear about your results. I agree that your patch +
tlog looks like a good solution at a design level.

Now onto the question of the problem to solve. Bram and Wei cover it well.
The goal at a high level is to better divide the work, as well as decouple
operational factors, between read and write replicas.

Right now, I'm interested in an architecture with a shared collections with
3 replicas per shard, where any one of them may become the leader as fault
tolerance, which I believe tlog plus Wei's patch fits perfectly. This also
doesn't work with Dave's suggestion, though it could be useful for slightly
different setups.

What are the next steps on integrating Wei's patch into main-line for
official release?

Best,
Stephen

On Sun, Jun 13, 2021, 4:46 PM Wei <[hidden email]> wrote:

> We did some explorations on excluding reads from the leader in a TLOG +
> PULL cloud. When updates are heavy we do observer query throughput and
> latency improvement.  Added the patch we have bee testing
> https://issues.apache.org/jira/secure/attachment/13026792/SOLR-15472.patch
>
>
> On Fri, Jun 11, 2021 at 11:43 AM Bram Van Dam <[hidden email]>
> wrote:
>
> > On 11/06/2021 00.28, Walter Underwood wrote:
> > > Are you trying to send queries to less loaded machines? If so, this
> > won’t do that.
> > > Leaders only do a little bit more work than followers. All indexing
> > processing is local
> > > and that is most of the CPU usage.
> >
> > I suspect that depends on the type of replica.
> >
> > Reducing the load on leaders seems like a valuable feature. We've
> > observed cases where a high query load on leaders caused it to become
> > unresponsive, resulting in a cascade of failures, eventually rendering
> > an entire cluster unusable.
> >
> > In fact, it would also be useful to be able to direct certain queries
> > *only* to leaders when you know that replicas are lagging behind.
> >
> >   - Bram
> >
>