Request routing / load-balancing TLOG & PULL replica types

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Request routing / load-balancing TLOG & PULL replica types

Greg Roodt
Hi

I have a question around how queries are routed and load-balanced in a
cluster of mixed TLOG and PULL replicas.

I thought that I might have to put a load-balancer in front of the PULL
replicas and direct queries at them manually as nodes are added and removed
as PULL replicas. However, it seems that SolrCloud handles this
automatically?

If I add a new PULL replica node, it goes into state="recovering" while it
pulls the core. As expected. What happens if queries are directed at this
node while in this state? From what I am observing, the query gets directed
to another node?

If SolrCloud is handling the routing of requests to active nodes, will it
automatically favour PULL replicas for read queries and TLOG replicas for
writes?

Thanks
Greg
Reply | Threaded
Open this post in threaded view
|

Re: Request routing / load-balancing TLOG & PULL replica types

Erick Erickson
Talking a little out of my depth here as I haven't ben in that code,
so if there are corrections they're welcome.

In general, there's nothing special between TLOG and PULL replicas in
terms of query routing. For
that matter, nothing special about either of these .vs. NRT replicas
for _queries_.

SolrCloud has internal load balancers that route queries across the
network. SolrCloud automatically
bypasses all  non-active replicas. So there's no particular need to
use a fronting load balancer. That
said, it depends on how you are accessing SolrCloud. By that I mean if
you provide a single HTTP end
point to a single node, you have a single point of failure. Even this
is irrelevant if you use SolrJ because
it has (you guessed it) an internal load balancer and is
ZooKeeper-aware so can handle nodes coming and
going.

So look at it this way. Each Solr node has a list of active replicas
and directs queries (or sub-queries) to those
replicas. It doesn't matter _where_ the replica is or, indeed, whether
any of them are on the same node. As
long as Solr is running, it sends queries to the right place.

"will it automatically favour PULL replicas"
No. There is so little extra work in a TLOG .vs. a PULL replica that
this is minimally useful. The extra work
a TLOG replica does is just to flush out the incoming documents to the
tlog, a bit of I/O. Much more
interesting is that the new metrics are in place to allow much more
intelligent use of resources. A heavy
request on a replica will put _much_ more load on that machine than
handling the TLOG. The metrics
will allow SolrCloud to say "replicas 1, 3, 5 are pretty busy, let's
not use them for more work until they
get less busy", and it won't matter whether they're TLOG, PULL or NRT replicas..

Best,
Erick

On Sun, Feb 11, 2018 at 6:35 PM, Greg Roodt <[hidden email]> wrote:

> Hi
>
> I have a question around how queries are routed and load-balanced in a
> cluster of mixed TLOG and PULL replicas.
>
> I thought that I might have to put a load-balancer in front of the PULL
> replicas and direct queries at them manually as nodes are added and removed
> as PULL replicas. However, it seems that SolrCloud handles this
> automatically?
>
> If I add a new PULL replica node, it goes into state="recovering" while it
> pulls the core. As expected. What happens if queries are directed at this
> node while in this state? From what I am observing, the query gets directed
> to another node?
>
> If SolrCloud is handling the routing of requests to active nodes, will it
> automatically favour PULL replicas for read queries and TLOG replicas for
> writes?
>
> Thanks
> Greg
Reply | Threaded
Open this post in threaded view
|

Re: Request routing / load-balancing TLOG & PULL replica types

Tomas Fernandez Lobbe-2
In reply to this post by Greg Roodt
On the last question:
For Writes: Yes. Writes are going to be sent to the shard leader, and since PULL replicas can’t  be leaders, it’s going to be a TLOG replica. If you are using CloudSolrClient, then this routing will be done directly from the client (since it will send the update to the leader), and if you are using some other HTTP client, then yes, the PULL replica will forward the update, the same way any non-leader node would.

For reads: this won’t happen today, and any replica can respond to queries. I do believe there is value in this kind of routing logic, sometimes you simply don’t want the leader to handle any queries, specially when queries can be expensive. You could do this today if you want, by putting some load balancer in front and just direct your queries to the nodes you know are PULL, but keep in mind that this would only work in the single shard scenario, and only if you hit an active replica (otherwise, as you said, the query will be routed to any other node of the shard, regardless of the type), if you have multiple shards then you need to use the “shards” parameter and tell Solr exactly which nodes you want to hit for each shard (the “shards” approach can also be done in the single shard case, although you would be adding an extra hop I believe)

Tomás
Sent from my iPhone

> On Feb 11, 2018, at 6:35 PM, Greg Roodt <[hidden email]> wrote:
>
> Hi
>
> I have a question around how queries are routed and load-balanced in a
> cluster of mixed TLOG and PULL replicas.
>
> I thought that I might have to put a load-balancer in front of the PULL
> replicas and direct queries at them manually as nodes are added and removed
> as PULL replicas. However, it seems that SolrCloud handles this
> automatically?
>
> If I add a new PULL replica node, it goes into state="recovering" while it
> pulls the core. As expected. What happens if queries are directed at this
> node while in this state? From what I am observing, the query gets directed
> to another node?
>
> If SolrCloud is handling the routing of requests to active nodes, will it
> automatically favour PULL replicas for read queries and TLOG replicas for
> writes?
>
> Thanks
> Greg
Reply | Threaded
Open this post in threaded view
|

Re: Request routing / load-balancing TLOG & PULL replica types

Greg Roodt
Thank you both for your very detailed answers.

This is great to know. I knew that SolrJ had the cluster aware knowledge
(via zookeeper), but I was wondering what something like curl would do.
Great to know that internally the cluster will proxy queries to the
appropriate place regardless.

I am running the single shard scenario. I'm thinking of using a dedicated
HTTP load-balancer in front of the PULL replicas only with read-only
queries directed directly at the load-balancer. In this situation, the
healthy PULL replicas *should* handle the queries on the node itself
without a proxy hop (assuming state=active). New PULL replicas added to the
load-balancer will internally proxy queries to the other PULL or TLOG
replicas while in state=recovering until the switch to state=active.

Is my understanding correct?

Is this sensible to do, or is it not worth it due to the smart proxying
that SolrCloud can do anyway?

If the TLOG and PULL replicas are so similar, is there any real advantage
to having a mixed cluster? I assume a bit less work is required across the
cluster to propagate writes if you only have 3 TLOG nodes vs 10+ PULL
nodes? Or would it be better to just have 13 TLOG nodes?





On 12 February 2018 at 15:24, Tomas Fernandez Lobbe <[hidden email]>
wrote:

> On the last question:
> For Writes: Yes. Writes are going to be sent to the shard leader, and
> since PULL replicas can’t  be leaders, it’s going to be a TLOG replica. If
> you are using CloudSolrClient, then this routing will be done directly from
> the client (since it will send the update to the leader), and if you are
> using some other HTTP client, then yes, the PULL replica will forward the
> update, the same way any non-leader node would.
>
> For reads: this won’t happen today, and any replica can respond to
> queries. I do believe there is value in this kind of routing logic,
> sometimes you simply don’t want the leader to handle any queries, specially
> when queries can be expensive. You could do this today if you want, by
> putting some load balancer in front and just direct your queries to the
> nodes you know are PULL, but keep in mind that this would only work in the
> single shard scenario, and only if you hit an active replica (otherwise, as
> you said, the query will be routed to any other node of the shard,
> regardless of the type), if you have multiple shards then you need to use
> the “shards” parameter and tell Solr exactly which nodes you want to hit
> for each shard (the “shards” approach can also be done in the single shard
> case, although you would be adding an extra hop I believe)
>
> Tomás
> Sent from my iPhone
>
> > On Feb 11, 2018, at 6:35 PM, Greg Roodt <[hidden email]> wrote:
> >
> > Hi
> >
> > I have a question around how queries are routed and load-balanced in a
> > cluster of mixed TLOG and PULL replicas.
> >
> > I thought that I might have to put a load-balancer in front of the PULL
> > replicas and direct queries at them manually as nodes are added and
> removed
> > as PULL replicas. However, it seems that SolrCloud handles this
> > automatically?
> >
> > If I add a new PULL replica node, it goes into state="recovering" while
> it
> > pulls the core. As expected. What happens if queries are directed at this
> > node while in this state? From what I am observing, the query gets
> directed
> > to another node?
> >
> > If SolrCloud is handling the routing of requests to active nodes, will it
> > automatically favour PULL replicas for read queries and TLOG replicas for
> > writes?
> >
> > Thanks
> > Greg
>
Reply | Threaded
Open this post in threaded view
|

Re: Request routing / load-balancing TLOG & PULL replica types

Ere Maijala
Your question about directing queries to PULL replicas only has been
discussed on the list. Look for topic "Limit search queries only to pull
replicas". What I'd like to see is something similar to the
preferLocalShards parameter. It could be something like
"preferReplicaTypes=TLOG,PULL". Tomás mentioned previously that
SOLR-10880 could be used as a base for such funtionality, and I'm
considering taking a stab at implementing it.

--Ere

Greg Roodt kirjoitti 12.2.2018 klo 6.55:

> Thank you both for your very detailed answers.
>
> This is great to know. I knew that SolrJ had the cluster aware knowledge
> (via zookeeper), but I was wondering what something like curl would do.
> Great to know that internally the cluster will proxy queries to the
> appropriate place regardless.
>
> I am running the single shard scenario. I'm thinking of using a dedicated
> HTTP load-balancer in front of the PULL replicas only with read-only
> queries directed directly at the load-balancer. In this situation, the
> healthy PULL replicas *should* handle the queries on the node itself
> without a proxy hop (assuming state=active). New PULL replicas added to the
> load-balancer will internally proxy queries to the other PULL or TLOG
> replicas while in state=recovering until the switch to state=active.
>
> Is my understanding correct?
>
> Is this sensible to do, or is it not worth it due to the smart proxying
> that SolrCloud can do anyway?
>
> If the TLOG and PULL replicas are so similar, is there any real advantage
> to having a mixed cluster? I assume a bit less work is required across the
> cluster to propagate writes if you only have 3 TLOG nodes vs 10+ PULL
> nodes? Or would it be better to just have 13 TLOG nodes?
>
>
>
>
>
> On 12 February 2018 at 15:24, Tomas Fernandez Lobbe <[hidden email]>
> wrote:
>
>> On the last question:
>> For Writes: Yes. Writes are going to be sent to the shard leader, and
>> since PULL replicas can’t  be leaders, it’s going to be a TLOG replica. If
>> you are using CloudSolrClient, then this routing will be done directly from
>> the client (since it will send the update to the leader), and if you are
>> using some other HTTP client, then yes, the PULL replica will forward the
>> update, the same way any non-leader node would.
>>
>> For reads: this won’t happen today, and any replica can respond to
>> queries. I do believe there is value in this kind of routing logic,
>> sometimes you simply don’t want the leader to handle any queries, specially
>> when queries can be expensive. You could do this today if you want, by
>> putting some load balancer in front and just direct your queries to the
>> nodes you know are PULL, but keep in mind that this would only work in the
>> single shard scenario, and only if you hit an active replica (otherwise, as
>> you said, the query will be routed to any other node of the shard,
>> regardless of the type), if you have multiple shards then you need to use
>> the “shards” parameter and tell Solr exactly which nodes you want to hit
>> for each shard (the “shards” approach can also be done in the single shard
>> case, although you would be adding an extra hop I believe)
>>
>> Tomás
>> Sent from my iPhone
>>
>>> On Feb 11, 2018, at 6:35 PM, Greg Roodt <[hidden email]> wrote:
>>>
>>> Hi
>>>
>>> I have a question around how queries are routed and load-balanced in a
>>> cluster of mixed TLOG and PULL replicas.
>>>
>>> I thought that I might have to put a load-balancer in front of the PULL
>>> replicas and direct queries at them manually as nodes are added and
>> removed
>>> as PULL replicas. However, it seems that SolrCloud handles this
>>> automatically?
>>>
>>> If I add a new PULL replica node, it goes into state="recovering" while
>> it
>>> pulls the core. As expected. What happens if queries are directed at this
>>> node while in this state? From what I am observing, the query gets
>> directed
>>> to another node?
>>>
>>> If SolrCloud is handling the routing of requests to active nodes, will it
>>> automatically favour PULL replicas for read queries and TLOG replicas for
>>> writes?
>>>
>>> Thanks
>>> Greg
>>
>

--
Ere Maijala
Kansalliskirjasto / The National Library of Finland
Reply | Threaded
Open this post in threaded view
|

Re: Request routing / load-balancing TLOG & PULL replica types

Greg Roodt
Thanks Ere. I've taken a look at the discussion here:
http://lucene.472066.n3.nabble.com/Limit-search-queries-only-to-pull-replicas-td4367323.html
This is how I was imagining TLOG & PULL replicas would wor, so if this
functionality does get developed, it would be useful to me.

I still have 2 questions at the moment:
1. I am running the single shard scenario. I'm thinking of using a
dedicated HTTP load-balancer in front of the PULL replicas only with
read-only queries directed directly at the load-balancer. In this
situation, the healthy PULL replicas *should* handle the queries on the
node itself without a proxy hop (assuming state=active). New PULL replicas
added to the load-balancer will internally proxy queries to the other PULL
or TLOG replicas while in state=recovering until the switch to
state=active. Is my understanding correct?

2. Is it all worth it? Is there any advantage to running a cluster of 3
TLOGs + 10 PULL replicas vs running 13 TLOG replicas?




On 12 February 2018 at 19:25, Ere Maijala <[hidden email]> wrote:

> Your question about directing queries to PULL replicas only has been
> discussed on the list. Look for topic "Limit search queries only to pull
> replicas". What I'd like to see is something similar to the
> preferLocalShards parameter. It could be something like
> "preferReplicaTypes=TLOG,PULL". Tomás mentioned previously that
> SOLR-10880 could be used as a base for such funtionality, and I'm
> considering taking a stab at implementing it.
>
> --Ere
>
>
> Greg Roodt kirjoitti 12.2.2018 klo 6.55:
>
>> Thank you both for your very detailed answers.
>>
>> This is great to know. I knew that SolrJ had the cluster aware knowledge
>> (via zookeeper), but I was wondering what something like curl would do.
>> Great to know that internally the cluster will proxy queries to the
>> appropriate place regardless.
>>
>> I am running the single shard scenario. I'm thinking of using a dedicated
>> HTTP load-balancer in front of the PULL replicas only with read-only
>> queries directed directly at the load-balancer. In this situation, the
>> healthy PULL replicas *should* handle the queries on the node itself
>> without a proxy hop (assuming state=active). New PULL replicas added to
>> the
>> load-balancer will internally proxy queries to the other PULL or TLOG
>> replicas while in state=recovering until the switch to state=active.
>>
>> Is my understanding correct?
>>
>> Is this sensible to do, or is it not worth it due to the smart proxying
>> that SolrCloud can do anyway?
>>
>> If the TLOG and PULL replicas are so similar, is there any real advantage
>> to having a mixed cluster? I assume a bit less work is required across the
>> cluster to propagate writes if you only have 3 TLOG nodes vs 10+ PULL
>> nodes? Or would it be better to just have 13 TLOG nodes?
>>
>>
>>
>>
>>
>> On 12 February 2018 at 15:24, Tomas Fernandez Lobbe <[hidden email]>
>> wrote:
>>
>> On the last question:
>>> For Writes: Yes. Writes are going to be sent to the shard leader, and
>>> since PULL replicas can’t  be leaders, it’s going to be a TLOG replica.
>>> If
>>> you are using CloudSolrClient, then this routing will be done directly
>>> from
>>> the client (since it will send the update to the leader), and if you are
>>> using some other HTTP client, then yes, the PULL replica will forward the
>>> update, the same way any non-leader node would.
>>>
>>> For reads: this won’t happen today, and any replica can respond to
>>> queries. I do believe there is value in this kind of routing logic,
>>> sometimes you simply don’t want the leader to handle any queries,
>>> specially
>>> when queries can be expensive. You could do this today if you want, by
>>> putting some load balancer in front and just direct your queries to the
>>> nodes you know are PULL, but keep in mind that this would only work in
>>> the
>>> single shard scenario, and only if you hit an active replica (otherwise,
>>> as
>>> you said, the query will be routed to any other node of the shard,
>>> regardless of the type), if you have multiple shards then you need to use
>>> the “shards” parameter and tell Solr exactly which nodes you want to hit
>>> for each shard (the “shards” approach can also be done in the single
>>> shard
>>> case, although you would be adding an extra hop I believe)
>>>
>>> Tomás
>>> Sent from my iPhone
>>>
>>> On Feb 11, 2018, at 6:35 PM, Greg Roodt <[hidden email]> wrote:
>>>>
>>>> Hi
>>>>
>>>> I have a question around how queries are routed and load-balanced in a
>>>> cluster of mixed TLOG and PULL replicas.
>>>>
>>>> I thought that I might have to put a load-balancer in front of the PULL
>>>> replicas and direct queries at them manually as nodes are added and
>>>>
>>> removed
>>>
>>>> as PULL replicas. However, it seems that SolrCloud handles this
>>>> automatically?
>>>>
>>>> If I add a new PULL replica node, it goes into state="recovering" while
>>>>
>>> it
>>>
>>>> pulls the core. As expected. What happens if queries are directed at
>>>> this
>>>> node while in this state? From what I am observing, the query gets
>>>>
>>> directed
>>>
>>>> to another node?
>>>>
>>>> If SolrCloud is handling the routing of requests to active nodes, will
>>>> it
>>>> automatically favour PULL replicas for read queries and TLOG replicas
>>>> for
>>>> writes?
>>>>
>>>> Thanks
>>>> Greg
>>>>
>>>
>>>
>>
> --
> Ere Maijala
> Kansalliskirjasto / The National Library of Finland
>
Reply | Threaded
Open this post in threaded view
|

Re: Request routing / load-balancing TLOG & PULL replica types

Tomas Fernandez Lobbe-2


> On Feb 12, 2018, at 12:06 PM, Greg Roodt <[hidden email]> wrote:
>
> Thanks Ere. I've taken a look at the discussion here:
> http://lucene.472066.n3.nabble.com/Limit-search-queries-only-to-pull-replicas-td4367323.html
> This is how I was imagining TLOG & PULL replicas would wor, so if this
> functionality does get developed, it would be useful to me.
>
> I still have 2 questions at the moment:
> 1. I am running the single shard scenario. I'm thinking of using a
> dedicated HTTP load-balancer in front of the PULL replicas only with
> read-only queries directed directly at the load-balancer. In this
> situation, the healthy PULL replicas *should* handle the queries on the
> node itself without a proxy hop (assuming state=active). New PULL replicas
> added to the load-balancer will internally proxy queries to the other PULL
> or TLOG replicas while in state=recovering until the switch to
> state=active. Is my understanding correct?

Yes

>
> 2. Is it all worth it? Is there any advantage to running a cluster of 3
> TLOGs + 10 PULL replicas vs running 13 TLOG replicas?
>

I don’t have a definitive answer, this will depend on your specific use case. As Erick said, there is very little work that non-leader TLOG replicas do for each update, and having all TLOG replicas means that with a single active replica you could in theory handle updates. It’s sometimes nice to separate query traffic from update traffic, but this can still be done if you have all TLOG replicas and you just make sure you don’t query the leader…
One nice characteristic that PULL replicas have is that they can’t go into Leader Initiated Recovery (LIR) state, even if there is some sort of network partition, they’ll remain in active state even if they can’t talk with the leader as long as they can reach ZooKeeper (note that this means they may be responding with outdated data for an undetermined amount of time, until replicas can replicate from the leader again). Also, since updates are not sent to all the replicas (only the TLOG replicas), updates should be faster with 3 TLOG vs 13 TLOG replicas.


Tomás

>
>
>
> On 12 February 2018 at 19:25, Ere Maijala <[hidden email]> wrote:
>
>> Your question about directing queries to PULL replicas only has been
>> discussed on the list. Look for topic "Limit search queries only to pull
>> replicas". What I'd like to see is something similar to the
>> preferLocalShards parameter. It could be something like
>> "preferReplicaTypes=TLOG,PULL". Tomás mentioned previously that
>> SOLR-10880 could be used as a base for such funtionality, and I'm
>> considering taking a stab at implementing it.
>>
>> --Ere
>>
>>
>> Greg Roodt kirjoitti 12.2.2018 klo 6.55:
>>
>>> Thank you both for your very detailed answers.
>>>
>>> This is great to know. I knew that SolrJ had the cluster aware knowledge
>>> (via zookeeper), but I was wondering what something like curl would do.
>>> Great to know that internally the cluster will proxy queries to the
>>> appropriate place regardless.
>>>
>>> I am running the single shard scenario. I'm thinking of using a dedicated
>>> HTTP load-balancer in front of the PULL replicas only with read-only
>>> queries directed directly at the load-balancer. In this situation, the
>>> healthy PULL replicas *should* handle the queries on the node itself
>>> without a proxy hop (assuming state=active). New PULL replicas added to
>>> the
>>> load-balancer will internally proxy queries to the other PULL or TLOG
>>> replicas while in state=recovering until the switch to state=active.
>>>
>>> Is my understanding correct?
>>>
>>> Is this sensible to do, or is it not worth it due to the smart proxying
>>> that SolrCloud can do anyway?
>>>
>>> If the TLOG and PULL replicas are so similar, is there any real advantage
>>> to having a mixed cluster? I assume a bit less work is required across the
>>> cluster to propagate writes if you only have 3 TLOG nodes vs 10+ PULL
>>> nodes? Or would it be better to just have 13 TLOG nodes?
>>>
>>>
>>>
>>>
>>>
>>> On 12 February 2018 at 15:24, Tomas Fernandez Lobbe <[hidden email]>
>>> wrote:
>>>
>>> On the last question:
>>>> For Writes: Yes. Writes are going to be sent to the shard leader, and
>>>> since PULL replicas can’t  be leaders, it’s going to be a TLOG replica.
>>>> If
>>>> you are using CloudSolrClient, then this routing will be done directly
>>>> from
>>>> the client (since it will send the update to the leader), and if you are
>>>> using some other HTTP client, then yes, the PULL replica will forward the
>>>> update, the same way any non-leader node would.
>>>>
>>>> For reads: this won’t happen today, and any replica can respond to
>>>> queries. I do believe there is value in this kind of routing logic,
>>>> sometimes you simply don’t want the leader to handle any queries,
>>>> specially
>>>> when queries can be expensive. You could do this today if you want, by
>>>> putting some load balancer in front and just direct your queries to the
>>>> nodes you know are PULL, but keep in mind that this would only work in
>>>> the
>>>> single shard scenario, and only if you hit an active replica (otherwise,
>>>> as
>>>> you said, the query will be routed to any other node of the shard,
>>>> regardless of the type), if you have multiple shards then you need to use
>>>> the “shards” parameter and tell Solr exactly which nodes you want to hit
>>>> for each shard (the “shards” approach can also be done in the single
>>>> shard
>>>> case, although you would be adding an extra hop I believe)
>>>>
>>>> Tomás
>>>> Sent from my iPhone
>>>>
>>>> On Feb 11, 2018, at 6:35 PM, Greg Roodt <[hidden email]> wrote:
>>>>>
>>>>> Hi
>>>>>
>>>>> I have a question around how queries are routed and load-balanced in a
>>>>> cluster of mixed TLOG and PULL replicas.
>>>>>
>>>>> I thought that I might have to put a load-balancer in front of the PULL
>>>>> replicas and direct queries at them manually as nodes are added and
>>>>>
>>>> removed
>>>>
>>>>> as PULL replicas. However, it seems that SolrCloud handles this
>>>>> automatically?
>>>>>
>>>>> If I add a new PULL replica node, it goes into state="recovering" while
>>>>>
>>>> it
>>>>
>>>>> pulls the core. As expected. What happens if queries are directed at
>>>>> this
>>>>> node while in this state? From what I am observing, the query gets
>>>>>
>>>> directed
>>>>
>>>>> to another node?
>>>>>
>>>>> If SolrCloud is handling the routing of requests to active nodes, will
>>>>> it
>>>>> automatically favour PULL replicas for read queries and TLOG replicas
>>>>> for
>>>>> writes?
>>>>>
>>>>> Thanks
>>>>> Greg
>>>>>
>>>>
>>>>
>>>
>> --
>> Ere Maijala
>> Kansalliskirjasto / The National Library of Finland
>>

Reply | Threaded
Open this post in threaded view
|

Re: Request routing / load-balancing TLOG & PULL replica types

Greg Roodt
Thanks so much again Tomas! You've answered my questions and I clearly
understand now. Great work!

On 13 February 2018 at 09:13, Tomas Fernandez Lobbe <[hidden email]>
wrote:

>
>
> > On Feb 12, 2018, at 12:06 PM, Greg Roodt <[hidden email]> wrote:
> >
> > Thanks Ere. I've taken a look at the discussion here:
> > http://lucene.472066.n3.nabble.com/Limit-search-
> queries-only-to-pull-replicas-td4367323.html
> > This is how I was imagining TLOG & PULL replicas would wor, so if this
> > functionality does get developed, it would be useful to me.
> >
> > I still have 2 questions at the moment:
> > 1. I am running the single shard scenario. I'm thinking of using a
> > dedicated HTTP load-balancer in front of the PULL replicas only with
> > read-only queries directed directly at the load-balancer. In this
> > situation, the healthy PULL replicas *should* handle the queries on the
> > node itself without a proxy hop (assuming state=active). New PULL
> replicas
> > added to the load-balancer will internally proxy queries to the other
> PULL
> > or TLOG replicas while in state=recovering until the switch to
> > state=active. Is my understanding correct?
>
> Yes
>
> >
> > 2. Is it all worth it? Is there any advantage to running a cluster of 3
> > TLOGs + 10 PULL replicas vs running 13 TLOG replicas?
> >
>
> I don’t have a definitive answer, this will depend on your specific use
> case. As Erick said, there is very little work that non-leader TLOG
> replicas do for each update, and having all TLOG replicas means that with a
> single active replica you could in theory handle updates. It’s sometimes
> nice to separate query traffic from update traffic, but this can still be
> done if you have all TLOG replicas and you just make sure you don’t query
> the leader…
> One nice characteristic that PULL replicas have is that they can’t go into
> Leader Initiated Recovery (LIR) state, even if there is some sort of
> network partition, they’ll remain in active state even if they can’t talk
> with the leader as long as they can reach ZooKeeper (note that this means
> they may be responding with outdated data for an undetermined amount of
> time, until replicas can replicate from the leader again). Also, since
> updates are not sent to all the replicas (only the TLOG replicas), updates
> should be faster with 3 TLOG vs 13 TLOG replicas.
>
>
> Tomás
>
> >
> >
> >
> > On 12 February 2018 at 19:25, Ere Maijala <[hidden email]>
> wrote:
> >
> >> Your question about directing queries to PULL replicas only has been
> >> discussed on the list. Look for topic "Limit search queries only to pull
> >> replicas". What I'd like to see is something similar to the
> >> preferLocalShards parameter. It could be something like
> >> "preferReplicaTypes=TLOG,PULL". Tomás mentioned previously that
> >> SOLR-10880 could be used as a base for such funtionality, and I'm
> >> considering taking a stab at implementing it.
> >>
> >> --Ere
> >>
> >>
> >> Greg Roodt kirjoitti 12.2.2018 klo 6.55:
> >>
> >>> Thank you both for your very detailed answers.
> >>>
> >>> This is great to know. I knew that SolrJ had the cluster aware
> knowledge
> >>> (via zookeeper), but I was wondering what something like curl would do.
> >>> Great to know that internally the cluster will proxy queries to the
> >>> appropriate place regardless.
> >>>
> >>> I am running the single shard scenario. I'm thinking of using a
> dedicated
> >>> HTTP load-balancer in front of the PULL replicas only with read-only
> >>> queries directed directly at the load-balancer. In this situation, the
> >>> healthy PULL replicas *should* handle the queries on the node itself
> >>> without a proxy hop (assuming state=active). New PULL replicas added to
> >>> the
> >>> load-balancer will internally proxy queries to the other PULL or TLOG
> >>> replicas while in state=recovering until the switch to state=active.
> >>>
> >>> Is my understanding correct?
> >>>
> >>> Is this sensible to do, or is it not worth it due to the smart proxying
> >>> that SolrCloud can do anyway?
> >>>
> >>> If the TLOG and PULL replicas are so similar, is there any real
> advantage
> >>> to having a mixed cluster? I assume a bit less work is required across
> the
> >>> cluster to propagate writes if you only have 3 TLOG nodes vs 10+ PULL
> >>> nodes? Or would it be better to just have 13 TLOG nodes?
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> On 12 February 2018 at 15:24, Tomas Fernandez Lobbe <[hidden email]
> >
> >>> wrote:
> >>>
> >>> On the last question:
> >>>> For Writes: Yes. Writes are going to be sent to the shard leader, and
> >>>> since PULL replicas can’t  be leaders, it’s going to be a TLOG
> replica.
> >>>> If
> >>>> you are using CloudSolrClient, then this routing will be done directly
> >>>> from
> >>>> the client (since it will send the update to the leader), and if you
> are
> >>>> using some other HTTP client, then yes, the PULL replica will forward
> the
> >>>> update, the same way any non-leader node would.
> >>>>
> >>>> For reads: this won’t happen today, and any replica can respond to
> >>>> queries. I do believe there is value in this kind of routing logic,
> >>>> sometimes you simply don’t want the leader to handle any queries,
> >>>> specially
> >>>> when queries can be expensive. You could do this today if you want, by
> >>>> putting some load balancer in front and just direct your queries to
> the
> >>>> nodes you know are PULL, but keep in mind that this would only work in
> >>>> the
> >>>> single shard scenario, and only if you hit an active replica
> (otherwise,
> >>>> as
> >>>> you said, the query will be routed to any other node of the shard,
> >>>> regardless of the type), if you have multiple shards then you need to
> use
> >>>> the “shards” parameter and tell Solr exactly which nodes you want to
> hit
> >>>> for each shard (the “shards” approach can also be done in the single
> >>>> shard
> >>>> case, although you would be adding an extra hop I believe)
> >>>>
> >>>> Tomás
> >>>> Sent from my iPhone
> >>>>
> >>>> On Feb 11, 2018, at 6:35 PM, Greg Roodt <[hidden email]> wrote:
> >>>>>
> >>>>> Hi
> >>>>>
> >>>>> I have a question around how queries are routed and load-balanced in
> a
> >>>>> cluster of mixed TLOG and PULL replicas.
> >>>>>
> >>>>> I thought that I might have to put a load-balancer in front of the
> PULL
> >>>>> replicas and direct queries at them manually as nodes are added and
> >>>>>
> >>>> removed
> >>>>
> >>>>> as PULL replicas. However, it seems that SolrCloud handles this
> >>>>> automatically?
> >>>>>
> >>>>> If I add a new PULL replica node, it goes into state="recovering"
> while
> >>>>>
> >>>> it
> >>>>
> >>>>> pulls the core. As expected. What happens if queries are directed at
> >>>>> this
> >>>>> node while in this state? From what I am observing, the query gets
> >>>>>
> >>>> directed
> >>>>
> >>>>> to another node?
> >>>>>
> >>>>> If SolrCloud is handling the routing of requests to active nodes,
> will
> >>>>> it
> >>>>> automatically favour PULL replicas for read queries and TLOG replicas
> >>>>> for
> >>>>> writes?
> >>>>>
> >>>>> Thanks
> >>>>> Greg
> >>>>>
> >>>>
> >>>>
> >>>
> >> --
> >> Ere Maijala
> >> Kansalliskirjasto / The National Library of Finland
> >>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Request routing / load-balancing TLOG & PULL replica types

Ere Maijala
In reply to this post by Greg Roodt
2. In my experience using PULL replicas can have a significant positive
effect on the server load. It depends of course on your analysis chain,
but we do some fairly expensive analysis, and not having to do the same
work X times does have a benefit. Unfortunately we need multiple shards
so we can't currently isolate the query traffic from the indexing work.

I took a quick look at the shard selection code yesterday, and it seems
it might be quite simple to add replica selection to the same place
where preferLocalShards parameter is handled.

--Ere

Greg Roodt kirjoitti 12.2.2018 klo 22.06:

> Thanks Ere. I've taken a look at the discussion here:
> http://lucene.472066.n3.nabble.com/Limit-search-queries-only-to-pull-replicas-td4367323.html
> This is how I was imagining TLOG & PULL replicas would wor, so if this
> functionality does get developed, it would be useful to me.
>
> I still have 2 questions at the moment:
> 1. I am running the single shard scenario. I'm thinking of using a
> dedicated HTTP load-balancer in front of the PULL replicas only with
> read-only queries directed directly at the load-balancer. In this
> situation, the healthy PULL replicas *should* handle the queries on the
> node itself without a proxy hop (assuming state=active). New PULL replicas
> added to the load-balancer will internally proxy queries to the other PULL
> or TLOG replicas while in state=recovering until the switch to
> state=active. Is my understanding correct?
>
> 2. Is it all worth it? Is there any advantage to running a cluster of 3
> TLOGs + 10 PULL replicas vs running 13 TLOG replicas?
>
>
>
>
> On 12 February 2018 at 19:25, Ere Maijala <[hidden email]> wrote:
>
>> Your question about directing queries to PULL replicas only has been
>> discussed on the list. Look for topic "Limit search queries only to pull
>> replicas". What I'd like to see is something similar to the
>> preferLocalShards parameter. It could be something like
>> "preferReplicaTypes=TLOG,PULL". Tomás mentioned previously that
>> SOLR-10880 could be used as a base for such funtionality, and I'm
>> considering taking a stab at implementing it.
>>
>> --Ere
>>
>>
>> Greg Roodt kirjoitti 12.2.2018 klo 6.55:
>>
>>> Thank you both for your very detailed answers.
>>>
>>> This is great to know. I knew that SolrJ had the cluster aware knowledge
>>> (via zookeeper), but I was wondering what something like curl would do.
>>> Great to know that internally the cluster will proxy queries to the
>>> appropriate place regardless.
>>>
>>> I am running the single shard scenario. I'm thinking of using a dedicated
>>> HTTP load-balancer in front of the PULL replicas only with read-only
>>> queries directed directly at the load-balancer. In this situation, the
>>> healthy PULL replicas *should* handle the queries on the node itself
>>> without a proxy hop (assuming state=active). New PULL replicas added to
>>> the
>>> load-balancer will internally proxy queries to the other PULL or TLOG
>>> replicas while in state=recovering until the switch to state=active.
>>>
>>> Is my understanding correct?
>>>
>>> Is this sensible to do, or is it not worth it due to the smart proxying
>>> that SolrCloud can do anyway?
>>>
>>> If the TLOG and PULL replicas are so similar, is there any real advantage
>>> to having a mixed cluster? I assume a bit less work is required across the
>>> cluster to propagate writes if you only have 3 TLOG nodes vs 10+ PULL
>>> nodes? Or would it be better to just have 13 TLOG nodes?
>>>
>>>
>>>
>>>
>>>
>>> On 12 February 2018 at 15:24, Tomas Fernandez Lobbe <[hidden email]>
>>> wrote:
>>>
>>> On the last question:
>>>> For Writes: Yes. Writes are going to be sent to the shard leader, and
>>>> since PULL replicas can’t  be leaders, it’s going to be a TLOG replica.
>>>> If
>>>> you are using CloudSolrClient, then this routing will be done directly
>>>> from
>>>> the client (since it will send the update to the leader), and if you are
>>>> using some other HTTP client, then yes, the PULL replica will forward the
>>>> update, the same way any non-leader node would.
>>>>
>>>> For reads: this won’t happen today, and any replica can respond to
>>>> queries. I do believe there is value in this kind of routing logic,
>>>> sometimes you simply don’t want the leader to handle any queries,
>>>> specially
>>>> when queries can be expensive. You could do this today if you want, by
>>>> putting some load balancer in front and just direct your queries to the
>>>> nodes you know are PULL, but keep in mind that this would only work in
>>>> the
>>>> single shard scenario, and only if you hit an active replica (otherwise,
>>>> as
>>>> you said, the query will be routed to any other node of the shard,
>>>> regardless of the type), if you have multiple shards then you need to use
>>>> the “shards” parameter and tell Solr exactly which nodes you want to hit
>>>> for each shard (the “shards” approach can also be done in the single
>>>> shard
>>>> case, although you would be adding an extra hop I believe)
>>>>
>>>> Tomás
>>>> Sent from my iPhone
>>>>
>>>> On Feb 11, 2018, at 6:35 PM, Greg Roodt <[hidden email]> wrote:
>>>>>
>>>>> Hi
>>>>>
>>>>> I have a question around how queries are routed and load-balanced in a
>>>>> cluster of mixed TLOG and PULL replicas.
>>>>>
>>>>> I thought that I might have to put a load-balancer in front of the PULL
>>>>> replicas and direct queries at them manually as nodes are added and
>>>>>
>>>> removed
>>>>
>>>>> as PULL replicas. However, it seems that SolrCloud handles this
>>>>> automatically?
>>>>>
>>>>> If I add a new PULL replica node, it goes into state="recovering" while
>>>>>
>>>> it
>>>>
>>>>> pulls the core. As expected. What happens if queries are directed at
>>>>> this
>>>>> node while in this state? From what I am observing, the query gets
>>>>>
>>>> directed
>>>>
>>>>> to another node?
>>>>>
>>>>> If SolrCloud is handling the routing of requests to active nodes, will
>>>>> it
>>>>> automatically favour PULL replicas for read queries and TLOG replicas
>>>>> for
>>>>> writes?
>>>>>
>>>>> Thanks
>>>>> Greg
>>>>>
>>>>
>>>>
>>>
>> --
>> Ere Maijala
>> Kansalliskirjasto / The National Library of Finland
>>
>

--
Ere Maijala
Kansalliskirjasto / The National Library of Finland
Reply | Threaded
Open this post in threaded view
|

Re: Request routing / load-balancing TLOG & PULL replica types

Ere Maijala
In reply to this post by Greg Roodt
A patch is now available: https://issues.apache.org/jira/browse/SOLR-11982

--Ere

Greg Roodt kirjoitti 12.2.2018 klo 22.06:

> Thanks Ere. I've taken a look at the discussion here:
> http://lucene.472066.n3.nabble.com/Limit-search-queries-only-to-pull-replicas-td4367323.html
> This is how I was imagining TLOG & PULL replicas would wor, so if this
> functionality does get developed, it would be useful to me.
>
> I still have 2 questions at the moment:
> 1. I am running the single shard scenario. I'm thinking of using a
> dedicated HTTP load-balancer in front of the PULL replicas only with
> read-only queries directed directly at the load-balancer. In this
> situation, the healthy PULL replicas *should* handle the queries on the
> node itself without a proxy hop (assuming state=active). New PULL replicas
> added to the load-balancer will internally proxy queries to the other PULL
> or TLOG replicas while in state=recovering until the switch to
> state=active. Is my understanding correct?
>
> 2. Is it all worth it? Is there any advantage to running a cluster of 3
> TLOGs + 10 PULL replicas vs running 13 TLOG replicas?
>
>
>
>
> On 12 February 2018 at 19:25, Ere Maijala <[hidden email]> wrote:
>
>> Your question about directing queries to PULL replicas only has been
>> discussed on the list. Look for topic "Limit search queries only to pull
>> replicas". What I'd like to see is something similar to the
>> preferLocalShards parameter. It could be something like
>> "preferReplicaTypes=TLOG,PULL". Tomás mentioned previously that
>> SOLR-10880 could be used as a base for such funtionality, and I'm
>> considering taking a stab at implementing it.
>>
>> --Ere
>>
>>
>> Greg Roodt kirjoitti 12.2.2018 klo 6.55:
>>
>>> Thank you both for your very detailed answers.
>>>
>>> This is great to know. I knew that SolrJ had the cluster aware knowledge
>>> (via zookeeper), but I was wondering what something like curl would do.
>>> Great to know that internally the cluster will proxy queries to the
>>> appropriate place regardless.
>>>
>>> I am running the single shard scenario. I'm thinking of using a dedicated
>>> HTTP load-balancer in front of the PULL replicas only with read-only
>>> queries directed directly at the load-balancer. In this situation, the
>>> healthy PULL replicas *should* handle the queries on the node itself
>>> without a proxy hop (assuming state=active). New PULL replicas added to
>>> the
>>> load-balancer will internally proxy queries to the other PULL or TLOG
>>> replicas while in state=recovering until the switch to state=active.
>>>
>>> Is my understanding correct?
>>>
>>> Is this sensible to do, or is it not worth it due to the smart proxying
>>> that SolrCloud can do anyway?
>>>
>>> If the TLOG and PULL replicas are so similar, is there any real advantage
>>> to having a mixed cluster? I assume a bit less work is required across the
>>> cluster to propagate writes if you only have 3 TLOG nodes vs 10+ PULL
>>> nodes? Or would it be better to just have 13 TLOG nodes?
>>>
>>>
>>>
>>>
>>>
>>> On 12 February 2018 at 15:24, Tomas Fernandez Lobbe <[hidden email]>
>>> wrote:
>>>
>>> On the last question:
>>>> For Writes: Yes. Writes are going to be sent to the shard leader, and
>>>> since PULL replicas can’t  be leaders, it’s going to be a TLOG replica.
>>>> If
>>>> you are using CloudSolrClient, then this routing will be done directly
>>>> from
>>>> the client (since it will send the update to the leader), and if you are
>>>> using some other HTTP client, then yes, the PULL replica will forward the
>>>> update, the same way any non-leader node would.
>>>>
>>>> For reads: this won’t happen today, and any replica can respond to
>>>> queries. I do believe there is value in this kind of routing logic,
>>>> sometimes you simply don’t want the leader to handle any queries,
>>>> specially
>>>> when queries can be expensive. You could do this today if you want, by
>>>> putting some load balancer in front and just direct your queries to the
>>>> nodes you know are PULL, but keep in mind that this would only work in
>>>> the
>>>> single shard scenario, and only if you hit an active replica (otherwise,
>>>> as
>>>> you said, the query will be routed to any other node of the shard,
>>>> regardless of the type), if you have multiple shards then you need to use
>>>> the “shards” parameter and tell Solr exactly which nodes you want to hit
>>>> for each shard (the “shards” approach can also be done in the single
>>>> shard
>>>> case, although you would be adding an extra hop I believe)
>>>>
>>>> Tomás
>>>> Sent from my iPhone
>>>>
>>>> On Feb 11, 2018, at 6:35 PM, Greg Roodt <[hidden email]> wrote:
>>>>>
>>>>> Hi
>>>>>
>>>>> I have a question around how queries are routed and load-balanced in a
>>>>> cluster of mixed TLOG and PULL replicas.
>>>>>
>>>>> I thought that I might have to put a load-balancer in front of the PULL
>>>>> replicas and direct queries at them manually as nodes are added and
>>>>>
>>>> removed
>>>>
>>>>> as PULL replicas. However, it seems that SolrCloud handles this
>>>>> automatically?
>>>>>
>>>>> If I add a new PULL replica node, it goes into state="recovering" while
>>>>>
>>>> it
>>>>
>>>>> pulls the core. As expected. What happens if queries are directed at
>>>>> this
>>>>> node while in this state? From what I am observing, the query gets
>>>>>
>>>> directed
>>>>
>>>>> to another node?
>>>>>
>>>>> If SolrCloud is handling the routing of requests to active nodes, will
>>>>> it
>>>>> automatically favour PULL replicas for read queries and TLOG replicas
>>>>> for
>>>>> writes?
>>>>>
>>>>> Thanks
>>>>> Greg
>>>>>
>>>>
>>>>
>>>
>> --
>> Ere Maijala
>> Kansalliskirjasto / The National Library of Finland
>>
>

--
Ere Maijala
Kansalliskirjasto / The National Library of Finland