Solr Cloud not routing to PULL replicas

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Solr Cloud not routing to PULL replicas

Ashwin Ramesh
Hi again,

We are currently using Solr 7.3.1 and have a 8 shard collection. All our
TLOGs are in seperate machines & PULLs in others. Since not all shards are
in the same machine, the request will be distributed. However, we are
seeing that most of the 'distributed' parts of the requests are being
routed to the TLOG machines. This is evident as the TLOGs are saturated at
80%+ CPU while the PULL machines are sitting at 25% even through the load
balancer only routes to the PULL machines. I know we can use
'preferLocalShards', but that still doesn't solve the problem.

Is there something we have configured incorrectly? We are currently rushing
to upgrade to 7.4.0 so we can take advantage of
'shards.preference=replica.location:local,replica.type:PULL' parameter. In
the meantime, we would like to know if there is a reason for this behavior
and if there is anything we can do to avoid it.

Thank you & regards,

Ash

--
*P.S. We've launched a new blog to share the latest ideas and case studies
from our team. Check it out here: product.canva.com
<http://product.canva.com/>. ***
** <https://canva.com>Empowering the world
to design
Also, we're hiring. Apply here!
<https://about.canva.com/careers/>
 <https://twitter.com/canva>
<https://facebook.com/canva> <https://au.linkedin.com/company/canva>
<https://instagram.com/canva>





Reply | Threaded
Open this post in threaded view
|

Re: Solr Cloud not routing to PULL replicas

Tomás Fernández Löbbe
Hi Ash,
Do you see all shard queries going to the TLOG replicas or “most” (are
there some going to the PULL replicas). You can confirm this by looking in
the logs for queries with “isShard=true” parameter. Are the PULL replicas
active (since you are using a load balancer I’m guessing you are not using
CloudSolrClient for queries)?
Did you look at other metrics other than the CPU utilization? like, are the
“/select” request metrics (or whatever handler path you are using)
confirming the issue (high in the TLOG replicas and low in the PULL
replicas).

Can you share a query from your logs (the main query and the shard queries
if possible)

Tomás


On Tue, Aug 28, 2018 at 6:22 AM Ash Ramesh <[hidden email]> wrote:

> Hi again,
>
> We are currently using Solr 7.3.1 and have a 8 shard collection. All our
> TLOGs are in seperate machines & PULLs in others. Since not all shards are
> in the same machine, the request will be distributed. However, we are
> seeing that most of the 'distributed' parts of the requests are being
> routed to the TLOG machines. This is evident as the TLOGs are saturated at
> 80%+ CPU while the PULL machines are sitting at 25% even through the load
> balancer only routes to the PULL machines. I know we can use
> 'preferLocalShards', but that still doesn't solve the problem.
>
> Is there something we have configured incorrectly? We are currently rushing
> to upgrade to 7.4.0 so we can take advantage of
> 'shards.preference=replica.location:local,replica.type:PULL' parameter. In
> the meantime, we would like to know if there is a reason for this behavior
> and if there is anything we can do to avoid it.
>
> Thank you & regards,
>
> Ash
>
> --
> *P.S. We've launched a new blog to share the latest ideas and case studies
> from our team. Check it out here: product.canva.com
> <http://product.canva.com/>. ***
> ** <https://canva.com>Empowering the world
> to design
> Also, we're hiring. Apply here!
> <https://about.canva.com/careers/>
>  <https://twitter.com/canva>
> <https://facebook.com/canva> <https://au.linkedin.com/company/canva>
> <https://instagram.com/canva>
>
>
>
>
>
>