Limit search queries only to pull replicas

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Limit search queries only to pull replicas

Stanislav Sandalnikov
Hi,

We have a Solr 7.1 setup with SolrCloud where we have multiple shards on one server (for indexing) each shard has a pull replica on other servers.

What are the possible ways to limit search request only to pull type replicase?
At the moment the only solution I found is to append shards parameter to each query, but if new shards added later it requires to change solrconfig. Is it the only way to do this?

Thank you

Regards
Stanislav

Reply | Threaded
Open this post in threaded view
|

Re: Limit search queries only to pull replicas

Emir Arnautović
Hi Stanislav,
I don’t think that there is a built in feature to do this, but that sounds like nice feature of Solrj - maybe you should check if available. You can implement it outside of Solrj - check cluster state to see which shards are available and send queries only to pull replicas.

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 14 Dec 2017, at 09:58, Stanislav Sandalnikov <[hidden email]> wrote:
>
> Hi,
>
> We have a Solr 7.1 setup with SolrCloud where we have multiple shards on one server (for indexing) each shard has a pull replica on other servers.
>
> What are the possible ways to limit search request only to pull type replicase?
> At the moment the only solution I found is to append shards parameter to each query, but if new shards added later it requires to change solrconfig. Is it the only way to do this?
>
> Thank you
>
> Regards
> Stanislav
>

Reply | Threaded
Open this post in threaded view
|

Re: Limit search queries only to pull replicas

Ere Maijala
Hi,

It would be really nice to have a server-side option, though. Not
everyone uses Solrj, and a typical fairly dummy client just queries the
server without any understanding about shards etc. Solr could be clever
enough to not forward the query to NRT shards when configured to prefer
PULL shards and they're available. Maybe it could be something similar
to the preferLocalShards parameter, like "preferShardTypes=TLOG,PULL".

--Ere

Emir Arnautović kirjoitti 14.12.2017 klo 11.41:

> Hi Stanislav,
> I don’t think that there is a built in feature to do this, but that sounds like nice feature of Solrj - maybe you should check if available. You can implement it outside of Solrj - check cluster state to see which shards are available and send queries only to pull replicas.
>
> HTH,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
>> On 14 Dec 2017, at 09:58, Stanislav Sandalnikov <[hidden email]> wrote:
>>
>> Hi,
>>
>> We have a Solr 7.1 setup with SolrCloud where we have multiple shards on one server (for indexing) each shard has a pull replica on other servers.
>>
>> What are the possible ways to limit search request only to pull type replicase?
>> At the moment the only solution I found is to append shards parameter to each query, but if new shards added later it requires to change solrconfig. Is it the only way to do this?
>>
>> Thank you
>>
>> Regards
>> Stanislav
>>
>

--
Ere Maijala
Kansalliskirjasto / The National Library of Finland
Reply | Threaded
Open this post in threaded view
|

Re: Limit search queries only to pull replicas

Emir Arnautović
It is interesting that ES had similar feature to prefer primary/replica but it deprecating that and will remove it - could not find explanation why.

Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 5 Jan 2018, at 15:22, Ere Maijala <[hidden email]> wrote:
>
> Hi,
>
> It would be really nice to have a server-side option, though. Not everyone uses Solrj, and a typical fairly dummy client just queries the server without any understanding about shards etc. Solr could be clever enough to not forward the query to NRT shards when configured to prefer PULL shards and they're available. Maybe it could be something similar to the preferLocalShards parameter, like "preferShardTypes=TLOG,PULL".
>
> --Ere
>
> Emir Arnautović kirjoitti 14.12.2017 klo 11.41:
>> Hi Stanislav,
>> I don’t think that there is a built in feature to do this, but that sounds like nice feature of Solrj - maybe you should check if available. You can implement it outside of Solrj - check cluster state to see which shards are available and send queries only to pull replicas.
>> HTH,
>> Emir
>> --
>> Monitoring - Log Management - Alerting - Anomaly Detection
>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>>> On 14 Dec 2017, at 09:58, Stanislav Sandalnikov <[hidden email]> wrote:
>>>
>>> Hi,
>>>
>>> We have a Solr 7.1 setup with SolrCloud where we have multiple shards on one server (for indexing) each shard has a pull replica on other servers.
>>>
>>> What are the possible ways to limit search request only to pull type replicase?
>>> At the moment the only solution I found is to append shards parameter to each query, but if new shards added later it requires to change solrconfig. Is it the only way to do this?
>>>
>>> Thank you
>>>
>>> Regards
>>> Stanislav
>>>
>
> --
> Ere Maijala
> Kansalliskirjasto / The National Library of Finland

Reply | Threaded
Open this post in threaded view
|

Re: Limit search queries only to pull replicas

Erick Erickson
Actually, I think a much better option is to route queries to server load.

The theory of preferring pull replicas to leaders would be that the leader
will be doing the indexing work and the pull replicas would be doing less
work therefore serving queries faster. But that's a fragile assumption.
Let's say indexing stops totally. Now your leader is sitting there idle
when it could be serving queries.

The autoscaling work will allow for more intelligent routing, you can
monitor the CPU load on your servers and if the leader has some spare
cycles use them .vs. crudely routing all queries to pull replicas (or tlog
replicas for that matter). NOTE: I don't know whether this is being
actively worked on or not, but seems a logical extension of the increased
monitoring capabilities being put in place for autoscaling, but I'd rather
see effort put in there than support routing based solely on a node's type.

Best,
Erick

On Fri, Jan 5, 2018 at 7:51 AM, Emir Arnautović <
[hidden email]> wrote:

> It is interesting that ES had similar feature to prefer primary/replica
> but it deprecating that and will remove it - could not find explanation why.
>
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 5 Jan 2018, at 15:22, Ere Maijala <[hidden email]> wrote:
> >
> > Hi,
> >
> > It would be really nice to have a server-side option, though. Not
> everyone uses Solrj, and a typical fairly dummy client just queries the
> server without any understanding about shards etc. Solr could be clever
> enough to not forward the query to NRT shards when configured to prefer
> PULL shards and they're available. Maybe it could be something similar to
> the preferLocalShards parameter, like "preferShardTypes=TLOG,PULL".
> >
> > --Ere
> >
> > Emir Arnautović kirjoitti 14.12.2017 klo 11.41:
> >> Hi Stanislav,
> >> I don’t think that there is a built in feature to do this, but that
> sounds like nice feature of Solrj - maybe you should check if available.
> You can implement it outside of Solrj - check cluster state to see which
> shards are available and send queries only to pull replicas.
> >> HTH,
> >> Emir
> >> --
> >> Monitoring - Log Management - Alerting - Anomaly Detection
> >> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
> >>> On 14 Dec 2017, at 09:58, Stanislav Sandalnikov <
> [hidden email]> wrote:
> >>>
> >>> Hi,
> >>>
> >>> We have a Solr 7.1 setup with SolrCloud where we have multiple shards
> on one server (for indexing) each shard has a pull replica on other servers.
> >>>
> >>> What are the possible ways to limit search request only to pull type
> replicase?
> >>> At the moment the only solution I found is to append shards parameter
> to each query, but if new shards added later it requires to change
> solrconfig. Is it the only way to do this?
> >>>
> >>> Thank you
> >>>
> >>> Regards
> >>> Stanislav
> >>>
> >
> > --
> > Ere Maijala
> > Kansalliskirjasto / The National Library of Finland
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Limit search queries only to pull replicas

Ere Maijala
In reply to this post by Emir Arnautović
Interesting indeed, but maybe in line with the idea that ES knows what
to do best without the user interfering.

My example parameter name was bad, it should have been something like
"preferReplicaTypes=TLOG,PULL". I can't see what would be bad about
that, but then to me it seems Solr has always been much more about
giving control to the administrator or developer instead of
automatically just working. This may be daunting in the beginning, but
it seems I always start to look for more control of how things are done
in the long run.

--Ere

Emir Arnautović kirjoitti 5.1.2018 klo 17.51:

> It is interesting that ES had similar feature to prefer primary/replica but it deprecating that and will remove it - could not find explanation why.
>
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
>> On 5 Jan 2018, at 15:22, Ere Maijala <[hidden email]> wrote:
>>
>> Hi,
>>
>> It would be really nice to have a server-side option, though. Not everyone uses Solrj, and a typical fairly dummy client just queries the server without any understanding about shards etc. Solr could be clever enough to not forward the query to NRT shards when configured to prefer PULL shards and they're available. Maybe it could be something similar to the preferLocalShards parameter, like "preferShardTypes=TLOG,PULL".
>>
>> --Ere
>>
>> Emir Arnautović kirjoitti 14.12.2017 klo 11.41:
>>> Hi Stanislav,
>>> I don’t think that there is a built in feature to do this, but that sounds like nice feature of Solrj - maybe you should check if available. You can implement it outside of Solrj - check cluster state to see which shards are available and send queries only to pull replicas.
>>> HTH,
>>> Emir
>>> --
>>> Monitoring - Log Management - Alerting - Anomaly Detection
>>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>>>> On 14 Dec 2017, at 09:58, Stanislav Sandalnikov <[hidden email]> wrote:
>>>>
>>>> Hi,
>>>>
>>>> We have a Solr 7.1 setup with SolrCloud where we have multiple shards on one server (for indexing) each shard has a pull replica on other servers.
>>>>
>>>> What are the possible ways to limit search request only to pull type replicase?
>>>> At the moment the only solution I found is to append shards parameter to each query, but if new shards added later it requires to change solrconfig. Is it the only way to do this?
>>>>
>>>> Thank you
>>>>
>>>> Regards
>>>> Stanislav
>>>>
>>
>> --
>> Ere Maijala
>> Kansalliskirjasto / The National Library of Finland
>

--
Ere Maijala
Kansalliskirjasto / The National Library of Finland
Reply | Threaded
Open this post in threaded view
|

Re: Limit search queries only to pull replicas

Ere Maijala
In reply to this post by Erick Erickson
Server load alone doesn't always indicate the server's ability to serve
queries. Memory and cache state are important too, and they're not as
easy to monitor. Additionally, server load at any single point in time
or a short term average is not indicative of the server's ability to
handle search requests if indexing happens in short but intense bursts.

It can also complicate things if there are more than one Solr instance
running on a single server.

I'm definitely not against intelligent routing. In many cases it makes
perfect sense, and I'd still like to use it, just limited to the pull
replicas.

--Ere

Erick Erickson kirjoitti 5.1.2018 klo 19.03:

> Actually, I think a much better option is to route queries to server load.
>
> The theory of preferring pull replicas to leaders would be that the leader
> will be doing the indexing work and the pull replicas would be doing less
> work therefore serving queries faster. But that's a fragile assumption.
> Let's say indexing stops totally. Now your leader is sitting there idle
> when it could be serving queries.
>
> The autoscaling work will allow for more intelligent routing, you can
> monitor the CPU load on your servers and if the leader has some spare
> cycles use them .vs. crudely routing all queries to pull replicas (or tlog
> replicas for that matter). NOTE: I don't know whether this is being
> actively worked on or not, but seems a logical extension of the increased
> monitoring capabilities being put in place for autoscaling, but I'd rather
> see effort put in there than support routing based solely on a node's type.
>
> Best,
> Erick
>
> On Fri, Jan 5, 2018 at 7:51 AM, Emir Arnautović <
> [hidden email]> wrote:
>
>> It is interesting that ES had similar feature to prefer primary/replica
>> but it deprecating that and will remove it - could not find explanation why.
>>
>> Emir
>> --
>> Monitoring - Log Management - Alerting - Anomaly Detection
>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>>
>>
>>
>>> On 5 Jan 2018, at 15:22, Ere Maijala <[hidden email]> wrote:
>>>
>>> Hi,
>>>
>>> It would be really nice to have a server-side option, though. Not
>> everyone uses Solrj, and a typical fairly dummy client just queries the
>> server without any understanding about shards etc. Solr could be clever
>> enough to not forward the query to NRT shards when configured to prefer
>> PULL shards and they're available. Maybe it could be something similar to
>> the preferLocalShards parameter, like "preferShardTypes=TLOG,PULL".
>>>
>>> --Ere
>>>
>>> Emir Arnautović kirjoitti 14.12.2017 klo 11.41:
>>>> Hi Stanislav,
>>>> I don’t think that there is a built in feature to do this, but that
>> sounds like nice feature of Solrj - maybe you should check if available.
>> You can implement it outside of Solrj - check cluster state to see which
>> shards are available and send queries only to pull replicas.
>>>> HTH,
>>>> Emir
>>>> --
>>>> Monitoring - Log Management - Alerting - Anomaly Detection
>>>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>>>>> On 14 Dec 2017, at 09:58, Stanislav Sandalnikov <
>> [hidden email]> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> We have a Solr 7.1 setup with SolrCloud where we have multiple shards
>> on one server (for indexing) each shard has a pull replica on other servers.
>>>>>
>>>>> What are the possible ways to limit search request only to pull type
>> replicase?
>>>>> At the moment the only solution I found is to append shards parameter
>> to each query, but if new shards added later it requires to change
>> solrconfig. Is it the only way to do this?
>>>>>
>>>>> Thank you
>>>>>
>>>>> Regards
>>>>> Stanislav
>>>>>
>>>
>>> --
>>> Ere Maijala
>>> Kansalliskirjasto / The National Library of Finland
>>
>>
>

--
Ere Maijala
Kansalliskirjasto / The National Library of Finland
Reply | Threaded
Open this post in threaded view
|

Re: Limit search queries only to pull replicas

Tomas Fernandez Lobbe-2
This feature is not currently supported. I was thinking in implementing it by extending the work done in SOLR-10880. I still didn’t have time to work on it though.  There is a patch for SOLR-10880 that doesn’t implement support for replica types, but could be used as base.

Tomás

> On Jan 8, 2018, at 12:04 AM, Ere Maijala <[hidden email]> wrote:
>
> Server load alone doesn't always indicate the server's ability to serve queries. Memory and cache state are important too, and they're not as easy to monitor. Additionally, server load at any single point in time or a short term average is not indicative of the server's ability to handle search requests if indexing happens in short but intense bursts.
>
> It can also complicate things if there are more than one Solr instance running on a single server.
>
> I'm definitely not against intelligent routing. In many cases it makes perfect sense, and I'd still like to use it, just limited to the pull replicas.
>
> --Ere
>
> Erick Erickson kirjoitti 5.1.2018 klo 19.03:
>> Actually, I think a much better option is to route queries to server load.
>> The theory of preferring pull replicas to leaders would be that the leader
>> will be doing the indexing work and the pull replicas would be doing less
>> work therefore serving queries faster. But that's a fragile assumption.
>> Let's say indexing stops totally. Now your leader is sitting there idle
>> when it could be serving queries.
>> The autoscaling work will allow for more intelligent routing, you can
>> monitor the CPU load on your servers and if the leader has some spare
>> cycles use them .vs. crudely routing all queries to pull replicas (or tlog
>> replicas for that matter). NOTE: I don't know whether this is being
>> actively worked on or not, but seems a logical extension of the increased
>> monitoring capabilities being put in place for autoscaling, but I'd rather
>> see effort put in there than support routing based solely on a node's type.
>> Best,
>> Erick
>> On Fri, Jan 5, 2018 at 7:51 AM, Emir Arnautović <
>> [hidden email]> wrote:
>>> It is interesting that ES had similar feature to prefer primary/replica
>>> but it deprecating that and will remove it - could not find explanation why.
>>>
>>> Emir
>>> --
>>> Monitoring - Log Management - Alerting - Anomaly Detection
>>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>>>
>>>
>>>
>>>> On 5 Jan 2018, at 15:22, Ere Maijala <[hidden email]> wrote:
>>>>
>>>> Hi,
>>>>
>>>> It would be really nice to have a server-side option, though. Not
>>> everyone uses Solrj, and a typical fairly dummy client just queries the
>>> server without any understanding about shards etc. Solr could be clever
>>> enough to not forward the query to NRT shards when configured to prefer
>>> PULL shards and they're available. Maybe it could be something similar to
>>> the preferLocalShards parameter, like "preferShardTypes=TLOG,PULL".
>>>>
>>>> --Ere
>>>>
>>>> Emir Arnautović kirjoitti 14.12.2017 klo 11.41:
>>>>> Hi Stanislav,
>>>>> I don’t think that there is a built in feature to do this, but that
>>> sounds like nice feature of Solrj - maybe you should check if available.
>>> You can implement it outside of Solrj - check cluster state to see which
>>> shards are available and send queries only to pull replicas.
>>>>> HTH,
>>>>> Emir
>>>>> --
>>>>> Monitoring - Log Management - Alerting - Anomaly Detection
>>>>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>>>>>> On 14 Dec 2017, at 09:58, Stanislav Sandalnikov <
>>> [hidden email]> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> We have a Solr 7.1 setup with SolrCloud where we have multiple shards
>>> on one server (for indexing) each shard has a pull replica on other servers.
>>>>>>
>>>>>> What are the possible ways to limit search request only to pull type
>>> replicase?
>>>>>> At the moment the only solution I found is to append shards parameter
>>> to each query, but if new shards added later it requires to change
>>> solrconfig. Is it the only way to do this?
>>>>>>
>>>>>> Thank you
>>>>>>
>>>>>> Regards
>>>>>> Stanislav
>>>>>>
>>>>
>>>> --
>>>> Ere Maijala
>>>> Kansalliskirjasto / The National Library of Finland
>>>
>>>
>
> --
> Ere Maijala
> Kansalliskirjasto / The National Library of Finland