How to block expensive solr queries

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

How to block expensive solr queries

weiwang19
Hi,

Recently we encountered a problem when solr cloud query latency suddenly
increase, many simple queries that has small recall gets time out. After
digging a bit I found that the root cause is some stats queries happen at
the same time, such as

/solr/mycollection/select?stats=true&stats.field=unique_ids&stats.calcdistinct=true



I see unique_ids is a high cardinality field so this query is quite
expensive. But why a small volume of such query blocks other queries and
make simple queries time out?  I checked the solr thread pool and see there
are plenty of idle threads available.  We are using solr 7.6.2 with a 10
shard cloud set up.

Is there a way to block certain solr queries based on url pattern? i.e.
ignore the stats.calcdistinct request in this case.


Thanks,

Wei
Reply | Threaded
Open this post in threaded view
|

Re: How to block expensive solr queries

Mikhail Khludnev-2
Hello, Wei.

Have you tried to abandon heavy queries with
https://lucene.apache.org/solr/guide/8_1/common-query-parameters.html#CommonQueryParameters-ThetimeAllowedParameter
 ?
It may or may not be able to stop stats.
https://github.com/apache/lucene-solr/blob/25eda17c66f0091dbf6570121e38012749c07d72/solr/core/src/test/org/apache/solr/cloud/CloudExitableDirectoryReaderTest.java#L223
can clarify it.

On Mon, Oct 7, 2019 at 8:19 PM Wei <[hidden email]> wrote:

> Hi,
>
> Recently we encountered a problem when solr cloud query latency suddenly
> increase, many simple queries that has small recall gets time out. After
> digging a bit I found that the root cause is some stats queries happen at
> the same time, such as
>
>
> /solr/mycollection/select?stats=true&stats.field=unique_ids&stats.calcdistinct=true
>
>
>
> I see unique_ids is a high cardinality field so this query is quite
> expensive. But why a small volume of such query blocks other queries and
> make simple queries time out?  I checked the solr thread pool and see there
> are plenty of idle threads available.  We are using solr 7.6.2 with a 10
> shard cloud set up.
>
> Is there a way to block certain solr queries based on url pattern? i.e.
> ignore the stats.calcdistinct request in this case.
>
>
> Thanks,
>
> Wei
>


--
Sincerely yours
Mikhail Khludnev
Reply | Threaded
Open this post in threaded view
|

Re: How to block expensive solr queries

weiwang19
Hi Mikhail,

Yes I have the timeAllowed parameter configured, still is this case it
doesn't seem to prevent the stats request from blocking other normal
queries.  Is it possible to drop the request before solr executes it? maybe
at the jetty request filter?

Thanks,
Wei

On Mon, Oct 7, 2019 at 1:39 PM Mikhail Khludnev <[hidden email]> wrote:

> Hello, Wei.
>
> Have you tried to abandon heavy queries with
>
> https://lucene.apache.org/solr/guide/8_1/common-query-parameters.html#CommonQueryParameters-ThetimeAllowedParameter
>  ?
> It may or may not be able to stop stats.
>
> https://github.com/apache/lucene-solr/blob/25eda17c66f0091dbf6570121e38012749c07d72/solr/core/src/test/org/apache/solr/cloud/CloudExitableDirectoryReaderTest.java#L223
> can clarify it.
>
> On Mon, Oct 7, 2019 at 8:19 PM Wei <[hidden email]> wrote:
>
> > Hi,
> >
> > Recently we encountered a problem when solr cloud query latency suddenly
> > increase, many simple queries that has small recall gets time out. After
> > digging a bit I found that the root cause is some stats queries happen at
> > the same time, such as
> >
> >
> >
> /solr/mycollection/select?stats=true&stats.field=unique_ids&stats.calcdistinct=true
> >
> >
> >
> > I see unique_ids is a high cardinality field so this query is quite
> > expensive. But why a small volume of such query blocks other queries and
> > make simple queries time out?  I checked the solr thread pool and see
> there
> > are plenty of idle threads available.  We are using solr 7.6.2 with a 10
> > shard cloud set up.
> >
> > Is there a way to block certain solr queries based on url pattern? i.e.
> > ignore the stats.calcdistinct request in this case.
> >
> >
> > Thanks,
> >
> > Wei
> >
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>
Reply | Threaded
Open this post in threaded view
|

Re: How to block expensive solr queries

Mikhail Khludnev-2
It's worth to raise an issue for supporting timeAllowed for stats. Until
it's done, something like jetty filter is only an option,

On Tue, Oct 8, 2019 at 12:34 AM Wei <[hidden email]> wrote:

> Hi Mikhail,
>
> Yes I have the timeAllowed parameter configured, still is this case it
> doesn't seem to prevent the stats request from blocking other normal
> queries.  Is it possible to drop the request before solr executes it? maybe
> at the jetty request filter?
>
> Thanks,
> Wei
>
> On Mon, Oct 7, 2019 at 1:39 PM Mikhail Khludnev <[hidden email]> wrote:
>
> > Hello, Wei.
> >
> > Have you tried to abandon heavy queries with
> >
> >
> https://lucene.apache.org/solr/guide/8_1/common-query-parameters.html#CommonQueryParameters-ThetimeAllowedParameter
> >  ?
> > It may or may not be able to stop stats.
> >
> >
> https://github.com/apache/lucene-solr/blob/25eda17c66f0091dbf6570121e38012749c07d72/solr/core/src/test/org/apache/solr/cloud/CloudExitableDirectoryReaderTest.java#L223
> > can clarify it.
> >
> > On Mon, Oct 7, 2019 at 8:19 PM Wei <[hidden email]> wrote:
> >
> > > Hi,
> > >
> > > Recently we encountered a problem when solr cloud query latency
> suddenly
> > > increase, many simple queries that has small recall gets time out.
> After
> > > digging a bit I found that the root cause is some stats queries happen
> at
> > > the same time, such as
> > >
> > >
> > >
> >
> /solr/mycollection/select?stats=true&stats.field=unique_ids&stats.calcdistinct=true
> > >
> > >
> > >
> > > I see unique_ids is a high cardinality field so this query is quite
> > > expensive. But why a small volume of such query blocks other queries
> and
> > > make simple queries time out?  I checked the solr thread pool and see
> > there
> > > are plenty of idle threads available.  We are using solr 7.6.2 with a
> 10
> > > shard cloud set up.
> > >
> > > Is there a way to block certain solr queries based on url pattern? i.e.
> > > ignore the stats.calcdistinct request in this case.
> > >
> > >
> > > Thanks,
> > >
> > > Wei
> > >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> >
>


--
Sincerely yours
Mikhail Khludnev
Reply | Threaded
Open this post in threaded view
|

Re: How to block expensive solr queries

Toke Eskildsen-2
In reply to this post by weiwang19
On Mon, 2019-10-07 at 10:18 -0700, Wei wrote:
> /solr/mycollection/select?stats=true&stats.field=unique_ids&stats.cal
> cdistinct=true
...
> Is there a way to block certain solr queries based on url pattern?
> i.e. ignore the stats.calcdistinct request in this case.

It sounds like it is possible for users to issue arbitrary queries
against your Solr installation. As you have noticed, it makes it easy
to perform a Denial Of Service (intentional or not). Filtering out
stats.calcdistinct won't help with the next request for
group.ngroups=true, facet.field=unique_id&facet.limit=100000000,
rows=100000000 or something fifth.

I recommend you flip your logic and only allow specific types of
requests and put limits on those. To my knowledge that is not a build-
in feature of Solr.

- Toke Eskildsem, Royal Danish Library


Reply | Threaded
Open this post in threaded view
|

Re: How to block expensive solr queries

weiwang19
On Wed, Oct 9, 2019 at 9:59 AM Wei <[hidden email]> wrote:

> Thanks all. I debugged a bit and see timeAllowed does not limit stats
> call. Also I think it would be useful for solr to support a white list or
> black list of operations as Toke suggested. Will create jira for it.
> Currently seems the only option to explore is adding filter to solr's
> embedded jetty.  Does anyone have experience doing that? Do I also need to
> change SolrDispatchFilter?
>
> On Tue, Oct 8, 2019 at 3:50 AM Toke Eskildsen <[hidden email]> wrote:
>
>> On Mon, 2019-10-07 at 10:18 -0700, Wei wrote:
>> > /solr/mycollection/select?stats=true&stats.field=unique_ids&stats.cal
>> > cdistinct=true
>> ...
>> > Is there a way to block certain solr queries based on url pattern?
>> > i.e. ignore the stats.calcdistinct request in this case.
>>
>> It sounds like it is possible for users to issue arbitrary queries
>> against your Solr installation. As you have noticed, it makes it easy
>> to perform a Denial Of Service (intentional or not). Filtering out
>> stats.calcdistinct won't help with the next request for
>> group.ngroups=true, facet.field=unique_id&facet.limit=100000000,
>> rows=100000000 or something fifth.
>>
>> I recommend you flip your logic and only allow specific types of
>> requests and put limits on those. To my knowledge that is not a build-
>> in feature of Solr.
>>
>> - Toke Eskildsem, Royal Danish Library
>>
>>
>>