Query of death? Collapsing Query Parser - Solr 7.5

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Query of death? Collapsing Query Parser - Solr 7.5

IZaBEE_Keeper
Hi..

I'm wondering if I've found a query of death or just a really expensive
query.. It's killing my solr with OOM..

Collapsing query parser using:
fq={!collapse field=domain nullPolicy=expand}

Everything works fine using words & phrases.. However as soon as there are
numbers involved it crashes out with OOM Killer..

The server has nowhere near enough ram for the index of 800GB & 150M docs..

But a dismax query like '1 2 s 2 s 3 e d 4 r f 3 e s 7 2 1 4 6 7 8 2 9 0 3'
will make it crash..

fq={!collapse field=domain nullPolicy=expand}
PhraseFields( 'content^0.05 description^0.03 keywords^0.03 title^0.05
url^0.06' )
BoostQuery( 'host:"' . $q . '"^0.6 host:"twitter.com"^0.35 domain:"' . $q .
'"^0.6' )

Without the fq it works just fine and only times out on extreme queries..
eventually it finds them..

Do I just need more ram or is there another way to prevent solr from
crashing?

Solr 7.5 24GB ram 16gb heap with ssd lv..



-----
Bee Keeper at IZaBEE.com
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Bee Keeper at IZaBEE.com
Reply | Threaded
Open this post in threaded view
|

Re: Query of death? Collapsing Query Parser - Solr 7.5

Michael Gibney
Would you be willing to share your query-time analysis chain config, and
perhaps the "debug=true" (or "debug=query") output for successful queries
of a similar nature to the problematic ones? Also, re: "only times out on
extreme queries" -- what do you consider to be an "extreme query", in this
context?

On Mon, Mar 25, 2019 at 10:06 PM IZaBEE_Keeper <[hidden email]>
wrote:

> Hi..
>
> I'm wondering if I've found a query of death or just a really expensive
> query.. It's killing my solr with OOM..
>
> Collapsing query parser using:
> fq={!collapse field=domain nullPolicy=expand}
>
> Everything works fine using words & phrases.. However as soon as there are
> numbers involved it crashes out with OOM Killer..
>
> The server has nowhere near enough ram for the index of 800GB & 150M docs..
>
> But a dismax query like '1 2 s 2 s 3 e d 4 r f 3 e s 7 2 1 4 6 7 8 2 9 0 3'
> will make it crash..
>
> fq={!collapse field=domain nullPolicy=expand}
> PhraseFields( 'content^0.05 description^0.03 keywords^0.03 title^0.05
> url^0.06' )
> BoostQuery( 'host:"' . $q . '"^0.6 host:"twitter.com"^0.35 domain:"' . $q
> .
> '"^0.6' )
>
> Without the fq it works just fine and only times out on extreme queries..
> eventually it finds them..
>
> Do I just need more ram or is there another way to prevent solr from
> crashing?
>
> Solr 7.5 24GB ram 16gb heap with ssd lv..
>
>
>
> -----
> Bee Keeper at IZaBEE.com
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>
Reply | Threaded
Open this post in threaded view
|

Re: Query of death? Collapsing Query Parser - Solr 7.5

IZaBEE_Keeper
OK..

The intent is to collapse on the field domain..

Here's a query that works fine and the way I want with the Collapsing query
parser..

/select?defType=dismax&fl=score,content,description,keywords,title&fq={!collapse%20field=domain%20nullPolicy=expand}&pf=content^0.05%20description^0.03%20keywords^0.03%20title^0.05%20url^0.06&q=bernie+sanders&qf=title%20description%20keywords%20content%20url

This is a complex query with 20 terms mixed alpha & numeric single
characters..

/select?defType=dismax&fl=score,content,description,keywords,title&fq={!collapse%20field=domain%20nullPolicy=expand}&pf=content^0.05%20description^0.03%20keywords^0.03%20title^0.05%20url^0.06&q=1+2+e+3+s+a+d+f+r+4+5+t+g+6+7+8+7+1+2+3+6&qf=title%20description%20keywords%20content%20url

This query crashes solr with the OOM process killer..

Removing the collapsing query parser {!collapse field=domain
nullPolicy=expand} eliminates the problem and never crashes solr on any
query by my testing.. A search of 20 alpha & numeric characters with spaces
is very slow though..

With the collapsing query parser the single numeric terms cause solr to
crash.. using whole words works but slow if there's too many terms..

The debug on all successful queries shows no errors.. the default is 10
rows.. a cold search (not cached) on a 2 word phrase takes 2-4 seconds.
Adding more than 3-4 numbers with spaces to the search kills it..

There is no debug for the failed queries as solr is killed by the process
killer..

Extreme queries are long multi term queries or long queries of single number
& letters with spaces in between.  Something like '1 3 s 2 c 4 5 t s 5 6 3 a
s 4 e 6 1 4 3 2 4 5 6 ' will cause it to search for all those individual
terms which are likely to be very frequent.. This type of query seems to
make solr work really hard..

While it's not likely that users would make such searches I need to prevent
solr from crashing with the collapsing query parser.. This type of query can
cause a heavy load on various types of search systems and can be used in DOS
attacks targeting search systems.. You can try a 20 term query made of
numbers & letters with spaces between to see what I mean if you have a 100m
doc index handy..

I can try to prevent these types of queries through the search API by
rewriting the user input.. However if there is a way to make solr time out
instead of being killed that would be preferable.. Otherwise I'll have to
find a different way to limit the number of results per domain..

I have some more ram to put in the server tomorrow, that might help..  I
don't mind if the complex searches are slow.. but crashing out is not good..
especially with the process killer killing solr completely..

Currently this is on a master/slave setup, 150m docs 800GB, 24GB ram, 16GB
heap..



-----
Bee Keeper at IZaBEE.com
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Bee Keeper at IZaBEE.com