Query of Death Lucene/Solr 7.6

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Query of Death Lucene/Solr 7.6

Markus Jelsma-2
Hello (apologies for cross-posting),

While working on SOLR-12743, using 7.6 on two nodes and 7.2.1 on the remaining four, we stumbled upon a situation where the 7.6 nodes quickly succumb when a 'Query-of-Death' is issued, 7.2.1 up to 7.5 are all unaffected (tested and confirmed).

Following Smiley's suggestion i used Eclipse MAT to find the problem in the heap dump i obtained, this fantastic tool revealed within minutes that a query thread ate 65 % of all resources, in the class variables i could find the the query, and reproduce the problem.

The problematic query is 'dubbele dijk/rijke dijkproject in het dijktracé eemshaven-delfzijl', on 7.6 this input produces a 40+ MB toString() output in edismax' newFieldQuery. If the node survives it takes 2+ seconds for the query to run (150 ms otherwise). If i disable all query time SynonymGraphFilters it still takes a second and produces just a 9 MB toString() for the query.

I could not find anything like this in Jira. I did think of LUCENE-8479 and LUCENE-8531 but they were about graphs, this problem looked related though.

I think i tracked it further down to LUCENE-8589 or SOLR-12243. When i leave Solr's edismax' pf parameter empty, everything runs fast. When all fields are configured for pf, the node dies.

I am now unsure whether this is a Solr or a Lucene issue.

Please let me know.

Many thanks,
Markus

ps. in Solr i even got an 'Impossible Exception', my first!
Reply | Threaded
Open this post in threaded view
|

Re: Query of Death Lucene/Solr 7.6

Michael Gibney
Hi Markus,
As of 7.6, LUCENE-8531 <https://issues.apache.org/jira/browse/LUCENE-8531>
reverted a graph/Spans-based phrase query implementation (introduced in 6.5
-- LUCENE-7699 <https://issues.apache.org/jira/browse/LUCENE-7699>) to an
implementation that builds a separate phrase query for each possible
enumerated path through the graph described by a parsed query.
The potential for combinatoric explosion of the enumerated approach was (as
far as I can tell) one of the main motivations for introducing the
Spans-based implementation. Some real-world use cases would be good to
explore. Markus, could you send (as an attachment) the debug toString() for
the queries with/without synonyms enabled? I'm also guessing you may have
WordDelimiterGraphFilter on the query analyzer?
As an alternative to disabling pf, LUCENE-8531 only reverts to the
enumerated approach for phrase queries where slop>0, so setting ps=0 would
probably also help.
Michael

On Fri, Feb 8, 2019 at 5:57 AM Markus Jelsma <[hidden email]>
wrote:

> Hello (apologies for cross-posting),
>
> While working on SOLR-12743, using 7.6 on two nodes and 7.2.1 on the
> remaining four, we stumbled upon a situation where the 7.6 nodes quickly
> succumb when a 'Query-of-Death' is issued, 7.2.1 up to 7.5 are all
> unaffected (tested and confirmed).
>
> Following Smiley's suggestion i used Eclipse MAT to find the problem in
> the heap dump i obtained, this fantastic tool revealed within minutes that
> a query thread ate 65 % of all resources, in the class variables i could
> find the the query, and reproduce the problem.
>
> The problematic query is 'dubbele dijk/rijke dijkproject in het dijktracé
> eemshaven-delfzijl', on 7.6 this input produces a 40+ MB toString() output
> in edismax' newFieldQuery. If the node survives it takes 2+ seconds for the
> query to run (150 ms otherwise). If i disable all query time
> SynonymGraphFilters it still takes a second and produces just a 9 MB
> toString() for the query.
>
> I could not find anything like this in Jira. I did think of LUCENE-8479
> and LUCENE-8531 but they were about graphs, this problem looked related
> though.
>
> I think i tracked it further down to LUCENE-8589 or SOLR-12243. When i
> leave Solr's edismax' pf parameter empty, everything runs fast. When all
> fields are configured for pf, the node dies.
>
> I am now unsure whether this is a Solr or a Lucene issue.
>
> Please let me know.
>
> Many thanks,
> Markus
>
> ps. in Solr i even got an 'Impossible Exception', my first!
>