QueryParser changes query by itself

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

QueryParser changes query by itself

Bernd Fehling
We just noticed a very strange problem with Solr 6.4.2 QueryParser.
The QueryParser changes the query by itself from time to time.
This happens if doing a search request reload several times at higher rate.

Good example:
...
<str name="q">textth:waffenhandel</str>
  <result name="response" numFound="85" start="0">
...
<str name="rawquerystring">textth:waffenhandel</str>
<str name="querystring">textth:waffenhandel</str>
  <str name="parsedquery">+SynonymQuery(Synonym(textth:"arms sales" textth:"arms trade"...
  <str name="parsedquery_toString">+Synonym(textth:"arms sales" textth:"arms trade"...


Bad example:
...
<str name="q">textth:waffenhandel</str>
  <result name="response" numFound="20459" start="0">
...
<str name="rawquerystring">textth:waffenhandel</str>
<str name="querystring">textth:waffenhandel</str>
  <str name="parsedquery">+textth:rss</str>
  <str name="parsedquery_toString">+textth:rss</str>

As you can see in the bad example after several reloads the parsedquery changed to term "rss".
But the original querystring has no "rss" substring at all. That is really strange.

Anyone seen this before?

Single index, Solr 6.4.2.

Regards
Bernd
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: QueryParser changes query by itself

Ahmet Arslan
Hi Bernd,

In LUCENE-3758, a new member field added into ComplexPhraseQuery class. But we didn't change its hashCode method accordingly. This caused anomalies in Solr, and Yonik found the bug and fixed hashCode. Your e-mail somehow reminded me this.
Could it be the QueryCache and hashCode method/implementation of Query subclasses.
May be your good and bad example is producing same hashCode? And this is confusing query cache in solr?
Can you disable the query cache, to test it?
By the way, which query parser are you using? I believe SynonymQuery is produced by BM25 similarity, right?

Ahmet


On Friday, August 11, 2017, 2:48:07 PM GMT+3, Bernd Fehling <[hidden email]> wrote:


We just noticed a very strange problem with Solr 6.4.2 QueryParser.
The QueryParser changes the query by itself from time to time.
This happens if doing a search request reload several times at higher rate.

Good example:
...
<str name="q">textth:waffenhandel</str>
  <result name="response" numFound="85" start="0">
...
<str name="rawquerystring">textth:waffenhandel</str>
<str name="querystring">textth:waffenhandel</str>
  <str name="parsedquery">+SynonymQuery(Synonym(textth:"arms sales" textth:"arms trade"...
  <str name="parsedquery_toString">+Synonym(textth:"arms sales" textth:"arms trade"...


Bad example:
...
<str name="q">textth:waffenhandel</str>
  <result name="response" numFound="20459" start="0">
...
<str name="rawquerystring">textth:waffenhandel</str>
<str name="querystring">textth:waffenhandel</str>
  <str name="parsedquery">+textth:rss</str>
  <str name="parsedquery_toString">+textth:rss</str>

As you can see in the bad example after several reloads the parsedquery changed to term "rss".
But the original querystring has no "rss" substring at all. That is really strange.

Anyone seen this before?

Single index, Solr 6.4.2.

Regards
Bernd
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: QueryParser changes query by itself

Bernd Fehling
In reply to this post by Bernd Fehling
Hi Ahmet,

thank you for your reply. I was also targeting towards QueryCache but
with your hint about LUCENE-3758 I have a better point to start with.

If the system is under high load and the the QueryCache is filled I have
a higher rate of changed queries.
In debug mode the "timing-->process-->query" of changed queries is always "0" zero.

The query parser "SynonymQParser" is self developed which uses QParserPlugin.
There is no caching inside and works for years.
Only compiled against recent Lucene/Solr and some modifications like
using Builder with newer Lucene versions.

I will test without query cache.
Wich one should be disabled, Query Result Cache?

Regards
Bernd


Am 15.08.2017 um 19:07 schrieb Ahmet Arslan:

> Hi Bernd,
>
> In LUCENE-3758, a new member field added into ComplexPhraseQuery class. But we didn't change its hashCode method accordingly. This caused anomalies in Solr, and Yonik found the bug and fixed hashCode. Your e-mail somehow reminded me this.
> Could it be the QueryCache and hashCode method/implementation of Query subclasses.
> May be your good and bad example is producing same hashCode? And this is confusing query cache in solr?
> Can you disable the query cache, to test it?
> By the way, which query parser are you using? I believe SynonymQuery is produced by BM25 similarity, right?
>
> Ahmet
>
>
> On Friday, August 11, 2017, 2:48:07 PM GMT+3, Bernd Fehling <[hidden email]> wrote:
>
>
> We just noticed a very strange problem with Solr 6.4.2 QueryParser.
> The QueryParser changes the query by itself from time to time.
> This happens if doing a search request reload several times at higher rate.
>
> Good example:
> ...
> <str name="q">textth:waffenhandel</str>
>   <result name="response" numFound="85" start="0">
> ...
> <str name="rawquerystring">textth:waffenhandel</str>
> <str name="querystring">textth:waffenhandel</str>
>   <str name="parsedquery">+SynonymQuery(Synonym(textth:"arms sales" textth:"arms trade"...
>   <str name="parsedquery_toString">+Synonym(textth:"arms sales" textth:"arms trade"...
>
>
> Bad example:
> ...
> <str name="q">textth:waffenhandel</str>
>   <result name="response" numFound="20459" start="0">
> ...
> <str name="rawquerystring">textth:waffenhandel</str>
> <str name="querystring">textth:waffenhandel</str>
>   <str name="parsedquery">+textth:rss</str>
>   <str name="parsedquery_toString">+textth:rss</str>
>
> As you can see in the bad example after several reloads the parsedquery changed to term "rss".
> But the original querystring has no "rss" substring at all. That is really strange.
>
> Anyone seen this before?
>
> Single index, Solr 6.4.2.
>
> Regards
> Bernd
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: QueryParser changes query by itself

Bernd Fehling
My class SynonymQParser which calls SolrQueryParserBase.parse :

class SynonymQParser extends QParser {
    protected SolrQueryParser sqparser;
    ...
    @Override
    public Query parse() throws SyntaxError {
        ...
        sqparser = new SolrQueryParser(this, defaultField);
        sqparser.setEnableGraphQueries(false);
        sqparser.setEnablePositionIncrements(false);
        ...
        Query synquery = sqparser.parse(qstr);
        ...

And this is SolrQueryParserBase with method parse:

public abstract class SolrQueryParserBase extends QueryBuilder {
    ...
    public Query parse(String query) throws SyntaxError {
        ReInit(new FastCharStream(new StringReader(query)));
        try {
          // TopLevelQuery is a Query followed by the end-of-input (EOF)
          Query res = TopLevelQuery(null);  // pass null so we can tell later if an explicit field was provided or not
          return res!=null ? res : newBooleanQuery().build();
        }
        ...


The String variable "query" going into parse method is always "textth:waffenhandel" !!!
Having a breakpoint at "return", the Query variable "res" changes sometimes to
TermQuery with term "textth:rss" instead of being a SynonymQuery.

This is strange!!!

What is ReInit right before try doing, is that a cahe lookup?

Or is the problem in TopLevelQuery?

Regards
Bernd


Am 16.08.2017 um 09:06 schrieb Bernd Fehling:

> Hi Ahmet,
>
> thank you for your reply. I was also targeting towards QueryCache but
> with your hint about LUCENE-3758 I have a better point to start with.
>
> If the system is under high load and the the QueryCache is filled I have
> a higher rate of changed queries.
> In debug mode the "timing-->process-->query" of changed queries is always "0" zero.
>
> The query parser "SynonymQParser" is self developed which uses QParserPlugin.
> There is no caching inside and works for years.
> Only compiled against recent Lucene/Solr and some modifications like
> using Builder with newer Lucene versions.
>
> I will test without query cache.
> Wich one should be disabled, Query Result Cache?
>
> Regards
> Bernd
>
>
> Am 15.08.2017 um 19:07 schrieb Ahmet Arslan:
>> Hi Bernd,
>>
>> In LUCENE-3758, a new member field added into ComplexPhraseQuery class. But we didn't change its hashCode method accordingly. This caused anomalies in Solr, and Yonik found the bug and fixed hashCode. Your e-mail somehow reminded me this.
>> Could it be the QueryCache and hashCode method/implementation of Query subclasses.
>> May be your good and bad example is producing same hashCode? And this is confusing query cache in solr?
>> Can you disable the query cache, to test it?
>> By the way, which query parser are you using? I believe SynonymQuery is produced by BM25 similarity, right?
>>
>> Ahmet
>>
>>
>> On Friday, August 11, 2017, 2:48:07 PM GMT+3, Bernd Fehling <[hidden email]> wrote:
>>
>>
>> We just noticed a very strange problem with Solr 6.4.2 QueryParser.
>> The QueryParser changes the query by itself from time to time.
>> This happens if doing a search request reload several times at higher rate.
>>
>> Good example:
>> ...
>> <str name="q">textth:waffenhandel</str>
>>   <result name="response" numFound="85" start="0">
>> ...
>> <str name="rawquerystring">textth:waffenhandel</str>
>> <str name="querystring">textth:waffenhandel</str>
>>   <str name="parsedquery">+SynonymQuery(Synonym(textth:"arms sales" textth:"arms trade"...
>>   <str name="parsedquery_toString">+Synonym(textth:"arms sales" textth:"arms trade"...
>>
>>
>> Bad example:
>> ...
>> <str name="q">textth:waffenhandel</str>
>>   <result name="response" numFound="20459" start="0">
>> ...
>> <str name="rawquerystring">textth:waffenhandel</str>
>> <str name="querystring">textth:waffenhandel</str>
>>   <str name="parsedquery">+textth:rss</str>
>>   <str name="parsedquery_toString">+textth:rss</str>
>>
>> As you can see in the bad example after several reloads the parsedquery changed to term "rss".
>> But the original querystring has no "rss" substring at all. That is really strange.
>>
>> Anyone seen this before?
>>
>> Single index, Solr 6.4.2.
>>
>> Regards
>> Bernd
>>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: QueryParser changes query by itself

Yonik Seeley
The queryCache shouldn't be involved, this is somehow an issue in
parsing (and Solr doesn't currently cache parsing).
Perhaps there is something shared in your SynonymQParser instances
that isn't quite thread safe?
It could also be something in the text analysis in lucene as well
(related to the new graph stuff?)

-Yonik


On Wed, Aug 16, 2017 at 7:32 AM, Bernd Fehling
<[hidden email]> wrote:

> My class SynonymQParser which calls SolrQueryParserBase.parse :
>
> class SynonymQParser extends QParser {
>     protected SolrQueryParser sqparser;
>     ...
>     @Override
>     public Query parse() throws SyntaxError {
>         ...
>         sqparser = new SolrQueryParser(this, defaultField);
>         sqparser.setEnableGraphQueries(false);
>         sqparser.setEnablePositionIncrements(false);
>         ...
>         Query synquery = sqparser.parse(qstr);
>         ...
>
> And this is SolrQueryParserBase with method parse:
>
> public abstract class SolrQueryParserBase extends QueryBuilder {
>     ...
>     public Query parse(String query) throws SyntaxError {
>         ReInit(new FastCharStream(new StringReader(query)));
>         try {
>           // TopLevelQuery is a Query followed by the end-of-input (EOF)
>           Query res = TopLevelQuery(null);  // pass null so we can tell later if an explicit field was provided or not
>           return res!=null ? res : newBooleanQuery().build();
>         }
>         ...
>
>
> The String variable "query" going into parse method is always "textth:waffenhandel" !!!
> Having a breakpoint at "return", the Query variable "res" changes sometimes to
> TermQuery with term "textth:rss" instead of being a SynonymQuery.
>
> This is strange!!!
>
> What is ReInit right before try doing, is that a cahe lookup?
>
> Or is the problem in TopLevelQuery?
>
> Regards
> Bernd
>
>
> Am 16.08.2017 um 09:06 schrieb Bernd Fehling:
>> Hi Ahmet,
>>
>> thank you for your reply. I was also targeting towards QueryCache but
>> with your hint about LUCENE-3758 I have a better point to start with.
>>
>> If the system is under high load and the the QueryCache is filled I have
>> a higher rate of changed queries.
>> In debug mode the "timing-->process-->query" of changed queries is always "0" zero.
>>
>> The query parser "SynonymQParser" is self developed which uses QParserPlugin.
>> There is no caching inside and works for years.
>> Only compiled against recent Lucene/Solr and some modifications like
>> using Builder with newer Lucene versions.
>>
>> I will test without query cache.
>> Wich one should be disabled, Query Result Cache?
>>
>> Regards
>> Bernd
>>
>>
>> Am 15.08.2017 um 19:07 schrieb Ahmet Arslan:
>>> Hi Bernd,
>>>
>>> In LUCENE-3758, a new member field added into ComplexPhraseQuery class. But we didn't change its hashCode method accordingly. This caused anomalies in Solr, and Yonik found the bug and fixed hashCode. Your e-mail somehow reminded me this.
>>> Could it be the QueryCache and hashCode method/implementation of Query subclasses.
>>> May be your good and bad example is producing same hashCode? And this is confusing query cache in solr?
>>> Can you disable the query cache, to test it?
>>> By the way, which query parser are you using? I believe SynonymQuery is produced by BM25 similarity, right?
>>>
>>> Ahmet
>>>
>>>
>>> On Friday, August 11, 2017, 2:48:07 PM GMT+3, Bernd Fehling <[hidden email]> wrote:
>>>
>>>
>>> We just noticed a very strange problem with Solr 6.4.2 QueryParser.
>>> The QueryParser changes the query by itself from time to time.
>>> This happens if doing a search request reload several times at higher rate.
>>>
>>> Good example:
>>> ...
>>> <str name="q">textth:waffenhandel</str>
>>>   <result name="response" numFound="85" start="0">
>>> ...
>>> <str name="rawquerystring">textth:waffenhandel</str>
>>> <str name="querystring">textth:waffenhandel</str>
>>>   <str name="parsedquery">+SynonymQuery(Synonym(textth:"arms sales" textth:"arms trade"...
>>>   <str name="parsedquery_toString">+Synonym(textth:"arms sales" textth:"arms trade"...
>>>
>>>
>>> Bad example:
>>> ...
>>> <str name="q">textth:waffenhandel</str>
>>>   <result name="response" numFound="20459" start="0">
>>> ...
>>> <str name="rawquerystring">textth:waffenhandel</str>
>>> <str name="querystring">textth:waffenhandel</str>
>>>   <str name="parsedquery">+textth:rss</str>
>>>   <str name="parsedquery_toString">+textth:rss</str>
>>>
>>> As you can see in the bad example after several reloads the parsedquery changed to term "rss".
>>> But the original querystring has no "rss" substring at all. That is really strange.
>>>
>>> Anyone seen this before?
>>>
>>> Single index, Solr 6.4.2.
>>>
>>> Regards
>>> Bernd
>>>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: QueryParser changes query by itself [solved]

Bernd Fehling
In reply to this post by Bernd Fehling
Finally I solved the problem :-)

I don't know if it's a bug or a feature in org.apache.lucene.util.QueryBuilder
but I solved it in my Filter code which feels like a dirty hack.

The TokeStream API says in the docs:
https://lucene.apache.org/core/6_4_2/core/org/apache/lucene/analysis/TokenStream.html
...
The workflow of the new TokenStream API is as follows:
 1. Instantiation of TokenStream/TokenFilters which add/get attributes to/from the AttributeSource.
 2. The consumer calls reset().
 3. The consumer retrieves attributes from the stream and stores local references to all attributes it wants to access.
 4. The consumer calls incrementToken() until it returns false consuming the attributes after each call.
 6. The consumer calls end() so that any end-of-stream operations can be performed.
 7. The consumer calls close() to release any resource when finished using the TokenStream.


But the QueryBuilder only calls "stream.reset()", it never calls "stream.end()" so that Filters
in the Analyzer chain can't do any cleanup (like my Filter wanted to do).
I moved my "cleanup" into reset() which feels like a dirty hack.


My opinion, in lucene QueryBuilder there should be a "stream.end()" after consuming the stream:
...
   stream.reset();
   while (stream.incrementToken()) {
       numTokens++;
       ...
   }
   stream.end();
...


Regards
Bernd


Am 16.08.2017 um 15:26 schrieb Yonik Seeley:

> The queryCache shouldn't be involved, this is somehow an issue in
> parsing (and Solr doesn't currently cache parsing).
> Perhaps there is something shared in your SynonymQParser instances
> that isn't quite thread safe?
> It could also be something in the text analysis in lucene as well
> (related to the new graph stuff?)
>
> -Yonik
>
>
> On Wed, Aug 16, 2017 at 7:32 AM, Bernd Fehling
> <[hidden email]> wrote:
>> My class SynonymQParser which calls SolrQueryParserBase.parse :
>>
>> class SynonymQParser extends QParser {
>>     protected SolrQueryParser sqparser;
>>     ...
>>     @Override
>>     public Query parse() throws SyntaxError {
>>         ...
>>         sqparser = new SolrQueryParser(this, defaultField);
>>         sqparser.setEnableGraphQueries(false);
>>         sqparser.setEnablePositionIncrements(false);
>>         ...
>>         Query synquery = sqparser.parse(qstr);
>>         ...
>>
>> And this is SolrQueryParserBase with method parse:
>>
>> public abstract class SolrQueryParserBase extends QueryBuilder {
>>     ...
>>     public Query parse(String query) throws SyntaxError {
>>         ReInit(new FastCharStream(new StringReader(query)));
>>         try {
>>           // TopLevelQuery is a Query followed by the end-of-input (EOF)
>>           Query res = TopLevelQuery(null);  // pass null so we can tell later if an explicit field was provided or not
>>           return res!=null ? res : newBooleanQuery().build();
>>         }
>>         ...
>>
>>
>> The String variable "query" going into parse method is always "textth:waffenhandel" !!!
>> Having a breakpoint at "return", the Query variable "res" changes sometimes to
>> TermQuery with term "textth:rss" instead of being a SynonymQuery.
>>
>> This is strange!!!
>>
>> What is ReInit right before try doing, is that a cahe lookup?
>>
>> Or is the problem in TopLevelQuery?
>>
>> Regards
>> Bernd
>>
>>
>> Am 16.08.2017 um 09:06 schrieb Bernd Fehling:
>>> Hi Ahmet,
>>>
>>> thank you for your reply. I was also targeting towards QueryCache but
>>> with your hint about LUCENE-3758 I have a better point to start with.
>>>
>>> If the system is under high load and the the QueryCache is filled I have
>>> a higher rate of changed queries.
>>> In debug mode the "timing-->process-->query" of changed queries is always "0" zero.
>>>
>>> The query parser "SynonymQParser" is self developed which uses QParserPlugin.
>>> There is no caching inside and works for years.
>>> Only compiled against recent Lucene/Solr and some modifications like
>>> using Builder with newer Lucene versions.
>>>
>>> I will test without query cache.
>>> Wich one should be disabled, Query Result Cache?
>>>
>>> Regards
>>> Bernd
>>>
>>>
>>> Am 15.08.2017 um 19:07 schrieb Ahmet Arslan:
>>>> Hi Bernd,
>>>>
>>>> In LUCENE-3758, a new member field added into ComplexPhraseQuery class. But we didn't change its hashCode method accordingly. This caused anomalies in Solr, and Yonik found the bug and fixed hashCode. Your e-mail somehow reminded me this.
>>>> Could it be the QueryCache and hashCode method/implementation of Query subclasses.
>>>> May be your good and bad example is producing same hashCode? And this is confusing query cache in solr?
>>>> Can you disable the query cache, to test it?
>>>> By the way, which query parser are you using? I believe SynonymQuery is produced by BM25 similarity, right?
>>>>
>>>> Ahmet
>>>>
>>>>
>>>> On Friday, August 11, 2017, 2:48:07 PM GMT+3, Bernd Fehling <[hidden email]> wrote:
>>>>
>>>>
>>>> We just noticed a very strange problem with Solr 6.4.2 QueryParser.
>>>> The QueryParser changes the query by itself from time to time.
>>>> This happens if doing a search request reload several times at higher rate.
>>>>
>>>> Good example:
>>>> ...
>>>> <str name="q">textth:waffenhandel</str>
>>>>   <result name="response" numFound="85" start="0">
>>>> ...
>>>> <str name="rawquerystring">textth:waffenhandel</str>
>>>> <str name="querystring">textth:waffenhandel</str>
>>>>   <str name="parsedquery">+SynonymQuery(Synonym(textth:"arms sales" textth:"arms trade"...
>>>>   <str name="parsedquery_toString">+Synonym(textth:"arms sales" textth:"arms trade"...
>>>>
>>>>
>>>> Bad example:
>>>> ...
>>>> <str name="q">textth:waffenhandel</str>
>>>>   <result name="response" numFound="20459" start="0">
>>>> ...
>>>> <str name="rawquerystring">textth:waffenhandel</str>
>>>> <str name="querystring">textth:waffenhandel</str>
>>>>   <str name="parsedquery">+textth:rss</str>
>>>>   <str name="parsedquery_toString">+textth:rss</str>
>>>>
>>>> As you can see in the bad example after several reloads the parsedquery changed to term "rss".
>>>> But the original querystring has no "rss" substring at all. That is really strange.
>>>>
>>>> Anyone seen this before?
>>>>
>>>> Single index, Solr 6.4.2.
>>>>
>>>> Regards
>>>> Bernd
>>>>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: QueryParser changes query by itself [solved]

sarowe
Hi Bernd,

> On Aug 22, 2017, at 4:31 AM, Bernd Fehling <[hidden email]> wrote:
>
> But the QueryBuilder only calls "stream.reset()", it never calls "stream.end()" so that Filters
> in the Analyzer chain can't do any cleanup (like my Filter wanted to do).
> I moved my "cleanup" into reset() which feels like a dirty hack.
>
>
> My opinion, in lucene QueryBuilder there should be a "stream.end()" after consuming the stream:
> ...
>   stream.reset();
>   while (stream.incrementToken()) {
>       numTokens++;
>       ...
>   }
>   stream.end();
> ...

The stream here is a CachingTokenFilter wrapping the passed-in TokenStream. On first call to cache.incrementToken(), CachingTokenFilter's cache is populated by exhausting the wrapped stream and then calling its end() method.

--
Steve
www.lucidworks.com
Loading...