Query in quotes cannot find results

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Query in quotes cannot find results

Permakoff, Vadim
Hi,
This might be known issue, but I cannot find a reference for this specific case - searching for exact query with synonyms and stopwords.

I have a simple configuration for catch-all field:

    <fieldType name="text_test" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="true">
      <analyzer type="index">
       <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
    </fieldType>

The synonyms.txt file has only one line:
expand,blow up

The stopwords.txt file has only one line:
the

There is only one document:
{
   "id":"1",
    "title":"to expand the methods for mailing cancellation"
}

Everything else is default basic configuaration. Tested with Solr 6.5.1 and Solr 8.5.2.

The basic query q=expand the methods   <<< finds the document,
the query (in quotes) q="expand the methods"   <<< cannot find the document

Am I doing something wrong, or is it known bug (I saw similar issues discussed in the past, but not for exact match query) and if yes - what is the Jira for it?

Best Regards,
Vadim Permakoff


________________________________

This email is intended solely for the recipient. It may contain privileged, proprietary or confidential information or material. If you are not the intended recipient, please delete this email and any attachments and notify the sender of the error.
Reply | Threaded
Open this post in threaded view
|

Re: Query in quotes cannot find results

Shawn Heisey-2
On 6/29/2020 3:34 PM, Permakoff, Vadim wrote:
> The basic query q=expand the methods   <<< finds the document,
> the query (in quotes) q="expand the methods"   <<< cannot find the document
>
> Am I doing something wrong, or is it known bug (I saw similar issues discussed in the past, but not for exact match query) and if yes - what is the Jira for it?

The most helpful information will come from running both queries with
debug enabled, so you can see how the query is parsed.  If you add a
parameter "debugQuery=true" to the URL, then the response should include
the parsed query.  Compare those, and see if you can tell what the
differences are.

One of the most common problems for queries like this is that you're not
searching the field that you THINK you're searching.  I don't know
whether this is the problem, I just mention it because it is a common error.

Thanks,
Shawn
Reply | Threaded
Open this post in threaded view
|

RE: Query in quotes cannot find results

Permakoff, Vadim
Hi Shawn,
Many thanks for the response, I checked the field and it is correct. Let's call it _text_ to make it easier.
I believe the parsing is also correct, please see below:
 - Query without quotes (works):
    "querystring":"expand the methods",
    "parsedquery":"(PhraseQuery(_text_:\"blow up\") _text_:expand) _text_:methods",

 - Query with quotes (does not work):
    "querystring":"\"expand the methods\"",
    "parsedquery":"SpanNearQuery(spanNear([spanOr([spanNear([_text_:blow, _text_:up], 0, true), _text_:expand]), _text_:methods], 0, true))",

The document has text:
"to expand the methods for mailing cancellation"

The analysis on this field shows that all words are present in the index and the query, the order is also correct, but the word "methods" in moved one position, I guess that's why the result is not found.

Best Regards,
Vadim Permakoff




-----Original Message-----
From: Shawn Heisey <[hidden email]>
Sent: Monday, June 29, 2020 6:28 PM
To: [hidden email]
Subject: Re: Query in quotes cannot find results

On 6/29/2020 3:34 PM, Permakoff, Vadim wrote:
> The basic query q=expand the methods   <<< finds the document,
> the query (in quotes) q="expand the methods"   <<< cannot find the document
>
> Am I doing something wrong, or is it known bug (I saw similar issues discussed in the past, but not for exact match query) and if yes - what is the Jira for it?

The most helpful information will come from running both queries with debug enabled, so you can see how the query is parsed.  If you add a parameter "debugQuery=true" to the URL, then the response should include the parsed query.  Compare those, and see if you can tell what the differences are.

One of the most common problems for queries like this is that you're not searching the field that you THINK you're searching.  I don't know whether this is the problem, I just mention it because it is a common error.

Thanks,
Shawn

________________________________

This email is intended solely for the recipient. It may contain privileged, proprietary or confidential information or material. If you are not the intended recipient, please delete this email and any attachments and notify the sender of the error.
Reply | Threaded
Open this post in threaded view
|

Re: Query in quotes cannot find results

Erick Erickson
Looks like you’re removing stopwords. Stopwords cause issues like this with the positions being off.

It’s becoming more and more common to _NOT_ remove stopwords, is that an option?



Best,
Erick

> On Jun 29, 2020, at 7:32 PM, Permakoff, Vadim <[hidden email]> wrote:
>
> Hi Shawn,
> Many thanks for the response, I checked the field and it is correct. Let's call it _text_ to make it easier.
> I believe the parsing is also correct, please see below:
> - Query without quotes (works):
>    "querystring":"expand the methods",
>    "parsedquery":"(PhraseQuery(_text_:\"blow up\") _text_:expand) _text_:methods",
>
> - Query with quotes (does not work):
>    "querystring":"\"expand the methods\"",
>    "parsedquery":"SpanNearQuery(spanNear([spanOr([spanNear([_text_:blow, _text_:up], 0, true), _text_:expand]), _text_:methods], 0, true))",
>
> The document has text:
> "to expand the methods for mailing cancellation"
>
> The analysis on this field shows that all words are present in the index and the query, the order is also correct, but the word "methods" in moved one position, I guess that's why the result is not found.
>
> Best Regards,
> Vadim Permakoff
>
>
>
>
> -----Original Message-----
> From: Shawn Heisey <[hidden email]>
> Sent: Monday, June 29, 2020 6:28 PM
> To: [hidden email]
> Subject: Re: Query in quotes cannot find results
>
> On 6/29/2020 3:34 PM, Permakoff, Vadim wrote:
>> The basic query q=expand the methods   <<< finds the document,
>> the query (in quotes) q="expand the methods"   <<< cannot find the document
>>
>> Am I doing something wrong, or is it known bug (I saw similar issues discussed in the past, but not for exact match query) and if yes - what is the Jira for it?
>
> The most helpful information will come from running both queries with debug enabled, so you can see how the query is parsed.  If you add a parameter "debugQuery=true" to the URL, then the response should include the parsed query.  Compare those, and see if you can tell what the differences are.
>
> One of the most common problems for queries like this is that you're not searching the field that you THINK you're searching.  I don't know whether this is the problem, I just mention it because it is a common error.
>
> Thanks,
> Shawn
>
> ________________________________
>
> This email is intended solely for the recipient. It may contain privileged, proprietary or confidential information or material. If you are not the intended recipient, please delete this email and any attachments and notify the sender of the error.

Reply | Threaded
Open this post in threaded view
|

RE: Query in quotes cannot find results

Permakoff, Vadim
Hi Erik,
That's what I did in the past, but this is an enterprise search and I have a requirement to remove the stopwords.
To have both features I can add synonyms in the front-end application, I know it will work, but I need a justification why I have to do it in the application as it is an additional effort.
I thought there is a bug for such case to which I can refer, because according to documentation it should work, right?
Anyway, there is more to it. If I'll add the same synonym processing to the indexing part, i.e. the configuration will be like this:

    <fieldType name="text_test" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="true">
      <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt" ignoreCase="true"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
    </fieldType>

The analysis shows the parsing is matching now for indexing and querying path, but the exact match result still cannot be found! This is weird.
Any thoughts?

Best Regards,
Vadim Permakoff


-----Original Message-----
From: Erick Erickson <[hidden email]>
Sent: Monday, June 29, 2020 10:19 PM
To: [hidden email]
Subject: Re: Query in quotes cannot find results

Looks like you’re removing stopwords. Stopwords cause issues like this with the positions being off.

It’s becoming more and more common to _NOT_ remove stopwords, is that an option?



Best,
Erick

> On Jun 29, 2020, at 7:32 PM, Permakoff, Vadim <[hidden email]> wrote:
>
> Hi Shawn,
> Many thanks for the response, I checked the field and it is correct. Let's call it _text_ to make it easier.
> I believe the parsing is also correct, please see below:
> - Query without quotes (works):
>    "querystring":"expand the methods",
>    "parsedquery":"(PhraseQuery(_text_:\"blow up\") _text_:expand) _text_:methods",
>
> - Query with quotes (does not work):
>    "querystring":"\"expand the methods\"",
>    "parsedquery":"SpanNearQuery(spanNear([spanOr([spanNear([_text_:blow, _text_:up], 0, true), _text_:expand]), _text_:methods], 0, true))",
>
> The document has text:
> "to expand the methods for mailing cancellation"
>
> The analysis on this field shows that all words are present in the index and the query, the order is also correct, but the word "methods" in moved one position, I guess that's why the result is not found.
>
> Best Regards,
> Vadim Permakoff
>
>
>
>
> -----Original Message-----
> From: Shawn Heisey <[hidden email]>
> Sent: Monday, June 29, 2020 6:28 PM
> To: [hidden email]
> Subject: Re: Query in quotes cannot find results
>
> On 6/29/2020 3:34 PM, Permakoff, Vadim wrote:
>> The basic query q=expand the methods   <<< finds the document,
>> the query (in quotes) q="expand the methods"   <<< cannot find the document
>>
>> Am I doing something wrong, or is it known bug (I saw similar issues discussed in the past, but not for exact match query) and if yes - what is the Jira for it?
>
> The most helpful information will come from running both queries with debug enabled, so you can see how the query is parsed.  If you add a parameter "debugQuery=true" to the URL, then the response should include the parsed query.  Compare those, and see if you can tell what the differences are.
>
> One of the most common problems for queries like this is that you're not searching the field that you THINK you're searching.  I don't know whether this is the problem, I just mention it because it is a common error.
>
> Thanks,
> Shawn
>
> ________________________________
>
> This email is intended solely for the recipient. It may contain privileged, proprietary or confidential information or material. If you are not the intended recipient, please delete this email and any attachments and notify the sender of the error.

Reply | Threaded
Open this post in threaded view
|

Re: Query in quotes cannot find results

Erick Erickson
Well, the first thing is that you haven’t include FlattenGraphFilterFactory in the index analysis chain, see: https://lucene.apache.org/solr/guide/7_5/filter-descriptions.html#synonym-graph-filter. IDK whether that actually pertains, but I’d reindex with that included before pursuing.

Second, “I have a requirement to remove the stopwords”. Why? Who thinks it’s necessary? Is there any evidence for this or any use-case that shows it _is_ necessary? Removing stopwords became common in the long-ago days when memory and disk capacity were vastly more constrained than now. At this point, I require proof that it’s _necessary_ to remove them before accepting this kind of requirement.

There are situations where removing stopwords is worth the difficulty it causes. But I’ve seen far too many unnecessary requirements to let that one pass without pushing back ;).

And you can hack around this by adding slop to the phrase, perhaps you can get “good enough” results by adding one slop for every stopword, i.e. if the input is “expand the methods”, detect that there’s one stopword and change it to “expand the methods”~1. That’ll introduce other problems of course.

Best,
Erick

> On Jun 30, 2020, at 11:56 AM, Permakoff, Vadim <[hidden email]> wrote:
>
> Hi Erik,
> That's what I did in the past, but this is an enterprise search and I have a requirement to remove the stopwords.
> To have both features I can add synonyms in the front-end application, I know it will work, but I need a justification why I have to do it in the application as it is an additional effort.
> I thought there is a bug for such case to which I can refer, because according to documentation it should work, right?
> Anyway, there is more to it. If I'll add the same synonym processing to the indexing part, i.e. the configuration will be like this:
>
>    <fieldType name="text_test" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="true">
>      <analyzer type="index">
>        <tokenizer class="solr.StandardTokenizerFactory"/>
>        <filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt" ignoreCase="true"/>
>        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
>        <filter class="solr.LowerCaseFilterFactory"/>
>      </analyzer>
>      <analyzer type="query">
>        <tokenizer class="solr.StandardTokenizerFactory"/>
>        <filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
>        <filter class="solr.LowerCaseFilterFactory"/>
>      </analyzer>
>    </fieldType>
>
> The analysis shows the parsing is matching now for indexing and querying path, but the exact match result still cannot be found! This is weird.
> Any thoughts?
>
> Best Regards,
> Vadim Permakoff
>
>
> -----Original Message-----
> From: Erick Erickson <[hidden email]>
> Sent: Monday, June 29, 2020 10:19 PM
> To: [hidden email]
> Subject: Re: Query in quotes cannot find results
>
> Looks like you’re removing stopwords. Stopwords cause issues like this with the positions being off.
>
> It’s becoming more and more common to _NOT_ remove stopwords, is that an option?
>
>
>
> Best,
> Erick
>
>> On Jun 29, 2020, at 7:32 PM, Permakoff, Vadim <[hidden email]> wrote:
>>
>> Hi Shawn,
>> Many thanks for the response, I checked the field and it is correct. Let's call it _text_ to make it easier.
>> I believe the parsing is also correct, please see below:
>> - Query without quotes (works):
>>   "querystring":"expand the methods",
>>   "parsedquery":"(PhraseQuery(_text_:\"blow up\") _text_:expand) _text_:methods",
>>
>> - Query with quotes (does not work):
>>   "querystring":"\"expand the methods\"",
>>   "parsedquery":"SpanNearQuery(spanNear([spanOr([spanNear([_text_:blow, _text_:up], 0, true), _text_:expand]), _text_:methods], 0, true))",
>>
>> The document has text:
>> "to expand the methods for mailing cancellation"
>>
>> The analysis on this field shows that all words are present in the index and the query, the order is also correct, but the word "methods" in moved one position, I guess that's why the result is not found.
>>
>> Best Regards,
>> Vadim Permakoff
>>
>>
>>
>>
>> -----Original Message-----
>> From: Shawn Heisey <[hidden email]>
>> Sent: Monday, June 29, 2020 6:28 PM
>> To: [hidden email]
>> Subject: Re: Query in quotes cannot find results
>>
>> On 6/29/2020 3:34 PM, Permakoff, Vadim wrote:
>>> The basic query q=expand the methods   <<< finds the document,
>>> the query (in quotes) q="expand the methods"   <<< cannot find the document
>>>
>>> Am I doing something wrong, or is it known bug (I saw similar issues discussed in the past, but not for exact match query) and if yes - what is the Jira for it?
>>
>> The most helpful information will come from running both queries with debug enabled, so you can see how the query is parsed.  If you add a parameter "debugQuery=true" to the URL, then the response should include the parsed query.  Compare those, and see if you can tell what the differences are.
>>
>> One of the most common problems for queries like this is that you're not searching the field that you THINK you're searching.  I don't know whether this is the problem, I just mention it because it is a common error.
>>
>> Thanks,
>> Shawn
>>
>> ________________________________
>>
>> This email is intended solely for the recipient. It may contain privileged, proprietary or confidential information or material. If you are not the intended recipient, please delete this email and any attachments and notify the sender of the error.
>

Reply | Threaded
Open this post in threaded view
|

Re: Query in quotes cannot find results

Walter Underwood
In reply to this post by Permakoff, Vadim
Removing stopwords is a dumb requirement. “Doctor, it hurts when I shove hedgehogs up my arse.”

Part of our job as search engineers is to solve the real problem, not implement a pile of requirements from people who don’t understand how search works.

Here is an article I wrote 13 years ago about why we didn’t remove stopwords at Netflix.

https://observer.wunderwood.org/2007/05/31/do-all-stopword-queries-matter/

wunder
Walter Underwood
[hidden email]
http://observer.wunderwood.org/  (my blog)

> On Jun 30, 2020, at 8:56 AM, Permakoff, Vadim <[hidden email]> wrote:
>
> Hi Erik,
> That's what I did in the past, but this is an enterprise search and I have a requirement to remove the stopwords.
> To have both features I can add synonyms in the front-end application, I know it will work, but I need a justification why I have to do it in the application as it is an additional effort.
> I thought there is a bug for such case to which I can refer, because according to documentation it should work, right?
> Anyway, there is more to it. If I'll add the same synonym processing to the indexing part, i.e. the configuration will be like this:
>
>    <fieldType name="text_test" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="true">
>      <analyzer type="index">
>        <tokenizer class="solr.StandardTokenizerFactory"/>
>        <filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt" ignoreCase="true"/>
>        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
>        <filter class="solr.LowerCaseFilterFactory"/>
>      </analyzer>
>      <analyzer type="query">
>        <tokenizer class="solr.StandardTokenizerFactory"/>
>        <filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
>        <filter class="solr.LowerCaseFilterFactory"/>
>      </analyzer>
>    </fieldType>
>
> The analysis shows the parsing is matching now for indexing and querying path, but the exact match result still cannot be found! This is weird.
> Any thoughts?
>
> Best Regards,
> Vadim Permakoff
>
>
> -----Original Message-----
> From: Erick Erickson <[hidden email]>
> Sent: Monday, June 29, 2020 10:19 PM
> To: [hidden email]
> Subject: Re: Query in quotes cannot find results
>
> Looks like you’re removing stopwords. Stopwords cause issues like this with the positions being off.
>
> It’s becoming more and more common to _NOT_ remove stopwords, is that an option?
>
>
>
> Best,
> Erick
>
>> On Jun 29, 2020, at 7:32 PM, Permakoff, Vadim <[hidden email]> wrote:
>>
>> Hi Shawn,
>> Many thanks for the response, I checked the field and it is correct. Let's call it _text_ to make it easier.
>> I believe the parsing is also correct, please see below:
>> - Query without quotes (works):
>>   "querystring":"expand the methods",
>>   "parsedquery":"(PhraseQuery(_text_:\"blow up\") _text_:expand) _text_:methods",
>>
>> - Query with quotes (does not work):
>>   "querystring":"\"expand the methods\"",
>>   "parsedquery":"SpanNearQuery(spanNear([spanOr([spanNear([_text_:blow, _text_:up], 0, true), _text_:expand]), _text_:methods], 0, true))",
>>
>> The document has text:
>> "to expand the methods for mailing cancellation"
>>
>> The analysis on this field shows that all words are present in the index and the query, the order is also correct, but the word "methods" in moved one position, I guess that's why the result is not found.
>>
>> Best Regards,
>> Vadim Permakoff
>>
>>
>>
>>
>> -----Original Message-----
>> From: Shawn Heisey <[hidden email]>
>> Sent: Monday, June 29, 2020 6:28 PM
>> To: [hidden email]
>> Subject: Re: Query in quotes cannot find results
>>
>> On 6/29/2020 3:34 PM, Permakoff, Vadim wrote:
>>> The basic query q=expand the methods   <<< finds the document,
>>> the query (in quotes) q="expand the methods"   <<< cannot find the document
>>>
>>> Am I doing something wrong, or is it known bug (I saw similar issues discussed in the past, but not for exact match query) and if yes - what is the Jira for it?
>>
>> The most helpful information will come from running both queries with debug enabled, so you can see how the query is parsed.  If you add a parameter "debugQuery=true" to the URL, then the response should include the parsed query.  Compare those, and see if you can tell what the differences are.
>>
>> One of the most common problems for queries like this is that you're not searching the field that you THINK you're searching.  I don't know whether this is the problem, I just mention it because it is a common error.
>>
>> Thanks,
>> Shawn
>>
>> ________________________________
>>
>> This email is intended solely for the recipient. It may contain privileged, proprietary or confidential information or material. If you are not the intended recipient, please delete this email and any attachments and notify the sender of the error.
>

Reply | Threaded
Open this post in threaded view
|

RE: Query in quotes cannot find results

Permakoff, Vadim
In reply to this post by Erick Erickson
Hi Erick,
Thank you for the suggestion, I should of add it. Actually before asking this question here, I tried to add and remove the FlattenGraphFilterFactory, plus other variations, like expand / not expand, autoGeneratePhraseQueries / not autoGeneratePhraseQueries - it just does not work with this particular example. You can try it yourself.

Regarding removing the stopwords, I agree, there are many cases when you don't want to remove the stopwords, but there is one very compelling case when you want them to be removed.

Imagine, you have one document with the following text:
1. "to expand the methods for mailing cancellation"
And another document with the text:
2. "to expand methods for mailing cancellation"

The user query is (without quotes): q=expand the methods for mailing cancellation
I don't want to bring all the documents with condition q.op=OR, it will find too many unrelated documents, so I want to search with q.op=AND. Unfortunately, the document 2 will not be found as it has no stop word "the" in it.
What should I do now?

Best Regards,
Vadim Permakoff


-----Original Message-----
From: Erick Erickson <[hidden email]>
Sent: Tuesday, June 30, 2020 12:15 PM
To: [hidden email]
Subject: Re: Query in quotes cannot find results

Well, the first thing is that you haven’t include FlattenGraphFilterFactory in the index analysis chain, see: https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_solr_guide_7-5F5_filter-2Ddescriptions.html-23synonym-2Dgraph-2Dfilter&d=DwIFaQ&c=birp9sjcGzT9DCP3EIAtLA&r=T7Y0P9fY-fUzzabuVL6cMrBieBBqDIpnUbUy8vL_a1g&m=v9L0OP7Vty3QDsAE5HHzmT17u-0nP9KxGEYASOsZDRc&s=LALOI9o1-14JCwd0WYWGCPwTSfWMg0K23bAk3wDp-g4&e= . IDK whether that actually pertains, but I’d reindex with that included before pursuing.

Second, “I have a requirement to remove the stopwords”. Why? Who thinks it’s necessary? Is there any evidence for this or any use-case that shows it _is_ necessary? Removing stopwords became common in the long-ago days when memory and disk capacity were vastly more constrained than now. At this point, I require proof that it’s _necessary_ to remove them before accepting this kind of requirement.

There are situations where removing stopwords is worth the difficulty it causes. But I’ve seen far too many unnecessary requirements to let that one pass without pushing back ;).

And you can hack around this by adding slop to the phrase, perhaps you can get “good enough” results by adding one slop for every stopword, i.e. if the input is “expand the methods”, detect that there’s one stopword and change it to “expand the methods”~1. That’ll introduce other problems of course.

Best,
Erick

> On Jun 30, 2020, at 11:56 AM, Permakoff, Vadim <[hidden email]> wrote:
>
> Hi Erik,
> That's what I did in the past, but this is an enterprise search and I have a requirement to remove the stopwords.
> To have both features I can add synonyms in the front-end application, I know it will work, but I need a justification why I have to do it in the application as it is an additional effort.
> I thought there is a bug for such case to which I can refer, because according to documentation it should work, right?
> Anyway, there is more to it. If I'll add the same synonym processing to the indexing part, i.e. the configuration will be like this:
>
>    <fieldType name="text_test" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="true">
>      <analyzer type="index">
>        <tokenizer class="solr.StandardTokenizerFactory"/>
>        <filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt" ignoreCase="true"/>
>        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
>        <filter class="solr.LowerCaseFilterFactory"/>
>      </analyzer>
>      <analyzer type="query">
>        <tokenizer class="solr.StandardTokenizerFactory"/>
>        <filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
>        <filter class="solr.LowerCaseFilterFactory"/>
>      </analyzer>
>    </fieldType>
>
> The analysis shows the parsing is matching now for indexing and querying path, but the exact match result still cannot be found! This is weird.
> Any thoughts?
>
> Best Regards,
> Vadim Permakoff
>
>
> -----Original Message-----
> From: Erick Erickson <[hidden email]>
> Sent: Monday, June 29, 2020 10:19 PM
> To: [hidden email]
> Subject: Re: Query in quotes cannot find results
>
> Looks like you’re removing stopwords. Stopwords cause issues like this with the positions being off.
>
> It’s becoming more and more common to _NOT_ remove stopwords, is that an option?
>
>
>
> Best,
> Erick
>
>> On Jun 29, 2020, at 7:32 PM, Permakoff, Vadim <[hidden email]> wrote:
>>
>> Hi Shawn,
>> Many thanks for the response, I checked the field and it is correct. Let's call it _text_ to make it easier.
>> I believe the parsing is also correct, please see below:
>> - Query without quotes (works):
>>   "querystring":"expand the methods",
>>   "parsedquery":"(PhraseQuery(_text_:\"blow up\") _text_:expand) _text_:methods",
>>
>> - Query with quotes (does not work):
>>   "querystring":"\"expand the methods\"",
>>   "parsedquery":"SpanNearQuery(spanNear([spanOr([spanNear([_text_:blow, _text_:up], 0, true), _text_:expand]), _text_:methods], 0, true))",
>>
>> The document has text:
>> "to expand the methods for mailing cancellation"
>>
>> The analysis on this field shows that all words are present in the index and the query, the order is also correct, but the word "methods" in moved one position, I guess that's why the result is not found.
>>
>> Best Regards,
>> Vadim Permakoff
>>
>>
>>
>>
>> -----Original Message-----
>> From: Shawn Heisey <[hidden email]>
>> Sent: Monday, June 29, 2020 6:28 PM
>> To: [hidden email]
>> Subject: Re: Query in quotes cannot find results
>>
>> On 6/29/2020 3:34 PM, Permakoff, Vadim wrote:
>>> The basic query q=expand the methods   <<< finds the document,
>>> the query (in quotes) q="expand the methods"   <<< cannot find the document
>>>
>>> Am I doing something wrong, or is it known bug (I saw similar issues discussed in the past, but not for exact match query) and if yes - what is the Jira for it?
>>
>> The most helpful information will come from running both queries with debug enabled, so you can see how the query is parsed.  If you add a parameter "debugQuery=true" to the URL, then the response should include the parsed query.  Compare those, and see if you can tell what the differences are.
>>
>> One of the most common problems for queries like this is that you're not searching the field that you THINK you're searching.  I don't know whether this is the problem, I just mention it because it is a common error.
>>
>> Thanks,
>> Shawn
>>
>> ________________________________
>>
>> This email is intended solely for the recipient. It may contain privileged, proprietary or confidential information or material. If you are not the intended recipient, please delete this email and any attachments and notify the sender of the error.
>

Reply | Threaded
Open this post in threaded view
|

RE: Query in quotes cannot find results

Permakoff, Vadim
In reply to this post by Walter Underwood
Hi Walter,
I'm with you, sometimes the stopwords are very important, I did a few years back just for fun the Solr demo for Wikipedia search, you can see - nothing is removed:
http://www.softcorporation.com/lab/solr/wiki/?sq=to+be+or+not+to+be

But with the enterprise search, sometimes you will be better off removing the stopwords, I replied to Erick why.
My question is not "Should we remove the stopwords?", my question is: "Apparently the synonyms with spaces are not working if we are removing the stopwords. Is there a way to fix it or is there a jira for it?"

Best Regards,
Vadim Permakoff


-----Original Message-----
From: Walter Underwood <[hidden email]>
Sent: Tuesday, June 30, 2020 12:50 PM
To: [hidden email]
Subject: Re: Query in quotes cannot find results

Removing stopwords is a dumb requirement. “Doctor, it hurts when I shove hedgehogs up my arse.”

Part of our job as search engineers is to solve the real problem, not implement a pile of requirements from people who don’t understand how search works.

Here is an article I wrote 13 years ago about why we didn’t remove stopwords at Netflix.

https://urldefense.proofpoint.com/v2/url?u=https-3A__observer.wunderwood.org_2007_05_31_do-2Dall-2Dstopword-2Dqueries-2Dmatter_&d=DwIFaQ&c=birp9sjcGzT9DCP3EIAtLA&r=T7Y0P9fY-fUzzabuVL6cMrBieBBqDIpnUbUy8vL_a1g&m=kjHjId_IfQN_w0ISSEAUWfFIrgqEl2H7YiZSx22eRys&s=RhKQkdqdNNyweNUackNjcCPnj-0ahUz7oHjupG4v9yM&e= 

wunder
Walter Underwood
[hidden email]
https://urldefense.proofpoint.com/v2/url?u=http-3A__observer.wunderwood.org_&d=DwIFaQ&c=birp9sjcGzT9DCP3EIAtLA&r=T7Y0P9fY-fUzzabuVL6cMrBieBBqDIpnUbUy8vL_a1g&m=kjHjId_IfQN_w0ISSEAUWfFIrgqEl2H7YiZSx22eRys&s=8xpxLnqquGUWswYROoC61WTpDxzjwNOnEoRNw3vNvmM&e=   (my blog)

> On Jun 30, 2020, at 8:56 AM, Permakoff, Vadim <[hidden email]> wrote:
>
> Hi Erik,
> That's what I did in the past, but this is an enterprise search and I have a requirement to remove the stopwords.
> To have both features I can add synonyms in the front-end application, I know it will work, but I need a justification why I have to do it in the application as it is an additional effort.
> I thought there is a bug for such case to which I can refer, because according to documentation it should work, right?
> Anyway, there is more to it. If I'll add the same synonym processing to the indexing part, i.e. the configuration will be like this:
>
>    <fieldType name="text_test" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="true">
>      <analyzer type="index">
>        <tokenizer class="solr.StandardTokenizerFactory"/>
>        <filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt" ignoreCase="true"/>
>        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
>        <filter class="solr.LowerCaseFilterFactory"/>
>      </analyzer>
>      <analyzer type="query">
>        <tokenizer class="solr.StandardTokenizerFactory"/>
>        <filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
>        <filter class="solr.LowerCaseFilterFactory"/>
>      </analyzer>
>    </fieldType>
>
> The analysis shows the parsing is matching now for indexing and querying path, but the exact match result still cannot be found! This is weird.
> Any thoughts?
>
> Best Regards,
> Vadim Permakoff
>
>
> -----Original Message-----
> From: Erick Erickson <[hidden email]>
> Sent: Monday, June 29, 2020 10:19 PM
> To: [hidden email]
> Subject: Re: Query in quotes cannot find results
>
> Looks like you’re removing stopwords. Stopwords cause issues like this with the positions being off.
>
> It’s becoming more and more common to _NOT_ remove stopwords, is that an option?
>
>
>
> Best,
> Erick
>
>> On Jun 29, 2020, at 7:32 PM, Permakoff, Vadim <[hidden email]> wrote:
>>
>> Hi Shawn,
>> Many thanks for the response, I checked the field and it is correct. Let's call it _text_ to make it easier.
>> I believe the parsing is also correct, please see below:
>> - Query without quotes (works):
>>   "querystring":"expand the methods",
>>   "parsedquery":"(PhraseQuery(_text_:\"blow up\") _text_:expand) _text_:methods",
>>
>> - Query with quotes (does not work):
>>   "querystring":"\"expand the methods\"",
>>   "parsedquery":"SpanNearQuery(spanNear([spanOr([spanNear([_text_:blow, _text_:up], 0, true), _text_:expand]), _text_:methods], 0, true))",
>>
>> The document has text:
>> "to expand the methods for mailing cancellation"
>>
>> The analysis on this field shows that all words are present in the index and the query, the order is also correct, but the word "methods" in moved one position, I guess that's why the result is not found.
>>
>> Best Regards,
>> Vadim Permakoff
>>
>>
>>
>>
>> -----Original Message-----
>> From: Shawn Heisey <[hidden email]>
>> Sent: Monday, June 29, 2020 6:28 PM
>> To: [hidden email]
>> Subject: Re: Query in quotes cannot find results
>>
>> On 6/29/2020 3:34 PM, Permakoff, Vadim wrote:
>>> The basic query q=expand the methods   <<< finds the document,
>>> the query (in quotes) q="expand the methods"   <<< cannot find the document
>>>
>>> Am I doing something wrong, or is it known bug (I saw similar issues discussed in the past, but not for exact match query) and if yes - what is the Jira for it?
>>
>> The most helpful information will come from running both queries with debug enabled, so you can see how the query is parsed.  If you add a parameter "debugQuery=true" to the URL, then the response should include the parsed query.  Compare those, and see if you can tell what the differences are.
>>
>> One of the most common problems for queries like this is that you're not searching the field that you THINK you're searching.  I don't know whether this is the problem, I just mention it because it is a common error.
>>
>> Thanks,
>> Shawn
>>
>> ________________________________
>>
>> This email is intended solely for the recipient. It may contain privileged, proprietary or confidential information or material. If you are not the intended recipient, please delete this email and any attachments and notify the sender of the error.
>

Reply | Threaded
Open this post in threaded view
|

Re: Query in quotes cannot find results

Walter Underwood
In reply to this post by Permakoff, Vadim
This is exactly why the “mm” (minimum match) parameter exists, to reduce the number of hits with fewer matches. Think of it as a sliding scale between OR and AND.

On the other hand, I don’t usually worry about hits with fewer matches. Those are not on the first page, so I don’t care.

In general, you can either optimize more related hits or optimize fewer unrelated hits. Everything you do to reduce the unrelated hits will cause some related hits to not match.

Also, do all of your tuning with real user queries from logs. Making up queries for testing will lead to fixing problems that never occur in production and to missing problems that do occur.

wunder
Walter Underwood
[hidden email]
http://observer.wunderwood.org/  (my blog)

> On Jun 30, 2020, at 11:07 AM, Permakoff, Vadim <[hidden email]> wrote:
>
> Hi Erick,
> Thank you for the suggestion, I should of add it. Actually before asking this question here, I tried to add and remove the FlattenGraphFilterFactory, plus other variations, like expand / not expand, autoGeneratePhraseQueries / not autoGeneratePhraseQueries - it just does not work with this particular example. You can try it yourself.
>
> Regarding removing the stopwords, I agree, there are many cases when you don't want to remove the stopwords, but there is one very compelling case when you want them to be removed.
>
> Imagine, you have one document with the following text:
> 1. "to expand the methods for mailing cancellation"
> And another document with the text:
> 2. "to expand methods for mailing cancellation"
>
> The user query is (without quotes): q=expand the methods for mailing cancellation
> I don't want to bring all the documents with condition q.op=OR, it will find too many unrelated documents, so I want to search with q.op=AND. Unfortunately, the document 2 will not be found as it has no stop word "the" in it.
> What should I do now?
>
> Best Regards,
> Vadim Permakoff
>
>
> -----Original Message-----
> From: Erick Erickson <[hidden email]>
> Sent: Tuesday, June 30, 2020 12:15 PM
> To: [hidden email]
> Subject: Re: Query in quotes cannot find results
>
> Well, the first thing is that you haven’t include FlattenGraphFilterFactory in the index analysis chain, see: https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_solr_guide_7-5F5_filter-2Ddescriptions.html-23synonym-2Dgraph-2Dfilter&d=DwIFaQ&c=birp9sjcGzT9DCP3EIAtLA&r=T7Y0P9fY-fUzzabuVL6cMrBieBBqDIpnUbUy8vL_a1g&m=v9L0OP7Vty3QDsAE5HHzmT17u-0nP9KxGEYASOsZDRc&s=LALOI9o1-14JCwd0WYWGCPwTSfWMg0K23bAk3wDp-g4&e= . IDK whether that actually pertains, but I’d reindex with that included before pursuing.
>
> Second, “I have a requirement to remove the stopwords”. Why? Who thinks it’s necessary? Is there any evidence for this or any use-case that shows it _is_ necessary? Removing stopwords became common in the long-ago days when memory and disk capacity were vastly more constrained than now. At this point, I require proof that it’s _necessary_ to remove them before accepting this kind of requirement.
>
> There are situations where removing stopwords is worth the difficulty it causes. But I’ve seen far too many unnecessary requirements to let that one pass without pushing back ;).
>
> And you can hack around this by adding slop to the phrase, perhaps you can get “good enough” results by adding one slop for every stopword, i.e. if the input is “expand the methods”, detect that there’s one stopword and change it to “expand the methods”~1. That’ll introduce other problems of course.
>
> Best,
> Erick
>
>> On Jun 30, 2020, at 11:56 AM, Permakoff, Vadim <[hidden email]> wrote:
>>
>> Hi Erik,
>> That's what I did in the past, but this is an enterprise search and I have a requirement to remove the stopwords.
>> To have both features I can add synonyms in the front-end application, I know it will work, but I need a justification why I have to do it in the application as it is an additional effort.
>> I thought there is a bug for such case to which I can refer, because according to documentation it should work, right?
>> Anyway, there is more to it. If I'll add the same synonym processing to the indexing part, i.e. the configuration will be like this:
>>
>>   <fieldType name="text_test" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="true">
>>     <analyzer type="index">
>>       <tokenizer class="solr.StandardTokenizerFactory"/>
>>       <filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt" ignoreCase="true"/>
>>       <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
>>       <filter class="solr.LowerCaseFilterFactory"/>
>>     </analyzer>
>>     <analyzer type="query">
>>       <tokenizer class="solr.StandardTokenizerFactory"/>
>>       <filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>>       <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
>>       <filter class="solr.LowerCaseFilterFactory"/>
>>     </analyzer>
>>   </fieldType>
>>
>> The analysis shows the parsing is matching now for indexing and querying path, but the exact match result still cannot be found! This is weird.
>> Any thoughts?
>>
>> Best Regards,
>> Vadim Permakoff
>>
>>
>> -----Original Message-----
>> From: Erick Erickson <[hidden email]>
>> Sent: Monday, June 29, 2020 10:19 PM
>> To: [hidden email]
>> Subject: Re: Query in quotes cannot find results
>>
>> Looks like you’re removing stopwords. Stopwords cause issues like this with the positions being off.
>>
>> It’s becoming more and more common to _NOT_ remove stopwords, is that an option?
>>
>>
>>
>> Best,
>> Erick
>>
>>> On Jun 29, 2020, at 7:32 PM, Permakoff, Vadim <[hidden email]> wrote:
>>>
>>> Hi Shawn,
>>> Many thanks for the response, I checked the field and it is correct. Let's call it _text_ to make it easier.
>>> I believe the parsing is also correct, please see below:
>>> - Query without quotes (works):
>>>  "querystring":"expand the methods",
>>>  "parsedquery":"(PhraseQuery(_text_:\"blow up\") _text_:expand) _text_:methods",
>>>
>>> - Query with quotes (does not work):
>>>  "querystring":"\"expand the methods\"",
>>>  "parsedquery":"SpanNearQuery(spanNear([spanOr([spanNear([_text_:blow, _text_:up], 0, true), _text_:expand]), _text_:methods], 0, true))",
>>>
>>> The document has text:
>>> "to expand the methods for mailing cancellation"
>>>
>>> The analysis on this field shows that all words are present in the index and the query, the order is also correct, but the word "methods" in moved one position, I guess that's why the result is not found.
>>>
>>> Best Regards,
>>> Vadim Permakoff
>>>
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: Shawn Heisey <[hidden email]>
>>> Sent: Monday, June 29, 2020 6:28 PM
>>> To: [hidden email]
>>> Subject: Re: Query in quotes cannot find results
>>>
>>> On 6/29/2020 3:34 PM, Permakoff, Vadim wrote:
>>>> The basic query q=expand the methods   <<< finds the document,
>>>> the query (in quotes) q="expand the methods"   <<< cannot find the document
>>>>
>>>> Am I doing something wrong, or is it known bug (I saw similar issues discussed in the past, but not for exact match query) and if yes - what is the Jira for it?
>>>
>>> The most helpful information will come from running both queries with debug enabled, so you can see how the query is parsed.  If you add a parameter "debugQuery=true" to the URL, then the response should include the parsed query.  Compare those, and see if you can tell what the differences are.
>>>
>>> One of the most common problems for queries like this is that you're not searching the field that you THINK you're searching.  I don't know whether this is the problem, I just mention it because it is a common error.
>>>
>>> Thanks,
>>> Shawn
>>>
>>> ________________________________
>>>
>>> This email is intended solely for the recipient. It may contain privileged, proprietary or confidential information or material. If you are not the intended recipient, please delete this email and any attachments and notify the sender of the error.
>>
>

Reply | Threaded
Open this post in threaded view
|

RE: Query in quotes cannot find results

Permakoff, Vadim
Thank you Walter, I'll look into “mm” (minimum match) parameter.

Best Regards,
Vadim Permakoff


-----Original Message-----
From: Walter Underwood <[hidden email]>
Sent: Tuesday, June 30, 2020 2:31 PM
To: [hidden email]
Subject: Re: Query in quotes cannot find results

This is exactly why the “mm” (minimum match) parameter exists, to reduce the number of hits with fewer matches. Think of it as a sliding scale between OR and AND.

On the other hand, I don’t usually worry about hits with fewer matches. Those are not on the first page, so I don’t care.

In general, you can either optimize more related hits or optimize fewer unrelated hits. Everything you do to reduce the unrelated hits will cause some related hits to not match.

Also, do all of your tuning with real user queries from logs. Making up queries for testing will lead to fixing problems that never occur in production and to missing problems that do occur.

wunder
Walter Underwood
[hidden email]
https://urldefense.proofpoint.com/v2/url?u=http-3A__observer.wunderwood.org_&d=DwIFaQ&c=birp9sjcGzT9DCP3EIAtLA&r=T7Y0P9fY-fUzzabuVL6cMrBieBBqDIpnUbUy8vL_a1g&m=Ol5cKm0H8yMMumWsju-SIp8XXKG9UsM1SZdwwfYwRFI&s=Wfu_hghIf8SKFF7k-pk9A0xMA5CMWm0MVNuK2XJSKuQ&e=   (my blog)

> On Jun 30, 2020, at 11:07 AM, Permakoff, Vadim <[hidden email]> wrote:
>
> Hi Erick,
> Thank you for the suggestion, I should of add it. Actually before asking this question here, I tried to add and remove the FlattenGraphFilterFactory, plus other variations, like expand / not expand, autoGeneratePhraseQueries / not autoGeneratePhraseQueries - it just does not work with this particular example. You can try it yourself.
>
> Regarding removing the stopwords, I agree, there are many cases when you don't want to remove the stopwords, but there is one very compelling case when you want them to be removed.
>
> Imagine, you have one document with the following text:
> 1. "to expand the methods for mailing cancellation"
> And another document with the text:
> 2. "to expand methods for mailing cancellation"
>
> The user query is (without quotes): q=expand the methods for mailing
> cancellation I don't want to bring all the documents with condition q.op=OR, it will find too many unrelated documents, so I want to search with q.op=AND. Unfortunately, the document 2 will not be found as it has no stop word "the" in it.
> What should I do now?
>
> Best Regards,
> Vadim Permakoff
>
>
> -----Original Message-----
> From: Erick Erickson <[hidden email]>
> Sent: Tuesday, June 30, 2020 12:15 PM
> To: [hidden email]
> Subject: Re: Query in quotes cannot find results
>
> Well, the first thing is that you haven’t include FlattenGraphFilterFactory in the index analysis chain, see: https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_solr_guide_7-5F5_filter-2Ddescriptions.html-23synonym-2Dgraph-2Dfilter&d=DwIFaQ&c=birp9sjcGzT9DCP3EIAtLA&r=T7Y0P9fY-fUzzabuVL6cMrBieBBqDIpnUbUy8vL_a1g&m=v9L0OP7Vty3QDsAE5HHzmT17u-0nP9KxGEYASOsZDRc&s=LALOI9o1-14JCwd0WYWGCPwTSfWMg0K23bAk3wDp-g4&e= . IDK whether that actually pertains, but I’d reindex with that included before pursuing.
>
> Second, “I have a requirement to remove the stopwords”. Why? Who thinks it’s necessary? Is there any evidence for this or any use-case that shows it _is_ necessary? Removing stopwords became common in the long-ago days when memory and disk capacity were vastly more constrained than now. At this point, I require proof that it’s _necessary_ to remove them before accepting this kind of requirement.
>
> There are situations where removing stopwords is worth the difficulty it causes. But I’ve seen far too many unnecessary requirements to let that one pass without pushing back ;).
>
> And you can hack around this by adding slop to the phrase, perhaps you can get “good enough” results by adding one slop for every stopword, i.e. if the input is “expand the methods”, detect that there’s one stopword and change it to “expand the methods”~1. That’ll introduce other problems of course.
>
> Best,
> Erick
>
>> On Jun 30, 2020, at 11:56 AM, Permakoff, Vadim <[hidden email]> wrote:
>>
>> Hi Erik,
>> That's what I did in the past, but this is an enterprise search and I have a requirement to remove the stopwords.
>> To have both features I can add synonyms in the front-end application, I know it will work, but I need a justification why I have to do it in the application as it is an additional effort.
>> I thought there is a bug for such case to which I can refer, because according to documentation it should work, right?
>> Anyway, there is more to it. If I'll add the same synonym processing to the indexing part, i.e. the configuration will be like this:
>>
>>   <fieldType name="text_test" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="true">
>>     <analyzer type="index">
>>       <tokenizer class="solr.StandardTokenizerFactory"/>
>>       <filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt" ignoreCase="true"/>
>>       <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
>>       <filter class="solr.LowerCaseFilterFactory"/>
>>     </analyzer>
>>     <analyzer type="query">
>>       <tokenizer class="solr.StandardTokenizerFactory"/>
>>       <filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>>       <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
>>       <filter class="solr.LowerCaseFilterFactory"/>
>>     </analyzer>
>>   </fieldType>
>>
>> The analysis shows the parsing is matching now for indexing and querying path, but the exact match result still cannot be found! This is weird.
>> Any thoughts?
>>
>> Best Regards,
>> Vadim Permakoff
>>
>>
>> -----Original Message-----
>> From: Erick Erickson <[hidden email]>
>> Sent: Monday, June 29, 2020 10:19 PM
>> To: [hidden email]
>> Subject: Re: Query in quotes cannot find results
>>
>> Looks like you’re removing stopwords. Stopwords cause issues like this with the positions being off.
>>
>> It’s becoming more and more common to _NOT_ remove stopwords, is that an option?
>>
>>
>>
>> Best,
>> Erick
>>
>>> On Jun 29, 2020, at 7:32 PM, Permakoff, Vadim <[hidden email]> wrote:
>>>
>>> Hi Shawn,
>>> Many thanks for the response, I checked the field and it is correct. Let's call it _text_ to make it easier.
>>> I believe the parsing is also correct, please see below:
>>> - Query without quotes (works):
>>>  "querystring":"expand the methods",  
>>> "parsedquery":"(PhraseQuery(_text_:\"blow up\") _text_:expand)
>>> _text_:methods",
>>>
>>> - Query with quotes (does not work):
>>>  "querystring":"\"expand the methods\"",  
>>> "parsedquery":"SpanNearQuery(spanNear([spanOr([spanNear([_text_:blow
>>> , _text_:up], 0, true), _text_:expand]), _text_:methods], 0,
>>> true))",
>>>
>>> The document has text:
>>> "to expand the methods for mailing cancellation"
>>>
>>> The analysis on this field shows that all words are present in the index and the query, the order is also correct, but the word "methods" in moved one position, I guess that's why the result is not found.
>>>
>>> Best Regards,
>>> Vadim Permakoff
>>>
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: Shawn Heisey <[hidden email]>
>>> Sent: Monday, June 29, 2020 6:28 PM
>>> To: [hidden email]
>>> Subject: Re: Query in quotes cannot find results
>>>
>>> On 6/29/2020 3:34 PM, Permakoff, Vadim wrote:
>>>> The basic query q=expand the methods   <<< finds the document,
>>>> the query (in quotes) q="expand the methods"   <<< cannot find the document
>>>>
>>>> Am I doing something wrong, or is it known bug (I saw similar issues discussed in the past, but not for exact match query) and if yes - what is the Jira for it?
>>>
>>> The most helpful information will come from running both queries with debug enabled, so you can see how the query is parsed.  If you add a parameter "debugQuery=true" to the URL, then the response should include the parsed query.  Compare those, and see if you can tell what the differences are.
>>>
>>> One of the most common problems for queries like this is that you're not searching the field that you THINK you're searching.  I don't know whether this is the problem, I just mention it because it is a common error.
>>>
>>> Thanks,
>>> Shawn
>>>
>>> ________________________________
>>>
>>> This email is intended solely for the recipient. It may contain privileged, proprietary or confidential information or material. If you are not the intended recipient, please delete this email and any attachments and notify the sender of the error.
>>
>