Search phrase not parsed properly

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Search phrase not parsed properly

chester-3
I'm using solr 6.3 and am having an issue with a certain search phrase.

When I search for the phrase "Perkins AND Will", the parsed query does not
include "Will". See debug info below.

select?q=firmname:(Perkins%20AND%20Will)

"debug":{
    "rawquerystring":"firmname:(Perkins AND Will)",
    "querystring":"firmname:(Perkins AND Will)",
    "parsedquery":"firmname:perkin",
    "parsedquery_toString":"firmname:perkin",
    "QParser":"LuceneQParser",

But, if I search for "Johnson AND Perkins", then the phrase is parsed
correctly.

select?q=firmname:(Johnson%20AND%20Perkins)

"debug":{
    "rawquerystring":"firmname:(Johnson AND Perkins)",
    "querystring":"firmname:(Johnson AND Perkins)",
    "parsedquery":"+firmname:johnson +firmname:perkin",
    "parsedquery_toString":"+firmname:johnson +firmname:perkin",
    "QParser":"LuceneQParser",
       
Can someone explain why this is and how to fix it?

Thanks.




--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Reply | Threaded
Open this post in threaded view
|

Re: Search phrase not parsed properly

Shawn Heisey-2
On 1/10/2020 5:30 PM, chester wrote:

> I'm using solr 6.3 and am having an issue with a certain search phrase.
>
> When I search for the phrase "Perkins AND Will", the parsed query does not
> include "Will". See debug info below.
>
> select?q=firmname:(Perkins%20AND%20Will)
>
> "debug":{
>      "rawquerystring":"firmname:(Perkins AND Will)",
>      "querystring":"firmname:(Perkins AND Will)",
>      "parsedquery":"firmname:perkin",
>      "parsedquery_toString":"firmname:perkin",
>      "QParser":"LuceneQParser",

Best guess is that you have an analysis step that removes stopwords, and
that "will" is one of them.  That word is found in many stopword lists
that are available.

It is my opinion, shared by many here, that stopwords should not be
removed.  It made sense in the distant past when system capacities were
a lot smaller than they are today ... the speedup was enough to make it
worth dealing with the downsides.  These days, system capacities are
much larger and there is usually no need to remove common stopwords.

Thanks,
Shawn
Reply | Threaded
Open this post in threaded view
|

Re: Search phrase not parsed properly

Edward Ribeiro
In reply to this post by chester-3
Hi,

It looks like you are using the stopword filter and 'will' is a stop word,
so it is removed by the the analysis chain of the field. Please, test the
analysis chain in Solr Admin UI to see if this is the case.

Best,
Edward

Em sex, 10 de jan de 2020 21:30, chester <[hidden email]>
escreveu:

> I'm using solr 6.3 and am having an issue with a certain search phrase.
>
> When I search for the phrase "Perkins AND Will", the parsed query does not
> include "Will". See debug info below.
>
> select?q=firmname:(Perkins%20AND%20Will)
>
> "debug":{
>     "rawquerystring":"firmname:(Perkins AND Will)",
>     "querystring":"firmname:(Perkins AND Will)",
>     "parsedquery":"firmname:perkin",
>     "parsedquery_toString":"firmname:perkin",
>     "QParser":"LuceneQParser",
>
> But, if I search for "Johnson AND Perkins", then the phrase is parsed
> correctly.
>
> select?q=firmname:(Johnson%20AND%20Perkins)
>
> "debug":{
>     "rawquerystring":"firmname:(Johnson AND Perkins)",
>     "querystring":"firmname:(Johnson AND Perkins)",
>     "parsedquery":"+firmname:johnson +firmname:perkin",
>     "parsedquery_toString":"+firmname:johnson +firmname:perkin",
>     "QParser":"LuceneQParser",
>
> Can someone explain why this is and how to fix it?
>
> Thanks.
>
>
>
>
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>
Reply | Threaded
Open this post in threaded view
|

Re: Search phrase not parsed properly

chester-3
I checked the stopwords.txt file and it is empty. That means "will" is not a
stop word, correct?



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Reply | Threaded
Open this post in threaded view
|

Re: Search phrase not parsed properly

Edward Ribeiro
You have to check your managed-schema to see if the field type defines a
stopwordfilter and which one it points to.

There's a folder named 'lang' with many files, one for each language. If
your field is configured to english the filter will point to
lang/stopword_en.txt. The stopwords.txt file is empty by default. Also, you
can test this by using the Analysis option in Admin UI.

Best,
Edward


Em sex, 10 de jan de 2020 22:26, chester <[hidden email]>
escreveu:

> I checked the stopwords.txt file and it is empty. That means "will" is
> not a
> stop word, correct?
>
>
>
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>
Reply | Threaded
Open this post in threaded view
|

Re: Search phrase not parsed properly

Edward Ribeiro
Here is the stopwords' file for english:
https://github.com/apache/lucene-solr/blob/
<https://github.com/apache/lucene-solr/blob/master/solr/server/solr/configsets/_default/conf/lang/stopwords_en.txt>
branch_8_3/solr/server/solr/configsets/_default/conf/lang/stopwords_en.txt
<https://github.com/apache/lucene-solr/blob/master/solr/server/solr/configsets/_default/conf/lang/stopwords_en.txt>

Here is an example of how the stopword filter is setup in managed-schema
file:
https://github.com/apache/lucene-solr/blob/branch_8_3/solr/server/solr/configsets/_default/conf/managed-schema#L724


Here is how to use Solr Admin UI to test the analysis chain of your fields:
https://lucene.apache.org/solr/guide/8_3/analysis-screen.html#analysis-screen

Edward


Em sex, 10 de jan de 2020 22:36, Edward Ribeiro <[hidden email]>
escreveu:

> You have to check your managed-schema to see if the field type defines a
> stopwordfilter and which one it points to.
>
> There's a folder named 'lang' with many files, one for each language. If
> your field is configured to english the filter will point to
> lang/stopword_en.txt. The stopwords.txt file is empty by default. Also,
> you can test this by using the Analysis option in Admin UI.
>
> Best,
> Edward
>
>
> Em sex, 10 de jan de 2020 22:26, chester <[hidden email]>
> escreveu:
>
>> I checked the stopwords.txt file and it is empty. That means "will" is
>> not a
>> stop word, correct?
>>
>>
>>
>> --
>> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: Search phrase not parsed properly

Erick Erickson
In reply to this post by Edward Ribeiro
The easiest way to verify this is to use the admin UI. Select a core (or replica), go to the analysis page. Select the field in question from there and type the words into the textbox. I’d also uncheck the “verbose” box, the detailed information resulting from that is unnecessary.

Do note that this is showing you what happens to your input _after_ parsing, so if you type in query AND will, it’ll think the “AND” is just a word, not an operator. But it’ll tell you whether the schema you actually use, rather than what you think you’re using is dropping the “will” and what filter does that if so.

Best,
Erick

> On Jan 10, 2020, at 8:36 PM, Edward Ribeiro <[hidden email]> wrote:
>
> You have to check your managed-schema to see if the field type defines a
> stopwordfilter and which one it points to.
>
> There's a folder named 'lang' with many files, one for each language. If
> your field is configured to english the filter will point to
> lang/stopword_en.txt. The stopwords.txt file is empty by default. Also, you
> can test this by using the Analysis option in Admin UI.
>
> Best,
> Edward
>
>
> Em sex, 10 de jan de 2020 22:26, chester <[hidden email]>
> escreveu:
>
>> I checked the stopwords.txt file and it is empty. That means "will" is
>> not a
>> stop word, correct?
>>
>>
>>
>> --
>> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>>

Reply | Threaded
Open this post in threaded view
|

Re: Search phrase not parsed properly

chester-3
Thanks, everyone. I found the stopword_en.txt file and saw that "will" was
included in there. I removed it and have re-indexed the core. Hopefully,
that will fix the issue.



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Reply | Threaded
Open this post in threaded view
|

Re: Search phrase not parsed properly

Edward Ribeiro
I would follow Shawn's advice: remove the stopword filter from your field
types' analysis chain... Up to you, as usual.

Best,
Edward

Em sex, 10 de jan de 2020 23:12, chester <[hidden email]>
escreveu:

> Thanks, everyone. I found the stopword_en.txt file and saw that "will" was
> included in there. I removed it and have re-indexed the core. Hopefully,
> that will fix the issue.
>
>
>
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>
Reply | Threaded
Open this post in threaded view
|

Re: Search phrase not parsed properly

Walter Underwood
In reply to this post by chester-3
Remove ALL the stopwords. Remove the stopword filter.

This will happen again and again with different words until you do that.

Stopwords were necessary with 16-bit CPUs. I stopped using them in 1996.

wunder
Walter Underwood
[hidden email]
http://observer.wunderwood.org/  (my blog)

> On Jan 10, 2020, at 6:12 PM, chester <[hidden email]> wrote:
>
> Thanks, everyone. I found the stopword_en.txt file and saw that "will" was
> included in there. I removed it and have re-indexed the core. Hopefully,
> that will fix the issue.
>
>
>
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Reply | Threaded
Open this post in threaded view
|

Re: Search phrase not parsed properly

Erick Erickson
I would _definitely_ follow Walter’s advice (take the stopword filter out) as my
starting point. It’ll make your life a lot easier.

I’ll add that I have seen clients with very large clusters that were willing to deal
with the issues inherent in stopwords (and there are more than just this) for
reductions in index size and the associated hardware savings. When you’re
talking a 1,000 node cluster of decent sized servers, a 10% reduction in
hardware can be worth the aggravation.

But that’s something I’d save until there was a demonstrated need and avoid
premature optimization...

Best,
Erick

> On Jan 10, 2020, at 9:24 PM, Walter Underwood <[hidden email]> wrote:
>
> Remove ALL the stopwords. Remove the stopword filter.
>
> This will happen again and again with different words until you do that.
>
> Stopwords were necessary with 16-bit CPUs. I stopped using them in 1996.
>
> wunder
> Walter Underwood
> [hidden email]
> http://observer.wunderwood.org/  (my blog)
>
>> On Jan 10, 2020, at 6:12 PM, chester <[hidden email]> wrote:
>>
>> Thanks, everyone. I found the stopword_en.txt file and saw that "will" was
>> included in there. I removed it and have re-indexed the core. Hopefully,
>> that will fix the issue.
>>
>>
>>
>> --
>> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>