Stopwords param of edismax parser not working

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Stopwords param of edismax parser not working

Ashish Bisht
This post was updated on .
Hi,

We are trying  to remove stopwords from analysis using edismax parser
parameter.The documentation says

stopwords
A Boolean parameter indicating if the StopFilterFactory configured in the
query analyzer should be respected when parsing the query. If this is set to
false, then the StopFilterFactory in the query analyzer is ignored.


https://lucene.apache.org/solr/guide/7_3/the-extended-dismax-query-parser.html


But seems like its not working.

http://Box-1:8983/solr/Collection/select?q=internet of
things&rows=0&defType=edismax&qf=search_field
content&stopwords=false&debug=true


"parsedquery":"+(DisjunctionMaxQuery((content:internet |
search_field:internet)) DisjunctionMaxQuery((content:thing |
search_field:thing)))",
  "parsedquery_toString":"+((content:internet | search_field:internet)
(content:thing | search_field:thing))",



Are we missing something here?



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Reply | Threaded
Open this post in threaded view
|

Re: Stopwords param of edismax parser not working

Walter Underwood
Why are you removing stopwords? That hack made sense in the 1950s, but I haven’t removed stopwords for the last twenty years.

wunder
Walter Underwood
[hidden email]
http://observer.wunderwood.org/  (my blog)

> On Mar 28, 2019, at 2:47 AM, Ashish Bisht <[hidden email]> wrote:
>
> Hi,
>
> We are trying  to remove stopwords from analysis using edismax parser
> parameter.The documentation says
>
> *stopwords
> A Boolean parameter indicating if the StopFilterFactory configured in the
> query analyzer should be respected when parsing the query. If this is set to
> false, then the StopFilterFactory in the query analyzer is ignored.*
>
> https://lucene.apache.org/solr/guide/7_3/the-extended-dismax-query-parser.html
>
>
> But seems like its not working.
>
> http://Box-1:8983/solr/SalesCentralDev_4/select?q=internet of
> things&rows=0&defType=edismax&qf=search_field
> content*&stopwords=false*&debug=true
>
>
> "parsedquery":"+(DisjunctionMaxQuery((content:internet |
> search_field:internet)) DisjunctionMaxQuery((content:thing |
> search_field:thing)))",
>  *  "parsedquery_toString":"+((content:internet | search_field:internet)
> (content:thing | search_field:thing))",*
>
>
> Are we missing something here?
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Reply | Threaded
Open this post in threaded view
|

Re: Stopwords param of edismax parser not working

Erick Erickson
and to say anything about your particular situation we need to see the field definitions from the schema for the field you expect stopwrods to be removed from and the stopwords file for those fields.

But Walter’s comment is germane. Stopwords lead to a number of incongruities and are best just left in.

Best,
Erick

> On Mar 28, 2019, at 8:05 AM, Walter Underwood <[hidden email]> wrote:
>
> Why are you removing stopwords? That hack made sense in the 1950s, but I haven’t removed stopwords for the last twenty years.
>
> wunder
> Walter Underwood
> [hidden email]
> http://observer.wunderwood.org/  (my blog)
>
>> On Mar 28, 2019, at 2:47 AM, Ashish Bisht <[hidden email]> wrote:
>>
>> Hi,
>>
>> We are trying  to remove stopwords from analysis using edismax parser
>> parameter.The documentation says
>>
>> *stopwords
>> A Boolean parameter indicating if the StopFilterFactory configured in the
>> query analyzer should be respected when parsing the query. If this is set to
>> false, then the StopFilterFactory in the query analyzer is ignored.*
>>
>> https://lucene.apache.org/solr/guide/7_3/the-extended-dismax-query-parser.html
>>
>>
>> But seems like its not working.
>>
>> http://Box-1:8983/solr/SalesCentralDev_4/select?q=internet of
>> things&rows=0&defType=edismax&qf=search_field
>> content*&stopwords=false*&debug=true
>>
>>
>> "parsedquery":"+(DisjunctionMaxQuery((content:internet |
>> search_field:internet)) DisjunctionMaxQuery((content:thing |
>> search_field:thing)))",
>> *  "parsedquery_toString":"+((content:internet | search_field:internet)
>> (content:thing | search_field:thing))",*
>>
>>
>> Are we missing something here?
>>
>>
>>
>> --
>> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>

Reply | Threaded
Open this post in threaded view
|

Re: Stopwords param of edismax parser not working

Branham, Jeremy (Experis)
In reply to this post by Ashish Bisht
Hi Ashish –
Are you using v7.3?
If so, I think this is the spot in code that should be executing:
https://github.com/apache/lucene-solr/blob/releases/lucene-solr/7.3.0/solr/core/src/java/org/apache/solr/search/ExtendedDismaxQParser.java#L310

 Haven’t dug into the logic, but I tested on my server [v7.7.0], and the debug output doesn’t show whether or not the stopword filter was removed.
I don’t know your use-case, but maybe you could use the field analysis tool in Solr Admin to get more insight.
 
Jeremy Branham
[hidden email]

On 3/28/19, 4:47 AM, "Ashish Bisht" <[hidden email]> wrote:

    Hi,
   
    We are trying  to remove stopwords from analysis using edismax parser
    parameter.The documentation says
   
    *stopwords
    A Boolean parameter indicating if the StopFilterFactory configured in the
    query analyzer should be respected when parsing the query. If this is set to
    false, then the StopFilterFactory in the query analyzer is ignored.*
   
    https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_solr_guide_7-5F3_the-2Dextended-2Ddismax-2Dquery-2Dparser.html&d=DwICAg&c=gtIjdLs6LnStUpy9cTOW9w&r=0SwsmPELGv6GC1_5JSQ9T7ZPMLljrIkbF_2jBCrKXI0&m=e4J09_tlle6pJ7cObY_3FNbT4FR9VqDKCmLDx2B1ZCs&s=fcdcV-zmNEPuHTwm3OIwC_pnXlfnBWBPxjH5Ah-5dsI&e=
   
   
    But seems like its not working.
   
    https://urldefense.proofpoint.com/v2/url?u=http-3A__Box-2D1-3A8983_solr_SalesCentralDev-5F4_select-3Fq-3Dinternet&d=DwICAg&c=gtIjdLs6LnStUpy9cTOW9w&r=0SwsmPELGv6GC1_5JSQ9T7ZPMLljrIkbF_2jBCrKXI0&m=e4J09_tlle6pJ7cObY_3FNbT4FR9VqDKCmLDx2B1ZCs&s=tsSjzyF4rk8ld7IZKfbLbXeTqLlRxChfOr8kJw5ASr4&e= of
    things&rows=0&defType=edismax&qf=search_field
    content*&stopwords=false*&debug=true
   
   
    "parsedquery":"+(DisjunctionMaxQuery((content:internet |
    search_field:internet)) DisjunctionMaxQuery((content:thing |
    search_field:thing)))",
      *  "parsedquery_toString":"+((content:internet | search_field:internet)
    (content:thing | search_field:thing))",*
   
   
    Are we missing something here?
   
   
   
    --
    Sent from: https://urldefense.proofpoint.com/v2/url?u=http-3A__lucene.472066.n3.nabble.com_Solr-2DUser-2Df472068.html&d=DwICAg&c=gtIjdLs6LnStUpy9cTOW9w&r=0SwsmPELGv6GC1_5JSQ9T7ZPMLljrIkbF_2jBCrKXI0&m=e4J09_tlle6pJ7cObY_3FNbT4FR9VqDKCmLDx2B1ZCs&s=zUk8ppVtIoJ0kfwqBmFVsGooDkMnNjeHYp_yfZkGgDk&e=