Edismax query returning the same number of results using AND as it does with OR

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Edismax query returning the same number of results using AND as it does with OR

Nicky Mastin

Oddity with edismax and queries involving boolean operators.  Here's the
"parsedquery_toString" from two different queries:
input:  "dog AND kiwi":
https://apaste.info/gaQl
input:  "dog OR kiwi":
https://apaste.info/sBwa
Both queries return the same number of results (389).  The query with OR was
expected to have a much higher numFound.  Those pastes have a one week
lifetime.
The two parsed queries are almost identical.  The AND query has a couple of
extra plus signs compared to the OR query, and the OR query has a ~2 after a
right paren that the AND query doesn't have.  I'm at a loss as to what this
all means, except to say that it didn't turn out as expected.
Should the two queries have returned different numbers of results?  If not,
why is that the case?
Here is the output from echoParams=all on the OR query:
<str name="spellcheck.collateExtendedResults">true</str>
<str name="df">text</str>
<str name="hl">true</str>
<str name="hl.bs.type">LINE</str>
<str name="f.sourceid.facet.method">enum</str>
<str name="spellcheck.maxCollations">3</str>
<str name="tie"> 0.4</str>
<str name="spellcheck.maxResultsForSuggest">5</str>
<str name="qf">
title^100 kw1ranked^100 kw1^100 keywordsranked_bm25_no_norms^50
keywords_bm25_no_norms^50 authors text description species
</str>
<arr name="f.year.facet.range.other">
<str>before</str>
<str>after</str>
</arr>
<str name="hl.fl">subdocuments,keywords,authors</str>
<str name="mm">3<-1 6<-3 9<30%</str>
<str name="f.year.facet.range.hardend">true</str>
<str name="hl.formatter">html</str>
<str name="spellcheck">on</str>
<arr name="boost">
<str>
max(recip(ms(NOW/DAY+1YEAR,dateint),3.16E-11,10,6),documentdatefix)
</str>
<str>rank</str>
</arr>
<str name="debugQuery">true</str>
<str name="f.sourceid.facet.limit">1000</str>
<str name="hl.boundaryScanner">breakIterator</str>
<str name="spellcheck.collate">true</str>
<str name="facet.range">year</str>
<str name="f.year.facet.range.end">2015</str>
<str name="spellcheck.dictionary">spell_file</str>
<str name="indent">true</str>
<str name="echoParams">all</str>
<str name="fl">
id,title,description,url,objecttypeid,contexturl,defaultsourceid,sourceid,score
</str>
<str name="hl.requireFieldMatch">false</str>
<str name="hl.fragsize">100</str>
<str name="spellcheck.maxCollationTries">5</str>
<str name="f.year.facet.range.gap">5</str>
<str name="hl.simple.pre"><strong></str>
<arr name="facet.query">
<str>
{!ex=dt key="Last10yr"}dateint:[NOW/YEAR-10YEARS TO *]
</str>
<str>
{!ex=dt key="Last5yr"}dateint:[NOW/YEAR-5YEARS TO *]
</str>
<str>
{!ex=dt key="Last3yr"}dateint:[NOW/YEAR-3YEARS TO *]
</str>
<str>
{!ex=dt key="Last1yr"}dateint:[NOW/YEAR-1YEAR TO *]
</str>
</arr>
<str name="defType">edismax</str>
<str name="hl.mergeContiguous">false</str>
<str name="f.folderid.facet.method">enum</str>
<str name="wt">xml</str>
<str name="hl.highlightMultiTerm">true</str>
<str name="q.alt">*:*</str>
<arr name="facet.field">
<str>folderid</str>
<str>sourceid</str>
<str>speciesid</str>
<str>admin</str>
</arr>
<str name="f.speciesid.facet.method">enum</str>
<str name="json.nl">map</str>
<str name="start">0</str>
<str name="hl.usePhraseHightligher">true</str>
<str name="rows">25</str>
<str name="spellcheck.alternativeTermCount">2</str>
<str name="spellcheck.extendedResults">true</str>
<str name="q">dog OR kiwi</str>
<str name="f.year.facet.range.start">1970</str>
<str name="hl.simple.post"></strong></str>
<str name="pf">
title~20^5000 keywordsranked_bm25_no_norms~20^5000 kw1ranked~10^5000
keywords_bm25_no_norms~20^1500 kw1~10^500 authors^250 text~20^1000
text~100^500 description^1
</str>
<str name="facet.mincount">1</str>
<str name="hl.method">unified</str>
<str name="spellcheck.count">10</str>
<str name="pf3">
title~22^1000 keywordsranked_bm25_no_norms~22^1000
keywords_bm25_no_norms~12^500 kw1ranked~12^100 kw1~12^100 text~22^100
</str>
<str name="pf2">authors~11 species~11</str>
<str name="facet">on</str>
If anyone has any ideas about whether this behavior is expected or
unexpected, I'd appreciate hearing them.  It is Solr 7.1.0 with a patch for
SOLR-12243 applied.
There might be information that would be helpful that isn't provided.  If
there is something else needed, please let me know, so I can provide it.

Reply | Threaded
Open this post in threaded view
|

Re: Edismax query returning the same number of results using AND as it does with OR

Zheng Lin Edwin Yeo
Hi,

What is your full query path or URL that you pass for the query?
And how is your setting like for the edismax in your solrconfig.xml?

Regards,
Edwin

On Fri, 26 Oct 2018 at 06:24, Nicky Mastin <[hidden email]> wrote:

>
> Oddity with edismax and queries involving boolean operators.  Here's the
> "parsedquery_toString" from two different queries:
> input:  "dog AND kiwi":
> https://apaste.info/gaQl
> input:  "dog OR kiwi":
> https://apaste.info/sBwa
> Both queries return the same number of results (389).  The query with OR
> was
> expected to have a much higher numFound.  Those pastes have a one week
> lifetime.
> The two parsed queries are almost identical.  The AND query has a couple
> of
> extra plus signs compared to the OR query, and the OR query has a ~2 after
> a
> right paren that the AND query doesn't have.  I'm at a loss as to what
> this
> all means, except to say that it didn't turn out as expected.
> Should the two queries have returned different numbers of results?  If
> not,
> why is that the case?
> Here is the output from echoParams=all on the OR query:
> <str name="spellcheck.collateExtendedResults">true</str>
> <str name="df">text</str>
> <str name="hl">true</str>
> <str name="hl.bs.type">LINE</str>
> <str name="f.sourceid.facet.method">enum</str>
> <str name="spellcheck.maxCollations">3</str>
> <str name="tie"> 0.4</str>
> <str name="spellcheck.maxResultsForSuggest">5</str>
> <str name="qf">
> title^100 kw1ranked^100 kw1^100 keywordsranked_bm25_no_norms^50
> keywords_bm25_no_norms^50 authors text description species
> </str>
> <arr name="f.year.facet.range.other">
> <str>before</str>
> <str>after</str>
> </arr>
> <str name="hl.fl">subdocuments,keywords,authors</str>
> <str name="mm">3<-1 6<-3 9<30%</str>
> <str name="f.year.facet.range.hardend">true</str>
> <str name="hl.formatter">html</str>
> <str name="spellcheck">on</str>
> <arr name="boost">
> <str>
> max(recip(ms(NOW/DAY+1YEAR,dateint),3.16E-11,10,6),documentdatefix)
> </str>
> <str>rank</str>
> </arr>
> <str name="debugQuery">true</str>
> <str name="f.sourceid.facet.limit">1000</str>
> <str name="hl.boundaryScanner">breakIterator</str>
> <str name="spellcheck.collate">true</str>
> <str name="facet.range">year</str>
> <str name="f.year.facet.range.end">2015</str>
> <str name="spellcheck.dictionary">spell_file</str>
> <str name="indent">true</str>
> <str name="echoParams">all</str>
> <str name="fl">
>
> id,title,description,url,objecttypeid,contexturl,defaultsourceid,sourceid,score
> </str>
> <str name="hl.requireFieldMatch">false</str>
> <str name="hl.fragsize">100</str>
> <str name="spellcheck.maxCollationTries">5</str>
> <str name="f.year.facet.range.gap">5</str>
> <str name="hl.simple.pre"><strong></str>
> <arr name="facet.query">
> <str>
> {!ex=dt key="Last10yr"}dateint:[NOW/YEAR-10YEARS TO *]
> </str>
> <str>
> {!ex=dt key="Last5yr"}dateint:[NOW/YEAR-5YEARS TO *]
> </str>
> <str>
> {!ex=dt key="Last3yr"}dateint:[NOW/YEAR-3YEARS TO *]
> </str>
> <str>
> {!ex=dt key="Last1yr"}dateint:[NOW/YEAR-1YEAR TO *]
> </str>
> </arr>
> <str name="defType">edismax</str>
> <str name="hl.mergeContiguous">false</str>
> <str name="f.folderid.facet.method">enum</str>
> <str name="wt">xml</str>
> <str name="hl.highlightMultiTerm">true</str>
> <str name="q.alt">*:*</str>
> <arr name="facet.field">
> <str>folderid</str>
> <str>sourceid</str>
> <str>speciesid</str>
> <str>admin</str>
> </arr>
> <str name="f.speciesid.facet.method">enum</str>
> <str name="json.nl">map</str>
> <str name="start">0</str>
> <str name="hl.usePhraseHightligher">true</str>
> <str name="rows">25</str>
> <str name="spellcheck.alternativeTermCount">2</str>
> <str name="spellcheck.extendedResults">true</str>
> <str name="q">dog OR kiwi</str>
> <str name="f.year.facet.range.start">1970</str>
> <str name="hl.simple.post"></strong></str>
> <str name="pf">
> title~20^5000 keywordsranked_bm25_no_norms~20^5000 kw1ranked~10^5000
> keywords_bm25_no_norms~20^1500 kw1~10^500 authors^250 text~20^1000
> text~100^500 description^1
> </str>
> <str name="facet.mincount">1</str>
> <str name="hl.method">unified</str>
> <str name="spellcheck.count">10</str>
> <str name="pf3">
> title~22^1000 keywordsranked_bm25_no_norms~22^1000
> keywords_bm25_no_norms~12^500 kw1ranked~12^100 kw1~12^100 text~22^100
> </str>
> <str name="pf2">authors~11 species~11</str>
> <str name="facet">on</str>
> If anyone has any ideas about whether this behavior is expected or
> unexpected, I'd appreciate hearing them.  It is Solr 7.1.0 with a patch
> for
> SOLR-12243 applied.
> There might be information that would be helpful that isn't provided.  If
> there is something else needed, please let me know, so I can provide it.
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Edismax query returning the same number of results using AND as it does with OR

Shawn Heisey-2
In reply to this post by Nicky Mastin
Followup:

I had a theory that Nicky tested, and I think what was observed confirms the theory.

TL;DR:

In previous versions, I think there was a bug where the presence of boolean operators caused edismax to ignore the mm parameter, and only rely on the boolean operator(s).

After that bug got fixed, mm will apply to any SHOULD clauses in the query. A query of "a OR b" has two SHOULD clauses, and the mm value present in this query requires all clauses to match, so it is effectively the same as "a AND b".

A potential workaround that appears to work: Detect when the query contains a boolean operator, and in that situation, send mm=0 with the query. Alternately, just do that when the query contains "OR" - things work right with AND & NOT because these don't produce SHOULD clauses.

Thanks,
Shawn



⁣Sent from TypeApp ​

On Oct 25, 2018, 15:24, at 15:24, Nicky Mastin <[hidden email]> wrote:

>
>Oddity with edismax and queries involving boolean operators.  Here's
>the
>"parsedquery_toString" from two different queries:
>input:  "dog AND kiwi":
>https://apaste.info/gaQl
>input:  "dog OR kiwi":
>https://apaste.info/sBwa
>Both queries return the same number of results (389).  The query with
>OR was
>expected to have a much higher numFound.  Those pastes have a one week
>lifetime.
>The two parsed queries are almost identical.  The AND query has a
>couple of
>extra plus signs compared to the OR query, and the OR query has a ~2
>after a
>right paren that the AND query doesn't have.  I'm at a loss as to what
>this
>all means, except to say that it didn't turn out as expected.
>Should the two queries have returned different numbers of results?  If
>not,
>why is that the case?
>Here is the output from echoParams=all on the OR query:
><str name="spellcheck.collateExtendedResults">true</str>
><str name="df">text</str>
><str name="hl">true</str>
><str name="hl.bs.type">LINE</str>
><str name="f.sourceid.facet.method">enum</str>
><str name="spellcheck.maxCollations">3</str>
><str name="tie"> 0.4</str>
><str name="spellcheck.maxResultsForSuggest">5</str>
><str name="qf">
>title^100 kw1ranked^100 kw1^100 keywordsranked_bm25_no_norms^50
>keywords_bm25_no_norms^50 authors text description species
></str>
><arr name="f.year.facet.range.other">
><str>before</str>
><str>after</str>
></arr>
><str name="hl.fl">subdocuments,keywords,authors</str>
><str name="mm">3<-1 6<-3 9<30%</str>
><str name="f.year.facet.range.hardend">true</str>
><str name="hl.formatter">html</str>
><str name="spellcheck">on</str>
><arr name="boost">
><str>
>max(recip(ms(NOW/DAY+1YEAR,dateint),3.16E-11,10,6),documentdatefix)
></str>
><str>rank</str>
></arr>
><str name="debugQuery">true</str>
><str name="f.sourceid.facet.limit">1000</str>
><str name="hl.boundaryScanner">breakIterator</str>
><str name="spellcheck.collate">true</str>
><str name="facet.range">year</str>
><str name="f.year.facet.range.end">2015</str>
><str name="spellcheck.dictionary">spell_file</str>
><str name="indent">true</str>
><str name="echoParams">all</str>
><str name="fl">
>id,title,description,url,objecttypeid,contexturl,defaultsourceid,sourceid,score
></str>
><str name="hl.requireFieldMatch">false</str>
><str name="hl.fragsize">100</str>
><str name="spellcheck.maxCollationTries">5</str>
><str name="f.year.facet.range.gap">5</str>
><str name="hl.simple.pre"><strong></str>
><arr name="facet.query">
><str>
>{!ex=dt key="Last10yr"}dateint:[NOW/YEAR-10YEARS TO *]
></str>
><str>
>{!ex=dt key="Last5yr"}dateint:[NOW/YEAR-5YEARS TO *]
></str>
><str>
>{!ex=dt key="Last3yr"}dateint:[NOW/YEAR-3YEARS TO *]
></str>
><str>
>{!ex=dt key="Last1yr"}dateint:[NOW/YEAR-1YEAR TO *]
></str>
></arr>
><str name="defType">edismax</str>
><str name="hl.mergeContiguous">false</str>
><str name="f.folderid.facet.method">enum</str>
><str name="wt">xml</str>
><str name="hl.highlightMultiTerm">true</str>
><str name="q.alt">*:*</str>
><arr name="facet.field">
><str>folderid</str>
><str>sourceid</str>
><str>speciesid</str>
><str>admin</str>
></arr>
><str name="f.speciesid.facet.method">enum</str>
><str name="json.nl">map</str>
><str name="start">0</str>
><str name="hl.usePhraseHightligher">true</str>
><str name="rows">25</str>
><str name="spellcheck.alternativeTermCount">2</str>
><str name="spellcheck.extendedResults">true</str>
><str name="q">dog OR kiwi</str>
><str name="f.year.facet.range.start">1970</str>
><str name="hl.simple.post"></strong></str>
><str name="pf">
>title~20^5000 keywordsranked_bm25_no_norms~20^5000 kw1ranked~10^5000
>keywords_bm25_no_norms~20^1500 kw1~10^500 authors^250 text~20^1000
>text~100^500 description^1
></str>
><str name="facet.mincount">1</str>
><str name="hl.method">unified</str>
><str name="spellcheck.count">10</str>
><str name="pf3">
>title~22^1000 keywordsranked_bm25_no_norms~22^1000
>keywords_bm25_no_norms~12^500 kw1ranked~12^100 kw1~12^100 text~22^100
></str>
><str name="pf2">authors~11 species~11</str>
><str name="facet">on</str>
>If anyone has any ideas about whether this behavior is expected or
>unexpected, I'd appreciate hearing them.  It is Solr 7.1.0 with a patch
>for
>SOLR-12243 applied.
>There might be information that would be helpful that isn't provided.
>If
>there is something else needed, please let me know, so I can provide
>it.