Dismax and StandardTokenizer: OR queries despite mm=100%

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Dismax and StandardTokenizer: OR queries despite mm=100%

ahubold
Hi,

we're using Solr 4.10.4 and the dismax query parser to search across
multiple fields. One of the fields is configured with a
StandardTokenizer (type "text_general"). I set mm=100% to only get hits
that match all terms.

This does not seem to work for queries that are split into multiple
tokens. For example a query for "CC-WAV-001" (tokenized to "cc", "wav",
"001") returns documents that only have "cc" in it. I need a result with
documents that contains all tokens - as returned by the /select handler.

Is there a way to force AND semantics for such dismax queries? I also
tried to set q.op=AND but it did not help.

The query is parsed as:

(+DisjunctionMaxQuery(((textbody:cc textbody:wav textbody:001) |
productCode:CC-WAV-001)~0.1) DisjunctionMaxQuery((textbody:"cc wav
001")~0.1))/no_coord

Thanks in advance!

Regards,
Andreas
Reply | Threaded
Open this post in threaded view
|

Re: Dismax and StandardTokenizer: OR queries despite mm=100%

Ahmet Arslan
Hi Andreas,

Thats weird. It looks like mm calculation is done before the tokenization took place.

You can try to set autoGeneratePhraseQueries to true
or replace dashes with white-spaces at client side.

Ahmet



On Wednesday, September 23, 2015 10:00 PM, Andreas Hubold <[hidden email]> wrote:
Hi,

we're using Solr 4.10.4 and the dismax query parser to search across
multiple fields. One of the fields is configured with a
StandardTokenizer (type "text_general"). I set mm=100% to only get hits
that match all terms.

This does not seem to work for queries that are split into multiple
tokens. For example a query for "CC-WAV-001" (tokenized to "cc", "wav",
"001") returns documents that only have "cc" in it. I need a result with
documents that contains all tokens - as returned by the /select handler.

Is there a way to force AND semantics for such dismax queries? I also
tried to set q.op=AND but it did not help.

The query is parsed as:

(+DisjunctionMaxQuery(((textbody:cc textbody:wav textbody:001) |
productCode:CC-WAV-001)~0.1) DisjunctionMaxQuery((textbody:"cc wav
001")~0.1))/no_coord

Thanks in advance!

Regards,
Andreas
Reply | Threaded
Open this post in threaded view
|

Re: Dismax and StandardTokenizer: OR queries despite mm=100%

Billnbell
In reply to this post by ahubold
Use fq

Bill Bell
Sent from mobile


> On Sep 23, 2015, at 1:00 PM, Andreas Hubold <[hidden email]> wrote:
>
> Hi,
>
> we're using Solr 4.10.4 and the dismax query parser to search across multiple fields. One of the fields is configured with a StandardTokenizer (type "text_general"). I set mm=100% to only get hits that match all terms.
>
> This does not seem to work for queries that are split into multiple tokens. For example a query for "CC-WAV-001" (tokenized to "cc", "wav", "001") returns documents that only have "cc" in it. I need a result with documents that contains all tokens - as returned by the /select handler.
>
> Is there a way to force AND semantics for such dismax queries? I also tried to set q.op=AND but it did not help.
>
> The query is parsed as:
>
> (+DisjunctionMaxQuery(((textbody:cc textbody:wav textbody:001) | productCode:CC-WAV-001)~0.1) DisjunctionMaxQuery((textbody:"cc wav 001")~0.1))/no_coord
>
> Thanks in advance!
>
> Regards,
> Andreas
Reply | Threaded
Open this post in threaded view
|

Re: Dismax and StandardTokenizer: OR queries despite mm=100%

ahubold
In reply to this post by Ahmet Arslan
Thank you, autoGeneratePhraseQueries did the job.

I assume that this setting just affects query generation and I don't
need to reindex after changing the field type accordingly. Is this correct?

BTW, I just found SOLR-3589 where the same issue was reported and fixed
for the edismax parser. It seems it was fixed for edismax but not for
dismax.

Andreas

Ahmet Arslan wrote on 09/23/2015 09:25 PM:

> Hi Andreas,
>
> Thats weird. It looks like mm calculation is done before the tokenization took place.
>
> You can try to set autoGeneratePhraseQueries to true
> or replace dashes with white-spaces at client side.
>
> Ahmet
>
>
>
> On Wednesday, September 23, 2015 10:00 PM, Andreas Hubold <[hidden email]> wrote:
> Hi,
>
> we're using Solr 4.10.4 and the dismax query parser to search across
> multiple fields. One of the fields is configured with a
> StandardTokenizer (type "text_general"). I set mm=100% to only get hits
> that match all terms.
>
> This does not seem to work for queries that are split into multiple
> tokens. For example a query for "CC-WAV-001" (tokenized to "cc", "wav",
> "001") returns documents that only have "cc" in it. I need a result with
> documents that contains all tokens - as returned by the /select handler.
>
> Is there a way to force AND semantics for such dismax queries? I also
> tried to set q.op=AND but it did not help.
>
> The query is parsed as:
>
> (+DisjunctionMaxQuery(((textbody:cc textbody:wav textbody:001) |
> productCode:CC-WAV-001)~0.1) DisjunctionMaxQuery((textbody:"cc wav
> 001")~0.1))/no_coord
>
> Thanks in advance!
>
> Regards,
> Andreas
>

Reply | Threaded
Open this post in threaded view
|

Re: Dismax and StandardTokenizer: OR queries despite mm=100%

Ahmet Arslan
Hi Andreas,

You are correct, no re-indexing required for autoGeneratePhraseQueries.

Ahmet



On Thursday, September 24, 2015 3:52 PM, Andreas Hubold <[hidden email]> wrote:
Thank you, autoGeneratePhraseQueries did the job.

I assume that this setting just affects query generation and I don't
need to reindex after changing the field type accordingly. Is this correct?

BTW, I just found SOLR-3589 where the same issue was reported and fixed
for the edismax parser. It seems it was fixed for edismax but not for
dismax.

Andreas


Ahmet Arslan wrote on 09/23/2015 09:25 PM:

> Hi Andreas,
>
> Thats weird. It looks like mm calculation is done before the tokenization took place.
>
> You can try to set autoGeneratePhraseQueries to true
> or replace dashes with white-spaces at client side.
>
> Ahmet
>
>
>
> On Wednesday, September 23, 2015 10:00 PM, Andreas Hubold <[hidden email]> wrote:
> Hi,
>
> we're using Solr 4.10.4 and the dismax query parser to search across
> multiple fields. One of the fields is configured with a
> StandardTokenizer (type "text_general"). I set mm=100% to only get hits
> that match all terms.
>
> This does not seem to work for queries that are split into multiple
> tokens. For example a query for "CC-WAV-001" (tokenized to "cc", "wav",
> "001") returns documents that only have "cc" in it. I need a result with
> documents that contains all tokens - as returned by the /select handler.
>
> Is there a way to force AND semantics for such dismax queries? I also
> tried to set q.op=AND but it did not help.
>
> The query is parsed as:
>
> (+DisjunctionMaxQuery(((textbody:cc textbody:wav textbody:001) |
> productCode:CC-WAV-001)~0.1) DisjunctionMaxQuery((textbody:"cc wav
> 001")~0.1))/no_coord
>
> Thanks in advance!
>
> Regards,
> Andreas
>