KeywordTokenizerFactory and Standard Query Parser

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

KeywordTokenizerFactory and Standard Query Parser

Chris Ulicny
Hi all,

We have a multivalued field that has an integer at the beginning followed
by a space, and the index analyzer chain extracts that value to search on

<analyzer type="index">
  <tokenizer class="solr.PatternTokenizerFactory" pattern="^\d+"
group="0"/> </analyzer>


testField:[
34 blah blah blah
27 blah blah blah
...
]

The query analyzer chain is just a keyword tokenizer factory since the
clients are searching only for the number on that field. So one process
will attempt to send in the following query

<analyzer type="query">
  <tokenizer class="solr.KeywordTokenizerFactory"/></analyzer>


q=testField:(34 27)

However, this will not pickup the document with the example testField value
above in version 7.4.0. Passing it as an fq parameter has the same result.

My understanding was that the query parser should split the (34 27) into
search terms "34" and "27" before the query analyzer chain is even entered.
Is that not correct anymore?

Thanks,
Chris
Reply | Threaded
Open this post in threaded view
|

Re: KeywordTokenizerFactory and Standard Query Parser

Chris Ulicny
Actually, nevermind. I found the part of the upgrade to 7 that was missed

" The sow (split-on-whitespace) request param now defaults to false (true
in previous versions). This affects the edismax and standard/"lucene" query
parsers: if the sow param is not specified, query text will not be split on
whitespace before analysis. See
https://lucidworks.com/2017/04/18/multi-word-synonyms-solr-adds-query-time-support/
."

On Tue, Apr 2, 2019 at 8:11 AM Chris Ulicny <[hidden email]> wrote:

> Hi all,
>
> We have a multivalued field that has an integer at the beginning followed
> by a space, and the index analyzer chain extracts that value to search on
>
> <analyzer type="index">
>   <tokenizer class="solr.PatternTokenizerFactory" pattern="^\d+" group="0"/> </analyzer>
>
>
> testField:[
> 34 blah blah blah
> 27 blah blah blah
> ...
> ]
>
> The query analyzer chain is just a keyword tokenizer factory since the
> clients are searching only for the number on that field. So one process
> will attempt to send in the following query
>
> <analyzer type="query">
>   <tokenizer class="solr.KeywordTokenizerFactory"/></analyzer>
>
>
> q=testField:(34 27)
>
> However, this will not pickup the document with the example testField
> value above in version 7.4.0. Passing it as an fq parameter has the same
> result.
>
> My understanding was that the query parser should split the (34 27) into
> search terms "34" and "27" before the query analyzer chain is even entered.
> Is that not correct anymore?
>
> Thanks,
> Chris
>
>
>
>
>
>