Using Synonym Graph Filter does not tokenize the query string if it has multi-word synonym

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Using Synonym Graph Filter does not tokenize the query string if it has multi-word synonym

atin janki

Hello everyone,

I am using solr 8.3.

After I included Synonym Graph Filter in my managed-schema file, I have noticed that if the query string contains a multi-word synonym, it considers that multi-word synonym as a single term and does not break it, further suppressing the default search behaviour.

Here "soap powder" is the search query which is also a multi-word synonym in the synonym file as-

s(104254535,1,'soap powder',n,1,1).
s(104254535,2,'built-soap powder',n,1,0).
s(104254535,3,'washing powder',n,1,0).


I am sharing some screenshots for understanding the problem-

without Synonym Graph Filter (2 docs returned) - 


with Synonym Graph Filter (2 docs expected, only 1 returned)

Has anyone experienced this before? If yes, is there any workaround ?
Or is it an expected behaviour?

Regards,
Atin Janki
Reply | Threaded
Open this post in threaded view
|

Re: Using Synonym Graph Filter does not tokenize the query string if it has multi-word synonym

Paras Lehana
Hi Atin,

Please host your images on some other site as they won't reach the mailing
list as attachments. I had researched about Synonym support for a week
before enabling them in Auto-Suggest. Why do you want multi-term synonyms
to break? I guess only for matching documents and not tokenized synonyms.

I think you should try setting sow (split on whitespace) to true. Read more
here:
https://lucene.apache.org/solr/guide/6_6/the-extended-dismax-query-parser.html#TheExtendedDisMaxQueryParser-ThesowParameter
.

A good article about this:
https://opensourceconnections.com/blog/2018/02/20/edismax-and-multiterm-synonyms-oddities/

This is a very naive guess. I would need the screenshots and some more
details. :)

On Mon, 2 Mar 2020 at 01:17, atin janki <[hidden email]> wrote:

> Hello everyone,
>
> I am using solr 8.3.
>
> After I included Synonym Graph Filter in my managed-schema file, I have
> noticed that if the query string contains a multi-word synonym, it
> considers that multi-word synonym as a single term and does not break it,
> further suppressing the default search behaviour.
>
> Here "soap powder" is the search query which is also a multi-word synonym
> in the synonym file as-
>
> s(104254535,1,'soap powder',n,1,1).
> s(104254535,2,'built-soap powder',n,1,0).
> s(104254535,3,'washing powder',n,1,0).
>
>
> I am sharing some screenshots for understanding the problem-
>
> *without* Synonym Graph Filter (2 docs returned) -
> [image: image.png]
>
>
> *with* Synonym Graph Filter (2 docs expected, only 1 returned)
>
> [image: image.png]
>
> Has anyone experienced this before? If yes, is there any workaround ?
> Or is it an expected behaviour?
>
> Regards,
> Atin Janki
>
>

--
--
Regards,

*Paras Lehana* [65871]
Development Engineer, *Auto-Suggest*,
IndiaMART InterMESH Ltd,

11th Floor, Tower 2, Assotech Business Cresterra,
Plot No. 22, Sector 135, Noida, Uttar Pradesh, India 201305

Mob.: +91-9560911996
Work: 0120-4056700 | Extn:
*1196*

--
*
*

 <https://www.facebook.com/IndiaMART/videos/578196442936091/>
Reply | Threaded
Open this post in threaded view
|

Using Synonym Graph Filter with StandardTokenizer does not tokenize the query string if it has multi-word synonym

atin janki
In reply to this post by atin janki
Hello everyone,

I am using solr 8.3.

After I included Synonym Graph Filter in my managed-schema file, I
have noticed that if the query string contains a multi-word synonym,
it considers that multi-word synonym as a single term and does not
break it, further suppressing the default search behaviour.

I am using StandardTokenizer.

Below is a snippet from managed-schema file -

>
> *  <fieldType name="text_general" class="solr.TextField" positionIncrementGap="100" multiValued="true">*
> *    <analyzer type="index">*
> *      <tokenizer class="solr.StandardTokenizerFactory"/>*
> *      <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>*
> *      <filter class="solr.LowerCaseFilterFactory"/>*
> *    </analyzer>*
> *    <analyzer type="query">*
> *      <tokenizer class="solr.StandardTokenizerFactory"/>*
> *      <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>*
> *      <filter class="solr.SynonymGraphFilterFactory" expand="true" ignoreCase="true" synonyms="synonyms.txt"/>*
> *      <filter class="solr.LowerCaseFilterFactory"/>*
> *    </analyzer>**  </fieldType>*


Here "*soap powder*" is the search *query* which is also a multi-word
synonym in the synonym file as-

> s(104254535,1,'soap powder',n,1,1).
> s(104254535,2,'built-soap powder',n,1,0).
> s(104254535,3,'washing powder',n,1,0).


I am sharing some screenshots for understanding the problem-

*without* Synonym Graph Filter => 2 docs returned  (screenshot at
below mentioned URL) -

https://ibb.co/zQXx7mV

*with* Synonym Graph Filter => 2 docs expected, only 1 returned
(screenshot at below mentioned URL) -

https://ibb.co/tp04Rzw


Has anyone experienced this before? If yes, is there any workaround ?
Or is it an expected behaviour?

Regards,
Atin Janki
Reply | Threaded
Open this post in threaded view
|

Re: Using Synonym Graph Filter with StandardTokenizer does not tokenize the query string if it has multi-word synonym

Audrey Lorberfeld - Audrey.Lorberfeld@ibm.com
Have you set sow=true in your search handler? I know that we have it set to false (sow = split on whitespace) because we WANT multi-token synonyms retained as multiple tokens.

On 3/16/20, 10:49 AM, "atin janki" <[hidden email]> wrote:

    Hello everyone,
   
    I am using solr 8.3.
   
    After I included Synonym Graph Filter in my managed-schema file, I
    have noticed that if the query string contains a multi-word synonym,
    it considers that multi-word synonym as a single term and does not
    break it, further suppressing the default search behaviour.
   
    I am using StandardTokenizer.
   
    Below is a snippet from managed-schema file -
   
    >
    > *  <fieldType name="text_general" class="solr.TextField" positionIncrementGap="100" multiValued="true">*
    > *    <analyzer type="index">*
    > *      <tokenizer class="solr.StandardTokenizerFactory"/>*
    > *      <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>*
    > *      <filter class="solr.LowerCaseFilterFactory"/>*
    > *    </analyzer>*
    > *    <analyzer type="query">*
    > *      <tokenizer class="solr.StandardTokenizerFactory"/>*
    > *      <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>*
    > *      <filter class="solr.SynonymGraphFilterFactory" expand="true" ignoreCase="true" synonyms="synonyms.txt"/>*
    > *      <filter class="solr.LowerCaseFilterFactory"/>*
    > *    </analyzer>**  </fieldType>*
   
   
    Here "*soap powder*" is the search *query* which is also a multi-word
    synonym in the synonym file as-
   
    > s(104254535,1,'soap powder',n,1,1).
    > s(104254535,2,'built-soap powder',n,1,0).
    > s(104254535,3,'washing powder',n,1,0).
   
   
    I am sharing some screenshots for understanding the problem-
   
    *without* Synonym Graph Filter => 2 docs returned  (screenshot at
    below mentioned URL) -
   
    https://urldefense.proofpoint.com/v2/url?u=https-3A__ibb.co_zQXx7mV&d=DwIBaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=20lvJFDIjFQqyiTdHseNNeSlDRT2YSznQPoQnxGJQfM&s=QUaaR69psn7pqa3DtaC7MrTMFstQrQHgeuY0qeQTc0k&e= 
   
    *with* Synonym Graph Filter => 2 docs expected, only 1 returned
    (screenshot at below mentioned URL) -
   
    https://urldefense.proofpoint.com/v2/url?u=https-3A__ibb.co_tp04Rzw&d=DwIBaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=20lvJFDIjFQqyiTdHseNNeSlDRT2YSznQPoQnxGJQfM&s=pLPVuD71W1IhokvFuu4F672lX8Nk07b0X9pCVETRjks&e= 
   
   
    Has anyone experienced this before? If yes, is there any workaround ?
    Or is it an expected behaviour?
   
    Regards,
    Atin Janki
   

Reply | Threaded
Open this post in threaded view
|

Re: Using Synonym Graph Filter with StandardTokenizer does not tokenize the query string if it has multi-word synonym

atin janki
Using sow=true, does split the word on whitespaces but it will not look for
synonyms of "soap powder" anymore, rather it expands separate synonyms for
"soap" and "powder".



Best Regards,
Atin Janki


On Mon, Mar 16, 2020 at 4:59 PM Audrey Lorberfeld -
[hidden email] <[hidden email]> wrote:

> Have you set sow=true in your search handler? I know that we have it set
> to false (sow = split on whitespace) because we WANT multi-token synonyms
> retained as multiple tokens.
>
> On 3/16/20, 10:49 AM, "atin janki" <[hidden email]> wrote:
>
>     Hello everyone,
>
>     I am using solr 8.3.
>
>     After I included Synonym Graph Filter in my managed-schema file, I
>     have noticed that if the query string contains a multi-word synonym,
>     it considers that multi-word synonym as a single term and does not
>     break it, further suppressing the default search behaviour.
>
>     I am using StandardTokenizer.
>
>     Below is a snippet from managed-schema file -
>
>     >
>     > *  <fieldType name="text_general" class="solr.TextField"
> positionIncrementGap="100" multiValued="true">*
>     > *    <analyzer type="index">*
>     > *      <tokenizer class="solr.StandardTokenizerFactory"/>*
>     > *      <filter class="solr.StopFilterFactory" words="stopwords.txt"
> ignoreCase="true"/>*
>     > *      <filter class="solr.LowerCaseFilterFactory"/>*
>     > *    </analyzer>*
>     > *    <analyzer type="query">*
>     > *      <tokenizer class="solr.StandardTokenizerFactory"/>*
>     > *      <filter class="solr.StopFilterFactory" words="stopwords.txt"
> ignoreCase="true"/>*
>     > *      <filter class="solr.SynonymGraphFilterFactory" expand="true"
> ignoreCase="true" synonyms="synonyms.txt"/>*
>     > *      <filter class="solr.LowerCaseFilterFactory"/>*
>     > *    </analyzer>**  </fieldType>*
>
>
>     Here "*soap powder*" is the search *query* which is also a multi-word
>     synonym in the synonym file as-
>
>     > s(104254535,1,'soap powder',n,1,1).
>     > s(104254535,2,'built-soap powder',n,1,0).
>     > s(104254535,3,'washing powder',n,1,0).
>
>
>     I am sharing some screenshots for understanding the problem-
>
>     *without* Synonym Graph Filter => 2 docs returned  (screenshot at
>     below mentioned URL) -
>
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__ibb.co_zQXx7mV&d=DwIBaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=20lvJFDIjFQqyiTdHseNNeSlDRT2YSznQPoQnxGJQfM&s=QUaaR69psn7pqa3DtaC7MrTMFstQrQHgeuY0qeQTc0k&e=
>
>     *with* Synonym Graph Filter => 2 docs expected, only 1 returned
>     (screenshot at below mentioned URL) -
>
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__ibb.co_tp04Rzw&d=DwIBaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=20lvJFDIjFQqyiTdHseNNeSlDRT2YSznQPoQnxGJQfM&s=pLPVuD71W1IhokvFuu4F672lX8Nk07b0X9pCVETRjks&e=
>
>
>     Has anyone experienced this before? If yes, is there any workaround ?
>     Or is it an expected behaviour?
>
>     Regards,
>     Atin Janki
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Re: Using Synonym Graph Filter with StandardTokenizer does not tokenize the query string if it has multi-word synonym

Audrey Lorberfeld - Audrey.Lorberfeld@ibm.com
To confirm, you want a synonym like "soap powder" to map onto synonyms like "hand soap," "hygiene products," etc? As in, more of a cognitive synonym mapping where you feed synonyms that only apply to the multi-token phrase as a whole?

On 3/16/20, 12:17 PM, "atin janki" <[hidden email]> wrote:

    Using sow=true, does split the word on whitespaces but it will not look for
    synonyms of "soap powder" anymore, rather it expands separate synonyms for
    "soap" and "powder".
   
   
   
    Best Regards,
    Atin Janki
   
   
    On Mon, Mar 16, 2020 at 4:59 PM Audrey Lorberfeld -
    [hidden email] <[hidden email]> wrote:
   
    > Have you set sow=true in your search handler? I know that we have it set
    > to false (sow = split on whitespace) because we WANT multi-token synonyms
    > retained as multiple tokens.
    >
    > On 3/16/20, 10:49 AM, "atin janki" <[hidden email]> wrote:
    >
    >     Hello everyone,
    >
    >     I am using solr 8.3.
    >
    >     After I included Synonym Graph Filter in my managed-schema file, I
    >     have noticed that if the query string contains a multi-word synonym,
    >     it considers that multi-word synonym as a single term and does not
    >     break it, further suppressing the default search behaviour.
    >
    >     I am using StandardTokenizer.
    >
    >     Below is a snippet from managed-schema file -
    >
    >     >
    >     > *  <fieldType name="text_general" class="solr.TextField"
    > positionIncrementGap="100" multiValued="true">*
    >     > *    <analyzer type="index">*
    >     > *      <tokenizer class="solr.StandardTokenizerFactory"/>*
    >     > *      <filter class="solr.StopFilterFactory" words="stopwords.txt"
    > ignoreCase="true"/>*
    >     > *      <filter class="solr.LowerCaseFilterFactory"/>*
    >     > *    </analyzer>*
    >     > *    <analyzer type="query">*
    >     > *      <tokenizer class="solr.StandardTokenizerFactory"/>*
    >     > *      <filter class="solr.StopFilterFactory" words="stopwords.txt"
    > ignoreCase="true"/>*
    >     > *      <filter class="solr.SynonymGraphFilterFactory" expand="true"
    > ignoreCase="true" synonyms="synonyms.txt"/>*
    >     > *      <filter class="solr.LowerCaseFilterFactory"/>*
    >     > *    </analyzer>**  </fieldType>*
    >
    >
    >     Here "*soap powder*" is the search *query* which is also a multi-word
    >     synonym in the synonym file as-
    >
    >     > s(104254535,1,'soap powder',n,1,1).
    >     > s(104254535,2,'built-soap powder',n,1,0).
    >     > s(104254535,3,'washing powder',n,1,0).
    >
    >
    >     I am sharing some screenshots for understanding the problem-
    >
    >     *without* Synonym Graph Filter => 2 docs returned  (screenshot at
    >     below mentioned URL) -
    >
    >
    > https://urldefense.proofpoint.com/v2/url?u=https-3A__ibb.co_zQXx7mV&d=DwIBaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=20lvJFDIjFQqyiTdHseNNeSlDRT2YSznQPoQnxGJQfM&s=QUaaR69psn7pqa3DtaC7MrTMFstQrQHgeuY0qeQTc0k&e=
    >
    >     *with* Synonym Graph Filter => 2 docs expected, only 1 returned
    >     (screenshot at below mentioned URL) -
    >
    >
    > https://urldefense.proofpoint.com/v2/url?u=https-3A__ibb.co_tp04Rzw&d=DwIBaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=20lvJFDIjFQqyiTdHseNNeSlDRT2YSznQPoQnxGJQfM&s=pLPVuD71W1IhokvFuu4F672lX8Nk07b0X9pCVETRjks&e=
    >
    >
    >     Has anyone experienced this before? If yes, is there any workaround ?
    >     Or is it an expected behaviour?
    >
    >     Regards,
    >     Atin Janki
    >
    >
    >
   

Reply | Threaded
Open this post in threaded view
|

Re: Re: Using Synonym Graph Filter with StandardTokenizer does not tokenize the query string if it has multi-word synonym

atin janki
I aim to achieve an expansion like -

Synonym(soap powder) + Synonym(soap) + Synonym (powder)


which is not happening because of the Synonym expansion is being done at
the moment.

At the moment, using  Synonym Graph Filter with StandardTokenizer  and sow
= false , expands as -

 Synonym(soap powder)

because "soap powder" is a multi-word synonym present in the synonym file.

Using sow = true in the above setting will give -

Synonym(soap) + Synonym (powder)



Best Regards,
Atin Janki


On Mon, Mar 16, 2020 at 5:27 PM Audrey Lorberfeld -
[hidden email] <[hidden email]> wrote:

> To confirm, you want a synonym like "soap powder" to map onto synonyms
> like "hand soap," "hygiene products," etc? As in, more of a cognitive
> synonym mapping where you feed synonyms that only apply to the multi-token
> phrase as a whole?
>
> On 3/16/20, 12:17 PM, "atin janki" <[hidden email]> wrote:
>
>     Using sow=true, does split the word on whitespaces but it will not
> look for
>     synonyms of "soap powder" anymore, rather it expands separate synonyms
> for
>     "soap" and "powder".
>
>
>
>     Best Regards,
>     Atin Janki
>
>
>     On Mon, Mar 16, 2020 at 4:59 PM Audrey Lorberfeld -
>     [hidden email] <[hidden email]> wrote:
>
>     > Have you set sow=true in your search handler? I know that we have it
> set
>     > to false (sow = split on whitespace) because we WANT multi-token
> synonyms
>     > retained as multiple tokens.
>     >
>     > On 3/16/20, 10:49 AM, "atin janki" <[hidden email]> wrote:
>     >
>     >     Hello everyone,
>     >
>     >     I am using solr 8.3.
>     >
>     >     After I included Synonym Graph Filter in my managed-schema file,
> I
>     >     have noticed that if the query string contains a multi-word
> synonym,
>     >     it considers that multi-word synonym as a single term and does
> not
>     >     break it, further suppressing the default search behaviour.
>     >
>     >     I am using StandardTokenizer.
>     >
>     >     Below is a snippet from managed-schema file -
>     >
>     >     >
>     >     > *  <fieldType name="text_general" class="solr.TextField"
>     > positionIncrementGap="100" multiValued="true">*
>     >     > *    <analyzer type="index">*
>     >     > *      <tokenizer class="solr.StandardTokenizerFactory"/>*
>     >     > *      <filter class="solr.StopFilterFactory"
> words="stopwords.txt"
>     > ignoreCase="true"/>*
>     >     > *      <filter class="solr.LowerCaseFilterFactory"/>*
>     >     > *    </analyzer>*
>     >     > *    <analyzer type="query">*
>     >     > *      <tokenizer class="solr.StandardTokenizerFactory"/>*
>     >     > *      <filter class="solr.StopFilterFactory"
> words="stopwords.txt"
>     > ignoreCase="true"/>*
>     >     > *      <filter class="solr.SynonymGraphFilterFactory"
> expand="true"
>     > ignoreCase="true" synonyms="synonyms.txt"/>*
>     >     > *      <filter class="solr.LowerCaseFilterFactory"/>*
>     >     > *    </analyzer>**  </fieldType>*
>     >
>     >
>     >     Here "*soap powder*" is the search *query* which is also a
> multi-word
>     >     synonym in the synonym file as-
>     >
>     >     > s(104254535,1,'soap powder',n,1,1).
>     >     > s(104254535,2,'built-soap powder',n,1,0).
>     >     > s(104254535,3,'washing powder',n,1,0).
>     >
>     >
>     >     I am sharing some screenshots for understanding the problem-
>     >
>     >     *without* Synonym Graph Filter => 2 docs returned  (screenshot at
>     >     below mentioned URL) -
>     >
>     >
>     >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__ibb.co_zQXx7mV&d=DwIBaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=20lvJFDIjFQqyiTdHseNNeSlDRT2YSznQPoQnxGJQfM&s=QUaaR69psn7pqa3DtaC7MrTMFstQrQHgeuY0qeQTc0k&e=
>     >
>     >     *with* Synonym Graph Filter => 2 docs expected, only 1 returned
>     >     (screenshot at below mentioned URL) -
>     >
>     >
>     >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__ibb.co_tp04Rzw&d=DwIBaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=20lvJFDIjFQqyiTdHseNNeSlDRT2YSznQPoQnxGJQfM&s=pLPVuD71W1IhokvFuu4F672lX8Nk07b0X9pCVETRjks&e=
>     >
>     >
>     >     Has anyone experienced this before? If yes, is there any
> workaround ?
>     >     Or is it an expected behaviour?
>     >
>     >     Regards,
>     >     Atin Janki
>     >
>     >
>     >
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Re: Re: Using Synonym Graph Filter with StandardTokenizer does not tokenize the query string if it has multi-word synonym

Audrey Lorberfeld - Audrey.Lorberfeld@ibm.com
I don't think you can synonym-ize both the multi-token phrase and each individual token in the multi-token phrase at the same time. But anyone else feel free to chime in!

Best,
Audrey Lorberfeld

On 3/16/20, 12:40 PM, "atin janki" <[hidden email]> wrote:

    I aim to achieve an expansion like -
   
    Synonym(soap powder) + Synonym(soap) + Synonym (powder)
   
   
    which is not happening because of the Synonym expansion is being done at
    the moment.
   
    At the moment, using  Synonym Graph Filter with StandardTokenizer  and sow
    = false , expands as -
   
     Synonym(soap powder)
   
    because "soap powder" is a multi-word synonym present in the synonym file.
   
    Using sow = true in the above setting will give -
   
    Synonym(soap) + Synonym (powder)
   
   
   
    Best Regards,
    Atin Janki
   
   
    On Mon, Mar 16, 2020 at 5:27 PM Audrey Lorberfeld -
    [hidden email] <[hidden email]> wrote:
   
    > To confirm, you want a synonym like "soap powder" to map onto synonyms
    > like "hand soap," "hygiene products," etc? As in, more of a cognitive
    > synonym mapping where you feed synonyms that only apply to the multi-token
    > phrase as a whole?
    >
    > On 3/16/20, 12:17 PM, "atin janki" <[hidden email]> wrote:
    >
    >     Using sow=true, does split the word on whitespaces but it will not
    > look for
    >     synonyms of "soap powder" anymore, rather it expands separate synonyms
    > for
    >     "soap" and "powder".
    >
    >
    >
    >     Best Regards,
    >     Atin Janki
    >
    >
    >     On Mon, Mar 16, 2020 at 4:59 PM Audrey Lorberfeld -
    >     [hidden email] <[hidden email]> wrote:
    >
    >     > Have you set sow=true in your search handler? I know that we have it
    > set
    >     > to false (sow = split on whitespace) because we WANT multi-token
    > synonyms
    >     > retained as multiple tokens.
    >     >
    >     > On 3/16/20, 10:49 AM, "atin janki" <[hidden email]> wrote:
    >     >
    >     >     Hello everyone,
    >     >
    >     >     I am using solr 8.3.
    >     >
    >     >     After I included Synonym Graph Filter in my managed-schema file,
    > I
    >     >     have noticed that if the query string contains a multi-word
    > synonym,
    >     >     it considers that multi-word synonym as a single term and does
    > not
    >     >     break it, further suppressing the default search behaviour.
    >     >
    >     >     I am using StandardTokenizer.
    >     >
    >     >     Below is a snippet from managed-schema file -
    >     >
    >     >     >
    >     >     > *  <fieldType name="text_general" class="solr.TextField"
    >     > positionIncrementGap="100" multiValued="true">*
    >     >     > *    <analyzer type="index">*
    >     >     > *      <tokenizer class="solr.StandardTokenizerFactory"/>*
    >     >     > *      <filter class="solr.StopFilterFactory"
    > words="stopwords.txt"
    >     > ignoreCase="true"/>*
    >     >     > *      <filter class="solr.LowerCaseFilterFactory"/>*
    >     >     > *    </analyzer>*
    >     >     > *    <analyzer type="query">*
    >     >     > *      <tokenizer class="solr.StandardTokenizerFactory"/>*
    >     >     > *      <filter class="solr.StopFilterFactory"
    > words="stopwords.txt"
    >     > ignoreCase="true"/>*
    >     >     > *      <filter class="solr.SynonymGraphFilterFactory"
    > expand="true"
    >     > ignoreCase="true" synonyms="synonyms.txt"/>*
    >     >     > *      <filter class="solr.LowerCaseFilterFactory"/>*
    >     >     > *    </analyzer>**  </fieldType>*
    >     >
    >     >
    >     >     Here "*soap powder*" is the search *query* which is also a
    > multi-word
    >     >     synonym in the synonym file as-
    >     >
    >     >     > s(104254535,1,'soap powder',n,1,1).
    >     >     > s(104254535,2,'built-soap powder',n,1,0).
    >     >     > s(104254535,3,'washing powder',n,1,0).
    >     >
    >     >
    >     >     I am sharing some screenshots for understanding the problem-
    >     >
    >     >     *without* Synonym Graph Filter => 2 docs returned  (screenshot at
    >     >     below mentioned URL) -
    >     >
    >     >
    >     >
    > https://urldefense.proofpoint.com/v2/url?u=https-3A__ibb.co_zQXx7mV&d=DwIBaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=20lvJFDIjFQqyiTdHseNNeSlDRT2YSznQPoQnxGJQfM&s=QUaaR69psn7pqa3DtaC7MrTMFstQrQHgeuY0qeQTc0k&e=
    >     >
    >     >     *with* Synonym Graph Filter => 2 docs expected, only 1 returned
    >     >     (screenshot at below mentioned URL) -
    >     >
    >     >
    >     >
    > https://urldefense.proofpoint.com/v2/url?u=https-3A__ibb.co_tp04Rzw&d=DwIBaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=20lvJFDIjFQqyiTdHseNNeSlDRT2YSznQPoQnxGJQfM&s=pLPVuD71W1IhokvFuu4F672lX8Nk07b0X9pCVETRjks&e=
    >     >
    >     >
    >     >     Has anyone experienced this before? If yes, is there any
    > workaround ?
    >     >     Or is it an expected behaviour?
    >     >
    >     >     Regards,
    >     >     Atin Janki
    >     >
    >     >
    >     >
    >
    >
    >
   

Reply | Threaded
Open this post in threaded view
|

Using Synonym Graph Filter with StandardTokenizer does not tokenize the query string if it has multi-word synonym

atin janki
In reply to this post by atin janki
Hello everyone,

I am using solr 8.3.

After I included Synonym Graph Filter in my managed-schema file, I have
noticed that if the query string contains a multi-word synonym, it
considers that multi-word synonym as a single term and does not break it,
further suppressing the default search behaviour.

I am using StandardTokenizer.

Below is a snippet from managed-schema file -

  <fieldType name="text_general" class="solr.TextField"
positionIncrementGap="100" multiValued="true">
    <analyzer type="index">
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.StopFilterFactory" words="stopwords.txt"
ignoreCase="true"/>
      <filter class="solr.LowerCaseFilterFactory"/>
    </analyzer>
    <analyzer type="query">
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.StopFilterFactory" words="stopwords.txt"
ignoreCase="true"/>
      <filter class="solr.SynonymGraphFilterFactory" expand="true"
ignoreCase="true" synonyms="synonyms.txt"/>
      <filter class="solr.LowerCaseFilterFactory"/>
    </analyzer>
  </fieldType>


Here "soap powder" is the search query which is also a multi-word synonym
in the synonym file as-

s(104254535,1,'soap powder',n,1,1).
s(104254535,2,'built-soap powder',n,1,0).
s(104254535,3,'washing powder',n,1,0).

I am sharing some screenshots for understanding the problem-

without Synonym Graph Filter => 2 docs returned (screenshot at below
mentioned URL) -

https://ibb.co/zQXx7mV

with Synonym Graph Filter => 2 docs expected, only 1 returned (screenshot
at below mentioned URL) -

https://ibb.co/tp04Rzw



Has anyone experienced this before? If yes, is there any workaround ?

Or is it an expected behaviour?

Regards,
Atin Janki

>