PF, PF2, PF3 clauses missing in solr7 with query-time synonyms?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

PF, PF2, PF3 clauses missing in solr7 with query-time synonyms?

eaph
I'm seeing pf and pf3 clauses fail to generate in long queries containing
synonyms.  Wondering if anyone else has run into this, or if it needs to be
submitted as a bug in Jira.   It is a showstopper problem for the current
project, as the pf and pf3 were pretty heavily tuned.

Using Solr 7.1; all fields are using the following type:

With query-time synonyms:
<fieldType name="my_text_general" class="solr.TextField"
positionIncrementGap="100" autoGeneratePhraseQueries="true">
<analyzer type="index">
<charFilter class="solr.PatternReplaceCharFilterFactory"
pattern="(?i)\b(anti|hypo|hyper|non)[-\\/ ](\w+)\b" replacement="$1$2"/>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.WordDelimiterGraphFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="1" catenateNumbers="1"
catenateAll="0" splitOnCaseChange="0" stemEnglishPossessive="1"
 protected="protwords_wdff.txt"/>
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" />
<filter class="solr.TrimFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ASCIIFoldingFilterFactory"/>
<filter class="solr.EnglishMinimalStemFilterFactory"/>
<filter class="solr.KeywordMarkerFilterFactory"
protected="protwords_nostem.txt"/>
<filter class="solr.KStemFilterFactory"/>
<filter class="solr.FlattenGraphFilterFactory" />
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
  </analyzer>
<analyzer type="query">
<charFilter class="solr.PatternReplaceCharFilterFactory"
pattern="(?i)\b(anti|hypo|hyper|non)[-\\/ ](\w+)\b" replacement="$1$2"/>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.WordDelimiterGraphFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="0" catenateNumbers="0"
catenateAll="0" splitOnCaseChange="0" stemEnglishPossessive="1"
 protected="protwords_wdff.txt"/>
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" />
<filter class="solr.TrimFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ASCIIFoldingFilterFactory"/>
<filter class="solr.EnglishMinimalStemFilterFactory"/>
    <filter class="solr.SynonymGraphFilterFactory"  managed="synonyms_all"
/>
<filter class="solr.KeywordMarkerFilterFactory"
protected="protwords_nostem.txt"/>
<filter class="solr.KStemFilterFactory"/>
</analyzer>
<similarity class="solr.ClassicSimilarityFactory" />
</fieldType>

Without query-time synonyms:
<fieldType name="my_text_general" class="solr.TextField"
positionIncrementGap="100" autoGeneratePhraseQueries="true">
<analyzer type="index">
<charFilter class="solr.PatternReplaceCharFilterFactory"
pattern="(?i)\b(anti|hypo|hyper|non)[-\\/ ](\w+)\b" replacement="$1$2"/>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.WordDelimiterGraphFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="1" catenateNumbers="1"
catenateAll="0" splitOnCaseChange="0" stemEnglishPossessive="1"
 protected="protwords_wdff.txt"/>
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" />
<filter class="solr.TrimFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ASCIIFoldingFilterFactory"/>
<filter class="solr.EnglishMinimalStemFilterFactory"/>
    <filter class="solr.SynonymGraphFilterFactory"  managed="synonyms_all"
/>
<filter class="solr.KeywordMarkerFilterFactory"
protected="protwords_nostem.txt"/>
<filter class="solr.KStemFilterFactory"/>
<filter class="solr.FlattenGraphFilterFactory" />
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
  </analyzer>
<analyzer type="query">
<charFilter class="solr.PatternReplaceCharFilterFactory"
pattern="(?i)\b(anti|hypo|hyper|non)[-\\/ ](\w+)\b" replacement="$1$2"/>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.WordDelimiterGraphFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="0" catenateNumbers="0"
catenateAll="0" splitOnCaseChange="0" stemEnglishPossessive="1"
 protected="protwords_wdff.txt"/>
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" />
<filter class="solr.TrimFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ASCIIFoldingFilterFactory"/>
<filter class="solr.EnglishMinimalStemFilterFactory"/>
<filter class="solr.KeywordMarkerFilterFactory"
protected="protwords_nostem.txt"/>
<filter class="solr.KStemFilterFactory"/>
</analyzer>
<similarity class="solr.ClassicSimilarityFactory" />
</fieldType>

Synonyms file is pretty long, so I'll just include the relevent bits for an
example:

allergic, hypersensitive
aspirin, acetylsalicylic acid
dog, canine, canis familiris, k 9
rat, rattus


The problem seems to occur when part of the query has a synonym, but the
whole phrase is not.  Whitespace added to piece out what is going on;
believe any parentheses errors are due to my tinkering around.  Beyond that
though, this is as from Solr.  Slop has been tinkered with to identify
PF/PF2/PF3 clauses where PF fields have a slop ending in 0, pf2 ending in
1, pf3 ending in 2 eg ~10, ~11, ~12, etc.

=============
Example 1:  "aspirin dose in rats"
==============

With query-time synonyms:
===============
/// Q terms generate as expected ///
+((((kw1:\"acetylsalicylic acid\" kw1:aspirin)^100.0 |
(species:\"acetylsalicylic acid\" species:aspirin) |
(keywords_bm25_no_norms:\"acetylsalicylic acid\"
keywords_bm25_no_norms:aspirin)^50.0 | (description:\"acetylsalicylic
acid\" description:aspirin) | (kw1ranked:\"acetylsalicylic acid\"
kw1ranked:aspirin)^100.0 | (text:\"acetylsalicylic acid\" text:aspirin) |
(title:\"acetylsalicylic acid\" title:aspirin)^100.0 |
(keywordsranked_bm25_no_norms:\"acetylsalicylic acid\"
keywordsranked_bm25_no_norms:aspirin)^50.0 | (authors:\"acetylsalicylic
acid\" authors:aspirin))~0.4 ((Synonym(kw1:dosage kw1:dose kw1:dose
kw1:dose))^100.0 | Synonym(species:dosage species:dose species:dose
species:dose) | (Synonym(keywords_bm25_no_norms:dosage
keywords_bm25_no_norms:dose keywords_bm25_no_norms:dose
keywords_bm25_no_norms:dose))^50.0 | Synonym(description:dosage
description:dose description:dose description:dose) |
(Synonym(kw1ranked:dosage kw1ranked:dose kw1ranked:dose
kw1ranked:dose))^100.0 | Synonym(text:dosage text:dose text:dose text:dose)
| (Synonym(title:dosage title:dose title:dose title:dose))^100.0 |
(Synonym(keywordsranked_bm25_no_norms:dosage
keywordsranked_bm25_no_norms:dose keywordsranked_bm25_no_norms:dose
keywordsranked_bm25_no_norms:dose))^50.0 | Synonym(authors:dosage
authors:dose authors:dose authors:dose))~0.4 ((Synonym(kw1:rat
kw1:rattu))^100.0 | Synonym(species:rat species:rattu) |
(Synonym(keywords_bm25_no_norms:rat keywords_bm25_no_norms:rattu))^50.0 |
Synonym(description:rat description:rattu) | (Synonym(kw1ranked:rat
kw1ranked:rattu))^100.0 | Synonym(text:rat text:rattu) | (Synonym(title:rat
title:rattu))^100.0 | (Synonym(keywordsranked_bm25_no_norms:rat
keywordsranked_bm25_no_norms:rattu))^50.0 | Synonym(authors:rat
authors:rattu))~0.4)~3)

/// PF and PF2 are missing. ///
 () () () () ()

/// This is actually PF3 with a missing ? where the stopword 'in' belonged.
///
 ((title:\"(dosage dose dose dose) (rattu rat)\"~22)^1000.0 |
(keywordsranked_bm25_no_norms:\"(dosage dose dose dose) (rattu
rat)\"~22)^1000.0 | (text:\"(dosage dose dose dose) (rattu
rat)\"~22)^100.0)~0.4 ((keywords_bm25_no_norms:\"(dosage dose dose dose)
(rattu rat)\"~12)^500.0 | (kw1ranked:\"(dosage dose dose dose) (rattu
rat)\"~12)^100.0 | (kw1:\"(dosage dose dose dose) (rattu
rat)\"~12)^100.0)~0.4,product(max(10.0/(3.16E-11*float(ms(const(1555545600000),date(dateint)))+6.0),int(documentdatefix)),scale(map(int(rank),-1.0,-1.0,const(0.5),null),0.5,2.0)))",

With index-time synonyms:
===============

/// Q ///
 "boost(+((((kw1:aspirin)^100.0 | species:aspirin |
(keywords_bm25_no_norms:aspirin)^50.0 | description:aspirin |
(kw1ranked:aspirin)^100.0 | text:aspirin | (title:aspirin)^100.0 |
(keywordsranked_bm25_no_norms:aspirin)^50.0 | authors:aspirin)~0.4
((kw1:dose)^100.0 | species:dose | (keywords_bm25_no_norms:dose)^50.0 |
description:dose | (kw1ranked:dose)^100.0 | text:dose | (title:dose)^100.0
| (keywordsranked_bm25_no_norms:dose)^50.0 | authors:dose)~0.4
((kw1:rats)^100.0 | species:rats | (keywords_bm25_no_norms:rats)^50.0 |
description:rats | (kw1ranked:rats)^100.0 | text:rats | (title:rats)^100.0
| (keywordsranked_bm25_no_norms:rats)^50.0 | authors:rats)~0.4)~3)
/// PF  ///
  ((title:\"aspirin dose ? rats\"~20)^5000.0 |
(keywordsranked_bm25_no_norms:\"aspirin dose ? rats\"~20)^5000.0 |
(keywords_bm25_no_norms:\"aspirin dose ? rats\"~20)^1500.0 |
(text:\"aspirin dose ? rats\"~20)^1000.0)~0.4 ((kw1ranked:\"aspirin dose ?
rats\"~10)^5000.0 | (kw1:\"aspirin dose ? rats\"~10)^500.0)~0.4
((authors:\"aspirin dose ? rats\")^250.0 | description:\"aspirin dose ?
rats\")~0.4

/// PF2 ///
  ((text:\"aspirin dose ? rats\"~100)^500.0)~0.4 (authors:\"aspirin
dose\"~11 | species:\"aspirin dose\"~11)~0.4

/// PF3 ///
(((title:\"aspirin dose\"~22)^1000.0 |
(keywordsranked_bm25_no_norms:\"aspirin dose\"~22)^1000.0 | (text:\"aspirin
dose\"~22)^100.0)~0.4 ((title:\"dose ? rats\"~22)^1000.0 |
(keywordsranked_bm25_no_norms:\"dose ? rats\"~22)^1000.0 | (text:\"dose ?
rats\"~22)^100.0)~0.4) (((keywords_bm25_no_norms:\"aspirin dose\"~12)^500.0
| (kw1ranked:\"aspirin dose\"~12)^100.0 | (kw1:\"aspirin
dose\"~12)^100.0)~0.4 ((keywords_bm25_no_norms:\"dose ? rats\"~12)^500.0 |
(kw1ranked:\"dose ? rats\"~12)^100.0 | (kw1:\"dose ?
rats\"~12)^100.0)~0.4),product(max(10.0/(3.16E-11*float(ms(const(1555545600000),date(dateint)))+6.0),int(documentdatefix)),scale(map(int(rank),-1.0,-1.0,const(0.5),null),0.5,2.0)))",


===============
Example 2: "allergic reaction dogs"
The underlying issue isn't specifically PF, PF2, PF3. The following example
picks up PF2, but not PF or PF3
===============

With Query-time synonyms:
///  Q ///
parsedquery_toString":"boost(
+((((Synonym(kw1:allergic kw1:allergy kw1:hypersensitive
kw1:hypersensitive))^100.0 | Synonym(species:allergic species:allergy
species:hypersensitive species:hypersensitive) |
(Synonym(keywords_bm25_no_norms:allergic keywords_bm25_no_norms:allergy
keywords_bm25_no_norms:hypersensitive
keywords_bm25_no_norms:hypersensitive))^50.0 | Synonym(description:allergic
description:allergy description:hypersensitive description:hypersensitive)
| (Synonym(kw1ranked:allergic kw1ranked:allergy kw1ranked:hypersensitive
kw1ranked:hypersensitive))^100.0 | Synonym(text:allergic text:allergy
text:hypersensitive text:hypersensitive) | (Synonym(title:allergic
title:allergy title:hypersensitive title:hypersensitive))^100.0 |
(Synonym(keywordsranked_bm25_no_norms:allergic
keywordsranked_bm25_no_norms:allergy
keywordsranked_bm25_no_norms:hypersensitive
keywordsranked_bm25_no_norms:hypersensitive))^50.0 |
Synonym(authors:allergic authors:allergy authors:hypersensitive
authors:hypersensitive))~0.4 ((kw1:reaction)^100.0 | species:reaction |
(keywords_bm25_no_norms:reaction)^50.0 | description:reaction |
(kw1ranked:reaction)^100.0 | text:reaction | (title:reaction)^100.0 |
(keywordsranked_bm25_no_norms:reaction)^50.0 | authors:reaction)~0.4
((kw1:\"cani familiari\" kw1:canine kw1:\"k 9\" kw1:\"cani lupu familiari\"
kw1:dog)^100.0 | (species:\"cani familiari\" species:canine species:\"k 9\"
species:\"cani lupu familiari\" species:dog) |
(keywords_bm25_no_norms:\"cani familiari\" keywords_bm25_no_norms:canine
keywords_bm25_no_norms:\"k 9\" keywords_bm25_no_norms:\"cani lupu
familiari\" keywords_bm25_no_norms:dog)^50.0 | (description:\"cani
familiari\" description:canine description:\"k 9\" description:\"cani lupu
familiari\" description:dog) | (kw1ranked:\"cani familiari\"
kw1ranked:canine kw1ranked:\"k 9\" kw1ranked:\"cani lupu familiari\"
kw1ranked:dog)^100.0 | (text:\"cani familiari\" text:canine text:\"k 9\"
text:\"cani lupu familiari\" text:dog) | (title:\"cani familiari\"
title:canine title:\"k 9\" title:\"cani lupu familiari\" title:dog)^100.0 |
(keywordsranked_bm25_no_norms:\"cani familiari\"
keywordsranked_bm25_no_norms:canine keywordsranked_bm25_no_norms:\"k 9\"
keywordsranked_bm25_no_norms:\"cani lupu familiari\"
keywordsranked_bm25_no_norms:dog)^50.0 | (authors:\"cani familiari\"
authors:canine authors:\"k 9\" authors:\"cani lupu familiari\"
authors:dog))~0.4)~3)

/// PF ///
() () () ()

/// PF2 ////
(authors:\"(hypersensitive allergy hypersensitive allergic) reaction\"~11 |
species:\"(hypersensitive allergy hypersensitive allergic)
reaction\"~11)~0.4

/// PF3 ///
() (),
product(max(10.0/(3.16E-11*float(ms(const(1555545600000),date(dateint)))+6.0),int(documentdatefix)),scale(map(int(rank),-1.0,-1.0,const(0.5),null),0.5,2.0)))",

With index-timy synonyms:
/// Q ///
+((((kw1:allergic)^100.0 | species:allergic |
(keywords_bm25_no_norms:allergic)^50.0 | description:allergic |
(kw1ranked:allergic)^100.0 | text:allergic | (title:allergic)^100.0 |
(keywordsranked_bm25_no_norms:allergic)^50.0 | authors:allergic)~0.4
((kw1:reaction)^100.0 | species:reaction |
(keywords_bm25_no_norms:reaction)^50.0 | description:reaction |
(kw1ranked:reaction)^100.0 | text:reaction | (title:reaction)^100.0 |
(keywordsranked_bm25_no_norms:reaction)^50.0 | authors:reaction)~0.4
((kw1:dog)^100.0 | species:dog | (keywords_bm25_no_norms:dog)^50.0 |
description:dog | (kw1ranked:dog)^100.0 | text:dog | (title:dog)^100.0 |
(keywordsranked_bm25_no_norms:dog)^50.0 | authors:dog)~0.4)~3)

/// PF ///
((title:\"allergic reaction dog\"~20)^5000.0 |
(keywordsranked_bm25_no_norms:\"allergic reaction dog\"~20)^5000.0 |
(keywords_bm25_no_norms:\"allergic reaction dog\"~20)^1500.0 |
(text:\"allergic reaction dog\"~20)^1000.0)~0.4 ((kw1ranked:\"allergic
reaction dog\"~10)^5000.0 | (kw1:\"allergic reaction dog\"~10)^500.0)~0.4
((authors:\"allergic reaction dog\")^250.0 | description:\"allergic
reaction dog\")~0.4 ((text:\"allergic reaction dog\"~100)^500.0)~0.4

/// PF2 ///
((authors:\"allergic reaction\"~11 | species:\"allergic reaction\"~11)~0.4

/// PF3 ///
(authors:\"reaction dog\"~11 | species:\"reaction dog\"~11)~0.4)
((title:\"allergic reaction dog\"~22)^1000.0 |
(keywordsranked_bm25_no_norms:\"allergic reaction dog\"~22)^1000.0 |
(text:\"allergic reaction dog\"~22)^100.0)~0.4
((keywords_bm25_no_norms:\"allergic reaction dog\"~12)^500.0 |
(kw1ranked:\"allergic reaction dog\"~12)^100.0 | (kw1:\"allergic reaction
dog\"~12)^100.0)~0.4,product(max(10.0/(3.16E-11*float(ms(const(1555545600000),date(dateint)))+6.0),int(documentdatefix)),scale(map(int(rank),-1.0,-1.0,const(0.5),null),0.5,2.0)))",


Working on getting this rigged up in the debugger, but would appreciate any
feedback.

Thank you,
Elizabeth
Reply | Threaded
Open this post in threaded view
|

Re: PF, PF2, PF3 clauses missing in solr7 with query-time synonyms?

eaph
An update on this:

The problem occurs on phrase queries, using edismax, where the term in the
nested query contains a multi-word synonym.
In the example above,  dog has a multiterm synonym "canis familiaris", and
aspirin has "acetylsalicylic acid".

Creating a JIRA ticket.

Thank you,
Elizabeth


On Wed, Apr 18, 2018 at 12:38 PM, Elizabeth Haubert <
[hidden email]> wrote:

> I'm seeing pf and pf3 clauses fail to generate in long queries containing
> synonyms.  Wondering if anyone else has run into this, or if it needs to be
> submitted as a bug in Jira.   It is a showstopper problem for the current
> project, as the pf and pf3 were pretty heavily tuned.
>
> Using Solr 7.1; all fields are using the following type:
>
> With query-time synonyms:
> <fieldType name="my_text_general" class="solr.TextField"
> positionIncrementGap="100" autoGeneratePhraseQueries="true">
> <analyzer type="index">
> <charFilter class="solr.PatternReplaceCharFilterFactory"
> pattern="(?i)\b(anti|hypo|hyper|non)[-\\/ ](\w+)\b" replacement="$1$2"/>
> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> <filter class="solr.WordDelimiterGraphFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"
> stemEnglishPossessive="1"  protected="protwords_wdff.txt"/>
> <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" />
> <filter class="solr.TrimFilterFactory"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> <filter class="solr.ASCIIFoldingFilterFactory"/>
> <filter class="solr.EnglishMinimalStemFilterFactory"/>
> <filter class="solr.KeywordMarkerFilterFactory"
> protected="protwords_nostem.txt"/>
> <filter class="solr.KStemFilterFactory"/>
> <filter class="solr.FlattenGraphFilterFactory" />
> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>   </analyzer>
> <analyzer type="query">
> <charFilter class="solr.PatternReplaceCharFilterFactory"
> pattern="(?i)\b(anti|hypo|hyper|non)[-\\/ ](\w+)\b" replacement="$1$2"/>
> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> <filter class="solr.WordDelimiterGraphFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="0"
> stemEnglishPossessive="1"  protected="protwords_wdff.txt"/>
> <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" />
> <filter class="solr.TrimFilterFactory"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> <filter class="solr.ASCIIFoldingFilterFactory"/>
> <filter class="solr.EnglishMinimalStemFilterFactory"/>
>     <filter class="solr.SynonymGraphFilterFactory"
>  managed="synonyms_all" />
> <filter class="solr.KeywordMarkerFilterFactory"
> protected="protwords_nostem.txt"/>
> <filter class="solr.KStemFilterFactory"/>
> </analyzer>
> <similarity class="solr.ClassicSimilarityFactory" />
> </fieldType>
>
> Without query-time synonyms:
> <fieldType name="my_text_general" class="solr.TextField"
> positionIncrementGap="100" autoGeneratePhraseQueries="true">
> <analyzer type="index">
> <charFilter class="solr.PatternReplaceCharFilterFactory"
> pattern="(?i)\b(anti|hypo|hyper|non)[-\\/ ](\w+)\b" replacement="$1$2"/>
> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> <filter class="solr.WordDelimiterGraphFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"
> stemEnglishPossessive="1"  protected="protwords_wdff.txt"/>
> <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" />
> <filter class="solr.TrimFilterFactory"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> <filter class="solr.ASCIIFoldingFilterFactory"/>
> <filter class="solr.EnglishMinimalStemFilterFactory"/>
>     <filter class="solr.SynonymGraphFilterFactory"
>  managed="synonyms_all" />
> <filter class="solr.KeywordMarkerFilterFactory"
> protected="protwords_nostem.txt"/>
> <filter class="solr.KStemFilterFactory"/>
> <filter class="solr.FlattenGraphFilterFactory" />
> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>   </analyzer>
> <analyzer type="query">
> <charFilter class="solr.PatternReplaceCharFilterFactory"
> pattern="(?i)\b(anti|hypo|hyper|non)[-\\/ ](\w+)\b" replacement="$1$2"/>
> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> <filter class="solr.WordDelimiterGraphFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="0"
> stemEnglishPossessive="1"  protected="protwords_wdff.txt"/>
> <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" />
> <filter class="solr.TrimFilterFactory"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> <filter class="solr.ASCIIFoldingFilterFactory"/>
> <filter class="solr.EnglishMinimalStemFilterFactory"/>
> <filter class="solr.KeywordMarkerFilterFactory"
> protected="protwords_nostem.txt"/>
> <filter class="solr.KStemFilterFactory"/>
> </analyzer>
> <similarity class="solr.ClassicSimilarityFactory" />
> </fieldType>
>
> Synonyms file is pretty long, so I'll just include the relevent bits for
> an example:
>
> allergic, hypersensitive
> aspirin, acetylsalicylic acid
> dog, canine, canis familiris, k 9
> rat, rattus
>
>
> The problem seems to occur when part of the query has a synonym, but the
> whole phrase is not.  Whitespace added to piece out what is going on;
> believe any parentheses errors are due to my tinkering around.  Beyond that
> though, this is as from Solr.  Slop has been tinkered with to identify
> PF/PF2/PF3 clauses where PF fields have a slop ending in 0, pf2 ending in
> 1, pf3 ending in 2 eg ~10, ~11, ~12, etc.
>
> =============
> Example 1:  "aspirin dose in rats"
> ==============
>
> With query-time synonyms:
> ===============
> /// Q terms generate as expected ///
> +((((kw1:\"acetylsalicylic acid\" kw1:aspirin)^100.0 |
> (species:\"acetylsalicylic acid\" species:aspirin) |
> (keywords_bm25_no_norms:\"acetylsalicylic acid\" keywords_bm25_no_norms:aspirin)^50.0
> | (description:\"acetylsalicylic acid\" description:aspirin) |
> (kw1ranked:\"acetylsalicylic acid\" kw1ranked:aspirin)^100.0 |
> (text:\"acetylsalicylic acid\" text:aspirin) | (title:\"acetylsalicylic
> acid\" title:aspirin)^100.0 | (keywordsranked_bm25_no_norms:\"acetylsalicylic
> acid\" keywordsranked_bm25_no_norms:aspirin)^50.0 |
> (authors:\"acetylsalicylic acid\" authors:aspirin))~0.4
> ((Synonym(kw1:dosage kw1:dose kw1:dose kw1:dose))^100.0 |
> Synonym(species:dosage species:dose species:dose species:dose) |
> (Synonym(keywords_bm25_no_norms:dosage keywords_bm25_no_norms:dose
> keywords_bm25_no_norms:dose keywords_bm25_no_norms:dose))^50.0 |
> Synonym(description:dosage description:dose description:dose
> description:dose) | (Synonym(kw1ranked:dosage kw1ranked:dose kw1ranked:dose
> kw1ranked:dose))^100.0 | Synonym(text:dosage text:dose text:dose text:dose)
> | (Synonym(title:dosage title:dose title:dose title:dose))^100.0 |
> (Synonym(keywordsranked_bm25_no_norms:dosage keywordsranked_bm25_no_norms:dose
> keywordsranked_bm25_no_norms:dose keywordsranked_bm25_no_norms:dose))^50.0
> | Synonym(authors:dosage authors:dose authors:dose authors:dose))~0.4
> ((Synonym(kw1:rat kw1:rattu))^100.0 | Synonym(species:rat species:rattu) |
> (Synonym(keywords_bm25_no_norms:rat keywords_bm25_no_norms:rattu))^50.0 |
> Synonym(description:rat description:rattu) | (Synonym(kw1ranked:rat
> kw1ranked:rattu))^100.0 | Synonym(text:rat text:rattu) | (Synonym(title:rat
> title:rattu))^100.0 | (Synonym(keywordsranked_bm25_no_norms:rat
> keywordsranked_bm25_no_norms:rattu))^50.0 | Synonym(authors:rat
> authors:rattu))~0.4)~3)
>
> /// PF and PF2 are missing. ///
>  () () () () ()
>
> /// This is actually PF3 with a missing ? where the stopword 'in'
> belonged. ///
>  ((title:\"(dosage dose dose dose) (rattu rat)\"~22)^1000.0 |
> (keywordsranked_bm25_no_norms:\"(dosage dose dose dose) (rattu
> rat)\"~22)^1000.0 | (text:\"(dosage dose dose dose) (rattu
> rat)\"~22)^100.0)~0.4 ((keywords_bm25_no_norms:\"(dosage dose dose dose)
> (rattu rat)\"~12)^500.0 | (kw1ranked:\"(dosage dose dose dose) (rattu
> rat)\"~12)^100.0 | (kw1:\"(dosage dose dose dose) (rattu
> rat)\"~12)^100.0)~0.4,product(max(10.0/(3.16E-11*float(ms(
> const(1555545600000),date(dateint)))+6.0),int(documentdatefix)),scale(map(
> int(rank),-1.0,-1.0,const(0.5),null),0.5,2.0)))",
>
> With index-time synonyms:
> ===============
>
> /// Q ///
>  "boost(+((((kw1:aspirin)^100.0 | species:aspirin |
> (keywords_bm25_no_norms:aspirin)^50.0 | description:aspirin |
> (kw1ranked:aspirin)^100.0 | text:aspirin | (title:aspirin)^100.0 |
> (keywordsranked_bm25_no_norms:aspirin)^50.0 | authors:aspirin)~0.4
> ((kw1:dose)^100.0 | species:dose | (keywords_bm25_no_norms:dose)^50.0 |
> description:dose | (kw1ranked:dose)^100.0 | text:dose | (title:dose)^100.0
> | (keywordsranked_bm25_no_norms:dose)^50.0 | authors:dose)~0.4
> ((kw1:rats)^100.0 | species:rats | (keywords_bm25_no_norms:rats)^50.0 |
> description:rats | (kw1ranked:rats)^100.0 | text:rats | (title:rats)^100.0
> | (keywordsranked_bm25_no_norms:rats)^50.0 | authors:rats)~0.4)~3)
> /// PF  ///
>   ((title:\"aspirin dose ? rats\"~20)^5000.0 |
> (keywordsranked_bm25_no_norms:\"aspirin dose ? rats\"~20)^5000.0 |
> (keywords_bm25_no_norms:\"aspirin dose ? rats\"~20)^1500.0 |
> (text:\"aspirin dose ? rats\"~20)^1000.0)~0.4 ((kw1ranked:\"aspirin dose ?
> rats\"~10)^5000.0 | (kw1:\"aspirin dose ? rats\"~10)^500.0)~0.4
> ((authors:\"aspirin dose ? rats\")^250.0 | description:\"aspirin dose ?
> rats\")~0.4
>
> /// PF2 ///
>   ((text:\"aspirin dose ? rats\"~100)^500.0)~0.4 (authors:\"aspirin
> dose\"~11 | species:\"aspirin dose\"~11)~0.4
>
> /// PF3 ///
> (((title:\"aspirin dose\"~22)^1000.0 | (keywordsranked_bm25_no_norms:\"aspirin
> dose\"~22)^1000.0 | (text:\"aspirin dose\"~22)^100.0)~0.4 ((title:\"dose ?
> rats\"~22)^1000.0 | (keywordsranked_bm25_no_norms:\"dose ?
> rats\"~22)^1000.0 | (text:\"dose ? rats\"~22)^100.0)~0.4)
> (((keywords_bm25_no_norms:\"aspirin dose\"~12)^500.0 |
> (kw1ranked:\"aspirin dose\"~12)^100.0 | (kw1:\"aspirin
> dose\"~12)^100.0)~0.4 ((keywords_bm25_no_norms:\"dose ? rats\"~12)^500.0
> | (kw1ranked:\"dose ? rats\"~12)^100.0 | (kw1:\"dose ?
> rats\"~12)^100.0)~0.4),product(max(10.0/(3.16E-11*
> float(ms(const(1555545600000),date(dateint)))+6.0),int(
> documentdatefix)),scale(map(int(rank),-1.0,-1.0,const(0.5)
> ,null),0.5,2.0)))",
>
>
> ===============
> Example 2: "allergic reaction dogs"
> The underlying issue isn't specifically PF, PF2, PF3. The following
> example picks up PF2, but not PF or PF3
> ===============
>
> With Query-time synonyms:
> ///  Q ///
> parsedquery_toString":"boost(
> +((((Synonym(kw1:allergic kw1:allergy kw1:hypersensitive
> kw1:hypersensitive))^100.0 | Synonym(species:allergic species:allergy
> species:hypersensitive species:hypersensitive) | (Synonym(keywords_bm25_no_norms:allergic
> keywords_bm25_no_norms:allergy keywords_bm25_no_norms:hypersensitive
> keywords_bm25_no_norms:hypersensitive))^50.0 |
> Synonym(description:allergic description:allergy description:hypersensitive
> description:hypersensitive) | (Synonym(kw1ranked:allergic kw1ranked:allergy
> kw1ranked:hypersensitive kw1ranked:hypersensitive))^100.0 |
> Synonym(text:allergic text:allergy text:hypersensitive text:hypersensitive)
> | (Synonym(title:allergic title:allergy title:hypersensitive
> title:hypersensitive))^100.0 | (Synonym(keywordsranked_bm25_no_norms:allergic
> keywordsranked_bm25_no_norms:allergy keywordsranked_bm25_no_norms:hypersensitive
> keywordsranked_bm25_no_norms:hypersensitive))^50.0 |
> Synonym(authors:allergic authors:allergy authors:hypersensitive
> authors:hypersensitive))~0.4 ((kw1:reaction)^100.0 | species:reaction |
> (keywords_bm25_no_norms:reaction)^50.0 | description:reaction |
> (kw1ranked:reaction)^100.0 | text:reaction | (title:reaction)^100.0 |
> (keywordsranked_bm25_no_norms:reaction)^50.0 | authors:reaction)~0.4
> ((kw1:\"cani familiari\" kw1:canine kw1:\"k 9\" kw1:\"cani lupu familiari\"
> kw1:dog)^100.0 | (species:\"cani familiari\" species:canine species:\"k 9\"
> species:\"cani lupu familiari\" species:dog) |
> (keywords_bm25_no_norms:\"cani familiari\" keywords_bm25_no_norms:canine
> keywords_bm25_no_norms:\"k 9\" keywords_bm25_no_norms:\"cani lupu
> familiari\" keywords_bm25_no_norms:dog)^50.0 | (description:\"cani
> familiari\" description:canine description:\"k 9\" description:\"cani lupu
> familiari\" description:dog) | (kw1ranked:\"cani familiari\"
> kw1ranked:canine kw1ranked:\"k 9\" kw1ranked:\"cani lupu familiari\"
> kw1ranked:dog)^100.0 | (text:\"cani familiari\" text:canine text:\"k 9\"
> text:\"cani lupu familiari\" text:dog) | (title:\"cani familiari\"
> title:canine title:\"k 9\" title:\"cani lupu familiari\" title:dog)^100.0 |
> (keywordsranked_bm25_no_norms:\"cani familiari\"
> keywordsranked_bm25_no_norms:canine keywordsranked_bm25_no_norms:\"k 9\"
> keywordsranked_bm25_no_norms:\"cani lupu familiari\"
> keywordsranked_bm25_no_norms:dog)^50.0 | (authors:\"cani familiari\"
> authors:canine authors:\"k 9\" authors:\"cani lupu familiari\"
> authors:dog))~0.4)~3)
>
> /// PF ///
> () () () ()
>
> /// PF2 ////
> (authors:\"(hypersensitive allergy hypersensitive allergic) reaction\"~11
> | species:\"(hypersensitive allergy hypersensitive allergic)
> reaction\"~11)~0.4
>
> /// PF3 ///
> () (),
> product(max(10.0/(3.16E-11*float(ms(const(1555545600000),
> date(dateint)))+6.0),int(documentdatefix)),scale(map(
> int(rank),-1.0,-1.0,const(0.5),null),0.5,2.0)))",
>
> With index-timy synonyms:
> /// Q ///
> +((((kw1:allergic)^100.0 | species:allergic | (keywords_bm25_no_norms:allergic)^50.0
> | description:allergic | (kw1ranked:allergic)^100.0 | text:allergic |
> (title:allergic)^100.0 | (keywordsranked_bm25_no_norms:allergic)^50.0 |
> authors:allergic)~0.4 ((kw1:reaction)^100.0 | species:reaction |
> (keywords_bm25_no_norms:reaction)^50.0 | description:reaction |
> (kw1ranked:reaction)^100.0 | text:reaction | (title:reaction)^100.0 |
> (keywordsranked_bm25_no_norms:reaction)^50.0 | authors:reaction)~0.4
> ((kw1:dog)^100.0 | species:dog | (keywords_bm25_no_norms:dog)^50.0 |
> description:dog | (kw1ranked:dog)^100.0 | text:dog | (title:dog)^100.0 |
> (keywordsranked_bm25_no_norms:dog)^50.0 | authors:dog)~0.4)~3)
>
> /// PF ///
> ((title:\"allergic reaction dog\"~20)^5000.0 |
> (keywordsranked_bm25_no_norms:\"allergic reaction dog\"~20)^5000.0 |
> (keywords_bm25_no_norms:\"allergic reaction dog\"~20)^1500.0 |
> (text:\"allergic reaction dog\"~20)^1000.0)~0.4 ((kw1ranked:\"allergic
> reaction dog\"~10)^5000.0 | (kw1:\"allergic reaction dog\"~10)^500.0)~0.4
> ((authors:\"allergic reaction dog\")^250.0 | description:\"allergic
> reaction dog\")~0.4 ((text:\"allergic reaction dog\"~100)^500.0)~0.4
>
> /// PF2 ///
> ((authors:\"allergic reaction\"~11 | species:\"allergic reaction\"~11)~0.4
>
> /// PF3 ///
> (authors:\"reaction dog\"~11 | species:\"reaction dog\"~11)~0.4)
> ((title:\"allergic reaction dog\"~22)^1000.0 |
> (keywordsranked_bm25_no_norms:\"allergic reaction dog\"~22)^1000.0 |
> (text:\"allergic reaction dog\"~22)^100.0)~0.4 ((keywords_bm25_no_norms:\"allergic
> reaction dog\"~12)^500.0 | (kw1ranked:\"allergic reaction dog\"~12)^100.0 |
> (kw1:\"allergic reaction dog\"~12)^100.0)~0.4,product(
> max(10.0/(3.16E-11*float(ms(const(1555545600000),date(dateint)))+6.0),int(
> documentdatefix)),scale(map(int(rank),-1.0,-1.0,const(0.5)
> ,null),0.5,2.0)))",
>
>
> Working on getting this rigged up in the debugger, but would appreciate
> any feedback.
>
> Thank you,
> Elizabeth
>