question about synonyms

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

question about synonyms

nick19701
Hi,
I put this line in my synonyms.txt

bestbuy,bb,best buy

I expect that when bb is searched, all results
including "bestbuy", "bb" or "best buy" will be returned.
But in my test I only got back the results which include "bestbuy"
or "best buy". The results which include "bb" are not returned.

what am I missing here?
Reply | Threaded
Open this post in threaded view
|

Re: question about synonyms

Yonik Seeley-2
On 2/13/07, nick19701 <[hidden email]> wrote:

>
> Hi,
> I put this line in my synonyms.txt
>
> bestbuy,bb,best buy
>
> I expect that when bb is searched, all results
> including "bestbuy", "bb" or "best buy" will be returned.
> But in my test I only got back the results which include "bestbuy"
> or "best buy". The results which include "bb" are not returned.

Are you using the synonyms at index time, query time, or both?
Did you reindex if you made changes to an "index" analyzer?
It would help if you post the fieldtype for the field you are searching.

-Yonik
Reply | Threaded
Open this post in threaded view
|

Re: question about synonyms

nick19701
Yonik Seeley wrote
Are you using the synonyms at index time, query time, or both? Did you reindex if you made changes to an "index" analyzer? It would help if you post the fieldtype for the field you are searching.
I am using the synonyms only at query time. Below is the field analysis. It seems like the culpit is the space in the phrase "best buy" in synonyms.txt. what should I do about it? put quotes around it? BTW, the default operator is "AND": solrQueryParser defaultOperator="AND"

Index Analyzer

org.apache.solr.analysis.WhitespaceTokenizerFactory {}

term position 1
term text bb
term type word
source start,end 0,2

org.apache.solr.analysis.StopFilterFactory {words=stopwords.txt, ignoreCase=true}

term position 1
term text bb
term type word
source start,end 0,2

org.apache.solr.analysis.WordDelimiterFilterFactory {catenateWords=1, catenateNumbers=1, catenateAll=0, generateNumberParts=1, generateWordParts=1}

term position 1
term text bb
term type word
source start,end 0,2

org.apache.solr.analysis.LowerCaseFilterFactory {}

term position 1
term text bb
term type word
source start,end 0,2

org.apache.solr.analysis.EnglishPorterFilterFactory {protected=protwords.txt}

term position 1
term text bb
term type word
source start,end 0,2

org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory {}

term position 1
term text bb
term type word
source start,end 0,2

Query Analyzer

org.apache.solr.analysis.WhitespaceTokenizerFactory {}

term position 1
term text bb
term type word
source start,end 0,2

org.apache.solr.analysis.SynonymFilterFactory {expand=true, ignoreCase=true, synonyms=synonyms.txt}

term position 12
term text bestbuybuy
bb
best
term type wordword
word
word
source start,end 0,20,2
0,2
0,2

org.apache.solr.analysis.StopFilterFactory {words=stopwords.txt, ignoreCase=true}

term position 12
term text bestbuybuy
bb
best
term type wordword
word
word
source start,end 0,20,2
0,2
0,2

org.apache.solr.analysis.WordDelimiterFilterFactory {catenateWords=0, catenateNumbers=0, catenateAll=0, generateNumberParts=1, generateWordParts=1}

term position 12
term text bestbuybuy
bb
best
term type wordword
word
word
source start,end 0,20,2
0,2
0,2

org.apache.solr.analysis.LowerCaseFilterFactory {}

term position 12
term text bestbuybuy
bb
best
term type wordword
word
word
source start,end 0,20,2
0,2
0,2

org.apache.solr.analysis.EnglishPorterFilterFactory {protected=protwords.txt}

term position 12
term text bestbuybuy
bb
best
term type wordword
word
word
source start,end 0,20,2
0,2
0,2

org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory {}

term position 12
term text bestbuybuy
bb
best
term type wordword
word
word
source start,end 0,20,2
0,2
0,2
Reply | Threaded
Open this post in threaded view
|

Re: question about synonyms

Chris Hostetter-3

: I am using the synonyms only at query time.
: Below is the field analysis.

FYI: I think what yonik ment was the section of your schema.xml that
defines the fieldtype.

: It seems like the culpit is the space in the phrase "best buy" in
: synonyms.txt.

because of some limitations in the way Analyzers can indicate that
multiple tokens occupy the same space, multiword synonyms are inheriently
tricky ... there is extensive discussion on this in the wiki...

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#head-2c461ac74b4ddd82e453dc68fcfc92da77358d46

...in a nut shell: there is no clean way to do query time multiword
synonyms.


-Hoss

Reply | Threaded
Open this post in threaded view
|

Re: question about synonyms

Yonik Seeley-2
On 2/13/07, Chris Hostetter <[hidden email]> wrote:

>
> : I am using the synonyms only at query time.
> : Below is the field analysis.
>
> FYI: I think what yonik ment was the section of your schema.xml that
> defines the fieldtype.
>
> : It seems like the culpit is the space in the phrase "best buy" in
> : synonyms.txt.
>
> because of some limitations in the way Analyzers can indicate that
> multiple tokens occupy the same space, multiword synonyms are inheriently
> tricky ... there is extensive discussion on this in the wiki...
>
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#head-2c461ac74b4ddd82e453dc68fcfc92da77358d46
>
> ...in a nut shell: there is no clean way to do query time multiword
> synonyms.

To be clear, no clean way to do *expansion* as opposed to reduction at
query time, when the alternatives are of different lengths.

You could use index-time expansion, a combination of index time and
query time reduction on the same synonym dictionary, or only handle
the multi-token alternatives during indexing with expansion, and do
query-time synonym expansion on the remaining alternatives.

-Yonik
Reply | Threaded
Open this post in threaded view
|

Re: question about synonyms

Chris Hostetter-3

: To be clear, no clean way to do *expansion* as opposed to reduction at
: query time, when the alternatives are of different lengths.

Reduction at query time doesn't work either ... when query parser sees the
string:
        my best buy
...it analyzes each white space sepearted string seperately, so a synonym
reduction of "best buy"=>bestbuy won't ever be triggered.  As i said, this
is all covered in the wiki. (it's probably the topic in the wiki with the
most complete coverage: multi word synonyms it kicked my ass up and down
the street about a year ago)



-Hoss

Reply | Threaded
Open this post in threaded view
|

Re: question about synonyms

Yonik Seeley-2
On 2/13/07, Chris Hostetter <[hidden email]> wrote:
>
> : To be clear, no clean way to do *expansion* as opposed to reduction at
> : query time, when the alternatives are of different lengths.
>
> Reduction at query time doesn't work either ... when query parser sees the
> string:
>         my best buy
> ...it analyzes each white space sepearted string seperately

Unless you put it in a phrase query.  But yes, that's not as flexible
and would probably cause pain with the dismax handler.

-Yonik