SynonymFilter docs

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

SynonymFilter docs

Yonik Seeley-2
I ran across the following text in
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters

'''
1) The Lucene QueryParser tokenizes on white space before giving any
text to the Analyzer, so if a person searches for the words sea biscit
the analyzer will be given the words "sea" and "biscit" seperately,
and will not know that they match a synonym.
'''

But the SynonymFilter Solr has is able to recognize multi-token
synonyms... it happens the same was as the analysis side.  Am I
misinterpreting the statement?

The conclusion is still correct though... index time works better in general.

-Yonik
Reply | Threaded
Open this post in threaded view
|

Re: SynonymFilter docs

Chris Hostetter-3

: '''
: 1) The Lucene QueryParser tokenizes on white space before giving any
: text to the Analyzer, so if a person searches for the words sea biscit
: the analyzer will be given the words "sea" and "biscit" seperately,
: and will not know that they match a synonym.
: '''
:
: But the SynonymFilter Solr has is able to recognize multi-token
: synonyms... it happens the same was as the analysis side.  Am I
: misinterpreting the statement?

the SynonymFilter can spot the multi-token synonym only if it's given the
multiple tokens as a single stream -- query parser won't do that if you
give it the words...

        sea biscit

...only if you give it...

        "sea biscit"

...but that leads to point #2 in that page.


-Hoss