Multi word synonym problem

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Multi word synonym problem

Nair, Manas
Hi Experts,
 
I would like help on multi word synonyms. The scenario is like:
 
I have a name Micheal Jackson(wrong term) which has a synonym Michael Jackson i.e.
 
Micheal Jackson => Michael Jackson
 
When I try to search for the word Micheal Jackson (not a phrase search), it is searching for text: Micheal , text: Jackson  and not for Michael Jackson.
But when I search for "Micheal Jackson" (phrase search), solr is searching for "Michael Jackson" (the correct term).
 
The schema.xml for the particular core contains the  SynonymFilterFactory for text analyzer and is enabled during index as well as query time. The  SynonymFilterFactory during index and query time has the parameter expand=true.
 
Please help me as to how a multiword synonym can be made effective i.e I want a search for
Micheal Jackson (not phrase search) to return the results for Michael Jackson.
 
What should be done so that Micheal Jackson is considered as one search term instead of splitting it.
 
Any help is greatly appreciated.
 
Thankyou,
Manas Nair
Reply | Threaded
Open this post in threaded view
|

Re: Multi word synonym problem

iorixxx
It is recommended [1] to use synonyms at index time only for various reasons especially with multi-word synonyms.

[1]http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory

only at index time use expand=true ingoreCase=true with synonym.txt :

micheal, michael

OR:

micheal jackson, michael jackson

Note it it is important to what filters you have before synonym filter.
Bu sure that you restart tomcat and re-index.

Query Micheal Jackson (not phrase search) should return the results
for Michael Jackson.

Hope this helps.

--- On Thu, 11/12/09, Nair, Manas <[hidden email]> wrote:

> From: Nair, Manas <[hidden email]>
> Subject: Multi word synonym problem
> To: [hidden email]
> Cc: "Arumugam, Senthil Kumar" <[hidden email]>
> Date: Thursday, November 12, 2009, 3:43 PM
> Hi Experts,
>  
> I would like help on multi word synonyms. The scenario is
> like:
>  
> I have a name Micheal Jackson(wrong term) which has a
> synonym Michael Jackson i.e.
>  
> Micheal Jackson => Michael Jackson
>  
> When I try to search for the word Micheal Jackson (not a
> phrase search), it is searching for text: Micheal , text:
> Jackson  and not for Michael Jackson.
> But when I search for "Micheal Jackson" (phrase search),
> solr is searching for "Michael Jackson" (the correct term).
>  
> The schema.xml for the particular core contains the 
> SynonymFilterFactory for text analyzer and is enabled during
> index as well as query time. The  SynonymFilterFactory
> during index and query time has the parameter expand=true.
>  
> Please help me as to how a multiword synonym can be made
> effective i.e I want a search for
> Micheal Jackson (not phrase search) to return the results
> for Michael Jackson.
>  
> What should be done so that Micheal Jackson is considered
> as one search term instead of splitting it.
>  
> Any help is greatly appreciated.
>  
> Thankyou,
> Manas Nair
>



Reply | Threaded
Open this post in threaded view
|

RE: Multi word synonym problem

Nair, Manas
Hi,
 
I tried using the recommended approach but to no benefit. The multiword synonyms are still not appearing in the result.
 
My schema.xml has the following fieldType:
 
 
<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"/>
        <filter class="solr.LowerCaseFilterFactory"/>
<!--        <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/> -->
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="false"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="0"/>
        <filter class="solr.LowerCaseFilterFactory"/>
<!--        <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/> -->
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
    </fieldType>

This "text" field is the defaultSearchField too.
 
If I give the synonym for Micheal Jackson as Michael Jackson, i.e. in my synonyms.txt file, he entry is:
Micheal Jackson => Michael Jackson
 
The response is not searching for Michael Jackson. Instead it is searching for (text:Micheal and text: Jackson).To monitor the parsed query, i turned on debugQuery, but in the present case, the parsed query string was searching Micheal and Jackson separately.
 
I was able to somehow bring the corret response by modifying the synonyms.txt file. I changed the entry as:
Micheal Jackson , Michael Jackson  (replaced '=>' with ',').
 
Is there something that needs to be done with the schema part that has been mentioned above. I would want the synonyms to work when I map them using =>.
 
Kindly help.
 
Thankyou,
Manas
________________________________

From: AHMET ARSLAN [mailto:[hidden email]]
Sent: Thu 11/12/2009 1:18 PM
To: [hidden email]
Subject: Re: Multi word synonym problem



It is recommended [1] to use synonyms at index time only for various reasons especially with multi-word synonyms.

[1]http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory

only at index time use expand=true ingoreCase=true with synonym.txt :

micheal, michael

OR:

micheal jackson, michael jackson

Note it it is important to what filters you have before synonym filter.
Bu sure that you restart tomcat and re-index.

Query Micheal Jackson (not phrase search) should return the results
for Michael Jackson.

Hope this helps.

--- On Thu, 11/12/09, Nair, Manas <[hidden email]> wrote:

> From: Nair, Manas <[hidden email]>
> Subject: Multi word synonym problem
> To: [hidden email]
> Cc: "Arumugam, Senthil Kumar" <[hidden email]>
> Date: Thursday, November 12, 2009, 3:43 PM
> Hi Experts,
>
> I would like help on multi word synonyms. The scenario is
> like:
>
> I have a name Micheal Jackson(wrong term) which has a
> synonym Michael Jackson i.e.
>
> Micheal Jackson => Michael Jackson
>
> When I try to search for the word Micheal Jackson (not a
> phrase search), it is searching for text: Micheal , text:
> Jackson  and not for Michael Jackson.
> But when I search for "Micheal Jackson" (phrase search),
> solr is searching for "Michael Jackson" (the correct term).
>
> The schema.xml for the particular core contains the
> SynonymFilterFactory for text analyzer and is enabled during
> index as well as query time. The  SynonymFilterFactory
> during index and query time has the parameter expand=true.
>
> Please help me as to how a multiword synonym can be made
> effective i.e I want a search for
> Micheal Jackson (not phrase search) to return the results
> for Michael Jackson.
>
> What should be done so that Micheal Jackson is considered
> as one search term instead of splitting it.
>
> Any help is greatly appreciated.
>
> Thankyou,
> Manas Nair
>


     


Reply | Threaded
Open this post in threaded view
|

RE: Multi word synonym problem

hossman

: The response is not searching for Michael Jackson. Instead it is
: searching for (text:Micheal and text: Jackson).To monitor the parsed
: query, i turned on debugQuery, but in the present case, the parsed query
: string was searching Micheal and Jackson separately.

using index time synonyms isn't ggoing to have any effect on how your
query is parsed.  the Lucene/Solr query parsers uses whitespace as
"markup" and will still analyze each of the "words" in your input
seperately and build up a boolean query containing each of your words
individually (the only way to change that is to use quotes to force
"phrase query" behavior where everything in quotes is analyzed as one
chunk, or pick a different queyr parse like the "field" parser)

...but none of that changes the point of *why* you can/should use index
time synonyms for situations like this.  the point of doing that is that
at index time the alternate versions of the multi-word sequences can all
be expanded and all varients are put in the index ... so it doesn't matter
if you use a phrase query, or term queries, all of the synonyms are in the
index document.



-Hoss