Solr Synonyms, Escape space in case of multi words

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Solr Synonyms, Escape space in case of multi words

David Philip
Hi All,

   I remember using multi-words in synonyms in Solr 3.x version. In case of
multi words, I was escaping space with back slash[\] and it work as
intended.  Ex: ride\ makers, riders, rider\ guards.  Each one mapped to
each other and so when I searched for ride makers, I obtained the search
results for all of them. The field type was same as below. I have same set
up in solr 4.10 but now the multi word space escape is getting ignored. It
is tokenizing on spaces.

 synonyms.txt
    ridemakers, ride makers, ridemakerz, ride makerz, ride\mark, ride\ care


Analysis page:

ridemakersrideridemakerzrideridemarkridemakersmakerzcare

Field Type

    <fieldType name="text_syn" class="solr.TextField"
positionIncrementGap="100">
      <analyzer>
        <tokenizer class="solr.KeywordTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
      </analyzer>
    </fieldType>



Could you please tell me what could be the issue? How do I handle
multi-word cases?




    synonyms.txt
    ridemakers, ride makers, ridemakerz, ride makerz, ride\mark, ride\ care


Thanks - David
Reply | Threaded
Open this post in threaded view
|

Re: Solr Synonyms, Escape space in case of multi words

David Philip
contd..

expectation was that the "ride care"  should not have split into two tokens.

It should have been as below. Please correct me/point me where I am wrong.


Input : ridemakers, ride makers, ridemakerz, ride makerz, ride\mark, ride\
care

o/p

ridemakersrideridemakerzrideridemarkridemakersmakerz

*ride care*




On Wed, Oct 15, 2014 at 7:16 PM, David Philip <[hidden email]>
wrote:

> Hi All,
>
>    I remember using multi-words in synonyms in Solr 3.x version. In case
> of multi words, I was escaping space with back slash[\] and it work as
> intended.  Ex: ride\ makers, riders, rider\ guards.  Each one mapped to
> each other and so when I searched for ride makers, I obtained the search
> results for all of them. The field type was same as below. I have same set
> up in solr 4.10 but now the multi word space escape is getting ignored. It
> is tokenizing on spaces.
>
>  synonyms.txt
>     ridemakers, ride makers, ridemakerz, ride makerz, ride\mark, ride\ care
>
>
> Analysis page:
>
> ridemakersrideridemakerzrideridemarkridemakersmakerzcare
>
> Field Type
>
>     <fieldType name="text_syn" class="solr.TextField"
> positionIncrementGap="100">
>       <analyzer>
>         <tokenizer class="solr.KeywordTokenizerFactory"/>
>         <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>
>       </analyzer>
>     </fieldType>
>
>
>
> Could you please tell me what could be the issue? How do I handle
> multi-word cases?
>
>
>
>
>     synonyms.txt
>     ridemakers, ride makers, ridemakerz, ride makerz, ride\mark, ride\ care
>
>
> Thanks - David
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Solr Synonyms, Escape space in case of multi words

David Philip
Sorry, analysis page clip is getting trimmed off and hence the indention is
lost.

Here it is :

ridemakers | ride | ridemakerz | ride | ridemark | ride | makers | makerz|
care

expected:

ridemakers | ride | ridemakerz | ride | ridemark | ride | makers |
makerz| *ride
care*



On Wed, Oct 15, 2014 at 7:21 PM, David Philip <[hidden email]>
wrote:

> contd..
>
> expectation was that the "ride care"  should not have split into two
> tokens.
>
> It should have been as below. Please correct me/point me where I am wrong.
>
>
> Input : ridemakers, ride makers, ridemakerz, ride makerz, ride\mark, ride\
> care
>
> o/p
>
> ridemakersrideridemakerzrideridemarkridemakersmakerz
>
> *ride care*
>
>
>
>
> On Wed, Oct 15, 2014 at 7:16 PM, David Philip <[hidden email]
> > wrote:
>
>> Hi All,
>>
>>    I remember using multi-words in synonyms in Solr 3.x version. In case
>> of multi words, I was escaping space with back slash[\] and it work as
>> intended.  Ex: ride\ makers, riders, rider\ guards.  Each one mapped to
>> each other and so when I searched for ride makers, I obtained the search
>> results for all of them. The field type was same as below. I have same set
>> up in solr 4.10 but now the multi word space escape is getting ignored. It
>> is tokenizing on spaces.
>>
>>  synonyms.txt
>>     ridemakers, ride makers, ridemakerz, ride makerz, ride\mark, ride\
>> care
>>
>>
>> Analysis page:
>>
>> ridemakersrideridemakerzrideridemarkridemakersmakerzcare
>>
>> Field Type
>>
>>     <fieldType name="text_syn" class="solr.TextField"
>> positionIncrementGap="100">
>>       <analyzer>
>>         <tokenizer class="solr.KeywordTokenizerFactory"/>
>>         <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
>> ignoreCase="true" expand="true"/>
>>       </analyzer>
>>     </fieldType>
>>
>>
>>
>> Could you please tell me what could be the issue? How do I handle
>> multi-word cases?
>>
>>
>>
>>
>>     synonyms.txt
>>     ridemakers, ride makers, ridemakerz, ride makerz, ride\mark, ride\
>> care
>>
>>
>> Thanks - David
>>
>>
>>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Solr Synonyms, Escape space in case of multi words

Rajinimaski
Hi David,

  I think you should have the filter class with tokenizer specified. [As
shown below]

  <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"

*tokenizerFactory="solr.KeywordTokenizerFactory"/>*



So your field type should be as shown below:

<fieldType name="text_syn" class="solr.TextField"
positionIncrementGap="100">
      <analyzer>
        <tokenizer class="solr.KeywordTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"
tokenizerFactory="solr.KeywordTokenizerFactory"/>
      </analyzer>
    </fieldType>


On Wed, Oct 15, 2014 at 7:25 PM, David Philip <[hidden email]>
wrote:

> Sorry, analysis page clip is getting trimmed off and hence the indention is
> lost.
>
> Here it is :
>
> ridemakers | ride | ridemakerz | ride | ridemark | ride | makers | makerz|
> care
>
> expected:
>
> ridemakers | ride | ridemakerz | ride | ridemark | ride | makers |
> makerz| *ride
> care*
>
>
>
> On Wed, Oct 15, 2014 at 7:21 PM, David Philip <[hidden email]
> >
> wrote:
>
> > contd..
> >
> > expectation was that the "ride care"  should not have split into two
> > tokens.
> >
> > It should have been as below. Please correct me/point me where I am
> wrong.
> >
> >
> > Input : ridemakers, ride makers, ridemakerz, ride makerz, ride\mark,
> ride\
> > care
> >
> > o/p
> >
> > ridemakersrideridemakerzrideridemarkridemakersmakerz
> >
> > *ride care*
> >
> >
> >
> >
> > On Wed, Oct 15, 2014 at 7:16 PM, David Philip <
> [hidden email]
> > > wrote:
> >
> >> Hi All,
> >>
> >>    I remember using multi-words in synonyms in Solr 3.x version. In case
> >> of multi words, I was escaping space with back slash[\] and it work as
> >> intended.  Ex: ride\ makers, riders, rider\ guards.  Each one mapped to
> >> each other and so when I searched for ride makers, I obtained the search
> >> results for all of them. The field type was same as below. I have same
> set
> >> up in solr 4.10 but now the multi word space escape is getting ignored.
> It
> >> is tokenizing on spaces.
> >>
> >>  synonyms.txt
> >>     ridemakers, ride makers, ridemakerz, ride makerz, ride\mark, ride\
> >> care
> >>
> >>
> >> Analysis page:
> >>
> >> ridemakersrideridemakerzrideridemarkridemakersmakerzcare
> >>
> >> Field Type
> >>
> >>     <fieldType name="text_syn" class="solr.TextField"
> >> positionIncrementGap="100">
> >>       <analyzer>
> >>         <tokenizer class="solr.KeywordTokenizerFactory"/>
> >>         <filter class="solr.SynonymFilterFactory"
> synonyms="synonyms.txt"
> >> ignoreCase="true" expand="true"/>
> >>       </analyzer>
> >>     </fieldType>
> >>
> >>
> >>
> >> Could you please tell me what could be the issue? How do I handle
> >> multi-word cases?
> >>
> >>
> >>
> >>
> >>     synonyms.txt
> >>     ridemakers, ride makers, ridemakerz, ride makerz, ride\mark, ride\
> >> care
> >>
> >>
> >> Thanks - David
> >>
> >>
> >>
> >
> >
>