Token "states" not getting lemmatized by Solr?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
OTH
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Token "states" not getting lemmatized by Solr?

OTH
Hello,

It seems for me that the token "states" is not getting lemmatized to
"state" by Solr.

Eg, I have a document with the value "united states of america".
This document is not returned when the following query is issued:
q=name:state^1+name:america^1+name:united^1
However, all documents which contain the token "state" are indeed returned,
with the above query.
The "united states of america" document is returned if I change "state" in
the query to "states"; so:
q=name:states^1+name:america^1+name:united^1

At first I thought maybe the lemmatization isn't working for some reason.
However, when I changed "united" in the query to "unite", then it did still
return the "united states of america" document:
q=name:states^1+name:america^1+name:unite^1
Which means that the lemmatization is working for the token "united", but
not for the token "states".

The "name" field above is defined as "text_general".

So it seems to me, that perhaps the default Solr lemmatizer does not
lemmatize "states" to "state"?
Can anyone confirm if this is indeed the expected behaviour?
And what can I do to change it?
If I need to put in a customer lemmatizer, then what would be the (best)
way to do that?

Much thanks
Omer
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Token "states" not getting lemmatized by Solr?

Erick Erickson
saying the field is "text_general" is not sufficient, please post the
analysis chain defined in your schema.

Also the admin UI>>analysis page will help you figure out exactly what
part of the analysis chain does what.

Best,
Erick

On Thu, Aug 10, 2017 at 8:37 AM, OTH <[hidden email]> wrote:

> Hello,
>
> It seems for me that the token "states" is not getting lemmatized to
> "state" by Solr.
>
> Eg, I have a document with the value "united states of america".
> This document is not returned when the following query is issued:
> q=name:state^1+name:america^1+name:united^1
> However, all documents which contain the token "state" are indeed returned,
> with the above query.
> The "united states of america" document is returned if I change "state" in
> the query to "states"; so:
> q=name:states^1+name:america^1+name:united^1
>
> At first I thought maybe the lemmatization isn't working for some reason.
> However, when I changed "united" in the query to "unite", then it did still
> return the "united states of america" document:
> q=name:states^1+name:america^1+name:unite^1
> Which means that the lemmatization is working for the token "united", but
> not for the token "states".
>
> The "name" field above is defined as "text_general".
>
> So it seems to me, that perhaps the default Solr lemmatizer does not
> lemmatize "states" to "state"?
> Can anyone confirm if this is indeed the expected behaviour?
> And what can I do to change it?
> If I need to put in a customer lemmatizer, then what would be the (best)
> way to do that?
>
> Much thanks
> Omer
OTH
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Token "states" not getting lemmatized by Solr?

OTH
Hi,

Regarding 'analysis chain':

I'm using Solr 6.4.1, and in the managed-schema file, I find the following:
<fieldType name="text_general" class="solr.TextField"
positionIncrementGap="100" multiValued="true">
    <analyzer type="index">
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.StopFilterFactory" words="stopwords.txt"
ignoreCase="true"/>
      <filter class="solr.LowerCaseFilterFactory"/>
    </analyzer>
    <analyzer type="query">
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.StopFilterFactory" words="stopwords.txt"
ignoreCase="true"/>
      <filter class="solr.SynonymFilterFactory" expand="true"
ignoreCase="true" synonyms="synonyms.txt"/>
      <filter class="solr.LowerCaseFilterFactory"/>
    </analyzer>
  </fieldType>


Regarding the Admin UI >> Analysis page:  I just tried that, and to be
honest, I can't seem to get much useful info out of it, especially in terms
of lemmatization.

For example, for any text I enter in it to "analyse", all it does is seem
to tell me which analysers (if that's the right term?) are being used for
the selected field / fieldtype, and for each of these analyzers, it would
give some very basic info, like text, raw_bytes, etc.  Eg, for the input
"united" in the "field value (index)" box, having "text_general" selected
for fieldtype, all I get is this:

ST
text
raw_bytes
start
end
positionLength
type
position
united
[75 6e 69 74 65 64]
0
6
1
<ALPHANUM>
1
SF
text
raw_bytes
start
end
positionLength
type
position
united
[75 6e 69 74 65 64]
0
6
1
<ALPHANUM>
1
LCF
text
raw_bytes
start
end
positionLength
type
position
united
[75 6e 69 74 65 64]
0
6
1
<ALPHANUM>
1
Placing the mouse cursor on "ST", "SF", or "LCF" shows a tooltip saying
"org.apache.lucene.analysis.standard.StandardTokenizer",
"org...core.StopFilter", and "org...core.LowerCaseFilter", respectively.


So - should 'states' not be lemmatized to 'state' using these settings?
 (If not, then I would need to figure out how to use a different lemmatizer)

Thanks

On Thu, Aug 10, 2017 at 10:28 PM, Erick Erickson <[hidden email]>
wrote:

> saying the field is "text_general" is not sufficient, please post the
> analysis chain defined in your schema.
>
> Also the admin UI>>analysis page will help you figure out exactly what
> part of the analysis chain does what.
>
> Best,
> Erick
>
> On Thu, Aug 10, 2017 at 8:37 AM, OTH <[hidden email]> wrote:
> > Hello,
> >
> > It seems for me that the token "states" is not getting lemmatized to
> > "state" by Solr.
> >
> > Eg, I have a document with the value "united states of america".
> > This document is not returned when the following query is issued:
> > q=name:state^1+name:america^1+name:united^1
> > However, all documents which contain the token "state" are indeed
> returned,
> > with the above query.
> > The "united states of america" document is returned if I change "state"
> in
> > the query to "states"; so:
> > q=name:states^1+name:america^1+name:united^1
> >
> > At first I thought maybe the lemmatization isn't working for some reason.
> > However, when I changed "united" in the query to "unite", then it did
> still
> > return the "united states of america" document:
> > q=name:states^1+name:america^1+name:unite^1
> > Which means that the lemmatization is working for the token "united", but
> > not for the token "states".
> >
> > The "name" field above is defined as "text_general".
> >
> > So it seems to me, that perhaps the default Solr lemmatizer does not
> > lemmatize "states" to "state"?
> > Can anyone confirm if this is indeed the expected behaviour?
> > And what can I do to change it?
> > If I need to put in a customer lemmatizer, then what would be the (best)
> > way to do that?
> >
> > Much thanks
> > Omer
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Token "states" not getting lemmatized by Solr?

Ahmet Arslan
Hi Omer,
Your analysis chain does not include a stem filter (lemmatizer)
Assuming you are dealing with English text, you can use KStemFilterFactory or SnowballFilterFactory.
Ahmet


On Thursday, August 10, 2017, 9:33:08 PM GMT+3, OTH <[hidden email]> wrote:


Hi,

Regarding 'analysis chain':

I'm using Solr 6.4.1, and in the managed-schema file, I find the following:
<fieldType name="text_general" class="solr.TextField"
positionIncrementGap="100" multiValued="true">
    <analyzer type="index">
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.StopFilterFactory" words="stopwords.txt"
ignoreCase="true"/>
      <filter class="solr.LowerCaseFilterFactory"/>
    </analyzer>
    <analyzer type="query">
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.StopFilterFactory" words="stopwords.txt"
ignoreCase="true"/>
      <filter class="solr.SynonymFilterFactory" expand="true"
ignoreCase="true" synonyms="synonyms.txt"/>
      <filter class="solr.LowerCaseFilterFactory"/>
    </analyzer>
  </fieldType>


Regarding the Admin UI >> Analysis page:  I just tried that, and to be
honest, I can't seem to get much useful info out of it, especially in terms
of lemmatization.

For example, for any text I enter in it to "analyse", all it does is seem
to tell me which analysers (if that's the right term?) are being used for
the selected field / fieldtype, and for each of these analyzers, it would
give some very basic info, like text, raw_bytes, etc.  Eg, for the input
"united" in the "field value (index)" box, having "text_general" selected
for fieldtype, all I get is this:

ST
text
raw_bytes
start
end
positionLength
type
position
united
[75 6e 69 74 65 64]
0
6
1
<ALPHANUM>
1
SF
text
raw_bytes
start
end
positionLength
type
position
united
[75 6e 69 74 65 64]
0
6
1
<ALPHANUM>
1
LCF
text
raw_bytes
start
end
positionLength
type
position
united
[75 6e 69 74 65 64]
0
6
1
<ALPHANUM>
1
Placing the mouse cursor on "ST", "SF", or "LCF" shows a tooltip saying
"org.apache.lucene.analysis.standard.StandardTokenizer",
"org...core.StopFilter", and "org...core.LowerCaseFilter", respectively.


So - should 'states' not be lemmatized to 'state' using these settings?
(If not, then I would need to figure out how to use a different lemmatizer)

Thanks

On Thu, Aug 10, 2017 at 10:28 PM, Erick Erickson <[hidden email]>
wrote:

> saying the field is "text_general" is not sufficient, please post the
> analysis chain defined in your schema.
>
> Also the admin UI>>analysis page will help you figure out exactly what
> part of the analysis chain does what.
>
> Best,
> Erick
>
> On Thu, Aug 10, 2017 at 8:37 AM, OTH <[hidden email]> wrote:
> > Hello,
> >
> > It seems for me that the token "states" is not getting lemmatized to
> > "state" by Solr.
> >
> > Eg, I have a document with the value "united states of america".
> > This document is not returned when the following query is issued:
> > q=name:state^1+name:america^1+name:united^1
> > However, all documents which contain the token "state" are indeed
> returned,
> > with the above query.
> > The "united states of america" document is returned if I change "state"
> in
> > the query to "states"; so:
> > q=name:states^1+name:america^1+name:united^1
> >
> > At first I thought maybe the lemmatization isn't working for some reason.
> > However, when I changed "united" in the query to "unite", then it did
> still
> > return the "united states of america" document:
> > q=name:states^1+name:america^1+name:unite^1
> > Which means that the lemmatization is working for the token "united", but
> > not for the token "states".
> >
> > The "name" field above is defined as "text_general".
> >
> > So it seems to me, that perhaps the default Solr lemmatizer does not
> > lemmatize "states" to "state"?
> > Can anyone confirm if this is indeed the expected behaviour?
> > And what can I do to change it?
> > If I need to put in a customer lemmatizer, then what would be the (best)
> > way to do that?
> >
> > Much thanks
> > Omer
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Token "states" not getting lemmatized by Solr?

Erick Erickson
In reply to this post by OTH
First, if you turn off the "verbose" checkbox, it'll reduce a lot of
the clutter. The important point is that when you hover over those
abbreviations, it tells you exactly what class did the associated
transformation the analysis chain on the tokens. You'll note that
StandardTokenizer breaks the input up into tokens. "united" doesn't to
very much that's exciting, make some letters uppercase and you'll see
the obvious for lowercaseFilter.

Why do you suppose lemmatization will be done for text_general?
There's nothing in the analysis chain that would perform any
lemmatization.
StandartTokenizerFactory will break the input up into tokens. Each
token is then sent through filter where:
StopFilterfactory will remove stopwords defined in stopwrods.txt
LowercaseFilterFactory will lowercase the token

that's all you've told Solr to do with the inptu at index time. And at
query time SynonymFilterFactory will substitute synonyms. There's
nothing here that has anything to do with lemmatization.

Here's a partial list of available filters that you can choose from:
https://cwiki.apache.org/confluence/display/solr/Filter+Descriptions

Best,
Erick


On Thu, Aug 10, 2017 at 11:33 AM, OTH <[hidden email]> wrote:

> Hi,
>
> Regarding 'analysis chain':
>
> I'm using Solr 6.4.1, and in the managed-schema file, I find the following:
> <fieldType name="text_general" class="solr.TextField"
> positionIncrementGap="100" multiValued="true">
>     <analyzer type="index">
>       <tokenizer class="solr.StandardTokenizerFactory"/>
>       <filter class="solr.StopFilterFactory" words="stopwords.txt"
> ignoreCase="true"/>
>       <filter class="solr.LowerCaseFilterFactory"/>
>     </analyzer>
>     <analyzer type="query">
>       <tokenizer class="solr.StandardTokenizerFactory"/>
>       <filter class="solr.StopFilterFactory" words="stopwords.txt"
> ignoreCase="true"/>
>       <filter class="solr.SynonymFilterFactory" expand="true"
> ignoreCase="true" synonyms="synonyms.txt"/>
>       <filter class="solr.LowerCaseFilterFactory"/>
>     </analyzer>
>   </fieldType>
>
>
> Regarding the Admin UI >> Analysis page:  I just tried that, and to be
> honest, I can't seem to get much useful info out of it, especially in terms
> of lemmatization.
>
> For example, for any text I enter in it to "analyse", all it does is seem
> to tell me which analysers (if that's the right term?) are being used for
> the selected field / fieldtype, and for each of these analyzers, it would
> give some very basic info, like text, raw_bytes, etc.  Eg, for the input
> "united" in the "field value (index)" box, having "text_general" selected
> for fieldtype, all I get is this:
>
> ST
> text
> raw_bytes
> start
> end
> positionLength
> type
> position
> united
> [75 6e 69 74 65 64]
> 0
> 6
> 1
> <ALPHANUM>
> 1
> SF
> text
> raw_bytes
> start
> end
> positionLength
> type
> position
> united
> [75 6e 69 74 65 64]
> 0
> 6
> 1
> <ALPHANUM>
> 1
> LCF
> text
> raw_bytes
> start
> end
> positionLength
> type
> position
> united
> [75 6e 69 74 65 64]
> 0
> 6
> 1
> <ALPHANUM>
> 1
> Placing the mouse cursor on "ST", "SF", or "LCF" shows a tooltip saying
> "org.apache.lucene.analysis.standard.StandardTokenizer",
> "org...core.StopFilter", and "org...core.LowerCaseFilter", respectively.
>
>
> So - should 'states' not be lemmatized to 'state' using these settings?
>  (If not, then I would need to figure out how to use a different lemmatizer)
>
> Thanks
>
> On Thu, Aug 10, 2017 at 10:28 PM, Erick Erickson <[hidden email]>
> wrote:
>
>> saying the field is "text_general" is not sufficient, please post the
>> analysis chain defined in your schema.
>>
>> Also the admin UI>>analysis page will help you figure out exactly what
>> part of the analysis chain does what.
>>
>> Best,
>> Erick
>>
>> On Thu, Aug 10, 2017 at 8:37 AM, OTH <[hidden email]> wrote:
>> > Hello,
>> >
>> > It seems for me that the token "states" is not getting lemmatized to
>> > "state" by Solr.
>> >
>> > Eg, I have a document with the value "united states of america".
>> > This document is not returned when the following query is issued:
>> > q=name:state^1+name:america^1+name:united^1
>> > However, all documents which contain the token "state" are indeed
>> returned,
>> > with the above query.
>> > The "united states of america" document is returned if I change "state"
>> in
>> > the query to "states"; so:
>> > q=name:states^1+name:america^1+name:united^1
>> >
>> > At first I thought maybe the lemmatization isn't working for some reason.
>> > However, when I changed "united" in the query to "unite", then it did
>> still
>> > return the "united states of america" document:
>> > q=name:states^1+name:america^1+name:unite^1
>> > Which means that the lemmatization is working for the token "united", but
>> > not for the token "states".
>> >
>> > The "name" field above is defined as "text_general".
>> >
>> > So it seems to me, that perhaps the default Solr lemmatizer does not
>> > lemmatize "states" to "state"?
>> > Can anyone confirm if this is indeed the expected behaviour?
>> > And what can I do to change it?
>> > If I need to put in a customer lemmatizer, then what would be the (best)
>> > way to do that?
>> >
>> > Much thanks
>> > Omer
>>
OTH
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Token "states" not getting lemmatized by Solr?

OTH
In reply to this post by OTH
Hello - Sorry, I obviously made a mistake here.

I said earlier that it seems to me that the word 'united' is being
lemmatized (to 'unite').  But it seems that's not the case.  It seems that
there isn't any lemmatization or stemming being done.  I had previously
assumed that the default 'text_general' fieldtype in Solr probably handles
this; but seems that's not the case.

I realize that what is going on with me is something else.  I will start
another email thread for that.

Thanks.


On Thu, Aug 10, 2017 at 11:33 PM, OTH <[hidden email]> wrote:

> Hi,
>
> Regarding 'analysis chain':
>
> I'm using Solr 6.4.1, and in the managed-schema file, I find the following:
> <fieldType name="text_general" class="solr.TextField"
> positionIncrementGap="100" multiValued="true">
>     <analyzer type="index">
>       <tokenizer class="solr.StandardTokenizerFactory"/>
>       <filter class="solr.StopFilterFactory" words="stopwords.txt"
> ignoreCase="true"/>
>       <filter class="solr.LowerCaseFilterFactory"/>
>     </analyzer>
>     <analyzer type="query">
>       <tokenizer class="solr.StandardTokenizerFactory"/>
>       <filter class="solr.StopFilterFactory" words="stopwords.txt"
> ignoreCase="true"/>
>       <filter class="solr.SynonymFilterFactory" expand="true"
> ignoreCase="true" synonyms="synonyms.txt"/>
>       <filter class="solr.LowerCaseFilterFactory"/>
>     </analyzer>
>   </fieldType>
>
>
> Regarding the Admin UI >> Analysis page:  I just tried that, and to be
> honest, I can't seem to get much useful info out of it, especially in terms
> of lemmatization.
>
> For example, for any text I enter in it to "analyse", all it does is seem
> to tell me which analysers (if that's the right term?) are being used for
> the selected field / fieldtype, and for each of these analyzers, it would
> give some very basic info, like text, raw_bytes, etc.  Eg, for the input
> "united" in the "field value (index)" box, having "text_general" selected
> for fieldtype, all I get is this:
>
> ST
> text
> raw_bytes
> start
> end
> positionLength
> type
> position
> united
> [75 6e 69 74 65 64]
> 0
> 6
> 1
> <ALPHANUM>
> 1
> SF
> text
> raw_bytes
> start
> end
> positionLength
> type
> position
> united
> [75 6e 69 74 65 64]
> 0
> 6
> 1
> <ALPHANUM>
> 1
> LCF
> text
> raw_bytes
> start
> end
> positionLength
> type
> position
> united
> [75 6e 69 74 65 64]
> 0
> 6
> 1
> <ALPHANUM>
> 1
> Placing the mouse cursor on "ST", "SF", or "LCF" shows a tooltip saying
> "org.apache.lucene.analysis.standard.StandardTokenizer",
> "org...core.StopFilter", and "org...core.LowerCaseFilter", respectively.
>
>
> So - should 'states' not be lemmatized to 'state' using these settings?
>  (If not, then I would need to figure out how to use a different lemmatizer)
>
> Thanks
>
> On Thu, Aug 10, 2017 at 10:28 PM, Erick Erickson <[hidden email]>
> wrote:
>
>> saying the field is "text_general" is not sufficient, please post the
>> analysis chain defined in your schema.
>>
>> Also the admin UI>>analysis page will help you figure out exactly what
>> part of the analysis chain does what.
>>
>> Best,
>> Erick
>>
>> On Thu, Aug 10, 2017 at 8:37 AM, OTH <[hidden email]> wrote:
>> > Hello,
>> >
>> > It seems for me that the token "states" is not getting lemmatized to
>> > "state" by Solr.
>> >
>> > Eg, I have a document with the value "united states of america".
>> > This document is not returned when the following query is issued:
>> > q=name:state^1+name:america^1+name:united^1
>> > However, all documents which contain the token "state" are indeed
>> returned,
>> > with the above query.
>> > The "united states of america" document is returned if I change "state"
>> in
>> > the query to "states"; so:
>> > q=name:states^1+name:america^1+name:united^1
>> >
>> > At first I thought maybe the lemmatization isn't working for some
>> reason.
>> > However, when I changed "united" in the query to "unite", then it did
>> still
>> > return the "united states of america" document:
>> > q=name:states^1+name:america^1+name:unite^1
>> > Which means that the lemmatization is working for the token "united",
>> but
>> > not for the token "states".
>> >
>> > The "name" field above is defined as "text_general".
>> >
>> > So it seems to me, that perhaps the default Solr lemmatizer does not
>> > lemmatize "states" to "state"?
>> > Can anyone confirm if this is indeed the expected behaviour?
>> > And what can I do to change it?
>> > If I need to put in a customer lemmatizer, then what would be the (best)
>> > way to do that?
>> >
>> > Much thanks
>> > Omer
>>
>
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: Token "states" not getting lemmatized by Solr?

Markus Jelsma-2
In reply to this post by OTH
I checked our English analyzer using KStemFilter. To my surprise, both united and states are not affected by the filter.

Regards,
Markus

 
 
-----Original message-----

> From:Ahmet Arslan <[hidden email]>
> Sent: Thursday 10th August 2017 21:57
> To: [hidden email]
> Subject: Re: Token &quot;states&quot; not getting lemmatized by Solr?
>
> Hi Omer,
> Your analysis chain does not include a stem filter (lemmatizer)
> Assuming you are dealing with English text, you can use KStemFilterFactory or SnowballFilterFactory.
> Ahmet
>
> On Thursday, August 10, 2017, 9:33:08 PM GMT+3, OTH <[hidden email]> wrote:
>
> Hi,
>
> Regarding 'analysis chain':
>
> I'm using Solr 6.4.1, and in the managed-schema file, I find the following:
> <fieldType name="text_general" class="solr.TextField"
> positionIncrementGap="100" multiValued="true">  
>    <analyzer type="index">  
>      <tokenizer class="solr.StandardTokenizerFactory"/>  
>      <filter class="solr.StopFilterFactory" words="stopwords.txt"
> ignoreCase="true"/>  
>      <filter class="solr.LowerCaseFilterFactory"/>  
>    </analyzer>  
>    <analyzer type="query">  
>      <tokenizer class="solr.StandardTokenizerFactory"/>  
>      <filter class="solr.StopFilterFactory" words="stopwords.txt"
> ignoreCase="true"/>  
>      <filter class="solr.SynonymFilterFactory" expand="true"
> ignoreCase="true" synonyms="synonyms.txt"/>  
>      <filter class="solr.LowerCaseFilterFactory"/>  
>    </analyzer>  
>  </fieldType>
>
> Regarding the Admin UI >> Analysis page:  I just tried that, and to be
> honest, I can't seem to get much useful info out of it, especially in terms
> of lemmatization.
>
> For example, for any text I enter in it to "analyse", all it does is seem
> to tell me which analysers (if that's the right term?) are being used for
> the selected field / fieldtype, and for each of these analyzers, it would
> give some very basic info, like text, raw_bytes, etc.  Eg, for the input
> "united" in the "field value (index)" box, having "text_general" selected
> for fieldtype, all I get is this:
>
> ST
> text
> raw_bytes
> start
> end
> positionLength
> type
> position
> united
> [75 6e 69 74 65 64]
> 0
> 6
> 1
> <ALPHANUM>
> 1
> SF
> text
> raw_bytes
> start
> end
> positionLength
> type
> position
> united
> [75 6e 69 74 65 64]
> 0
> 6
> 1
> <ALPHANUM>
> 1
> LCF
> text
> raw_bytes
> start
> end
> positionLength
> type
> position
> united
> [75 6e 69 74 65 64]
> 0
> 6
> 1
> <ALPHANUM>
> 1
> Placing the mouse cursor on "ST", "SF", or "LCF" shows a tooltip saying
> "org.apache.lucene.analysis.standard.StandardTokenizer",
> "org...core.StopFilter", and "org...core.LowerCaseFilter", respectively.
>
> So - should 'states' not be lemmatized to 'state' using these settings?
> (If not, then I would need to figure out how to use a different lemmatizer)
>
> Thanks
>
> On Thu, Aug 10, 2017 at 10:28 PM, Erick Erickson <[hidden email]>
> wrote:
>
> > saying the field is "text_general" is not sufficient, please post the
> > analysis chain defined in your schema.
> >
> > Also the admin UI>>analysis page will help you figure out exactly what
> > part of the analysis chain does what.
> >
> > Best,
> > Erick
> >
> > On Thu, Aug 10, 2017 at 8:37 AM, OTH <[hidden email]> wrote:
> > > Hello,
> > >
> > > It seems for me that the token "states" is not getting lemmatized to
> > > "state" by Solr.
> > >
> > > Eg, I have a document with the value "united states of america".
> > > This document is not returned when the following query is issued:
> > > q=name:state^1+name:america^1+name:united^1
> > > However, all documents which contain the token "state" are indeed
> > returned,
> > > with the above query.
> > > The "united states of america" document is returned if I change "state"
> > in
> > > the query to "states"; so:
> > > q=name:states^1+name:america^1+name:united^1
> > >
> > > At first I thought maybe the lemmatization isn't working for some reason.
> > > However, when I changed "united" in the query to "unite", then it did
> > still
> > > return the "united states of america" document:
> > > q=name:states^1+name:america^1+name:unite^1
> > > Which means that the lemmatization is working for the token "united", but
> > > not for the token "states".
> > >
> > > The "name" field above is defined as "text_general".
> > >
> > > So it seems to me, that perhaps the default Solr lemmatizer does not
> > > lemmatize "states" to "state"?
> > > Can anyone confirm if this is indeed the expected behaviour?
> > > And what can I do to change it?
> > > If I need to put in a customer lemmatizer, then what would be the (best)
> > > way to do that?
> > >
> > > Much thanks
> > > Omer
> >
Loading...