howto replace fieldType string with text lowercase

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

howto replace fieldType string with text lowercase

Bernd Fehling
Dear list,

for one field I want to change fieldType from string to something
equal to string, but only lowercase.

currently:
<field name="firstname" type="string" indexed="true" stored="true" multiValued="true">
<fieldType name="string" class="solr.StrField" sortMissingLast="true" />

new:
<field name="firstname" type="text_lc" indexed="true" stored="true" multiValued="true">
<fieldType name="text_lc" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="true">
  <analyzer>
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldType>

Is this the right replacement for "string"?
Are the attributes for solr.TextField ok?

Regards
Bernd
Reply | Threaded
Open this post in threaded view
|

Re: howto replace fieldType string with text lowercase

MUNENDRA S N
Instead of StandardTokenizerFactory use KeywordTokenizerFactory which emits
whole text as a single token. Once you make this change, full reindexing
needs to be done. After field type, some functionality might not be
performant on the field like faceting, sorting.
I'm not sure if there are any out-of-the-box update processors to convert
the value to lowercase but implementing one should be easy. Other approach
is to convert the value in the preprocessing phase before sending it Solr.

Regards,
Munendra S N



On Fri, Dec 6, 2019 at 2:45 PM Bernd Fehling <[hidden email]>
wrote:

> Dear list,
>
> for one field I want to change fieldType from string to something
> equal to string, but only lowercase.
>
> currently:
> <field name="firstname" type="string" indexed="true" stored="true"
> multiValued="true">
> <fieldType name="string" class="solr.StrField" sortMissingLast="true" />
>
> new:
> <field name="firstname" type="text_lc" indexed="true" stored="true"
> multiValued="true">
> <fieldType name="text_lc" class="solr.TextField"
> positionIncrementGap="100" autoGeneratePhraseQueries="true">
>   <analyzer>
>     <tokenizer class="solr.StandardTokenizerFactory"/>
>     <filter class="solr.LowerCaseFilterFactory"/>
>   </analyzer>
> </fieldType>
>
> Is this the right replacement for "string"?
> Are the attributes for solr.TextField ok?
>
> Regards
> Bernd
>
Reply | Threaded
Open this post in threaded view
|

Re: howto replace fieldType string with text lowercase

Bernd Fehling
In reply to this post by Bernd Fehling
Hi Munendra S N,

thanks for the hint about Tokenizer.
Could I omit Tokenizer at all or is it needed by LowerCaseFilterFactory?

The field "firstname" has no facetting and sorting.
Also, I want to keep the raw content as is, with capital letters and so on.
I think update processors and preprocessing before loading would not help here.

Regards
Bernd


Am 06.12.19 um 11:31 schrieb Munendra S N:

> Instead of StandardTokenizerFactory use KeywordTokenizerFactory which emits
> whole text as a single token. Once you make this change, full reindexing
> needs to be done. After field type, some functionality might not be
> performant on the field like faceting, sorting.
> I'm not sure if there are any out-of-the-box update processors to convert
> the value to lowercase but implementing one should be easy. Other approach
> is to convert the value in the preprocessing phase before sending it Solr.
>
> Regards,
> Munendra S N
>
>
>
> On Fri, Dec 6, 2019 at 2:45 PM Bernd Fehling <[hidden email]>
> wrote:
>
>> Dear list,
>>
>> for one field I want to change fieldType from string to something
>> equal to string, but only lowercase.
>>
>> currently:
>> <field name="firstname" type="string" indexed="true" stored="true"
>> multiValued="true">
>> <fieldType name="string" class="solr.StrField" sortMissingLast="true" />
>>
>> new:
>> <field name="firstname" type="text_lc" indexed="true" stored="true"
>> multiValued="true">
>> <fieldType name="text_lc" class="solr.TextField"
>> positionIncrementGap="100" autoGeneratePhraseQueries="true">
>>   <analyzer>
>>     <tokenizer class="solr.StandardTokenizerFactory"/>
>>     <filter class="solr.LowerCaseFilterFactory"/>
>>   </analyzer>
>> </fieldType>
>>
>> Is this the right replacement for "string"?
>> Are the attributes for solr.TextField ok?
>>
>> Regards
>> Bernd
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: howto replace fieldType string with text lowercase

MUNENDRA S N
Tokenizer is required

Regards,
Munendra S N



On Fri, Dec 6, 2019 at 4:14 PM Bernd Fehling <[hidden email]>
wrote:

> Hi Munendra S N,
>
> thanks for the hint about Tokenizer.
> Could I omit Tokenizer at all or is it needed by LowerCaseFilterFactory?
>
> The field "firstname" has no facetting and sorting.
> Also, I want to keep the raw content as is, with capital letters and so on.
> I think update processors and preprocessing before loading would not help
> here.
>
> Regards
> Bernd
>
>
> Am 06.12.19 um 11:31 schrieb Munendra S N:
> > Instead of StandardTokenizerFactory use KeywordTokenizerFactory which
> emits
> > whole text as a single token. Once you make this change, full reindexing
> > needs to be done. After field type, some functionality might not be
> > performant on the field like faceting, sorting.
> > I'm not sure if there are any out-of-the-box update processors to convert
> > the value to lowercase but implementing one should be easy. Other
> approach
> > is to convert the value in the preprocessing phase before sending it
> Solr.
> >
> > Regards,
> > Munendra S N
> >
> >
> >
> > On Fri, Dec 6, 2019 at 2:45 PM Bernd Fehling <
> [hidden email]>
> > wrote:
> >
> >> Dear list,
> >>
> >> for one field I want to change fieldType from string to something
> >> equal to string, but only lowercase.
> >>
> >> currently:
> >> <field name="firstname" type="string" indexed="true" stored="true"
> >> multiValued="true">
> >> <fieldType name="string" class="solr.StrField" sortMissingLast="true" />
> >>
> >> new:
> >> <field name="firstname" type="text_lc" indexed="true" stored="true"
> >> multiValued="true">
> >> <fieldType name="text_lc" class="solr.TextField"
> >> positionIncrementGap="100" autoGeneratePhraseQueries="true">
> >>   <analyzer>
> >>     <tokenizer class="solr.StandardTokenizerFactory"/>
> >>     <filter class="solr.LowerCaseFilterFactory"/>
> >>   </analyzer>
> >> </fieldType>
> >>
> >> Is this the right replacement for "string"?
> >> Are the attributes for solr.TextField ok?
> >>
> >> Regards
> >> Bernd
> >>
> >
>