Make search on the particular field to be case sensitive

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Make search on the particular field to be case sensitive

Karan Saini
Hi guys,

Solr version :: 6.6.1

*<field name="NameLine1" type="string" indexed="true" stored="true" />*

I have around 10 fields in my core. I want to make the search on this
specific field to be case sensitive. Please advise, how to introduce case
sensitivity at the field level. What changes do i need to make for this
field ?

Thanks,
Karan
Reply | Threaded
Open this post in threaded view
|

Re: Make search on the particular field to be case sensitive

Amrit Sarkar
Behavior of the field values is defined by fieldType analyzer declaration.

If you look at the managed-schema;

You will find fieldType declarations like:

<fieldType name="text_en" class="solr.TextField" positionIncrementGap="100">

> <analyzer type="index"> <tokenizer class="solr.StandardTokenizerFactory"/>
> <filter class="solr.StopFilterFactory" words="lang/stopwords_en.txt"
> ignoreCase="true"/> <filter class="solr.LowerCaseFilterFactory"/> <filter
> class="solr.EnglishPossessiveFilterFactory"/> <filter class=
> "solr.KeywordMarkerFilterFactory" protected="protwords.txt"/> <filter
> class="solr.PorterStemFilterFactory"/> </analyzer> <analyzer type="query">
> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class=
> "solr.SynonymGraphFilterFactory" expand="true" ignoreCase="true" synonyms=
> "synonyms.txt"/> <filter class="solr.StopFilterFactory" words=
> "lang/stopwords_en.txt" ignoreCase="true"/> <filter class=
> "solr.LowerCaseFilterFactory"/> <filter class=
> "solr.EnglishPossessiveFilterFactory"/> <filter class=
> "solr.KeywordMarkerFilterFactory" protected="protwords.txt"/> <filter
> class="solr.PorterStemFilterFactory"/> </analyzer> </fieldType>


In you case fieldType is "string". *You need to write analyzer chain for
the same fieldType and don't include:*
 <filter class="solr.LowerCaseFilterFactory"/>

LowerCaseFilterFactory is responsible lowercase the token coming in query
and while indexing.

Something like this will work for you:

<fieldType name="string" class="solr.StrField" sortMissingLast="true"
docValues="true"/>
<analyzer> <tokenizer class="solr.KeywordTokenizerFactory"/> </analyzer> </
fieldType>

I listed "KeywordTokenizerFactory" considering this is string, not text.

More details on: https://lucene.apache.org/solr/guide/6_6/analyzers.html

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2

On Thu, Nov 9, 2017 at 4:41 PM, Karan Saini <[hidden email]> wrote:

> Hi guys,
>
> Solr version :: 6.6.1
>
> *<field name="NameLine1" type="string" indexed="true" stored="true" />*
>
> I have around 10 fields in my core. I want to make the search on this
> specific field to be case sensitive. Please advise, how to introduce case
> sensitivity at the field level. What changes do i need to make for this
> field ?
>
> Thanks,
> Karan
>
Reply | Threaded
Open this post in threaded view
|

Re: Make search on the particular field to be case sensitive

Erick Erickson
This won't quite work. "string" types are totally un-analyzed you
cannot add filters to a solr.StrField, you must use solr.TextField
rather than solr.StrField.


<fieldType name="string" class="solr.TextField" sortMissingLast="true"
docValues="true"/>
<analyzer>
      <tokenizer class="solr.KeywordTokenizerFactory"/>
       <filter class="solr.LowerCaseFilterFactory"/>
 </analyzer>
 </fieldType>


start over and re-index from scratch in a new collection of course.

You also need to make sure you really want to search on the whole
field. The KeywordTokenizerFactory doesn't split the incoming test up
_at all_. So if the input is
"my dog has fleas" you can't search for just "dog" unless you use the
extremely inefficient *dog* form. If you want to search for words, use
an tokenizer that breaks up the input, WhitespaceTokenizer for
instance.

Best,
Erick

On Thu, Nov 9, 2017 at 3:24 AM, Amrit Sarkar <[hidden email]> wrote:

> Behavior of the field values is defined by fieldType analyzer declaration.
>
> If you look at the managed-schema;
>
> You will find fieldType declarations like:
>
> <fieldType name="text_en" class="solr.TextField" positionIncrementGap="100">
>> <analyzer type="index"> <tokenizer class="solr.StandardTokenizerFactory"/>
>> <filter class="solr.StopFilterFactory" words="lang/stopwords_en.txt"
>> ignoreCase="true"/> <filter class="solr.LowerCaseFilterFactory"/> <filter
>> class="solr.EnglishPossessiveFilterFactory"/> <filter class=
>> "solr.KeywordMarkerFilterFactory" protected="protwords.txt"/> <filter
>> class="solr.PorterStemFilterFactory"/> </analyzer> <analyzer type="query">
>> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class=
>> "solr.SynonymGraphFilterFactory" expand="true" ignoreCase="true" synonyms=
>> "synonyms.txt"/> <filter class="solr.StopFilterFactory" words=
>> "lang/stopwords_en.txt" ignoreCase="true"/> <filter class=
>> "solr.LowerCaseFilterFactory"/> <filter class=
>> "solr.EnglishPossessiveFilterFactory"/> <filter class=
>> "solr.KeywordMarkerFilterFactory" protected="protwords.txt"/> <filter
>> class="solr.PorterStemFilterFactory"/> </analyzer> </fieldType>
>
>
> In you case fieldType is "string". *You need to write analyzer chain for
> the same fieldType and don't include:*
>  <filter class="solr.LowerCaseFilterFactory"/>
>
> LowerCaseFilterFactory is responsible lowercase the token coming in query
> and while indexing.
>
> Something like this will work for you:
>
> <fieldType name="string" class="solr.StrField" sortMissingLast="true"
> docValues="true"/>
> <analyzer> <tokenizer class="solr.KeywordTokenizerFactory"/> </analyzer> </
> fieldType>
>
> I listed "KeywordTokenizerFactory" considering this is string, not text.
>
> More details on: https://lucene.apache.org/solr/guide/6_6/analyzers.html
>
> Amrit Sarkar
> Search Engineer
> Lucidworks, Inc.
> 415-589-9269
> www.lucidworks.com
> Twitter http://twitter.com/lucidworks
> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> Medium: https://medium.com/@sarkaramrit2
>
> On Thu, Nov 9, 2017 at 4:41 PM, Karan Saini <[hidden email]> wrote:
>
>> Hi guys,
>>
>> Solr version :: 6.6.1
>>
>> *<field name="NameLine1" type="string" indexed="true" stored="true" />*
>>
>> I have around 10 fields in my core. I want to make the search on this
>> specific field to be case sensitive. Please advise, how to introduce case
>> sensitivity at the field level. What changes do i need to make for this
>> field ?
>>
>> Thanks,
>> Karan
>>
Reply | Threaded
Open this post in threaded view
|

Re: Make search on the particular field to be case sensitive

Amrit Sarkar
Ah ok.

I didn't test and laid it over. Thank you Erick for correcting me out.

On 9 Nov 2017 9:06 p.m., "Erick Erickson" <[hidden email]> wrote:

> This won't quite work. "string" types are totally un-analyzed you
> cannot add filters to a solr.StrField, you must use solr.TextField
> rather than solr.StrField.
>
>
> <fieldType name="string" class="solr.TextField" sortMissingLast="true"
> docValues="true"/>
> <analyzer>
>       <tokenizer class="solr.KeywordTokenizerFactory"/>
>        <filter class="solr.LowerCaseFilterFactory"/>
>  </analyzer>
>  </fieldType>
>
>
> start over and re-index from scratch in a new collection of course.
>
> You also need to make sure you really want to search on the whole
> field. The KeywordTokenizerFactory doesn't split the incoming test up
> _at all_. So if the input is
> "my dog has fleas" you can't search for just "dog" unless you use the
> extremely inefficient *dog* form. If you want to search for words, use
> an tokenizer that breaks up the input, WhitespaceTokenizer for
> instance.
>
> Best,
> Erick
>
> On Thu, Nov 9, 2017 at 3:24 AM, Amrit Sarkar <[hidden email]>
> wrote:
> > Behavior of the field values is defined by fieldType analyzer
> declaration.
> >
> > If you look at the managed-schema;
> >
> > You will find fieldType declarations like:
> >
> > <fieldType name="text_en" class="solr.TextField"
> positionIncrementGap="100">
> >> <analyzer type="index"> <tokenizer class="solr.
> StandardTokenizerFactory"/>
> >> <filter class="solr.StopFilterFactory" words="lang/stopwords_en.txt"
> >> ignoreCase="true"/> <filter class="solr.LowerCaseFilterFactory"/>
> <filter
> >> class="solr.EnglishPossessiveFilterFactory"/> <filter class=
> >> "solr.KeywordMarkerFilterFactory" protected="protwords.txt"/> <filter
> >> class="solr.PorterStemFilterFactory"/> </analyzer> <analyzer
> type="query">
> >> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class=
> >> "solr.SynonymGraphFilterFactory" expand="true" ignoreCase="true"
> synonyms=
> >> "synonyms.txt"/> <filter class="solr.StopFilterFactory" words=
> >> "lang/stopwords_en.txt" ignoreCase="true"/> <filter class=
> >> "solr.LowerCaseFilterFactory"/> <filter class=
> >> "solr.EnglishPossessiveFilterFactory"/> <filter class=
> >> "solr.KeywordMarkerFilterFactory" protected="protwords.txt"/> <filter
> >> class="solr.PorterStemFilterFactory"/> </analyzer> </fieldType>
> >
> >
> > In you case fieldType is "string". *You need to write analyzer chain for
> > the same fieldType and don't include:*
> >  <filter class="solr.LowerCaseFilterFactory"/>
> >
> > LowerCaseFilterFactory is responsible lowercase the token coming in query
> > and while indexing.
> >
> > Something like this will work for you:
> >
> > <fieldType name="string" class="solr.StrField" sortMissingLast="true"
> > docValues="true"/>
> > <analyzer> <tokenizer class="solr.KeywordTokenizerFactory"/>
> </analyzer> </
> > fieldType>
> >
> > I listed "KeywordTokenizerFactory" considering this is string, not text.
> >
> > More details on: https://lucene.apache.org/solr/guide/6_6/analyzers.html
> >
> > Amrit Sarkar
> > Search Engineer
> > Lucidworks, Inc.
> > 415-589-9269
> > www.lucidworks.com
> > Twitter http://twitter.com/lucidworks
> > LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> > Medium: https://medium.com/@sarkaramrit2
> >
> > On Thu, Nov 9, 2017 at 4:41 PM, Karan Saini <[hidden email]>
> wrote:
> >
> >> Hi guys,
> >>
> >> Solr version :: 6.6.1
> >>
> >> *<field name="NameLine1" type="string" indexed="true" stored="true" />*
> >>
> >> I have around 10 fields in my core. I want to make the search on this
> >> specific field to be case sensitive. Please advise, how to introduce
> case
> >> sensitivity at the field level. What changes do i need to make for this
> >> field ?
> >>
> >> Thanks,
> >> Karan
> >>
>
Reply | Threaded
Open this post in threaded view
|

Re: Make search on the particular field to be case sensitive

Karan Saini
In reply to this post by Erick Erickson
Hi Erick,

Thanks for the help. It is working fine with the
*KeywordTokenizerFactory. *Like
you mentioned, i want to search for "dog" or "*dog*" alone also.
Case sensitivity is working fine, but i want to have the wild based search
also.

So I tried this changed code, but no luck !!

  <fieldType name="NameLine1" class="solr.TextField" sortMissingLast="true">
        <analyzer>
          <!--<tokenizer class="solr.KeywordTokenizerFactory"/>-->
          <tokenizer class="solr.*WhitespaceTokenizerFactory*" />
          <filter class="solr.LowerCaseFilterFactory"/>
        </analyzer>
    </fieldType>

Please suggest, where am i making the mistake.

Kind regards,
Karan



On 9 November 2017 at 21:05, Erick Erickson <[hidden email]> wrote:

> This won't quite work. "string" types are totally un-analyzed you
> cannot add filters to a solr.StrField, you must use solr.TextField
> rather than solr.StrField.
>
>
> <fieldType name="string" class="solr.TextField" sortMissingLast="true"
> docValues="true"/>
> <analyzer>
>       <tokenizer class="solr.KeywordTokenizerFactory"/>
>        <filter class="solr.LowerCaseFilterFactory"/>
>  </analyzer>
>  </fieldType>
>
>
> start over and re-index from scratch in a new collection of course.
>
> You also need to make sure you really want to search on the whole
> field. The KeywordTokenizerFactory doesn't split the incoming test up
> _at all_. So if the input is
> "my dog has fleas" you can't search for just "dog" unless you use the
> extremely inefficient *dog* form. If you want to search for words, use
> an tokenizer that breaks up the input, WhitespaceTokenizer for
> instance.
>
> Best,
> Erick
>
> On Thu, Nov 9, 2017 at 3:24 AM, Amrit Sarkar <[hidden email]>
> wrote:
> > Behavior of the field values is defined by fieldType analyzer
> declaration.
> >
> > If you look at the managed-schema;
> >
> > You will find fieldType declarations like:
> >
> > <fieldType name="text_en" class="solr.TextField"
> positionIncrementGap="100">
> >> <analyzer type="index"> <tokenizer class="solr.
> StandardTokenizerFactory"/>
> >> <filter class="solr.StopFilterFactory" words="lang/stopwords_en.txt"
> >> ignoreCase="true"/> <filter class="solr.LowerCaseFilterFactory"/>
> <filter
> >> class="solr.EnglishPossessiveFilterFactory"/> <filter class=
> >> "solr.KeywordMarkerFilterFactory" protected="protwords.txt"/> <filter
> >> class="solr.PorterStemFilterFactory"/> </analyzer> <analyzer
> type="query">
> >> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class=
> >> "solr.SynonymGraphFilterFactory" expand="true" ignoreCase="true"
> synonyms=
> >> "synonyms.txt"/> <filter class="solr.StopFilterFactory" words=
> >> "lang/stopwords_en.txt" ignoreCase="true"/> <filter class=
> >> "solr.LowerCaseFilterFactory"/> <filter class=
> >> "solr.EnglishPossessiveFilterFactory"/> <filter class=
> >> "solr.KeywordMarkerFilterFactory" protected="protwords.txt"/> <filter
> >> class="solr.PorterStemFilterFactory"/> </analyzer> </fieldType>
> >
> >
> > In you case fieldType is "string". *You need to write analyzer chain for
> > the same fieldType and don't include:*
> >  <filter class="solr.LowerCaseFilterFactory"/>
> >
> > LowerCaseFilterFactory is responsible lowercase the token coming in query
> > and while indexing.
> >
> > Something like this will work for you:
> >
> > <fieldType name="string" class="solr.StrField" sortMissingLast="true"
> > docValues="true"/>
> > <analyzer> <tokenizer class="solr.KeywordTokenizerFactory"/>
> </analyzer> </
> > fieldType>
> >
> > I listed "KeywordTokenizerFactory" considering this is string, not text.
> >
> > More details on: https://lucene.apache.org/solr/guide/6_6/analyzers.html
> >
> > Amrit Sarkar
> > Search Engineer
> > Lucidworks, Inc.
> > 415-589-9269
> > www.lucidworks.com
> > Twitter http://twitter.com/lucidworks
> > LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> > Medium: https://medium.com/@sarkaramrit2
> >
> > On Thu, Nov 9, 2017 at 4:41 PM, Karan Saini <[hidden email]>
> wrote:
> >
> >> Hi guys,
> >>
> >> Solr version :: 6.6.1
> >>
> >> *<field name="NameLine1" type="string" indexed="true" stored="true" />*
> >>
> >> I have around 10 fields in my core. I want to make the search on this
> >> specific field to be case sensitive. Please advise, how to introduce
> case
> >> sensitivity at the field level. What changes do i need to make for this
> >> field ?
> >>
> >> Thanks,
> >> Karan
> >>
>
Reply | Threaded
Open this post in threaded view
|

Re: Make search on the particular field to be case sensitive

Karan Saini
Hi Erick,

Please ignore my earlier mail. I got it worked ! I missed the rule
attribute.

<tokenizer class="solr.WhitespaceTokenizerFactory" *rule="java" */>

Now it is working.

Thanks,
Karan



On 10 November 2017 at 15:59, Karan Saini <[hidden email]> wrote:

> Hi Erick,
>
> Thanks for the help. It is working fine with the *KeywordTokenizerFactory.
> *Like you mentioned, i want to search for "dog" or "*dog*" alone also.
> Case sensitivity is working fine, but i want to have the wild based search
> also.
>
> So I tried this changed code, but no luck !!
>
>   <fieldType name="NameLine1" class="solr.TextField"
> sortMissingLast="true">
>         <analyzer>
>           <!--<tokenizer class="solr.KeywordTokenizerFactory"/>-->
>           <tokenizer class="solr.*WhitespaceTokenizerFactory*" />
>           <filter class="solr.LowerCaseFilterFactory"/>
>         </analyzer>
>     </fieldType>
>
> Please suggest, where am i making the mistake.
>
> Kind regards,
> Karan
>
>
>
> On 9 November 2017 at 21:05, Erick Erickson <[hidden email]>
> wrote:
>
>> This won't quite work. "string" types are totally un-analyzed you
>> cannot add filters to a solr.StrField, you must use solr.TextField
>> rather than solr.StrField.
>>
>>
>> <fieldType name="string" class="solr.TextField" sortMissingLast="true"
>> docValues="true"/>
>> <analyzer>
>>       <tokenizer class="solr.KeywordTokenizerFactory"/>
>>        <filter class="solr.LowerCaseFilterFactory"/>
>>  </analyzer>
>>  </fieldType>
>>
>>
>> start over and re-index from scratch in a new collection of course.
>>
>> You also need to make sure you really want to search on the whole
>> field. The KeywordTokenizerFactory doesn't split the incoming test up
>> _at all_. So if the input is
>> "my dog has fleas" you can't search for just "dog" unless you use the
>> extremely inefficient *dog* form. If you want to search for words, use
>> an tokenizer that breaks up the input, WhitespaceTokenizer for
>> instance.
>>
>> Best,
>> Erick
>>
>> On Thu, Nov 9, 2017 at 3:24 AM, Amrit Sarkar <[hidden email]>
>> wrote:
>> > Behavior of the field values is defined by fieldType analyzer
>> declaration.
>> >
>> > If you look at the managed-schema;
>> >
>> > You will find fieldType declarations like:
>> >
>> > <fieldType name="text_en" class="solr.TextField"
>> positionIncrementGap="100">
>> >> <analyzer type="index"> <tokenizer class="solr.StandardTokenizerF
>> actory"/>
>> >> <filter class="solr.StopFilterFactory" words="lang/stopwords_en.txt"
>> >> ignoreCase="true"/> <filter class="solr.LowerCaseFilterFactory"/>
>> <filter
>> >> class="solr.EnglishPossessiveFilterFactory"/> <filter class=
>> >> "solr.KeywordMarkerFilterFactory" protected="protwords.txt"/> <filter
>> >> class="solr.PorterStemFilterFactory"/> </analyzer> <analyzer
>> type="query">
>> >> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class=
>> >> "solr.SynonymGraphFilterFactory" expand="true" ignoreCase="true"
>> synonyms=
>> >> "synonyms.txt"/> <filter class="solr.StopFilterFactory" words=
>> >> "lang/stopwords_en.txt" ignoreCase="true"/> <filter class=
>> >> "solr.LowerCaseFilterFactory"/> <filter class=
>> >> "solr.EnglishPossessiveFilterFactory"/> <filter class=
>> >> "solr.KeywordMarkerFilterFactory" protected="protwords.txt"/> <filter
>> >> class="solr.PorterStemFilterFactory"/> </analyzer> </fieldType>
>> >
>> >
>> > In you case fieldType is "string". *You need to write analyzer chain for
>> > the same fieldType and don't include:*
>> >  <filter class="solr.LowerCaseFilterFactory"/>
>> >
>> > LowerCaseFilterFactory is responsible lowercase the token coming in
>> query
>> > and while indexing.
>> >
>> > Something like this will work for you:
>> >
>> > <fieldType name="string" class="solr.StrField" sortMissingLast="true"
>> > docValues="true"/>
>> > <analyzer> <tokenizer class="solr.KeywordTokenizerFactory"/>
>> </analyzer> </
>> > fieldType>
>> >
>> > I listed "KeywordTokenizerFactory" considering this is string, not text.
>> >
>> > More details on: https://lucene.apache.org/solr
>> /guide/6_6/analyzers.html
>> >
>> > Amrit Sarkar
>> > Search Engineer
>> > Lucidworks, Inc.
>> > 415-589-9269
>> > www.lucidworks.com
>> > Twitter http://twitter.com/lucidworks
>> > LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>> > Medium: https://medium.com/@sarkaramrit2
>> >
>> > On Thu, Nov 9, 2017 at 4:41 PM, Karan Saini <[hidden email]>
>> wrote:
>> >
>> >> Hi guys,
>> >>
>> >> Solr version :: 6.6.1
>> >>
>> >> *<field name="NameLine1" type="string" indexed="true" stored="true" />*
>> >>
>> >> I have around 10 fields in my core. I want to make the search on this
>> >> specific field to be case sensitive. Please advise, how to introduce
>> case
>> >> sensitivity at the field level. What changes do i need to make for this
>> >> field ?
>> >>
>> >> Thanks,
>> >> Karan
>> >>
>>
>
>