Is it possible to add stemming in a text_exact field

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Is it possible to add stemming in a text_exact field

dhaneshsr
Hello,
I'm facing an issue with stemming.
My search query is "restaurant dubai" and returns  results.
If I search "restaurants dubai" it returns no data.

How to stem this keyword "restaurant dubai" with "restaurants dubai" ?

I'm using a text exact field for search.

<field name="business_locality" type="text_exact" required="true"
multiValued="true" omitNorms="false" omitTermFreqAndPositions="false"/>

Here is the field definition

    <fieldType name="text_exact" class="solr.TextField"
positionIncrementGap="100">
        <analyzer type="index">
           <tokenizer class="solr.KeywordTokenizerFactory" />
           <filter class="solr.LowerCaseFilterFactory" />
           <filter class="solr.TrimFilterFactory" />
           <filter class="solr.PorterStemFilterFactory"/>
        </analyzer>
        <analyzer type="query">
          <tokenizer class="solr.KeywordTokenizerFactory" />
          <filter class="solr.LowerCaseFilterFactory" />
          <filter class="solr.TrimFilterFactory" />
          <filter class="solr.PorterStemFilterFactory"/>
       </analyzer>
    </fieldType>

Is there any solutions without changing the tokenizer class.




Dhanesh S.R

--
IMPORTANT: This is an e-mail from HiFX IT Media Services Pvt. Ltd. Its
content are confidential to the intended recipient. If you are not the
intended recipient, be advised that you have received this e-mail in error
and that any use, dissemination, forwarding, printing or copying of this
e-mail is strictly prohibited. It may not be disclosed to or used by anyone
other than its intended recipient, nor may it be copied in any way. If
received in error, please email a reply to the sender, then delete it from
your system.

Although this e-mail has been scanned for viruses, HiFX
cannot ultimately accept any responsibility for viruses and it is your
responsibility to scan attachments (if any).

​Before you print this email
or attachments, please consider the negative environmental impacts
associated with printing.
Reply | Threaded
Open this post in threaded view
|

Re: Is it possible to add stemming in a text_exact field

Edward Ribeiro
Hi,

One possible solution would be to create a second field (e.g.,
text_general) that uses DefaultTokenizer, or other tokenizer that breaks
the string into tokens, and use a copyField to copy the content from
text_exact to text_general. Then, you can use edismax parser to search both
fields, but giving text_exact a higher boost (qf=text_exact^5
text_general). In this case, both fields should be indexed, but only one
needs to be stored.

Edward

On Wed, Jan 22, 2020 at 10:34 AM Dhanesh Radhakrishnan <[hidden email]>
wrote:

> Hello,
> I'm facing an issue with stemming.
> My search query is "restaurant dubai" and returns  results.
> If I search "restaurants dubai" it returns no data.
>
> How to stem this keyword "restaurant dubai" with "restaurants dubai" ?
>
> I'm using a text exact field for search.
>
> <field name="business_locality" type="text_exact" required="true"
> multiValued="true" omitNorms="false" omitTermFreqAndPositions="false"/>
>
> Here is the field definition
>
>     <fieldType name="text_exact" class="solr.TextField"
> positionIncrementGap="100">
>         <analyzer type="index">
>            <tokenizer class="solr.KeywordTokenizerFactory" />
>            <filter class="solr.LowerCaseFilterFactory" />
>            <filter class="solr.TrimFilterFactory" />
>            <filter class="solr.PorterStemFilterFactory"/>
>         </analyzer>
>         <analyzer type="query">
>           <tokenizer class="solr.KeywordTokenizerFactory" />
>           <filter class="solr.LowerCaseFilterFactory" />
>           <filter class="solr.TrimFilterFactory" />
>           <filter class="solr.PorterStemFilterFactory"/>
>        </analyzer>
>     </fieldType>
>
> Is there any solutions without changing the tokenizer class.
>
>
>
>
> Dhanesh S.R
>
> --
> IMPORTANT: This is an e-mail from HiFX IT Media Services Pvt. Ltd. Its
> content are confidential to the intended recipient. If you are not the
> intended recipient, be advised that you have received this e-mail in error
> and that any use, dissemination, forwarding, printing or copying of this
> e-mail is strictly prohibited. It may not be disclosed to or used by
> anyone
> other than its intended recipient, nor may it be copied in any way. If
> received in error, please email a reply to the sender, then delete it from
> your system.
>
> Although this e-mail has been scanned for viruses, HiFX
> cannot ultimately accept any responsibility for viruses and it is your
> responsibility to scan attachments (if any).
>
> ​Before you print this email
> or attachments, please consider the negative environmental impacts
> associated with printing.
>
Reply | Threaded
Open this post in threaded view
|

Re: Is it possible to add stemming in a text_exact field

Alessandro Benedetti
Edward is correct, furthermore using a stemmer in an analysis chain that
don't tokenise is going to work just for single term queries and single
term field values...
Not sure it was intended ...

Cheers


--------------------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
www.sease.io


On Wed, 22 Jan 2020 at 16:26, Edward Ribeiro <[hidden email]>
wrote:

> Hi,
>
> One possible solution would be to create a second field (e.g.,
> text_general) that uses DefaultTokenizer, or other tokenizer that breaks
> the string into tokens, and use a copyField to copy the content from
> text_exact to text_general. Then, you can use edismax parser to search both
> fields, but giving text_exact a higher boost (qf=text_exact^5
> text_general). In this case, both fields should be indexed, but only one
> needs to be stored.
>
> Edward
>
> On Wed, Jan 22, 2020 at 10:34 AM Dhanesh Radhakrishnan <[hidden email]
> >
> wrote:
>
> > Hello,
> > I'm facing an issue with stemming.
> > My search query is "restaurant dubai" and returns  results.
> > If I search "restaurants dubai" it returns no data.
> >
> > How to stem this keyword "restaurant dubai" with "restaurants dubai" ?
> >
> > I'm using a text exact field for search.
> >
> > <field name="business_locality" type="text_exact" required="true"
> > multiValued="true" omitNorms="false" omitTermFreqAndPositions="false"/>
> >
> > Here is the field definition
> >
> >     <fieldType name="text_exact" class="solr.TextField"
> > positionIncrementGap="100">
> >         <analyzer type="index">
> >            <tokenizer class="solr.KeywordTokenizerFactory" />
> >            <filter class="solr.LowerCaseFilterFactory" />
> >            <filter class="solr.TrimFilterFactory" />
> >            <filter class="solr.PorterStemFilterFactory"/>
> >         </analyzer>
> >         <analyzer type="query">
> >           <tokenizer class="solr.KeywordTokenizerFactory" />
> >           <filter class="solr.LowerCaseFilterFactory" />
> >           <filter class="solr.TrimFilterFactory" />
> >           <filter class="solr.PorterStemFilterFactory"/>
> >        </analyzer>
> >     </fieldType>
> >
> > Is there any solutions without changing the tokenizer class.
> >
> >
> >
> >
> > Dhanesh S.R
> >
> > --
> > IMPORTANT: This is an e-mail from HiFX IT Media Services Pvt. Ltd. Its
> > content are confidential to the intended recipient. If you are not the
> > intended recipient, be advised that you have received this e-mail in
> error
> > and that any use, dissemination, forwarding, printing or copying of this
> > e-mail is strictly prohibited. It may not be disclosed to or used by
> > anyone
> > other than its intended recipient, nor may it be copied in any way. If
> > received in error, please email a reply to the sender, then delete it
> from
> > your system.
> >
> > Although this e-mail has been scanned for viruses, HiFX
> > cannot ultimately accept any responsibility for viruses and it is your
> > responsibility to scan attachments (if any).
> >
> > ​Before you print this email
> > or attachments, please consider the negative environmental impacts
> > associated with printing.
> >
>
---------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
Reply | Threaded
Open this post in threaded view
|

Re: Is it possible to add stemming in a text_exact field

Lucky Sharma
Hi Dhanesh,
I have also encountered the problem long back when we have 'skimmed milk'
and need to search for 'skim milk', for that we have written one filter,
such that we can customize it, and then use KStemmer, then apply the custom
ConcatPhraseFilterFactory.

You can use the link mentioned below to review:
https://github.com/MighTguY/solr-extensions

Regards,
Lucky Sharma

On Thu, 23 Jan, 2020, 8:58 pm Alessandro Benedetti, <[hidden email]>
wrote:

> Edward is correct, furthermore using a stemmer in an analysis chain that
> don't tokenise is going to work just for single term queries and single
> term field values...
> Not sure it was intended ...
>
> Cheers
>
>
> --------------------------
> Alessandro Benedetti
> Search Consultant, R&D Software Engineer, Director
> www.sease.io
>
>
> On Wed, 22 Jan 2020 at 16:26, Edward Ribeiro <[hidden email]>
> wrote:
>
> > Hi,
> >
> > One possible solution would be to create a second field (e.g.,
> > text_general) that uses DefaultTokenizer, or other tokenizer that breaks
> > the string into tokens, and use a copyField to copy the content from
> > text_exact to text_general. Then, you can use edismax parser to search
> both
> > fields, but giving text_exact a higher boost (qf=text_exact^5
> > text_general). In this case, both fields should be indexed, but only one
> > needs to be stored.
> >
> > Edward
> >
> > On Wed, Jan 22, 2020 at 10:34 AM Dhanesh Radhakrishnan <
> [hidden email]
> > >
> > wrote:
> >
> > > Hello,
> > > I'm facing an issue with stemming.
> > > My search query is "restaurant dubai" and returns  results.
> > > If I search "restaurants dubai" it returns no data.
> > >
> > > How to stem this keyword "restaurant dubai" with "restaurants dubai" ?
> > >
> > > I'm using a text exact field for search.
> > >
> > > <field name="business_locality" type="text_exact" required="true"
> > > multiValued="true" omitNorms="false" omitTermFreqAndPositions="false"/>
> > >
> > > Here is the field definition
> > >
> > >     <fieldType name="text_exact" class="solr.TextField"
> > > positionIncrementGap="100">
> > >         <analyzer type="index">
> > >            <tokenizer class="solr.KeywordTokenizerFactory" />
> > >            <filter class="solr.LowerCaseFilterFactory" />
> > >            <filter class="solr.TrimFilterFactory" />
> > >            <filter class="solr.PorterStemFilterFactory"/>
> > >         </analyzer>
> > >         <analyzer type="query">
> > >           <tokenizer class="solr.KeywordTokenizerFactory" />
> > >           <filter class="solr.LowerCaseFilterFactory" />
> > >           <filter class="solr.TrimFilterFactory" />
> > >           <filter class="solr.PorterStemFilterFactory"/>
> > >        </analyzer>
> > >     </fieldType>
> > >
> > > Is there any solutions without changing the tokenizer class.
> > >
> > >
> > >
> > >
> > > Dhanesh S.R
> > >
> > > --
> > > IMPORTANT: This is an e-mail from HiFX IT Media Services Pvt. Ltd. Its
> > > content are confidential to the intended recipient. If you are not the
> > > intended recipient, be advised that you have received this e-mail in
> > error
> > > and that any use, dissemination, forwarding, printing or copying of
> this
> > > e-mail is strictly prohibited. It may not be disclosed to or used by
> > > anyone
> > > other than its intended recipient, nor may it be copied in any way. If
> > > received in error, please email a reply to the sender, then delete it
> > from
> > > your system.
> > >
> > > Although this e-mail has been scanned for viruses, HiFX
> > > cannot ultimately accept any responsibility for viruses and it is your
> > > responsibility to scan attachments (if any).
> > >
> > > ​Before you print this email
> > > or attachments, please consider the negative environmental impacts
> > > associated with printing.
> > >
> >
>