Strange behavior

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Strange behavior

Sergey Polzunov-2
Hi all

Please take a look at this strange behavior (connected with stemming I
suppose):


type:

<fieldtype name="customTextField" class="solr.TextField" indexed="true"
stored="false">
      <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="
stopwords.txt"/>
        <filter class="solr.EnglishPorterFilterFactory" protected="
protwords.txt"/>
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
      <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="
stopwords.txt"/>
        <filter class="solr.EnglishPorterFilterFactory" protected="
protwords.txt"/>
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
</fieldtype>

field:

<field name="name"  type="customTextField" indexed="true"  stored="false"/>



I'm adding a document:

<add><doc><field name="id">999999</field><field
name="name">Apple</field></doc></add>

<commit/>


Queriyng "name:apple" - 0 results. Searching "name:Apple" - 1 result. But
"name:appl*" - 1 result


Adding next document:

<add><doc><field name="id">88888</field><field
name="name">Somenamele</field></doc></add>

<commit/>


Searching for "name:somenamele" - 1 result, for "name:Somenamele" - 1 result


What is the problem with "Apple" ? Maybe StandardTokenizer understands it as
trademark :) ?


Thank you in advence


--
Best regards,
Traut
Reply | Threaded
Open this post in threaded view
|

Re: Strange behavior

Yonik Seeley-2
Try putting the stemmer after the lowercase filter.
-Yonik

On Feb 12, 2008 9:15 AM, Traut <[hidden email]> wrote:

> Hi all
>
> Please take a look at this strange behavior (connected with stemming I
> suppose):
>
>
> type:
>
> <fieldtype name="customTextField" class="solr.TextField" indexed="true"
> stored="false">
>       <analyzer type="query">
>         <tokenizer class="solr.StandardTokenizerFactory"/>
>         <filter class="solr.StopFilterFactory" ignoreCase="true" words="
> stopwords.txt"/>
>         <filter class="solr.EnglishPorterFilterFactory" protected="
> protwords.txt"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>       </analyzer>
>       <analyzer type="index">
>         <tokenizer class="solr.StandardTokenizerFactory"/>
>         <filter class="solr.StopFilterFactory" ignoreCase="true" words="
> stopwords.txt"/>
>         <filter class="solr.EnglishPorterFilterFactory" protected="
> protwords.txt"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>       </analyzer>
> </fieldtype>
>
> field:
>
> <field name="name"  type="customTextField" indexed="true"  stored="false"/>
>
>
>
> I'm adding a document:
>
> <add><doc><field name="id">999999</field><field
> name="name">Apple</field></doc></add>
>
> <commit/>
>
>
> Queriyng "name:apple" - 0 results. Searching "name:Apple" - 1 result. But
> "name:appl*" - 1 result
>
>
> Adding next document:
>
> <add><doc><field name="id">88888</field><field
> name="name">Somenamele</field></doc></add>
>
> <commit/>
>
>
> Searching for "name:somenamele" - 1 result, for "name:Somenamele" - 1 result
>
>
> What is the problem with "Apple" ? Maybe StandardTokenizer understands it as
> trademark :) ?
>
>
> Thank you in advence
>
>
> --
> Best regards,
> Traut
>
Reply | Threaded
Open this post in threaded view
|

Re: Strange behavior

Sergey Polzunov-2
Thank you, it works. Stemming filter works only with lowercased words?

On Feb 12, 2008 4:29 PM, Yonik Seeley <[hidden email]> wrote:

> Try putting the stemmer after the lowercase filter.
> -Yonik
>
> On Feb 12, 2008 9:15 AM, Traut <[hidden email]> wrote:
> > Hi all
> >
> > Please take a look at this strange behavior (connected with stemming I
> > suppose):
> >
> >
> > type:
> >
> > <fieldtype name="customTextField" class="solr.TextField" indexed="true"
> > stored="false">
> >       <analyzer type="query">
> >         <tokenizer class="solr.StandardTokenizerFactory"/>
> >         <filter class="solr.StopFilterFactory" ignoreCase="true" words="
> > stopwords.txt"/>
> >         <filter class="solr.EnglishPorterFilterFactory" protected="
> > protwords.txt"/>
> >         <filter class="solr.LowerCaseFilterFactory"/>
> >       </analyzer>
> >       <analyzer type="index">
> >         <tokenizer class="solr.StandardTokenizerFactory"/>
> >         <filter class="solr.StopFilterFactory" ignoreCase="true" words="
> > stopwords.txt"/>
> >         <filter class="solr.EnglishPorterFilterFactory" protected="
> > protwords.txt"/>
> >         <filter class="solr.LowerCaseFilterFactory"/>
> >       </analyzer>
> > </fieldtype>
> >
> > field:
> >
> > <field name="name"  type="customTextField" indexed="true"
>  stored="false"/>
> >
> >
> >
> > I'm adding a document:
> >
> > <add><doc><field name="id">999999</field><field
> > name="name">Apple</field></doc></add>
> >
> > <commit/>
> >
> >
> > Queriyng "name:apple" - 0 results. Searching "name:Apple" - 1 result.
> But
> > "name:appl*" - 1 result
> >
> >
> > Adding next document:
> >
> > <add><doc><field name="id">88888</field><field
> > name="name">Somenamele</field></doc></add>
> >
> > <commit/>
> >
> >
> > Searching for "name:somenamele" - 1 result, for "name:Somenamele" - 1
> result
> >
> >
> > What is the problem with "Apple" ? Maybe StandardTokenizer understands
> it as
> > trademark :) ?
> >
> >
> > Thank you in advence
> >
> >
> > --
> > Best regards,
> > Traut
> >
>



--
Best regards,
Traut
Reply | Threaded
Open this post in threaded view
|

Re: Strange behavior

Yonik Seeley-2
On Feb 12, 2008 9:50 AM, Traut <[hidden email]> wrote:
> Thank you, it works. Stemming filter works only with lowercased words?

I've never tried it in the order you have it.
You could try the analysis admin page and report back what happens...

-Yonik


> On Feb 12, 2008 4:29 PM, Yonik Seeley <[hidden email]> wrote:
>
> > Try putting the stemmer after the lowercase filter.
> > -Yonik
> >
> > On Feb 12, 2008 9:15 AM, Traut <[hidden email]> wrote:
> > > Hi all
> > >
> > > Please take a look at this strange behavior (connected with stemming I
> > > suppose):
> > >
> > >
> > > type:
> > >
> > > <fieldtype name="customTextField" class="solr.TextField" indexed="true"
> > > stored="false">
> > >       <analyzer type="query">
> > >         <tokenizer class="solr.StandardTokenizerFactory"/>
> > >         <filter class="solr.StopFilterFactory" ignoreCase="true" words="
> > > stopwords.txt"/>
> > >         <filter class="solr.EnglishPorterFilterFactory" protected="
> > > protwords.txt"/>
> > >         <filter class="solr.LowerCaseFilterFactory"/>
> > >       </analyzer>
> > >       <analyzer type="index">
> > >         <tokenizer class="solr.StandardTokenizerFactory"/>
> > >         <filter class="solr.StopFilterFactory" ignoreCase="true" words="
> > > stopwords.txt"/>
> > >         <filter class="solr.EnglishPorterFilterFactory" protected="
> > > protwords.txt"/>
> > >         <filter class="solr.LowerCaseFilterFactory"/>
> > >       </analyzer>
> > > </fieldtype>
> > >
> > > field:
> > >
> > > <field name="name"  type="customTextField" indexed="true"
> >  stored="false"/>
> > >
> > >
> > >
> > > I'm adding a document:
> > >
> > > <add><doc><field name="id">999999</field><field
> > > name="name">Apple</field></doc></add>
> > >
> > > <commit/>
> > >
> > >
> > > Queriyng "name:apple" - 0 results. Searching "name:Apple" - 1 result.
> > But
> > > "name:appl*" - 1 result
> > >
> > >
> > > Adding next document:
> > >
> > > <add><doc><field name="id">88888</field><field
> > > name="name">Somenamele</field></doc></add>
> > >
> > > <commit/>
> > >
> > >
> > > Searching for "name:somenamele" - 1 result, for "name:Somenamele" - 1
> > result
> > >
> > >
> > > What is the problem with "Apple" ? Maybe StandardTokenizer understands
> > it as
> > > trademark :) ?
> > >
> > >
> > > Thank you in advence
> > >
> > >
> > > --
> > > Best regards,
> > > Traut
> > >
> >
>
>
>
> --
> Best regards,
> Traut
>