Index with ItalianStemmer

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Index with ItalianStemmer

Tommaso Teofili
Hi all,
I am experiencing a strange behavior while indexing italian text (an indexed
not stored text field) when stemming with italian language:

<fieldType name="text" class="solr.TextField" positionIncrementGap="100">

      <analyzer type="index">

        <tokenizer class="solr.WhitespaceTokenizerFactory"/>

        <filter class="solr.StopFilterFactory"

                ignoreCase="true"

                words="stopwords.txt"

                enablePositionIncrements="true"

                />

        <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>

         <filter class="solr.LowerCaseFilterFactory"/>

        <filter class="solr.SnowballPorterFilterFactory" language="Italian"
> protected="protwords.txt"/>

      </analyzer>


if I try to index the text field with the value:
"mi voglio documentare su Solr e sulla sua storia" (which means "I want to
study Solr and its history")
my search for "q=text:documentare" or for  "q=text:documento" turns out
nothing.
The biggest issue is that the first one, which was intended to work both if
stemming was and was not enabled, does not match any document

If I change the stemmer language to English and then reindex, the first of
the queries above succeeds as expected because no stemming is applied.

Does anyone know what could be the root cause or if I am missing something?
Thanks in advance for any help,
Tommaso
Reply | Threaded
Open this post in threaded view
|

Re: Index with ItalianStemmer

Robert Muir
On Fri, Sep 3, 2010 at 8:04 AM, Tommaso Teofili
<[hidden email]>wrote:

> Does anyone know what could be the root cause or if I am missing something?
> Thanks in advance for any help,
> Tommaso
>

I didn't see a definition of your 'query' analyzer, only 'index'. Can you
ensure you specify Italian Stemmer at 'query' time too?

for debugging you can use analysis.jsp to see if these are consistent.


--
Robert Muir
[hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: Index with ItalianStemmer

Tommaso Teofili
Thanks Robert for this hint, the problem was exactly that I needed to define
the right stemmer at query time too.
Best regards,
Tommaso

2010/9/3 Robert Muir <[hidden email]>

> On Fri, Sep 3, 2010 at 8:04 AM, Tommaso Teofili
> <[hidden email]>wrote:
>
> > Does anyone know what could be the root cause or if I am missing
> something?
> > Thanks in advance for any help,
> > Tommaso
> >
>
> I didn't see a definition of your 'query' analyzer, only 'index'. Can you
> ensure you specify Italian Stemmer at 'query' time too?
>
> for debugging you can use analysis.jsp to see if these are consistent.
>
>
> --
> Robert Muir
> [hidden email]
>