How configure SnowballAnalyzer to language Spanish

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

How configure SnowballAnalyzer to language Spanish

Iris Soto
Hello,

I am trying to configure Solr to index a Spanish site and I am hitting
some problems.
I have a basic install using the Tomcat.

Into schema.xml file i have the following:

<fieldtype name="text_es" class="solr.TextField"
positionIncrementGap="100">
     <analyzer>
         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
         <filter class="solr.ISOLatin1AccentFilterFactory"/>
         <filter class="solr.StopFilterFactory" ignoreCase="true"/>
         <filter class="solr.LowerCaseFilterFactory"/>
         <filter class="solr.SnowballPorterFilterFactory"
language="Spanish"/>
     </analyzer>
   </fieldtype>

In Solr wiki appears package:
org.apache.lucene.analysis.snowball.SnowballAnalyzer, how can i specify
the type of language to use it?
<analyzer class="org.apache.lucene.analysis.snowball.SnowballAnalyzer">

I want that ISOLatin1AccentFilterFactory delete accented forms, like: á,
é, ñ... , but in case of queries, this process doesn't works, because it
should search words that contains that accented forms.
Is good this code? How can i configure the analyzer to Spanish language?

Thanks & Regards,


--
Iris Soto

--
Iris Soto

Reply | Threaded
Open this post in threaded view
|

Re: How configure SnowballAnalyzer to language Spanish

Yonik Seeley-2
Hi Iris,

An "Analyzer" is just a tokenizer followed by a series of token filters.
Stick with the TextField that you defined below and you should be fine.
I'm not sure how the Spanish stemmer works, and if it expects to work
on accented characters... if so, you may want to move
ISOLatin1AccentFilterFactory after the stemmer.

-Yonik

On 11/27/06, Iris Soto <[hidden email]> wrote:

> Hello,
>
> I am trying to configure Solr to index a Spanish site and I am hitting
> some problems.
> I have a basic install using the Tomcat.
>
> Into schema.xml file i have the following:
>
> <fieldtype name="text_es" class="solr.TextField"
> positionIncrementGap="100">
>      <analyzer>
>          <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>          <filter class="solr.ISOLatin1AccentFilterFactory"/>
>          <filter class="solr.StopFilterFactory" ignoreCase="true"/>
>          <filter class="solr.LowerCaseFilterFactory"/>
>          <filter class="solr.SnowballPorterFilterFactory"
> language="Spanish"/>
>      </analyzer>
>    </fieldtype>
>
> In Solr wiki appears package:
> org.apache.lucene.analysis.snowball.SnowballAnalyzer, how can i specify
> the type of language to use it?
> <analyzer class="org.apache.lucene.analysis.snowball.SnowballAnalyzer">
>
> I want that ISOLatin1AccentFilterFactory delete accented forms, like: á,
> é, ñ... , but in case of queries, this process doesn't works, because it
> should search words that contains that accented forms.
> Is good this code? How can i configure the analyzer to Spanish language?
>
> Thanks & Regards,
>
>
> --
> Iris Soto