new spanish analyzer

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

new spanish analyzer

José Ramón Pérez Agüera
I have developed a spanish analyzer with spanish stemmer based in Porter algorithm. Its under GNU license and free for use. I hope that will be useful for spanish lucene users. You can download the stemmer here:

http://multidoc.rediris.es/joseramon/index.php?option=com_docman&task=view_category&Itemid=25&subcat=1&catid=11&limitstart=0&limit=5

if somebody have any sugerences, i will be happy to improve my implementation

Sorry for my english :-)

jose

José Ramón Pérez Agüera
Despacho 411 tlf. 913947599
Dept. de Sistemas Informáticos y Programación
Facultad de Informática
Universidad Complutense de Madrid


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: new spanish analyzer

steve_rowe
Hola José,

Did you know that Java Lucene already has a contributed Snowball-based
stemmer/analyzer, very similar to yours?

http://svn.apache.org/viewcvs.cgi/lucene/java/trunk/contrib/snowball/

It looks to me as though your Spanish stopword list is the only
significant difference.  Would you agree that this is true?

Also, your stoplist loader (SpanishAnalyzer.loadStopWords()) is not
respecting the '|' comment-to-end-of-line character in your stoplist
(stopwords-spanish.txt).

Steve

José Ramón Pérez Agüera wrote:

> I have developed a spanish analyzer with spanish stemmer based in Porter algorithm. Its under GNU license and free for use. I hope that will be useful for spanish lucene users. You can download the stemmer here:
>
> http://multidoc.rediris.es/joseramon/index.php?option=com_docman&task=view_category&Itemid=25&subcat=1&catid=11&limitstart=0&limit=5
>
> if somebody have any sugerences, i will be happy to improve my implementation
>
> Sorry for my english :-)
>
> jose
>
> José Ramón Pérez Agüera
> Despacho 411 tlf. 913947599
> Dept. de Sistemas Informáticos y Programación
> Facultad de Informática
> Universidad Complutense de Madrid

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: new spanish analyzer

Ben van Klinken
Also, the snowball stemmer is available in the contributions of clucene.

ben

On 1/10/06, Steven Rowe <[hidden email]> wrote:

> Hola José,
>
> Did you know that Java Lucene already has a contributed Snowball-based
> stemmer/analyzer, very similar to yours?
>
> http://svn.apache.org/viewcvs.cgi/lucene/java/trunk/contrib/snowball/
>
> It looks to me as though your Spanish stopword list is the only
> significant difference.  Would you agree that this is true?
>
> Also, your stoplist loader (SpanishAnalyzer.loadStopWords()) is not
> respecting the '|' comment-to-end-of-line character in your stoplist
> (stopwords-spanish.txt).
>
> Steve
>
> José Ramón Pérez Agüera wrote:
> > I have developed a spanish analyzer with spanish stemmer based in Porter algorithm. Its under GNU license and free for use. I hope that will be useful for spanish lucene users. You can download the stemmer here:
> >
> > http://multidoc.rediris.es/joseramon/index.php?option=com_docman&task=view_category&Itemid=25&subcat=1&catid=11&limitstart=0&limit=5
> >
> > if somebody have any sugerences, i will be happy to improve my implementation
> >
> > Sorry for my english :-)
> >
> > jose
> >
> > José Ramón Pérez Agüera
> > Despacho 411 tlf. 913947599
> > Dept. de Sistemas Informáticos y Programación
> > Facultad de Informática
> > Universidad Complutense de Madrid
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]