stemming in Lucene

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

stemming in Lucene

wojtek hury
Hi all,

Snowball stemmers are part of Lucene, but for few languages only. We
have documents in various languages and so need stemmers for many
languages (in particular polish). One of the ideas is to use ispell
dictionaries. There are ispell dicts for many languages and so this
solution is good for multilingual environment. Maybe this is not
perfect place to ask, but does anyone know about java stemmer using
ispell dicts?
There is aspell-like java spell-checker (Jazzy) but I could not see
how to use it for stemming. We are considering porting part of
postgres tsearch module to java, because tsearch uses ispell dicts for
stemming.
But maybe there is a better way or there are people working on
something like that?

Thanks and regards,
wojtek

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: stemming in Lucene

Karl Wettin
Wojtek H skrev:
> Snowball stemmers are part of Lucene, but for few languages only. We

org.apache.lucene.analysis contains a few more stemmers.

> have documents in various languages and so need stemmers for many
> languages (in particular polish).

Have you seen Stempel?

http://www.getopt.org/stempel/



       karl

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: stemming in Lucene

Mathieu Lecarme
In reply to this post by wojtek hury
Wojtek H a écrit :

> Hi all,
>
> Snowball stemmers are part of Lucene, but for few languages only. We
> have documents in various languages and so need stemmers for many
> languages (in particular polish). One of the ideas is to use ispell
> dictionaries. There are ispell dicts for many languages and so this
> solution is good for multilingual environment. Maybe this is not
> perfect place to ask, but does anyone know about java stemmer using
> ispell dicts?
> There is aspell-like java spell-checker (Jazzy) but I could not see
> how to use it for stemming. We are considering porting part of
> postgres tsearch module to java, because tsearch uses ispell dicts for
> stemming.
> But maybe there is a better way or there are people working on
> something like that?
>  
ispell data is nice for phonetic, and for enumerate a huge list of
words. The ispell dictionnary is one way : pseudo root => word, it looks
hard to build the inverse function, lemme is splitted in multiple affix.
But it can be used to find rules, just like
http://www.getopt.org/stempel/ do.

M.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]