Snowball Java EnglishStemmer: Porter or Porter2?

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Snowball Java EnglishStemmer: Porter or Porter2?

Steve Legrand
Does the java-version of Snowball employ Porter or Porter2 stemming
algorithm in its EnglishStemmer available from the Lucene Sandbox? If it is
Porter2, I should get the word "his" indexed as "his" not as "hi" as it does
at the moment.

Regards,
Steve Legrand

_________________________________________________________________
Express yourself instantly with MSN Messenger! Download today it's FREE!
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/

Reply | Threaded
Open this post in threaded view
|

Re: Snowball Java EnglishStemmer: Porter or Porter2?

Erik Hatcher

On May 22, 2005, at 1:53 PM, Steve Legrand wrote:

> Does the java-version of Snowball employ Porter or Porter2 stemming  
> algorithm in its EnglishStemmer available from the Lucene Sandbox?  
> If it is Porter2, I should get the word "his" indexed as "his" not  
> as "hi" as it does at the moment.

I don't know the specifics of which algorithm, but there are three  
different SnowballAnalyzer stemmers for English - "English", "Lovins"  
and "Porter.  I just ran each of the English stemmers with the  
AnalyzerDemo and got this output analyzing the string "his hiss  
history":

   SnowballAnalyzer:  // English
     [his] [hiss] [histori]

   SnowballAnalyzer:  // Lovins
     [his] [his] [history]

   SnowballAnalyzer:  // Porter
     [hi] [hiss] [histori]

Only the "Lovins" one does what seems to be the right thing with  
"his", except that it does a bad job with words like "country" and  
"countries".

     Erik

Reply | Threaded
Open this post in threaded view
|

Re: Snowball Java EnglishStemmer: Porter or Porter2?

Steve Legrand
Thanks, Eric

I debugged my code and noticed that I had indexed one set of my files using
the older PorterAnalyzer and did the search with the SnowballAnalyzer. Now I
have the Snowball?s Porter algorithm (net.sf.snowball)  in both indexing and
search in all the file sets and everything works fine.

Cheerio, Steve

Steve Legrand

>
>On May 22, 2005, at 1:53 PM, Steve Legrand wrote:
>
>>Does the java-version of Snowball employ Porter or Porter2 stemming  
>>algorithm in its EnglishStemmer available from the Lucene Sandbox?  If it
>>is Porter2, I should get the word "his" indexed as "his" not  as "hi" as
>>it does at the moment.
>
>I don't know the specifics of which algorithm, but there are three  
>different SnowballAnalyzer stemmers for English - "English", "Lovins"  and
>"Porter.  I just ran each of the English stemmers with the  AnalyzerDemo
>and got this output analyzing the string "his hiss  history":
>
>   SnowballAnalyzer:  // English
>     [his] [hiss] [histori]
>
>   SnowballAnalyzer:  // Lovins
>     [his] [his] [history]
>
>   SnowballAnalyzer:  // Porter
>     [hi] [hiss] [histori]
>
>Only the "Lovins" one does what seems to be the right thing with  "his",
>except that it does a bad job with words like "country" and  "countries".
>
>     Erik
>

_________________________________________________________________
Express yourself instantly with MSN Messenger! Download today it's FREE!
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/