Limit Porter stemmer to plural stemming only?

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Limit Porter stemmer to plural stemming only?

climbingrose
Hi all,
Porter stemmer in general is really good. However, there are some cases
where it doesn't work. For example, "accountant" matches "Accountant" as
well as "Account Manager" which isn't desirable. Is it possible to use this
analyser for plural words only? For example:
+Accountant -> accountant
+Accountants -> accountant
+Account -> Account
+Accounts -> account

Thanks.

--
Regards,

Cuong Hoang
Reply | Threaded
Open this post in threaded view
|

Re: Limit Porter stemmer to plural stemming only?

climbingrose
Ok, it looks like step 1a in Porter algo does what I need.
On Mon, Jun 30, 2008 at 6:39 PM, climbingrose <[hidden email]>
wrote:

> Hi all,
> Porter stemmer in general is really good. However, there are some cases
> where it doesn't work. For example, "accountant" matches "Accountant" as
> well as "Account Manager" which isn't desirable. Is it possible to use this
> analyser for plural words only? For example:
> +Accountant -> accountant
> +Accountants -> accountant
> +Account -> Account
> +Accounts -> account
>
> Thanks.
>
> --
> Regards,
>
> Cuong Hoang
>



--
Regards,

Cuong Hoang
Reply | Threaded
Open this post in threaded view
|

Re: Limit Porter stemmer to plural stemming only?

Mike Klaas
If you find a solution that works well, I encourage you to contribute  
it back to Solr.  Plural-only stemming is probably a common need (I've  
definitely wanted to use it before).

cheers,
-Mike

On 30-Jun-08, at 2:25 AM, climbingrose wrote:

> Ok, it looks like step 1a in Porter algo does what I need.
> On Mon, Jun 30, 2008 at 6:39 PM, climbingrose <[hidden email]>
> wrote:
>
>> Hi all,
>> Porter stemmer in general is really good. However, there are some  
>> cases
>> where it doesn't work. For example, "accountant" matches  
>> "Accountant" as
>> well as "Account Manager" which isn't desirable. Is it possible to  
>> use this
>> analyser for plural words only? For example:
>> +Accountant -> accountant
>> +Accountants -> accountant
>> +Account -> Account
>> +Accounts -> account
>>
>> Thanks.
>>
>> --
>> Regards,
>>
>> Cuong Hoang
>>
>
>
>
> --
> Regards,
>
> Cuong Hoang

Reply | Threaded
Open this post in threaded view
|

Re: Limit Porter stemmer to plural stemming only?

climbingrose
I modified the original English Stemmer written in Snowball language and
regenerate the Java implementation using Snowball compiler. It's been
working for me  so far. I certainly can share the modified Snowball English
Stemmer if anyone wants to use it.

Cheers,
Cuong

On Tue, Jul 1, 2008 at 4:12 AM, Mike Klaas <[hidden email]> wrote:

> If you find a solution that works well, I encourage you to contribute it
> back to Solr.  Plural-only stemming is probably a common need (I've
> definitely wanted to use it before).
>
> cheers,
> -Mike
>
>
> On 30-Jun-08, at 2:25 AM, climbingrose wrote:
>
>  Ok, it looks like step 1a in Porter algo does what I need.
>> On Mon, Jun 30, 2008 at 6:39 PM, climbingrose <[hidden email]>
>> wrote:
>>
>>  Hi all,
>>> Porter stemmer in general is really good. However, there are some cases
>>> where it doesn't work. For example, "accountant" matches "Accountant" as
>>> well as "Account Manager" which isn't desirable. Is it possible to use
>>> this
>>> analyser for plural words only? For example:
>>> +Accountant -> accountant
>>> +Accountants -> accountant
>>> +Account -> Account
>>> +Accounts -> account
>>>
>>> Thanks.
>>>
>>> --
>>> Regards,
>>>
>>> Cuong Hoang
>>>
>>>
>>
>>
>> --
>> Regards,
>>
>> Cuong Hoang
>>
>
>


--
Regards,

Cuong Hoang
Reply | Threaded
Open this post in threaded view
|

Re: Limit Porter stemmer to plural stemming only?

Guillaume Smet
Hi Cuong,

On Tue, Jul 1, 2008 at 4:45 AM, climbingrose <[hidden email]> wrote:
> I modified the original English Stemmer written in Snowball language and
> regenerate the Java implementation using Snowball compiler. It's been
> working for me  so far. I certainly can share the modified Snowball English
> Stemmer if anyone wants to use it.

Yeah, it would be nice. A step by step explanation of how to
regenerate the Java files would be nice too (or a pointer to such a
documentation if you found one).

Thanks,

--
Guillaume
Reply | Threaded
Open this post in threaded view
|

Re: Limit Porter stemmer to plural stemming only?

climbingrose
Attached is the modified Snowball source code for plural-only English stemmer. You need to compile it to Java using instruction here: http://snowball.tartarus.org/runtime/use.html. Essentially, you need to:

1) Download (Snowball, algorithms, and libstemmer library) and compile Snowball compiler it self using this command: gcc -O -o snowball compiler/*.c.
2) Compile the the attached file to Java: 
./snowball stem_ISO_8859_1.sbl -java -o EnglishStemmer -name EnglishStemmer

You can change EnglishStemmer to whatever you like, for example, PluralEnglishStemmer. After that, you need to modify the generated Java class so that it references the appropriate classes in net.sf.snowball.* package instead of the one from Snowball website. I think only 2 classes you need to import are Among and SnowballProgram.

Once, you have the new stemmer ready, write something similar to EnglishPorterFilterFactory to use it within Solr.

Hope this helps.

Cheers,
Cuong


On Tue, Jul 1, 2008 at 6:07 PM, Guillaume Smet <[hidden email]> wrote:
Hi Cuong,

On Tue, Jul 1, 2008 at 4:45 AM, climbingrose <[hidden email]> wrote:
> I modified the original English Stemmer written in Snowball language and
> regenerate the Java implementation using Snowball compiler. It's been
> working for me  so far. I certainly can share the modified Snowball English
> Stemmer if anyone wants to use it.

Yeah, it would be nice. A step by step explanation of how to
regenerate the Java files would be nice too (or a pointer to such a
documentation if you found one).

Thanks,

--
Guillaume


Reply | Threaded
Open this post in threaded view
|

Re: Limit Porter stemmer to plural stemming only?

jerry.jacob@gmail.com
Hi,

Do you mind attaching the Plural only Stemmer? I cant find it in this post.

Thanks
Jerry