stemEnglishPossessive and contractions

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

stemEnglishPossessive and contractions

Herman Kiefus
We utilize a comprehensive dictionary of English words, place names, surnames, male and female first names, ... you get the point.  As such, the possessive plural forms of these words are recognized as 'misspelled'.

I simply thought that 'turning on' this option for the WordDelimiterFactory would address my concerns; however, I also got an unintended consequence: Contractions (isn't, wouldn't, shouldn't, he'll, we'll...) also seem to be affected.  Is this intended behavior?  When I read 'English possessive' I hear 'apostrophe s' and not 'apostrophe anything'.  Is there something I'm missing here?
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: stemEnglishPossessive and contractions

Robert Muir
The word delimiter filter also does other things, it treats ' as
punctuation by default. So it normally splits on ', except if its 's
(in this case it removes the 's completely if you use this
stemEnglishPossessive).

There are a couple approaches you can use:
1. you can keep worddelimiterfilter with this option on, but disabling
splitting on ' by customize its type table. in this case specify
types=mycustomtypes.txt, and in that file specify ' to be treated as
ALPHANUM or similar. see
https://issues.apache.org/jira/browse/SOLR-2059 for some examples of
this. i would only do this if you want worddelimiterfilter for other
purposes, if you just want to remove possessives and don't need
worddelimiterfilter's other features, look below.
2. you can instead use EnglishPossessiveFilterFactory, which only does
this exact thing (remove 's) and nothing else.

On Wed, Oct 19, 2011 at 5:30 PM, Herman Kiefus <[hidden email]> wrote:
> We utilize a comprehensive dictionary of English words, place names, surnames, male and female first names, ... you get the point.  As such, the possessive plural forms of these words are recognized as 'misspelled'.
>
> I simply thought that 'turning on' this option for the WordDelimiterFactory would address my concerns; however, I also got an unintended consequence: Contractions (isn't, wouldn't, shouldn't, he'll, we'll...) also seem to be affected.  Is this intended behavior?  When I read 'English possessive' I hear 'apostrophe s' and not 'apostrophe anything'.  Is there something I'm missing here?
>



--
lucidimagination.com
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: stemEnglishPossessive and contractions

Herman Kiefus
Thanks Robert, exactly what I was looking for.

-----Original Message-----
From: Robert Muir [mailto:[hidden email]]
Sent: Wednesday, October 19, 2011 1:15 PM
To: [hidden email]
Subject: Re: stemEnglishPossessive and contractions

The word delimiter filter also does other things, it treats ' as punctuation by default. So it normally splits on ', except if its 's (in this case it removes the 's completely if you use this stemEnglishPossessive).

There are a couple approaches you can use:
1. you can keep worddelimiterfilter with this option on, but disabling splitting on ' by customize its type table. in this case specify types=mycustomtypes.txt, and in that file specify ' to be treated as ALPHANUM or similar. see
https://issues.apache.org/jira/browse/SOLR-2059 for some examples of this. i would only do this if you want worddelimiterfilter for other purposes, if you just want to remove possessives and don't need worddelimiterfilter's other features, look below.
2. you can instead use EnglishPossessiveFilterFactory, which only does this exact thing (remove 's) and nothing else.

On Wed, Oct 19, 2011 at 5:30 PM, Herman Kiefus <[hidden email]> wrote:
> We utilize a comprehensive dictionary of English words, place names, surnames, male and female first names, ... you get the point.  As such, the possessive plural forms of these words are recognized as 'misspelled'.
>
> I simply thought that 'turning on' this option for the WordDelimiterFactory would address my concerns; however, I also got an unintended consequence: Contractions (isn't, wouldn't, shouldn't, he'll, we'll...) also seem to be affected.  Is this intended behavior?  When I read 'English possessive' I hear 'apostrophe s' and not 'apostrophe anything'.  Is there something I'm missing here?
>



--
lucidimagination.com
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: stemEnglishPossessive and contractions

donato
Hi Herman,

I just noticed your post on possessives and I am having the same problem. With Sr. Patrick's Day coming up, people are searching our site for "patrick" and patrick's" yet they are yielding different results. If we search for "patrick" and patricks" they yield the same results. I want all three to yield the same results.

Here is my schema file CLICK HERE. Am I missing something? Do I have the order wrong? Are they in the wrong place?

Thank you in advance. I am not too familiar with this stuff as of yet...

Cheers.
Loading...