Snowball Analyzer and apostrophes

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Snowball Analyzer and apostrophes

Max Metral
So I'm using Snowball Analyzer on a field for business titles.  The
value "Charlie's Sandwich Shoppe" becomes "charli sandwich shopp".  This
happens partly because the StandardAnalyzer strips off the apostrophe-s
entirely, and then the Snowballer takes off the e.  The problem is when
someone comes in to search for Charlies, without the apostrophe, they
get no match because in THAT case, Snowballer produces "charl" as the
term.  Thoughts on best approach for solving this?  Do I expand it to
become "{charl,charli} sandwich shop"?  Should I strip apostrophe's
before feeding the beast?

 

Thanks

--Max

 

Reply | Threaded
Open this post in threaded view
|

Re: Snowball Analyzer and apostrophes

Erick Erickson
This is tricky....

If you strip the apostrophe, you'd get interesting results from O'brien,
depending
upon how you stripped it (i.e. "closed up" the word to Obrien or substituted
a space, e.g. O brien). We've generally had the fewest surprises by closing
up apostrophes (i.e. Obrien, Charlies).

Unfortunately, anything you do will be wrong in some case. You can either
do something simple like the above, or, say, generate a dictionary that you
use. That is, basically keep a record of all the exceptions to your simple
rule
and transform the input before feeding the analyzer.

Personally, though, I'd close up the apostrophe and feed the analyzer. Don't
forget to do the same for the query.

Best
Erick

You know, my job would be a lot easier if English were regularized. Sign my
petition now!

On Tue, Jun 17, 2008 at 5:16 PM, Max Metral <[hidden email]>
wrote:

> So I'm using Snowball Analyzer on a field for business titles.  The
> value "Charlie's Sandwich Shoppe" becomes "charli sandwich shopp".  This
> happens partly because the StandardAnalyzer strips off the apostrophe-s
> entirely, and then the Snowballer takes off the e.  The problem is when
> someone comes in to search for Charlies, without the apostrophe, they
> get no match because in THAT case, Snowballer produces "charl" as the
> term.  Thoughts on best approach for solving this?  Do I expand it to
> become "{charl,charli} sandwich shop"?  Should I strip apostrophe's
> before feeding the beast?
>
>
>
> Thanks
>
> --Max
>
>
>
>