Any support for DoubleMetaphone ever putting out secondary tokens?

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Any support for DoubleMetaphone ever putting out secondary tokens?

mbennett
As I understand Wikipedia, Double Metaphone improves over Metaphone in 2 areas:
1: Better linguistic matching
2: Can output a secondary token for words like Schmidt

A quick look at the Apache commons codec and Lucene filter, it doesn't seem like that secondary token is supported?  There is "save" code for whether inject is true/false, but that's not the same thing, and doesn't seem to have been extended.

Either I'm reading it wrong?  Or it somehow produces a compound token in those cases?

Looking on the web, one author claims that only 10% of names need a second token anyway, so not a big deal, but still good to know.

Thanks

--
Mark Bennett / New Idea Engineering, Inc. / [hidden email]
Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513
Reply | Threaded
Open this post in threaded view
|

Re: Any support for DoubleMetaphone ever putting out secondary tokens?

Walter Underwood
Double Metaphone is a good idea, but not that useful. Searchers just don't type in full phonetic versions of their query. Nobody types "ratatooie", instead they type "rata" then stop instead of making a mistake.

So, not that important.

wunder

On Apr 27, 2013, at 5:57 PM, Mark Bennett wrote:

As I understand Wikipedia, Double Metaphone improves over Metaphone in 2 areas:
1: Better linguistic matching
2: Can output a secondary token for words like Schmidt

A quick look at the Apache commons codec and Lucene filter, it doesn't seem like that secondary token is supported?  There is "save" code for whether inject is true/false, but that's not the same thing, and doesn't seem to have been extended.

Either I'm reading it wrong?  Or it somehow produces a compound token in those cases?

Looking on the web, one author claims that only 10% of names need a second token anyway, so not a big deal, but still good to know.

Thanks

--
Mark Bennett / New Idea Engineering, Inc. / [hidden email]
Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513

-



Reply | Threaded
Open this post in threaded view
|

Re: Any support for DoubleMetaphone ever putting out secondary tokens?

mbennett
In reply to this post by mbennett
Doh!

Turns out there's TWO ways to invoke Double Metaphone:

lucene/analysis/phonetic/src/java/org/apache/lucene/analysis/phonetic/PhoneticFilterFactory.java  (and Factory)  - use a setting
lucene/analysis/phonetic/src/java/org/apache/lucene/analysis/phonetic/DoubleMetaphoneFilter.java  (and Factory) - only D.M.

And the second more specific one has in it's comments:
"... DoubleMetaphone (supporting secondary codes)..."

In my defense, it wasn't in the wiki ;-)  TODO to add it

Hi Walter!

Thanks for the reply.  In my case it's special app that deals with surnames already in a database.  Not everybody is interactively searching for movie rentals y'know ;-)

--
Mark Bennett / New Idea Engineering, Inc. / [hidden email]
Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513


On Sat, Apr 27, 2013 at 5:57 PM, Mark Bennett <[hidden email]> wrote:
As I understand Wikipedia, Double Metaphone improves over Metaphone in 2 areas:
1: Better linguistic matching
2: Can output a secondary token for words like Schmidt

A quick look at the Apache commons codec and Lucene filter, it doesn't seem like that secondary token is supported?  There is "save" code for whether inject is true/false, but that's not the same thing, and doesn't seem to have been extended.

Either I'm reading it wrong?  Or it somehow produces a compound token in those cases?

Looking on the web, one author claims that only 10% of names need a second token anyway, so not a big deal, but still good to know.

Thanks

--
Mark Bennett / New Idea Engineering, Inc. / [hidden email]
Direct: <a href="tel:408-733-0387" value="+14087330387" target="_blank">408-733-0387 / Main: 866-IDEA-ENG / Cell: <a href="tel:408-829-6513" value="+14088296513" target="_blank">408-829-6513