spellcheck: substitutions, but no inserts or deletes

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

spellcheck: substitutions, but no inserts or deletes

Jason Rennie-2
I've been testing the SpellCheckComponent for use on StyleFeeder.  It seems
to do a great job of suggesting character substitutions, but I haven't seen
any deletion/insertion suggestions.  I've tried decreasing the "accuracy"
parameter to 0.5.  Some queries I've tried are:

bluea: suggests "blues" (should be "blue")
yello: no suggestions (should be "yellow")
candyz: suggests "candyĆ¢" (should be "candy")
chane: no suggestions (should be "chanel")

It looks to me like it is only willing to make character substitutions and
is unwilling to insert/delete characters.  Does anyone know why it might be
behaving this way?  I'm certain that the "should be" words appear fairly
frequently in the field I used for spellcheck indexing.  And, I reindexed
the documents after setting up the spellchecker.

Not sure if this would help to debug, but I noticed that words appear with
different frequency in the spellcheck index file (.cfs in the spellcheck
dir).  I.e. here's what I get for a few variants on "blue":

[jason@database spellchecker]$ strings _2y.cfs | grep ^blue$|wc
     46      46     230
[jason@database spellchecker]$ strings _2y.cfs | grep ^bluea$|wc
      0       0       0
[jason@database spellchecker]$ strings _2y.cfs | grep ^blues$|wc
      3       3      18

All the "should be" words appear 10+ times.  The misspellings appear 0 or 1
times.

Any help is appreciated.  Thanks,

Jason