Hello, I've been exploring usage of the spellcheck feature via solr 1.3. I
have it working, but there are some issues I'm seeing that make it less
useful than it could be. Response on the solr-user mailing list has been
limited. I'm guessing the reason may be that I'm asking about issues which
are most relevant to the lucene codebase. So, I hope you don't mind this
I've noticed a few issues with spellcheck as I've been testing it out for
use on our site...
1. Rebuild breaks requests - I'm using rebuildOnCommit ATM. If a commit
is going on and files are being rebuilt in the spellcheck data dir,
spellcheck requests yield bogus answers. I.e. I can issue identical
requests and get drastically different answers. The first time, I get
suggestions and "correctlySpelled" is false. The second time (during the
commit), I get no suggestions and "correctlySpelled" is true. Shouldn't
spellcheck use the old index until the new one is ready for use, like solr
does with optimizes?
2. Inconsistent ordering - The first suggestion changes depending on the
spellcheck.count that I specify. If my query is "chanl" and I ask for one
result, the suggestion is "chant" (freq. 16). If I ask for 5 results, the
first suggestion is also "chant"; the other 4 suggestions are less frequent
(#2 is "chang", freq. 11). However, if I ask for 10 results, the first
suggestion is "chanel" (freq. 1296); #2 and #3 are "chant" and "chang"; #9
is "chan" (freq. 174). Shouldn't spellcheck always return the best
suggestion first? In my case, shouldn't "chanel" always top "chant" and
"chang" since they all have the same edit distance yet "chanel" is two
orders of mangnitude more popular?
Is there anything I could be doing wrong to create these problems? If not,
are these known issues? If not, should I create jira's for them?