Spellchecker delivers far too few suggestions

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Spellchecker delivers far too few suggestions

Martin Dietze-2
I recently upgraded to SOLR 4.10.1 and after that set up the spell
checker which I use for returning suggestions after searches with few
or no results.
When the spellchecker is active, this request handler is used (most of
which is taken from examples I found in the net):

  <requestHandler name="standardWithSpell" class="solr.SearchHandler"
default="false">
     <lst name="defaults">
       <str name="echoParams">explicit</str>
       <str name="spellcheck">true</str>
       <str name="spellcheck.onlyMorePopular">false</str>
       <str name="spellcheck.count">10</str>
       <str name="spellcheck.collate">false</str>
       <str name="q.alt">*:*</str>
       <str name="echoParams">explicit</str>
       <int name="rows">50</int>
       <str name="fl">*,score</str>
     </lst>
     <arr name="last-components">
       <str>spellcheck</str>
     </arr>
  </requestHandler>

The search component is configured as follows (again most of it copied
from examples in the net):

  <searchComponent name="spellcheck" class="solr.SpellCheckComponent">
    <str name="queryAnalyzerFieldType">text</str>
    <lst name="spellchecker">
      <str name="name">default</str>
      <str name="field">text</str>
      <str name="classname">solr.DirectSolrSpellChecker</str>
      <str name="distanceMeasure">internal</str>
      <float name="accuracy">0.3</float>
      <int name="maxEdits">2</int>
      <int name="minPrefix">1</int>
      <int name="maxInspections">5</int>
      <int name="minQueryLength">4</int>
      <float name="maxQueryFrequency">0.01</float>
      <float name="maxQueryFrequency">.01</float>
    </lst>
  </searchComponent>

With this setup I can get suggestions for misspelled words. The
results on my developer machine were mostly fine, but on the test
system (much larger database, much larger search index) I found it
very hard to get suggestions at all. If for instance I misspell “bank”
as “bnak” I’d expect to get a suggestion for “bank” (since that word
can be found in the index very often).

I’ve played around with maxQueryFrequency and maxQueryFrequency with
no success.

Does anyone see any obvious misconfiguration? Anything that I could try?

Any way I can debug this? (problem is that my application uses the
core API which makes trying out requests through the web interface
does not work)

Any help would be greatly appreciated!

Cheers,

Martin


--
---------- [hidden email] --/-- [hidden email] ----
------------- / http://herbert.the-little-red-haired-girl.org / -------------
Reply | Threaded
Open this post in threaded view
|

Re: Spellchecker delivers far too few suggestions

Erick Erickson
First, I'd look in your corpus for "bnak". The problem with index-based
suggestions is that if your index contains garbage, they're "correctly
spelled" since they're in the index. TermsComponent is very useful for this.

You can also loosen up the match criteria, and as I remember the collations
parameter does some permutations of the word (but my memory of how that
works is shaky).

Best,
Erick

On Wed, Dec 17, 2014 at 9:13 AM, Martin Dietze <[hidden email]> wrote:

> I recently upgraded to SOLR 4.10.1 and after that set up the spell
> checker which I use for returning suggestions after searches with few
> or no results.
> When the spellchecker is active, this request handler is used (most of
> which is taken from examples I found in the net):
>
>   <requestHandler name="standardWithSpell" class="solr.SearchHandler"
> default="false">
>      <lst name="defaults">
>        <str name="echoParams">explicit</str>
>        <str name="spellcheck">true</str>
>        <str name="spellcheck.onlyMorePopular">false</str>
>        <str name="spellcheck.count">10</str>
>        <str name="spellcheck.collate">false</str>
>        <str name="q.alt">*:*</str>
>        <str name="echoParams">explicit</str>
>        <int name="rows">50</int>
>        <str name="fl">*,score</str>
>      </lst>
>      <arr name="last-components">
>        <str>spellcheck</str>
>      </arr>
>   </requestHandler>
>
> The search component is configured as follows (again most of it copied
> from examples in the net):
>
>   <searchComponent name="spellcheck" class="solr.SpellCheckComponent">
>     <str name="queryAnalyzerFieldType">text</str>
>     <lst name="spellchecker">
>       <str name="name">default</str>
>       <str name="field">text</str>
>       <str name="classname">solr.DirectSolrSpellChecker</str>
>       <str name="distanceMeasure">internal</str>
>       <float name="accuracy">0.3</float>
>       <int name="maxEdits">2</int>
>       <int name="minPrefix">1</int>
>       <int name="maxInspections">5</int>
>       <int name="minQueryLength">4</int>
>       <float name="maxQueryFrequency">0.01</float>
>       <float name="maxQueryFrequency">.01</float>
>     </lst>
>   </searchComponent>
>
> With this setup I can get suggestions for misspelled words. The
> results on my developer machine were mostly fine, but on the test
> system (much larger database, much larger search index) I found it
> very hard to get suggestions at all. If for instance I misspell “bank”
> as “bnak” I’d expect to get a suggestion for “bank” (since that word
> can be found in the index very often).
>
> I’ve played around with maxQueryFrequency and maxQueryFrequency with
> no success.
>
> Does anyone see any obvious misconfiguration? Anything that I could try?
>
> Any way I can debug this? (problem is that my application uses the
> core API which makes trying out requests through the web interface
> does not work)
>
> Any help would be greatly appreciated!
>
> Cheers,
>
> Martin
>
>
> --
> ---------- [hidden email] --/-- [hidden email] ----
> ------------- / http://herbert.the-little-red-haired-girl.org / -------------
Reply | Threaded
Open this post in threaded view
|

Re: Spellchecker delivers far too few suggestions

Dan Davis-2
What about the frequency comparison - I haven't used the spellchecker
heavily, but it seems that if "bnak" is in the database, but "bank" is much
more frequent, then "bank" should be a suggestion anyway...

On Wed, Dec 17, 2014 at 10:41 AM, Erick Erickson <[hidden email]>
wrote:

>
> First, I'd look in your corpus for "bnak". The problem with index-based
> suggestions is that if your index contains garbage, they're "correctly
> spelled" since they're in the index. TermsComponent is very useful for
> this.
>
> You can also loosen up the match criteria, and as I remember the collations
> parameter does some permutations of the word (but my memory of how that
> works is shaky).
>
> Best,
> Erick
>
> On Wed, Dec 17, 2014 at 9:13 AM, Martin Dietze <[hidden email]> wrote:
> > I recently upgraded to SOLR 4.10.1 and after that set up the spell
> > checker which I use for returning suggestions after searches with few
> > or no results.
> > When the spellchecker is active, this request handler is used (most of
> > which is taken from examples I found in the net):
> >
> >   <requestHandler name="standardWithSpell" class="solr.SearchHandler"
> > default="false">
> >      <lst name="defaults">
> >        <str name="echoParams">explicit</str>
> >        <str name="spellcheck">true</str>
> >        <str name="spellcheck.onlyMorePopular">false</str>
> >        <str name="spellcheck.count">10</str>
> >        <str name="spellcheck.collate">false</str>
> >        <str name="q.alt">*:*</str>
> >        <str name="echoParams">explicit</str>
> >        <int name="rows">50</int>
> >        <str name="fl">*,score</str>
> >      </lst>
> >      <arr name="last-components">
> >        <str>spellcheck</str>
> >      </arr>
> >   </requestHandler>
> >
> > The search component is configured as follows (again most of it copied
> > from examples in the net):
> >
> >   <searchComponent name="spellcheck" class="solr.SpellCheckComponent">
> >     <str name="queryAnalyzerFieldType">text</str>
> >     <lst name="spellchecker">
> >       <str name="name">default</str>
> >       <str name="field">text</str>
> >       <str name="classname">solr.DirectSolrSpellChecker</str>
> >       <str name="distanceMeasure">internal</str>
> >       <float name="accuracy">0.3</float>
> >       <int name="maxEdits">2</int>
> >       <int name="minPrefix">1</int>
> >       <int name="maxInspections">5</int>
> >       <int name="minQueryLength">4</int>
> >       <float name="maxQueryFrequency">0.01</float>
> >       <float name="maxQueryFrequency">.01</float>
> >     </lst>
> >   </searchComponent>
> >
> > With this setup I can get suggestions for misspelled words. The
> > results on my developer machine were mostly fine, but on the test
> > system (much larger database, much larger search index) I found it
> > very hard to get suggestions at all. If for instance I misspell “bank”
> > as “bnak” I’d expect to get a suggestion for “bank” (since that word
> > can be found in the index very often).
> >
> > I’ve played around with maxQueryFrequency and maxQueryFrequency with
> > no success.
> >
> > Does anyone see any obvious misconfiguration? Anything that I could try?
> >
> > Any way I can debug this? (problem is that my application uses the
> > core API which makes trying out requests through the web interface
> > does not work)
> >
> > Any help would be greatly appreciated!
> >
> > Cheers,
> >
> > Martin
> >
> >
> > --
> > ---------- [hidden email] --/-- [hidden email]
> ----
> > ------------- / http://herbert.the-little-red-haired-girl.org /
> -------------
>
Reply | Threaded
Open this post in threaded view
|

Re: Spellchecker delivers far too few suggestions

Martin Dietze-2
In reply to this post by Erick Erickson
On 17 December 2014 at 16:41, Erick Erickson <[hidden email]> wrote:
> First, I'd look in your corpus for "bnak". The problem with index-based
> suggestions is that if your index contains garbage, they're "correctly
> spelled" since they're in the index. TermsComponent is very useful for this.
>
> You can also loosen up the match criteria, and as I remember the collations
> parameter does some permutations of the word (but my memory of how that
> works is shaky).

Thank you for your response. I now set up a TermsComponent for this
case as follows:

  <searchComponent name="termsComponent" class="solr.TermsComponent”/>

  <requestHandler name="terms" class="solr.SearchHandler">
    <lst name="defaults">
      <bool name="terms">true</bool>
      <str name="terms.fl">text</str>
    </lst>
    <arr name="components">
      <str>termsComponent</str>
    </arr>
  </requestHandler>

… constructed a MapSolrParams from which I create my
SolrQueryRequestBase object using these params (“text” is the name of
my catch-all-field):

{{params(terms.prefix="bnak"),defaults(terms.fl=text&terms=true)}}

… and call my core with it, yielding the following:

{responseHeader={status=0,QTime=5416},terms={text={}}}

That seems to imply that indeed the term “bnak” is not in my index, or
am I using the TermsComponent the wrong way?

Cheers,

Martin

--
---------- [hidden email] --/-- [hidden email] ----
------------- / http://herbert.the-little-red-haired-girl.org / -------------
Reply | Threaded
Open this post in threaded view
|

Re: Spellchecker delivers far too few suggestions

Erick Erickson
That seems fine. What happens if your prefix is just "b"? Just to verify that
you're getting something back....

Although I usually just enable the terms component and specify the field
and all that on the URL, but what you're doing should work fine....

This is seeming like a puzzler...

Erick

On Wed, Dec 17, 2014 at 10:45 AM, Martin Dietze <[hidden email]> wrote:

> On 17 December 2014 at 16:41, Erick Erickson <[hidden email]> wrote:
>> First, I'd look in your corpus for "bnak". The problem with index-based
>> suggestions is that if your index contains garbage, they're "correctly
>> spelled" since they're in the index. TermsComponent is very useful for this.
>>
>> You can also loosen up the match criteria, and as I remember the collations
>> parameter does some permutations of the word (but my memory of how that
>> works is shaky).
>
> Thank you for your response. I now set up a TermsComponent for this
> case as follows:
>
>   <searchComponent name="termsComponent" class="solr.TermsComponent”/>
>
>   <requestHandler name="terms" class="solr.SearchHandler">
>     <lst name="defaults">
>       <bool name="terms">true</bool>
>       <str name="terms.fl">text</str>
>     </lst>
>     <arr name="components">
>       <str>termsComponent</str>
>     </arr>
>   </requestHandler>
>
> … constructed a MapSolrParams from which I create my
> SolrQueryRequestBase object using these params (“text” is the name of
> my catch-all-field):
>
> {{params(terms.prefix="bnak"),defaults(terms.fl=text&terms=true)}}
>
> … and call my core with it, yielding the following:
>
> {responseHeader={status=0,QTime=5416},terms={text={}}}
>
> That seems to imply that indeed the term “bnak” is not in my index, or
> am I using the TermsComponent the wrong way?
>
> Cheers,
>
> Martin
>
> --
> ---------- [hidden email] --/-- [hidden email] ----
> ------------- / http://herbert.the-little-red-haired-girl.org / -------------
Reply | Threaded
Open this post in threaded view
|

Re: Spellchecker delivers far too few suggestions

Martin Dietze-2
On 17 December 2014 at 18:08, Erick Erickson <[hidden email]> wrote:
> This is seeming like a puzzler...

I’ve got to the point that I do get suggestions if I find no document
at all. The problem was seemingly caused by the way I quoted my search
queries.

Still I don’t get suggestions for terms that are in the index. For
instance, if I create a document that contains the term “bnak”, I
would like to display a result like: “found one occurrence of ‘bnak’,
but did you mean: <list of suggestions>”.

Is there a setting I’ve missed?


--
---------- [hidden email] --/-- [hidden email] ----
------------- / http://herbert.the-little-red-haired-girl.org / -------------
Reply | Threaded
Open this post in threaded view
|

RE: Spellchecker delivers far too few suggestions

Dyer, James-2
Martin,

If you would like to get suggestions even for terms occurring in the index, set "spellcheck.alternativeTermCount" to a value >0 .  You can use the same value as for "spellcheck.count", or a lower value if you want fewer results than for terms not in the index.

See https://cwiki.apache.org/confluence/display/solr/Spell+Checking#SpellChecking-The{{spellcheck.alternativeTermCount}}Parameter

With this, you might also want to set "spellcheck.maxResultsForSuggest" to a value >0.  This will prevent the spellchecker from doing work even when enough results returned that you wouldn't want to suggest anything to the user.

See https://cwiki.apache.org/confluence/display/solr/Spell+Checking#SpellChecking-The{{spellcheck.maxResultsForSuggest}}Parameter

Used with the "maxCollationTries" parameter, you should be getting fairly good "did-you-mean"-style suggestions.

See https://cwiki.apache.org/confluence/display/solr/Spell+Checking#SpellChecking-The{{spellcheck.maxCollationTries}}Parameter

James Dyer
Ingram Content Group
(615) 213-4311


-----Original Message-----
From: Martin Dietze [mailto:[hidden email]]
Sent: Thursday, December 18, 2014 3:02 AM
To: [hidden email]
Subject: Re: Spellchecker delivers far too few suggestions

On 17 December 2014 at 18:08, Erick Erickson <[hidden email]> wrote:
> This is seeming like a puzzler...

I’ve got to the point that I do get suggestions if I find no document
at all. The problem was seemingly caused by the way I quoted my search
queries.

Still I don’t get suggestions for terms that are in the index. For
instance, if I create a document that contains the term “bnak”, I
would like to display a result like: “found one occurrence of ‘bnak’,
but did you mean: <list of suggestions>”.

Is there a setting I’ve missed?


--
---------- [hidden email] --/-- [hidden email] ----
------------- / http://herbert.the-little-red-haired-girl.org / -------------