Spellcheck returning suggestions for words that exist in the dictionary

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Spellcheck returning suggestions for words that exist in the dictionary

Sanjana Sridhar-2
Spellcheck works perfectly when I misspell a word, but if there is a word
that already exists in the dictionary, Solr still returns suggestions for
it. eg: bike gets spell corrected to bake.

 I unfortunately cannot use the *maxResultsForSuggest* field as I need to
return the correct spelling irrespective of if results exist or not.

*Is there a way to prevent Solr from suggesting a spelling if the word
already exists in the dictionary?*

I'm using both the IndexBasedSpellChecker and FileBasedSpellChecker


Relevant code snippets from solrconfig.xml

*REQUEST HANDLER*

 <requestHandler name="/query" class="solr.SearchHandler">
    <lst name="defaults">
      <str name="defType">edismax</str>
      <str name="uf">*</str>
      <str name="rows">10</str>
      <str name="echoParams">explicit</str>
      <str name="spellcheck">true</str>
      <str name="spellcheck.collate">true</str>
      <str name="spellcheck.dictionary">file_spellcheck</str>
      <str name="spellcheck.dictionary">index_spellcheck</str>
    </lst>
    <arr name="last-components">
      <str>spellcheck</str>
    </arr>
  </requestHandler>


*SPELLCHECK COMPONENT*

<searchComponent name="spellcheck" class="solr.SpellCheckComponent">
      <str name="queryAnalyzerFieldType">text_general</str>
      <!-- a spellchecker built from a field of the main index -->
      <lst name="spellchecker">
        <str name="name">index_spellcheck</str>
        <str name="field">content</str>
        <str name="classname">solr.IndexBasedSpellChecker</str>
        <str name="spellcheckIndexDir">spellchecker</str>
        <str
name="distanceMeasure">org.apache.lucene.search.spell.LevensteinDistance</str>
        <str name="accuracy">0.75</str>
        <int name="maxEdits">1</int>
        <int name="minPrefix">0</int>
        <int name="maxInspections">5</int>
        <int name="minQueryLength">4</int>
        <!-- <float name="maxQueryFrequency">0.01</float> -->
        <!-- <float name="thresholdTokenFrequency">.01</float> -->
      </lst>

      <!-- A spellchecker that reads the list of words from a file -->
      <lst name="spellchecker">
        <str name="classname">solr.FileBasedSpellChecker</str>
        <str name="name">file_spellcheck</str>
        <str name="field">content</str>
        <str name="accuracy">0.75</str>
        <str name="sourceLocation">spellings.txt</str>
        <str name="characterEncoding">UTF-8</str>
        <str name="spellcheckIndexDir">spellcheckerFile</str>
      </lst>
    </searchComponent>


*FIELD IN MANAGED-SCHEMA*

    <field name="content" type="text_spell" indexed="true" stored="false"
multiValued="true"/>
    <fieldType name="text_spell" class="solr.TextField"
positionIncrementGap="100">
     <analyzer>
      <tokenizer class="solr.WhitespaceTokenizerFactory"/>
       <filter class="solr.PatternReplaceFilterFactory" pattern="'"
replacement="" replace="all" />
       <filter class="solr.WordDelimiterFilterFactory"
        generateWordParts="1"
        generateNumberParts="1"
        catenateWords="1"
        stemEnglishPossessive="0"
      />
      <filter class="solr.LowerCaseFilterFactory"/>
     </analyzer>
    </fieldType>
    <!-- English spell check fields-->
    <copyField source="name_en" dest="content"/>
    <copyField source="desc_en" dest="content"/>
    <copyField source="keywords_en" dest="content"/>
    <copyField source="brand_name_en" dest="content"/>


Any help would be greatly appreciated.


Thank you,

Sanjana Sridhar

--
IMPORTANT NOTICE:  This message, including any attachments (hereinafter
collectively referred to as "Communication"), is intended only for the addressee(s)
named above.  This Communication may include information that is
privileged, confidential and exempt from disclosure under applicable law.
 If the recipient of this Communication is not the intended recipient, or
the employee or agent responsible for delivering this Communication to the
intended recipient, you are notified that any dissemination, distribution
or copying of this Communication is strictly prohibited.  If you have
received this Communication in error, please notify the sender immediately
by phone or email and permanently delete this Communication from your
computer without making a copy. Thank you.
Reply | Threaded
Open this post in threaded view
|

Re: Spellcheck returning suggestions for words that exist in the dictionary

alessandro.benedetti
Which Solr version are you using ?

From the documentation :
"Only query words, which are absent in index or too rare ones (below
maxQueryFrequency ) are considered as misspelled and used for finding
suggestions.
...
These parameters (maxQueryFrequency and thresholdTokenFrequency) can be a
percentage (such as .01, or 1%) or an absolute value (such as 4)."

Checking in the latest source code[1] : public static final float
DEFAULT_MAXQUERYFREQUENCY = 0.01f;

This means that for the direct Solr Spellcheck, you should not get the
suggestion if the term has a Document Frequency >=0.01 ( so if a term is in
the index ) .
Can you show us the snippet of the result you got ?








-----
---------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
---------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
Reply | Threaded
Open this post in threaded view
|

Re: Spellcheck returning suggestions for words that exist in the dictionary

Sanjana Sridhar-2
Hi Alessandro,

I'm currently on Solr version 6.2.1, but will soon be moving to 6.6. I'm
not using DirectSolrSpellcheck, but using Index and File based.
The words I was testing against are definitely available in the File and
possibly in the Index as well.

What I found was if I don't set the maxResultsForSuggest field, Solr would
always try to spell correct. So for example,

Searching for "nike", gets corrected to "bike",

{"responseHeader":{"status":0,"QTime":2167,"params":{"spellcheck.q":"*nike*
","spellcheck":"true","wt":"json","spellcheck.build":"true","spellcheck.extendedResults":"true"}},"command":"build","response":{"numFound":0,"start":0,"docs":[]},"spellcheck":{"suggestions":["nike",{"numFound":1,"startOffset":0,"endOffset":4,"origFreq":0,"suggestion":[{"word":"
*bike*
","freq":-1}]}],"correctlySpelled":false,"collations":["collation","bike"]}}

But searching for "bike", gets corrected to "bake"

{"responseHeader":{"status":0,"QTime":2048,"params":{"spellcheck.q":"*bike*
","spellcheck":"true","wt":"json","spellcheck.build":"true","spellcheck.extendedResults":"true"}},"command":"build","response":{"numFound":0,"start":0,"docs":[]},"spellcheck":{"suggestions":["bike",{"numFound":1,"startOffset":0,"endOffset":4,"origFreq":0,"suggestion":[{"word":"
*bake*
","freq":-1}]}],"correctlySpelled":false,"collations":["collation","bake"]}}




On Mon, Nov 13, 2017 at 10:43 AM, alessandro.benedetti <[hidden email]
> wrote:

> Which Solr version are you using ?
>
> From the documentation :
> "Only query words, which are absent in index or too rare ones (below
> maxQueryFrequency ) are considered as misspelled and used for finding
> suggestions.
> ...
> These parameters (maxQueryFrequency and thresholdTokenFrequency) can be a
> percentage (such as .01, or 1%) or an absolute value (such as 4)."
>
> Checking in the latest source code[1] : public static final float
> DEFAULT_MAXQUERYFREQUENCY = 0.01f;
>
> This means that for the direct Solr Spellcheck, you should not get the
> suggestion if the term has a Document Frequency >=0.01 ( so if a term is in
> the index ) .
> Can you show us the snippet of the result you got ?
>
>
>
>
>
>
>
>
> -----
> ---------------
> Alessandro Benedetti
> Search Consultant, R&D Software Engineer, Director
> Sease Ltd. - www.sease.io
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>



--

<http://corp.flipp.com> <http://corp.flipp.com>

Sanjana Sridhar
Flipp Corporation

p: 226-600-2281
e: [hidden email]

--
IMPORTANT NOTICE:  This message, including any attachments (hereinafter
collectively referred to as "Communication"), is intended only for the addressee(s)
named above.  This Communication may include information that is
privileged, confidential and exempt from disclosure under applicable law.
 If the recipient of this Communication is not the intended recipient, or
the employee or agent responsible for delivering this Communication to the
intended recipient, you are notified that any dissemination, distribution
or copying of this Communication is strictly prohibited.  If you have
received this Communication in error, please notify the sender immediately
by phone or email and permanently delete this Communication from your
computer without making a copy. Thank you.