[jira] Created: (SOLR-321) misleading comment about spellchecker's termSourceField in solrconfig.xml

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

[jira] Created: (SOLR-321) misleading comment about spellchecker's termSourceField in solrconfig.xml

Prajeeth Emanuel (Jira)
misleading comment about spellchecker's termSourceField in solrconfig.xml
-------------------------------------------------------------------------

                 Key: SOLR-321
                 URL: https://issues.apache.org/jira/browse/SOLR-321
             Project: Solr
          Issue Type: Bug
          Components: documentation
            Reporter: Daniel Naber


The config file comment says this about "termSourceField":

"the field in your schema that you want to be able to build
your spell index on. This should be a field that uses a very
simple FieldType without a lot of Analysis (ie: string)"

I think this is wrong or at least misleading: the Lucene spellchecker uses a TermEnum to access the terms of this field, so the only requirement is that the field needs to be indexed. Isn't the common usecase of the spellchecker to use all of your terms in e.g. "body" as candidates for spellchecking? Then the field given for termSourceField should be e.g. "body", which is usually indexed and tokenized.

Of course, if you want "new yorc" to be corrected to "new york" this won't work with a tokenized field. I suggest this text for the comment:

The field in your schema that you want to be able to build your spell index on. This must be a field that is indexed. If it is of type "text" all the terms in that field will be used as separate candidates for spellchecking, if it is of type "string" the complete content of that field is considered a single term. This might me useful if you have a field whose only content is something like 'new york' and the text you want to have spell checked is 'new yrok'.

(besied that, spellchecking more than one term doesn't seem to be supported, I'll see if I add a comment about that to the wiki)



--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (SOLR-321) misleading comment about spellchecker's termSourceField in solrconfig.xml

Prajeeth Emanuel (Jira)

    [ https://issues.apache.org/jira/browse/SOLR-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12516219 ]

Daniel Naber commented on SOLR-321:
-----------------------------------

Ah, I now understand the "simple FieldType" comment refers mostly to word stemming, not to analysis in general. I still think the text should be changed so that people don't get the impression the spell checker can only be built on "string" fields. Maybe one could also add the URL pf the wiki page to solrconfig.xml.

> misleading comment about spellchecker's termSourceField in solrconfig.xml
> -------------------------------------------------------------------------
>
>                 Key: SOLR-321
>                 URL: https://issues.apache.org/jira/browse/SOLR-321
>             Project: Solr
>          Issue Type: Bug
>          Components: documentation
>            Reporter: Daniel Naber
>
> The config file comment says this about "termSourceField":
> "the field in your schema that you want to be able to build
> your spell index on. This should be a field that uses a very
> simple FieldType without a lot of Analysis (ie: string)"
> I think this is wrong or at least misleading: the Lucene spellchecker uses a TermEnum to access the terms of this field, so the only requirement is that the field needs to be indexed. Isn't the common usecase of the spellchecker to use all of your terms in e.g. "body" as candidates for spellchecking? Then the field given for termSourceField should be e.g. "body", which is usually indexed and tokenized.
> Of course, if you want "new yorc" to be corrected to "new york" this won't work with a tokenized field. I suggest this text for the comment:
> The field in your schema that you want to be able to build your spell index on. This must be a field that is indexed. If it is of type "text" all the terms in that field will be used as separate candidates for spellchecking, if it is of type "string" the complete content of that field is considered a single term. This might me useful if you have a field whose only content is something like 'new york' and the text you want to have spell checked is 'new yrok'.
> (besied that, spellchecking more than one term doesn't seem to be supported, I'll see if I add a comment about that to the wiki)

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Resolved: (SOLR-321) misleading comment about spellchecker's termSourceField in solrconfig.xml

Prajeeth Emanuel (Jira)
In reply to this post by Prajeeth Emanuel (Jira)

     [ https://issues.apache.org/jira/browse/SOLR-321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Otis Gospodnetic resolved SOLR-321.
-----------------------------------

    Resolution: Won't Fix

Looks like nobody else got confused for a while, resolving.

> misleading comment about spellchecker's termSourceField in solrconfig.xml
> -------------------------------------------------------------------------
>
>                 Key: SOLR-321
>                 URL: https://issues.apache.org/jira/browse/SOLR-321
>             Project: Solr
>          Issue Type: Bug
>          Components: documentation
>            Reporter: Daniel Naber
>
> The config file comment says this about "termSourceField":
> "the field in your schema that you want to be able to build
> your spell index on. This should be a field that uses a very
> simple FieldType without a lot of Analysis (ie: string)"
> I think this is wrong or at least misleading: the Lucene spellchecker uses a TermEnum to access the terms of this field, so the only requirement is that the field needs to be indexed. Isn't the common usecase of the spellchecker to use all of your terms in e.g. "body" as candidates for spellchecking? Then the field given for termSourceField should be e.g. "body", which is usually indexed and tokenized.
> Of course, if you want "new yorc" to be corrected to "new york" this won't work with a tokenized field. I suggest this text for the comment:
> The field in your schema that you want to be able to build your spell index on. This must be a field that is indexed. If it is of type "text" all the terms in that field will be used as separate candidates for spellchecking, if it is of type "string" the complete content of that field is considered a single term. This might me useful if you have a field whose only content is something like 'new york' and the text you want to have spell checked is 'new yrok'.
> (besied that, spellchecking more than one term doesn't seem to be supported, I'll see if I add a comment about that to the wiki)

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.