[jira] Created: (SOLR-2010) Improvements to SpellCheckComponent Collate functionality

classic Classic list List threaded Threaded
51 messages Options
123
Reply | Threaded
Open this post in threaded view
|

[jira] Created: (SOLR-2010) Improvements to SpellCheckComponent Collate functionality

JIRA jira@apache.org
Improvements to SpellCheckComponent Collate functionality
---------------------------------------------------------

                 Key: SOLR-2010
                 URL: https://issues.apache.org/jira/browse/SOLR-2010
             Project: Solr
          Issue Type: New Feature
          Components: clients - java, spellchecker
    Affects Versions: 1.4.1
         Environment: Tested against trunk revision 966633
            Reporter: James Dyer
            Priority: Minor


Improvements to SpellCheckComponent Collate functionality

Our project requires a better Spell Check Collator.  I'm contributing this as a patch to get suggestions for improvements and in case there is a broader need for these features.

1. Only return collations that are guaranteed to result in hits if re-queried (applying original fq params also).  This is especially helpful when there is more than one correction per query.  The 1.4 behavior does not verify that a particular combination will actually return hits.
2. Provide the option to get multiple collation suggestions
3. Provide extended collation results including the # of hits re-querying will return and a breakdown of each misspelled word and its correction.

This patch is similar to what is described in SOLR-507 item #1.  Also, this patch provides a viable workaround for the problem discussed in SOLR-1074.  A dictionary could be created that combines the terms from the multiple fields.  The collator then would prune out any spurious suggestions this would cause.

This patch adds the following spellcheck parameters:

1. spellcheck.maxCollationTries - maximum # of collation possibilities to try before giving up.  Lower values ensure better performance.  Higher values may be necessary to find a collation that can return results.  Default is 0, which maintains backwards-compatible behavior (do not check collations).

2. spellcheck.maxCollations - maximum # of collations to return.  Default is 1, which maintains backwards-compatible behavior.

3. spellcheck.collateExtendedResult - if true, returns an expanded response format detailing collations found.  default is false, which maintains backwards-compatible behavior.  When true, output is like this (in context):

<lst name="spellcheck">
        <lst name="suggestions">
                <lst name="hopq">
                        <int name="numFound">94</int>
                        <int name="startOffset">7</int>
                        <int name="endOffset">11</int>
                        <arr name="suggestion">
                                <str>hope</str>
                                <str>how</str>
                                <str>hope</str>
                                <str>chops</str>
                                <str>hoped</str>
                                etc
                        </arr>
                <lst name="faill">
                        <int name="numFound">100</int>
                        <int name="startOffset">16</int>
                        <int name="endOffset">21</int>
                        <arr name="suggestion">
                                <str>fall</str>
                                <str>fails</str>
                                <str>fail</str>
                                <str>fill</str>
                                <str>faith</str>
                                <str>all</str>
                                etc
                        </arr>
                </lst>
                <lst name="collation">
                        <str name="collationQuery">Title:(how AND fails)</str>
                        <int name="hits">2</int>
                        <lst name="misspellingsAndCorrections">
                                <str name="hopq">how</str>
                                <str name="faill">fails</str>
                        </lst>
                </lst>
                <lst name="collation">
                        <str name="collationQuery">Title:(hope AND faith)</str>
                        <int name="hits">2</int>
                        <lst name="misspellingsAndCorrections">
                                <str name="hopq">hope</str>
                                <str name="faill">faith</str>
                        </lst>
                </lst>
                <lst name="collation">
                        <str name="collationQuery">Title:(chops AND all)</str>
                        <int name="hits">1</int>
                        <lst name="misspellingsAndCorrections">
                                <str name="hopq">chops</str>
                                <str name="faill">all</str>
                        </lst>
                </lst>
        </lst>
</lst>

In addition, SOLRJ is updated to include SpellCheckResponse.getCollatedResults(), which will return the expanded Collation format.  getCollatedResult(), which returns a single String, is retained for backwards-compatibility.  Other APIs were not changed but will still work provided that spellcheck.collateExtendedResult is false.

This likely will not return valid results if using Shards.  Rather, a more robust interaction with the index would be necessary than what exists in SpellCheckCollator.collate().

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (SOLR-2010) Improvements to SpellCheckComponent Collate functionality

JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/SOLR-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

James Dyer updated SOLR-2010:
-----------------------------

    Attachment: SOLR-2010.patch

Tested against branch version #96633

> Improvements to SpellCheckComponent Collate functionality
> ---------------------------------------------------------
>
>                 Key: SOLR-2010
>                 URL: https://issues.apache.org/jira/browse/SOLR-2010
>             Project: Solr
>          Issue Type: New Feature
>          Components: clients - java, spellchecker
>    Affects Versions: 1.4.1
>         Environment: Tested against trunk revision 966633
>            Reporter: James Dyer
>            Priority: Minor
>         Attachments: SOLR-2010.patch
>
>
> Improvements to SpellCheckComponent Collate functionality
> Our project requires a better Spell Check Collator.  I'm contributing this as a patch to get suggestions for improvements and in case there is a broader need for these features.
> 1. Only return collations that are guaranteed to result in hits if re-queried (applying original fq params also).  This is especially helpful when there is more than one correction per query.  The 1.4 behavior does not verify that a particular combination will actually return hits.
> 2. Provide the option to get multiple collation suggestions
> 3. Provide extended collation results including the # of hits re-querying will return and a breakdown of each misspelled word and its correction.
> This patch is similar to what is described in SOLR-507 item #1.  Also, this patch provides a viable workaround for the problem discussed in SOLR-1074.  A dictionary could be created that combines the terms from the multiple fields.  The collator then would prune out any spurious suggestions this would cause.
> This patch adds the following spellcheck parameters:
> 1. spellcheck.maxCollationTries - maximum # of collation possibilities to try before giving up.  Lower values ensure better performance.  Higher values may be necessary to find a collation that can return results.  Default is 0, which maintains backwards-compatible behavior (do not check collations).
> 2. spellcheck.maxCollations - maximum # of collations to return.  Default is 1, which maintains backwards-compatible behavior.
> 3. spellcheck.collateExtendedResult - if true, returns an expanded response format detailing collations found.  default is false, which maintains backwards-compatible behavior.  When true, output is like this (in context):
> <lst name="spellcheck">
> <lst name="suggestions">
> <lst name="hopq">
> <int name="numFound">94</int>
> <int name="startOffset">7</int>
> <int name="endOffset">11</int>
> <arr name="suggestion">
> <str>hope</str>
> <str>how</str>
> <str>hope</str>
> <str>chops</str>
> <str>hoped</str>
> etc
> </arr>
> <lst name="faill">
> <int name="numFound">100</int>
> <int name="startOffset">16</int>
> <int name="endOffset">21</int>
> <arr name="suggestion">
> <str>fall</str>
> <str>fails</str>
> <str>fail</str>
> <str>fill</str>
> <str>faith</str>
> <str>all</str>
> etc
> </arr>
> </lst>
> <lst name="collation">
> <str name="collationQuery">Title:(how AND fails)</str>
> <int name="hits">2</int>
> <lst name="misspellingsAndCorrections">
> <str name="hopq">how</str>
> <str name="faill">fails</str>
> </lst>
> </lst>
> <lst name="collation">
> <str name="collationQuery">Title:(hope AND faith)</str>
> <int name="hits">2</int>
> <lst name="misspellingsAndCorrections">
> <str name="hopq">hope</str>
> <str name="faill">faith</str>
> </lst>
> </lst>
> <lst name="collation">
> <str name="collationQuery">Title:(chops AND all)</str>
> <int name="hits">1</int>
> <lst name="misspellingsAndCorrections">
> <str name="hopq">chops</str>
> <str name="faill">all</str>
> </lst>
> </lst>
> </lst>
> </lst>
> In addition, SOLRJ is updated to include SpellCheckResponse.getCollatedResults(), which will return the expanded Collation format.  getCollatedResult(), which returns a single String, is retained for backwards-compatibility.  Other APIs were not changed but will still work provided that spellcheck.collateExtendedResult is false.
> This likely will not return valid results if using Shards.  Rather, a more robust interaction with the index would be necessary than what exists in SpellCheckCollator.collate().

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Assigned: (SOLR-2010) Improvements to SpellCheckComponent Collate functionality

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/SOLR-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Grant Ingersoll reassigned SOLR-2010:
-------------------------------------

    Assignee: Grant Ingersoll

> Improvements to SpellCheckComponent Collate functionality
> ---------------------------------------------------------
>
>                 Key: SOLR-2010
>                 URL: https://issues.apache.org/jira/browse/SOLR-2010
>             Project: Solr
>          Issue Type: New Feature
>          Components: clients - java, spellchecker
>    Affects Versions: 1.4.1
>         Environment: Tested against trunk revision 966633
>            Reporter: James Dyer
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: SOLR-2010.patch
>
>
> Improvements to SpellCheckComponent Collate functionality
> Our project requires a better Spell Check Collator.  I'm contributing this as a patch to get suggestions for improvements and in case there is a broader need for these features.
> 1. Only return collations that are guaranteed to result in hits if re-queried (applying original fq params also).  This is especially helpful when there is more than one correction per query.  The 1.4 behavior does not verify that a particular combination will actually return hits.
> 2. Provide the option to get multiple collation suggestions
> 3. Provide extended collation results including the # of hits re-querying will return and a breakdown of each misspelled word and its correction.
> This patch is similar to what is described in SOLR-507 item #1.  Also, this patch provides a viable workaround for the problem discussed in SOLR-1074.  A dictionary could be created that combines the terms from the multiple fields.  The collator then would prune out any spurious suggestions this would cause.
> This patch adds the following spellcheck parameters:
> 1. spellcheck.maxCollationTries - maximum # of collation possibilities to try before giving up.  Lower values ensure better performance.  Higher values may be necessary to find a collation that can return results.  Default is 0, which maintains backwards-compatible behavior (do not check collations).
> 2. spellcheck.maxCollations - maximum # of collations to return.  Default is 1, which maintains backwards-compatible behavior.
> 3. spellcheck.collateExtendedResult - if true, returns an expanded response format detailing collations found.  default is false, which maintains backwards-compatible behavior.  When true, output is like this (in context):
> <lst name="spellcheck">
> <lst name="suggestions">
> <lst name="hopq">
> <int name="numFound">94</int>
> <int name="startOffset">7</int>
> <int name="endOffset">11</int>
> <arr name="suggestion">
> <str>hope</str>
> <str>how</str>
> <str>hope</str>
> <str>chops</str>
> <str>hoped</str>
> etc
> </arr>
> <lst name="faill">
> <int name="numFound">100</int>
> <int name="startOffset">16</int>
> <int name="endOffset">21</int>
> <arr name="suggestion">
> <str>fall</str>
> <str>fails</str>
> <str>fail</str>
> <str>fill</str>
> <str>faith</str>
> <str>all</str>
> etc
> </arr>
> </lst>
> <lst name="collation">
> <str name="collationQuery">Title:(how AND fails)</str>
> <int name="hits">2</int>
> <lst name="misspellingsAndCorrections">
> <str name="hopq">how</str>
> <str name="faill">fails</str>
> </lst>
> </lst>
> <lst name="collation">
> <str name="collationQuery">Title:(hope AND faith)</str>
> <int name="hits">2</int>
> <lst name="misspellingsAndCorrections">
> <str name="hopq">hope</str>
> <str name="faill">faith</str>
> </lst>
> </lst>
> <lst name="collation">
> <str name="collationQuery">Title:(chops AND all)</str>
> <int name="hits">1</int>
> <lst name="misspellingsAndCorrections">
> <str name="hopq">chops</str>
> <str name="faill">all</str>
> </lst>
> </lst>
> </lst>
> </lst>
> In addition, SOLRJ is updated to include SpellCheckResponse.getCollatedResults(), which will return the expanded Collation format.  getCollatedResult(), which returns a single String, is retained for backwards-compatibility.  Other APIs were not changed but will still work provided that spellcheck.collateExtendedResult is false.
> This likely will not return valid results if using Shards.  Rather, a more robust interaction with the index would be necessary than what exists in SpellCheckCollator.collate().

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (SOLR-2010) Improvements to SpellCheckComponent Collate functionality

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/SOLR-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12896585#action_12896585 ]

Grant Ingersoll commented on SOLR-2010:
---------------------------------------

James, thanks for the patch.  At first glance this looks great and I would like to see it incorporated.

bq. This likely will not return valid results if using Shards. Rather, a more robust interaction with the index would be necessary than what exists in SpellCheckCollator.collate().

Perhaps we should just have a simple Search Handler that is QueryComp only, either that or we need a way to easily turn off all components but the query component.  That way, we could take advantage of the existing sharding capabilities.



> Improvements to SpellCheckComponent Collate functionality
> ---------------------------------------------------------
>
>                 Key: SOLR-2010
>                 URL: https://issues.apache.org/jira/browse/SOLR-2010
>             Project: Solr
>          Issue Type: New Feature
>          Components: clients - java, spellchecker
>    Affects Versions: 1.4.1
>         Environment: Tested against trunk revision 966633
>            Reporter: James Dyer
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: SOLR-2010.patch, SOLR-2010.patch
>
>
> Improvements to SpellCheckComponent Collate functionality
> Our project requires a better Spell Check Collator.  I'm contributing this as a patch to get suggestions for improvements and in case there is a broader need for these features.
> 1. Only return collations that are guaranteed to result in hits if re-queried (applying original fq params also).  This is especially helpful when there is more than one correction per query.  The 1.4 behavior does not verify that a particular combination will actually return hits.
> 2. Provide the option to get multiple collation suggestions
> 3. Provide extended collation results including the # of hits re-querying will return and a breakdown of each misspelled word and its correction.
> This patch is similar to what is described in SOLR-507 item #1.  Also, this patch provides a viable workaround for the problem discussed in SOLR-1074.  A dictionary could be created that combines the terms from the multiple fields.  The collator then would prune out any spurious suggestions this would cause.
> This patch adds the following spellcheck parameters:
> 1. spellcheck.maxCollationTries - maximum # of collation possibilities to try before giving up.  Lower values ensure better performance.  Higher values may be necessary to find a collation that can return results.  Default is 0, which maintains backwards-compatible behavior (do not check collations).
> 2. spellcheck.maxCollations - maximum # of collations to return.  Default is 1, which maintains backwards-compatible behavior.
> 3. spellcheck.collateExtendedResult - if true, returns an expanded response format detailing collations found.  default is false, which maintains backwards-compatible behavior.  When true, output is like this (in context):
> <lst name="spellcheck">
> <lst name="suggestions">
> <lst name="hopq">
> <int name="numFound">94</int>
> <int name="startOffset">7</int>
> <int name="endOffset">11</int>
> <arr name="suggestion">
> <str>hope</str>
> <str>how</str>
> <str>hope</str>
> <str>chops</str>
> <str>hoped</str>
> etc
> </arr>
> <lst name="faill">
> <int name="numFound">100</int>
> <int name="startOffset">16</int>
> <int name="endOffset">21</int>
> <arr name="suggestion">
> <str>fall</str>
> <str>fails</str>
> <str>fail</str>
> <str>fill</str>
> <str>faith</str>
> <str>all</str>
> etc
> </arr>
> </lst>
> <lst name="collation">
> <str name="collationQuery">Title:(how AND fails)</str>
> <int name="hits">2</int>
> <lst name="misspellingsAndCorrections">
> <str name="hopq">how</str>
> <str name="faill">fails</str>
> </lst>
> </lst>
> <lst name="collation">
> <str name="collationQuery">Title:(hope AND faith)</str>
> <int name="hits">2</int>
> <lst name="misspellingsAndCorrections">
> <str name="hopq">hope</str>
> <str name="faill">faith</str>
> </lst>
> </lst>
> <lst name="collation">
> <str name="collationQuery">Title:(chops AND all)</str>
> <int name="hits">1</int>
> <lst name="misspellingsAndCorrections">
> <str name="hopq">chops</str>
> <str name="faill">all</str>
> </lst>
> </lst>
> </lst>
> </lst>
> In addition, SOLRJ is updated to include SpellCheckResponse.getCollatedResults(), which will return the expanded Collation format.  getCollatedResult(), which returns a single String, is retained for backwards-compatibility.  Other APIs were not changed but will still work provided that spellcheck.collateExtendedResult is false.
> This likely will not return valid results if using Shards.  Rather, a more robust interaction with the index would be necessary than what exists in SpellCheckCollator.collate().

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (SOLR-2010) Improvements to SpellCheckComponent Collate functionality

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/SOLR-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Grant Ingersoll updated SOLR-2010:
----------------------------------

    Attachment: SOLR-2010.patch

Added license headers

> Improvements to SpellCheckComponent Collate functionality
> ---------------------------------------------------------
>
>                 Key: SOLR-2010
>                 URL: https://issues.apache.org/jira/browse/SOLR-2010
>             Project: Solr
>          Issue Type: New Feature
>          Components: clients - java, spellchecker
>    Affects Versions: 1.4.1
>         Environment: Tested against trunk revision 966633
>            Reporter: James Dyer
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: SOLR-2010.patch, SOLR-2010.patch
>
>
> Improvements to SpellCheckComponent Collate functionality
> Our project requires a better Spell Check Collator.  I'm contributing this as a patch to get suggestions for improvements and in case there is a broader need for these features.
> 1. Only return collations that are guaranteed to result in hits if re-queried (applying original fq params also).  This is especially helpful when there is more than one correction per query.  The 1.4 behavior does not verify that a particular combination will actually return hits.
> 2. Provide the option to get multiple collation suggestions
> 3. Provide extended collation results including the # of hits re-querying will return and a breakdown of each misspelled word and its correction.
> This patch is similar to what is described in SOLR-507 item #1.  Also, this patch provides a viable workaround for the problem discussed in SOLR-1074.  A dictionary could be created that combines the terms from the multiple fields.  The collator then would prune out any spurious suggestions this would cause.
> This patch adds the following spellcheck parameters:
> 1. spellcheck.maxCollationTries - maximum # of collation possibilities to try before giving up.  Lower values ensure better performance.  Higher values may be necessary to find a collation that can return results.  Default is 0, which maintains backwards-compatible behavior (do not check collations).
> 2. spellcheck.maxCollations - maximum # of collations to return.  Default is 1, which maintains backwards-compatible behavior.
> 3. spellcheck.collateExtendedResult - if true, returns an expanded response format detailing collations found.  default is false, which maintains backwards-compatible behavior.  When true, output is like this (in context):
> <lst name="spellcheck">
> <lst name="suggestions">
> <lst name="hopq">
> <int name="numFound">94</int>
> <int name="startOffset">7</int>
> <int name="endOffset">11</int>
> <arr name="suggestion">
> <str>hope</str>
> <str>how</str>
> <str>hope</str>
> <str>chops</str>
> <str>hoped</str>
> etc
> </arr>
> <lst name="faill">
> <int name="numFound">100</int>
> <int name="startOffset">16</int>
> <int name="endOffset">21</int>
> <arr name="suggestion">
> <str>fall</str>
> <str>fails</str>
> <str>fail</str>
> <str>fill</str>
> <str>faith</str>
> <str>all</str>
> etc
> </arr>
> </lst>
> <lst name="collation">
> <str name="collationQuery">Title:(how AND fails)</str>
> <int name="hits">2</int>
> <lst name="misspellingsAndCorrections">
> <str name="hopq">how</str>
> <str name="faill">fails</str>
> </lst>
> </lst>
> <lst name="collation">
> <str name="collationQuery">Title:(hope AND faith)</str>
> <int name="hits">2</int>
> <lst name="misspellingsAndCorrections">
> <str name="hopq">hope</str>
> <str name="faill">faith</str>
> </lst>
> </lst>
> <lst name="collation">
> <str name="collationQuery">Title:(chops AND all)</str>
> <int name="hits">1</int>
> <lst name="misspellingsAndCorrections">
> <str name="hopq">chops</str>
> <str name="faill">all</str>
> </lst>
> </lst>
> </lst>
> </lst>
> In addition, SOLRJ is updated to include SpellCheckResponse.getCollatedResults(), which will return the expanded Collation format.  getCollatedResult(), which returns a single String, is retained for backwards-compatibility.  Other APIs were not changed but will still work provided that spellcheck.collateExtendedResult is false.
> This likely will not return valid results if using Shards.  Rather, a more robust interaction with the index would be necessary than what exists in SpellCheckCollator.collate().

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (SOLR-2010) Improvements to SpellCheckComponent Collate functionality

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/SOLR-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12896898#action_12896898 ]

Tom Phethean commented on SOLR-2010:
------------------------------------

This sounds like a really useful patch, I would definitely like to see it go further as it would be useful for a project I'm currently working on. I have just tried to patch this against 1.4.1 (downloaded today) and got the following errors:

patching file solr/src/test/org/apache/solr/spelling/SpellPossibilityIteratorTest.java
patching file solr/src/test/org/apache/solr/spelling/SpellCheckCollatorTest.java
patching file solr/src/test/org/apache/solr/client/solrj/response/TestSpellCheckResponse.java
Hunk #1 FAILED at 20.
Hunk #2 FAILED at 103.
2 out of 2 hunks FAILED -- saving rejects to file solr/src/test/org/apache/solr/client/solrj/response/TestSpellCheckResponse.java.rej
patching file solr/src/java/org/apache/solr/handler/component/SpellCheckComponent.java
Hunk #1 FAILED at 132.
Hunk #2 FAILED at 361.
Hunk #3 FAILED at 405.
Hunk #4 FAILED at 452.
Hunk #5 FAILED at 466.
5 out of 5 hunks FAILED -- saving rejects to file solr/src/java/org/apache/solr/handler/component/SpellCheckComponent.java.rej
patching file solr/src/java/org/apache/solr/spelling/SpellCheckCollation.java
patching file solr/src/java/org/apache/solr/spelling/PossibilityIterator.java
patching file solr/src/java/org/apache/solr/spelling/SpellCheckCorrection.java
patching file solr/src/java/org/apache/solr/spelling/SpellCheckCollator.java
patching file solr/src/common/org/apache/solr/common/params/SpellingParams.java
Hunk #1 FAILED at 78.
1 out of 1 hunk FAILED -- saving rejects to file solr/src/common/org/apache/solr/common/params/SpellingParams.java.rej
patching file solr/src/solrj/org/apache/solr/client/solrj/response/SpellCheckResponse.java
Hunk #1 FAILED at 31.
Hunk #2 FAILED at 46.
Hunk #3 FAILED at 77.
Hunk #4 FAILED at 162.
4 out of 4 hunks FAILED -- saving rejects to file solr/src/solrj/org/apache/solr/client/solrj/response/SpellCheckResponse.java.rej


> Improvements to SpellCheckComponent Collate functionality
> ---------------------------------------------------------
>
>                 Key: SOLR-2010
>                 URL: https://issues.apache.org/jira/browse/SOLR-2010
>             Project: Solr
>          Issue Type: New Feature
>          Components: clients - java, spellchecker
>    Affects Versions: 1.4.1
>         Environment: Tested against trunk revision 966633
>            Reporter: James Dyer
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: SOLR-2010.patch, SOLR-2010.patch
>
>
> Improvements to SpellCheckComponent Collate functionality
> Our project requires a better Spell Check Collator.  I'm contributing this as a patch to get suggestions for improvements and in case there is a broader need for these features.
> 1. Only return collations that are guaranteed to result in hits if re-queried (applying original fq params also).  This is especially helpful when there is more than one correction per query.  The 1.4 behavior does not verify that a particular combination will actually return hits.
> 2. Provide the option to get multiple collation suggestions
> 3. Provide extended collation results including the # of hits re-querying will return and a breakdown of each misspelled word and its correction.
> This patch is similar to what is described in SOLR-507 item #1.  Also, this patch provides a viable workaround for the problem discussed in SOLR-1074.  A dictionary could be created that combines the terms from the multiple fields.  The collator then would prune out any spurious suggestions this would cause.
> This patch adds the following spellcheck parameters:
> 1. spellcheck.maxCollationTries - maximum # of collation possibilities to try before giving up.  Lower values ensure better performance.  Higher values may be necessary to find a collation that can return results.  Default is 0, which maintains backwards-compatible behavior (do not check collations).
> 2. spellcheck.maxCollations - maximum # of collations to return.  Default is 1, which maintains backwards-compatible behavior.
> 3. spellcheck.collateExtendedResult - if true, returns an expanded response format detailing collations found.  default is false, which maintains backwards-compatible behavior.  When true, output is like this (in context):
> <lst name="spellcheck">
> <lst name="suggestions">
> <lst name="hopq">
> <int name="numFound">94</int>
> <int name="startOffset">7</int>
> <int name="endOffset">11</int>
> <arr name="suggestion">
> <str>hope</str>
> <str>how</str>
> <str>hope</str>
> <str>chops</str>
> <str>hoped</str>
> etc
> </arr>
> <lst name="faill">
> <int name="numFound">100</int>
> <int name="startOffset">16</int>
> <int name="endOffset">21</int>
> <arr name="suggestion">
> <str>fall</str>
> <str>fails</str>
> <str>fail</str>
> <str>fill</str>
> <str>faith</str>
> <str>all</str>
> etc
> </arr>
> </lst>
> <lst name="collation">
> <str name="collationQuery">Title:(how AND fails)</str>
> <int name="hits">2</int>
> <lst name="misspellingsAndCorrections">
> <str name="hopq">how</str>
> <str name="faill">fails</str>
> </lst>
> </lst>
> <lst name="collation">
> <str name="collationQuery">Title:(hope AND faith)</str>
> <int name="hits">2</int>
> <lst name="misspellingsAndCorrections">
> <str name="hopq">hope</str>
> <str name="faill">faith</str>
> </lst>
> </lst>
> <lst name="collation">
> <str name="collationQuery">Title:(chops AND all)</str>
> <int name="hits">1</int>
> <lst name="misspellingsAndCorrections">
> <str name="hopq">chops</str>
> <str name="faill">all</str>
> </lst>
> </lst>
> </lst>
> </lst>
> In addition, SOLRJ is updated to include SpellCheckResponse.getCollatedResults(), which will return the expanded Collation format.  getCollatedResult(), which returns a single String, is retained for backwards-compatibility.  Other APIs were not changed but will still work provided that spellcheck.collateExtendedResult is false.
> This likely will not return valid results if using Shards.  Rather, a more robust interaction with the index would be necessary than what exists in SpellCheckCollator.collate().

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (SOLR-2010) Improvements to SpellCheckComponent Collate functionality

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/SOLR-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12896899#action_12896899 ]

Grant Ingersoll commented on SOLR-2010:
---------------------------------------

The patch is currently for trunk.  I think it will likely be the case that we work it out for trunk and then backport.

> Improvements to SpellCheckComponent Collate functionality
> ---------------------------------------------------------
>
>                 Key: SOLR-2010
>                 URL: https://issues.apache.org/jira/browse/SOLR-2010
>             Project: Solr
>          Issue Type: New Feature
>          Components: clients - java, spellchecker
>    Affects Versions: 1.4.1
>         Environment: Tested against trunk revision 966633
>            Reporter: James Dyer
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: SOLR-2010.patch, SOLR-2010.patch
>
>
> Improvements to SpellCheckComponent Collate functionality
> Our project requires a better Spell Check Collator.  I'm contributing this as a patch to get suggestions for improvements and in case there is a broader need for these features.
> 1. Only return collations that are guaranteed to result in hits if re-queried (applying original fq params also).  This is especially helpful when there is more than one correction per query.  The 1.4 behavior does not verify that a particular combination will actually return hits.
> 2. Provide the option to get multiple collation suggestions
> 3. Provide extended collation results including the # of hits re-querying will return and a breakdown of each misspelled word and its correction.
> This patch is similar to what is described in SOLR-507 item #1.  Also, this patch provides a viable workaround for the problem discussed in SOLR-1074.  A dictionary could be created that combines the terms from the multiple fields.  The collator then would prune out any spurious suggestions this would cause.
> This patch adds the following spellcheck parameters:
> 1. spellcheck.maxCollationTries - maximum # of collation possibilities to try before giving up.  Lower values ensure better performance.  Higher values may be necessary to find a collation that can return results.  Default is 0, which maintains backwards-compatible behavior (do not check collations).
> 2. spellcheck.maxCollations - maximum # of collations to return.  Default is 1, which maintains backwards-compatible behavior.
> 3. spellcheck.collateExtendedResult - if true, returns an expanded response format detailing collations found.  default is false, which maintains backwards-compatible behavior.  When true, output is like this (in context):
> <lst name="spellcheck">
> <lst name="suggestions">
> <lst name="hopq">
> <int name="numFound">94</int>
> <int name="startOffset">7</int>
> <int name="endOffset">11</int>
> <arr name="suggestion">
> <str>hope</str>
> <str>how</str>
> <str>hope</str>
> <str>chops</str>
> <str>hoped</str>
> etc
> </arr>
> <lst name="faill">
> <int name="numFound">100</int>
> <int name="startOffset">16</int>
> <int name="endOffset">21</int>
> <arr name="suggestion">
> <str>fall</str>
> <str>fails</str>
> <str>fail</str>
> <str>fill</str>
> <str>faith</str>
> <str>all</str>
> etc
> </arr>
> </lst>
> <lst name="collation">
> <str name="collationQuery">Title:(how AND fails)</str>
> <int name="hits">2</int>
> <lst name="misspellingsAndCorrections">
> <str name="hopq">how</str>
> <str name="faill">fails</str>
> </lst>
> </lst>
> <lst name="collation">
> <str name="collationQuery">Title:(hope AND faith)</str>
> <int name="hits">2</int>
> <lst name="misspellingsAndCorrections">
> <str name="hopq">hope</str>
> <str name="faill">faith</str>
> </lst>
> </lst>
> <lst name="collation">
> <str name="collationQuery">Title:(chops AND all)</str>
> <int name="hits">1</int>
> <lst name="misspellingsAndCorrections">
> <str name="hopq">chops</str>
> <str name="faill">all</str>
> </lst>
> </lst>
> </lst>
> </lst>
> In addition, SOLRJ is updated to include SpellCheckResponse.getCollatedResults(), which will return the expanded Collation format.  getCollatedResult(), which returns a single String, is retained for backwards-compatibility.  Other APIs were not changed but will still work provided that spellcheck.collateExtendedResult is false.
> This likely will not return valid results if using Shards.  Rather, a more robust interaction with the index would be necessary than what exists in SpellCheckCollator.collate().

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (SOLR-2010) Improvements to SpellCheckComponent Collate functionality

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/SOLR-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12896903#action_12896903 ]

Tom Phethean commented on SOLR-2010:
------------------------------------

Ok, thanks. Do you know if there is a rough timescale on that?

> Improvements to SpellCheckComponent Collate functionality
> ---------------------------------------------------------
>
>                 Key: SOLR-2010
>                 URL: https://issues.apache.org/jira/browse/SOLR-2010
>             Project: Solr
>          Issue Type: New Feature
>          Components: clients - java, spellchecker
>    Affects Versions: 1.4.1
>         Environment: Tested against trunk revision 966633
>            Reporter: James Dyer
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: SOLR-2010.patch, SOLR-2010.patch
>
>
> Improvements to SpellCheckComponent Collate functionality
> Our project requires a better Spell Check Collator.  I'm contributing this as a patch to get suggestions for improvements and in case there is a broader need for these features.
> 1. Only return collations that are guaranteed to result in hits if re-queried (applying original fq params also).  This is especially helpful when there is more than one correction per query.  The 1.4 behavior does not verify that a particular combination will actually return hits.
> 2. Provide the option to get multiple collation suggestions
> 3. Provide extended collation results including the # of hits re-querying will return and a breakdown of each misspelled word and its correction.
> This patch is similar to what is described in SOLR-507 item #1.  Also, this patch provides a viable workaround for the problem discussed in SOLR-1074.  A dictionary could be created that combines the terms from the multiple fields.  The collator then would prune out any spurious suggestions this would cause.
> This patch adds the following spellcheck parameters:
> 1. spellcheck.maxCollationTries - maximum # of collation possibilities to try before giving up.  Lower values ensure better performance.  Higher values may be necessary to find a collation that can return results.  Default is 0, which maintains backwards-compatible behavior (do not check collations).
> 2. spellcheck.maxCollations - maximum # of collations to return.  Default is 1, which maintains backwards-compatible behavior.
> 3. spellcheck.collateExtendedResult - if true, returns an expanded response format detailing collations found.  default is false, which maintains backwards-compatible behavior.  When true, output is like this (in context):
> <lst name="spellcheck">
> <lst name="suggestions">
> <lst name="hopq">
> <int name="numFound">94</int>
> <int name="startOffset">7</int>
> <int name="endOffset">11</int>
> <arr name="suggestion">
> <str>hope</str>
> <str>how</str>
> <str>hope</str>
> <str>chops</str>
> <str>hoped</str>
> etc
> </arr>
> <lst name="faill">
> <int name="numFound">100</int>
> <int name="startOffset">16</int>
> <int name="endOffset">21</int>
> <arr name="suggestion">
> <str>fall</str>
> <str>fails</str>
> <str>fail</str>
> <str>fill</str>
> <str>faith</str>
> <str>all</str>
> etc
> </arr>
> </lst>
> <lst name="collation">
> <str name="collationQuery">Title:(how AND fails)</str>
> <int name="hits">2</int>
> <lst name="misspellingsAndCorrections">
> <str name="hopq">how</str>
> <str name="faill">fails</str>
> </lst>
> </lst>
> <lst name="collation">
> <str name="collationQuery">Title:(hope AND faith)</str>
> <int name="hits">2</int>
> <lst name="misspellingsAndCorrections">
> <str name="hopq">hope</str>
> <str name="faill">faith</str>
> </lst>
> </lst>
> <lst name="collation">
> <str name="collationQuery">Title:(chops AND all)</str>
> <int name="hits">1</int>
> <lst name="misspellingsAndCorrections">
> <str name="hopq">chops</str>
> <str name="faill">all</str>
> </lst>
> </lst>
> </lst>
> </lst>
> In addition, SOLRJ is updated to include SpellCheckResponse.getCollatedResults(), which will return the expanded Collation format.  getCollatedResult(), which returns a single String, is retained for backwards-compatibility.  Other APIs were not changed but will still work provided that spellcheck.collateExtendedResult is false.
> This likely will not return valid results if using Shards.  Rather, a more robust interaction with the index would be necessary than what exists in SpellCheckCollator.collate().

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: [jira] Commented: (SOLR-2010) Improvements to SpellCheckComponent Collate functionality

Dyer, James
Tom,

I'm going to also need this to work with 1.4.1 within the next month or two so if someone else doesn't back-port it to 1.4.1 then I probably will.  I also would like to see this working with shards.  The PossibilityIterator class likely can be made a lot simpler.  If nobody else takes care of these items I will try to find time to do so myself prior to making it work with 1.4.1.

James Dyer
E-Commerce Systems
Ingram Book Company
(615) 213-4311

-----Original Message-----
From: Tom Phethean (JIRA) [mailto:[hidden email]]
Sent: Tuesday, August 10, 2010 10:01 AM
To: [hidden email]
Subject: [jira] Commented: (SOLR-2010) Improvements to SpellCheckComponent Collate functionality


    [ https://issues.apache.org/jira/browse/SOLR-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12896903#action_12896903 ]

Tom Phethean commented on SOLR-2010:
------------------------------------

Ok, thanks. Do you know if there is a rough timescale on that?

> Improvements to SpellCheckComponent Collate functionality
> ---------------------------------------------------------
>
>                 Key: SOLR-2010
>                 URL: https://issues.apache.org/jira/browse/SOLR-2010
>             Project: Solr
>          Issue Type: New Feature
>          Components: clients - java, spellchecker
>    Affects Versions: 1.4.1
>         Environment: Tested against trunk revision 966633
>            Reporter: James Dyer
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: SOLR-2010.patch, SOLR-2010.patch
>
>
> Improvements to SpellCheckComponent Collate functionality
> Our project requires a better Spell Check Collator.  I'm contributing this as a patch to get suggestions for improvements and in case there is a broader need for these features.
> 1. Only return collations that are guaranteed to result in hits if re-queried (applying original fq params also).  This is especially helpful when there is more than one correction per query.  The 1.4 behavior does not verify that a particular combination will actually return hits.
> 2. Provide the option to get multiple collation suggestions
> 3. Provide extended collation results including the # of hits re-querying will return and a breakdown of each misspelled word and its correction.
> This patch is similar to what is described in SOLR-507 item #1.  Also, this patch provides a viable workaround for the problem discussed in SOLR-1074.  A dictionary could be created that combines the terms from the multiple fields.  The collator then would prune out any spurious suggestions this would cause.
> This patch adds the following spellcheck parameters:
> 1. spellcheck.maxCollationTries - maximum # of collation possibilities to try before giving up.  Lower values ensure better performance.  Higher values may be necessary to find a collation that can return results.  Default is 0, which maintains backwards-compatible behavior (do not check collations).
> 2. spellcheck.maxCollations - maximum # of collations to return.  Default is 1, which maintains backwards-compatible behavior.
> 3. spellcheck.collateExtendedResult - if true, returns an expanded response format detailing collations found.  default is false, which maintains backwards-compatible behavior.  When true, output is like this (in context):
> <lst name="spellcheck">
> <lst name="suggestions">
> <lst name="hopq">
> <int name="numFound">94</int>
> <int name="startOffset">7</int>
> <int name="endOffset">11</int>
> <arr name="suggestion">
> <str>hope</str>
> <str>how</str>
> <str>hope</str>
> <str>chops</str>
> <str>hoped</str>
> etc
> </arr>
> <lst name="faill">
> <int name="numFound">100</int>
> <int name="startOffset">16</int>
> <int name="endOffset">21</int>
> <arr name="suggestion">
> <str>fall</str>
> <str>fails</str>
> <str>fail</str>
> <str>fill</str>
> <str>faith</str>
> <str>all</str>
> etc
> </arr>
> </lst>
> <lst name="collation">
> <str name="collationQuery">Title:(how AND fails)</str>
> <int name="hits">2</int>
> <lst name="misspellingsAndCorrections">
> <str name="hopq">how</str>
> <str name="faill">fails</str>
> </lst>
> </lst>
> <lst name="collation">
> <str name="collationQuery">Title:(hope AND faith)</str>
> <int name="hits">2</int>
> <lst name="misspellingsAndCorrections">
> <str name="hopq">hope</str>
> <str name="faill">faith</str>
> </lst>
> </lst>
> <lst name="collation">
> <str name="collationQuery">Title:(chops AND all)</str>
> <int name="hits">1</int>
> <lst name="misspellingsAndCorrections">
> <str name="hopq">chops</str>
> <str name="faill">all</str>
> </lst>
> </lst>
> </lst>
> </lst>
> In addition, SOLRJ is updated to include SpellCheckResponse.getCollatedResults(), which will return the expanded Collation format.  getCollatedResult(), which returns a single String, is retained for backwards-compatibility.  Other APIs were not changed but will still work provided that spellcheck.collateExtendedResult is false.
> This likely will not return valid results if using Shards.  Rather, a more robust interaction with the index would be necessary than what exists in SpellCheckCollator.collate().

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [jira] Commented: (SOLR-2010) Improvements to SpellCheckComponent Collate functionality

Grant Ingersoll-2
Hi James,

Did you see my comments on the issue?  

On Aug 11, 2010, at 12:28 AM, Dyer, James wrote:

> Tom,
>
> I'm going to also need this to work with 1.4.1 within the next month or two so if someone else doesn't back-port it to 1.4.1 then I probably will.  I also would like to see this working with shards.  The PossibilityIterator class likely can be made a lot simpler.  If nobody else takes care of these items I will try to find time to do so myself prior to making it work with 1.4.1.
>
> James Dyer
> E-Commerce Systems
> Ingram Book Company
> (615) 213-4311
>
> -----Original Message-----
> From: Tom Phethean (JIRA) [mailto:[hidden email]]
> Sent: Tuesday, August 10, 2010 10:01 AM
> To: [hidden email]
> Subject: [jira] Commented: (SOLR-2010) Improvements to SpellCheckComponent Collate functionality
>
>
>    [ https://issues.apache.org/jira/browse/SOLR-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12896903#action_12896903 ]
>
> Tom Phethean commented on SOLR-2010:
> ------------------------------------
>
> Ok, thanks. Do you know if there is a rough timescale on that?
>
>> Improvements to SpellCheckComponent Collate functionality
>> ---------------------------------------------------------
>>
>>                Key: SOLR-2010
>>                URL: https://issues.apache.org/jira/browse/SOLR-2010
>>            Project: Solr
>>         Issue Type: New Feature
>>         Components: clients - java, spellchecker
>>   Affects Versions: 1.4.1
>>        Environment: Tested against trunk revision 966633
>>           Reporter: James Dyer
>>           Assignee: Grant Ingersoll
>>           Priority: Minor
>>        Attachments: SOLR-2010.patch, SOLR-2010.patch
>>
>>
>> Improvements to SpellCheckComponent Collate functionality
>> Our project requires a better Spell Check Collator.  I'm contributing this as a patch to get suggestions for improvements and in case there is a broader need for these features.
>> 1. Only return collations that are guaranteed to result in hits if re-queried (applying original fq params also).  This is especially helpful when there is more than one correction per query.  The 1.4 behavior does not verify that a particular combination will actually return hits.
>> 2. Provide the option to get multiple collation suggestions
>> 3. Provide extended collation results including the # of hits re-querying will return and a breakdown of each misspelled word and its correction.
>> This patch is similar to what is described in SOLR-507 item #1.  Also, this patch provides a viable workaround for the problem discussed in SOLR-1074.  A dictionary could be created that combines the terms from the multiple fields.  The collator then would prune out any spurious suggestions this would cause.
>> This patch adds the following spellcheck parameters:
>> 1. spellcheck.maxCollationTries - maximum # of collation possibilities to try before giving up.  Lower values ensure better performance.  Higher values may be necessary to find a collation that can return results.  Default is 0, which maintains backwards-compatible behavior (do not check collations).
>> 2. spellcheck.maxCollations - maximum # of collations to return.  Default is 1, which maintains backwards-compatible behavior.
>> 3. spellcheck.collateExtendedResult - if true, returns an expanded response format detailing collations found.  default is false, which maintains backwards-compatible behavior.  When true, output is like this (in context):
>> <lst name="spellcheck">
>> <lst name="suggestions">
>> <lst name="hopq">
>> <int name="numFound">94</int>
>> <int name="startOffset">7</int>
>> <int name="endOffset">11</int>
>> <arr name="suggestion">
>> <str>hope</str>
>> <str>how</str>
>> <str>hope</str>
>> <str>chops</str>
>> <str>hoped</str>
>> etc
>> </arr>
>> <lst name="faill">
>> <int name="numFound">100</int>
>> <int name="startOffset">16</int>
>> <int name="endOffset">21</int>
>> <arr name="suggestion">
>> <str>fall</str>
>> <str>fails</str>
>> <str>fail</str>
>> <str>fill</str>
>> <str>faith</str>
>> <str>all</str>
>> etc
>> </arr>
>> </lst>
>> <lst name="collation">
>> <str name="collationQuery">Title:(how AND fails)</str>
>> <int name="hits">2</int>
>> <lst name="misspellingsAndCorrections">
>> <str name="hopq">how</str>
>> <str name="faill">fails</str>
>> </lst>
>> </lst>
>> <lst name="collation">
>> <str name="collationQuery">Title:(hope AND faith)</str>
>> <int name="hits">2</int>
>> <lst name="misspellingsAndCorrections">
>> <str name="hopq">hope</str>
>> <str name="faill">faith</str>
>> </lst>
>> </lst>
>> <lst name="collation">
>> <str name="collationQuery">Title:(chops AND all)</str>
>> <int name="hits">1</int>
>> <lst name="misspellingsAndCorrections">
>> <str name="hopq">chops</str>
>> <str name="faill">all</str>
>> </lst>
>> </lst>
>> </lst>
>> </lst>
>> In addition, SOLRJ is updated to include SpellCheckResponse.getCollatedResults(), which will return the expanded Collation format.  getCollatedResult(), which returns a single String, is retained for backwards-compatibility.  Other APIs were not changed but will still work provided that spellcheck.collateExtendedResult is false.
>> This likely will not return valid results if using Shards.  Rather, a more robust interaction with the index would be necessary than what exists in SpellCheckCollator.collate().
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem using Solr/Lucene: http://www.lucidimagination.com/search


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: [jira] Commented: (SOLR-2010) Improvements to SpellCheckComponent Collate functionality

Dyer, James
Grant,

I saw your comment and I agree its probably best to somehow re-query
through a Search Handler, either the existing one with all other
components turned off, or through a new one just for this purpose.  If
you (or someone else) are not able to work on implementing it this way
then I can probably get a little time in a few weeks.  

James Dyer
E-Commerce Systems
Ingram Book Company
(615) 213-4311

-----Original Message-----
From: Grant Ingersoll [mailto:[hidden email]]
Sent: Friday, August 13, 2010 7:34 AM
To: [hidden email]
Subject: Re: [jira] Commented: (SOLR-2010) Improvements to
SpellCheckComponent Collate functionality

Hi James,

Did you see my comments on the issue?  

On Aug 11, 2010, at 12:28 AM, Dyer, James wrote:

> Tom,
>
> I'm going to also need this to work with 1.4.1 within the next month
or two so if someone else doesn't back-port it to 1.4.1 then I probably
will.  I also would like to see this working with shards.  The
PossibilityIterator class likely can be made a lot simpler.  If nobody
else takes care of these items I will try to find time to do so myself
prior to making it work with 1.4.1.

>
> James Dyer
> E-Commerce Systems
> Ingram Book Company
> (615) 213-4311
>
> -----Original Message-----
> From: Tom Phethean (JIRA) [mailto:[hidden email]]
> Sent: Tuesday, August 10, 2010 10:01 AM
> To: [hidden email]
> Subject: [jira] Commented: (SOLR-2010) Improvements to
SpellCheckComponent Collate functionality
>
>
>    [
https://issues.apache.org/jira/browse/SOLR-2010?page=com.atlassian.jira.
plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12896903#
action_12896903 ]

>
> Tom Phethean commented on SOLR-2010:
> ------------------------------------
>
> Ok, thanks. Do you know if there is a rough timescale on that?
>
>> Improvements to SpellCheckComponent Collate functionality
>> ---------------------------------------------------------
>>
>>                Key: SOLR-2010
>>                URL: https://issues.apache.org/jira/browse/SOLR-2010
>>            Project: Solr
>>         Issue Type: New Feature
>>         Components: clients - java, spellchecker
>>   Affects Versions: 1.4.1
>>        Environment: Tested against trunk revision 966633
>>           Reporter: James Dyer
>>           Assignee: Grant Ingersoll
>>           Priority: Minor
>>        Attachments: SOLR-2010.patch, SOLR-2010.patch
>>
>>
>> Improvements to SpellCheckComponent Collate functionality
>> Our project requires a better Spell Check Collator.  I'm contributing
this as a patch to get suggestions for improvements and in case there is
a broader need for these features.
>> 1. Only return collations that are guaranteed to result in hits if
re-queried (applying original fq params also).  This is especially
helpful when there is more than one correction per query.  The 1.4
behavior does not verify that a particular combination will actually
return hits.
>> 2. Provide the option to get multiple collation suggestions
>> 3. Provide extended collation results including the # of hits
re-querying will return and a breakdown of each misspelled word and its
correction.
>> This patch is similar to what is described in SOLR-507 item #1.
Also, this patch provides a viable workaround for the problem discussed
in SOLR-1074.  A dictionary could be created that combines the terms
from the multiple fields.  The collator then would prune out any
spurious suggestions this would cause.
>> This patch adds the following spellcheck parameters:
>> 1. spellcheck.maxCollationTries - maximum # of collation
possibilities to try before giving up.  Lower values ensure better
performance.  Higher values may be necessary to find a collation that
can return results.  Default is 0, which maintains backwards-compatible
behavior (do not check collations).
>> 2. spellcheck.maxCollations - maximum # of collations to return.
Default is 1, which maintains backwards-compatible behavior.
>> 3. spellcheck.collateExtendedResult - if true, returns an expanded
response format detailing collations found.  default is false, which
maintains backwards-compatible behavior.  When true, output is like this
(in context):

>> <lst name="spellcheck">
>> <lst name="suggestions">
>> <lst name="hopq">
>> <int name="numFound">94</int>
>> <int name="startOffset">7</int>
>> <int name="endOffset">11</int>
>> <arr name="suggestion">
>> <str>hope</str>
>> <str>how</str>
>> <str>hope</str>
>> <str>chops</str>
>> <str>hoped</str>
>> etc
>> </arr>
>> <lst name="faill">
>> <int name="numFound">100</int>
>> <int name="startOffset">16</int>
>> <int name="endOffset">21</int>
>> <arr name="suggestion">
>> <str>fall</str>
>> <str>fails</str>
>> <str>fail</str>
>> <str>fill</str>
>> <str>faith</str>
>> <str>all</str>
>> etc
>> </arr>
>> </lst>
>> <lst name="collation">
>> <str name="collationQuery">Title:(how AND
fails)</str>
>> <int name="hits">2</int>
>> <lst name="misspellingsAndCorrections">
>> <str name="hopq">how</str>
>> <str name="faill">fails</str>
>> </lst>
>> </lst>
>> <lst name="collation">
>> <str name="collationQuery">Title:(hope AND
faith)</str>
>> <int name="hits">2</int>
>> <lst name="misspellingsAndCorrections">
>> <str name="hopq">hope</str>
>> <str name="faill">faith</str>
>> </lst>
>> </lst>
>> <lst name="collation">
>> <str name="collationQuery">Title:(chops AND
all)</str>
>> <int name="hits">1</int>
>> <lst name="misspellingsAndCorrections">
>> <str name="hopq">chops</str>
>> <str name="faill">all</str>
>> </lst>
>> </lst>
>> </lst>
>> </lst>
>> In addition, SOLRJ is updated to include
SpellCheckResponse.getCollatedResults(), which will return the expanded
Collation format.  getCollatedResult(), which returns a single String,
is retained for backwards-compatibility.  Other APIs were not changed
but will still work provided that spellcheck.collateExtendedResult is
false.
>> This likely will not return valid results if using Shards.  Rather, a
more robust interaction with the index would be necessary than what
exists in SpellCheckCollator.collate().

>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem using Solr/Lucene:
http://www.lucidimagination.com/search


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (SOLR-2010) Improvements to SpellCheckComponent Collate functionality

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/SOLR-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

James Dyer updated SOLR-2010:
-----------------------------

    Attachment: SOLR-2010.txt

Second version of patch.  Updated to trunk rev #986945.

Adds support for shards.  I originally implemented this by passing the SearchHandler to the SpellCheckComponent and then using an overloaded version of SearchHandler.handleRequestBody() to do the re-queries.  I found this was unnecessary as we get the same results by calling the QueryComponent directly.  

I added some test scenarios to "DistributedSpellCheckComponentTest" and all pass.  However, I am a bit disturbed to find that the test fails if I uncomment the constructor (added with this patch).  The constructor simply tells it to test only with 4 shards rather than trying 1 shard, then 2, etc.  I found either way the 4-shard test results in the same docs going to the same shards.  Yet the results are different.  Specifically the ranking/ordering of the collations returned and the # of hits reported are sometimes wrong when the constructor is called before the test.  Unfortunately I am at a loss as to why I get inconsistent results here and anyone's assistance on this would be most helpful.

I also added an additional unit test method to verify this works when multiple request handlers are configured with different "qf" params.  I also added a unit test method that verifies this works when "fq" is set.



> Improvements to SpellCheckComponent Collate functionality
> ---------------------------------------------------------
>
>                 Key: SOLR-2010
>                 URL: https://issues.apache.org/jira/browse/SOLR-2010
>             Project: Solr
>          Issue Type: New Feature
>          Components: clients - java, spellchecker
>    Affects Versions: 1.4.1
>         Environment: Tested against trunk revision 966633
>            Reporter: James Dyer
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.txt
>
>
> Improvements to SpellCheckComponent Collate functionality
> Our project requires a better Spell Check Collator.  I'm contributing this as a patch to get suggestions for improvements and in case there is a broader need for these features.
> 1. Only return collations that are guaranteed to result in hits if re-queried (applying original fq params also).  This is especially helpful when there is more than one correction per query.  The 1.4 behavior does not verify that a particular combination will actually return hits.
> 2. Provide the option to get multiple collation suggestions
> 3. Provide extended collation results including the # of hits re-querying will return and a breakdown of each misspelled word and its correction.
> This patch is similar to what is described in SOLR-507 item #1.  Also, this patch provides a viable workaround for the problem discussed in SOLR-1074.  A dictionary could be created that combines the terms from the multiple fields.  The collator then would prune out any spurious suggestions this would cause.
> This patch adds the following spellcheck parameters:
> 1. spellcheck.maxCollationTries - maximum # of collation possibilities to try before giving up.  Lower values ensure better performance.  Higher values may be necessary to find a collation that can return results.  Default is 0, which maintains backwards-compatible behavior (do not check collations).
> 2. spellcheck.maxCollations - maximum # of collations to return.  Default is 1, which maintains backwards-compatible behavior.
> 3. spellcheck.collateExtendedResult - if true, returns an expanded response format detailing collations found.  default is false, which maintains backwards-compatible behavior.  When true, output is like this (in context):
> <lst name="spellcheck">
> <lst name="suggestions">
> <lst name="hopq">
> <int name="numFound">94</int>
> <int name="startOffset">7</int>
> <int name="endOffset">11</int>
> <arr name="suggestion">
> <str>hope</str>
> <str>how</str>
> <str>hope</str>
> <str>chops</str>
> <str>hoped</str>
> etc
> </arr>
> <lst name="faill">
> <int name="numFound">100</int>
> <int name="startOffset">16</int>
> <int name="endOffset">21</int>
> <arr name="suggestion">
> <str>fall</str>
> <str>fails</str>
> <str>fail</str>
> <str>fill</str>
> <str>faith</str>
> <str>all</str>
> etc
> </arr>
> </lst>
> <lst name="collation">
> <str name="collationQuery">Title:(how AND fails)</str>
> <int name="hits">2</int>
> <lst name="misspellingsAndCorrections">
> <str name="hopq">how</str>
> <str name="faill">fails</str>
> </lst>
> </lst>
> <lst name="collation">
> <str name="collationQuery">Title:(hope AND faith)</str>
> <int name="hits">2</int>
> <lst name="misspellingsAndCorrections">
> <str name="hopq">hope</str>
> <str name="faill">faith</str>
> </lst>
> </lst>
> <lst name="collation">
> <str name="collationQuery">Title:(chops AND all)</str>
> <int name="hits">1</int>
> <lst name="misspellingsAndCorrections">
> <str name="hopq">chops</str>
> <str name="faill">all</str>
> </lst>
> </lst>
> </lst>
> </lst>
> In addition, SOLRJ is updated to include SpellCheckResponse.getCollatedResults(), which will return the expanded Collation format.  getCollatedResult(), which returns a single String, is retained for backwards-compatibility.  Other APIs were not changed but will still work provided that spellcheck.collateExtendedResult is false.
> This likely will not return valid results if using Shards.  Rather, a more robust interaction with the index would be necessary than what exists in SpellCheckCollator.collate().

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (SOLR-2010) Improvements to SpellCheckComponent Collate functionality

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/SOLR-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900246#action_12900246 ]

Grant Ingersoll commented on SOLR-2010:
---------------------------------------

{quote}Adds support for shards. I originally implemented this by passing the SearchHandler to the SpellCheckComponent and then using an overloaded version of SearchHandler.handleRequestBody() to do the re-queries. I found this was unnecessary as we get the same results by calling the QueryComponent directly.
{quote}

I haven't taken a look at the patch yet, but by the sounds of it, I still think the cleaner way to go is to make Solr have an option to specifically pass in which component to run and turn off all others.  This would be useful for other things, too.  Then you could just use the existing mechanisms.

> Improvements to SpellCheckComponent Collate functionality
> ---------------------------------------------------------
>
>                 Key: SOLR-2010
>                 URL: https://issues.apache.org/jira/browse/SOLR-2010
>             Project: Solr
>          Issue Type: New Feature
>          Components: clients - java, spellchecker
>    Affects Versions: 1.4.1
>         Environment: Tested against trunk revision 966633
>            Reporter: James Dyer
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.txt
>
>
> Improvements to SpellCheckComponent Collate functionality
> Our project requires a better Spell Check Collator.  I'm contributing this as a patch to get suggestions for improvements and in case there is a broader need for these features.
> 1. Only return collations that are guaranteed to result in hits if re-queried (applying original fq params also).  This is especially helpful when there is more than one correction per query.  The 1.4 behavior does not verify that a particular combination will actually return hits.
> 2. Provide the option to get multiple collation suggestions
> 3. Provide extended collation results including the # of hits re-querying will return and a breakdown of each misspelled word and its correction.
> This patch is similar to what is described in SOLR-507 item #1.  Also, this patch provides a viable workaround for the problem discussed in SOLR-1074.  A dictionary could be created that combines the terms from the multiple fields.  The collator then would prune out any spurious suggestions this would cause.
> This patch adds the following spellcheck parameters:
> 1. spellcheck.maxCollationTries - maximum # of collation possibilities to try before giving up.  Lower values ensure better performance.  Higher values may be necessary to find a collation that can return results.  Default is 0, which maintains backwards-compatible behavior (do not check collations).
> 2. spellcheck.maxCollations - maximum # of collations to return.  Default is 1, which maintains backwards-compatible behavior.
> 3. spellcheck.collateExtendedResult - if true, returns an expanded response format detailing collations found.  default is false, which maintains backwards-compatible behavior.  When true, output is like this (in context):
> <lst name="spellcheck">
> <lst name="suggestions">
> <lst name="hopq">
> <int name="numFound">94</int>
> <int name="startOffset">7</int>
> <int name="endOffset">11</int>
> <arr name="suggestion">
> <str>hope</str>
> <str>how</str>
> <str>hope</str>
> <str>chops</str>
> <str>hoped</str>
> etc
> </arr>
> <lst name="faill">
> <int name="numFound">100</int>
> <int name="startOffset">16</int>
> <int name="endOffset">21</int>
> <arr name="suggestion">
> <str>fall</str>
> <str>fails</str>
> <str>fail</str>
> <str>fill</str>
> <str>faith</str>
> <str>all</str>
> etc
> </arr>
> </lst>
> <lst name="collation">
> <str name="collationQuery">Title:(how AND fails)</str>
> <int name="hits">2</int>
> <lst name="misspellingsAndCorrections">
> <str name="hopq">how</str>
> <str name="faill">fails</str>
> </lst>
> </lst>
> <lst name="collation">
> <str name="collationQuery">Title:(hope AND faith)</str>
> <int name="hits">2</int>
> <lst name="misspellingsAndCorrections">
> <str name="hopq">hope</str>
> <str name="faill">faith</str>
> </lst>
> </lst>
> <lst name="collation">
> <str name="collationQuery">Title:(chops AND all)</str>
> <int name="hits">1</int>
> <lst name="misspellingsAndCorrections">
> <str name="hopq">chops</str>
> <str name="faill">all</str>
> </lst>
> </lst>
> </lst>
> </lst>
> In addition, SOLRJ is updated to include SpellCheckResponse.getCollatedResults(), which will return the expanded Collation format.  getCollatedResult(), which returns a single String, is retained for backwards-compatibility.  Other APIs were not changed but will still work provided that spellcheck.collateExtendedResult is false.
> This likely will not return valid results if using Shards.  Rather, a more robust interaction with the index would be necessary than what exists in SpellCheckCollator.collate().

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (SOLR-2010) Improvements to SpellCheckComponent Collate functionality

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/SOLR-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

James Dyer updated SOLR-2010:
-----------------------------

    Attachment: SOLR-2010.patch

Third version (with ".patch" extension.  I had used ".txt" extension with 2nd version).  Works with trunk rev#986945.

This time SpellCheckCollator calls the SearchHandler instead of calling the QueryComponent.  This required exposing a reference to the SearchHandler on the ResponseBuilder.  Also a new overloaded method in SearchHandler.processRequestBody() lets you override the list of components to run.  In this case we just have it run QueryComponent.

This revision has 2 potential benefits:
 
(1) the overloaded method in SearchHandler may prove useful to other components in the future.  

(2) there may be a way to get SearchHandler to requery all the shards at once and then there would be no need to reintegrate the Collations in SearchHandler.finishStage().  However, see my comment in SpellCheckCollator lines 56-57.  Likely I am calling SpellCheckCollator during the wrong "stage" of the distributed request but I a need to find out more specifically how shards work to determine how to further improve this here.  As time allows I will do my own investigating but anyone's advice would be greatly appreciated.

Finally, this version corrects a bug that would have caused one of the test scenarios in DistributedSpellCheckComponentTest to fail.  Unfortunately in the 2nd version, I had left some scenarios commented-out and did not catch this until now.


> Improvements to SpellCheckComponent Collate functionality
> ---------------------------------------------------------
>
>                 Key: SOLR-2010
>                 URL: https://issues.apache.org/jira/browse/SOLR-2010
>             Project: Solr
>          Issue Type: New Feature
>          Components: clients - java, spellchecker
>    Affects Versions: 1.4.1
>         Environment: Tested against trunk revision 966633
>            Reporter: James Dyer
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.txt
>
>
> Improvements to SpellCheckComponent Collate functionality
> Our project requires a better Spell Check Collator.  I'm contributing this as a patch to get suggestions for improvements and in case there is a broader need for these features.
> 1. Only return collations that are guaranteed to result in hits if re-queried (applying original fq params also).  This is especially helpful when there is more than one correction per query.  The 1.4 behavior does not verify that a particular combination will actually return hits.
> 2. Provide the option to get multiple collation suggestions
> 3. Provide extended collation results including the # of hits re-querying will return and a breakdown of each misspelled word and its correction.
> This patch is similar to what is described in SOLR-507 item #1.  Also, this patch provides a viable workaround for the problem discussed in SOLR-1074.  A dictionary could be created that combines the terms from the multiple fields.  The collator then would prune out any spurious suggestions this would cause.
> This patch adds the following spellcheck parameters:
> 1. spellcheck.maxCollationTries - maximum # of collation possibilities to try before giving up.  Lower values ensure better performance.  Higher values may be necessary to find a collation that can return results.  Default is 0, which maintains backwards-compatible behavior (do not check collations).
> 2. spellcheck.maxCollations - maximum # of collations to return.  Default is 1, which maintains backwards-compatible behavior.
> 3. spellcheck.collateExtendedResult - if true, returns an expanded response format detailing collations found.  default is false, which maintains backwards-compatible behavior.  When true, output is like this (in context):
> <lst name="spellcheck">
> <lst name="suggestions">
> <lst name="hopq">
> <int name="numFound">94</int>
> <int name="startOffset">7</int>
> <int name="endOffset">11</int>
> <arr name="suggestion">
> <str>hope</str>
> <str>how</str>
> <str>hope</str>
> <str>chops</str>
> <str>hoped</str>
> etc
> </arr>
> <lst name="faill">
> <int name="numFound">100</int>
> <int name="startOffset">16</int>
> <int name="endOffset">21</int>
> <arr name="suggestion">
> <str>fall</str>
> <str>fails</str>
> <str>fail</str>
> <str>fill</str>
> <str>faith</str>
> <str>all</str>
> etc
> </arr>
> </lst>
> <lst name="collation">
> <str name="collationQuery">Title:(how AND fails)</str>
> <int name="hits">2</int>
> <lst name="misspellingsAndCorrections">
> <str name="hopq">how</str>
> <str name="faill">fails</str>
> </lst>
> </lst>
> <lst name="collation">
> <str name="collationQuery">Title:(hope AND faith)</str>
> <int name="hits">2</int>
> <lst name="misspellingsAndCorrections">
> <str name="hopq">hope</str>
> <str name="faill">faith</str>
> </lst>
> </lst>
> <lst name="collation">
> <str name="collationQuery">Title:(chops AND all)</str>
> <int name="hits">1</int>
> <lst name="misspellingsAndCorrections">
> <str name="hopq">chops</str>
> <str name="faill">all</str>
> </lst>
> </lst>
> </lst>
> </lst>
> In addition, SOLRJ is updated to include SpellCheckResponse.getCollatedResults(), which will return the expanded Collation format.  getCollatedResult(), which returns a single String, is retained for backwards-compatibility.  Other APIs were not changed but will still work provided that spellcheck.collateExtendedResult is false.
> This likely will not return valid results if using Shards.  Rather, a more robust interaction with the index would be necessary than what exists in SpellCheckCollator.collate().

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (SOLR-2010) Improvements to SpellCheckComponent Collate functionality

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/SOLR-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

James Dyer updated SOLR-2010:
-----------------------------

    Attachment: SOLR-2010.patch

New Patch Version with Shard Support.  Grant, I hope I'm getting closer to what you have in mind this time around.

I think I've figured how to send the collation test queries back to SearchHandler and have it take care of querying the shards individually.  Then the collation logic is no different for distributed / non-distributed.

As I would like to eventually use this in production here, any comments as to how to further make this a "production-quality" feature are much appreciated.

> Improvements to SpellCheckComponent Collate functionality
> ---------------------------------------------------------
>
>                 Key: SOLR-2010
>                 URL: https://issues.apache.org/jira/browse/SOLR-2010
>             Project: Solr
>          Issue Type: New Feature
>          Components: clients - java, spellchecker
>    Affects Versions: 1.4.1
>         Environment: Tested against trunk revision 966633
>            Reporter: James Dyer
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.txt
>
>
> Improvements to SpellCheckComponent Collate functionality
> Our project requires a better Spell Check Collator.  I'm contributing this as a patch to get suggestions for improvements and in case there is a broader need for these features.
> 1. Only return collations that are guaranteed to result in hits if re-queried (applying original fq params also).  This is especially helpful when there is more than one correction per query.  The 1.4 behavior does not verify that a particular combination will actually return hits.
> 2. Provide the option to get multiple collation suggestions
> 3. Provide extended collation results including the # of hits re-querying will return and a breakdown of each misspelled word and its correction.
> This patch is similar to what is described in SOLR-507 item #1.  Also, this patch provides a viable workaround for the problem discussed in SOLR-1074.  A dictionary could be created that combines the terms from the multiple fields.  The collator then would prune out any spurious suggestions this would cause.
> This patch adds the following spellcheck parameters:
> 1. spellcheck.maxCollationTries - maximum # of collation possibilities to try before giving up.  Lower values ensure better performance.  Higher values may be necessary to find a collation that can return results.  Default is 0, which maintains backwards-compatible behavior (do not check collations).
> 2. spellcheck.maxCollations - maximum # of collations to return.  Default is 1, which maintains backwards-compatible behavior.
> 3. spellcheck.collateExtendedResult - if true, returns an expanded response format detailing collations found.  default is false, which maintains backwards-compatible behavior.  When true, output is like this (in context):
> <lst name="spellcheck">
> <lst name="suggestions">
> <lst name="hopq">
> <int name="numFound">94</int>
> <int name="startOffset">7</int>
> <int name="endOffset">11</int>
> <arr name="suggestion">
> <str>hope</str>
> <str>how</str>
> <str>hope</str>
> <str>chops</str>
> <str>hoped</str>
> etc
> </arr>
> <lst name="faill">
> <int name="numFound">100</int>
> <int name="startOffset">16</int>
> <int name="endOffset">21</int>
> <arr name="suggestion">
> <str>fall</str>
> <str>fails</str>
> <str>fail</str>
> <str>fill</str>
> <str>faith</str>
> <str>all</str>
> etc
> </arr>
> </lst>
> <lst name="collation">
> <str name="collationQuery">Title:(how AND fails)</str>
> <int name="hits">2</int>
> <lst name="misspellingsAndCorrections">
> <str name="hopq">how</str>
> <str name="faill">fails</str>
> </lst>
> </lst>
> <lst name="collation">
> <str name="collationQuery">Title:(hope AND faith)</str>
> <int name="hits">2</int>
> <lst name="misspellingsAndCorrections">
> <str name="hopq">hope</str>
> <str name="faill">faith</str>
> </lst>
> </lst>
> <lst name="collation">
> <str name="collationQuery">Title:(chops AND all)</str>
> <int name="hits">1</int>
> <lst name="misspellingsAndCorrections">
> <str name="hopq">chops</str>
> <str name="faill">all</str>
> </lst>
> </lst>
> </lst>
> </lst>
> In addition, SOLRJ is updated to include SpellCheckResponse.getCollatedResults(), which will return the expanded Collation format.  getCollatedResult(), which returns a single String, is retained for backwards-compatibility.  Other APIs were not changed but will still work provided that spellcheck.collateExtendedResult is false.
> This likely will not return valid results if using Shards.  Rather, a more robust interaction with the index would be necessary than what exists in SpellCheckCollator.collate().

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (SOLR-2010) Improvements to SpellCheckComponent Collate functionality

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/SOLR-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12904264#action_12904264 ]

Grant Ingersoll commented on SOLR-2010:
---------------------------------------

Hi James,

First off, good work.  I like the overall design, etc.

Second, this patch no longer applies cleanly to trunk.  The issue is in the SearchHandler.

Third, in thinking some more about the whole distributed case, perhaps we are approaching this wrong.  I was originally thinking that we would have to go off and re-query all the shards (as in send another message) but we really shouldn't have to do that, right?  Why can't we just pass the collation request through to the shards as part of the get suggestions and then it can, if collation is asked for, return it's collation suggestions.  Then, the question becomes how to merge the suggestions and pick the best one.  This should save a round trip at the cost of doing some extra collations, but since most people aren't going to ask for more than 5 or 10, it shouldn't be an issue.

-Grant

> Improvements to SpellCheckComponent Collate functionality
> ---------------------------------------------------------
>
>                 Key: SOLR-2010
>                 URL: https://issues.apache.org/jira/browse/SOLR-2010
>             Project: Solr
>          Issue Type: New Feature
>          Components: clients - java, spellchecker
>    Affects Versions: 1.4.1
>         Environment: Tested against trunk revision 966633
>            Reporter: James Dyer
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.txt
>
>
> Improvements to SpellCheckComponent Collate functionality
> Our project requires a better Spell Check Collator.  I'm contributing this as a patch to get suggestions for improvements and in case there is a broader need for these features.
> 1. Only return collations that are guaranteed to result in hits if re-queried (applying original fq params also).  This is especially helpful when there is more than one correction per query.  The 1.4 behavior does not verify that a particular combination will actually return hits.
> 2. Provide the option to get multiple collation suggestions
> 3. Provide extended collation results including the # of hits re-querying will return and a breakdown of each misspelled word and its correction.
> This patch is similar to what is described in SOLR-507 item #1.  Also, this patch provides a viable workaround for the problem discussed in SOLR-1074.  A dictionary could be created that combines the terms from the multiple fields.  The collator then would prune out any spurious suggestions this would cause.
> This patch adds the following spellcheck parameters:
> 1. spellcheck.maxCollationTries - maximum # of collation possibilities to try before giving up.  Lower values ensure better performance.  Higher values may be necessary to find a collation that can return results.  Default is 0, which maintains backwards-compatible behavior (do not check collations).
> 2. spellcheck.maxCollations - maximum # of collations to return.  Default is 1, which maintains backwards-compatible behavior.
> 3. spellcheck.collateExtendedResult - if true, returns an expanded response format detailing collations found.  default is false, which maintains backwards-compatible behavior.  When true, output is like this (in context):
> <lst name="spellcheck">
> <lst name="suggestions">
> <lst name="hopq">
> <int name="numFound">94</int>
> <int name="startOffset">7</int>
> <int name="endOffset">11</int>
> <arr name="suggestion">
> <str>hope</str>
> <str>how</str>
> <str>hope</str>
> <str>chops</str>
> <str>hoped</str>
> etc
> </arr>
> <lst name="faill">
> <int name="numFound">100</int>
> <int name="startOffset">16</int>
> <int name="endOffset">21</int>
> <arr name="suggestion">
> <str>fall</str>
> <str>fails</str>
> <str>fail</str>
> <str>fill</str>
> <str>faith</str>
> <str>all</str>
> etc
> </arr>
> </lst>
> <lst name="collation">
> <str name="collationQuery">Title:(how AND fails)</str>
> <int name="hits">2</int>
> <lst name="misspellingsAndCorrections">
> <str name="hopq">how</str>
> <str name="faill">fails</str>
> </lst>
> </lst>
> <lst name="collation">
> <str name="collationQuery">Title:(hope AND faith)</str>
> <int name="hits">2</int>
> <lst name="misspellingsAndCorrections">
> <str name="hopq">hope</str>
> <str name="faill">faith</str>
> </lst>
> </lst>
> <lst name="collation">
> <str name="collationQuery">Title:(chops AND all)</str>
> <int name="hits">1</int>
> <lst name="misspellingsAndCorrections">
> <str name="hopq">chops</str>
> <str name="faill">all</str>
> </lst>
> </lst>
> </lst>
> </lst>
> In addition, SOLRJ is updated to include SpellCheckResponse.getCollatedResults(), which will return the expanded Collation format.  getCollatedResult(), which returns a single String, is retained for backwards-compatibility.  Other APIs were not changed but will still work provided that spellcheck.collateExtendedResult is false.
> This likely will not return valid results if using Shards.  Rather, a more robust interaction with the index would be necessary than what exists in SpellCheckCollator.collate().

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (SOLR-2010) Improvements to SpellCheckComponent Collate functionality

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/SOLR-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

James Dyer updated SOLR-2010:
-----------------------------

    Attachment: SOLR-2010_shardSearchHandler_993538.patch
                SOLR-2010_shardRecombineCollations_993538.patch

Two new versions of the patch:

1. SOLR-2010_shardSearchHandler_993538.patch is the same as the 8/23/2010 version except it applies cleanly to trunk revision #993538.  In a Distributed setup, this version calls an overloaded method on SearchHandler to use its logic for combining results from the collation test queries.  This is simpler code but requires many more round-trips between shards.  We also can guarantee that a Distributed setup will always return the exact same collations in order as a non-Distributed setup.  

2. SOLR-2010_shardRecombineCollations_993538.patch is similar to the 8/19/2010 version, with improvements.  This version also applies cleanly to trunk revision #993538.  In a Distributed setup, each shard calls QueryComponent individually and generates its own list of Collations.  The SpellCheckComponent then combines and sorts the resulting collations, returning the best ones, up to the client-specified maximum.  This requires more complicated logic in SpellCheckComponent.finishStage(), although it does not necessitate changes to SearchHandler or ResponseBuilder.  It may be possible to find cases where a Distributed setup may return different collations--or the same collations in a different order--than a non-distributed setup.  I do not believe this potential disparity would ever be very significant.

Grant, I believe version 1 is something like what you were thinking of on 8/9 and 8/19.  Version 2 is more like what you describe in your comment from 8/30.  Let me know if you think this needs any more tweaking.  ALSO, if you're thinking of possibly committing this someday, you may want to look at SOLR-2049 also.  Based on my understanding, distributed SpellCheckComponent as exists currently in Trunk is broken.  (If I'm right), we may want to fix it before adding on more functionality.

> Improvements to SpellCheckComponent Collate functionality
> ---------------------------------------------------------
>
>                 Key: SOLR-2010
>                 URL: https://issues.apache.org/jira/browse/SOLR-2010
>             Project: Solr
>          Issue Type: New Feature
>          Components: clients - java, spellchecker
>    Affects Versions: 1.4.1
>         Environment: Tested against trunk revision 966633
>            Reporter: James Dyer
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.txt, SOLR-2010_shardRecombineCollations_993538.patch, SOLR-2010_shardSearchHandler_993538.patch
>
>
> Improvements to SpellCheckComponent Collate functionality
> Our project requires a better Spell Check Collator.  I'm contributing this as a patch to get suggestions for improvements and in case there is a broader need for these features.
> 1. Only return collations that are guaranteed to result in hits if re-queried (applying original fq params also).  This is especially helpful when there is more than one correction per query.  The 1.4 behavior does not verify that a particular combination will actually return hits.
> 2. Provide the option to get multiple collation suggestions
> 3. Provide extended collation results including the # of hits re-querying will return and a breakdown of each misspelled word and its correction.
> This patch is similar to what is described in SOLR-507 item #1.  Also, this patch provides a viable workaround for the problem discussed in SOLR-1074.  A dictionary could be created that combines the terms from the multiple fields.  The collator then would prune out any spurious suggestions this would cause.
> This patch adds the following spellcheck parameters:
> 1. spellcheck.maxCollationTries - maximum # of collation possibilities to try before giving up.  Lower values ensure better performance.  Higher values may be necessary to find a collation that can return results.  Default is 0, which maintains backwards-compatible behavior (do not check collations).
> 2. spellcheck.maxCollations - maximum # of collations to return.  Default is 1, which maintains backwards-compatible behavior.
> 3. spellcheck.collateExtendedResult - if true, returns an expanded response format detailing collations found.  default is false, which maintains backwards-compatible behavior.  When true, output is like this (in context):
> <lst name="spellcheck">
> <lst name="suggestions">
> <lst name="hopq">
> <int name="numFound">94</int>
> <int name="startOffset">7</int>
> <int name="endOffset">11</int>
> <arr name="suggestion">
> <str>hope</str>
> <str>how</str>
> <str>hope</str>
> <str>chops</str>
> <str>hoped</str>
> etc
> </arr>
> <lst name="faill">
> <int name="numFound">100</int>
> <int name="startOffset">16</int>
> <int name="endOffset">21</int>
> <arr name="suggestion">
> <str>fall</str>
> <str>fails</str>
> <str>fail</str>
> <str>fill</str>
> <str>faith</str>
> <str>all</str>
> etc
> </arr>
> </lst>
> <lst name="collation">
> <str name="collationQuery">Title:(how AND fails)</str>
> <int name="hits">2</int>
> <lst name="misspellingsAndCorrections">
> <str name="hopq">how</str>
> <str name="faill">fails</str>
> </lst>
> </lst>
> <lst name="collation">
> <str name="collationQuery">Title:(hope AND faith)</str>
> <int name="hits">2</int>
> <lst name="misspellingsAndCorrections">
> <str name="hopq">hope</str>
> <str name="faill">faith</str>
> </lst>
> </lst>
> <lst name="collation">
> <str name="collationQuery">Title:(chops AND all)</str>
> <int name="hits">1</int>
> <lst name="misspellingsAndCorrections">
> <str name="hopq">chops</str>
> <str name="faill">all</str>
> </lst>
> </lst>
> </lst>
> </lst>
> In addition, SOLRJ is updated to include SpellCheckResponse.getCollatedResults(), which will return the expanded Collation format.  getCollatedResult(), which returns a single String, is retained for backwards-compatibility.  Other APIs were not changed but will still work provided that spellcheck.collateExtendedResult is false.
> This likely will not return valid results if using Shards.  Rather, a more robust interaction with the index would be necessary than what exists in SpellCheckCollator.collate().

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Issue Comment Edited: (SOLR-2010) Improvements to SpellCheckComponent Collate functionality

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/SOLR-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12907304#action_12907304 ]

James Dyer edited comment on SOLR-2010 at 9/8/10 12:50 PM:
-----------------------------------------------------------

Two new versions of the patch:

1. SOLR-2010_shardSearchHandler_993538.patch is the same as the 8/23/2010 version except it applies cleanly to trunk revision #993538.  In a Distributed setup, this version calls an overloaded method on SearchHandler to use its logic for combining results from the collation test queries.  This is simpler code but requires many more round-trips between shards.  We also can guarantee that a Distributed setup will always return the exact same collations in order as a non-Distributed setup.  

2. SOLR-2010_shardRecombineCollations_993538.patch is similar to the 8/19/2010 version, with improvements.  This version also applies cleanly to trunk revision #993538.  In a Distributed setup, each shard calls QueryComponent individually and generates its own list of Collations.  The SpellCheckComponent then combines and sorts the resulting collations, returning the best ones, up to the client-specified maximum.  This requires more complicated logic in SpellCheckComponent.finishStage(), although it does not necessitate changes to SearchHandler or ResponseBuilder.  It may be possible to find cases where a Distributed setup may return different collations--or the same collations in a different order--than a non-distributed setup.  I do not believe this potential disparity would ever be very significant.

Grant, I believe version 1 is something like what you were thinking of on 8/9 and 8/19.  Version 2 is more like what you describe in your comment from 8/30.  Let me know if you think this needs any more tweaking.  ALSO, if you're thinking of possibly committing this someday, you may want to look at SOLR-2083 also.  Based on my understanding, distributed SpellCheckComponent as exists currently in Trunk is broken.  (If I'm right), we may want to fix it before adding on more functionality.

      was (Author: jdyer):
    Two new versions of the patch:

1. SOLR-2010_shardSearchHandler_993538.patch is the same as the 8/23/2010 version except it applies cleanly to trunk revision #993538.  In a Distributed setup, this version calls an overloaded method on SearchHandler to use its logic for combining results from the collation test queries.  This is simpler code but requires many more round-trips between shards.  We also can guarantee that a Distributed setup will always return the exact same collations in order as a non-Distributed setup.  

2. SOLR-2010_shardRecombineCollations_993538.patch is similar to the 8/19/2010 version, with improvements.  This version also applies cleanly to trunk revision #993538.  In a Distributed setup, each shard calls QueryComponent individually and generates its own list of Collations.  The SpellCheckComponent then combines and sorts the resulting collations, returning the best ones, up to the client-specified maximum.  This requires more complicated logic in SpellCheckComponent.finishStage(), although it does not necessitate changes to SearchHandler or ResponseBuilder.  It may be possible to find cases where a Distributed setup may return different collations--or the same collations in a different order--than a non-distributed setup.  I do not believe this potential disparity would ever be very significant.

Grant, I believe version 1 is something like what you were thinking of on 8/9 and 8/19.  Version 2 is more like what you describe in your comment from 8/30.  Let me know if you think this needs any more tweaking.  ALSO, if you're thinking of possibly committing this someday, you may want to look at SOLR-2049 also.  Based on my understanding, distributed SpellCheckComponent as exists currently in Trunk is broken.  (If I'm right), we may want to fix it before adding on more functionality.
 

> Improvements to SpellCheckComponent Collate functionality
> ---------------------------------------------------------
>
>                 Key: SOLR-2010
>                 URL: https://issues.apache.org/jira/browse/SOLR-2010
>             Project: Solr
>          Issue Type: New Feature
>          Components: clients - java, spellchecker
>    Affects Versions: 1.4.1
>         Environment: Tested against trunk revision 966633
>            Reporter: James Dyer
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.txt, SOLR-2010_shardRecombineCollations_993538.patch, SOLR-2010_shardSearchHandler_993538.patch
>
>
> Improvements to SpellCheckComponent Collate functionality
> Our project requires a better Spell Check Collator.  I'm contributing this as a patch to get suggestions for improvements and in case there is a broader need for these features.
> 1. Only return collations that are guaranteed to result in hits if re-queried (applying original fq params also).  This is especially helpful when there is more than one correction per query.  The 1.4 behavior does not verify that a particular combination will actually return hits.
> 2. Provide the option to get multiple collation suggestions
> 3. Provide extended collation results including the # of hits re-querying will return and a breakdown of each misspelled word and its correction.
> This patch is similar to what is described in SOLR-507 item #1.  Also, this patch provides a viable workaround for the problem discussed in SOLR-1074.  A dictionary could be created that combines the terms from the multiple fields.  The collator then would prune out any spurious suggestions this would cause.
> This patch adds the following spellcheck parameters:
> 1. spellcheck.maxCollationTries - maximum # of collation possibilities to try before giving up.  Lower values ensure better performance.  Higher values may be necessary to find a collation that can return results.  Default is 0, which maintains backwards-compatible behavior (do not check collations).
> 2. spellcheck.maxCollations - maximum # of collations to return.  Default is 1, which maintains backwards-compatible behavior.
> 3. spellcheck.collateExtendedResult - if true, returns an expanded response format detailing collations found.  default is false, which maintains backwards-compatible behavior.  When true, output is like this (in context):
> <lst name="spellcheck">
> <lst name="suggestions">
> <lst name="hopq">
> <int name="numFound">94</int>
> <int name="startOffset">7</int>
> <int name="endOffset">11</int>
> <arr name="suggestion">
> <str>hope</str>
> <str>how</str>
> <str>hope</str>
> <str>chops</str>
> <str>hoped</str>
> etc
> </arr>
> <lst name="faill">
> <int name="numFound">100</int>
> <int name="startOffset">16</int>
> <int name="endOffset">21</int>
> <arr name="suggestion">
> <str>fall</str>
> <str>fails</str>
> <str>fail</str>
> <str>fill</str>
> <str>faith</str>
> <str>all</str>
> etc
> </arr>
> </lst>
> <lst name="collation">
> <str name="collationQuery">Title:(how AND fails)</str>
> <int name="hits">2</int>
> <lst name="misspellingsAndCorrections">
> <str name="hopq">how</str>
> <str name="faill">fails</str>
> </lst>
> </lst>
> <lst name="collation">
> <str name="collationQuery">Title:(hope AND faith)</str>
> <int name="hits">2</int>
> <lst name="misspellingsAndCorrections">
> <str name="hopq">hope</str>
> <str name="faill">faith</str>
> </lst>
> </lst>
> <lst name="collation">
> <str name="collationQuery">Title:(chops AND all)</str>
> <int name="hits">1</int>
> <lst name="misspellingsAndCorrections">
> <str name="hopq">chops</str>
> <str name="faill">all</str>
> </lst>
> </lst>
> </lst>
> </lst>
> In addition, SOLRJ is updated to include SpellCheckResponse.getCollatedResults(), which will return the expanded Collation format.  getCollatedResult(), which returns a single String, is retained for backwards-compatibility.  Other APIs were not changed but will still work provided that spellcheck.collateExtendedResult is false.
> This likely will not return valid results if using Shards.  Rather, a more robust interaction with the index would be necessary than what exists in SpellCheckCollator.collate().

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (SOLR-2010) Improvements to SpellCheckComponent Collate functionality

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/SOLR-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

James Dyer updated SOLR-2010:
-----------------------------

    Attachment: SOLR-2010_shardSearchHandler_999521.patch
                SOLR-2010_shardRecombineCollations_999521.patch

Both patch versions sync'ed to Trunk version 999521. (sorry about the many filename variants)

> Improvements to SpellCheckComponent Collate functionality
> ---------------------------------------------------------
>
>                 Key: SOLR-2010
>                 URL: https://issues.apache.org/jira/browse/SOLR-2010
>             Project: Solr
>          Issue Type: New Feature
>          Components: clients - java, spellchecker
>    Affects Versions: 1.4.1
>         Environment: Tested against trunk revision 966633
>            Reporter: James Dyer
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.txt, SOLR-2010_shardRecombineCollations_993538.patch, SOLR-2010_shardRecombineCollations_999521.patch, SOLR-2010_shardSearchHandler_993538.patch, SOLR-2010_shardSearchHandler_999521.patch
>
>
> Improvements to SpellCheckComponent Collate functionality
> Our project requires a better Spell Check Collator.  I'm contributing this as a patch to get suggestions for improvements and in case there is a broader need for these features.
> 1. Only return collations that are guaranteed to result in hits if re-queried (applying original fq params also).  This is especially helpful when there is more than one correction per query.  The 1.4 behavior does not verify that a particular combination will actually return hits.
> 2. Provide the option to get multiple collation suggestions
> 3. Provide extended collation results including the # of hits re-querying will return and a breakdown of each misspelled word and its correction.
> This patch is similar to what is described in SOLR-507 item #1.  Also, this patch provides a viable workaround for the problem discussed in SOLR-1074.  A dictionary could be created that combines the terms from the multiple fields.  The collator then would prune out any spurious suggestions this would cause.
> This patch adds the following spellcheck parameters:
> 1. spellcheck.maxCollationTries - maximum # of collation possibilities to try before giving up.  Lower values ensure better performance.  Higher values may be necessary to find a collation that can return results.  Default is 0, which maintains backwards-compatible behavior (do not check collations).
> 2. spellcheck.maxCollations - maximum # of collations to return.  Default is 1, which maintains backwards-compatible behavior.
> 3. spellcheck.collateExtendedResult - if true, returns an expanded response format detailing collations found.  default is false, which maintains backwards-compatible behavior.  When true, output is like this (in context):
> <lst name="spellcheck">
> <lst name="suggestions">
> <lst name="hopq">
> <int name="numFound">94</int>
> <int name="startOffset">7</int>
> <int name="endOffset">11</int>
> <arr name="suggestion">
> <str>hope</str>
> <str>how</str>
> <str>hope</str>
> <str>chops</str>
> <str>hoped</str>
> etc
> </arr>
> <lst name="faill">
> <int name="numFound">100</int>
> <int name="startOffset">16</int>
> <int name="endOffset">21</int>
> <arr name="suggestion">
> <str>fall</str>
> <str>fails</str>
> <str>fail</str>
> <str>fill</str>
> <str>faith</str>
> <str>all</str>
> etc
> </arr>
> </lst>
> <lst name="collation">
> <str name="collationQuery">Title:(how AND fails)</str>
> <int name="hits">2</int>
> <lst name="misspellingsAndCorrections">
> <str name="hopq">how</str>
> <str name="faill">fails</str>
> </lst>
> </lst>
> <lst name="collation">
> <str name="collationQuery">Title:(hope AND faith)</str>
> <int name="hits">2</int>
> <lst name="misspellingsAndCorrections">
> <str name="hopq">hope</str>
> <str name="faill">faith</str>
> </lst>
> </lst>
> <lst name="collation">
> <str name="collationQuery">Title:(chops AND all)</str>
> <int name="hits">1</int>
> <lst name="misspellingsAndCorrections">
> <str name="hopq">chops</str>
> <str name="faill">all</str>
> </lst>
> </lst>
> </lst>
> </lst>
> In addition, SOLRJ is updated to include SpellCheckResponse.getCollatedResults(), which will return the expanded Collation format.  getCollatedResult(), which returns a single String, is retained for backwards-compatibility.  Other APIs were not changed but will still work provided that spellcheck.collateExtendedResult is false.
> This likely will not return valid results if using Shards.  Rather, a more robust interaction with the index would be necessary than what exists in SpellCheckCollator.collate().

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (SOLR-2010) Improvements to SpellCheckComponent Collate functionality

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/SOLR-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

James Dyer updated SOLR-2010:
-----------------------------

    Attachment: SOLR-2010_141.patch

This version is for v1.4.1.  No shard support as SpellCheckComponent does not have any distributed support in 1.4.  All tests pass.

> Improvements to SpellCheckComponent Collate functionality
> ---------------------------------------------------------
>
>                 Key: SOLR-2010
>                 URL: https://issues.apache.org/jira/browse/SOLR-2010
>             Project: Solr
>          Issue Type: New Feature
>          Components: clients - java, spellchecker
>    Affects Versions: 1.4.1
>         Environment: Tested against trunk revision 966633
>            Reporter: James Dyer
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.patch, SOLR-2010.txt, SOLR-2010_141.patch, SOLR-2010_shardRecombineCollations_993538.patch, SOLR-2010_shardRecombineCollations_999521.patch, SOLR-2010_shardSearchHandler_993538.patch, SOLR-2010_shardSearchHandler_999521.patch
>
>
> Improvements to SpellCheckComponent Collate functionality
> Our project requires a better Spell Check Collator.  I'm contributing this as a patch to get suggestions for improvements and in case there is a broader need for these features.
> 1. Only return collations that are guaranteed to result in hits if re-queried (applying original fq params also).  This is especially helpful when there is more than one correction per query.  The 1.4 behavior does not verify that a particular combination will actually return hits.
> 2. Provide the option to get multiple collation suggestions
> 3. Provide extended collation results including the # of hits re-querying will return and a breakdown of each misspelled word and its correction.
> This patch is similar to what is described in SOLR-507 item #1.  Also, this patch provides a viable workaround for the problem discussed in SOLR-1074.  A dictionary could be created that combines the terms from the multiple fields.  The collator then would prune out any spurious suggestions this would cause.
> This patch adds the following spellcheck parameters:
> 1. spellcheck.maxCollationTries - maximum # of collation possibilities to try before giving up.  Lower values ensure better performance.  Higher values may be necessary to find a collation that can return results.  Default is 0, which maintains backwards-compatible behavior (do not check collations).
> 2. spellcheck.maxCollations - maximum # of collations to return.  Default is 1, which maintains backwards-compatible behavior.
> 3. spellcheck.collateExtendedResult - if true, returns an expanded response format detailing collations found.  default is false, which maintains backwards-compatible behavior.  When true, output is like this (in context):
> <lst name="spellcheck">
> <lst name="suggestions">
> <lst name="hopq">
> <int name="numFound">94</int>
> <int name="startOffset">7</int>
> <int name="endOffset">11</int>
> <arr name="suggestion">
> <str>hope</str>
> <str>how</str>
> <str>hope</str>
> <str>chops</str>
> <str>hoped</str>
> etc
> </arr>
> <lst name="faill">
> <int name="numFound">100</int>
> <int name="startOffset">16</int>
> <int name="endOffset">21</int>
> <arr name="suggestion">
> <str>fall</str>
> <str>fails</str>
> <str>fail</str>
> <str>fill</str>
> <str>faith</str>
> <str>all</str>
> etc
> </arr>
> </lst>
> <lst name="collation">
> <str name="collationQuery">Title:(how AND fails)</str>
> <int name="hits">2</int>
> <lst name="misspellingsAndCorrections">
> <str name="hopq">how</str>
> <str name="faill">fails</str>
> </lst>
> </lst>
> <lst name="collation">
> <str name="collationQuery">Title:(hope AND faith)</str>
> <int name="hits">2</int>
> <lst name="misspellingsAndCorrections">
> <str name="hopq">hope</str>
> <str name="faill">faith</str>
> </lst>
> </lst>
> <lst name="collation">
> <str name="collationQuery">Title:(chops AND all)</str>
> <int name="hits">1</int>
> <lst name="misspellingsAndCorrections">
> <str name="hopq">chops</str>
> <str name="faill">all</str>
> </lst>
> </lst>
> </lst>
> </lst>
> In addition, SOLRJ is updated to include SpellCheckResponse.getCollatedResults(), which will return the expanded Collation format.  getCollatedResult(), which returns a single String, is retained for backwards-compatibility.  Other APIs were not changed but will still work provided that spellcheck.collateExtendedResult is false.
> This likely will not return valid results if using Shards.  Rather, a more robust interaction with the index would be necessary than what exists in SpellCheckCollator.collate().

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

123