[jira] Created: (LUCENE-1297) Allow other string distance measures in spellchecker

classic Classic list List threaded Threaded
24 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[jira] Created: (LUCENE-1297) Allow other string distance measures in spellchecker

Mihir Sharma (Jira)
Allow other string distance measures in spellchecker
----------------------------------------------------

                 Key: LUCENE-1297
                 URL: https://issues.apache.org/jira/browse/LUCENE-1297
             Project: Lucene - Java
          Issue Type: New Feature
          Components: contrib/spellchecker
    Affects Versions: 2.4
         Environment: n/a
            Reporter: Thomas Morton
            Priority: Minor
             Fix For: 2.4


Updated spelling code to allow for other string distance measures to be used.

Created StringDistance interface.
Modified existing Levenshtein distance measure to implement interface (and renamed class).
Verified that change to Levenshtein distance didn't impact runtime performance.
Implemented Jaro/Winkler distance metric
Modified SpellChecker to take distacne measure as in constructor or in set method and to use interface when calling.


--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (LUCENE-1297) Allow other string distance measures in spellchecker

Mihir Sharma (Jira)

     [ https://issues.apache.org/jira/browse/LUCENE-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Thomas Morton updated LUCENE-1297:
----------------------------------

    Attachment: string_distance.patch

> Allow other string distance measures in spellchecker
> ----------------------------------------------------
>
>                 Key: LUCENE-1297
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1297
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: contrib/spellchecker
>    Affects Versions: 2.4
>         Environment: n/a
>            Reporter: Thomas Morton
>            Priority: Minor
>             Fix For: 2.4
>
>         Attachments: string_distance.patch
>
>
> Updated spelling code to allow for other string distance measures to be used.
> Created StringDistance interface.
> Modified existing Levenshtein distance measure to implement interface (and renamed class).
> Verified that change to Levenshtein distance didn't impact runtime performance.
> Implemented Jaro/Winkler distance metric
> Modified SpellChecker to take distacne measure as in constructor or in set method and to use interface when calling.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Assigned: (LUCENE-1297) Allow other string distance measures in spellchecker

Mihir Sharma (Jira)
In reply to this post by Mihir Sharma (Jira)

     [ https://issues.apache.org/jira/browse/LUCENE-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Otis Gospodnetic reassigned LUCENE-1297:
----------------------------------------

    Assignee: Otis Gospodnetic

> Allow other string distance measures in spellchecker
> ----------------------------------------------------
>
>                 Key: LUCENE-1297
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1297
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: contrib/spellchecker
>    Affects Versions: 2.4
>         Environment: n/a
>            Reporter: Thomas Morton
>            Assignee: Otis Gospodnetic
>            Priority: Minor
>             Fix For: 2.4
>
>         Attachments: string_distance.patch
>
>
> Updated spelling code to allow for other string distance measures to be used.
> Created StringDistance interface.
> Modified existing Levenshtein distance measure to implement interface (and renamed class).
> Verified that change to Levenshtein distance didn't impact runtime performance.
> Implemented Jaro/Winkler distance metric
> Modified SpellChecker to take distacne measure as in constructor or in set method and to use interface when calling.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-1297) Allow other string distance measures in spellchecker

Mihir Sharma (Jira)
In reply to this post by Mihir Sharma (Jira)

    [ https://issues.apache.org/jira/browse/LUCENE-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12601335#action_12601335 ]

Otis Gospodnetic commented on LUCENE-1297:
------------------------------------------

You read my mind, Thomas.
Would it be appropriate to add and try Jaccard index and Dice coefficient, too, then?


> Allow other string distance measures in spellchecker
> ----------------------------------------------------
>
>                 Key: LUCENE-1297
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1297
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: contrib/spellchecker
>    Affects Versions: 2.4
>         Environment: n/a
>            Reporter: Thomas Morton
>            Assignee: Otis Gospodnetic
>            Priority: Minor
>             Fix For: 2.4
>
>         Attachments: string_distance.patch
>
>
> Updated spelling code to allow for other string distance measures to be used.
> Created StringDistance interface.
> Modified existing Levenshtein distance measure to implement interface (and renamed class).
> Verified that change to Levenshtein distance didn't impact runtime performance.
> Implemented Jaro/Winkler distance metric
> Modified SpellChecker to take distacne measure as in constructor or in set method and to use interface when calling.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-1297) Allow other string distance measures in spellchecker

Mihir Sharma (Jira)
In reply to this post by Mihir Sharma (Jira)

    [ https://issues.apache.org/jira/browse/LUCENE-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12601397#action_12601397 ]

Thomas Morton commented on LUCENE-1297:
---------------------------------------

I think the dice coefficient would be nice to have.  I'm not sure the jaccard index makes sense in the context of spelling correction since order isn't captured.  I implemented JaroWinkler since I'm suggesting proper names and it does a good job with those.

With the StringDistance interface defined, anyone can implement the distance measure however they want.  What I think would be very useful is weighted version of edit distance with the weights tuned to your target language/domain.  Also with support in solr for specifying this parameter in the SpellCheckRequestHandler, changing this just becomes a config change.




> Allow other string distance measures in spellchecker
> ----------------------------------------------------
>
>                 Key: LUCENE-1297
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1297
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: contrib/spellchecker
>    Affects Versions: 2.4
>         Environment: n/a
>            Reporter: Thomas Morton
>            Assignee: Otis Gospodnetic
>            Priority: Minor
>             Fix For: 2.4
>
>         Attachments: string_distance.patch
>
>
> Updated spelling code to allow for other string distance measures to be used.
> Created StringDistance interface.
> Modified existing Levenshtein distance measure to implement interface (and renamed class).
> Verified that change to Levenshtein distance didn't impact runtime performance.
> Implemented Jaro/Winkler distance metric
> Modified SpellChecker to take distacne measure as in constructor or in set method and to use interface when calling.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-1297) Allow other string distance measures in spellchecker

Mihir Sharma (Jira)
In reply to this post by Mihir Sharma (Jira)

    [ https://issues.apache.org/jira/browse/LUCENE-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12602151#action_12602151 ]

Otis Gospodnetic commented on LUCENE-1297:
------------------------------------------

Thomas - any chance you can write a simple unit test that exercises JaroWinkler?


> Allow other string distance measures in spellchecker
> ----------------------------------------------------
>
>                 Key: LUCENE-1297
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1297
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: contrib/spellchecker
>    Affects Versions: 2.4
>         Environment: n/a
>            Reporter: Thomas Morton
>            Assignee: Otis Gospodnetic
>            Priority: Minor
>             Fix For: 2.4
>
>         Attachments: string_distance.patch
>
>
> Updated spelling code to allow for other string distance measures to be used.
> Created StringDistance interface.
> Modified existing Levenshtein distance measure to implement interface (and renamed class).
> Verified that change to Levenshtein distance didn't impact runtime performance.
> Implemented Jaro/Winkler distance metric
> Modified SpellChecker to take distacne measure as in constructor or in set method and to use interface when calling.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (LUCENE-1297) Allow other string distance measures in spellchecker

Mihir Sharma (Jira)
In reply to this post by Mihir Sharma (Jira)

     [ https://issues.apache.org/jira/browse/LUCENE-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Thomas Morton updated LUCENE-1297:
----------------------------------

    Attachment:     (was: string_distance.patch)

> Allow other string distance measures in spellchecker
> ----------------------------------------------------
>
>                 Key: LUCENE-1297
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1297
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: contrib/spellchecker
>    Affects Versions: 2.4
>         Environment: n/a
>            Reporter: Thomas Morton
>            Assignee: Otis Gospodnetic
>            Priority: Minor
>             Fix For: 2.4
>
>         Attachments: string_distance.patch
>
>
> Updated spelling code to allow for other string distance measures to be used.
> Created StringDistance interface.
> Modified existing Levenshtein distance measure to implement interface (and renamed class).
> Verified that change to Levenshtein distance didn't impact runtime performance.
> Implemented Jaro/Winkler distance metric
> Modified SpellChecker to take distacne measure as in constructor or in set method and to use interface when calling.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (LUCENE-1297) Allow other string distance measures in spellchecker

Mihir Sharma (Jira)
In reply to this post by Mihir Sharma (Jira)

     [ https://issues.apache.org/jira/browse/LUCENE-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Thomas Morton updated LUCENE-1297:
----------------------------------

    Attachment: string_distance.patch

Updated to include additional unit tests.

> Allow other string distance measures in spellchecker
> ----------------------------------------------------
>
>                 Key: LUCENE-1297
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1297
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: contrib/spellchecker
>    Affects Versions: 2.4
>         Environment: n/a
>            Reporter: Thomas Morton
>            Assignee: Otis Gospodnetic
>            Priority: Minor
>             Fix For: 2.4
>
>         Attachments: string_distance.patch
>
>
> Updated spelling code to allow for other string distance measures to be used.
> Created StringDistance interface.
> Modified existing Levenshtein distance measure to implement interface (and renamed class).
> Verified that change to Levenshtein distance didn't impact runtime performance.
> Implemented Jaro/Winkler distance metric
> Modified SpellChecker to take distacne measure as in constructor or in set method and to use interface when calling.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-1297) Allow other string distance measures in spellchecker

Mihir Sharma (Jira)
In reply to this post by Mihir Sharma (Jira)

    [ https://issues.apache.org/jira/browse/LUCENE-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12604102#action_12604102 ]

Grant Ingersoll commented on LUCENE-1297:
-----------------------------------------

Hi Thomas,

This patch doesn't apply for me from the contrib/spellchecker directory.  



> Allow other string distance measures in spellchecker
> ----------------------------------------------------
>
>                 Key: LUCENE-1297
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1297
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: contrib/spellchecker
>    Affects Versions: 2.4
>         Environment: n/a
>            Reporter: Thomas Morton
>            Assignee: Otis Gospodnetic
>            Priority: Minor
>             Fix For: 2.4
>
>         Attachments: string_distance.patch
>
>
> Updated spelling code to allow for other string distance measures to be used.
> Created StringDistance interface.
> Modified existing Levenshtein distance measure to implement interface (and renamed class).
> Verified that change to Levenshtein distance didn't impact runtime performance.
> Implemented Jaro/Winkler distance metric
> Modified SpellChecker to take distacne measure as in constructor or in set method and to use interface when calling.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (LUCENE-1297) Allow other string distance measures in spellchecker

Mihir Sharma (Jira)
In reply to this post by Mihir Sharma (Jira)

     [ https://issues.apache.org/jira/browse/LUCENE-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Thomas Morton updated LUCENE-1297:
----------------------------------

    Attachment:     (was: string_distance.patch)

> Allow other string distance measures in spellchecker
> ----------------------------------------------------
>
>                 Key: LUCENE-1297
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1297
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: contrib/spellchecker
>    Affects Versions: 2.4
>         Environment: n/a
>            Reporter: Thomas Morton
>            Assignee: Otis Gospodnetic
>            Priority: Minor
>             Fix For: 2.4
>
>         Attachments: string_distance.patch2
>
>
> Updated spelling code to allow for other string distance measures to be used.
> Created StringDistance interface.
> Modified existing Levenshtein distance measure to implement interface (and renamed class).
> Verified that change to Levenshtein distance didn't impact runtime performance.
> Implemented Jaro/Winkler distance metric
> Modified SpellChecker to take distacne measure as in constructor or in set method and to use interface when calling.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (LUCENE-1297) Allow other string distance measures in spellchecker

Mihir Sharma (Jira)
In reply to this post by Mihir Sharma (Jira)

     [ https://issues.apache.org/jira/browse/LUCENE-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Thomas Morton updated LUCENE-1297:
----------------------------------

    Attachment: string_distance.patch2

Looks like there was a minor change to the testing code since I created the patch.  Updated and re-created patch.

> Allow other string distance measures in spellchecker
> ----------------------------------------------------
>
>                 Key: LUCENE-1297
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1297
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: contrib/spellchecker
>    Affects Versions: 2.4
>         Environment: n/a
>            Reporter: Thomas Morton
>            Assignee: Otis Gospodnetic
>            Priority: Minor
>             Fix For: 2.4
>
>         Attachments: string_distance.patch2
>
>
> Updated spelling code to allow for other string distance measures to be used.
> Created StringDistance interface.
> Modified existing Levenshtein distance measure to implement interface (and renamed class).
> Verified that change to Levenshtein distance didn't impact runtime performance.
> Implemented Jaro/Winkler distance metric
> Modified SpellChecker to take distacne measure as in constructor or in set method and to use interface when calling.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-1297) Allow other string distance measures in spellchecker

Mihir Sharma (Jira)
In reply to this post by Mihir Sharma (Jira)

    [ https://issues.apache.org/jira/browse/LUCENE-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12604121#action_12604121 ]

Otis Gospodnetic commented on LUCENE-1297:
------------------------------------------

Tom, note the bit about naming patches and reusing patch names on the HowToContribute wiki page.

I see JaroWinklerDistance.java doesn't have ASL on top.

Oh, there is something funky about this patch.  You created a new class (LevenshteinDistance), but your patch shows it as an edit of TRStringDistance.  It should show it as a brand new file.  Could you please provide a clean patch?  This is why the patch fails to apply.

Thanks.


> Allow other string distance measures in spellchecker
> ----------------------------------------------------
>
>                 Key: LUCENE-1297
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1297
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: contrib/spellchecker
>    Affects Versions: 2.4
>         Environment: n/a
>            Reporter: Thomas Morton
>            Assignee: Otis Gospodnetic
>            Priority: Minor
>             Fix For: 2.4
>
>         Attachments: string_distance.patch2
>
>
> Updated spelling code to allow for other string distance measures to be used.
> Created StringDistance interface.
> Modified existing Levenshtein distance measure to implement interface (and renamed class).
> Verified that change to Levenshtein distance didn't impact runtime performance.
> Implemented Jaro/Winkler distance metric
> Modified SpellChecker to take distacne measure as in constructor or in set method and to use interface when calling.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (LUCENE-1297) Allow other string distance measures in spellchecker

Mihir Sharma (Jira)
In reply to this post by Mihir Sharma (Jira)

     [ https://issues.apache.org/jira/browse/LUCENE-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Thomas Morton updated LUCENE-1297:
----------------------------------

    Attachment: string_distance3.patch

I didn't see anything about re-using patch names on the wiki.  please advise.

In svn the LevenshteinDistance class is a re-name and edit of the TRStringDistance class.  Perhaps the patch doesn't know how to deal with that.  I'll change the name back though I think given that there are now going to be more than one of these a more descriptive name makes sense.

Added ASL to Jaro class.

> Allow other string distance measures in spellchecker
> ----------------------------------------------------
>
>                 Key: LUCENE-1297
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1297
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: contrib/spellchecker
>    Affects Versions: 2.4
>         Environment: n/a
>            Reporter: Thomas Morton
>            Assignee: Otis Gospodnetic
>            Priority: Minor
>             Fix For: 2.4
>
>         Attachments: string_distance3.patch
>
>
> Updated spelling code to allow for other string distance measures to be used.
> Created StringDistance interface.
> Modified existing Levenshtein distance measure to implement interface (and renamed class).
> Verified that change to Levenshtein distance didn't impact runtime performance.
> Implemented Jaro/Winkler distance metric
> Modified SpellChecker to take distacne measure as in constructor or in set method and to use interface when calling.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (LUCENE-1297) Allow other string distance measures in spellchecker

Mihir Sharma (Jira)
In reply to this post by Mihir Sharma (Jira)

     [ https://issues.apache.org/jira/browse/LUCENE-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Thomas Morton updated LUCENE-1297:
----------------------------------

    Attachment:     (was: string_distance.patch2)

> Allow other string distance measures in spellchecker
> ----------------------------------------------------
>
>                 Key: LUCENE-1297
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1297
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: contrib/spellchecker
>    Affects Versions: 2.4
>         Environment: n/a
>            Reporter: Thomas Morton
>            Assignee: Otis Gospodnetic
>            Priority: Minor
>             Fix For: 2.4
>
>         Attachments: string_distance3.patch
>
>
> Updated spelling code to allow for other string distance measures to be used.
> Created StringDistance interface.
> Modified existing Levenshtein distance measure to implement interface (and renamed class).
> Verified that change to Levenshtein distance didn't impact runtime performance.
> Implemented Jaro/Winkler distance metric
> Modified SpellChecker to take distacne measure as in constructor or in set method and to use interface when calling.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-1297) Allow other string distance measures in spellchecker

Mihir Sharma (Jira)
In reply to this post by Mihir Sharma (Jira)

    [ https://issues.apache.org/jira/browse/LUCENE-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12604231#action_12604231 ]

Grant Ingersoll commented on LUCENE-1297:
-----------------------------------------

{quote}
I didn't see anything about re-using patch names on the wiki. please advise.
{qoute}

I think Otis is just referring to naming patches as something like LUCENE-1297.patch and then you just always keep that name, then  JIRA takes care of the versioning and it is always clear which patch is the latest.  As for the Wiki, I think it is on the Solr wiki, but should be added to the Lucene one, too.

> Allow other string distance measures in spellchecker
> ----------------------------------------------------
>
>                 Key: LUCENE-1297
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1297
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: contrib/spellchecker
>    Affects Versions: 2.4
>         Environment: n/a
>            Reporter: Thomas Morton
>            Assignee: Otis Gospodnetic
>            Priority: Minor
>             Fix For: 2.4
>
>         Attachments: string_distance3.patch
>
>
> Updated spelling code to allow for other string distance measures to be used.
> Created StringDistance interface.
> Modified existing Levenshtein distance measure to implement interface (and renamed class).
> Verified that change to Levenshtein distance didn't impact runtime performance.
> Implemented Jaro/Winkler distance metric
> Modified SpellChecker to take distacne measure as in constructor or in set method and to use interface when calling.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Issue Comment Edited: (LUCENE-1297) Allow other string distance measures in spellchecker

Mihir Sharma (Jira)
In reply to this post by Mihir Sharma (Jira)

    [ https://issues.apache.org/jira/browse/LUCENE-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12604231#action_12604231 ]

gsingers edited comment on LUCENE-1297 at 6/12/08 5:48 AM:
------------------------------------------------------------------

{quote}
I didn't see anything about re-using patch names on the wiki. please advise.
{quote}

I think Otis is just referring to naming patches as something like LUCENE-1297.patch and then you just always keep that name, then  JIRA takes care of the versioning and it is always clear which patch is the latest.  As for the Wiki, I think it is on the Solr wiki, but should be added to the Lucene one, too.

      was (Author: gsingers):
    {quote}
I didn't see anything about re-using patch names on the wiki. please advise.
{qoute}

I think Otis is just referring to naming patches as something like LUCENE-1297.patch and then you just always keep that name, then  JIRA takes care of the versioning and it is always clear which patch is the latest.  As for the Wiki, I think it is on the Solr wiki, but should be added to the Lucene one, too.
 

> Allow other string distance measures in spellchecker
> ----------------------------------------------------
>
>                 Key: LUCENE-1297
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1297
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: contrib/spellchecker
>    Affects Versions: 2.4
>         Environment: n/a
>            Reporter: Thomas Morton
>            Assignee: Otis Gospodnetic
>            Priority: Minor
>             Fix For: 2.4
>
>         Attachments: string_distance3.patch
>
>
> Updated spelling code to allow for other string distance measures to be used.
> Created StringDistance interface.
> Modified existing Levenshtein distance measure to implement interface (and renamed class).
> Verified that change to Levenshtein distance didn't impact runtime performance.
> Implemented Jaro/Winkler distance metric
> Modified SpellChecker to take distacne measure as in constructor or in set method and to use interface when calling.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-1297) Allow other string distance measures in spellchecker

Mihir Sharma (Jira)
In reply to this post by Mihir Sharma (Jira)

    [ https://issues.apache.org/jira/browse/LUCENE-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12604515#action_12604515 ]

Grant Ingersoll commented on LUCENE-1297:
-----------------------------------------

Patch applies cleanly and the tests pass.

Ideally, there would be standalone tests for each of the distance measures that test them outside the context of spell checking.

I think the Jaro-Winkler threshold should be configurable via a setter/constructor.  A getter would make sense too, so that one can see what the threshold is.

Also, the TRStringDistance explicitly states that it is not thread safe.  I believe it is now being used in a non thread-safe manner.  FWIW, I see no reason why it can't be made thread-safe.  All of those member variables are being allocated in the getDistance method, so no reason not to just make them local variables, I think.

> Allow other string distance measures in spellchecker
> ----------------------------------------------------
>
>                 Key: LUCENE-1297
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1297
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: contrib/spellchecker
>    Affects Versions: 2.4
>         Environment: n/a
>            Reporter: Thomas Morton
>            Assignee: Otis Gospodnetic
>            Priority: Minor
>             Fix For: 2.4
>
>         Attachments: string_distance3.patch
>
>
> Updated spelling code to allow for other string distance measures to be used.
> Created StringDistance interface.
> Modified existing Levenshtein distance measure to implement interface (and renamed class).
> Verified that change to Levenshtein distance didn't impact runtime performance.
> Implemented Jaro/Winkler distance metric
> Modified SpellChecker to take distacne measure as in constructor or in set method and to use interface when calling.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-1297) Allow other string distance measures in spellchecker

Mihir Sharma (Jira)
In reply to this post by Mihir Sharma (Jira)

    [ https://issues.apache.org/jira/browse/LUCENE-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12604538#action_12604538 ]

Otis Gospodnetic commented on LUCENE-1297:
------------------------------------------

Tom, I agree with Grant and I'll assume you'll update the patch.

As for that TRStringDistance -> LevensteinDistance, I'll just commit it as is once the patch is fully ready, and then I'll rename classes in a separate commit.


> Allow other string distance measures in spellchecker
> ----------------------------------------------------
>
>                 Key: LUCENE-1297
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1297
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: contrib/spellchecker
>    Affects Versions: 2.4
>         Environment: n/a
>            Reporter: Thomas Morton
>            Assignee: Otis Gospodnetic
>            Priority: Minor
>             Fix For: 2.4
>
>         Attachments: string_distance3.patch
>
>
> Updated spelling code to allow for other string distance measures to be used.
> Created StringDistance interface.
> Modified existing Levenshtein distance measure to implement interface (and renamed class).
> Verified that change to Levenshtein distance didn't impact runtime performance.
> Implemented Jaro/Winkler distance metric
> Modified SpellChecker to take distacne measure as in constructor or in set method and to use interface when calling.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (LUCENE-1297) Allow other string distance measures in spellchecker

Mihir Sharma (Jira)
In reply to this post by Mihir Sharma (Jira)

     [ https://issues.apache.org/jira/browse/LUCENE-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Thomas Morton updated LUCENE-1297:
----------------------------------

    Attachment: LUCENE-1297.patch

Added tests for JaroWinkler and  Levenshtein distances directly.
Added getter/setter for JaroWinker threshold and javadoc.
Moved class variables in Levenshtein into method to make it thread-safe.
Named patch appropriately.

> Allow other string distance measures in spellchecker
> ----------------------------------------------------
>
>                 Key: LUCENE-1297
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1297
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: contrib/spellchecker
>    Affects Versions: 2.4
>         Environment: n/a
>            Reporter: Thomas Morton
>            Assignee: Otis Gospodnetic
>            Priority: Minor
>             Fix For: 2.4
>
>         Attachments: LUCENE-1297.patch
>
>
> Updated spelling code to allow for other string distance measures to be used.
> Created StringDistance interface.
> Modified existing Levenshtein distance measure to implement interface (and renamed class).
> Verified that change to Levenshtein distance didn't impact runtime performance.
> Implemented Jaro/Winkler distance metric
> Modified SpellChecker to take distacne measure as in constructor or in set method and to use interface when calling.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (LUCENE-1297) Allow other string distance measures in spellchecker

Mihir Sharma (Jira)
In reply to this post by Mihir Sharma (Jira)

     [ https://issues.apache.org/jira/browse/LUCENE-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Thomas Morton updated LUCENE-1297:
----------------------------------

    Attachment:     (was: string_distance3.patch)

> Allow other string distance measures in spellchecker
> ----------------------------------------------------
>
>                 Key: LUCENE-1297
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1297
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: contrib/spellchecker
>    Affects Versions: 2.4
>         Environment: n/a
>            Reporter: Thomas Morton
>            Assignee: Otis Gospodnetic
>            Priority: Minor
>             Fix For: 2.4
>
>         Attachments: LUCENE-1297.patch
>
>
> Updated spelling code to allow for other string distance measures to be used.
> Created StringDistance interface.
> Modified existing Levenshtein distance measure to implement interface (and renamed class).
> Verified that change to Levenshtein distance didn't impact runtime performance.
> Implemented Jaro/Winkler distance metric
> Modified SpellChecker to take distacne measure as in constructor or in set method and to use interface when calling.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

12