Alternative Spellchecker (spelt)

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Alternative Spellchecker (spelt)

Toby Cole-2
Hi,
        We've been working on integrating the spelt spellchecker from XTF (http://groups.google.com/group/spelt/ 
) into solr, and were wondering if anyone else would find it useful.
It is currently implemented as several components:
        SpeltComponent - Adds the best suggestion to the solr response as an  
array
        SpeltFilter & SpeltFilterFactory - Creates a filter which queues  
tokens into the spelt index
        SpeltHandler - Allows you to query the spellchecker directly
        SpeltIndexCreationHandler - Force the creation of the spellchecking  
index (typically used if you are not utilizing the post commit  
listener, below)
        SpeltPostCommitListener - Adds the queued words to the spellchecking  
index after each commit.

The approach we have taken is pretty flexible, as it allows you to  
decide when to query the spellchecker, and when to create the index.  
It has been tested with a corpus of 8 million bibliographic records,  
with a total of around 2.3 gig of words queued into it.

Regards, T

PS. Martin Haye
http://markmail.org/message/cqt4qtzzwyceltqu

Toby Cole
Software Engineer

Semantico
Lees House, Floor 1, 21-23 Dyke Road, Brighton BN1 3FE
T: +44 (0)1273 358 238
F: +44 (0)1273 723 232
E: [hidden email]
W: www.semantico.com

Reply | Threaded
Open this post in threaded view
|

Re: Alternative Spellchecker (spelt)

Toby Cole-2
Sorry, must have removed part of my PS by accident. I meant to say  
that Martin Haye posted a good description of the spelt library on the  
solr list: http://markmail.org/message/cqt4qtzzwyceltqu

T
On 9 Jul 2008, at 11:40, Toby Cole wrote:

> Hi,
> We've been working on integrating the spelt spellchecker from XTF (http://groups.google.com/group/spelt/ 
> ) into solr, and were wondering if anyone else would find it useful.
> It is currently implemented as several components:
> SpeltComponent - Adds the best suggestion to the solr response as  
> an array
> SpeltFilter & SpeltFilterFactory - Creates a filter which queues  
> tokens into the spelt index
> SpeltHandler - Allows you to query the spellchecker directly
> SpeltIndexCreationHandler - Force the creation of the spellchecking  
> index (typically used if you are not utilizing the post commit  
> listener, below)
> SpeltPostCommitListener - Adds the queued words to the  
> spellchecking index after each commit.
>
> The approach we have taken is pretty flexible, as it allows you to  
> decide when to query the spellchecker, and when to create the index.  
> It has been tested with a corpus of 8 million bibliographic records,  
> with a total of around 2.3 gig of words queued into it.
>
> Regards, T
>
> PS. Martin Haye
>
> Toby Cole
> Software Engineer
>
> Semantico
> Lees House, Floor 1, 21-23 Dyke Road, Brighton BN1 3FE
> T: +44 (0)1273 358 238
> F: +44 (0)1273 723 232
> E: [hidden email]
> W: www.semantico.com
>

Toby Cole
Software Engineer

Semantico
Lees House, Floor 1, 21-23 Dyke Road, Brighton BN1 3FE
T: +44 (0)1273 358 238
F: +44 (0)1273 723 232
E: [hidden email]
W: www.semantico.com

Reply | Threaded
Open this post in threaded view
|

Re: Alternative Spellchecker (spelt)

Grant Ingersoll-2
In reply to this post by Toby Cole-2
Have you seen SOLR-572?  I'm curious how easy it would be to plug  
Spelt into that framework.  Otherwise, yes, I am interested in  
alternatives, but I am not sure we need separate components for ever  
spell checker version out there.  I do like the sounds of what you have.

-Grant

On Jul 9, 2008, at 6:40 AM, Toby Cole wrote:

> Hi,
> We've been working on integrating the spelt spellchecker from XTF (http://groups.google.com/group/spelt/ 
> ) into solr, and were wondering if anyone else would find it useful.
> It is currently implemented as several components:
> SpeltComponent - Adds the best suggestion to the solr response as  
> an array
> SpeltFilter & SpeltFilterFactory - Creates a filter which queues  
> tokens into the spelt index
> SpeltHandler - Allows you to query the spellchecker directly
> SpeltIndexCreationHandler - Force the creation of the spellchecking  
> index (typically used if you are not utilizing the post commit  
> listener, below)
> SpeltPostCommitListener - Adds the queued words to the  
> spellchecking index after each commit.
>
> The approach we have taken is pretty flexible, as it allows you to  
> decide when to query the spellchecker, and when to create the index.  
> It has been tested with a corpus of 8 million bibliographic records,  
> with a total of around 2.3 gig of words queued into it.
>
> Regards, T
>
> PS. Martin Haye
> http://markmail.org/message/cqt4qtzzwyceltqu
>
> Toby Cole
> Software Engineer
>
> Semantico
> Lees House, Floor 1, 21-23 Dyke Road, Brighton BN1 3FE
> T: +44 (0)1273 358 238
> F: +44 (0)1273 723 232
> E: [hidden email]
> W: www.semantico.com
>

--------------------------
Grant Ingersoll
http://www.lucidimagination.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ







Reply | Threaded
Open this post in threaded view
|

Re: Alternative Spellchecker (spelt)

Toby Cole-2
On 10 Jul 2008, at 14:37, Grant Ingersoll wrote:

> Have you seen SOLR-572?  I'm curious how easy it would be to plug  
> Spelt into that framework.

I had seen SOLR-572 but not looked into it too hard because our  
project required the multi-word correction from XTF.
One thing I had not thought about is the possibility of multiple  
dictionaries for different fields, however I reckon this is achievable  
as
you can just create a new instance of the spellchecker with a  
different data directory.

> Otherwise, yes, I am interested in alternatives, but I am not sure  
> we need separate components for ever spell checker version out  
> there.  I do like the sounds of what you have.

I'll have a look at the SOLR-572 implementation over the weekend and  
see if our handlers could fit into that framework somehow.
Cheers, Toby.

>
>
> -Grant
>
> On Jul 9, 2008, at 6:40 AM, Toby Cole wrote:
>
>> Hi,
>> We've been working on integrating the spelt spellchecker from XTF (http://groups.google.com/group/spelt/ 
>> ) into solr, and were wondering if anyone else would find it useful.
>> It is currently implemented as several components:
>> SpeltComponent - Adds the best suggestion to the solr response as  
>> an array
>> SpeltFilter & SpeltFilterFactory - Creates a filter which queues  
>> tokens into the spelt index
>> SpeltHandler - Allows you to query the spellchecker directly
>> SpeltIndexCreationHandler - Force the creation of the  
>> spellchecking index (typically used if you are not utilizing the  
>> post commit listener, below)
>> SpeltPostCommitListener - Adds the queued words to the  
>> spellchecking index after each commit.
>>
>> The approach we have taken is pretty flexible, as it allows you to  
>> decide when to query the spellchecker, and when to create the  
>> index. It has been tested with a corpus of 8 million bibliographic  
>> records, with a total of around 2.3 gig of words queued into it.
>>
>> Regards, T
>>
>> PS. Martin Haye
>> http://markmail.org/message/cqt4qtzzwyceltqu
>>
>> Toby Cole
>> Software Engineer
>>
>> Semantico
>> Lees House, Floor 1, 21-23 Dyke Road, Brighton BN1 3FE
>> T: +44 (0)1273 358 238
>> F: +44 (0)1273 723 232
>> E: [hidden email]
>> W: www.semantico.com
>>
>
> --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com
>
> Lucene Helpful Hints:
> http://wiki.apache.org/lucene-java/BasicsOfPerformance
> http://wiki.apache.org/lucene-java/LuceneFAQ
>
>
>
>
>
>
>

Toby Cole
Software Engineer

Semantico
Lees House, Floor 1, 21-23 Dyke Road, Brighton BN1 3FE
T: +44 (0)1273 358 238
F: +44 (0)1273 723 232
E: [hidden email]
W: www.semantico.com