Spellchecker in Solr?

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Spellchecker in Solr?

Michael Imbeault
Hello everyone,

Has anybody successfully implemented a Lucene spellchecker within Solr?
If so, could you give details on how one would achieve this?

If not, is it planned to make it as standard within Solr? Its a feature
almost every Solr application would want to use, so I think it would be
a nice idea. Sadly, I'm no Java developer, so I fear I won't be the one
coding that :(

Thanks,

--
Michael Imbeault
CHUL Research Center (CHUQ)
2705 boul. Laurier
Ste-Foy, QC, Canada, G1V 4G2
Tel: (418) 654-2705, Fax: (418) 654-2212

Reply | Threaded
Open this post in threaded view
|

Re: Spellchecker in Solr?

Kevin Lewandowski
I have not done one but have been planning to do it based on this article:
http://today.java.net/pub/a/today/2005/08/09/didyoumean.html

With Solr it would be much simpler than the java examples they give.

On 10/30/06, Michael Imbeault <[hidden email]> wrote:

> Hello everyone,
>
> Has anybody successfully implemented a Lucene spellchecker within Solr?
> If so, could you give details on how one would achieve this?
>
> If not, is it planned to make it as standard within Solr? Its a feature
> almost every Solr application would want to use, so I think it would be
> a nice idea. Sadly, I'm no Java developer, so I fear I won't be the one
> coding that :(
>
> Thanks,
>
> --
> Michael Imbeault
> CHUL Research Center (CHUQ)
> 2705 boul. Laurier
> Ste-Foy, QC, Canada, G1V 4G2
> Tel: (418) 654-2705, Fax: (418) 654-2212
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Spellchecker in Solr?

Michael Imbeault
I had the very same article in mind - how would it be simpler in Solr
than in Lucene? A spellchecker is pretty much standard in every major
search engine nowadays - with one, Solr would be the best, hands down
(even if it already is :P).

Are your plans to build this anything concrete, or is it just at the 'i
might do this in the future' stage?
Thanks,
--

Michael Imbeault
CHUL Research Center (CHUQ)
2705 boul. Laurier
Ste-Foy, QC, Canada, G1V 4G2
Tel: (418) 654-2705, Fax: (418) 654-2212



Kevin Lewandowski wrote:

> I have not done one but have been planning to do it based on this
> article:
> http://today.java.net/pub/a/today/2005/08/09/didyoumean.html
>
> With Solr it would be much simpler than the java examples they give.
>
> On 10/30/06, Michael Imbeault <[hidden email]> wrote:
>> Hello everyone,
>>
>> Has anybody successfully implemented a Lucene spellchecker within Solr?
>> If so, could you give details on how one would achieve this?
>>
>> If not, is it planned to make it as standard within Solr? Its a feature
>> almost every Solr application would want to use, so I think it would be
>> a nice idea. Sadly, I'm no Java developer, so I fear I won't be the one
>> coding that :(
>>
>> Thanks,
>>
>> --
>> Michael Imbeault
>> CHUL Research Center (CHUQ)
>> 2705 boul. Laurier
>> Ste-Foy, QC, Canada, G1V 4G2
>> Tel: (418) 654-2705, Fax: (418) 654-2212
>>
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: Spellchecker in Solr?

Otis Gospodnetic-2
In reply to this post by Michael Imbeault
I just wrote this for Technorati the other day for our internal hackaton.  It was pretty simple.  It doesn't use Solr, but it could.  It uses Jetty's HTTP handler, which I highly recomment.  It acts as a web service that responds to HTTP GET requests that contain a query, and it return text/plain with suggestions, one per line.

Otis

----- Original Message ----
From: Michael Imbeault <[hidden email]>
To: [hidden email]
Sent: Monday, October 30, 2006 10:07:19 PM
Subject: Re: Spellchecker in Solr?

I had the very same article in mind - how would it be simpler in Solr
than in Lucene? A spellchecker is pretty much standard in every major
search engine nowadays - with one, Solr would be the best, hands down
(even if it already is :P).

Are your plans to build this anything concrete, or is it just at the 'i
might do this in the future' stage?
Thanks,
--

Michael Imbeault
CHUL Research Center (CHUQ)
2705 boul. Laurier
Ste-Foy, QC, Canada, G1V 4G2
Tel: (418) 654-2705, Fax: (418) 654-2212



Kevin Lewandowski wrote:

> I have not done one but have been planning to do it based on this
> article:
> http://today.java.net/pub/a/today/2005/08/09/didyoumean.html
>
> With Solr it would be much simpler than the java examples they give.
>
> On 10/30/06, Michael Imbeault <[hidden email]> wrote:
>> Hello everyone,
>>
>> Has anybody successfully implemented a Lucene spellchecker within Solr?
>> If so, could you give details on how one would achieve this?
>>
>> If not, is it planned to make it as standard within Solr? Its a feature
>> almost every Solr application would want to use, so I think it would be
>> a nice idea. Sadly, I'm no Java developer, so I fear I won't be the one
>> coding that :(
>>
>> Thanks,
>>
>> --
>> Michael Imbeault
>> CHUL Research Center (CHUQ)
>> 2705 boul. Laurier
>> Ste-Foy, QC, Canada, G1V 4G2
>> Tel: (418) 654-2705, Fax: (418) 654-2212
>>
>>
>



Reply | Threaded
Open this post in threaded view
|

Re: Spellchecker in Solr?

Greg Ludington
In reply to this post by Michael Imbeault
I have created one in a sample application (i.e. it works, but I have
not used it in any heavy load or production sense) using the
SpellChecker package from the Lucene 2.1 branch.  Solr provides an
excellent foundation for extension, so it is not too difficult to get
going, but it does take some java coding.  Another project at my
company has bumped the Solr-related one at least 6 months, so I have
not been able to polish off my spell checker.  As fast as the Solr
community is moving, however, I would be surprised if no other
solution showed up before then.

-Greg

On 10/30/06, Michael Imbeault <[hidden email]> wrote:

> Hello everyone,
>
> Has anybody successfully implemented a Lucene spellchecker within Solr?
> If so, could you give details on how one would achieve this?
>
> If not, is it planned to make it as standard within Solr? Its a feature
> almost every Solr application would want to use, so I think it would be
> a nice idea. Sadly, I'm no Java developer, so I fear I won't be the one
> coding that :(
>
> Thanks,
>
> --
> Michael Imbeault
> CHUL Research Center (CHUQ)
> 2705 boul. Laurier
> Ste-Foy, QC, Canada, G1V 4G2
> Tel: (418) 654-2705, Fax: (418) 654-2212
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Spellchecker in Solr?

Kevin Lewandowski
In reply to this post by Michael Imbeault
> I had the very same article in mind - how would it be simpler in Solr
> than in Lucene? A spellchecker is pretty much standard in every major

I meant it would be a simpler implementation in Solr because you don't
have to deal with java or any Lucene API's. You just create a document
for each "correct" word. For example the word "lettuce" would have a
document:

<doc>
<field name="word">lettuce</field>
<field name="start3">let</field>
<field name="gram3">let ett ttu tuc uce</field>
<field name="end3">uce</field>
<field name="start4">lett</field>
<field name="gram4">lett ettu ttuc tuce</field>
<field name="end4">tuce</field>
</doc>

Then you query Solr using the same syntax they describe in the article.

Anyway I haven't done this or tested it, but when reading that article
I thought it would be much easier to implement using Solr, at least
for me since I already have a database of correct words in Solr.

Kevin
Reply | Threaded
Open this post in threaded view
|

Re: Spellchecker in Solr?

Chris Hostetter-3
In reply to this post by Michael Imbeault

: Has anybody successfully implemented a Lucene spellchecker within Solr?
: If so, could you give details on how one would achieve this?

There's really two ways to interpret that question ...
  1) built a spell correction suggestion application powered by Solr,
     where you manually feed it the data as documents and the mainIndex is
     the source of suggestion data.
  2) Embeded sepll correction suggestion in Solr, so that request handlers
     can return suggested alternatives allong with the results from your
     mainIndex.

#1 would probably be pretty easy as people have mentioned.

#2 would be a lot trickier...

request handlers can certainly keep state, and could even write to files
if they wanted to to preserve state accross JVM instances to maintain a
permenant dictionary store ... and i suppose you could use a newSearcher
Listener to know when documents have been added so you can scan them for
new words to update your dictionary ... but off the top of my head it
sounds like it would get pretty complicated.



-Hoss

Reply | Threaded
Open this post in threaded view
|

Re: Spellchecker in Solr?

Chris Hostetter-3
In reply to this post by Kevin Lewandowski
: I meant it would be a simpler implementation in Solr because you don't
: have to deal with java or any Lucene API's. You just create a document
: for each "correct" word. For example the word "lettuce" would have a
: document:
:
: <doc>
: <field name="word">lettuce</field>
: <field name="start3">let</field>
: <field name="gram3">let ett ttu tuc uce</field>
: <field name="end3">uce</field>

with copyField, good character based NGram analyzer, and a substring
analyzer (for the start and end fields) you wouldn't even need to do all
that splitting on the client side ... just send each suggestion as a
single doc/field...

  <doc><field name="word">lettuce</field></doc>


-Hoss

Reply | Threaded
Open this post in threaded view
|

Re: Spellchecker in Solr?

Michael Imbeault
In reply to this post by Chris Hostetter-3
I had #1 in mind. Everything in my mainIndex is supposed to be correctly
spelled, so I just want to use that as a source for spelling
suggestions. I'd check for suggestions on low numbers of results (no
results, or very few for a one word query).

#2 would be even better but as you said, its a lot trickier. For my
needs, just a spelling suggester would be perfect. Would it require java
programming, or could I get away with it with the current Solr (adding
n-gram fields and querying on them)?

Thanks,

Michael Imbeault
CHUL Research Center (CHUQ)
2705 boul. Laurier
Ste-Foy, QC, Canada, G1V 4G2
Tel: (418) 654-2705, Fax: (418) 654-2212



Chris Hostetter wrote:

> : Has anybody successfully implemented a Lucene spellchecker within Solr?
> : If so, could you give details on how one would achieve this?
>
> There's really two ways to interpret that question ...
>   1) built a spell correction suggestion application powered by Solr,
>      where you manually feed it the data as documents and the mainIndex is
>      the source of suggestion data.
>   2) Embeded sepll correction suggestion in Solr, so that request handlers
>      can return suggested alternatives allong with the results from your
>      mainIndex.
>
> #1 would probably be pretty easy as people have mentioned.
>
> #2 would be a lot trickier...
>
> request handlers can certainly keep state, and could even write to files
> if they wanted to to preserve state accross JVM instances to maintain a
> permenant dictionary store ... and i suppose you could use a newSearcher
> Listener to know when documents have been added so you can scan them for
> new words to update your dictionary ... but off the top of my head it
> sounds like it would get pretty complicated.
>
>
>
> -Hoss
>
>  
Reply | Threaded
Open this post in threaded view
|

Re: Spellchecker in Solr?

Chris Hostetter-3

: #2 would be even better but as you said, its a lot trickier. For my
: needs, just a spelling suggester would be perfect. Would it require java
: programming, or could I get away with it with the current Solr (adding
: n-gram fields and querying on them)?

If you build the ngrams yourself external to Solr, and put them in a
field, and query on that field using grams built yourself from the users
input, then yeah -- Solr out of the box should work fine.




-Hoss