Quantcast

Stemming using automata

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Stemming using automata

karl.wright

Folks,

I had an interesting conversation with Simon a few weeks back.  It occurred to me that it might be possible to build an automata that handles  stemming and pluralization on searches.  Just a thought…

Karl

 

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Stemming using automata

Robert Muir
Karl, you are right.

this is one of the ways i originally used this thing.

i've done some relevance experiments along these lines (some summary
results here http://www.slideshare.net/otisg/finite-state-queries-in-lucene).

in this case i compared 3 cases: index-time porter stemming,
index-time plural stemming, and query-time plural stemming (with
automaton).

in general you can get similar results, slower query speed, but more
flexibility. for instance, you could have a queryparser that
implements a stem() operator without indexing everything twice.

probably pretty boring for most people, but in some cases (e.g. lots
of languages) query-time starts to become more attractive...

On Wed, Nov 17, 2010 at 3:18 PM,  <[hidden email]> wrote:
> Folks,
>
> I had an interesting conversation with Simon a few weeks back.  It occurred
> to me that it might be possible to build an automata that handles  stemming
> and pluralization on searches.  Just a thought…
>
> Karl
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Loading...