stemming the index

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

stemming the index

sarfaraz masood
My index contains data of 2 different languages, English & German. Now which analyzer & stemmer should be applied on this data before feeding to index

-Sarfaraz


Reply | Threaded
Open this post in threaded view
|

Re: stemming the index

Erick Erickson
The short answer is "there isn't a single analyzer and stemmer that
really work well for mixed-language indexing and searching".

Take a look through the mail archive, try search for multilanguage or
multi-language
or multiple languages. There's a wealth of info there because this topic has
been
discussed many times.

Best
Erick

On Wed, Jul 7, 2010 at 3:51 PM, sarfaraz masood <
[hidden email]> wrote:

> My index contains data of 2 different languages, English & German. Now
> which analyzer & stemmer should be applied on this data before feeding to
> index
>
> -Sarfaraz
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: stemming the index

sarfaraz masood
Thanx Erick
:-)

--- On Thu, 8/7/10, Erick Erickson <[hidden email]> wrote:

From: Erick Erickson <[hidden email]>
Subject: Re: stemming the index
To: [hidden email]
Date: Thursday, 8 July, 2010, 1:33 AM

The short answer is "there isn't a single analyzer and stemmer that
really work well for mixed-language indexing and searching".

Take a look through the mail archive, try search for multilanguage or
multi-language
or multiple languages. There's a wealth of info there because this topic has
been
discussed many times.

Best
Erick

On Wed, Jul 7, 2010 at 3:51 PM, sarfaraz masood <
[hidden email]> wrote:

> My index contains data of 2 different languages, English & German. Now
> which analyzer & stemmer should be applied on this data before feeding to
> index
>
> -Sarfaraz
>
>
>


Reply | Threaded
Open this post in threaded view
|

Re: stemming the index

Jaran Nilsen
Although I have not tested it myself yet, the Lucene-Hunspell project might
be worth to have a look at: http://code.google.com/p/lucene-hunspell/

Jaran

On Wed, Jul 7, 2010 at 10:15 PM, sarfaraz masood <
[hidden email]> wrote:

> Thanx Erick
> :-)
>
> --- On Thu, 8/7/10, Erick Erickson <[hidden email]> wrote:
>
> From: Erick Erickson <[hidden email]>
> Subject: Re: stemming the index
> To: [hidden email]
> Date: Thursday, 8 July, 2010, 1:33 AM
>
> The short answer is "there isn't a single analyzer and stemmer that
> really work well for mixed-language indexing and searching".
>
> Take a look through the mail archive, try search for multilanguage or
> multi-language
> or multiple languages. There's a wealth of info there because this topic
> has
> been
> discussed many times.
>
> Best
> Erick
>
> On Wed, Jul 7, 2010 at 3:51 PM, sarfaraz masood <
> [hidden email]> wrote:
>
> > My index contains data of 2 different languages, English & German. Now
> > which analyzer & stemmer should be applied on this data before feeding to
> > index
> >
> > -Sarfaraz
> >
> >
> >
>
>
>


--
Jaran Nilsen
MSN/GTalk: [hidden email] / [hidden email]
Tel.: +47 97 19 33 69
jarannilsen.com || codemunchies.com || ita.sourceforge.net
twitter.com/jarannilsen // www.linkedin.com/in/jarannilsen //
facebook.com/jaran.nilsen
Reply | Threaded
Open this post in threaded view
|

Re: stemming the index

Jan Høydahl / Cominvent
In reply to this post by Erick Erickson
Check out slides 36-38 in this presentation for some hint on a possible solution:
http://www.slideshare.net/janhoy/migrating-fast-to-solr-jan-hydahl-cominvent-as-euro-con

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Training in Europe - www.solrtraining.com

On 7. juli 2010, at 22.03, Erick Erickson wrote:

> The short answer is "there isn't a single analyzer and stemmer that
> really work well for mixed-language indexing and searching".
>
> Take a look through the mail archive, try search for multilanguage or
> multi-language
> or multiple languages. There's a wealth of info there because this topic has
> been
> discussed many times.
>
> Best
> Erick
>
> On Wed, Jul 7, 2010 at 3:51 PM, sarfaraz masood <
> [hidden email]> wrote:
>
>> My index contains data of 2 different languages, English & German. Now
>> which analyzer & stemmer should be applied on this data before feeding to
>> index
>>
>> -Sarfaraz
>>
>>
>>