How to use protwords.txt

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

How to use protwords.txt

Shuai Weng

Hey,

Currently we have indexed some biological fulltext files. I was wondering how to
config the schema.xml such that the gene names (eg, 'met1', 'met2', 'met3' etc) won't
be stemmed into the same word ('met'). I added these gene names into the protwords.txt
file but it doesn't seem to work.  

Am I missing anything?

Thanks for any info you may provide!

Shaui


Reply | Threaded
Open this post in threaded view
|

Re: How to use protwords.txt

tomasflobbe
Shaui, are you using a WordDelimiterFilterFactory in the analysis? That's the
filter that might be transforming "met1" into "met" and "1" and not the steamer.
Check de Analysis page on Solr admin.




________________________________
De: Shuai Weng <[hidden email]>
Para: [hidden email]
Enviado: lunes, 30 de agosto, 2010 20:00:41
Asunto: How to use protwords.txt


Hey,

Currently we have indexed some biological fulltext files. I was wondering how to
config the schema.xml such that the gene names (eg, 'met1', 'met2', 'met3' etc)
won't
be stemmed into the same word ('met'). I added these gene names into the
protwords.txt
file but it doesn't seem to work.  

Am I missing anything?

Thanks for any info you may provide!

Shaui


   
Reply | Threaded
Open this post in threaded view
|

Re: How to use protwords.txt

Erick Erickson
In addition to Tomas' question, be aware that if you're already indexed
data, the stemming has already been done, you'll have to re-index
to get the "right" tokens in there.

Best
Erick

On Tue, Aug 31, 2010 at 6:08 AM, Tomas <[hidden email]> wrote:

> Shaui, are you using a WordDelimiterFilterFactory in the analysis? That's
> the
> filter that might be transforming "met1" into "met" and "1" and not the
> steamer.
> Check de Analysis page on Solr admin.
>
>
>
>
> ________________________________
> De: Shuai Weng <[hidden email]>
> Para: [hidden email]
> Enviado: lunes, 30 de agosto, 2010 20:00:41
> Asunto: How to use protwords.txt
>
>
> Hey,
>
> Currently we have indexed some biological fulltext files. I was wondering
> how to
> config the schema.xml such that the gene names (eg, 'met1', 'met2', 'met3'
> etc)
> won't
> be stemmed into the same word ('met'). I added these gene names into the
> protwords.txt
> file but it doesn't seem to work.
>
> Am I missing anything?
>
> Thanks for any info you may provide!
>
> Shaui
>
>
>
>