How to customize the delimiters used by the WordDelimiterFilter in Lucene?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

How to customize the delimiters used by the WordDelimiterFilter in Lucene?

phauly
Hi,

I am trying to index words like 'e-mail' as 'email', 'e mail' and 'e-mail' with Lucene 4.4.0.

Lucene's WordDelimiterFilter should be ideal for this. However, it treats every(?) non-alphanumeric character as a delimiter. So, terms like 'C++' are transformed to 'C', which is not what I want.

Apparently, Solr allows to specify custom delimiters. But how can I do it in Lucene?

I have looked into the documentation and the 'byte[] charTypeTable' parameter in the Constructor looked promising. But it seems to have no effect if I specify some delimiters in a charTypeTable.

Thank you!

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How to customize the delimiters used by the WordDelimiterFilter in Lucene?

Ahmet Arslan
Hi,

May be look at the factory class to see how types argument is handled?

Ahmet


On Friday, March 17, 2017 11:05 PM, "[hidden email]" <[hidden email]> wrote:



Hi,


I am trying to index words like 'e-mail' as 'email', 'e mail' and 'e-mail' with Lucene 4.4.0.


Lucene's WordDelimiterFilter should be ideal for this. However, it treats every(?) non-alphanumeric character as a delimiter. So, terms like 'C++' are transformed to 'C', which is not what I want.


Apparently, Solr allows to specify custom delimiters. But how can I do it in Lucene?


I have looked into the documentation and the 'byte[] charTypeTable' parameter in the Constructor looked promising. But it seems to have no effect if I specify some delimiters in a charTypeTable.


Thank you!


---------------------------------------------------------------------

To unsubscribe, e-mail: [hidden email]

For additional commands, e-mail: [hidden email]

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Loading...