How to index Chinese text?

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

How to index Chinese text?

Zsolt Koppany
Our application works with lucene-1.4.3 stable even for German text but we
have problems with Chinese text. Which analyzer should we use to index
Chinese text?

Zsolt



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: How to index Chinese text?

Erik Hatcher

On Jun 13, 2005, at 12:01 PM, Zsolt Koppany wrote:

> Our application works with lucene-1.4.3 stable even for German text  
> but we
> have problems with Chinese text. Which analyzer should we use to index
> Chinese text?

This question is best posted to java-user, not java-dev, but I'll  
reply here for now.

The answer is that "it depends" on what you want to do.  
StandardAnalyzer will tokenize CJK characters individually.  In the  
contrib area of the Subversion repository under "analyzers", there is  
a ChineseAnalyzer and a CJKAnalyzer.

     Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]