[jira] Updated: (SOLR-1336) Add support for lucene's SmartChineseAnalyzer

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

[jira] Updated: (SOLR-1336) Add support for lucene's SmartChineseAnalyzer

JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/SOLR-1336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir updated SOLR-1336:

    Fix Version/s: 3.1

I don't think jar file size should prevent us from adding support for all the analyzers we have.

This comes with the territory for CJK. Individuals interested in "optimizing" size can help
with LUCENE-2510, but I don't think that should block integrating all our analyzers, nor should
they have to all wait till 4.0

> Add support for lucene's SmartChineseAnalyzer
> ---------------------------------------------
>                 Key: SOLR-1336
>                 URL: https://issues.apache.org/jira/browse/SOLR-1336
>             Project: Solr
>          Issue Type: New Feature
>          Components: Schema and Analysis
>            Reporter: Robert Muir
>            Assignee: Robert Muir
>             Fix For: 3.1, 4.0
>         Attachments: SOLR-1336.patch, SOLR-1336.patch, SOLR-1336.patch
> SmartChineseAnalyzer was contributed to lucene, it indexes simplified chinese text as words.
> if the factories for the tokenizer and word token filter are added to solr it can be used, although there should be a sample config or wiki entry showing how to apply the built-in stopwords list.
> this is because it doesn't contain actual stopwords, but must be used to prevent indexing punctuation...
> note: we did some refactoring/cleanup on this analyzer recently, so it would be much easier to do this after the next lucene update.
> it has also been moved out of -analyzers.jar due to size, and now builds in its own smartcn jar file, so that would need to be added if this feature is desired.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]