Adding stemmer option to EnglishAnalyzer

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

Adding stemmer option to EnglishAnalyzer

Cameron M VandenBerg



My name is Cameron VandenBerg, and I am a research programmer at Carnegie Mellon University.  We use Lucene for projects and classwork here, and one feature we have always added to our own code, which extends the EnglishAnalyzer, is a setStemmer method and use that stemmer in the createComponents method.


Is it possible to add this feature to the EnglishAnalyzer?  If so, what steps can we take?


Code Snippets:


   * Control whether and how stemming is done. See StemmerType.


  public void setStemmer(StemmerType s) {

    this.stemmer = s;




  protected TokenStreamComponents createComponents(String fieldName) {

    final Tokenizer source = new StandardTokenizer();

    TokenStream result = new EnglishPossessiveFilter(source);

    result = new LowerCaseFilter(result);

    result = new StopFilter(result, stopwords);


      result = new SetKeywordMarkerFilter(result, stemExclusionSet);

    if (this.stemmer == StemmerType.KSTEM)

       result = new KStemFilter(result);


       result = new PorterStemFilter(result);

    return new TokenStreamComponents(source, result);



Thank you,

Cameron VandenBerg