Adding stemmer option to EnglishAnalyzer

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Adding stemmer option to EnglishAnalyzer

Cameron M VandenBerg

Hello!

 

My name is Cameron VandenBerg, and I am a research programmer at Carnegie Mellon University.  We use Lucene for projects and classwork here, and one feature we have always added to our own code, which extends the EnglishAnalyzer, is a setStemmer method and use that stemmer in the createComponents method.

 

Is it possible to add this feature to the EnglishAnalyzer?  If so, what steps can we take?

 

Code Snippets:

  /**

   * Control whether and how stemming is done. See StemmerType.

   */

  public void setStemmer(StemmerType s) {

    this.stemmer = s;

  }

 

  @Override

  protected TokenStreamComponents createComponents(String fieldName) {

    final Tokenizer source = new StandardTokenizer();

    TokenStream result = new EnglishPossessiveFilter(source);

    result = new LowerCaseFilter(result);

    result = new StopFilter(result, stopwords);

    if(!stemExclusionSet.isEmpty())

      result = new SetKeywordMarkerFilter(result, stemExclusionSet);

    if (this.stemmer == StemmerType.KSTEM)

       result = new KStemFilter(result);

    else

       result = new PorterStemFilter(result);

    return new TokenStreamComponents(source, result);

  }

 

Thank you,

Cameron VandenBerg