Where to place a filter...

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Where to place a filter...

Christian Aschoff
Hello,

as a prolog, i have no problems and everything works the way i want :-)

I am more interested in a tip if i am using the right way or pattern.  
I want to strip accents before data goes into my index, so i wrote  
the code following below. I did not find an example of where to place  
a filter (for indexing) with google, so this is my guess of how to do  
it.

My question is: Is this the correct pattern for the usage of a filter  
or where should it be placed?

Thank you in advantage for any comments,
Christian

---------------------------------------------------------------
/*
  * RetroBibAnalyzer.java
  *
  * Created on 22. November 2007, 12:42
  *
  */

package de.retrobib.lucene;

import java.io.Reader;
import org.apache.log4j.Logger;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.de.GermanAnalyzer;
import org.apache.lucene.analysis.snowball.SnowballAnalyzer;

/**
  * Analyzer für den Lucene-Index. Zu Zeit nur ein
  * Wrapper um spätere Erweiterungen zu erleichtern.
  *
  * @author caschoff
  * @version 1.0
  */
public class RetroBibAnalyzer extends Analyzer {

     /**
      * <b>Jede</b> Klasse hat ihren Logger.
      */
     private static final Logger logger = Logger.getLogger
(RetroBibAnalyzer.class);

     /** Der Analyzer. */
     private static final SnowballAnalyzer analyzer = new  
SnowballAnalyzer("German", GermanAnalyzer.GERMAN_STOP_WORDS);

     /** Creates a new instance of RetroBibAnalyzer */
     public RetroBibAnalyzer() {
         super();
     }

     /**
      * Den Tokenstream verarbeiten.
      *
      * @param fieldName Der Name des Feldes.
      * @param reader Der reader.
      * @return Der TokenStream.
      */
     public TokenStream tokenStream(String fieldName, Reader reader) {
         return new UTF8AccentFilter(analyzer.tokenStream(fieldName,  
reader));
     }

}
---------------------------------------------------------------

---
Dipl. Ing. (FH) Christian Aschoff

Büro:
Universität Ulm
Kommunikations- und Informationszentrum
Abt. Informationssysteme
Raum O26/5403
Albert-Einstein-Allee 11
89081 Ulm

Tel. 0731 50-22432
Fax. 0731 50-22471
[hidden email]

Privat:
Fabristr. 13
89075 Ulm
Deutschland/Old Europe

Tel. 0731 602 803 60
Fax. 0731 602 803 61
Mob. 0171 272 03 04
[hidden email]

Helfen Sie mit: www.retrobibliothek.de




---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Where to place a filter...

Doron Cohen
Seems your ask if to remove accents before or after stemming.
Here is a discussion on similar question (for Spanish) -

http://www.nabble.com/Snowball-and-accents-filter...--tf3653720.html#a10207399

Hope this helps,
Doron

Christian Aschoff <[hidden email]> wrote on 22/11/2007
21:27:20:

> Hello,
>
> as a prolog, i have no problems and everything works the way i want :-)
>
> I am more interested in a tip if i am using the right way or pattern.
> I want to strip accents before data goes into my index, so i wrote
> the code following below. I did not find an example of where to place
> a filter (for indexing) with google, so this is my guess of how to do
> it.
>
> My question is: Is this the correct pattern for the usage of a filter
> or where should it be placed?
>
> Thank you in advantage for any comments,
> Christian
...
>      private static final SnowballAnalyzer analyzer = new
> SnowballAnalyzer("German", GermanAnalyzer.GERMAN_STOP_WORDS);
...
>      public TokenStream tokenStream(String fieldName, Reader reader) {
>          return new UTF8AccentFilter(analyzer.tokenStream(fieldName,


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]