Basic character-cleanups easily possible?

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Basic character-cleanups easily possible?

Stefan Neufeind
Hi,

I wonder if it is somehow easily possible to do basic
character-"cleanups". E.g. most people might expect searching for "cafe"
to find "cafe" and "café" (the latter with accent).

Does this also fall in the category of "stemming", or would it maybe be
a general "optimisation" of words independent of actual language-based
stemming? And at which stage could it be done through which plugin?
Somebody "solved" this already?


Regards,
  Stefan
Reply | Threaded
Open this post in threaded view
|

Re: Basic character-cleanups easily possible?

Otis Gospodnetic-2-2
Have a look at Lucene's contrib/:

$ ff \*ISO\*java
./src/test/org/apache/lucene/analysis/TestISOLatin1AccentFilter.java
./src/java/org/apache/lucene/analysis/ISOLatin1AccentFilter.java

Otis

----- Original Message ----
From: Stefan Neufeind <[hidden email]>
To: [hidden email]
Sent: Wednesday, July 12, 2006 6:23:36 PM
Subject: Basic character-cleanups easily possible?

Hi,

I wonder if it is somehow easily possible to do basic
character-"cleanups". E.g. most people might expect searching for "cafe"
to find "cafe" and "café" (the latter with accent).

Does this also fall in the category of "stemming", or would it maybe be
a general "optimisation" of words independent of actual language-based
stemming? And at which stage could it be done through which plugin?
Somebody "solved" this already?


Regards,
  Stefan