Q: Wildcard searching with german umlauts (ä, ö, ß, ...)

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Q: Wildcard searching with german umlauts (ä, ö, ß, ...)

Stephan Spat
Hello again!

I use the following Analyzer to analyze my documents:

public TokenStream tokenStream(String fieldName, Reader reader) {
                return new SnowballFilter(
                    new LowerCaseFilter(
                        new StandardFilter(
                            new StandardTokenizer(reader))), "German");  
    }

It replaces german umlauts, e.g. ä <=> a, ü <=> u, ... . So no umlauts
are in the index. For searching I use the same Analyzer. When I do a
simple search for a word with umlauts there is no problem. But if I use
addidionally wildcards I suppose the word is not analyzed and so I word
with umlauts and wildcards is not found in the index?!! (for example:
grö*). Is this assumption correct?

Is the only way to use wildcards and umlauts not to use the
StadardFilter (I suppose replacement is done here)? Or is there a
"trick" to use umlauts and wildcards? Or is it necessary to write a new
Filter instead of the StandardFilter?

Thank's a lot

Stephan Spat


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

adb
Reply | Threaded
Open this post in threaded view
|

Re: Q: Wildcard searching with german umlauts (ä, ö, ß, ...)

adb
Stephan Spat wrote:
> Hello again!
>
> It replaces german umlauts, e.g. ä <=> a, ü <=> u, ... . So no umlauts
> are in the index. For searching I use the same Analyzer. When I do a
> simple search for a word with umlauts there is no problem. But if I use
> addidionally wildcards I suppose the word is not analyzed and so I word
> with umlauts and wildcards is not found in the index?!! (for example:
> grö*). Is this assumption correct?

I came across this class this morning:

AnalyzingQueryParser - Overrides Lucene's default QueryParser so that Fuzzy-,
Prefix-, Range-, and WildcardQuerys are also passed through the given analyzer,
but ? and * don't get removed from the search terms.

Read the warning re German though.

Antony


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]