Normalizing multiple Chars with MappingCharFilter possible?

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Normalizing multiple Chars with MappingCharFilter possible?

Andreas Kahl
Hello everyone,

is it possible to normalize Strings like '`e' (2 chars) => 'e' (in contrast to 'é' (1 char) => 'e') with org.apache.lucene.analysis.MappingCharFilter?

I am asking this because I am considering to index some multilingual and multi-alphabetic data with Solr which uses such Strings as a substitution for 'real' Unicode characters.

Thanks for your advice.

Andreas

Reply | Threaded
Open this post in threaded view
|

Re: Normalizing multiple Chars with MappingCharFilter possible?

Koji Sekiguchi
Andreas Kahl wrote:

> Hello everyone,
>
> is it possible to normalize Strings like '`e' (2 chars) => 'e' (in contrast to 'é' (1 char) => 'e') with org.apache.lucene.analysis.MappingCharFilter?
>
> I am asking this because I am considering to index some multilingual and multi-alphabetic data with Solr which uses such Strings as a substitution for 'real' Unicode characters.
>
> Thanks for your advice.
>
> Andreas
>
>
>  
Yes. It should work.
MappingCharFilter supports:

* char-to-char
* string-to-char
* char-to-string
* string-to-string

without misalignment of original offsets (i.e. highlighter works
correctly with MappingCharFilters).

Koji

--
http://www.rondhuit.com/en/

Reply | Threaded
Open this post in threaded view
|

Re: Normalizing multiple Chars with MappingCharFilter possible?

Andreas Kahl


Am 24.11.09 12:30, schrieb Koji Sekiguchi:

> Andreas Kahl wrote:
>> Hello everyone,
>>
>> is it possible to normalize Strings like '`e' (2 chars) => 'e' (in
>> contrast to 'é' (1 char) => 'e') with
>> org.apache.lucene.analysis.MappingCharFilter?
>>
>> I am asking this because I am considering to index some multilingual
>> and multi-alphabetic data with Solr which uses such Strings as a
>> substitution for 'real' Unicode characters.
>> Thanks for your advice.
>> Andreas
>>
>>
>>  
> Yes. It should work.
> MappingCharFilter supports:
>
> * char-to-char
> * string-to-char
> * char-to-string
> * string-to-string
>
> without misalignment of original offsets (i.e. highlighter works
> correctly with MappingCharFilters).
>
> Koji
>
Thanks Koji. That was all I needed to know.

Andreas


signature.asc (210 bytes) Download Attachment