Re: Lucene Arabic Internationalization Question

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Re: Lucene Arabic Internationalization Question

Nader Henein
Dear Rasha,

Sorry for the delay, I've indexed Arabic and English seamlessly on
Lucene, the only thing you have to watch out for is stemming, as for
indexing PDFs, I have not used that part of the API, but from
experience, this comes down to using or in some cases forcing the
correct encoding, debug this by bringing down your development to the
lowest denominator, for example if you're doing this from a webservice,
try it first from the prompt, so you have to contend only with the OS
encoding (UTF-8 is highly recommended) and not the browser / server  
encodings.

A more detailed example of the problem you're facing would help me
understand the problem more.

Nader

Rasha wrote:

>Dear Nader,
>
>I Have a big problem during indexing pdfs containing Persian Word
>
>lucenePDFIndexer cannot index it , and indexed words of pdf are unuseable
>
>
>is there a way to perform it to index good?
>
>
>regards,
>rasha malek
>
>
>
>
>
>
>  
>

--

Nader S. Henein
Senior Applications Architect

Bayt.com





---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]