Nutch OutPut in which UTF format

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Nutch OutPut in which UTF format

Saurabh Suman
hi

I came to know that Nutch converts text to Unicode for all subsequent processing.But as i know unicode can be represented in different UTF format.
(1).Nutch output is in which UTF format?
(2).Can we change the utf formt of text?If yes, how?
Reply | Threaded
Open this post in threaded view
|

Re: Nutch OutPut in which UTF format

Doğacan Güney-3
On Mon, Jul 13, 2009 at 11:06, Saurabh Suman<[hidden email]> wrote:
>
> hi
>
> I came to know that Nutch converts text to Unicode for all subsequent
> processing.But as i know unicode can be represented in different UTF format.
> (1).Nutch output is in which UTF format?
> (2).Can we change the utf formt of text?If yes, how?
>

We normally use Text.{read,write}String to read and write String-s.
These methods
always use UTF-8. Text is a hadoop class, you may try examining Text's
source code
but I don't think that you can change it to anything but UTF-8.

> --
> View this message in context: http://www.nabble.com/Nutch-OutPut-in-which--UTF-format-tp24457617p24457617.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
>
>



--
Doğacan Güney