RE: Bayan Group Extractor plugin for Nutch-Spanish Accent Character Issue
I don't have any experience with this specific plugin, but I have run across similar problems, with 2 possible reasons:
1. It is possible that this specific site does not properly declare what encoding it is using, and the browser guesses the correct one.
2. You may have run across https://issues.apache.org/jira/browse/NUTCH-1807. I solved a similar problem by setting the environment variable LC_ALL to en_US.UTF-8 for all Hadoop processes (more specifically, adding `export LC_ALL=en_US.UTF-8` in ~hadoop/.bashrc on all Hadoop machines solved the problem for me).
> -----Original Message-----
> From: Rushi [mailto:[hidden email]]
> Sent: 25 January 2018 16:32
> To: [hidden email]; Mark Vega <[hidden email]>
> Subject: Bayan Group Extractor plugin for Nutch-Spanish Accent Character Issue
> Hello Everyone,
> I am having an issue while crawling the spanish website,some the accent
> characters are not converting properly.
> Here is an example InfecciÃ³n (wrong one)should be Infección (correct ).
> Note:This is with *Bayan Group Extractor plugin.* Is there any change that i
> need to make to convert correctly.
> Rushikesh M
> .Net Developer