I am using Nutch 0.9 parsing framework on its own. I create a Content with a
contentType text/plain; charset="windows-1251". However, Content does not
preserve the charset part of the content type, so when the TextParser calls
it always gets null because the contentType no longer contains the charset string.
I see from the trunk that all this has changed quite a lot and I read about the
changes, but I'm not sure if I'm doing something wrong or if it ever worked.
Can anyone confirm is this is a known problem and if there is a simple known
solution- I could simply store the full contentType and add a new method to get
that, which is then used in TextParers, but is there a more elegant solution.