Problem with trunk HtmlParser.java

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Problem with trunk HtmlParser.java

Ned Rockson
I tried to compile the trunk (version 579849) and it complained about
HtmlParser.  Basically, the 4th argument to the String constructor on
line 84 should have been a string, not a Charset.  Anyway, I made the
change but I can't check it back in so here is the diff:

Index: src/plugin/parse-html/src/java/org/apache/nutch/parse/html/HtmlParser.java
===================================================================
--- src/plugin/parse-html/src/java/org/apache/nutch/parse/html/HtmlParser.java
 (revision 579846)
+++ src/plugin/parse-html/src/java/org/apache/nutch/parse/html/HtmlParser.java
 (working copy)
@@ -81,7 +81,12 @@
    // to just inflate each byte to a 16-bit value by padding.
    // For instance, the sequence {0x41, 0x82, 0xb7} will be turned into
    // {U+0041, U+0082, U+00B7}.
-    String str = new String(content, 0, length, Charset.forName("ASCII"));
+    String str = "";
+    try {
+       str = new String(content, 0, length,
Charset.forName("ASCII").toString());
+    } catch (UnsupportedEncodingException e) {
+       e.printStackTrace();
+    }

    Matcher metaMatcher = metaPattern.matcher(str);
    String encoding = null;


Thanks,
Ned
Reply | Threaded
Open this post in threaded view
|

Re: Problem with trunk HtmlParser.java

Doğacan Güney-3
Hi,

On 9/27/07, Ned Rockson <[hidden email]> wrote:
> I tried to compile the trunk (version 579849) and it complained about
> HtmlParser.  Basically, the 4th argument to the String constructor on
> line 84 should have been a string, not a Charset.  Anyway, I made the
> change but I can't check it back in so here is the diff:

I have done it yet again... I am using java 6 in my development
environment (for various reasons) and even though I configured Eclipse
to be java 5 compatible, eclipse misses these (since this is not a
syntax change but a new method). Anyway, thanks for noticing and
sending a patch. Your patch is now committed (rev.  579922) and I will
try to be more careful not to make more commits that break nutch for
java 5 users.

>
> Index: src/plugin/parse-html/src/java/org/apache/nutch/parse/html/HtmlParser.java
> ===================================================================
> --- src/plugin/parse-html/src/java/org/apache/nutch/parse/html/HtmlParser.java
>  (revision 579846)
> +++ src/plugin/parse-html/src/java/org/apache/nutch/parse/html/HtmlParser.java
>  (working copy)
> @@ -81,7 +81,12 @@
>     // to just inflate each byte to a 16-bit value by padding.
>     // For instance, the sequence {0x41, 0x82, 0xb7} will be turned into
>     // {U+0041, U+0082, U+00B7}.
> -    String str = new String(content, 0, length, Charset.forName("ASCII"));
> +    String str = "";
> +    try {
> +       str = new String(content, 0, length,
> Charset.forName("ASCII").toString());
> +    } catch (UnsupportedEncodingException e) {
> +       e.printStackTrace();
> +    }
>
>     Matcher metaMatcher = metaPattern.matcher(str);
>     String encoding = null;
>
>
> Thanks,
> Ned
>


--
Doğacan Güney
Reply | Threaded
Open this post in threaded view
|

Re: Problem with trunk HtmlParser.java

Sami Siren-2
Dog(acan Güney wrote:

> I have done it yet again... I am using java 6 in my development
> environment (for various reasons) and even though I configured Eclipse
> to be java 5 compatible, eclipse misses these (since this is not a
> syntax change but a new method).

You can set projects JRE System library to verion 1.5 (from 1.5 jre).
That way you'll notice when try to code against something else but java
5 API.

--
 Sami Siren