[jira] Created: (NUTCH-487) Neko HTML parser goes on default settings.

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

[jira] Created: (NUTCH-487) Neko HTML parser goes on default settings.

Clark Perkins (Jira)
Neko HTML parser goes on default settings.
------------------------------------------

                 Key: NUTCH-487
                 URL: https://issues.apache.org/jira/browse/NUTCH-487
             Project: Nutch
          Issue Type: Bug
          Components: fetcher
    Affects Versions: 0.9.0
         Environment: Linux, Java 1.5.0.
            Reporter: Marcin Okraszewski
         Attachments: neko_setup.patch

The Neko HTML parser set up is done in silent try / catch statement (Nutch 0.9: HtmlParser.java:248-259). The problem is that the first feature being set thrown an exception. So, the whole setup block is skipped. The catch statement does nothing, so probably nobody noticed this.

I attach a patch which fixes this. It was done on Nutch 0.9, but SVN trunk contains the same code.

The patch does:
1. Fixes augmentations feature.
2. Removes include-comments feature, because I couldn't find anything similar at http://people.apache.org/~andyc/neko/doc/html/settings.html
3. Prints warn message when exception is caught.

Please note that now there goes a lot for messages to console (not log4j log), because "report-errors" feature is being set. Shouldn't it be removed?

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (NUTCH-487) Neko HTML parser goes on default settings.

Clark Perkins (Jira)

     [ https://issues.apache.org/jira/browse/NUTCH-487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Marcin Okraszewski updated NUTCH-487:
-------------------------------------

    Attachment: neko_setup.patch

Patch for Nutch 0.9, which fixes the problem.

> Neko HTML parser goes on default settings.
> ------------------------------------------
>
>                 Key: NUTCH-487
>                 URL: https://issues.apache.org/jira/browse/NUTCH-487
>             Project: Nutch
>          Issue Type: Bug
>          Components: fetcher
>    Affects Versions: 0.9.0
>         Environment: Linux, Java 1.5.0.
>            Reporter: Marcin Okraszewski
>         Attachments: neko_setup.patch
>
>
> The Neko HTML parser set up is done in silent try / catch statement (Nutch 0.9: HtmlParser.java:248-259). The problem is that the first feature being set thrown an exception. So, the whole setup block is skipped. The catch statement does nothing, so probably nobody noticed this.
> I attach a patch which fixes this. It was done on Nutch 0.9, but SVN trunk contains the same code.
> The patch does:
> 1. Fixes augmentations feature.
> 2. Removes include-comments feature, because I couldn't find anything similar at http://people.apache.org/~andyc/neko/doc/html/settings.html
> 3. Prints warn message when exception is caught.
> Please note that now there goes a lot for messages to console (not log4j log), because "report-errors" feature is being set. Shouldn't it be removed?

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Closed: (NUTCH-487) Neko HTML parser goes on default settings.

Clark Perkins (Jira)
In reply to this post by Clark Perkins (Jira)

     [ https://issues.apache.org/jira/browse/NUTCH-487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doğacan Güney closed NUTCH-487.
-------------------------------

       Resolution: Fixed
    Fix Version/s: 1.0.0

Fixed as part of NUTCH-25. (Note: NUTCH-25's patch uses code from this patch)

> Neko HTML parser goes on default settings.
> ------------------------------------------
>
>                 Key: NUTCH-487
>                 URL: https://issues.apache.org/jira/browse/NUTCH-487
>             Project: Nutch
>          Issue Type: Bug
>          Components: fetcher
>    Affects Versions: 0.9.0
>         Environment: Linux, Java 1.5.0.
>            Reporter: Marcin Okraszewski
>             Fix For: 1.0.0
>
>         Attachments: neko_setup.patch
>
>
> The Neko HTML parser set up is done in silent try / catch statement (Nutch 0.9: HtmlParser.java:248-259). The problem is that the first feature being set thrown an exception. So, the whole setup block is skipped. The catch statement does nothing, so probably nobody noticed this.
> I attach a patch which fixes this. It was done on Nutch 0.9, but SVN trunk contains the same code.
> The patch does:
> 1. Fixes augmentations feature.
> 2. Removes include-comments feature, because I couldn't find anything similar at http://people.apache.org/~andyc/neko/doc/html/settings.html
> 3. Prints warn message when exception is caught.
> Please note that now there goes a lot for messages to console (not log4j log), because "report-errors" feature is being set. Shouldn't it be removed?

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (NUTCH-487) Neko HTML parser goes on default settings.

Clark Perkins (Jira)
In reply to this post by Clark Perkins (Jira)

    [ https://issues.apache.org/jira/browse/NUTCH-487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12530797 ]

Hudson commented on NUTCH-487:
------------------------------

Integrated in Nutch-Nightly #219 (See [http://lucene.zones.apache.org:8080/hudson/job/Nutch-Nightly/219/])

> Neko HTML parser goes on default settings.
> ------------------------------------------
>
>                 Key: NUTCH-487
>                 URL: https://issues.apache.org/jira/browse/NUTCH-487
>             Project: Nutch
>          Issue Type: Bug
>          Components: fetcher
>    Affects Versions: 0.9.0
>         Environment: Linux, Java 1.5.0.
>            Reporter: Marcin Okraszewski
>             Fix For: 1.0.0
>
>         Attachments: neko_setup.patch
>
>
> The Neko HTML parser set up is done in silent try / catch statement (Nutch 0.9: HtmlParser.java:248-259). The problem is that the first feature being set thrown an exception. So, the whole setup block is skipped. The catch statement does nothing, so probably nobody noticed this.
> I attach a patch which fixes this. It was done on Nutch 0.9, but SVN trunk contains the same code.
> The patch does:
> 1. Fixes augmentations feature.
> 2. Removes include-comments feature, because I couldn't find anything similar at http://people.apache.org/~andyc/neko/doc/html/settings.html
> 3. Prints warn message when exception is caught.
> Please note that now there goes a lot for messages to console (not log4j log), because "report-errors" feature is being set. Shouldn't it be removed?

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.