[jira] Commented: (NUTCH-359) extraction of links will fail for whole page if one single link cannot be parsed

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (NUTCH-359) extraction of links will fail for whole page if one single link cannot be parsed

Sebastian Nagel (Jira)

    [ https://issues.apache.org/jira/browse/NUTCH-359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12633358#action_12633358 ]

Andrzej Bialecki  commented on NUTCH-359:
-----------------------------------------

Fixed as a part of another commit.

> extraction of links will fail for whole page if one single link cannot be parsed
> --------------------------------------------------------------------------------
>
>                 Key: NUTCH-359
>                 URL: https://issues.apache.org/jira/browse/NUTCH-359
>             Project: Nutch
>          Issue Type: Bug
>          Components: fetcher
>    Affects Versions: 0.8
>         Environment: Ubuntu Dapper
>            Reporter: Renaud Richardet
>            Priority: Minor
>             Fix For: 1.0.0
>
>         Attachments: outlink.diff
>
>
> When Nutch parses the outlinks of a fetched page, the process will fail if a single link cannot be parsed (e.g. java.net.MalformedURLException: unknown protocol). The attached patch will keep indexing the remaining links on that page even if one fails.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.