[jira] [Updated] (NUTCH-2509) Inconsistent behavior in SitemapProcessor

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Updated] (NUTCH-2509) Inconsistent behavior in SitemapProcessor

JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/NUTCH-2509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yossi Tamari updated NUTCH-2509:
--------------------------------
    Attachment: SitemapProcessor.patch

> Inconsistent behavior in SitemapProcessor
> -----------------------------------------
>
>                 Key: NUTCH-2509
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2509
>             Project: Nutch
>          Issue Type: Bug
>          Components: sitemap
>    Affects Versions: 1.14
>            Reporter: Yossi Tamari
>            Priority: Minor
>         Attachments: SitemapProcessor.patch
>
>
> There are two inconsistent behaviors in SitemapProcessor:
>  # There is a member variable maxRedir that is supposed to limit the number of redirections on sitemap URLs, and it is initialized from config property sitemap.redir.max, but it is ignored in the code because a local variable with the same name is defined in the relevant method, and is always set to 3.
>  # When a sitemap URL goes through redirect, it is filtered and normalized. However, if a sitemap URL comes from a sitemapindex, it is not. This seems inconsistent, as in both cases we have a URL from an outside source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)