[jira] [Created] (NUTCH-2509) Inconsistent behavior in SitemapProcessor

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

[jira] [Created] (NUTCH-2509) Inconsistent behavior in SitemapProcessor

JIRA jira@apache.org
Yossi Tamari created NUTCH-2509:

             Summary: Inconsistent behavior in SitemapProcessor
                 Key: NUTCH-2509
                 URL: https://issues.apache.org/jira/browse/NUTCH-2509
             Project: Nutch
          Issue Type: Bug
          Components: sitemap
    Affects Versions: 1.14
            Reporter: Yossi Tamari

There are two inconsistent behaviors in SitemapProcessor:
 # There is a member variable maxRedir that is supposed to limit the number of redirections on sitemap URLs, and it is initialized from config property sitemap.redir.max, but it is ignored in the code because a local variable with the same name is defined in the relevant method, and is always set to 3.
 # When a sitemap URL goes through redirect, it is filtered and normalized. However, if a sitemap URL comes from a sitemapindex, it is not. This seems inconsistent, as in both cases we have a URL from an outside source.

This message was sent by Atlassian JIRA