[jira] [Resolved] (NUTCH-2398) Fetcher saving redirected robots.txt under redirect target URL

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[jira] [Resolved] (NUTCH-2398) Fetcher saving redirected robots.txt under redirect target URL

JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/NUTCH-2398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sebastian Nagel resolved NUTCH-2398.
------------------------------------
    Resolution: Fixed
      Assignee: Sebastian Nagel

Commited to 1.x/master ([2dc7472|https://github.com/apache/nutch/commit/2dc7472499a8518adbd72cec53617fad842665a5]).

> Fetcher saving redirected robots.txt under redirect target URL
> --------------------------------------------------------------
>
>                 Key: NUTCH-2398
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2398
>             Project: Nutch
>          Issue Type: Bug
>          Components: fetcher
>    Affects Versions: 1.13
>            Reporter: Sebastian Nagel
>            Assignee: Sebastian Nagel
>            Priority: Minor
>             Fix For: 1.14
>
>
> NUTCH-2300 lets the Fetcher store optionally the robots.txt response (content and HTTP status). If the '.../robots.txt' is redirected, the redirected content is also stored but with the redirect source URL as key. It should use the redirect target URL instead. Otherwise one of the responses is overwritten in the segments map file.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
Loading...