[jira] [Updated] (NUTCH-2331) REST API Fetch fails to retrieve HDFS path on distributed mode

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Updated] (NUTCH-2331) REST API Fetch fails to retrieve HDFS path on distributed mode

Jorge Spinsanti (Jira)

     [ https://issues.apache.org/jira/browse/NUTCH-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sebastian Nagel updated NUTCH-2331:
-----------------------------------
    Affects Version/s: 1.15

> REST API Fetch fails to retrieve HDFS path on distributed mode
> --------------------------------------------------------------
>
>                 Key: NUTCH-2331
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2331
>             Project: Nutch
>          Issue Type: Bug
>          Components: fetcher, REST_api
>    Affects Versions: 1.15
>            Reporter: Sujen Shah
>            Assignee: Sujen Shah
>            Priority: Major
>
> Currently in the REST API, if the user does not specify the absolute path of the segment to fetch and only the crawlId, then the fetcher would find the latest segment generated and use that.
> But as of now, the above functionality will only work in local mode as per https://github.com/apache/nutch/blob/master/src/java/org/apache/nutch/fetcher/Fetcher.java#L562-L573.
> Need to update these lines to enable fetcher to read the directory and list files from an hdfs system.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)