cp command in webhdfs (and Filesystem Java Object)

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

cp command in webhdfs (and Filesystem Java Object)

Jérôme BAROTIN
Hello,

I'm writing this email, because, I spent one hour to look for a cp command in the webhdfs API (in fact, I'm using HTTPFS, but I think it's the same).

This command is implemented in the "hdfs dfs" command line client (and I'm using this command), but, I can't find it on the webhdfs REST API. I thought that webhdfs is an implementation of the Filesystem object (https://hadoop.apache.org/docs/r2.6.1/api/org/apache/hadoop/fs/FileSystem.html). I checked at the Java API and I haven't found any cp command. The only java cp command is on the FileUtil Object (https://hadoop.apache.org/docs/stable/api/org/apache/hadoop/fs/FileUtil.html) and I'm not sure that it work identicaly than "hdfs dfs -cp" command. 

I also checked at the Hadoop JIRA, and I found nothing : https://issues.apache.org/jira/browse/HADOOP-9417?jql=project%20%3D%20HADOOP%20AND%20(text%20~%20%22webhdfs%20copy%22%20OR%20text%20~%20%22webhdfs%20cp%22)

is there a way to execute a cp command through a REST API ?

All my best,


Jérôme
Reply | Threaded
Open this post in threaded view
|

Re: cp command in webhdfs (and Filesystem Java Object)

Rohan Rajeevan

If you are interested in intra cluster copy, may look at DistCp?

On Tue, Jun 28, 2016 at 9:36 AM, Jérôme BAROTIN <[hidden email]> wrote:
Hello,

I'm writing this email, because, I spent one hour to look for a cp command in the webhdfs API (in fact, I'm using HTTPFS, but I think it's the same).

This command is implemented in the "hdfs dfs" command line client (and I'm using this command), but, I can't find it on the webhdfs REST API. I thought that webhdfs is an implementation of the Filesystem object (https://hadoop.apache.org/docs/r2.6.1/api/org/apache/hadoop/fs/FileSystem.html). I checked at the Java API and I haven't found any cp command. The only java cp command is on the FileUtil Object (https://hadoop.apache.org/docs/stable/api/org/apache/hadoop/fs/FileUtil.html) and I'm not sure that it work identicaly than "hdfs dfs -cp" command. 

I also checked at the Hadoop JIRA, and I found nothing : https://issues.apache.org/jira/browse/HADOOP-9417?jql=project%20%3D%20HADOOP%20AND%20(text%20~%20%22webhdfs%20copy%22%20OR%20text%20~%20%22webhdfs%20cp%22)

is there a way to execute a cp command through a REST API ?

All my best,


Jérôme

Reply | Threaded
Open this post in threaded view
|

Re: cp command in webhdfs (and Filesystem Java Object)

Jérôme BAROTIN
I'm not thinking that is the same :
- CREATE is for a local file : in my case, I just want to copy one hdfs path to another on the same cluster
- Distcp, is for copying file between two differents clusters.

I'm using HTTPFs/webhdfsREST API to acces to my cluster, and I need to execute a "cp" command. How can I do that ?

Do I need to develop this service ?

Jérôme

2016-06-29 8:17 GMT+02:00 Rohan Rajeevan <[hidden email]>:

If you are interested in intra cluster copy, may look at DistCp?

On Tue, Jun 28, 2016 at 9:36 AM, Jérôme BAROTIN <[hidden email]> wrote:
Hello,

I'm writing this email, because, I spent one hour to look for a cp command in the webhdfs API (in fact, I'm using HTTPFS, but I think it's the same).

This command is implemented in the "hdfs dfs" command line client (and I'm using this command), but, I can't find it on the webhdfs REST API. I thought that webhdfs is an implementation of the Filesystem object (https://hadoop.apache.org/docs/r2.6.1/api/org/apache/hadoop/fs/FileSystem.html). I checked at the Java API and I haven't found any cp command. The only java cp command is on the FileUtil Object (https://hadoop.apache.org/docs/stable/api/org/apache/hadoop/fs/FileUtil.html) and I'm not sure that it work identicaly than "hdfs dfs -cp" command. 

I also checked at the Hadoop JIRA, and I found nothing : https://issues.apache.org/jira/browse/HADOOP-9417?jql=project%20%3D%20HADOOP%20AND%20(text%20~%20%22webhdfs%20copy%22%20OR%20text%20~%20%22webhdfs%20cp%22)

is there a way to execute a cp command through a REST API ?

All my best,


Jérôme


Reply | Threaded
Open this post in threaded view
|

Re: cp command in webhdfs (and Filesystem Java Object)

Chris Nauroth
Hello Jérôme,

WebHDFS provides an HTTP binding to the FileSystem API, which defines the primitive operations offered by the file system.  The FileSystem Shell builds on top of the FileSystem API to provide higher-level workflows, implemented using the FileSystem primitives.  In the case of "cp", copy is not a primitive operation defined by the FileSystem API.  Instead, the FileSystem Shell implements it by composing a few different FileSystem API primitives: open, create and rename.

Due to this separation, you won't find a "cp" operation directly in the WebHDFS REST API (or HTTPFS).  However, it is possible for the FileSystem shell to reference paths as URIs using the "webhdfs" scheme.  For example:

> hadoop fs -cp webhdfs://localhost:9870/hello1 webhdfs://localhost:9870/hello2

> hadoop fs -cat webhdfs://localhost:9870/hello2
hello

--Chris Nauroth

From: Jérôme BAROTIN <[hidden email]>
Date: Wednesday, June 29, 2016 at 12:44 AM
To: Rohan Rajeevan <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: Re: cp command in webhdfs (and Filesystem Java Object)

I'm not thinking that is the same :
- CREATE is for a local file : in my case, I just want to copy one hdfs path to another on the same cluster
- Distcp, is for copying file between two differents clusters.

I'm using HTTPFs/webhdfsREST API to acces to my cluster, and I need to execute a "cp" command. How can I do that ?

Do I need to develop this service ?

Jérôme

2016-06-29 8:17 GMT+02:00 Rohan Rajeevan <[hidden email]>:

If you are interested in intra cluster copy, may look at DistCp?

On Tue, Jun 28, 2016 at 9:36 AM, Jérôme BAROTIN <[hidden email]> wrote:
Hello,

I'm writing this email, because, I spent one hour to look for a cp command in the webhdfs API (in fact, I'm using HTTPFS, but I think it's the same).

This command is implemented in the "hdfs dfs" command line client (and I'm using this command), but, I can't find it on the webhdfs REST API. I thought that webhdfs is an implementation of the Filesystem object (https://hadoop.apache.org/docs/r2.6.1/api/org/apache/hadoop/fs/FileSystem.html). I checked at the Java API and I haven't found any cp command. The only java cp command is on the FileUtil Object (https://hadoop.apache.org/docs/stable/api/org/apache/hadoop/fs/FileUtil.html) and I'm not sure that it work identicaly than "hdfs dfs -cp" command. 

I also checked at the Hadoop JIRA, and I found nothing : https://issues.apache.org/jira/browse/HADOOP-9417?jql=project%20%3D%20HADOOP%20AND%20(text%20~%20%22webhdfs%20copy%22%20OR%20text%20~%20%22webhdfs%20cp%22)

is there a way to execute a cp command through a REST API ?

All my best,


Jérôme


Reply | Threaded
Open this post in threaded view
|

Re: cp command in webhdfs (and Filesystem Java Object)

Jérôme BAROTIN
Thanks for your response Chris, so I understand that there are no standard implementation of cp as a REST API ?

You mention that cp is a combination of "open, create and rename" all of theses method are available thought webhdfs. Do you think that we can re product remotely though execute several REST call ? (I mean without transferring data on client side) 

Otherwise, if I want to build my own hdfs cp API REST (in Java), do you think I should use the copy method of the FileUtil Object (https://hadoop.apache.org/docs/stable/api/org/apache/hadoop/fs/FileUtil.html) ?

Best regards,

Jérôme

2016-06-29 17:36 GMT+02:00 Chris Nauroth <[hidden email]>:
Hello Jérôme,

WebHDFS provides an HTTP binding to the FileSystem API, which defines the primitive operations offered by the file system.  The FileSystem Shell builds on top of the FileSystem API to provide higher-level workflows, implemented using the FileSystem primitives.  In the case of "cp", copy is not a primitive operation defined by the FileSystem API.  Instead, the FileSystem Shell implements it by composing a few different FileSystem API primitives: open, create and rename.

Due to this separation, you won't find a "cp" operation directly in the WebHDFS REST API (or HTTPFS).  However, it is possible for the FileSystem shell to reference paths as URIs using the "webhdfs" scheme.  For example:

> hadoop fs -cp webhdfs://localhost:9870/hello1 webhdfs://localhost:9870/hello2

> hadoop fs -cat webhdfs://localhost:9870/hello2
hello

--Chris Nauroth

From: Jérôme BAROTIN <[hidden email]>
Date: Wednesday, June 29, 2016 at 12:44 AM
To: Rohan Rajeevan <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: Re: cp command in webhdfs (and Filesystem Java Object)

I'm not thinking that is the same :
- CREATE is for a local file : in my case, I just want to copy one hdfs path to another on the same cluster
- Distcp, is for copying file between two differents clusters.

I'm using HTTPFs/webhdfsREST API to acces to my cluster, and I need to execute a "cp" command. How can I do that ?

Do I need to develop this service ?

Jérôme

2016-06-29 8:17 GMT+02:00 Rohan Rajeevan <[hidden email]>:

If you are interested in intra cluster copy, may look at DistCp?

On Tue, Jun 28, 2016 at 9:36 AM, Jérôme BAROTIN <[hidden email]> wrote:
Hello,

I'm writing this email, because, I spent one hour to look for a cp command in the webhdfs API (in fact, I'm using HTTPFS, but I think it's the same).

This command is implemented in the "hdfs dfs" command line client (and I'm using this command), but, I can't find it on the webhdfs REST API. I thought that webhdfs is an implementation of the Filesystem object (https://hadoop.apache.org/docs/r2.6.1/api/org/apache/hadoop/fs/FileSystem.html). I checked at the Java API and I haven't found any cp command. The only java cp command is on the FileUtil Object (https://hadoop.apache.org/docs/stable/api/org/apache/hadoop/fs/FileUtil.html) and I'm not sure that it work identicaly than "hdfs dfs -cp" command. 

I also checked at the Hadoop JIRA, and I found nothing : https://issues.apache.org/jira/browse/HADOOP-9417?jql=project%20%3D%20HADOOP%20AND%20(text%20~%20%22webhdfs%20copy%22%20OR%20text%20~%20%22webhdfs%20cp%22)

is there a way to execute a cp command through a REST API ?

All my best,


Jérôme