[jira] [Created] (HADOOP-16083) DistCp shouldn't always overwrite the target file when checksums match

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (HADOOP-16083) DistCp shouldn't always overwrite the target file when checksums match

JIRA jira@apache.org
Siyao Meng created HADOOP-16083:
-----------------------------------

             Summary: DistCp shouldn't always overwrite the target file when checksums match
                 Key: HADOOP-16083
                 URL: https://issues.apache.org/jira/browse/HADOOP-16083
             Project: Hadoop Common
          Issue Type: Improvement
          Components: tools/distcp
    Affects Versions: 3.1.1, 3.2.0, 3.3.0
            Reporter: Siyao Meng
            Assignee: Siyao Meng


{code:java|title=CopyMapper#setup}
...
    try {
      overWrite = overWrite || targetFS.getFileStatus(targetFinalPath).isFile();
    } catch (FileNotFoundException ignored) {
    }
...
{code}

The above code overrides config key "overWrite" to "true" when the target path is a file. Therefore, unnecessary transfer happens when the source and target file have the same checksums.

My suggestion is: remove the code above. If the user insists to overwrite, just add -overwrite in the options:
{code:bash|title=DistCp command with -overwrite option}
hadoop distcp -overwrite hdfs://localhost:64464/source/5/6.txt hdfs://localhost:64464/target/5/6.txt
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]