[jira] Created: (HADOOP-1513) A likely race condition between the creation of a directory and checking for its existence in the DiskChecker class

classic Classic list List threaded Threaded
28 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[jira] Created: (HADOOP-1513) A likely race condition between the creation of a directory and checking for its existence in the DiskChecker class

Kenneth William Krugler (Jira)
A likely race condition between the creation of a directory and checking for its existence in the DiskChecker class
-------------------------------------------------------------------------------------------------------------------

                 Key: HADOOP-1513
                 URL: https://issues.apache.org/jira/browse/HADOOP-1513
             Project: Hadoop
          Issue Type: Bug
          Components: fs
    Affects Versions: 0.14.0
            Reporter: Devaraj Das
            Priority: Critical
             Fix For: 0.14.0


Got this exception in a job run. It looks like the problem is a race condition between the creation of a directory and checking for its existence. Specifically, the line:
if (!dir.exists() && !dir.mkdirs()), doesn't seem safe when invoked by multiple processes at the same time.

2007-06-21 07:55:33,583 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks: 1
2007-06-21 07:55:33,818 WARN org.apache.hadoop.fs.AllocatorPerContext: org.apache.hadoop.util.DiskChecker$DiskErrorException: can not create directory: /export/crawlspace/kryptonite/ddas/dfs/data/tmp
        at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:26)
        at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.createPath(LocalDirAllocator.java:211)
        at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:248)
        at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.createTmpFileForWrite(LocalDirAllocator.java:276)
        at org.apache.hadoop.fs.LocalDirAllocator.createTmpFileForWrite(LocalDirAllocator.java:155)
        at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.newBackupFile(DFSClient.java:1171)
        at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.(DFSClient.java:1136)
        at org.apache.hadoop.dfs.DFSClient.create(DFSClient.java:342)
        at org.apache.hadoop.dfs.DistributedFileSystem$RawDistributedFileSystem.create(DistributedFileSystem.java:145)
        at org.apache.hadoop.fs.ChecksumFileSystem$FSOutputSummer.(ChecksumFileSystem.java:368)
        at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:443)
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:254)
        at org.apache.hadoop.io.SequenceFile$Writer.(SequenceFile.java:675)
        at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:165)
        at org.apache.hadoop.examples.RandomWriter$Map.map(RandomWriter.java:137)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:189)
        at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1740)

2007-06-21 07:55:33,821 WARN org.apache.hadoop.mapred.TaskTracker: Error running child

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Assigned: (HADOOP-1513) A likely race condition between the creation of a directory and checking for its existence in the DiskChecker class

Kenneth William Krugler (Jira)

     [ https://issues.apache.org/jira/browse/HADOOP-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Devaraj Das reassigned HADOOP-1513:
-----------------------------------

    Assignee: Devaraj Das

> A likely race condition between the creation of a directory and checking for its existence in the DiskChecker class
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1513
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1513
>             Project: Hadoop
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.14.0
>            Reporter: Devaraj Das
>            Assignee: Devaraj Das
>            Priority: Critical
>             Fix For: 0.14.0
>
>
> Got this exception in a job run. It looks like the problem is a race condition between the creation of a directory and checking for its existence. Specifically, the line:
> if (!dir.exists() && !dir.mkdirs()), doesn't seem safe when invoked by multiple processes at the same time.
> 2007-06-21 07:55:33,583 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks: 1
> 2007-06-21 07:55:33,818 WARN org.apache.hadoop.fs.AllocatorPerContext: org.apache.hadoop.util.DiskChecker$DiskErrorException: can not create directory: /export/crawlspace/kryptonite/ddas/dfs/data/tmp
> at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:26)
> at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.createPath(LocalDirAllocator.java:211)
> at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:248)
> at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.createTmpFileForWrite(LocalDirAllocator.java:276)
> at org.apache.hadoop.fs.LocalDirAllocator.createTmpFileForWrite(LocalDirAllocator.java:155)
> at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.newBackupFile(DFSClient.java:1171)
> at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.(DFSClient.java:1136)
> at org.apache.hadoop.dfs.DFSClient.create(DFSClient.java:342)
> at org.apache.hadoop.dfs.DistributedFileSystem$RawDistributedFileSystem.create(DistributedFileSystem.java:145)
> at org.apache.hadoop.fs.ChecksumFileSystem$FSOutputSummer.(ChecksumFileSystem.java:368)
> at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:443)
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:254)
> at org.apache.hadoop.io.SequenceFile$Writer.(SequenceFile.java:675)
> at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:165)
> at org.apache.hadoop.examples.RandomWriter$Map.map(RandomWriter.java:137)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:189)
> at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1740)
> 2007-06-21 07:55:33,821 WARN org.apache.hadoop.mapred.TaskTracker: Error running child

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (HADOOP-1513) A likely race condition between the creation of a directory and checking for its existence in the DiskChecker class

Kenneth William Krugler (Jira)
In reply to this post by Kenneth William Krugler (Jira)

     [ https://issues.apache.org/jira/browse/HADOOP-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Devaraj Das updated HADOOP-1513:
--------------------------------

    Attachment: 1513.patch

The patch addresses the problem. It breaks up the expression evaluated in the if clause into two parts. There is no check done for the mkdirs() call's return value. The subsequent exists() check decides whether to throw an exception or not.

> A likely race condition between the creation of a directory and checking for its existence in the DiskChecker class
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1513
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1513
>             Project: Hadoop
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.14.0
>            Reporter: Devaraj Das
>            Assignee: Devaraj Das
>            Priority: Critical
>             Fix For: 0.14.0
>
>         Attachments: 1513.patch
>
>
> Got this exception in a job run. It looks like the problem is a race condition between the creation of a directory and checking for its existence. Specifically, the line:
> if (!dir.exists() && !dir.mkdirs()), doesn't seem safe when invoked by multiple processes at the same time.
> 2007-06-21 07:55:33,583 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks: 1
> 2007-06-21 07:55:33,818 WARN org.apache.hadoop.fs.AllocatorPerContext: org.apache.hadoop.util.DiskChecker$DiskErrorException: can not create directory: /export/crawlspace/kryptonite/ddas/dfs/data/tmp
> at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:26)
> at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.createPath(LocalDirAllocator.java:211)
> at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:248)
> at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.createTmpFileForWrite(LocalDirAllocator.java:276)
> at org.apache.hadoop.fs.LocalDirAllocator.createTmpFileForWrite(LocalDirAllocator.java:155)
> at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.newBackupFile(DFSClient.java:1171)
> at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.(DFSClient.java:1136)
> at org.apache.hadoop.dfs.DFSClient.create(DFSClient.java:342)
> at org.apache.hadoop.dfs.DistributedFileSystem$RawDistributedFileSystem.create(DistributedFileSystem.java:145)
> at org.apache.hadoop.fs.ChecksumFileSystem$FSOutputSummer.(ChecksumFileSystem.java:368)
> at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:443)
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:254)
> at org.apache.hadoop.io.SequenceFile$Writer.(SequenceFile.java:675)
> at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:165)
> at org.apache.hadoop.examples.RandomWriter$Map.map(RandomWriter.java:137)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:189)
> at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1740)
> 2007-06-21 07:55:33,821 WARN org.apache.hadoop.mapred.TaskTracker: Error running child

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (HADOOP-1513) A likely race condition between the creation of a directory and checking for its existence in the DiskChecker class

Kenneth William Krugler (Jira)
In reply to this post by Kenneth William Krugler (Jira)

     [ https://issues.apache.org/jira/browse/HADOOP-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Devaraj Das updated HADOOP-1513:
--------------------------------

    Status: Patch Available  (was: Open)

> A likely race condition between the creation of a directory and checking for its existence in the DiskChecker class
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1513
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1513
>             Project: Hadoop
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.14.0
>            Reporter: Devaraj Das
>            Assignee: Devaraj Das
>            Priority: Critical
>             Fix For: 0.14.0
>
>         Attachments: 1513.patch
>
>
> Got this exception in a job run. It looks like the problem is a race condition between the creation of a directory and checking for its existence. Specifically, the line:
> if (!dir.exists() && !dir.mkdirs()), doesn't seem safe when invoked by multiple processes at the same time.
> 2007-06-21 07:55:33,583 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks: 1
> 2007-06-21 07:55:33,818 WARN org.apache.hadoop.fs.AllocatorPerContext: org.apache.hadoop.util.DiskChecker$DiskErrorException: can not create directory: /export/crawlspace/kryptonite/ddas/dfs/data/tmp
> at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:26)
> at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.createPath(LocalDirAllocator.java:211)
> at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:248)
> at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.createTmpFileForWrite(LocalDirAllocator.java:276)
> at org.apache.hadoop.fs.LocalDirAllocator.createTmpFileForWrite(LocalDirAllocator.java:155)
> at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.newBackupFile(DFSClient.java:1171)
> at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.(DFSClient.java:1136)
> at org.apache.hadoop.dfs.DFSClient.create(DFSClient.java:342)
> at org.apache.hadoop.dfs.DistributedFileSystem$RawDistributedFileSystem.create(DistributedFileSystem.java:145)
> at org.apache.hadoop.fs.ChecksumFileSystem$FSOutputSummer.(ChecksumFileSystem.java:368)
> at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:443)
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:254)
> at org.apache.hadoop.io.SequenceFile$Writer.(SequenceFile.java:675)
> at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:165)
> at org.apache.hadoop.examples.RandomWriter$Map.map(RandomWriter.java:137)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:189)
> at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1740)
> 2007-06-21 07:55:33,821 WARN org.apache.hadoop.mapred.TaskTracker: Error running child

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-1513) A likely race condition between the creation of a directory and checking for its existence in the DiskChecker class

Kenneth William Krugler (Jira)
In reply to this post by Kenneth William Krugler (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12506855 ]

Hadoop QA commented on HADOOP-1513:
-----------------------------------

+1

http://issues.apache.org/jira/secure/attachment/12360263/1513.patch applied and successfully tested against trunk revision r549284.

Test results:   http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/316/testReport/
Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/316/console

> A likely race condition between the creation of a directory and checking for its existence in the DiskChecker class
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1513
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1513
>             Project: Hadoop
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.14.0
>            Reporter: Devaraj Das
>            Assignee: Devaraj Das
>            Priority: Critical
>             Fix For: 0.14.0
>
>         Attachments: 1513.patch
>
>
> Got this exception in a job run. It looks like the problem is a race condition between the creation of a directory and checking for its existence. Specifically, the line:
> if (!dir.exists() && !dir.mkdirs()), doesn't seem safe when invoked by multiple processes at the same time.
> 2007-06-21 07:55:33,583 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks: 1
> 2007-06-21 07:55:33,818 WARN org.apache.hadoop.fs.AllocatorPerContext: org.apache.hadoop.util.DiskChecker$DiskErrorException: can not create directory: /export/crawlspace/kryptonite/ddas/dfs/data/tmp
> at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:26)
> at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.createPath(LocalDirAllocator.java:211)
> at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:248)
> at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.createTmpFileForWrite(LocalDirAllocator.java:276)
> at org.apache.hadoop.fs.LocalDirAllocator.createTmpFileForWrite(LocalDirAllocator.java:155)
> at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.newBackupFile(DFSClient.java:1171)
> at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.(DFSClient.java:1136)
> at org.apache.hadoop.dfs.DFSClient.create(DFSClient.java:342)
> at org.apache.hadoop.dfs.DistributedFileSystem$RawDistributedFileSystem.create(DistributedFileSystem.java:145)
> at org.apache.hadoop.fs.ChecksumFileSystem$FSOutputSummer.(ChecksumFileSystem.java:368)
> at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:443)
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:254)
> at org.apache.hadoop.io.SequenceFile$Writer.(SequenceFile.java:675)
> at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:165)
> at org.apache.hadoop.examples.RandomWriter$Map.map(RandomWriter.java:137)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:189)
> at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1740)
> 2007-06-21 07:55:33,821 WARN org.apache.hadoop.mapred.TaskTracker: Error running child

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-1513) A likely race condition between the creation of a directory and checking for its existence in the DiskChecker class

Kenneth William Krugler (Jira)
In reply to this post by Kenneth William Krugler (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12506970 ]

dhruba borthakur commented on HADOOP-1513:
------------------------------------------

I think this patch might not fix the real problem of finding/fixing the race condition between exists() and mkdirs(). See HADOOP-1502. This patch must be changing the timing in a such a way that the problem gets hidden.

> A likely race condition between the creation of a directory and checking for its existence in the DiskChecker class
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1513
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1513
>             Project: Hadoop
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.14.0
>            Reporter: Devaraj Das
>            Assignee: Devaraj Das
>            Priority: Critical
>             Fix For: 0.14.0
>
>         Attachments: 1513.patch
>
>
> Got this exception in a job run. It looks like the problem is a race condition between the creation of a directory and checking for its existence. Specifically, the line:
> if (!dir.exists() && !dir.mkdirs()), doesn't seem safe when invoked by multiple processes at the same time.
> 2007-06-21 07:55:33,583 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks: 1
> 2007-06-21 07:55:33,818 WARN org.apache.hadoop.fs.AllocatorPerContext: org.apache.hadoop.util.DiskChecker$DiskErrorException: can not create directory: /export/crawlspace/kryptonite/ddas/dfs/data/tmp
> at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:26)
> at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.createPath(LocalDirAllocator.java:211)
> at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:248)
> at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.createTmpFileForWrite(LocalDirAllocator.java:276)
> at org.apache.hadoop.fs.LocalDirAllocator.createTmpFileForWrite(LocalDirAllocator.java:155)
> at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.newBackupFile(DFSClient.java:1171)
> at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.(DFSClient.java:1136)
> at org.apache.hadoop.dfs.DFSClient.create(DFSClient.java:342)
> at org.apache.hadoop.dfs.DistributedFileSystem$RawDistributedFileSystem.create(DistributedFileSystem.java:145)
> at org.apache.hadoop.fs.ChecksumFileSystem$FSOutputSummer.(ChecksumFileSystem.java:368)
> at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:443)
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:254)
> at org.apache.hadoop.io.SequenceFile$Writer.(SequenceFile.java:675)
> at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:165)
> at org.apache.hadoop.examples.RandomWriter$Map.map(RandomWriter.java:137)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:189)
> at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1740)
> 2007-06-21 07:55:33,821 WARN org.apache.hadoop.mapred.TaskTracker: Error running child

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-1513) A likely race condition between the creation of a directory and checking for its existence in the DiskChecker class

Kenneth William Krugler (Jira)
In reply to this post by Kenneth William Krugler (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12506989 ]

Devaraj Das commented on HADOOP-1513:
-------------------------------------

Dhruba,  I was looking at the 'if' clause in conjunction with the exception that is thrown (if the expression returns true).

The idea behind the if clause,
if (!dir.exists() && !dir.mkdirs()),
is to first check whether the directory exists, and, if not, create it. If the creation fails, exception is thrown.

I think breaking the if clause into two parts solves the problem in the context of its usage. If a race condition ever occurs, it will be this way - the first process will create the dir successfully. The second process will not be able to do so (inside the OS kernel, things will be atomic). In the DiskChecker.checkDir method's context, things will still work - we will throw an exception only when we don't see the directory (we really don't need to care who created the directory). So, yes, the reason for throwing the exception is different, but IMO it is consistent overall. There cannot be a race condition in the exists( ) check since the kernel provided the atomicity in the directory creation.

BTW, there are some more checks done afterwards in the method (readable/writable checks). Those will rule out permission issues to do with processes from different users competing with each other to create the dir (we will bail out if we discover the dir is not writable/readable).

> A likely race condition between the creation of a directory and checking for its existence in the DiskChecker class
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1513
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1513
>             Project: Hadoop
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.14.0
>            Reporter: Devaraj Das
>            Assignee: Devaraj Das
>            Priority: Critical
>             Fix For: 0.14.0
>
>         Attachments: 1513.patch
>
>
> Got this exception in a job run. It looks like the problem is a race condition between the creation of a directory and checking for its existence. Specifically, the line:
> if (!dir.exists() && !dir.mkdirs()), doesn't seem safe when invoked by multiple processes at the same time.
> 2007-06-21 07:55:33,583 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks: 1
> 2007-06-21 07:55:33,818 WARN org.apache.hadoop.fs.AllocatorPerContext: org.apache.hadoop.util.DiskChecker$DiskErrorException: can not create directory: /export/crawlspace/kryptonite/ddas/dfs/data/tmp
> at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:26)
> at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.createPath(LocalDirAllocator.java:211)
> at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:248)
> at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.createTmpFileForWrite(LocalDirAllocator.java:276)
> at org.apache.hadoop.fs.LocalDirAllocator.createTmpFileForWrite(LocalDirAllocator.java:155)
> at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.newBackupFile(DFSClient.java:1171)
> at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.(DFSClient.java:1136)
> at org.apache.hadoop.dfs.DFSClient.create(DFSClient.java:342)
> at org.apache.hadoop.dfs.DistributedFileSystem$RawDistributedFileSystem.create(DistributedFileSystem.java:145)
> at org.apache.hadoop.fs.ChecksumFileSystem$FSOutputSummer.(ChecksumFileSystem.java:368)
> at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:443)
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:254)
> at org.apache.hadoop.io.SequenceFile$Writer.(SequenceFile.java:675)
> at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:165)
> at org.apache.hadoop.examples.RandomWriter$Map.map(RandomWriter.java:137)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:189)
> at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1740)
> 2007-06-21 07:55:33,821 WARN org.apache.hadoop.mapred.TaskTracker: Error running child

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (HADOOP-1513) A likely race condition between the creation of a directory and checking for its existence in the DiskChecker class

Kenneth William Krugler (Jira)
In reply to this post by Kenneth William Krugler (Jira)

     [ https://issues.apache.org/jira/browse/HADOOP-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Devaraj Das updated HADOOP-1513:
--------------------------------

    Status: Open  (was: Patch Available)

Ok, I realized that all what I said in my last comment will hold only for an mkdir( ) call, but we are making mkdirs( ) call (which internally makes a chain of mkdir( ) calls for each component in the path). mkdirs( ) will return false if any mkdir( ) call returns false. So here is a case where breaking up the expression evaluated within the 'if' statement will not solve the problem.
{noformat}
    dir.mkdirs();
    if (!dir.exists()) {
        throw new DiskErrorException("can not create directory: "
                                    + dir.toString());
    }
{noformat}

Two threads/processes (t1 & t2) go inside the mkdirs( ) call and t1 makes the first few (successful) calls to mkdir( ), and then t2 gets to run. t2 will immediately return error since the first component in the path already exists. Now t2 goes to the exists( ) call and that might return false since the entire directory tree might have not yet been created by t1. Thus, exception is thrown and that is not right.

We have to make the above exists( ) check for each component in the path if mkdir( ) for that component fails.

So we could have a custom implementation of mkdirs( ) called mkdirsExists( ) that will return false if the following expression returns false.
{noformat}
   boolean mkdirsExists(String path) {
   ...........
       if (!component.mkdir( ) && !component.exists( ) ) {
          return false;
       }
  ..........
  }
{noformat}

Makes sense ?

> A likely race condition between the creation of a directory and checking for its existence in the DiskChecker class
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1513
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1513
>             Project: Hadoop
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.14.0
>            Reporter: Devaraj Das
>            Assignee: Devaraj Das
>            Priority: Critical
>             Fix For: 0.14.0
>
>         Attachments: 1513.patch
>
>
> Got this exception in a job run. It looks like the problem is a race condition between the creation of a directory and checking for its existence. Specifically, the line:
> if (!dir.exists() && !dir.mkdirs()), doesn't seem safe when invoked by multiple processes at the same time.
> 2007-06-21 07:55:33,583 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks: 1
> 2007-06-21 07:55:33,818 WARN org.apache.hadoop.fs.AllocatorPerContext: org.apache.hadoop.util.DiskChecker$DiskErrorException: can not create directory: /export/crawlspace/kryptonite/ddas/dfs/data/tmp
> at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:26)
> at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.createPath(LocalDirAllocator.java:211)
> at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:248)
> at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.createTmpFileForWrite(LocalDirAllocator.java:276)
> at org.apache.hadoop.fs.LocalDirAllocator.createTmpFileForWrite(LocalDirAllocator.java:155)
> at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.newBackupFile(DFSClient.java:1171)
> at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.(DFSClient.java:1136)
> at org.apache.hadoop.dfs.DFSClient.create(DFSClient.java:342)
> at org.apache.hadoop.dfs.DistributedFileSystem$RawDistributedFileSystem.create(DistributedFileSystem.java:145)
> at org.apache.hadoop.fs.ChecksumFileSystem$FSOutputSummer.(ChecksumFileSystem.java:368)
> at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:443)
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:254)
> at org.apache.hadoop.io.SequenceFile$Writer.(SequenceFile.java:675)
> at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:165)
> at org.apache.hadoop.examples.RandomWriter$Map.map(RandomWriter.java:137)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:189)
> at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1740)
> 2007-06-21 07:55:33,821 WARN org.apache.hadoop.mapred.TaskTracker: Error running child

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-1513) A likely race condition between the creation of a directory and checking for its existence in the DiskChecker class

Kenneth William Krugler (Jira)
In reply to this post by Kenneth William Krugler (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12507133 ]

Raghu Angadi commented on HADOOP-1513:
--------------------------------------

mkdirs() should not require absence of all components of the path. e.g. /user/ usually always exists in case of hadoop. mkdirs() should be similar to 'mkdir -p' command which creates parent directories if they don't exist. At the minimum mkdirs should do what {{mkdirExists()}} does above, I haven't seen source code for mkdirs.

> A likely race condition between the creation of a directory and checking for its existence in the DiskChecker class
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1513
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1513
>             Project: Hadoop
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.14.0
>            Reporter: Devaraj Das
>            Assignee: Devaraj Das
>            Priority: Critical
>             Fix For: 0.14.0
>
>         Attachments: 1513.patch
>
>
> Got this exception in a job run. It looks like the problem is a race condition between the creation of a directory and checking for its existence. Specifically, the line:
> if (!dir.exists() && !dir.mkdirs()), doesn't seem safe when invoked by multiple processes at the same time.
> 2007-06-21 07:55:33,583 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks: 1
> 2007-06-21 07:55:33,818 WARN org.apache.hadoop.fs.AllocatorPerContext: org.apache.hadoop.util.DiskChecker$DiskErrorException: can not create directory: /export/crawlspace/kryptonite/ddas/dfs/data/tmp
> at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:26)
> at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.createPath(LocalDirAllocator.java:211)
> at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:248)
> at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.createTmpFileForWrite(LocalDirAllocator.java:276)
> at org.apache.hadoop.fs.LocalDirAllocator.createTmpFileForWrite(LocalDirAllocator.java:155)
> at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.newBackupFile(DFSClient.java:1171)
> at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.(DFSClient.java:1136)
> at org.apache.hadoop.dfs.DFSClient.create(DFSClient.java:342)
> at org.apache.hadoop.dfs.DistributedFileSystem$RawDistributedFileSystem.create(DistributedFileSystem.java:145)
> at org.apache.hadoop.fs.ChecksumFileSystem$FSOutputSummer.(ChecksumFileSystem.java:368)
> at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:443)
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:254)
> at org.apache.hadoop.io.SequenceFile$Writer.(SequenceFile.java:675)
> at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:165)
> at org.apache.hadoop.examples.RandomWriter$Map.map(RandomWriter.java:137)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:189)
> at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1740)
> 2007-06-21 07:55:33,821 WARN org.apache.hadoop.mapred.TaskTracker: Error running child

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-1513) A likely race condition between the creation of a directory and checking for its existence in the DiskChecker class

Kenneth William Krugler (Jira)
In reply to this post by Kenneth William Krugler (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12507135 ]

dhruba borthakur commented on HADOOP-1513:
------------------------------------------

Two mkdirs() cannot interleave. If you look at FSNamesystem.mkdirsInternal(), it is synchronized with the global FSNamesystem lock. I belive that the scenario Devaraj explained "cannot happen".


> A likely race condition between the creation of a directory and checking for its existence in the DiskChecker class
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1513
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1513
>             Project: Hadoop
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.14.0
>            Reporter: Devaraj Das
>            Assignee: Devaraj Das
>            Priority: Critical
>             Fix For: 0.14.0
>
>         Attachments: 1513.patch
>
>
> Got this exception in a job run. It looks like the problem is a race condition between the creation of a directory and checking for its existence. Specifically, the line:
> if (!dir.exists() && !dir.mkdirs()), doesn't seem safe when invoked by multiple processes at the same time.
> 2007-06-21 07:55:33,583 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks: 1
> 2007-06-21 07:55:33,818 WARN org.apache.hadoop.fs.AllocatorPerContext: org.apache.hadoop.util.DiskChecker$DiskErrorException: can not create directory: /export/crawlspace/kryptonite/ddas/dfs/data/tmp
> at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:26)
> at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.createPath(LocalDirAllocator.java:211)
> at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:248)
> at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.createTmpFileForWrite(LocalDirAllocator.java:276)
> at org.apache.hadoop.fs.LocalDirAllocator.createTmpFileForWrite(LocalDirAllocator.java:155)
> at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.newBackupFile(DFSClient.java:1171)
> at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.(DFSClient.java:1136)
> at org.apache.hadoop.dfs.DFSClient.create(DFSClient.java:342)
> at org.apache.hadoop.dfs.DistributedFileSystem$RawDistributedFileSystem.create(DistributedFileSystem.java:145)
> at org.apache.hadoop.fs.ChecksumFileSystem$FSOutputSummer.(ChecksumFileSystem.java:368)
> at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:443)
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:254)
> at org.apache.hadoop.io.SequenceFile$Writer.(SequenceFile.java:675)
> at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:165)
> at org.apache.hadoop.examples.RandomWriter$Map.map(RandomWriter.java:137)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:189)
> at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1740)
> 2007-06-21 07:55:33,821 WARN org.apache.hadoop.mapred.TaskTracker: Error running child

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-1513) A likely race condition between the creation of a directory and checking for its existence in the DiskChecker class

Kenneth William Krugler (Jira)
In reply to this post by Kenneth William Krugler (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12507143 ]

Devaraj Das commented on HADOOP-1513:
-------------------------------------

I guess I was not clear enough in my comments. I am referring to the File.mkdirs( ) and not FSNameSystem's equivalent of that.

> A likely race condition between the creation of a directory and checking for its existence in the DiskChecker class
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1513
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1513
>             Project: Hadoop
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.14.0
>            Reporter: Devaraj Das
>            Assignee: Devaraj Das
>            Priority: Critical
>             Fix For: 0.14.0
>
>         Attachments: 1513.patch
>
>
> Got this exception in a job run. It looks like the problem is a race condition between the creation of a directory and checking for its existence. Specifically, the line:
> if (!dir.exists() && !dir.mkdirs()), doesn't seem safe when invoked by multiple processes at the same time.
> 2007-06-21 07:55:33,583 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks: 1
> 2007-06-21 07:55:33,818 WARN org.apache.hadoop.fs.AllocatorPerContext: org.apache.hadoop.util.DiskChecker$DiskErrorException: can not create directory: /export/crawlspace/kryptonite/ddas/dfs/data/tmp
> at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:26)
> at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.createPath(LocalDirAllocator.java:211)
> at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:248)
> at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.createTmpFileForWrite(LocalDirAllocator.java:276)
> at org.apache.hadoop.fs.LocalDirAllocator.createTmpFileForWrite(LocalDirAllocator.java:155)
> at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.newBackupFile(DFSClient.java:1171)
> at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.(DFSClient.java:1136)
> at org.apache.hadoop.dfs.DFSClient.create(DFSClient.java:342)
> at org.apache.hadoop.dfs.DistributedFileSystem$RawDistributedFileSystem.create(DistributedFileSystem.java:145)
> at org.apache.hadoop.fs.ChecksumFileSystem$FSOutputSummer.(ChecksumFileSystem.java:368)
> at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:443)
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:254)
> at org.apache.hadoop.io.SequenceFile$Writer.(SequenceFile.java:675)
> at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:165)
> at org.apache.hadoop.examples.RandomWriter$Map.map(RandomWriter.java:137)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:189)
> at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1740)
> 2007-06-21 07:55:33,821 WARN org.apache.hadoop.mapred.TaskTracker: Error running child

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-1513) A likely race condition between the creation of a directory and checking for its existence in the DiskChecker class

Kenneth William Krugler (Jira)
In reply to this post by Kenneth William Krugler (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12507147 ]

Devaraj Das commented on HADOOP-1513:
-------------------------------------

My bad, that I mentioned "for each component". The call will return after creating the necessary parents.
However, the problem is that if it got recursively invoked for a parent which was not existent, and that invocation is not able to complete successfully since another thread just got scheduled and created that same dir., it signals an error (returns false).

> A likely race condition between the creation of a directory and checking for its existence in the DiskChecker class
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1513
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1513
>             Project: Hadoop
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.14.0
>            Reporter: Devaraj Das
>            Assignee: Devaraj Das
>            Priority: Critical
>             Fix For: 0.14.0
>
>         Attachments: 1513.patch
>
>
> Got this exception in a job run. It looks like the problem is a race condition between the creation of a directory and checking for its existence. Specifically, the line:
> if (!dir.exists() && !dir.mkdirs()), doesn't seem safe when invoked by multiple processes at the same time.
> 2007-06-21 07:55:33,583 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks: 1
> 2007-06-21 07:55:33,818 WARN org.apache.hadoop.fs.AllocatorPerContext: org.apache.hadoop.util.DiskChecker$DiskErrorException: can not create directory: /export/crawlspace/kryptonite/ddas/dfs/data/tmp
> at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:26)
> at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.createPath(LocalDirAllocator.java:211)
> at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:248)
> at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.createTmpFileForWrite(LocalDirAllocator.java:276)
> at org.apache.hadoop.fs.LocalDirAllocator.createTmpFileForWrite(LocalDirAllocator.java:155)
> at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.newBackupFile(DFSClient.java:1171)
> at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.(DFSClient.java:1136)
> at org.apache.hadoop.dfs.DFSClient.create(DFSClient.java:342)
> at org.apache.hadoop.dfs.DistributedFileSystem$RawDistributedFileSystem.create(DistributedFileSystem.java:145)
> at org.apache.hadoop.fs.ChecksumFileSystem$FSOutputSummer.(ChecksumFileSystem.java:368)
> at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:443)
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:254)
> at org.apache.hadoop.io.SequenceFile$Writer.(SequenceFile.java:675)
> at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:165)
> at org.apache.hadoop.examples.RandomWriter$Map.map(RandomWriter.java:137)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:189)
> at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1740)
> 2007-06-21 07:55:33,821 WARN org.apache.hadoop.mapred.TaskTracker: Error running child

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Issue Comment Edited: (HADOOP-1513) A likely race condition between the creation of a directory and checking for its existence in the DiskChecker class

Kenneth William Krugler (Jira)
In reply to this post by Kenneth William Krugler (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12507147 ]

Devaraj Das edited comment on HADOOP-1513 at 6/22/07 12:06 AM:
---------------------------------------------------------------

Raghu: My bad, that I mentioned "for each component". The call will return after creating the necessary parents.
However, the problem is that if it got recursively invoked for a parent which was not existent, and that invocation is not able to complete successfully since another thread just got scheduled and created that same dir., it signals an error (returns false).


 was:
My bad, that I mentioned "for each component". The call will return after creating the necessary parents.
However, the problem is that if it got recursively invoked for a parent which was not existent, and that invocation is not able to complete successfully since another thread just got scheduled and created that same dir., it signals an error (returns false).

> A likely race condition between the creation of a directory and checking for its existence in the DiskChecker class
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1513
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1513
>             Project: Hadoop
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.14.0
>            Reporter: Devaraj Das
>            Assignee: Devaraj Das
>            Priority: Critical
>             Fix For: 0.14.0
>
>         Attachments: 1513.patch
>
>
> Got this exception in a job run. It looks like the problem is a race condition between the creation of a directory and checking for its existence. Specifically, the line:
> if (!dir.exists() && !dir.mkdirs()), doesn't seem safe when invoked by multiple processes at the same time.
> 2007-06-21 07:55:33,583 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks: 1
> 2007-06-21 07:55:33,818 WARN org.apache.hadoop.fs.AllocatorPerContext: org.apache.hadoop.util.DiskChecker$DiskErrorException: can not create directory: /export/crawlspace/kryptonite/ddas/dfs/data/tmp
> at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:26)
> at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.createPath(LocalDirAllocator.java:211)
> at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:248)
> at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.createTmpFileForWrite(LocalDirAllocator.java:276)
> at org.apache.hadoop.fs.LocalDirAllocator.createTmpFileForWrite(LocalDirAllocator.java:155)
> at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.newBackupFile(DFSClient.java:1171)
> at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.(DFSClient.java:1136)
> at org.apache.hadoop.dfs.DFSClient.create(DFSClient.java:342)
> at org.apache.hadoop.dfs.DistributedFileSystem$RawDistributedFileSystem.create(DistributedFileSystem.java:145)
> at org.apache.hadoop.fs.ChecksumFileSystem$FSOutputSummer.(ChecksumFileSystem.java:368)
> at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:443)
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:254)
> at org.apache.hadoop.io.SequenceFile$Writer.(SequenceFile.java:675)
> at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:165)
> at org.apache.hadoop.examples.RandomWriter$Map.map(RandomWriter.java:137)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:189)
> at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1740)
> 2007-06-21 07:55:33,821 WARN org.apache.hadoop.mapred.TaskTracker: Error running child

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-1513) A likely race condition between the creation of a directory and checking for its existence in the DiskChecker class

Kenneth William Krugler (Jira)
In reply to this post by Kenneth William Krugler (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12507424 ]

Raghu Angadi commented on HADOOP-1513:
--------------------------------------

Are saying File.mkdirs() has this problem based on its implementation or  contract or reproduceable error? I still don't see how just mkdirs()'s documentation implies this problem. The patch you have seems correct (except that you don't need to call exist() when mkdirs() returns true.. but mkdirs() would return false most of the time in this case  DFSClient).

I think I understand case  you are describing with two thread but why do you think mkdirs() first invokes {{exists()}} and then {{mkdir()}} ?

> A likely race condition between the creation of a directory and checking for its existence in the DiskChecker class
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1513
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1513
>             Project: Hadoop
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.14.0
>            Reporter: Devaraj Das
>            Assignee: Devaraj Das
>            Priority: Critical
>             Fix For: 0.14.0
>
>         Attachments: 1513.patch
>
>
> Got this exception in a job run. It looks like the problem is a race condition between the creation of a directory and checking for its existence. Specifically, the line:
> if (!dir.exists() && !dir.mkdirs()), doesn't seem safe when invoked by multiple processes at the same time.
> 2007-06-21 07:55:33,583 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks: 1
> 2007-06-21 07:55:33,818 WARN org.apache.hadoop.fs.AllocatorPerContext: org.apache.hadoop.util.DiskChecker$DiskErrorException: can not create directory: /export/crawlspace/kryptonite/ddas/dfs/data/tmp
> at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:26)
> at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.createPath(LocalDirAllocator.java:211)
> at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:248)
> at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.createTmpFileForWrite(LocalDirAllocator.java:276)
> at org.apache.hadoop.fs.LocalDirAllocator.createTmpFileForWrite(LocalDirAllocator.java:155)
> at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.newBackupFile(DFSClient.java:1171)
> at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.(DFSClient.java:1136)
> at org.apache.hadoop.dfs.DFSClient.create(DFSClient.java:342)
> at org.apache.hadoop.dfs.DistributedFileSystem$RawDistributedFileSystem.create(DistributedFileSystem.java:145)
> at org.apache.hadoop.fs.ChecksumFileSystem$FSOutputSummer.(ChecksumFileSystem.java:368)
> at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:443)
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:254)
> at org.apache.hadoop.io.SequenceFile$Writer.(SequenceFile.java:675)
> at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:165)
> at org.apache.hadoop.examples.RandomWriter$Map.map(RandomWriter.java:137)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:189)
> at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1740)
> 2007-06-21 07:55:33,821 WARN org.apache.hadoop.mapred.TaskTracker: Error running child

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Issue Comment Edited: (HADOOP-1513) A likely race condition between the creation of a directory and checking for its existence in the DiskChecker class

Kenneth William Krugler (Jira)
In reply to this post by Kenneth William Krugler (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12507424 ]

Raghu Angadi edited comment on HADOOP-1513 at 6/22/07 8:54 AM:
---------------------------------------------------------------

Are you saying File.mkdirs() has this problem based on its implementation or  contract or reproduceable error? I still don't see how just mkdirs()'s documentation implies this problem. The patch you have seems correct (except that you don't need to call exists() when mkdirs() returns true.. but mkdirs() would return false most of the time in DFSClient's case).

I think I understand case  you are describing with two threads but why do you think mkdirs() first invokes {{exists()}} and then {{mkdir()}} ?


 was:
Are saying File.mkdirs() has this problem based on its implementation or  contract or reproduceable error? I still don't see how just mkdirs()'s documentation implies this problem. The patch you have seems correct (except that you don't need to call exist() when mkdirs() returns true.. but mkdirs() would return false most of the time in this case  DFSClient).

I think I understand case  you are describing with two thread but why do you think mkdirs() first invokes {{exists()}} and then {{mkdir()}} ?

> A likely race condition between the creation of a directory and checking for its existence in the DiskChecker class
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1513
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1513
>             Project: Hadoop
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.14.0
>            Reporter: Devaraj Das
>            Assignee: Devaraj Das
>            Priority: Critical
>             Fix For: 0.14.0
>
>         Attachments: 1513.patch
>
>
> Got this exception in a job run. It looks like the problem is a race condition between the creation of a directory and checking for its existence. Specifically, the line:
> if (!dir.exists() && !dir.mkdirs()), doesn't seem safe when invoked by multiple processes at the same time.
> 2007-06-21 07:55:33,583 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks: 1
> 2007-06-21 07:55:33,818 WARN org.apache.hadoop.fs.AllocatorPerContext: org.apache.hadoop.util.DiskChecker$DiskErrorException: can not create directory: /export/crawlspace/kryptonite/ddas/dfs/data/tmp
> at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:26)
> at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.createPath(LocalDirAllocator.java:211)
> at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:248)
> at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.createTmpFileForWrite(LocalDirAllocator.java:276)
> at org.apache.hadoop.fs.LocalDirAllocator.createTmpFileForWrite(LocalDirAllocator.java:155)
> at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.newBackupFile(DFSClient.java:1171)
> at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.(DFSClient.java:1136)
> at org.apache.hadoop.dfs.DFSClient.create(DFSClient.java:342)
> at org.apache.hadoop.dfs.DistributedFileSystem$RawDistributedFileSystem.create(DistributedFileSystem.java:145)
> at org.apache.hadoop.fs.ChecksumFileSystem$FSOutputSummer.(ChecksumFileSystem.java:368)
> at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:443)
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:254)
> at org.apache.hadoop.io.SequenceFile$Writer.(SequenceFile.java:675)
> at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:165)
> at org.apache.hadoop.examples.RandomWriter$Map.map(RandomWriter.java:137)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:189)
> at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1740)
> 2007-06-21 07:55:33,821 WARN org.apache.hadoop.mapred.TaskTracker: Error running child

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-1513) A likely race condition between the creation of a directory and checking for its existence in the DiskChecker class

Kenneth William Krugler (Jira)
In reply to this post by Kenneth William Krugler (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12507467 ]

Raghu Angadi commented on HADOOP-1513:
--------------------------------------

Devraj: yes, mkdirs()'s implementation seems to be making the same mistake we did (before your patch). You should probably report the bug it to Sun/Java.

> A likely race condition between the creation of a directory and checking for its existence in the DiskChecker class
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1513
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1513
>             Project: Hadoop
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.14.0
>            Reporter: Devaraj Das
>            Assignee: Devaraj Das
>            Priority: Critical
>             Fix For: 0.14.0
>
>         Attachments: 1513.patch
>
>
> Got this exception in a job run. It looks like the problem is a race condition between the creation of a directory and checking for its existence. Specifically, the line:
> if (!dir.exists() && !dir.mkdirs()), doesn't seem safe when invoked by multiple processes at the same time.
> 2007-06-21 07:55:33,583 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks: 1
> 2007-06-21 07:55:33,818 WARN org.apache.hadoop.fs.AllocatorPerContext: org.apache.hadoop.util.DiskChecker$DiskErrorException: can not create directory: /export/crawlspace/kryptonite/ddas/dfs/data/tmp
> at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:26)
> at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.createPath(LocalDirAllocator.java:211)
> at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:248)
> at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.createTmpFileForWrite(LocalDirAllocator.java:276)
> at org.apache.hadoop.fs.LocalDirAllocator.createTmpFileForWrite(LocalDirAllocator.java:155)
> at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.newBackupFile(DFSClient.java:1171)
> at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.(DFSClient.java:1136)
> at org.apache.hadoop.dfs.DFSClient.create(DFSClient.java:342)
> at org.apache.hadoop.dfs.DistributedFileSystem$RawDistributedFileSystem.create(DistributedFileSystem.java:145)
> at org.apache.hadoop.fs.ChecksumFileSystem$FSOutputSummer.(ChecksumFileSystem.java:368)
> at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:443)
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:254)
> at org.apache.hadoop.io.SequenceFile$Writer.(SequenceFile.java:675)
> at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:165)
> at org.apache.hadoop.examples.RandomWriter$Map.map(RandomWriter.java:137)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:189)
> at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1740)
> 2007-06-21 07:55:33,821 WARN org.apache.hadoop.mapred.TaskTracker: Error running child

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (HADOOP-1513) A likely race condition between the creation of a directory and checking for its existence in the DiskChecker class

Kenneth William Krugler (Jira)
In reply to this post by Kenneth William Krugler (Jira)

     [ https://issues.apache.org/jira/browse/HADOOP-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Devaraj Das updated HADOOP-1513:
--------------------------------

    Attachment: 1513.patch

This patch defines a new method that will ignore the return value of mkdir() if it is false. It will invoke exists() and if that returns true, things are assumed to be fine.

> A likely race condition between the creation of a directory and checking for its existence in the DiskChecker class
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1513
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1513
>             Project: Hadoop
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.14.0
>            Reporter: Devaraj Das
>            Assignee: Devaraj Das
>            Priority: Critical
>             Fix For: 0.14.0
>
>         Attachments: 1513.patch, 1513.patch
>
>
> Got this exception in a job run. It looks like the problem is a race condition between the creation of a directory and checking for its existence. Specifically, the line:
> if (!dir.exists() && !dir.mkdirs()), doesn't seem safe when invoked by multiple processes at the same time.
> 2007-06-21 07:55:33,583 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks: 1
> 2007-06-21 07:55:33,818 WARN org.apache.hadoop.fs.AllocatorPerContext: org.apache.hadoop.util.DiskChecker$DiskErrorException: can not create directory: /export/crawlspace/kryptonite/ddas/dfs/data/tmp
> at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:26)
> at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.createPath(LocalDirAllocator.java:211)
> at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:248)
> at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.createTmpFileForWrite(LocalDirAllocator.java:276)
> at org.apache.hadoop.fs.LocalDirAllocator.createTmpFileForWrite(LocalDirAllocator.java:155)
> at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.newBackupFile(DFSClient.java:1171)
> at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.(DFSClient.java:1136)
> at org.apache.hadoop.dfs.DFSClient.create(DFSClient.java:342)
> at org.apache.hadoop.dfs.DistributedFileSystem$RawDistributedFileSystem.create(DistributedFileSystem.java:145)
> at org.apache.hadoop.fs.ChecksumFileSystem$FSOutputSummer.(ChecksumFileSystem.java:368)
> at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:443)
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:254)
> at org.apache.hadoop.io.SequenceFile$Writer.(SequenceFile.java:675)
> at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:165)
> at org.apache.hadoop.examples.RandomWriter$Map.map(RandomWriter.java:137)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:189)
> at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1740)
> 2007-06-21 07:55:33,821 WARN org.apache.hadoop.mapred.TaskTracker: Error running child

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-1513) A likely race condition between the creation of a directory and checking for its existence in the DiskChecker class

Kenneth William Krugler (Jira)
In reply to this post by Kenneth William Krugler (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508236 ]

Raghu Angadi commented on HADOOP-1513:
--------------------------------------

+1.  You don't always need exists() check i.e. we cna have  'if (!mkdirsWithExistsCheck() && !exists()) { then throw ioe}'.

> A likely race condition between the creation of a directory and checking for its existence in the DiskChecker class
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1513
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1513
>             Project: Hadoop
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.14.0
>            Reporter: Devaraj Das
>            Assignee: Devaraj Das
>            Priority: Critical
>             Fix For: 0.14.0
>
>         Attachments: 1513.patch, 1513.patch
>
>
> Got this exception in a job run. It looks like the problem is a race condition between the creation of a directory and checking for its existence. Specifically, the line:
> if (!dir.exists() && !dir.mkdirs()), doesn't seem safe when invoked by multiple processes at the same time.
> 2007-06-21 07:55:33,583 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks: 1
> 2007-06-21 07:55:33,818 WARN org.apache.hadoop.fs.AllocatorPerContext: org.apache.hadoop.util.DiskChecker$DiskErrorException: can not create directory: /export/crawlspace/kryptonite/ddas/dfs/data/tmp
> at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:26)
> at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.createPath(LocalDirAllocator.java:211)
> at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:248)
> at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.createTmpFileForWrite(LocalDirAllocator.java:276)
> at org.apache.hadoop.fs.LocalDirAllocator.createTmpFileForWrite(LocalDirAllocator.java:155)
> at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.newBackupFile(DFSClient.java:1171)
> at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.(DFSClient.java:1136)
> at org.apache.hadoop.dfs.DFSClient.create(DFSClient.java:342)
> at org.apache.hadoop.dfs.DistributedFileSystem$RawDistributedFileSystem.create(DistributedFileSystem.java:145)
> at org.apache.hadoop.fs.ChecksumFileSystem$FSOutputSummer.(ChecksumFileSystem.java:368)
> at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:443)
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:254)
> at org.apache.hadoop.io.SequenceFile$Writer.(SequenceFile.java:675)
> at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:165)
> at org.apache.hadoop.examples.RandomWriter$Map.map(RandomWriter.java:137)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:189)
> at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1740)
> 2007-06-21 07:55:33,821 WARN org.apache.hadoop.mapred.TaskTracker: Error running child

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-1513) A likely race condition between the creation of a directory and checking for its existence in the DiskChecker class

Kenneth William Krugler (Jira)
In reply to this post by Kenneth William Krugler (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508243 ]

Raghu Angadi commented on HADOOP-1513:
--------------------------------------

Actually it should just be {{'if (!mkdirsWithExistsCheck()) { throw ...'}}. Also could you explicitly note in the comment that this implementation is a modification of Sun's File.mkdirs() implementation.

> A likely race condition between the creation of a directory and checking for its existence in the DiskChecker class
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1513
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1513
>             Project: Hadoop
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.14.0
>            Reporter: Devaraj Das
>            Assignee: Devaraj Das
>            Priority: Critical
>             Fix For: 0.14.0
>
>         Attachments: 1513.patch, 1513.patch
>
>
> Got this exception in a job run. It looks like the problem is a race condition between the creation of a directory and checking for its existence. Specifically, the line:
> if (!dir.exists() && !dir.mkdirs()), doesn't seem safe when invoked by multiple processes at the same time.
> 2007-06-21 07:55:33,583 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks: 1
> 2007-06-21 07:55:33,818 WARN org.apache.hadoop.fs.AllocatorPerContext: org.apache.hadoop.util.DiskChecker$DiskErrorException: can not create directory: /export/crawlspace/kryptonite/ddas/dfs/data/tmp
> at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:26)
> at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.createPath(LocalDirAllocator.java:211)
> at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:248)
> at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.createTmpFileForWrite(LocalDirAllocator.java:276)
> at org.apache.hadoop.fs.LocalDirAllocator.createTmpFileForWrite(LocalDirAllocator.java:155)
> at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.newBackupFile(DFSClient.java:1171)
> at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.(DFSClient.java:1136)
> at org.apache.hadoop.dfs.DFSClient.create(DFSClient.java:342)
> at org.apache.hadoop.dfs.DistributedFileSystem$RawDistributedFileSystem.create(DistributedFileSystem.java:145)
> at org.apache.hadoop.fs.ChecksumFileSystem$FSOutputSummer.(ChecksumFileSystem.java:368)
> at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:443)
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:254)
> at org.apache.hadoop.io.SequenceFile$Writer.(SequenceFile.java:675)
> at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:165)
> at org.apache.hadoop.examples.RandomWriter$Map.map(RandomWriter.java:137)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:189)
> at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1740)
> 2007-06-21 07:55:33,821 WARN org.apache.hadoop.mapred.TaskTracker: Error running child

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (HADOOP-1513) A likely race condition between the creation of a directory and checking for its existence in the DiskChecker class

Kenneth William Krugler (Jira)
In reply to this post by Kenneth William Krugler (Jira)

     [ https://issues.apache.org/jira/browse/HADOOP-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Devaraj Das updated HADOOP-1513:
--------------------------------

    Attachment: 1513.patch

Yes, I also realized that the exists() check was redundant. Removed that and also added a comment that mkdirsWithExistsCheck is semantically different from Sun's java.io.File.mkdirs().

> A likely race condition between the creation of a directory and checking for its existence in the DiskChecker class
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1513
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1513
>             Project: Hadoop
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.14.0
>            Reporter: Devaraj Das
>            Assignee: Devaraj Das
>            Priority: Critical
>             Fix For: 0.14.0
>
>         Attachments: 1513.patch, 1513.patch, 1513.patch
>
>
> Got this exception in a job run. It looks like the problem is a race condition between the creation of a directory and checking for its existence. Specifically, the line:
> if (!dir.exists() && !dir.mkdirs()), doesn't seem safe when invoked by multiple processes at the same time.
> 2007-06-21 07:55:33,583 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks: 1
> 2007-06-21 07:55:33,818 WARN org.apache.hadoop.fs.AllocatorPerContext: org.apache.hadoop.util.DiskChecker$DiskErrorException: can not create directory: /export/crawlspace/kryptonite/ddas/dfs/data/tmp
> at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:26)
> at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.createPath(LocalDirAllocator.java:211)
> at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:248)
> at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.createTmpFileForWrite(LocalDirAllocator.java:276)
> at org.apache.hadoop.fs.LocalDirAllocator.createTmpFileForWrite(LocalDirAllocator.java:155)
> at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.newBackupFile(DFSClient.java:1171)
> at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.(DFSClient.java:1136)
> at org.apache.hadoop.dfs.DFSClient.create(DFSClient.java:342)
> at org.apache.hadoop.dfs.DistributedFileSystem$RawDistributedFileSystem.create(DistributedFileSystem.java:145)
> at org.apache.hadoop.fs.ChecksumFileSystem$FSOutputSummer.(ChecksumFileSystem.java:368)
> at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:443)
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:254)
> at org.apache.hadoop.io.SequenceFile$Writer.(SequenceFile.java:675)
> at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:165)
> at org.apache.hadoop.examples.RandomWriter$Map.map(RandomWriter.java:137)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:189)
> at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1740)
> 2007-06-21 07:55:33,821 WARN org.apache.hadoop.mapred.TaskTracker: Error running child

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

12