[jira] Created: (HADOOP-931) Make writes to S3FileSystem world visible only on completion

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

[jira] Created: (HADOOP-931) Make writes to S3FileSystem world visible only on completion

JIRA jira@apache.org
Make writes to S3FileSystem world visible only on completion
------------------------------------------------------------

                 Key: HADOOP-931
                 URL: https://issues.apache.org/jira/browse/HADOOP-931
             Project: Hadoop
          Issue Type: Bug
          Components: fs
            Reporter: Tom White


Currently files written to S3 are visible to other processes as soon as the first block has been written. This is different to DFS which only makes files world visible after the stream writing to the file has closed (see FSNamesystem.completeFile).

We could implement this by having a piece of inode metadata that indicates the visibility of the file.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-931) Make writes to S3FileSystem world visible only on completion

JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/HADOOP-931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12467511 ]

Doug Cutting commented on HADOOP-931:
-------------------------------------

Hmm.  Seeing partial files could be considered a feature.  LocalFileSystem also makes partial files visitible, no?  Is this breaking something?  If not, I'd leave things as-is.

> Make writes to S3FileSystem world visible only on completion
> ------------------------------------------------------------
>
>                 Key: HADOOP-931
>                 URL: https://issues.apache.org/jira/browse/HADOOP-931
>             Project: Hadoop
>          Issue Type: Bug
>          Components: fs
>            Reporter: Tom White
>
> Currently files written to S3 are visible to other processes as soon as the first block has been written. This is different to DFS which only makes files world visible after the stream writing to the file has closed (see FSNamesystem.completeFile).
> We could implement this by having a piece of inode metadata that indicates the visibility of the file.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-931) Make writes to S3FileSystem world visible only on completion

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/HADOOP-931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12467519 ]

Tom White commented on HADOOP-931:
----------------------------------

No, it's not breaking anything that I know of. I wanted to make S3FileSystem consistent with DFS, but as you rightly point out LocalFileSystem makes partial files visible.

It would be nice to improve the documentation of FileSystem to make it clearer what the contract permits, this could be combined with creating a set of common unit tests for different implementations. However, this feels like a longer term goal, so I won't pursue it further at the moment.

I'll close this issue.

> Make writes to S3FileSystem world visible only on completion
> ------------------------------------------------------------
>
>                 Key: HADOOP-931
>                 URL: https://issues.apache.org/jira/browse/HADOOP-931
>             Project: Hadoop
>          Issue Type: Bug
>          Components: fs
>            Reporter: Tom White
>
> Currently files written to S3 are visible to other processes as soon as the first block has been written. This is different to DFS which only makes files world visible after the stream writing to the file has closed (see FSNamesystem.completeFile).
> We could implement this by having a piece of inode metadata that indicates the visibility of the file.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-931) Make writes to S3FileSystem world visible only on completion

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/HADOOP-931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12467521 ]

Milind Bhandarkar commented on HADOOP-931:
------------------------------------------

I would prefer the DFS doing the right thing, i.e. listing the file being created in listPaths, but not allowing it to be opened for reading while it is being written.

> Make writes to S3FileSystem world visible only on completion
> ------------------------------------------------------------
>
>                 Key: HADOOP-931
>                 URL: https://issues.apache.org/jira/browse/HADOOP-931
>             Project: Hadoop
>          Issue Type: Bug
>          Components: fs
>            Reporter: Tom White
>
> Currently files written to S3 are visible to other processes as soon as the first block has been written. This is different to DFS which only makes files world visible after the stream writing to the file has closed (see FSNamesystem.completeFile).
> We could implement this by having a piece of inode metadata that indicates the visibility of the file.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-931) Make writes to S3FileSystem world visible only on completion

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/HADOOP-931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12467527 ]

Andrzej Bialecki  commented on HADOOP-931:
------------------------------------------

While we're at it, it's been often requested by Nutch users that DFS should do an automatic close of a partial file, if the process writing it abruptly exits. Currently partial files are deleted (which often means that even in case where partial files are usable they are deleted anyway).

> Make writes to S3FileSystem world visible only on completion
> ------------------------------------------------------------
>
>                 Key: HADOOP-931
>                 URL: https://issues.apache.org/jira/browse/HADOOP-931
>             Project: Hadoop
>          Issue Type: Bug
>          Components: fs
>            Reporter: Tom White
>
> Currently files written to S3 are visible to other processes as soon as the first block has been written. This is different to DFS which only makes files world visible after the stream writing to the file has closed (see FSNamesystem.completeFile).
> We could implement this by having a piece of inode metadata that indicates the visibility of the file.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Resolved: (HADOOP-931) Make writes to S3FileSystem world visible only on completion

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/HADOOP-931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tom White resolved HADOOP-931.
------------------------------

    Resolution: Won't Fix

Closing since nothing is broken. Any DFS changes should go in a new issue.

> Make writes to S3FileSystem world visible only on completion
> ------------------------------------------------------------
>
>                 Key: HADOOP-931
>                 URL: https://issues.apache.org/jira/browse/HADOOP-931
>             Project: Hadoop
>          Issue Type: Bug
>          Components: fs
>            Reporter: Tom White
>
> Currently files written to S3 are visible to other processes as soon as the first block has been written. This is different to DFS which only makes files world visible after the stream writing to the file has closed (see FSNamesystem.completeFile).
> We could implement this by having a piece of inode metadata that indicates the visibility of the file.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.