[jira] Created: (HADOOP-1377) Creation time and modification time for hadoop files and directories

classic Classic list List threaded Threaded
45 messages Options
123
Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-1377) Creation time and modification time for hadoop files and directories

Tim Allison (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-1377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12507531 ]

Konstantin Shvachko commented on HADOOP-1377:
---------------------------------------------

I agree there is no reason to measure time during which file was created, and therefore we need just one time stamp per file.
But I would rather go with creation time rather than with modification. Modification time makes an impression files can be
modified, which is not true. So we should introduce mod-time later when appends are implemented.

Yes, in your implementation different FileStatus-es share only getLength() and getReplication().
We can introduce the base class later if it will make sense.


> Creation time and modification time for hadoop files and directories
> --------------------------------------------------------------------
>
>                 Key: HADOOP-1377
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1377
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>             Fix For: 0.14.0
>
>         Attachments: 1377-noctime.patch, 1377.patch, CreationModificationTime.html, CreationTime8.patch
>
>
> This issue will document the requirements, design and implementation of creation times and modification times of hadoop files and directories.
> My proposal is to have support two additional attributes for each file and directory in HDFS. The "creation time" is the time when the file/directory was created. It is a 8 byte integer stored in each FSDirectory.INode. The "modification time" is the time when the last modification occured to the file/directory. It is an 8 byte integer stored in the FSDirectory.INode. These two fields are stored in in the FSEdits and FSImage as part of the transaction that created the file/directory.
> My current proposal is to not support "access time" for a file/directory. It is costly to implement and current applications might not need it.
> In the current implementation, the "modification time" for a file will be same as its creation time because HDFS files are currently unmodifiable. Setting file attributes (e.g. setting the replication factor) of a file does not modify the "modification time" of that file. The "modification time" for a directory is either its creation time or the time when the most recent file-delete or file-create occured in that directory.
> A new command named "hadoop dfs -lsl" will display the creation time and modification time of the files/directories that it lists. The output of the existing command "hadoop dfs -ls" will not be affected.
> The ClientProtocol will change because DFSFileInfo will have two additional fields: the creation time and modification time of the file that it represents. This information can be retrieved by clients thorugh the ClientProtocol.getListings() method. The FileSystem public API will have two additional methods: getCreationTime and getModificationTime().
> The datanodes are completely transparent to this design and implementation and requires no change.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-1377) Creation time and modification time for hadoop files and directories

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-1377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12507534 ]

Hadoop QA commented on HADOOP-1377:
-----------------------------------

+0, new Findbugs warnings

http://issues.apache.org/jira/secure/attachment/12360382/1377-noctime.patch
applied and successfully tested against trunk revision r549933,
but there appear to be new Findbugs warnings introduced by this patch.

New Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/324/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Test results:   http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/324/testReport/
Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/324/console

> Creation time and modification time for hadoop files and directories
> --------------------------------------------------------------------
>
>                 Key: HADOOP-1377
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1377
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>             Fix For: 0.14.0
>
>         Attachments: 1377-noctime.patch, 1377.patch, CreationModificationTime.html, CreationTime8.patch
>
>
> This issue will document the requirements, design and implementation of creation times and modification times of hadoop files and directories.
> My proposal is to have support two additional attributes for each file and directory in HDFS. The "creation time" is the time when the file/directory was created. It is a 8 byte integer stored in each FSDirectory.INode. The "modification time" is the time when the last modification occured to the file/directory. It is an 8 byte integer stored in the FSDirectory.INode. These two fields are stored in in the FSEdits and FSImage as part of the transaction that created the file/directory.
> My current proposal is to not support "access time" for a file/directory. It is costly to implement and current applications might not need it.
> In the current implementation, the "modification time" for a file will be same as its creation time because HDFS files are currently unmodifiable. Setting file attributes (e.g. setting the replication factor) of a file does not modify the "modification time" of that file. The "modification time" for a directory is either its creation time or the time when the most recent file-delete or file-create occured in that directory.
> A new command named "hadoop dfs -lsl" will display the creation time and modification time of the files/directories that it lists. The output of the existing command "hadoop dfs -ls" will not be affected.
> The ClientProtocol will change because DFSFileInfo will have two additional fields: the creation time and modification time of the file that it represents. This information can be retrieved by clients thorugh the ClientProtocol.getListings() method. The FileSystem public API will have two additional methods: getCreationTime and getModificationTime().
> The datanodes are completely transparent to this design and implementation and requires no change.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-1377) Creation time and modification time for hadoop files and directories

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-1377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12507539 ]

Doug Cutting commented on HADOOP-1377:
--------------------------------------

> I would rather go with creation time rather than with modification.

Except that java.io.File only implements modification, and our creation doesn't match the semantics of any posix concept.  So I think modification is the one to support.

> Creation time and modification time for hadoop files and directories
> --------------------------------------------------------------------
>
>                 Key: HADOOP-1377
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1377
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>             Fix For: 0.14.0
>
>         Attachments: 1377-noctime.patch, 1377.patch, CreationModificationTime.html, CreationTime8.patch
>
>
> This issue will document the requirements, design and implementation of creation times and modification times of hadoop files and directories.
> My proposal is to have support two additional attributes for each file and directory in HDFS. The "creation time" is the time when the file/directory was created. It is a 8 byte integer stored in each FSDirectory.INode. The "modification time" is the time when the last modification occured to the file/directory. It is an 8 byte integer stored in the FSDirectory.INode. These two fields are stored in in the FSEdits and FSImage as part of the transaction that created the file/directory.
> My current proposal is to not support "access time" for a file/directory. It is costly to implement and current applications might not need it.
> In the current implementation, the "modification time" for a file will be same as its creation time because HDFS files are currently unmodifiable. Setting file attributes (e.g. setting the replication factor) of a file does not modify the "modification time" of that file. The "modification time" for a directory is either its creation time or the time when the most recent file-delete or file-create occured in that directory.
> A new command named "hadoop dfs -lsl" will display the creation time and modification time of the files/directories that it lists. The output of the existing command "hadoop dfs -ls" will not be affected.
> The ClientProtocol will change because DFSFileInfo will have two additional fields: the creation time and modification time of the file that it represents. This information can be retrieved by clients thorugh the ClientProtocol.getListings() method. The FileSystem public API will have two additional methods: getCreationTime and getModificationTime().
> The datanodes are completely transparent to this design and implementation and requires no change.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (HADOOP-1377) Creation time and modification time for hadoop files and directories

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

     [ https://issues.apache.org/jira/browse/HADOOP-1377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting updated HADOOP-1377:
---------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

I just committed this.  I fixed the single FindBugs warning.  Thanks Dhruba!

> Creation time and modification time for hadoop files and directories
> --------------------------------------------------------------------
>
>                 Key: HADOOP-1377
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1377
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>             Fix For: 0.14.0
>
>         Attachments: 1377-noctime.patch, 1377.patch, CreationModificationTime.html, CreationTime8.patch
>
>
> This issue will document the requirements, design and implementation of creation times and modification times of hadoop files and directories.
> My proposal is to have support two additional attributes for each file and directory in HDFS. The "creation time" is the time when the file/directory was created. It is a 8 byte integer stored in each FSDirectory.INode. The "modification time" is the time when the last modification occured to the file/directory. It is an 8 byte integer stored in the FSDirectory.INode. These two fields are stored in in the FSEdits and FSImage as part of the transaction that created the file/directory.
> My current proposal is to not support "access time" for a file/directory. It is costly to implement and current applications might not need it.
> In the current implementation, the "modification time" for a file will be same as its creation time because HDFS files are currently unmodifiable. Setting file attributes (e.g. setting the replication factor) of a file does not modify the "modification time" of that file. The "modification time" for a directory is either its creation time or the time when the most recent file-delete or file-create occured in that directory.
> A new command named "hadoop dfs -lsl" will display the creation time and modification time of the files/directories that it lists. The output of the existing command "hadoop dfs -ls" will not be affected.
> The ClientProtocol will change because DFSFileInfo will have two additional fields: the creation time and modification time of the file that it represents. This information can be retrieved by clients thorugh the ClientProtocol.getListings() method. The FileSystem public API will have two additional methods: getCreationTime and getModificationTime().
> The datanodes are completely transparent to this design and implementation and requires no change.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-1377) Creation time and modification time for hadoop files and directories

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-1377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12507597 ]

Hudson commented on HADOOP-1377:
--------------------------------

Integrated in Hadoop-Nightly #133 (See [http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/133/])

> Creation time and modification time for hadoop files and directories
> --------------------------------------------------------------------
>
>                 Key: HADOOP-1377
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1377
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>             Fix For: 0.14.0
>
>         Attachments: 1377-noctime.patch, 1377.patch, CreationModificationTime.html, CreationTime8.patch
>
>
> This issue will document the requirements, design and implementation of creation times and modification times of hadoop files and directories.
> My proposal is to have support two additional attributes for each file and directory in HDFS. The "creation time" is the time when the file/directory was created. It is a 8 byte integer stored in each FSDirectory.INode. The "modification time" is the time when the last modification occured to the file/directory. It is an 8 byte integer stored in the FSDirectory.INode. These two fields are stored in in the FSEdits and FSImage as part of the transaction that created the file/directory.
> My current proposal is to not support "access time" for a file/directory. It is costly to implement and current applications might not need it.
> In the current implementation, the "modification time" for a file will be same as its creation time because HDFS files are currently unmodifiable. Setting file attributes (e.g. setting the replication factor) of a file does not modify the "modification time" of that file. The "modification time" for a directory is either its creation time or the time when the most recent file-delete or file-create occured in that directory.
> A new command named "hadoop dfs -lsl" will display the creation time and modification time of the files/directories that it lists. The output of the existing command "hadoop dfs -ls" will not be affected.
> The ClientProtocol will change because DFSFileInfo will have two additional fields: the creation time and modification time of the file that it represents. This information can be retrieved by clients thorugh the ClientProtocol.getListings() method. The FileSystem public API will have two additional methods: getCreationTime and getModificationTime().
> The datanodes are completely transparent to this design and implementation and requires no change.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

123