[jira] Created: (HADOOP-1377) Creation time and modification time for hadoop files and directories

classic Classic list List threaded Threaded
45 messages Options
123
Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (HADOOP-1377) Creation time and modification time for hadoop files and directories

Chris Mattmann (Jira)

     [ https://issues.apache.org/jira/browse/HADOOP-1377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting updated HADOOP-1377:
---------------------------------

    Fix Version/s: 0.14.0
           Status: Open  (was: Patch Available)

Unfortunately this patch conflicts with HADOOP-1283, which I just committed.

Also, in LocalFileSystem, RawLocalFileSystem and InMemoryFileSystem, the getLength(), isDirectory(), getBlockSize(), and getReplication() methods can be eliminated, with these method bodies copied to each class's status constructor.  The implementations in FilterFileSystem.java can also be removed.  So then the only implementations of these methods will be in FileSystem.java, for back-compatibility.  Does that make sense?



> Creation time and modification time for hadoop files and directories
> --------------------------------------------------------------------
>
>                 Key: HADOOP-1377
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1377
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>             Fix For: 0.14.0
>
>         Attachments: CreationModificationTime.html, CreationTime6.patch
>
>
> This issue will document the requirements, design and implementation of creation times and modification times of hadoop files and directories.
> My proposal is to have support two additional attributes for each file and directory in HDFS. The "creation time" is the time when the file/directory was created. It is a 8 byte integer stored in each FSDirectory.INode. The "modification time" is the time when the last modification occured to the file/directory. It is an 8 byte integer stored in the FSDirectory.INode. These two fields are stored in in the FSEdits and FSImage as part of the transaction that created the file/directory.
> My current proposal is to not support "access time" for a file/directory. It is costly to implement and current applications might not need it.
> In the current implementation, the "modification time" for a file will be same as its creation time because HDFS files are currently unmodifiable. Setting file attributes (e.g. setting the replication factor) of a file does not modify the "modification time" of that file. The "modification time" for a directory is either its creation time or the time when the most recent file-delete or file-create occured in that directory.
> A new command named "hadoop dfs -lsl" will display the creation time and modification time of the files/directories that it lists. The output of the existing command "hadoop dfs -ls" will not be affected.
> The ClientProtocol will change because DFSFileInfo will have two additional fields: the creation time and modification time of the file that it represents. This information can be retrieved by clients thorugh the ClientProtocol.getListings() method. The FileSystem public API will have two additional methods: getCreationTime and getModificationTime().
> The datanodes are completely transparent to this design and implementation and requires no change.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (HADOOP-1377) Creation time and modification time for hadoop files and directories

Chris Mattmann (Jira)
In reply to this post by Chris Mattmann (Jira)

     [ https://issues.apache.org/jira/browse/HADOOP-1377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-1377:
-------------------------------------

    Attachment:     (was: CreationTime6.patch)

> Creation time and modification time for hadoop files and directories
> --------------------------------------------------------------------
>
>                 Key: HADOOP-1377
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1377
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>             Fix For: 0.14.0
>
>         Attachments: CreationModificationTime.html
>
>
> This issue will document the requirements, design and implementation of creation times and modification times of hadoop files and directories.
> My proposal is to have support two additional attributes for each file and directory in HDFS. The "creation time" is the time when the file/directory was created. It is a 8 byte integer stored in each FSDirectory.INode. The "modification time" is the time when the last modification occured to the file/directory. It is an 8 byte integer stored in the FSDirectory.INode. These two fields are stored in in the FSEdits and FSImage as part of the transaction that created the file/directory.
> My current proposal is to not support "access time" for a file/directory. It is costly to implement and current applications might not need it.
> In the current implementation, the "modification time" for a file will be same as its creation time because HDFS files are currently unmodifiable. Setting file attributes (e.g. setting the replication factor) of a file does not modify the "modification time" of that file. The "modification time" for a directory is either its creation time or the time when the most recent file-delete or file-create occured in that directory.
> A new command named "hadoop dfs -lsl" will display the creation time and modification time of the files/directories that it lists. The output of the existing command "hadoop dfs -ls" will not be affected.
> The ClientProtocol will change because DFSFileInfo will have two additional fields: the creation time and modification time of the file that it represents. This information can be retrieved by clients thorugh the ClientProtocol.getListings() method. The FileSystem public API will have two additional methods: getCreationTime and getModificationTime().
> The datanodes are completely transparent to this design and implementation and requires no change.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (HADOOP-1377) Creation time and modification time for hadoop files and directories

Chris Mattmann (Jira)
In reply to this post by Chris Mattmann (Jira)

     [ https://issues.apache.org/jira/browse/HADOOP-1377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-1377:
-------------------------------------

    Attachment: CreationTime8.patch

Merged patch with latest trunk. and incorporated Dougs' comments.

> Creation time and modification time for hadoop files and directories
> --------------------------------------------------------------------
>
>                 Key: HADOOP-1377
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1377
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>             Fix For: 0.14.0
>
>         Attachments: CreationModificationTime.html, CreationTime8.patch
>
>
> This issue will document the requirements, design and implementation of creation times and modification times of hadoop files and directories.
> My proposal is to have support two additional attributes for each file and directory in HDFS. The "creation time" is the time when the file/directory was created. It is a 8 byte integer stored in each FSDirectory.INode. The "modification time" is the time when the last modification occured to the file/directory. It is an 8 byte integer stored in the FSDirectory.INode. These two fields are stored in in the FSEdits and FSImage as part of the transaction that created the file/directory.
> My current proposal is to not support "access time" for a file/directory. It is costly to implement and current applications might not need it.
> In the current implementation, the "modification time" for a file will be same as its creation time because HDFS files are currently unmodifiable. Setting file attributes (e.g. setting the replication factor) of a file does not modify the "modification time" of that file. The "modification time" for a directory is either its creation time or the time when the most recent file-delete or file-create occured in that directory.
> A new command named "hadoop dfs -lsl" will display the creation time and modification time of the files/directories that it lists. The output of the existing command "hadoop dfs -ls" will not be affected.
> The ClientProtocol will change because DFSFileInfo will have two additional fields: the creation time and modification time of the file that it represents. This information can be retrieved by clients thorugh the ClientProtocol.getListings() method. The FileSystem public API will have two additional methods: getCreationTime and getModificationTime().
> The datanodes are completely transparent to this design and implementation and requires no change.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (HADOOP-1377) Creation time and modification time for hadoop files and directories

Chris Mattmann (Jira)
In reply to this post by Chris Mattmann (Jira)

     [ https://issues.apache.org/jira/browse/HADOOP-1377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-1377:
-------------------------------------

    Status: Patch Available  (was: Open)

> Creation time and modification time for hadoop files and directories
> --------------------------------------------------------------------
>
>                 Key: HADOOP-1377
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1377
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>             Fix For: 0.14.0
>
>         Attachments: CreationModificationTime.html, CreationTime8.patch
>
>
> This issue will document the requirements, design and implementation of creation times and modification times of hadoop files and directories.
> My proposal is to have support two additional attributes for each file and directory in HDFS. The "creation time" is the time when the file/directory was created. It is a 8 byte integer stored in each FSDirectory.INode. The "modification time" is the time when the last modification occured to the file/directory. It is an 8 byte integer stored in the FSDirectory.INode. These two fields are stored in in the FSEdits and FSImage as part of the transaction that created the file/directory.
> My current proposal is to not support "access time" for a file/directory. It is costly to implement and current applications might not need it.
> In the current implementation, the "modification time" for a file will be same as its creation time because HDFS files are currently unmodifiable. Setting file attributes (e.g. setting the replication factor) of a file does not modify the "modification time" of that file. The "modification time" for a directory is either its creation time or the time when the most recent file-delete or file-create occured in that directory.
> A new command named "hadoop dfs -lsl" will display the creation time and modification time of the files/directories that it lists. The output of the existing command "hadoop dfs -ls" will not be affected.
> The ClientProtocol will change because DFSFileInfo will have two additional fields: the creation time and modification time of the file that it represents. This information can be retrieved by clients thorugh the ClientProtocol.getListings() method. The FileSystem public API will have two additional methods: getCreationTime and getModificationTime().
> The datanodes are completely transparent to this design and implementation and requires no change.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-1377) Creation time and modification time for hadoop files and directories

Chris Mattmann (Jira)
In reply to this post by Chris Mattmann (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-1377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12507478 ]

Hadoop QA commented on HADOOP-1377:
-----------------------------------

+1

http://issues.apache.org/jira/secure/attachment/12360377/CreationTime8.patch applied and successfully tested against trunk revision r549624.

Test results:   http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/323/testReport/
Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/323/console

> Creation time and modification time for hadoop files and directories
> --------------------------------------------------------------------
>
>                 Key: HADOOP-1377
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1377
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>             Fix For: 0.14.0
>
>         Attachments: CreationModificationTime.html, CreationTime8.patch
>
>
> This issue will document the requirements, design and implementation of creation times and modification times of hadoop files and directories.
> My proposal is to have support two additional attributes for each file and directory in HDFS. The "creation time" is the time when the file/directory was created. It is a 8 byte integer stored in each FSDirectory.INode. The "modification time" is the time when the last modification occured to the file/directory. It is an 8 byte integer stored in the FSDirectory.INode. These two fields are stored in in the FSEdits and FSImage as part of the transaction that created the file/directory.
> My current proposal is to not support "access time" for a file/directory. It is costly to implement and current applications might not need it.
> In the current implementation, the "modification time" for a file will be same as its creation time because HDFS files are currently unmodifiable. Setting file attributes (e.g. setting the replication factor) of a file does not modify the "modification time" of that file. The "modification time" for a directory is either its creation time or the time when the most recent file-delete or file-create occured in that directory.
> A new command named "hadoop dfs -lsl" will display the creation time and modification time of the files/directories that it lists. The output of the existing command "hadoop dfs -ls" will not be affected.
> The ClientProtocol will change because DFSFileInfo will have two additional fields: the creation time and modification time of the file that it represents. This information can be retrieved by clients thorugh the ClientProtocol.getListings() method. The FileSystem public API will have two additional methods: getCreationTime and getModificationTime().
> The datanodes are completely transparent to this design and implementation and requires no change.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (HADOOP-1377) Creation time and modification time for hadoop files and directories

Chris Mattmann (Jira)
In reply to this post by Chris Mattmann (Jira)

     [ https://issues.apache.org/jira/browse/HADOOP-1377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting updated HADOOP-1377:
---------------------------------

    Status: Open  (was: Patch Available)

The S3, in-memory and local status implementations can further improved.  I'll attach a patch in a minute.

> Creation time and modification time for hadoop files and directories
> --------------------------------------------------------------------
>
>                 Key: HADOOP-1377
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1377
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>             Fix For: 0.14.0
>
>         Attachments: CreationModificationTime.html, CreationTime8.patch
>
>
> This issue will document the requirements, design and implementation of creation times and modification times of hadoop files and directories.
> My proposal is to have support two additional attributes for each file and directory in HDFS. The "creation time" is the time when the file/directory was created. It is a 8 byte integer stored in each FSDirectory.INode. The "modification time" is the time when the last modification occured to the file/directory. It is an 8 byte integer stored in the FSDirectory.INode. These two fields are stored in in the FSEdits and FSImage as part of the transaction that created the file/directory.
> My current proposal is to not support "access time" for a file/directory. It is costly to implement and current applications might not need it.
> In the current implementation, the "modification time" for a file will be same as its creation time because HDFS files are currently unmodifiable. Setting file attributes (e.g. setting the replication factor) of a file does not modify the "modification time" of that file. The "modification time" for a directory is either its creation time or the time when the most recent file-delete or file-create occured in that directory.
> A new command named "hadoop dfs -lsl" will display the creation time and modification time of the files/directories that it lists. The output of the existing command "hadoop dfs -ls" will not be affected.
> The ClientProtocol will change because DFSFileInfo will have two additional fields: the creation time and modification time of the file that it represents. This information can be retrieved by clients thorugh the ClientProtocol.getListings() method. The FileSystem public API will have two additional methods: getCreationTime and getModificationTime().
> The datanodes are completely transparent to this design and implementation and requires no change.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (HADOOP-1377) Creation time and modification time for hadoop files and directories

Chris Mattmann (Jira)
In reply to this post by Chris Mattmann (Jira)

     [ https://issues.apache.org/jira/browse/HADOOP-1377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting updated HADOOP-1377:
---------------------------------

    Attachment: 1377.patch

Improves status implementations for S2, in-memory and local filesystems.  Also change format of FsShell listings to more closely match unix 'ls -l'.  Dhruba, do these changes look reasonable to you?

> Creation time and modification time for hadoop files and directories
> --------------------------------------------------------------------
>
>                 Key: HADOOP-1377
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1377
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>             Fix For: 0.14.0
>
>         Attachments: 1377.patch, CreationModificationTime.html, CreationTime8.patch
>
>
> This issue will document the requirements, design and implementation of creation times and modification times of hadoop files and directories.
> My proposal is to have support two additional attributes for each file and directory in HDFS. The "creation time" is the time when the file/directory was created. It is a 8 byte integer stored in each FSDirectory.INode. The "modification time" is the time when the last modification occured to the file/directory. It is an 8 byte integer stored in the FSDirectory.INode. These two fields are stored in in the FSEdits and FSImage as part of the transaction that created the file/directory.
> My current proposal is to not support "access time" for a file/directory. It is costly to implement and current applications might not need it.
> In the current implementation, the "modification time" for a file will be same as its creation time because HDFS files are currently unmodifiable. Setting file attributes (e.g. setting the replication factor) of a file does not modify the "modification time" of that file. The "modification time" for a directory is either its creation time or the time when the most recent file-delete or file-create occured in that directory.
> A new command named "hadoop dfs -lsl" will display the creation time and modification time of the files/directories that it lists. The output of the existing command "hadoop dfs -ls" will not be affected.
> The ClientProtocol will change because DFSFileInfo will have two additional fields: the creation time and modification time of the file that it represents. This information can be retrieved by clients thorugh the ClientProtocol.getListings() method. The FileSystem public API will have two additional methods: getCreationTime and getModificationTime().
> The datanodes are completely transparent to this design and implementation and requires no change.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-1377) Creation time and modification time for hadoop files and directories

Chris Mattmann (Jira)
In reply to this post by Chris Mattmann (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-1377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12507495 ]

Doug Cutting commented on HADOOP-1377:
--------------------------------------

One more thought on this: none of the filesystem will implement creation date, so I propose that we remove this feature from the API, supporting only modification date.  That's all that java.io.File supports, and all that HDFS will support for some time.  We can always add creation date later if we need it, but right now it's just unusable baggage.  Any objections?  If not, I'll attach a new version of the patch without creation date support shortly.

> Creation time and modification time for hadoop files and directories
> --------------------------------------------------------------------
>
>                 Key: HADOOP-1377
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1377
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>             Fix For: 0.14.0
>
>         Attachments: 1377.patch, CreationModificationTime.html, CreationTime8.patch
>
>
> This issue will document the requirements, design and implementation of creation times and modification times of hadoop files and directories.
> My proposal is to have support two additional attributes for each file and directory in HDFS. The "creation time" is the time when the file/directory was created. It is a 8 byte integer stored in each FSDirectory.INode. The "modification time" is the time when the last modification occured to the file/directory. It is an 8 byte integer stored in the FSDirectory.INode. These two fields are stored in in the FSEdits and FSImage as part of the transaction that created the file/directory.
> My current proposal is to not support "access time" for a file/directory. It is costly to implement and current applications might not need it.
> In the current implementation, the "modification time" for a file will be same as its creation time because HDFS files are currently unmodifiable. Setting file attributes (e.g. setting the replication factor) of a file does not modify the "modification time" of that file. The "modification time" for a directory is either its creation time or the time when the most recent file-delete or file-create occured in that directory.
> A new command named "hadoop dfs -lsl" will display the creation time and modification time of the files/directories that it lists. The output of the existing command "hadoop dfs -ls" will not be affected.
> The ClientProtocol will change because DFSFileInfo will have two additional fields: the creation time and modification time of the file that it represents. This information can be retrieved by clients thorugh the ClientProtocol.getListings() method. The FileSystem public API will have two additional methods: getCreationTime and getModificationTime().
> The datanodes are completely transparent to this design and implementation and requires no change.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-1377) Creation time and modification time for hadoop files and directories

Chris Mattmann (Jira)
In reply to this post by Chris Mattmann (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-1377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12507498 ]

Konstantin Shvachko commented on HADOOP-1377:
---------------------------------------------

- TestCreateModTime.java:  Redundant imports, method, and variable declarations:
import java.util.Collection;
import java.util.ArrayList;
import java.util.Iterator;
import java.util.Date;
63:  private void checkFile(FileSystem fileSys, Path name, int repl)
100:    DistributedFileSystem dfs = (DistributedFileSystem) fileSys;
132:     long ctime2 = stat.getCreationTime();
133:     long mtime2 = stat.getModificationTime();

- Classes RawLocalFileStatus, InMemoryFileStatus, and S3FileStatus have almost identical implementations.
It makes sense to have one base class that provides a default FilesStatus implementation and make those three
subclasses if the default is not good enough. The base class can be an inner class of FileSystem or the FileStatus
itself can be declared as an abstract class instead of being an interface.

- FSConstants: The comment describing changes related to the new layout version should be updated.
  // Current version:
...................

- FSEditLog:
fromLogTimeStamp(UTF8) is not used anywhere

- DistributedFileSystem: Unused variable in getFileStatus()
      FileStatus stat = null;

- FSDirectory:
    long modTime = namesystem.now();
Should be accessed in a static way NameSystem.now()

> Creation time and modification time for hadoop files and directories
> --------------------------------------------------------------------
>
>                 Key: HADOOP-1377
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1377
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>             Fix For: 0.14.0
>
>         Attachments: 1377.patch, CreationModificationTime.html, CreationTime8.patch
>
>
> This issue will document the requirements, design and implementation of creation times and modification times of hadoop files and directories.
> My proposal is to have support two additional attributes for each file and directory in HDFS. The "creation time" is the time when the file/directory was created. It is a 8 byte integer stored in each FSDirectory.INode. The "modification time" is the time when the last modification occured to the file/directory. It is an 8 byte integer stored in the FSDirectory.INode. These two fields are stored in in the FSEdits and FSImage as part of the transaction that created the file/directory.
> My current proposal is to not support "access time" for a file/directory. It is costly to implement and current applications might not need it.
> In the current implementation, the "modification time" for a file will be same as its creation time because HDFS files are currently unmodifiable. Setting file attributes (e.g. setting the replication factor) of a file does not modify the "modification time" of that file. The "modification time" for a directory is either its creation time or the time when the most recent file-delete or file-create occured in that directory.
> A new command named "hadoop dfs -lsl" will display the creation time and modification time of the files/directories that it lists. The output of the existing command "hadoop dfs -ls" will not be affected.
> The ClientProtocol will change because DFSFileInfo will have two additional fields: the creation time and modification time of the file that it represents. This information can be retrieved by clients thorugh the ClientProtocol.getListings() method. The FileSystem public API will have two additional methods: getCreationTime and getModificationTime().
> The datanodes are completely transparent to this design and implementation and requires no change.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-1377) Creation time and modification time for hadoop files and directories

Chris Mattmann (Jira)
In reply to this post by Chris Mattmann (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-1377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12507505 ]

dhruba borthakur commented on HADOOP-1377:
------------------------------------------

Hi Doug, thanks for your changes. They look good. +1.

DFS implements CreationTime. In fact, since files are typically large in HDFS and it takes a while before all the data is written to a file (e.g. output of Reduce), the creation time and modification time of a file qill not be the same. I think it is helpful to keep the implementation of CreationTime in the generic FileSystem API. As of now, only DistributedFileSystem implements CreationTime.

> Creation time and modification time for hadoop files and directories
> --------------------------------------------------------------------
>
>                 Key: HADOOP-1377
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1377
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>             Fix For: 0.14.0
>
>         Attachments: 1377.patch, CreationModificationTime.html, CreationTime8.patch
>
>
> This issue will document the requirements, design and implementation of creation times and modification times of hadoop files and directories.
> My proposal is to have support two additional attributes for each file and directory in HDFS. The "creation time" is the time when the file/directory was created. It is a 8 byte integer stored in each FSDirectory.INode. The "modification time" is the time when the last modification occured to the file/directory. It is an 8 byte integer stored in the FSDirectory.INode. These two fields are stored in in the FSEdits and FSImage as part of the transaction that created the file/directory.
> My current proposal is to not support "access time" for a file/directory. It is costly to implement and current applications might not need it.
> In the current implementation, the "modification time" for a file will be same as its creation time because HDFS files are currently unmodifiable. Setting file attributes (e.g. setting the replication factor) of a file does not modify the "modification time" of that file. The "modification time" for a directory is either its creation time or the time when the most recent file-delete or file-create occured in that directory.
> A new command named "hadoop dfs -lsl" will display the creation time and modification time of the files/directories that it lists. The output of the existing command "hadoop dfs -ls" will not be affected.
> The ClientProtocol will change because DFSFileInfo will have two additional fields: the creation time and modification time of the file that it represents. This information can be retrieved by clients thorugh the ClientProtocol.getListings() method. The FileSystem public API will have two additional methods: getCreationTime and getModificationTime().
> The datanodes are completely transparent to this design and implementation and requires no change.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Issue Comment Edited: (HADOOP-1377) Creation time and modification time for hadoop files and directories

Chris Mattmann (Jira)
In reply to this post by Chris Mattmann (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-1377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12507505 ]

dhruba borthakur edited comment on HADOOP-1377 at 6/22/07 1:56 PM:
-------------------------------------------------------------------

Hi Doug, thanks for your changes. They look good. +1.

DFS implements CreationTime. In fact, since files are typically large in HDFS and it takes a while before all the data is written to a file (e.g. output of Reduce), the creation time and modification time of a file will not be the same. I think it is helpful to keep the implementation of CreationTime in the generic FileSystem API. As of now, only DistributedFileSystem implements CreationTime.


 was:
Hi Doug, thanks for your changes. They look good. +1.

DFS implements CreationTime. In fact, since files are typically large in HDFS and it takes a while before all the data is written to a file (e.g. output of Reduce), the creation time and modification time of a file qill not be the same. I think it is helpful to keep the implementation of CreationTime in the generic FileSystem API. As of now, only DistributedFileSystem implements CreationTime.

> Creation time and modification time for hadoop files and directories
> --------------------------------------------------------------------
>
>                 Key: HADOOP-1377
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1377
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>             Fix For: 0.14.0
>
>         Attachments: 1377.patch, CreationModificationTime.html, CreationTime8.patch
>
>
> This issue will document the requirements, design and implementation of creation times and modification times of hadoop files and directories.
> My proposal is to have support two additional attributes for each file and directory in HDFS. The "creation time" is the time when the file/directory was created. It is a 8 byte integer stored in each FSDirectory.INode. The "modification time" is the time when the last modification occured to the file/directory. It is an 8 byte integer stored in the FSDirectory.INode. These two fields are stored in in the FSEdits and FSImage as part of the transaction that created the file/directory.
> My current proposal is to not support "access time" for a file/directory. It is costly to implement and current applications might not need it.
> In the current implementation, the "modification time" for a file will be same as its creation time because HDFS files are currently unmodifiable. Setting file attributes (e.g. setting the replication factor) of a file does not modify the "modification time" of that file. The "modification time" for a directory is either its creation time or the time when the most recent file-delete or file-create occured in that directory.
> A new command named "hadoop dfs -lsl" will display the creation time and modification time of the files/directories that it lists. The output of the existing command "hadoop dfs -ls" will not be affected.
> The ClientProtocol will change because DFSFileInfo will have two additional fields: the creation time and modification time of the file that it represents. This information can be retrieved by clients thorugh the ClientProtocol.getListings() method. The FileSystem public API will have two additional methods: getCreationTime and getModificationTime().
> The datanodes are completely transparent to this design and implementation and requires no change.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-1377) Creation time and modification time for hadoop files and directories

Chris Mattmann (Jira)
In reply to this post by Chris Mattmann (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-1377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12507506 ]

Doug Cutting commented on HADOOP-1377:
--------------------------------------

But applications cannot rely on creation time as meaningful, since they cannot get it for the local filesystem.  And for HDFS, it really only allows you to see how long it took to write a file.  Is there an important use case where an application needs creation time, distinct from modified time?

> Creation time and modification time for hadoop files and directories
> --------------------------------------------------------------------
>
>                 Key: HADOOP-1377
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1377
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>             Fix For: 0.14.0
>
>         Attachments: 1377.patch, CreationModificationTime.html, CreationTime8.patch
>
>
> This issue will document the requirements, design and implementation of creation times and modification times of hadoop files and directories.
> My proposal is to have support two additional attributes for each file and directory in HDFS. The "creation time" is the time when the file/directory was created. It is a 8 byte integer stored in each FSDirectory.INode. The "modification time" is the time when the last modification occured to the file/directory. It is an 8 byte integer stored in the FSDirectory.INode. These two fields are stored in in the FSEdits and FSImage as part of the transaction that created the file/directory.
> My current proposal is to not support "access time" for a file/directory. It is costly to implement and current applications might not need it.
> In the current implementation, the "modification time" for a file will be same as its creation time because HDFS files are currently unmodifiable. Setting file attributes (e.g. setting the replication factor) of a file does not modify the "modification time" of that file. The "modification time" for a directory is either its creation time or the time when the most recent file-delete or file-create occured in that directory.
> A new command named "hadoop dfs -lsl" will display the creation time and modification time of the files/directories that it lists. The output of the existing command "hadoop dfs -ls" will not be affected.
> The ClientProtocol will change because DFSFileInfo will have two additional fields: the creation time and modification time of the file that it represents. This information can be retrieved by clients thorugh the ClientProtocol.getListings() method. The FileSystem public API will have two additional methods: getCreationTime and getModificationTime().
> The datanodes are completely transparent to this design and implementation and requires no change.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (HADOOP-1377) Creation time and modification time for hadoop files and directories

Chris Mattmann (Jira)
In reply to this post by Chris Mattmann (Jira)

     [ https://issues.apache.org/jira/browse/HADOOP-1377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting updated HADOOP-1377:
---------------------------------

    Attachment: 1377-noctime.patch

Here's a version that removes support for creation time.

This also addresses all of Konstantin's issues save one: I didn't create a base class for FileStatus.  Only the length field is common to all of these, and its implementation is simple enough that I don't think sharing code buys much--there's no real logic that's shared.  I wouldn't oppose the use of a base class, but I don't think we'll suffer much without it in this case.

> Creation time and modification time for hadoop files and directories
> --------------------------------------------------------------------
>
>                 Key: HADOOP-1377
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1377
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>             Fix For: 0.14.0
>
>         Attachments: 1377-noctime.patch, 1377.patch, CreationModificationTime.html, CreationTime8.patch
>
>
> This issue will document the requirements, design and implementation of creation times and modification times of hadoop files and directories.
> My proposal is to have support two additional attributes for each file and directory in HDFS. The "creation time" is the time when the file/directory was created. It is a 8 byte integer stored in each FSDirectory.INode. The "modification time" is the time when the last modification occured to the file/directory. It is an 8 byte integer stored in the FSDirectory.INode. These two fields are stored in in the FSEdits and FSImage as part of the transaction that created the file/directory.
> My current proposal is to not support "access time" for a file/directory. It is costly to implement and current applications might not need it.
> In the current implementation, the "modification time" for a file will be same as its creation time because HDFS files are currently unmodifiable. Setting file attributes (e.g. setting the replication factor) of a file does not modify the "modification time" of that file. The "modification time" for a directory is either its creation time or the time when the most recent file-delete or file-create occured in that directory.
> A new command named "hadoop dfs -lsl" will display the creation time and modification time of the files/directories that it lists. The output of the existing command "hadoop dfs -ls" will not be affected.
> The ClientProtocol will change because DFSFileInfo will have two additional fields: the creation time and modification time of the file that it represents. This information can be retrieved by clients thorugh the ClientProtocol.getListings() method. The FileSystem public API will have two additional methods: getCreationTime and getModificationTime().
> The datanodes are completely transparent to this design and implementation and requires no change.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-1377) Creation time and modification time for hadoop files and directories

Chris Mattmann (Jira)
In reply to this post by Chris Mattmann (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-1377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12507510 ]

Sameer Paranjpye commented on HADOOP-1377:
------------------------------------------

> One more thought on this: none of the filesystem will implement creation date, so I propose that we remove this feature from the API, supporting only modification date

Creation date falls in the same category as things like block size and replication, which are also mostly unsupported, so maybe we could treat it the same way. Can we make it available in DFSFileStatus but not in the FileStatus interface?

> Creation time and modification time for hadoop files and directories
> --------------------------------------------------------------------
>
>                 Key: HADOOP-1377
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1377
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>             Fix For: 0.14.0
>
>         Attachments: 1377-noctime.patch, 1377.patch, CreationModificationTime.html, CreationTime8.patch
>
>
> This issue will document the requirements, design and implementation of creation times and modification times of hadoop files and directories.
> My proposal is to have support two additional attributes for each file and directory in HDFS. The "creation time" is the time when the file/directory was created. It is a 8 byte integer stored in each FSDirectory.INode. The "modification time" is the time when the last modification occured to the file/directory. It is an 8 byte integer stored in the FSDirectory.INode. These two fields are stored in in the FSEdits and FSImage as part of the transaction that created the file/directory.
> My current proposal is to not support "access time" for a file/directory. It is costly to implement and current applications might not need it.
> In the current implementation, the "modification time" for a file will be same as its creation time because HDFS files are currently unmodifiable. Setting file attributes (e.g. setting the replication factor) of a file does not modify the "modification time" of that file. The "modification time" for a directory is either its creation time or the time when the most recent file-delete or file-create occured in that directory.
> A new command named "hadoop dfs -lsl" will display the creation time and modification time of the files/directories that it lists. The output of the existing command "hadoop dfs -ls" will not be affected.
> The ClientProtocol will change because DFSFileInfo will have two additional fields: the creation time and modification time of the file that it represents. This information can be retrieved by clients thorugh the ClientProtocol.getListings() method. The FileSystem public API will have two additional methods: getCreationTime and getModificationTime().
> The datanodes are completely transparent to this design and implementation and requires no change.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-1377) Creation time and modification time for hadoop files and directories

Chris Mattmann (Jira)
In reply to this post by Chris Mattmann (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-1377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12507511 ]

Doug Cutting commented on HADOOP-1377:
--------------------------------------

> Creation date falls in the same category as things like block size and replication [ ... ]

We have code that uses block size and replication for important optimizations, so even though they're not universal, they have important use cases.  But what is the use case for creation time as presently implemented?  What will it enable that's difficult or impossible without it?  Knowing the amount of time it took to write a file seems like trivia, not critical functionality.

> Creation time and modification time for hadoop files and directories
> --------------------------------------------------------------------
>
>                 Key: HADOOP-1377
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1377
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>             Fix For: 0.14.0
>
>         Attachments: 1377-noctime.patch, 1377.patch, CreationModificationTime.html, CreationTime8.patch
>
>
> This issue will document the requirements, design and implementation of creation times and modification times of hadoop files and directories.
> My proposal is to have support two additional attributes for each file and directory in HDFS. The "creation time" is the time when the file/directory was created. It is a 8 byte integer stored in each FSDirectory.INode. The "modification time" is the time when the last modification occured to the file/directory. It is an 8 byte integer stored in the FSDirectory.INode. These two fields are stored in in the FSEdits and FSImage as part of the transaction that created the file/directory.
> My current proposal is to not support "access time" for a file/directory. It is costly to implement and current applications might not need it.
> In the current implementation, the "modification time" for a file will be same as its creation time because HDFS files are currently unmodifiable. Setting file attributes (e.g. setting the replication factor) of a file does not modify the "modification time" of that file. The "modification time" for a directory is either its creation time or the time when the most recent file-delete or file-create occured in that directory.
> A new command named "hadoop dfs -lsl" will display the creation time and modification time of the files/directories that it lists. The output of the existing command "hadoop dfs -ls" will not be affected.
> The ClientProtocol will change because DFSFileInfo will have two additional fields: the creation time and modification time of the file that it represents. This information can be retrieved by clients thorugh the ClientProtocol.getListings() method. The FileSystem public API will have two additional methods: getCreationTime and getModificationTime().
> The datanodes are completely transparent to this design and implementation and requires no change.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-1377) Creation time and modification time for hadoop files and directories

Chris Mattmann (Jira)
In reply to this post by Chris Mattmann (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-1377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12507521 ]

dhruba borthakur commented on HADOOP-1377:
------------------------------------------

When HDFS supports "append" and "truncate", the difference between creation time and modification time might become more apparent. But you are right, I do not have a very strong case for implementing Creation Time.

> We have code that uses block size and replication for important optimizations

Can you pl point me to some piece of code that uses FileSystem.getReplication()? I thought that it was mostly for display purposes.

> Creation time and modification time for hadoop files and directories
> --------------------------------------------------------------------
>
>                 Key: HADOOP-1377
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1377
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>             Fix For: 0.14.0
>
>         Attachments: 1377-noctime.patch, 1377.patch, CreationModificationTime.html, CreationTime8.patch
>
>
> This issue will document the requirements, design and implementation of creation times and modification times of hadoop files and directories.
> My proposal is to have support two additional attributes for each file and directory in HDFS. The "creation time" is the time when the file/directory was created. It is a 8 byte integer stored in each FSDirectory.INode. The "modification time" is the time when the last modification occured to the file/directory. It is an 8 byte integer stored in the FSDirectory.INode. These two fields are stored in in the FSEdits and FSImage as part of the transaction that created the file/directory.
> My current proposal is to not support "access time" for a file/directory. It is costly to implement and current applications might not need it.
> In the current implementation, the "modification time" for a file will be same as its creation time because HDFS files are currently unmodifiable. Setting file attributes (e.g. setting the replication factor) of a file does not modify the "modification time" of that file. The "modification time" for a directory is either its creation time or the time when the most recent file-delete or file-create occured in that directory.
> A new command named "hadoop dfs -lsl" will display the creation time and modification time of the files/directories that it lists. The output of the existing command "hadoop dfs -ls" will not be affected.
> The ClientProtocol will change because DFSFileInfo will have two additional fields: the creation time and modification time of the file that it represents. This information can be retrieved by clients thorugh the ClientProtocol.getListings() method. The FileSystem public API will have two additional methods: getCreationTime and getModificationTime().
> The datanodes are completely transparent to this design and implementation and requires no change.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (HADOOP-1377) Creation time and modification time for hadoop files and directories

Chris Mattmann (Jira)
In reply to this post by Chris Mattmann (Jira)

     [ https://issues.apache.org/jira/browse/HADOOP-1377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting updated HADOOP-1377:
---------------------------------

    Status: Patch Available  (was: Open)

> Creation time and modification time for hadoop files and directories
> --------------------------------------------------------------------
>
>                 Key: HADOOP-1377
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1377
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>             Fix For: 0.14.0
>
>         Attachments: 1377-noctime.patch, 1377.patch, CreationModificationTime.html, CreationTime8.patch
>
>
> This issue will document the requirements, design and implementation of creation times and modification times of hadoop files and directories.
> My proposal is to have support two additional attributes for each file and directory in HDFS. The "creation time" is the time when the file/directory was created. It is a 8 byte integer stored in each FSDirectory.INode. The "modification time" is the time when the last modification occured to the file/directory. It is an 8 byte integer stored in the FSDirectory.INode. These two fields are stored in in the FSEdits and FSImage as part of the transaction that created the file/directory.
> My current proposal is to not support "access time" for a file/directory. It is costly to implement and current applications might not need it.
> In the current implementation, the "modification time" for a file will be same as its creation time because HDFS files are currently unmodifiable. Setting file attributes (e.g. setting the replication factor) of a file does not modify the "modification time" of that file. The "modification time" for a directory is either its creation time or the time when the most recent file-delete or file-create occured in that directory.
> A new command named "hadoop dfs -lsl" will display the creation time and modification time of the files/directories that it lists. The output of the existing command "hadoop dfs -ls" will not be affected.
> The ClientProtocol will change because DFSFileInfo will have two additional fields: the creation time and modification time of the file that it represents. This information can be retrieved by clients thorugh the ClientProtocol.getListings() method. The FileSystem public API will have two additional methods: getCreationTime and getModificationTime().
> The datanodes are completely transparent to this design and implementation and requires no change.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-1377) Creation time and modification time for hadoop files and directories

Chris Mattmann (Jira)
In reply to this post by Chris Mattmann (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-1377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12507523 ]

Doug Cutting commented on HADOOP-1377:
--------------------------------------

> When HDFS supports "append" and "truncate", the difference between creation time and modification time might become more apparent.

Yes, and that might be a good time to add support for creation time.  Until then, it's pretty useless, so why bother?

> pl point me to some piece of code that uses FileSystem.getReplication()?

We increase the replication of job.xml and job.jar in JobClient.java so that the datanodes that contain these files are not overwhelmed when a job first starts.

> Creation time and modification time for hadoop files and directories
> --------------------------------------------------------------------
>
>                 Key: HADOOP-1377
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1377
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>             Fix For: 0.14.0
>
>         Attachments: 1377-noctime.patch, 1377.patch, CreationModificationTime.html, CreationTime8.patch
>
>
> This issue will document the requirements, design and implementation of creation times and modification times of hadoop files and directories.
> My proposal is to have support two additional attributes for each file and directory in HDFS. The "creation time" is the time when the file/directory was created. It is a 8 byte integer stored in each FSDirectory.INode. The "modification time" is the time when the last modification occured to the file/directory. It is an 8 byte integer stored in the FSDirectory.INode. These two fields are stored in in the FSEdits and FSImage as part of the transaction that created the file/directory.
> My current proposal is to not support "access time" for a file/directory. It is costly to implement and current applications might not need it.
> In the current implementation, the "modification time" for a file will be same as its creation time because HDFS files are currently unmodifiable. Setting file attributes (e.g. setting the replication factor) of a file does not modify the "modification time" of that file. The "modification time" for a directory is either its creation time or the time when the most recent file-delete or file-create occured in that directory.
> A new command named "hadoop dfs -lsl" will display the creation time and modification time of the files/directories that it lists. The output of the existing command "hadoop dfs -ls" will not be affected.
> The ClientProtocol will change because DFSFileInfo will have two additional fields: the creation time and modification time of the file that it represents. This information can be retrieved by clients thorugh the ClientProtocol.getListings() method. The FileSystem public API will have two additional methods: getCreationTime and getModificationTime().
> The datanodes are completely transparent to this design and implementation and requires no change.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-1377) Creation time and modification time for hadoop files and directories

Chris Mattmann (Jira)
In reply to this post by Chris Mattmann (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-1377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12507526 ]

Sameer Paranjpye commented on HADOOP-1377:
------------------------------------------

No, I don't have a particularly compelling use case for creation date. It can be dispensed with. Creation time isn't even POSIX, from 'man 2 stat'

#     The time-related fields of struct stat are as follows:
#
#     st_atime     Time when file data last accessed.  Changed by the mknod(2),
#                 utimes(2) and read(2) system calls.
#
#     st_mtime     Time when file data last modified.  Changed by the mknod(2),
#                  utimes(2) and write(2) system calls.
#
#     st_ctime     Time when file status was last changed (inode data modifica-
#                  tion).  Changed by the chmod(2), chown(2), link(2),
#                  mknod(2), rename(2), unlink(2), utimes(2) and write(2) sys-
#                  tem calls.
#

We can implement something like, st_ctime later. It might be useful to have for accounting when we have users and permissions.


> Creation time and modification time for hadoop files and directories
> --------------------------------------------------------------------
>
>                 Key: HADOOP-1377
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1377
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>             Fix For: 0.14.0
>
>         Attachments: 1377-noctime.patch, 1377.patch, CreationModificationTime.html, CreationTime8.patch
>
>
> This issue will document the requirements, design and implementation of creation times and modification times of hadoop files and directories.
> My proposal is to have support two additional attributes for each file and directory in HDFS. The "creation time" is the time when the file/directory was created. It is a 8 byte integer stored in each FSDirectory.INode. The "modification time" is the time when the last modification occured to the file/directory. It is an 8 byte integer stored in the FSDirectory.INode. These two fields are stored in in the FSEdits and FSImage as part of the transaction that created the file/directory.
> My current proposal is to not support "access time" for a file/directory. It is costly to implement and current applications might not need it.
> In the current implementation, the "modification time" for a file will be same as its creation time because HDFS files are currently unmodifiable. Setting file attributes (e.g. setting the replication factor) of a file does not modify the "modification time" of that file. The "modification time" for a directory is either its creation time or the time when the most recent file-delete or file-create occured in that directory.
> A new command named "hadoop dfs -lsl" will display the creation time and modification time of the files/directories that it lists. The output of the existing command "hadoop dfs -ls" will not be affected.
> The ClientProtocol will change because DFSFileInfo will have two additional fields: the creation time and modification time of the file that it represents. This information can be retrieved by clients thorugh the ClientProtocol.getListings() method. The FileSystem public API will have two additional methods: getCreationTime and getModificationTime().
> The datanodes are completely transparent to this design and implementation and requires no change.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-1377) Creation time and modification time for hadoop files and directories

Chris Mattmann (Jira)
In reply to this post by Chris Mattmann (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-1377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12507527 ]

dhruba borthakur commented on HADOOP-1377:
------------------------------------------

Ok, I agree. let's submit this patch with Modification Time only (no Creation Time). And it saves us 8 bytes per file on the NameNode!

+1.

> Creation time and modification time for hadoop files and directories
> --------------------------------------------------------------------
>
>                 Key: HADOOP-1377
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1377
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>             Fix For: 0.14.0
>
>         Attachments: 1377-noctime.patch, 1377.patch, CreationModificationTime.html, CreationTime8.patch
>
>
> This issue will document the requirements, design and implementation of creation times and modification times of hadoop files and directories.
> My proposal is to have support two additional attributes for each file and directory in HDFS. The "creation time" is the time when the file/directory was created. It is a 8 byte integer stored in each FSDirectory.INode. The "modification time" is the time when the last modification occured to the file/directory. It is an 8 byte integer stored in the FSDirectory.INode. These two fields are stored in in the FSEdits and FSImage as part of the transaction that created the file/directory.
> My current proposal is to not support "access time" for a file/directory. It is costly to implement and current applications might not need it.
> In the current implementation, the "modification time" for a file will be same as its creation time because HDFS files are currently unmodifiable. Setting file attributes (e.g. setting the replication factor) of a file does not modify the "modification time" of that file. The "modification time" for a directory is either its creation time or the time when the most recent file-delete or file-create occured in that directory.
> A new command named "hadoop dfs -lsl" will display the creation time and modification time of the files/directories that it lists. The output of the existing command "hadoop dfs -ls" will not be affected.
> The ClientProtocol will change because DFSFileInfo will have two additional fields: the creation time and modification time of the file that it represents. This information can be retrieved by clients thorugh the ClientProtocol.getListings() method. The FileSystem public API will have two additional methods: getCreationTime and getModificationTime().
> The datanodes are completely transparent to this design and implementation and requires no change.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

123