[jira] Created: (HADOOP-1377) Creation time and modification time for hadoop files and directories

classic Classic list List threaded Threaded
45 messages Options
123
Reply | Threaded
Open this post in threaded view
|

[jira] Created: (HADOOP-1377) Creation time and modification time for hadoop files and directories

Tim Allison (Jira)
Creation time and modification time for hadoop files and directories
--------------------------------------------------------------------

                 Key: HADOOP-1377
                 URL: https://issues.apache.org/jira/browse/HADOOP-1377
             Project: Hadoop
          Issue Type: New Feature
            Reporter: dhruba borthakur
         Assigned To: dhruba borthakur


This issue will document the requirements, design and implementation of creation times and modification times of hadoop files and directories.

My proposal is to have support two additional attributes for each file and directory in HDFS. The "creation time" is the time when the file/directory was created. It is a 8 byte integer stored in each FSDirectory.INode. The "modification time" is the time when the last modification occured to the file/directory. It is an 8 byte integer stored in the FSDirectory.INode. These two fields are stored in in the FSEdits and FSImage as part of the transaction that created the file/directory.

My current proposal is to not support "access time" for a file/directory. It is costly to implement and current applications might not need it.

In the current implementation, the "modification time" for a file will be same as its creation time because HDFS files are currently unmodifiable. Setting file attributes (e.g. setting the replication factor) of a file does not modify the "modification time" of that file. The "modification time" for a directory is either its creation time or the time when the most recent file-delete or file-create occured in that directory.

A new command named "hadoop dfs -lsl" will display the creation time and modification time of the files/directories that it lists. The output of the existing command "hadoop dfs -ls" will not be affected.

The ClientProtocol will change because DFSFileInfo will have two additional fields: the creation time and modification time of the file that it represents. This information can be retrieved by clients thorugh the ClientProtocol.getListings() method. The FileSystem public API will have two additional methods: getCreationTime and getModificationTime().

The datanodes are completely transparent to this design and implementation and requires no change.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (HADOOP-1377) Creation time and modification time for hadoop files and directories

Tim Allison (Jira)

     [ https://issues.apache.org/jira/browse/HADOOP-1377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-1377:
-------------------------------------

    Attachment: CreationModificationTime.html

A design document for implementing creation time and modification time for files and directories.

> Creation time and modification time for hadoop files and directories
> --------------------------------------------------------------------
>
>                 Key: HADOOP-1377
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1377
>             Project: Hadoop
>          Issue Type: New Feature
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: CreationModificationTime.html
>
>
> This issue will document the requirements, design and implementation of creation times and modification times of hadoop files and directories.
> My proposal is to have support two additional attributes for each file and directory in HDFS. The "creation time" is the time when the file/directory was created. It is a 8 byte integer stored in each FSDirectory.INode. The "modification time" is the time when the last modification occured to the file/directory. It is an 8 byte integer stored in the FSDirectory.INode. These two fields are stored in in the FSEdits and FSImage as part of the transaction that created the file/directory.
> My current proposal is to not support "access time" for a file/directory. It is costly to implement and current applications might not need it.
> In the current implementation, the "modification time" for a file will be same as its creation time because HDFS files are currently unmodifiable. Setting file attributes (e.g. setting the replication factor) of a file does not modify the "modification time" of that file. The "modification time" for a directory is either its creation time or the time when the most recent file-delete or file-create occured in that directory.
> A new command named "hadoop dfs -lsl" will display the creation time and modification time of the files/directories that it lists. The output of the existing command "hadoop dfs -ls" will not be affected.
> The ClientProtocol will change because DFSFileInfo will have two additional fields: the creation time and modification time of the file that it represents. This information can be retrieved by clients thorugh the ClientProtocol.getListings() method. The FileSystem public API will have two additional methods: getCreationTime and getModificationTime().
> The datanodes are completely transparent to this design and implementation and requires no change.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-1377) Creation time and modification time for hadoop files and directories

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-1377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12500482 ]

Doug Cutting commented on HADOOP-1377:
--------------------------------------

As I mentioned above, I'd prefer we use a FileStatus object that contains all file metadata, rather than adding a new FileSystem method per new metadata field added.  If an application needs more than one metadata field from a file it should not invoke more than one RPC, nor should FileSystem implementations be forced to implement a cache.  Currently HDFS does cache status internally, but that is fragile and will break when, e.g., modification times and lengths are permitted to change.

> Creation time and modification time for hadoop files and directories
> --------------------------------------------------------------------
>
>                 Key: HADOOP-1377
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1377
>             Project: Hadoop
>          Issue Type: New Feature
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: CreationModificationTime.html
>
>
> This issue will document the requirements, design and implementation of creation times and modification times of hadoop files and directories.
> My proposal is to have support two additional attributes for each file and directory in HDFS. The "creation time" is the time when the file/directory was created. It is a 8 byte integer stored in each FSDirectory.INode. The "modification time" is the time when the last modification occured to the file/directory. It is an 8 byte integer stored in the FSDirectory.INode. These two fields are stored in in the FSEdits and FSImage as part of the transaction that created the file/directory.
> My current proposal is to not support "access time" for a file/directory. It is costly to implement and current applications might not need it.
> In the current implementation, the "modification time" for a file will be same as its creation time because HDFS files are currently unmodifiable. Setting file attributes (e.g. setting the replication factor) of a file does not modify the "modification time" of that file. The "modification time" for a directory is either its creation time or the time when the most recent file-delete or file-create occured in that directory.
> A new command named "hadoop dfs -lsl" will display the creation time and modification time of the files/directories that it lists. The output of the existing command "hadoop dfs -ls" will not be affected.
> The ClientProtocol will change because DFSFileInfo will have two additional fields: the creation time and modification time of the file that it represents. This information can be retrieved by clients thorugh the ClientProtocol.getListings() method. The FileSystem public API will have two additional methods: getCreationTime and getModificationTime().
> The datanodes are completely transparent to this design and implementation and requires no change.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-1377) Creation time and modification time for hadoop files and directories

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-1377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12500579 ]

dhruba borthakur commented on HADOOP-1377:
------------------------------------------

I like the idea of having an explicit call getFileStatus() that returns an object of type FileStatus as implemented in HADOOP-1298. Once that patch is submitted, I will make the corresponding changes to this implementation.

> Creation time and modification time for hadoop files and directories
> --------------------------------------------------------------------
>
>                 Key: HADOOP-1377
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1377
>             Project: Hadoop
>          Issue Type: New Feature
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: CreationModificationTime.html
>
>
> This issue will document the requirements, design and implementation of creation times and modification times of hadoop files and directories.
> My proposal is to have support two additional attributes for each file and directory in HDFS. The "creation time" is the time when the file/directory was created. It is a 8 byte integer stored in each FSDirectory.INode. The "modification time" is the time when the last modification occured to the file/directory. It is an 8 byte integer stored in the FSDirectory.INode. These two fields are stored in in the FSEdits and FSImage as part of the transaction that created the file/directory.
> My current proposal is to not support "access time" for a file/directory. It is costly to implement and current applications might not need it.
> In the current implementation, the "modification time" for a file will be same as its creation time because HDFS files are currently unmodifiable. Setting file attributes (e.g. setting the replication factor) of a file does not modify the "modification time" of that file. The "modification time" for a directory is either its creation time or the time when the most recent file-delete or file-create occured in that directory.
> A new command named "hadoop dfs -lsl" will display the creation time and modification time of the files/directories that it lists. The output of the existing command "hadoop dfs -ls" will not be affected.
> The ClientProtocol will change because DFSFileInfo will have two additional fields: the creation time and modification time of the file that it represents. This information can be retrieved by clients thorugh the ClientProtocol.getListings() method. The FileSystem public API will have two additional methods: getCreationTime and getModificationTime().
> The datanodes are completely transparent to this design and implementation and requires no change.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (HADOOP-1377) Creation time and modification time for hadoop files and directories

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

     [ https://issues.apache.org/jira/browse/HADOOP-1377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-1377:
-------------------------------------

    Component/s: dfs

> Creation time and modification time for hadoop files and directories
> --------------------------------------------------------------------
>
>                 Key: HADOOP-1377
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1377
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: CreationModificationTime.html
>
>
> This issue will document the requirements, design and implementation of creation times and modification times of hadoop files and directories.
> My proposal is to have support two additional attributes for each file and directory in HDFS. The "creation time" is the time when the file/directory was created. It is a 8 byte integer stored in each FSDirectory.INode. The "modification time" is the time when the last modification occured to the file/directory. It is an 8 byte integer stored in the FSDirectory.INode. These two fields are stored in in the FSEdits and FSImage as part of the transaction that created the file/directory.
> My current proposal is to not support "access time" for a file/directory. It is costly to implement and current applications might not need it.
> In the current implementation, the "modification time" for a file will be same as its creation time because HDFS files are currently unmodifiable. Setting file attributes (e.g. setting the replication factor) of a file does not modify the "modification time" of that file. The "modification time" for a directory is either its creation time or the time when the most recent file-delete or file-create occured in that directory.
> A new command named "hadoop dfs -lsl" will display the creation time and modification time of the files/directories that it lists. The output of the existing command "hadoop dfs -ls" will not be affected.
> The ClientProtocol will change because DFSFileInfo will have two additional fields: the creation time and modification time of the file that it represents. This information can be retrieved by clients thorugh the ClientProtocol.getListings() method. The FileSystem public API will have two additional methods: getCreationTime and getModificationTime().
> The datanodes are completely transparent to this design and implementation and requires no change.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (HADOOP-1377) Creation time and modification time for hadoop files and directories

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

     [ https://issues.apache.org/jira/browse/HADOOP-1377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-1377:
-------------------------------------

    Attachment: CreationTime.patch

A first version of the path available for review. It introduces a new FileSystem call of the form:

 public FileStatus getFileStatus(Path f) throws IOException

This API retrives the creation time and modification time of files (along with other file attributes).

Would appreciate some review comments, especially on the API enhancement.

> Creation time and modification time for hadoop files and directories
> --------------------------------------------------------------------
>
>                 Key: HADOOP-1377
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1377
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: CreationModificationTime.html, CreationTime.patch
>
>
> This issue will document the requirements, design and implementation of creation times and modification times of hadoop files and directories.
> My proposal is to have support two additional attributes for each file and directory in HDFS. The "creation time" is the time when the file/directory was created. It is a 8 byte integer stored in each FSDirectory.INode. The "modification time" is the time when the last modification occured to the file/directory. It is an 8 byte integer stored in the FSDirectory.INode. These two fields are stored in in the FSEdits and FSImage as part of the transaction that created the file/directory.
> My current proposal is to not support "access time" for a file/directory. It is costly to implement and current applications might not need it.
> In the current implementation, the "modification time" for a file will be same as its creation time because HDFS files are currently unmodifiable. Setting file attributes (e.g. setting the replication factor) of a file does not modify the "modification time" of that file. The "modification time" for a directory is either its creation time or the time when the most recent file-delete or file-create occured in that directory.
> A new command named "hadoop dfs -lsl" will display the creation time and modification time of the files/directories that it lists. The output of the existing command "hadoop dfs -ls" will not be affected.
> The ClientProtocol will change because DFSFileInfo will have two additional fields: the creation time and modification time of the file that it represents. This information can be retrieved by clients thorugh the ClientProtocol.getListings() method. The FileSystem public API will have two additional methods: getCreationTime and getModificationTime().
> The datanodes are completely transparent to this design and implementation and requires no change.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-1377) Creation time and modification time for hadoop files and directories

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-1377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12504421 ]

Konstantin Shvachko commented on HADOOP-1377:
---------------------------------------------

DFSClient.java
- Please avoid introducing methods with deprecated UTF8. String would be a better in this case.
  public DFSFileInfo getFileInfo(UTF8 src) throws IOException {

FileStatus
- all imports are redundant
- I think that in terms of different file system compatibility FileStatus should
be an interface. For the hdfs we should have a class HDFSFileStatus, which should be a part
of DFSFileInfo combining all status fields. That way we will not need to change
protocols and internal name-node interfaces when we add/modify status fields.
- In the FileStatus constructor you are assigning blockSize to itself.
    this.blockSize = blockSize;
I guess a parameter is missing.

DfsPath
- I am not sure whether HADOOP-1377 should be built on top of HADOOP-1298 or
vise versa, but I agree with Doug that public api should use FileStatus.
That is why DfsPath should introduce getFileStatus() rather than getters for each new field.

FSDirectory
- Rather than multiplying parameters for each method related to meta-data modification
I would just add FileStatus as a parameter once.
-  getFileInfo should not be public and should not have UTF8 as a parameter
public DFSFileInfo getFileInfo(UTF8 src) throws IOException {...}
Same for FSNamesystem.getFileInfo().

FSEdiLlog
- loadFSEdits() has a lot of code replication, which deserves to be wrapped in
separate method(s). I'd serialize entire HDFSFileStatus, which is Writable anyway.
Same for FSImage, I'd serialize the entire FileStatus.

> Creation time and modification time for hadoop files and directories
> --------------------------------------------------------------------
>
>                 Key: HADOOP-1377
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1377
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: CreationModificationTime.html, CreationTime.patch
>
>
> This issue will document the requirements, design and implementation of creation times and modification times of hadoop files and directories.
> My proposal is to have support two additional attributes for each file and directory in HDFS. The "creation time" is the time when the file/directory was created. It is a 8 byte integer stored in each FSDirectory.INode. The "modification time" is the time when the last modification occured to the file/directory. It is an 8 byte integer stored in the FSDirectory.INode. These two fields are stored in in the FSEdits and FSImage as part of the transaction that created the file/directory.
> My current proposal is to not support "access time" for a file/directory. It is costly to implement and current applications might not need it.
> In the current implementation, the "modification time" for a file will be same as its creation time because HDFS files are currently unmodifiable. Setting file attributes (e.g. setting the replication factor) of a file does not modify the "modification time" of that file. The "modification time" for a directory is either its creation time or the time when the most recent file-delete or file-create occured in that directory.
> A new command named "hadoop dfs -lsl" will display the creation time and modification time of the files/directories that it lists. The output of the existing command "hadoop dfs -ls" will not be affected.
> The ClientProtocol will change because DFSFileInfo will have two additional fields: the creation time and modification time of the file that it represents. This information can be retrieved by clients thorugh the ClientProtocol.getListings() method. The FileSystem public API will have two additional methods: getCreationTime and getModificationTime().
> The datanodes are completely transparent to this design and implementation and requires no change.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (HADOOP-1377) Creation time and modification time for hadoop files and directories

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

     [ https://issues.apache.org/jira/browse/HADOOP-1377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-1377:
-------------------------------------

    Attachment: CreationTime3.patch

1. Implemented Konstantin suggestion of making FileStatus an Interface (instead of an object).

2. Dates are now printed as MMM dd yyyy HH:mm:ss


> Creation time and modification time for hadoop files and directories
> --------------------------------------------------------------------
>
>                 Key: HADOOP-1377
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1377
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: CreationModificationTime.html, CreationTime3.patch
>
>
> This issue will document the requirements, design and implementation of creation times and modification times of hadoop files and directories.
> My proposal is to have support two additional attributes for each file and directory in HDFS. The "creation time" is the time when the file/directory was created. It is a 8 byte integer stored in each FSDirectory.INode. The "modification time" is the time when the last modification occured to the file/directory. It is an 8 byte integer stored in the FSDirectory.INode. These two fields are stored in in the FSEdits and FSImage as part of the transaction that created the file/directory.
> My current proposal is to not support "access time" for a file/directory. It is costly to implement and current applications might not need it.
> In the current implementation, the "modification time" for a file will be same as its creation time because HDFS files are currently unmodifiable. Setting file attributes (e.g. setting the replication factor) of a file does not modify the "modification time" of that file. The "modification time" for a directory is either its creation time or the time when the most recent file-delete or file-create occured in that directory.
> A new command named "hadoop dfs -lsl" will display the creation time and modification time of the files/directories that it lists. The output of the existing command "hadoop dfs -ls" will not be affected.
> The ClientProtocol will change because DFSFileInfo will have two additional fields: the creation time and modification time of the file that it represents. This information can be retrieved by clients thorugh the ClientProtocol.getListings() method. The FileSystem public API will have two additional methods: getCreationTime and getModificationTime().
> The datanodes are completely transparent to this design and implementation and requires no change.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (HADOOP-1377) Creation time and modification time for hadoop files and directories

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

     [ https://issues.apache.org/jira/browse/HADOOP-1377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-1377:
-------------------------------------

    Attachment:     (was: CreationTime.patch)

> Creation time and modification time for hadoop files and directories
> --------------------------------------------------------------------
>
>                 Key: HADOOP-1377
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1377
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: CreationModificationTime.html, CreationTime3.patch
>
>
> This issue will document the requirements, design and implementation of creation times and modification times of hadoop files and directories.
> My proposal is to have support two additional attributes for each file and directory in HDFS. The "creation time" is the time when the file/directory was created. It is a 8 byte integer stored in each FSDirectory.INode. The "modification time" is the time when the last modification occured to the file/directory. It is an 8 byte integer stored in the FSDirectory.INode. These two fields are stored in in the FSEdits and FSImage as part of the transaction that created the file/directory.
> My current proposal is to not support "access time" for a file/directory. It is costly to implement and current applications might not need it.
> In the current implementation, the "modification time" for a file will be same as its creation time because HDFS files are currently unmodifiable. Setting file attributes (e.g. setting the replication factor) of a file does not modify the "modification time" of that file. The "modification time" for a directory is either its creation time or the time when the most recent file-delete or file-create occured in that directory.
> A new command named "hadoop dfs -lsl" will display the creation time and modification time of the files/directories that it lists. The output of the existing command "hadoop dfs -ls" will not be affected.
> The ClientProtocol will change because DFSFileInfo will have two additional fields: the creation time and modification time of the file that it represents. This information can be retrieved by clients thorugh the ClientProtocol.getListings() method. The FileSystem public API will have two additional methods: getCreationTime and getModificationTime().
> The datanodes are completely transparent to this design and implementation and requires no change.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (HADOOP-1377) Creation time and modification time for hadoop files and directories

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

     [ https://issues.apache.org/jira/browse/HADOOP-1377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-1377:
-------------------------------------

    Attachment: CreationTime4.patch

Removed deprecated UTF8 parameter to some methods. I left the minor duplication in code in FSEditLog as it was earlier.


> Creation time and modification time for hadoop files and directories
> --------------------------------------------------------------------
>
>                 Key: HADOOP-1377
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1377
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: CreationModificationTime.html, CreationTime3.patch, CreationTime4.patch
>
>
> This issue will document the requirements, design and implementation of creation times and modification times of hadoop files and directories.
> My proposal is to have support two additional attributes for each file and directory in HDFS. The "creation time" is the time when the file/directory was created. It is a 8 byte integer stored in each FSDirectory.INode. The "modification time" is the time when the last modification occured to the file/directory. It is an 8 byte integer stored in the FSDirectory.INode. These two fields are stored in in the FSEdits and FSImage as part of the transaction that created the file/directory.
> My current proposal is to not support "access time" for a file/directory. It is costly to implement and current applications might not need it.
> In the current implementation, the "modification time" for a file will be same as its creation time because HDFS files are currently unmodifiable. Setting file attributes (e.g. setting the replication factor) of a file does not modify the "modification time" of that file. The "modification time" for a directory is either its creation time or the time when the most recent file-delete or file-create occured in that directory.
> A new command named "hadoop dfs -lsl" will display the creation time and modification time of the files/directories that it lists. The output of the existing command "hadoop dfs -ls" will not be affected.
> The ClientProtocol will change because DFSFileInfo will have two additional fields: the creation time and modification time of the file that it represents. This information can be retrieved by clients thorugh the ClientProtocol.getListings() method. The FileSystem public API will have two additional methods: getCreationTime and getModificationTime().
> The datanodes are completely transparent to this design and implementation and requires no change.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (HADOOP-1377) Creation time and modification time for hadoop files and directories

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

     [ https://issues.apache.org/jira/browse/HADOOP-1377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-1377:
-------------------------------------

    Attachment:     (was: CreationTime3.patch)

> Creation time and modification time for hadoop files and directories
> --------------------------------------------------------------------
>
>                 Key: HADOOP-1377
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1377
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: CreationModificationTime.html, CreationTime4.patch
>
>
> This issue will document the requirements, design and implementation of creation times and modification times of hadoop files and directories.
> My proposal is to have support two additional attributes for each file and directory in HDFS. The "creation time" is the time when the file/directory was created. It is a 8 byte integer stored in each FSDirectory.INode. The "modification time" is the time when the last modification occured to the file/directory. It is an 8 byte integer stored in the FSDirectory.INode. These two fields are stored in in the FSEdits and FSImage as part of the transaction that created the file/directory.
> My current proposal is to not support "access time" for a file/directory. It is costly to implement and current applications might not need it.
> In the current implementation, the "modification time" for a file will be same as its creation time because HDFS files are currently unmodifiable. Setting file attributes (e.g. setting the replication factor) of a file does not modify the "modification time" of that file. The "modification time" for a directory is either its creation time or the time when the most recent file-delete or file-create occured in that directory.
> A new command named "hadoop dfs -lsl" will display the creation time and modification time of the files/directories that it lists. The output of the existing command "hadoop dfs -ls" will not be affected.
> The ClientProtocol will change because DFSFileInfo will have two additional fields: the creation time and modification time of the file that it represents. This information can be retrieved by clients thorugh the ClientProtocol.getListings() method. The FileSystem public API will have two additional methods: getCreationTime and getModificationTime().
> The datanodes are completely transparent to this design and implementation and requires no change.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (HADOOP-1377) Creation time and modification time for hadoop files and directories

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

     [ https://issues.apache.org/jira/browse/HADOOP-1377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-1377:
-------------------------------------

    Status: Patch Available  (was: Open)

Implemented most of Konstantin's suggestions. merged patch with latest trunk.

> Creation time and modification time for hadoop files and directories
> --------------------------------------------------------------------
>
>                 Key: HADOOP-1377
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1377
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: CreationModificationTime.html, CreationTime4.patch
>
>
> This issue will document the requirements, design and implementation of creation times and modification times of hadoop files and directories.
> My proposal is to have support two additional attributes for each file and directory in HDFS. The "creation time" is the time when the file/directory was created. It is a 8 byte integer stored in each FSDirectory.INode. The "modification time" is the time when the last modification occured to the file/directory. It is an 8 byte integer stored in the FSDirectory.INode. These two fields are stored in in the FSEdits and FSImage as part of the transaction that created the file/directory.
> My current proposal is to not support "access time" for a file/directory. It is costly to implement and current applications might not need it.
> In the current implementation, the "modification time" for a file will be same as its creation time because HDFS files are currently unmodifiable. Setting file attributes (e.g. setting the replication factor) of a file does not modify the "modification time" of that file. The "modification time" for a directory is either its creation time or the time when the most recent file-delete or file-create occured in that directory.
> A new command named "hadoop dfs -lsl" will display the creation time and modification time of the files/directories that it lists. The output of the existing command "hadoop dfs -ls" will not be affected.
> The ClientProtocol will change because DFSFileInfo will have two additional fields: the creation time and modification time of the file that it represents. This information can be retrieved by clients thorugh the ClientProtocol.getListings() method. The FileSystem public API will have two additional methods: getCreationTime and getModificationTime().
> The datanodes are completely transparent to this design and implementation and requires no change.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-1377) Creation time and modification time for hadoop files and directories

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-1377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12506306 ]

Hadoop QA commented on HADOOP-1377:
-----------------------------------

-1, could not apply patch.

The patch command could not apply the latest attachment http://issues.apache.org/jira/secure/attachment/12360062/CreationTime4.patch as a patch to trunk revision r548794.

Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/304/console

Please note that this message is automatically generated and may represent a problem with the automation system and not the patch.

> Creation time and modification time for hadoop files and directories
> --------------------------------------------------------------------
>
>                 Key: HADOOP-1377
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1377
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: CreationModificationTime.html, CreationTime4.patch
>
>
> This issue will document the requirements, design and implementation of creation times and modification times of hadoop files and directories.
> My proposal is to have support two additional attributes for each file and directory in HDFS. The "creation time" is the time when the file/directory was created. It is a 8 byte integer stored in each FSDirectory.INode. The "modification time" is the time when the last modification occured to the file/directory. It is an 8 byte integer stored in the FSDirectory.INode. These two fields are stored in in the FSEdits and FSImage as part of the transaction that created the file/directory.
> My current proposal is to not support "access time" for a file/directory. It is costly to implement and current applications might not need it.
> In the current implementation, the "modification time" for a file will be same as its creation time because HDFS files are currently unmodifiable. Setting file attributes (e.g. setting the replication factor) of a file does not modify the "modification time" of that file. The "modification time" for a directory is either its creation time or the time when the most recent file-delete or file-create occured in that directory.
> A new command named "hadoop dfs -lsl" will display the creation time and modification time of the files/directories that it lists. The output of the existing command "hadoop dfs -ls" will not be affected.
> The ClientProtocol will change because DFSFileInfo will have two additional fields: the creation time and modification time of the file that it represents. This information can be retrieved by clients thorugh the ClientProtocol.getListings() method. The FileSystem public API will have two additional methods: getCreationTime and getModificationTime().
> The datanodes are completely transparent to this design and implementation and requires no change.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (HADOOP-1377) Creation time and modification time for hadoop files and directories

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

     [ https://issues.apache.org/jira/browse/HADOOP-1377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-1377:
-------------------------------------

    Attachment:     (was: CreationTime4.patch)

> Creation time and modification time for hadoop files and directories
> --------------------------------------------------------------------
>
>                 Key: HADOOP-1377
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1377
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: CreationModificationTime.html, CreationTime5.patch
>
>
> This issue will document the requirements, design and implementation of creation times and modification times of hadoop files and directories.
> My proposal is to have support two additional attributes for each file and directory in HDFS. The "creation time" is the time when the file/directory was created. It is a 8 byte integer stored in each FSDirectory.INode. The "modification time" is the time when the last modification occured to the file/directory. It is an 8 byte integer stored in the FSDirectory.INode. These two fields are stored in in the FSEdits and FSImage as part of the transaction that created the file/directory.
> My current proposal is to not support "access time" for a file/directory. It is costly to implement and current applications might not need it.
> In the current implementation, the "modification time" for a file will be same as its creation time because HDFS files are currently unmodifiable. Setting file attributes (e.g. setting the replication factor) of a file does not modify the "modification time" of that file. The "modification time" for a directory is either its creation time or the time when the most recent file-delete or file-create occured in that directory.
> A new command named "hadoop dfs -lsl" will display the creation time and modification time of the files/directories that it lists. The output of the existing command "hadoop dfs -ls" will not be affected.
> The ClientProtocol will change because DFSFileInfo will have two additional fields: the creation time and modification time of the file that it represents. This information can be retrieved by clients thorugh the ClientProtocol.getListings() method. The FileSystem public API will have two additional methods: getCreationTime and getModificationTime().
> The datanodes are completely transparent to this design and implementation and requires no change.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (HADOOP-1377) Creation time and modification time for hadoop files and directories

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

     [ https://issues.apache.org/jira/browse/HADOOP-1377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-1377:
-------------------------------------

    Attachment: CreationTime5.patch

yet another patch that is merged with *latest* trunk.

> Creation time and modification time for hadoop files and directories
> --------------------------------------------------------------------
>
>                 Key: HADOOP-1377
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1377
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: CreationModificationTime.html, CreationTime5.patch
>
>
> This issue will document the requirements, design and implementation of creation times and modification times of hadoop files and directories.
> My proposal is to have support two additional attributes for each file and directory in HDFS. The "creation time" is the time when the file/directory was created. It is a 8 byte integer stored in each FSDirectory.INode. The "modification time" is the time when the last modification occured to the file/directory. It is an 8 byte integer stored in the FSDirectory.INode. These two fields are stored in in the FSEdits and FSImage as part of the transaction that created the file/directory.
> My current proposal is to not support "access time" for a file/directory. It is costly to implement and current applications might not need it.
> In the current implementation, the "modification time" for a file will be same as its creation time because HDFS files are currently unmodifiable. Setting file attributes (e.g. setting the replication factor) of a file does not modify the "modification time" of that file. The "modification time" for a directory is either its creation time or the time when the most recent file-delete or file-create occured in that directory.
> A new command named "hadoop dfs -lsl" will display the creation time and modification time of the files/directories that it lists. The output of the existing command "hadoop dfs -ls" will not be affected.
> The ClientProtocol will change because DFSFileInfo will have two additional fields: the creation time and modification time of the file that it represents. This information can be retrieved by clients thorugh the ClientProtocol.getListings() method. The FileSystem public API will have two additional methods: getCreationTime and getModificationTime().
> The datanodes are completely transparent to this design and implementation and requires no change.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (HADOOP-1377) Creation time and modification time for hadoop files and directories

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

     [ https://issues.apache.org/jira/browse/HADOOP-1377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting updated HADOOP-1377:
---------------------------------

    Status: Open  (was: Patch Available)

1. FsShell.java needs to import java.text.SimpleDateFormat.

2. We should deprecate the FileSystem methods getReplication, getBlockSize, getLength and isDirectory.  And these should no longer be abstract methods, but should be implemented in terms of getStatus().

3. This issue should implement getStatus() for all FileSystem implementations, doing more than throwing an exception.  These can be implemented to call the FileSystem getReplication, getBlockSize, isDirectory and getLength implementations.  So only getCreationTime and getModificationTime cannot trivially be properly implemented.  I'm not certain whether in these cases it is better to throw an exception or return zero.


> Creation time and modification time for hadoop files and directories
> --------------------------------------------------------------------
>
>                 Key: HADOOP-1377
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1377
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: CreationModificationTime.html, CreationTime5.patch
>
>
> This issue will document the requirements, design and implementation of creation times and modification times of hadoop files and directories.
> My proposal is to have support two additional attributes for each file and directory in HDFS. The "creation time" is the time when the file/directory was created. It is a 8 byte integer stored in each FSDirectory.INode. The "modification time" is the time when the last modification occured to the file/directory. It is an 8 byte integer stored in the FSDirectory.INode. These two fields are stored in in the FSEdits and FSImage as part of the transaction that created the file/directory.
> My current proposal is to not support "access time" for a file/directory. It is costly to implement and current applications might not need it.
> In the current implementation, the "modification time" for a file will be same as its creation time because HDFS files are currently unmodifiable. Setting file attributes (e.g. setting the replication factor) of a file does not modify the "modification time" of that file. The "modification time" for a directory is either its creation time or the time when the most recent file-delete or file-create occured in that directory.
> A new command named "hadoop dfs -lsl" will display the creation time and modification time of the files/directories that it lists. The output of the existing command "hadoop dfs -ls" will not be affected.
> The ClientProtocol will change because DFSFileInfo will have two additional fields: the creation time and modification time of the file that it represents. This information can be retrieved by clients thorugh the ClientProtocol.getListings() method. The FileSystem public API will have two additional methods: getCreationTime and getModificationTime().
> The datanodes are completely transparent to this design and implementation and requires no change.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (HADOOP-1377) Creation time and modification time for hadoop files and directories

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

     [ https://issues.apache.org/jira/browse/HADOOP-1377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-1377:
-------------------------------------

    Attachment:     (was: CreationTime5.patch)

> Creation time and modification time for hadoop files and directories
> --------------------------------------------------------------------
>
>                 Key: HADOOP-1377
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1377
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: CreationModificationTime.html, CreationTime6.patch
>
>
> This issue will document the requirements, design and implementation of creation times and modification times of hadoop files and directories.
> My proposal is to have support two additional attributes for each file and directory in HDFS. The "creation time" is the time when the file/directory was created. It is a 8 byte integer stored in each FSDirectory.INode. The "modification time" is the time when the last modification occured to the file/directory. It is an 8 byte integer stored in the FSDirectory.INode. These two fields are stored in in the FSEdits and FSImage as part of the transaction that created the file/directory.
> My current proposal is to not support "access time" for a file/directory. It is costly to implement and current applications might not need it.
> In the current implementation, the "modification time" for a file will be same as its creation time because HDFS files are currently unmodifiable. Setting file attributes (e.g. setting the replication factor) of a file does not modify the "modification time" of that file. The "modification time" for a directory is either its creation time or the time when the most recent file-delete or file-create occured in that directory.
> A new command named "hadoop dfs -lsl" will display the creation time and modification time of the files/directories that it lists. The output of the existing command "hadoop dfs -ls" will not be affected.
> The ClientProtocol will change because DFSFileInfo will have two additional fields: the creation time and modification time of the file that it represents. This information can be retrieved by clients thorugh the ClientProtocol.getListings() method. The FileSystem public API will have two additional methods: getCreationTime and getModificationTime().
> The datanodes are completely transparent to this design and implementation and requires no change.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (HADOOP-1377) Creation time and modification time for hadoop files and directories

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

     [ https://issues.apache.org/jira/browse/HADOOP-1377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-1377:
-------------------------------------

    Attachment: CreationTime6.patch

Incorporated Doug's review comments. InMemoryFileSystem, RawLocalFileSystem and S3FileSystem returns 0 for getCreationTime(filename) and getModificationTime(filename).

> Creation time and modification time for hadoop files and directories
> --------------------------------------------------------------------
>
>                 Key: HADOOP-1377
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1377
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: CreationModificationTime.html, CreationTime6.patch
>
>
> This issue will document the requirements, design and implementation of creation times and modification times of hadoop files and directories.
> My proposal is to have support two additional attributes for each file and directory in HDFS. The "creation time" is the time when the file/directory was created. It is a 8 byte integer stored in each FSDirectory.INode. The "modification time" is the time when the last modification occured to the file/directory. It is an 8 byte integer stored in the FSDirectory.INode. These two fields are stored in in the FSEdits and FSImage as part of the transaction that created the file/directory.
> My current proposal is to not support "access time" for a file/directory. It is costly to implement and current applications might not need it.
> In the current implementation, the "modification time" for a file will be same as its creation time because HDFS files are currently unmodifiable. Setting file attributes (e.g. setting the replication factor) of a file does not modify the "modification time" of that file. The "modification time" for a directory is either its creation time or the time when the most recent file-delete or file-create occured in that directory.
> A new command named "hadoop dfs -lsl" will display the creation time and modification time of the files/directories that it lists. The output of the existing command "hadoop dfs -ls" will not be affected.
> The ClientProtocol will change because DFSFileInfo will have two additional fields: the creation time and modification time of the file that it represents. This information can be retrieved by clients thorugh the ClientProtocol.getListings() method. The FileSystem public API will have two additional methods: getCreationTime and getModificationTime().
> The datanodes are completely transparent to this design and implementation and requires no change.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (HADOOP-1377) Creation time and modification time for hadoop files and directories

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

     [ https://issues.apache.org/jira/browse/HADOOP-1377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-1377:
-------------------------------------

    Status: Patch Available  (was: Open)

> Creation time and modification time for hadoop files and directories
> --------------------------------------------------------------------
>
>                 Key: HADOOP-1377
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1377
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: CreationModificationTime.html, CreationTime6.patch
>
>
> This issue will document the requirements, design and implementation of creation times and modification times of hadoop files and directories.
> My proposal is to have support two additional attributes for each file and directory in HDFS. The "creation time" is the time when the file/directory was created. It is a 8 byte integer stored in each FSDirectory.INode. The "modification time" is the time when the last modification occured to the file/directory. It is an 8 byte integer stored in the FSDirectory.INode. These two fields are stored in in the FSEdits and FSImage as part of the transaction that created the file/directory.
> My current proposal is to not support "access time" for a file/directory. It is costly to implement and current applications might not need it.
> In the current implementation, the "modification time" for a file will be same as its creation time because HDFS files are currently unmodifiable. Setting file attributes (e.g. setting the replication factor) of a file does not modify the "modification time" of that file. The "modification time" for a directory is either its creation time or the time when the most recent file-delete or file-create occured in that directory.
> A new command named "hadoop dfs -lsl" will display the creation time and modification time of the files/directories that it lists. The output of the existing command "hadoop dfs -ls" will not be affected.
> The ClientProtocol will change because DFSFileInfo will have two additional fields: the creation time and modification time of the file that it represents. This information can be retrieved by clients thorugh the ClientProtocol.getListings() method. The FileSystem public API will have two additional methods: getCreationTime and getModificationTime().
> The datanodes are completely transparent to this design and implementation and requires no change.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-1377) Creation time and modification time for hadoop files and directories

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-1377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12506786 ]

Hadoop QA commented on HADOOP-1377:
-----------------------------------

+1

http://issues.apache.org/jira/secure/attachment/12360252/CreationTime6.patch applied and successfully tested against trunk revision r549284.

Test results:   http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/315/testReport/
Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/315/console

> Creation time and modification time for hadoop files and directories
> --------------------------------------------------------------------
>
>                 Key: HADOOP-1377
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1377
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: CreationModificationTime.html, CreationTime6.patch
>
>
> This issue will document the requirements, design and implementation of creation times and modification times of hadoop files and directories.
> My proposal is to have support two additional attributes for each file and directory in HDFS. The "creation time" is the time when the file/directory was created. It is a 8 byte integer stored in each FSDirectory.INode. The "modification time" is the time when the last modification occured to the file/directory. It is an 8 byte integer stored in the FSDirectory.INode. These two fields are stored in in the FSEdits and FSImage as part of the transaction that created the file/directory.
> My current proposal is to not support "access time" for a file/directory. It is costly to implement and current applications might not need it.
> In the current implementation, the "modification time" for a file will be same as its creation time because HDFS files are currently unmodifiable. Setting file attributes (e.g. setting the replication factor) of a file does not modify the "modification time" of that file. The "modification time" for a directory is either its creation time or the time when the most recent file-delete or file-create occured in that directory.
> A new command named "hadoop dfs -lsl" will display the creation time and modification time of the files/directories that it lists. The output of the existing command "hadoop dfs -ls" will not be affected.
> The ClientProtocol will change because DFSFileInfo will have two additional fields: the creation time and modification time of the file that it represents. This information can be retrieved by clients thorugh the ClientProtocol.getListings() method. The FileSystem public API will have two additional methods: getCreationTime and getModificationTime().
> The datanodes are completely transparent to this design and implementation and requires no change.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

123