[jira] Created: (HADOOP-894) dfs client protocol should allow asking for parts of the block map

classic Classic list List threaded Threaded
56 messages Options
123
Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-894) dfs client protocol should allow asking for parts of the block map

David Eric Pugh (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12494575 ]

Sameer Paranjpye commented on HADOOP-894:
-----------------------------------------

Adding 'start' and 'length' parameters to the Namenodes 'open' RPC doesn't seem to add a lot of value. It won't be used unless we expose it through fs.FileSystem or dfs.DistributedFileSystem and adding an 'open and seek' kind of call just seems like API bloat.

On the other hand, having the locations of the first few block of a file is useful in many cases. In particular when a client is working with small files or wants to read the files header before seeking (as MR tasks processing sequence files do). Why not just have open default to returning the first few block locations?


> dfs client protocol should allow asking for parts of the block map
> ------------------------------------------------------------------
>
>                 Key: HADOOP-894
>                 URL: https://issues.apache.org/jira/browse/HADOOP-894
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Owen O'Malley
>         Assigned To: Konstantin Shvachko
>         Attachments: partialBlockList.patch, partialBlockList2.patch
>
>
> I think that the HDFS client protocol should change like:
> /** The meta-data about a file that was opened. */
> class OpenFileInfo {
>   /** the info for the first block */
>   public LocatedBlockInfo getBlockInfo();
>   public long getBlockSize();
>   public long getLength();
> }
> interface ClientProtocol extends VersionedProtocol {
>   public OpenFileInfo open(String name) throws IOException;
>   /** get block info for any range of blocks */
>   public LocatedBlockInfo[] getBlockInfo(String name, int blockOffset, int blockLength) throws IOException;
> }
> so that the client can decide how much block info to request and when. Currently, when the file is opened or an error occurs, the entire block list is requested and sent.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-894) dfs client protocol should allow asking for parts of the block map

David Eric Pugh (Jira)
In reply to this post by David Eric Pugh (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12494580 ]

Konstantin Shvachko commented on HADOOP-894:
--------------------------------------------

Summarizing:
- LocatedBlocks open(String src);
- LocatedBlocks getBlockLocations(String file, long start, long length);
- open() always returns first 10 blocks as decided by the name-node.

Does that work for everybody?

> dfs client protocol should allow asking for parts of the block map
> ------------------------------------------------------------------
>
>                 Key: HADOOP-894
>                 URL: https://issues.apache.org/jira/browse/HADOOP-894
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Owen O'Malley
>         Assigned To: Konstantin Shvachko
>         Attachments: partialBlockList.patch, partialBlockList2.patch
>
>
> I think that the HDFS client protocol should change like:
> /** The meta-data about a file that was opened. */
> class OpenFileInfo {
>   /** the info for the first block */
>   public LocatedBlockInfo getBlockInfo();
>   public long getBlockSize();
>   public long getLength();
> }
> interface ClientProtocol extends VersionedProtocol {
>   public OpenFileInfo open(String name) throws IOException;
>   /** get block info for any range of blocks */
>   public LocatedBlockInfo[] getBlockInfo(String name, int blockOffset, int blockLength) throws IOException;
> }
> so that the client can decide how much block info to request and when. Currently, when the file is opened or an error occurs, the entire block list is requested and sent.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-894) dfs client protocol should allow asking for parts of the block map

David Eric Pugh (Jira)
In reply to this post by David Eric Pugh (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12494581 ]

Doug Cutting commented on HADOOP-894:
-------------------------------------

Sameer: you're right, our current public API would not take advantage of an open with start and length, so it may be overkill.  And in many cases we also read a file header from the first block before we seek anyway.  Long-term, this might be a good optimization, to be able to open a file directly at a position, without touching the first block, and to be able to disable the reading of headers.  It would be convenient if this did not require changes to both the protocol and to the server, but instead only on the client.  To me, open(start,length) is a more general API that's no harder to implement than open(length), one that's future compatible.  The client would, for now, always pass zero for 'start'.  But I wouldn't veto open(length).  That's also a fine API and is more minimal, a good thing.

> dfs client protocol should allow asking for parts of the block map
> ------------------------------------------------------------------
>
>                 Key: HADOOP-894
>                 URL: https://issues.apache.org/jira/browse/HADOOP-894
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Owen O'Malley
>         Assigned To: Konstantin Shvachko
>         Attachments: partialBlockList.patch, partialBlockList2.patch
>
>
> I think that the HDFS client protocol should change like:
> /** The meta-data about a file that was opened. */
> class OpenFileInfo {
>   /** the info for the first block */
>   public LocatedBlockInfo getBlockInfo();
>   public long getBlockSize();
>   public long getLength();
> }
> interface ClientProtocol extends VersionedProtocol {
>   public OpenFileInfo open(String name) throws IOException;
>   /** get block info for any range of blocks */
>   public LocatedBlockInfo[] getBlockInfo(String name, int blockOffset, int blockLength) throws IOException;
> }
> so that the client can decide how much block info to request and when. Currently, when the file is opened or an error occurs, the entire block list is requested and sent.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-894) dfs client protocol should allow asking for parts of the block map

David Eric Pugh (Jira)
In reply to this post by David Eric Pugh (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12494587 ]

Doug Cutting commented on HADOOP-894:
-------------------------------------

Konstantin: Does LocatedBlocks contain the length of the file?  We need that too, don't we?  Also, why have an open() method at all, rather than just using open(start,length), letting the client pass start=0 and length=${hdfs.initial.bytes}?

> dfs client protocol should allow asking for parts of the block map
> ------------------------------------------------------------------
>
>                 Key: HADOOP-894
>                 URL: https://issues.apache.org/jira/browse/HADOOP-894
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Owen O'Malley
>         Assigned To: Konstantin Shvachko
>         Attachments: partialBlockList.patch, partialBlockList2.patch
>
>
> I think that the HDFS client protocol should change like:
> /** The meta-data about a file that was opened. */
> class OpenFileInfo {
>   /** the info for the first block */
>   public LocatedBlockInfo getBlockInfo();
>   public long getBlockSize();
>   public long getLength();
> }
> interface ClientProtocol extends VersionedProtocol {
>   public OpenFileInfo open(String name) throws IOException;
>   /** get block info for any range of blocks */
>   public LocatedBlockInfo[] getBlockInfo(String name, int blockOffset, int blockLength) throws IOException;
> }
> so that the client can decide how much block info to request and when. Currently, when the file is opened or an error occurs, the entire block list is requested and sent.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-894) dfs client protocol should allow asking for parts of the block map

David Eric Pugh (Jira)
In reply to this post by David Eric Pugh (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12494608 ]

Konstantin Shvachko commented on HADOOP-894:
--------------------------------------------

Yes. LocatedBlocks contains file length and a List of block locations.
I initially implemented open(src, length) because it is more general, and deprecated old open(src).
Dhruba finds it "not very intuitive" and Sameer says it does not "add a lot of value".

I cannot implement open(start,length) with the start > 0 right now, because in order to do that I will
need to write an offset-to-block map for cached blocks in the client. I was planning to do it in the next
iteration, but it was supposed to be used mostly in pread() that is for getBlockLocations(), not in open().

I don't see how we can benefit from introducing the start parameter, but I definitely support adding length.
So currently it's a tie 2:2. We need more votes to resolve the issue.

> dfs client protocol should allow asking for parts of the block map
> ------------------------------------------------------------------
>
>                 Key: HADOOP-894
>                 URL: https://issues.apache.org/jira/browse/HADOOP-894
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Owen O'Malley
>         Assigned To: Konstantin Shvachko
>         Attachments: partialBlockList.patch, partialBlockList2.patch
>
>
> I think that the HDFS client protocol should change like:
> /** The meta-data about a file that was opened. */
> class OpenFileInfo {
>   /** the info for the first block */
>   public LocatedBlockInfo getBlockInfo();
>   public long getBlockSize();
>   public long getLength();
> }
> interface ClientProtocol extends VersionedProtocol {
>   public OpenFileInfo open(String name) throws IOException;
>   /** get block info for any range of blocks */
>   public LocatedBlockInfo[] getBlockInfo(String name, int blockOffset, int blockLength) throws IOException;
> }
> so that the client can decide how much block info to request and when. Currently, when the file is opened or an error occurs, the entire block list is requested and sent.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-894) dfs client protocol should allow asking for parts of the block map

David Eric Pugh (Jira)
In reply to this post by David Eric Pugh (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12494613 ]

Sameer Paranjpye commented on HADOOP-894:
-----------------------------------------

> It would be convenient if this did not require changes to both the protocol and to the server, but instead only on the client. To me, open(start,length) is a more general
> API that's no harder to implement than open(length), one that's future compatible. The client would, for now, always pass zero for 'start'.

Fair enough, open(start, length) is more general and future compatible and we should implement it. The public APIs don't change for now and the client always passes 0 for start and ${hdfs.initial.bytes} for length. Maybe we use a default of 256MB and get the first 2-8 blocks depending on which of the common block sizes (32, 64 or 128MB) applies to the file.



> dfs client protocol should allow asking for parts of the block map
> ------------------------------------------------------------------
>
>                 Key: HADOOP-894
>                 URL: https://issues.apache.org/jira/browse/HADOOP-894
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Owen O'Malley
>         Assigned To: Konstantin Shvachko
>         Attachments: partialBlockList.patch, partialBlockList2.patch
>
>
> I think that the HDFS client protocol should change like:
> /** The meta-data about a file that was opened. */
> class OpenFileInfo {
>   /** the info for the first block */
>   public LocatedBlockInfo getBlockInfo();
>   public long getBlockSize();
>   public long getLength();
> }
> interface ClientProtocol extends VersionedProtocol {
>   public OpenFileInfo open(String name) throws IOException;
>   /** get block info for any range of blocks */
>   public LocatedBlockInfo[] getBlockInfo(String name, int blockOffset, int blockLength) throws IOException;
> }
> so that the client can decide how much block info to request and when. Currently, when the file is opened or an error occurs, the entire block list is requested and sent.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-894) dfs client protocol should allow asking for parts of the block map

David Eric Pugh (Jira)
In reply to this post by David Eric Pugh (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12494616 ]

Konstantin Shvachko commented on HADOOP-894:
--------------------------------------------

hdfs.initial.bytes - is it a configuration parameter?


> dfs client protocol should allow asking for parts of the block map
> ------------------------------------------------------------------
>
>                 Key: HADOOP-894
>                 URL: https://issues.apache.org/jira/browse/HADOOP-894
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Owen O'Malley
>         Assigned To: Konstantin Shvachko
>         Attachments: partialBlockList.patch, partialBlockList2.patch
>
>
> I think that the HDFS client protocol should change like:
> /** The meta-data about a file that was opened. */
> class OpenFileInfo {
>   /** the info for the first block */
>   public LocatedBlockInfo getBlockInfo();
>   public long getBlockSize();
>   public long getLength();
> }
> interface ClientProtocol extends VersionedProtocol {
>   public OpenFileInfo open(String name) throws IOException;
>   /** get block info for any range of blocks */
>   public LocatedBlockInfo[] getBlockInfo(String name, int blockOffset, int blockLength) throws IOException;
> }
> so that the client can decide how much block info to request and when. Currently, when the file is opened or an error occurs, the entire block list is requested and sent.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-894) dfs client protocol should allow asking for parts of the block map

David Eric Pugh (Jira)
In reply to this post by David Eric Pugh (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12494621 ]

Doug Cutting commented on HADOOP-894:
-------------------------------------

> hdfs.initial.bytes - is it a configuration parameter?

Yes, and it probably needs a better name.

> dfs client protocol should allow asking for parts of the block map
> ------------------------------------------------------------------
>
>                 Key: HADOOP-894
>                 URL: https://issues.apache.org/jira/browse/HADOOP-894
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Owen O'Malley
>         Assigned To: Konstantin Shvachko
>         Attachments: partialBlockList.patch, partialBlockList2.patch
>
>
> I think that the HDFS client protocol should change like:
> /** The meta-data about a file that was opened. */
> class OpenFileInfo {
>   /** the info for the first block */
>   public LocatedBlockInfo getBlockInfo();
>   public long getBlockSize();
>   public long getLength();
> }
> interface ClientProtocol extends VersionedProtocol {
>   public OpenFileInfo open(String name) throws IOException;
>   /** get block info for any range of blocks */
>   public LocatedBlockInfo[] getBlockInfo(String name, int blockOffset, int blockLength) throws IOException;
> }
> so that the client can decide how much block info to request and when. Currently, when the file is opened or an error occurs, the entire block list is requested and sent.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-894) dfs client protocol should allow asking for parts of the block map

David Eric Pugh (Jira)
In reply to this post by David Eric Pugh (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12494845 ]

Konstantin Shvachko commented on HADOOP-894:
--------------------------------------------

Do we really want it configurable? I was trying to avoid that. In my view the parameter is not significant enough
in order to include it into the configuration. I currently use a constant instead.

> dfs client protocol should allow asking for parts of the block map
> ------------------------------------------------------------------
>
>                 Key: HADOOP-894
>                 URL: https://issues.apache.org/jira/browse/HADOOP-894
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Owen O'Malley
>         Assigned To: Konstantin Shvachko
>         Attachments: partialBlockList.patch, partialBlockList2.patch
>
>
> I think that the HDFS client protocol should change like:
> /** The meta-data about a file that was opened. */
> class OpenFileInfo {
>   /** the info for the first block */
>   public LocatedBlockInfo getBlockInfo();
>   public long getBlockSize();
>   public long getLength();
> }
> interface ClientProtocol extends VersionedProtocol {
>   public OpenFileInfo open(String name) throws IOException;
>   /** get block info for any range of blocks */
>   public LocatedBlockInfo[] getBlockInfo(String name, int blockOffset, int blockLength) throws IOException;
> }
> so that the client can decide how much block info to request and when. Currently, when the file is opened or an error occurs, the entire block list is requested and sent.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-894) dfs client protocol should allow asking for parts of the block map

David Eric Pugh (Jira)
In reply to this post by David Eric Pugh (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12494912 ]

Sameer Paranjpye commented on HADOOP-894:
-----------------------------------------

No necessarily, we might want to start out with a reasonable default and introduce a configuration variable when it appears to be needed.

> dfs client protocol should allow asking for parts of the block map
> ------------------------------------------------------------------
>
>                 Key: HADOOP-894
>                 URL: https://issues.apache.org/jira/browse/HADOOP-894
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Owen O'Malley
>         Assigned To: Konstantin Shvachko
>         Attachments: partialBlockList.patch, partialBlockList2.patch
>
>
> I think that the HDFS client protocol should change like:
> /** The meta-data about a file that was opened. */
> class OpenFileInfo {
>   /** the info for the first block */
>   public LocatedBlockInfo getBlockInfo();
>   public long getBlockSize();
>   public long getLength();
> }
> interface ClientProtocol extends VersionedProtocol {
>   public OpenFileInfo open(String name) throws IOException;
>   /** get block info for any range of blocks */
>   public LocatedBlockInfo[] getBlockInfo(String name, int blockOffset, int blockLength) throws IOException;
> }
> so that the client can decide how much block info to request and when. Currently, when the file is opened or an error occurs, the entire block list is requested and sent.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-894) dfs client protocol should allow asking for parts of the block map

David Eric Pugh (Jira)
In reply to this post by David Eric Pugh (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12495111 ]

Doug Cutting commented on HADOOP-894:
-------------------------------------

> start out with a reasonable default and introduce a configuration variable when it appears to be needed

Two other options:

  1. Make it configurable but don't document it in hadoop-default.xml.

  2. Make it configurable but document it as an "expert" parameter.  (We should really go through hadoop-default.xml and mark things that most folks should leave alone as expert.)


> dfs client protocol should allow asking for parts of the block map
> ------------------------------------------------------------------
>
>                 Key: HADOOP-894
>                 URL: https://issues.apache.org/jira/browse/HADOOP-894
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Owen O'Malley
>         Assigned To: Konstantin Shvachko
>         Attachments: partialBlockList.patch, partialBlockList2.patch
>
>
> I think that the HDFS client protocol should change like:
> /** The meta-data about a file that was opened. */
> class OpenFileInfo {
>   /** the info for the first block */
>   public LocatedBlockInfo getBlockInfo();
>   public long getBlockSize();
>   public long getLength();
> }
> interface ClientProtocol extends VersionedProtocol {
>   public OpenFileInfo open(String name) throws IOException;
>   /** get block info for any range of blocks */
>   public LocatedBlockInfo[] getBlockInfo(String name, int blockOffset, int blockLength) throws IOException;
> }
> so that the client can decide how much block info to request and when. Currently, when the file is opened or an error occurs, the entire block list is requested and sent.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-894) dfs client protocol should allow asking for parts of the block map

David Eric Pugh (Jira)
In reply to this post by David Eric Pugh (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12495118 ]

Konstantin Shvachko commented on HADOOP-894:
--------------------------------------------

More precisely, the length that I pass to open() is 10 * {dfs.block.size}, that is 10 default block sizes.
So it is in a sense configurable, but not as a separate parameter.


> dfs client protocol should allow asking for parts of the block map
> ------------------------------------------------------------------
>
>                 Key: HADOOP-894
>                 URL: https://issues.apache.org/jira/browse/HADOOP-894
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Owen O'Malley
>         Assigned To: Konstantin Shvachko
>         Attachments: partialBlockList.patch, partialBlockList2.patch
>
>
> I think that the HDFS client protocol should change like:
> /** The meta-data about a file that was opened. */
> class OpenFileInfo {
>   /** the info for the first block */
>   public LocatedBlockInfo getBlockInfo();
>   public long getBlockSize();
>   public long getLength();
> }
> interface ClientProtocol extends VersionedProtocol {
>   public OpenFileInfo open(String name) throws IOException;
>   /** get block info for any range of blocks */
>   public LocatedBlockInfo[] getBlockInfo(String name, int blockOffset, int blockLength) throws IOException;
> }
> so that the client can decide how much block info to request and when. Currently, when the file is opened or an error occurs, the entire block list is requested and sent.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-894) dfs client protocol should allow asking for parts of the block map

David Eric Pugh (Jira)
In reply to this post by David Eric Pugh (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12495132 ]

Doug Cutting commented on HADOOP-894:
-------------------------------------

> the length that I pass to open() is 10 * {dfs.block.size}

It's too bad we don't support expressions in config files..  In the meantime, adding it as a config variable with no value in hadoop-default.xml, or a commented-out value.  Perhaps we should change Configuration so that if the value for a numeric field is "" then the default is used...

Also, 2 would be a better default for mapreduce inputs.


> dfs client protocol should allow asking for parts of the block map
> ------------------------------------------------------------------
>
>                 Key: HADOOP-894
>                 URL: https://issues.apache.org/jira/browse/HADOOP-894
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Owen O'Malley
>         Assigned To: Konstantin Shvachko
>         Attachments: partialBlockList.patch, partialBlockList2.patch
>
>
> I think that the HDFS client protocol should change like:
> /** The meta-data about a file that was opened. */
> class OpenFileInfo {
>   /** the info for the first block */
>   public LocatedBlockInfo getBlockInfo();
>   public long getBlockSize();
>   public long getLength();
> }
> interface ClientProtocol extends VersionedProtocol {
>   public OpenFileInfo open(String name) throws IOException;
>   /** get block info for any range of blocks */
>   public LocatedBlockInfo[] getBlockInfo(String name, int blockOffset, int blockLength) throws IOException;
> }
> so that the client can decide how much block info to request and when. Currently, when the file is opened or an error occurs, the entire block list is requested and sent.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (HADOOP-894) dfs client protocol should allow asking for parts of the block map

David Eric Pugh (Jira)
In reply to this post by David Eric Pugh (Jira)

     [ https://issues.apache.org/jira/browse/HADOOP-894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Konstantin Shvachko updated HADOOP-894:
---------------------------------------

    Attachment: partialBlockList3.patch

In this patch:
- open takes three parameters open(src,  offset, length)
- there is an undocumented config parameter "dfs.read.prefetch.size" that defines the range within which
we want all block locations to be fetch from the name-node during the open call.
- I kept 10*defaultBlockSize as the default, because 2 vs 10 does not improve much communication or name-node
performance, but in most cases 10 will be ALL blocks for the majority of files.
- Implemented block location caching for reads and preads.
- Included more test cases in TestPread


> dfs client protocol should allow asking for parts of the block map
> ------------------------------------------------------------------
>
>                 Key: HADOOP-894
>                 URL: https://issues.apache.org/jira/browse/HADOOP-894
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Owen O'Malley
>         Assigned To: Konstantin Shvachko
>         Attachments: partialBlockList.patch, partialBlockList2.patch, partialBlockList3.patch
>
>
> I think that the HDFS client protocol should change like:
> /** The meta-data about a file that was opened. */
> class OpenFileInfo {
>   /** the info for the first block */
>   public LocatedBlockInfo getBlockInfo();
>   public long getBlockSize();
>   public long getLength();
> }
> interface ClientProtocol extends VersionedProtocol {
>   public OpenFileInfo open(String name) throws IOException;
>   /** get block info for any range of blocks */
>   public LocatedBlockInfo[] getBlockInfo(String name, int blockOffset, int blockLength) throws IOException;
> }
> so that the client can decide how much block info to request and when. Currently, when the file is opened or an error occurs, the entire block list is requested and sent.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (HADOOP-894) dfs client protocol should allow asking for parts of the block map

David Eric Pugh (Jira)
In reply to this post by David Eric Pugh (Jira)

     [ https://issues.apache.org/jira/browse/HADOOP-894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Konstantin Shvachko updated HADOOP-894:
---------------------------------------

        Fix Version/s: 0.13.0
    Affects Version/s: 0.12.0
               Status: Patch Available  (was: Open)

> dfs client protocol should allow asking for parts of the block map
> ------------------------------------------------------------------
>
>                 Key: HADOOP-894
>                 URL: https://issues.apache.org/jira/browse/HADOOP-894
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.12.0
>            Reporter: Owen O'Malley
>         Assigned To: Konstantin Shvachko
>             Fix For: 0.13.0
>
>         Attachments: partialBlockList.patch, partialBlockList2.patch, partialBlockList3.patch
>
>
> I think that the HDFS client protocol should change like:
> /** The meta-data about a file that was opened. */
> class OpenFileInfo {
>   /** the info for the first block */
>   public LocatedBlockInfo getBlockInfo();
>   public long getBlockSize();
>   public long getLength();
> }
> interface ClientProtocol extends VersionedProtocol {
>   public OpenFileInfo open(String name) throws IOException;
>   /** get block info for any range of blocks */
>   public LocatedBlockInfo[] getBlockInfo(String name, int blockOffset, int blockLength) throws IOException;
> }
> so that the client can decide how much block info to request and when. Currently, when the file is opened or an error occurs, the entire block list is requested and sent.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-894) dfs client protocol should allow asking for parts of the block map

David Eric Pugh (Jira)
In reply to this post by David Eric Pugh (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12496159 ]

Hadoop QA commented on HADOOP-894:
----------------------------------

-1, could not apply patch.

The patch command could not apply the latest attachment http://issues.apache.org/jira/secure/attachment/12357430/partialBlockList3.patch as a patch to trunk revision r538318.

Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/145/console

Please note that this message is automatically generated and may represent a problem with the automation system and not the patch.

> dfs client protocol should allow asking for parts of the block map
> ------------------------------------------------------------------
>
>                 Key: HADOOP-894
>                 URL: https://issues.apache.org/jira/browse/HADOOP-894
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.12.0
>            Reporter: Owen O'Malley
>         Assigned To: Konstantin Shvachko
>             Fix For: 0.13.0
>
>         Attachments: partialBlockList.patch, partialBlockList2.patch, partialBlockList3.patch
>
>
> I think that the HDFS client protocol should change like:
> /** The meta-data about a file that was opened. */
> class OpenFileInfo {
>   /** the info for the first block */
>   public LocatedBlockInfo getBlockInfo();
>   public long getBlockSize();
>   public long getLength();
> }
> interface ClientProtocol extends VersionedProtocol {
>   public OpenFileInfo open(String name) throws IOException;
>   /** get block info for any range of blocks */
>   public LocatedBlockInfo[] getBlockInfo(String name, int blockOffset, int blockLength) throws IOException;
> }
> so that the client can decide how much block info to request and when. Currently, when the file is opened or an error occurs, the entire block list is requested and sent.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (HADOOP-894) dfs client protocol should allow asking for parts of the block map

David Eric Pugh (Jira)
In reply to this post by David Eric Pugh (Jira)

     [ https://issues.apache.org/jira/browse/HADOOP-894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Konstantin Shvachko updated HADOOP-894:
---------------------------------------

    Status: Open  (was: Patch Available)

> dfs client protocol should allow asking for parts of the block map
> ------------------------------------------------------------------
>
>                 Key: HADOOP-894
>                 URL: https://issues.apache.org/jira/browse/HADOOP-894
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.12.0
>            Reporter: Owen O'Malley
>         Assigned To: Konstantin Shvachko
>             Fix For: 0.13.0
>
>         Attachments: partialBlockList.patch, partialBlockList2.patch, partialBlockList3.patch
>
>
> I think that the HDFS client protocol should change like:
> /** The meta-data about a file that was opened. */
> class OpenFileInfo {
>   /** the info for the first block */
>   public LocatedBlockInfo getBlockInfo();
>   public long getBlockSize();
>   public long getLength();
> }
> interface ClientProtocol extends VersionedProtocol {
>   public OpenFileInfo open(String name) throws IOException;
>   /** get block info for any range of blocks */
>   public LocatedBlockInfo[] getBlockInfo(String name, int blockOffset, int blockLength) throws IOException;
> }
> so that the client can decide how much block info to request and when. Currently, when the file is opened or an error occurs, the entire block list is requested and sent.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (HADOOP-894) dfs client protocol should allow asking for parts of the block map

David Eric Pugh (Jira)
In reply to this post by David Eric Pugh (Jira)

     [ https://issues.apache.org/jira/browse/HADOOP-894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Konstantin Shvachko updated HADOOP-894:
---------------------------------------

    Attachment: partialBlockList4.patch

Synchronized with the trunk.

> dfs client protocol should allow asking for parts of the block map
> ------------------------------------------------------------------
>
>                 Key: HADOOP-894
>                 URL: https://issues.apache.org/jira/browse/HADOOP-894
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.12.0
>            Reporter: Owen O'Malley
>         Assigned To: Konstantin Shvachko
>             Fix For: 0.13.0
>
>         Attachments: partialBlockList2.patch, partialBlockList3.patch, partialBlockList4.patch
>
>
> I think that the HDFS client protocol should change like:
> /** The meta-data about a file that was opened. */
> class OpenFileInfo {
>   /** the info for the first block */
>   public LocatedBlockInfo getBlockInfo();
>   public long getBlockSize();
>   public long getLength();
> }
> interface ClientProtocol extends VersionedProtocol {
>   public OpenFileInfo open(String name) throws IOException;
>   /** get block info for any range of blocks */
>   public LocatedBlockInfo[] getBlockInfo(String name, int blockOffset, int blockLength) throws IOException;
> }
> so that the client can decide how much block info to request and when. Currently, when the file is opened or an error occurs, the entire block list is requested and sent.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (HADOOP-894) dfs client protocol should allow asking for parts of the block map

David Eric Pugh (Jira)
In reply to this post by David Eric Pugh (Jira)

     [ https://issues.apache.org/jira/browse/HADOOP-894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Konstantin Shvachko updated HADOOP-894:
---------------------------------------

    Attachment:     (was: partialBlockList.patch)

> dfs client protocol should allow asking for parts of the block map
> ------------------------------------------------------------------
>
>                 Key: HADOOP-894
>                 URL: https://issues.apache.org/jira/browse/HADOOP-894
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.12.0
>            Reporter: Owen O'Malley
>         Assigned To: Konstantin Shvachko
>             Fix For: 0.13.0
>
>         Attachments: partialBlockList2.patch, partialBlockList3.patch, partialBlockList4.patch
>
>
> I think that the HDFS client protocol should change like:
> /** The meta-data about a file that was opened. */
> class OpenFileInfo {
>   /** the info for the first block */
>   public LocatedBlockInfo getBlockInfo();
>   public long getBlockSize();
>   public long getLength();
> }
> interface ClientProtocol extends VersionedProtocol {
>   public OpenFileInfo open(String name) throws IOException;
>   /** get block info for any range of blocks */
>   public LocatedBlockInfo[] getBlockInfo(String name, int blockOffset, int blockLength) throws IOException;
> }
> so that the client can decide how much block info to request and when. Currently, when the file is opened or an error occurs, the entire block list is requested and sent.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (HADOOP-894) dfs client protocol should allow asking for parts of the block map

David Eric Pugh (Jira)
In reply to this post by David Eric Pugh (Jira)

     [ https://issues.apache.org/jira/browse/HADOOP-894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Konstantin Shvachko updated HADOOP-894:
---------------------------------------

    Status: Patch Available  (was: Open)

> dfs client protocol should allow asking for parts of the block map
> ------------------------------------------------------------------
>
>                 Key: HADOOP-894
>                 URL: https://issues.apache.org/jira/browse/HADOOP-894
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.12.0
>            Reporter: Owen O'Malley
>         Assigned To: Konstantin Shvachko
>             Fix For: 0.13.0
>
>         Attachments: partialBlockList2.patch, partialBlockList3.patch, partialBlockList4.patch
>
>
> I think that the HDFS client protocol should change like:
> /** The meta-data about a file that was opened. */
> class OpenFileInfo {
>   /** the info for the first block */
>   public LocatedBlockInfo getBlockInfo();
>   public long getBlockSize();
>   public long getLength();
> }
> interface ClientProtocol extends VersionedProtocol {
>   public OpenFileInfo open(String name) throws IOException;
>   /** get block info for any range of blocks */
>   public LocatedBlockInfo[] getBlockInfo(String name, int blockOffset, int blockLength) throws IOException;
> }
> so that the client can decide how much block info to request and when. Currently, when the file is opened or an error occurs, the entire block list is requested and sent.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

123