[jira] Created: (HADOOP-894) dfs client protocol should allow asking for parts of the block map

classic Classic list List threaded Threaded
56 messages Options
123
Reply | Threaded
Open this post in threaded view
|

[jira] Created: (HADOOP-894) dfs client protocol should allow asking for parts of the block map

Nick Burch (Jira)
dfs client protocol should allow asking for parts of the block map
------------------------------------------------------------------

                 Key: HADOOP-894
                 URL: https://issues.apache.org/jira/browse/HADOOP-894
             Project: Hadoop
          Issue Type: Improvement
          Components: dfs
            Reporter: Owen O'Malley
         Assigned To: Sameer Paranjpye


I think that the HDFS client protocol should change like:

/** The meta-data about a file that was opened. */
class OpenFileInfo {
  /** the info for the first block */
  public LocatedBlockInfo getBlockInfo();
  public long getBlockSize();
  public long getLength();
}

interface ClientProtocol extends VersionedProtocol {
  public OpenFileInfo open(String name) throws IOException;
  /** get block info for any range of blocks */
  public LocatedBlockInfo[] getBlockInfo(String name, int blockOffset, int blockLength) throws IOException;
}

so that the client can decide how much block info to request and when. Currently, when the file is opened or an error occurs, the entire block list is requested and sent.

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       
Reply | Threaded
Open this post in threaded view
|

Assigned: (HADOOP-894) dfs client protocol should allow asking for parts of the block map

Nick Burch (Jira)

     [ https://issues.apache.org/jira/browse/HADOOP-894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Wendy Chien reassigned HADOOP-894:
----------------------------------

    Assignee: Wendy Chien  (was: Sameer Paranjpye)

> dfs client protocol should allow asking for parts of the block map
> ------------------------------------------------------------------
>
>                 Key: HADOOP-894
>                 URL: https://issues.apache.org/jira/browse/HADOOP-894
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Owen O'Malley
>         Assigned To: Wendy Chien
>
> I think that the HDFS client protocol should change like:
> /** The meta-data about a file that was opened. */
> class OpenFileInfo {
>   /** the info for the first block */
>   public LocatedBlockInfo getBlockInfo();
>   public long getBlockSize();
>   public long getLength();
> }
> interface ClientProtocol extends VersionedProtocol {
>   public OpenFileInfo open(String name) throws IOException;
>   /** get block info for any range of blocks */
>   public LocatedBlockInfo[] getBlockInfo(String name, int blockOffset, int blockLength) throws IOException;
> }
> so that the client can decide how much block info to request and when. Currently, when the file is opened or an error occurs, the entire block list is requested and sent.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-894) dfs client protocol should allow asking for parts of the block map

Nick Burch (Jira)
In reply to this post by Nick Burch (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12482631 ]

Konstantin Shvachko commented on HADOOP-894:
--------------------------------------------

I understand the problem as that a lot of clients are opening the same file and read the first block of it,
e.g. in streaming, and then each reads a specific part of the file. So each client does not need to receive
a block map for the whole file, but rather needs to get block locations in a specified range.

I propose to modify ClientProtocol.open() to
OpenFileInfo open( String src, int numBlocks )
where
src - is the path;
numBlocks - is the number of blocks, which locations the client wants to be calculated by the open()
@returns
OpenFileInfo : extends DFSFileInfo {
    LocatedBlock[ numBlocks ];
}
DFSFileInfo contains file information including file length and replication.

ClientProtocol should also contain
public LocatedBlock[] getBlockLocations(String src, int offset, int length) throws IOException;
offset - is the starting offset in the file
length - is the number of bytes the client is supposed to read

class LocatedBlock should include an additional field
+ long startFrom;  which determines the offset within the block to the desired region of bytes.

Then we will need to reimplement seeks and reads for DFSInputStream using that API.
What would be a good default for the number of blocks that getBlockLocations()
would fetch per call if the file is read from start to finish?

> dfs client protocol should allow asking for parts of the block map
> ------------------------------------------------------------------
>
>                 Key: HADOOP-894
>                 URL: https://issues.apache.org/jira/browse/HADOOP-894
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Owen O'Malley
>         Assigned To: Wendy Chien
>
> I think that the HDFS client protocol should change like:
> /** The meta-data about a file that was opened. */
> class OpenFileInfo {
>   /** the info for the first block */
>   public LocatedBlockInfo getBlockInfo();
>   public long getBlockSize();
>   public long getLength();
> }
> interface ClientProtocol extends VersionedProtocol {
>   public OpenFileInfo open(String name) throws IOException;
>   /** get block info for any range of blocks */
>   public LocatedBlockInfo[] getBlockInfo(String name, int blockOffset, int blockLength) throws IOException;
> }
> so that the client can decide how much block info to request and when. Currently, when the file is opened or an error occurs, the entire block list is requested and sent.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Assigned: (HADOOP-894) dfs client protocol should allow asking for parts of the block map

Nick Burch (Jira)
In reply to this post by Nick Burch (Jira)

     [ https://issues.apache.org/jira/browse/HADOOP-894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Owen O'Malley reassigned HADOOP-894:
------------------------------------

    Assignee: dhruba borthakur  (was: Wendy Chien)

> dfs client protocol should allow asking for parts of the block map
> ------------------------------------------------------------------
>
>                 Key: HADOOP-894
>                 URL: https://issues.apache.org/jira/browse/HADOOP-894
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Owen O'Malley
>         Assigned To: dhruba borthakur
>
> I think that the HDFS client protocol should change like:
> /** The meta-data about a file that was opened. */
> class OpenFileInfo {
>   /** the info for the first block */
>   public LocatedBlockInfo getBlockInfo();
>   public long getBlockSize();
>   public long getLength();
> }
> interface ClientProtocol extends VersionedProtocol {
>   public OpenFileInfo open(String name) throws IOException;
>   /** get block info for any range of blocks */
>   public LocatedBlockInfo[] getBlockInfo(String name, int blockOffset, int blockLength) throws IOException;
> }
> so that the client can decide how much block info to request and when. Currently, when the file is opened or an error occurs, the entire block list is requested and sent.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Assigned: (HADOOP-894) dfs client protocol should allow asking for parts of the block map

Nick Burch (Jira)
In reply to this post by Nick Burch (Jira)

     [ https://issues.apache.org/jira/browse/HADOOP-894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Konstantin Shvachko reassigned HADOOP-894:
------------------------------------------

    Assignee: Konstantin Shvachko  (was: dhruba borthakur)

> dfs client protocol should allow asking for parts of the block map
> ------------------------------------------------------------------
>
>                 Key: HADOOP-894
>                 URL: https://issues.apache.org/jira/browse/HADOOP-894
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Owen O'Malley
>         Assigned To: Konstantin Shvachko
>
> I think that the HDFS client protocol should change like:
> /** The meta-data about a file that was opened. */
> class OpenFileInfo {
>   /** the info for the first block */
>   public LocatedBlockInfo getBlockInfo();
>   public long getBlockSize();
>   public long getLength();
> }
> interface ClientProtocol extends VersionedProtocol {
>   public OpenFileInfo open(String name) throws IOException;
>   /** get block info for any range of blocks */
>   public LocatedBlockInfo[] getBlockInfo(String name, int blockOffset, int blockLength) throws IOException;
> }
> so that the client can decide how much block info to request and when. Currently, when the file is opened or an error occurs, the entire block list is requested and sent.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (HADOOP-894) dfs client protocol should allow asking for parts of the block map

Nick Burch (Jira)
In reply to this post by Nick Burch (Jira)

     [ https://issues.apache.org/jira/browse/HADOOP-894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Konstantin Shvachko updated HADOOP-894:
---------------------------------------

    Attachment: partialBlockList.patch

In this patch:
- I included the list of LocatedBlock directly into DFSFileInfo, rather than overloading the class.
- removed redundant members in DFSFileInfo
- ClientProtocol.open(src, length) takes 2 parameters now: the file name and the length of the starting segment
of the file for which block locations must be returned
- Old open(src) is deprecated. I've seen many servlets used it directly. I replaced those calls by
getBlockLocations() in hadoop servlets, but there might be others.
- new ClientProtocol.getBlockLocations() method is introduced
- DFSInputStream during initialization fetches only 10 blocks, subsequent blocks are requested and
cached during the regular read().
- pread first tries to use already cached blocks, then requests block locations from the name-node.
- DFSClient.getHints() now calls getBlockLocations(), I removed redundant getHints() from ClientProocol and NameNode
- many existing tests verify new functionality, I added one more case to TestPread, which ensures pread correctly
reads both cached and uncached blocks.
- checked style and checked JavaDoc.

> dfs client protocol should allow asking for parts of the block map
> ------------------------------------------------------------------
>
>                 Key: HADOOP-894
>                 URL: https://issues.apache.org/jira/browse/HADOOP-894
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Owen O'Malley
>         Assigned To: Konstantin Shvachko
>         Attachments: partialBlockList.patch
>
>
> I think that the HDFS client protocol should change like:
> /** The meta-data about a file that was opened. */
> class OpenFileInfo {
>   /** the info for the first block */
>   public LocatedBlockInfo getBlockInfo();
>   public long getBlockSize();
>   public long getLength();
> }
> interface ClientProtocol extends VersionedProtocol {
>   public OpenFileInfo open(String name) throws IOException;
>   /** get block info for any range of blocks */
>   public LocatedBlockInfo[] getBlockInfo(String name, int blockOffset, int blockLength) throws IOException;
> }
> so that the client can decide how much block info to request and when. Currently, when the file is opened or an error occurs, the entire block list is requested and sent.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (HADOOP-894) dfs client protocol should allow asking for parts of the block map

Nick Burch (Jira)
In reply to this post by Nick Burch (Jira)

     [ https://issues.apache.org/jira/browse/HADOOP-894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Konstantin Shvachko updated HADOOP-894:
---------------------------------------

    Fix Version/s: 0.13.0
           Status: Patch Available  (was: Open)

> dfs client protocol should allow asking for parts of the block map
> ------------------------------------------------------------------
>
>                 Key: HADOOP-894
>                 URL: https://issues.apache.org/jira/browse/HADOOP-894
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Owen O'Malley
>         Assigned To: Konstantin Shvachko
>             Fix For: 0.13.0
>
>         Attachments: partialBlockList.patch
>
>
> I think that the HDFS client protocol should change like:
> /** The meta-data about a file that was opened. */
> class OpenFileInfo {
>   /** the info for the first block */
>   public LocatedBlockInfo getBlockInfo();
>   public long getBlockSize();
>   public long getLength();
> }
> interface ClientProtocol extends VersionedProtocol {
>   public OpenFileInfo open(String name) throws IOException;
>   /** get block info for any range of blocks */
>   public LocatedBlockInfo[] getBlockInfo(String name, int blockOffset, int blockLength) throws IOException;
> }
> so that the client can decide how much block info to request and when. Currently, when the file is opened or an error occurs, the entire block list is requested and sent.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-894) dfs client protocol should allow asking for parts of the block map

Nick Burch (Jira)
In reply to this post by Nick Burch (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12493433 ]

Hadoop QA commented on HADOOP-894:
----------------------------------

-1, new javadoc warnings

The javadoc tool appears to have generated warning messages when testing the latest attachment http://issues.apache.org/jira/secure/attachment/12356609/partialBlockList.patch against trunk revision r534624.

Test results:   http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/111/testReport/
Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/111/console

Please note that this message is automatically generated and may represent a problem with the automation system and not the patch.

> dfs client protocol should allow asking for parts of the block map
> ------------------------------------------------------------------
>
>                 Key: HADOOP-894
>                 URL: https://issues.apache.org/jira/browse/HADOOP-894
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Owen O'Malley
>         Assigned To: Konstantin Shvachko
>             Fix For: 0.13.0
>
>         Attachments: partialBlockList.patch
>
>
> I think that the HDFS client protocol should change like:
> /** The meta-data about a file that was opened. */
> class OpenFileInfo {
>   /** the info for the first block */
>   public LocatedBlockInfo getBlockInfo();
>   public long getBlockSize();
>   public long getLength();
> }
> interface ClientProtocol extends VersionedProtocol {
>   public OpenFileInfo open(String name) throws IOException;
>   /** get block info for any range of blocks */
>   public LocatedBlockInfo[] getBlockInfo(String name, int blockOffset, int blockLength) throws IOException;
> }
> so that the client can decide how much block info to request and when. Currently, when the file is opened or an error occurs, the entire block list is requested and sent.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-894) dfs client protocol should allow asking for parts of the block map

Nick Burch (Jira)
In reply to this post by Nick Burch (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12493447 ]

dhruba borthakur commented on HADOOP-894:
-----------------------------------------

One issue we discussed earlier: The ClientProtocol open method used to take a path name. It was of the form:

public LocatedBlock[] open(String src)

This patch changes it to

public DFSFileInfo open(String src, long length)

The modified "open" API is not very intuitive because it is taking a "length" parameter. If we want to keep the ClientProtocol elegant and simple, we might want to remove the "length" parameter from call. The server is free to send back as many block locations as it deems fit. Typically, the server will be send one or two block locations.

> dfs client protocol should allow asking for parts of the block map
> ------------------------------------------------------------------
>
>                 Key: HADOOP-894
>                 URL: https://issues.apache.org/jira/browse/HADOOP-894
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Owen O'Malley
>         Assigned To: Konstantin Shvachko
>             Fix For: 0.13.0
>
>         Attachments: partialBlockList.patch
>
>
> I think that the HDFS client protocol should change like:
> /** The meta-data about a file that was opened. */
> class OpenFileInfo {
>   /** the info for the first block */
>   public LocatedBlockInfo getBlockInfo();
>   public long getBlockSize();
>   public long getLength();
> }
> interface ClientProtocol extends VersionedProtocol {
>   public OpenFileInfo open(String name) throws IOException;
>   /** get block info for any range of blocks */
>   public LocatedBlockInfo[] getBlockInfo(String name, int blockOffset, int blockLength) throws IOException;
> }
> so that the client can decide how much block info to request and when. Currently, when the file is opened or an error occurs, the entire block list is requested and sent.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (HADOOP-894) dfs client protocol should allow asking for parts of the block map

Nick Burch (Jira)
In reply to this post by Nick Burch (Jira)

     [ https://issues.apache.org/jira/browse/HADOOP-894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Konstantin Shvachko updated HADOOP-894:
---------------------------------------

    Status: Open  (was: Patch Available)

> dfs client protocol should allow asking for parts of the block map
> ------------------------------------------------------------------
>
>                 Key: HADOOP-894
>                 URL: https://issues.apache.org/jira/browse/HADOOP-894
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Owen O'Malley
>         Assigned To: Konstantin Shvachko
>             Fix For: 0.13.0
>
>         Attachments: partialBlockList.patch, partialBlockList2.patch
>
>
> I think that the HDFS client protocol should change like:
> /** The meta-data about a file that was opened. */
> class OpenFileInfo {
>   /** the info for the first block */
>   public LocatedBlockInfo getBlockInfo();
>   public long getBlockSize();
>   public long getLength();
> }
> interface ClientProtocol extends VersionedProtocol {
>   public OpenFileInfo open(String name) throws IOException;
>   /** get block info for any range of blocks */
>   public LocatedBlockInfo[] getBlockInfo(String name, int blockOffset, int blockLength) throws IOException;
> }
> so that the client can decide how much block info to request and when. Currently, when the file is opened or an error occurs, the entire block list is requested and sent.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (HADOOP-894) dfs client protocol should allow asking for parts of the block map

Nick Burch (Jira)
In reply to this post by Nick Burch (Jira)

     [ https://issues.apache.org/jira/browse/HADOOP-894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Konstantin Shvachko updated HADOOP-894:
---------------------------------------

    Attachment: partialBlockList2.patch

Removed JavaDoc warning. Applied to the current trunk.

> dfs client protocol should allow asking for parts of the block map
> ------------------------------------------------------------------
>
>                 Key: HADOOP-894
>                 URL: https://issues.apache.org/jira/browse/HADOOP-894
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Owen O'Malley
>         Assigned To: Konstantin Shvachko
>             Fix For: 0.13.0
>
>         Attachments: partialBlockList.patch, partialBlockList2.patch
>
>
> I think that the HDFS client protocol should change like:
> /** The meta-data about a file that was opened. */
> class OpenFileInfo {
>   /** the info for the first block */
>   public LocatedBlockInfo getBlockInfo();
>   public long getBlockSize();
>   public long getLength();
> }
> interface ClientProtocol extends VersionedProtocol {
>   public OpenFileInfo open(String name) throws IOException;
>   /** get block info for any range of blocks */
>   public LocatedBlockInfo[] getBlockInfo(String name, int blockOffset, int blockLength) throws IOException;
> }
> so that the client can decide how much block info to request and when. Currently, when the file is opened or an error occurs, the entire block list is requested and sent.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (HADOOP-894) dfs client protocol should allow asking for parts of the block map

Nick Burch (Jira)
In reply to this post by Nick Burch (Jira)

     [ https://issues.apache.org/jira/browse/HADOOP-894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Konstantin Shvachko updated HADOOP-894:
---------------------------------------

    Status: Patch Available  (was: Open)

> dfs client protocol should allow asking for parts of the block map
> ------------------------------------------------------------------
>
>                 Key: HADOOP-894
>                 URL: https://issues.apache.org/jira/browse/HADOOP-894
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Owen O'Malley
>         Assigned To: Konstantin Shvachko
>             Fix For: 0.13.0
>
>         Attachments: partialBlockList.patch, partialBlockList2.patch
>
>
> I think that the HDFS client protocol should change like:
> /** The meta-data about a file that was opened. */
> class OpenFileInfo {
>   /** the info for the first block */
>   public LocatedBlockInfo getBlockInfo();
>   public long getBlockSize();
>   public long getLength();
> }
> interface ClientProtocol extends VersionedProtocol {
>   public OpenFileInfo open(String name) throws IOException;
>   /** get block info for any range of blocks */
>   public LocatedBlockInfo[] getBlockInfo(String name, int blockOffset, int blockLength) throws IOException;
> }
> so that the client can decide how much block info to request and when. Currently, when the file is opened or an error occurs, the entire block list is requested and sent.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-894) dfs client protocol should allow asking for parts of the block map

Nick Burch (Jira)
In reply to this post by Nick Burch (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12493453 ]

Konstantin Shvachko commented on HADOOP-894:
--------------------------------------------

Yes, I can change the prototype to public DFSFileInfo open(String src) as Dhruba proposes.
But then open() will always return 10 blocks, and if we decide to implement something that will require
only one block or all blocks on open we will not be able to optimize that.
So there is a trade off here functionality/flexibility vs simplicity.
I vote for flexibility in the case.

> dfs client protocol should allow asking for parts of the block map
> ------------------------------------------------------------------
>
>                 Key: HADOOP-894
>                 URL: https://issues.apache.org/jira/browse/HADOOP-894
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Owen O'Malley
>         Assigned To: Konstantin Shvachko
>             Fix For: 0.13.0
>
>         Attachments: partialBlockList.patch, partialBlockList2.patch
>
>
> I think that the HDFS client protocol should change like:
> /** The meta-data about a file that was opened. */
> class OpenFileInfo {
>   /** the info for the first block */
>   public LocatedBlockInfo getBlockInfo();
>   public long getBlockSize();
>   public long getLength();
> }
> interface ClientProtocol extends VersionedProtocol {
>   public OpenFileInfo open(String name) throws IOException;
>   /** get block info for any range of blocks */
>   public LocatedBlockInfo[] getBlockInfo(String name, int blockOffset, int blockLength) throws IOException;
> }
> so that the client can decide how much block info to request and when. Currently, when the file is opened or an error occurs, the entire block list is requested and sent.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-894) dfs client protocol should allow asking for parts of the block map

Nick Burch (Jira)
In reply to this post by Nick Burch (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12493476 ]

Hadoop QA commented on HADOOP-894:
----------------------------------

+1

http://issues.apache.org/jira/secure/attachment/12356728/partialBlockList2.patch applied and successfully tested against trunk revision r534624.

Test results:   http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/112/testReport/
Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/112/console

> dfs client protocol should allow asking for parts of the block map
> ------------------------------------------------------------------
>
>                 Key: HADOOP-894
>                 URL: https://issues.apache.org/jira/browse/HADOOP-894
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Owen O'Malley
>         Assigned To: Konstantin Shvachko
>             Fix For: 0.13.0
>
>         Attachments: partialBlockList.patch, partialBlockList2.patch
>
>
> I think that the HDFS client protocol should change like:
> /** The meta-data about a file that was opened. */
> class OpenFileInfo {
>   /** the info for the first block */
>   public LocatedBlockInfo getBlockInfo();
>   public long getBlockSize();
>   public long getLength();
> }
> interface ClientProtocol extends VersionedProtocol {
>   public OpenFileInfo open(String name) throws IOException;
>   /** get block info for any range of blocks */
>   public LocatedBlockInfo[] getBlockInfo(String name, int blockOffset, int blockLength) throws IOException;
> }
> so that the client can decide how much block info to request and when. Currently, when the file is opened or an error occurs, the entire block list is requested and sent.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-894) dfs client protocol should allow asking for parts of the block map

Nick Burch (Jira)
In reply to this post by Nick Burch (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12493508 ]

Doug Cutting commented on HADOOP-894:
-------------------------------------

I think it's strange to put LocatedBlocks in DFSFileInfo.  You're trying to optimize the protocol, so that a separate call isn't required to get the length, right?  So let's make that explicit by returning the file length along with the list of blocks, rather than hacking DFSFileInfo.

public LocatedBlocks {
  private LocatedBlock[] blocks;
  private long fileLength;
}

public LocatedBlocks getBlockLocations(String file, long start, long length);

Then we don't need the open() method at all.  getBlockLocations() replaces it altogether.  This also has the benefit that someone can open a file in the middle with a single RPC.


> dfs client protocol should allow asking for parts of the block map
> ------------------------------------------------------------------
>
>                 Key: HADOOP-894
>                 URL: https://issues.apache.org/jira/browse/HADOOP-894
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Owen O'Malley
>         Assigned To: Konstantin Shvachko
>             Fix For: 0.13.0
>
>         Attachments: partialBlockList.patch, partialBlockList2.patch
>
>
> I think that the HDFS client protocol should change like:
> /** The meta-data about a file that was opened. */
> class OpenFileInfo {
>   /** the info for the first block */
>   public LocatedBlockInfo getBlockInfo();
>   public long getBlockSize();
>   public long getLength();
> }
> interface ClientProtocol extends VersionedProtocol {
>   public OpenFileInfo open(String name) throws IOException;
>   /** get block info for any range of blocks */
>   public LocatedBlockInfo[] getBlockInfo(String name, int blockOffset, int blockLength) throws IOException;
> }
> so that the client can decide how much block info to request and when. Currently, when the file is opened or an error occurs, the entire block list is requested and sent.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-894) dfs client protocol should allow asking for parts of the block map

Nick Burch (Jira)
In reply to this post by Nick Burch (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12493558 ]

Konstantin Shvachko commented on HADOOP-894:
--------------------------------------------

I was just about to comment that open(...) is a convenience call, which combines 2 calls
DFSFileInfo getListing(src) and getBlockLocations(src, 0, length).
DFSFileInfo.fileLength if the only field that is widely used in current implementation.
So if folks can live without other fields like blockSize and blockReplication I am removing open().



> dfs client protocol should allow asking for parts of the block map
> ------------------------------------------------------------------
>
>                 Key: HADOOP-894
>                 URL: https://issues.apache.org/jira/browse/HADOOP-894
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Owen O'Malley
>         Assigned To: Konstantin Shvachko
>             Fix For: 0.13.0
>
>         Attachments: partialBlockList.patch, partialBlockList2.patch
>
>
> I think that the HDFS client protocol should change like:
> /** The meta-data about a file that was opened. */
> class OpenFileInfo {
>   /** the info for the first block */
>   public LocatedBlockInfo getBlockInfo();
>   public long getBlockSize();
>   public long getLength();
> }
> interface ClientProtocol extends VersionedProtocol {
>   public OpenFileInfo open(String name) throws IOException;
>   /** get block info for any range of blocks */
>   public LocatedBlockInfo[] getBlockInfo(String name, int blockOffset, int blockLength) throws IOException;
> }
> so that the client can decide how much block info to request and when. Currently, when the file is opened or an error occurs, the entire block list is requested and sent.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-894) dfs client protocol should allow asking for parts of the block map

Nick Burch (Jira)
In reply to this post by Nick Burch (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12493580 ]

Sameer Paranjpye commented on HADOOP-894:
-----------------------------------------

I don't think we should remove open() just yet.

Long term it would be nice to have the POSIX semantics of a files blocks not being removed while it is held open by a client even though the namespace entry for the file is removed. In this situation, a client calling open() on a file sets the expectation that it will need the files data until it either calls close() or loses it's lease. We'd need the open() call to track open files.  I don't think getBlockLocations() alone is sufficient, it is ok to call getBlockLocations() in order to get placement information for scheduling without opening the file.



> dfs client protocol should allow asking for parts of the block map
> ------------------------------------------------------------------
>
>                 Key: HADOOP-894
>                 URL: https://issues.apache.org/jira/browse/HADOOP-894
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Owen O'Malley
>         Assigned To: Konstantin Shvachko
>             Fix For: 0.13.0
>
>         Attachments: partialBlockList.patch, partialBlockList2.patch
>
>
> I think that the HDFS client protocol should change like:
> /** The meta-data about a file that was opened. */
> class OpenFileInfo {
>   /** the info for the first block */
>   public LocatedBlockInfo getBlockInfo();
>   public long getBlockSize();
>   public long getLength();
> }
> interface ClientProtocol extends VersionedProtocol {
>   public OpenFileInfo open(String name) throws IOException;
>   /** get block info for any range of blocks */
>   public LocatedBlockInfo[] getBlockInfo(String name, int blockOffset, int blockLength) throws IOException;
> }
> so that the client can decide how much block info to request and when. Currently, when the file is opened or an error occurs, the entire block list is requested and sent.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-894) dfs client protocol should allow asking for parts of the block map

Nick Burch (Jira)
In reply to this post by Nick Burch (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12493817 ]

Konstantin Shvachko commented on HADOOP-894:
--------------------------------------------

I looked at HADOOP-1298. Sounds like open() will need to return more metadata then it does now.
I am planning to have DFSFileInfo open(src) - with one parameter, and remove open(src, length) as Dhruba described.
And I'm planning to keep LocatedBlock list inside DFSFileInfo.


> dfs client protocol should allow asking for parts of the block map
> ------------------------------------------------------------------
>
>                 Key: HADOOP-894
>                 URL: https://issues.apache.org/jira/browse/HADOOP-894
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Owen O'Malley
>         Assigned To: Konstantin Shvachko
>             Fix For: 0.13.0
>
>         Attachments: partialBlockList.patch, partialBlockList2.patch
>
>
> I think that the HDFS client protocol should change like:
> /** The meta-data about a file that was opened. */
> class OpenFileInfo {
>   /** the info for the first block */
>   public LocatedBlockInfo getBlockInfo();
>   public long getBlockSize();
>   public long getLength();
> }
> interface ClientProtocol extends VersionedProtocol {
>   public OpenFileInfo open(String name) throws IOException;
>   /** get block info for any range of blocks */
>   public LocatedBlockInfo[] getBlockInfo(String name, int blockOffset, int blockLength) throws IOException;
> }
> so that the client can decide how much block info to request and when. Currently, when the file is opened or an error occurs, the entire block list is requested and sent.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (HADOOP-894) dfs client protocol should allow asking for parts of the block map

Nick Burch (Jira)
In reply to this post by Nick Burch (Jira)

     [ https://issues.apache.org/jira/browse/HADOOP-894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Konstantin Shvachko updated HADOOP-894:
---------------------------------------

    Status: Open  (was: Patch Available)

> dfs client protocol should allow asking for parts of the block map
> ------------------------------------------------------------------
>
>                 Key: HADOOP-894
>                 URL: https://issues.apache.org/jira/browse/HADOOP-894
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Owen O'Malley
>         Assigned To: Konstantin Shvachko
>             Fix For: 0.13.0
>
>         Attachments: partialBlockList.patch, partialBlockList2.patch
>
>
> I think that the HDFS client protocol should change like:
> /** The meta-data about a file that was opened. */
> class OpenFileInfo {
>   /** the info for the first block */
>   public LocatedBlockInfo getBlockInfo();
>   public long getBlockSize();
>   public long getLength();
> }
> interface ClientProtocol extends VersionedProtocol {
>   public OpenFileInfo open(String name) throws IOException;
>   /** get block info for any range of blocks */
>   public LocatedBlockInfo[] getBlockInfo(String name, int blockOffset, int blockLength) throws IOException;
> }
> so that the client can decide how much block info to request and when. Currently, when the file is opened or an error occurs, the entire block list is requested and sent.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-894) dfs client protocol should allow asking for parts of the block map

Nick Burch (Jira)
In reply to this post by Nick Burch (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12494094 ]

Doug Cutting commented on HADOOP-894:
-------------------------------------

When we open a file we don't need anything in return except the length, since we can call getBlockLocations() afterwards.  If we want some block locations returned from open(), as an optimization, then we should pass a start and length, giving the range of the file whose blocks we'd initially like, and return those with the length.  HADOOP-1298 will add more fields to DFSFileInfo, things we don't need when opening.  So HADOOP-1298 argues that we should not return a DFSFileInfo at open.  Also, other users of DFSFileInfo don't need a LocatedBlockList, so I really don't think it belongs there.


> dfs client protocol should allow asking for parts of the block map
> ------------------------------------------------------------------
>
>                 Key: HADOOP-894
>                 URL: https://issues.apache.org/jira/browse/HADOOP-894
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Owen O'Malley
>         Assigned To: Konstantin Shvachko
>             Fix For: 0.13.0
>
>         Attachments: partialBlockList.patch, partialBlockList2.patch
>
>
> I think that the HDFS client protocol should change like:
> /** The meta-data about a file that was opened. */
> class OpenFileInfo {
>   /** the info for the first block */
>   public LocatedBlockInfo getBlockInfo();
>   public long getBlockSize();
>   public long getLength();
> }
> interface ClientProtocol extends VersionedProtocol {
>   public OpenFileInfo open(String name) throws IOException;
>   /** get block info for any range of blocks */
>   public LocatedBlockInfo[] getBlockInfo(String name, int blockOffset, int blockLength) throws IOException;
> }
> so that the client can decide how much block info to request and when. Currently, when the file is opened or an error occurs, the entire block list is requested and sent.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

123