[jira] Created: (HADOOP-855) HDFS should repair corrupted files

classic Classic list List threaded Threaded
20 messages Options
Reply | Threaded
Open this post in threaded view
|

[jira] Created: (HADOOP-855) HDFS should repair corrupted files

JIRA jira@apache.org
HDFS should repair corrupted files
----------------------------------

                 Key: HADOOP-855
                 URL: https://issues.apache.org/jira/browse/HADOOP-855
             Project: Hadoop
          Issue Type: Bug
          Components: dfs
            Reporter: Wendy Chien
         Assigned To: Wendy Chien


While reading if we discover a mismatch between a block and checksum, we want to report this back to the namenode to delete the corrupted block or crc.

To implement this, we need to do the following:
DFSInputStream
1. move DFSInputStream out of DFSClient
2. add member variable to keep track of current datanode (the chosen node)

DistributedFileSystem
1. change reportChecksumFailure parameter crc from int to FSInputStream (needed to be able to delete it).
2. determine specific block and datanode from DFSInputStream passed to reportChecksumFailure  
3. call namenode to delete block/crc vis DFSClient

ClientProtocol
1. add method to ask namenode to delete certain blocks on specifc datanode.

Namenode
1. add ability to delete certain blocks on specific datanode

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       
Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-855) HDFS should repair corrupted files

JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/HADOOP-855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12462257 ]

Doug Cutting commented on HADOOP-855:
-------------------------------------

Mostly this sounds good to me.

> 1. change reportChecksumFailure parameter crc from int to FSInputStream

I'm confused by this one.  There's already an FSInputStream parameter.  In the DistributedFileSystem implementation of this method, one can cast this to DFSInputStream and then access whatever implementation-specific state is needed (like the datanode where the block in question resides).  So I see no need to alter the reportChecksumFailure signature.


> HDFS should repair corrupted files
> ----------------------------------
>
>                 Key: HADOOP-855
>                 URL: https://issues.apache.org/jira/browse/HADOOP-855
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Wendy Chien
>         Assigned To: Wendy Chien
>
> While reading if we discover a mismatch between a block and checksum, we want to report this back to the namenode to delete the corrupted block or crc.
> To implement this, we need to do the following:
> DFSInputStream
> 1. move DFSInputStream out of DFSClient
> 2. add member variable to keep track of current datanode (the chosen node)
> DistributedFileSystem
> 1. change reportChecksumFailure parameter crc from int to FSInputStream (needed to be able to delete it).
> 2. determine specific block and datanode from DFSInputStream passed to reportChecksumFailure  
> 3. call namenode to delete block/crc vis DFSClient
> ClientProtocol
> 1. add method to ask namenode to delete certain blocks on specifc datanode.
> Namenode
> 1. add ability to delete certain blocks on specific datanode

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       
Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-855) HDFS should repair corrupted files

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/HADOOP-855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12462301 ]

Wendy Chien commented on HADOOP-855:
------------------------------------

What I meant by that change is to pass in two FSInputStreams (and range information).  We have an FSInputStream for the data file, but we also need one for the checksum file to be able to delete the corrupted block from the checksum.  

In my first pass, when the checksum doesn't match, both the data and checksum blocks will be deleted.  Should we also try to figure out which of the two was corrupt?   Will the extra effort be worth the gain in not rewriting a block that was not actually corrupt?

 


> HDFS should repair corrupted files
> ----------------------------------
>
>                 Key: HADOOP-855
>                 URL: https://issues.apache.org/jira/browse/HADOOP-855
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Wendy Chien
>         Assigned To: Wendy Chien
>
> While reading if we discover a mismatch between a block and checksum, we want to report this back to the namenode to delete the corrupted block or crc.
> To implement this, we need to do the following:
> DFSInputStream
> 1. move DFSInputStream out of DFSClient
> 2. add member variable to keep track of current datanode (the chosen node)
> DistributedFileSystem
> 1. change reportChecksumFailure parameter crc from int to FSInputStream (needed to be able to delete it).
> 2. determine specific block and datanode from DFSInputStream passed to reportChecksumFailure  
> 3. call namenode to delete block/crc vis DFSClient
> ClientProtocol
> 1. add method to ask namenode to delete certain blocks on specifc datanode.
> Namenode
> 1. add ability to delete certain blocks on specific datanode

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       
Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-855) HDFS should repair corrupted files

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/HADOOP-855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12462306 ]

Doug Cutting commented on HADOOP-855:
-------------------------------------

> we also need one for the checksum file

Ah, I get it.  That sounds reasonable.  Thanks for clarifying!

> Should we also try to figure out which of the two was corrupt?

I don't think it's worth it, at least not for the first pass.

What should we do if the replication level is one for the corrupt block?  Keep it, I think.

> HDFS should repair corrupted files
> ----------------------------------
>
>                 Key: HADOOP-855
>                 URL: https://issues.apache.org/jira/browse/HADOOP-855
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Wendy Chien
>         Assigned To: Wendy Chien
>
> While reading if we discover a mismatch between a block and checksum, we want to report this back to the namenode to delete the corrupted block or crc.
> To implement this, we need to do the following:
> DFSInputStream
> 1. move DFSInputStream out of DFSClient
> 2. add member variable to keep track of current datanode (the chosen node)
> DistributedFileSystem
> 1. change reportChecksumFailure parameter crc from int to FSInputStream (needed to be able to delete it).
> 2. determine specific block and datanode from DFSInputStream passed to reportChecksumFailure  
> 3. call namenode to delete block/crc vis DFSClient
> ClientProtocol
> 1. add method to ask namenode to delete certain blocks on specifc datanode.
> Namenode
> 1. add ability to delete certain blocks on specific datanode

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       
Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-855) HDFS should repair corrupted files

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/HADOOP-855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12462310 ]

Wendy Chien commented on HADOOP-855:
------------------------------------

If the replication level is one, then we will keep the corrupt block and report the error.  

> HDFS should repair corrupted files
> ----------------------------------
>
>                 Key: HADOOP-855
>                 URL: https://issues.apache.org/jira/browse/HADOOP-855
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Wendy Chien
>         Assigned To: Wendy Chien
>
> While reading if we discover a mismatch between a block and checksum, we want to report this back to the namenode to delete the corrupted block or crc.
> To implement this, we need to do the following:
> DFSInputStream
> 1. move DFSInputStream out of DFSClient
> 2. add member variable to keep track of current datanode (the chosen node)
> DistributedFileSystem
> 1. change reportChecksumFailure parameter crc from int to FSInputStream (needed to be able to delete it).
> 2. determine specific block and datanode from DFSInputStream passed to reportChecksumFailure  
> 3. call namenode to delete block/crc vis DFSClient
> ClientProtocol
> 1. add method to ask namenode to delete certain blocks on specifc datanode.
> Namenode
> 1. add ability to delete certain blocks on specific datanode

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       
Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-855) HDFS should repair corrupted files

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/HADOOP-855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12462350 ]

Hairong Kuang commented on HADOOP-855:
--------------------------------------

DFSInputStream also needs a method to return the current block. We could either have a memember variable to keep track of the current block or calculate the block based the member variables pos and blocks.

> HDFS should repair corrupted files
> ----------------------------------
>
>                 Key: HADOOP-855
>                 URL: https://issues.apache.org/jira/browse/HADOOP-855
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Wendy Chien
>         Assigned To: Wendy Chien
>
> While reading if we discover a mismatch between a block and checksum, we want to report this back to the namenode to delete the corrupted block or crc.
> To implement this, we need to do the following:
> DFSInputStream
> 1. move DFSInputStream out of DFSClient
> 2. add member variable to keep track of current datanode (the chosen node)
> DistributedFileSystem
> 1. change reportChecksumFailure parameter crc from int to FSInputStream (needed to be able to delete it).
> 2. determine specific block and datanode from DFSInputStream passed to reportChecksumFailure  
> 3. call namenode to delete block/crc vis DFSClient
> ClientProtocol
> 1. add method to ask namenode to delete certain blocks on specifc datanode.
> Namenode
> 1. add ability to delete certain blocks on specific datanode

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       
Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-855) HDFS should repair corrupted files

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/HADOOP-855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12463446 ]

Wendy Chien commented on HADOOP-855:
------------------------------------

I've added a member variable to keep track of the current block.  

> HDFS should repair corrupted files
> ----------------------------------
>
>                 Key: HADOOP-855
>                 URL: https://issues.apache.org/jira/browse/HADOOP-855
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Wendy Chien
>         Assigned To: Wendy Chien
>         Attachments: hadoop-855-5.patch
>
>
> While reading if we discover a mismatch between a block and checksum, we want to report this back to the namenode to delete the corrupted block or crc.
> To implement this, we need to do the following:
> DFSInputStream
> 1. move DFSInputStream out of DFSClient
> 2. add member variable to keep track of current datanode (the chosen node)
> DistributedFileSystem
> 1. change reportChecksumFailure parameter crc from int to FSInputStream (needed to be able to delete it).
> 2. determine specific block and datanode from DFSInputStream passed to reportChecksumFailure  
> 3. call namenode to delete block/crc vis DFSClient
> ClientProtocol
> 1. add method to ask namenode to delete certain blocks on specifc datanode.
> Namenode
> 1. add ability to delete certain blocks on specific datanode

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       
Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (HADOOP-855) HDFS should repair corrupted files

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/HADOOP-855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Wendy Chien updated HADOOP-855:
-------------------------------

    Attachment: hadoop-855-5.patch

> HDFS should repair corrupted files
> ----------------------------------
>
>                 Key: HADOOP-855
>                 URL: https://issues.apache.org/jira/browse/HADOOP-855
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Wendy Chien
>         Assigned To: Wendy Chien
>         Attachments: hadoop-855-5.patch
>
>
> While reading if we discover a mismatch between a block and checksum, we want to report this back to the namenode to delete the corrupted block or crc.
> To implement this, we need to do the following:
> DFSInputStream
> 1. move DFSInputStream out of DFSClient
> 2. add member variable to keep track of current datanode (the chosen node)
> DistributedFileSystem
> 1. change reportChecksumFailure parameter crc from int to FSInputStream (needed to be able to delete it).
> 2. determine specific block and datanode from DFSInputStream passed to reportChecksumFailure  
> 3. call namenode to delete block/crc vis DFSClient
> ClientProtocol
> 1. add method to ask namenode to delete certain blocks on specifc datanode.
> Namenode
> 1. add ability to delete certain blocks on specific datanode

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       
Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-855) HDFS should repair corrupted files

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/HADOOP-855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12463635 ]

Doug Cutting commented on HADOOP-855:
-------------------------------------

We should probably call the protocol method something like invalidateBlock() rather than deleteBlock(), since it only deletes it if has replicas.

Also, does this issue include HADOOP-731 or not?  If so, then it needs more work.  If not, we should re-open HADOOP-731.

> HDFS should repair corrupted files
> ----------------------------------
>
>                 Key: HADOOP-855
>                 URL: https://issues.apache.org/jira/browse/HADOOP-855
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Wendy Chien
>         Assigned To: Wendy Chien
>         Attachments: hadoop-855-5.patch
>
>
> While reading if we discover a mismatch between a block and checksum, we want to report this back to the namenode to delete the corrupted block or crc.
> To implement this, we need to do the following:
> DFSInputStream
> 1. move DFSInputStream out of DFSClient
> 2. add member variable to keep track of current datanode (the chosen node)
> DistributedFileSystem
> 1. change reportChecksumFailure parameter crc from int to FSInputStream (needed to be able to delete it).
> 2. determine specific block and datanode from DFSInputStream passed to reportChecksumFailure  
> 3. call namenode to delete block/crc vis DFSClient
> ClientProtocol
> 1. add method to ask namenode to delete certain blocks on specifc datanode.
> Namenode
> 1. add ability to delete certain blocks on specific datanode

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       
Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-855) HDFS should repair corrupted files

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/HADOOP-855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12463645 ]

Sameer Paranjpye commented on HADOOP-855:
-----------------------------------------

This patch does not include HADOOP-731, I've re-opened it.

> HDFS should repair corrupted files
> ----------------------------------
>
>                 Key: HADOOP-855
>                 URL: https://issues.apache.org/jira/browse/HADOOP-855
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Wendy Chien
>         Assigned To: Wendy Chien
>         Attachments: hadoop-855-5.patch
>
>
> While reading if we discover a mismatch between a block and checksum, we want to report this back to the namenode to delete the corrupted block or crc.
> To implement this, we need to do the following:
> DFSInputStream
> 1. move DFSInputStream out of DFSClient
> 2. add member variable to keep track of current datanode (the chosen node)
> DistributedFileSystem
> 1. change reportChecksumFailure parameter crc from int to FSInputStream (needed to be able to delete it).
> 2. determine specific block and datanode from DFSInputStream passed to reportChecksumFailure  
> 3. call namenode to delete block/crc vis DFSClient
> ClientProtocol
> 1. add method to ask namenode to delete certain blocks on specifc datanode.
> Namenode
> 1. add ability to delete certain blocks on specific datanode

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       
Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-855) HDFS should repair corrupted files

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/HADOOP-855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12463737 ]

Hairong Kuang commented on HADOOP-855:
--------------------------------------

1. src/java/org/apache/hadoop/dfs/NameNode.java
     In deleteBlocks, I would not enforce one location per block.
2. src/java/org/apache/hadoop/dfs/FSNamesystem.java
     In deleteBlocks, should we make sure that the remaining containing datanodes contain at least one non-decomissoned/ing datanode when deciding if a block should be deleted? We need to check if the block is underreplicated before putting it to needReplications. Also need to check if the block should be taken out of excessReplicationMap.

> HDFS should repair corrupted files
> ----------------------------------
>
>                 Key: HADOOP-855
>                 URL: https://issues.apache.org/jira/browse/HADOOP-855
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Wendy Chien
>         Assigned To: Wendy Chien
>         Attachments: hadoop-855-5.patch
>
>
> While reading if we discover a mismatch between a block and checksum, we want to report this back to the namenode to delete the corrupted block or crc.
> To implement this, we need to do the following:
> DFSInputStream
> 1. move DFSInputStream out of DFSClient
> 2. add member variable to keep track of current datanode (the chosen node)
> DistributedFileSystem
> 1. change reportChecksumFailure parameter crc from int to FSInputStream (needed to be able to delete it).
> 2. determine specific block and datanode from DFSInputStream passed to reportChecksumFailure  
> 3. call namenode to delete block/crc vis DFSClient
> ClientProtocol
> 1. add method to ask namenode to delete certain blocks on specifc datanode.
> Namenode
> 1. add ability to delete certain blocks on specific datanode

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       
Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-855) HDFS should repair corrupted files

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/HADOOP-855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12463770 ]

Wendy Chien commented on HADOOP-855:
------------------------------------

Thanks Doug and Hairong for looking over the patch.

a. I'll change the name from deleteBlocks to invalidateBlocks
b. I'll take away the one location enforcement in NameNode
c. I'm going to call removeStoredBlock to update the data structures in invalidateBlock.  We need to do everything it does.  (It updates blocksMap, neededReplications, and excessReplicateMap)
d. I agree with Hairong that we should not delete the corrupt copy if it is the only one on a live node, in case the decommissioned nodes are taken down.  I will implement it this way unless people disagree.

> HDFS should repair corrupted files
> ----------------------------------
>
>                 Key: HADOOP-855
>                 URL: https://issues.apache.org/jira/browse/HADOOP-855
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Wendy Chien
>         Assigned To: Wendy Chien
>         Attachments: hadoop-855-5.patch
>
>
> While reading if we discover a mismatch between a block and checksum, we want to report this back to the namenode to delete the corrupted block or crc.
> To implement this, we need to do the following:
> DFSInputStream
> 1. move DFSInputStream out of DFSClient
> 2. add member variable to keep track of current datanode (the chosen node)
> DistributedFileSystem
> 1. change reportChecksumFailure parameter crc from int to FSInputStream (needed to be able to delete it).
> 2. determine specific block and datanode from DFSInputStream passed to reportChecksumFailure  
> 3. call namenode to delete block/crc vis DFSClient
> ClientProtocol
> 1. add method to ask namenode to delete certain blocks on specifc datanode.
> Namenode
> 1. add ability to delete certain blocks on specific datanode

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       
Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (HADOOP-855) HDFS should repair corrupted files

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/HADOOP-855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Wendy Chien updated HADOOP-855:
-------------------------------

    Attachment: hadoop-855-7.patch

> HDFS should repair corrupted files
> ----------------------------------
>
>                 Key: HADOOP-855
>                 URL: https://issues.apache.org/jira/browse/HADOOP-855
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Wendy Chien
>         Assigned To: Wendy Chien
>         Attachments: hadoop-855-5.patch, hadoop-855-7.patch
>
>
> While reading if we discover a mismatch between a block and checksum, we want to report this back to the namenode to delete the corrupted block or crc.
> To implement this, we need to do the following:
> DFSInputStream
> 1. move DFSInputStream out of DFSClient
> 2. add member variable to keep track of current datanode (the chosen node)
> DistributedFileSystem
> 1. change reportChecksumFailure parameter crc from int to FSInputStream (needed to be able to delete it).
> 2. determine specific block and datanode from DFSInputStream passed to reportChecksumFailure  
> 3. call namenode to delete block/crc vis DFSClient
> ClientProtocol
> 1. add method to ask namenode to delete certain blocks on specifc datanode.
> Namenode
> 1. add ability to delete certain blocks on specific datanode

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       
Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-855) HDFS should repair corrupted files

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/HADOOP-855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12464382 ]

Wendy Chien commented on HADOOP-855:
------------------------------------

I've attached another patch which includes the points mentioned in my previous comment with one change.  Instead of invalidateBlocks, I called it reportBadBlocks because others pointed out the namenode can choose to do whatever it wants with this information from the client, not necessarily delete or invalidate the blocks.


> HDFS should repair corrupted files
> ----------------------------------
>
>                 Key: HADOOP-855
>                 URL: https://issues.apache.org/jira/browse/HADOOP-855
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Wendy Chien
>         Assigned To: Wendy Chien
>         Attachments: hadoop-855-5.patch, hadoop-855-7.patch
>
>
> While reading if we discover a mismatch between a block and checksum, we want to report this back to the namenode to delete the corrupted block or crc.
> To implement this, we need to do the following:
> DFSInputStream
> 1. move DFSInputStream out of DFSClient
> 2. add member variable to keep track of current datanode (the chosen node)
> DistributedFileSystem
> 1. change reportChecksumFailure parameter crc from int to FSInputStream (needed to be able to delete it).
> 2. determine specific block and datanode from DFSInputStream passed to reportChecksumFailure  
> 3. call namenode to delete block/crc vis DFSClient
> ClientProtocol
> 1. add method to ask namenode to delete certain blocks on specifc datanode.
> Namenode
> 1. add ability to delete certain blocks on specific datanode

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       
Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (HADOOP-855) HDFS should repair corrupted files

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/HADOOP-855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Wendy Chien updated HADOOP-855:
-------------------------------

    Status: Patch Available  (was: Open)

> HDFS should repair corrupted files
> ----------------------------------
>
>                 Key: HADOOP-855
>                 URL: https://issues.apache.org/jira/browse/HADOOP-855
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Wendy Chien
>         Assigned To: Wendy Chien
>         Attachments: hadoop-855-5.patch, hadoop-855-7.patch
>
>
> While reading if we discover a mismatch between a block and checksum, we want to report this back to the namenode to delete the corrupted block or crc.
> To implement this, we need to do the following:
> DFSInputStream
> 1. move DFSInputStream out of DFSClient
> 2. add member variable to keep track of current datanode (the chosen node)
> DistributedFileSystem
> 1. change reportChecksumFailure parameter crc from int to FSInputStream (needed to be able to delete it).
> 2. determine specific block and datanode from DFSInputStream passed to reportChecksumFailure  
> 3. call namenode to delete block/crc vis DFSClient
> ClientProtocol
> 1. add method to ask namenode to delete certain blocks on specifc datanode.
> Namenode
> 1. add ability to delete certain blocks on specific datanode

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       
Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-855) HDFS should repair corrupted files

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/HADOOP-855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12464389 ]

Hadoop QA commented on HADOOP-855:
----------------------------------

+1, because http://issues.apache.org/jira/secure/attachment/12348878/hadoop-855-7.patch applied and successfully tested against trunk revision r495045.

> HDFS should repair corrupted files
> ----------------------------------
>
>                 Key: HADOOP-855
>                 URL: https://issues.apache.org/jira/browse/HADOOP-855
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Wendy Chien
>         Assigned To: Wendy Chien
>         Attachments: hadoop-855-5.patch, hadoop-855-7.patch
>
>
> While reading if we discover a mismatch between a block and checksum, we want to report this back to the namenode to delete the corrupted block or crc.
> To implement this, we need to do the following:
> DFSInputStream
> 1. move DFSInputStream out of DFSClient
> 2. add member variable to keep track of current datanode (the chosen node)
> DistributedFileSystem
> 1. change reportChecksumFailure parameter crc from int to FSInputStream (needed to be able to delete it).
> 2. determine specific block and datanode from DFSInputStream passed to reportChecksumFailure  
> 3. call namenode to delete block/crc vis DFSClient
> ClientProtocol
> 1. add method to ask namenode to delete certain blocks on specifc datanode.
> Namenode
> 1. add ability to delete certain blocks on specific datanode

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       
Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (HADOOP-855) HDFS should repair corrupted files

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/HADOOP-855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting updated HADOOP-855:
--------------------------------

    Fix Version/s: 0.11.0
           Status: Open  (was: Patch Available)

Wendy,

This patch has fallen out of date, since HADOOP-803 and HADOOP-842 were committed.  Can you please update it?  Thanks!

Doug

> HDFS should repair corrupted files
> ----------------------------------
>
>                 Key: HADOOP-855
>                 URL: https://issues.apache.org/jira/browse/HADOOP-855
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Wendy Chien
>         Assigned To: Wendy Chien
>             Fix For: 0.11.0
>
>         Attachments: hadoop-855-5.patch, hadoop-855-7.patch
>
>
> While reading if we discover a mismatch between a block and checksum, we want to report this back to the namenode to delete the corrupted block or crc.
> To implement this, we need to do the following:
> DFSInputStream
> 1. move DFSInputStream out of DFSClient
> 2. add member variable to keep track of current datanode (the chosen node)
> DistributedFileSystem
> 1. change reportChecksumFailure parameter crc from int to FSInputStream (needed to be able to delete it).
> 2. determine specific block and datanode from DFSInputStream passed to reportChecksumFailure  
> 3. call namenode to delete block/crc vis DFSClient
> ClientProtocol
> 1. add method to ask namenode to delete certain blocks on specifc datanode.
> Namenode
> 1. add ability to delete certain blocks on specific datanode

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       
Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (HADOOP-855) HDFS should repair corrupted files

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/HADOOP-855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Wendy Chien updated HADOOP-855:
-------------------------------

    Attachment: hadoop-855-9.patch

> HDFS should repair corrupted files
> ----------------------------------
>
>                 Key: HADOOP-855
>                 URL: https://issues.apache.org/jira/browse/HADOOP-855
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Wendy Chien
>         Assigned To: Wendy Chien
>             Fix For: 0.11.0
>
>         Attachments: hadoop-855-5.patch, hadoop-855-7.patch, hadoop-855-9.patch
>
>
> While reading if we discover a mismatch between a block and checksum, we want to report this back to the namenode to delete the corrupted block or crc.
> To implement this, we need to do the following:
> DFSInputStream
> 1. move DFSInputStream out of DFSClient
> 2. add member variable to keep track of current datanode (the chosen node)
> DistributedFileSystem
> 1. change reportChecksumFailure parameter crc from int to FSInputStream (needed to be able to delete it).
> 2. determine specific block and datanode from DFSInputStream passed to reportChecksumFailure  
> 3. call namenode to delete block/crc vis DFSClient
> ClientProtocol
> 1. add method to ask namenode to delete certain blocks on specifc datanode.
> Namenode
> 1. add ability to delete certain blocks on specific datanode

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       
Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (HADOOP-855) HDFS should repair corrupted files

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/HADOOP-855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Wendy Chien updated HADOOP-855:
-------------------------------

    Status: Patch Available  (was: Open)

Updated the patch.

> HDFS should repair corrupted files
> ----------------------------------
>
>                 Key: HADOOP-855
>                 URL: https://issues.apache.org/jira/browse/HADOOP-855
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Wendy Chien
>         Assigned To: Wendy Chien
>             Fix For: 0.11.0
>
>         Attachments: hadoop-855-5.patch, hadoop-855-7.patch, hadoop-855-9.patch
>
>
> While reading if we discover a mismatch between a block and checksum, we want to report this back to the namenode to delete the corrupted block or crc.
> To implement this, we need to do the following:
> DFSInputStream
> 1. move DFSInputStream out of DFSClient
> 2. add member variable to keep track of current datanode (the chosen node)
> DistributedFileSystem
> 1. change reportChecksumFailure parameter crc from int to FSInputStream (needed to be able to delete it).
> 2. determine specific block and datanode from DFSInputStream passed to reportChecksumFailure  
> 3. call namenode to delete block/crc vis DFSClient
> ClientProtocol
> 1. add method to ask namenode to delete certain blocks on specifc datanode.
> Namenode
> 1. add ability to delete certain blocks on specific datanode

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       
Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (HADOOP-855) HDFS should repair corrupted files

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/HADOOP-855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting updated HADOOP-855:
--------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

I just committed this.  Thanks, Wendy!

> HDFS should repair corrupted files
> ----------------------------------
>
>                 Key: HADOOP-855
>                 URL: https://issues.apache.org/jira/browse/HADOOP-855
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Wendy Chien
>         Assigned To: Wendy Chien
>             Fix For: 0.11.0
>
>         Attachments: hadoop-855-5.patch, hadoop-855-7.patch, hadoop-855-9.patch
>
>
> While reading if we discover a mismatch between a block and checksum, we want to report this back to the namenode to delete the corrupted block or crc.
> To implement this, we need to do the following:
> DFSInputStream
> 1. move DFSInputStream out of DFSClient
> 2. add member variable to keep track of current datanode (the chosen node)
> DistributedFileSystem
> 1. change reportChecksumFailure parameter crc from int to FSInputStream (needed to be able to delete it).
> 2. determine specific block and datanode from DFSInputStream passed to reportChecksumFailure  
> 3. call namenode to delete block/crc vis DFSClient
> ClientProtocol
> 1. add method to ask namenode to delete certain blocks on specifc datanode.
> Namenode
> 1. add ability to delete certain blocks on specific datanode

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira