File Checksum Error

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

File Checksum Error

Oussama Jilal

Hello,

This is originally an HBase issue, but please allow me to forward it to the hadoop mailing list since the error is in the checksum of a file in HDFS...

We are facing a serious issue with our production system, we are on a Windows Azure infrastructure and yesterday, unexpectedly, all of our VMs restarted... this has caused our HBase cluster (1 Master + 2 Region Servers) to crash...

Now whenever I want to start our HBase cluster, it goes down. I checked the HBase logs and this is what I found on the master logs:

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
2016-07-28 09:33:10,052 WARN  [main-EventThread] coordination.SplitLogManagerCoordination: Error splitting /hbase/splitWAL/WALs%2Fdatanode-2%2C16020%2C1466263181091-splitting%2Fdatanode-2%252C16020%252C1466263181
091.default.1469654596681
2016-07-28 09:33:10,052 WARN  [MASTER_SERVER_OPERATIONS-NameNode:16000-3] master.SplitLogManager: error while splitting logs in [hdfs://namenode/hbase/WALs/datanode-2,16020,1466263181091-splitting] installed = 1
but only 0 done
2016-07-28 09:33:10,053 ERROR [MASTER_SERVER_OPERATIONS-NameNode:16000-3] executor.EventHandler: Caught throwable while processing event M_SERVER_SHUTDOWN
java.io.IOException: failed log splitting for datanode-2,16020,1466263181091, will retry
        at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.resubmit(ServerShutdownHandler.java:357)
        at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:220)
        at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:129)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: error or interrupted while splitting logs in [hdfs://namenode/hbase/WALs/datanode-2,16020,1466263181091-splitting] Task = installed = 1 done = 0 error = 1
        at org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:290)
        at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:391)
        at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:364)
        at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:286)
        at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:213)
        ... 4 more
2016-07-28 09:33:10,055 FATAL [MASTER_SERVER_OPERATIONS-NameNode:16000-3] master.HMaster: Master server abort: loaded coprocessors are: []
2016-07-28 09:33:10,055 FATAL [MASTER_SERVER_OPERATIONS-NameNode:16000-3] master.HMaster: Caught throwable while processing event M_SERVER_SHUTDOWN
java.io.IOException: failed log splitting for datanode-2,16020,1466263181091, will retry
        at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.resubmit(ServerShutdownHandler.java:357)
        at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:220)
        at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:129)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: error or interrupted while splitting logs in [hdfs://namenode/hbase/WALs/datanode-2,16020,1466263181091-splitting] Task = installed = 1 done = 0 error = 1
        at org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:290)
        at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:391)
        at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:364)
        at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:286)
        at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:213)
        ... 4 more
2016-07-28 09:33:10,055 INFO  [MASTER_SERVER_OPERATIONS-NameNode:16000-3] regionserver.HRegionServer: STOPPED: Caught throwable while processing event M_SERVER_SHUTDOWN
2016-07-28 09:33:10,055 ERROR [MASTER_SERVER_OPERATIONS-NameNode:16000-2] executor.EventHandler: Caught throwable while processing event M_SERVER_SHUTDOWN
java.io.IOException: Server is stopped
        at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:194)
        at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:129)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

And this is the one of the Region Servers logs:
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
2016-07-28 09:33:10,033 WARN  [RS_LOG_REPLAY_OPS-datanode-2:16020-0] regionserver.SplitLogWorker: log splitting of WALs/datanode-2,16020,1466263181091-splitting/datanode-2%2C16020%2C1466263181091.default.1469654596681 failed, returning error
org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-304127416-10.0.0.7-1465905487911:blk_1073770935_30121 file=/hbase/WALs/datanode-2,16020,1466263181091-splitting/datanode-2%2C16020%2C1466263181091.default.1469654596681
        at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:882)
        at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:563)
        at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:793)
        at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:840)
        at java.io.DataInputStream.read(DataInputStream.java:100)
        at org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:298)
        at org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:267)
        at org.apache.hadoop.hbase.wal.WALSplitter.getReader(WALSplitter.java:839)
        at org.apache.hadoop.hbase.wal.WALSplitter.getReader(WALSplitter.java:763)
        at org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:297)
        at org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:235)
        at org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:104)
        at org.apache.hadoop.hbase.regionserver.handler.WALSplitterHandler.process(WALSplitterHandler.java:72)
        at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:129)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Now the file "/hbase/WALs/datanode-2,16020,1466263181091-splitting/datanode-2%2C16020%2C1466263181091.default.1469654596681" does exists on the dfs and I have run "hdfs fsck /" and it says "The filesystem under path '/' is HEALTHY". But when I try accessing the file bad things happens:

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
16/07/28 09:57:30 WARN hdfs.DFSClient: Found Checksum error for BP-304127416-10.0.0.7-1465905487911:blk_1073770935_30121 from DatanodeInfoWithStorage[10.0.0.9:50010,DS-be379543-51da-481b-ad25-2259d8fbe479,DISK] at 5120
16/07/28 09:57:30 WARN hdfs.DFSClient: Found Checksum error for BP-304127416-10.0.0.7-1465905487911:blk_1073770935_30121 from DatanodeInfoWithStorage[10.0.0.8:50010,DS-b5d3244e-3ccb-4c5f-831d-4c300e59ee74,DISK] at 5120
16/07/28 09:57:30 INFO hdfs.DFSClient: Could not obtain BP-304127416-10.0.0.7-1465905487911:blk_1073770935_30121 from any node: java.io.IOException: No live nodes contain block BP-304127416-10.0.0.7-1465905487911:blk_1073770935_30121 after checking nodes = [DatanodeInfoWithStorage[10.0.0.9:50010,DS-be379543-51da-481b-ad25-2259d8fbe479,DISK], DatanodeInfoWithStorage[10.0.0.8:50010,DS-b5d3244e-3ccb-4c5f-831d-4c300e59ee74,DISK]], ignoredNodes = null No live nodes contain current block Block locations: DatanodeInfoWithStorage[10.0.0.9:50010,DS-be379543-51da-481b-ad25-2259d8fbe479,DISK] DatanodeInfoWithStorage[10.0.0.8:50010,DS-b5d3244e-3ccb-4c5f-831d-4c300e59ee74,DISK] Dead nodes:  DatanodeInfoWithStorage[10.0.0.9:50010,DS-be379543-51da-481b-ad25-2259d8fbe479,DISK] DatanodeInfoWithStorage[10.0.0.8:50010,DS-b5d3244e-3ccb-4c5f-831d-4c300e59ee74,DISK]. Will get new block locations from namenode and retry...
16/07/28 09:57:30 WARN hdfs.DFSClient: DFS chooseDataNode: got # 1 IOException, will wait for 2671.5631767728437 msec.
16/07/28 09:57:33 WARN hdfs.DFSClient: Found Checksum error for BP-304127416-10.0.0.7-1465905487911:blk_1073770935_30121 from DatanodeInfoWithStorage[10.0.0.9:50010,DS-be379543-51da-481b-ad25-2259d8fbe479,DISK] at 5120
16/07/28 09:57:33 WARN hdfs.DFSClient: Found Checksum error for BP-304127416-10.0.0.7-1465905487911:blk_1073770935_30121 from DatanodeInfoWithStorage[10.0.0.8:50010,DS-b5d3244e-3ccb-4c5f-831d-4c300e59ee74,DISK] at 5120
16/07/28 09:57:33 INFO hdfs.DFSClient: Could not obtain BP-304127416-10.0.0.7-1465905487911:blk_1073770935_30121 from any node: java.io.IOException: No live nodes contain block BP-304127416-10.0.0.7-1465905487911:blk_1073770935_30121 after checking nodes = [DatanodeInfoWithStorage[10.0.0.9:50010,DS-be379543-51da-481b-ad25-2259d8fbe479,DISK], DatanodeInfoWithStorage[10.0.0.8:50010,DS-b5d3244e-3ccb-4c5f-831d-4c300e59ee74,DISK]], ignoredNodes = null No live nodes contain current block Block locations: DatanodeInfoWithStorage[10.0.0.9:50010,DS-be379543-51da-481b-ad25-2259d8fbe479,DISK] DatanodeInfoWithStorage[10.0.0.8:50010,DS-b5d3244e-3ccb-4c5f-831d-4c300e59ee74,DISK] Dead nodes:  DatanodeInfoWithStorage[10.0.0.9:50010,DS-be379543-51da-481b-ad25-2259d8fbe479,DISK] DatanodeInfoWithStorage[10.0.0.8:50010,DS-b5d3244e-3ccb-4c5f-831d-4c300e59ee74,DISK]. Will get new block locations from namenode and retry...
16/07/28 09:57:33 WARN hdfs.DFSClient: DFS chooseDataNode: got # 2 IOException, will wait for 8977.03196466135 msec.
16/07/28 09:57:42 WARN hdfs.DFSClient: Found Checksum error for BP-304127416-10.0.0.7-1465905487911:blk_1073770935_30121 from DatanodeInfoWithStorage[10.0.0.9:50010,DS-be379543-51da-481b-ad25-2259d8fbe479,DISK] at 5120
16/07/28 09:57:42 WARN hdfs.DFSClient: Found Checksum error for BP-304127416-10.0.0.7-1465905487911:blk_1073770935_30121 from DatanodeInfoWithStorage[10.0.0.8:50010,DS-b5d3244e-3ccb-4c5f-831d-4c300e59ee74,DISK] at 5120
16/07/28 09:57:42 INFO hdfs.DFSClient: Could not obtain BP-304127416-10.0.0.7-1465905487911:blk_1073770935_30121 from any node: java.io.IOException: No live nodes contain block BP-304127416-10.0.0.7-1465905487911:blk_1073770935_30121 after checking nodes = [DatanodeInfoWithStorage[10.0.0.9:50010,DS-be379543-51da-481b-ad25-2259d8fbe479,DISK], DatanodeInfoWithStorage[10.0.0.8:50010,DS-b5d3244e-3ccb-4c5f-831d-4c300e59ee74,DISK]], ignoredNodes = null No live nodes contain current block Block locations: DatanodeInfoWithStorage[10.0.0.9:50010,DS-be379543-51da-481b-ad25-2259d8fbe479,DISK] DatanodeInfoWithStorage[10.0.0.8:50010,DS-b5d3244e-3ccb-4c5f-831d-4c300e59ee74,DISK] Dead nodes:  DatanodeInfoWithStorage[10.0.0.9:50010,DS-be379543-51da-481b-ad25-2259d8fbe479,DISK] DatanodeInfoWithStorage[10.0.0.8:50010,DS-b5d3244e-3ccb-4c5f-831d-4c300e59ee74,DISK]. Will get new block locations from namenode and retry...
16/07/28 09:57:42 WARN hdfs.DFSClient: DFS chooseDataNode: got # 3 IOException, will wait for 13504.266287461855 msec.
cat: Checksum error: /hbase/WALs/datanode-2,16020,1466263181091-splitting/datanode-2%2C16020%2C1466263181091.default.1469654596681 at 5120 exp: 123018583 got: 48747596
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Any help is appreciated in order to recover our HBase cluster...
Reply | Threaded
Open this post in threaded view
|

Re: File Checksum Error

Oussama Jilal
In case anyone should face the same issue... I managed to fix this issue by running:

hadoop fs -get -ignoreCrc hdfs://namenode/hbase/WALs/datanode-2,16020,1466263181091-splitting/datanode-2%2C16020%2C1466263181091.default.1469654596681 datanode-2,16020,1466263181091.default.1469654596681

then:

hadoop fs -rm /hbase/WALs/datanode-2,16020,1466263181091-splitting/datanode-2%2C16020%2C1466263181091.default.1469654596681

and then:

hadoop fs -put datanode-2,16020,1466263181091.default.1469654596681 /hbase/WALs/datanode-2,16020,1466263181091-splitting/datanode-2%2C16020%2C1466263181091.default.1469654596681

And finally starting HBase again... which basically means getting the file with the issue from HDFS (ignoring the checksum), and then putting it back! ouf! (it might be risky... but it worked...)

Best Regards.

On Thu, Jul 28, 2016 at 11:26 AM Oussama Jilal <[hidden email]> wrote:

Hello,

This is originally an HBase issue, but please allow me to forward it to the hadoop mailing list since the error is in the checksum of a file in HDFS...

We are facing a serious issue with our production system, we are on a Windows Azure infrastructure and yesterday, unexpectedly, all of our VMs restarted... this has caused our HBase cluster (1 Master + 2 Region Servers) to crash...

Now whenever I want to start our HBase cluster, it goes down. I checked the HBase logs and this is what I found on the master logs:

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
2016-07-28 09:33:10,052 WARN  [main-EventThread] coordination.SplitLogManagerCoordination: Error splitting /hbase/splitWAL/WALs%2Fdatanode-2%2C16020%2C1466263181091-splitting%2Fdatanode-2%252C16020%252C1466263181
091.default.1469654596681
2016-07-28 09:33:10,052 WARN  [MASTER_SERVER_OPERATIONS-NameNode:16000-3] master.SplitLogManager: error while splitting logs in [hdfs://namenode/hbase/WALs/datanode-2,16020,1466263181091-splitting] installed = 1
but only 0 done
2016-07-28 09:33:10,053 ERROR [MASTER_SERVER_OPERATIONS-NameNode:16000-3] executor.EventHandler: Caught throwable while processing event M_SERVER_SHUTDOWN
java.io.IOException: failed log splitting for datanode-2,16020,1466263181091, will retry
        at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.resubmit(ServerShutdownHandler.java:357)
        at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:220)
        at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:129)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: error or interrupted while splitting logs in [hdfs://namenode/hbase/WALs/datanode-2,16020,1466263181091-splitting] Task = installed = 1 done = 0 error = 1
        at org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:290)
        at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:391)
        at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:364)
        at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:286)
        at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:213)
        ... 4 more
2016-07-28 09:33:10,055 FATAL [MASTER_SERVER_OPERATIONS-NameNode:16000-3] master.HMaster: Master server abort: loaded coprocessors are: []
2016-07-28 09:33:10,055 FATAL [MASTER_SERVER_OPERATIONS-NameNode:16000-3] master.HMaster: Caught throwable while processing event M_SERVER_SHUTDOWN
java.io.IOException: failed log splitting for datanode-2,16020,1466263181091, will retry
        at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.resubmit(ServerShutdownHandler.java:357)
        at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:220)
        at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:129)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: error or interrupted while splitting logs in [hdfs://namenode/hbase/WALs/datanode-2,16020,1466263181091-splitting] Task = installed = 1 done = 0 error = 1
        at org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:290)
        at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:391)
        at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:364)
        at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:286)
        at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:213)
        ... 4 more
2016-07-28 09:33:10,055 INFO  [MASTER_SERVER_OPERATIONS-NameNode:16000-3] regionserver.HRegionServer: STOPPED: Caught throwable while processing event M_SERVER_SHUTDOWN
2016-07-28 09:33:10,055 ERROR [MASTER_SERVER_OPERATIONS-NameNode:16000-2] executor.EventHandler: Caught throwable while processing event M_SERVER_SHUTDOWN
java.io.IOException: Server is stopped
        at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:194)
        at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:129)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

And this is the one of the Region Servers logs:
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
2016-07-28 09:33:10,033 WARN  [RS_LOG_REPLAY_OPS-datanode-2:16020-0] regionserver.SplitLogWorker: log splitting of WALs/datanode-2,16020,1466263181091-splitting/datanode-2%2C16020%2C1466263181091.default.1469654596681 failed, returning error
org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-304127416-10.0.0.7-1465905487911:blk_1073770935_30121 file=/hbase/WALs/datanode-2,16020,1466263181091-splitting/datanode-2%2C16020%2C1466263181091.default.1469654596681
        at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:882)
        at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:563)
        at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:793)
        at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:840)
        at java.io.DataInputStream.read(DataInputStream.java:100)
        at org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:298)
        at org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:267)
        at org.apache.hadoop.hbase.wal.WALSplitter.getReader(WALSplitter.java:839)
        at org.apache.hadoop.hbase.wal.WALSplitter.getReader(WALSplitter.java:763)
        at org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:297)
        at org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:235)
        at org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:104)
        at org.apache.hadoop.hbase.regionserver.handler.WALSplitterHandler.process(WALSplitterHandler.java:72)
        at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:129)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Now the file "/hbase/WALs/datanode-2,16020,1466263181091-splitting/datanode-2%2C16020%2C1466263181091.default.1469654596681" does exists on the dfs and I have run "hdfs fsck /" and it says "The filesystem under path '/' is HEALTHY". But when I try accessing the file bad things happens:

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
16/07/28 09:57:30 WARN hdfs.DFSClient: Found Checksum error for BP-304127416-10.0.0.7-1465905487911:blk_1073770935_30121 from DatanodeInfoWithStorage[10.0.0.9:50010,DS-be379543-51da-481b-ad25-2259d8fbe479,DISK] at 5120
16/07/28 09:57:30 WARN hdfs.DFSClient: Found Checksum error for BP-304127416-10.0.0.7-1465905487911:blk_1073770935_30121 from DatanodeInfoWithStorage[10.0.0.8:50010,DS-b5d3244e-3ccb-4c5f-831d-4c300e59ee74,DISK] at 5120
16/07/28 09:57:30 INFO hdfs.DFSClient: Could not obtain BP-304127416-10.0.0.7-1465905487911:blk_1073770935_30121 from any node: java.io.IOException: No live nodes contain block BP-304127416-10.0.0.7-1465905487911:blk_1073770935_30121 after checking nodes = [DatanodeInfoWithStorage[10.0.0.9:50010,DS-be379543-51da-481b-ad25-2259d8fbe479,DISK], DatanodeInfoWithStorage[10.0.0.8:50010,DS-b5d3244e-3ccb-4c5f-831d-4c300e59ee74,DISK]], ignoredNodes = null No live nodes contain current block Block locations: DatanodeInfoWithStorage[10.0.0.9:50010,DS-be379543-51da-481b-ad25-2259d8fbe479,DISK] DatanodeInfoWithStorage[10.0.0.8:50010,DS-b5d3244e-3ccb-4c5f-831d-4c300e59ee74,DISK] Dead nodes:  DatanodeInfoWithStorage[10.0.0.9:50010,DS-be379543-51da-481b-ad25-2259d8fbe479,DISK] DatanodeInfoWithStorage[10.0.0.8:50010,DS-b5d3244e-3ccb-4c5f-831d-4c300e59ee74,DISK]. Will get new block locations from namenode and retry...
16/07/28 09:57:30 WARN hdfs.DFSClient: DFS chooseDataNode: got # 1 IOException, will wait for 2671.5631767728437 msec.
16/07/28 09:57:33 WARN hdfs.DFSClient: Found Checksum error for BP-304127416-10.0.0.7-1465905487911:blk_1073770935_30121 from DatanodeInfoWithStorage[10.0.0.9:50010,DS-be379543-51da-481b-ad25-2259d8fbe479,DISK] at 5120
16/07/28 09:57:33 WARN hdfs.DFSClient: Found Checksum error for BP-304127416-10.0.0.7-1465905487911:blk_1073770935_30121 from DatanodeInfoWithStorage[10.0.0.8:50010,DS-b5d3244e-3ccb-4c5f-831d-4c300e59ee74,DISK] at 5120
16/07/28 09:57:33 INFO hdfs.DFSClient: Could not obtain BP-304127416-10.0.0.7-1465905487911:blk_1073770935_30121 from any node: java.io.IOException: No live nodes contain block BP-304127416-10.0.0.7-1465905487911:blk_1073770935_30121 after checking nodes = [DatanodeInfoWithStorage[10.0.0.9:50010,DS-be379543-51da-481b-ad25-2259d8fbe479,DISK], DatanodeInfoWithStorage[10.0.0.8:50010,DS-b5d3244e-3ccb-4c5f-831d-4c300e59ee74,DISK]], ignoredNodes = null No live nodes contain current block Block locations: DatanodeInfoWithStorage[10.0.0.9:50010,DS-be379543-51da-481b-ad25-2259d8fbe479,DISK] DatanodeInfoWithStorage[10.0.0.8:50010,DS-b5d3244e-3ccb-4c5f-831d-4c300e59ee74,DISK] Dead nodes:  DatanodeInfoWithStorage[10.0.0.9:50010,DS-be379543-51da-481b-ad25-2259d8fbe479,DISK] DatanodeInfoWithStorage[10.0.0.8:50010,DS-b5d3244e-3ccb-4c5f-831d-4c300e59ee74,DISK]. Will get new block locations from namenode and retry...
16/07/28 09:57:33 WARN hdfs.DFSClient: DFS chooseDataNode: got # 2 IOException, will wait for 8977.03196466135 msec.
16/07/28 09:57:42 WARN hdfs.DFSClient: Found Checksum error for BP-304127416-10.0.0.7-1465905487911:blk_1073770935_30121 from DatanodeInfoWithStorage[10.0.0.9:50010,DS-be379543-51da-481b-ad25-2259d8fbe479,DISK] at 5120
16/07/28 09:57:42 WARN hdfs.DFSClient: Found Checksum error for BP-304127416-10.0.0.7-1465905487911:blk_1073770935_30121 from DatanodeInfoWithStorage[10.0.0.8:50010,DS-b5d3244e-3ccb-4c5f-831d-4c300e59ee74,DISK] at 5120
16/07/28 09:57:42 INFO hdfs.DFSClient: Could not obtain BP-304127416-10.0.0.7-1465905487911:blk_1073770935_30121 from any node: java.io.IOException: No live nodes contain block BP-304127416-10.0.0.7-1465905487911:blk_1073770935_30121 after checking nodes = [DatanodeInfoWithStorage[10.0.0.9:50010,DS-be379543-51da-481b-ad25-2259d8fbe479,DISK], DatanodeInfoWithStorage[10.0.0.8:50010,DS-b5d3244e-3ccb-4c5f-831d-4c300e59ee74,DISK]], ignoredNodes = null No live nodes contain current block Block locations: DatanodeInfoWithStorage[10.0.0.9:50010,DS-be379543-51da-481b-ad25-2259d8fbe479,DISK] DatanodeInfoWithStorage[10.0.0.8:50010,DS-b5d3244e-3ccb-4c5f-831d-4c300e59ee74,DISK] Dead nodes:  DatanodeInfoWithStorage[10.0.0.9:50010,DS-be379543-51da-481b-ad25-2259d8fbe479,DISK] DatanodeInfoWithStorage[10.0.0.8:50010,DS-b5d3244e-3ccb-4c5f-831d-4c300e59ee74,DISK]. Will get new block locations from namenode and retry...
16/07/28 09:57:42 WARN hdfs.DFSClient: DFS chooseDataNode: got # 3 IOException, will wait for 13504.266287461855 msec.
cat: Checksum error: /hbase/WALs/datanode-2,16020,1466263181091-splitting/datanode-2%2C16020%2C1466263181091.default.1469654596681 at 5120 exp: 123018583 got: 48747596
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Any help is appreciated in order to recover our HBase cluster...