When is DataNode 'bad'?

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

When is DataNode 'bad'?

Dejan Menges

From time to time I see some reduces failing with this:

Error: java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try. The current failed datanode replacement policy is DEFAULT, and a client may configure this via 'dfs.client.block.write.replace-datanode-on-failure.policy' in its configuration.

I don't see any issues in HDFS during this period (for example, for specific node on which this happened, I checked the logs, and only thing that was happening at that specific point was that pipeline was recovering). 

So not quite sure how there's no more good datanodes in cluster of 15 nodes with replication factor three?

Also, regarding http://blog.cloudera.com/blog/2015/03/understanding-hdfs-recovery-processes-part-2/ - there is parameter called dfs.client.block.write.replace-datanode-on-failure.best-effort which I can not find currently. From which Hadoop version this parameter can be used, and how much sense it makes to use it to avoid issues like this one from above?

It's about Hadoop 2.4, Hortonworks 2.1, and currently preparing upgrade to 2.2 and not sure if this is maybe some known issue or something I don't get.

Thanks a lot,