I think HDFS always makes every effort to fill up most Datanodes uniformly.
Anomaly arises when a large set of Datanodes are added to an existing
cluster. In this case one possible approach would be to write a tool that
does the following:
1. increase the replication factor of each file. This will automatically
create a new replica in those nodes that have more free disk-space and
2. then decrease the replication factor of the file to its original. The
HDFS code will automatically select the replica on the most-full node to be
deleted. (see Hadoop-1300)
The tool could take a set of HDFS directories as input and then do the above
two steps on all files (recursively) in the set of specified directories.
Will this approach address your issue?
From: Dennis Kubes [mailto:[hidden email]]
Sent: Wednesday, May 16, 2007 9:11 AM
To: [hidden email] Subject: Redistribute blocks evenly across DFS
Is there a way to redistribute blocks evenly across all DFS nodes. If
not I would be happy to program a tool to do so but I would need a
little guidance on howto.