I was looking at the JMX metrics from one of my DataNodes and noticed that if I divide BytesWritten by TotalWriteTime, I get a result of about 1000 Mb/sec. However, I can only dd to the disk storing HDFS data on that node at about 95 Mb/sec.
To try to figure out what's going on, I uploaded some data to HDFS and checked the values of BytesWritten before and after. BytesWritten changed by the expected amount (roughly the size of the data uploaded), so those numbers seem fine. I only have 3 DataNodes and RF=3, so all data is written to each DataNode during upload.
Therefore it seems that the value of TotalWriteTime must either be too low, or I'm misunderstanding what it means. My understanding of BlockReceiver#receivePacket is that TotalWriteTime measures the cumulative time spent by the DataNode writing to its local disk. Is this correct? If so, what could explain its value being so low that write throughput seems to be roughly 10x what the disk is actually capable of?