HDFS Short-Circuit Local Reads

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

HDFS Short-Circuit Local Reads

Dejan Menges
Hi,

We are using (still, until Monday) HDP 2.1 for quite some time now, and SC local reads were enabled all the time. In beginning, we used Hortonworks recommendations and set SC cache size to 256, with default 5 minutes to invalidate them, and that's where problems started.

At some point in time we started using multigets. After very short time they started timing out on our side. We were playing with different timeouts, graphite was showing (metric hbase.regionserver.RegionServer.get_mean) that load on three nodes out of all other increased drastically. Looking into logs, googling, going through documentation over and over again, we found some discussion that SC cache by should be no lower than 4096. After setting it up to 4096, our problem was solved. For some time. 

At some point our data usage patterns were changed, and as we already had monitoring for this stuff, multigets started timing out again, monitoring showing it's timing out on two nodes where number of open sockets was ~3-4k per node, while on all others was 400-500. Narrowing this down a little bit we found some strangely too big regions, did some splitting, some manual merges, HBase distributed it around, but issue was still there. And then I found next three things (here's questions coming): 

- With cache size of 4096, and 300000ms cache expiry timeout, we saw exactly every ten minutes this error in logs:

2015-06-18 14:26:07,093 WARN org.apache.hadoop.hdfs.BlockReaderLocal: error creating DomainSocket
2015-06-18 14:26:07,093 WARN org.apache.hadoop.hdfs.client.ShortCircuitCache: ShortCircuitCache(0x3d1dc8c9): failed to load 1109699858_BP-1988583858-172.22.5.40-1424448407690
--
2015-06-18 14:36:07,135 WARN org.apache.hadoop.hdfs.BlockReaderLocal: error creating DomainSocket
2015-06-18 14:36:07,136 WARN org.apache.hadoop.hdfs.client.ShortCircuitCache: ShortCircuitCache(0x3d1dc8c9): failed to load 1109704764_BP-1988583858-172.22.5.40-1424448407690
--
2015-06-18 14:46:07,137 WARN org.apache.hadoop.hdfs.BlockReaderLocal: error creating DomainSocket
2015-06-18 14:46:07,138 WARN org.apache.hadoop.hdfs.client.ShortCircuitCache: ShortCircuitCache(0x3d1dc8c9): failed to load 1105787899_BP-1988583858-172.22.5.40-1424448407690

- After increasing SC cache to 8192 (as on those couple that were getting up to 5-7k 4096 obviously wasn't enough):
    - Our multigets are not taking between 20-30 seconds anymore but being again done within 5 seconds, what's our client timeout.
    - netstat -tanlp | grep -c 50010 shows now ~ 2800 open local SC per every node.

Why would those errors be logged exactly every 10 minutes with 4096 cache size and 5 minutes expire timeout?

Why would increasing SC cache also 'balance' number of open SC on all nodes?

Am I right that hbase.regionserver.RegionServer.get_mean shows mean number of gets in unit on time, not time needed to make a gets? If I'm true, increasing this made, in our case, gets faster. If I'm wrong, it made gets slower, but then it speeded up our multigets, what's twisting my brain after narrowing this down for a week. 

How should cache and expiry timeout correlate to each other?

Thanks a lot!