Network Topology in Hadoop

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Network Topology in Hadoop

Stu Hood-2
Hey guys,

I've been reading up on DHTs quite a bit recently, and have come across the Vivaldi "topology-aware structured overlay" in a few places. Vivaldi uses hash functions to define a node's distance from other nodes in a 2 or 3 dimensional space. Every time a node communicates with another node, it uses the round trip time to modify/improve its own location hash. By looking at hash distances, you can determine the relative connection quality between any 2 nodes. (I'm probably explaining it all wrong. See this paper instead: http://portal.acm.org/citation.cfm?id=1272980.1272985&coll=GUIDE&dl=ACM&CFID=15151515&CFTOKEN=6184618 )

In Hadoop's case, namenodes and jobtrackers could use Vivaldi coordinates from datanodes, and attempt to either minimize or maximize physical proximity (depending). I know some work has been going on to integrate rack-awareness, so I just thought I'd point out the possibility of a self-managing solution in case you guys weren't aware of it.

Stu Hood
Webmail.us
"You manage your business. We'll manage your email."®

Reply | Threaded
Open this post in threaded view
|

RE: Network Topology in Hadoop

Devaraj Das
See
https://issues.apache.org/jira/secure/attachment/12345251/Rack_aware_HDFS_pr
oposal.pdf for an overview of the Network Topology in hadoop. The distance
between two nodes is calculated by summing up their distances to their
closest common ancestor.

For the management of the location of a node, we are moving to the approach
outlined in hadoop-1985. So it is not self managing at this point but quite
precise.

> -----Original Message-----
> From: Stu Hood [mailto:[hidden email]]
> Sent: Thursday, November 15, 2007 6:25 AM
> To: [hidden email]
> Subject: Network Topology in Hadoop
>
> Hey guys,
>
> I've been reading up on DHTs quite a bit recently, and have
> come across the Vivaldi "topology-aware structured overlay"
> in a few places. Vivaldi uses hash functions to define a
> node's distance from other nodes in a 2 or 3 dimensional
> space. Every time a node communicates with another node, it
> uses the round trip time to modify/improve its own location
> hash. By looking at hash distances, you can determine the
> relative connection quality between any 2 nodes. (I'm
> probably explaining it all wrong. See this paper instead:
> http://portal.acm.org/citation.cfm?id=1272980.1272985&coll=GUI
> DE&dl=ACM&CFID=15151515&CFTOKEN=6184618 )
>
> In Hadoop's case, namenodes and jobtrackers could use Vivaldi
> coordinates from datanodes, and attempt to either minimize or
> maximize physical proximity (depending). I know some work has
> been going on to integrate rack-awareness, so I just thought
> I'd point out the possibility of a self-managing solution in
> case you guys weren't aware of it.
>
> Stu Hood
> Webmail.us
> "You manage your business. We'll manage your email."R
>
>