Monday, May 25, 2015

Interesting stuff from Facebook DC network talks

This is like a 5-star movie, plain at first sight, lovely every time you re-watch, Facebook's Data Center Fabric.


16 servers per rack, 48 racks per pod, 48 pods per altogether. (Total number of servers are 16 x 48 x 48 = 36,864 servers in a single fabric clos). 1 wedge for each rack as Top of Row switch (presumably) [16 ports x 40Gbps], and truck load of 6-Packs [128 ports x 40 Gbps x 1.92Tbps non-blocking throughput].
Inter-pod traffic, for fully populate network, there are 4 x ECMP paths (at the maximum of 4 hops) from any rack to any other rack. Intra-pod traffic (traffic between racks belonging to the same pod, it is 4xECMP path with 2 hops each.


From the look of it,
1. Each server seems to only connect to the network via a single NIC. (i.e. no network connection redundancy) (I think we can pretty much presume commodity-grade server hardware is used also). Therefore, redundancy is definitely built-in at a software layer and entirely make do without the server level redundancy. (I do not have a hard evidence here, but this guys may not even use RAID at the server)
2. One single protocol, BGP. (Hah! say it again!) At first, I think the detail is skimmed over, but later slide show that 6-pack has only minimum routing functionality. And I was like, ummm, how the hell do they do that? And I am curious. And that probably is a good thing. Is this actually possible? What is the convergence time? Does it really matter given the design? A good source I turned to only suggested that I should think whether that IGP element is absolutely required. (OSPF/IS-IS)

Assuming we want to improve on this design, what can we improve and why! (Wow, that makes it even more interesting, isn't it? :P)
References:
- Attached Screenshot Taken from Facebook's Data Center Fabric,https://www.youtube.com/watch?v=kcI3fGEait0
- See also, Introducing 6-pack, the First Open Hardware Modular Switch 

- See also a walking through iterations of the design, Datacenter Networking @ Facebook,https://www.youtube.com/watch?v=xC461XfmI0E  (This one actually explicitly states that BGP is the only protocol used in the fabric)

See Also:

- Use of BGP for routing in large-scale data centers, draft-ietf-rtgwg-bgp-routing-large-dc-02, https://tools.ietf.org/html/draft-ietf-rtgwg-bgp-routing-large-dc-02