We discuss today the networking in container
world and primarily in context of K8s . We are not covering the policies and
isolation part , but only how L2 and L3 play a role in packet flows.
Flannel is an overlay network mechanism where as Calico is basically a pure L3 play.
Flannel works by using a vxlan device in conjunction with a software switch like linux bridge or ovs.
Container A when tries to reach container B on different host the traffic is pushed to the bridge on host A via the VETH pair. The bridge then based on ARP tries to get the mac of container B. Since container B is not on the host the traffic by bridge is forwarded at L2 to the vxlan device (software TAP device) which then allows flannel daemon software to capture those packets and then wrap then into a L3 packet for transport over a physical network using UDP. Also vxlan tagging is added to the packet to isolate them between tenants.
Flannel is an overlay network mechanism where as Calico is basically a pure L3 play.
Flannel works by using a vxlan device in conjunction with a software switch like linux bridge or ovs.
Container A when tries to reach container B on different host the traffic is pushed to the bridge on host A via the VETH pair. The bridge then based on ARP tries to get the mac of container B. Since container B is not on the host the traffic by bridge is forwarded at L2 to the vxlan device (software TAP device) which then allows flannel daemon software to capture those packets and then wrap then into a L3 packet for transport over a physical network using UDP. Also vxlan tagging is added to the packet to isolate them between tenants.
Flannel shown
diagrammatically
In case of Calico, the approach is little
different. Calico works at Layer 3 and depends on Linux routing for moving the
packets.
Calico injects a
routing rule inside the container for gateway at this IP 169.254.1.1.
default via
169.254.1.1 dev eth0
169.254.1.1 dev eth0
scope link
What this means is
that any traffic from the container first tries to go to the default gateway
IP. Since the default gateway IP is reachable at eth0 , the ARP request is sent
to eth0 for determining the mac address for gateway IP.
The trick here is the
arp proxy configured at the veth device on host side. This arp proxy responds
back with its mac for the ARP request for 169.254.1.1.
Post this resolution
the packets are sent to the veth device with source IP of container and
destination IP of target container. From here on the L3 routing of the host
takes effect which knows how to route for the destination container IP.
The routes amongst
the hosts are synchronized via the BGP protocol. There is a BGP client (Bird)
running on each host which makes sure each host has the updated routes.
So here you can see
in Calico solution, we got rid of software bridges as well as preserved the
source IP.
Diagrammatically the
flow is shown below
Also the
overlay complexity is out of the picture and it’s a pure L3 solution just based
on the principles of how the internet works. Since we make use of routing
principles rather then L2 broadcast domains, the need of vlan is eliminated.
Instead for tenant specific network flows Calico resorts to iptables based
mechanism.
So if we
just try to compare how say a bridge based communication happens vs a pure L3
communication, the difference is that in case of bridge the bridge device IP
acts as the gateway for containers and so the next hop for any traffic not
within same broadcast domain is directed to the bridge device. This allows the
L3 on linux kernel on the host to apply the routing (the routing rules are
configured to forward the packets to the vm on which destination container
resides) or they are forwarded to a tap device to give opportunity to tunnel
the packets via GRE/vxlan.
On the
contrary the Calico approach relies on proxy ARP mechanism to transfer the
packet to the veth counterpart device on host side and again applying the
routing to take traffic out. So if we analyse this carefully, technically the
bridge is replaced with proxy ARP and route synchronization happens over BGP.
For more
information on Calico you can take a look at https://www.projectcalico.org/
In essence
packets from vm or containers can use one of the following mechanisms to
communicate with containers/vms on other hosts
1.
Use overlay like GRE/VXLAN
2.
use NAT to send packets to remote host
3.
use Calico like mechanism with pure L3 routing without having any NAT
and bridges. This allows to preserve source IP and security policies ingress
can be applied adequately based on source IPs