koko - Connect Containers together with virtual ethernet connections

05 Apr 2017

Let’s dig into koko created by Tomofumi Hayashi. koko (the project’s namesake comes from “COntainer COnnector”) is a utility written in Go that gives us a way to connect containers together with “veth” (virtual ethernet) devices – a feature available in the Linux kernel. This allows us to specify interfaces that the containers use and link them together – all without using Linux bridges. koko has become a cornerstone of the zebra-pen project, an effort I’m involved in to analyze gaps in containerized NFV workloads, specifically it routes traffic using Quagga, and we setup all the interfaces using koko. The project really took a turn for the better when Tomo came up with koko and we implemented it in zebra-pen. Ready to see koko in action? Let’s jump in the pool!

Quick update note: This article was written before “koko” was named…. “koko”! It was previously named “vethcon” as it dealt primarily with “veth connections for containers” and that’s what we focus on in this article. Now, “vethcon” does more than just use “veth” interfaces, and henceforth it was renamed. Now, it can also do some cool work with vxlan interfaces to do what we’ll do here – but also across hosts! This article still focuses on using veth interfaces. I did a wholesale find-and-replace of “vethcon” with “koko” and everything should “just work”, but, just so you can be forewarned.

We’ll talk about the back-story for what veth interfaces are, and talk a little bit about Linux network namespaces. Then we’ll dig into the koko source itself and briefly step through what it’s doing.

Last but not least – what fun would it be if we didn’t fire up koko and get it working? If you’re less interested in the back story, just scroll down to the “Ready, set, compile!” section. From there you can get your hands on the keyboard and dive into the fun stuff. Our goal will be to compile koko, connect two containers with one another, look at those interfaces and get a ping to come across them.

We’ll just connect a couple containers together, but, using koko you can also connect network namespaces to containers, and network namespaces to network namespaces, too.

Another note before we kick this off – koko’s life has really just begun, it’s useful and functional as it is. But, Tomo has bigger and better ideas for it – there’s some potential in the future for creating vxlan interfaces (and given that the rename happened, those are in there at least as a prototype), and getting it working with CNI – but, there’s still experimentation to be done there, and I don’t want to spoil it by saying too much. So, as I’ve heard said before “That’s another story for another bourbon.”

Requirements

If you want to sing along – the way that I’m going through this is using a fresh install of CentOS 7. In my case I’m using the generic cloud image. Chances are this will be very similar with a RHEL or Fedora install. But if you want to play along the same exact way, spin yourself up a fresh CentOS 7 VM.

You’re also going to need a spankin’ fresh version of Docker. So we’ll install from the official Docker RPM repos and install a really fresh one.

The back-story

koko leverages “veth” – as evidenced by its name. veth interfaces aren’t exactly new, veth devices were proposed way back in ‘07. The original authors describe veth as:

Veth stands for Virtual ETHernet. It is a simple tunnel driver that works at the link layer and looks like a pair of ethernet devices interconnected with each other.

veth interfaces come in pairs, and that’s what we’ll do in a bit, we’ll pair them up together with two containers. If you’d like to see some diagrams of veth pairs in action – I’ll point you to this article from opencloudblog which has does a nice job illustrating it.

Another concept that’s important to the functioning of koko is “network namespaces”. Linux namespaces is the general concept here that allows network namespaces – in short they give us a view of resources that are limited to a “namespace”. Linux namespaces are a fundamental part of how containers function under Linux, it provides the over-arching functionality that’s necessary to segregate processes and users, etc. This isn’t new either – apparently it begun in 2002 with mount-type namespaces.

Without network namespaces, in Linux all of your interfaces and routing tables are all mashed together and available to one another. With network namespaces, you can isolate these from one another, so they can work independently from one-another. This will give processes a specific view of these interfaces.

Let’s look at the koko go code.

So, what’s under the hood? In essence, koko uses a few modules and then provides some handling for us to pick out the container namespace and assign veth links to the containers. Its simplicity is its elegance, and quite a good idea.

It’s worth noting that koko may change after I write this article, so if you’re following along with a clone of koko – you might want to know what point in the git history it exists, so I’ll point you towards browsing the code at commitish 35c4c58 if you’d like.

Let’s first look at the modules, then, I’ll point you through the code just a touch, in case you wanted to get in there and look a little deeper.

The libraries

vishvananda/netlink: does cool things like you’d do as a user like ip link add
containernetworking/cni/pkg/ns: The namespace package from the CNI project. Used to access network namespaces.
docker/docker/client: The docker client (to access, y’know, docker containers, specifically to inspect the container and get its network settings)

And other things that are more utilitarian, such as package context, c-style getopts, and internal built-ins like os,fmt,net, etc.

Application flow

Note: some of this naming may have changed a bit with the koko upgrade

At it’s core, koko defines a data object called vEth, which gives us a structure to store some information about the connections that we’ll make.

It’s a struct and is defined as so:

// ---------------------------------------------------- -
// ------------------------------ vEth data object.  - -
// -------------------------------------------------- -
// -- defines a data object to describe interfaces
// -------------------------------------------------- -

type vEth struct {
    // What's the network namespace?
    nsName string
    // And what will we call the link.
    linkName string
    // Is there an ip address?
    withIPAddr bool
    // What is that ip address.
    ipAddr net.IPNet
}

In some fairly terse diagramming using asciiflow, the general application flow goes as follows… (It’s high level, I’m missing a step or two, but, it’d help you dig through the code a bit if you were to step through it)

main()
  +
  |
  +------> parseDOption()  (parse -d options from cli)
  |
  +------> parseNOption()  (parse -n options from cli)
  |
  +------> makeVeth(veth1, veth2) with vEth data objects
               +
               |
               +------>  getVethPair(link names)
               |             +
               |             |
               |             +------>  makeVethPair(link)
               |                          +
               |                          |
               |                          +----> netlink.Veth()
               |
               +------>  setVethLink(link) for link 1 & 2

Ready, set, compile!

Ok, first let’s get ready and install the dependencies that we need. Go makes it really easy on us – it handles its own deps and we basically will just need golang, git and Docker.

# Enable the docker ce repo
[centos@koko ~]$ sudo yum-config-manager     --add-repo     https://download.docker.com/linux/centos/docker-ce.repo

# Install the deps.
[centos@koko ~]$ sudo yum install -y golang git docker-ce

# Start and enable docker
[centos@koko ~]$ sudo systemctl start docker && sudo systemctl enable docker

# Check that docker is working
[centos@koko ~]$ sudo docker ps

Now let’s set the gopath and git clone the code.

# Set the go path
[centos@koko ~]$ rm -Rf gocode/
[centos@koko ~]$ mkdir -p gocode/src
[centos@koko ~]$ export GOPATH=/home/centos/gocode/

# Clone koko
[centos@koko ~]$ git clone https://github.com/redhat-nfvpe/koko.git /home/centos/gocode/src/koko

Finally, we’ll grab the dependencies and compile koko.

# Fetch the dependencies for koko
[centos@koko ~]$ cd gocode/
[centos@koko gocode]$ go get koko

# Now, let's compile it
[centos@koko ~]$ go build koko

Now you can go ahead and run the help if you’d like.

[centos@koko gocode]$ ./koko -h

Usage:
./koko -d centos1:link1:192.168.1.1/24 -d centos2:link2:192.168.1.2/24 #with IP addr
./koko -d centos1:link1 -d centos2:link2  #without IP addr
./koko -n /var/run/netns/test1:link1:192.168.1.1/24 <other>  

Make a handy-dandy little Docker image

Let’s make ourselves a handy Docker image that we can use – we’ll base it on CentOS and just add a couple utilities for inspecting what’s going on.

Make a Dockerfile like so:

FROM centos:centos7
RUN yum install -y iproute tcpdump

I just hucked my Dockerfile into tmp and built from there.

[centos@koko gocode]$ cd /tmp/
[centos@koko tmp]$ vi Dockerfile
[centos@koko tmp]$ sudo docker build -t dougbtv/inspect-centos .

Run your containers

Now you can spin up a couple containers based on those images…

Note that we’re going to run these with --network none as a demonstration.

Let’s do that now…

[centos@koko gocode]$ sudo docker run --network none -dt --name centos1 dougbtv/inspect-centos /bin/bash
[centos@koko gocode]$ sudo docker run --network none -dt --name centos2 dougbtv/inspect-centos /bin/bash

If you exec ip link on either of the containers you’ll see they only have a local loopback interfaces.

[centos@koko gocode]$ sudo docker exec -it centos1 ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

That’s perfect for us for now.

Let’s give koko a run!

Ok, cool, so at this point you should have a couple containers named centos1 & centos2 running, and both of those with --network none so they only have the local loopback as mentioned.

[centos@koko gocode]$ sudo docker ps --format 'table     '
NAMES               IMAGE
centos2    dougbtv/inspect-centos
centos1    dougbtv/inspect-centos

Cool – now, let’s get some network between the two of these containers using vetcon… What we’re going to do is put the containers on a network, the /24 we’re going to choose is 10.200.0.0/24 and we’ll make network interfaces named net1 and net2.

You pass these into koko with colon delimited fields which is like -d {container-name}:{interface-name}:{ip-address/netmask}. As we mentioned earlier, since veths are pairs – you pass in the -d {stuff} twice for the pain, one for each container.

Note that the container name can either be the name (as we gave it a --name in our docker run or it can be the container id [the big fat hash]). The interface name must be unique – it can’t match another one on your system, and it must be different

So that means we’re going to execute koko like this. (Psst, make sure you’re in the ~/gocode/ directory we created earlier, unless you moved the koko binary somewhere else that’s handy.)

Drum roll please…

[centos@koko gocode]$ sudo ./koko -d centos1:net1:10.200.0.1/24 -d centos2:net2:10.200.0.2/24
Create veth...done

Alright! Now we should have some interfaces called net1 and net2 in the centos1 & centos2 containers respectively, let’s take a look by running ip addr on each container. (I took the liberty of grepping for some specifics)

[centos@koko gocode]$ sudo docker exec -it centos1 ip addr | grep -P "^\d|inet "
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
    inet 127.0.0.1/8 scope host lo
28: net1@if27: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP 
    inet 10.200.0.1/24 scope global net1

[centos@koko gocode]$ sudo docker exec -it centos2 ip addr | grep -P "^\d|inet "
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
    inet 127.0.0.1/8 scope host lo
27: net2@if28: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP 
    inet 10.200.0.2/24 scope global net2

As you can see there’s an interface called net1 for the centos1 container, and it’s assigned the address 10.200.0.1. It’s companion, centos2 has the net2 address, assigned 10.200.0.2.

That being said, let’s exec a ping from centos1 to centos2 to prove that it’s in good shape.

Here we go!

[centos@koko gocode]$ sudo docker exec -it centos1 ping -c5 10.200.0.2
PING 10.200.0.2 (10.200.0.2) 56(84) bytes of data.
64 bytes from 10.200.0.2: icmp_seq=1 ttl=64 time=0.063 ms
64 bytes from 10.200.0.2: icmp_seq=2 ttl=64 time=0.068 ms
64 bytes from 10.200.0.2: icmp_seq=3 ttl=64 time=0.055 ms
64 bytes from 10.200.0.2: icmp_seq=4 ttl=64 time=0.054 ms
64 bytes from 10.200.0.2: icmp_seq=5 ttl=64 time=0.052 ms

--- 10.200.0.2 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 3999ms
rtt min/avg/max/mdev = 0.052/0.058/0.068/0.009 ms

Alright, looking good with a ping, just to couple check, let’s also see that we can see it with a tcpdump on centos2. So, bring up 2 ssh sessions to this host (or, if it’s local to you, two terminals will do well, or however you’d like to do this).

And we’ll start a TCP dump on centos2 and we’ll exec the same ping command as above on centos1

And running that, we can see the pings going to-and-fro!

[centos@koko ~]$ sudo docker exec -it centos2 tcpdump -nn -i net2 'icmp'
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on net2, link-type EN10MB (Ethernet), capture size 65535 bytes
21:39.426020 IP 10.200.0.1 > 10.200.0.2: ICMP echo request, id 43, seq 1, length 64
21:39.426050 IP 10.200.0.2 > 10.200.0.1: ICMP echo reply, id 43, seq 1, length 64
21:40.425953 IP 10.200.0.1 > 10.200.0.2: ICMP echo request, id 43, seq 2, length 64
21:40.425983 IP 10.200.0.2 > 10.200.0.1: ICMP echo reply, id 43, seq 2, length 64
21:41.425898 IP 10.200.0.1 > 10.200.0.2: ICMP echo request, id 43, seq 3, length 64
21:41.425925 IP 10.200.0.2 > 10.200.0.1: ICMP echo reply, id 43, seq 3, length 64
21:42.425922 IP 10.200.0.1 > 10.200.0.2: ICMP echo request, id 43, seq 4, length 64
21:42.425949 IP 10.200.0.2 > 10.200.0.1: ICMP echo reply, id 43, seq 4, length 64
21:43.425870 IP 10.200.0.1 > 10.200.0.2: ICMP echo request, id 43, seq 5, length 64
21:43.425891 IP 10.200.0.2 > 10.200.0.1: ICMP echo reply, id 43, seq 5, length 64

(BTW, hit ctrl+c when you’re done with that tcpdump.)

Cool!!! …Man, sometimes when you’re working on networking goodies, the satisfaction of a successful ping is like no other. Ahhhh, feels so good.

Thank you, Tomo!

A big thanks goes out to Tomo for coming up with this idea, and then implementing it quite nicely in Go. It’s a well made utility built from an impressive idea. Really cool, I’ve enjoyed getting to utilitize it, and I hope it comes in handy to others in the future too.

dougbtv