Are you exhausted? IPv4 almost is -- let's setup an IPv6 lab for Kubernetes

It’s no secret that there’s the inevitability that IPv4 is becoming exhausted. And it’s not just tired (ba-dum-ching!). Since we’re a bunch of Kubernetes fans, and we’re networking fans – we really want to check out what we can do with IPv6 with Kubernetes. Thanks to some slinky automation by my colleague, Feng Pan, contributed to kube-ansible, he was able to implement some creative work by leblancd. In this simple setup today, we’re going to deploy Kubernetes with custom binaries from leblancd and have two pods (ideally on different nodes) ping one another with ping6 and declare victory! In the future let’s hope to iterate on what’s necessary to get IPv6 functionality in Kubernetes.

There’s an ever growing interest in IPv6 for Kubernetes. There’s a solid effort by the good folks from the Kubernetes SIG-Network. You’ll find in the SIG-Network features spreadsheet that IPv6 is slated for the next release. There’s probably more to that Additionally, you can find some more information about the issues tagged for IPv6 up on the k/k GitHub, too.

There’s also a README for creating an IPv6 lab with kube-ansible on GitHub.


Our goal here with this setup is to make it possible to ping6 one pod from another. I’m looking forward to using this laboratory to explore the other possibilities and scenarios, however this pod-to-pod ping6 is the baseline functionality from which to start adventuring into further territory.


TL;DR: A host that can run VMs (or choose your own adventure and bring your baremetal or some other cloud), an editor (anything but Emacs, just kidding), git and Ansible.

To run these playbooks, we assume you have already adventured warily so far that you have:

  • A machine for running Ansible (like your workstation) and have Ansible installed.
  • Ansible 2.4 or later (necessary to support get_url with IPv6 enabled machines)
  • A host capable of running virtual machines, and is running CentOS 7.
  • Git. If you don’t have git, get git. Don’t be a git. We’ll clone up in a minute here.

We also disable the “bridged networking” feature we often use and instead uses NAT’ed libvirt virtual machines.

You may have to disable GRO (generic receive offload) for the NICs on the virtualization host (if you’re using one).

An example of doing so is:

ethtool -K em3 gro off

Fire up your terminal, and let’s clone this repo!

You’re going to need to clone up this repo, let’s clone at the latest tag that supports this functionality.

$ git clone --branch v0.1.6

Cool, enter the dir and surf around if you wish, we’ll setup our inventory and necessary variables.

If you clone master instead of that tag, don’t forget to install the galaxy roles!

There’s likely some Ansible Galaxy roles to install, if find . | grep -i require shows any files, do a ansible-galaxy install -r requirements.yml.

Inventory and variable setup

Let’s look at an inventory and variable overrides to use. Make sure you have a host setup you can run VMs on, that’s running CentOS 7, and ensure you can SSH to it.

Here’s the initially used inventory, which only really cares about the virthost. Here I’m placing this inventory file @ inventory/my.virthost.inventory. You’ll need to modify the location of the host to match your environment.

the_virthost ansible_host= ansible_ssh_user=root


And the overrides which are based on the examples @ ./inventory/examples/virthost/virthost-ipv6.inventory.yml. I’m creating this set of extra variables @ ./inventory/extravars.yml :

bridge_networking: false
  - name: kube-master
    node_type: master
  - name: kube-node-1
    node_type: nodes
  - name: kube-node-2
    node_type: nodes
  - name: kube-nat64-dns64
    node_type: other
ipv6_enabled: true

Spinning up and access virtual machines

Perform a run of the virthost-setup.yml playbook, using the previously mentioned extra variables for override, and an inventory which references the virthost.

ansible-playbook -i inventory/my.virthost.inventory -e "@./inventory/extravars.yml" virthost-setup.yml

This will produce an inventory file in the local clone of this repo @ ./inventory/vms.local.generated. And it will also create some SSH keys for you which you’ll find in the .ssh folder of the user you ran the Ansible playbooks as.

In the case that you’re running Ansible from your workstation, and your virthost is another machine, you may need to SSH jump host from the virthost to the virtual machines.

If that is the case, you may add to the bottom of ./inventory/vms.local.generated a line similar to this (replacing root@ with the method you use to access the virtualization host):

cat << EOF >> ./inventory/vms.local.generated
ansible_ssh_common_args='-o ProxyCommand="ssh -W %h:%p root@"'

Optional: Handy-dandy “ssh to your virtual machines script”

You may wish to log into to the machines in order to debug, or even more likely – to access the Kubernetes master after an install.

You may wish to create a script, in this example… This script is located at ~/ and you should change to the hostname or IP address of your virthost.

# !/bin/bash
ssh -i ~/.ssh/the_virthost/id_vm_rsa -o ProxyCommand="ssh root@ nc $1 22" centos@$1

You would use this script by calling it with ~/ yourhost.local where the first parameter to the script is the hostname or IP address of the virtual machine you wish to acess.

Here’s an example of using it to access the kubernetes master by pulling the IP address from the generated inventory:

$ ~/ $(cat inventory/vms.local.generated | grep "kube-master.ansible" | cut -d"=" -f 2)

Deploy a Kubernetes cluster

With the above in place, we can now perform a kube install, and use the locally generated inventory.

ansible-playbook -i inventory/vms.local.generated -e "@./inventory/extravars.yml" kube-install.yml

SSH into the master, if you created it above, use the handy

Just double check things are coming up Milhouse Check out the status of the cluster with kubectl get nodes and/or kubectl cluster-info.

We’ll now create a couple pods via a ReplicationController. Create a YAML resource definition like so:

[centos@kube-master ~]$ cat debug.yaml 
apiVersion: v1
kind: ReplicationController
  name: debugging
  replicas: 2
    app: debugging
      name: debugging
        app: debugging
      - name: debugging
        command: ["/bin/bash", "-c", "sleep 2000000000000"]
        image: dougbtv/centos-network-advanced
        - containerPort: 80

Create the pods with kubectl by issuing:

$ kubectl create -f debug.yaml

Watch ‘em come up:

[centos@kube-master ~]$ watch -n1 kubectl get pods -o wide

Try it out!

Once those pods are fully running, list them, and take a look at the IP addresses, like so:

[centos@kube-master ~]$ kubectl get pods -o wide
NAME              READY     STATUS    RESTARTS   AGE       IP            NODE
debugging-cvbb2   1/1       Running   0          4m        fd00:101::2   kube-node-1
debugging-gw8xt   1/1       Running   0          4m        fd00:102::2   kube-node-2

Now you can exec commands in one of them, to ping the other (note that your pod names and IPv6 addresses are likely to differ):

[centos@kube-master ~]$ kubectl exec -it debugging-cvbb2 -- /bin/bash -c 'ping6 -c5 fd00:102::2'
PING fd00:102::2(fd00:102::2) 56 data bytes
64 bytes from fd00:102::2: icmp_seq=1 ttl=62 time=0.845 ms
64 bytes from fd00:102::2: icmp_seq=2 ttl=62 time=0.508 ms
64 bytes from fd00:102::2: icmp_seq=3 ttl=62 time=0.562 ms
64 bytes from fd00:102::2: icmp_seq=4 ttl=62 time=0.357 ms
64 bytes from fd00:102::2: icmp_seq=5 ttl=62 time=0.555 ms

Finally pat yourself on the back and enjoy some IPv6 goodness.

Ghost Riding The Whip -- A complete Kubernetes workflow without Docker, using CRI-O, Buildah & kpod

It is my decree that whenever you are using Kubernetes without using Docker you are officially “ghost riding the whip”, maybe even “ghost riding the kube”. (Well, I’m from Vermont, so I’m more like “ghost riding the combine”). And again, we’re running Kubernetes without Docker, but this time? We’ve got an entire workflow without Docker. From image build, to running container, to inspecting the running containers. Thanks to the good folks from the OCI project and Project Atomic, we’ve got kpod for working with running containers, and we’ve got buildah for building our images. And of course, don’t leave out CRI-O which makes the magic happen to get it all running in Kube without Docker. Fire up your terminals, because you’re about to ghost ride the kube.

I happened to see that there is a first release candidate of CRI-O which has a bunch of great improvements that work towards really getting CRI-O production ready for Kubernetes. And I have to say – my experience with using it has been nearly flawless. It’s been working like a champ, and I can tell they’re doing an excellent job with the polish. Of course that’s awesome, but, I was most excited to hear about kpod – “the missing tool”. When I wrote my first article about using CRI-O, I was missing a few portions – especially a half decent tool for checking out what’s going on with containers. This tool isn’t quite as mature as CRI-O itself, but, the presence of this tool at all is just a straight-up boon.

To get this all going, I have these tools (CRI-O, kpod & buildah) integrated into my vanilla kubernetes lab playbooks, kube-ansible. This playbook has it so we can compile CRI-O (which includes kpod), buildah, and get Kubernetes up and running (which uses kubeadm to initialize and join the pods). I made some upgrades to kube-ansible in the process, fixing up issues with kube 1.7, and also improving it so that kube-ansible can also use Fedora. CRI-O itself works wondefully with CentOS, but Buildah needs some kernel functionality that just isn’t available in CentOS yet, so… kube-centos-ansible now also supports Fedora, oddly or not-so-oddly enough.


This walk-through assumes that you have at least 2 machines with Fedora installed (and generally up-to-date). That’s where we’ll install Kubernetes with CRI-O (and kpod!). You might notice that we use kube-ansible, the name of which is… Not so apropos. But! It’s recently been updated to support Fedora. And we need Fedora to get a spankin’ fresh kernel, so we can use… Drum roll please… Buildah – an image building tool that is not Docker (wink, wink!).

Those machines need to have over 2 gigs of RAM. Compilation of CRI-O, specifically during a step with GCC was bombing out on me with GCC complaining it couldn’t allocate memory when I had just 2 gigs of RAM. Therefore, I recommend at least 4 gigs of RAM.

In addition to that, you’ll need git & Ansible installed on “some machine” (likely your workstation). And your handy-dandy editor. Cause… How do you live without an editor? Unless you’re feeding the input in on punch cards, in which case… You have my respect.

TL;DR, you need:

  • 2 or more Fedora machines with 4 gigs or RAM or more (and maybe 5 gigs free on disk)
  • On a client machine (like your workstation)

Spinning up a Kubernetes cluster with CRI-O (and kpod included!)

First off, go ahead and clone up the kube-ansible project…

git clone --branch v0.1.3

This article glosses over the fact that the kube-ansible has the ability to spin-up virtual machines to mock-up Kubernetes clusters. However, if you’re familiar with it, you can use it as well. I won’t go into depth here, but this is the technique that I use:

$ ansible-playbook -i inventory/your.inventory -e "vm_parameters_ram_mb=4096" virt-host-setup.yml 

Now we’ll a playbook to bootstrap the nodes with Python (as the Fedora cloud images don’t come packaged with Python).

$ ansible-playbook -i inventory/your.inventory fedora-python-bootstrapper.yml

For your reference here’s the inventory I used. This inventory can also be found in the ./inventory/examples/crio/crio.inventory in the clone. Mostly this is here to show you how to set the variables in order to get this puppy (that is, kube-ansible) to properly use Fedora, when it comes down to it.

kube-master ansible_host=
kube-node-1 ansible_host=
kubehost ansible_host= ansible_ssh_user=root

# Using Fedora


# Using Fedora

Start the Kubernetes install

Then you can go ahead and get your kube install rolling!

$ ansible-playbook -i inventory/vms.inventory -e 'container_runtime=crio' kube-install.yml 

That, my good friend… Is is a coffee-worthy step. It is now time to fuel up while that runs. (We’re compiling some big-idea kind of stuff, like CRI-O [and more]).

Verify that things are hunky dory

(Still unsure what the genesis of the phrase “hunky dory” is. But it means “satisfactory” or “just fine”)

Log yourself into the master. And, first of all things… Make sure you DON’T have Docker. And grin during this step. Cause I sure did.

[fedora@kube-crio-master ~]$ docker
-bash: docker: command not found
[fedora@kube-master ~]$ echo $?

YES. We want that to exit 127!

Make sure to see that the nodes are healthy…

$ kubectl get nodes

And making sure the nodes are in a ready state.

Optionally, spin up a pod. in my case I did a…

[fedora@kube-crio-master ~]$ cat <<EOF | kubectl create -f -
apiVersion: v1
kind: ReplicationController
  name: nginx
  replicas: 2
    app: nginx
      name: nginx
        app: nginx
      - name: nginx
        image: nginx
        - containerPort: 80
[fedora@kube-crio-master ~]$ watch -n1 kubectl get pods

They should come up! And if they are you should be able to query nginx.

[fedora@kube-crio-master ~]$ curl -s $(kubectl describe pod $(kubectl get pods | grep nginx | head -n 1 | awk '{print $1}') | grep "^IP" | awk '{print $2}') | grep -i thank
<p><em>Thank you for using nginx.</em></p>

Cool! That means that you have CRI-O up and poppin’. You are officially ghost riding the whip.

Clean that up if you want, with:

[fedora@kube-master ~]$ kubectl delete rc nginx

Wait – didn’t I promise you a complete work-flow that omits Docker at all? That’s right I did. So let’s go ahead and start up a from-scratch workflow here… with…


Awesome. Now, let’s go ahead and log into the node. For ease, for now, we’ll also sudo su -. In the future, you might wanna set this up to work for a specific user, but, I’ll leave that as a journey for the reader.

Check out the help for buildah, if you wish. That’s how I learned how to do this myself.

[root@kube-node-1 ~]# buildah --help

Now, let’s create a “Dockerfile”. We’ll use the Dockerfile syntax, as I’m familiar with it, and if you have existing Dockerfiles – buildah supports that!

So go ahead and make yourself a Dockerfile like so.

[root@kube-node-1 ~]# cat Dockerfile 
FROM fedora:26
RUN dnf install -y cowsay-beefymiracle cowsay
ENTRYPOINT ["cowsay","-s","Shoutout from Vermont!"]

This image is just a couple RPMs, really. Mostly cowsay (and then an extra “cowsay file” to add the beefy miracle art. According to Wikipedia:

cowsay is a program that generates ASCII pictures of a cow with a message.

And you think that machine learning is high tech? Obviously you haven’t seen cow ASCII art insult a co-worker before. The pinnacle of technology.

BONUS: To insult your co-workers using cowsay, install the package with dnf install cowsay and use wall to broadcast a message to all terminals logged into a machine.

[fedora@kube-node-1 ~]$ cowsay -s "your mother wears army boots" | wall
Broadcast message from fedora@kube-node-1 (pts/0) (Wed Sep 20 13:32:41 2017):
< your mother wears army boots >                                               
        \   ^__^                                                               
         \  (**)\_______                                                       
            (__)\       )\/\                                                   
             U  ||----w |                                                      
                ||     ||                                                      

Now that you have sufficiently made enemies with your co-workers, back to getting this workflow going.

Go ahead and kick off the build. And on the subject of ASCII – enjoy yourself the nicer ASCII progress bars than Docker, too.

[root@kube-node-1 ~]# buildah  --storage-driver overlay2 bud -t dougbtv/beefy .

The command we’re using there is buildah budbud is “build using dockerfile”. Very nice feature.

Note that we’re setting --storage-driver overlay2 (as a global option) which will store the images in the proper locations for runc (and therefore CRI-O) to see where these images are.

Also, for what it’s worth – I didn’t have great luck with the build cache on subsequent runs of buildah. I’m unsure what the progress on that functionality in buildah itself is. Likely, it may be something I did wrong in the compilation or installation of buildah, so if you see it and shoot me a note on twitter or place a github issue, that’d be awesome.

You can go ahead and list what you just built. Note that we’re including the storage driver option, again.

[root@kube-node-1 ~]# buildah  --storage-driver overlay2 images | grep -P "(IMAGE|beefy)"
IMAGE ID             IMAGE NAME                                               CREATED AT             SIZE
95c3725439f6                           Sep 15, 2017 23:28     1.983 KB

Great! You’ve got an image.

Now, lets run that image!

We’ll do this with Kubernetes itself today. Log into your master, and first thing, let’s specify a label that we’ll use for a node selector (which will specify on which node we’ll run this particular pod). In this case we’re doing this because we don’t have a registry to pull the images from, so, we’ve got to tell Kube to run the pod in a particular place – because that where we built the image.

Here’s the (admittedly zany) label that I added. (You can make a lot less insane node selector constraint if you’re sound of mind, too.)

$ kubectl label nodes kube-node-1 beefylevel=expert

And you can see what’s been labeled with:

[fedora@kube-master ~]$ kubectl get nodes --show-labels

Create yourself a beefy.yaml. Here you’ll see a few things that are fairly important, but something to pay attention to in this context is the imagePullPolicy: Never. Since this isn’t available on a registry, we want to tell Kubernetes “don’t even try to pull this”, by default it will try, say it can’t pull it, and then it won’t run the container.

Here’s the beefy.yaml I created.

[fedora@kube-master ~]$ cat beefy.yaml 
apiVersion: v1
kind: Pod
    app: beefy
  name: beefy
   - command:
       - "/bin/bash"
       - "-c"
       - "cowsay -f /usr/share/cowsay/beefymiracle.cow -s 'shouts from Vermont' && sleep 2000000"
     image: dougbtv/beefy
     name: beefy
     imagePullPolicy: Never
    beefylevel: expert

Go ahead and create that…

[fedora@kube-master ~]$ kubectl create -f beefy.yaml 
pod "beefy" created

And watch it come up.

[fedora@kube-master ~]$ watch -n1 kubectl get pods -o wide

(Note that it should be saying it’s coming up on kube-node-1)

Now for the pay day… Let’s see it rollin’.

[fedora@kube-master ~]$ kubectl logs beefy
< shouts from Vermont >
                      .---. __
           ,         /     \   \    ||||
          \\\\      |O___O |    | \\||||
          \   //    | \_/  |    |  \   /
           '--/----/|     /     |   |-'
                  // //  /     -----'
                 //  \\ /      /
                //  // /      /
               //  \\ /      /
              //  // /      /
             /|   ' /      /
             //\___/      /
            //   ||\     /
            \\_  || '---'
            /' /  \\_.-
           /  /    --| |
           '-'      |  |

Huzzah! We’ve got Beefy. It’s a gosh darned miracle. Dang heckin’ good job.

Awesome! That’s a whole workflow without Docker. Aww yisss. Now, let’s put a cherry on top…

Let’s try out kpod!

Enter kpod! That’s the missing tool from my last CRI-O article. We only had some really rudimentary stuff in runc that could do this for us. But, the Atomic guys are really tearing it up, and now we’ve got kpod which can do a whole lot more for us.

Now that we have a running container – We can check it out with kpod. There’s a lot more features on the way for kpod, but, for now it gives a nice way to work with your containers (and some container image utilities). I wanted to run it directly with this, but, that’s in the works at the tag at which I have CRI-O/kpod pinned.

So go ahead and log into the node… And we’ll sudo su - for now (as above). And let’s list the container processes…

[root@kube-node-1 ~]# kpod ps

Awesome! You should see dougbtv/beefy:latest in there.

And you can list the images with this tool, too.

[root@kube-node-1 ~]# kpod images

Say you want to see what’s in the ephemeral storage of an image, we can use kpod for this, too. So let’s pick up the id of our running container.

[root@kube-node-1 ~]# beefyid=$(kpod ps | grep -i beef | awk '{print $1}')
[root@kube-node-1 ~]# echo $beefyid

Now we can use that in order to look at what’s in the container. Let’s just cat the definition for the beefy miracle in cowsay.

[root@kube-node-1 ~]# cat $(kpod mount $beefyid)/usr/share/cowsay/beefymiracle.cow

That should show you a heavily escaped ASCII hotdog. Alright! Nice work Project Atomic folks! Quite a feat.

Ratchet CNI -- Using VXLAN for network isolation for pods in Kubernetes

In today’s episode we’re looking at Ratchet CNI, an implementation of Koko – but in CNI, the container networking interface that is used by Kubernetes for creating network interfaces. The idea being that the network interface creation can be performed by Kubernetes via CNI. Specifically we’re going to create some network isolation of network links between containers to demonstrate a series of “cloud routers”. We can use the capabilities of Koko to both create vEth connections between containers when they’re local to the same host, and then VXLAN tunnels to containers when they’re across hosts. Our goal today will be to install & configure Ratchet CNI on an existing cluster, we’ll verify it’s working, and then we’ll install a cloud router setup based on zebra pen (a cloud router demo).

Here’s what the setup will look like when we’re done:


The gist is that the green boxes are Kubernetes minions which run pods, and the blue boxes are pods running on those hosts, and the yellow boxes are the network interfaces that will be created by Ratchet (and therefore Koko). In this scenario, just one VXLAN tunnel is created when going between the hosts.

So that means we’ll route traffic from “CentOS A” container, across 2 routers (which use OSPF) to finally land at “CentOS B”, and have a ping come back across the links.

Note that Ratchet is still a prototype, and some of the constraints of it are limited to the static way in which interfaces and addressing is specified. This is indeed a limitation, but is intended to illustrate how you might specify the links between these containers.



  • A Kube cluster, spin one up my way if you wish.
  • Two nodes where you can schedule pods (and have the ability modify the CNI configuration on those nodes)


  • An operational Flannel plugin in CNI running on the cluster beforehand.

It’s worth it for you to note that I use an all CentOS 7 based cluster, which while it isn’t required, definitely colors how I use the ancillary tools and approach things.

Installing the Ratchet binaries

First thing we’re going to do is on each of the nodes we’re going to use here. In my case it’s just going to be two minion nodes which can schedule pods, I don’t bother putting it on my master.

Here’s how I download and put the binaries into place:

$ curl -L > ratchet.tar.gz
$ tar -xzvf ratchet.tar.gz 
$ sudo mv ratchet-cni-v0.1.0/* /opt/cni/bin/

That’s it – you’ve got ratchet! (Again, man, Go makes it easy, right.)

Spin up etcd

You’ll need to have an etcd instance – if you have a running instance you want to use for this, go ahead. I’ll include my scheme here where I run my own.

From wherever you have kubectl available, go ahead and work on these steps.

Firstly, I create a new namespace to run these etcd pods in…

$ tee ratchet-namespace.yaml <<'EOF'
  "kind": "Namespace",
  "apiVersion": "v1",
  "metadata": {
    "name": "ratchet",
    "labels": {
      "name": "ratchet"
$ kubectl create -f ratchet-namespace.yaml 
$ kubectl get namespaces

I have an example etcd pod spec in this gist, I download that…

[centos@kube-master ~]$ curl -L > etcd.yaml

And then create it in the ratchet namespace we just created, and watch it come up.

$ kubectl create -f etcd.yaml --namespace=ratchet
$ watch -n1 kubectl get pods --namespace=ratchet

This has also created a service for us.

[centos@kube-master ~]$ kubectl get svc --namespace=ratchet | head -n2
NAME          CLUSTER-IP      EXTERNAL-IP   PORT(S)             AGE
etcd-client   <none>        2379/TCP            56s

This service is important to the Ratchet configuration. So note how you can access this service – you can use the IP if all else fails, at least for testing that’s just fine. You don’t want to rely on that full-time, however.

If your nodes don’t resolve etcd-client.ratchet.svc.cluster.local – pay special attention. As this is the DNS name for etcd I’ll use in the following configs.

Configuring Ratchet

Now we need to put configurations into place. Firstly, you’re going to want to clear out whatevers in /etc/cni/net.d/, I recommend before getting to this point that you have flannel working because we can do something cool with this plugin available – we can bypass ratchet and pass along ineligible pods to Flannel (or any other plugin). I’ll include configs that have Flannel, here. If appropriate, replace with another plugin configuration.

Here I am moving my configs to a backup directory, do this on both hosts that will run Ratchet…

[centos@kube-minion-1 ~]$ mkdir cni-configs
[centos@kube-minion-1 ~]$ sudo mv /etc/cni/net.d/* ./cni-configs/

Let’s look at my current configuration…

[centos@kube-minion-2 ~]$ cat cni-configs/10-flannel.conf 
  "name": "cbr0",
  "type": "flannel",
  "delegate": {
    "isDefaultGateway": true

It’s a Flannel config, I’m gonna keep this around for a minute, cause I’ll use it in my upcoming configs.

Next, let’s assess what you have available for networking. Mine is pretty simple. Each of my nodes have a single nic – eth0, and it’s on the network, and that network is essentially flat – it can access the WAN over that NIC, and also the other nodes on the network. Naturally, in real life – your network will be more complex. But, in this step… Choose the proper NIC and IP address for your setup.

So, I pick out my NIC and IP address, what’s it look like on my nodes…

[centos@kube-minion-1 ~]$ ip a | grep -Pi "eth0|inet 192"
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    inet brd scope global dynamic eth0

Ok, cool, so I have eth0 and it’s – these are both going into my Ratchet config.

Now, here’s my Ratchet config I’ve created on this node, as /etc/cni/net.d/10-ratchet.conf:

[centos@kube-minion-1 ~]$ cat /etc/cni/net.d/10-ratchet.conf
  "name": "ratchet-demo",
  "type": "ratchet",
  "etcd_host": "etcd-client.ratchet.svc.cluster.local",
  "etcd_port": "2379",
  "child_path": "/opt/cni/bin/ratchet-child",
  "parent_interface": "eth0",
  "parent_address": "",
  "use_labels": true,
  "delegate": {
    "name": "cbr0",
    "type": "flannel",
    "delegate": {
      "isDefaultGateway": true
  "boot_network": {
    "type": "loopback"

Some things to note:

  • type: ratchet is required
  • etcd
    • etcd_host generally should point to the service we created in the previous step
    • etcd_port is the port on which etcd will respond.
    • You can test if curl etcd-client.ratchet.svc.cluster.local:2379 works and that will let you know if etcd is responding (it’ll respond with a 404)
  • child_path points where the secondary binary for ratchet lives, following these instructions this is the proper path.
    • parent_interface is the interface on which the VXLAN tunnels will reside
    • parent_address is the IP address remote VXLANs will use to create a tunnel to this machine.
  • use_labels should generally be true.
  • Alternate CNI plugin
    • delegate is a special field. In this we pack in an entire CNI config for another plugin. You’ll note that this is set to the exact entry that we have earlier when I show the current config for CNI on one of the minions. When pods are not labeled to use ratchet, they will use this CNI plugin (more on the labeling later).
  • boot_network – similar to delegate but when pods are eligble to be processed by Ratchet, they will have an extra interface created with the CNI config as packed into this property. In this case I just set a loopback device, using the loopback CNI plugin.

Great! You’ve got one all set. But, you need two. So setup another one on a second host.

On my second host I have the same config @ /etc/cni/net.d/10-ratchet.conf – minus one line which differs, and they is the parent_address (the parent_interface would differ if the nics were named differently on each host), so for example on the second minion I have…

[centos@kube-minion-2 ~]$ ip a | grep -iP "(eth0|192)"
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    inet brd scope global dynamic eth0

[centos@kube-minion-2 ~]$ cat /etc/cni/net.d/10-ratchet.conf | grep parent
  "parent_interface": "eth0",
  "parent_address": "",

Note that the IP address in parent_address matches that of the address on eth0.

Labeling the nodes

Alright, something we’re going to want to do is to specify which pods run where for demonstrative purposes. For this we’re going to use nodeSelector to tell Kube where to run these pods.

That being said, we will assign a label to each one…

[centos@kube-master ~]$ kubectl label nodes kube-minion-1 ratchetside=left
[centos@kube-master ~]$ kubectl label nodes kube-minion-2 ratchetside=right

And you can check those labels out if you need to…

[centos@kube-master ~]$ kubectl get nodes --show-labels

Running two pods as a baseline test

We are now all configured and ready to rumble with Ratchet. Let’s first create a couple pods to make sure everything is running.

Let’s create these pods using this yaml:

apiVersion: v1
kind: Pod
  name: primary-pod
    app: primary-pod
    ratchet: "true"
    ratchet.pod_name: "primary-pod"
    ratchet.target_pod: "primary-pod"
    ratchet.target_container: "primary-pod"
    ratchet.public_ip: ""
    ratchet.local_ip: ""
    ratchet.local_ifname: "in1"
    ratchet.pair_name: "pair-pod"
    ratchet.pair_ip: ""
    ratchet.pair_ifname: "in2"
    ratchet.primary: "true"
    - name: primary-pod
      image: dougbtv/centos-network
      command: ["/bin/bash"]
      args: ["-c", "while true; do sleep 10; done"]
    ratchetside: left
apiVersion: v1
kind: Pod
  name: pair-pod
    app: pair-pod
    ratchet: "true"
    ratchet.pod_name: pair-pod
    ratchet.primary: "false"
    - name: pair-pod
      image: dougbtv/centos-network
      command: ["/bin/bash"]
      args: ["-c", "while true; do sleep 10; done"]
    ratchetside: right

Likely the most important things to look at are these labels:

ratchet: "true"
ratchet.pod_name: "primary-pod"
ratchet.target_pod: "primary-pod"
ratchet.target_container: "primary-pod"
ratchet.local_ip: ""
ratchet.local_ifname: "in1"
ratchet.pair_name: "pair-pod"
ratchet.pair_ip: ""
ratchet.pair_ifname: "in2"
ratchet.primary: "true"

These are how ratchet knows how to setup the interfaces on the pods. You set up each pod as pairs. Where there’s a “primary” and a “pair”. You need to (as of now) know the name of the pod that’s going to be the pair. Then you can set the names of the interfaces, and which IPs are assigned. In this case we’re going to have an interface called in1 on the primary side, and an interface named in2 on the pair side. The primary will be assigned the IP address and the pair will have the IP address

Of all of the parameters, the keystone is the ratchet: "true" parameter, which tells us that ratchet should process this pod – otherwise, it will pass through the pod to another CNI plugin given the delegate parameter in the ratchet configuration.

I put that into a file example.yaml and created it as such:

[centos@kube-master ~]$ kubectl create -f example.yaml 

And then watched it come up with watch -n1 kubectl get pods. Once it’s up, we can check out some stuff.

But – you should also check out which nodes they’re running on to make sure you got the labelling and the nodeSelector’s correct. You can do this by checking out the description of the pods, and looking for the node values.

$ kubectl describe pod primary-pod | grep "^Node"
$ kubectl describe pod pair-pod | grep "^Node"

Now that you know they’re on differnt nodes, let’s enter the primary pod.

[centos@kube-master ~]$ kubectl exec -it primary-pod -- /bin/bash

Now we can take a look at the interfaces…

[root@primary-pod /]# ip a | grep -P "(^\d|inet\s)"
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
    inet scope host lo
7: in1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN qlen 1000
    inet brd scope global in1

Note that there’s two interfaces:

  • lo which is a loopback created by the boot_network CNI pass through parameter in our configuration.
  • in1 which is a vxlan, assigned the IP address as we defined in the pod labels.

Let’s look at the vxlan properties like so:

[root@primary-pod /]# ip -d link show in1
7: in1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN mode DEFAULT qlen 1000
    link/ether 9e:f4:ab:a0:86:7a brd ff:ff:ff:ff:ff:ff link-netnsid 0 promiscuity 0 
    vxlan id 11 remote dev 2 srcport 0 0 dstport 4789 l2miss l3miss ageing 300 addrgenmode eui64 

You can see that it’s a vxlan with an id of 11, and the remote side is @ which is the IP address of the second minion node. That’s looking correct.

That being said, we can ping the other side now, that we know is @ IP address of

[root@primary-pod /]# ping -c1
PING ( 56(84) bytes of data.
64 bytes from icmp_seq=1 ttl=64 time=0.546 ms

Excellent! All is well and good, let’s destroy this pod, and shortly we’ll move onto the more interesting setup.

[centos@kube-master ~]$ kubectl delete -f example.yaml 

Quick clean-up procedure

Ratchet is in need of some clean-up routines of its own, and since they’re not implemented yet, we have to clean up the etcd data ourselves. So let’s do that right now.

We’re going to create a kubernetes job to delete, with this yaml:

apiVersion: batch/v1
kind: Job
  name: etcd-delete
      name: etcd-delete
      - name: etcd-delete
        image: centos:centos7
        command: ["/bin/bash"]
          - "-c"
          - >
            curl -s -L -X DELETE http://$ETCD_HOST:2379/v2/keys/ratchet\?recursive=true;
      restartPolicy: Never

I created this file as job-delete-etcd.yaml, and then executed it as such:

[centos@kube-master ~]$ kubectl create -f job-delete-etcd.yaml 

And I want to watch it come to completion with:

[centos@kube-master ~]$ watch -n1 kubectl get pods --show-all

You can now remove the job if you wish:

[centos@kube-master ~]$ kubectl delete -f job-delete-etcd.yaml 

Running the whole cloud router

Next, we’re going to run a more interesting setup. I’ve got the YAML resource definitions stored in this gist, so you can peruse them more deeply.

A current limitation is that there are 2 parts, you have to run the first part, wait for the pods to come up, then you can run the second part. This is due to the fact that the current VXLAN implementation of Ratchet is a sketch, and doesn’t take into account a few different use cases – one of which being that there is sometimes more than “just a pair” – and in this case, there’s 3 pairs and some overlap. So we create them in an ordered fashion to let Ratchet think of them just as pairs – because otherwise if we create them all right now, we get a race condition, and usually the vEth wins, so… We’re working around that here ;)

Let’s download those yaml files.

$ curl -L > cloud-router-part1.yaml
$ curl -L > cloud-router-part2.yaml

Now, create the first part, and let the pods come up.

[centos@kube-master ~]$ kubectl create -f cloud-router-part1.yaml 
[centos@kube-master ~]$ watch -n1 kubectl get pods --show-all

Then you can create the second part, and watch the last single pod come up.

[centos@kube-master ~]$ kubectl create -f cloud-router-part2.yaml 
[centos@kube-master ~]$ watch -n1 kubectl get pods --show-all

Using the diagram up at the top of the post, we can figure out that the “Centos A” box routes through both quagga-a and quagga-b before reaching Centos B – so that means if we ping Centos B from Centos A – that’s an end-to-end test. So let’s run that ping:

[centos@kube-master ~]$ kubectl exec -it centosa -- /bin/bash
[root@centosa /]# ping -c5
PING ( 56(84) bytes of data.
64 bytes from icmp_seq=1 ttl=62 time=0.399 ms
[... snip ...]

Hurray! Feel free to go and dig through the rest of the pods and check out ip a and ip -d link show etc. Also feel free to enter the quagga pods and run vtysh and see what’s going on in the routers, too.

Debugging Ratchet issues

This is the very short version, but, there’s basically two places you want to look to see what’s going on.

  • journalctl -u kubelet -f will give you the output from ratchet when it’s run by CNI proper, this is how it’s initially run.
  • tail -f /tmp/ratchet-child.log – this is the log from the child process, and likely will give you the most information. Note that this method of logging to temp is an ulllllltra hack. And I mean it’s a super hack. It’s just a work-around to get some output while debugging for me.

Be a hyper spaz about a hyperconverged GlusterFS setup with dynamically provisioned Kubernetes persistent volumes

I’d recently brought up my GlusterFS for persistent volumes in Kubernetes setup and I was noticing something errant. I had to REALLY baby the persistent volumes. That didn’t sit right with me, so I refactored the setup to use gluster-kubernetes to hook up a hyperconverged setup. This setup improves on the previous setup by both having the Gluster daemon running in Kubernetes pods, which is just feeling so fresh and so clean. Difference being that OutKast is like smooth and cool – and I’m an excited spaz about technology with this. Gluster-Kubernetes also implements heketi which is an API for GlusterFS volume management – that Kube can also use to allow us dynamic provisioning. Our goal today is to spin up Kube (using kube-ansible) with gluster-kubernetes for dynamic provisioning, and then we’ll validate it with master-slave replication in MySQL, to one-up our simple MySQL from the last article.

If you’re not familiar with persistent volumes in Kubernetes, or some of the basics of why GlusterFS is pretty darn cool – give my previous article a read for those basics. But, come back here for the setup.

The bulk of the work I was able to do here was thanks to the gluster-kubernetes setup guide, which helps you use the tool embedded in that project called gk-deploy. This article (and the playbook) leans on gk-deploy quite a bit. I’d also like to thank @jarrpa for some help he gave when I ran into some documentation snags bringing up gluster-kubernetes.


In short, I recommend my usual setup which is a single CentOS 7 machine you can run VMs on. That’s what I typically use with kube-ansible. You’re going to need approximately 100 gigs of disk. You’ll run 4 virtual machines (one master and 4 minions). I personally use 2 vCPUs per VM, you’d likely get away with one.

Otherwise, you can also use this on baremetal, just skip the VM portion of kube-ansible. The tricky part is the that kube-ansible currently only supports a single disk for this setup, and it’d need to be the same name on all baremetal hosts. If you do give it a go, just change the name of the disk in the spare_disk_dev in the ./group_vars/all.yml in your kube-ansible clone. And, you’ll need some disks that are free-and-clear of data, and not mounted on your machines. kube-ansible can set this up for you in VMs. I’m also happy to take some pull requests to improve how this works against baremetal!

Also, as per usual, I assume a CentOS 7 distro on all nodes. And while you might be able to do this with other distros that it colors how I approach this and what ancillary tools I select.

Lastly, you need a client machine you can run Ansible on, and must have Ansible installed.

But, why? Isn’t the previous article’s method just fine?

First and foremost – the original article I wrote didn’t have heketi – the API that we’re going to have Kube use to dynamically provision Gluster volumes. That’s not as good.

The other thing was cleanliness. It was kind of two ways of managing applications – one running on the host operating system, and the others in containers. Just not nearly as clean.

Lastly, it required that you baby some of the volumes. For example, you’d have to specify new persistent volumes, and then make claims against them. Now we can have claims against a new Kubernetes storageclass, and that storage class will specify that we talk to Heketi, like in this example.

Also, we use the gk-deploy tool from gluster-kubernetes here, and it can do a number of things that we just don’t have to maintain anymore – such as “peer probe” all the gluster nodes; which gets them all connected to one another and cooperating.

This begs the question – is there an advantage to running it on the host? I don’t think there is. This has all the pieces that has, it just happens to have them running in containers on the host. Since you’re running Kubernetes – I think that’s an advantage.

It should be noted however that the gk-deploy tool also supports using an existing GlusterFS cluster, and it can just run heketi for us. (However, my playbook doesn’t intend to support that mode, for now.)

Kubernetes Installation (the hard part)

I’ll give a quick review of kube-ansible. If you want a more thorough tutorial check out my article on using it. The most difficult part is just modifying the inventory, and that’s not even that tough. Remember the gist here is that we have a single host that can run virtual machines (which we call the “virthost”, and this playbook has the setup for running those), and then we run virtual machines on which we run Kubernetes (generally for laboratory analysis, in my own case).

Clone up the kube-ansible repo (at a particular tag that has the kube-glusterfs):

$ git clone --branch v0.1.0 && cd kube-ansible

Now go and do the hardest part. Modify the inventories. Modify ./inventory/virthost.inventory to your main CentOS machine to run virtual machines on. Add a vars section to the bottom of it:


And set the eth0 to whatever your primary NIC is named (e.g. if you have multiple NICs, it’s likely in your lab this would be the NIC that can access the internet). And set the CIDR for it too. Of course, at the top set the IP address of this host.

Now we’ll run the virthost setup:

$ ansible-playbook -i inventory/virthost.inventory virt-host-setup.yml

Two things you need to do from here:

  • Pay attention to the list of IPs for the VMs that come up in a play described as: Here are the IPs of the VMs
  • Next, go ahead and get the contents of /root/.ssh/id_vm_rsa (the SSH private key) on the virt host. Put those somewhere so on your client machine (workstation or what have you)

Modify the ./inventory/vms.inventory. In the first four lines, put the IP addresses you got from the last step. Then, the last line point the ansible_ssh_private_key_file variable at the path to the SSH private key you got from the previous step. And lastly – comment out the ansible_ssh_common_args line, you don’t need that now.

Now you can install Kubernetes.

$ ansible-playbook -i inventory/vms.inventory kube-install.yml

To verify it, on the virt host you can ssh to the kube master, like so, and get the list of nodes:

$ ssh -i .ssh/id_vm_rsa centos@kube-master 'kubectl get nodes'

Cool – now you have Kube. We’re going to attach some spare disks to those VMs which will show up as /dev/vdb on each of them. By default they’re 10 gigs (and you can change that in the spare_disk_size_megs variable in ./group_vars/all.yml or put it in your inventory)

ansible-playbook -i inventory/virthost.inventory vm-attach-disk.yml

Alright, you’re good to go – now onto the good stuff.

GlusterFS on Kube (the easy part)

Here’s the easy part – just one more playbook to run. Then we can go from there.

$ ansible-playbook -i inventory/vms.inventory gluster-install.yml

This is going to do everything you need to have glusterfs running on each of the minion nodes.

The (at least mock) hyperconverged storage situation is coming now. If you’re not familiar with that terminology – the shortest explanation is that your storage resides on the same hosts as where you run your computational workloads. Awesome.

Great – that’s a whole bunch of magic, what the heck did that playbook actually do!? If you want to see it in stark detail, checkout the ./roles/glusterfs-kube-config/tasks/main.yml file which has all of what it does.

Here’s the run-down:

  • Installs some required packages (glusterfs-fuse is required on all nodes)
  • Templates a gk-deploy topology file, from ./roles/glusterfs-kube-config/templates/glusterfs-topology.json.j2
    • You can also check out an example, if you’d like.
  • Clones gluster-kubernetes
  • Installs the heketi CLI application on the kube master.
  • Runs the gk-deploy script
    • Using the topology file we templated
    • Specifying that we’ll run GlusterFS daemon in Kubernetes
  • Creates a storageclass from a template in ./roles/glusterfs-kube-config/templates/glusterfs-storageclass.yaml.j2

It’s actually a LOT less steps than before. Primarily because we don’t have to worry about such things as:

  • Formatting disks and creating volume groups, etc.
  • Configuring GlusterFS more deeply and manually peering the endpoints.
  • …and more.

Let’s use it!

Alright cool, well, you just hung out for a while waiting for that GlusterFS playbook to run (not to mention, an entire Kubernetes install). Which makes me believe that you’re sufficiently coffee-i-fied at this point. Because of that, we’re going to pick something a little bit more ambitious this time for an example usage of these persistent volumes. Last time we used MariaDB, this time, we’re going to use MySQL with replication.

Setting up MySQL replication in Kubernetes

If you’re interested more deeply in how to do this, check out the k8s docs on running replicated mysql using stateful sets. That’s the origin of my example, but, I have some modified resource definitions here that are specific to what we just spun up so you don’t have to read through every line. However, it is actually fairly interesting to check out, so I do encourage it.

Firstly, let’s curl down those resource definitions. I also have them in a GitHub Gist.

Ok, let’s get the files.

$ curl -s -L > mysql-configmap.yaml
$ curl -s -L > mysql-services.yaml
$ curl -s -L > mysql-statefulset.yaml

Create from all of those.

$ kubectl create -f mysql-configmap.yaml
$ kubectl create -f mysql-services.yaml
$ kubectl create -f mysql-statefulset.yaml

(One time I had to recreate the stateful set, MySQL complained that I couldn’t connect from an arbitrary IP address one time. Unsure what caused that, but if it happens to you just kubectl delete -f mysql*.yml and try again. )

It takes a bit to spin up, since it’s a stateful set, the pods come up ordered for us, which is nice for a replicated setup. So make sure to do a watch -n1 kubectl get pods (or even a kubectl get pods --watch).

Verifying the MySQL setup.

Now, we can do cool stuff with it. Let’s create a table based on… Honey bees (I keep bees but these numbers aren’t representative of anything scientific, just FYI). Feel free to use whatever data you’d like.

[centos@kube-master ~]$ kubectl run mysql-client --image=mysql:5.7 -i -t --rm --restart=Never -- mysql -h mysql-0.mysql
mysql> CREATE DATABASE beekeeping;
mysql> USE beekeeping;
mysql> CREATE TABLE hive (id INT AUTO_INCREMENT, role VARCHAR(255), counted BIGINT, PRIMARY KEY (id));
mysql> INSERT INTO hive VALUES (NULL,'queen',1);
mysql> INSERT INTO hive VALUES (NULL,'worker',20000);
mysql> INSERT INTO hive VALUES (NULL,'drone',800);
mysql> SELECT * FROM hive;

Ok, that’s all well and good, now, let’s check that the replicated members have data.

[centos@kube-master ~]$ kubectl run mysql-client --image=mysql:5.7 -i -t --rm --restart=Never -- mysql -h mysql-read --execute "SELECT * FROM beekeeping.hive"
| id | role   | counted |
|  1 | queen  |       1 |
|  2 | worker |   20000 |
|  3 | drone  |     800 |

Now, let’s have fun and tear it down, and see if we still have data rollin’.

[centos@kube-master ~]$ kubectl delete -f mysql-statefulset.yaml 
[centos@kube-master ~]$ kubectl create -f mysql-statefulset.yaml 

And then exec the select again, and bammo…

[centos@kube-master ~]$ kubectl run mysql-client --image=mysql:5.7 -i -t --rm --restart=Never -- mysql -h mysql-read --execute "SELECT * FROM beekeeping.hive"
| id | role   | counted |
|  1 | queen  |       1 |
|  2 | worker |   20000 |
|  3 | drone  |     800 |

You’re cookin’ with oil!

Chainmail of NFV (+1 Dexterity) -- Service Chaining in Containers using Koko & Koro

In this episode – we’re going to do some “service chaining” in containers, with some work facilitated by Tomofumi Hayashi in his creation of koko and koro.

Koko (the “container connector”) gives us the ability to connect a network between containers (with veth, vxlan or vlan interfaces) in an isolated way (and it creates multiple interfaces for our containers too, which will allow us to chain them), and then we can use the functionality of Koro (the “container routing” tool) to manipulate those network interfaces, and specifically their routing in order to chain them together, and then further manipulate routing and ip addressing to facilitate the changing of this chain.

Our goal today will be to connect four containers in a chain of services going from a http client, to a firewall, through a router, and terminating at a web server. Once we have that chain together, we’ll intentionally cause a failure of a service and then repair it using koro.

(The title joke is… fairly lame. Since when aren’t the other one’s lame? But! It’s supposed to be a reference to magic items in Dungeons & Dragons)

I’d like to point out that this is not exactly “service function chaining” (SFC) – we can let sdxcentral define that for you. From what I understand is that pure SFC uses a “network service header” (which you can see here from IETF) to help perform dynamic routing. This doesn’t use those headers, so I will refer to it as simply “service chaining”. You can think of it as maybe some related tools and ideas to build on to achieve something more like a proper SFC.

In fact… We’re going to perform a series of steps here that are quite manual, but, to demonstrate what you may be able to automate in the future – and my associate Tomofumi has some machinations in the works to do such things. We’ll cover those later.

Now that we’ve establashed we’re going to chain some services together – let’s go ahead and actually chain ‘em up!

What are we building today?

We’re going to spin up 4 containers, and chain the services in them. All the network connections are veth created by koko.

service chain overview

Here you can see we’ll have 4 services chained together, in essence an HTTP request is made by the client, passes the firewall, gets routed by the router, and then lands at an HTTP server. All of these services run in containers, and the network connections are veth, so all of the containers are on the same host.

The firewall is just iptables, and the router is simply kernel routing and allowing ip forwarding in the container. These are shortcuts to help simplify those services allowing at us to look at the pieces that we use to deploy and manage their networking. I tried to put in an example with DPI, and I realized quickly it was too big of a piece to chew, and that it’d detract from the other core functionality to explore in this article.


Note that this article assumes you have setup left-over from this previous how-to blog showing koko+vpp. If you’re not interested in the VPP part (we don’t use it in this article) you can skip those sections, but, you will need koko & koro installed and Docker.

Limitations and what’s next

This setup could be further extended and made cooler by making all vxlan (or maybe even vlan) connections to the containers and backing them with the VPP host we create in the last article. However, it’s a further number of steps, and between these articles I beleieve one could make a portmanteau of the two and give that a whirl, too!

Tomo has other cool goodies in the works, and without spoiling the surprise of how cool what he’s been designing, the gist is that they further the automation of what we’re doing here. In a more realistic scenario – that’s the real use-case, to have these type of operations very quickly and automatically – instead of babying them at each step. However, this helps to expose you to the pieces at work for something like that to happen.

A warm-up using iptables (optional)

Ok, let’s have a warm-up quick. We can go through the most basic steps, and we’ll operate a firewall. So here we’ll create two endpoints with a firewall between them. This part is optional and you can skip down to the next header.

But, I encourage you to run through this quick, it won’t take extra time and you can see stepwise how koro is used after, say, not using it.

I’m going to use someone’s dockerhub iptables, and here’s the Dockerfile should you need it.

$ docker pull vimagick/iptables

Now run that image, and two more.

$ docker run --name=iptables -dt --privileged -e 'TCP_PORTS=80,443' -e 'UDP_PORTS=53' -e 'RATE=4mbit' -e 'BURST=4kb' vimagick/iptables:latest
$ docker run --name test1 --privileged --net=none -dt dougbtv/centos-network sleep 2000000
$ docker run --name test2 --privileged --net=none -dt dougbtv/centos-network sleep 2000000

We can use koko to connect them together with veth connections.

$ ./gocode/bin/koko -d test1,link1, -d iptables,link2,
$ ./gocode/bin/koko -d iptables,link3, -d test2,link4,

Then, you need default routes on both test1 and test2, like:

$ docker exec -it test /bin/bash -c 'ip route add default via dev link1'
$ docker exec -it test /bin/bash -c 'ip route add default via dev link4'

And the iptables container needs to have ip forwarding…

[root@koko1 centos]# docker exec -it iptables /bin/sh
/ # echo 1 > /proc/sys/net/ipv4/ip_forward

Then you should be able to ping from test1.

Now let’s block icmp, to make sure iptables is working, needs to go into the FORWARD table.

/ # iptables -A FORWARD -p icmp  -j DROP

And you can remove that too…

/ # iptables delete -j FORWARD 1

Cool, those are the working bits, minus koro. So let’s bring in koro.

First, delete those containers (this removes ALL the containers on the host).

$ docker kill $(docker ps -aq)
$ docker rm $(docker ps -aq)

Run those containers again, and now use koko but without assigning IP addresses.

$ docker run --name=iptables -dt --privileged -e 'TCP_PORTS=80,443' -e 'UDP_PORTS=53' -e 'RATE=4mbit' -e 'BURST=4kb' vimagick/iptables:latest
$ docker run --name test1 --privileged --net=none -dt dougbtv/centos-network sleep 2000000
$ docker run --name test2 --privileged --net=none -dt dougbtv/centos-network sleep 2000000
$ ./gocode/bin/koko -d test1,link1 -d iptables,link2
$ ./gocode/bin/koko -d iptables,link3 -d test2,link4

Alright, now, you’ve gotta still set ip forwarding on the iptables container.

[root@koko1 centos]# docker exec -it iptables /bin/sh
/ # echo 1 > /proc/sys/net/ipv4/ip_forward

We’ve got links now, but, no ip addressing. Koro should be able to fix this up for us.

This adds the addresses…

$ ./gocode/bin/koro docker test1 address add dev link1
$ ./gocode/bin/koro docker iptables address add dev link2
$ ./gocode/bin/koro docker iptables address add dev link3
$ ./gocode/bin/koro docker test2 address add dev link4

Let’s add a default route to test1 & 2.

$ ./koro docker test1 route add default via dev link1
$ ./koro docker test2 route add default via dev link4

With those in place, we can now ping across the containers.

$ docker exec -it test1 ping -c 5

Alright, and now… we’ll take those down. (This kills all containers running on your host, btw.)

$ docker kill $(docker ps -aq)
$ docker rm $(docker ps -aq)

Creating a service chain with koro

Let’s get to the good stuff – time to go ahead and make a service chain, it’ll look like…

service chain

Note that those are all containers, and the interfaces created in them are veth pairs.

With that in hand – let’s spin up all the pieces that we need. Pull my dougbtv/pickle-nginx, we’ll use that.

$ docker pull dougbtv/pickle-nginx

Now, let’s run all the containers.

$ docker run --name client --privileged --net=none -dt dougbtv/centos-network sleep 2000000
$ docker run --name=firewall -dt --privileged -e 'TCP_PORTS=80,443' -e 'UDP_PORTS=53' -e 'RATE=4mbit' -e 'BURST=4kb' vimagick/iptables:latest
$ docker run --name router --privileged --net=none -dt dougbtv/centos-network sleep 2000000
$ docker run -dt --net=none --name webserver dougbtv/pickle-nginx

And run a docker ps to make sure they’re all running.

Ok, these need a bit of grooming. Firstly, we need IP forwarding on the firewall and router.

$ docker exec -it firewall /bin/sh -c 'echo 1 > /proc/sys/net/ipv4/ip_forward'
$ docker exec -it router /bin/sh -c 'echo 1 > /proc/sys/net/ipv4/ip_forward'

Great. Now we can create koko links between all the containers. That’s three veth pairs…

$ ./gocode/bin/koko -d client,link1 -d firewall,link2
$ ./gocode/bin/koko -d firewall,link3 -d router,link4
$ ./gocode/bin/koko -d router,link5 -d webserver,link6

And now we’ll add addresses to them all.

$ ./gocode/bin/koro docker client address add dev link1
$ ./gocode/bin/koro docker firewall address add dev link2
$ ./gocode/bin/koro docker firewall address add dev link3
$ ./gocode/bin/koro docker router address add dev link4
$ ./gocode/bin/koro docker router address add dev link5
$ ./gocode/bin/koro docker webserver address add dev link6

And we’re going to need some more routing.

[root@koko1 centos]# ./gocode/bin/koro docker client route add default via dev link1
[root@koko1 centos]# ./gocode/bin/koro docker webserver route add default via dev link6
[root@koko1 centos]# ./gocode/bin/koro docker firewall route add via dev link3
[root@koko1 centos]# ./gocode/bin/koro docker router route add via dev link4

Check all the routing.

[root@koko1 centos]# docker exec -it client ip route
default via dev link1 dev link1  proto kernel  scope link  src 

[root@koko1 centos]# docker exec -it firewall ip route
default via dev eth0 dev link2 proto kernel scope link src dev link3 proto kernel scope link src via dev link3 dev eth0 proto kernel scope link src 

[root@koko1 centos]# docker exec -it router ip route via dev link4 dev link4  proto kernel  scope link  src dev link5  proto kernel  scope link  src 

[root@koko1 centos]# docker exec -it webserver ip route
default via dev link6 dev link6  proto kernel  scope link  src 

Now we have a service chain! Huzzah! You can curl the nginx.

[root@koko1 centos]# docker exec -it client /bin/bash -c 'curl -s | grep -i pickle'
<title>This is pickle-nginx</title>

Let’s cause some chaos, some mass confusion. It’s all well and good we have these four pieces all setup together.

However, the reality is… Something is going to happen. In the real world – everything is broken. To emulate that let’s create this scenario – the firewall goes down. In a more realistic scenario, this pod will be recreated. For this demonstration we’re just going to let it be gone, and we’ll just create new links with koko directly to the router, and then re-route.

Here’s what we’ll do…

service chain failure mode

Note that the firewall winds up failing and is gone, and we’ll fix the routing and ip addressing surrounding it to patch it up.

[root@koko1 centos]# docker kill firewall

That should do it. Alright now we can’t run our same curl, it fails.

[root@koko1 centos]# docker exec -it client /bin/bash -c 'curl'
curl: (7) Failed to connect to Network is unreachable

We can use koko & koro to fix this up for us. Let’s create some new interfaces with koko. We’ll also just use a new subnet for this connection (we could finesse the existing, but, this is a couple steps less).

Go ahead and create that veth pair.

$ ./gocode/bin/koko -d client,link7 -d router,link8

Now, we’ll need some IP addresses, too.

$ ./gocode/bin/koro docker client address add dev link7
$ ./gocode/bin/koro docker router address add dev link8

And we have to fix the client containers default route. We don’t have to delete the existing default route because it went down with the interface – since a veth is a pair. (In a vxlan setup, we’d have to otherwise detect the failure and provide some cleanup), so all we have to do is add a route.

./gocode/bin/koro docker client route add default via dev link7

And – we’re back in business, you can curl the pickle-nginx again.

[root@koko1 centos]# docker exec -it client /bin/bash -c 'curl -s | grep -i pickle'
<title>This is pickle-nginx</title>

In closing.

Using the basics from this technique for a failed service in a container you could make a number of other operations that would use the same basics, e.g. other failure modes (container that is died is replaced with a new one), or extensions of the service chain, say… Adding a DPI container somewhere in the chain.

The purpose of this is to show the steps manually that could be taken automatically – by say a CNI plugin for example. That could make these changes automatically and much more quickly than us lowly humans can make them by punching commands in a terminal.