Chainsaw CNI -- Modify container networking at runtime

Introducing: Chainsaw CNI

The gist of Chainsaw CNI (brum-brum-brum-brum-brrrrrrrrr) is it’s a CNI plugin that runs in a CNI chain (more on that soon), and it allows you to run arbitrary ip commands against your Kubernetes pods to either manipulate or inspect networking. You can do this at run-time by annotating a pod with the commands you want to run.

For example, you can annotate a pod with:

k8s.v1.cni.cncf.io/chainsaw: >
      ["ip route","ip addr"]

And then get the output of ip route and ip addr for your pod.

I named it Chainsaw because:

  • It works using CNI Chains.
  • It’s powerful, but kind of dangerous.

Today, we’re going to:

  • Talk about why I made it.
  • Look at what CNI chains are.
  • See what the architecture is comprised of.
  • And of course, engage the choke, pull the rope start and fire up this chainsaw.

We’ll be using it with network attachment definitions – that is, the custom resource type that’s used by Multus CNI

Why do you say it’s dangerous? Well, like a chainsaw, you do permanent harm to something. You could totally turn off networking for a pod. Or, potentially you open up a way for some user of your system to do something more privileged than you thought. I’m still thinking about how to better address this part, but for now… I’d advise that you use it carefully, and in lab situations rather than production before these aspects are more fully considered.

Also, as an aside… I am a physical chainsaw user. I have one and, I use it. But I’m appropriately afraid of it. I take a long long time to think about it before I use it. I’ve watched a bunch of videos about it, but I really want to take a Game Of Logging course so I can really operate it safely. Typically, I’m just using my Silky Katanaboy (awesome Japanese pull saw!) for trail work and what not.

Last but not least, a quick disclaimer: This is… a really new project. So it’s missing all kinds of stuff you might take for granted: unit tests, automatic builds, all that. Just a proof of concept, really.

Why, though?

I was originally inspired by this hearing this particular discussion:

Person: “Hey I want to manipulate a route on a particular pod”

Me: “Cool, that’s totally possible, use the route override CNI” (it’s another chained plugin!)

Person: “But I don’t want to manipulate the net-attach-def, there’s tons of pods using them, and I only want to manipulate for a specific site, so I want to do it at runtime, adding more net-attach-defs makes life harder”.

Well, this kinda bothered me! I talked to a co-worker who said “Sure, next they’re going to want to change EVERYTHING at runtime!”

I was thinking: “hey, what if you COULD change whatever you wanted at runtime?”

And I figured, it could be a really handy tool, even if just for CNI developers, or network tinkerers as it may be.

CNI Chains


   ┌──────────────────┐                   ┌────────────────┐
   │                  │                   │                │
   │                  │   ┌───────────┐   │                │
   │   CNI Plugin A   │   │           │   │  CNI Plugin B  │
   │                  ├───► cni result├───►                │
   │                  │   │           │   │                │
   │                  │   └───────────┘   │                │
   └──────────────────┘                   └────────────────┘

CNI chains are… sometimes confusing to people. But, they don’t need to be, it’s basically as simple as saying, “You can chain as many CNI plugins together as you want, and each CNI plugin gets all the CNI results of the plugin before it”

This functionality was introduced in CNI 0.3.0 and is available in all later versions of CNI, naturally.

You can tell if you have a CNI plugin chain by looking at your CNI configuration, if the top level JSON has the "type" field – then it’s not a chain.

If it has the "plugins": [] array – then it’s a chain of plugins, and will run in the order within the array. As of CNI 1.0, you’ll always be using the plugins field, and always have chains, even if a “chain of one”.

Why do you use chained plugins? The best example I can usually think of is the Tuning Plugin. Which allows you to set network sysctls, or manipulate other parameters of networks – such as setting an interface into promiscuous mode. This is done typically after the work of your main plugin, which is going to do the plumbing to setup the networking for you (e.g. say, a vxlan tunnel, or a macvlan interface, etc etc).

The architecture

Not a whole lot to say, but it’s a “sort of thick plugin” – thick CNI plugins are those that have a resident daemon, as opposed to “thin CNI plugins” – which run as a one-shot (all of the reference CNI plugins are one shots). But in this case, we just use the daemonset that’s resident for looking at the log output, for inspecting our results.

Other than that, it’s similar to Multus CNI in that it knows how to talk to the k8s API and get the annotations, and it uses a generated kubeconfig to authorize itself against the k8s API

Let’s get to using it!

Requirements:

  • A k8s cluster, the newer the beter.
  • Multus CNI must be installed

That’s about it. Don’t use a production cluster ;)

So go ahead and clone dougbtv/chainsaw-cni.

Then create the daemonset with:

kubectl create -f deployments/daemonset.yaml

NOTE: Are you an openshift user? Use the deployments/daemonset_openshift.yaml deployment instead :thumbsup:

Now, let’s create a net-attach-def which implements chainsaw in a chain – note the plugins array!

Also note the use of the special token CURRENT_INTERFACE which will use the current interface name as opposed to you having to know it in advance.

---
apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
  name: test-chainsaw
spec:
  config: '{
    "cniVersion": "0.4.0",
    "name": "test-chainsaw-chain",
    "plugins": [{
      "type": "bridge",
      "name": "mybridge",
      "bridge": "chainsawbr0",
      "ipam": {
        "type": "host-local",
        "subnet": "192.0.2.0/24"
      }
    }, {
      "type": "chainsaw",
      "foo": "bar"
    }]
  }'
---
apiVersion: v1
kind: Pod
metadata:
  name: chainsawtestpod
  annotations:
    k8s.v1.cni.cncf.io/networks: test-chainsaw
    k8s.v1.cni.cncf.io/chainsaw: >
      ["ip route add 192.0.3.0/24 dev CURRENT_INTERFACE", "ip route"]
spec:
  containers:
  - name: chainsawtestpod
    command: ["/bin/ash", "-c", "trap : TERM INT; sleep infinity & wait"]
    image: alpine

Next, check what node the pod is running with:

kubectl get pods -o wide

You can then find the output from the results of the ip commands from the chainsaw daemonset that is running on that node, e.g.

kubectl get pods -n kube-system -o wide | grep -iP "status|chainsaw"

And looking at the logs for the daemonset pod that correlates to the node on which the pod resides, for example:

kubectl logs kube-chainsaw-cni-ds-kgx69 -n kube-system

You’ll see that we have added a route to 192.0.3.0/24 and then show the IP route output!

So my results look like:

Detected commands: [route add 192.0.3.0/24 dev CURRENT_INTERFACE route]
Running ip netns exec 901afa16-48e7-4f22-b2b1-7678fa3e9f5e ip route add 192.0.3.0/24 dev net1 ===============


Running ip netns exec 901afa16-48e7-4f22-b2b1-7678fa3e9f5e ip route ===============
default via 10.129.2.1 dev eth0 
10.128.0.0/14 dev eth0 
10.129.2.0/23 dev eth0 proto kernel scope link src 10.129.2.64 
172.30.0.0/16 via 10.129.2.1 dev eth0 
192.0.2.0/24 dev net1 proto kernel scope link src 192.0.2.51 
192.0.3.0/24 dev net1 scope link 
224.0.0.0/4 dev eth0 

cnitool -- your CNI Swiss Army knife

If you’re looking at developing (or debugging!) CNI plugins, you’re going to need a workflow for developing CNI plugins – something that really lets you get in there, and see exactly what a CNI plugin is doing. You’re going to need a bit of a swiss army knife, or something that slices, dices, and makes juilienne fries. cnitool is just the thing to do the job. Today we’ll walk through setting up cnitool, and then we’ll make a “dummy” CNI plugin to use it with, and we’ll run a reference CNI plugin.

We’ll also cover some of the basics of the information that’s passed to and from the CNI plugins and CNI itself, and how you might interact with that information, and how you might inspect a container that’s been plumbed with interfaces as created by a CNI plugin.

In this article, we’ll do this entirely without interacting with Kubernetes (and save it for another time!). And we actually do it without a container runtime at all – no docker, no crio. We just create the network namespace by hand. But the same kind of principles apply with both a container runtime (docker, crio) or a container orchestration enginer (e.g. k8s)

You might remember my blog article about a workflow for developing CNI plugins. That article uses the docker-run.sh, which is still totally valid. You might look at it for a reference, but CNI tool gives a bit more granularity.

Prerequisites

  • Golang installed and configured on your system.
  • I used a Fedora environment, these steps probably work elsewhere.

Setting up cnitool and the reference CNI plugins.

Basically, all the steps necessary to install cnitool are available in the cnitool README. I’ll summarize them here, but, it may be worth a reference.

Install cnitool…

go get github.com/containernetworking/cni
go install github.com/containernetworking/cni/cnitool

You can test if it’s in your path and operational with:

cnitool --help

Next, we’ll compile the “reference CNI plugins” – these are a series of plugins that are offered by the CNI maintainers that create network interfaces for pods (as well as provide a number of “meta” type plugins that alter the properties, attributes, and what not of a particular container’s network). We also set our CNI_PATH variable (which is used by cnitool to know where these plugin executables are)

git clone https://github.com/containernetworking/plugins.git
cd plugins
./build_linux.sh
export CNI_PATH=$(pwd)/bin
echo $CNI_PATH

Alright, you’re basically all setup at this point.

Creating a netns and running cnitool against it

We’ll need to create a CNI configuration. For testing purposes, we’re going to create a configuration for the bridge CNI.

Create a directory and file at /tmp/cniconfig/10-myptp.conf with these contents:

{
  "cniVersion": "0.4.0",
  "name": "myptp",
  "type": "ptp",
  "ipMasq": true,
  "ipam": {
    "type": "host-local",
    "subnet": "172.16.29.0/24",
    "routes": [{
      "dst": "0.0.0.0/0"
    }]
  }
}

And then set your CNI configuration directory by exporting this variable as:

export NETCONFPATH=/tmp/cniconfig/

First we create a netns – a network namespace. This is kind of a privately sorta-jailed space in which network components live, and is the basis of networking in containers, “here’s your private namespace in which to do your network-y things”. This, from a CNI point of view, is equivalent to the “sandbox” which is the basis container of pods that run in kubernetes. In k8s we’d have one or more containers running inside this sandbox, and they’d share the networks as in this network namespace.

sudo ip netns add myplayground

You can go and list them to see that it’s there…

sudo ip netns list | grep myplayground

Now we’re going to run cnitool with sudo so it has the appropriate permissions, and we’re going to need to pass it along our environment variables and our path to cnitool (if your root user doesn’t have a go environment, or isn’t configured that way), for me it looks like:

sudo NETCONFPATH=$(echo $NETCONFPATH) CNI_PATH=$(echo $CNI_PATH) $(which cnitool) add myptp /var/run/netns/myplayground

Let’s breakdown what this is doing more or less…

  • NETCONFPATH=$(echo $NETCONFPATH) CNI_PATH=$(echo $CNI_PATH) sets our environment variables to tell tool
  • $(which cnitool) figures out the path of cnitool so that inside your sudo environment, you don’t need your GOPATH (you’re rad if you have that setup, though)
  • add myptp /var/run/netns/myplayground says that add is the CNI method which is being invoked, myptp is our configuration, and the /var/run/... is the path to the netns that we created.

You should get some output that looks like:

{
    "cniVersion": "0.4.0",
    "interfaces": [
        {
            "name": "veth20b2acac",
            "mac": "62:22:15:72:b2:29"
        },
        {
            "name": "eth0",
            "mac": "42:48:16:0b:e9:98",
            "sandbox": "/var/run/netns/myplayground"
        }
    ],
    "ips": [
        {
            "version": "4",
            "interface": 1,
            "address": "172.16.29.3/24",
            "gateway": "172.16.29.1"
        }
    ],
    "routes": [
        {
            "dst": "0.0.0.0/0"
        }
    ],
    "dns": {}
}

You can then actually do a ping out that interface, with:

sudo ip -n myplayground addr
sudo ip netns exec myplayground ping -c 1 4.2.2.2

And you can use nsenter to more interactively play with it, too…

sudo nsenter --net=/var/run/netns/myplayground /bin/bash
[root@host dir]# ip a
[root@host dir]# ip route
[root@host dir]# ping -c 5 4.2.2.2

Let’s interactively look at a CNI plugin running with cnitool.

What we’re going to do is create a shell script that is a CNI plugin. You see, CNI plugins can be executables of any variety – they just need to be able to read from stdin, and write to stdout and stderr.

This is kind of a blank slate for a CNI plugin that’s made with bash. You could use this approach, but, in reality – you’ll probably write these applications with go. Why? Well, especially because there’s the CNI libraries (especially libcni) which you would use to be able to express some of these ideas about CNI in a more elegant fashion. Take a look at how Multus uses CNI’s skel (skeletal components, for the framework of your CNI plugin) in its main routine to call the methods as CNI has called them. Just read through Multus’ main.go and look how it imports skel and then using skel calls our method to add when CNI ADD is used.

First, let’s make a cni configuration for our dummy plugin. I made mine at /tmp/cniconfig/05-dummy.conf.

{
  "cniVersion": "0.4.0",
  "name": "mydummy",
  "type": "dummy"
}

There’s not a lot to pay attention to here, the most important things are:

  • the type field which must have the same name as our executable on disk – which are both going to be dummy
  • the name field is the name we’ll reference in our cnitool command, which will be mydummy.

Now, in the path where we have our reference CNI plugins, lets add another file, name it dummy, and then make sure its executable. In my case I did a:

vi ./bin/dummy
chmod 0755 ./bin/dummy

I made mine with the contents from this gist.

The first thing to note is that the majority of this file is to actually just setup some logging for looking at the CNI parameters, and all the magic happens in the last 3-4 lines.

Mainly, we want to output 3 environment using these three lines. These are some environment variables that are sent to us from CNI and that a CNI plugin can use to figure out the netns, the container id, and the CNI command.

Importantly – since we have this DEBUG variable turned on, we’re outputting via stderr… if there’s any stderr output during a CNI plugin run, this is considered a failure, as that’s what you’re supposed to do when you error out, is output to stderr.

And last but not least, we output a CNI result at the bottom line, which calls this function which outputs a (sorta kinda realistic) CNI result.

You can turn that off, but we have it on for demonstrative purposes so you can easily see the what those variables are.

So, let’s run it!

sudo NETCONFPATH=$(echo $NETCONFPATH) CNI_PATH=$(echo $CNI_PATH) $(which cnitool) add mydummy /var/run/netns/dummyplayground

And you can see output that looks like:

CNI method: ADD
CNI container id: cnitool-06764c511c35893f831e
CNI netns: /var/run/netns/dummyplayground
{
    "cniVersion": "0.4.0",
    "interfaces": [
        {
            "name": "dummy"
        }
    ],
    "dns": {}
}

Here we’ll see that there’s a lot of information that we as humans already know, since we’re executing CNI tool, but it demonstrates how a CNI plugin interacts with this information, it’s telling us that it:

  • Knows that we’re doing a CNI ADD operation.
  • We’re using a netns that’s called dummyplayground
  • It’s outputting a CNI result.

These are the general basics of what a CNI plugin needs in order to operate. And then… from there, the sky’s the limit. A more realistic plugin might

And to learn a bit more, you might think about looking at some of the reference CNI plugins, and see what they do to create interfaces inside these network namespaces.

But what if my CNI plugins interacts with Kubernetes!?

…And that’s for next time! You’ll need a Kubernetes environment of some sort.

Whereabouts -- A cluster-wide CNI IP Address Management (IPAM) plugin

Something that’s a real challenge when you’re trying to attach multiple networks to pods in Kubernetes is trying to get the right IP addresses assigned to those interfaces. Sure, you’d think, “Oh, give it an IP address, no big deal” – but, turns out… It’s less than trivial. That’s why I came up with the IP Address Management (IPAM) plugin that I call “Whereabouts” – you can think of it like a DHCP replacement, it assigns IP addresses dynamically to interfaces created by CNI plugins in Kubernetes. Today, we’ll walk through how to use Whereabouts, and highlight some of the issues that it overcomes. First – a little background.

The “multi-networking problem” in Kubernetes is something that’s been near and dear to me. Basically what it boils down to is the question “How do you access multiple networks from networking-based workloads in Kube?” As a member of the Network Plumbing Working Group, I’ve helped to write a specification for how to express your intent to attach to multiple networks, and I’ve contributed to Multus CNI in the process. Multus CNI is a reference implementation of that spec and it gives you the ability to create additional interfaces in pods, each one of those interfaces created by CNI plugins. This kind of functionality is critical for creating network topologies that provide control and data plane isolation (for example). If you’re a follower of my blog – you’ll know that I’m apt to use telephony examples (especially with Asterisk!) usually to show how you might isolate signal, media and control.

I’ll admit to being somewhat biased (being a Multus maintainer), but typically I see community members pick up Multus and have some nice success with it rather quickly. However, sometimes they get tripped up when it comes to getting IP addresses assigned on their additional interfaces. Usually they start by using the quick-start guide). The examples for Multus CNI are focused on a quick start in a lab, and for IP address assignment, we use the host-local reference plugin from the CNI maintainers. It works flawlessly for a single node.

host-local with a single node

But… Once they get through the quickstart guide in a lab, they’re like “Great! Ok, now let’s exapand the scale a little bit…” and once that happens, they’re using more than one node, and… It all comes crumbling down.

host-local with multiple nodes

See – the reason why host local doesn’t work across multiple nodes is actually right in the name “host-local” – the storage for the IP allocations is local to each node. That is, it stores which IPs have been allocated in a flat file on the node, and it doesn’t know if IPs in the same range have been allocated on a different node. This is… Frustrating, and really the core reasoning behind why I originally created Whereabouts. That’s not to say there’s anything inherently wrong with host-local, it works great for the purpose for which its designed, and its purview (from my view) is for local configurations for each node (which isn’t necessarily the paradigm that’s used with a technology like Multus CNI where CNI configurations aren’t local to each node).

Of course, the next thing you might ask is “Why not just DHCP?” and actually that’s what people typically try next. They’ll try to use the DHCP CNI plugin. And you know, the DHCP CNI plugin is actually pretty great (and aside from the README, these rkt docs kind of explain it pretty well in the IP Address management section). But, some of it is less than intuitive. Firstly, it requires two parts – one of which is to run the DHCP CNI plugin in “daemon mode”. You’ve gotta have this running on each node, so you’ll need a recipe to do just that. But… It’s “DHCP CNI Plugin in Daemon Mode” it’s not a “DHCP Server”. Soooo – if you don’t already have a DHCP server you can use, you’ll also need to setup a DHCP server itself. The “DHCP CNI Plugin in Daemon Mode” just gives you a way to listen to for DHCP messages.

And personally – I think managing a DHCP server is a pain in the gluteous maximus. And it’s the beginning of ski season, and I’m a telemark skier, so I have enough of those pains.

I’d also like to give some BIG THANKS! I’d like to point out that Christopher Randles has made some monstrous contributions to Whereabouts – especially but not limited to the engine which provides the Kubernetes-backed data store (Thanks Christopher!). Additionally, I’d also like to thank Tomofumi Hayashi who is the author of the static IPAM CNI plugin. I originally based Whereabouts on the structure of the static IPAM CNI plugin as it had all the basics, and also I could leverage what was built there to allow Whereabouts users to also use the static features alongside Whereabouts.

How Whereabouts works

How Whereabouts Works

From a user perspective, it’s pretty easy – basically, you add a section to your CNI configuration(s). The CNI specification has a construct for “ipam” – IP Address management.

Here’s an example of what a Whereabouts configuration looks like:

"ipam": {
    "type": "whereabouts",
    "datastore": "kubernetes",
    "kubernetes": { "kubeconfig": "/etc/cni/net.d/whereabouts.d/whereabouts.kubeconfig" },
    "range": "192.168.2.0/24"
  }

Here, we’re essentially saying:

  • We choose whereabouts as a value for type which defines which IPAM plugin we’re calling.
  • We’d like to use kubernetes for our datastore (where we’ll store the IP addresses we’ve allocated) (and we’ll provide a kubeconfig for it, so Whereabouts can access the kube API)
  • And we’d like an IP address range that’s a /24 – we’re asking Whereabouts to assign us IP addresses in the range of 192.168.2.1 to 192.168.2.255.

Behind the scenes, honestly… It’s not much more complex than what you might assume from the exposed knobs from the user perspective. Essentially – it’s storing the IP address allocations in a data store. It can use the Kubernetes API natively to do so, or, it can use an etcd instance. This provides a method to access what’s been allocated across the cluster – so you can assign IP addresses across nodes in the cluster (unlike being limited to a single host, with host-local). Otherwise, regarding internals – I have to admit it was kind of satisfying to program the logic to scan through IP address ranges with bitwise operations, ok I’m downplaying it… Let’s be honest, it was super satisifying.

Requirements

  • A Kubernetes Cluster v1.16 or later
  • You need a default network CNI plugin installed (like Flannel [or Weave, or Calico, etc, etc])
  • Multus CNI
    • I’ll cover a basic installation here, so you don’t need to have it right now. But, if you already have it installed, you’ll save a step.
    • If you’re using OpenShift – you already have all of the above out of the box, so you’re all set.

Essentially, all of the commands will be run from wherever you have access to kubectl.

Let’s install Multus CNI

You can always refer to the quick start guide if you’d like more information about it, but, I’ll provide the cheat sheet here.

Basically we just clone the Multus repo and then apply the daemonset for it…

git clone https://github.com/intel/multus-cni.git && cd multus-cni
cat ./images/multus-daemonset.yml | kubectl apply -f -

You can check to see that it’s been installed by watching the pods for it come up, with watch -n1 kubectl get pods --all-namespaces. When you see the kube-multus-ds-* pods in a Running state you’re good. If you’re a curious type you can check out the contents (on any or all nodes) of /etc/cni/net.d/00-multus.conf to see how Multus was configured.

Let’s fire up Whereabouts!

The installation for it is easy, it’s basically the same as Multus, we clone it and apply the daemonset. This is copied directly from the Whereabouts README.

git clone https://github.com/dougbtv/whereabouts && cd whereabouts
kubectl apply -f ./doc/daemonset-install.yaml -f ./doc/whereabouts.cni.k8s.io_ippools.yaml

Same drill as above, just wait for the pods to come up with watch -n1 kubectl get pods --all-namespaces, they’re named whereabouts-* (usually in the kube-system namespace).

Time for a test drive

The goal here is to create a configuration to add an extra interface on a pod, add a Whereabouts configurations to that, spin up two pods, have those pods on different nodes, and show that they’ve been assigned IP addresses as we’ve specified.

Alright, what I’m going to do next is to give my nodes some labels so I can be assured that pods wind up on different nodes – this is mostly just used to illustrate that Whereabouts works with multiple nodes (as opposed to how host-local works).

$ kubectl get nodes
$ kubectl label node kube-whereabouts-demo-node-1 side=left
$ kubectl label node kube-whereabouts-demo-node-2 side=right
$ kubectl get nodes --show-labels

Now what we’re going to do is create a NetworkAttachmentDefinition – this a custom resource that we’ll create to express that we’d like to attach an additional interface to a pod. Basically what we do is pack a CNI configuration inside our NetworkAttachmentDefinition. In this CNI configuration we’ll also include our whereabouts config.

Here’s how I created mine:

cat <<EOF | kubectl create -f -
apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
  name: macvlan-conf
spec:
  config: '{
      "cniVersion": "0.3.0",
      "name": "whereaboutsexample",
      "type": "macvlan",
      "master": "eth0",
      "mode": "bridge",
      "ipam": {
        "type": "whereabouts",
        "datastore": "kubernetes",
        "kubernetes": { "kubeconfig": "/etc/cni/net.d/whereabouts.d/whereabouts.kubeconfig" },
        "range": "192.168.2.225/28",
        "log_file" : "/tmp/whereabouts.log",
        "log_level" : "debug"
      }
    }'
EOF

What we’re doing here is creating a NetworkAttachmentDefinition for a macvlan-type interface (using the macvlan CNI plugin).

NOTE: If you’re copying and pasting the above configuration (and I hope you are!) make sure you set the master parameter to match the name of a real interface name as available on your nodes.

Then we specify an ipam section, and we say that we want to use whereabouts as our type of IPAM plugin. We specify where the kubeconfig lives (this gives whereabouts access to the Kube API).

And maybe most important to us as users – we specify the range we’d like to have IP addresses assigned in. You can use CIDR notation here, and… If you need to use other options to exclude ranges, or other range formats – check out the README’s guide to the core parameters.

After we’ve created this configuration, we can list it too – in case we need to remove or change it later, such as:

$ kubectl get network-attachment-definitions.k8s.cni.cncf.io

Alright, we have all our basic setup together, now let’s finally spin up some pods…

Note that we have annotations here that include k8s.v1.cni.cncf.io/networks: macvlan-conf – that value of macvlan-conf matches the name of the NetworkAttachmentDefinition that we created above.

Let’s create the first pod for our “left side” label:

cat <<EOF | kubectl create -f -
apiVersion: v1
kind: Pod
metadata:
  name: samplepod-left
  annotations:
    k8s.v1.cni.cncf.io/networks: macvlan-conf
spec:
  containers:
  - name: samplepod-left
    command: ["/bin/bash", "-c", "trap : TERM INT; sleep infinity & wait"]
    image: dougbtv/centos-network
  nodeSelector:
    side: left
EOF

And again for the right side:

cat <<EOF | kubectl create -f -
apiVersion: v1
kind: Pod
metadata:
  name: samplepod-right
  annotations:
    k8s.v1.cni.cncf.io/networks: macvlan-conf
spec:
  containers:
  - name: samplepod-right
    command: ["/bin/bash", "-c", "trap : TERM INT; sleep infinity & wait"]
    image: dougbtv/centos-network
  nodeSelector:
    side: right
EOF

I then wait for the pods to come up with watch -n1 kubectl get pods --all-namespaces or I look at the details of one pod with watch -n1 'kubectl describe pod samplepod-left | tail -n 50'

Also – you’ll note if you kubectl get pods -o wide the pods are indeed running on different nodes.

Once the pods are up and in a Running state, we can interact with them.

The first thing I do is check out that the IPs have been assigned:

$ kubectl exec -it samplepod-left -- ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
3: eth0@if8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP 
    link/ether 3e:f7:4b:a1:16:4b brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.244.2.4/24 scope global eth0
       valid_lft forever preferred_lft forever
4: net1@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN 
    link/ether b6:42:18:70:12:6e brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 192.168.2.225/28 scope global net1
       valid_lft forever preferred_lft forever

You’ll note there’s three interfaces, a local loopback, an eth0 that’s for our “default network” (where we have pod-to-pod connectivity by default), and an additional interface – net1. This is our macvlan connection AND it’s got an IP address assigned dynamically by Whereabouts. In this case 192.168.2.225

Let’s check out the right side, too:

$ kubectl exec -it samplepod-right -- ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
3: eth0@if7: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP 
    link/ether 96:28:58:b9:a4:4c brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.244.1.3/24 scope global eth0
       valid_lft forever preferred_lft forever
4: net1@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN 
    link/ether 7a:31:a7:57:82:1f brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 192.168.2.226/28 scope global net1
       valid_lft forever preferred_lft forever

Great, we’ve got another dynamically assigned address that does not collide with our already reserved IP address from the left side! Our address on the right side here is 192.168.2.226.

And while connectivity is kind of outside the scope of this article – in most cases it should generally work right out the box, and you should be able to ping from one pod to the next!

[centos@kube-whereabouts-demo-master whereabouts]$ kubectl exec -it samplepod-right -- ping -c5 192.168.2.225
PING 192.168.2.225 (192.168.2.225) 56(84) bytes of data.
64 bytes from 192.168.2.225: icmp_seq=1 ttl=64 time=0.438 ms
64 bytes from 192.168.2.225: icmp_seq=2 ttl=64 time=0.217 ms
64 bytes from 192.168.2.225: icmp_seq=3 ttl=64 time=0.316 ms
64 bytes from 192.168.2.225: icmp_seq=4 ttl=64 time=0.269 ms
64 bytes from 192.168.2.225: icmp_seq=5 ttl=64 time=0.226 ms

And that’s how you can determine your pods Whereabouts (by assigning it a dynamic address without the pain of runnning DHCP!).

High Performance Networking with KubeVirt - SR-IOV device plugin to the rescue!

If you’ve got workloads that live in VMs, and you want to get them into your Kubernetes environment (because, I don’t wish maintaining two platforms even on the worst of the supervillains!) – you might also have networking workloads that require you to really push some performance…. KubeVirt with SR-IOV device plugin might be just the hero you need to save the day. Not all heros wear capes, sometimes those heroes just wear a t-shirt with a KubeVirt logo that they got at Kubecon. Today we’ll spin up KubeVirt with SR-IOV device plugin and we’ll run a VoIP workload on it, so jump into a phonebooth, change into your Kubevirt t-shirt and fire up a terminal!

I’ll be giving a talk at Kubecon EU 2019 in Barcelona titled High Performance Networking with KubeVirt. Presenting with me is the guy with the best Yoda drawing on all of GitHub, Abdul Halim from Intel. and I’ll give a demonstration of what’s going on here in this article, and this material will be provided to attendees too so that they can follow the bouncing ball and get the same demo working in their environment.

Part of the talk is this recorded demo on YouTube. It’ll give you a preview of all that we’re about to do here in this article. Granted this recorded demo does skip over some of the most interesting configuration, but, shows the results. We’ll cover all the details herein to get you to the same point.

We’ll look at spinning up KubeVirt, with SR-IOV capabilities. We’ll walk through what the physical installation and driver setup looks like, we’ll fire up KubeVirt, spin up VMs running in Kube, and then we’ll put our VoIP workload (using Asterisk) in those pods – which isn’t complete until we terminate a phone call over a SIP trunk! The only thing that’s on you is to install Kubernetes (but, I’ll have pointers to get you started there, too). Just a quick note that I’m just using Asterisk as an example of a VoIP workload, it’s definitely NOT limited to running in a VM, it also works well in a container, even as a containerized VNF. You might be getting the point that I love Asterisk! (Shameless plugin, it’s a great open source telephony solution!)

So – why VMs? The thing is, maybe you’re stuck with them. Maybe it’s how your vendor shipped the software you bought and deploy. Maybe the management of the application is steeped in the history of it being virtualized. Maybe your software has legacies that simply just can’t be easily re-written into something that’s containerized. Maybe you like having pets (I don’t always love pets in my production deployments – but, I do love my cats Juniper & Otto, who I trained using know-how from The Trainable Cat! …Mostly I just trained them to come inside on command as they’re indoor-outdoor cats.)

Something really cool about the KubeVirt ecosystem is that it REALLY leverages some other hereos in the open source community. A good hero works well in a team for sure. In this case KubeVirt leverages Multus CNI which enables us to connect multiple network interfaces to pods (which also means VMs in the case of KubeVirt!), and we also use the SR-IOV Device Plugin – this plugin gives the Kubernetes scheduler awareness of which limited resources on our worker nodes have been exhausted – specifically which SR-IOV virtual functions (VFs) have been used up, this way we schedule workloads on machines that have sufficient resources.

I’d like to send a HUGE thanks to Booxter – Ihar from the KubeVirt team at Red Hat helped me get all of this going, and I could not have gotten nearly as far as I did without his help. Also thanks to SchSeba & Phoracek, too!

Requirements

Not a ton of requirements, I think the heaviest two here is that you’re going to need:

  • Some experience with Kubernetes (you know how to use kubectl for some basic stuff, at least), and a way to install Kubernetes.
  • SR-IOV capable devices on bare metal machines (and make them part of the Kubernetes cluster that you create)

I’m not going to cover the Kubernetes install here, I have some other material I will share with you on how to do so, though.

In my case, I spun up a cluster with kubeadm. Additionally, I also used my kube-ansible playbooks. If you’d like to use those playbooks, I also have another blog article on how to use kube-ansible.

Install a “default network”

Once you have Kubernetes installed – you’re going to need to have some CNI plugin installed to act as the default network for your cluster. This will provide network connectivity between pods in the regular old fashioned way that you’re used to. Why am I calling it the “default network”, you ask? Because we’re going to add additional network interfaces and attachments to other networks on top of this.

I used Flannel, and installed it like so:

$ curl https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml > flannel.yml
$ kubectl apply -f flannel.yml 

When it’s installed you should see all nodes in a “ready” state when you issue kubectl get nodes.

SR-IOV Setup

Primarily, I followed the KubeVirt docs for SR-IOV setup. In my opinion, this is maybe the biggest adventure in this whole process – mostly because depending on what SR-IOV hardware you have, and what mobo & CPU you have, etc… It might require you to have to dig deeply into your BIOS and figure out what to enable.

Mostly – I will leave this adventure to you, but, I will give you a quick overview of how it went on my equipment.

It’s a little like making a witch’s brew, “Less eye of newt, more hair of frog… nope. Ok let’s try that again, blackcat_iommu=no ravensbreath_pci=on

Or as my co-worker Anton Ivanov said:

It’s just like that old joke about SCSI. How many places do you terminate a SCSI cable? Three. Once on each end and a black goat with a silver knife at full moon in the middle

Mostly, I first had to modify my kernel parameters, so, I added an extra menuentry in my /etc/grub2.cfg, and set it as the default with grubby --set-default-index=0, and made sure my linux line included:

amd_iommu=on pci=realloc

Make sure to do this on each node in your cluster that has SR-IOV hardware.

Note that I was using an AMD based motherboard and CPU, so you might have intel_iommu=on if you’re using Intel, and the KubeVirt docs suggest a couple other parameters you can try.

If you need more help with Grub configurations, the Fedora docs on working with the GRUB2 bootloader are very helpful.

Then, in my BIOS I had to enable a number of things, I had to make sure SR-IOV support was on, as well as enabling IOMMU, and PCIe ARI Support.

After I had that up, I was able to find the VFs like so:

$ find /sys -name *vfs*

And then chose a sriov_totalvfs and echo that number into the sriov_numvfs:

$ cat /sys/devices/pci0000:00/0000:00:03.2/0000:2f:00.2/sriov_totalvfs
$ echo 32 > /sys/devices/pci0000:00/0000:00:03.2/0000:2f:00.2/sriov_numvfs

If it errors out, you might get a hint from following your journal, that is with journalctl -f and see if it gives you any hints. I almost thought I was going to have to modify my BIOS (gulp!), I had found this Reddit thread, but, luckily it never got that far for me. It took me a few iterations at fixing my Kernel parameters and finding all the hidden bits in my BIOS, but… With patience I got there.

…Last but not least, make sure your physical ports on your SR-IOV card are connected to something. I had forgotten to connect mine initially and I couldn’t get SR-IOV capable interfaces in my VMs to come up. So, back to our roots – check layer 1!

Make sure to modprobe vfio-pci

Make sure you have the vfio-pci kernel module loaded…

I did:

# modprobe vfio-pci

And then verified it with:

# lsmod | grep -i vfio

And then I added vfio-pci to /etc/modules

KubeVirt installation

First we install the cluster-network-addons, this will install Multus CNI, and the SR-IOV device plugin.

Before we get any further, let’s open the SR-IOV feature gate. So, on your machine where you use kubectl, issue:

cat <<EOF | kubectl create -f -
apiVersion: v1
kind: ConfigMap
metadata:
  name: kubevirt-config
  namespace: kubevirt
  labels:
    kubevirt.io: ""
data:
  feature-gates: "SRIOV"
EOF

It’s assumed you’d generally do this on the master, or, wherever you run kubectl from.

Let’s follow the add-on operator deployment

kubectl apply -f https://raw.githubusercontent.com/kubevirt/cluster-network-addons-operator/master/manifests/cluster-network-addons/0.7.0/namespace.yaml
kubectl apply -f https://raw.githubusercontent.com/kubevirt/cluster-network-addons-operator/master/manifests/cluster-network-addons/0.7.0/network-addons-config.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/kubevirt/cluster-network-addons-operator/master/manifests/cluster-network-addons/0.7.0/operator.yaml

And we make an example custom resource…

kubectl apply -f https://raw.githubusercontent.com/kubevirt/cluster-network-addons-operator/master/manifests/cluster-network-addons/0.7.0/network-addons-config-example.cr.yaml

Watch for it all to come up…

$ watch -n1 kubectl get pods --all-namespaces -o wide

You can also use this wait condition…

$ kubectl wait networkaddonsconfig cluster --for condition=Ready

Install the KubeVirt operator

Next we’ll follow instructions from the KubeVirt docs for installing the KubeVirt operator. In this case we’ll follow the “#2” instructions here for the “Alternative flow (aka Operator flow)”.

It was suggested to me to use the latest version, as of this writing on the KubeVirt releases it’s shown to be v0.17.0.

$ export VERSION=v0.17.0
$ kubectl apply -f https://github.com/kubevirt/kubevirt/releases/download/$VERSION/kubevirt-operator.yaml
$ kubectl apply -f https://github.com/kubevirt/kubevirt/releases/download/$VERSION/kubevirt-cr.yaml

Watch the pods to be ready, kubectl get pods and all that good stuff.

Then we wait for this to be readied up…

$ kubectl wait kv kubevirt --for condition=Ready

(Mine never became ready?)

[centos@kube-nonetwork-master ~]$ kubectl wait kv kubevirt --for condition=Ready
Error from server (NotFound): kubevirts.kubevirt.io "kubevirt" not found

Install virtctl

$ wget https://github.com/kubevirt/kubevirt/releases/download/v0.17.0/virtctl-v0.17.0-linux-amd64
$ chmod +x virtctl-v0.17.0-linux-amd64
$ sudo mv virtctl-v0.17.0-linux-amd64 /usr/bin/virtctl

Alright cool, at this point you’ve got KubeVirt installed up!

Setup SR-IOV on-disk configuration file /etc/pcidp/config.json

For this step, we’re going to use a helper script. I took this from an existing (and open at the time of writing this article) pull request, and I put it into this gist.

I went ahead and did this as root on each node that has SR-IOV devices (in my case, just one machine)

# curl -s https://gist.githubusercontent.com/dougbtv/1d83c233975e3444957e318f39949d14/raw/ef0bcad7e4a318b3791934ff60a87cc40c4233a9/sriov-helper.sh > sriov-helper.sh
# chmod +x sriov-helper.sh
# ./sriov-helper.sh

Now we can inspect the contents of the file…

# cat /etc/pcidp/config.json

On my machine I can see that the rootDevices matches what I initialized in my SR-IOV setup way above in this article, specifically 2f:00.2.

Restart the SR-IOV device plugin pods…

Now that this is setup, you have to delete the SR-IOV pods… Back to the master (or wherever your kubectl command is run from).

Give this a try…

$ kubectl get pods --namespace=sriov | grep device-plugin | awk '{print $1}' | xargs -L 1 -i kubectl delete pod {} --namespace=sriov

If it stalls out (full disclosure, mine did), you can just list them and delete one-by-one.

$ kubectl get pods --namespace=sriov -o wide | grep device-plugin

and then with each one:

$ kubectl delete pod $each_pod_name_here --namespace=sriov

And then just to make sure, I took the one pod running on my host with SR-IOV devices and looked at the logs…

$ kubectl logs kube-sriov-device-plugin-nblww --namespace=sriov

In this case, I could see the last line was a ListAndWatch(sriov) log and it had content about my device, looked something like this:

&ListAndWatchResponse{Devices:[&Device{ID:0000:2f:0a.0,Health:Healthy,}

Let’s start a (vanilla) Virtual Machine!

Move back to your master (or wherever your run Kubevirt from), and we’re going to spin up a vanilla VM just to get the commands down and make sure everything’s looking hunky dory.

First we’ll clone the kubevirt repo (word to the wise, it’s pretty big, maybe 400 meg clone).

$ git clone https://github.com/kubevirt/kubevirt.git --depth 50 && cd kubevirt

Let’s move into the example VMs section…

$ cd cluster/examples/

And edit a file in there, let’s edit the vm-cirros.yaml – a classic test VM image. Bring it up in your editor first, but, we’ll edit in place like so:

$ sed -ie "s|registry:5000/kubevirt/cirros-container-disk-demo:devel|kubevirt/cirros-container-disk-demo:latest|" vm-cirros.yaml

Kubectl create from that file…

$ kubectl create -f vm-cirros.yaml

And let’s look at the vms custom resources, and we’ll see that it’s created, but, not yet running.

$ kubectl get vms
NAME        AGE     RUNNING   VOLUME
vm-cirros   2m13s   false     

Yep, it’s not started yet, let’s start it…

$ virtctl start vm-cirros
VM vm-cirros was scheduled to start
$ kubectl get vms
NAME        AGE     RUNNING   VOLUME
vm-cirros   3m17s   true      

Wait for it to come up (watch the pods…), and then we’ll console in (you can see that the password is listed right there in the MOTD, gocubsgo). You might have to hit <enter> to see the prompt.

[centos@kube-nonetwork-master examples]$ virtctl console vm-cirros
Successfully connected to vm-cirros console. The escape sequence is ^]

login as 'cirros' user. default password: 'gocubsgo'. use 'sudo' for root.
vm-cirros login: cirros
Password: 
$ echo "foo"
foo

(You can hit ctrl+] to get back to your command line, btw.)

Presenting… a VM with an SR-IOV interface!

Ok, back into your master, and still in the examples directory… Let’s create the SR-IOV example. First we change the image location again…

sed -ie "s|registry:5000/kubevirt/fedora-cloud-container-disk-demo:devel|kubevirt/fedora-cloud-container-disk-demo:latest|" vmi-sriov.yaml

Create a network configuration, a NetworkAttachmentDefinition for this one…

cat <<EOF | kubectl create -f -
apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
  name: sriov-net
  annotations:
    k8s.v1.cni.cncf.io/resourceName: intel.com/sriov
spec: 
  config: '{
    "type": "sriov",
    "name": "sriov-net",
    "ipam": {
      "type": "host-local",
      "subnet": "192.168.100.0/24",
      "rangeStart": "192.168.100.171",
      "rangeEnd": "192.168.100.181",
      "routes": [{
        "dst": "0.0.0.0/0"
      }],
      "gateway": "192.168.100.1"
    }
  }'
EOF

(Side note: The IPAM section here isn’t actually doing a lot for us, in theory you can have "ipam": {}, instead of this setup with the host-local plugin – I struggled with that a little bit, so, I included here an otherwise dummy IPAM section)

Console in with:

virtctl console vmi-sriov

Login as fedora (with password fedora), become root (sudo su -) create an ifcfg-eth1 script:

[root@vmi-sriov2 ~]# cat /etc/sysconfig/network-scripts/ifcfg-eth1
DEVICE="eth1"
BOOTPROTO="none"
IPADDR=192.168.100.2
NETMASK=255.255.255.0
ONBOOT="yes"
TYPE="Ethernet"

And:

# ifup eth1

You can now check out what the configs look like with: ip a.

Now – repeat this for a second VM. I copied the vmi-sriov.yaml to another file and changed the metadata->name to vmi-sriov2.

I then also created a /etc/sysconfig/network-scripts/ifcfg-eth1 and assigned a static IP address of 192.168.100.2.

We’ll reference that IP address later when we create our VoIP workload.

Once you have those two together – you can probably make a ping between the two workloads, and… You can put your own workload in!

Or, if you like, you can also create a VoIP workload using Asterisk as I did.

Asterisk configuration

Install asterisk from RPM, in both VMs, install like so:

yum install -y asterisk-pjsip asterisk asterisk-sounds-core-en-ulaw

Next, we’re going to setup our /etc/asterisk/pjsip.conf file on both VMs. This creates a SIP trunk between each machine.

[transport-udp]
type=transport
protocol=udp
bind=0.0.0.0

[alice]
type=endpoint
transport=transport-udp
context=endpoints
disallow=all
allow=ulaw
aors=alice

[alice]
type=identify
endpoint=alice
match=192.168.100.3/255.255.255.255

[alice]
type=aor
contact=sip:anyuser@192.168.100.3:5060

[bob]
type=endpoint
transport=transport-udp
context=endpoints
disallow=all
allow=ulaw
aors=bob

[bob]
type=identify
endpoint=bob
match=192.168.100.2/255.255.255.255

[bob]
type=aor
contact=sip:anyuser@192.168.100.2:5060

Once you’ve loaded that, console into the VM and issue:

# asterisk -rx 'pjsip reload'

Next we’re going to create a file /etc/asterisk/extensions.conf which is our “dialplan” – this tells Asterisk how to behave when a call comes in our trunk. In our case, we’re going to have it answer the call, play a sound file, and then hangup.

Create the file as so:

[endpoints]
exten => _X.,1,NoOp()
  same => n,Answer()
  same => n,SayDigits(1)
  same => n,Hangup()

Next, you’re going to tell asterisk to reload this with:

# asterisk -rx 'dialplan reload'

Now, from the first VM with the 192.168.100.2 address, go ahead and console into the VM and run asterisk -rvvv to get an Asterisk console, and we’ll set some debugging output on, and then we’ll originate a phone call:

vmi-sriov*CLI> pjsip set logger on
vmi-sriov*CLI> rtp set debug on
vmi-sriov*CLI> channel originate PJSIP/333@bob application saydigits 1

You should see a ton of output now! You’ll see the SIP messages to initiate the phone call, and then you’ll see information about the RTP (real-time protocol) packets that include the voice media going between the machines!

Awesome! Thanks for sticking with it, now… For your workload to the rescue!

A Kubernetes Operator Tutorial? You got it, with the Operator-SDK and an Asterisk Operator!

So you need a Kubernetes Operator Tutorial, right? I sure did when I started. So guess what? I got that b-roll! In this tutorial, we’re going to use the Operator SDK, and I definitely got myself up-and-running by following the Operator Framework User Guide. Once we have all that setup – oh yeah! We’re going to run a custom Operator. One that’s designed for Asterisk, it can spin up Asterisk instances, discover them as services and dynamically create SIP trunks between n-number-of-instances of Asterisk so they can all reach one another to make calls between them. Fire up your terminals, it’s time to get moving with Operators.

What exactly are Kubernetes Operators? In my own description – Operators are applications that manage other applications, specifically with tight integration with the Kubernetes API. They allow you build in your own “operational knowledge” into them, and perform automated actions when managing those applications. You might also want to see what CoreOS has to say on the topic, read their blog article where they introduced operators.

Sidenote: Man, what an overloaded term, Operators! In the telephony world, well, we have operators, like… a switchboard operator (I guess that one’s at least a little obsolete). Then we have platform operators, like… sysops. And we have how things operate, and the operations they perform… Oh my.

A guy on my team said (paraphrased): “Well if they’re applications that manage applications, then… Why write them in Go? Why not just write them in bash?”. He was… Likely kidding. However, it always kind of stuck with me and got me to think about it a lot. One of the main reasons why you’ll see these written in Go is because it’s going to be the default choice for interacting with the Kubernetes API. There’s likely other ways to do it – but, all of the popular tools for interacting with it are written in Go, just like Kubernetes itself. The thing here is – you probably care about managing your application running in Kubernetes with an operator because you care about integrating with the Kubernetes API.

One more thing to keep in mind here as we continue along – the idea of CRDS – Custom Resource Definitions. These are the lingua franca of Kubernetes Operators. We often watch what these are doing and take actions based on them. What’s a CRD? It’s often described as “a way to extend the Kubernetes API”, which is true. The thing is – that sounds SO BIG. It sounds daunting. It’s not really. CRDs, in the end, are just a way for you to store some of your own custom data, and then access it through the Kubernetes API. Think of it as some meta data you can push into the Kube API and then access it – so if you’re interacting with the Kube API, it’s simple to store some of your own data, without having to roll-your-own way of otherwise storing it (and otherwise reading & writing that data).

Today we have a big agenda for this blog article… Here’s what we’re going to do:

  • Create a development environment where we can use the operator-sdk
  • Create own application as scaffolded by the Operator SDK itself.
  • Spin up the asterisk-operator, dissect it a little bit, and then we’ll run it and see it in action.
  • Lastly, we’ll introduce the Helm Operator, a way to kind of lower the barrier of entry that allows you to create a Kubernetes Operator using Helm, and it might solve some of your problems that you’d use an Operator for without having to slang any golang.

References

Here’s a few articles that I used when I was building this article myself.

Requirements

  • A CentOS 7 machine to use for development
    • These commands all reference CentOS, if you use Fedora (or something else), then it might take some conversion to get all the deps.
  • Access to Kubernetes version 1.9 or later cluster
    • Need a tute for that? Check out my latest Kubernetes install tutorial.
    • We will also cover a quick minikube installation
  • Your favorite text editor.
  • A rubber duck for debugging.

Basic development environment setup

Alright, we’ve got some deps to work through. Including, ahem, dep. I didn’t include “root or your regular user” but in short, generally, just the yum & systemctl lines here require su, otherwise they should be your regular user.

Make sure you have git, and this is a good time to install whatever usual goodies you use.

$ yum install -y git
$ git config --global user.email "you@example.com"
$ git config --global user.name "Your Name"

Firstly, install Docker.

$ yum install -y yum-utils   device-mapper-persistent-data   lvm2
$ yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
$ yum install docker-ce -y
$ systemctl enable docker
$ systemctl start docker

Install kubectl.

$ cat <<EOF > /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://packages.cloud.google.com/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg
EOF

$ yum install -y kubectl

Double check that you’ve got bridge-nf-call-iptables all good.

$ sudo /bin/bash -c 'echo "1" > /proc/sys/net/bridge/bridge-nf-call-iptables'

Install minikube (optional: if this is part of a cluster or otherwise have access to another cluster). I’m not generally a huge minikube fan, however, in this case we’re working on a development environment (seeing that we’re looking into building an operator), so it’s actually appropriate here.

$ curl -Lo minikube https://storage.googleapis.com/minikube/releases/v0.28.2/minikube-linux-amd64 && chmod +x minikube && sudo mv minikube /usr/local/bin/
$ sudo /usr/local/bin/minikube start --vm-driver=none

It’ll take a few minutes while it downloads a few container images from which it runs Kubernetes.

If something went wrong and you need to restart minikube from scratch you can do so with:

$ sudo /usr/local/bin/minikube stop; cd /etc/kubernetes/; sudo rm -F *.conf; /usr/local/bin/minikube delete; cd -

Follow the instructions from minikube for setting up your .kube folder. I didn’t have great luck with it, so I performed a sudo su - in order to run say, kubectl get nodes to see that the cluster was OK. In my case, this also meant that I had to bring the cluster up as root as well.

You can test that your minikube is operational with:

kubectl get nodes

It should list just a single node.

Install a nice-and-up-to-date-golang.

$ rpm --import https://mirror.go-repo.io/centos/RPM-GPG-KEY-GO-REPO
$ curl -s https://mirror.go-repo.io/centos/go-repo.repo | tee /etc/yum.repos.d/go-repo.repo
$ yum install -y golang

I changed root’s ~/.bash_profile path (given my above Minikube situation) to:

export GOPATH=/home/centos/go
PATH=$PATH:$HOME/bin:$(go env GOPATH)/bin
export PATH

If you do the same thing you might want to be mindful of the /home/user in that path.

Setup your go environment a little, goal here being able to run binaries that are in your GOPATH’s bin directory.

$ mkdir -p ~/go/bin
$ export GOPATH=~/go
$ export PATH=$PATH:$(go env GOPATH)/bin

Ensure that directory exists…

mkdir -p $GOPATH/bin

Install dep.

$ curl https://raw.githubusercontent.com/golang/dep/master/install.sh | sh

Install the operator-sdk.

$ mkdir -p $GOPATH/src/github.com/operator-framework
$ cd $GOPATH/src/github.com/operator-framework
$ git clone https://github.com/operator-framework/operator-sdk
$ cd operator-sdk
$ git checkout master
$ export PATH=$PATH:$GOPATH/bin && make dep && make install

Create your new project

We’re going to create a sample project using the operator-sdk CLI tool. Note – I used my own GitHub namespace here, feel free to replace it with yours. If not, cool, you can also get a halloween costume of me (and scare kids and neighbors!)

$ mkdir -p $GOPATH/src/github.com/dougbtv
$ cd $GOPATH/src/github.com/dougbtv
$ operator-sdk new hello-operator --kind=Memcached
$ operator-sdk add api  --api-version=cache.example.com/v1alpha1 --kind=Memcached
$ cd hello-operator

Sidenote: For what it’s worth, at some point I had tried a few versions of operator-sdk tools to try to fix another issue. During this, I had some complaint (when running operator-sdk new ...) that something didn’t meet constraints (No versions of k8s.io/gengo met constraints), and it turned out it was this kind of stale dep package cache. So you can clear it as such:

[centos@operator-box github.com]$ rm -Rf $GOPATH/pkg/dep/sources

Also, ignore if it complains it can’t complete the git actions, they’re so simple you can just manage it as a git repo however you please.

Inspecting the scaffolded project

Let’s modify the types package to define what our CRD looks like…

Modify ./pkg/apis/cache/v1alpha1/types.go, replace the two structs at the bottom (that say // Fill me) like so:

type MemcachedSpec struct {
    // Size is the size of the memcached deployment
    Size int32 `json:"size"`
}
type MemcachedStatus struct {
    // Nodes are the names of the memcached pods
    Nodes []string `json:"nodes"`
}

And then update the generated code for the custom resources…

operator-sdk generate k8s

Then let’s update the handler, it’s @ ./pkg/stub/handler.go

We’ll replace that file in its entirety with this example memcached deployment code from github. Just copy-pasta it, or curl it down, whatever you like.

You’ll also need to change the github namespace in that file, replace it with your namespace + the project name you used during operator-sdk new $name_here. I changed mine like so:

$ sed -i -e 's|example-inc/memcached-operator|dougbtv/hello-operator|' pkg/stub/handler.go

Now, let’s create the CRD. First, let’s just cat (I’m a cat person, like, seriously I love cats, if you’re a dog person you can stop reading this article right now, or, you probably use less as a pager too, dog people, seriously!) it and take a look…

$ cat deploy/crd.yaml

Now you can create it…

$ kubectl create -f deploy/crd.yaml

Once it has been created, you can see it’s listed, but, there’s no CRD objects yet…

$ kubectl get memcacheds.cache.example.com

In the Operator-SDK user guide they list two options for running your SDK. Of course, the production way to do it is create a docker image and push it up to a registry, but… we haven’t even compiled this yet, so let’s go one step at a time and run in our local cluster.

$ operator-sdk up local

Cool, you’ll see it initialize, and you might get an error you can ignore for now:

ERRO[0000] failed to initialize service object for operator metrics: OPERATOR_NAME must be set 

Alright, so what has it done? Ummm, nothing yet! Let’s create a custom resource and we’ll watch what it does… Create a custom resource yaml file like so:

$ cat deploy/my.crd.yaml 
apiVersion: "cache.example.com/v1alpha1"
kind: "Memcached"
metadata:
  name: "example-memcached"
spec:
  size: 3

Now let’s apply it:

$ kubectl apply -f deploy/my.crd.yaml 

And we can go and watch what’s happening here…

$ watch -n1 kubectl get deployment

You’ll see that it’s creating a bunch of memcached pods from a deployment! Hurray! Now we can modify that…

Let’s edit the the ./deploy/my.crd.yaml to have a size: 4, like so:

$ cat deploy/my.crd.yaml 
apiVersion: "cache.example.com/v1alpha1"
kind: "Memcached"
metadata:
  name: "example-memcached"
spec:
  size: 4

We can apply that, and then we’ll take another look…

$ kubectl apply -f deploy/my.crd.yaml 
$ watch -n1 kubectl get deployment

Awesome, 4 instances going. Alright cool, we’ve got an operator running! So… Can we create our own?

Creating our own operator!

Well, almost! What we’re going to do now is use Doug’s asterisk-operator. Hopefully there’s some portions here that you can use as a springboard for your own Operator.

How the operator was created

Some of the things that I modified after I had the scaffold was..

  • Updated the types.go to include the fields I needed.
  • I moved the /pkg/apis/cache/ to /pkg/apis/voip/
    • And changed references to memcached to asterisk
  • Created a scheme to discover all IPs of the Asterisk pods
  • Created REST API called to Asterisk to push the configuration

Some things to check out in the code…

Aside from what we reviewed earlier when we were scaffolding the application – which is argually the most interesting from a standpoint of “How do I create any operator that want?” The second most interesting, or, potentially most interesting if you’re interested in Asterisk – is how we handle the service discovery and dynamically pushing configuration to Asterisk.

You can find the bulk of this in the handler.go. Give it a skim through, and you’ll find where it makes the actions of:

  1. Creating the deployment and giving it a proper size based on the CRDs
  2. How it figures out the IP addresses of each pod, and then goes through and uses those to cycle through all the instances and create SIP trunks to all of the other Asterisk instances.

But… What about making it better? This Operator is mostly provided as an example, and to “do a cool thing with Asterisk & Operators”, so some of the things here are clearly in the proof-of-concept realm. A few of the things that it could use improvement with are…

  1. It’s not very graceful with how it handles waiting for the Asterisk instances to become ready. There’s some timing issues with when the pod is created, and when the IP address is assigned. It’s not the cleanest in that regard.
  2. There’s a complete “brute force” method by which it creates all the SIP trunks. If you start with say, 2 instances, and change to 3 instances – well… It creates all of the SIP trunks all over again instead of just creating the couple new ones it needs, I went along with the idea of don’t prematurely optimize. But, this could really justified to optimize it.

What’s the application doing?

Asterisk Operator diagram

In short the application really just does three things:

  1. Watches a CRD to see how many Asterisk instances to create
  2. Figures out the IP addresses of all the Asterisk instances, using the Kube API
  3. Creates SIP trunks from each Asterisk instance to each other Asterisk instance, using ARI push configuration, allowing us to make calls from any Asterisk instance to any other Asterisk instance.

Let’s give the Asterisk Operator a spin!

This assumes that you’ve completed creating the development environment above, and have it all running – you know, with golang and GOPATH all set, minikube running and the operator-sdk binaries available.

First things first – make sure you pull the image we’ll use in advance, this will make for a lot less confusing waiting when you first start the operator itself.

docker pull dougbtv/asterisk-example-operator

Then, clone the asterisk-operator git repo:

mkdir -p $GOPATH/src/github.com/dougbtv && cd $GOPATH/src/github.com/dougbtv
git clone https://github.com/dougbtv/asterisk-operator.git && cd asterisk-operator

We’ll need to create the CRD for it:

kubectl create -f deploy/crd.yaml

Next… We’ll just start the operator itself!

operator-sdk up local

Ok, cool, now, we’ll create a CRD so that the operator sees it and spins up asterisk instances – open up a new terminal window for this.

cat <<EOF | kubectl apply -f -
apiVersion: "voip.example.com/v1alpha1"
kind: "Asterisk"
metadata:
  name: "example-asterisk"
spec:
  size: 2
  config: "an unused field."
EOF

Take a look at the output from the operator – you’ll see it logging a number of things. It has some waits to properly wait for Asterisk’s IP to be found, and for Asterisk instances to be booted – and then it’ll log that it’s creating some trunks for us.

Check out the deployment to see that all of the instances are up:

watch -n1 kubectl get deployment

You should see that it desires to have 2 instances, and that it’s fulfilled those instances. It does this as it has created a deployment.

Let’s go ahead and exec into one of the Asterisk pods, and we’ll run the Asterisk console…

kubectl exec -it $(kubectl get pods -o wide | grep asterisk | head -n1 | awk '{print $1}') -- asterisk -rvvv

Let’s show the AORs (addresses of record):

example-asterisk-6c6dff544-2wfwg*CLI> pjsip show aors

      Aor:  <Aor..............................................>  <MaxContact>
    Contact:  <Aor/ContactUri............................> <Hash....> <Status> <RTT(ms)..>
==========================================================================================

      Aor:  example-asterisk-6c6dff544-wnkpx                     0
    Contact:  example-asterisk-6c6dff544-wnkpx/sip:anyuser 1a830a6772 Unknown         nan

Ok, cool, this has a trunk setup for us, the trunk name in the Aor field is example-asterisk-6c6dff544-wnkpx. Go ahead and copy that value in your own terminal (yours will be different, if it’s not different – leave your keyboard right now, and go buy a lotto ticket).

We can use that to originate a call, I do so with:

example-asterisk-6c6dff544-2wfwg*CLI> channel originate PJSIP/333@example-asterisk-6c6dff544-wnkpx application wait 2
    -- Called 333@example-asterisk-6c6dff544-wnkpx
    -- PJSIP/example-asterisk-6c6dff544-wnkpx-00000000 answered

And we can see that there’s a call that’s been originated, and it has been answered by the other end! Go ahead an quit for now.

Ok – but, here comes the cool stuff. Let’s increase the size of our cluster, we requested 2 instances of Asterisk earlier, now we’ll bump it up to 3.

cat <<EOF | kubectl apply -f -
apiVersion: "voip.example.com/v1alpha1"
kind: "Asterisk"
metadata:
  name: "example-asterisk"
spec:
  size: 3
  config: "an unused field."
EOF

Now our kubectl get deployment will show us that we have three, but! Better yet, we have all the SIP trunks created for us. Let’s exec in and look at the AORs again.

kubectl exec -it $(kubectl get pods -o wide | grep asterisk | head -n1 | awk '{print $1}') -- asterisk -rvvv

Then we’ll do the same and show the AORs:

example-asterisk-6c6dff544-2wfwg*CLI> pjsip show aors

      Aor:  <Aor..............................................>  <MaxContact>
    Contact:  <Aor/ContactUri............................> <Hash....> <Status> <RTT(ms)..>
==========================================================================================

      Aor:  example-asterisk-6c6dff544-k2m7z                     0
    Contact:  example-asterisk-6c6dff544-k2m7z/sip:anyuser 0d391d57b2 Unknown         nan

      Aor:  example-asterisk-6c6dff544-wnkpx                     0
    Contact:  example-asterisk-6c6dff544-wnkpx/sip:anyuser 1a830a6772 Unknown         nan

Ah ha! Now there’s 2 trunks available, the operator went and created a new one for us to the new Asterisk instance.

And we can originate a call to it, too!

example-asterisk-6c6dff544-2wfwg*CLI> channel originate PJSIP/333@example-asterisk-6c6dff544-wnkpx application wait 2
    -- Called 333@example-asterisk-6c6dff544-wnkpx
    -- PJSIP/example-asterisk-6c6dff544-wnkpx-00000001 answered

And there you have it – you can do it for n-number of instances. I tested it out with 33 instances, which works out to 1056 trunks (counting both sides) and… While it took like 15ish minutes, which felt like forever… It takes me longer than that to create 2 or 3 by hand! So… Not a terrible trade off.

Bonus: Helm Operator!

Let’s follow the 15 minute operator with Helm tutorial. See how far we can get. This uses the helm operator kit.

Clone the operator kit, we’ll use their example.

$ git clone https://github.com/operator-framework/helm-app-operator-kit.git
$ cd helm-app-operator-kit/

Now, build a Docker image. Note: You’ll probably want to change the name (from -t dougbtv/... to your name, or someone else’s name if that’s how you roll).

docker build \
  --build-arg HELM_CHART=https://storage.googleapis.com/kubernetes-charts/tomcat-0.1.0.tgz \
  --build-arg API_VERSION=apache.org/v1alpha1 \
  --build-arg KIND=Tomcat \
  -t dougbtv/tomcat-operator:latest .

Docker login and then push the image.

$ docker login
$ docker push dougbtv/tomcat-operator:latest

Alright, now there’s a series of things we’ve got to customize. There’s more instructions on what needs to be customized, too, if you need it.

# this can stay changed to "tomcat"
$ sed -i -e 's/<chart>/tomcat/' helm-app-operator/deploy/operator.yaml 

# this you should change to your docker namespace
$ sed -i -e 's|quay.io/<namespace>|dougbtv|' helm-app-operator/deploy/operator.yaml

# Change the group & kind to match what we had in the docker build.
$ sed -i -e 's/group: example.com/group: apache.org/' helm-app-operator/deploy/crd.yaml 
$ sed -i -e 's/kind: ExampleApp/kind: Tomcat/' helm-app-operator/deploy/crd.yaml 

# And the name has to match that, too
$ sed -i -e 's/name: exampleapps.example.com/name: exampleapps.apache.org/' helm-app-operator/deploy/crd.yaml

# Finally update the Custom Resource to be what we like.
$ sed -i -e 's|apiVersion: example.com/v1alpha1|apiVersion: apache.org/v1alpha1|' helm-app-operator/deploy/cr.yaml
$ sed -i -e 's/kind: ExampleApp/kind: Tomcat/' helm-app-operator/deploy/cr.yaml

Now let’s deploy all that stuff we created!

$ kubectl create -f helm-app-operator/deploy/crd.yaml
$ kubectl create -n default -f helm-app-operator/deploy/rbac.yaml
$ kubectl create -n default -f helm-app-operator/deploy/operator.yaml
$ kubectl create -n default -f helm-app-operator/deploy/cr.yaml