Any time in your schedule? Try using a custom scheduler in Kubernetes

I’ve recently been interested in the idea of extending the scheduler in Kubernetes, there’s a number of reasons why, but at the top of my list is looking at re-scheduling failed pods based on custom metrics – specifically for high performance high availablity; like we need in telecom. In my search for learning more about it, I discovered the Kube docs for configuring multiple schedulers, and even better – a practical application, a toy scheduler created by the one-and-only-kube-hero Kelsey Hightower. It’s about a year old and Hightower is on his game, so he’s using alpha functionality at time of authoring. In this article I modernize at least a component to get it to run in the contemporary day. Today our goal is to run through the toy scheduler and have it schedule a pod for us. We’ll also dig into Kelsey’s go code for the scheduler a little bit to get an intro to what he’s doing.

Fire up your terminals, and let’s get ready to schedule some pods – with the NOT the default scheduler.

What, what’s a scheduler? crond?

Well, not crond, but, part of what makes Kubernetes be Kubernetes is its scheduler. A scheduler, according to Wikipedia, generically speaking is:

[A] method by which work specified by some means is assigned to resources that complete the work. The work may be virtual computation elements such as threads, processes or data flows, which are in turn scheduled onto hardware resources such as processors, network links or expansion cards

So in this case – the “work specified by some means” is our containers (usually Docker containers), and the resource they’re assigned do – are our nodes. That’s a big thing that Kube does for us – it assigns our containers to nodes, and makes sure that they’re running.

If you want to read more about exactly what the default scheduler in Kubernetes does, check out this readme file from the kube repos.


Simply have a Kubernetes 1.7 up and running for you. 1.6 might work, too. If you don’t have Kube running, may I suggest that you use my kube-centos-ansible playbooks, and follow my article about installing a kube cluster on centos (ignore that it says kube 1.5 – same steps will produce a 1.7 cluster).

Also, I use an all-CentOS 7 lab environment, and while it might not be required, note that it colors the ancillary tools and viewpoint from which I create this tutorial.

We’ll install a few deps, I wound up with a Go version 1.6.3, which appears to work fine, for your reference.

Install our deps

I’m performing these steps on my kube master, feel free to run them where’s appropriate for you. You’ll need to install some packages, and you’ll need to be able to use the kubectl utility in order to perform these.

Now, let’s go and install the deps we need:

[centos@kube-master ~]$ sudo yum install -y git golang tmux

Now, make yourself a dir for your go source.

[centos@kube-master ~]$ mkdir -p gocode/src

Clone and build the scheduler

Now let’s clone up Hightower’s code into there.

[centos@kube-master ~]$ cd gocode/src/
[centos@kube-master src]$ git clone
[centos@kube-master src]$ cd scheduler/
[centos@kube-master scheduler]$ pwd

Alright now that we’re there, first thing we’ll do is build the annotator.

[centos@kube-master scheduler]$ cd annotator/
[centos@kube-master annotator]$ go build
[centos@kube-master annotator]$ ls annotator -lh
-rwxrwxr-x. 1 centos centos 7.8M Jul 21 15:23 annotator

Which will produce a binary for us.

Now, go and build the scheduler proper.

[centos@kube-master annotator]$ cd ../
[centos@kube-master scheduler]$ go build
[centos@kube-master scheduler]$ ls scheduler -lh
-rwxrwxr-x. 1 centos centos 7.7M Jul 21 15:24 scheduler

Go makes it easy, right!?

Start your kubectl proxy

We need to run a kubectl proxy, which is a HTTP proxy to access the kube API – our scheduler here will rely on it.

Run tmux:

[centos@kube-master ~]$ tmux 

This will give you a new screen, in that screen run:

[centos@kube-master ~]$ kubectl proxy

You can exit this screen and let it keep running by hitting ctrl+b then d. To return to the screen execute tmux a.

Run the annotation

Alright, we’re going to create some “prices” for each of our nodes. The scheduler will use this and then start the pods on the node with the lowest price.

[centos@kube-master scheduler]$ cd annotator/
[centos@kube-master annotator]$ ./annotator 
kube-master 0.20
kube-minion-1 0.20
kube-minion-2 0.05
kube-minion-3 1.60

Each time you run the annotator, it’ll generate new prices for you. If you just want to list the prices, list them like so:

[centos@kube-master annotator]$ ./annotator -l
kube-master 0.20
kube-minion-1 0.20
kube-minion-2 0.05
kube-minion-3 1.60

Kick up a pod…

Alright, now create a resource definition yaml file with these contents:

[centos@kube-master scheduler]$ cat ~/nginx.yaml 
apiVersion: extensions/v1beta1
kind: Deployment
    app: nginx
  name: nginx
  replicas: 1
      #  "": hightower
        app: nginx
      name: nginx
      schedulerName: hightower
        - name: nginx
          image: "nginx:1.11.1-alpine"
              cpu: "500m"
              memory: "128M"

Hightower had been using the annotation earlier, but, this is now core functionality so what I’ve done that’s different is used the schedulerName property under the spec in the resource definition. As you can see it’s schedulerName: hightower (and hightower is set as a constant as scheduler name in the go code, more on that later)

Now, let’s create this pod:

[centos@kube-master annotator]$ kubectl create -f ~/nginx.yaml 
deployment "nginx" created

We can check out and see that this pod won’t scheduler, which is what we want for now:

[centos@kube-master annotator]$ watch -n1 kubectl get pods

And you might wanna describe it, too…

[centos@kube-master annotator]$ watch -n1 kubectl describe pod nginx-881608959-gwnll

Cool, good it shouldn’t have started yet.

Start the scheduler

Feel free to run this in a tmux screen, but, I ran it in it’s own window.

Fire it up!

[centos@kube-master scheduler]$ ./scheduler 
2017/07/21 15:32:36 Starting custom scheduler...
2017/07/21 15:32:38 Successfully assigned nginx-881608959-vk6t3 to kube-minion-2

Hurray! It scheduled it to kube-minion-2 if you look at our pricing output, you’ll see that is the lowest priced node when we generated prices. Run a kubectl get pods to double check and you can pick up the IP address with a kubectl describe $the_pod_name and curl it to your heart’s content.

If you want, destroy the pod with a:

[centos@kube-master scheduler]$ kubectl delete -f ~/nginx.yaml 

And generate new prices with ./annotator/annotator and run the scheduler again, and see it schedule it to another place when you kubectl create -f it.

Let’s inspect the toy scheduler go code.

So let’s take a look at the code in the toy scheduler. This is really a gloss-over, but maybe can help point you (and later me!) in the right direction to figure out more about how to use these concepts to our own advantages.

The files we’re interested in are:

  • main.go: The main app which starts a couple handler goroutines
  • processor.go: Where our goroutines live.
  • kubernetes.go: The Kube API meat-and-potatoes
  • bestprice.go: Our metric for scheduling.

(There’s also the ./annotator/annotator.go, which is a small util, feel free to poke at that too)

Generally, we have a main.go which is our handler, it starts up some goroutines that run two methods, both found in the process.go file:

  • monitorUnscheduledPods()
  • reconcileUnscheduledPods()

These handle the goroutine logic (e.g. working with the wait group), perform a wait operation (I assume for polling for the rest of the logic), and then call the schedulePod() method also in processor.go.

The monitorUnscheduledPods() also calls the method watchUnscheduledPods() from kubernetes.go which is looking for those unscheduled pods for us (looks to be polling, but, there’s some things named “event” which makes me wonder if it has a watch on those events, I’m unsure and I didn’t dig further for now). The watchUnscheduledPods() method returns a channel to the pods it discovers.

When there’s a pod to be scheduled, finally a bind() method is called from kubernetes.go – this calls the binding core in Kubernetes API, which can bind a pod to a node, for example.

The processor also looks at the bestPrice() method, which is in bestprice.go – this look at the “prices” for each node and returns the lowest value price, this is how we determine which pod is going to go where.

BYOB - Bring your own boxen to an OpenShift Origin lab!

Let’s spin up a OpenShift Origin lab today, we’ll be using openshift-ansible with a “BYO” (bring your own) inventory. Or I’d rather say “BYOB” for “Bring your own boxen”. OpenShift Origin is the upstream OpenShift – in short, OpenShift is a PaaS (platform-as-a-service), but one that is built with a distribution of Kubernetes, and in my opinion – is so valuable because of its strong opinions, which guide you towards some best practices for using Kubernetes for the enterprise. In addition, we’ll use my openshift-ansible-bootstrap which we can use to A. spin up some VMs to use in the lab, and/or B. Setup some basics on the host to make sure we can properly install OpenShift Origin. Our goal today will be to setup an OpenShift Origin cluster with a master and two compute nodes, we’ll verify that it’s healthy – and we’ll deploy a very basic pod.

If you’re itching to get your hands on the keyboard, skip down to “Clone Doug’s openshift-ansible-bootstrap” to omit the intro.

What, exactly, are we going to deploy?

The gist is we’re going to use Ansible from “some device” (in my case, my workstation, and I’d guess yours, too). We’ll then provision a machine to be a “virt-host” – a host for running virtual machines. Then we’ll spin up 3 virtual machines (with libvirt) to run OpenShift on. Those virtual machines are connected to a br0 bridge which will allow these virtual machines to have IP addressing on your LAN. (As opposed to say, a NAT’ed IP address)

architecture diagram


In this setup we use a CentOS 7 virtual machine host, you’ll need decent size on it. You might be able to trim down some of these, but, what I’m using is a baremetal node with 16 cores, using 4 cores per VM, 96 gigs of RAM, and I have 1TB spinning disk.

You’ll need at least:

  • 48 gigs of RAM (16 per VM)
  • ~240 gigs of HDD (~80 gigs per VM)
  • 6-8 cores (2 core per VM, I recommend 4 per VM)

This walk-through assumes that you have a host (like that) with CentOS 7.3 up and running (and hopefully you have some updated packaged and a late kernel, too).

You’ll need a host from which to run Ansible, and you’ll need Ansible installed. Additionally, we’re going to be using OpenShift-Ansible which requires Ansible or greater. This could be the same as your virtual host. Make sure you have SSH keys to your target box.

Additionally – while I use a VM lab, you could definitely spin up baremetal, or some VMs on “the cloud platform of your choosing” (and I hope for your sake, you don’t use one that has vendor lock-in). Just read through and skip the VM provisioning portion.


Really – you’ll want a DNS server for your cluster if you’re doing anything bigger than this, and even this setup could benefit from a DNS implementation. I don’t really go there in this implementation.

There is no HA components herein. Those may be extended to this lab environment when the right use-case for the lab comes along.

Additionally, since we’re using a single master node, there won’t be an official load balancer. The load balancer conflicts with some master service, and required a node dedicated to it. (Although, in theory you can probably schedule pods on that node, too.)

Docker storage driver

One of the bumps in the road I ran into while I was working on this was the Docker storage driver.

OpenShift does some great things for us, and that OpenShift-Ansible honors – one of those things being that it discourages you from using a loopback storage driver.

I followed the instructions for configuring direct-lvm storage for Docker from the Docker documentation.

Mostly though, these are covered in the playbooks, so, if you want, dig into those to see how I sorted it out. It’s worth noting that in the most recent Docker versions (the version used here at the time of writing is 1.12.x) make setting up the direct-lvm volumes much easier, and it does all volume actions automagically. In short, what I do is dedicate a disk to each VM and then tell Docker to use it.

Clone Doug’s openshift-ansible-bootstrap

I’ll assume now that you’ve got a machine to use that we can spin up virtual machines on, and that you have SSH keys from whatever box you’re going to run ansible on to that host.

I’ve got a few playbooks put together in a repo that’ll help you gets some basics on a few hosts to use for spinning up OpenShift Origin with a BYO inventory. I call it, boringly, openshift-ansible-bootstrap.

Go ahead and clone that.

$ git clone

Setup the virtual machine host.

Alright, first thing let’s open up the ./inventory/inventory file in the clone. Modify the virt_host line (in the first few lines) to have a ansible_host that has the IP (or hostname) of the machine we’re going to provision.

# Setup this host first, and put the IP here.
virt_host ansible_host= ansible_ssh_user=root

You’ll need to specify the NIC that you use to access the LAN/WAN on that host with:


(e.g. replace enp1s0f1 with eth0 if that’s what you have.)

Additionally (in order for the playbook to discover the IP address of the VMs it creates), you’ll need to specify the CIDR for the network on which that NIC operates…


Now that you have that setup, we can run the virt-host-setup.yml, like so:

$ ansible-playbook -i inventory/inventory virt-host-setup.yml

Oh is it coffee time? IT IS COFFEE TIME. Fill up a big mug, and I recommend stocking on up Vermont Coffee Company’s Tres. It’s legit.

In this process we have:

  • Installed dependencies to run VMs with libvirt
  • Spun up 3 VMs (and pick up their IP addresses)

Setup the inventory for the virtual machines (and grab the ssh keys)

Look in the output from the playbook and look for a section called: “Here are the IPs of the VMs”, grab those IPs and add them into the ./inventory/inventory file in this section:

# After running the virt-host-setup, then change these to match.
openshift-master ansible_host=
openshift-minion-1 ansible_host=
openshift-minion-2 ansible_host=

Ok, but, that’s no good without grabbing the SSH key to access these. You’ll find the key to them on the virt host, in root’s directory, the file should be here:

$ cat /root/.ssh/id_vm_rsa

Take that file and put it on your ansible machine, and we’ll also add that into the inventory.

Find this section in the inventory, and modify it to match where you put the file (keep the ansible_ssh_user the same, in most cases)


Modify the virtual machine hosts to get ready for an OpenShift Ansible run.

Cool – now go ahead and run the bootstrap.yml playbook which will setup these VMs to be readied for an openshift Ansible install.

$ ansible-playbook -i inventory/inventory bootstrap.yml

There’s a few things this does that really helps us out so that openshift-ansible can do the magic we need it to do.

  • It installs the correct docker version, and sets direct-lvm storage for Docker
  • It sets up the host files on the machines so that we don’t need DNS

That one should finish in a pretty reasonable amount of time.

Start the OpenShift Ansible run.

In the openshift-ansible-bootstrap clone’s root, you’ll find a file final.inventory which is the inventory we’re going to use for openshift-ansible – except again, we’ll have to replace the IPs in the first three lines of that file. (These will match what you created in the last step for the bootstrap.yml)

Here’s the whole thing in case you need it:

openshift-master ansible_host=
openshift-minion-1 ansible_host=
openshift-minion-2 ansible_host=

# lb
# nfs

# openshift_release=v3.6
openshift_master_identity_providers=[{'name': 'htpasswd_auth', 'login': 'true', 'challenge': 'true', 'kind': 'HTPasswdPasswordIdentityProvider', 'filename': '/etc/origin/master/htpasswd'}]



# [lb]
# openshift-master

# make them unschedulable by adding openshift_schedulable=False any node that's also a master.
openshift-master openshift_node_labels="{'region': 'infra', 'zone': 'default'}" openshift_schedulable=true
openshift-minion-[1:2] openshift_node_labels="{'region': 'primary', 'zone': 'default'}"

Alright, now, let’s ssh into the virtual machine host, and we’ll find that it’s cloned the openshift-ansible repo.

So move into that directory…

$ cd /root/openshift-ansible/

And put the contents of that final inventory into ./my.inventory

Drum roll please, begin the openshift-ansible run…

Now you can run the openshift ansible playbook like so:

$ ansible-playbook -i my.inventory ./playbooks/byo/config.yml

Now, make 10 coffees – and/or wait for your Vermont Coffee Company order to complete and then brew that coffee. This takes a bit.

Verifying the setup.

So, we’ll assume that openshift-ansible completed without a hitch (and if it didn’t? Give a read-through of the error, and give a shot at fixing it, and with that info in hand open up an issue or PR on my bootstrap playbooks). Now, we can look at the node status.

SSH into the master, and run:

[centos@openshift-master ~]$ oc status
[centos@openshift-master ~]$ oc get nodes
NAME                               STATUS    AGE
openshift-master.example.local     Ready     52m
openshift-minion-1.example.local   Ready     52m
openshift-minion-2.example.local   Ready     52m

You should have 3 nodes, and you might have noticed something in the ./final.inventory – I’ve told OpenShift that it’s OK to schedule pods on the master. We’re using a lot of resources for this lab, so, might as well make use of the master, too.

Optional: Configure the Dashboard.

If you want to, set a hosts file on your workstation to point openshift-master.example.local at the IP we’ve been using as the inventory IP address. And then point a browser @ https://openshift-master.example.local:8443/ and accept the certs to kick up the dashboard.

You’ll then need to configure the access to the dashboard. You can get a gist of the defaults from the /etc/origin/master/master-config.yaml file on the master:

[root@openshift-master centos]# grep -A12 "oauthConfig" /etc/origin/master/master-config.yaml 
  assetPublicURL: https://openshift-master.example.local:8443/console/
    method: auto
  - challenge: true
    login: true
    mappingMethod: claim
    name: htpasswd_auth
      apiVersion: v1
      file: /etc/origin/master/htpasswd
      kind: HTPasswdPasswordIdentityProvider

This lets us know that we’re using htpasswd_auth and that the htpasswd file is @ /etc/origin/master/htpasswd. There’s more info in the official docs.

With this in hand, we can create a user.

[centos@openshift-master ~]$ oc create user dougbtv
user "dougbtv" created

And now let’s add a password for that user.

[centos@openshift-master ~]$ sudo htpasswd -c /etc/origin/master/htpasswd dougbtv
New password: 
Re-type new password: 
Adding password for user dougbtv

Great, now you should be able to login with the user dougbtv (in this example) with the password you set there.

Let’s kick off a pod.

Alright, why don’t we use my all time handy favorite nginx pod!

First, let’s create a new project.

[centos@openshift-master ~]$ oc new-project sample

We’re going to use a public nginx container image, so, this one assumes it can run as the user it choses, so… We’re going to allow this. In your own production setup, you’ll likely massage the users and SCCs to fit a cleaner mold.

So in this case, we’ll add the anyuid SCC to the default user.

[centos@openshift-master ~]$ oc adm policy add-scc-to-user anyuid -z default

Then, create a nginx.yaml with these contents:

apiVersion: v1
kind: ReplicationController
  name: nginx
  replicas: 2
    app: nginx
      name: nginx
        app: nginx
      - name: nginx
        image: nginx
        - containerPort: 80

Create the replica set we’re defining with:

[centos@openshift-master ~]$ oc create -f nginx.yaml 

Watch the pods come up…

[centos@openshift-master ~]$ watch -n1 oc get pods

Should the pod fail to come up, do a oc describe pod nginx-A1B2C3 (replacing the pod name with the one from oc get pods)

Then… We can curl something from it. Here’s a shortcut to get you one of the pod’s IP addresses and curl it.

[centos@openshift-master ~]$ curl -s $(oc describe pod $(oc get pods | tail -n1 | awk '{print $1}') | grep -P "^IP" | awk '{print $2}') | grep -i thank
<p><em>Thank you for using nginx.</em></p>

And there you have it!

Look ma, No Docker! Kubernetes with CRI-O, and no Docker at all!

This isn’t just a stunt like riding a bike with no hands – it’s probably the future of how we’ll use Kubernetes. Today, we’re going to spin up Kubernetes using cri-o which uses the Kubernetes container runtime interface with OCI (open containers initive) compatible runtimes. That’s a mouthful, but, the gist is – it’s a way to use Kubernetes without Docker! That’s what we’ll do today. And to add a cherry on top, we’re also going to build a container image without Docker, too. We won’t go in depth on images today – our goal will be to get a Kubernetes up without Docker, with cri-o, and we’ll run a pod on it to prove it out.

We’re not going to have much luck with building and managing images. In a coming eposide we’ll add Buildah into the mix, a project out of Project Atomic which can build OCI images. Then we can expand to having a whole workflow without Docker. But today, I promise that you won’t do a single docker {run,build,ps}, not a one.

I saw this tweet from @soltysh on Twitter which linked me to the cri-o ansible playbook which inspired me to implement the same concept in my kube-centos-ansible playbooks. Inspired is the wrong word – more like made me ultra giddy to give it a try.

Here’s the thing, editorially – I love Docker^hMoby, and I am a firm believer that what Docker did was change the landscape for how we manage and deploy applications. But, it’s not wise to have a majority rule of the technology we use. So, I’m really excited for CRI-O. This is a game changer for the whole landscape, and I think the open governance model of CRI-O will be a huge boon for all parties involved (including Docker, too).

You might enjoy enjoy the infamous Kelsey Hightower’s cri-o-tutorial.


We’re going to use kube-centos-ansible – and this will spin up virtual machines for you if you want. If you don’t want – you could setup physical machines with CentOS 7, and skip on to the part where you modify the inventory for that. We’ll basically start from square one here and setup a virtual machine host for you, but, it’s up to you if you want that. Should you go with the virt-host method, you’ll need to strap that machine with CentOS 7, and give yourself some SSH keys.

So in short… The main consideration here is to have a machine you can deploy to (which could in theory, be your local machine, it might work with Fedora, and will certainly work with CentOS) – and you’ll need to have Ansible installed on a machine that can access the machine(s) with SSH.

What’s the hard part?

Honestly, most of this is really easy. The hardest part is managing your inventories and running my playbooks if you’re unfamiliar with them. I’ll give a recap here of how to do that.

We’re using my kube-centos-ansible playbooks, and if you aren’t familiar with them, I recommend you check out my intro blog article on how to install Kubernetes which goes in depth on these playbooks – I take them for granted sometimes and that will be useful as a reference if I miss something that I took as obvious.

Virtual machine host & spinning up the virtual machines

As I mentioned previously – skip this section if you already have machine provisioned. Otherwise, get yourself a fresh (or existing should likely be ok) CentOS 7 install where we can run VMs – so physical is preferable unless, yo dawg, I heard you like nested virtualization.

Alright, first thing’s first, let’s clone the kube-centos-ansible playbooks.

$ git clone && cd kube-centos-ansible

In there I’m going to have an inventory you should modify, so go ahead and modify this and put in the proper hostname/ip.

cat ./inventory/virthost.inventory 
kubehost ansible_host= ansible_ssh_user=root


Now that you have that, you should be able to run

$ ansible-playbook -i inventory/virthost.inventory virt-host-setup.yml

Importantly, this will create some ssh keys on that target virtual machine host that you’ll want to put on the machine where you’re running Ansible.

[root@your-virt-host ~]# ls ~/.ssh/id_vm_rsa

Also! It will show you a list of IP addresses for the machines you created. We use those in the next step.

You’ll also note at this point there are virtual machines running, you can see them with virsh list --all.

Readying the inventory of your virtual machines

Alright, now let’s modify the VM inventory. So go ahead and modify the ./inventory/vms.inventory

Main things here are:

  1. Modify the hosts at the top to match the IPs of the machines you just provisioned
  2. Modify the jump host information, e.g. for the virtual machine host. (skip this step if you brought your own hosts)

These are the two lines you really care about for step 2.

ansible_ssh_common_args='-o ProxyCommand="ssh -W %h:%p root@"'

Change the IP to the IP of your virtual machine host, and set the private key location to where you are keeping the private key on your local machine – e.g. the one that was created for you on the virtual machine host (that is… scp it to your local machine and then reference it here)

Let’s run this playbook!

So there’s a bit more setup than what’s the meat and potatoes… We’re about to do that now.

$ ansible-playbook -i inventory/vms.inventory kube-install.yml -e 'container_runtime=crio'

Verify the installation

Ok cool… So let’s do the first bit of verification… That’s there’s no, and I mean NO DOCKER. Aww. Yes.

Log yourself into the master (and minions, I know you’re incredulous, so go for it).

[centos@kube-master ~]$ sudo docker -v
sudo: docker: command not found

Just how we like it!

Now… List that you have some connected nodes.

[centos@kube-master ~]$ kubectl get nodes
NAME            STATUS    AGE       VERSION
kube-master     Ready     4m        v1.6.6
kube-minion-1   Ready     3m        v1.6.6

Ok, that’s all well and good… but, is anything running?

Should be!

[centos@kube-master ~]$ kubectl get pods --all-namespaces


Running a pod

Let’s use my favorite little nginx example.. Go ahead and put this yaml into a file named nginx.yaml:

apiVersion: v1
kind: ReplicationController
  name: nginx
  replicas: 2
    app: nginx
      name: nginx
        app: nginx
      - name: nginx
        image: nginx
        - containerPort: 80

Now go ahead and create using that, a la:

[centos@kube-master ~]$ kubectl create -f nginx.yaml 

And watch the two pods come up…

[centos@kube-master ~]$ watch -n1 kubectl get pods

Cool, now let’s see if we can reach an nginx…

[centos@kube-master ~]$ curl -s $(kubectl describe pod $(kubectl get pods | grep nginx | head -n 1 | awk '{print $1}') | grep "^IP" | awk '{print $2}') | grep -i thank
<p><em>Thank you for using nginx.</em></p>

And there it is! Mission complete.

Some commands to get you around

So – you don’t have docker, and there’s some regular ole things you’d like to do.

So how about the running processes? You can use runc for this, such as:

[centos@kube-master ~]$ sudo runc list

And get some help for it, to see some other things running:

[centos@kube-master ~]$ sudo runc --help

Some of my show stoppers.

One of the first things I ran into was that kubeadm was complaining I didn’t have docker – well, I know that kubeadm ;) So, I tried to skip preflight checks…

kubeadm init --skip-preflight-checks --pod-network-cidr

And that appeared to have worked. I think I saw something zip by on the kubernetes slack channels about this, maybe even in the kubeadm channel.

I talked with the awesome folks in the #cri-o channel on freenode, and they noted that this is a known issue with kubeadm and they’ve got PR’s open so that kubeadm knows it’s OK to use another runtime. Awesome!

Let's create a workflow for writing CNI plugins (including writing your first CNI plugin!)

In this tutorial, we’re going to write a CNI plugin, that is a “container network interface” plugin, that in this case we’ll specifically use in Kubernetes. A CNI plugin executes on start & stop of a container, and you use it to, generally, modify the infra container’s network namespace in order to configure networking for the pod. We can use this to customize how we setup networking. Today, we’ll both write a simple Go application to say “Hello, world!” to CNI to inspect how it works a little bit, and we’ll follow that up by looking at my CNI plugin Ratchet CNI (an implementation of koko in CNI) a little bit to grok the development workflow.

Our goal today is to:

  • Run a “dummy” CNI plugin of our own build, to show some of the moving parts
  • And run Ratchet CNI – to introduce some of the work-flow that I used to build it.

A lot of what’s here borrows heavily from the running the plugins section of the CNI readme. We’ll add on to here by introducing some key concepts, and get you started in writing your own plugin.


While it’s not required – you probably want to have a Kubernetes environment setup for yourself where you can experiment with deploying the plugins in Kubernetes proper. In my case, I used a Kubernetes master to check out my stuff “in development” and then also used a simple cluster with a master and single minion. If you need a Kubernetes lab environment, maybe I could tempt you to try using my lab playbooks. I also tend to assume a CentOS environment. I don’t use Kubernetes itself during this tutorial, but, you’ll certainly level up faster if you take the steps here and implement some of these ideas on Kube as a DIY exercise.

You can get away without golang if you just go up to the point where we create a dummy plugin. If you want to go further, you’ll need golang, and preferably Ansible to go ahead with running and inspecting Ratchet CNI.

On whatever box you use as I use my master, you’re going to need to install golang, e.g. on CentOS yum install -y golang, and you’ll need Docker (unless you’re cool enough to have another container runtime, in which case I salute you and you can go ahead with adapting towards that).

Lastly, you might see some mix here between a prompt as an unprivileged user, and root. The best case scenario is that you setup a regular user to use Docker… or you can just use root.

Some basics behind CNI.

When Kubernetes starts up your pod (a logical group of containers), something it will do is create a “infra container” – this is a container that has a shared network namespace (among other namespaces) that all the containers in the pod share.

This means that any networking elements you create in that infra container will be available to all the containers in the pod. This also means that as containers come and go within that pod, the networking stays stable.

If you have a running Kubernetes (which has some pods running), you can perform a docker ps and see containers that often running with image, and they’re running a command that looks like /pause. If you’re running OpenShift, the same concept applies, but, it may be a different image and command. In theory this is a lightweight enough container that it “shouldn’t really die” and should be available to all containers within that pod.

As Kubernetes creates this infra container, it also will call an executable as specified in the /etc/cni/net.d/*conf files. Kubernetes passes the contents of this

Kubernetes then uses the same config and calls the same binary when the pod is destroyed, too.

If you want even more detail, you can checkout the CNI specification itself.

Setting up your environment.

First thing we’ll do is clone the CNI repo proper, e.g.:

git clone

If you’re not running in a Kubernetes environment, you’ll also need to build some plugins, you can do so with a recipe like:

git clone
cd plugins
cp ./bin/* /opt/cni/bin

Then you can copy those binaries out to wherever you need. In my case, since I already have a running Kubernetes environment, I’m assuming you have binaries in /opt/cni/bin.

Last but not least, you’re going to need jq – as the scripts we’re using coming up require it.

[centos@cni ~]$ sudo curl -Ls -o /usr/bin/jq -w %{url_effective}
[centos@cni ~]$ sudo chmod +x /usr/bin/jq
[centos@cni ~]$ /usr/bin/jq  --version

Using the handy-dandy

In the clone of containernetworking/cni – you’ll find a ./scripts directory which has a this is a wrapper around the docker run command that invokes docker in such a way as to have a

Before we run those, we’re going to want to set the path of our CNI executables, and additionally where our configs live.

[root@cni scripts]# export CNI_PATH=/opt/cni/bin/
[root@cni scripts]# export NETCONFPATH=/etc/cni/net.d

Now that we have those, we’re going to create a simple CNI configuration, and we’ll run one of the default plugins.

We’ll shamelessly borrow the two configs from the official CNI readme, which include using the bridge type plugin, and a loopback. You’ll notice that these configs are “just JSON”

$ mkdir -p /etc/cni/net.d
$ cat >/etc/cni/net.d/10-mynet.conf <<EOF
    "cniVersion": "0.2.0",
    "name": "mynet",
    "type": "bridge",
    "bridge": "cni0",
    "isGateway": true,
    "ipMasq": true,
    "ipam": {
        "type": "host-local",
        "subnet": "",
        "routes": [
            { "dst": "" }
$ cat >/etc/cni/net.d/99-loopback.conf <<EOF
    "cniVersion": "0.2.0",
    "type": "loopback"

With those in place, we can now run a container. Let’s go for it.

[root@kube-mult-master scripts]# ./ --rm busybox ifconfig | grep -Pi "(eth0|lo|inet addr)"
eth0      Link encap:Ethernet  HWaddr 0A:58:0A:16:00:03  
          inet addr:  Bcast:  Mask:
lo        Link encap:Local Loopback  
          inet addr:  Mask:
          UP LOOPBACK RUNNING  MTU:65536  Metric:1

You can see that we have the two pieces we specified, a loopback, and a bridge to cni0 with a

Now that you’re done with that, let’s delete those two configs.

[root@kube-mult-master scripts]# rm /etc/cni/net.d/*conf

Let’s make our own dummy plugin!

Cool, so now that we have that… We’re going to make a new config, and we’ll create a “dumb” bash script that we’ll have execute.

cat >/etc/cni/net.d/10-mynet.conf <<EOF
    "cniVersion": "0.2.0",
    "name": "my_dummy_network",
    "type": "dummy"

Now we can create our dummy script.

cat >/opt/cni/bin/dummy <<EOF
logit () {
 >&2 echo \$1

logit "CNI method: \$CNI_COMMAND"
logit "CNI container id: \$CNI_CONTAINERID"
logit "-------------- Begin config"
while read line
  logit "\$line"
done < /dev/stdin
logit "-------------- End config"

And then give it proper permissions, to make it executable:

[root@kube-mult-master scripts]# chmod 0755 /opt/cni/bin/dummy

Now that it’s in place, let’s look at a few things in this script, as it’s going to tell us a few key bits of information we’re going to find helpful as we go along to create real CNI plugins.

Firstly: Anything that’s written to stderr is going to appear when we use the utility. That’s why we have the logit() function that does something like >&2 echo "foo" as that writes to stderr. This is really handy for debugging. Note that when you use it in kubernetes, it won’t show you anything, so if you need to debug there you’ll have to create some other facility for logging.

Next – you’ll notice there’s two ways that information is passed to your CNI plugin.

  • Environment variables.
  • Config file via stdin.

The list of environment variables are available in the CNI spec in the Overview section (down towards the bottom of that section).

You’ll notice that there’s a part of the script that reads:

logit "CNI method: $CNI_COMMAND"

This tells us if it’s on creation or deletion of the pod, and will come up as either ADD or DEL.

Then there’s a section where we read from stdin.

logit "-------------- Begin config"
while read line
  logit "\$line"
done < /dev/stdin
logit "-------------- End config"

The whole config file is then passed in here. So, Kubernetes (or this handy has already read this and knows what plugin to run, and then… It knows how to send it all to us.

In your plugin itself, you’ll then read this in to read any options that want to add.

If you want some more information that’s purely in Go code, take a look at the skel go modules in the CNI repo. This shows you exactly what CNI is doing to pass some information around.

Alright, enough jibber jabber – I want to run this dummy plugin already! Here you go:

[root@kube-mult-master scripts]# ./ --rm busybox ifconfig 
CNI method: ADD
CNI container id: d416ce8dc911a91b080530e1d18e033637a736c0affc707af5f219c59e919672
-------------- Begin config
"cniVersion": "0.2.0",
"name": "my_dummy_network",
"type": "dummy"
-------------- End config
lo        Link encap:Local Loopback  
          inet addr:  Mask:
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1 
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

CNI method: DEL
CNI container id: d416ce8dc911a91b080530e1d18e033637a736c0affc707af5f219c59e919672
-------------- Begin config
"cniVersion": "0.2.0",
"name": "my_dummy_network",
"type": "dummy"
-------------- End config

In this case, you’ll see the output from the dummy plugin both before and after the ifconfig output, as we run the container with the script, and it invokes our plugin both on ADD and on DEL

Congratulations – you have officially written a CNI plugin now. It’s not much (seeing that it, well doesn’t create any networking), but, it demonstrates what the moving pieces are to get an application to run.

Let’s inspect a “more real” plugin.

So – chances are, you’re not actually going to write a plugin in bash. Or, I hope you don’t / I don’t wish that on my worst enemy. You’re probably going to use Go. Not because it’s better or worse than anything else – but because you’re entering into a world of Gophers. And there’s lots of utilities out there for interacting with CNI itself, and with the containers – something we’ll likely do a lot as we write CNI plugins.

So why the quotes on “more real” plugin? Because we’re looking at Ratchet CNI, and it’s primarily an experiment, which leverages a more powerful technology – koko, a way of connecting containers with veth or vxlan to provide some network isolation for containers (and maybe some service function chaining, later on).

The Ratchet CNI is primarily a wrapper that can invoke koko. It does do some interesting things, but, the most interesting part of CNI, well… Is the networking! So, maybe it’s fair case.

Looking at some important bits in Ratchet

Let’s look at some of the important bits in Ratchet CNI, starting with the dependencies. The primary script we’re going to look at is ./ratchet/ratchet.go – which is what we compile down and is the main terminal binary that gets run by CNI. There’s more to how ratchet is designed, but, for today since we’re looking at build your own first CNI plugin – we’ll stick to the most interesting stuff there.

The Ratchet dependencies

Some of the most important dependencies are in these lines in ratchet.go, which includes:

  • Skeleton for CNI to read stdin & environment variables.
  • Allows us to use the DelegateAdd method which we use to call other plugins.
  • Some common types that are used by the CNI packages, including the NetConf type which defines our config JSON that we read from stdin.

There’s also a Docker client that we use to pick some additional metadata from the pod.

The main method

The main() method of the application is really just calling skelas seen here, which looks like:

skel.PluginMain(cmdAdd, cmdDel)

So we let skel do some work for us – it will call either of these methods (which are local to the Ratchet application), either cmdAdd or cmdDel (called on either creation or deletion of the pod). In those methods – we’re able to have a return from skel that includes the JSON config, which we can then parse and read to get some custom properties out of it.

Running the Ratchet CNI playbooks

You might not actually care about what Ratchet itself is doing, but, what you may care about is how I setup my development environment and how I manage that so I can hack on Ratchet, and then run it.

I do all of the editing of the application in an IDE (Sublime Text, for me) on my workstation. Then I keep my workstation clean of running any of the dependencies of this application, because, in my opinion I should have a place where I can store how to create all of those dependencies – which is why I choose an Ansible playbook to do that for me. I then use an Ansible playbook to create my environment where these will run (which is a small kubernetes cluster) and then I can both run against the quick-to-debug-against – and also deploy it to Kubernetes, for a final test.

While we’re talking about – you might also like taking a look at the .travis.yml file, too. Which shows you the exact steps that are taken in order to validate that the plugin is working – and should in theory give you all the steps you need to get it working yourself.

Using the Ansible playbooks

In the ./utils directory there are some rudimentary Ansible playbooks. If you’re going to use them, they do assume you have a Kubernetes master, and a Kubernetes minion (at least one).

Go ahead and edit the ./utils/remote.inventory file and change out the bits you want, especially the location of your boxen, and you might not need my ansible_ssh_common_args in the host variables (unless you’re using my ansible playbooks for labs, in which case – that might be handy)

After you’ve got that, there’s two playbooks you’re going to run…

  • ./utils/sync-and-build.yml: rsyncs code from local machine to remote master, and compiles it on the master – then copies the binary to all the minions – also templates the configs, it should be generally ready to use in Kubernetes at this point.
  • ./utils/docker-run.yml: Sets up everything to run with the from the CNI repo.

So you’d run the two commands in series like so:

$ ansible-playbook -i remote.inventory sync-and-build.yml
$ ansible-playbook -i remote.inventory docker-run.yml

Now that you’ve got those two in hand. Now you have run the Ratchet CNI plugin! The real usefulness in the context of this article is to

You can verify it by doing a docker ps and then validate the functionality of it (which is to provide some network isolation between containers using koko). By doing something like…

[root@kube-mult-master scripts]# docker exec -it primary ifconfig | grep in1
in1       Link encap:Ethernet  HWaddr 1A:73:4A:78:B7:21  
[root@kube-mult-master scripts]# docker exec -it primary ping -c 1
PING ( 56 data bytes
64 bytes from seq=0 ttl=64 time=0.086 ms

--- ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 0.086/0.086/0.086 ms

An Istio Blue-Green Deploy -- Anthropomorphized ASCII Hotdogs included.

Let’s check out performing a blue-green deploy using Istio. We’re going to leverage how Istio provides routing to services through its ingress controls and we’ll use that to deploy an application – upgrade to version 2 of that same application, and then… We’ll decide “Uh oh!” we want to change back to version 1, and we can do it very quickly because we’ll still have version 1 running. Our version one includes cow ASCII art, which is then upgraded to anthropomorphized hot dog ASCII art, because this article wouldn’t be complete without it. Are you ready for an Istio style b/g deploy?

If you notice the cow & hotdog are saying “Hello OPNFV” – it’s because I’m planning on demonstrating this method of a blue-green deploy @ OPNFV summit in Beijing in the coming weeks! So hello to any OPNFV folks who came here through that avenue.


You’ll need Kubernetes and Istio, and we’ve got that part of the lab all setup in the article about installing and using Istio. That should be enough to get your feet wet!

This also assumes you know your $GATEWAY_URL, which the above referenced article has instructions on how to figure out, too.

Blue-green deploys

If you’re uniniated, a blue-green deploy is basically where we have two versions of our application running, and then we put a load balancer / proxy / etc in front of it… We swing traffic from the current release, to the new release.

But! We leave the old release running. So, in case something goes wrong, we can swing back.

blue-green release

If you’re ultra high tech (and I know you are) you can probably integrate your task-runners & monitoring solutions to do the swinging back for you. Here, we’re going to do it all manually.

Using istioctl

Got your lab all set with the Helm style deploy? Great! If you came about it another way istioctl get route-rules -o yaml might already work for you, and if it does, skip down to the next section. If you’re starting with my lab, continue here.

It’s not so simple to use istioctl when you don’t have the default names – e.g. when you’re using Helm, so… You’re going to need to figure out the --managerAPIService to specify for istioctl.

You can figure out the name of the deployment nickname from helm with a helm list, mine is zooming-jaguar, which I found like:

[centos@kube-master ~]$ helm list
NAME            REVISION    UPDATED                     STATUS      CHART       NAMESPACE
zooming-jaguar  1           Mon Jun  5 19:59:07 2017    DEPLOYED    istio-0.1.4 default  

Then you can test if your istioctl is working by replacing your name into a command like so:

[centos@kube-master ~]$ istioctl --managerAPIService=zooming-jaguar-istio-manager:8081 get route-rules -o yaml
No resources found.
[centos@kube-master ~]$ echo $?

If it doesn’t exit zero, something is up. You can put a -v=10 to bump up the verbosity if you like.

That’s a mouthful, so I went and created a script to do all that dirty work for me.

[centos@kube-master ~]$ cat 
istioctl --managerAPIService=zooming-jaguar-istio-manager:8081 "$@"

That passes all the arguments, so… You can do something like…

[centos@kube-master ~]$ ./ get route-rules

Setup for a version upgrade

Alright, firstly, this is fairly similar to the pickle.yaml we had before, but… we’re now in both the dairy and hotdog industry. We’re going to use an nginx image I built, the relevant dockerfiles are in this nginx-cowsay gist if you’d like to see. The idea is, there’s two versions here dougbtv/cowsay-nginx:v1 and dougbtv/cowsay-nginx:v2, like… Two releases of an application with tagged docker images. The v1 is plain old cowsay output, the v2 cowsay includes the beefy miracle (a hotdog).

We’re about to create two files, you’ll note there’s a couple important parts, firstly the aforementioned image. Then, note that they share the same service that’s defined. Last but not least check out the metadata, there’s a version parameter there – we’ll specify that in the routing rules we create in a bit.

First create a cowsay.yaml with these contents:

apiVersion: extensions/v1beta1
kind: Deployment
  name: cowsay-nginx
  replicas: 1
        version: v1
        service: cowsay-nginx
      - name: cowsay-nginx
        image: dougbtv/nginx-cowsay:v1
        imagePullPolicy: IfNotPresent
        - name: PICKLE_TYPE
          value: pickle
        - containerPort: 80
apiVersion: v1
kind: Service
  name: cowsay-nginx
    service: cowsay-nginx
  - port: 9080
    name: "http"
    targetPort: 80
    service: cowsay-nginx
apiVersion: extensions/v1beta1
kind: Ingress
  name: gateway
  annotations: "istio"
  - http:
      - path: /
          serviceName: cowsay-nginx
          servicePort: 9080

Now, create a second one, cowsay-v2.yaml:

apiVersion: extensions/v1beta1
kind: Deployment
  name: hotdogsay-nginx
  replicas: 1
        version: v2
        service: cowsay-nginx
      - name: pickleman-nginx
        image: dougbtv/nginx-cowsay:v2
        imagePullPolicy: IfNotPresent
        - name: PICKLE_TYPE
          value: cowsay-man
        - containerPort: 80

Deploy Version 1

Go ahead and deploy version 1…

[centos@kube-master ~]$ kubectl create -f <(istioctl kube-inject -f cowsay.yaml)

Perform a watch -n1 kubectl get pods and wait until it’s ready to rumble.

And then from whatever machine you want do a curl -s $GATEWAY_URL. You should see some cow ASCII art.

[root@droctagon2 ~]# curl -s
< Hello OPNFV, from Vermont >
        \   ^__^
         \  (**)\_______
            (__)\       )\/\
             U  ||----w |
                ||     ||

Alright, you’re in pretty good shape.

Now if we were to just apply the second cowsay-v2.yaml right now, we’d get a round-robin between v1 and v2. Which is interesting on it’s own.

But, that’s not what we want.

Setup a default route to v1

And indeed it works, does a round-robin between the two.

So, now let’s see about being able to control those a little better.

So create a file… routerules.yaml

type: route-rule
name: cowsay-default
  destination: cowsay-nginx.default.svc.cluster.local
  precedence: 1
  - tags:
      version: v1

Now create some rules…

[centos@kube-master ~]$ ./ create -f routerules.yaml 
Created config: route-rule cowsay-default

Check that you can still curl the url.

Now, you can list what you’ve got.

[centos@kube-master ~]$ ./ get route-rules -o yaml

Alright, that’s great, so…. Now it’s time to roll-out version 2.

Deploy version 2

That being done, it’s time to do your deployment. So go ahead and create version 2.

[centos@kube-master ~]$ kubectl create -f <(istioctl kube-inject -f cowsay-v2.yaml)

Wait until it’s up and ready. You should have a hotdogsay-nginx-* pod.

Check your curl, and make sure that it’s still just a cow. It’s now up and running, but we’re not routing to it yet. Did I mention that I do a watch -n1 curl -s $GATEWAY_URL during this so I can just watch and see what it is? I recommend that.

Now let’s go officially live. We’re going to modify the file to be an upgrade routerules.yaml. Yours should now look like:

[centos@kube-master ~]$ cat routerules.yaml 
type: route-rule
name: cowsay-default
  destination: cowsay-nginx.default.svc.cluster.local
  precedence: 1
  - tags:
      version: v2

And we’re going to replace the config, a la:

[centos@kube-master ~]$ ./ replace -f routerules.yaml 
Updated config: route-rule cowsay-default

Check your curl command, now… You’ve got a hot dog!

< Hello OPNFV, from Vermont >
                      .---. __
           ,         /     \   \    ||||
          \\\\      |O___O |    | \\||||
          \   //    | \_/  |    |  \   /
           '--/----/|     /     |   |-'
                  // //  /     -----'
                 //  \\ /      /
                //  // /      /
               //  \\ /      /
              //  // /      /
             /|   ' /      /
             //\___/      /
            //   ||\     /
            \\_  || '---'
            /' /  \\_.-
           /  /    --| |
           '-'      |  |

Fall back to v1

Ahhh, now you’re running v2. And all is well. You already know this version works perfectly. It’s amazing, and you already have all the CI backed behind it to know it works. So there can’t possibly be a technique failure.

But, the call comes in from your a marketing VP and an ops VP: “HOLY GUACAMOLE, our hotdogs aren’t READY FOR SALE. Go back to the old version IMMEDIATELY. Also, The hot dog only has 4 fingers that’s not anatomically correct for hotdog men!”

No big deal. We can do that easily. Just change the v2 to a v1 in routerules.yaml and then replace the config.

[centos@kube-master ~]$ cat routerules.yaml | grep -P "v\d"
      version: v1

And replace it again…

[centos@kube-master ~]$ ./ replace -f routerules.yaml 
Updated config: route-rule cowsay-default

Call it day!

Now you can call marketing & ops back and tell them to get their requests in when they’ve verified the results in staging next time ;)