Spin up a Kubernetes cluster on CentOS, a choose-your-own-adventure

So you want to install Kubernetes on CentOS? Awesome, I’ve got a little choose-your-own-adventure here for you. If you choose to continue installing Kubernetes, keep reading. If you choose to not install Kubernetes, skip to the very bottom of the article. I’ve got just the recipe for you to brew it up. It’s been a year since my last article on installing Kubernetes on CentOS, and while it’s still probably useful – some of the Ansible playbooks we were using have changed significantly. Today we’ll use kube-ansible which is a playbook developed by my team and I to spin up Kubernetes clusters for development purposes. Our goal will be to get Kubernetes up (and we’ll use Flannel as the CNI plugin), and then spin up a test pod to make sure everything’s working swimmingly.

What’s inside?

Our goal here is to spin up a development cluster of Kubernetes machines to experiment here. If you’re looking for something that’s a little bit more production grade, you might want to consider using OpenShift – the bottom line is that it’s a lot more opinionated, and will guide you to make some good decisions for production, especially in terms of reliability and maintenance. What we’ll spin up here is more-or-less the bleeding edge of Kubernetes. This project is more appropriate for infrastructure experimentation, and is generally a bit more fragile.

We’ll be using Ansible – but you don’t have to be an Ansible expert. If you can get it installed (which should be as easy as a pip install or dnf install) – you’re well on your way. I’ll give you the command-by-command rundown here, and I’ll provide example inventories (which tell Ansible which machines to operate on). We use kube-ansible extensively here to do the job for us.

Generally – what these playbooks do is bootstrap some hosts for you so they’re readied for a Kubernetes install. They then use kubeadm. If you have more interest in this, follow that previous link to the official docs, or check out my (now likely a bit dated) article on manually installing Kubernetes on CentOS.

Then, post install, the playbooks can install some CNI plugins – the plugins that Kubernetes uses to configure the networking on the cluster. By default we spin up the cluster with Flannel.

Breif overview of the adventure.

So what exactly are we going to do?

  • You’ll clone a repo to help install Kube on CentOS.
  • You’ll make a choice:
    • To provision a CentOS host to use as a virtual machine host which hosts the virtual guests which will comprise your cluster
    • Install CentOS on any number of machines (2+ recommended) which will become the nodes which comprise your cluster.
  • Install Kubernetes
  • Verify the installation by running a couple pods.

Requirements

Overall you’re required to have:

  • Some box with Ansible installed – you don’t need to be an Ansible expert.
  • Git.
  • You guessed it, a coffee in hand. Beans must have been ground at approximately the time of brewing, and your coffee was poured from 12” or higher into your drinking vessel to help aerate the coffee. Seeing it’s a choose your own adventure – you may also choose tea.You’ll just be suffering a little. But, grab some Smith Teamaker’s Rooibos, it’s pretty fine.

Secondarily, there’s a choose-your-own-adventure part. Basically, you can choose to either:

  1. Provision a host that can run virtual machines, or
  2. Spin up whatever CentOS hosts yourself.

Generally – I’d suggest #2. Hopefully you have a way to spin up hosts in your own environment. You could use anything from spacewalk, to bifrost, or… If you’re hipster cool, maybe you’re even using matchbox.

Mostly the playbooks used to spin up virtual machines for you herein are for my own quick iteration when I’m quickly building (and destroying) clusters, and trying different setups, configurations, new features, CNI plugins, etc. Feel free to use it, but, it could just slow you down if you otherwise have a workflow for spinning up boxen. Sidenote: For years I called a virtualization host I was using in a development environment “deathstar” because the rebels kept destroying the damn thing. Side-sidenote: I was a rebel.

If you’ve choosen “1. Provision a host that can run virtual machines.” – then you’re just required to have a host that can run virtual machines. I assume there’s already a CentOS operating system on it. You should have approximately 60-120+ gigs of disk space free, and maybe 16-32 gigs of RAM. That should be more than enough.

If you chose the adventure “2. Spin up whatever CentOS hosts yourself.” – then go ahead and spin those CentOS machines up yourself, and I’d recommend 3 of them. 2 is fine too. 1 will just not be nearly as much fun. Generally, I’d recommend 4 gig of RAM a piece, and maybe 20+ gig free for each node.

I admit that the box sizing recommendations are fairly arbitrary. You’d likely size them according to your workloads, but, these are essentially “medium range guesses” to make sure it works.

Clone the kube-ansible repo.

Should be fairly simple, just clone ‘er right up:

$ git clone -b v0.5.0 https://github.com/redhat-nfvpe/kube-ansible.git && cd kube-ansible

You’ll note that we’re cloning at a particular tag – v0.5.0. If you want, omit the -b v0.5.0, which will make it so you’re on the master branch. In theory, it should be fine. I chose a particular tag for this article so it’ll still be relevant in the case that we (inevitably) make changes to the kube-ansible repo.

It’ll change directory into that directory with the copy-and-pasted command, and then you can initialize the included roles…

$ ansible-galaxy install -r requirements.yml

You’ll note here that we’re cloning at a particular tag so that things don’t change and I can base the documentation on it. If you’re feeling particularly, ahem, adventurous – you can choose the adventure to remove the -b 0.2.1 parameter, and clone at master HEAD. I’m hopeful that there’s some maturity on these playbooks and that shouldn’t matter much, but, at least at this tag it’ll match your experience with this article. Granted – we’ll be installing the latest and greatest Kubernetes, so, that will change.

So, what exactly do these playbooks do?

  1. Configures a machine to use as a virtual machine host (which is optional, you’ll get to choose this later on) on which the nodes run.
  2. Installs all the deps necessary on the hosts
  3. Runs kubeadm init to bootstrap the cluster (kubeadm docs)
  4. Installs a CNI plugin for pod networking (by default, it’s flannel.)
  5. Joins the hosts to a cluster.

You chose the adventure: Provision a host that can run virtual machines

If you chose the adventure “2. Spin up whatever CentOS hosts yourself.” head down to the next header topic, you’ve just saved yourself some work. (Unless you had to manually install CentOS like, twice, then you didn’t but I’m hopeful you have a good way to spin up nodes in your environment.)

If you chose “1. Provision a host that can run virtual machines.”, continue reading from here.

I recommended adventure #2, to spin them up yourself. I’m only going to glance over this part, I think it’s handy for iterating on Kubernetes setups, but, there’s really a bunch of options here. For the time being – I’m going to only cover a setup that uses a NAT’d setup for the VMs. IMO – it’s less convenient, but, it’s more normalized to generally document. So that’s what we’ll get today.

Alright – so you’ve got CentOS all setup on this new host, and you can SSH to it, and at least sudo root from there. That’s necessary for our Ansible playbook.

Let’s create a small inventory, and we’ll use that.

We can copy out a sample inventory, and we’ll go from there.

$ cp inventory/examples/virthost/virthost.inventory inventory/your_virthost.inventory

All edited, mine looks like:

vmhost ansible_host=192.168.1.119 ansible_ssh_user=root

[virthost]
vmhost

This assumes you can SSH as root to that ansible_host specified there.

If you’ve got that all set – it shouldn’t be hard to spin up some VMs, now.

Just go ahead and run the virthost-setup playbook, such as:

$ ansible-playbook -i inventory/your_virthost.inventory -e "ssh_proxy_enabled=true" playbooks/virthost-setup.yml

By default this will spin up 4 hosts for us to use. If you’d like to use other hosts, you can specify them, you’ll find the default variable for the list of these VMs in the variable called virtual_machines in the ./playbooks/ka-init/group_vars/all.yml file, which you’re intended to override (instead of edit) – you can specify the memory & CPU requirements for those VMs, too.

Let that puppy run, and you’ll find out that it will create a file for you with a new inventory – ./inventory/vms.local.generated.

It has also created a private key to SSH to these vms. So if you want to ssh to one, you can do something like:

$ ssh -i ~/.ssh/vmhost/id_vm_rsa -o ProxyCommand="ssh -W %h:%p root@192.168.1.119" centos@192.168.122.58

Where:

  • ` ~/.ssh/vmhost/id_vm_rsa is the private key, and vmhost` is the name of the host from the first inventory we used.
  • 192.168.1.119 is the IP address of the virtualization host.
  • and 192.168.122.58 is the IP address of the VM (which you discovered from looking at the vms.local.generated file)

Check that out, we’re going to use it in the “Interall Kubernetes step” (which you can skip to, now.)

You chose the adventure: Spin up whatever CentOS hosts yourself

If you chose “1. Provision a host that can run virtual machines.”, continue to the next header.

Go ahead and spin up N+1 boxes. I recommend at least 2, 3 makes it more interesting. And even more for the brave. You need at least a master, and I recommend another as a node.

Make sure that you can SSH to these boxes, and let’s create a sample inventory.

Create yourself an inventory, which you can base on this inventory:

kube-master ansible_host=192.168.122.216
kube-node-1 ansible_host=192.168.122.179
kube-node-2 ansible_host=192.168.122.32

[master]
kube-master

[nodes]
kube-node-1
kube-node-2

[all:vars]
ansible_user=centos
ansible_ssh_private_key_file=/home/me/.ssh/my_id_of_some_sort

Go ahead and put that inventory file in the ./inventory directory at whatever name you choose, I’d choose ./inventory/myname.inventory – you can replace myname with your name, your dogs name, your favorite cheese – actually that’s the official suggested name of the inventory now… manchego.inventory.

So place that file at ./inventory/manchego.inventory.

(sidenote, I actually prefer a sharp cheddar, or a brie-style cheese like Jasper Hill’s Moses Sleeper)

Installing Kubernetes

Alright – you’ve gotten this far, you’re on the path to success. Let’s kick off an install.

Replace ./inventory/your.inventory with:

  • ./inventory/vms.local.generated if you chose #1, build a virtualization host
  • ./inventory/manchego.inventory if you chose #2, provision your own machines.
$ ansible-playbook -i ./inventory/your.inventory playbooks/kube-install.yml

Wait! Did you already run that? If you didn’t there’s another mini-adventure you can choose, go to the next header, “Run the kube-install with Multus for networking”.

And you’re on the way to success! And if you’ve finished your coffee now… It’s time to skip down to “Verify your Kubernetes setup!”

(Optional) Run the kube-install with Multus for networking

If you aren’t going to use Multus, skip down to “Verify your Kubernetes setup!”, otherwise, continue here.

Alright, so this is an optional one, some of my audience for this blog gets here because they’re looking for a way to use Multus CNI. I’m a big fan of Multus, it allows us to attach multiple network interfaces to pods. If you’re following Multus, I urge you to check out what’s happening with the Network Plumbing Working Group (NPWG) – an offshoot of Kubernetes SIG-Network (the special interest group for networking). Up in the NPWG, we’re working on standardizing how multiple network attachments for pods work, and I’m excited to be trying Multus.

Ok, so you want to use Multus! Great. Let’s create an extra vars file that we can use.

$ cat inventory/multus-extravars.yml 
---
pod_network_type: "multus"
multus_use_crd: false
optional_packages:
  - tcpdump
  - bind-utils
multus_ipam_subnet: "192.168.122.0/24"
multus_ipam_rangeStart: "192.168.122.200"
multus_ipam_rangeEnd: "192.168.122.216"
multus_ipam_gateway: "192.168.122.1"

Our Multus demo uses macvlan – so you’ll want to change the multus_ipam_* variables to match your network. This one matches the default NAT’ed setup for libvirt VMs in CentOS.

Now that we have that file in place, we can kick off the install like so:

$ ansible-playbook -i ./inventory/vms.local.generated -e "@./inventory/multus-extravars.yml" playbooks/kube-install.yml

If you created your own inventory change ./inventory/vms.local.generated with ./inventory/manchego.inventory (or whatever you called yours if you didn’t pick my cheesy inventory name).

Verify your Kubernetes setup!

Go ahead and SSH to the master node, and you can view which nodes have registered, if everything is good, it should look something like:

[centos@kube-master ~]$ kubectl get nodes
NAME          STATUS    ROLES     AGE       VERSION
kube-master   Ready     master    30m       v1.9.3
kube-node-1   Ready     <none>    22m       v1.9.3
kube-node-2   Ready     <none>    22m       v1.9.3
kube-node-3   Ready     <none>    22m       v1.9.3

Let’s create a pod to make sure things are working a-ok.

Create a yaml file that looks like so:

[centos@kube-master ~]$ cat nginx_pod.yaml
apiVersion: v1
kind: ReplicationController
metadata:
  name: nginx
spec:
  replicas: 2
  selector:
    app: nginx
  template:
    metadata:
      name: nginx
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx
        ports:
        - containerPort: 80

And tell kube to create the pods with:

[centos@kube-master ~]$ kubectl create -f nginx_pod.yaml 

Watch them come up with:

[centos@kube-master ~]$ watch -n1 kubectl get pods -o wide

Assuming you have multiple nodes, these should be coming up on separate nodes, once they’re up, go ahead and find the IP of one of them…

[centos@kube-master ~]$ IP=$(kubectl describe pod $(kubectl get pods | grep nginx | head -n1 | awk '{print $1}') | grep -P "^IP" | awk '{print $2}')
[centos@kube-master ~]$ echo $IP
10.244.3.2
[centos@kube-master ~]$ curl -s $IP | grep -i thank
<p><em>Thank you for using nginx.</em></p>

And there you have it, an instance of nginx running on Kube!

For Multus verification…

(If you haven’t installed with Multus, skip down to the “Some other adventures you can choose” section.)

You can kick off a pod and go ahead and exec ip a on it. The nginx pods that we spun up don’t have the right tools to inspect the network. So let’s kick off a pod with some better tools.

Create a yaml file like so:

[centos@kube-master ~]$ cat check_network.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: debugging
spec:
  containers:
    - name: debugging
      command: ["/bin/bash", "-c", "sleep 2000000000000"]
      image: dougbtv/centos-network
      ports:
      - containerPort: 80

Then have Kubernetes create that pod for you…

[centos@kube-master ~]$ kubectl create -f check_network.yaml 

You can watch it come up with watch -n1 kubectl get pods -o wide, then you can verify that it has multiple interfaces…

[centos@kube-master ~]$ kubectl exec -it debugging -- ip a | grep -Pi "^\d|^\s*inet\s"
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
    inet 127.0.0.1/8 scope host lo
3: eth0@if6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP 
    inet 10.244.3.2/24 scope global eth0
4: net0@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN 
    inet 192.168.122.200/24 scope global net0

Hurray! There’s your Kubernetes install up and running showing multiple network attachments per pod using Multus.

Some other adventures you can choose…

This is just the tip of the iceberg for more advanced scenarios you can spin up…

If you made the first decision in this article to install Kube, congrats! THE END.

You have chosen: Do not install Kubernetes

It is pitch black. You are likely to be eaten by a grue. You have been eaten by a grue. THE END.

Kubernetes multiple network interfaces -- but! With different configs per pod; Multus CNI has your back.

You need multiple network interfaces in each pod – because you, like me, have some more serious networking requirements for Kubernetes than your average bear. The thing is – if you have different specifications for each pod, and what network interfaces each pod should have based on its role, well… Previously you were fairly limited. At least using my previous (and somewhat dated) method of using Multus CNI (a CNI plugin to help enable you to have multiple interfaces per pod), you could only apply to all pods (or at best, with multiple CNI configs per box, and have it per box). Thanks to Kural and crew, Multus includes the functionality to use Kubernetes Custom Resources (Also known as “CRDs”). These “custom resource definitions” are a way to extend the Kubernetes API. Today we’ll take advantage of that functionality. The CRD implementation in Multus allows us to specify exactly what multiple network interfaces each pod has based on annotations attached to each pod. Our goal here will be to spin up a Kubernetes cluster complete with Multus CNI (including the CRD functionality), and then we’ll spin up pods where we have some with a single interface, and some with multiple interfaces, and then we’ll inspect those.

Not familiar with Multus CNI? The short version is that it’s (in my own words) a “meta plugin” – one that lets you call multiple CNI plugins, and assign an interface in a pod to each of those plugins. This allows us to create multiple interfaces.

Have an older Kubernetes? At the time of writing Kubernetes 1.9.0 was hot off the presses. So CRDs are well established, but if you have an older edition Multus also supports “TPRs” – third party resources, which were an earlier incantation of what is now CRDs. You’ll have to modify for those to work, but, this might be a fair reference point.

A lot of what I learned here is directly from the Multus CNI readme. Mostly I have just automated it with kube-ansible, and then documented up my way of doing it. Make sure to check out what’s in the official readme to further extend your knowledge of what you can do with Multus.

In short, what’s cool about this?

  • Multus CNI can give us multiple interfaces per each Kubernetes pod
  • The CRD functionality for Multus can allow us to specify which pods get which interfaces, and allowing different interfaces depending on the use case.

I originally really wanted to do something neat with a realistic use-case. Like separate networks like I used to do frequently for telephony use cases. In those cases I’d have different network segments for management, signalling and media. I was going to setup a neat VoIP configuration here, but, alas… I kept yak shaving to get there. So instead, we’ll just get to the point and today we’re just going to spin up some example pods, and maybe next time around I’ll have a more realistic use-case rather than just saying “There it is, it works!”. But, today, it’s just “there it is!”

Requirements

TL;DR:

  • A CentOS 7 box capable of running some virtual machines.
  • Ansible installed on a workstation.
  • Git.
  • Your favorite text editor.
  • Some really good coffee.
    • Tea is also a fair substitute, but, if herbal – it must be a rooibos.

This tutorial will use kube-ansible, which is an Ansible playbook that I reference often in this blog, but, it’s a way to spin up a Kubernetes cluster (on CentOS) with vanilla kubernetes in order to create a Kubernetes development cluster for yourself quickly, and including some scenarios.

In this case we’re going to spin up a couple virtual machines and deploy to those. You don’t need a high powered machine for this, just enough to get a couple light VMs to use for our experiment.

Get your clone on.

Go ahead and clone kube-ansible, and move into its directory.

$ git clone -b v0.1.8 git@github.com:redhat-nfvpe/kube-ansible.git && cd kube-ansible/

Install the required galaxy roles for the project.

$ ansible-galaxy install -r requirements.yml

Setup your inventory and extra vars.

Make sure you can SSH to the CentOS 7 machine we’ll use as a virtualization host (referred to heavily as “virthost” in the Ansible playbooks, and docs, and probably here in this article). Then create yourself an inventory for that host. For a reference, here’s what mine looks like:

$ cat inventory/virthost.inventory 
the_virthost ansible_host=192.168.1.119 ansible_ssh_user=root

[virthost]
the_virthost

We’re also going to create some extra variables to use. So let’s define those.

Pay attention to these parts:

  • bridge_ variables define how we’ll bridge to the network of your virthost. In this case I want to bridge to the device called enp1s0f1 on that host, which I specify as bridge_physical_nic. I then specify a bridge_network_cidr which matches the DHCP range on that network (in this example case I have a SOHO type setup with a 192.168.1.0/24 subnet.)
  • multus_ipam_ variables define how we’re going to use some networking with a plugin (it’ll be macvlan, a little more on that later) that this playbook automatically sets up for us. Generally this should match what your network looks like, so in my SOHO type example, we have a gateway on 192.168.1.1 and then we match that.

The rest of the variables can likely stay the same.

$ cat inventory/multus-extravars.yml 
---
bridge_networking: true
bridge_name: br0
bridge_physical_nic: "enp1s0f1"
bridge_network_name: "br0"
bridge_network_cidr: 192.168.1.0/24
pod_network_type: "multus"
virtual_machines:
  - name: kube-master
    node_type: master
  - name: kube-node-1
    node_type: nodes
optional_packages:
  - tcpdump
  - bind-utils
multus_use_crd: true
multus_ipam_subnet: "192.168.1.0/24"
multus_ipam_rangeStart: "192.168.1.200"
multus_ipam_rangeEnd: "192.168.1.216"
multus_ipam_gateway: "192.168.1.1"

Initial setup the virtualization host

Cool, with those in place, we can now begin our initial virthost setup. Let’s run that with the inventory and extra vars we just created.

$ ansible-playbook -i inventory/virthost.inventory -e "@./inventory/multus-extravars.yml" virthost-setup.yml

This has done a few things for us: It has spun up some virtual machines, and created a local inventory of those virtual machines, and also it has put a ssh key in ~/.ssh/the_virthost/id_vm_rsa – which we can use if we want to SSH to one of those hosts. (Which we’ll do here in a minute)

Now, let’s kick off a deployment of Kubernetes, it will also get. This is the part of the tute where you’ll need that coffee I mentioned earlier.

$ ansible-playbook -i inventory/vms.local.generated -e "@./inventory/multus-extravars.yml" kube-install.yml 

Finished your coffee yet? Ok, heat it up, we’re going to enter a machine and take a look around.

Overview of what’s happened.

I highly suggest you take a peek around the Ansible playbooks if you want some details of what has happened for you. Sure, they’re pretty big, but, you don’t need to be an Ansible genius to figure out what’s going on.

As a quick recap, here’s some of the things the playbook has done for us:

  • Installed the basic packages we need for Kubernetes
  • Initialized a Kubernetes cluster using kubeadm
  • Compiled Multus CNI
  • Configured some RBAC so that our nodes can query the Kubernetes API (which Multus needs in order to use CRDs)
  • Added some CRDs to our setup that Multus can use to figure out which pods get which treatments for their network configuration.

Inspecting the setup.

Here’s one way that you can use to ssh to the master…

$ ssh -i ~/.ssh/the_virthost/id_vm_rsa centos@$(grep -m1 "kube-master" inventory/vms.local.generated | cut -d"=" -f 2)

You might first want to checkout the health of the cluster with a kubectl get nodes and make sure that it’s generally functioning OK. In this case we’re building a cluster with a single master, and a single node.

Let’s peek around at a few things that the playbook has setup for us… Before anything else – the CNI config.

[centos@kube-master centos]$ sudo cat /etc/cni/net.d/10-multus.conf 
{
  "name": "multus-cni-network",
  "type": "multus",
  "kubeconfig": "/etc/kubernetes/kubelet.conf"
}

You’ll see that it just has a skeleton for Multus. The real configs will really be in CRD.

The Custom Resource Definitions (CRDs)

Check this out – we have a CRD, networks.kubernetes.com:

[centos@kube-master ~]$ kubectl get crd
NAME                      AGE
networks.kubernetes.com   46m

We can also kubectl that, too.

[centos@kube-master ~]$ kubectl get networks
NAME           AGE
flannel-conf   46m
macvlan-conf   46m

Great, now let’s describe one of the networks…

[centos@kube-master ~]$ kubectl describe networks flannel-conf
Name:         flannel-conf
Namespace:    default
Labels:       <none>
Annotations:  <none>
API Version:  kubernetes.com/v1
Args:         [ { "delegate": { "isDefaultGateway": true } } ]
[...snip ...]

You can also describe the macvlan-conf, too. With kubectl describe networks macvlan-conf.

So check this out, there’s a really really simple CNI configuration there in the Args:. It’s just a simple config that points to flannel. That’s it.

Spin up a pod!

That being the case, let’s setup a pod from this spec.

[centos@kube-master ~]$ cat flannel.pod.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: flannelpod
  annotations:
    networks: '[  
        { "name": "flannel-conf" }
    ]'
spec:
  containers:
  - name: flannelpod
    command: ["/bin/bash", "-c", "sleep 2000000000000"]
    image: dougbtv/centos-network
    ports:
    - containerPort: 80

Create that pod spec YAML however you’d like, and then we’ll create from it.

[centos@kube-master ~]$ kubectl create -f flannel.pod.yaml 
pod "flannelpod" created

Watch it come up if you wish, with watch -n1 kubectl get pods -o wide. Or even get some detail with watch -n1 kubectl describe pod flannelpod

Now, let’s look at the interfaces therein… In this case, we have a vanilla flannel setup for this pod. There’s a lo loopback device interface, and then eth0 which has an IP address assigned on the 10.244.1.2 address in the CIDR range the playbooks setup for us.

[centos@kube-master ~]$ kubectl exec -it flannelpod -- ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
3: eth0@if6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP 
    link/ether 0a:58:0a:f4:01:02 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.244.1.2/24 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::a8a0:b3ff:febd:4e0a/64 scope link 
       valid_lft forever preferred_lft forever

How about… another pod!

Well naturally, this wouldn’t be a very good demonstration if we didn’t show you how you could create yet another pod – but with a different set of networks using CRD. So, let’s get on with it and create another!

This time, you’ll note that the annotation is different here, instead of flannel-conf in the networks in annotations we have macvlan-conf which you’ll notice correlates with the object we have created (via the playbooks) in the CRDs.

Here’s my example pod spec…

[centos@kube-master ~]$ cat macvlan.pod.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: macvlanpod
  annotations:
    networks: '[  
        { "name": "macvlan-conf" }
    ]'
spec:
  containers:
  - name: macvlanpod
    command: ["/bin/bash", "-c", "sleep 2000000000000"]
    image: dougbtv/centos-network
    ports:
    - containerPort: 80

And I create that…

kubectl create -f macvlan.pod.yaml 

And then I watch that come up too (much quicker this time as it in theory should’ve pulled the image already to the same node)

$ watch -n1 kubectl describe pod macvlanpod

Now let’s check out the ip a on that pod, too.

[centos@kube-master ~]$ kubectl exec -it macvlanpod -- ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN 
    link/ether 96:ea:41:2b:38:23 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 192.168.1.200/24 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::94ea:41ff:fe2b:3823/64 scope link 
       valid_lft forever preferred_lft forever

Cool! It’s got an address on the 192.168.1.0/24 network. In theory, you could ping this pod from elsewhere on that network. In my case, I’m going to open up a ping stream to this pod on my workstation (which is VPN’d in and presents as 192.168.1.199) and then I’m going to sniff some packets with tcpdump while I’m at it.

On my workstation…

$ ping -c 100 192.168.1.200

And then from the pod…

[centos@kube-master ~]$ kubectl exec -it macvlanpod -- /bin/bash
[root@macvlanpod /]# tcpdump -i any icmp
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked), capture size 65535 bytes
18:30:21.195765 IP 192.168.1.199 > macvlanpod: ICMP echo request, id 695, seq 43, length 64
18:30:21.195814 IP macvlanpod > 192.168.1.199: ICMP echo reply, id 695, seq 43, length 64
18:30:22.197676 IP 192.168.1.199 > macvlanpod: ICMP echo request, id 695, seq 44, length 64
18:30:22.197721 IP macvlanpod > 192.168.1.199: ICMP echo reply, id 695, seq 44, length 64

Hey did you notice anything yet? There’s not truly multi-interface!

Hey you duped me, this isn’t multi-interface!

Ah ha! Now this is the part where we’ll bring it all together my good friend. Let’s create a pod that has BOTH macvlan, and flannel… All we have to do is create a list in the annotations – the astute eye may have noticed that the JSON already had the brackets for a list.

$ cat both.pod.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: bothpod
  annotations:
    networks: '[  
        { "name": "macvlan-conf" },
        { "name": "flannel-conf" }
    ]'
spec:
  containers:
  - name: bothpod
    command: ["/bin/bash", "-c", "sleep 2000000000000"]
    image: dougbtv/centos-network
    ports:
    - containerPort: 80

And create with that…

kubectl create -f both.pod.yaml

Of course, I watch it come up with watch -n1 kubectl describe pod bothpod.

And I can see that there’s now multiple interfaces – loopback, flannel, and macvlan!

[centos@kube-master multus-resources]$ kubectl exec -it bothpod -- ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN 
    link/ether c6:bc:74:df:80:7b brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 192.168.1.201/24 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::c4bc:74ff:fedf:807b/64 scope link 
       valid_lft forever preferred_lft forever
4: net0@if7: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP 
    link/ether 0a:58:0a:f4:01:03 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.244.1.3/24 scope global net0
       valid_lft forever preferred_lft forever
    inet6 fe80::6c4e:c5ff:fe5d:64f8/64 scope link 
       valid_lft forever preferred_lft forever

Here you can see it shows both the 10. network for flannel (net0), and the 192.168.1.0/24 network for the macvlan plugin (eth0).

Thanks for giving it a try! If you run into any issues, make sure to post ‘em on the issues for the kube-ansible github, or, if they’re multus specific (and not setup specific) to Multus CNI repo.

Are you exhausted? IPv4 almost is -- let's setup an IPv6 lab for Kubernetes

It’s no secret that there’s the inevitability that IPv4 is becoming exhausted. And it’s not just tired (ba-dum-ching!). Since we’re a bunch of Kubernetes fans, and we’re networking fans – we really want to check out what we can do with IPv6 with Kubernetes. Thanks to some slinky automation by my colleague, Feng Pan, contributed to kube-ansible, he was able to implement some creative work by leblancd. In this simple setup today, we’re going to deploy Kubernetes with custom binaries from leblancd and have two pods (ideally on different nodes) ping one another with ping6 and declare victory! In the future let’s hope to iterate on what’s necessary to get IPv6 functionality in Kubernetes.

There’s an ever growing interest in IPv6 for Kubernetes. There’s a solid effort by the good folks from the Kubernetes SIG-Network. You’ll find in the SIG-Network features spreadsheet that IPv6 is slated for the next release. There’s probably more to that Additionally, you can find some more information about the issues tagged for IPv6 up on the k/k GitHub, too.

There’s also a README for creating an IPv6 lab with kube-ansible on GitHub.

Limitations

Our goal here with this setup is to make it possible to ping6 one pod from another. I’m looking forward to using this laboratory to explore the other possibilities and scenarios, however this pod-to-pod ping6 is the baseline functionality from which to start adventuring into further territory.

Requirements

TL;DR: A host that can run VMs (or choose your own adventure and bring your baremetal or some other cloud), an editor (anything but Emacs, just kidding), git and Ansible.

To run these playbooks, we assume you have already adventured warily so far that you have:

  • A machine for running Ansible (like your workstation) and have Ansible installed.
  • Ansible 2.4 or later (necessary to support get_url with IPv6 enabled machines)
  • A host capable of running virtual machines, and is running CentOS 7.
  • Git. If you don’t have git, get git. Don’t be a git. We’ll clone up in a minute here.

We also disable the “bridged networking” feature we often use and instead uses NAT’ed libvirt virtual machines.

You may have to disable GRO (generic receive offload) for the NICs on the virtualization host (if you’re using one).

An example of doing so is:

ethtool -K em3 gro off

Fire up your terminal, and let’s clone this repo!

You’re going to need to clone up this repo, let’s clone at the latest tag that supports this functionality.

$ git clone --branch v0.1.6 https://github.com/redhat-nfvpe/kube-ansible.git

Cool, enter the dir and surf around if you wish, we’ll setup our inventory and necessary variables.

If you clone master instead of that tag, don’t forget to install the galaxy roles!

There’s likely some Ansible Galaxy roles to install, if find . | grep -i require shows any files, do a ansible-galaxy install -r requirements.yml.

Inventory and variable setup

Let’s look at an inventory and variable overrides to use. Make sure you have a host setup you can run VMs on, that’s running CentOS 7, and ensure you can SSH to it.

Here’s the initially used inventory, which only really cares about the virthost. Here I’m placing this inventory file @ inventory/my.virthost.inventory. You’ll need to modify the location of the host to match your environment.

the_virthost ansible_host=192.168.1.119 ansible_ssh_user=root

[virthost]
the_virthost

And the overrides which are based on the examples @ ./inventory/examples/virthost/virthost-ipv6.inventory.yml. I’m creating this set of extra variables @ ./inventory/extravars.yml :

bridge_networking: false
virtual_machines:
  - name: kube-master
    node_type: master
  - name: kube-node-1
    node_type: nodes
  - name: kube-node-2
    node_type: nodes
  - name: kube-nat64-dns64
    node_type: other
ipv6_enabled: true

Spinning up and access virtual machines

Perform a run of the virthost-setup.yml playbook, using the previously mentioned extra variables for override, and an inventory which references the virthost.

ansible-playbook -i inventory/my.virthost.inventory -e "@./inventory/extravars.yml" virthost-setup.yml

This will produce an inventory file in the local clone of this repo @ ./inventory/vms.local.generated. And it will also create some SSH keys for you which you’ll find in the .ssh folder of the user you ran the Ansible playbooks as.

In the case that you’re running Ansible from your workstation, and your virthost is another machine, you may need to SSH jump host from the virthost to the virtual machines.

If that is the case, you may add to the bottom of ./inventory/vms.local.generated a line similar to this (replacing root@192.168.1.119 with the method you use to access the virtualization host):

cat << EOF >> ./inventory/vms.local.generated
ansible_ssh_common_args='-o ProxyCommand="ssh -W %h:%p root@192.168.1.119"'
EOF

Optional: Handy-dandy “ssh to your virtual machines script”

You may wish to log into to the machines in order to debug, or even more likely – to access the Kubernetes master after an install.

You may wish to create a script, in this example… This script is located at ~/jumphost.sh and you should change 192.168.1.119 to the hostname or IP address of your virthost.

# !/bin/bash
ssh -i ~/.ssh/the_virthost/id_vm_rsa -o ProxyCommand="ssh root@192.168.1.119 nc $1 22" centos@$1

You would use this script by calling it with ~/jumphost.sh yourhost.local where the first parameter to the script is the hostname or IP address of the virtual machine you wish to acess.

Here’s an example of using it to access the kubernetes master by pulling the IP address from the generated inventory:

$ ~/jumphost.sh $(cat inventory/vms.local.generated | grep "kube-master.ansible" | cut -d"=" -f 2)

Deploy a Kubernetes cluster

With the above in place, we can now perform a kube install, and use the locally generated inventory.

ansible-playbook -i inventory/vms.local.generated -e "@./inventory/extravars.yml" kube-install.yml

SSH into the master, if you created it above, use the handy jumphost.sh.

Just double check things are coming up Milhouse Check out the status of the cluster with kubectl get nodes and/or kubectl cluster-info.

We’ll now create a couple pods via a ReplicationController. Create a YAML resource definition like so:

[centos@kube-master ~]$ cat debug.yaml 
apiVersion: v1
kind: ReplicationController
metadata:
  name: debugging
spec:
  replicas: 2
  selector:
    app: debugging
  template:
    metadata:
      name: debugging
      labels:
        app: debugging
    spec:
      containers:
      - name: debugging
        command: ["/bin/bash", "-c", "sleep 2000000000000"]
        image: dougbtv/centos-network-advanced
        ports:
        - containerPort: 80

Create the pods with kubectl by issuing:

$ kubectl create -f debug.yaml

Watch ‘em come up:

[centos@kube-master ~]$ watch -n1 kubectl get pods -o wide

Try it out!

Once those pods are fully running, list them, and take a look at the IP addresses, like so:

[centos@kube-master ~]$ kubectl get pods -o wide
NAME              READY     STATUS    RESTARTS   AGE       IP            NODE
debugging-cvbb2   1/1       Running   0          4m        fd00:101::2   kube-node-1
debugging-gw8xt   1/1       Running   0          4m        fd00:102::2   kube-node-2

Now you can exec commands in one of them, to ping the other (note that your pod names and IPv6 addresses are likely to differ):

[centos@kube-master ~]$ kubectl exec -it debugging-cvbb2 -- /bin/bash -c 'ping6 -c5 fd00:102::2'
PING fd00:102::2(fd00:102::2) 56 data bytes
64 bytes from fd00:102::2: icmp_seq=1 ttl=62 time=0.845 ms
64 bytes from fd00:102::2: icmp_seq=2 ttl=62 time=0.508 ms
64 bytes from fd00:102::2: icmp_seq=3 ttl=62 time=0.562 ms
64 bytes from fd00:102::2: icmp_seq=4 ttl=62 time=0.357 ms
64 bytes from fd00:102::2: icmp_seq=5 ttl=62 time=0.555 ms

Finally pat yourself on the back and enjoy some IPv6 goodness.

Ghost Riding The Whip -- A complete Kubernetes workflow without Docker, using CRI-O, Buildah & kpod

It is my decree that whenever you are using Kubernetes without using Docker you are officially “ghost riding the whip”, maybe even “ghost riding the kube”. (Well, I’m from Vermont, so I’m more like “ghost riding the combine”). And again, we’re running Kubernetes without Docker, but this time? We’ve got an entire workflow without Docker. From image build, to running container, to inspecting the running containers. Thanks to the good folks from the OCI project and Project Atomic, we’ve got kpod for working with running containers, and we’ve got buildah for building our images. And of course, don’t leave out CRI-O which makes the magic happen to get it all running in Kube without Docker. Fire up your terminals, because you’re about to ghost ride the kube.

I happened to see that there is a first release candidate of CRI-O which has a bunch of great improvements that work towards really getting CRI-O production ready for Kubernetes. And I have to say – my experience with using it has been nearly flawless. It’s been working like a champ, and I can tell they’re doing an excellent job with the polish. Of course that’s awesome, but, I was most excited to hear about kpod – “the missing tool”. When I wrote my first article about using CRI-O, I was missing a few portions – especially a half decent tool for checking out what’s going on with containers. This tool isn’t quite as mature as CRI-O itself, but, the presence of this tool at all is just a straight-up boon.

To get this all going, I have these tools (CRI-O, kpod & buildah) integrated into my vanilla kubernetes lab playbooks, kube-ansible. This playbook has it so we can compile CRI-O (which includes kpod), buildah, and get Kubernetes up and running (which uses kubeadm to initialize and join the pods). I made some upgrades to kube-ansible in the process, fixing up issues with kube 1.7, and also improving it so that kube-ansible can also use Fedora. CRI-O itself works wondefully with CentOS, but Buildah needs some kernel functionality that just isn’t available in CentOS yet, so… kube-centos-ansible now also supports Fedora, oddly or not-so-oddly enough.

Requirements

This walk-through assumes that you have at least 2 machines with Fedora installed (and generally up-to-date). That’s where we’ll install Kubernetes with CRI-O (and kpod!). You might notice that we use kube-ansible, the name of which is… Not so apropos. But! It’s recently been updated to support Fedora. And we need Fedora to get a spankin’ fresh kernel, so we can use… Drum roll please… Buildah – an image building tool that is not Docker (wink, wink!).

Those machines need to have over 2 gigs of RAM. Compilation of CRI-O, specifically during a step with GCC was bombing out on me with GCC complaining it couldn’t allocate memory when I had just 2 gigs of RAM. Therefore, I recommend at least 4 gigs of RAM.

In addition to that, you’ll need git & Ansible installed on “some machine” (likely your workstation). And your handy-dandy editor. Cause… How do you live without an editor? Unless you’re feeding the input in on punch cards, in which case… You have my respect.

TL;DR, you need:

  • 2 or more Fedora machines with 4 gigs or RAM or more (and maybe 5 gigs free on disk)
  • On a client machine (like your workstation)

Spinning up a Kubernetes cluster with CRI-O (and kpod included!)

First off, go ahead and clone up the kube-ansible project…

git clone --branch v0.1.3 https://github.com/redhat-nfvpe/kube-ansible.git

This article glosses over the fact that the kube-ansible has the ability to spin-up virtual machines to mock-up Kubernetes clusters. However, if you’re familiar with it, you can use it as well. I won’t go into depth here, but this is the technique that I use:

$ ansible-playbook -i inventory/your.inventory -e "vm_parameters_ram_mb=4096" virt-host-setup.yml 

Now we’ll a playbook to bootstrap the nodes with Python (as the Fedora cloud images don’t come packaged with Python).

$ ansible-playbook -i inventory/your.inventory fedora-python-bootstrapper.yml

For your reference here’s the inventory I used. This inventory can also be found in the ./inventory/examples/crio/crio.inventory in the clone. Mostly this is here to show you how to set the variables in order to get this puppy (that is, kube-ansible) to properly use Fedora, when it comes down to it.

kube-master ansible_host=192.168.1.149
kube-node-1 ansible_host=192.168.1.87
kubehost ansible_host=192.168.1.119 ansible_ssh_user=root

[kubehost]
kubehost
[kubehost:vars]
# Using Fedora
centos_genericcloud_url=https://download.fedoraproject.org/pub/fedora/linux/releases/26/CloudImages/x86_64/images/Fedora-Cloud-Base-26-1.5.x86_64.qcow2
image_destination_name=Fedora-Cloud-Base-26-1.5.x86_64.qcow2
increase_root_size_gigs=10

[master]
kube-master
[nodes]
kube-node-1
[all_vms]
kube-master
kube-node-1

[all_vms:vars]
# Using Fedora
ansible_ssh_user=fedora
ansible_ssh_private_key_file=/home/doug/.ssh/id_testvms
kubectl_home=/home/fedora
kubectl_user=fedora
kubectl_group=fedora

Start the Kubernetes install

Then you can go ahead and get your kube install rolling!

$ ansible-playbook -i inventory/vms.inventory -e 'container_runtime=crio' kube-install.yml 

That, my good friend… Is is a coffee-worthy step. It is now time to fuel up while that runs. (We’re compiling some big-idea kind of stuff, like CRI-O [and more]).

Verify that things are hunky dory

(Still unsure what the genesis of the phrase “hunky dory” is. But it means “satisfactory” or “just fine”)

Log yourself into the master. And, first of all things… Make sure you DON’T have Docker. And grin during this step. Cause I sure did.

[fedora@kube-crio-master ~]$ docker
-bash: docker: command not found
[fedora@kube-master ~]$ echo $?
127

YES. We want that to exit 127!

Make sure to see that the nodes are healthy…

$ kubectl get nodes

And making sure the nodes are in a ready state.

Optionally, spin up a pod. in my case I did a…

[fedora@kube-crio-master ~]$ cat <<EOF | kubectl create -f -
apiVersion: v1
kind: ReplicationController
metadata:
  name: nginx
spec:
  replicas: 2
  selector:
    app: nginx
  template:
    metadata:
      name: nginx
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx
        ports:
        - containerPort: 80
EOF
[fedora@kube-crio-master ~]$ watch -n1 kubectl get pods

They should come up! And if they are you should be able to query nginx.

[fedora@kube-crio-master ~]$ curl -s $(kubectl describe pod $(kubectl get pods | grep nginx | head -n 1 | awk '{print $1}') | grep "^IP" | awk '{print $2}') | grep -i thank
<p><em>Thank you for using nginx.</em></p>

Cool! That means that you have CRI-O up and poppin’. You are officially ghost riding the whip.

Clean that up if you want, with:

[fedora@kube-master ~]$ kubectl delete rc nginx

Wait – didn’t I promise you a complete work-flow that omits Docker at all? That’s right I did. So let’s go ahead and start up a from-scratch workflow here… with…

Buildah!

Awesome. Now, let’s go ahead and log into the node. For ease, for now, we’ll also sudo su -. In the future, you might wanna set this up to work for a specific user, but, I’ll leave that as a journey for the reader.

Check out the help for buildah, if you wish. That’s how I learned how to do this myself.

[root@kube-node-1 ~]# buildah --help

Now, let’s create a “Dockerfile”. We’ll use the Dockerfile syntax, as I’m familiar with it, and if you have existing Dockerfiles – buildah supports that!

So go ahead and make yourself a Dockerfile like so.

[root@kube-node-1 ~]# cat Dockerfile 
FROM fedora:26
RUN dnf install -y cowsay-beefymiracle cowsay
ENTRYPOINT ["cowsay","-s","Shoutout from Vermont!"]

This image is just a couple RPMs, really. Mostly cowsay (and then an extra “cowsay file” to add the beefy miracle art. According to Wikipedia:

cowsay is a program that generates ASCII pictures of a cow with a message.

And you think that machine learning is high tech? Obviously you haven’t seen cow ASCII art insult a co-worker before. The pinnacle of technology.

BONUS: To insult your co-workers using cowsay, install the package with dnf install cowsay and use wall to broadcast a message to all terminals logged into a machine.

[fedora@kube-node-1 ~]$ cowsay -s "your mother wears army boots" | wall
                                                                               
Broadcast message from fedora@kube-node-1 (pts/0) (Wed Sep 20 13:32:41 2017):
                                                                               
 ______________________________                                                
< your mother wears army boots >                                               
 ------------------------------                                                
        \   ^__^                                                               
         \  (**)\_______                                                       
            (__)\       )\/\                                                   
             U  ||----w |                                                      
                ||     ||                                                      

Now that you have sufficiently made enemies with your co-workers, back to getting this workflow going.

Go ahead and kick off the build. And on the subject of ASCII – enjoy yourself the nicer ASCII progress bars than Docker, too.

[root@kube-node-1 ~]# buildah  --storage-driver overlay2 bud -t dougbtv/beefy .

The command we’re using there is buildah budbud is “build using dockerfile”. Very nice feature.

Note that we’re setting --storage-driver overlay2 (as a global option) which will store the images in the proper locations for runc (and therefore CRI-O) to see where these images are.

Also, for what it’s worth – I didn’t have great luck with the build cache on subsequent runs of buildah. I’m unsure what the progress on that functionality in buildah itself is. Likely, it may be something I did wrong in the compilation or installation of buildah, so if you see it and shoot me a note on twitter or place a github issue, that’d be awesome.

You can go ahead and list what you just built. Note that we’re including the storage driver option, again.

[root@kube-node-1 ~]# buildah  --storage-driver overlay2 images | grep -P "(IMAGE|beefy)"
IMAGE ID             IMAGE NAME                                               CREATED AT             SIZE
95c3725439f6         docker.io/dougbtv/beefy:latest                           Sep 15, 2017 23:28     1.983 KB

Great! You’ve got an image.

Now, lets run that image!

We’ll do this with Kubernetes itself today. Log into your master, and first thing, let’s specify a label that we’ll use for a node selector (which will specify on which node we’ll run this particular pod). In this case we’re doing this because we don’t have a registry to pull the images from, so, we’ve got to tell Kube to run the pod in a particular place – because that where we built the image.

Here’s the (admittedly zany) label that I added. (You can make a lot less insane node selector constraint if you’re sound of mind, too.)

$ kubectl label nodes kube-node-1 beefylevel=expert

And you can see what’s been labeled with:

[fedora@kube-master ~]$ kubectl get nodes --show-labels

Create yourself a beefy.yaml. Here you’ll see a few things that are fairly important, but something to pay attention to in this context is the imagePullPolicy: Never. Since this isn’t available on a registry, we want to tell Kubernetes “don’t even try to pull this”, by default it will try, say it can’t pull it, and then it won’t run the container.

Here’s the beefy.yaml I created.

[fedora@kube-master ~]$ cat beefy.yaml 
---
apiVersion: v1
kind: Pod
metadata:
  labels:
    app: beefy
  name: beefy
spec:
  containers:
   - command:
       - "/bin/bash"
       - "-c"
       - "cowsay -f /usr/share/cowsay/beefymiracle.cow -s 'shouts from Vermont' && sleep 2000000"
     image: dougbtv/beefy
     name: beefy
     imagePullPolicy: Never
  nodeSelector:
    beefylevel: expert

Go ahead and create that…

[fedora@kube-master ~]$ kubectl create -f beefy.yaml 
pod "beefy" created

And watch it come up.

[fedora@kube-master ~]$ watch -n1 kubectl get pods -o wide

(Note that it should be saying it’s coming up on kube-node-1)

Now for the pay day… Let’s see it rollin’.

[fedora@kube-master ~]$ kubectl logs beefy
 _____________________
< shouts from Vermont >
 ---------------------
              \
                      .---. __
           ,         /     \   \    ||||
          \\\\      |O___O |    | \\||||
          \   //    | \_/  |    |  \   /
           '--/----/|     /     |   |-'
                  // //  /     -----'
                 //  \\ /      /
                //  // /      /
               //  \\ /      /
              //  // /      /
             /|   ' /      /
             //\___/      /
            //   ||\     /
            \\_  || '---'
            /' /  \\_.-
           /  /    --| |
           '-'      |  |
                     '-'

Huzzah! We’ve got Beefy. It’s a gosh darned miracle. Dang heckin’ good job.

Awesome! That’s a whole workflow without Docker. Aww yisss. Now, let’s put a cherry on top…

Let’s try out kpod!

Enter kpod! That’s the missing tool from my last CRI-O article. We only had some really rudimentary stuff in runc that could do this for us. But, the Atomic guys are really tearing it up, and now we’ve got kpod which can do a whole lot more for us.

Now that we have a running container – We can check it out with kpod. There’s a lot more features on the way for kpod, but, for now it gives a nice way to work with your containers (and some container image utilities). I wanted to run it directly with this, but, that’s in the works at the tag at which I have CRI-O/kpod pinned.

So go ahead and log into the node… And we’ll sudo su - for now (as above). And let’s list the container processes…

[root@kube-node-1 ~]# kpod ps

Awesome! You should see dougbtv/beefy:latest in there.

And you can list the images with this tool, too.

[root@kube-node-1 ~]# kpod images

Say you want to see what’s in the ephemeral storage of an image, we can use kpod for this, too. So let’s pick up the id of our running container.

[root@kube-node-1 ~]# beefyid=$(kpod ps | grep -i beef | awk '{print $1}')
[root@kube-node-1 ~]# echo $beefyid
74caa091b27b7

Now we can use that in order to look at what’s in the container. Let’s just cat the definition for the beefy miracle in cowsay.

[root@kube-node-1 ~]# cat $(kpod mount $beefyid)/usr/share/cowsay/beefymiracle.cow

That should show you a heavily escaped ASCII hotdog. Alright! Nice work Project Atomic folks! Quite a feat.

Ratchet CNI -- Using VXLAN for network isolation for pods in Kubernetes

In today’s episode we’re looking at Ratchet CNI, an implementation of Koko – but in CNI, the container networking interface that is used by Kubernetes for creating network interfaces. The idea being that the network interface creation can be performed by Kubernetes via CNI. Specifically we’re going to create some network isolation of network links between containers to demonstrate a series of “cloud routers”. We can use the capabilities of Koko to both create vEth connections between containers when they’re local to the same host, and then VXLAN tunnels to containers when they’re across hosts. Our goal today will be to install & configure Ratchet CNI on an existing cluster, we’ll verify it’s working, and then we’ll install a cloud router setup based on zebra pen (a cloud router demo).

Here’s what the setup will look like when we’re done:

diagram

The gist is that the green boxes are Kubernetes minions which run pods, and the blue boxes are pods running on those hosts, and the yellow boxes are the network interfaces that will be created by Ratchet (and therefore Koko). In this scenario, just one VXLAN tunnel is created when going between the hosts.

So that means we’ll route traffic from “CentOS A” container, across 2 routers (which use OSPF) to finally land at “CentOS B”, and have a ping come back across the links.

Note that Ratchet is still a prototype, and some of the constraints of it are limited to the static way in which interfaces and addressing is specified. This is indeed a limitation, but is intended to illustrate how you might specify the links between these containers.

Requirements

Required:

  • A Kube cluster, spin one up my way if you wish.
  • Two nodes where you can schedule pods (and have the ability modify the CNI configuration on those nodes)

Optional:

  • An operational Flannel plugin in CNI running on the cluster beforehand.

It’s worth it for you to note that I use an all CentOS 7 based cluster, which while it isn’t required, definitely colors how I use the ancillary tools and approach things.

Installing the Ratchet binaries

First thing we’re going to do is on each of the nodes we’re going to use here. In my case it’s just going to be two minion nodes which can schedule pods, I don’t bother putting it on my master.

Here’s how I download and put the binaries into place:

$ curl -L https://github.com/dougbtv/ratchet-cni/releases/download/v0.1.0/ratchet-cni-v0.1.0.tar.gz > ratchet.tar.gz
$ tar -xzvf ratchet.tar.gz 
$ sudo mv ratchet-cni-v0.1.0/* /opt/cni/bin/

That’s it – you’ve got ratchet! (Again, man, Go makes it easy, right.)

Spin up etcd

You’ll need to have an etcd instance – if you have a running instance you want to use for this, go ahead. I’ll include my scheme here where I run my own.

From wherever you have kubectl available, go ahead and work on these steps.

Firstly, I create a new namespace to run these etcd pods in…

$ tee ratchet-namespace.yaml <<'EOF'
{
  "kind": "Namespace",
  "apiVersion": "v1",
  "metadata": {
    "name": "ratchet",
    "labels": {
      "name": "ratchet"
    }
  }
}
EOF
$ kubectl create -f ratchet-namespace.yaml 
$ kubectl get namespaces

I have an example etcd pod spec in this gist, I download that…

[centos@kube-master ~]$ curl -L https://goo.gl/eMnsh9 > etcd.yaml

And then create it in the ratchet namespace we just created, and watch it come up.

$ kubectl create -f etcd.yaml --namespace=ratchet
$ watch -n1 kubectl get pods --namespace=ratchet

This has also created a service for us.

[centos@kube-master ~]$ kubectl get svc --namespace=ratchet | head -n2
NAME          CLUSTER-IP      EXTERNAL-IP   PORT(S)             AGE
etcd-client   10.102.72.174   <none>        2379/TCP            56s

This service is important to the Ratchet configuration. So note how you can access this service – you can use the IP if all else fails, at least for testing that’s just fine. You don’t want to rely on that full-time, however.

If your nodes don’t resolve etcd-client.ratchet.svc.cluster.local – pay special attention. As this is the DNS name for etcd I’ll use in the following configs.

Configuring Ratchet

Now we need to put configurations into place. Firstly, you’re going to want to clear out whatevers in /etc/cni/net.d/, I recommend before getting to this point that you have flannel working because we can do something cool with this plugin available – we can bypass ratchet and pass along ineligible pods to Flannel (or any other plugin). I’ll include configs that have Flannel, here. If appropriate, replace with another plugin configuration.

Here I am moving my configs to a backup directory, do this on both hosts that will run Ratchet…

[centos@kube-minion-1 ~]$ mkdir cni-configs
[centos@kube-minion-1 ~]$ sudo mv /etc/cni/net.d/* ./cni-configs/

Let’s look at my current configuration…

[centos@kube-minion-2 ~]$ cat cni-configs/10-flannel.conf 
{
  "name": "cbr0",
  "type": "flannel",
  "delegate": {
    "isDefaultGateway": true
  }
}

It’s a Flannel config, I’m gonna keep this around for a minute, cause I’ll use it in my upcoming configs.

Next, let’s assess what you have available for networking. Mine is pretty simple. Each of my nodes have a single nic – eth0, and it’s on the 192.168.1.0/24 network, and that network is essentially flat – it can access the WAN over that NIC, and also the other nodes on the network. Naturally, in real life – your network will be more complex. But, in this step… Choose the proper NIC and IP address for your setup.

So, I pick out my NIC and IP address, what’s it look like on my nodes…

[centos@kube-minion-1 ~]$ ip a | grep -Pi "eth0|inet 192"
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    inet 192.168.1.73/24 brd 192.168.1.255 scope global dynamic eth0

Ok, cool, so I have eth0 and it’s 192.168.1.73 – these are both going into my Ratchet config.

Now, here’s my Ratchet config I’ve created on this node, as /etc/cni/net.d/10-ratchet.conf:

[centos@kube-minion-1 ~]$ cat /etc/cni/net.d/10-ratchet.conf
{
  "name": "ratchet-demo",
  "type": "ratchet",
  "etcd_host": "etcd-client.ratchet.svc.cluster.local",
  "etcd_port": "2379",
  "child_path": "/opt/cni/bin/ratchet-child",
  "parent_interface": "eth0",
  "parent_address": "192.168.1.73",
  "use_labels": true,
  "delegate": {
    "name": "cbr0",
    "type": "flannel",
    "delegate": {
      "isDefaultGateway": true
    }
  },
  "boot_network": {
    "type": "loopback"
  }
}

Some things to note:

  • type: ratchet is required
  • etcd
    • etcd_host generally should point to the service we created in the previous step
    • etcd_port is the port on which etcd will respond.
    • You can test if curl etcd-client.ratchet.svc.cluster.local:2379 works and that will let you know if etcd is responding (it’ll respond with a 404)
  • child_path points where the secondary binary for ratchet lives, following these instructions this is the proper path.
  • VXLAN
    • parent_interface is the interface on which the VXLAN tunnels will reside
    • parent_address is the IP address remote VXLANs will use to create a tunnel to this machine.
  • use_labels should generally be true.
  • Alternate CNI plugin
    • delegate is a special field. In this we pack in an entire CNI config for another plugin. You’ll note that this is set to the exact entry that we have earlier when I show the current config for CNI on one of the minions. When pods are not labeled to use ratchet, they will use this CNI plugin (more on the labeling later).
  • boot_network – similar to delegate but when pods are eligble to be processed by Ratchet, they will have an extra interface created with the CNI config as packed into this property. In this case I just set a loopback device, using the loopback CNI plugin.

Great! You’ve got one all set. But, you need two. So setup another one on a second host.

On my second host I have the same config @ /etc/cni/net.d/10-ratchet.conf – minus one line which differs, and they is the parent_address (the parent_interface would differ if the nics were named differently on each host), so for example on the second minion I have…

[centos@kube-minion-2 ~]$ ip a | grep -iP "(eth0|192)"
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    inet 192.168.1.33/24 brd 192.168.1.255 scope global dynamic eth0

[centos@kube-minion-2 ~]$ cat /etc/cni/net.d/10-ratchet.conf | grep parent
  "parent_interface": "eth0",
  "parent_address": "192.168.1.33",

Note that the IP address in parent_address matches that of the address on eth0.

Labeling the nodes

Alright, something we’re going to want to do is to specify which pods run where for demonstrative purposes. For this we’re going to use nodeSelector to tell Kube where to run these pods.

That being said, we will assign a label to each one…

[centos@kube-master ~]$ kubectl label nodes kube-minion-1 ratchetside=left
[centos@kube-master ~]$ kubectl label nodes kube-minion-2 ratchetside=right

And you can check those labels out if you need to…

[centos@kube-master ~]$ kubectl get nodes --show-labels

Running two pods as a baseline test

We are now all configured and ready to rumble with Ratchet. Let’s first create a couple pods to make sure everything is running.

Let’s create these pods using this yaml:

---
apiVersion: v1
kind: Pod
metadata:
  name: primary-pod
  labels:
    app: primary-pod
    ratchet: "true"
    ratchet.pod_name: "primary-pod"
    ratchet.target_pod: "primary-pod"
    ratchet.target_container: "primary-pod"
    ratchet.public_ip: "1.1.1.1"
    ratchet.local_ip: "192.168.2.100"
    ratchet.local_ifname: "in1"
    ratchet.pair_name: "pair-pod"
    ratchet.pair_ip: "192.168.2.101"
    ratchet.pair_ifname: "in2"
    ratchet.primary: "true"
spec:
  containers:
    - name: primary-pod
      image: dougbtv/centos-network
      command: ["/bin/bash"]
      args: ["-c", "while true; do sleep 10; done"]
  nodeSelector:
    ratchetside: left
---
apiVersion: v1
kind: Pod
metadata:
  name: pair-pod
  labels:
    app: pair-pod
    ratchet: "true"
    ratchet.pod_name: pair-pod
    ratchet.primary: "false"
spec:
  containers:
    - name: pair-pod
      image: dougbtv/centos-network
      command: ["/bin/bash"]
      args: ["-c", "while true; do sleep 10; done"]
  nodeSelector:
    ratchetside: right

Likely the most important things to look at are these labels:

ratchet: "true"
ratchet.pod_name: "primary-pod"
ratchet.target_pod: "primary-pod"
ratchet.target_container: "primary-pod"
ratchet.local_ip: "192.168.2.100"
ratchet.local_ifname: "in1"
ratchet.pair_name: "pair-pod"
ratchet.pair_ip: "192.168.2.101"
ratchet.pair_ifname: "in2"
ratchet.primary: "true"

These are how ratchet knows how to setup the interfaces on the pods. You set up each pod as pairs. Where there’s a “primary” and a “pair”. You need to (as of now) know the name of the pod that’s going to be the pair. Then you can set the names of the interfaces, and which IPs are assigned. In this case we’re going to have an interface called in1 on the primary side, and an interface named in2 on the pair side. The primary will be assigned the IP address 192.168.2.100 and the pair will have the IP address 192.168.2.101.

Of all of the parameters, the keystone is the ratchet: "true" parameter, which tells us that ratchet should process this pod – otherwise, it will pass through the pod to another CNI plugin given the delegate parameter in the ratchet configuration.

I put that into a file example.yaml and created it as such:

[centos@kube-master ~]$ kubectl create -f example.yaml 

And then watched it come up with watch -n1 kubectl get pods. Once it’s up, we can check out some stuff.

But – you should also check out which nodes they’re running on to make sure you got the labelling and the nodeSelector’s correct. You can do this by checking out the description of the pods, and looking for the node values.

$ kubectl describe pod primary-pod | grep "^Node"
$ kubectl describe pod pair-pod | grep "^Node"

Now that you know they’re on differnt nodes, let’s enter the primary pod.

[centos@kube-master ~]$ kubectl exec -it primary-pod -- /bin/bash

Now we can take a look at the interfaces…

[root@primary-pod /]# ip a | grep -P "(^\d|inet\s)"
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
    inet 127.0.0.1/8 scope host lo
7: in1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN qlen 1000
    inet 192.168.2.100/24 brd 192.168.2.255 scope global in1

Note that there’s two interfaces:

  • lo which is a loopback created by the boot_network CNI pass through parameter in our configuration.
  • in1 which is a vxlan, assigned the 192.168.2.100 IP address as we defined in the pod labels.

Let’s look at the vxlan properties like so:

[root@primary-pod /]# ip -d link show in1
7: in1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN mode DEFAULT qlen 1000
    link/ether 9e:f4:ab:a0:86:7a brd ff:ff:ff:ff:ff:ff link-netnsid 0 promiscuity 0 
    vxlan id 11 remote 192.168.1.33 dev 2 srcport 0 0 dstport 4789 l2miss l3miss ageing 300 addrgenmode eui64 

You can see that it’s a vxlan with an id of 11, and the remote side is @ 192.168.1.33 which is the IP address of the second minion node. That’s looking correct.

That being said, we can ping the other side now, that we know is @ IP address of 192.168.2.101

[root@primary-pod /]# ping -c1 192.168.2.101
PING 192.168.2.101 (192.168.2.101) 56(84) bytes of data.
64 bytes from 192.168.2.101: icmp_seq=1 ttl=64 time=0.546 ms

Excellent! All is well and good, let’s destroy this pod, and shortly we’ll move onto the more interesting setup.

[centos@kube-master ~]$ kubectl delete -f example.yaml 

Quick clean-up procedure

Ratchet is in need of some clean-up routines of its own, and since they’re not implemented yet, we have to clean up the etcd data ourselves. So let’s do that right now.

We’re going to create a kubernetes job to delete, with this yaml:

---
apiVersion: batch/v1
kind: Job
metadata:
  name: etcd-delete
spec:
  template:
    metadata:
      name: etcd-delete
    spec:
      containers:
      - name: etcd-delete
        image: centos:centos7
        command: ["/bin/bash"]
        args:
          - "-c"
          - >
            ETCD_HOST=etcd-client.ratchet.svc.cluster.local;
            curl -s -L -X DELETE http://$ETCD_HOST:2379/v2/keys/ratchet\?recursive=true;
      restartPolicy: Never

I created this file as job-delete-etcd.yaml, and then executed it as such:

[centos@kube-master ~]$ kubectl create -f job-delete-etcd.yaml 

And I want to watch it come to completion with:

[centos@kube-master ~]$ watch -n1 kubectl get pods --show-all

You can now remove the job if you wish:

[centos@kube-master ~]$ kubectl delete -f job-delete-etcd.yaml 

Running the whole cloud router

Next, we’re going to run a more interesting setup. I’ve got the YAML resource definitions stored in this gist, so you can peruse them more deeply.

A current limitation is that there are 2 parts, you have to run the first part, wait for the pods to come up, then you can run the second part. This is due to the fact that the current VXLAN implementation of Ratchet is a sketch, and doesn’t take into account a few different use cases – one of which being that there is sometimes more than “just a pair” – and in this case, there’s 3 pairs and some overlap. So we create them in an ordered fashion to let Ratchet think of them just as pairs – because otherwise if we create them all right now, we get a race condition, and usually the vEth wins, so… We’re working around that here ;)

Let’s download those yaml files.

$ curl -L https://goo.gl/QLGB2C > cloud-router-part1.yaml
$ curl -L https://goo.gl/aQotzQ > cloud-router-part2.yaml

Now, create the first part, and let the pods come up.

[centos@kube-master ~]$ kubectl create -f cloud-router-part1.yaml 
[centos@kube-master ~]$ watch -n1 kubectl get pods --show-all

Then you can create the second part, and watch the last single pod come up.

[centos@kube-master ~]$ kubectl create -f cloud-router-part2.yaml 
[centos@kube-master ~]$ watch -n1 kubectl get pods --show-all

Using the diagram up at the top of the post, we can figure out that the “Centos A” box routes through both quagga-a and quagga-b before reaching Centos B – so that means if we ping Centos B from Centos A – that’s an end-to-end test. So let’s run that ping:

[centos@kube-master ~]$ kubectl exec -it centosa -- /bin/bash
[root@centosa /]# ping -c5 192.168.4.101
PING 192.168.4.101 (192.168.4.101) 56(84) bytes of data.
64 bytes from 192.168.4.101: icmp_seq=1 ttl=62 time=0.399 ms
[... snip ...]

Hurray! Feel free to go and dig through the rest of the pods and check out ip a and ip -d link show etc. Also feel free to enter the quagga pods and run vtysh and see what’s going on in the routers, too.

Debugging Ratchet issues

This is the very short version, but, there’s basically two places you want to look to see what’s going on.

  • journalctl -u kubelet -f will give you the output from ratchet when it’s run by CNI proper, this is how it’s initially run.
  • tail -f /tmp/ratchet-child.log – this is the log from the child process, and likely will give you the most information. Note that this method of logging to temp is an ulllllltra hack. And I mean it’s a super hack. It’s just a work-around to get some output while debugging for me.