Using Robocniconfig to generate CNI configs with an LLM

Time to automate some secondary networks automagically by having configurations produced using a large language model (LLM)!

I saw ollama at Kubecon and I had to play with it :D It’s really fun and I had good luck. I love the idea of having a containerized LLM and then hooking up my own toys to it! …I’ve also played with doing such myself, maybe a little bit that I’ve mentioned on this blog or you’ve seen a Dockerfile or two from me. But! It’s really nice to have an opinionated installation.

Here’s the concept of Robo CNI

  • You (the user) give an instruction as a short blurb about what kind of CNI configuration you want, like “I’ll take a macvlan CNI with an eth0 master, and IP addresses in the 192.0.2.0/24 range”
  • RoboCNI runs that through a large language model, with some added context to help the LLM figure out how to produce a CNI configuration
  • Then, we test that a bunch of times to see how effectively it works.

I’m doing something I heard data scientists tell me not do with it (paraphrased): “Don’t go have this thing automatically configure stuff for you”. Well… I won’t be doing in production. I’ve had enough pager calls at midnight in my life without some robot making it worse, but, I will do it in a lab like… whoa!

So I wrote an application Robo CNI Config! It basically generates CNI configurations based on your prompts, so you’d use the core application like:

$ ./robocni --json "dude hook me up with a macvlan mastered to eth0 with whereabouts on a 10.10.0.0/16"
{
    "cniVersion": "0.3.1",
    "name": "macvlan-whereabouts",
    "type": "macvlan",
    "master": "eth0",
    "mode": "bridge",
    "ipam": {
        "type": "whereabouts",
        "range": "10.10.0.225/28"
    }
}

I basically accomplished this by writing some quick and dirty tooling to interface with it, and then a context for the LLM that has some CNI configuration examples and some of my own instructions on how to configure CNI. You can see the context itself here in github.

It has stuff like you’d imagine something like ChatGPT is kind of “pre-prompted” with, like:

Under no circumstance should you reply with anything but a CNI configuration. I repeat, reply ONLY with a CNI configuration.
Put the JSON between 3 backticks like: ```{"json":"here"}```
Respond only with valid JSON. Respond with pretty JSON.
You do not provide any context or reasoning, only CNI configurations.

And stuff like that.

I also wrote a utility to automatically spin up pods given these configs and test connectivity between them. I then kicked it off for 5000 runs over a weekend. Naturally my kube cluster died from something else (VMs decided to choke), so I had to spin up a new cluster, and then start it again, but… It did make it through 5000 runs.

Amazingly: It was able to get a ping across pods that were automatically approximately 95% of the time. Way better than I expected, and I even see some mistakes that could be corrected, too.

It’s kinda biased towards macvlan, bridge and ipvlan CNI plugins, but… You gotta start somewhere.

So, let’s do it!

Pre-reqs…

There’s a lot, but I’m hopeful you can find enough pointers from other stuff in my blog, or… even a google search.

  • A working Kubernetes install (and access to the kubectl command, of course)
  • A machine with a GPU where you can run ollama
  • Multus CNI is installed, as well as Whereabouts IPAM CNI.

I used my own home lab. I’ve got a Fedora box with a Nvidia Geforce 3090, and it seems to be fairly decent for running LLMs and StableDiffusion training (which is the main thing I use it for, honestly!)

Installing and configuring ollama

Honestly, it’s really as easy as following the quickstart.

Granted – the main thing I did need to change was that it would accept queries from outside the network, so I did go and mess with the systemd units, you can find most of what you need in the installation on linux section, and then also the port info here in the FAQ.

I opted to use the LLaMA 2 model with 13B params. Use whatever you like (read: Use the biggest one you can!)

Using RoboCNI Config

First clone the repo @ https://github.com/dougbtv/robocniconfig/

Then kick off ./hack/build-go.sh to build the binaries.

Generally, I think this is going to work best on uniform environments (e.g. same interface names across all the hosts). I ran it on a master in my environment, which is a cluster with one 1 master and 2 workers.

You can test it out by running something like…

export OLLAMA_HOST=192.168.2.199
./robocni "give me a macvlan CNI configuration mastered to eth0 using whereabouts ipam ranged on 192.0.2.0/24"

That last part in the quotes is the “hint” which is basically the user prompt for the LLM that gets added to my larger context of what the LLM is supposed to do.

Then, you can run it in a loop.

But first! Make sure robocni is in your path.

./looprobocni --runs 5000

My first run of 5000 was…. Better than I expected!

Run number: 5000
Total Errors: 481 (9.62%)
Generation Errors: 254 (5.08%)
Failed Pod Creations: 226 (4.52%)
Ping Errors: 0 (0.00%)
Stats Array:
  Hint 1: Runs: 786, Successes: 772
  Hint 2: Runs: 819, Successes: 763
  Hint 3: Runs: 777, Successes: 768
  Hint 4: Runs: 703, Successes: 685
  Hint 5: Runs: 879, Successes: 758
  Hint 6: Runs: 782, Successes: 773

In this case, there’s 5% generation errors, but… Those can be trapped. So discounting those… 95% of the runs were able to spin up pods, and… Amazingly – whenever those pods came up, a ping between them worked.

Insert:<tim-and-eric-mind-blown.jpg>

I had more fun doing this than I care to admit :D

Installing Oobabooga LLM text webui on Fedora 38 (with podman!)

Today we’re going to run Ooobabooga – the text generation UI to run large language models (LLMs) on your local machine. We’ll make it containerized so that you can keep everything sitting pretty right where it is, otherwise.

Requirements

Looks like we’ll need podman compose if you don’t have it…

  • Fedora 38
  • A nVidia GPU
  • Podman (typically included by default)
  • podman-compose (optional)
  • The nVidia drivers

If you want podman compose, pick up:

pip3 install --user podman-compose

Driver install

You’re also going to need to install the nVidia driver, and the nVidia container tools

Before you install CUDA, do a dnf update (otherwise I wound up with mismatched deps), then install CUDA Toolkit (link is for F37 RPM, but it worked fine on F38)

And the container tools:

curl -s -L https://nvidia.github.io/libnvidia-container/centos8/libnvidia-container.repo | sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo
sudo dnf install nvidia-container-toolkit nvidia-docker2

(nvidia docker 2 might not be required.)

If you need more of a reference for GPUs on Red Hat flavored linuxes, this article from the red hat blog is very good

Let’s get started

In my experience, you’ve gotta use podman for GPU support in Fedora 38 (and probably a few versions earlier, is my guess).

Go ahead and clone this oobabooga/text-generation-webui

From their README, you’ve gotta set this up to do the container build…

ln -s docker/{Dockerfile,docker-compose.yml,.dockerignore} .
cp docker/.env.example .env
# Edit .env and set TORCH_CUDA_ARCH_LIST based on your GPU model
docker compose up --build

Importantly – you’ve got to set the TORCH_CUDA_ARCH_LIST. You can check that you’ve got the right one from this grid on wikipedia DOUBLE CHECK – everything, but especially that you’re using the right .env file. Because I really made that take longer than it should when I got that wrong.

TORCH_CUDA_ARCH_LIST=8.6+PTX

First, try building ti with podman – it worked for me on the second attempt. Unsure what went wrong, but I built with…

podman build -t dougbtv/oobabooga .

WARNING: These are some BIG images. I think mine came out to ~16 gigs.

And then I loaded that image it into podman…

I need make a few mods before I can run it… Copy the .env file also to the docker folder (we could probably improve this with a symlink in an earlier step). And while we’re here we’ll need to copy the template prompts, presets, too.

cp .env docker/.env
cp prompts/* docker/prompts/
cp presets/* docker/presets/

Now you’ll need at least a model, so to download one leveraging the container image…

podman-compose run --entrypoint "/bin/bash -c 'source venv/bin/activate; python download-model.py TheBloke/stable-vicuna-13B-GPTQ'" text-generation-webui

Naturally, change TheBloke/stable-vicuna-13B-GPTQ to whatever model you want.

You’ll find the model in…

ls ./docker/models/

I also modify the docker/.env to change this line to…

CLI_ARGS=--model TheBloke_stable-vicuna-13B-GPTQ --chat --model_type=Llama --wbits 4 --groupsize 128 --listen

However, I run it by hand with:

podman run \
--env-file /home/doug/ai-ml/text-generation-webui/docker/.env \
-v /home/doug/ai-ml/text-generation-webui/characters:/app/characters \
-v /home/doug/ai-ml/text-generation-webui/extensions:/app/extensions \
-v /home/doug/ai-ml/text-generation-webui/loras:/app/loras \
-v /home/doug/ai-ml/text-generation-webui/models:/app/models \
-v /home/doug/ai-ml/text-generation-webui/presets:/app/presets \
-v /home/doug/ai-ml/text-generation-webui/prompts:/app/prompts \
-v /home/doug/ai-ml/text-generation-webui/softprompts:/app/softprompts \
-v /home/doug/ai-ml/text-generation-webui/docker/training:/app/training \
-p 7860:7860 \
-p 5000:5000 \
--gpus all \
-i \
--tty \
--shm-size=512m \
localhost/dougbtv/oobabooga:latest

(If you’re smarter than me, you can get it running with podman-compose at this point)

At this point, you should be done, grats!

It should give you a web address, fire it up and get on generating!

Mount your models somewhere

I wound up bind mounting some directories…

sudo mount --bind /home/doug/ai-ml/oobabooga_linux/text-generation-webui/models/ docker/models/
sudo mount --bind /home/doug/ai-ml/oobabooga_linux/text-generation-webui/presets/ docker/presets/
sudo mount --bind /home/doug/ai-ml/oobabooga_linux/text-generation-webui/characters/ docker/characters/

Bonus note: I also wound up changing my dockerfile to install a torch+cu118, in case that helps you.

So I changed out two lines that looked like this diff:

-    pip3 install torch torchvision torchaudio && \
+    pip3 install torch==2.0.1+cu118 torchvision==0.15.2+cu118 torchaudio==2.0.2 -f https://download.pytorch.org/whl/cu118/torch_stable.html && \

I’m not sure how much it helped, but, I kept this change after I made it.

I’m hopeful to submit a patch for https://github.com/RedTopper/Text-Generation-Webui-Podman which isn’t building for me right now hopefully integrating what I learned from this. And then have the whole thing in podman, later.

Don’t make my stupid mistakes

I ran into an issue where, I got:

RuntimeError: CUDA error: no kernel image is available for execution on the device

I tried messing with the TORCH_CUDA_ARCH_LIST in the .env file and change it to 8.6+PTX, 8.0, etc, the whole list, commented out, no luck.

I created an issue in the meanwhile: https://github.com/oobabooga/text-generation-webui/issues/2002

I also found this podman image repo!

https://github.com/RedTopper/Text-Generation-Webui-Podman

and I forked it.

It looks like it could possibly need updates.

I’ll try to contribute my work back to this repo at some point.

Installing Stable Diffusion on Fedora 38

In today’s tutorial, we’re going to install Stable Diffusion on Fedora 38.

I’m putting together a lab machine for GPU workloads. And the first thing I wanted to do was get Stable Diffusion running, and I’m also hopeful to start using it for training LoRA’s, embeddings, maybe even a fine tuning checkpoint (we’ll see).

Fedora is my default home server setup, and I didn’t find a direct guide on how to do it, although it’s not terribly different from other distros

…Oddly enough I actually fired this up with Fedora Workstation.

Requirements

  • An install of Fedora 38
  • A nVidia GPU (if someone has insight on AMD GPUs, and wants to provide instructions, hit me up and I’ll update the article)

Installing Automatic Stable Diffusion WebUI on Fedora 38

I’m going to be using Vladmanic’s fork of Automatic1111 sd webui: https://github.com/vladmandic/automatic

Clone it.

Fedora 38 ships with Python 3.11, but some dependency for stable diffusion requires python 3.11, which will require a few extra steps.

Install python 3.10

dnf install python3.10

Also, before you install CUDA, do a dnf update (otherwise I wound up with mismatched deps for NetworkManager and couldn’t boot off a new kernel, and I had to wheel up a crash cart, just kidding I don’t have a crash cart or a KVM for my Linux lab so it’s much more annoying where I move my server to my workstation area, luckily I just have a desktop server lab)

Install CUDA Toolkit (link is for F37 RPM, but it worked fine on F38)

And – follow the instructions there. You might need to reboot now.

Make a handler script to export the correct python version… I named mine user-webui.sh

#!/bin/bash
export python_cmd=python3.10
screen -S ./webui.sh --listen

NOTE: I fire it up in screen. If you don’t have Stockholm Syndrome for screen you can decide to not be a luddite and modify it to use tmux. And if you need a cheat sheet for screen, there you go. I also use the --listen flag because I’m going to connect to this from other machines on my network.

Then run the ./user-webui.sh once to get the venv, it will likely fail at this point. Or if you’re a smarter python user, create the venv yourself.

Then enter the venv.

 . venv/bin/activate

Then ensurepip…

python3.10 -m ensurepip

And now you can fire up the script!

./user-webui.sh

Running LLM's locally and interacting with an API (Using Ooobabooga Web UI)

Have you played with ChatGPT yet? Ummm, yeah, who hasn’t!? I have pirate-styled rap battles to make! So let’s get right to the point so we can get back to generating rap-battles as soon as possible!

Today we’re going to run a LLM (Large Language Model) locally on one of our own machines, and we’re going to set it up so that we can interface with it via API, and we’ll even write a small program to test it out.

I have some ideas where I want to take some software I’m building and hook it up to one of these, later I’d like to train it on custom data and then query it. Maybe even have some like real-ish-time data fed to it and then be able to query it in near-real-time too. I also have some ideas for populating it with logs and then being like “yo, tell me what’s up with this machine?”

But yeah, instead of relying on a GPT service, I want to run the LLM myself, using open source tools.

Pre-requisites

Ok, so I’m not going to go deep into the details of the installation, so I’m just going to give some pointers. It’s not necessarily rocket science

First up, we’re going to install a webUI, OobaBooga: https://github.com/oobabooga/text-generation-webui

This is one of the few times I’m going to say the word “windows” on this blog, but, I actually installed mine on windows, because it’s a windows box that’s an art and music workstation where I’ve got my decent GPU (for gaming and also for stable diffusion and then associated windoze-y art tools). I follow this youtube video by @TroubleChute. I even used his opinionated script to automatically install Vicuna!

But you can also just install it with the instructions on the README, which appears to be really straight forward.

The Model we’re going to use, Vicuna, you can find it @ https://vicuna.lmsys.org/ – the thing that’s interesting about Vicuna is that it’s trained on crowd-sourced GPT output, and claims be 90% as good as GPT, which seems like a lofty statement. But so far it does seem like it’s pretty decent, even if it does parrot a lot of the kind of “walled garden” stuff that ChatGPT says (“As an AI language model, I can’t tell you what you asked me for” kind of stuff.)

Quick tip: After you install it and you’re playing with it on the WebUI, assuming you’ve installed Vicuna… Go to the bottom of the chat and find the radio button for “mode” and switch it to “instruct” and then in the “instruction template” drop down, select “Vicuna” – this will parse the output from the LLM so it makes more sense in context with your queries/conversation/prompts to it.

Configure the start up script to launch the API.

Well, first, in your text-generation-webui git clone – go and check out the ./extensions/api folder and make sure there’s a script there, there should be by default.

Next, we’re going to enable that by adding a flag to our startup script.

In your startup script, add a --extensions api field, and also a --listen flag (if you want to listen on all addresses and not just your loopback so you can access the api from your LAN/WAN).

So now, for example, mine looks like:

python server.py --extensions api  --listen --auto-devices --chat --model anon8231489123_vicuna-13b-GPTQ-4bit-128g --wbits 4 --groupsize 128

Now your api should be ready to query.

Calling the API

Then I whipped together a node.js script (with ChatGPT’s help!) to query the API.

I’ve got the script here in this github gist.

Lucky you, the script asks for suggestions for cheese. If you ask me, I’d tell you have to have a Bayley Hazen blue, or maybe the Cabot Clothbound both from Jasper Hill Farms (they’ve go the raddest cheese cave, super good cheese)

And you can just run that with

node oobabooga_api_query.js

And away you go. Or if you don’t like Node, just have ChatGPT re-write it for you in assembly or something!

(The script still needs a little help parsing, but it’s close! I think it’s kind of missing the kind of “mode” + “template” from the UI, so that’s next to figure out)

B-b-b-bonus hip hop battle

Just for you, I asked it to:

write a rap battle about api clients written in node.js vs. assembly

And it gave me:

Verse 1 (Node.js):

I'm Node, I'm fast and sleek,
When it comes to APIs, I can't be beat!
With JavaScript at my core,
I make building apps a breeze, no need to feel bore-ot.

Chorus:

API clients, we rule the land,
Node.js or Assembly, which one takes the stand?
We may not be as old school,
But when it comes to speed and coolness, we'll never lack cool.

Verse 2 (Assembly):

I may be older than you think,
But don't underestimate me, I still have some tricks.
With raw power and precision,
My code runs smoothly without any friction.

Chorus:

API clients, we rule the land,
Node.js or Assembly, which one takes the stand?
We may not be as old school,
But when it comes to speed and coolness, we'll never lack cool.

Outro:

In the end, it doesn't matter who wins this battle,
Because both Node.js and Assembly are great for different purposes, so let's celebrate!

Thanks for your “everyone gets an award!” style indifference LLM!

…But not too shabby for something running on my local!

Chainsaw CNI -- Modify container networking at runtime

Introducing: Chainsaw CNI

The gist of Chainsaw CNI (brum-brum-brum-brum-brrrrrrrrr) is it’s a CNI plugin that runs in a CNI chain (more on that soon), and it allows you to run arbitrary ip commands against your Kubernetes pods to either manipulate or inspect networking. You can do this at run-time by annotating a pod with the commands you want to run.

For example, you can annotate a pod with:

k8s.v1.cni.cncf.io/chainsaw: >
      ["ip route","ip addr"]

And then get the output of ip route and ip addr for your pod.

I named it Chainsaw because:

  • It works using CNI Chains.
  • It’s powerful, but kind of dangerous.

Today, we’re going to:

  • Talk about why I made it.
  • Look at what CNI chains are.
  • See what the architecture is comprised of.
  • And of course, engage the choke, pull the rope start and fire up this chainsaw.

We’ll be using it with network attachment definitions – that is, the custom resource type that’s used by Multus CNI

Why do you say it’s dangerous? Well, like a chainsaw, you do permanent harm to something. You could totally turn off networking for a pod. Or, potentially you open up a way for some user of your system to do something more privileged than you thought. I’m still thinking about how to better address this part, but for now… I’d advise that you use it carefully, and in lab situations rather than production before these aspects are more fully considered.

Also, as an aside… I am a physical chainsaw user. I have one and, I use it. But I’m appropriately afraid of it. I take a long long time to think about it before I use it. I’ve watched a bunch of videos about it, but I really want to take a Game Of Logging course so I can really operate it safely. Typically, I’m just using my Silky Katanaboy (awesome Japanese pull saw!) for trail work and what not.

Last but not least, a quick disclaimer: This is… a really new project. So it’s missing all kinds of stuff you might take for granted: unit tests, automatic builds, all that. Just a proof of concept, really.

Why, though?

I was originally inspired by this hearing this particular discussion:

Person: “Hey I want to manipulate a route on a particular pod”

Me: “Cool, that’s totally possible, use the route override CNI” (it’s another chained plugin!)

Person: “But I don’t want to manipulate the net-attach-def, there’s tons of pods using them, and I only want to manipulate for a specific site, so I want to do it at runtime, adding more net-attach-defs makes life harder”.

Well, this kinda bothered me! I talked to a co-worker who said “Sure, next they’re going to want to change EVERYTHING at runtime!”

I was thinking: “hey, what if you COULD change whatever you wanted at runtime?”

And I figured, it could be a really handy tool, even if just for CNI developers, or network tinkerers as it may be.

CNI Chains


   ┌──────────────────┐                   ┌────────────────┐
   │                  │                   │                │
   │                  │   ┌───────────┐   │                │
   │   CNI Plugin A   │   │           │   │  CNI Plugin B  │
   │                  ├───► cni result├───►                │
   │                  │   │           │   │                │
   │                  │   └───────────┘   │                │
   └──────────────────┘                   └────────────────┘

CNI chains are… sometimes confusing to people. But, they don’t need to be, it’s basically as simple as saying, “You can chain as many CNI plugins together as you want, and each CNI plugin gets all the CNI results of the plugin before it”

This functionality was introduced in CNI 0.3.0 and is available in all later versions of CNI, naturally.

You can tell if you have a CNI plugin chain by looking at your CNI configuration, if the top level JSON has the "type" field – then it’s not a chain.

If it has the "plugins": [] array – then it’s a chain of plugins, and will run in the order within the array. As of CNI 1.0, you’ll always be using the plugins field, and always have chains, even if a “chain of one”.

Why do you use chained plugins? The best example I can usually think of is the Tuning Plugin. Which allows you to set network sysctls, or manipulate other parameters of networks – such as setting an interface into promiscuous mode. This is done typically after the work of your main plugin, which is going to do the plumbing to setup the networking for you (e.g. say, a vxlan tunnel, or a macvlan interface, etc etc).

The architecture

Not a whole lot to say, but it’s a “sort of thick plugin” – thick CNI plugins are those that have a resident daemon, as opposed to “thin CNI plugins” – which run as a one-shot (all of the reference CNI plugins are one shots). But in this case, we just use the daemonset that’s resident for looking at the log output, for inspecting our results.

Other than that, it’s similar to Multus CNI in that it knows how to talk to the k8s API and get the annotations, and it uses a generated kubeconfig to authorize itself against the k8s API

Let’s get to using it!

Requirements:

  • A k8s cluster, the newer the beter.
  • Multus CNI must be installed

That’s about it. Don’t use a production cluster ;)

So go ahead and clone dougbtv/chainsaw-cni.

Then create the daemonset with:

kubectl create -f deployments/daemonset.yaml

NOTE: Are you an openshift user? Use the deployments/daemonset_openshift.yaml deployment instead :thumbsup:

Now, let’s create a net-attach-def which implements chainsaw in a chain – note the plugins array!

Also note the use of the special token CURRENT_INTERFACE which will use the current interface name as opposed to you having to know it in advance.

---
apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
  name: test-chainsaw
spec:
  config: '{
    "cniVersion": "0.4.0",
    "name": "test-chainsaw-chain",
    "plugins": [{
      "type": "bridge",
      "name": "mybridge",
      "bridge": "chainsawbr0",
      "ipam": {
        "type": "host-local",
        "subnet": "192.0.2.0/24"
      }
    }, {
      "type": "chainsaw",
      "foo": "bar"
    }]
  }'
---
apiVersion: v1
kind: Pod
metadata:
  name: chainsawtestpod
  annotations:
    k8s.v1.cni.cncf.io/networks: test-chainsaw
    k8s.v1.cni.cncf.io/chainsaw: >
      ["ip route add 192.0.3.0/24 dev CURRENT_INTERFACE", "ip route"]
spec:
  containers:
  - name: chainsawtestpod
    command: ["/bin/ash", "-c", "trap : TERM INT; sleep infinity & wait"]
    image: alpine

Next, check what node the pod is running with:

kubectl get pods -o wide

You can then find the output from the results of the ip commands from the chainsaw daemonset that is running on that node, e.g.

kubectl get pods -n kube-system -o wide | grep -iP "status|chainsaw"

And looking at the logs for the daemonset pod that correlates to the node on which the pod resides, for example:

kubectl logs kube-chainsaw-cni-ds-kgx69 -n kube-system

You’ll see that we have added a route to 192.0.3.0/24 and then show the IP route output!

So my results look like:

Detected commands: [route add 192.0.3.0/24 dev CURRENT_INTERFACE route]
Running ip netns exec 901afa16-48e7-4f22-b2b1-7678fa3e9f5e ip route add 192.0.3.0/24 dev net1 ===============


Running ip netns exec 901afa16-48e7-4f22-b2b1-7678fa3e9f5e ip route ===============
default via 10.129.2.1 dev eth0 
10.128.0.0/14 dev eth0 
10.129.2.0/23 dev eth0 proto kernel scope link src 10.129.2.64 
172.30.0.0/16 via 10.129.2.1 dev eth0 
192.0.2.0/24 dev net1 proto kernel scope link src 192.0.2.51 
192.0.3.0/24 dev net1 scope link 
224.0.0.0/4 dev eth0