02 Apr 2025
A couple of summers ago my dad was telling me this story where a customer was in the show room of his sign shop. A customer’s kid, probably prompted by the CRT monitor and yellowed plastic of a late 1990s eMachines computer sitting on a desk in the showroom, announces…
Hey look Mom! It’s an antique computer!

Which I found absolutely hilarious, but… I didn’t realize that PC was still alive – it’s a workstation that operates a CNC router that’s a core part of my dad’s business, Briar Hill Signworks (don’t worry, his website has a 90s feel to it too! But the signs are awesome). He specializes in what I’d consider New England-style carved gold leafed wooden signs. They’re gorgeous and he’s really perfected the craft, as well his partner who is a painter and (a quite masterful!) gilder, meanwhile he’s quite modest about it overall.
So – next thought was “OH MY. You are still running that same PC to operate the CNC router?” It’s a miracle, a damn miracle, that the workstation PC is still running. That eMachines computer has been there for a solid 25 years, dutifully running Windows 98 in order to run some proprietary software, that I later found out, was actually built for Windows 95. I knew it was time to replace that thing, and I told my dad I’d come try to help him when I found some time. Two summers and almost a whole winter passed before I got a fateful call from my dad:
I can’t get files onto the antique computer over the network.
Uh oh, that means production was DOWN at my dad’s shop. Which is why I bring to tell you a tale of this retro computing challenge – quite a departure from my typical tales of open source software, complete with decades old software and hardware, and hardware software license key dongles, and all kinds of other gory stuff.
The eMachines situation.
It’s a got an Intel 266mhz processor running Windows 98 SE, I believe it was originally a sub-$500 machine in maybe 1999. It’s got a floppy disk drive, and a CD-ROM drive – both of which have since failed and both are inoperable. Amazingly, it’s got a NIC attached to ethernet and my dad’s been sending it files (from another machine, one that he uses more-or-less modern software to design the signs) using Windows File Sharing.
The router is connected over a DB-25 pin cable which is converted to DB-9 pin and works over a serial COM port (or that’s how it was last configured, so I kept it that way!). My dad uses two pieces of software on this machine primarily, CASMate and Enroute – one is like a CAD for sign design and Enroute is used for creating the tool paths (the instructions to send to the CNC router), and then it also has the drivers to actually communicate with the router itself. But you see – this was rather expensive software at its time, and it’s protected with a software protection dongle. I think specifically a HASP (which stands for “Hardware Against Software Piracy”) dongle, one that works over a DB25 LPT (parallel printer) port.

This is the HASP (attached to the new PC workstation), what a relic of a different era.
My dad’s router is a Vytek Rebel 3D. He bought it in 1996, and it was spiffy for its time – and honestly still really is to this day in a lot of ways (minus the outdated software stack). It’s got a 50x50” addressable table with a bit more table outside of that, and the thing is a tank, hell it’s nearly 30 now. See – it’s a 3D router. Not 2.5D. With 2.5D, the tool can move in X & Y at the same time, but, Z independently – with a 3D machine, you can move in all three axis at once. It’s critical for making the beautiful v-groove lettering of a carved sign. I thought this might be a software limitation, but nope, it’s also apparently a hardware limitation – there’s mechanisms for true Z axis movement, whereas a 2.5D machine can use a solenoid to step along the Z axis.
But, that company doesn’t produce CNC routers anymore, they seem to do CNC laser stuff these days, and… It seems non-trivial to replace the head unit on it.
Growing up, my father started this business in a woodshop that was an outbuilding to our home, and he ran the CNC router in our garage before he moved the business to a (rather scenic) red barn in Sutton New Hampshire (Google street view). As a budding young tech person – I was STOKED for the CNC router to arrive. My next door neighbor was also pretty excited and he RAN to the house to let us know the flatbed delivering the router was pulling up. The CNC router seemed like pure magic. Fascinating to watch it carve a sign. When my dad bought it, it came with a service where someone came to your location to train you how to use it. I begged my dad to let me stay home from school so that I could also learn how to use it. He said no – if he could go back in time knowing I’ll be there to help now, I’ll bet he’d say yes, but… Seriously who needs a know-it-all 15 year old (who should probably be at school) bugging you when you’re trying to learn something critical for building your business. I still learned a lot along the way (and even worked at the shop for a year, too).

My original game plan was to take the HDD out of the eMachines computer and hook it up with an IDE adapter dd
it and capture an image so that I could run it as a VM. But I explained the risks to my dad – this computer is 25, and in computer years that’s 175, so, any operation – including taking the hood off to pop out an HDD – might totally turn this thing to dust.
He didn’t want to do that. Understandably. So, I did it the hard way, a stare-and-compare, and I also was lucky enough to get some files off of it over the network. My dad in desperation had jiggled the ethernet cable attached to it and got the machine running for a few more jobs before I could make it to the barn to help out. Maybe it was just a dodgy network cable that was the problem all along – who knows, maybe that thing would’ve gone another 25 years? But, regardless.
My dad was losing sleep over this. I get it. It’s production. It’s how you make money. I really looked at it as a fun retro computing challenge, and a good way to help family. But, it’s family and it’s business, and it’s production. Guy was freaked, and I don’t blame him. This kind of stuff also BOTHERS me. I don’t like it when production is down, that bothers my most basic instincts.
In preparation and the first day on site…
I spec’d out a new school workstation – I was going to have him buy a laptop, but a buddy of mine recommended that a parallel or serial to USB converter might be too touchy with the precision instruments, so… I looked for a kind of middle-of-the-road workstation and had him get a Lenovo desktop that had a serial and parallel port – and I also had him buy a smattering of other things at the same time, including PCIe serial and parallel cards, a handful of gender changers for the parallel ports and some peripherals.
The lamest part of the thing is that I opted to have him use VMWare Workstation (which is luckily is a free license these days). Without getting into the details, the gist was that my dad is a windows user, running an old windows for this machine, on a new windows, and we needed the parallel port support – which is since no longer supported in the latest versions, so I may have advised borrowing an old version from archive.org (I didn’t learn that until after the first attempt with the latest version, we settled on a 16.x version).
I figured I’d build out the VM, install the apps, get the dongle working, then try to talk to the router. Easy, right?
Famous last words.
I spent the first day mostly stubbing my toe on the vmware portion, and then guess what I did… Installed old software via CD-ROM drive. Click, bzzt, whirrrrrrr! My dad had saved all the old software and had the CD-ROMs, so, I went through installing it all. I even did it twice because I tried to mess with some windows settings after the first pass, and… I corrupted the O/S haha – yeah. I forgot to take a snapshot, so, I was pretty sure to take snapshots after that.
I used a bunch of help from ChatGPT, it’s great at generating some instructions for what to look for in an old Windoze that I have long LONG forgotten about. Got it hooked up to the file sharing network again, stuff like that. I also ran some troubleshooting scenarios with it.
And… the tour de force of the first day – I got it to talk to the parallel port dongle and got the proprietary software to load reading whatever auth info from the HASP. I knew a lot of things were going just right to have that happen.

Remember that fun visual error? It’s like… a failure to clear the video buffer or something? Looks like a win in windows solitaire. I tried to Google the name of this error, there isn’t one, apparently
My dad’s partner also had brought some really nice healthy lunch, looked sort of like a bento box with lox and veggies, amazing.
I also got it to talk to the router, kind of. At least… sort of. The router would move to the first position but when it went to drive the tool down the Z-axis, it would go…. excruciatingly slow. I tried a bunch of combinations of different ports, with DB9 adapter and without, onboard parallel and PCIe parallel, that kind of stuff. But wasn’t getting much further.
I don’t often get to shout “MOVE!” while physically at a workstation, so my dad was subject to many Nick Burns, Your Company’s computer guy jokes throughout the process. I’d try a configuration, then, not remembering all the pieces for what to do, my dad would jump in and start generating toolpaths or sending the job to the router. I noticed that one thing was weird with the job manager, it was like… The process we were using was different, it seemed like it wasn’t what my dad was doing. I knew something was different, what was it? It would haunt me for a week.
Have to admit, I was bummed to have to put it down at the end of the day, and felt like I was getting close, but… Until it would route a job – it wasn’t over. And as we know, anything can go wrong, at any time.
I explained the problem to a few friends throughout the course of the project, and a good friend mentioned (paraphrased) “the slow Z movement sure smells like a driver issue.” I would keep stewing on that.
But there were things that were retro tech cool that I wound up doing that were successes in their own right, just for the fun of it – I brought up Hyperterminal which I hadn’t seen in like, probably 25 years too. I opened up regedit
, and boy oh boy – did I hit windows key + pause/break
dozens of times to look at the device manager.
The second day, the next week…
I made it down to the shop for a second day. 90 minute drive through the Green Mountains, down into the Connecticut River Valley on a March day, complete with slushy roads and bad coffee from a gas station. Side note: How can those like “ground coffee on demand single serve machines”, even Green Mountain Coffee, taste so mediocre? Oh well, I still drank it all morning.
Because one idea that’s just a science experiment isn’t enough. I also advised him to try the latest edition of Enroute – which you can buy on a subscription (which is nice compared to the lump sum with no upgrades, as we know from the rest of this story huh). So, we started the day with that. It wasn’t just a cake walk, in fact, it was more of a stepping into dog piles kind of walk. Granted, their support was really nice (thanks Jerry!) and we even got to get a laugh out of support when I showed him Enroute 2.1 running on Windows 98. However, they didn’t have an immediate solution regarding drivers for our machine, so, by noon time, they’d hit a wall and sent a request to an engineer to get back to us.
Additional benefit of being on the phone with support, a rare opportunity to call my dad “Cheif”. No one likes calling their dad “Dad” during a business call, and it’s even more awkward to call him by his, you know, actual name. Let’s just be honest, just seems weird on both fronts. My own father had worked many years with my grandfather, his father, and he also encountered this as well – so he called his dad “Cheif”. Incidentally, that’s also my dad’s grandfather name, Cheif. So, I got to call my dad Cheif during the calls.
In parallel while waiting on support, I was also working through the setup of the virtualized old eMachine.
It was time to get to basics: Stare and compare. What’s different? Spot the difference between these two pictures. I went step by step through all the stuff I needed to look at. One by one.

This wasn’t exactly what I was comparing, but to give you an idea, it’s actually how the “plate” is defined, how you define where the substrate is on the table.
There was one thing that I knew of and I was seeing, there was a dialogue box that showed the path of the drivers, and it was different on the old machine and on the new VM. One was using C:\enroute\ODrivers
and the other was using C:\enroute\NDrivers
– I had seen nomenclature during the install for “new drivers” and “old drivers” – guess what? The old machine was also using the ODrivers
directory, so, even for the old software, we were using the old drivers.
The thing was… I couldn’t change that path. No way to set it. What was I to do?
Well, file contents search for NDrivers
did the trick – there was a C:\windows\enroute.ini
– and guess what was there? A driver path along with a bunch of other options, including coordinates for where your toolbars go and your whole setup. So I tested changing the driver directory.
My dad sees me editing the .ini
file in notepad, and he asks:
Dad: So, you type that, and… it does something?
Me: I hope so.
It worked to change the displayed driver directory, so I copied the whole .ini
file over from the old computer. There was actually a paramter named something old UseOldStyle=true
on the old computer.
Things started changing QUICKLY. I called my dad over to run a test.
I knew it was good when the job manager came up to send the job to the CNC router – this time the process was EXACTLY the way it was when my dad would send a job. He knew the interface immediately – must’ve been the UseOldStyle
, that’s what he knew.
We’d be calling back and forth from the show room where the PC workstation is, and the production floor where the router is in the next room over, I’d be yelling “Ready?” and my dad would respond “READY.” and we’d send the job. This time though, my dad’s instincts kicked in and he removed a checkbox to send the job, he already knew the router was ready.
I watched the progress percentage in the job manager as it sent the job to the router, something like 200KB going over virtualized COM port to the router, in all its 1996 glory.
The router sprung to life. BZZZZT, BRNNNNNNNNNNN, BZZT, BRRRN. Holy smokes…. We got the job to run.
Not gonna lie – I had to hold back a tear. Tears of relief, tears of joy. We got production back online, and at least in a somewhat sustainable fashion that isn’t teetering on a budget eMachines PC. My dad even said “I think I could cry.” It was an extremely incredible moment with my dad. Seriously a cherished moment. I’ve had some moments in my tech career where I was so excited that I did cartwheels in an office (apparently I was younger then, I had gotten Asterisk to talk to a Lucent 5E switch!), and even one time I ran out onto the street and yelled with joy! (I found a problem with a backup that was causing a daily outage for weeks in my days as a telco switch tech). But, I can’t remember another that quite touched me like that.
Then reality hits: MAKE A VM SNAPSHOT NOW. We had success.
There was a few rough edges, some files would cause a BSOD (Wikipedia) when we’d try to import them into Enroute, after me watching a windoze VM bomb out with a BSOD a hundred times, I finally figured out some potion that was “good enough”. He also had a way to set a specific 0,0
X,Y coordinates for a home position using CASMate, but I couldn’t get CASMate to talk to the router yet, and no obvious way to do it in Enroute, so, we wound up working around it by figuring out a different method for defining “the plate” (as in the above photo).
I kind of figured I’d have to give my dad a whole walkthrough later after getting it all together, but, I think he got the gist of using the VM, the rest of the software running in the VM – that he knew. But, he didn’t need a walk through, I just got a video of him carving a sign. I’ve watched it more times than I’m willing to admit to.
31 Jan 2025
So, are you familiar with DRA? Dynamic Resource Allocation (k8s docs linked)? It’s for “requesting and sharing resources between pods”, like, if you’ve got a hardware resource you want to use, say, a GPU in your pod, or maybe you’re, like, cooler than that and you wanna connect a smart toaster to your pods… Well, you could use DRA to help Kubernetes schedule your pod on a node that has a connected toaster, or, well, yeah a GPU. Popular for AI/ML or so I hear, have you? :)
Today, we’re going to chat about using DRA for networking, and I’ve got a pointer and an opinionated tutorial and how to run through using an example DRA driver so you can get your hands on it – so if you’d rather that, you can skip my rundown – fire up your scroll wheel on turbo or start spamming page down!
DRA vs. Device Plugins
It’s starting to become a popular idea to use DRA for Networking. This isn’t necessarily groundbreaking for networking in Kubernetes, you know, we do use hardware devices for networking, and indeed people have been using Kubernetes device plugins to achieve this for networking. And in fact, the SR-IOV network operator which is maintained by folks in the Network Plumbing Working Group (NPWG) uses device plugins to allocate high performance NICs to pods in Kubernetes. You, however, won’t be surprised to hear that there was a community get together at the Nvidia HQ in California to form a working group which came up with the device plugin spec, back in 2018, I attended because, well, I’m a Kubernetes networking guy and I was also interested in making sure my pods wind up on nodes with available hardware resources for specialized and high performance networking.
So – why now is DRA becoming popular? Well, device plugins did “get ‘er done” – but, they are limited, at the end of the day, device plugins basically require that you have a daemon running on each node and advertise the availability of nodes to the Kubelet, basically as a simple counter, like, chartek.io/toaster=5
if you’ve got 5 smart toasters connected to your node. Your daemon would realize one has been used and decrement the counter. With DRA – we can have much richer expression of those toasters.
Now imagine, you’ve got a variety of toasters. So, it’s like… Just saying 5 toasters isn’t enough. Now you want to be able to use a toaster that has a bagel function, or one with computer vision to analyze your bread type, and actually, there’s even toasters that will toast a gosh dang hot dog.

Now here’s the thing – if you need to divvy up those resources, and share that technology between multiple pods… You can do that with DRA. So if you’ve got a need to have a pod using the hot dog slot and one using the bun slot – you can do that. So, you could setup a resource class to divide the methods of your device, like so:
apiVersion: resource.k8s.io/v1beta1
kind: ResourceClass
metadata:
name: toaster-bun-slot
driverName: burntek.io/bun-toaster
---
apiVersion: resource.k8s.io/v1beta1
kind: ResourceClass
metadata:
name: toaster-hotdog-slot
driverName: burntek.io/hotdog-warmer
Whereas with device plugins – you kind of have to allocate the whole toaster.
This might seem ridiculous (because it is), but, there’s an entire effort dedicated to ensuring that hot dog toasters work with Kubernetes – I mean, there’s a whole effort dedicated to “Structured parameters for DRA”, which I believe is related to the idea of allocating multiple workloads to a GPU, like say you have a GPU with 48 gigs of VRAM, if one workload claims it – but it only uses 12 gigs of VRAM, well, the rest of the VRAM is kind of going to waste, so, you could allocate multiple.
Think of it this way…
Device plugins: Ask for a toaster, you get a toaster, that’s it.
DRA: Ask for a toaster, and the toasting technologies, and you get the precise toaster you need, allocated at scheduling time, and shared with the right pods.
Why DRA for networking?
See – this is what’s cool about Kubernetes, it’s an awesome scheduler – it gets your workloads running in the right places. But, it doesn’t know a whole lot about networking, for the most part it just knows the IP addresses of your pods, and indeed it can proxy traffic to them with Kube proxy, but… For the most part, Kubernetes doesn’t want to mess with infrastructure specific know-how. Which is why we’ve got CNI – the container networking interface. With CNI, we can kind of free Kubernetes from having infrastructure specific know how, and let your CNI plugin do all the heavy lifting.
Device plugins, while they could do the job generally, have only that counter. This was also a big challenge with SR-IOV. You see, device plugins kind of weren’t really designed for networking either, and here’s the thing…
The design of device plugins was weighted towards allocating compute devices, like GPUs and FPGAs. SR-IOV network devices don’t need just allocation, they also need configuration.
So, the Network Plumbing Working Group put together, “The Device Info Spec” [github]
This allowed us to store some configurations, which we laid down on disk on nodes, in order for the terminal CNI plugin to pick up that configuration information, which we needed to do at the time, but… It’s got a lot of moving parts, including the use of Multus CNI as an intermediary to pass that information along. I’m a Multus CNI maintainer and I sort of wince whenever I have to go near that code to be frank with you. And when I say “frank” – I don’t mean frankfurters (actually, yes I do).
Using DRA for these kind of cases represents a way to potentially simplify this, we can have customized parameters that go all the way down, without having to have some kind of intermediary.
But, there’s some potential downsides…
Here’s the other part… This might just totally bypass CNI. Which, basically every Kubernetes community user/operator/administrator really relies on today. It’s THE way that we plumb network interfaces into pods. It’s kind of the elephant in the room to me, there’s sort of two sides to it…
CNI was designed to be “container orchestration agnostic”—not Kubernetes-specific. That was a smart, community-friendly move at the time, but it also means CNI doesn’t have deep Kubernetes integrations today. If you’re an “all day every day” kind of Kubernetes developer, CNI looks like a total non-sequitur which can only be described as “a total pain in the buns” (not hot dog buns, but, pain in the butt). If you want to hear more about this, check out my lightning talk about this at FOSDEM in 2023.
The other side is… CNI is also extremely modular, and allows components to interoperate. It allows you to customize your clusters and integrate say, your own solutions, with a primary CNI that you choose, and even integrate vendor solutions.
What I’m afraid of is that if we ignore CNI: We’re going to blackbox things from people who do manage their own clusters. We might also blackbox it from people who support those clusters – even in a proprietary public cloud provider context.
CNI provides a lingua franca for this kind of work with networking.
On the bright side…
We might have a better world coming! And there’s a lot of people working VERY hard to make it happen.
I think that potentially that the CNI DRA driver is probably the thing that will help us iron out a bunch of these kinks.
The primary author and legendary rad developer, Lionel Jouin has been doing A LOT in this space, and has been putting together some excellent PoCs for quite a while to really explore the space.
Lionel was also pivotal in getting a “resource claim status” enabled – see: (github.com/kubernetes/enhancements#4817)[https://github.com/kubernetes/enhancements/issues/4817] & (github.com/kubernetes/kubernetes#128240)[https://github.com/kubernetes/kubernetes/pull/128240]
Keep your eyes on the DRA CNI driver repo and track it – and, best yet – think about contributing to it.
Also, you should check out this KEP to support dynamic device provisioning from Sunyanan Choochotkaew which also helps to address some challenges found when working through the DRA CNI driver.
Other resources to check out…
Let’s get onto the tutorial!
Pre-reqs…
- Fedora (but you could adapt it to your own system)
Get docker installed…
sudo dnf -y install dnf-plugins-core
sudo dnf-3 config-manager --add-repo https://download.docker.com/linux/fedora/docker-ce.repo
sudo dnf install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
sudo systemctl start docker
sudo systemctl enable docker
sudo groupadd docker
sudo usermod -aG docker $USER
newgrp docker
docker ps
Install kind:
[ $(uname -m) = x86_64 ] && curl -Lo ./kind https://kind.sigs.k8s.io/dl/v0.24.0/kind-linux-amd64
chmod +x ./kind
sudo mv ./kind /usr/local/bin/kind
kind version
curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
chmod +x kubectl
sudo mv kubectl /usr/local/bin/
kubectl version
And you’ll need make
.
Install helm (dangerously!)
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
And then we’re going to want to spin up a registry in kind
I use the instructions from kind – which includes a script. I ran that without mod.
Push an image to localhost:5000/foo/bar:quux to see if it works.
Install go.
wget https://go.dev/dl/go1.23.4.linux-amd64.tar.gz
sudo rm -rf /usr/local/go
sudo tar -C /usr/local -xzf go1.23.4.linux-amd64.tar.gz
export PATH=$PATH:/usr/local/go/bin
Spinning up the DRA example driver.
This is – basically right from the DRA example driver README, but, with my own experience, and install steps for the tools above, of course… But, let’s check it out to see the basics.
Then I checked out…
git clone https://github.com/kubernetes-sigs/dra-example-driver.git
cd dra-example-driver
From there, I spun up a 1.32 cluster…
Build and install the example driver image…
./demo/build-driver.sh
# You could load or reload it...
kind load docker-image --name dra-example-driver-cluster registry.example.com/dra-example-driver:v0.1.0
# and where its used from...
cat deployments/helm/dra-example-driver/values.yaml | grep registry
And helm install it:
helm upgrade -i \
--create-namespace \
--namespace dra-example-driver \
dra-example-driver \
deployments/helm/dra-example-driver
And that it works.
kubectl get pod -n dra-example-driver
Now, you’ll see they created resource slices…
kubectl get resourceslice -o yaml
You could also use their spinup, and config, which I borrowed from.
./demo/create-cluster.sh
# You can check out what's in their config with:
cat ./demo/scripts/kind-cluster-config.yaml
Run their example…
kubectl apply --filename=demo/gpu-test{1,2,3,4,5}.yaml
And use their… bashy script to see the results…
#!/bin/bash
for example in $(seq 1 5); do \
echo "gpu-test${example}:"
for pod in $(kubectl get pod -n gpu-test${example} --output=jsonpath='{.items[*].metadata.name}'); do \
for ctr in $(kubectl get pod -n gpu-test${example} ${pod} -o jsonpath='{.spec.containers[*].name}'); do \
echo "${pod} ${ctr}:"
if [ "${example}" -lt 3 ]; then
kubectl logs -n gpu-test${example} ${pod} -c ${ctr}| grep -E "GPU_DEVICE_[0-9]+=" | grep -v "RESOURCE_CLAIM"
else
kubectl logs -n gpu-test${example} ${pod} -c ${ctr}| grep -E "GPU_DEVICE_[0-9]+" | grep -v "RESOURCE_CLAIM"
fi
done
done
echo ""
done
And it’s a bunch of data. I mean, stare-and-compare with their repo and result, but…
You might have to dig in deeper in what actually happened – like, where do those resourceslices come from?
Now you can delete their stuff:
kubectl delete --wait=false --filename=demo/gpu-test{1,2,3,4,5}.yaml
And, let’s remove their cluster:
kind delete cluster --name dra-example-driver-cluster
Network DRA! Lionel’s POC
So! This part is incomplete… I wish I got further, but I got stuck with mismatched resource versions. DRA went to v1beta1 in K8s 1.32, and… I think that caused me a bunch of heart burn. But, I’m keeping this here for posterity.
Now let’s look at Lionel’s project: https://github.com/LionelJouin/network-dra
git clone https://github.com/LionelJouin/network-dra
cd network-dra/
Following his directions I tried make generate
with errors. No bigs, let’s carry on.
make REGISTRY=localhost:5001/network-dra
Then, make the customized k8s
git clone https://github.com/kubernetes/kubernetes.git
cd kubernetes
git remote add LionelJouin https://github.com/LionelJouin/kubernetes.git
git fetch LionelJouin
git checkout LionelJouin/KEP-4817
git checkout -b lionel-custom
And then you can build it with:
kind build node-image . --image kindest/node:kep-4817
I had:
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
featureGates:
DynamicResourceAllocation: true
containerdConfigPatches:
- |-
[plugins."io.containerd.grpc.v1.cri"]
enable_cdi = true
nodes:
- role: control-plane
image: kindest/node:v1.32.0
kubeadmConfigPatches:
- |
kind: ClusterConfiguration
apiServer:
extraArgs:
runtime-config: "resource.k8s.io/v1beta1=true"
v: "1"
scheduler:
extraArgs:
v: "1"
controllerManager:
extraArgs:
v: "1"
- |
kind: InitConfiguration
nodeRegistration:
kubeletExtraArgs:
v: "1"
- role: worker
image: kindest/node:v1.32.0
kubeadmConfigPatches:
- |
kind: JoinConfiguration
nodeRegistration:
kubeletExtraArgs:
v: "1"
Their cluster…
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
featureGates:
DynamicResourceAllocation: true
containerdConfigPatches:
# Enable CDI as described in
# https://tags.cncf.io/container-device-interface#containerd-configuration
- |-
[plugins."io.containerd.grpc.v1.cri"]
enable_cdi = true
nodes:
- role: control-plane
kubeadmConfigPatches:
- |
kind: ClusterConfiguration
apiServer:
extraArgs:
runtime-config: "resource.k8s.io/v1beta1=true"
scheduler:
extraArgs:
v: "1"
controllerManager:
extraArgs:
v: "1"
- |
kind: InitConfiguration
nodeRegistration:
kubeletExtraArgs:
v: "1"
- role: worker
kubeadmConfigPatches:
- |
kind: JoinConfiguration
nodeRegistration:
kubeletExtraArgs:
v: "1"
Lionel’s:
---
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
networking:
ipFamily: dual
kubeProxyMode: ipvs
featureGates:
"DynamicResourceAllocation": true
"DRAResourceClaimDeviceStatus": true
runtimeConfig:
"networking.k8s.io/v1alpha1": true
"resource.k8s.io/v1alpha3": true
containerdConfigPatches:
- |-
[plugins."io.containerd.grpc.v1.cri"]
enable_cdi = true
[plugins.'io.containerd.grpc.v1.cri'.cni]
disable_cni = true
[plugins."io.containerd.nri.v1.nri"]
disable = false
nodes:
- role: control-plane
image: kindest/node:kep-4817
- role: worker
image: kindest/node:kep-4817
My portmanteau…
---
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
networking:
ipFamily: dual
kubeProxyMode: ipvs
featureGates:
"DynamicResourceAllocation": true
"DRAResourceClaimDeviceStatus": true
runtimeConfig:
"networking.k8s.io/v1alpha1": true
"resource.k8s.io/v1alpha3": true
containerdConfigPatches:
- |-
[plugins."io.containerd.grpc.v1.cri"]
enable_cdi = true
[plugins.'io.containerd.grpc.v1.cri'.cni]
disable_cni = true
[plugins."io.containerd.nri.v1.nri"]
disable = false
[plugins."io.containerd.grpc.v1.cri".registry]
config_path = "/etc/containerd/certs.d"
nodes:
- role: control-plane
image: kindest/node:kep-4817
kubeadmConfigPatches:
- |
kind: ClusterConfiguration
apiServer:
extraArgs:
runtime-config: "resource.k8s.io/v1beta1=true"
scheduler:
extraArgs:
v: "1"
controllerManager:
extraArgs:
v: "1"
- |
kind: InitConfiguration
nodeRegistration:
kubeletExtraArgs:
v: "1"
- role: worker
image: kindest/node:kep-4817
kubeadmConfigPatches:
- |
kind: JoinConfiguration
nodeRegistration:
kubeletExtraArgs:
v: "1"
nodeRegistration:
kubeletExtraArgs:
v: "1"
- role: worker
image: kindest/node:kep-4817
kubeadmConfigPatches:
- |
kind: JoinConfiguration
nodeRegistration:
kubeletExtraArgs:
v: "1"
And:
kind create cluster --name network-dra --config my.cluster.yaml
And I have running pods…
Load the image…
kind load docker-image localhost:5001/network-dra/network-nri-plugin:latest
Install plugins…
kubectl apply -f https://raw.githubusercontent.com/k8snetworkplumbingwg/multus-cni/master/e2e/templates/cni-install.yml.j2
You can uninstall with:
I had to push to a public registry…
helm install network-dra deployments/network-DRA --set registry=docker.io/dougbtv
docker tag network-nri-plugin dougbtv/network-nri-plugin:latest
docker push dougbtv/network-nri-plugin:latest
helm uninstall network-dra
Now let’s look at the demo…
Most interesting, we want to look at:
apiVersion: resource.k8s.io/v1alpha3
kind: ResourceClaim
metadata:
name: macvlan-eth0-attachment
spec:
devices:
requests:
- name: macvlan-eth0
deviceClassName: network-interface
config:
- requests:
- macvlan-eth0
opaque:
driver: poc.dra.networking
parameters:
interface: "net1"
config:
cniVersion: 1.0.0
name: macvlan-eth0
plugins:
- type: macvlan
master: eth0
mode: bridge
ipam:
type: host-local
ranges:
- - subnet: 10.10.1.0/24
Well – look at that, looks like a YAML-ized version of the macvlan CNI config!.
That is, because it is.
And, the pod has a little something to look at:
resourceClaims:
- name: macvlan-eth0-attachment
resourceClaimName: macvlan-eth0-attachment
And then I got stuck – can you get further?
19 Jul 2024
In this article, we’re going to use LLaVa (running under ollama) to caption images for a Stable Diffusion training dataset, well fine tuning in my case, I’ve usually been baking LoRAs with the Kohya SS GUI.
Something I’ve been hearing about is that people are using LLaVa to caption their datasets for training Stable Diffusion LoRAs (low rank adapations, a kind of fine tuning of a model). And I was like – this would be great, I have a few big datasets, and I have my own ways of adding some info from metadata I might have – but, I’d love to get it more detailed, too.
From the page:
LLaVA represents a novel end-to-end trained large multimodal model that combines a vision encoder and Vicuna for general-purpose visual and language understanding, achieving impressive chat capabilities mimicking spirits of the multimodal GPT-4 and setting a new state-of-the-art accuracy on Science QA.
What the really means in plainer terms is – you can prompt it with text and images and get text output generated by the LLM, relevant to the imagery you provided.
Previously, I’ve been using BLIP (huggingface) from within the captioning tools of Kohya SS (my favorite training app of the moment), and then I’d sometimes munge those captions with sed
and call it a day.
However, this really method using LLaVa really intruiges me, so I wanted to get it setup.
Granted – I did think about whipping up a script to do this using GPT-4 multimodal, and I will also probably try that at some point. But! It’s not nearly as fun, nor as rad, as having your own local multimodal setup. I also had put this project in my notes for that purpose too: github.com/jiayev/GPT4V-Image-Captioner. No tto mention, since you own it on your own gear – you get to make your own rules. Naturally, I love cloud computing, but I’m often reminded of Stallman’s “can you trust your computer?” essay in the age of centralized applications.
I will also provide my own script to make API queries against ollama, but, I’ll provide a few links to other tools if you’d rather start with something more full fleged – in this case I just wanted to be able to do my own prompt engineering and having my own method to keep munging my captions. It’s not a lot of code, so it’s easy to get wrangled in yourself.
Prerequisites
- A machine where you can run ollama, preferably, with a GPU
- I’ll be using Fedora linux, you can adapt to other distros.
Getting LLaVa running with ollama
First get ollama installed, it’s a nicely packaged opionated way to get LLMs running in containers, which I quite like. If you wanna learn more about LLaVa on ollama, you can also check out this great youtube video by Learn Data with Mark.
Then I went and pulled this model (you can pull it manually, or it should pull when you do ollama run ...
):
https://ollama.com/library/llava:13b
Ok, I go ahead and start ollama serve…
$ screen -S ollamaserve
ollama serve
I’m starting with the 13b param model @ https://ollama.com/library/llava:13b
Then I kick it off with…
And let’s give it a test drive, I use an example image from a dataset I have going for Adirondack guideboats
>>> please describe this image /path-to/dataset/guideboat/.../example.jpg
Added image '/path-to/dataset/guideboat/.../example.jpg'
The image you've provided appears to be an old black and white photograph. It shows a group of people in boats on calm waters, likely a river or lake. There are several individuals visible, with one person who seems to be actively
rowing or maneuvering the boat. In the background, there is land with trees and vegetation, indicating a natural setting. The image has a vintage appearance due to its monochrome color scheme and grainy texture, which suggests it
could be from an earlier time period, possibly mid-20th century or earlier, judging by the style of clothing and equipment.
Looks like a pure win! I didn’t even look at the image, I just know it’s right, that matches most of the images.
I saw on reddit that taggui now supports LLaVa captioning, so you might want to check it out @ https://github.com/jhc13/taggui/
Captioning images
So seems like there’s two ways to do this…
- From the CLI I can provide a path to an image
- Via the API I can provide a base64 encoded image
we’re going to use this library…
https://pypi.org/project/ollama/
And take a look at the API docs…
https://github.com/ollama/ollama/blob/main/docs/api.md
Later, I think we’re going to create a modelfile – for example… https://github.com/ollama/ollama/blob/main/docs/modelfile.md – there we can better define an overall system prompt.
In the meanwhile – we will just confidently emulate it.
For now though, I think I only need a little bit of context for my system prompt, so we’ll just jam it right in.
I used GPT-4 to feed in some info about the library and the API and having it whip me up something quick – this script doesn’t need to be intense, just an intro prompt and then a way to cycle through images and output a caption.
I put my quick scripts up on github @ dougbtv/ollama-llava-captioner
First, you’ve got to make sure you pip install ollama
, it’s the only dep…
Then, you just run it with a path to your folder with images…
doug@stimsonmt:~/ai-ml/llava-captioner$ python llava-caption-ollama.py /path/to/dataset/guideboat/images/15_guideboat/
[...]
Processing 005C3BDB-6751-4E98-9CEE-352236552770.jpg (1/1260)...
Generated Caption: Vintage black and white photograph of two people in a small boat on a wavy lake. They appear to be engaged in fishing or enjoying the tranquil water. The background shows a hazy, serene skyline with houses along the shoreline under a cloudy sky.
Completed 0.08% in 2.12s, estimated 2663.50s left.
Processing 00764EBA-A1C4-4687-B45A-226973315006.jpg (2/1260)...
Generated Caption: An old, vintage black and white photograph depicting an early aviation scene. A seaplane is gliding over a serene lake nestled among pine trees and hills in the background, capturing a moment of adventure and exploration amidst nature's tranquility.
Completed 0.16% in 4.16s, estimated 2615.48s left.
It uses a kind of “pre prompt” to set up the instructions for captioning. I think you should probably tune this, so you can start with the prompt.txt
in the root dir of the repo, and then modify it yourself, and you either just change it in place or run the script with a path to where your saved your modified (or new!) prompt.
$ python llava-caption-ollama.py /path/to/dataset/guideboat/images/15_guideboat/ /path/to/my.prompt.txt
In both cases it will save a .txt
file in the same folder as your images, with the same base file name (e.g. before the extension).
In conclusion…
This seems to be more accurate, at least reading as a human, than BLIP captioning is. I haven’t put together an organized “before/after” on the datasets I’ve tried this with, but my intuition says it does work quite a bit better, I’ll try to come back with some results in the future, but until then…
Happy captioning!
18 Jul 2024
We’re going to try to get some AI/ML workloads running using Podman Desktop’s new AI Lab Feature!
Podman Desktop, in the project’s words, is:
an open source graphical tool enabling you to seamlessly work with containers and Kubernetes from your local environment.
If you just want to get on to reference my steps in how I got it rolling, just skip down to the pre-reqs, otherwise…
I was inspired by this post on linked in talking about an AI lab using podman, and since I already , and I was like “let’s try this new podman ai lab thing!”, and I wanted to see if I could get my current containerized workloads (especially for Stable Diffusions) which are running managed by podman desktop and it’s new AI features. That post especially pumped up this repo about running podman ai lab on openshift AI, and I was really hopeful for managing my workloads, running in Kubernetes, from a GUI that integrated with podman.
However, I found a few limitations for myself and my own use case… I don’t currently have access to a lab running OpenShift AI – which requires an OpenShift install to begin with. I’d really rather use my local lab, because I’m trying to get hands on with my environment, and… An OpenShift environment is going to be little sipping water from a firehose. So, what about another way to run Kubernetes?
And then I ran into a number of other limitations…
I was originally hoping that I could get my workloads running in K8s, and maybe even in kind (kubernetes in docker) because it’s often used for local dev, but… I’m not sure we’re going to see GPU functionality in kind at the moment (even if someone put together a sweet hack [that’s now a little outdated]).
RECORD SCRATCH: Wait up! It might be possible in KIND! I found an article about K8s Kind with GPUs. I gave it a whirl, and it might be a path forward, but I didn’t have great success right out of the box, I think part of it is that we need a more feature rich runtime for kind, like Antonio’s gist about using CRI-O + KIND.
And, I had trouble getting it to cooperate with GPU allocations for my workloads in general. I know, GUIs can be limited in terms of options, but, at almost every turn I couldn’t easily spin up a workload with GPU access.
Unfortunately, this is kind of a disappointment compared to say ollama, which does spin up LLMs using GPU inferrence quickly, and containerized on my local machine. But! I’ll keep tracking it to see how it progresses.
Also, I should note, some of my main gripes are actually with kind + GPU, and only somewhat with Podman Desktop, but, I feel like experiences like ollama are a lot more turnkey in part.
However! Podman Desktop is still really neat, and it’s beautifully made, and overall, mostly intuitive. A few things tripped me up, but maybe you’ll learn from what I did…
Pre-requisities
I’m on fedora 39, I happened to install a workstation variant because it’s handy to have the GUI for other ai/ml software I use (notably, kohya_ss, which I apparently need xwindows for)
I also wound up installing a number of other things to get podman + my nvidia 3090 working originally, but! I did document the steps I took in this blog article about using ooba on Fedora so that could be a good reference if something isn’t adding up.
Installing podman desktop
I used: https://podman-desktop.io/downloads to start my install.
So, I’m not a huge flatpak user, I have used it on my steam deck, but, let’s try it on fedora…
flatpak install flathub io.podman_desktop.PodmanDesktop
…You might want to run it as sudo
because I had to auth a buuuunch from the GUI.
Simple enough, I hit the windoze key and typed in “podman” from Gnome and there it is “Podman Desktop”
I ran through the setup with all the defaults, just basically me hitting next and then typing in my r00t password a bunch of times.
Then I hit the containers window to see what’s running – ah ha! Indeed, my kohya ss that I have running in podman is showing up in the list, hurray!
Installing kind
So, apparently there’s a bunch of Kubernetes-type features with podman desktop, and it must use KIND – kubernetes-in-docker.
I don’t have this installed on this machine, so, there was a warning icon along the bottom of my podman desktop with a note about kind, I clicked it, and… yep, installed kind.
I ran a:
from the CLI, but, nothing shows, so I checked the docs.
So, let’s try creating a pod…
From the “pod” section of podman I go to create a pod, which, gives me a CLI item to paste:
podman pod create --label myFirstPod
However, this doesn’t appear to have created a kind cluster behind the scenes, weird, so I try using the “Deploy generated pod to Kubernetes” option… Nothing!
And I notice a message Cluster Not Reachable
as a status from my Pod section of podman desktop.
So, let’s dig into this…
Let’s use the podman desktop kind docs.
Since I have the kind
CLI installed, now I have to create a kind cluster myself.
We can do this through the Settings -> Resources
part of desktop.
In the podman box, there’s a “create kind cluster” button, and I just used the defaults (which even includes countour for ingress), and there’s a “show logs” portion, which I followed along on.
Now when I run kind get clusters
I can see one! Cool.
And lastly, we’re going to need a Kubernetes context
However, without doing anything, I can see at the CLI:
$ kubectl config current-context
kind-kind-cluster
Seems good, and where I used to Cluster not reachable
I now see Connected
! Let’s test drive it…
Also note that there’s a dark purple bar at the bottom of the podman desktop UI that has a k8s icon and the name of your cluster, you could switch to another cluster from there (even a remote openshift one, too)
Getting a pod running on kind from Podman Desktop
I didn’t have good luck at first. I’d run the podman pod create
copied from the UI again, Using the ...
context menu on my pod named hungry_ramanujan
, I run Deploy the Kubernetes
and use the default options… And I wait, without output… And it’s just hanging there for a half hour or better! So I tried running the flatpak interactively with flatpak run io.podman_desktop.PodmanDesktop
to see if I got any actionable log output – I didn’t.
So I tried another approach…
First, I pulled an image to test with podman pull quay.io/fedora/fedora
(I actually used the UI for fun). Then from the images tab, I used the ...
context menu to push to kind.
Then, I hit the play button for the fedora image, kept all the defaults, and after I acccept it sends me to the container details, and you’ll have tabs for summary/logs/inspect/kube/terminal/tty
The Kube
tab has conveniently generated a pod spec for me, which is:
# Save the output of this file and use kubectl create -f to import
# it into Kubernetes.
#
# Created with podman-4.9.4
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: "2024-07-19T12:44:38Z"
labels:
app: romanticmcclintock-pod
name: romanticmcclintock-pod
spec:
containers:
- env:
- name: TERM
value: xterm
image: quay.io/fedora/fedora:latest
name: romanticmcclintock
stdin: true
tty: true
(you can also do this at the CLI with podman generate kube my-silly-container-name
)
Great, now we should be able to hit the rocket ship icon to deploy to our pod to kind… you’ll get a few options, I kept the defaults and hit deploy…
doug@stimsonmt:~/ai-ml/kohya_ss$ kubectl get pods
NAME READY STATUS RESTARTS AGE
romanticmcclintock-pod 1/1 Running 0 23s
Looks good! Awesome.
Let’s see what it can do with my current workloads!
So I already have the Kohya SS GUI running in a podman container, here’s the bash script I use to kick mine off…
#!/bin/bash
# -e DISPLAY=0:0 \
PORT=7860
screen -S kohya podman run \
-e SAFETENSORS_FAST_GPU=1 \
-e DISPLAY=$DISPLAY -v /tmp/.X11-unix:/tmp/.X11-unix \
-v /home/doug/ai-ml/kohya_ss/dataset:/dataset:rw \
-v /home/doug/ai-ml/automatic/models/Stable-diffusion/:/models:rw \
-v /home/doug/ai-ml/kohya_ss/.cache/user:/app/appuser/.cache:rw \
-v /home/doug/ai-ml/kohya_ss/.cache/triton:/app/appuser/.triton:rw \
-v /home/doug/ai-ml/kohya_ss/.cache/config:/app/appuser/.config:rw \
-v /home/doug/ai-ml/kohya_ss/.cache/nv:/app/appuser/.nv:rw \
-v /home/doug/ai-ml/kohya_ss/.cache/keras:/app/appuser/.keras:rw \
-p $PORT:7860 \
--security-opt=label=disable \
--device=nvidia.com/gpu=all \
-i \
--tty \
--shm-size=512m \
--user root \
ghcr.io/bmaltais/kohya-ss-gui
Pay close attention to --device=nvidia.com/gpu=all
, it’s important because it’s using CDI, the container device interface, also check out this nvidia article. And it’s the way that we say this workload should use a GPU, at the container level.
And then if I inspect this in podman desktop Kube
view, I get this yaml (that I snipped a little):
---
apiVersion: v1
kind: Pod
metadata:
labels:
app: affectionategould-pod
name: affectionategould-pod
spec:
containers:
- args:
- python3
- kohya_gui.py
- --listen
- 0.0.0.0
- --server_port
- "7860"
- --headless
env:
- name: TERM
value: xterm
- name: SAFETENSORS_FAST_GPU
value: "1"
- name: DISPLAY
value: :1
image: ghcr.io/bmaltais/kohya-ss-gui:latest
name: affectionategould
ports:
- containerPort: 7860
hostPort: 7860
securityContext:
runAsGroup: 0
runAsNonRoot: true
runAsUser: 0
seLinuxOptions:
type: spc_t
stdin: true
tty: true
volumeMounts:
[..snip..]
volumes:
[..snip..]
While this is very very convenient, I’m not sure it’s actually going to be allocated a GPU in my kind cluster – which, might not even be possible. I did find someone proposed a quick hack to add GPU support to kind, and they have a PoC, but it’s on an older Kubernetes. But there’d have to be something there that indicates a request for a GPU, right?
RECORD SCRATCH! Wait that might be wrong, I found another approach for it – that uses the nvidia GPU operator.
That seems to be notably missing.
Giving a shot at kind + GPU support (which didn’t quite work!)
I followed the steps @ https://www.substratus.ai/blog/kind-with-gpus and…. I didn’t succeed, but! I think there might be hope in this method in the future.
The first step is to use nvidia-ctk
to set the runtime to docker
, but, that’s not a choice for us, and I checked the help and found that…
--runtime value the target runtime engine; one of [containerd, crio, docker] (default: "docker")
That might be the death knell for this approach, but I’m going to carry on with skipping the step…
I also set:
accept-nvidia-visible-devices-as-volume-mounts = true
Now, time to create a new kind cluster…
kind create cluster --name kind-gpu-enabled --config - <<EOF
apiVersion: kind.x-k8s.io/v1alpha4
kind: Cluster
nodes:
- role: control-plane
image: docker.io/kindest/node@sha256:047357ac0cfea04663786a612ba1eaba9702bef25227a794b52890dd8bcd692e
# required for GPU workaround
extraMounts:
- hostPath: /dev/null
containerPath: /var/run/nvidia-container-devices/all
EOF
Which only worked for me as root.
And I fire it up… Cluster comes up, but there’s a workaround listed in the article, I skip it (makes a symlink for /sbin/ldconfig.real
)
And we’re going to need helm with dnf install helm
…
Then, let’s get the operator going…
helm repo add nvidia https://helm.ngc.nvidia.com/nvidia || true
helm repo update
helm install --wait --generate-name \
-n gpu-operator --create-namespace \
nvidia/gpu-operator --set driver.enabled=false
Aaaaand, all my pods hung. I did finally find this k8s event:
root@stimsonmt:~# kubectl describe pod gpu-feature-discovery-58k7c -n gpu-operator | grep -P -A5 "^Events:"
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 3m39s default-scheduler Successfully assigned gpu-operator/gpu-feature-discovery-58k7c to kind-gpu-enabled-control-plane
Warning FailedCreatePodSandBox 3s (x17 over 3m39s) kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to get sandbox runtime: no runtime for "nvidia" is configured
So I tried to hack in…
root@stimsonmt:/home/doug/ai-ml/helm# nvidia-ctk runtime configure --runtime=crio --set-as-default
But it had no effect, we’re not really using crio (and it resulted in the same message, probably because crio hasn’t kicked in, you know, at all).
I did note that the cool SIG-Net dude aojea has a cool gist on CRIO + KIND, maybe for another day.
Installing podman AI lab
So, for this, I went and followed this article from Red Hat developer’s blog about Getting started with Podman AI Lab
Honestly, super simple, there’s a “puzzle piece” icon/UI tab that’s for Extensions – just do a search for “lab” in the “catalog” tab, and… click install. that’s it!
Now, let’s give it a test drive…
Following the steps from the “run LLMs locally” podman desktop page, I go to download a model from the Catalog
page, and… they are listed as… to CPU models! Nooooo! We want to use GPUs.
I download ibm/merlinite-7b-GGUF
only because I recognize it from Instruct Lab. So let’s give it a try and see what happens.
I find that there’s a “Services” section of Podman Desktop AI Lab, and it says that:
A model service offers a configurable endpoint via an OpenAI-compatible web server,
Sounds like that should work, so I choose to start the model service with the “New Model Service” button, which shows the model I downloaded in a drop down. I get a warning that I don’t have enough free memory, but I check free -m
and I actually have 26gigs free, while it reports I have 346 megs.
It pulls the ghcr.io/containers/podman-desktop-extension-ai-lab-playground-images/ai-lab-playground-chat:0.3.2.
image for me.
It shows I have an endpoint @ http://localhost:35127/v1 and it gives me some “client code” (a curl command”), and boo hoo…. Lists it with a CPU icon.
curl --location 'http://localhost:35127/v1/chat/completions' \
--header 'Content-Type: application/json' \
--data '{
"messages": [
{
"content": "You are an artist from the latter half of the 1800s, skilled at explaining programming concepts using vocabulary associated with art from the Academism, Realism and Impressionist periods.",
"role": "system"
},
{
"content": "Describe what a pointer is",
"role": "user"
}
]
}'
I give this a run, and… unfortunately, it’s…. verrrrry slow.
Ah, dear interlocutor! A pointer in the realm of programming is akin to a skilled craftsman's chisel, guiding and shaping the very essence of data. It is a variable that stores the memory address of another variable, allowing us to indirectly access and manipulate the value it points to. In the spirit of the Impressionist period, one could liken it to a painter's brush, subtly yet powerfully influencing the overall composition of our code. [...more drivel snipped!...]
Typical LLM yammering, but, it works!!
12 Jan 2024
Today we’re going to get a demo for Kubernetes Networking Interface (KNI) rolling! If you wanna skip the preamble and get to the terminal, skip down to the requirements.
What is KNI? It’s a “Foundational gRPC Network API specific for Kubernetes”, and it “revisit[s] past decisions for location of pod network setup/teardown”, and in my words, it’s an amazing way for us to think about deeper integration for network plugins in the context of Kubernetes, a kind of next level for what we can only solve today using CNI (the container networking interface).
Mike Zappa, sig-network chair, CNI maintainer, containerd contributor and sport climber/alpinist and trail runner, has been spearheading an effort to bridge the gap for network implementations for Kubernetes, and if you want to see some more of what Zappa is thinking, check out this presentation: “KNI [K8s Network Interface]”.
And, maybe if I’m lucky, and Mike likes crack climbing someday I can get Mike to climb Upper West in the “tough schist” my neighborhood, I’ll just hike to the base though, I’m just your average granola crunching telemark hippy, but I love alpine travel myself.
Remember when I gave a talk on “CNI 2.0: Vive la revolution!”, I wrote that:
Did you know CNI is container orchestration agnostic? It’s not Kubernetes specific. Should it stay that way? People are looking for translation layers between Kubernetes and CNI itself.
What I’m talking about is that Container Networking Interface (CNI) (which I know, and love!), it’s not purpose built for Kubernetes. It’s orchestration engine agnostic – remember when people talked about different orchestration engines for containers? Like Mesos, or, wait? I can’t think of more for the life of me… It’s for a good reason I can’t think of another right now: Kubernetes is the container orchestration engine. CNI predates the meteoric rise of Kubernetes, and CNI has lots of great things going for it – it’s modular, it has an ecosystem, and it’s got a specification that I think is simple to use and to understand. I love this. But, I’m both a network plugin developer as well as a Kubernetes developer, I want to write tools that both do the networking jobs I need to do, but also integrate with Kubernetes. I need a layer that enables this, and… KNI sure looks like just the thing to bring the community forward. I think there’s a lot of potential here for how we think about extensibility for networking in Kubernetes with KNI, and it might be a great place to do a lot of integrations for Kubernetes, such as Kubernetes Native Multi-networking [KEP], dynamic resource allocation, and maybe even gateway API, and my gears are turning on how to use to further the technology created by the Network Plumbing Working Group community.
As a maintainer of Multus CNI, which can provide multiple interfaces to pods in Kubernetes by allowing users the ability to specify CNI configurations in Kubernetes custom resources, we have a project which does both of these things:
- It can execute CNI plugins
- It can operate within Kubernetes
Creative people that are looking to couple richer Kubernetes interaction with their CNI plugins look at Multus as a way to potentially act as a Kubernetes runtime. I love this creative usage, and I encourage it as much as it makes sense. But, it’s not really what Multus is designed for, Multus is designed for multi-networking specifically (e.g. giving multiple interfaces for a pod). It just happens to do both of these things well. What we really need is something that’s lofted up another layer with deeper Kubernetes intergration – and that something… Is KNI! And this is just the tip of the iceberg.
But on to today: Let’s get the KNI demo rocking and rolling.
Disclaimer! This does use code and branches that could see significant change. But, hopefully it’s enough to get you started.
Requirements
- A machine running Fedora 38 (should be easy enough to pick another distro, though)
- A basic ability to surf around Kubernetes.
- A roobois latte (honestly you don’t need coffee for this one, it’s smooth sailing)
What we’re gonna do…
For this demo, we actually replace a good few core components with modified versions…
- The Kubelet as part of Kubernetes
- We’ll replace containerd with a modified one.
- And we’ll install a “network runtime”
Under KNI, a “network runtime” is your implementation where you do the fun stuff that you want to do. In this case, we just have a basic runtime that Zappa came up with that calls CNI. So, it essentially exercises stuff that you should already have, but we’ll get to see where it’s hooked in when we’ve got it all together.
Ready? Let’s roll.
System basics setup
First I installed Fedora 38. And then, you might want to install some things you might need…
dnf install -y wget make task
Then, install go 1.21.6
sudo tar -C /usr/local -xzf go1.21.6.linux-amd64.tar.gz
Setup your path, etc.
Install kind
go install sigs.k8s.io/kind@v0.20.0
Install docker
…From their steps.
And add yourself as a user from the post install docs
Install kubectl
curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
sudo mv kubectl /usr/bin/
sudo chmod +x /usr/bin/kubectl
Make sure kind can run with:
kind create cluster
kubectl cluster-info --context kind-kind
kind delete cluster
Now let’s spin up Zappa’s Awesomeness
Alright, next we’re going to install and build from a bunch of repositories.
Demo repo
Mike’s got a repo together which spins up the demo for us… So, let’s get that first.
Update! There’s a sweet new way to run this that saves a bunch of manual steps, so we’ll use that now.
Thanks Tomo for making this much easier!!
git clone https://github.com/MikeZappa87/kni-demo.git
cd kni-demo
task 01-init 02-build 03-setup
Let it rip and watch the KNI runtime go!
Then create a pod…
cat <<EOF | kubectl create -f -
apiVersion: v1
kind: Pod
metadata:
name: samplepod
spec:
containers:
- name: samplepod
command: ["/bin/ash", "-c", "trap : TERM INT; sleep infinity & wait"]
image: alpine
EOF
Let it come up, and you can see the last log item from kni…
$ docker exec -it test1-worker systemctl status kni
● kni.service
[...]
Jan 12 18:20:16 test1-worker network-runtime[576]: {"level":"info","msg":"ipconfigs received for id: e42ffb53c0021a8d6223bc324e7771d31910e6973c7fea708ee3f673baac9a1f ip: map[cni0:mac:\"36:e0:1f:e6:21:bf\" eth0:ip:\"10.244.1.3\" mac:\"a2:19:92:bc:f1:e9\" lo:ip:\"127.0.0.1\" ip:\"::1\" mac:\"00:00:00:00:00:00\" vetha61196e4:mac:\"62:ac:54:83:31:31\"]","time":"2024-01-12T18:20:16Z"}
Voila! We can see that KNI processed, it has all that information about the pod networking which it’s showing us!
But, this is only the tip of the iceberg! While we’re not doing a lot here other than saying “Cool, you can run Flannel”, for the next episode… We’ll look at creating a Hello World for a KNI runtime!