Using LLaVA for captioning Stable Diffusion fine tuning datasets

In this article, we’re going to use LLaVa (running under ollama) to caption images for a Stable Diffusion training dataset, well fine tuning in my case, I’ve usually been baking LoRAs with the Kohya SS GUI.

Something I’ve been hearing about is that people are using LLaVa to caption their datasets for training Stable Diffusion LoRAs (low rank adapations, a kind of fine tuning of a model). And I was like – this would be great, I have a few big datasets, and I have my own ways of adding some info from metadata I might have – but, I’d love to get it more detailed, too.

From the page:

LLaVA represents a novel end-to-end trained large multimodal model that combines a vision encoder and Vicuna for general-purpose visual and language understanding, achieving impressive chat capabilities mimicking spirits of the multimodal GPT-4 and setting a new state-of-the-art accuracy on Science QA.

What the really means in plainer terms is – you can prompt it with text and images and get text output generated by the LLM, relevant to the imagery you provided.

Previously, I’ve been using BLIP (huggingface) from within the captioning tools of Kohya SS (my favorite training app of the moment), and then I’d sometimes munge those captions with sed and call it a day.

However, this really method using LLaVa really intruiges me, so I wanted to get it setup.

Granted – I did think about whipping up a script to do this using GPT-4 multimodal, and I will also probably try that at some point. But! It’s not nearly as fun, nor as rad, as having your own local multimodal setup. I also had put this project in my notes for that purpose too: github.com/jiayev/GPT4V-Image-Captioner. No tto mention, since you own it on your own gear – you get to make your own rules. Naturally, I love cloud computing, but I’m often reminded of Stallman’s “can you trust your computer?” essay in the age of centralized applications.

I will also provide my own script to make API queries against ollama, but, I’ll provide a few links to other tools if you’d rather start with something more full fleged – in this case I just wanted to be able to do my own prompt engineering and having my own method to keep munging my captions. It’s not a lot of code, so it’s easy to get wrangled in yourself.

Prerequisites

  • A machine where you can run ollama, preferably, with a GPU
  • I’ll be using Fedora linux, you can adapt to other distros.

Getting LLaVa running with ollama

First get ollama installed, it’s a nicely packaged opionated way to get LLMs running in containers, which I quite like. If you wanna learn more about LLaVa on ollama, you can also check out this great youtube video by Learn Data with Mark.

Then I went and pulled this model (you can pull it manually, or it should pull when you do ollama run ...):

https://ollama.com/library/llava:13b

Ok, I go ahead and start ollama serve…

$ screen -S ollamaserve
ollama serve

I’m starting with the 13b param model @ https://ollama.com/library/llava:13b

Then I kick it off with…

$ ollama run llava:13b

And let’s give it a test drive, I use an example image from a dataset I have going for Adirondack guideboats

>>> please describe this image /path-to/dataset/guideboat/.../example.jpg
Added image '/path-to/dataset/guideboat/.../example.jpg'
 The image you've provided appears to be an old black and white photograph. It shows a group of people in boats on calm waters, likely a river or lake. There are several individuals visible, with one person who seems to be actively
rowing or maneuvering the boat. In the background, there is land with trees and vegetation, indicating a natural setting. The image has a vintage appearance due to its monochrome color scheme and grainy texture, which suggests it 
could be from an earlier time period, possibly mid-20th century or earlier, judging by the style of clothing and equipment.

Looks like a pure win! I didn’t even look at the image, I just know it’s right, that matches most of the images.

Captioning images: Some existing tools

I saw on reddit that taggui now supports LLaVa captioning, so you might want to check it out @ https://github.com/jhc13/taggui/

Captioning images

So seems like there’s two ways to do this…

  1. From the CLI I can provide a path to an image
  2. Via the API I can provide a base64 encoded image

we’re going to use this library…

https://pypi.org/project/ollama/

And take a look at the API docs…

https://github.com/ollama/ollama/blob/main/docs/api.md

Later, I think we’re going to create a modelfile – for example… https://github.com/ollama/ollama/blob/main/docs/modelfile.md – there we can better define an overall system prompt.

In the meanwhile – we will just confidently emulate it.

For now though, I think I only need a little bit of context for my system prompt, so we’ll just jam it right in.

I used GPT-4 to feed in some info about the library and the API and having it whip me up something quick – this script doesn’t need to be intense, just an intro prompt and then a way to cycle through images and output a caption.

Using my dougbtv/ollama-llava-captioner script

I put my quick scripts up on github @ dougbtv/ollama-llava-captioner

First, you’ve got to make sure you pip install ollama, it’s the only dep…

Then, you just run it with a path to your folder with images…

doug@stimsonmt:~/ai-ml/llava-captioner$ python llava-caption-ollama.py /path/to/dataset/guideboat/images/15_guideboat/
[...]
Processing 005C3BDB-6751-4E98-9CEE-352236552770.jpg (1/1260)...
Generated Caption:  Vintage black and white photograph of two people in a small boat on a wavy lake. They appear to be engaged in fishing or enjoying the tranquil water. The background shows a hazy, serene skyline with houses along the shoreline under a cloudy sky.
Completed 0.08% in 2.12s, estimated 2663.50s left.
Processing 00764EBA-A1C4-4687-B45A-226973315006.jpg (2/1260)...
Generated Caption:  An old, vintage black and white photograph depicting an early aviation scene. A seaplane is gliding over a serene lake nestled among pine trees and hills in the background, capturing a moment of adventure and exploration amidst nature's tranquility.
Completed 0.16% in 4.16s, estimated 2615.48s left.

It uses a kind of “pre prompt” to set up the instructions for captioning. I think you should probably tune this, so you can start with the prompt.txt in the root dir of the repo, and then modify it yourself, and you either just change it in place or run the script with a path to where your saved your modified (or new!) prompt.

$ python llava-caption-ollama.py /path/to/dataset/guideboat/images/15_guideboat/ /path/to/my.prompt.txt

In both cases it will save a .txt file in the same folder as your images, with the same base file name (e.g. before the extension).

In conclusion…

This seems to be more accurate, at least reading as a human, than BLIP captioning is. I haven’t put together an organized “before/after” on the datasets I’ve tried this with, but my intuition says it does work quite a bit better, I’ll try to come back with some results in the future, but until then…

Happy captioning!

Let's try Podman Desktop's AI Lab

We’re going to try to get some AI/ML workloads running using Podman Desktop’s new AI Lab Feature!

Podman Desktop, in the project’s words, is:

an open source graphical tool enabling you to seamlessly work with containers and Kubernetes from your local environment.

If you just want to get on to reference my steps in how I got it rolling, just skip down to the pre-reqs, otherwise…

I was inspired by this post on linked in talking about an AI lab using podman, and since I already , and I was like “let’s try this new podman ai lab thing!”, and I wanted to see if I could get my current containerized workloads (especially for Stable Diffusions) which are running managed by podman desktop and it’s new AI features. That post especially pumped up this repo about running podman ai lab on openshift AI, and I was really hopeful for managing my workloads, running in Kubernetes, from a GUI that integrated with podman.

However, I found a few limitations for myself and my own use case… I don’t currently have access to a lab running OpenShift AI – which requires an OpenShift install to begin with. I’d really rather use my local lab, because I’m trying to get hands on with my environment, and… An OpenShift environment is going to be little sipping water from a firehose. So, what about another way to run Kubernetes?

And then I ran into a number of other limitations…

I was originally hoping that I could get my workloads running in K8s, and maybe even in kind (kubernetes in docker) because it’s often used for local dev, but… I’m not sure we’re going to see GPU functionality in kind at the moment (even if someone put together a sweet hack [that’s now a little outdated]).

RECORD SCRATCH: Wait up! It might be possible in KIND! I found an article about K8s Kind with GPUs. I gave it a whirl, and it might be a path forward, but I didn’t have great success right out of the box, I think part of it is that we need a more feature rich runtime for kind, like Antonio’s gist about using CRI-O + KIND.

And, I had trouble getting it to cooperate with GPU allocations for my workloads in general. I know, GUIs can be limited in terms of options, but, at almost every turn I couldn’t easily spin up a workload with GPU access.

Unfortunately, this is kind of a disappointment compared to say ollama, which does spin up LLMs using GPU inferrence quickly, and containerized on my local machine. But! I’ll keep tracking it to see how it progresses.

Also, I should note, some of my main gripes are actually with kind + GPU, and only somewhat with Podman Desktop, but, I feel like experiences like ollama are a lot more turnkey in part.

However! Podman Desktop is still really neat, and it’s beautifully made, and overall, mostly intuitive. A few things tripped me up, but maybe you’ll learn from what I did…

Pre-requisities

I’m on fedora 39, I happened to install a workstation variant because it’s handy to have the GUI for other ai/ml software I use (notably, kohya_ss, which I apparently need xwindows for)

I also wound up installing a number of other things to get podman + my nvidia 3090 working originally, but! I did document the steps I took in this blog article about using ooba on Fedora so that could be a good reference if something isn’t adding up.

Installing podman desktop

I used: https://podman-desktop.io/downloads to start my install.

So, I’m not a huge flatpak user, I have used it on my steam deck, but, let’s try it on fedora…

flatpak install flathub io.podman_desktop.PodmanDesktop

…You might want to run it as sudo because I had to auth a buuuunch from the GUI.

Simple enough, I hit the windoze key and typed in “podman” from Gnome and there it is “Podman Desktop”

I ran through the setup with all the defaults, just basically me hitting next and then typing in my r00t password a bunch of times.

Then I hit the containers window to see what’s running – ah ha! Indeed, my kohya ss that I have running in podman is showing up in the list, hurray!

Installing kind

So, apparently there’s a bunch of Kubernetes-type features with podman desktop, and it must use KIND – kubernetes-in-docker.

I don’t have this installed on this machine, so, there was a warning icon along the bottom of my podman desktop with a note about kind, I clicked it, and… yep, installed kind.

I ran a:

kind get clusters

from the CLI, but, nothing shows, so I checked the docs.

So, let’s try creating a pod…

From the “pod” section of podman I go to create a pod, which, gives me a CLI item to paste:

podman pod create --label myFirstPod

However, this doesn’t appear to have created a kind cluster behind the scenes, weird, so I try using the “Deploy generated pod to Kubernetes” option… Nothing!

And I notice a message Cluster Not Reachable as a status from my Pod section of podman desktop.

So, let’s dig into this…

Let’s use the podman desktop kind docs.

Since I have the kind CLI installed, now I have to create a kind cluster myself.

We can do this through the Settings -> Resources part of desktop.

In the podman box, there’s a “create kind cluster” button, and I just used the defaults (which even includes countour for ingress), and there’s a “show logs” portion, which I followed along on.

Now when I run kind get clusters I can see one! Cool.

And lastly, we’re going to need a Kubernetes context

However, without doing anything, I can see at the CLI:

$ kubectl config current-context
kind-kind-cluster

Seems good, and where I used to Cluster not reachable I now see Connected! Let’s test drive it…

Also note that there’s a dark purple bar at the bottom of the podman desktop UI that has a k8s icon and the name of your cluster, you could switch to another cluster from there (even a remote openshift one, too)

Getting a pod running on kind from Podman Desktop

I didn’t have good luck at first. I’d run the podman pod create copied from the UI again, Using the ... context menu on my pod named hungry_ramanujan, I run Deploy the Kubernetes and use the default options… And I wait, without output… And it’s just hanging there for a half hour or better! So I tried running the flatpak interactively with flatpak run io.podman_desktop.PodmanDesktop to see if I got any actionable log output – I didn’t.

So I tried another approach…

First, I pulled an image to test with podman pull quay.io/fedora/fedora (I actually used the UI for fun). Then from the images tab, I used the ... context menu to push to kind.

Then, I hit the play button for the fedora image, kept all the defaults, and after I acccept it sends me to the container details, and you’ll have tabs for summary/logs/inspect/kube/terminal/tty

The Kube tab has conveniently generated a pod spec for me, which is:

# Save the output of this file and use kubectl create -f to import
# it into Kubernetes.
#
# Created with podman-4.9.4
apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: "2024-07-19T12:44:38Z"
  labels:
    app: romanticmcclintock-pod
  name: romanticmcclintock-pod
spec:
  containers:
  - env:
    - name: TERM
      value: xterm
    image: quay.io/fedora/fedora:latest
    name: romanticmcclintock
    stdin: true
    tty: true

(you can also do this at the CLI with podman generate kube my-silly-container-name)

Great, now we should be able to hit the rocket ship icon to deploy to our pod to kind… you’ll get a few options, I kept the defaults and hit deploy…

doug@stimsonmt:~/ai-ml/kohya_ss$ kubectl get pods
NAME                     READY   STATUS    RESTARTS   AGE
romanticmcclintock-pod   1/1     Running   0          23s

Looks good! Awesome.

Let’s see what it can do with my current workloads!

So I already have the Kohya SS GUI running in a podman container, here’s the bash script I use to kick mine off…

#!/bin/bash
# -e DISPLAY=0:0 \
PORT=7860
screen -S kohya podman run \
-e SAFETENSORS_FAST_GPU=1 \
-e DISPLAY=$DISPLAY -v /tmp/.X11-unix:/tmp/.X11-unix \
-v /home/doug/ai-ml/kohya_ss/dataset:/dataset:rw \
-v /home/doug/ai-ml/automatic/models/Stable-diffusion/:/models:rw \
-v /home/doug/ai-ml/kohya_ss/.cache/user:/app/appuser/.cache:rw \
-v /home/doug/ai-ml/kohya_ss/.cache/triton:/app/appuser/.triton:rw \
-v /home/doug/ai-ml/kohya_ss/.cache/config:/app/appuser/.config:rw \
-v /home/doug/ai-ml/kohya_ss/.cache/nv:/app/appuser/.nv:rw \
-v /home/doug/ai-ml/kohya_ss/.cache/keras:/app/appuser/.keras:rw  \
-p $PORT:7860 \
--security-opt=label=disable \
--device=nvidia.com/gpu=all \
-i \
--tty \
--shm-size=512m \
--user root \
ghcr.io/bmaltais/kohya-ss-gui

Pay close attention to --device=nvidia.com/gpu=all, it’s important because it’s using CDI, the container device interface, also check out this nvidia article. And it’s the way that we say this workload should use a GPU, at the container level.

And then if I inspect this in podman desktop Kube view, I get this yaml (that I snipped a little):

---
apiVersion: v1
kind: Pod
metadata:
  labels:
    app: affectionategould-pod
  name: affectionategould-pod
spec:
  containers:
  - args:
    - python3
    - kohya_gui.py
    - --listen
    - 0.0.0.0
    - --server_port
    - "7860"
    - --headless
    env:
    - name: TERM
      value: xterm
    - name: SAFETENSORS_FAST_GPU
      value: "1"
    - name: DISPLAY
      value: :1
    image: ghcr.io/bmaltais/kohya-ss-gui:latest
    name: affectionategould
    ports:
    - containerPort: 7860
      hostPort: 7860
    securityContext:
      runAsGroup: 0
      runAsNonRoot: true
      runAsUser: 0
      seLinuxOptions:
        type: spc_t
    stdin: true
    tty: true
    volumeMounts:
      [..snip..]
  volumes:
    [..snip..]

While this is very very convenient, I’m not sure it’s actually going to be allocated a GPU in my kind cluster – which, might not even be possible. I did find someone proposed a quick hack to add GPU support to kind, and they have a PoC, but it’s on an older Kubernetes. But there’d have to be something there that indicates a request for a GPU, right?

RECORD SCRATCH! Wait that might be wrong, I found another approach for it – that uses the nvidia GPU operator.

That seems to be notably missing.

Giving a shot at kind + GPU support (which didn’t quite work!)

I followed the steps @ https://www.substratus.ai/blog/kind-with-gpus and…. I didn’t succeed, but! I think there might be hope in this method in the future.

The first step is to use nvidia-ctk to set the runtime to docker, but, that’s not a choice for us, and I checked the help and found that…

--runtime value           the target runtime engine; one of [containerd, crio, docker] (default: "docker")

That might be the death knell for this approach, but I’m going to carry on with skipping the step…

I also set:

accept-nvidia-visible-devices-as-volume-mounts = true 

Now, time to create a new kind cluster…

kind create cluster --name kind-gpu-enabled --config - <<EOF
apiVersion: kind.x-k8s.io/v1alpha4
kind: Cluster
nodes:
- role: control-plane
  image: docker.io/kindest/node@sha256:047357ac0cfea04663786a612ba1eaba9702bef25227a794b52890dd8bcd692e
  # required for GPU workaround
  extraMounts:
    - hostPath: /dev/null
      containerPath: /var/run/nvidia-container-devices/all
EOF

Which only worked for me as root.

And I fire it up… Cluster comes up, but there’s a workaround listed in the article, I skip it (makes a symlink for /sbin/ldconfig.real)

And we’re going to need helm with dnf install helm

Then, let’s get the operator going…

helm repo add nvidia https://helm.ngc.nvidia.com/nvidia || true
helm repo update
helm install --wait --generate-name \
     -n gpu-operator --create-namespace \
     nvidia/gpu-operator --set driver.enabled=false

Aaaaand, all my pods hung. I did finally find this k8s event:

root@stimsonmt:~# kubectl describe pod gpu-feature-discovery-58k7c -n gpu-operator | grep -P -A5 "^Events:"
Events:
  Type     Reason                  Age                  From               Message
  ----     ------                  ----                 ----               -------
  Normal   Scheduled               3m39s                default-scheduler  Successfully assigned gpu-operator/gpu-feature-discovery-58k7c to kind-gpu-enabled-control-plane
  Warning  FailedCreatePodSandBox  3s (x17 over 3m39s)  kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to get sandbox runtime: no runtime for "nvidia" is configured

So I tried to hack in…

root@stimsonmt:/home/doug/ai-ml/helm# nvidia-ctk runtime configure --runtime=crio --set-as-default

But it had no effect, we’re not really using crio (and it resulted in the same message, probably because crio hasn’t kicked in, you know, at all).

I did note that the cool SIG-Net dude aojea has a cool gist on CRIO + KIND, maybe for another day.

Installing podman AI lab

So, for this, I went and followed this article from Red Hat developer’s blog about Getting started with Podman AI Lab

Honestly, super simple, there’s a “puzzle piece” icon/UI tab that’s for Extensions – just do a search for “lab” in the “catalog” tab, and… click install. that’s it!

Now, let’s give it a test drive…

Following the steps from the “run LLMs locally” podman desktop page, I go to download a model from the Catalog page, and… they are listed as… to CPU models! Nooooo! We want to use GPUs.

I download ibm/merlinite-7b-GGUF only because I recognize it from Instruct Lab. So let’s give it a try and see what happens.

I find that there’s a “Services” section of Podman Desktop AI Lab, and it says that:

A model service offers a configurable endpoint via an OpenAI-compatible web server,

Sounds like that should work, so I choose to start the model service with the “New Model Service” button, which shows the model I downloaded in a drop down. I get a warning that I don’t have enough free memory, but I check free -m and I actually have 26gigs free, while it reports I have 346 megs.

It pulls the ghcr.io/containers/podman-desktop-extension-ai-lab-playground-images/ai-lab-playground-chat:0.3.2. image for me.

It shows I have an endpoint @ http://localhost:35127/v1 and it gives me some “client code” (a curl command”), and boo hoo…. Lists it with a CPU icon.

curl --location 'http://localhost:35127/v1/chat/completions' \
--header 'Content-Type: application/json' \
--data '{
  "messages": [
    {
      "content": "You are an artist from the latter half of the 1800s, skilled at explaining programming concepts using vocabulary associated with art from the Academism, Realism and Impressionist periods.",
      "role": "system"
    },
    {
      "content": "Describe what a pointer is",
      "role": "user"
    }
  ]
}'

I give this a run, and… unfortunately, it’s…. verrrrry slow.

Ah, dear interlocutor! A pointer in the realm of programming is akin to a skilled craftsman's chisel, guiding and shaping the very essence of data. It is a variable that stores the memory address of another variable, allowing us to indirectly access and manipulate the value it points to. In the spirit of the Impressionist period, one could liken it to a painter's brush, subtly yet powerfully influencing the overall composition of our code. [...more drivel snipped!...]

Typical LLM yammering, but, it works!!

Reimagine Kubernetes networking, let's get a KNI (Kubernetes Networking Interface/reImagined) demo going!

Today we’re going to get a demo for Kubernetes Networking Interface (KNI) rolling! If you wanna skip the preamble and get to the terminal, skip down to the requirements.

What is KNI? It’s a “Foundational gRPC Network API specific for Kubernetes”, and it “revisit[s] past decisions for location of pod network setup/teardown”, and in my words, it’s an amazing way for us to think about deeper integration for network plugins in the context of Kubernetes, a kind of next level for what we can only solve today using CNI (the container networking interface).

Mike Zappa, sig-network chair, CNI maintainer, containerd contributor and sport climber/alpinist and trail runner, has been spearheading an effort to bridge the gap for network implementations for Kubernetes, and if you want to see some more of what Zappa is thinking, check out this presentation: “KNI [K8s Network Interface]”.

And, maybe if I’m lucky, and Mike likes crack climbing someday I can get Mike to climb Upper West in the “tough schist” my neighborhood, I’ll just hike to the base though, I’m just your average granola crunching telemark hippy, but I love alpine travel myself.

Remember when I gave a talk on “CNI 2.0: Vive la revolution!”, I wrote that:

Did you know CNI is container orchestration agnostic? It’s not Kubernetes specific. Should it stay that way? People are looking for translation layers between Kubernetes and CNI itself.

What I’m talking about is that Container Networking Interface (CNI) (which I know, and love!), it’s not purpose built for Kubernetes. It’s orchestration engine agnostic – remember when people talked about different orchestration engines for containers? Like Mesos, or, wait? I can’t think of more for the life of me… It’s for a good reason I can’t think of another right now: Kubernetes is the container orchestration engine. CNI predates the meteoric rise of Kubernetes, and CNI has lots of great things going for it – it’s modular, it has an ecosystem, and it’s got a specification that I think is simple to use and to understand. I love this. But, I’m both a network plugin developer as well as a Kubernetes developer, I want to write tools that both do the networking jobs I need to do, but also integrate with Kubernetes. I need a layer that enables this, and… KNI sure looks like just the thing to bring the community forward. I think there’s a lot of potential here for how we think about extensibility for networking in Kubernetes with KNI, and it might be a great place to do a lot of integrations for Kubernetes, such as Kubernetes Native Multi-networking [KEP], dynamic resource allocation, and maybe even gateway API, and my gears are turning on how to use to further the technology created by the Network Plumbing Working Group community.

As a maintainer of Multus CNI, which can provide multiple interfaces to pods in Kubernetes by allowing users the ability to specify CNI configurations in Kubernetes custom resources, we have a project which does both of these things:

  • It can execute CNI plugins
  • It can operate within Kubernetes

Creative people that are looking to couple richer Kubernetes interaction with their CNI plugins look at Multus as a way to potentially act as a Kubernetes runtime. I love this creative usage, and I encourage it as much as it makes sense. But, it’s not really what Multus is designed for, Multus is designed for multi-networking specifically (e.g. giving multiple interfaces for a pod). It just happens to do both of these things well. What we really need is something that’s lofted up another layer with deeper Kubernetes intergration – and that something… Is KNI! And this is just the tip of the iceberg.

But on to today: Let’s get the KNI demo rocking and rolling.

Disclaimer! This does use code and branches that could see significant change. But, hopefully it’s enough to get you started.

Requirements

  • A machine running Fedora 38 (should be easy enough to pick another distro, though)
  • A basic ability to surf around Kubernetes.
  • A roobois latte (honestly you don’t need coffee for this one, it’s smooth sailing)

What we’re gonna do…

For this demo, we actually replace a good few core components with modified versions…

  • The Kubelet as part of Kubernetes
  • We’ll replace containerd with a modified one.
  • And we’ll install a “network runtime”

Under KNI, a “network runtime” is your implementation where you do the fun stuff that you want to do. In this case, we just have a basic runtime that Zappa came up with that calls CNI. So, it essentially exercises stuff that you should already have, but we’ll get to see where it’s hooked in when we’ve got it all together.

Ready? Let’s roll.

System basics setup

First I installed Fedora 38. And then, you might want to install some things you might need…

dnf install -y wget make task

Then, install go 1.21.6

sudo tar -C /usr/local -xzf go1.21.6.linux-amd64.tar.gz

Setup your path, etc.

Install kind

go install sigs.k8s.io/kind@v0.20.0

Install docker

…From their steps.

And add yourself as a user from the post install docs

Install kubectl

curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
sudo mv kubectl /usr/bin/
sudo chmod +x /usr/bin/kubectl

Make sure kind can run with:

kind create cluster
kubectl cluster-info --context kind-kind
kind delete cluster

Now let’s spin up Zappa’s Awesomeness

Alright, next we’re going to install and build from a bunch of repositories.

Demo repo

Mike’s got a repo together which spins up the demo for us… So, let’s get that first.

Update! There’s a sweet new way to run this that saves a bunch of manual steps, so we’ll use that now.

Thanks Tomo for making this much easier!!

git clone https://github.com/MikeZappa87/kni-demo.git
cd kni-demo
task 01-init 02-build 03-setup

Let it rip and watch the KNI runtime go!

Then create a pod…

cat <<EOF | kubectl create -f -
apiVersion: v1
kind: Pod
metadata:
  name: samplepod
spec:
  containers:
  - name: samplepod
    command: ["/bin/ash", "-c", "trap : TERM INT; sleep infinity & wait"]
    image: alpine
EOF

Let it come up, and you can see the last log item from kni…

$ docker exec -it test1-worker systemctl status kni
● kni.service
[...]
Jan 12 18:20:16 test1-worker network-runtime[576]: {"level":"info","msg":"ipconfigs received for id: e42ffb53c0021a8d6223bc324e7771d31910e6973c7fea708ee3f673baac9a1f ip: map[cni0:mac:\"36:e0:1f:e6:21:bf\" eth0:ip:\"10.244.1.3\" mac:\"a2:19:92:bc:f1:e9\" lo:ip:\"127.0.0.1\" ip:\"::1\" mac:\"00:00:00:00:00:00\" vetha61196e4:mac:\"62:ac:54:83:31:31\"]","time":"2024-01-12T18:20:16Z"}

Voila! We can see that KNI processed, it has all that information about the pod networking which it’s showing us!

But, this is only the tip of the iceberg! While we’re not doing a lot here other than saying “Cool, you can run Flannel”, for the next episode… We’ll look at creating a Hello World for a KNI runtime!

Using Robocniconfig to generate CNI configs with an LLM

Time to automate some secondary networks automagically by having configurations produced using a large language model (LLM)!

I saw ollama at Kubecon and I had to play with it :D It’s really fun and I had good luck. I love the idea of having a containerized LLM and then hooking up my own toys to it! …I’ve also played with doing such myself, maybe a little bit that I’ve mentioned on this blog or you’ve seen a Dockerfile or two from me. But! It’s really nice to have an opinionated installation.

Here’s the concept of Robo CNI

  • You (the user) give an instruction as a short blurb about what kind of CNI configuration you want, like “I’ll take a macvlan CNI with an eth0 master, and IP addresses in the 192.0.2.0/24 range”
  • RoboCNI runs that through a large language model, with some added context to help the LLM figure out how to produce a CNI configuration
  • Then, we test that a bunch of times to see how effectively it works.

I’m doing something I heard data scientists tell me not do with it (paraphrased): “Don’t go have this thing automatically configure stuff for you”. Well… I won’t be doing in production. I’ve had enough pager calls at midnight in my life without some robot making it worse, but, I will do it in a lab like… whoa!

So I wrote an application Robo CNI Config! It basically generates CNI configurations based on your prompts, so you’d use the core application like:

$ ./robocni --json "dude hook me up with a macvlan mastered to eth0 with whereabouts on a 10.10.0.0/16"
{
    "cniVersion": "0.3.1",
    "name": "macvlan-whereabouts",
    "type": "macvlan",
    "master": "eth0",
    "mode": "bridge",
    "ipam": {
        "type": "whereabouts",
        "range": "10.10.0.225/28"
    }
}

I basically accomplished this by writing some quick and dirty tooling to interface with it, and then a context for the LLM that has some CNI configuration examples and some of my own instructions on how to configure CNI. You can see the context itself here in github.

It has stuff like you’d imagine something like ChatGPT is kind of “pre-prompted” with, like:

Under no circumstance should you reply with anything but a CNI configuration. I repeat, reply ONLY with a CNI configuration.
Put the JSON between 3 backticks like: ```{"json":"here"}```
Respond only with valid JSON. Respond with pretty JSON.
You do not provide any context or reasoning, only CNI configurations.

And stuff like that.

I also wrote a utility to automatically spin up pods given these configs and test connectivity between them. I then kicked it off for 5000 runs over a weekend. Naturally my kube cluster died from something else (VMs decided to choke), so I had to spin up a new cluster, and then start it again, but… It did make it through 5000 runs.

Amazingly: It was able to get a ping across pods that were automatically approximately 95% of the time. Way better than I expected, and I even see some mistakes that could be corrected, too.

It’s kinda biased towards macvlan, bridge and ipvlan CNI plugins, but… You gotta start somewhere.

So, let’s do it!

Pre-reqs…

There’s a lot, but I’m hopeful you can find enough pointers from other stuff in my blog, or… even a google search.

  • A working Kubernetes install (and access to the kubectl command, of course)
  • A machine with a GPU where you can run ollama
  • Multus CNI is installed, as well as Whereabouts IPAM CNI.

I used my own home lab. I’ve got a Fedora box with a Nvidia Geforce 3090, and it seems to be fairly decent for running LLMs and StableDiffusion training (which is the main thing I use it for, honestly!)

Installing and configuring ollama

Honestly, it’s really as easy as following the quickstart.

Granted – the main thing I did need to change was that it would accept queries from outside the network, so I did go and mess with the systemd units, you can find most of what you need in the installation on linux section, and then also the port info here in the FAQ.

I opted to use the LLaMA 2 model with 13B params. Use whatever you like (read: Use the biggest one you can!)

Using RoboCNI Config

First clone the repo @ https://github.com/dougbtv/robocniconfig/

Then kick off ./hack/build-go.sh to build the binaries.

Generally, I think this is going to work best on uniform environments (e.g. same interface names across all the hosts). I ran it on a master in my environment, which is a cluster with one 1 master and 2 workers.

You can test it out by running something like…

export OLLAMA_HOST=192.168.2.199
./robocni "give me a macvlan CNI configuration mastered to eth0 using whereabouts ipam ranged on 192.0.2.0/24"

That last part in the quotes is the “hint” which is basically the user prompt for the LLM that gets added to my larger context of what the LLM is supposed to do.

Then, you can run it in a loop.

But first! Make sure robocni is in your path.

./looprobocni --runs 5000

My first run of 5000 was…. Better than I expected!

Run number: 5000
Total Errors: 481 (9.62%)
Generation Errors: 254 (5.08%)
Failed Pod Creations: 226 (4.52%)
Ping Errors: 0 (0.00%)
Stats Array:
  Hint 1: Runs: 786, Successes: 772
  Hint 2: Runs: 819, Successes: 763
  Hint 3: Runs: 777, Successes: 768
  Hint 4: Runs: 703, Successes: 685
  Hint 5: Runs: 879, Successes: 758
  Hint 6: Runs: 782, Successes: 773

In this case, there’s 5% generation errors, but… Those can be trapped. So discounting those… 95% of the runs were able to spin up pods, and… Amazingly – whenever those pods came up, a ping between them worked.

Insert:<tim-and-eric-mind-blown.jpg>

I had more fun doing this than I care to admit :D

Installing Oobabooga LLM text webui on Fedora 38 (with podman!)

Today we’re going to run Ooobabooga – the text generation UI to run large language models (LLMs) on your local machine. We’ll make it containerized so that you can keep everything sitting pretty right where it is, otherwise.

Requirements

Looks like we’ll need podman compose if you don’t have it…

  • Fedora 38
  • A nVidia GPU
  • Podman (typically included by default)
  • podman-compose (optional)
  • The nVidia drivers

If you want podman compose, pick up:

pip3 install --user podman-compose

Driver install

You’re also going to need to install the nVidia driver, and the nVidia container tools

Before you install CUDA, do a dnf update (otherwise I wound up with mismatched deps), then install CUDA Toolkit (link is for F37 RPM, but it worked fine on F38)

And the container tools:

curl -s -L https://nvidia.github.io/libnvidia-container/centos8/libnvidia-container.repo | sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo
sudo dnf install nvidia-container-toolkit nvidia-docker2

(nvidia docker 2 might not be required.)

If you need more of a reference for GPUs on Red Hat flavored linuxes, this article from the red hat blog is very good

Let’s get started

In my experience, you’ve gotta use podman for GPU support in Fedora 38 (and probably a few versions earlier, is my guess).

Go ahead and clone this oobabooga/text-generation-webui

From their README, you’ve gotta set this up to do the container build…

ln -s docker/{Dockerfile,docker-compose.yml,.dockerignore} .
cp docker/.env.example .env
# Edit .env and set TORCH_CUDA_ARCH_LIST based on your GPU model
docker compose up --build

Importantly – you’ve got to set the TORCH_CUDA_ARCH_LIST. You can check that you’ve got the right one from this grid on wikipedia DOUBLE CHECK – everything, but especially that you’re using the right .env file. Because I really made that take longer than it should when I got that wrong.

TORCH_CUDA_ARCH_LIST=8.6+PTX

First, try building ti with podman – it worked for me on the second attempt. Unsure what went wrong, but I built with…

podman build -t dougbtv/oobabooga .

WARNING: These are some BIG images. I think mine came out to ~16 gigs.

And then I loaded that image it into podman…

I need make a few mods before I can run it… Copy the .env file also to the docker folder (we could probably improve this with a symlink in an earlier step). And while we’re here we’ll need to copy the template prompts, presets, too.

cp .env docker/.env
cp prompts/* docker/prompts/
cp presets/* docker/presets/

Now you’ll need at least a model, so to download one leveraging the container image…

podman-compose run --entrypoint "/bin/bash -c 'source venv/bin/activate; python download-model.py TheBloke/stable-vicuna-13B-GPTQ'" text-generation-webui

Naturally, change TheBloke/stable-vicuna-13B-GPTQ to whatever model you want.

You’ll find the model in…

ls ./docker/models/

I also modify the docker/.env to change this line to…

CLI_ARGS=--model TheBloke_stable-vicuna-13B-GPTQ --chat --model_type=Llama --wbits 4 --groupsize 128 --listen

However, I run it by hand with:

podman run \
--env-file /home/doug/ai-ml/text-generation-webui/docker/.env \
-v /home/doug/ai-ml/text-generation-webui/characters:/app/characters \
-v /home/doug/ai-ml/text-generation-webui/extensions:/app/extensions \
-v /home/doug/ai-ml/text-generation-webui/loras:/app/loras \
-v /home/doug/ai-ml/text-generation-webui/models:/app/models \
-v /home/doug/ai-ml/text-generation-webui/presets:/app/presets \
-v /home/doug/ai-ml/text-generation-webui/prompts:/app/prompts \
-v /home/doug/ai-ml/text-generation-webui/softprompts:/app/softprompts \
-v /home/doug/ai-ml/text-generation-webui/docker/training:/app/training \
-p 7860:7860 \
-p 5000:5000 \
--gpus all \
-i \
--tty \
--shm-size=512m \
localhost/dougbtv/oobabooga:latest

(If you’re smarter than me, you can get it running with podman-compose at this point)

At this point, you should be done, grats!

It should give you a web address, fire it up and get on generating!

Mount your models somewhere

I wound up bind mounting some directories…

sudo mount --bind /home/doug/ai-ml/oobabooga_linux/text-generation-webui/models/ docker/models/
sudo mount --bind /home/doug/ai-ml/oobabooga_linux/text-generation-webui/presets/ docker/presets/
sudo mount --bind /home/doug/ai-ml/oobabooga_linux/text-generation-webui/characters/ docker/characters/

Bonus note: I also wound up changing my dockerfile to install a torch+cu118, in case that helps you.

So I changed out two lines that looked like this diff:

-    pip3 install torch torchvision torchaudio && \
+    pip3 install torch==2.0.1+cu118 torchvision==0.15.2+cu118 torchaudio==2.0.2 -f https://download.pytorch.org/whl/cu118/torch_stable.html && \

I’m not sure how much it helped, but, I kept this change after I made it.

I’m hopeful to submit a patch for https://github.com/RedTopper/Text-Generation-Webui-Podman which isn’t building for me right now hopefully integrating what I learned from this. And then have the whole thing in podman, later.

Don’t make my stupid mistakes

I ran into an issue where, I got:

RuntimeError: CUDA error: no kernel image is available for execution on the device

I tried messing with the TORCH_CUDA_ARCH_LIST in the .env file and change it to 8.6+PTX, 8.0, etc, the whole list, commented out, no luck.

I created an issue in the meanwhile: https://github.com/oobabooga/text-generation-webui/issues/2002

I also found this podman image repo!

https://github.com/RedTopper/Text-Generation-Webui-Podman

and I forked it.

It looks like it could possibly need updates.

I’ll try to contribute my work back to this repo at some point.