Table of Contents

Foreward about LLM Culture

I’ve been intrigued by the AI and LLM mindvirus that seems to have infected the whole world. I won’t lament about the moral, ethical, or environmental impacts that come along with commercialized Artificial Intelligence (although I could). The reality is that Pandora’s Box has been opened, and the curse upon mankind has percolated through every facet of human existence.

Anyways, here’s how I deployed ollama on k0s using an Nvidia RTX 3080. It could probably work for other k8s distributions if you tweak a few things.

Installation

Quick note about Consumer vs Enterprise hardware

If you’re going to go through with running a consumer GPU in your cluster just know that it is not supported by many of the upstream Nvidia documentation or helm charts. For example, their gpu-operator handles probably 90% of this stuff for you, including installing the drivers, the container toolkit, etc. However none of that works for consumer GPUs, at least for me it didn’t. I tried to go the official way, even going so far as to hack away at the primitives in the helm-chart to get things kickstarted. Every time the gpu-operator prints:

{"level":"info","ts":1758250136.8297,"logger":"controllers.ClusterPolicy","msg":"No GPU node in the cluster, do not create DaemonSets ","DaemonSet":"nvidia-mig-manager","Namespace":"kube-system"}

I’d suggest avoiding deploying it entirely unless you have a datacenter GPU. There may yet be a way to get this working with a fair bit of hacking but I gave up on it because Nvidia just refuses to acknowledge these use cases. Which is fine. It’s precisely why I opted to buy a 9070 XT for my gaming desktop.

Context of my environment

  • OS: Mix of Debian 12 and Debian 13
  • k8s Distro: k0s
  • 3 dedicated control plane nodes
  • 4 total worker nodes
  • 1 RTX 3080

Setting up your GPU Node

Install the Nvidia GPU Driver

This one is pretty straightforward. You need to navigate to the Nvidia driver download site and use the filters to find the driver for your GPU. I’d suggest the latest, as that is what ollama suggests in their troubleshooting guide.

Once you search for and find the list of drivers, you can right click the “Download” button and copy the link. It should look something like this:

https://us.download.nvidia.com/XFree86/Linux-x86_64/580.82.09/NVIDIA-Linux-x86_64-580.82.09.run

SSH to the node that has the GPU, and curl the download:

curl -O -L https://us.download.nvidia.com/XFree86/Linux-x86_64/580.82.09/NVIDIA-Linux-x86_64-580.82.09.run

Next you simply execute the installation with sh (as per the documentation):

sh ./NVIDIA-Linux-x86_64-580.82.09.run

This executes a nice little interactive TUI. Just follow the prompts. If you aren’t running X11/Wayland (you’re running this on a headless server right?) then don’t bother installing or configuring the stuff for X.

Install the Nvidia Container Toolkit

You also need to install the NVIDIA Container Toolkit. As always, check upstream docs first, but I’ll recap what I did here if it maybe saves you a click.

I created this bash script that condenses all their steps into one:

NVIDIA_CONTAINER_TOOLKIT_VERSION=1.17.8-1

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
  && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
    sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
    sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

sudo apt-get update

sudo apt-get install -y \
    nvidia-container-toolkit=${NVIDIA_CONTAINER_TOOLKIT_VERSION} \
    nvidia-container-toolkit-base=${NVIDIA_CONTAINER_TOOLKIT_VERSION} \
    libnvidia-container-tools=${NVIDIA_CONTAINER_TOOLKIT_VERSION} \
    libnvidia-container1=${NVIDIA_CONTAINER_TOOLKIT_VERSION}

After you run all of that, you should be able to run:

root@fulgrim:~# nvidia-ctk --help
NAME:
   NVIDIA Container Toolkit CLI - Tools to configure the NVIDIA Container Toolkit

USAGE:
   NVIDIA Container Toolkit CLI [global options] command [command options]

VERSION:
   1.17.8
commit: f202b80a9b9d0db00d9b1d73c0128c8962c55f4d

COMMANDS:
   hook     A collection of hooks that may be injected into an OCI spec
   runtime  A collection of runtime-related utilities for the NVIDIA Container Toolkit
   info     Provide information about the system
   cdi      Provide tools for interacting with Container Device Interface specifications
   system   A collection of system-related utilities for the NVIDIA Container Toolkit
   config   Interact with the NVIDIA Container Toolkit configuration
   help, h  Shows a list of commands or help for one command

GLOBAL OPTIONS:
   --debug, -d    Enable debug-level logging (default: false) [$NVIDIA_CTK_DEBUG]
   --quiet        Suppress all output except for errors; overrides --debug (default: false) [$NVIDIA_CTK_QUIET]
   --help, -h     show help
   --version, -v  print the version

Configuring k0s’ containerd Runtime

Info Gathering

As you may or may not know, k0s ships it’s own bundled containerd runtime. This means that configuring containerd may be slightly different than normal. According to the Nvidia documentation you could run a single command sudo nvidia-ctk runtime configure --runtime=containerd. However, I don’t find that very helpful because I don’t know what exactly it is doing. So in my case I opted to do things a little more manually.

If you look into the k0s documentation regarding configuring containerd, they state:

In order to make changes to containerd configuration first you need to generate a default containerd configuration by running: containerd config default > /etc/k0s/containerd.toml

However, in my case this file already existed:

root@fulgrim:~# cat /etc/k0s/containerd.toml
# k0s_managed=true
# This is a placeholder configuration for k0s managed containerd.
# If you wish to override the config, remove the first line and replace this file with your custom configuration.
# For reference see https://github.com/containerd/containerd/blob/main/docs/man/containerd-config.toml.5.md
version = 2
imports = [
	"/run/k0s/containerd-cri.toml",
]

Sweet, so we have the ability to extend the k0s containerd config natively. Nvidia also provides us a way to deploy their desired containerd config updates:

root@fulgrim:~# nvidia-ctk --quiet runtime configure --runtime=containerd --dry-run
version = 2

[plugins]

  [plugins."io.containerd.grpc.v1.cri"]

    [plugins."io.containerd.grpc.v1.cri".containerd]

      [plugins."io.containerd.grpc.v1.cri".containerd.runtimes]

        [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia]
          privileged_without_host_devices = false
          runtime_engine = ""
          runtime_root = ""
          runtime_type = "io.containerd.runc.v2"

          [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia.options]
            BinaryName = "/usr/bin/nvidia-container-runtime"

Implementing the Changes

There’s probably another more creative way to do this, but this is what I settled on. We can copy the contents of /run/k0s/containerd-cri.toml to a new file in /etc/k0s/, then apply the config changes using nvidia-ctk. After that we just point /etc/k0s/containerd.toml at our new file.

First we copy the running config to a new file:

cat /run/k0s/containerd-cri.toml > /etc/k0s/containerd-cri.toml

Let nvidia-ctk do its thing:

nvidia-ctk runtime configure --runtime=containerd /etc/k0s/containerd-cri.toml

Following the directions in the aforementioned /etc/k0s/containerd.toml we can replace the previous running config with our newly created /etc/k0s/containerd-cri.toml. Make sure you remove the top line that has k0s_managed=true to prevent k0s from overwriting our changes:

# This is a placeholder configuration for k0s managed containerd.
# If you wish to override the config, remove the first line and replace this file with your custom configuration.
# For reference see https://github.com/containerd/containerd/blob/main/docs/man/containerd-config.toml.5.md
version = 2
imports = [
	#"/run/k0s/containerd-cri.toml",
    "/etc/k0s/containerd-cri.toml",
]

Now we just restart k0sworker.service:

systemctl restart k0sworker.service

Configuring kubernetes

Pre-Reqs

Now we can deploy the nvidia-device-plugin. Essentially this little guy informs your kublet that there is a GPU resource on one (or more) nodes that can be allocated/reserved by a running pod.

Supporting Resources Stolen from gpu-operator

Although I mentioned before that the Nvidia gpu-operator is basically useless for consumer GPUs, there happens to be a few small things that we need from it. Fret not, as we don’t have to deploy the entire helm chart to get it.

RuntimeClass

We need a RuntimeClass that defines a new class for nvidia. You can go ahead and apply this to your cluster (RuntimeClass is cluster-scoped):

apiVersion: node.k8s.io/v1
handler: nvidia
kind: RuntimeClass
metadata:
  labels:
    app.kubernetes.io/component: gpu-operator
  name: nvidia

Clearly mine is a remnant from my failed gpu-operator tinkering. Not sure if the label is required, but I left it for completeness.

Labeling Nodes

Additionally, if you are already running node-feature-discovery you can deploy some nodeFeatureRules to help nvidia-device-plugin scheduling. These essentially add labels to your node(s) based on their attached PCIe device, in this case a GPU. In my case I have a 3080. We can find the PCIe bus information by using lspci:

root@fulgrim:~# lspci -nn | grep -i nvidia | grep VGA
c1:00.0 VGA compatible controller [0300]: NVIDIA Corporation GA102 [GeForce RTX 3080] [10de:2206] (rev a1)

The values we’re looking for here are 10de:2206, indicating the vendor 10de and the device 2206. Using that we can create our own NFD nodeFeatureRule. This is kind of an exercise for the reader, but I’ll provide my nodeFeatureRule as an example:

apiVersion: nfd.k8s-sigs.io/v1alpha1
kind: NodeFeatureRule
metadata:
  name: nvidia-nfd-nodefeaturerules
spec:
  rules:
  - labels:
      nvidia.com/gpu.RTX3080.pcie: "true"
      nvidia.com/gpu.family: "ampere"
      feature.node.kubernetes.io/nvidia-gpu: "3080"
      nvidia.com/gpu.present: "true"
    matchFeatures:
    - feature: pci.device
      matchExpressions:
        device:
          op: In
          value:
          - "2206"
        vendor:
          op: In
          value:
          - 10de
    name: NVIDIA RTX3080

The labels you apply aren’t really required to be in that exact format, but I simply just emulated what the gpu-operator would apply. If you aren’t running node-feature-discovery then you can simply label your GPU-laden node the old fashioned way.

kubectl label node <your-nodes-name> nvidia.com/gpu.RTX3080.pcie=true

nvidia-device-plugin

You can essentially deploy the nvidia-device-plugin wholesale, however there is one gotcha that you should pay attention to. Within the helm chart values there is some pre-defined node affinity stuff:

affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
      - matchExpressions:
        # On discrete-GPU based systems NFD adds the following label where 10de is the NVIDIA PCI vendor ID
        - key: feature.node.kubernetes.io/pci-10de.present
          operator: In
          values:
          - "true"
      - matchExpressions:
        # On some Tegra-based systems NFD detects the CPU vendor ID as NVIDIA
        - key: feature.node.kubernetes.io/cpu-model.vendor_id
          operator: In
          values:
          - "NVIDIA"
      - matchExpressions:
        # We allow a GPU deployment to be forced by setting the following label to "true"
        - key: "nvidia.com/gpu.present"
          operator: In
          values:
          - "true"

You’ll want to make sure that these labels match whatever is in your nodefeaturerule or your manually applied labels. You could probably also just skip all the labeling nonsense and just lock it to a nodeSelector but I haven’t tested that personally.

Once youve deployed nvidia-device-plugin via helm or otherwise, we can check the logs to make sure things are working:

kubectl logs -l app.kubernetes.io/instance=nvidia-device-plugin


I0918 03:24:53.991797       1 main.go:235] "Starting NVIDIA Device Plugin" version=<
	3c378193
	commit: 3c378193fcebf6e955f0d65bd6f2aeed099ad8ea
 >
I0918 03:24:53.991843       1 main.go:238] Starting FS watcher for /var/lib/kubelet/device-plugins
I0918 03:24:53.991884       1 main.go:245] Starting OS watcher.
I0918 03:24:53.992196       1 main.go:260] Starting Plugins.
I0918 03:24:53.992231       1 main.go:317] Loading configuration.
I0918 03:24:53.993192       1 main.go:342] Updating config with default resource matching patterns.
I0918 03:24:53.993401       1 main.go:353]
Running with config:
{
  "version": "v1",
  "flags": {
    "migStrategy": "none",
    "failOnInitError": true,
    "mpsRoot": "/run/nvidia/mps",
    "nvidiaDriverRoot": "/",
    "nvidiaDevRoot": "/",
    "gdsEnabled": false,
    "mofedEnabled": false,
    "useNodeFeatureAPI": null,
    "deviceDiscoveryStrategy": "auto",
    "plugin": {
      "passDeviceSpecs": false,
      "deviceListStrategy": [
        "envvar"
      ],
      "deviceIDStrategy": "uuid",
      "cdiAnnotationPrefix": "cdi.k8s.io/",
      "nvidiaCTKPath": "/usr/bin/nvidia-ctk",
      "containerDriverRoot": "/driver-root"
    }
  },
  "resources": {
    "gpus": [
      {
        "pattern": "*",
        "name": "nvidia.com/gpu"
      }
    ]
  },
  "sharing": {
    "timeSlicing": {}
  },
  "imex": {}
}
I0918 03:24:53.993408       1 main.go:356] Retrieving plugins.
I0918 03:24:55.884566       1 server.go:195] Starting GRPC server for 'nvidia.com/gpu'
I0918 03:24:55.885835       1 server.go:139] Starting to serve 'nvidia.com/gpu' on /var/lib/kubelet/device-plugins/nvidia-gpu.sock
I0918 03:24:55.888285       1 server.go:146] Registered device plugin for 'nvidia.com/gpu' with Kubelet

This is the desired output. Specifically Registered device plugin for 'nvidia.com/gpu' with Kubelet. This means we can request the GPU to be attached to a pod.

Ollama

This won’t be a comprehensive guide on how to deploy Ollama. Using helm we will simply just make sure the following values are present:

ollama:
  gpu:
    enabled: true
    number: 1
    nvidiaResource: nvidia.com/gpu
    type: nvidia
runtimeClassName: "nvidia"

More generally speaking, you can request a GPU by adding nvidia.com/gpu: "1" in the resources block of any Deployment or Pod, and setting the runtimeClassName. This example is an excerpt from my ollama deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: ollama
spec:
  ...
  template:
    spec:
      ...
      containers:
        ...
        resources:
          limits:
            cpu: "4"
            memory: 8Gi
            nvidia.com/gpu: "1"
          requests:
            cpu: "2"
            memory: 4Gi
      ...
      runtimeClassName: nvidia

Further Reading