Container Runtime and CRI: Installing containerd on the Workers

K
Kai··10 min read

The control plane section closed in Article 9: three api-servers behind a load balancer, kubectl from the laptop controlling things, and the RBAC for the api-server to call down to the kubelet in place. But the cluster still can't run a single pod: get nodes returns empty, because no worker has joined. The rest of the series builds the workers, and the first thing on a worker isn't the kubelet — it's the thing the kubelet relies on to run containers: a container runtime.

This article stops at that foundational layer. We look at the interface the kubelet uses to talk to the runtime, who does what in the stack, then install containerd and runc on the two workers and verify with crictl, with no kubelet needed yet.

The kubelet doesn't run containers, it tells the runtime to

A common misconception is that the kubelet creates containers itself. It doesn't. The kubelet watches the api-server, learns which pods need to run on its node, then asks a runtime to create and manage those containers. Creating namespaces, cgroups, mounting the rootfs, running the process — that's all the runtime's part.

The interface between the two is CRI — the Container Runtime Interface. It's a gRPC API defined by Kubernetes, with two services:

  • RuntimeService — the lifecycle of pod sandboxes and containers: create, start, stop, delete, list.
  • ImageService — image management: pull, list, delete.

The kubelet is the client, the runtime is the server. As long as a runtime speaks the CRI protocol correctly, the kubelet doesn't care what's underneath. That's why dockershim was removed from the kubelet (1.24) without affecting functionality: Docker was just one option, while containerd speaks CRI directly.

containerd implements CRI via a built-in plugin, listening on a Unix socket: /run/containerd/containerd.sock. That's the socket the kubelet will point at in Article 11. For now, we use crictl (a CLI that speaks CRI straight to that socket) to test the runtime without a kubelet.

Who does what: containerd, runc, OCI

containerd isn't the thing that ultimately runs the container. It orchestrates: pulls images, manages rootfs snapshots, persists state, and calls a lower-level runtime to actually create the container. That lower level is runc — an OCI runtime.

OCI (the Open Container Initiative) standardizes two things: the image format and the runtime format. runc reads an OCI bundle (a rootfs directory plus a config.json describing the container) and uses Linux primitives — namespaces, cgroups, capabilities — to build the container and run the process inside. Once created, runc exits; the container lives on.

Between containerd and runc there's one more piece: containerd-shim (containerd-shim-runc-v2). Each container has a shim as its direct parent. The shim keeps the container alive independently of containerd, which is how you can restart or upgrade containerd without killing the running containers.

Put together as a delegation chain, each layer does exactly one thing:

   api-server
       │  (kubelet sees which pods belong to its node)
       ▼
   kubelet ──CRI (gRPC over unix socket)──► containerd
                                              │  pull images, build rootfs,
                                              │  hold state
                                              ▼
                                   containerd-shim-runc-v2   (1 shim / container)
                                              │
                                              ▼
                                            runc   (OCI runtime)
                                              │  create namespaces + cgroups,
                                              │  run the process then exit
                                              ▼
                                       [ running container ]

We'll install from the bottom up: runc (the OCI runtime), containerd (which ships the shim in its release), crictl (the inspection tool), and the CNI plugin set (containerd needs them when building pod networking; the network config is deferred to Article 14).

Step 1 — Download the binaries onto worker-0

Pin the versions to a mid-2026 baseline so the article doesn't drift over time: containerd v2.3.1, runc v1.4.2, CNI plugins v1.9.1, crictl v1.36.0 (matching the minor of Kubernetes v1.36). All are static binaries, downloaded straight from GitHub releases.

# on worker-0
cd /tmp
CONTAINERD_VER=2.3.1
RUNC_VER=1.4.2
CNI_VER=1.9.1
CRICTL_VER=1.36.0

curl -fsSL -O https://github.com/containerd/containerd/releases/download/v${CONTAINERD_VER}/containerd-${CONTAINERD_VER}-linux-amd64.tar.gz
curl -fsSL -O https://github.com/opencontainers/runc/releases/download/v${RUNC_VER}/runc.amd64
curl -fsSL -O https://github.com/containernetworking/plugins/releases/download/v${CNI_VER}/cni-plugins-linux-amd64-v${CNI_VER}.tgz
curl -fsSL -O https://github.com/kubernetes-sigs/cri-tools/releases/download/v${CRICTL_VER}/crictl-v${CRICTL_VER}-linux-amd64.tar.gz

ls -la *.tar.gz *.tgz runc.amd64
-rw-rw-r-- 1 ubuntu ubuntu 55418181 cni-plugins-linux-amd64-v1.9.1.tgz
-rw-rw-r-- 1 ubuntu ubuntu 34541786 containerd-2.3.1-linux-amd64.tar.gz
-rw-rw-r-- 1 ubuntu ubuntu 19263420 crictl-v1.36.0-linux-amd64.tar.gz
-rw-rw-r-- 1 ubuntu ubuntu 12233104 runc.amd64

The -f in curl -fsSL makes the command return an error code when the server returns a 404, instead of silently saving an HTML error page as a .tar.gz file. In the control plane articles we got bitten a few times by truncated downloads that reported nothing; on the workers, just glancing at the file sizes after the download (ls -la) is the cheapest way to catch the error early.

Step 2 — Install into the right places

The containerd release extracts straight into /usr/local (the binaries land in bin/). runc is a single file, placed in /usr/local/sbin. The CNI plugins go to /opt/cni/bin, the default path containerd looks in. crictl goes to /usr/local/bin.

# containerd -> /usr/local/bin/{containerd,containerd-shim-runc-v2,ctr,...}
sudo tar Cxzvf /usr/local containerd-2.3.1-linux-amd64.tar.gz

# runc -> OCI runtime
sudo install -m 755 runc.amd64 /usr/local/sbin/runc

# CNI plugins -> /opt/cni/bin
sudo mkdir -p /opt/cni/bin
sudo tar Cxzvf /opt/cni/bin cni-plugins-linux-amd64-v1.9.1.tgz

# crictl -> /usr/local/bin
sudo tar Cxzvf /usr/local/bin crictl-v1.36.0-linux-amd64.tar.gz

Verify each one; after every download, this step also catches a truncated binary:

containerd --version
runc --version | head -1
crictl --version
ls /opt/cni/bin
containerd github.com/containerd/containerd/v2 v2.3.1 64b425cf570b3b8dd1d4cc46da7c1fce65c6651a
runc version 1.4.2
crictl version v1.36.0

bandwidth  bridge  dhcp  dummy  firewall  host-device  host-local  ipvlan
loopback   macvlan  portmap  ptp  sbr  static  tap  tuning  vlan  vrf

Of that pile of CNI plugins, the two we'll actually configure in Article 14 are bridge (creates the bridge and assigns IPs to pods) and loopback (the lo interface inside each pod). The rest are options for other networking models.

Step 3 — Generate and edit the containerd config

containerd runs fine without a config file, but the CRI plugin needs a couple of tweaks. The cleanest approach is to have containerd itself print the default config, then edit two spots:

sudo mkdir -p /etc/containerd
containerd config default | sudo tee /etc/containerd/config.toml >/dev/null
grep -E '^version' /etc/containerd/config.toml
version = 4

containerd 2.x generates a version 4 config, with the plugin keys in the new style: io.containerd.cri.v1.images and io.containerd.cri.v1.runtime (different from the io.containerd.grpc.v1.cri of the 1.x line). If you copy a config from an old guide, watch for this, because a wrong key gets silently ignored by containerd with no error.

First edit, the pause image (already correct). Each pod has a hidden container called pause: it's created first, holds the pod's network/IPC namespaces for the real containers to share, and does nothing but sleep. Its image is "pinned" in the config:

    [plugins.'io.containerd.cri.v1.images'.pinned_images]
      sandbox = 'registry.k8s.io/pause:3.10.2'

containerd 2.3.1 already pins pause:3.10.2, exactly the version that goes with Kubernetes 1.36, so there's nothing to change. The point to remember: the pause image is declared here, in the runtime, not in the kubelet. If a pod later gets stuck unable to create a sandbox, this is one of the first places to check.

Second edit, the cgroup driver. This one is mandatory. By default containerd uses the cgroupfs cgroup driver:

grep -n 'SystemdCgroup' /etc/containerd/config.toml
88:            SystemdCgroup = false

Ubuntu boots with systemd, and systemd wants to be the only component managing the cgroup tree. If the runtime writes into cgroups in the cgroupfs style while systemd also manages the same tree, the two step on each other, and under resource pressure the node can become unstable. The rule: the kubelet and the container runtime must use the same cgroup driver, and on a systemd host both should choose systemd. We enable it for containerd now; the kubelet side will set cgroupDriver: systemd in Article 11.

sudo sed -i 's/SystemdCgroup = false/SystemdCgroup = true/' /etc/containerd/config.toml
grep -n 'SystemdCgroup' /etc/containerd/config.toml
88:            SystemdCgroup = true

This flag sits in the options block of the runc runtime, the default runtime containerd uses to run containers:

          [plugins.'io.containerd.cri.v1.runtime'.containerd.runtimes.runc.options]
            ...
            SystemdCgroup = true

Step 4 — systemd unit and startup

The containerd release doesn't include a systemd service file; grab it from the repo, pinned to the right tag to match the binary:

curl -fsSL https://raw.githubusercontent.com/containerd/containerd/v2.3.1/containerd.service \
  | sudo tee /etc/systemd/system/containerd.service >/dev/null

sudo systemctl daemon-reload
sudo systemctl enable --now containerd
systemctl is-active containerd
active

Step 5 — Verify with crictl

crictl doesn't know which socket to use by default, so declare it once in /etc/crictl.yaml:

sudo tee /etc/crictl.yaml >/dev/null <<'EOF'
runtime-endpoint: unix:///run/containerd/containerd.sock
image-endpoint: unix:///run/containerd/containerd.sock
timeout: 10
EOF

Now test along exactly the path the kubelet will take, through CRI to containerd:

sudo crictl version
Version:  0.1.0
RuntimeName:  containerd
RuntimeVersion:  v2.3.1
RuntimeApiVersion:  v1

crictl version doesn't read the containerd binary; it makes a gRPC call to the socket and asks the runtime to describe itself. Getting back RuntimeApiVersion: v1 means the CRI plugin is serving exactly the API version that kubelet 1.36 needs. Try pulling an image — this is the CRI ImageService at work:

sudo crictl pull registry.k8s.io/pause:3.10.2
sudo crictl images
Image is up to date for sha256:4a83b15d3ecfe0d916b2d0a7991bc2854a629b8097017c2ee1ff65b30ae4c07c

IMAGE                   TAG                 IMAGE ID            SIZE
registry.k8s.io/pause   3.10.2              4a83b15d3ecfe       321kB

Both CRI services respond. One thing is not ready, and it's right that it isn't:

sudo crictl info | grep -iE 'lastCNILoadStatus|SystemdCgroup'
            "SystemdCgroup": true
  "lastCNILoadStatus": "cni config load failed: no network config found
    in /etc/cni/net.d: cni plugin not initialized: failed to load cni config"

SystemdCgroup: true confirms the tweak in Step 3 took. The lastCNILoadStatus line reports an error because /etc/cni/net.d is empty: we installed the CNI binaries into /opt/cni/bin but haven't written any network config yet. The runtime runs, can pull images, can build a standalone container; but for a pod to get an IP and talk to the outside it needs CNI config, and that's the job of Article 14. Splitting the layers this way reflects the architecture accurately: the runtime and the network are two independent concerns, joined through CNI.

Step 6 — Repeat on worker-1

All the steps above apply identically to worker-1. Bundled into one stream so you don't have to type each command by hand:

# on worker-1
cd /tmp
CONTAINERD_VER=2.3.1; RUNC_VER=1.4.2; CNI_VER=1.9.1; CRICTL_VER=1.36.0
for u in \
  https://github.com/containerd/containerd/releases/download/v${CONTAINERD_VER}/containerd-${CONTAINERD_VER}-linux-amd64.tar.gz \
  https://github.com/opencontainers/runc/releases/download/v${RUNC_VER}/runc.amd64 \
  https://github.com/containernetworking/plugins/releases/download/v${CNI_VER}/cni-plugins-linux-amd64-v${CNI_VER}.tgz \
  https://github.com/kubernetes-sigs/cri-tools/releases/download/v${CRICTL_VER}/crictl-v${CRICTL_VER}-linux-amd64.tar.gz ; do
  curl -fsSL -O "$u"
done

sudo tar Cxzf /usr/local containerd-2.3.1-linux-amd64.tar.gz
sudo install -m 755 runc.amd64 /usr/local/sbin/runc
sudo mkdir -p /opt/cni/bin && sudo tar Cxzf /opt/cni/bin cni-plugins-linux-amd64-v1.9.1.tgz
sudo tar Cxzf /usr/local/bin crictl-v1.36.0-linux-amd64.tar.gz

sudo mkdir -p /etc/containerd
containerd config default | sudo tee /etc/containerd/config.toml >/dev/null
sudo sed -i 's/SystemdCgroup = false/SystemdCgroup = true/' /etc/containerd/config.toml
curl -fsSL https://raw.githubusercontent.com/containerd/containerd/v2.3.1/containerd.service \
  | sudo tee /etc/systemd/system/containerd.service >/dev/null
sudo tee /etc/crictl.yaml >/dev/null <<'EOF'
runtime-endpoint: unix:///run/containerd/containerd.sock
image-endpoint: unix:///run/containerd/containerd.sock
timeout: 10
EOF

sudo systemctl daemon-reload
sudo systemctl enable --now containerd

Verify:

containerd --version
systemctl is-active containerd
sudo crictl version | grep RuntimeVersion
containerd github.com/containerd/containerd/v2 v2.3.1 64b425cf570b3b8dd1d4cc46da7c1fce65c6651a
active
RuntimeVersion:  v2.3.1

Both workers now have a runtime ready to take CRI commands.

🧹 Cleanup

containerd is a permanent service on the workers; leave it as is. Just clean up the downloaded files in /tmp:

# on each worker
rm -f /tmp/containerd-*.tar.gz /tmp/runc.amd64 \
      /tmp/cni-plugins-*.tgz /tmp/crictl-*.tar.gz

Keep the pause image you pulled — the kubelet will need it for every pod, and it's only 321kB. If you stop the EC2 cluster to save money, containerd is enabled so it'll start itself again on boot.

The full script for both workers is at github.com/nghiadaulau/kubernetes-from-scratch, in the 10-container-runtime directory.

Wrap-up

The workers now have their foundational layer: containerd v2.3 speaking CRI over /run/containerd/containerd.sock, runc as the OCI runtime underneath, the cgroup driver set to systemd to match the kubelet, the pause image pinned to the right version. What's worth taking away isn't the tar commands, but being able to picture the delegation chain kubelet → containerd → shim → runc: when a pod gets stuck later, knowing which layer is responsible for what shortens the debugging a great deal.

crictl shows the runtime is ready, but it just sits there waiting: nobody has told it which pod to run. The one giving that order is the kubelet. Article 11 installs the kubelet on the two workers: distributes the certificate for each node, writes the KubeletConfiguration, points it at the containerd socket we just built, and watches the two workers appear for the first time in kubectl get nodes — though not yet Ready, because they're still missing exactly the CNI piece we deliberately set aside.