Kubernetes From Scratch
Build a complete Kubernetes cluster by hand — no kubeadm, no scripts — from the first certificate to a real HA cluster, then use that cluster as a lab to deep-dive every Kubernetes concept. Part one: PKI/TLS, etcd, the control plane, workers, pod networking, CoreDNS. Part two: Pods, workload controllers, scheduling, storage, advanced networking (Cilium eBPF), security, extending the API, operations. Each component is both explained from the inside and stood up/configured by hand. Tested for real on AWS EC2 with Kubernetes v1.36; manifests/scripts at github.com/nghiadaulau/kubernetes-from-scratch. Grounded in the official docs at kubernetes.io.
Deployment: rollout and rollback
So far we've only created bare Pods. No one runs production that way — pods are handed to a Deployment, which manages them through a middle layer: the ReplicaSet. This article opens Part IV by digging into that mechanism: changing the image spawns a new ReplicaSet, a rolling update scales it up while scaling the old one down, and the old ReplicaSet is kept at 0 so rollback is one command. Tested step by step on a real cluster, tracing Pod → ReplicaSet → Deployment.
StatefulSet: stable identity and order
A Deployment treats every pod as an anonymous school of fish — any one is interchangeable. But a database, message queue, or etcd is not: each node needs a fixed identity and its own data. The StatefulSet exists for exactly that need. This article digs into its four guarantees — stable names, per-pod DNS via a headless service, ordered creation and deletion — verifying each on a real cluster, and leaving the volume part for the Storage section.
DaemonSet: one pod per node
A Deployment manages N replicas placed anywhere; a StatefulSet manages N pods with identity. The DaemonSet is the third model: it doesn't count replicas but guarantees exactly one pod per node — add a node and a pod appears, remove a node and it vanishes. The mold for log agents, CNI, node exporters. This article digs into how it pins a pod to each node, why its pods run even on a not-ready node, and how to limit it to a group of nodes — tested on two real workers.
Job, CronJob and TTL
Every controller so far runs forever — Deployment, StatefulSet, DaemonSet keep pods alive indefinitely. The Job inverts this: it runs one task until done, then stops, perfect for migrations, backups, batch work. This article closes Part IV with the Job (completions, parallelism, backoffLimit), the CronJob that runs on a cron schedule, and TTL that auto-cleans finished Jobs — testing each on a real cluster, including catching a CronJob fire exactly on the minute boundary.
Labels, selectors, namespaces and annotations
We've typed -l app=web dozens of times without stopping to ask how it works. This article opens Part V with the toolkit for organizing and querying objects: labels to tag and select (equality and set-based), annotations to attach non-identifying metadata, namespaces to isolate, and field selectors to filter by built-in fields. Each kind of selector is tested for real on a basket of labeled pods.
Finalizers, ownerReferences and garbage collection
Every time we deleted a Deployment, the pods and ReplicaSet vanished with it — we called that garbage collection without dissecting it. This article digs into the mechanism: ownerReferences link parent and child, the garbage collector auto-cleans children when the parent is gone (background, foreground, or orphan), and finalizers block deletion until cleanup is done. All three tested for real — including an object stuck in Terminating because of a finalizer.
Object management, recommended labels, and storage version
For the same Deployment, we have three ways to create and edit it — type a command directly, create -f a file, or apply a whole directory — and mixing them invites bugs. This article closes Part V with those three object-management techniques (plus why apply differs from create -f), the recommended app.kubernetes.io/* label set so tools speak the same language, and storage version — digging into etcd to see which API version an object is actually stored in.
ConfigMap and Secret
Don't bake configuration into the image — pull it into a ConfigMap for ordinary data, a Secret for sensitive data, then inject it via environment variables or files. This article opens Part VI with both: four ways to consume them, one key difference (files auto-update on edit, env doesn't), and the harsh truth that a Secret is only base64, not encrypted — unless you turn it on, which our cluster did in Article 5. Tested for real, dug into etcd.
Node Allocatable: the resources a pod actually gets
Article 22 looked at requests/limits from the pod side. This one flips to the node side: a 2-vCPU machine doesn't let pods use all 2 vCPUs. Kubernetes carves off a slice for system daemons, one for Kubernetes daemons, and a buffer against running out of RAM — what's left is Allocatable, the part the scheduler divides up. We dig into the formula, read Capacity vs Allocatable on a real node, then add a reservation by hand and watch Allocatable drop by exactly that many Ki.
LimitRange and ResourceQuota
When many teams share one cluster, nothing stops team A from creating 10,000 pods or asking for 64Gi RAM for a single container — unless you set rules. LimitRange sets defaults and min/max for each pod in a namespace; ResourceQuota caps the total resources and object count the whole namespace can use. This article closes Part VI with both: testing for real the default-injection, the over-max 403, and the fourth pod blocked by quota.
The scheduler and the scheduling framework
Every pod we create has someone quietly picking a node for it — that's kube-scheduler, the thing we stood up in Article 8 but never looked at closely. This article opens Part VII by digging into exactly how it picks: filter out nodes that don't fit, score the remaining nodes, then bind. We test for real a pod stuck because no node has room, a pod that gets a node, and watch scoring pile pods onto the less-loaded node — not a naive round-robin.
Affinity, taints, and tolerations
The scheduler picks a node on its own, but often you need to intervene: this pod must be on an SSD node, two replicas shouldn't share a machine, that node is for one team. This article digs into three tools for steering the scheduler — nodeAffinity (pull a pod toward labeled nodes), podAntiAffinity (push pods apart), taint/toleration (a node pushes pods away unless tolerated). Tested for real: a pod stuck on affinity, a third with nowhere to go, one evicted by NoExecute.