73 part series

Kubernetes From Scratch

Build a complete Kubernetes cluster by hand — no kubeadm, no scripts — from the first certificate to a real HA cluster, then use that cluster as a lab to deep-dive every Kubernetes concept. Part one: PKI/TLS, etcd, the control plane, workers, pod networking, CoreDNS. Part two: Pods, workload controllers, scheduling, storage, advanced networking (Cilium eBPF), security, extending the API, operations. Each component is both explained from the inside and stood up/configured by hand. Tested for real on AWS EC2 with Kubernetes v1.36; manifests/scripts at github.com/nghiadaulau/kubernetes-from-scratch. Grounded in the official docs at kubernetes.io.

61

API Aggregation: Bolting On a Second API Server

A CRD adds a new kind that the main API server itself stores in etcd. API aggregation goes further: it bolts a second API server in behind the main one, serving an API group that it stores and computes on its own terms. Our cluster has been running one example since Article 39 — metrics-server. This article examines it as an aggregated API: how the APIService registers it, where requests get proxied, and why the CPU/memory figures it returns never sit in etcd.

Kai··4 min read·DevOpsKubernetes
62

Device Plugins and Extended Resources

Pods can request CPU and memory, but what about a GPU, a high-speed NIC, or an FPGA? A device plugin is how a node advertises hardware beyond CPU/memory as an extended resource for pods to request and the scheduler to divvy up. This article stands up a real device plugin, captures the full gRPC flow it uses to register with kubelet and advertise devices, watches kubelet call Allocate when a pod runs, then uses the underlying mechanism to see the scheduler split it like CPU.

Kai··7 min read·DevOpsKubernetes
63

Backing Up etcd and Rotating Certificates

etcd holds all of the cluster's state; lose it and you lose the cluster. Part XIII opens at exactly the scariest spot when things break. We take an etcd snapshot, verify it's valid, and restore it into a fresh data directory to prove the snapshot is usable — all without touching the running etcd. Then we inspect the expiry of the certificate set built in Article 4 and discuss rotating them before they expire.

Kai··5 min read·DevOpsKubernetes
64

Upgrades and Version Skew

Upgrading Kubernetes isn't a single sweep — it follows an order, because components are allowed to skew versions within strict limits: kubelet may be up to three minors older than the apiserver but never newer. This article inspects the cluster's version skew, explains why the apiserver must be upgraded first, then drills the hardest part of upgrading a node: cordon, drain, uncordon — on the real worker-0, fully reversible.

Kai··5 min read·DevOpsKubernetes
65

GC, cgroup v2, Swap, and Graceful Node Shutdown

Kubelet does a lot at the node level that we rarely look at while things run fine. This article inspects four of those on a real worker: cleaning up old images when the disk fills, placing each pod in the right cgroup v2 branch and enforcing limits via memory.max/cpu.max, why swap is blocked by default, and graceful node shutdown — the thing that decides whether pods get yanked or shut down cleanly when a node powers off.

Kai··4 min read·DevOpsKubernetes
66

Logging Architecture

kubectl logs sounds simple, but behind it is a chain: the container writes stdout/stderr to a file on the node, kubelet reads that file back, and rotates it when full. This article traces a real log line from kubectl down to the file on the worker's disk, examines the CRI format and the symlink, then separates the two kinds of logs in a self-built cluster — container logs and system-component logs via journald — and why the cluster needs an agent to collect them.

Kai··4 min read·DevOpsKubernetes
67

Metrics, Traces, and API Priority and Fairness

Logs are for discrete events; metrics are for continuous measurements. This article inspects the Prometheus-format /metrics endpoint the apiserver and kubelet expose, then digs into API Priority and Fairness — the apiserver's mechanism for splitting request bandwidth into priority levels so one misbehaving client can't starve the rest. We look at the built-in FlowSchema and PriorityLevelConfiguration, and each level's live state via the debug endpoint.

Kai··5 min read·DevOpsKubernetes
68

Leader Election, Addons, and Node Autoscaling

The cluster runs three control planes, but controller-manager and scheduler keep only one instance active at a time — if all three acted, they'd step on each other. This article looks at the mechanism that prevents that: leader election via a Lease, and proves real failover by taking the leader down and watching another controller win the lock. Then it closes Part XIII with addons and node autoscaling — adding and removing machines by load.

Kai··5 min read·DevOpsAutoscaling
69

Admission Policy with CEL

Article 58 built an admission webhook — a separate HTTPS service with a cert and a server to keep alive. From v1.36, most of that need can be met without any server: ValidatingAdmissionPolicy and MutatingAdmissionPolicy write rules in CEL right inside the API server. This article opens Part XIV — features that just graduated in v1.36 — by blocking :latest images and auto-injecting a pod label, entirely with policy objects, not a line of server.

Kai··5 min read·DevOpsSecurity
70

In-place Pod Resize

Throughout the series, changing a container's resources meant recreating the pod. In-place pod resize breaks that: adjust a running pod's CPU/memory without a restart, via the resize subresource. This article resizes a real pod, then inspects cgroup v2 on the node changing in place with restartCount still 0 — the 'no disruption' counterpart to Article 40's vertical scaling — and two constraints: you can't change QoS, and why memory needs its own resizePolicy.

Kai··4 min read·DevOpsKubernetes
71

The New Storage of v1.36

Part IX built PV/PVC, StorageClass, EBS CSI, snapshots. v1.36 adds three storage pieces that just went stable, and a from-scratch cluster can try two of them right away: mounting an OCI image's content as a volume, and changing the IOPS/throughput of an EBS volume in use without recreating it — watching the change apply straight to AWS. The third, VolumeGroupSnapshot, is a different lesson: a feature being GA in Kubernetes doesn't mean every CSI driver can do it.

Kai··5 min read·DevOpsStorage
72

Node Log Query and Fine-grained Kubelet Authorization

Article 65 viewed system logs by SSHing into each node to run journalctl. v1.36 lets you query those logs straight through the kubelet API, no SSH. And it comes with a security change: kubelet API access, previously lumped under nodes/proxy, is now split per endpoint — letting you grant exactly nodes/metrics to a monitoring agent without handing over logs or exec. The final article of Part XIV, both touching components we built in Part I.

Kai··4 min read·DevOpsSecurity