eBPF From Scratch
Learn eBPF from the ground up to writing real programs — the eBPF virtual machine, the verifier, maps, the hooks (XDP/tc/kprobe/tracepoint/LSM); tracing with bpftrace; writing programs in C with libbpf + CO-RE then loading them from Go (cilium/ebpf); networking, observability and security. A real Kubernetes cluster (kernel 6.17, Cilium 1.19 eBPF kube-proxy-less with hundreds of BPF programs running) is the lab throughout. Everything is tested on real hardware and grounded in official docs (ebpf.io, kernel.org, libbpf, cilium). Source at github.com/nghiadaulau/ebpf-from-scratch.
tc/sched_cls and Dissecting a Live Cilium Datapath
After XDP comes tc — the hook where the packet already has an sk_buff, where both ingress and egress are visible, and where Cilium puts almost its entire datapath. This article dissects the 74 sched_cls programs actually running on a cluster node: where they attach (NIC, each pod), how they call each other via tail calls, and which BPF maps they look up to load-balance a Service or apply a NetworkPolicy. kube-proxy-less load balancing is one map lookup.
Writing a tc Program Yourself: __sk_buff and the tcx Chain
Article 12 read Cilium's tc datapath from the outside. This article writes a tc program ourselves — counting egress packets by protocol — to understand __sk_buff from the inside. The core difference from XDP: tc sees the sk_buff with metadata already filled in (skb->protocol, skb->len), not the raw packet. We attach it with tcx on a real interface, get correct counts, then hit a lesson: attached after Cilium on the NIC it never runs, because of how the tcx chain terminates.
LSM BPF: Enforcing Security Right Inside the Kernel
So far our eBPF has only observed. LSM BPF enforces: it attaches to the kernel's security hooks (Linux Security Modules) that SELinux and AppArmor use, and a program returns 0 to allow or -EPERM to block. This article writes an LSM program that blocks opening a file, and hits a lesson: it loaded and attached but blocked nothing — because bpf wasn't an active LSM. After enabling bpf via a boot parameter and rebooting, it blocks for real — both cat and python get Operation not permitted.
seccomp-bpf: Classic BPF Filtering Syscalls in Every Container
Before eBPF there was cBPF — classic BPF, the thing tcpdump uses. And it's still running: seccomp-bpf filters syscalls with cBPF, the foundational sandbox layer of containers. This article distinguishes cBPF from eBPF, inspects real seccomp on the cluster (pause containers and CSI sidecars restricted, privileged pods not), then writes a cBPF filter that blocks mkdir with EPERM — eight instructions on struct seccomp_data, installed with prctl, blocking for real while printf still runs.
The Tetragon Way: From Observe to Enforce with bpf_send_signal
Tetragon is the Cilium ecosystem's runtime security tool: it observes with kprobe/tracepoint (the very hooks Part II used) and then enforces inside the kernel. Its enforcement uses two helpers — bpf_send_signal sends SIGKILL to kill a process, and bpf_override_return overrides a syscall's return value. This article rebuilds that: an exec tracepoint calls bpf_send_signal(SIGKILL) the moment a process runs — a forbidden binary gets exit 137, a normal binary still runs. No LSM, no reboot.
CPU Profiling with perf_event: Sampling Stacks, the Foundation of Flame Graphs
To know what the CPU is busy doing, we sample: a few dozen times per second, freeze each CPU and record the stack that's running. eBPF does this through the perf_event program type — attached to a kernel sampling counter, each time it fires it captures the stack and aggregates in the kernel. This article profiles a real node at 99Hz, sees dd eating CPU reading /dev/zero while idle cores sit in the idle loop, aggregates by process to get dd at 479 samples — the data a flame graph draws.
Off-CPU and Scheduler Latency: Measuring the Time a Process Is NOT Running
On-CPU profiling (Article 17) only sees the CPU when busy. But most latency an app feels is time it is NOT running: waiting for disk, a lock, or its CPU turn. eBPF measures that off-CPU interval via scheduler tracepoints. On a real node we measure two things: run-queue latency — from wakeup to actually running, exposing the 16-32ms tail under CPU contention; and off-CPU time — how long a task stays off the CPU each time, with a tail reaching several seconds for blocked tasks.
Inside Hubble: From eBPF Events to Cluster-Wide Network Flows
Hubble lets us see every connection in a Kubernetes cluster by pod name, service, and policy verdict — without a sidecar in any pod. This article dissects the mechanism: Cilium's eBPF datapath (the 74 sched_cls programs from Article 12) calls bpf_perf_event_output to push events into a perf ring buffer named cilium_events; cilium-agent reads them out with numeric identities; then Hubble enriches them — turning identity 18203 into kube-system/coredns — via the identity-to-label store.
Case Study: A Packet Through Cilium's eBPF Datapath
Nineteen articles dissected each piece: verifier, maps, XDP, tc, tail call, perf ring, identity. This article assembles them into one seamless story — following a single packet as a pod calls the cluster's DNS Service, from leaving the source pod to reaching the CoreDNS pod, through every eBPF program and every BPF map it touches. No new concepts; just seeing the whole machine run as one unified thing, with real data from the same cluster used throughout the series.
Capstone: Writing connmon — A Node-Wide TCP Connection Monitor
The final article: assembling everything learned into a real tool. connmon attaches a kprobe to tcp_connect in the kernel, pushes every new TCP connection through a ring buffer, and a Go loader prints them in real time — pid, process, destination IP:port. Just over a hundred lines, a single static binary, run it on the cluster and immediately see coredns, kubelet, curl connecting out. Includes a real kprobe build trap. Then a look back over the whole eBPF journey from scratch.