eBPF From Scratch: Running Programs Inside the Linux Kernel

K
Kai··7 min read

In the "Kubernetes From Zero" series we used Cilium — it replaced kube-proxy, routed Services, applied NetworkPolicy — and we kept saying "Cilium does that with eBPF." But what eBPF is, and how an eBPF program actually runs in the kernel, was left open. This series fills that gap: learn eBPF from the foundations up to writing your own programs, using the very Kubernetes cluster we built (kernel 6.17, Cilium 1.19) as the lab.

Start with a concrete fact. On worker-0 of the cluster, right now:

ssh worker-0 'sudo bpftool prog show | grep -c "^[0-9]*:"'
ssh worker-0 'sudo bpftool map show | grep -c "^[0-9]*:"'
140      # 140 eBPF programs loaded in the kernel
56       # 56 BPF maps holding state

140 programs written by someone else (Cilium) are running inside the Linux kernel on that machine — handling every packet in and out of pods, controlling cgroup device access, collecting metrics. They were loaded into the kernel at runtime without anyone recompiling the kernel and without loading a module. That is eBPF.

What eBPF is

eBPF lets you run sandboxed programs inside the operating system — specifically inside the Linux kernel — "without changing kernel source code or loading kernel modules" (ebpf.io). People often compare it to the role of JavaScript on the web: the browser is a runtime, JavaScript lets us inject behavior into it without rewriting the browser; eBPF is a runtime inside the kernel, letting us inject behavior into the kernel without rewriting the kernel.

Why this matters is clearest against the two old ways of extending the kernel:

  • Change the kernel source: you have to convince the Linux community to accept the change, then wait years for that kernel version to become widespread.
  • Write a kernel module: it loads immediately, but the module runs with full privileges inside the kernel — one bad pointer is a kernel panic that takes the whole machine down; and modules break easily across kernel versions.

eBPF decouples the pace of innovation from the kernel release cycle: load a program at runtime, but safe (the kernel checks it before allowing it to run) and fast (translated to native machine code). A buggy eBPF program is refused at load, it does not crash the kernel — completely unlike a module.

How an eBPF program goes from code to running

This is the core part, and it is what distinguishes eBPF from a vague "script running in the kernel." The lifecycle:

   C code  ──clang/LLVM──►  eBPF bytecode
                                  │  bpf() syscall (load)
                                  ▼
                          ┌──────────────┐  rejected if not safe
                          │   VERIFIER   │ ──────────────────────────►
                          └──────────────┘  (terminates? valid memory reads? bounded?)
                                  │ pass
                                  ▼
                          ┌──────────────┐
                          │     JIT      │  bytecode → native machine code (x86/arm64)
                          └──────────────┘
                                  │
                                  ▼  attach to HOOK
              XDP · tc · kprobe · tracepoint · LSM · socket ...
                                  │  event happens (packet arrives, syscall called...)
                                  ▼  program runs
                          ┌──────────────┐   read/write
                          │     MAPS     │ ◄────────►  userspace process
                          └──────────────┘   (via bpf() syscall)

Four pieces to grasp:

  • Verifier — the safety checker that runs before the program is allowed to run. It guarantees the "program always runs to completion" (no infinite loops), "no use of uninitialized variables or out-of-bounds memory reads", and "bounded complexity". The verifier is a safety tool (can the program crash the kernel) not a security tool (does the program have malicious intent) — this distinction matters. This is why eBPF can be loaded into the kernel without being like a module.

  • JIT — after passing the verifier, the generic bytecode is "translated into a specific machine instruction set" (x86-64, arm64). So eBPF is not slow interpretation; it runs at native code speed.

  • Maps — data structures (hash, array, LRU, ring buffer...) for a program to hold state between runs, and for userspace to read/write via the bpf() syscall. Maps are how an eBPF tool reports results to the outside, and how userspace configures the program.

  • Helper — an eBPF program "cannot call arbitrary kernel functions"; it can only call a set of helpers that the kernel provides (get a timestamp, manipulate maps, modify packets...) — a stable API. This too is a safety constraint: the program cannot reach into just anywhere in the kernel.

And the hook is where the program attaches to run: when a packet arrives at the NIC (XDP, tc), when a syscall is called or a kernel function runs (kprobe, tracepoint), when a security operation happens (LSM), when a socket sends/receives. Each kind of hook corresponds to a program type, which determines what the program may do and what data it can see.

Not theory: dissecting 140 running programs

bpftool (already installed on the node) lets you inspect eBPF programs in the kernel directly. Count them by type:

sudo bpftool prog show | grep -oE '^[0-9]+: [a-z_]+' | awk '{print $2}' | sort | uniq -c | sort -rn
  74 sched_cls       # tc — Cilium's network datapath (handles pod packets)
  47 cgroup_device   # control device access per cgroup
   8 cgroup_sock_addr
   6 cgroup_skb
   3 cgroup_sock
   2 tracing

The 74 sched_cls programs are programs attached at the tc hook — exactly the network datapath that Cilium uses in place of kube-proxy. Inspect one:

sudo bpftool prog show id 2871
2871: sched_cls  name tail_no_service_ipv4  tag fe7bcb57c001d434  gpl
    loaded_at 2026-05-23T23:04:17+0000  uid 0
    xlated 4920B  jited 2778B  memlock 8192B  map_ids 171,631
    btf_id 758

Every field here is one of the concepts just mentioned, showing up for real:

  • xlated 4920B — the eBPF bytecode after passing the verifier (4920 bytes). Having this number means the program was accepted by the verifier.
  • jited 2778B — native machine code after JIT (2778 bytes). Having this number means it is running at native speed, not interpreted.
  • map_ids 171,631 — this program uses two maps (id 171 is cilium_metrics). That is how it holds/reports state.
  • btf_id 758 — it has BTF (kernel data types), the foundation for CO-RE that we will use when writing our own programs.
  • gpl — declares the GPL license (required to call certain helpers).

JIT is enabled system-wide:

cat /proc/sys/net/core/bpf_jit_enable
1

And 56 maps are holding state for that crowd of programs:

sudo bpftool map show | grep -iE 'cilium'
169: perf_event_array  name cilium_events
171: percpu_hash       name cilium_metrics
172: hash              name cilium_ratelimi

The entire anatomy of eBPF — verified bytecode, JIT code, maps, BTF, hooks — is not a diagram on paper but something running and inspectable on this machine. The whole series will come back to dissect these very programs, and to write similar ones ourselves.

The lab and the roadmap

The Kubernetes cluster from the previous series is an ideal eBPF environment: kernel 6.17 (enough for every modern eBPF feature), BTF enabled (/sys/kernel/btf/vmlinux), bpftool + bpftrace pre-installed, and a production eBPF system (Cilium, 140 programs) to dissect as a case study — something a hand-built lab rarely has.

The series roadmap:

  • Part I — Foundations (we are here): the eBPF virtual machine, verifier, JIT, maps, program types, the hooks, BTF + CO-RE.
  • Part II — Tracing: observe the kernel with bpftrace.
  • Part III — Writing your own programs: libbpf + CO-RE in C, then loading from Go (cilium/ebpf).
  • Part IV — Networking: XDP, tc, dissecting the Cilium datapath, writing your own XDP firewall.
  • Part V — Security: LSM BPF, seccomp-bpf, runtime enforcement.
  • Part VI — Observability: profiling, latency histograms, Hubble internals.
  • Part VII — End-to-end Cilium case study and wrap-up.

🧹 Cleanup

This article only reads kernel state with bpftool (loads/changes nothing), so there is nothing to clean up. The source code and commands for the series are at github.com/nghiadaulau/ebpf-from-scratch, directory 00-intro.

Wrap-up

eBPF runs sandboxed programs inside the Linux kernel without changing kernel source or loading a module — safe (the verifier checks before running) and fast (JIT to native code), decoupling innovation from the kernel cycle. A program goes through the lifecycle: write C → clang compiles to bytecode → bpf() syscall loads it → the verifier checks safety (termination, valid memory, bounded complexity) → JIT to machine code → attach to a hook (XDP/tc/kprobe/tracepoint/LSM/socket) → run when an event happens, using maps to hold state and talk to userspace, able to call only helpers and not arbitrary kernel functions. We do not learn this in the abstract: worker-0 is running 140 Cilium programs, and bpftool shows each concept is real — xlated (verified), jited (native code), map_ids, btf_id on a real tc program named tail_no_service_ipv4.

Article 1 goes inside: what exactly the "eBPF virtual machine" is — its register set, instruction set, and why the verifier can prove a program safe before it runs.