Case Study: A Packet Through Cilium's eBPF Datapath

Across nineteen articles, we dissected each eBPF piece in isolation. Part VII puts them back together. This article has no new concepts — it follows one single packet through Cilium's datapath on the cluster, and each step points back to the article that explained that piece. The scenario: a pod makes a DNS call — querying the kube-dns Service at ClusterIP 10.32.0.10:53, which is load-balanced to the real CoreDNS pod at 10.200.0.44:53.

The big picture: what the packet goes through

  Source pod                                                   CoreDNS pod
  (e.g. 10.200.0.64)                                           (10.200.0.44)
      │ sends to Service 10.32.0.10:53                              ▲
      ▼                                                              │
  veth lxc… ──► cil_from_container (tc, Art.12) ──┐                  │
                                                  │ tail call (Art.4)│
                          tail_handle_ipv4 ───────┤                  │
                          tail_ipv4_ct_*    ──────┤                  │
                                                  ▼                  │
              ┌──────────── BPF map lookup (Article 3) ───────────┐  │
              │ cilium_lb4_services: 10.32.0.10:53 -> 10.200.0.44:53  (Article 12)
              │ cilium_ct4_global:   create a connection-tracking entry  (Article 12)
              │ cilium_lxc/policy:    allow by identity 18203          (Article 12/19)
              └───────────────────────────────────────────────────┘
                                                  │ DNAT destination -> backend
                                                  ▼
                            same node? redirect veth->veth ───────────┘
                                  │ different node
                                  ▼
                          cil_to_netdev ─► ens5 ─► dest node ─► cil_from_netdev ─► … ─► pod
                                  │
                                  └─► bpf_perf_event_output -> cilium_events -> Hubble (Article 19)

Step 1 — Leaving the pod: cil_from_container

The pod sends a packet to 10.32.0.10:53. The packet hits the head of the pod's veth (lxc…), where Cilium attaches cil_from_container at tc ingress (Article 12). This is the first point eBPF grabs the packet — a sched_cls program, running after the kernel has built the sk_buff, so it has full metadata (Article 13). Every packet access in here has already been proven safe by the verifier at load time (Article 2), and runs as native JIT code (Article 1).

Step 2 — Tail call: splitting the datapath

cil_from_container does not do everything in one program — the 1-million-instruction limit (Article 2) does not allow it. It tail calls (Article 4) through the prog_array cilium_call_policy into a chain of programs, exactly the names we saw in Article 12:

cil_from_container -> tail_handle_ipv4 -> tail_ipv4_ct_ingress -> cil_lxc_policy -> ...

Each program handles one job then jumps to the next, each staying within the verifier's limit.

Step 3 — Load balancing: look up cilium_lb4_services, then DNAT

10.32.0.10:53 is a virtual ClusterIP, not any pod's real address. The datapath looks up cilium_lb4_services (a BPF hash map, Articles 3 & 12) — the Service table we read in human-readable form:

10.32.0.10:53/UDP (2)   10.200.0.44:53/UDP  (6) (2)   <- CoreDNS backend

Having found the backend, the datapath picks a slot and DNATs the destination from 10.32.0.10:53 to 10.200.0.44:53 — right in the kernel, at the pod's veth, not through iptables, not through kube-proxy (this cluster is kube-proxy-less). This is the whole of Kubernetes "load balancing" reduced to a single hash-map lookup plus an address rewrite.

Step 4 — Conntrack and policy

So the return path is reverse-NATed correctly, the datapath writes an entry into cilium_ct4_global (Article 12) — this connection is now tracked, along with the source pod's SourceSecurityID. Then cil_lxc_policy applies the NetworkPolicy not by IP but by identity (Articles 12 & 19): the source pod carries a security identity (number), CoreDNS carries 18203; the program looks up the policy map by the identity pair to allow or drop. We confirmed 18203 is indeed CoreDNS:

$ cilium identity get 18203
18203   k8s:k8s-app=kube-dns   k8s:io.kubernetes.pod.namespace=kube-system

Step 5 — Delivering the packet: same node or different node

After DNAT + a "permit" policy, the datapath delivers the packet to the backend:

Same node: redirect straight from the source veth to the destination pod's veth (tail_ipv4_to_endpoint), without going up the network stack — fastest.
Different node: push out to cil_to_netdev → ens5 (past the XDP layer of Article 11) → to the destination node, where cil_from_netdev receives it and runs the tail-call chain again to deliver into the CoreDNS pod's veth.

Step 6 — Observation: cilium_events

Along the way, the datapath calls bpf_perf_event_output to push events into the perf ring cilium_events (Article 19). cilium-agent reads them out, Hubble enriches the numeric identities into names. The whole connection lifecycle appears as a readable flow — exactly the lines we saw in Article 19:

10.200.0.64:54764 (host) -> kube-system/coredns-…:8080 (ID:18203) to-endpoint FORWARDED (SYN)
                         <- kube-system/coredns-…:8080 (ID:18203) to-stack    FORWARDED (SYN,ACK)
                         -> kube-system/coredns-…:8080 (ID:18203) to-endpoint FORWARDED (ACK)
                         ...                                                              (ACK,FIN)

The same machine, many roles

The thing worth pausing on: every step above is eBPF, the same technology we learned from Article 1. Load balancing, NAT, connection tracking, applying a security policy, generating observation events — not five separate systems, but eBPF programs attached at different hooks, sharing state through BPF maps, chained together by tail calls, all in the kernel. That is why Cilium can replace a whole stack of old tools (kube-proxy + iptables + an observation agent + a policy agent) with one unified datapath. The 74 sched_cls programs and 56 maps on the node (Article 0) are not a mess — they are this machine, dissected piece by piece across the entire series.

🧹 Cleanup

This is a synthesis article; it only references state and data already gathered in earlier articles — nothing new runs, nothing to clean up. Diagram + command references at github.com/nghiadaulau/ebpf-from-scratch, directory 20-case-study.

Wrap-up

A packet from a pod calling the DNS Service travels through a unified eBPF chain that ties the whole series together: it leaves the pod at cil_from_container (tc/sched_cls, Articles 12-13, already through the verifier of Article 2 + JIT of Article 1); tail calls (Article 4) through cilium_call_policy into tail_handle_ipv4→tail_ipv4_ct_ingress→cil_lxc_policy; load balances by looking up cilium_lb4_services (BPF map, Article 3/12) + DNAT 10.32.0.10:53→10.200.0.44:53 (kube-proxy-less); writes conntrack cilium_ct4_global and applies identity-based policy 18203=CoreDNS (Article 12/19); delivers via veth redirect (same node) or cil_to_netdev→ens5→destination node; and emits events via bpf_perf_event_output→cilium_events→Hubble (Article 19). It is all eBPF — one technology, many hooks, shared maps, chained by tail calls — so Cilium folds LB + NAT + conntrack + policy + observability into a single in-kernel datapath.

Article 21 (the last) closes the series with a hands-on task in the true "from scratch" spirit: writing a complete observation tool yourself — connmon, a real-time node-wide TCP connection monitor built with kprobe + ring buffer + a Go loader — then looking back over the whole journey.