Maps: Memory and the Bridge to Userspace

The eBPF programs in the previous articles run once per event then terminate — the verifier even requires that they terminate (Article 2). So what does a program remember between two runs? How does userspace get the result it computed? The answer to both is maps.

What maps are

Maps are key-value data structures that live in the kernel, decoupled from a program's single-run lifecycle. They serve two purposes:

Hold state between program runs (each time it execs on an event, the program writes to the map; the map is still there on the next run).
Bridge to userspace: a userspace process reads/writes that same map via the bpf() syscall — this is how an eBPF tool reports results to the outside, and how userspace configures the program.

   event ──► [eBPF program] ──write──► ┌─────────┐ ◄──read/write── [userspace]
   (each run                            │   MAP   │                 (bpftool,
    then shuts off)                     └─────────┘                  your app)
                                     (lives independently, via bpf() syscall)

There are many map types for different needs: array (index), hash (arbitrary key), percpu_* (one copy per CPU, avoiding contention), lru_hash (self-evicting), ringbuf (push events to userspace)... Most of the Cilium datapath holds state in maps.

Inspecting a production map

bpftool map dump reads the contents of a running map directly. Take cilium_metrics (id 171, type percpu_hash) that Article 0 already saw:

sudo bpftool map dump id 171

key:
03 01 63 04 01 00 00 00
value (CPU 00): cb 33 03 00 00 00 00 00  ...
value (CPU 01): 96 43 04 00 00 00 00 00  ...

This is real metrics Cilium is collecting — and note one separate value per CPU (percpu). That is an important trick: if multiple CPUs increment a shared counter you need locking/atomics (slow); with a per-CPU map, each CPU writes its own copy without contention, and userspace sums them up on read. Cilium uses this type for the hot path that processes packets.

Writing your own: count exec into a map

To see the full "program writes — userspace reads" loop, write a program that counts every time the system execs a process. The map is declared with BTF syntax right in C:

struct {
    __uint(type, BPF_MAP_TYPE_ARRAY);
    __uint(max_entries, 1);
    __type(key, __u32);
    __type(value, __u64);
} exec_count SEC(".maps");

SEC("tracepoint/sched/sched_process_exec")
int count_exec(void *ctx)
{
    __u32 key = 0;
    __u64 *val = bpf_map_lookup_elem(&exec_count, &key);   // helper to access the map
    if (val)
        __sync_fetch_and_add(val, 1);                  // atomic increment
    return 0;
}

Two mechanism points: the program does not touch the map directly but calls the helper bpf_map_lookup_elem (Article 0 — eBPF only calls helpers, not arbitrary kernel functions); and because this is a plain array (one value shared across all CPUs), it must use __sync_fetch_and_add (atomic) so that two CPUs do not overwrite each other — exactly what the per-CPU map avoids.

Compile, load, and auto-attach to the tracepoint:

clang -O2 -g -target bpf -I/usr/include/x86_64-linux-gnu -c count_exec.bpf.c -o count_exec.bpf.o
sudo bpftool prog loadall count_exec.bpf.o /sys/fs/bpf/cexec autoattach

Read the counter, run a few commands (each command is an exec), then read again:

sudo bpftool map dump name exec_count          # value: 206
for i in 1 2 3 4 5; do /bin/true; /bin/date >/dev/null; done
sudo bpftool map dump name exec_count

[{
        "key": 0,
        "value": 233
    }
]

206 → 233. The counter goes up because the eBPF program in the kernel increments it on every exec (both the commands we ran and the system's background execs), while bpftool in userspace reads that same map via the bpf() syscall. Without a map, the program forgets as soon as it finishes; with a map, it accumulates and can report results to the outside. (The starting number is 206 rather than 0 because between attaching and the first read, the system had already execed 206 times — it counts real events.)

🧹 Cleanup

sudo rm -rf /sys/fs/bpf/cexec          # unpin -> free the program + map
rm -f /tmp/count_exec.bpf.*

Unpinning frees both the program and the map it created; the node returns to 140 programs. Source code is at github.com/nghiadaulau/ebpf-from-scratch, directory 03-maps.

Wrap-up

An eBPF program runs per event then shuts off, keeping no variables between runs — maps are its memory and the bridge to userspace: the program writes via helpers (bpf_map_lookup_elem/update), userspace reads/writes the same map via the bpf() syscall. We inspected the real cilium_metrics (type percpu_hash, one value per CPU to avoid contention on the hot path), then wrote our own tracepoint program counting exec into an array map (using __sync_fetch_and_add atomic because the array is shared): load + autoattach, and bpftool map dump shows the counter going up 206→233 as the system execs — the kernel program writes, userspace reads, the loop closes. Choosing the map type (array/hash/percpu/lru/ringbuf) is choosing the trade-off between key structure, CPU contention, and how data is pushed out.

Article 4 answers "where does the program attach": the program types and hooks — why an XDP program can see packets while a tracepoint sees syscall arguments, and what each kind is allowed to do.

Maps: Memory and the Bridge to Userspace

What maps are

Inspecting a production map

Writing your own: count exec into a map

🧹 Cleanup

Wrap-up

Related Posts

Inside Hubble: From eBPF Events to Cluster-Wide Network Flows

The Tetragon Way: From Observe to Enforce with bpf_send_signal