The Tetragon Way: From Observe to Enforce with bpf_send_signal

K
Kai··5 min read

Articles 14 (LSM) and 15 (seccomp) enforce at dedicated security points. But the most common way the Kubernetes ecosystem does runtime security — Tetragon (a Cilium project) — takes a different route: it attaches to the very observation hooks Part II used (kprobe, tracepoint), and from there it acts. This article dissects the mechanism that turns observation into enforcement, and rebuilds it.

Tetragon: an observation tool + one "act" button

Tetragon is essentially a kprobe/tracepoint observation tool (like execsnoop in Article 9) plus the ability to enforce. According to the Tetragon docs, it enforces in two ways, both of which are eBPF helpers running straight in the kernel — no switch to userspace required:

  1. bpf_send_signal() — sends a signal (usually SIGKILL) to the process that just matched a policy, killing it synchronously right inside the kernel.
  2. bpf_override_return() — overrides the return value of a function/syscall (e.g. forcing openat to return -EPERM), blocking the operation without killing the process.

The subtle point: the eBPF program matches the event against the policy in the kernel; if it matches, it calls the helper — there is no userspace round trip in the enforcement path. That is what makes it fast and hard to evade.

Rebuilding the SIGKILL mechanism

We rebuild mechanism (1) exactly: a tracepoint on sched_process_exec (like Article 9) but instead of printing the event, it kills the process if it is a forbidden binary. Narrow and safe: it only kills a process that execs exactly /tmp/forbidden-bin.

SEC("tracepoint/sched/sched_process_exec")
int kill_forbidden(struct trace_event_raw_sched_process_exec *ctx)
{
    char fn[24] = {};
    unsigned off = ctx->__data_loc_filename & 0xFFFF;
    bpf_probe_read_kernel_str(fn, sizeof(fn), (void *)ctx + off);   // the exec'd filename

    const char want[] = "/tmp/forbidden-bin";
    for (int i = 0; i < sizeof(want) - 1; i++)
        if (fn[i] != want[i])
            return 0;                  // not the target -> let it run

    bpf_send_signal(9);                        // SIGKILL this very process
    return 0;
}

The first half is identical to an observation tool — read the exec'd filename from the tracepoint's dynamic field (Article 9). The only difference is the last line: instead of bpf_ringbuf_submit to report, it calls bpf_send_signal(9) — sending SIGKILL to the current process (the one that just exec'd). Observation becomes enforcement with a single helper.

Run it: forbidden binary killed, normal binary runs

Attach with bpftool (autoattach), then try it:

sudo bpftool prog loadall tetra_kill.bpf.o /sys/fs/bpf/tetra autoattach
cp /bin/sleep /tmp/forbidden-bin
/bin/sleep 0.2            # normal binary
/tmp/forbidden-bin 5     # forbidden binary
844: tracepoint  name kill_forbidden  ...        <- attached to exec
-- normal /bin/sleep 0.2 --
  sleep OK (exit 0)                               <- runs normally
-- /tmp/forbidden-bin 5 --
  Killed
  exit=137 (137 = 128 + SIGKILL)                  <- killed RIGHT at exec

/bin/sleep finishes normally. /tmp/forbidden-bin (also sleep, just a different name) is Killed immediately, and the shell reports exit=137 — exactly 128 + 9 (SIGKILL). The process dies right at the moment of exec, before it can run a single instruction. The "forbid this binary" policy is enforced entirely in the kernel, from an observation tracepoint.

An important fact: SIGKILL is not always enough

The Tetragon docs make a point that is easy to miss: sending SIGKILL synchronously stops the process, but does not always prevent the operation already in progress. For example, SIGKILL fired inside a write() does not guarantee the data hasn't already been written to the file — the syscall may have finished its work before the process died. So to reliably block an operation you have to combine the two: use bpf_override_return() to force the syscall to return an error (the operation does not happen) and bpf_send_signal() to kill the process. For enforcement at exec (as in the demo), SIGKILL is enough — the process dies before doing anything; but for enforcement at a write/modify syscall you have to override.

(This cluster has CONFIG_BPF_KPROBE_OVERRIDE=y and CONFIG_FUNCTION_ERROR_INJECTION=y, so bpf_override_return is usable — but it can only attach to functions marked ALLOW_ERROR_INJECTION, a narrow list of points safe to override.)

Why this route needs no reboot

Unlike Article 14: this demo attaches to a tracepoint, not an LSM hook, and enforces with a signal — so it runs without bpf in the LSM list and without a reboot. That is also why Tetragon chose this route: it works on stock kernels that don't have BPF LSM enabled, needing only kprobe/tracepoint + bpf_send_signal (available since kernel 5.3). The trade-off: it enforces after the event has started (killing the process that is/has exec'ing), unlike LSM which blocks before the operation happens. Each mechanism has its place.

🧹 Cleanup

sudo rm -rf /sys/fs/bpf/tetra      # remove the pin -> unload the program
rm -f /tmp/forbidden-bin

After removal, bpftool prog show no longer lists kill_forbidden; the node is back to 140 programs. The source (tetra_kill.bpf.c, build/attach commands) is at github.com/nghiadaulau/ebpf-from-scratch, directory 16-tetragon-style.

Wrap-up

Tetragon does runtime security by attaching to observation hooks (kprobe/tracepoint) and then enforcing through two eBPF helpers that run straight in the kernel: bpf_send_signal() sends SIGKILL to kill a process synchronously, and bpf_override_return() overrides a syscall's return value to block an operation. We rebuilt the SIGKILL mechanism: a sched_process_exec tracepoint (exactly like the observation tool in Article 9) but with a last line that calls bpf_send_signal(9) instead of reporting — the forbidden binary /tmp/forbidden-bin is killed right at exec (exit 137 = 128+SIGKILL), the normal binary runs fine. A fact from the docs: synchronous SIGKILL does not always prevent an operation in progress (e.g. a write may have already written), so to be sure you must combine bpf_override_return. This route attaches to a tracepoint + signal, so it needs no BPF LSM and no reboot (unlike Article 14) — it runs on stock kernels, with the trade-off being enforcement after the event starts rather than blocking before.

Part V closes — three enforcement mechanisms: LSM BPF (block before, semantic, needs the bpf LSM), seccomp (raw syscall filtering, every container), the Tetragon way (kprobe/tracepoint + signal/override, no LSM needed). Part VI returns to observation but at the deepest layer: performance observability — profiling with perf_event, measuring kernel-layer latency, off-CPU, and dissecting how Hubble builds a cluster-wide network flow picture from eBPF events.