cilium/ebpf: Loading eBPF From Go

K
Kai··4 min read

Article 9 built execsnoop in C: clang compiled the kernel side, bpftool generated the skeleton, a C loader linked libbpf. This post builds the exact same tool but with the loader written in Go using the cilium/ebpf library. This is how most eBPF tools in the Kubernetes world are written — Cilium, Tetragon, Falco all use Go — because Go is the lingua franca of the k8s ecosystem and produces a static binary that's easy to distribute.

The kernel side is unchanged, only the loader changes

The key point: exec.bpf.c (the kernel side, Article 9) stays the same — still the exec tracepoint pushing events through a ring buffer. Only the userspace part changes from C to Go. eBPF is eBPF; the only change is in who loads and reads it.

bpf2go: compile and generate Go bindings

In C, we ran clang then bpftool gen skeleton by hand. In Go, the bpf2go tool does both via a go:generate directive:

//go:generate go run github.com/cilium/ebpf/cmd/bpf2go -cc clang exec exec.bpf.c -- -I.
go generate
exec_bpfel.go    exec_bpfel.o      # little-endian: Go binding + embedded object
exec_bpfeb.go    exec_bpfeb.o      # big-endian

bpf2go runs clang to compile exec.bpf.c, then generates Go files (exec_bpfel.go) that embed the bytecode object straight into the Go code, along with ready-made types: execObjects (collecting maps + programs), loadExecObjects(), the HandleExec field (program), Events (map). The names are derived from the exec prefix + the function/map names in C. This is the equivalent of the skeleton in Article 9, but as Go code.

One required detail: add //go:build ignore to the top of exec.bpf.c. That's a Go build constraint (clang treats it as a comment and ignores it) so go build does not try to compile that C file as Go code — without it, the build reports C source files not allowed.

main.go: load, attach, read

import (
    "github.com/cilium/ebpf/link"
    "github.com/cilium/ebpf/ringbuf"
    "github.com/cilium/ebpf/rlimit"
)

type event struct {                 // matches the struct in exec.h
    Pid, Ppid uint32
    Comm      [16]byte
    Filename  [64]byte
}

func main() {
    rlimit.RemoveMemlock()                                   // raise the locked-memory limit
    var objs execObjects
    loadExecObjects(&objs, nil)                              // load + verifier + CO-RE + JIT
    defer objs.Close()

    tp, _ := link.Tracepoint("sched", "sched_process_exec", objs.HandleExec, nil)  // attach
    defer tp.Close()

    rd, _ := ringbuf.NewReader(objs.Events)                  // open the ring buffer
    defer rd.Close()

    var e event
    for {
        rec, err := rd.Read()                            // wait for the next event
        if errors.Is(err, ringbuf.ErrClosed) { return }
        binary.Read(bytes.NewReader(rec.RawSample), binary.LittleEndian, &e)
        fmt.Printf("%-16s %-8d %-8d %s\n", cstr(e.Comm[:]), e.Pid, e.Ppid, cstr(e.Filename[:]))
    }
}

Compare with the C loader in Article 9: loadExecObjects replaces exec_bpf__open_and_load (same job: load, CO-RE relocate, verifier, JIT); the link package replaces the attach call; ringbuf.Reader replaces ring_buffer__poll. The event that comes back is raw bytes — binary.Read decodes it into the event struct (which must match the C layout). rlimit.RemoveMemlock() is a standard step for a Go eBPF application.

Build and run

go generate              # bpf2go: clang + embed object + generate binding
go build -o execsnoop-go .
sudo ./execsnoop-go
COMM             PID      PPID     FILENAME
iptables         372461   213711   /usr/sbin/iptables
iptables         372462   213711   /usr/sbin/iptables
iptables         372463   213711   /usr/sbin/iptables

Same result as Article 9 — it streams every exec, showing cilium-agent (ppid 213711) constantly calling iptables. But the key difference: execsnoop-go is a single 5.4MB static binary that already has the eBPF object embedded inside. No need to ship a separate .o file or skeleton, no dependency on libbpf.so at runtime — just one file, scp it to another machine (same architecture, kernel with BTF) and it runs, CO-RE handles the kernel differences. This is why the k8s ecosystem favors Go for eBPF: bundle it into one binary, easy to containerize, easy to distribute.

Two traps when building

  • llvm-strip missing: bpf2go strips the object after compiling, which needs llvm-strip (the llvm package, not present with clang alone). Without it, go generate reports exec: "llvm-strip": executable file not found.
  • //go:build ignore on .bpf.c: as mentioned, without it go build reports the C source error.

🧹 Cleanup

go run github.com/cilium/ebpf/cmd/bpf2go ...   # the generated files: delete them or keep them in git
rm -rf /tmp/goebpf /tmp/gopath                 # project + module cache (cache needs sudo if read-only)

The program detaches itself on exit; the node returns to 140 programs. The source (exec.bpf.c, main.go, exec.h, go.mod/go.sum) is at github.com/nghiadaulau/ebpf-from-scratch, directory 10-cilium-ebpf-go; run go generate to produce the bindings, then go build.

Wrap-up

Same eBPF kernel-side program (Article 9), the loader switched from C to Go with cilium/ebpf. bpf2go (via go:generate) compiles exec.bpf.c with clang and embeds the object into the Go code along with typed bindings (execObjects, loadExecObjects, HandleExec, Events) — equivalent to the skeleton but in Go. main.go uses loadExecObjects (load + CO-RE + verifier + JIT), the link package to attach the tracepoint, ringbuf.Reader to read events, decoding the raw bytes into a struct matching the C layout. The result is a single static binary with eBPF embedded — no libbpf.so or loose files needed, easy to distribute, exactly why Cilium/Tetragon chose Go. Practical traps: you need llvm-strip (the llvm package) and //go:build ignore on the .bpf.c file.

Part III closes — we've written full eBPF tools ourselves, in both C (libbpf) and Go (cilium/ebpf). Part IV steps into eBPF's most famous domain: networking — XDP processing packets at the earliest point, tc, and dissecting Cilium's live datapath running on the cluster to see how its 74 sched_cls programs route each pod packet.