The eBPF Virtual Machine: Registers, Instruction Set, and Bytecode

K
Kai··5 min read

Article 0 showed an eBPF program carrying two numbers: xlated 512B and jited 333B. xlated is the eBPF bytecode after the verifier; jited is the native machine code after JIT. This article goes inside that bytecode — eBPF runs on a virtual machine, and understanding this virtual machine is understanding why eBPF is both fast and safe.

The eBPF virtual machine: 11 registers, RISC-style

eBPF is not an arbitrary interpreted scripting language. It is a RISC-style virtual machine deliberately designed to resemble a real CPU architecture (x86-64, arm64), so that it translates to native code almost one-to-one. The virtual machine has 11 64-bit registers (r0r10), each with a conventional role:

   r0        return value (of a helper, and the program's exit code)
   r1 - r5   arguments passed to a helper (and the context at start: r1 points to the input)
   r6 - r9   callee-saved (preserved across helper calls)
   r10       stack frame pointer — READ-ONLY (cannot be modified)

This convention matches the calling convention of a real CPU, so the JIT maps r1rdi, r2rsi... directly on x86-64. The read-only r10 is a first safety constraint: the program has a stack region to work in but cannot arbitrarily move the frame pointer.

The instruction set is split into 8 classes: LD, LDX (load from memory), ST, STX (store to memory), ALU, ALU64 (32/64-bit arithmetic), JMP, JMP32 (branching). Each basic instruction is 64 bits long: 8-bit opcode, 4-bit destination register, 4-bit source register, 16-bit offset, 32-bit immediate (there is a wide 128-bit instruction form for a 64-bit immediate). Small and uniform — expressive enough, yet easy for the verifier to reason about.

Reading the bytecode of a real program

The theory above becomes clear when you read real bytecode. bpftool prog dump xlated prints the post-verifier bytecode of a running program. Take program id 903 (a Cilium/systemd cgroup_device that decides whether to allow device access or not):

sudo bpftool prog dump xlated id 903
 0: (61) r2 = *(u32 *)(r1 +0)      # LDX: load 4 bytes from context (r1+0) into r2
 1: (54) w2 &= 65535               # ALU: r2 &= 0xffff  (w2 = lower 32 bits of r2)
 2: (61) r3 = *(u32 *)(r1 +0)      # LDX: load that field again into r3
 3: (74) w3 >>= 16                 # ALU: r3 >>= 16
 4: (61) r4 = *(u32 *)(r1 +4)      # LDX: the next field of the context
 5: (61) r5 = *(u32 *)(r1 +8)
 6: (55) if r2 != 0x1 goto pc+5    # JMP: if r2 != 1 then jump to +5
 7: (bc) w1 = w3                   # ALU: r1 = r3
 8: (54) w1 &= 1
 9: (5d) if r1 != r3 goto pc+2     # JMP: compare two registers
10: (b4) w0 = 1                    # ALU: r0 = 1  (set the return value)
11: (95) exit                     # JMP: exit, return r0

It reads end to end:

  • r1 is the context — the program's input. For cgroup_device, it points to a structure describing the device operation (type, major, minor). The (61) ... = *(u32 *)(r1 +N) instructions are the LDX class, loading those fields into registers.
  • The (54) &=, (74) >>=, (bc) = instructions are the ALU class — arithmetic/logic, here splitting major/minor out of a packed field.
  • (55), (5d) are the JMP class — conditional branches, encoding the target as a relative offset (goto pc+5).
  • (b4) w0 = 1 then (95) exit — set r0 (the return value) then exit. For cgroup_device, r0=1 means "allow".

The numbers in parentheses — (61), (54), (95) — are the 8-bit opcodes. Each line is a 64-bit instruction exactly per the encoding above: opcode + registers + offset/immediate. This is not pseudo-code; this is what the eBPF virtual machine in the kernel executes.

From bytecode to machine code: JIT

The bytecode above is portable — the same program runs on x86-64 or arm64. But running it by interpretation is slow. So after the verifier, the JIT translates the bytecode to the native machine code of the node's architecture. Evidence that JIT (from Article 0) is enabled system-wide:

cat /proc/sys/net/core/bpf_jit_enable     # -> 1
sudo bpftool prog show id 903 | grep -oE 'xlated [0-9]+B|jited [0-9]+B'
xlated 512B      # eBPF bytecode (12 instructions × 8B + ...)
jited  333B      # native x86-64 machine code

jited 333B is the size of the real x86-64 code the CPU runs directly — no interpretation. (Unfortunately the bpftool build on this node does not include a disassembler, so it cannot print the asm; but having jited and bpf_jit_enable=1 already confirms the program runs as native code.) This is why eBPF is usable in performance-critical spots like processing every packet: the cost is close to precompiled C code.

Why this design is the foundation of safety

The key point is not just "compact." It is precisely the virtual machine's constraints that let the verifier prove safety before allowing a run:

  • A fixed number of registers, no arbitrary pointers → the verifier can track what each register holds (a number, a pointer to context, a pointer to stack...) at every step.
  • Memory access via register + offset with a clear type → the verifier can check that every read/write stays within bounds.
  • Jumps by relative offset, no arbitrary indirect jumps → the verifier can build the control-flow graph and prove the program terminates.

A Turing-complete virtual machine with arbitrary function pointers cannot be verified this way (the halting problem). eBPF deliberately restricts itself in exchange for the ability to prove safety — that is the price, and also the beauty.

🧹 Cleanup

This article only dump/shows running programs, loads or changes nothing — there is nothing to clean up. The commands are at github.com/nghiadaulau/ebpf-from-scratch, directory 01-vm.

Wrap-up

eBPF is a RISC-style virtual machine: 11 64-bit registers (r0 return, r1r5 arguments/context, r6r9 callee-saved, r10 read-only frame pointer), an 8-class instruction set (LD/LDX/ST/STX/ALU/ALU64/JMP/JMP32), each basic instruction 64-bit. We read the real xlated bytecode of a Cilium program id 903 and traced each instruction: LDX loads context from r1, ALU splits fields, JMP branches by relative offset, sets r0 then exit. After the verifier, the JIT translates the bytecode (512B) to native x86-64 machine code (333B), running at native speed (bpf_jit_enable=1). Most importantly: it is precisely the virtual machine's constraints — fixed registers, typed memory access, relative jumps — that let the verifier prove safety.

Article 2 goes into the verifier itself: how it tracks each register's state, how it rejects an unsafe program — and we will load a deliberately wrong program to watch the verifier block it with a log explaining why.