Blog

Thoughts on engineering, design, and building great products.

Off-CPU and Scheduler Latency: Measuring the Time a Process Is NOT Running

On-CPU profiling (Article 17) only sees the CPU when busy. But most latency an app feels is time it is NOT running: waiting for disk, a lock, or its CPU turn. eBPF measures that off-CPU interval via scheduler tracepoints. On a real node we measure two things: run-queue latency — from wakeup to actually running, exposing the 16-32ms tail under CPU contention; and off-CPU time — how long a task stays off the CPU each time, with a tail reaching several seconds for blocked tasks.

KaiMay 24, 2026· 14 views

CPU Profiling with perf_event: Sampling ...

LinuxPerformance

CPU Profiling with perf_event: Sampling Stacks, the Foundation of Flame Graphs

To know what the CPU is busy doing, we sample: a few dozen times per second, freeze each CPU and record the stack that's running. eBPF does this through the perf_event program type — attached to a kernel sampling counter, each time it fires it captures the stack and aggregates in the kernel. This article profiles a real node at 99Hz, sees dd eating CPU reading /dev/zero while idle cores sit in the idle loop, aggregates by process to get dd at 479 samples — the data a flame graph draws.

KaiMay 24, 2026· 10 views

bpftrace: Maps, Counting and Histograms

LinuxPerformance

bpftrace: Maps, Counting and Histograms

Printing line by line floods the screen when events are dense. bpftrace's real power is aggregating data right inside the kernel: counting by key, building distribution charts, then returning only a small summary. This post uses bpftrace's @ map to count syscalls by process, then builds a real vfs_read latency histogram with a kprobe/kretprobe pair — seeing the distribution as ASCII bars, including the slow tail that an average would hide.

KaiMay 24, 2026· 13 views

Optimization and Execution Strategy

DevOpsAnsible

Optimization and Execution Strategy

Running Ansible fast and safely across hundreds of hosts: forks and strategy control parallelism, serial for zero-downtime rolling updates, delegate_to/run_once, async for long tasks, fact caching and pipelining for speed, tags and check mode for control.

KaiMay 23, 2026· 14 views