Health Checks: Liveness and Readiness Probes

STATUS: Running does not mean the application is actually working. A process can still be alive but completely hung (deadlock), or the container shell may have just started while the app inside hasn't connected to the database yet. Kubernetes needs a way to ask a container "are you healthy? are you ready?" — that's a probe. Getting the two probe types right is the first step toward a self-healing cluster that never sends traffic into a pod that isn't ready.

Two different questions

This is where many people get confused. Liveness and readiness answer two different questions, and act differently:

   Liveness probe   ──  "Is the container still ALIVE (not hung)?"
                        Fail → kubelet RESTARTS the container.

   Readiness probe  ──  "Is the container READY to take traffic yet?"
                        Fail → remove pod from Service endpoints (NO restart).

In short: liveness handles restarting what's hung; readiness handles hiding traffic from what isn't ready. A pod can be "alive" but "not ready" (loading a cache) — in that case you want it to not take traffic, not to be restarted.

Three check types

Both probe types check in one of three ways:

httpGet — call a URL; a 2xx/3xx code is healthy. The most common for web apps (usually a /healthz endpoint).
exec — run a command inside the container; exit code 0 is healthy.
tcpSocket — try to open a TCP port; if it opens, healthy.

Plus timing parameters: initialDelaySeconds (how long to wait before the first probe), periodSeconds (probe every N seconds), failureThreshold (how many consecutive fails before it counts as broken).

Liveness probe: restart what's hung

See it firsthand. The pod below stays deliberately "healthy" for the first 20 seconds (file /tmp/healthy exists), then deletes the file so the probe starts failing:

apiVersion: v1
kind: Pod
metadata:
  name: liveness-demo
spec:
  containers:
    - name: app
      image: busybox:1.36
      args: ["sh", "-c", "touch /tmp/healthy; sleep 20; rm /tmp/healthy; sleep 600"]
      livenessProbe:
        exec:
          command: ["cat", "/tmp/healthy"]   # file present → healthy; gone → fail
        initialDelaySeconds: 5
        periodSeconds: 5
        failureThreshold: 2

kubectl apply -f liveness-fail.yaml
# wait ~45s then look
kubectl describe pod liveness-demo | grep -iE "Unhealthy|Killing"

Warning  Unhealthy  15s (x2 over 20s)  kubelet  Liveness probe failed: cat: can't open '/tmp/healthy': No such file or directory
Normal   Killing    15s                kubelet  Container app failed liveness probe, will be restarted

After the file disappears, the probe fails twice in a row (failureThreshold: 2), kubelet concludes the container is hung and restarts it. Count the restarts:

kubectl get pod liveness-demo

NAME            READY   STATUS    RESTARTS     AGE
liveness-demo   1/1     Running   1 (6s ago)   66s

RESTARTS 1 — kubelet restarted the container on its own. This is self-healing at the container level: a hung app is brought back to life with no one intervening. (If it keeps failing, you'll see CrashLoopBackOff — kubelet restarts with an increasing backoff.)

Readiness probe: hide traffic from a not-ready pod

The pod below runs nginx but its readiness probe calls /healthz — a path nginx doesn't have, so it always returns 404:

      readinessProbe:
        httpGet:
          path: /healthz       # nginx has no such path → 404 → never ready
          port: 80
        periodSeconds: 5
        failureThreshold: 2

kubectl apply -f readiness-demo.yaml
kubectl get pod readiness-demo

NAME             READY   STATUS    RESTARTS   AGE
readiness-demo   0/1     Running   0          18s

The core difference is right here: STATUS is still Running, RESTARTS is still 0 — Kubernetes did not restart it. But READY is 0/1. The reason:

Warning  Unhealthy  ...  kubelet  Readiness probe failed: HTTP probe failed with statuscode: 404

Because it isn't READY, this pod is removed from its Service's endpoint list (recall endpoints from Article 5). The Service will not send traffic to it until it's ready. This is exactly what you want when a pod is starting up or temporarily overloaded: don't kill it, just don't send customers its way until it can handle them.

startupProbe: for slow-starting apps

Some apps start slowly (loading data, migrating the DB...). If you set liveness too aggressively, it will kill the app while the app is still starting — a death loop. startupProbe solves this: a separate check for the startup phase that defers liveness/readiness until startup is done. Use it for any app with an unpredictable startup time.

Why this matters

Probes aren't decorative options — they're the foundation of reliability:

Without liveness, a hung app just stays hung; no one brings it back.
Without readiness, a rolling update (Article 4) sends traffic to a new pod before it's ready → users get 502 errors for a few seconds on every deploy. With readiness, Kubernetes only shifts traffic once the new pod is truly ready — that's what makes "zero-downtime deploy" real rather than just theoretical.

One caveat: don't make the liveness probe depend on something external (like a database). If the DB goes down, every pod fails liveness and gets restarted en masse — making things worse. Liveness should only ask "is my process still responding"; external dependencies belong to readiness.

Wrap-up

Probes tell Kubernetes whether a pod is actually healthy, via httpGet/exec/tcpSocket. Liveness fail → restart the container (revive what's hung, container-level self-healing). Readiness fail → remove the pod from Service endpoints without restarting (hide traffic from a not-ready pod) — exactly what makes zero-downtime rolling updates real. startupProbe protects slow-starting apps. The golden rule: liveness should only ask about the process itself; don't let it depend on external services.

The pod is healthy and ready at the right time. Article 11: teach Kubernetes how much resource each pod needs (requests/limits) and automatically scale with load using the HPA.