Observe and Debug: logs, exec, describe, events

K
Kai··5 min read

However elegant the theory, your real working day with Kubernetes is mostly debugging: a pod stuck Pending, a container CrashLoopBackOff, a service not getting traffic. This article adds no new concepts — it drills a reflex: when something breaks, where to look, in what order. We learn through two classic failures, reproduced for real on minikube.

The toolkit and the order to use it

When a pod isn't well, this is the sequence to follow:

   1. kubectl get pods           → what STATUS? how many RESTARTS?  (first glance)
   2. kubectl describe pod <p>   → the Events section: WHY this happened  (most important)
   3. kubectl logs <p>           → what the app INSIDE says  (application errors)
   4. kubectl exec -it <p> -- sh → go inside and inspect directly  (when you need to dig)

Rule number one: describe first, guess later. The Events section at the bottom of describe almost always points straight at the cause.

Case 1: ImagePullBackOff — failed to pull the image

Create a pod with a nonexistent image:

kubectl run broken-image --image=nginx:doesnotexist-9999
kubectl get pod broken-image
NAME           READY   STATUS         RESTARTS   AGE
broken-image   0/1     ErrImagePull   0          12s

STATUS: ErrImagePull (then it turns into ImagePullBackOff as Kubernetes backs off and retries). Don't guess — ask describe:

kubectl describe pod broken-image
Events:
  Type     Reason     Age   From      Message
  ----     ------     ----  ----      -------
  Normal   Scheduled  12s   default-scheduler  Successfully assigned default/broken-image to minikube
  Normal   Pulling    12s   kubelet   Pulling image "nginx:doesnotexist-9999"
  Warning  Failed     8s    kubelet   Failed to pull image "nginx:doesnotexist-9999":
                                      manifest for nginx:doesnotexist-9999 not found: manifest unknown

Events say it plainly: the image pull failed because that tag doesn't exist. In practice, ImagePullBackOff is usually from: a misspelled image name/tag, an image in a private registry missing an imagePullSecret, or losing network to the registry. describe tells you which one — no fumbling.

Case 2: CrashLoopBackOff — the container keeps dying

Create a pod whose container exits immediately with an error:

kubectl run crasher --image=busybox:1.36 -- sh -c "echo 'starting...'; exit 1"
kubectl get pod crasher
NAME      READY   STATUS   RESTARTS      AGE
crasher   0/1     Error    5 (2m17s ago) 3m46s

The telltale sign: RESTARTS climbing (here, already 5 times). The container runs, dies, Kubernetes restarts it, it dies again... Between restarts, Kubernetes backs off (waiting progressively longer: 10s, 20s, 40s...) and the pod shows STATUS: CrashLoopBackOff — that's how it avoids frantically restarting a broken container. Unlike ImagePullBackOff (which never got to run), here the container does run but exits immediately, so the place to look is the logs:

kubectl logs crasher
starting...

The app prints "starting..." then exit 1. In practice this is where you'd see a stack trace, "couldn't connect to DB", "missing environment variable"... If the container has already restarted and you want to see the logs of the previous run (the one that crashed), add -p:

kubectl logs crasher -p        # logs of the immediately previous run

CrashLoopBackOff is usually from: an app misconfiguration (missing env/secret — Article 7), an overly aggressive liveness probe (Article 10), or a wrong start command. The pair logs + logs -p is the key.

See all cluster events

describe gives events for one object. To see the cluster-wide timeline (useful when you're not sure what broke first):

kubectl get events --sort-by=.lastTimestamp
Warning   Failed    pod/broken-image   Failed to pull image ... manifest unknown
Warning   Failed    pod/broken-image   Error: ImagePullBackOff
Normal    BackOff   pod/broken-image   Back-off pulling image "nginx:doesnotexist-9999"

Sorting by time helps you reconstruct "what happened in what order" — invaluable when debugging a cascading failure.

exec: go inside and inspect

When you need to check from inside the container — is the config file correct, can it reach another service, is the environment variable set:

kubectl exec -it <pod> -- sh
# inside:
env | grep DB           # did the environment variable land? (Article 7)
cat /etc/config/app.conf
wget -qO- http://other-service     # can it reach another service? (Article 5)

This is the fastest way to verify a hypothesis. (For minimal images with no shell, Kubernetes has kubectl debug to attach a temporary container that comes with tools — more advanced but worth knowing.)

Dashboard: view the cluster through a UI

minikube comes with a graphical dashboard — handy for the big picture, especially when you're starting out:

minikube dashboard

It opens a browser with a UI to view every namespace, workload, pod, log, event — the same information kubectl gives, but easier to browse. Useful for grasping the big picture; for fast debugging and real work, kubectl is still faster.

A debugging mental model

Boil it down to a reflex:

   Abnormal STATUS?
     ├─ Pending          → describe: out of resources? no node yet? PVC not bound?
     ├─ ImagePullBackOff → describe: wrong image name / missing pull secret
     ├─ CrashLoopBackOff → logs (+ logs -p): why the app died
     ├─ Running 0/1      → readiness probe failing (Article 10): describe to check the probe
     └─ Service unreachable → kubectl get endpoints: does the selector match the pods?

Almost every fundamentals-level incident falls into one of these branches, and describe + logs solve most of them.

Wrap-up

Debugging Kubernetes is a process, not guesswork: get pods (STATUS, RESTARTS) → describe (the Events section — why) → logs (+ -p for the previous crashed run) → exec (inspect inside). ImagePullBackOff = can't pull the image (wrong name / missing pull secret) — read Events; CrashLoopBackOff = container runs then dies repeatedly (RESTARTS climbing) — read logs. kubectl get events --sort-by gives the cluster-wide timeline, minikube dashboard gives a graphical view. The golden rule: describe first, guess later.

We've got all the pieces — architecture, workloads, networking, configuration, storage, operations, debugging. Article 14 ties it all together into a complete project: deploy a multi-component application onto minikube end to end, then wrap up the whole series.