Workload Types: StatefulSet, DaemonSet, Job and CronJob

K
Kai··5 min read

Throughout the series we've used Deployment — and it does handle most needs (web apps, stateless APIs). But Kubernetes has a few other workload types for problems Deployment doesn't fit. Knowing they exist and when to use them lets you pick the right tool instead of forcing Deployment onto everything.

   Deployment    →  STATELESS app, pods are identical, freely replaceable      (web, API)
   StatefulSet   →  STATEFUL app, each pod has a stable identity                (database)
   DaemonSet     →  one pod on EVERY node                                       (log agent, monitoring)
   Job           →  runs until COMPLETE then stops                              (batch, migration)
   CronJob       →  a Job that runs on a SCHEDULE                               (nightly backup)

Job: run until done

Deployment assumes a pod runs forever. But many tasks have an endpoint: running a migration, processing a batch of data, sending a report. A Job runs a pod until it completes successfully then stops — no infinite restarts.

apiVersion: batch/v1
kind: Job
metadata:
  name: pi
spec:
  template:
    spec:
      containers:
        - name: pi
          image: perl:5.40-slim
          command: ["perl", "-Mbignum=bpi", "-wle", "print bpi(50)"]
      restartPolicy: Never
  backoffLimit: 2
kubectl apply -f job.yaml
kubectl get job pi
kubectl logs job/pi
NAME   STATUS     COMPLETIONS   DURATION   AGE
pi     Complete   1/1           17s        17s

3.1415926535897932384626433832795028841971693993751

The Job runs, computes 50 digits of pi, then goes Complete (1/1). restartPolicy: Never and backoffLimit: 2 (retry at most twice on failure) are two fields specific to Jobs. A Job keeps the pod around after it finishes so you can read the logs — quite unlike a Deployment.

CronJob: a Job on a schedule

A CronJob is a Job that runs periodically, using the familiar cron syntax (recall the Linux series):

apiVersion: batch/v1
kind: CronJob
metadata:
  name: hello
spec:
  schedule: "*/1 * * * *"        # every minute (cron syntax)
  jobTemplate:
    spec:
      template:
        spec:
          containers:
            - name: hello
              image: busybox:1.36
              command: ["sh", "-c", "date; echo 'Hello from CronJob'"]
          restartPolicy: OnFailure
kubectl get cronjob hello
NAME    SCHEDULE      SUSPEND   ACTIVE   LAST SCHEDULE   AGE
hello   */1 * * * *   False     0        <none>          0s

Every minute (*/1 * * * *), the CronJob creates a new Job that runs the command. This is the Kubernetes-native way to do nightly backups, periodic cleanup, sending reports — instead of a crontab on a single machine.

DaemonSet: one pod per node

Some things must run on every node, not "N replicas placed wherever": log-collection agents, monitoring exporters, network/storage plugins. A DaemonSet guarantees exactly one pod per node — add a new node and it automatically adds a pod onto it.

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: node-logger
spec:
  selector:
    matchLabels: { app: node-logger }
  template:
    metadata:
      labels: { app: node-logger }
    spec:
      containers:
        - name: logger
          image: busybox:1.36
          command: ["sh", "-c", "while true; do sleep 3600; done"]
kubectl get daemonset node-logger
NAME          DESIRED   CURRENT   READY   NODE SELECTOR   AGE
node-logger   1         1         1       <none>          5s

DESIRED 1 because minikube has one node. On a 10-node cluster, a DaemonSet would automatically have 10 pods — one per node. You don't declare replicas; the pod count always equals the node count. (Components like kube-proxy from Article 1 run as a DaemonSet.)

StatefulSet: for stateful apps

This is the most subtle type. With a Deployment, pods are anonymous and replaceable — random names (web-5687-9n9j7), and killing one and standing up another makes no difference. But a database is different: node 0 is the primary, nodes 1–2 are replicas; each node needs a stable identity and its own disk. A StatefulSet provides exactly that.

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: web-sts
spec:
  serviceName: nginx-sts
  replicas: 3
  selector:
    matchLabels: { app: nginx-sts }
  template:
    # ... like a Deployment ...
  volumeClaimTemplates:               # each pod gets its own PVC
    - metadata: { name: data }
      spec:
        accessModes: ["ReadWriteOnce"]
        resources: { requests: { storage: 50Mi } }
kubectl get pods -l app=nginx-sts
kubectl get pvc -l app=nginx-sts
NAME        READY   STATUS    AGE
web-sts-0   1/1     Running   6s
web-sts-1   1/1     Running   4s
web-sts-2   1/1     Running   2s

NAME             STATUS   VOLUME            CAPACITY   STORAGECLASS
data-web-sts-0   Bound    pvc-a745d57d...   50Mi       standard
data-web-sts-1   Bound    pvc-c50a2045...   50Mi       standard
data-web-sts-2   Bound    pvc-f1eaa136...   50Mi       standard

Three core differences from a Deployment show up immediately:

  • Stable, ordered names: web-sts-0, web-sts-1, web-sts-2 — not a random hash. Pod web-sts-0 dies and is recreated with the same name, and reattaches to the same disk.
  • One PVC per pod: volumeClaimTemplates automatically creates a PVC for each pod (data-web-sts-0/1/2) — pod 0's data isn't mixed with pod 1's. This is what a database needs.
  • Sequential create/delete: pods are created in order 0→1→2 (note the descending AGE), and deleted in reverse. Important for clusters with a primary/replica.

Note that a StatefulSet usually pairs with a headless Service (clusterIP: None) so each pod gets its own DNS name (web-sts-0.nginx-sts...) — allowing you to address each pod directly, which is what a database cluster needs.

In practice, running databases on Kubernetes usually uses an Operator (installed via CRD) rather than hand-writing a raw StatefulSet, because backup/failover gets complex. But understanding StatefulSet is the foundation for understanding those Operators.

Wrap-up

Beyond Deployment (stateless apps), Kubernetes has: Job (runs to completion then stops — migrations, batch), CronJob (a Job on a cron schedule — nightly backups), DaemonSet (exactly one pod on every node — log/monitoring agents, scales with node count), and StatefulSet (for stateful apps — stable ordered names, one PVC per pod, sequential create/delete, usually with a headless Service). Pick the right type by the nature of the task rather than cramming everything into a Deployment.

We've nearly got the full toolkit. Article 13 takes a step back to drill the daily survival skill: observe and debug — reading logs, exec, describe, events, and using the dashboard.