Docker Swarm: Cluster Architecture and Raft

K
Kai··6 min read

So far we've run containers on one machine. But real systems need multiple machines: to handle more load, and so that when one machine dies the system stays alive. Orchestrating containers across multiple machines is called orchestration, and Docker ships a tool for it: Docker Swarm. The last four articles of the series cover Swarm, starting with the architecture.

What Swarm is

Per the Docker docs, "a swarm consists of multiple Docker hosts which run in swarm mode and act as managers (to manage membership and delegation) and workers (which run the services)". In other words, Swarm gathers many machines running Docker into one logical cluster, then you command the whole cluster instead of each machine.

Each machine in the cluster is a node — a Docker engine instance taking part in the swarm. There are two roles:

  • Manager node: takes commands from you, makes orchestration decisions (which container goes on which machine), and maintains the cluster state. A manager can also run tasks.
  • Worker node: receives and runs tasks assigned by a manager. Workers make no orchestration decisions.
                 you give commands
                       │
            ┌──────────▼───────────┐
            │   Manager nodes      │   ── orchestrate + hold state (Raft)
            │  [m1*] [m2] [m3]     │      (* = leader)
            └─────┬────────┬───────┘
                  │ assign tasks
        ┌─────────▼──┐  ┌──▼─────────┐  ┌────────────┐
        │ Worker w1  │  │ Worker w2  │  │ Worker w3  │  ── run containers
        │ [task][task]│  │ [task]    │  │ [task][task]│
        └────────────┘  └───────────┘  └────────────┘

The desired-state model: you declare, Swarm handles it

This is the most important mindset shift when moving from docker run to Swarm. With docker run you give an imperative command: "run this container". With Swarm you declare a desired state: "I want 5 replicas of this app running at all times".

The manager continuously compares the actual state to the desired state and self-adjusts. A worker dies and the 2 replicas on it are lost? The manager sees "there are 3, we need 5" and starts 2 new replicas on another node. You don't have to step in. This self-healing is the main reason people use orchestration.

Raft: why managers must reach consensus

If there's only one manager, when it dies the whole cluster loses its brain. So you usually have multiple managers. But with multiple managers you must ensure they're consistent with each other about the cluster state — they can't each think differently. Swarm solves this with the Raft consensus algorithm.

Per the Docker docs, Raft is used to "ensure that all the manager nodes that are in charge of managing and scheduling tasks in the cluster store the same consistent state". The core mechanism:

  • The managers elect a leader; the leader orchestrates, and the other managers replicate the state.
  • Every state change must be agreed by a majority (quorum) of managers to take effect. Quorum is (N/2)+1 of N total managers.
  • This lets the cluster tolerate up to (N-1)/2 failed managers and still operate.

The docs give an example: a 5-manager cluster, if 3 managers are unavailable then "the system cannot process any more requests to schedule additional tasks" (because quorum is lost). Running tasks keep running, but no further orchestration is possible.

Why use an odd number of managers

This is a direct consequence of the quorum formula. Compare:

   N managers   quorum (N/2)+1   max failures tolerated (N-1)/2
   ─────────────────────────────────────────────────────────────
       1             1                  0
       3             2                  1
       4             3                  1   ← costs more but no better than 3
       5             3                  2
       7             4                  3

Notice: 4 managers tolerate exactly 1 failure — the same as 3 managers, but at the cost of one extra machine. An even number doesn't increase fault tolerance and is more prone to getting stuck on quorum. So always use an odd number of managers: usually 3 (for small/medium clusters) or 5 (for large clusters). More than 7 managers is rarely needed and slows Raft down because more replicas have to sync.

Workers don't take part in Raft, so you can add as many workers as you like without affecting quorum.

Initializing a swarm

Turn the current machine into the first manager:

docker swarm init
Swarm initialized: current node (s48p4l0i2md2...) is now a manager.
To add a worker to this swarm, run the following command:
    docker swarm join --token SWMTKN-1-4tvv...   192.168.65.3:2377

The command returns a ready-made command for another node to join as a worker, along with a token and the manager address (port 2377 is the cluster management port). On a worker machine you just run that exact docker swarm join --token ... <manager-ip>:2377 command.

See the nodes in the cluster:

docker node ls
HOSTNAME         STATUS   MANAGER STATUS
docker-desktop   Ready    Leader

Right now there's only one node, both manager and leader. Retrieve the token when needed:

docker swarm join-token worker     # token for a worker
docker swarm join-token manager    # token for a manager (add a manager)

Check the swarm state:

docker info | grep -A4 Swarm
 Swarm: active
  Is Manager: true
  Managers: 1
  Nodes: 1

How to learn multi-node with only one machine

Most Swarm commands (Articles 11–13) work on a single-node cluster — you can still create services, scale, and deploy stacks. But to clearly see orchestration across multiple machines (tasks spread across nodes, the manager replacing a dead node), you need multiple nodes. A few ways that don't cost a real server:

  • play-with-docker.com: a multi-node Docker environment running right in the browser, free, no install. The fastest way to play with real multi-node.
  • Lightweight VMs: use Multipass or VirtualBox to spin up a few Linux VMs, install Docker, then swarm join.
  • Cheap VPS: 2–3 small machines from a cloud provider, the most realistic but costs money.

In the following articles I demo on a single-node cluster (which runs on your machine) and call out which parts need multiple nodes to see clearly.

🧹 Cleanup note

We'll keep the swarm running throughout Articles 11–13 for hands-on practice. The full cleanup (leaving the swarm) is at the end of Article 13. If you want to leave the swarm right now:

docker swarm leave --force

(--force is needed because this is the last manager node.)

Wrap-up

Docker Swarm gathers many nodes into a cluster of managers (orchestrate, hold state via Raft) and workers (run tasks). You declare a desired state, and the manager keeps the actual state matching it — that's the foundation of self-healing. Raft requires a quorum of (N/2)+1, so the number of managers should be odd (3 or 5). docker swarm init creates the first manager and gives you a ready token for other nodes to join.

In Article 11 we use this cluster to deploy a service: run multiple replicas of an app, scale up and down, and update versions without downtime.