Swarm: Service, Scale and Rolling Update
In Article 10 we set up the cluster and talked about desired state. This article makes it concrete: on Swarm you don't run individual containers with docker run, you declare a service — and Swarm handles the rest.
Service and task
Per the Docker docs, "a service is the definition of the tasks to execute on the nodes", and "a task carries a container and the commands to run inside the container; it is the atomic scheduling unit of swarm". The relationship:
Service "web" (desired: 3 replicas)
│ the manager splits it into tasks
├── task web.1 ──► nginx container (on node X)
├── task web.2 ──► nginx container (on node Y)
└── task web.3 ──► nginx container (on node Z)
You declare the service "wants 3 replicas"; the manager creates 3 tasks; each task runs a container, possibly on different nodes. You manage at the service level, not each container.
There are two service modes:
- replicated (default): runs exactly the number of replicas you set, and the manager distributes them across nodes. Used for most applications.
- global: runs exactly one replica on each node in the cluster. Good for monitoring/log agents that need to be present on every machine.
Creating a service
Create a web service with 3 replicas, publishing a port:
docker service create --name webv --replicas 3 -p 9090:80 nginx:alpine
verify: Service converged
"Converged" means the actual state now matches the desired state (all 3 replicas running). View the service:
docker service ls
NAME MODE REPLICAS IMAGE
webv replicated 3/3 nginx:alpine
3/3 = 3 replicas desired, 3 running. View each task and where it sits:
docker service ps webv
NAME CURRENT STATE
webv.1 Running
webv.2 Running
webv.3 Running
(On a multi-node cluster, the node column shows the tasks spread across different machines.)
Note:
docker service ...only runs on a manager node. Workers can't issue orchestration commands (Article 10). And a service differs fromdocker run:docker psonly shows containers on the current node, whiledocker service psshows tasks across the whole cluster.
Scale: increase or decrease replicas
Change the replica count with a single command:
docker service scale webv=5
verify: Service converged
The manager creates 2 more tasks to reach 5, distributing them onto nodes with room. Scale down and it shuts some tasks off. An equivalent way:
docker service update --replicas 5 webv
This is desired state in practice: you state the number, Swarm self-adjusts.
Self-healing
Because Swarm always keeps the actual state matching the desired state, if a task dies (container crash, or a whole node failing), the manager detects "there's a shortfall" and creates a replacement task on an available node. You don't have to do anything.
On a multi-node cluster you can try: shut Docker down on a worker, then run docker service ps webv on the manager — you'll see the tasks on that node move to Shutdown/Failed and new tasks spring up on another node to make up for them. (On a single-node cluster you can't simulate this — you need multi-node as suggested in Article 10.)
Rolling update: change versions without downtime
When you need to update the image (deploy a new version), Swarm replaces tasks piece by piece instead of stopping them all at once — this is a rolling update. That way the service isn't interrupted.
docker service update --image nginx:1.27-alpine webv
verify: Service converged
Swarm does it one at a time: stop an old task, start a new task with the new image, wait for it to run cleanly, then move to the next task. Check that the image changed:
docker service inspect webv --format '{{.Spec.TaskTemplate.ContainerSpec.Image}}'
Control how the update runs with flags:
docker service update \
--image nginx:1.27-alpine \
--update-parallelism 2 \ # update 2 tasks at a time
--update-delay 10s \ # wait 10s between batches
webv
--update-parallelism decides how many tasks are replaced at once; --update-delay is the pause between batches so the new replica has time to stabilize before touching the next one.
Rollback when an update goes wrong
If the new version has a problem, go back to the previous version with just:
docker service rollback webv
Swarm stores the previous configuration, so the rollback also happens in a rolling fashion. You can also set an automatic-rollback policy for failed updates with --update-failure-action rollback at service creation time.
🧹 Cleanup
docker service rm webv
Removing a service stops and deletes all its tasks/containers on every node. (We still keep the swarm for Articles 12–13; leaving the swarm is at the end of Article 13.)
Wrap-up
On Swarm, the unit of work is the service — a declaration of "I want N replicas of this image". The manager splits it into tasks spread across nodes, and continuously keeps the replica count met (self-healing). scale changes the replica count; a rolling update replaces the image batch by batch so there's no downtime; rollback returns to the previous version. It's all desired state: you declare, Swarm converges to it.
The service now runs many replicas across many nodes, but which network do they talk over, and how does a published port reach a replica that's sitting on another node? Article 12 answers that: overlay networks and the routing mesh.