Swarm: Overlay Network and Routing Mesh

K
Kai··5 min read

In Article 11, a service runs many tasks spread across many nodes. Two natural questions: how do tasks on different machines talk to each other? And when you publish a port, how does a request reach a task no matter which node it's on? This article answers them — with the overlay network and the routing mesh.

Bridge isn't enough for multiple machines

Recall Article 7: a bridge network (docker0) is a single-machine internal network. Containers on two different hosts aren't on the same bridge, so they can't see each other over it. Swarm needs a kind of network that spans multiple hosts — that's the overlay.

When you swarm init (Article 10), Docker creates two special networks for you:

docker network ls
NAME              DRIVER    SCOPE
docker_gwbridge   bridge    local      ← connects the overlay to each host's external network
ingress           overlay   swarm      ← the routing mesh network (covered below)

Notice ingress has the overlay driver and swarm scope (cluster-wide), unlike the bridge's local scope (single machine).

Overlay network: one flat network across multiple hosts

Per the Docker docs, an overlay network "connects multiple Docker daemons together". In use, containers on different nodes but on the same overlay see each other as if on the same LAN — even though they're really on physical machines far apart.

Underneath, the overlay uses VXLAN: a packet between two containers is "encapsulated" inside a UDP packet and sent across the physical network between the two hosts, then "decapsulated" at the destination. The container has no idea this wrapping happens — to it, the network is just flat.

   Node A                         Node B
   ┌──────────────┐               ┌──────────────┐
   │ [ctn api.1]  │  VXLAN wraps   │ [ctn api.2]  │
   │   10.0.1.2 ──┼─ over physical ┼─► 10.0.1.3   │
   │              │  network (UDP) │              │
   └──────┬───────┘               └───────┬──────┘
          └────── overlay "appoverlay" ────┘
              (same logical network 10.0.1.0/24)

Create an overlay network:

docker network create -d overlay appoverlay
appoverlay  overlay  swarm

Then attach services to it:

docker service create --name api --network appoverlay --replicas 2 nginx:alpine
docker service create --name worker --network appoverlay alpine sleep 600

Now worker and api are on the same overlay, even when their tasks sit on different nodes.

Service discovery: call by name, via a VIP

Like the user-defined bridge in Article 7, the overlay has internal DNS: call a service by name. But Swarm adds a layer: each service has a virtual IP (VIP). DNS resolves the service name to a VIP, and Swarm automatically load-balances that VIP across all the service's tasks.

Try it from inside worker, resolving the name api:

# (run inside one of worker's tasks)
nslookup api
Name:    api
Address: 10.0.1.2

10.0.1.2 is not the IP of any specific container — it's the VIP of the api service. When worker sends a request to api, Swarm distributes it to one of the api.1, api.2... tasks. You just call api, with no need to know how many replicas there are or where they sit.

   worker  ── calls "api" ──►  VIP 10.0.1.2 (service api)
                                  │ Swarm load-balances
                       ┌──────────┼──────────┐
                       ▼                      ▼
                  task api.1              task api.2
                  (node A)                (node B)

This is built-in service discovery + load balancing, with nothing else to set up. In code, worker just points at the host api.

Routing mesh: publish a port on every node

A harder question: you publish a web service on port 80, but its tasks sit on only a few nodes. What if a request hits a node that has no task? The answer is the routing mesh.

When you publish a port for a service, Swarm opens that port on every node in the cluster (via the ingress network). A request to any-node:port is routed by the routing mesh to a running task of the service, even one on a different node.

docker service create --name pub -p 9091:80 nginx:alpine
curl http://localhost:9091
HTTP 200

On a multi-node cluster, curl http://<any node>:9091 returns the web page — whether or not that node has a pub task.

   Request to ANY node:9091
        │ (every node listens on 9091 via ingress)
        ▼
   routing mesh ──► forwards to a live "pub" task
        ├──► task on node A
        └──► task on node B

The practical benefit: you put a load balancer in front and point it at all nodes on the same port, without knowing where the tasks are. Every node is a valid "entry point".

There's another publish mode: --publish mode=host publishes directly on the node running the task (bypassing the routing mesh), used when you want exact control or need the performance. The default is mode=ingress (routing mesh) as above.

What docker_gwbridge is

The docker_gwbridge network (local scope, present on every node) is the bridge that lets containers in the overlay go out — to the Internet or to receive traffic from the routing mesh. You rarely touch it directly, but it's worth knowing so you're not surprised to see it in docker network ls.

🧹 Cleanup

docker service rm api worker pub
docker network rm appoverlay

Remove the services first, since you can't remove an overlay that still has services using it. (Still keep the swarm for Article 13.)

Wrap-up

The overlay network creates a flat network spanning multiple hosts (using VXLAN to wrap packets over the physical network), letting containers on different nodes talk as if on the same LAN. On it, each service has a VIP: call the service by name, and Swarm load-balances across the tasks. The routing mesh publishes a port on every node and routes requests to a live task anywhere — so any node can be the entry point.

The final article (13) ties it all together: using docker stack deploy to deploy a whole multi-service app from a compose file onto the cluster, managing secrets safely, then cleaning everything up and leaving the swarm.