Load Balancers and Reverse Proxies

A single server has limits: it can handle only so many connections, and if it dies the whole service dies. To serve many people with no single point of failure, we put up several servers and a load balancer to distribute traffic among them. This article explains load balancers, reverse proxies, and the surrounding concepts — things at the center of scalable systems.

The problem: one server isn't enough

Two problems with a single server:

Load: the traffic exceeds what one machine can handle.
Single point of failure: if that machine dies, the service goes down.

The solution: run multiple copies of the application (recall "replicas" from the Docker Swarm series) and put something in front to distribute requests among them — that's the load balancer (LB).

                          ┌─► Server A
   Client ──► Load        ├─► Server B     (multiple identical copies)
             Balancer ────┤
                          └─► Server C
        distributes requests, skips dead servers

The benefits: split load across machines (scaling horizontally — scale out), and if a server dies, the LB stops sending to it and the service keeps running.

L4 and L7: which layer to balance at

Remember the layered model (Article 1)? A load balancer operates at one of two layers, and this is an important distinction:

L4 (transport layer — TCP/UDP): distributes based on IP and port, without looking at content. Fast, simple, fits any protocol. It just forwards TCP packets to a backend.
L7 (application layer — HTTP): understands HTTP content (URL, headers, cookies), so it can route intelligently — e.g. /api/* to this group of servers, /images/* to another; or by host. More flexible but heavier on processing.

   L4 LB:  looks at  [ destination IP:port ]      → picks a backend
   L7 LB:  looks at  [ GET /api/users  Host: ... ]→ picks a backend by URL/host

A real-world example (AWS series): AWS's NLB (Network Load Balancer) is L4; ALB (Application Load Balancer) is L7. Saying "a layer-7 load balancer" is just the OSI naming from Article 1.

Load-balancing algorithms

The LB picks which backend gets each request by an algorithm:

   Round-robin       in turn A, B, C, A, B, C...  (common default)
   Least connections sends to the server with the fewest open connections
   IP hash           the same client IP always hits the same server (sticky by IP)
   Weighted          a stronger server takes a larger share

Round-robin is simple and enough for most cases. Least-connections is good when requests vary in weight.

Health checks: skipping dead servers

How does the LB know a server has died so it can stop sending to it? With health checks: the LB periodically "probes" each backend (e.g. calling GET /health — recall the /health endpoint we put in the app in the Docker/AWS series). The backend returns 200 = healthy, keeps receiving requests; no answer/an error = temporarily removed. When it recovers, the LB puts it back.

This is the mechanism behind high availability: a failed server is skipped automatically and users see no disruption. (Swarm/Kubernetes also use health checks to replace dead copies automatically — recall the Docker series.)

Reverse proxies: a close cousin

A reverse proxy is a server that sits in front of the real servers, receiving requests on their behalf and forwarding them inward. An L7 load balancer is in fact a kind of reverse proxy. Distinguishing the two kinds of "proxy":

A forward proxy sits in front of the client (representing the client going out to the Internet — e.g. a company proxy).
A reverse proxy sits in front of the server (representing the server, receiving requests from outside).

   Forward:  Client ──► [Forward proxy] ──► Internet
   Reverse:  Internet ──► [Reverse proxy] ──► Backend server

A reverse proxy (like nginx, HAProxy, Caddy, Traefik) does many useful things beyond load balancing:

TLS termination: decrypt HTTPS at the proxy (recall Article 9), speak plain HTTP to the internal backend — the backend doesn't deal with certificates.
Routing: by path/host to different services.
Caching static content, compression, rate limiting — this blog's own nginx does rate limiting on /api/auth, which we hit during deploys.
Hiding the internal architecture (the client only sees the proxy).

This is why nearly every real web system has nginx/a reverse proxy out front.

Sticky sessions: when you need to "stick" to a server

Because the LB spreads requests across servers, two requests from the same person may land on two different servers. If session state lives inside the server (in-memory), this causes "logged in one moment, not the next" bugs. Two ways to handle it:

Sticky sessions (session affinity): the LB pins a client to the same server (via a cookie or IP hash). Simple but less flexible when scaling.
Externalize state (recommended): store sessions in a shared place (Redis, a database) any server can read. The servers are then stateless and scale freely — this is the modern architecture to aim for.

Wrap-up

A load balancer distributes traffic across multiple server copies to split load and avoid a single point of failure. It works at L4 (by IP/port — fast) or L7 (understands HTTP — intelligent routing), picks a backend by an algorithm (round-robin, least-conn...), and uses health checks to skip dead servers (the foundation of high availability). A reverse proxy (nginx...) sits in front of servers handling TLS termination, routing, caching, rate limiting. To scale well, keep servers stateless (push sessions to Redis/DB) rather than using sticky sessions.

You now have the full picture from packet to scalable system. Article 12 gathers the diagnostic tools into a practical network-troubleshooting workflow.