PKI and TLS: Why a Cluster Needs So Many Certificates

At the end of Article 1 we left a question: across the architecture diagram, every arrow between two components implicitly asks two things — is the one on the other end really who it says it is, and is it allowed to do this? The answer lies in a PKI (Public Key Infrastructure) we have to build ourselves. This article is theory, but theory you'll type out as real commands in Article 4, so getting it down now will save a lot of confusion later.

One-way TLS and two-way mTLS

When we think of TLS we usually picture web HTTPS: the browser checks the server's certificate to be sure it's talking to the real my-bank.com. That's one-way TLS — only the client checks the server. The server doesn't care who the client is, because anyone can come look at the website.

In Kubernetes the reverse direction matters more. When kubelet calls the api-server, the api-server must know exactly who the caller is, because that identity decides what it's allowed to do. So Kubernetes uses mTLS (mutual TLS), i.e. two-way TLS: both sides present a certificate and both check each other.

   one-way TLS (ordinary web)        mTLS (in Kubernetes)
   ────────────────────────         ───────────────────────
   client ──► checks ──► server     client ◄──► both check ◄──► server
   "is the server correct?"         "is the server correct?" AND
   who is the client? — whatever    "WHO is the client?" — most important

One point to remember for the whole article: in Kubernetes a certificate isn't just for encryption, it's also an ID card. The api-server has no table of users with passwords; instead, when an mTLS connection succeeds, it reads the identity straight out of the caller's certificate. Later we'll see where that identity sits in the cert.

What it means for a CA to sign a certificate

A Certificate Authority (CA) is a key pair (private key plus certificate) playing the role of a notary. The process:

   1. Each component (e.g. kubelet) generates its own key pair: private key + public key.
   2. It creates a "signing request" (CSR) containing the public key + the identity it claims (CN, O...).
   3. The CA uses the CA's private key to SIGN it, producing a certificate.
   4. That certificate says: "the CA vouches that this public key belongs to that identity."

Trust works like this: if two sides both trust one CA, then when A presents a certificate signed by that CA, B only needs to check that the signature really is the CA's. If it is, B trusts the identity in the cert, without needing to know A in advance. The CA is the root of trust; whoever holds the CA's private key can sign forged certs for anyone, so the CA's private key is the single most important thing to protect.

Why three CAs, not one

In theory, one CA signing everything still works. But Kubernetes, and we in this series, split it into three independent CAs to isolate the scope of trust:

   ┌─ Kubernetes CA ──────────┐   signs everything around the api-server:
   │ "kubernetes-ca"          │   apiserver, kubelet, controller-mgr,
   └──────────────────────────┘   scheduler, kube-proxy, admin...

   ┌─ etcd CA ────────────────┐   signs SEPARATELY for the etcd world:
   │ "etcd-ca"                │   etcd server, etcd peer, and the etcd
   └──────────────────────────┘   client (which is the api-server)

   ┌─ front-proxy CA ─────────┐   signs for the aggregation layer
   │ "front-proxy-ca"         │   (extension API server)
   └──────────────────────────┘

The etcd CA is split off because etcd holds all cluster state, including Secrets; whoever can talk to etcd can reach everything. A separate CA means that even if Kubernetes' main CA leaks, an attacker still can't sign a cert to access etcd directly. The front-proxy CA is split off for the same reason: it serves a narrow mechanism (aggregation, when you extend the API), so it shouldn't share a root of trust with the rest.

The CN and O fields: the identity the cluster reads

An X.509 certificate has two fields in its Subject that Kubernetes uses as identity:

CN (Common Name) — the proper name.
O (Organization) — the organization, and there can be several O's.

Kubernetes maps these two fields to RBAC identity directly:

   CN in the certificate   ──►  the USER name the api-server sees
   O  in the certificate   ──►  the GROUP that user belongs to

There's no user table at all. When you sign a certificate with CN=jane, O=developers, then anyone holding that cert, when calling the api-server, is the user jane in the group developers, and RBAC authorizes based on exactly those two things. Issuing a cert means issuing an identity.

A consequence is a set of special CN/O's that Kubernetes understands out of the box. We have to set them character for character when creating certs in Article 4, or the cluster will reject them:

Component	CN (user)	O (group)	Meaning
admin (you)	`admin`	`system:masters`	super-power group, bypasses all RBAC checks
kubelet per node	`system:node:<node-name>`	`system:nodes`	activates the dedicated Node authorization mode
controller-manager	`system:kube-controller-manager`	—	identity with built-in RBAC
scheduler	`system:kube-scheduler`	—	identity with built-in RBAC
kube-proxy	`system:kube-proxy`	—	identity with built-in RBAC

The first two rows are worth noting. O=system:masters allows doing anything through the api-server without going through RBAC, so the admin cert must be carefully protected. And CN=system:node:worker-0, O=system:nodes for kubelet isn't arbitrary: the prefix system:node: turns on an authorizer called the Node authorizer, limiting each kubelet to touch only the pods and secrets belonging to its own node, so that a compromised worker can't grab another node's secrets. The name after the prefix must match the node name, so in Article 4 we sign a separate kubelet cert for each worker.

The full picture: who presents which cert to whom

Pulled together, here is every certificate we'll create, arranged by the conversation it serves:

   ┌──────────────── signed by Kubernetes CA ─────────────┐
   │                                                       │
   │  api-server  ──server cert "kube-apiserver"──►  (every client checks)
   │  api-server  ──client cert──► kubelet  (to call down to kubelet: logs, exec)
   │  kubelet     ──client cert "system:node:X" / O=system:nodes──► api-server
   │  controller-mgr ─client "system:kube-controller-manager"──► api-server
   │  scheduler   ──client "system:kube-scheduler"──► api-server
   │  kube-proxy  ──client "system:kube-proxy"──► api-server
   │  admin (you) ──client "admin" / O=system:masters──► api-server
   └───────────────────────────────────────────────────────┘

   ┌──────────────── signed by etcd CA ───────────────────┐
   │  etcd        ──server cert──►  (api-server checks)    │
   │  etcd ◄─peer cert─► etcd   (3 nodes talk to each other)│
   │  api-server  ──client cert──► etcd  (to read/write state)│
   └───────────────────────────────────────────────────────┘

   ┌──────────────── signed by front-proxy CA ────────────┐
   │  aggregation layer (used later, when extending the API)│
   └───────────────────────────────────────────────────────┘

A rough count shows the number isn't small: each arrow is a key pair plus a cert to create, and with 3 controllers and 2 workers, many of them multiply by the number of machines. This is the part kubeadm does compactly in one command and hides; we'll do each one by hand in Article 4 to see every identity clearly.

SAN: why a server cert must list all its addresses

A technical detail that often causes baffling errors: with a server certificate (like the api-server's or etcd's), the client doesn't check the CN, it checks the SAN field (Subject Alternative Name). The SAN lists every name and IP this server can be reached at. If the client connects via an address not in the SAN, the TLS handshake fails.

For the api-server, the SAN must include: the IP of each controller, the load balancer address (because clients reach it through there), internal DNS names like kubernetes.default.svc.cluster.local, and the first ClusterIP of the Service range (because pods in the cluster reach the api-server through a virtual Service named kubernetes). Miss one entry and later you hit x509: certificate is valid for ... not ..., one of the most time-consuming errors when building by hand. We'll list them all from Article 4 to avoid it.

   api-server server cert, SAN must include:
     ├── 127.0.0.1, IP of each controller (10.x.x.x ×3)
     ├── load balancer IP
     ├── kubernetes, kubernetes.default,
     │   kubernetes.default.svc, ...svc.cluster.local
     └── first ClusterIP of the Service range (e.g. 10.32.0.1)

One exception: the ServiceAccount key pair is not a certificate

Finally, one thing that's easy to confuse in Article 4. To issue tokens for ServiceAccounts (the identity a pod uses to call the api-server), Kubernetes needs a key pair to sign and verify tokens, called sa.key (private) and sa.pub (public):

   sa.key (private)  ──► controller-manager uses it to SIGN ServiceAccount tokens
   sa.pub (public)   ──► api-server uses it to VERIFY those tokens when a pod calls

The difference is this is a bare key pair, signed by no CA, with no CN/O, not X.509. It's just an asymmetric key pair for signing and verifying signatures on JWT tokens. The reason it exists separately: ServiceAccount tokens are created inside the cluster in large numbers with short lifetimes, so a lightweight token-signing mechanism makes more sense than issuing an X.509 for each one. We still create it in Article 4 alongside the certs, just remember it's a different kind.

Wrap-up

Kubernetes uses two-way mTLS; a certificate is both for encryption and as an ID card.
Three CAs (Kubernetes, etcd, front-proxy) to isolate the scope of trust, especially to lock down the door into etcd.
CN maps to user, O maps to group; some CN/O's are special (system:masters, system:node:<node> with system:nodes...) and must be set character for character.
A server cert needs a SAN listing every address the server can be reached at, including the load balancer and the ClusterIP of the kubernetes Service.
The ServiceAccount key pair (sa.key/sa.pub) is the exception, not a certificate.

That's the theoretical foundation. From the next article we leave theory and start getting our hands on the infrastructure: Article 3 stands up six EC2 machines, prepares the OS (hostname, kernel modules, sysctl, disable swap) and installs the tools we'll need, so that in Article 4 we can sit down and sign the certificates just listed above.