Backing Up etcd and Rotating Certificates

K
Kai··5 min read

Part XII extended the cluster. Part XIII turns to keeping it alive: backup, upgrade, garbage collection, observability. We start with the thing whose loss loses everything — etcd. Every object in the cluster (Pod, Secret, RBAC, even the custom resources from Article 57) lives in etcd; if all three etcd nodes fail with no backup, the cluster is a rebuild from scratch. This article takes a snapshot, verifies it, restores it elsewhere to make sure it's usable, then looks at certificate expiry.

The cluster's etcd

The cluster runs etcd 3.6 stacked on three controllers (Article 6). Check the health of all three endpoints before backing up:

ssh controller-0
E="sudo etcdctl --cacert=/etc/etcd/etcd-ca.pem --cert=/etc/etcd/etcd.pem --key=/etc/etcd/etcd-key.pem"
$E --endpoints=https://10.0.1.11:2379,https://10.0.1.12:2379,https://10.0.1.13:2379 endpoint health -w table
+------------------------+--------+-------------+-------+
|        ENDPOINT        | HEALTH |    TOOK     | ERROR |
+------------------------+--------+-------------+-------+
| https://10.0.1.11:2379 |   true |  14.23545ms |       |
| https://10.0.1.13:2379 |   true | 15.991822ms |       |
| https://10.0.1.12:2379 |   true | 14.031481ms |       |
+------------------------+--------+-------------+-------+

All three members are healthy. A three-node HA setup survives losing one node and the cluster keeps running (it needs a 2/3 majority), but HA is not a backup: delete an object by mistake and all three members delete it. Backup is a different concern.

Taking a snapshot

etcdctl snapshot save captures the entire keyspace at a point in time into a single file:

$E snapshot save /tmp/etcd-backup.db
Snapshot saved at /tmp/etcd-backup.db
Server version 3.6.0

The file is 18–19 MB. An unverified snapshot is an untrustworthy snapshot, so look at what's in it. In etcd 3.6, snapshot status and restore moved to a separate tool, etcdutl (which operates directly on the file, no running etcd needed):

sudo etcdutl snapshot status /tmp/etcd-backup.db -w table
+----------+----------+------------+------------+---------+
|   HASH   | REVISION | TOTAL KEYS | TOTAL SIZE | VERSION |
+----------+----------+------------+------------+---------+
| 2bd8e3d9 |   115022 |        569 |      19 MB |   3.6.0 |
+----------+----------+------------+------------+---------+

TOTAL KEYS 569 is the number of objects in the cluster at snapshot time, REVISION is etcd's logical timestamp. The hash helps detect a corrupt file.

Restore-testing without touching the cluster

The easily missed point: a snapshot is only trustworthy if it restores. Verify by restoring into a fresh data directory — this operation is entirely local and doesn't touch the running etcd:

sudo etcdutl snapshot restore /tmp/etcd-backup.db --data-dir=/tmp/etcd-restore
sudo ls /tmp/etcd-restore/member
... restored snapshot ... data-dir: /tmp/etcd-restore ...
snap  wal

The restore produces a valid data-dir with member/snap and member/wal — exactly the structure an etcd member needs to start. This is the safe way to drill DR: snapshot regularly, and occasionally restore-test it elsewhere to make sure the backup is actually usable, rather than finding out the file is corrupt only when the house is on fire.

The real restore procedure

When you have to restore for real (all three members lost, or you need to roll back to a point), the procedure on a stacked HA cluster is:

1. Stop kube-apiserver (all 3) — no writes to etcd during the restore
2. Stop etcd (all 3 members):          systemctl stop etcd
3. On EACH member, restore the snapshot into a new data-dir with the correct
   --name / --initial-cluster / --initial-advertise-peer-urls for that member
4. Replace the old data-dir with the freshly restored one
5. Start etcd again (all 3), then kube-apiserver

The subtle point of HA: each member must restore with its own correct identity (--name, --initial-cluster), because the snapshot carries no membership info — otherwise the three members won't form a single cluster. This article doesn't run that destructive step on the live cluster; the verification is that the snapshot is valid and restores into a data-dir with the correct structure.

Certificate expiry

The second operational concern is easy to forget until the cluster breaks itself: certificate expiry. Inspect the expiry of the cert set built in Article 4:

for c in ca kube-apiserver etcd admin front-proxy-ca; do
  printf "%-16s %s\n" "$c" "$(openssl x509 -in $c.pem -noout -enddate | cut -d= -f2)"
done
ca               May 22 13:33:00 2031 GMT      # CA: 10 years
kube-apiserver   May 23 13:34:00 2027 GMT      # leaf: 1 year
etcd             May 23 13:34:00 2027 GMT
admin            May 23 13:33:00 2027 GMT
front-proxy-ca   May 22 13:33:00 2031 GMT

Two different expiries: the CA (including front-proxy-ca) lives 10 years, while leaf certs (apiserver, etcd, admin, kubelet...) last only one year — they expire in 2027. When a leaf cert expires, the component using it can no longer complete a TLS handshake: the apiserver can't talk to etcd, kubelet can't reach the apiserver, the cluster stalls. Rotating a leaf cert means re-signing it with the same old CA (the CA hasn't expired, so clients still trust it), then putting the new cert in place and restarting the component — which is just re-running the cert-signing part of Article 4 with the existing CA, not building a new CA. Because the CA lives 10 years, rotating leaf certs doesn't break trust. (A kubeadm cluster has kubeadm certs renew to automate this; a self-built cluster re-signs by hand — the tradeoff of doing it manually.)

🧹 Cleanup

sudo rm -rf /tmp/etcd-backup.db /tmp/etcd-restore /tmp/etcdutl

This article doesn't modify cluster config — it just takes a snapshot (read) and restores into a temp directory. Deleting the temp files cleans it up. The commands used in this article are at github.com/nghiadaulau/kubernetes-from-scratch, directory 62-etcd-backup.

Wrap-up

etcd is the single source of truth for the cluster, and HA is no substitute for backup — delete an object by mistake and all three members delete it. We took a snapshot with etcdctl snapshot save, verified it with etcdutl snapshot status (HASH/REVISION/569 keys/19MB), and restored it into a fresh data-dir (valid member/snap+member/wal) to prove the backup is usable, all without touching the live etcd. The real DR procedure on stacked HA: stop apiserver + etcd, restore with each member's correct identity, restart. The second operational concern is certificates: the CA lives 10 years (until 2031) but leaf certs only one (expire 2027), and a leaf cert expiry stalls the cluster — rotating is re-signing leaf certs with the old CA, not building a new CA. The shared lesson: both of these stay silent until they break, so you must drill restores and watch cert expiry before you need them.

Article 63 turns to a more frequent operational task: version upgrades. The cluster is on v1.36, and Kubernetes has strict version-skew rules about how far each component may lag the apiserver — upgrading in the right order is the core of not breaking the cluster halfway through.