Migrating to kube-proxy-less Cilium

K
Kai··6 min read

Article 45 explained why. This one does it for real: move the live cluster from kube-proxy iptables + bridge (Part I) to eBPF-based Cilium 1.19, remove kube-proxy entirely, enable Hubble. This is a migration on a live cluster — CoreDNS, metrics-server, EBS CSI all running — so the risk is real, and we hit exactly the traps a self-built cluster (no cloud-controller-manager, no IMDS-for-pods) faces. I'm keeping the whole troubleshooting part because that's the real value.

Step 1 — install Cilium (kube-proxy still running alongside)

Install first, remove kube-proxy later — to keep a way back. Use Helm, pinned to 1.19.4. The config follows Part I's architecture: native routing (like Article 13), pod CIDR 10.200.0.0/16 (like Article 14), and k8sServiceHost pointing at the internal LB (since there's no kube-proxy, the Cilium agent has to know where the API server is):

helm install cilium cilium/cilium --version 1.19.4 --namespace kube-system \
  --set kubeProxyReplacement=true \
  --set k8sServiceHost=10.0.1.10 --set k8sServicePort=6443 \
  --set ipam.mode=cluster-pool \
  --set ipam.operator.clusterPoolIPv4PodCIDRList='{10.200.0.0/16}' \
  --set ipam.operator.clusterPoolIPv4MaskSize=24 \
  --set routingMode=native --set ipv4NativeRoutingCIDR=10.200.0.0/16 \
  --set autoDirectNodeRoutes=true --set enableIPv4Masquerade=true \
  --set hubble.relay.enabled=true --set hubble.ui.enabled=true

autoDirectNodeRoutes=true lets Cilium itself install node→other-node-pod-CIDR routes (instead of the hand-made VPC routes from Article 14) — pure Cilium native routing. Wait for the agent (DaemonSet) + operator to come up, then check status:

kubectl exec ds/cilium -n kube-system -c cilium-agent -- cilium-dbg status | grep -E "KubeProxyReplacement|Routing"
KubeProxyReplacement:    True   [ens5 10.0.1.20 (Direct Routing)]
Routing:                 Network: Native   Host: Legacy

KubeProxyReplacement: True — eBPF is ready to do kube-proxy's job. But kube-proxy is still running (systemd, Article 12) — now we remove it.

Step 2 — remove kube-proxy

This cluster installed kube-proxy via systemd on the workers (not a DaemonSet like kubeadm), so removing it = stop the service + delete the iptables rules it left:

for w in worker-0 worker-1; do
  ssh $w 'sudo systemctl stop kube-proxy; sudo systemctl disable kube-proxy
    sudo iptables-save | grep -v KUBE | sudo iptables-restore'   # delete KUBE-* rules
done
worker-0: KUBE rules remaining: 0   |   kube-proxy active: inactive
worker-1: KUBE rules remaining: 0   |   kube-proxy active: inactive

The 74 iptables rules from Article 45 are gone. From now on Service ClusterIP is handled only by eBPF. Cilium also cleans up the old CNI config itself — it writes 05-cilium.conflist (higher priority) and renames 10-bridge.conf (Article 14) to 10-bridge.conf.cilium_bak:

ssh worker-0 'sudo ls /etc/cni/net.d/'
# 05-cilium.conflist  10-bridge.conf.cilium_bak  99-loopback.conf.cilium_bak

Step 3 — restart pods to pick up the Cilium datapath

Running pods still carry IPs from the old bridge. Restart them so Cilium assigns IPs and the new datapath:

for d in coredns metrics-server ebs-csi-controller snapshot-controller; do
  kubectl rollout restart deployment/$d -n kube-system
done
kubectl rollout restart daemonset/ebs-csi-node -n kube-system

CoreDNS comes back with new IPs (10.200.0.148, 10.200.1.204 — assigned by Cilium IPAM). Verify the core thing — Service + DNS work with no kube-proxy:

kubectl exec cli -- nslookup kubernetes.default.svc.cluster.local   # via DNS 10.32.0.10
kubectl exec cli -- wget -qO- https://10.32.0.1:443/healthz          # via ClusterIP
Address: 10.32.0.1          # DNS resolves OK (CoreDNS, via Cilium eBPF)
TLS error alert 47          # connection TO apiserver OK (error only from missing client cert)

Both the ClusterIP (10.32.0.1) and DNS (10.32.0.10) work — eBPF DNAT successfully replaced kube-proxy.

Four traps of a self-built cluster (and how to clear them)

After the migration, ebs-csi (Article 43) crashes en masse. This is the most real part of the article, four stacked problems that a KTHW + Cilium cluster often hits:

(1) Nodes missing providerID. EBS CSI needs to know which EC2 instance a pod runs on. Our cluster doesn't run a cloud-controller-manager so node.spec.providerID is empty. Set it by hand:

kubectl patch node worker-0 -p '{"spec":{"providerID":"aws:///ap-southeast-1a/i-0f1ab..."}}'

(2) Nodes missing topology labels. The driver also needs node.kubernetes.io/instance-type + topology.kubernetes.io/zone (also set by cloud-controller-manager, which we don't have). Add them by hand:

kubectl label node worker-0 node.kubernetes.io/instance-type=t3.medium \
  topology.kubernetes.io/region=ap-southeast-1 topology.kubernetes.io/zone=ap-southeast-1a

After (1)+(2), ebs-csi-node (which only needs metadata) comes up 3/3. But ebs-csi-controller still dies — because it needs credentials, not just metadata.

(3) IMDS hop limit blocks the pod. The controller gets IAM credentials from IMDS (169.254.169.254). The error no EC2 IMDS role found ... context deadline exceeded — the pod can't reach IMDS. The culprit: EC2 IMDSv2 defaults to HttpPutResponseHopLimit=1, but the pod is one hop from IMDS (through Cilium's masquerade) → blocked. Raise the hop limit:

aws ec2 modify-instance-metadata-options --instance-id i-0f1ab... --http-put-response-hop-limit 2

(4) The controller needs hostNetwork to reliably reach IMDS. Even with hop limit 2, the pod→IMDS path through eBPF/masquerade is still flaky. The surest fix: put the controller on the host network (reaching IMDS directly, not through Cilium):

kubectl patch deployment ebs-csi-controller -n kube-system --type=strategic \
  -p '{"spec":{"template":{"spec":{"hostNetwork":true,"dnsPolicy":"ClusterFirstWithHostNet"}}}}'

After that the controller is 2/2, the driver starts, the dry-run AWS API call succeeds. (On a managed/EKS cluster, IRSA or Pod Identity handles credentials more cleanly — but on a hand-built cluster this is the direct way.) Hubble-relay/ui also crashed at one point but only transiently — they start before DNS is ready during the migration, and recover on their own once CoreDNS is stable.

Final confirmation: truly kube-proxy-less

kubectl get pods -n kube-system    # 17 pods Running
kubectl exec ds/cilium -n kube-system -c cilium-agent -- cilium-dbg status | grep -E "KubeProxy|Hubble|health"
ssh worker-0 'sudo iptables-save | grep -cE "KUBE-SERVICES|KUBE-SVC-"'
KubeProxyReplacement:  True
Hubble:                Ok   Current/Max Flows: 4095/4095, Flows/s: 17
Cluster health:        2/2 reachable
0          # <- NO kube-proxy rules left (KUBE-SERVICES/KUBE-SVC)

0 kube-proxy rules — only a few KUBE-FIREWALL/KUBE-KUBELET-CANARY rules from kubelet remain (harmless, unrelated to Service routing). Services are entirely handled by eBPF. And Hubble Ok — we can now see every flow (this article won't dwell on it; Hubble UI/observability is exploited in the policy articles).

🧹 Cleanup

kubectl delete deployment net-test ; kubectl delete svc net-test    # test workload

Only delete the test workload. Keep everything: Cilium + Hubble (the cluster's new CNI), and the permanent changes (kube-proxy disabled, providerID + labels on nodes, IMDS hop limit, ebs-csi-controller hostNetwork). The cluster now runs kube-proxy-less with 17 kube-system pods Running. Manifests/Helm values at github.com/nghiadaulau/kubernetes-from-scratch, directory 46-cilium-migrate.

Wrap-up

The migration succeeded: the cluster moved from kube-proxy iptables + bridge to kube-proxy-less Cilium 1.19.4 eBPF + Hubble. The process: install Cilium (helm, kubeProxyReplacement=true, native routing, autoDirectNodeRoutes) alongside → stop+disable kube-proxy systemd + flush iptables (74 rules → 0 KUBE-SERVICES/SVC) → Cilium replaces the CNI config itself (.cilium_bak) → restart pods to get Cilium IPs. Verified results: KubeProxyReplacement: True, Service ClusterIP + DNS working through eBPF, Cluster health 2/2, Hubble Ok. The four traps of a self-built cluster all have real fixes: providerID + topology labels (replacing cloud-controller-manager), IMDS hop limit 1→2 (so the pod can reach IMDS through Cilium), and hostNetwork for ebs-csi-controller (to get IAM credentials). This is an honest portrait of a production migration — the hard part isn't the install commands but the hidden dependencies that surface when you swap the datapath.

Article 47 exploits the new power: NetworkPolicy. Previously there was no CNI supporting policy so every pod talked freely; now with Cilium, we'll block and allow traffic by label (and watch Hubble show drop/allow verdicts) — identity-based network security as Article 45 promised.