CoreDNS: Calling Each Other by Name in the Cluster

K
Kai··8 min read

Article 14 finished wiring up pod networking: pods get IPs, ping each other across nodes. But those IPs are ephemeral — a pod that dies and is reborn gets a different IP, so no one writes a pod IP into config. What stays stable is the name: we want to call kube-dns.kube-system or my-app.default and have the system translate it into the current IP. That's the job of in-cluster DNS, and the default implementation is CoreDNS.

The interesting bit: CoreDNS isn't a special component sitting outside. It's an ordinary workload, a few pods behind a Service, running on the very cluster we just built. It uses the pod networking of Article 14 and the Service mechanism of Article 12. In other words, getting CoreDNS up is also a test that the earlier articles were correct.

The piece that's been waiting

Recall Article 11: in KubeletConfiguration we set clusterDNS: 10.32.0.10. From then on, every pod kubelet creates gets a /etc/resolv.conf pointing at nameserver 10.32.0.10. For the last several articles, nothing has been answering at that address. This article's job is to put something at exactly 10.32.0.10 so it responds: a Service named kube-dns with a fixed ClusterIP equal to that exact number, sitting in front of the CoreDNS pods.

   any pod
     │  /etc/resolv.conf: nameserver 10.32.0.10
     ▼
   Service kube-dns (ClusterIP 10.32.0.10)
     │  kube-proxy DNAT (Article 12)
     ▼
   CoreDNS pod ──┬── *.svc.cluster.local names ─► ask api-server, return ClusterIP
                 └── external names ─► forward to upstream (node's resolv.conf)

CoreDNS answers two kinds of questions: names inside the cluster (Services, pods) it knows itself by watching the api-server; names outside (e.g. github.com) it forwards to the node's upstream DNS.

Step 1 — ServiceAccount and RBAC

CoreDNS needs to read the list of Services, Endpoints, Pods, and Namespaces to know which name maps to which IP. Like everything running in the cluster, it authenticates with a ServiceAccount, and permissions come through RBAC. Create a coredns ServiceAccount in kube-system, a read-only ClusterRole for exactly those resources, and a binding linking the two:

cat <<'EOF' | kubectl apply -f -
apiVersion: v1
kind: ServiceAccount
metadata:
  name: coredns
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: system:coredns
rules:
  - apiGroups: [""]
    resources: ["endpoints", "services", "pods", "namespaces"]
    verbs: ["list", "watch"]
  - apiGroups: ["discovery.k8s.io"]
    resources: ["endpointslices"]
    verbs: ["list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: system:coredns
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:coredns
subjects:
  - kind: ServiceAccount
    name: coredns
    namespace: kube-system
EOF

verbs: ["list", "watch"] is just enough, since CoreDNS only observes, never writes. The endpointslices permission (the discovery.k8s.io group) is needed for recent Kubernetes versions, where endpoint information lives in EndpointSlice instead of the old Endpoints.

Step 2 — The Corefile in a ConfigMap

CoreDNS's config is a "Corefile", placed in a ConfigMap to be mounted into the pod. Each line is a plugin:

cat <<'EOF' | kubectl apply -f -
apiVersion: v1
kind: ConfigMap
metadata:
  name: coredns
  namespace: kube-system
data:
  Corefile: |
    .:53 {
        errors
        health {
           lameduck 5s
        }
        ready
        kubernetes cluster.local in-addr.arpa ip6.arpa {
           pods insecure
           fallthrough in-addr.arpa ip6.arpa
           ttl 30
        }
        prometheus :9153
        forward . /etc/resolv.conf {
           max_concurrent 1000
        }
        cache 30
        loop
        reload
        loadbalance
    }
EOF

The two most notable plugins:

  • kubernetes cluster.local ... — this is the part that makes CoreDNS understand Kubernetes. It watches Service/Endpoint via the api-server and resolves every name under cluster.local (and the reverse zones). pods insecure allows resolving pod-IP-style names too. This is the plugin that uses the RBAC from Step 1.
  • forward . /etc/resolv.conf — every name not under cluster.local gets forwarded to the nameserver listed in the /etc/resolv.conf of the CoreDNS pod itself. This detail leads straight to a trap in a later step.

The remaining plugins are the supporting cast: health/ready for probes, cache holds results for 30 seconds, loop detects forwarding loops, reload reloads automatically when the Corefile changes, loadbalance shuffles the order of A records.

Step 3 — The Deployment, and the dnsPolicy trap

cat <<'EOF' | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  name: coredns
  namespace: kube-system
  labels:
    k8s-app: kube-dns
spec:
  replicas: 2
  selector:
    matchLabels:
      k8s-app: kube-dns
  template:
    metadata:
      labels:
        k8s-app: kube-dns
    spec:
      priorityClassName: system-cluster-critical
      serviceAccountName: coredns
      dnsPolicy: Default
      containers:
        - name: coredns
          image: registry.k8s.io/coredns/coredns:v1.12.4
          args: ["-conf", "/etc/coredns/Corefile"]
          ports:
            - { containerPort: 53, name: dns, protocol: UDP }
            - { containerPort: 53, name: dns-tcp, protocol: TCP }
            - { containerPort: 9153, name: metrics, protocol: TCP }
          resources:
            limits: { memory: 170Mi }
            requests: { cpu: 100m, memory: 70Mi }
          livenessProbe:
            httpGet: { path: /health, port: 8080 }
            initialDelaySeconds: 60
          readinessProbe:
            httpGet: { path: /ready, port: 8181 }
          volumeMounts:
            - { name: config-volume, mountPath: /etc/coredns, readOnly: true }
      volumes:
        - name: config-volume
          configMap:
            name: coredns
            items:
              - { key: Corefile, path: Corefile }
EOF

The line easiest to miss but most important is dnsPolicy: Default. By default a pod uses dnsPolicy: ClusterFirst — its resolv.conf points at 10.32.0.10, i.e. back at CoreDNS itself. If you left that on the CoreDNS pod, then forward . /etc/resolv.conf from Step 2 would forward external-name questions back to itself, forming a loop, and the loop plugin would detect it and crash the pod right at startup. dnsPolicy: Default tells kubelet to give the CoreDNS pod the node's resolv.conf (the real upstream, in this cluster the VPC's 10.0.0.2), so forward sends them on to the right upstream. This is a common mistake when standing up CoreDNS yourself; remember it to save an afternoon of reading crash logs.

priorityClassName: system-cluster-critical places CoreDNS in a high-priority group so it isn't evicted before ordinary workloads. replicas: 2 so DNS doesn't depend on a single pod.

Step 4 — The kube-dns Service at exactly 10.32.0.10

The key piece: a Service with a fixed ClusterIP equal to kubelet's clusterDNS.

cat <<'EOF' | kubectl apply -f -
apiVersion: v1
kind: Service
metadata:
  name: kube-dns
  namespace: kube-system
  labels:
    k8s-app: kube-dns
spec:
  clusterIP: 10.32.0.10
  selector:
    k8s-app: kube-dns
  ports:
    - { name: dns, port: 53, protocol: UDP }
    - { name: dns-tcp, port: 53, protocol: TCP }
    - { name: metrics, port: 9153, protocol: TCP }
EOF

Unlike an ordinary Service (where you let Kubernetes assign the ClusterIP), here we specify clusterIP: 10.32.0.10, which must match the number baked into kubelet in Article 11, or else pods will ask DNS at an address nobody answers. selector: k8s-app=kube-dns ties this Service to the two CoreDNS pods.

Check that everything is up:

kubectl -n kube-system get pods -l k8s-app=kube-dns -o wide
kubectl -n kube-system get svc kube-dns
NAME                       READY   STATUS    RESTARTS   AGE   IP           NODE
coredns-596dd76cbc-4zxph   1/1     Running   0          7s    10.200.1.3   worker-1
coredns-596dd76cbc-9nlpq   1/1     Running   0          7s    10.200.0.3   worker-0

NAME       TYPE        CLUSTER-IP   PORT(S)                  AGE
kube-dns   ClusterIP   10.32.0.10   53/UDP,53/TCP,9153/TCP   7s

Both pods Running, one per node (the scheduler spreads them on its own), and the Service sits at 10.32.0.10. The CoreDNS pods got IPs from the Article 14 pod range (10.200.0.3, 10.200.1.3), proof that it's an ordinary workload using exactly the network we built.

Step 5 — Resolve names from a pod

Create a test pod and see what resolv.conf it gets:

kubectl run dnstest --image=busybox:1.36 --restart=Never --command -- sleep 3600
kubectl exec dnstest -- cat /etc/resolv.conf
search default.svc.cluster.local svc.cluster.local cluster.local ap-southeast-1.compute.internal
nameserver 10.32.0.10
options ndots:5

kubelet injected: nameserver 10.32.0.10 (CoreDNS), a list of search domains, and ndots:5. The search part is what allows short names: type kube-dns.kube-system and the resolver tries appending .svc.cluster.local, .cluster.local... in turn until it hits. Try resolving the full names of two Services:

kubectl exec dnstest -- nslookup kube-dns.kube-system.svc.cluster.local
kubectl exec dnstest -- nslookup kubernetes.default.svc.cluster.local
Name:   kube-dns.kube-system.svc.cluster.local
Address: 10.32.0.10

Name:   kubernetes.default.svc.cluster.local
Address: 10.32.0.1

The Service names translate correctly to ClusterIPs: kube-dns to 10.32.0.10, kubernetes to 10.32.0.1. The naming convention is <service>.<namespace>.svc.cluster.local. Names outside the cluster also resolve, via forward:

kubectl exec dnstest -- nslookup one.one.one.one
Non-authoritative answer:
Name:   one.one.one.one
Address: 2606:4700:4700::1111

CoreDNS doesn't know one.one.one.one itself, so it forwards the question to 10.0.0.2 (the node's upstream) and returns the result, exactly the role dnsPolicy: Default arranged.

A note about busybox. If you try nslookup kubernetes.default (the short name) with busybox, you'll see NXDOMAIN. That's not a CoreDNS bug: busybox's nslookup doesn't apply the search list the way a standard resolver does. The proof: in that same pod, ping kubernetes.default prints PING kubernetes.default (10.32.0.1), because ping uses libc's getaddrinfo, which does apply search. When troubleshooting in-cluster DNS, remember to distinguish a tool's bug from the system's.

🧹 Cleanup

Delete the test pod; keep CoreDNS, since from now on it's a permanent component and later articles rely on it:

kubectl delete pod dnstest

CoreDNS runs in the cluster, so there's nothing to clean up on the node or in the VPC. If you stop/start the cluster, the Deployment rebuilds the pods on its own; just remember the masquerade rule from Article 14 needs to be re-run, otherwise CoreDNS will resolve internal names but won't be able to forward externally (packets to 10.0.0.2 won't be SNAT'd).

The full manifests are at github.com/nghiadaulau/kubernetes-from-scratch, directory 15-coredns.

Wrap-up

The cluster now has internal DNS: pods call each other by stable Service names instead of ephemeral IPs, and external names resolve too. The most memorable part is how CoreDNS is itself an ordinary workload (a Deployment behind a Service) rather than special infrastructure, along with the dnsPolicy: Default trap that keeps the CoreDNS pod from asking DNS at itself. The clusterDNS piece we planted back in Article 11 finally clicks into its slot here.

By now every component of a minimal cluster is present: control plane, workers, pod networking, Services, DNS. Article 16 gathers it all into a systematic smoke test: deploy a real application via a Deployment, expose it with a Service, call it by name, check kubectl logs/exec/port-forward, to confirm that every wire we've laid throughout the series really does work together.