PersistentVolume and PersistentVolumeClaim

K
Kai··6 min read

Every volume in Article 41 was ephemeral, dying with the pod. Databases, user uploads, data that needs to outlive the pod require persistent storage. Kubernetes solves this by splitting the roles. This is where you have to trace carefully what creates what, what binds what, because mixing up the roles confuses the whole picture.

Two objects, two creators:

  • PersistentVolume (PV)"a piece of storage in the cluster that has been provisioned by an administrator ... It is a resource in the cluster just like a node ... PVs have a lifecycle independent of any individual Pod." This is the real piece of storage, created by an admin (or generated dynamically by CSI — Article 43). A PV is cluster-scoped (not in a namespace, like a Node).
  • PersistentVolumeClaim (PVC)"a request for storage by a user ... Pods consume node resources and PVCs consume PV resources." This is the request, created by a user. A PVC is namespaced.

The docs' analogy: a Pod consumes Node resources, a PVC consumes PV resources. Now trace it step by step on the cluster.

Step 1 — admin creates the PV

The PV carries the real storage details. Here it uses hostPath on worker-0 for simplicity (no CSI yet):

apiVersion: v1
kind: PersistentVolume
metadata: {name: pv-static}
spec:
  capacity: {storage: 1Gi}
  accessModes: ["ReadWriteOnce"]
  persistentVolumeReclaimPolicy: Retain
  hostPath: {path: /mnt/data}        # real storage on the node
kubectl get pv pv-static
NAME        CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      CLAIM
pv-static   1Gi        RWO            Retain           Available   <unset>

The new PV starts in phase Available, CLAIM=<unset>nobody has requested it yet. This is stock the admin set up in advance, waiting for someone to claim it. (The four PV phases: AvailableBoundReleasedFailed.)

Step 2 — user creates the PVC

The user does not know (and need not know) which specific PV; they only declare a need: how much capacity, what access mode:

apiVersion: v1
kind: PersistentVolumeClaim
metadata: {name: pvc-app}
spec:
  accessModes: ["ReadWriteOnce"]
  resources: {requests: {storage: 500Mi}}

This PVC says "I need 500Mi, read-write by one node". It does not point at any PV; matching is the system's job.

Step 3 — control loop binds PVC ↔ PV (both ways)

Here's the crux: who matches a PVC to a PV? The docs: "A control loop in the control plane watches for new PVCs, finds a matching PV (if possible), and binds them together." That's the persistentvolume-binder controller inside kube-controller-manager (the component we set up in Article 8). It sees the new PVC pvc-app, looks for an Available PV that satisfies it (≥500Mi, RWO) → finds pv-static (1Gi, RWO) → binds:

kubectl get pv pv-static  -o jsonpath='PV.status={.status.phase} PV.claimRef={.spec.claimRef.namespace}/{.spec.claimRef.name}{"\n"}'
kubectl get pvc pvc-app   -o jsonpath='PVC.status={.status.phase} PVC.volumeName={.spec.volumeName}{"\n"}'
PV.status=Bound  PV.claimRef=default/pvc-app
PVC.status=Bound PVC.volumeName=pv-static

The bind is bidirectional and 1-1, exactly as the docs say: "a PVC to PV binding is a one-to-one mapping, using a ClaimRef which is a bi-directional binding." The controller writes claimRef on the PV (pointing to default/pvc-app) and volumeName on the PVC (pointing to pv-static). From now on the two are locked together: "Once bound, PersistentVolumeClaim binds are exclusive." No other PVC can take pv-static, even though the PVC only asked for 500Mi while the PV has 1Gi (the PVC gets the full capacity of the PV it binds to). The chain of cause and effect so far:

admin ──creates──▶ PV (Available, real stock)
user  ──creates──▶ PVC (the request: 500Mi RWO)
                 │
kube-controller-manager / persistentvolume-binder
                 │  watch PVC → find matching PV → write claimRef(PV) + volumeName(PVC)
                 ▼
            PV.Bound ◀──── 1-1 ────▶ PVC.Bound

Step 4 — pod uses the claim (and data outlives the pod)

The pod does not point straight at the PV but at the PVC (the right abstraction layer: the app declares "I need this claim", without knowing the storage backend):

  volumes:
  - name: v
    persistentVolumeClaim: {claimName: pvc-app}

The docs: "Pods use claims as volumes. The cluster inspects the claim to find the bound volume and mounts that volume for a Pod." The kubelet reads the PVC, traces the bound PV, then mounts the real storage into the pod. Write data, then delete the pod, create a new pod using the same claim:

kubectl exec user-pod  -- sh -c 'echo "du-lieu-ben-vung-001" > /data/file.txt'
kubectl delete pod user-pod --now
# ... create user-pod2 using the same pvc-app ...
kubectl exec user-pod2 -- cat /data/file.txt
du-lieu-ben-vung-001

The new pod reads back exactly the data the old pod wrote, because the PV "has a lifecycle independent of any individual Pod". This is what emptyDir (Article 41) cannot do: delete the pod and it's gone. Persistent storage lives outside the pod's lifecycle.

Step 5 — reclaim: what happens when the user is done

When done, the user deletes the PVC. The PV's next phase is decided by persistentVolumeReclaimPolicy:

kubectl delete pvc pvc-app
kubectl get pv pv-static
ssh worker-0 'sudo cat /mnt/data/file.txt'
NAME        ... RECLAIM POLICY   STATUS     CLAIM
pv-static   ... Retain           Released   default/pvc-app

du-lieu-ben-vung-001

The PVC is gone, the PV moves to phase Released (still recording the old claim=default/pvc-app), it does not return to Available. And the data on the host is still there. That's Retain: "The reclaim policy ... tells the cluster what to do with the volume after it has been released." Retain keeps the data intact and does not let the PV be reused automatically; the admin has to clean it up by hand and then create a new PV (safe, prevents data loss). The other option is Delete (deletes the real storage too when the PVC is gone, handy for dynamic provisioning, Article 43).

Access modes — what kind of read/write

accessModes on both PV and PVC dictate how it's mounted, per the docs:

Mode Abbrev. Meaning
ReadWriteOnce RWO read-write by one node (multiple pods on the same node still allowed)
ReadOnlyMany ROX read-only by many nodes
ReadWriteMany RWX read-write by many nodes (needs backend support, e.g. NFS)
ReadWriteOncePod RWOP read-write by exactly one pod only

Most block storage (EBS — Article 43) can only do RWO (attached to one node at a time); file storage (NFS, EFS) can do RWX. The PVC must request a mode the PV/backend supports, or it won't bind.

🧹 Cleanup

kubectl delete pod user-pod2 --now
kubectl delete pvc pvc-app          # -> PV Released
kubectl delete pv pv-static         # admin cleans up the PV
ssh worker-0 'sudo rm -rf /mnt/data'  # clean up the real data on the host (Retain doesn't auto-delete)

Because of Retain, the real data must be deleted by hand on the host, true to the spirit of "the PV lives outside the pod". The cluster is back to CoreDNS + metrics-server. Manifests at github.com/nghiadaulau/kubernetes-from-scratch, directory 42-pv-pvc.

Wrap-up

Persistent storage splits into two roles and one matcher. Admin creates the PV (the real piece of storage, cluster-scoped, born in phase Available); user creates the PVC (the request, namespaced, declaring only capacity + access mode, pointing at no PV). The persistentvolume-binder control loop in kube-controller-manager watches PVCs, finds a matching PV, binds 1-1 both ways via claimRef (on the PV) + volumeName (on the PVC) → both become Bound, locked exclusively. The pod points at the PVC (not the PV), the kubelet traces the bound PV and mounts it, and data outlives the pod (delete the pod, recreate, still readable). Delete the PVC → PV goes Released; reclaimPolicy: Retain keeps the data (admin cleans up), Delete removes it too. Access modes (RWO/ROX/RWX/RWOP) dictate who can mount. So far this is all static, the admin creates the PV by hand. Article 43 brings StorageClass + CSI that generate PVs automatically: the user creates a PVC, and the system spawns a matching PV on its own (dynamic provisioning), and we'll install the real EBS CSI driver to watch AWS create a volume.