Pod Security Standards and Admission

Article 52 showed that RBAC decides who can create a pod, but RBAC doesn't look at a pod's contents. Someone with permission to create pods can still create a pod that runs privileged, borrows hostNetwork, or mounts hostPath / — escape routes from the container onto the node. Blocking that is the admission stage, the third stage after authentication and authorization. Kubernetes ships an admission controller for exactly this: Pod Security Admission, on by default since 1.25.

Three levels and three modes

Pod Security Admission measures pods against the Pod Security Standards — three cumulative levels:

   privileged  ── no constraints (for infra workloads: CNI, agents)
        │
   baseline    ── blocks known escalations: hostNetwork/PID/IPC, privileged,
        │          hostPath, dangerous capabilities
        ▼
   restricted  ── baseline + hardening: runAsNonRoot, allowPrivilegeEscalation=false,
                   drop ALL capabilities, seccompProfile RuntimeDefault

Which level applies, in which mode, is set by a label on the namespace following the form pod-security.kubernetes.io/<MODE>=<LEVEL>. MODE has three values that differ in consequence:

MODE	when a pod violates
`enforce`	rejected — the pod is not created
`warn`	still created, but returns a warning to the submitter
`audit`	still created, writes an annotation into the audit log

A namespace can set a different level for each mode. The common pattern is enforce at a moderate level, with warn/audit at a higher level to see in advance what will break when you tighten.

enforce=restricted: the plain pod gets kicked

Label a namespace with enforce restricted:

kubectl create namespace psa-demo
kubectl label namespace psa-demo \
  pod-security.kubernetes.io/enforce=restricted \
  pod-security.kubernetes.io/enforce-version=latest

Then try to create an ordinary pod that declares no securityContext at all:

kubectl -n psa-demo run plain --image=busybox:1.36 --command -- sleep 100000

Error from server (Forbidden): pods "plain" is forbidden: violates PodSecurity "restricted:latest":
  allowPrivilegeEscalation != false (container "plain" must set securityContext.allowPrivilegeEscalation=false),
  unrestricted capabilities (container "plain" must set securityContext.capabilities.drop=["ALL"]),
  runAsNonRoot != true (pod or container "plain" must set securityContext.runAsNonRoot=true),
  seccompProfile (pod or container "plain" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")

The pod is rejected right at admission, with exactly the four points restricted demands that the pod is missing. The message also serves as a fix-it guide: for a pod to live in this namespace, it must declare all four of those fields.

A compliant pod runs

Rewrite the pod to match restricted:

apiVersion: v1
kind: Pod
metadata: {name: compliant}
spec:
  securityContext:
    runAsNonRoot: true
    runAsUser: 1000
    seccompProfile: {type: RuntimeDefault}
  containers:
  - name: app
    image: busybox:1.36
    command: ["sleep", "100000"]
    securityContext:
      allowPrivilegeEscalation: false
      capabilities: {drop: ["ALL"]}

kubectl -n psa-demo apply -f compliant.yaml
kubectl -n psa-demo get pod compliant

NAME        READY   STATUS    RESTARTS   AGE
compliant   1/1     Running   0          9s

The pod runs. runAsNonRoot: true forces the container not to run as root (requires a runAsUser other than 0), allowPrivilegeEscalation: false stops a child process from gaining more privilege, drop: ["ALL"] removes every Linux capability, and seccompProfile RuntimeDefault enables the runtime's default syscall filter. These four fields are what Article 55 dissects in detail; here they are the price of admission.

warn: warns without blocking

enforce mode rejects outright, which is sometimes too harsh for a namespace that's already running. warn shows which pods would violate without yet blocking — good for probing before tightening. In a different namespace, set warn=baseline, then create a pod that violates baseline:

kubectl create namespace psa-warn
kubectl label namespace psa-warn pod-security.kubernetes.io/warn=baseline
# pod with hostNetwork + privileged
kubectl -n psa-warn apply -f priv-pod.yaml

Warning: would violate PodSecurity "baseline:latest": host namespaces (hostNetwork=true),
  privileged (container "c" must not set securityContext.privileged=true)
pod/priv created

There's a Warning: spelling out the two baseline violations (hostNetwork, privileged), but the last line is pod/priv created — the pod is still created. That's the difference between warn and enforce: both detect the violation, but warn only reports while enforce blocks. A security-tightening process usually starts with warn/audit to gather the list of pods that will break, fixes them gradually, and only then turns on enforce.

PSA guards at the pod layer, not the workload

PSA is a validating admission plugin built into the API server (named PodSecurity), sitting at the admission stage from Article 51, running every time there's a CREATE/UPDATE request for a Pod. This detail matters and is easy to get wrong: it guards at the Pod layer, not Deployment or Job. Create a Deployment with a violating pod template in a namespace with enforce=restricted:

kubectl -n psa2 create deployment bad --image=busybox:1.36 -- sleep 100000
kubectl -n psa2 get deploy bad
kubectl -n psa2 get pods

Warning: would violate PodSecurity "restricted:latest": allowPrivilegeEscalation != false ...
deployment.apps/bad created          # the Deployment IS CREATED
NAME   READY   UP-TO-DATE   AVAILABLE
bad    0/1     0            0          # but 0 pods
No resources found in psa2 namespace.

The Deployment is created (with only a Warning from PSA test-running the template), but no pod is born. Because PSA guards at pod CREATE, and a Deployment doesn't create pods directly — its ReplicaSet does (Article 24), and that's where enforce blocks. The error shows up in the ReplicaSet's events, not when the Deployment is created:

kubectl -n psa2 get events | grep FailedCreate

Warning  FailedCreate  replicaset/bad-df59c78fb  Error creating: pods "bad-..." is forbidden:
  violates PodSecurity "restricted:latest": allowPrivilegeEscalation != false ...

This is the mechanism to grasp: enforce applies to the pod object when it's created, so for workloads via a controller, the violation doesn't break Deployment creation but leaves the ReplicaSet stuck in a FailedCreate loop. warn/audit are different — they run directly on the workload object (evaluating the template) to report early, which is why the Warning appears right at create deployment. The practical upshot: a "green" Deployment in kubectl get deploy that stays READY 0/1 forever, in a namespace with PSA, is almost certainly a pod being blocked by enforce — you have to look at the ReplicaSet's events to see it.

🧹 Cleanup

kubectl delete namespace psa-demo psa-warn psa2

Pod Security Admission is a feature built into the API server, nothing extra to install; deleting the namespaces cleans up. Manifests are at github.com/nghiadaulau/kubernetes-from-scratch, directory 54-pod-security.

Wrap-up

Pod Security Admission blocks dangerous pods right at the admission stage, with nothing extra to install (built in since 1.25). It measures pods against three cumulative levels — privileged (no constraints), baseline (blocks known escalations: host namespaces, privileged, hostPath), restricted (adds hardening: runAsNonRoot, allowPrivilegeEscalation=false, drop ALL, seccomp RuntimeDefault) — applied via the namespace label pod-security.kubernetes.io/<mode>=<level>. The three modes differ in consequence: enforce rejects, warn warns, audit logs. We saw enforce=restricted kick out a plain pod with four specific violations, a pod declaring all four fields run, and warn=baseline only warn when it met a hostNetwork+privileged pod. On mechanism, PSA is a validating admission plugin that guards at the Pod layer on CREATE/UPDATE, so for workloads via a controller, a violation doesn't block Deployment creation but leaves the ReplicaSet stuck in FailedCreate — a Deployment that stays READY 0/1 forever in a namespace with PSA is usually that sign. The practical rollout: turn on warn/audit first to probe and fix, then enforce.

The four fields restricted demands — runAsNonRoot, drop capabilities, seccomp, allowPrivilegeEscalation — are policy at the Kubernetes level. Article 55 goes down to the kernel layer to see what they actually do: which capability allows what, how seccomp filters syscalls, and how far the node's Ubuntu AppArmor constrains a process.