kubernetessrereliability

Kubernetes PDBs That Actually Budget

Kubernetes PDBs That Actually Budget

A PDB (PodDisruptionBudget) caps voluntary disruptions: node drains, rolling node-group updates, scaled-down ASGs. It does nothing about involuntary disruption (a node dying).

The footgun

minAvailable: 1 on a Deployment with 2 replicas means the cluster can evict one pod at a time. That's correct for stateless APIs. It's catastrophic for a consumer Deployment whose partition is pinned 1-to-1 with a pod — you'll lose in-flight work for whichever partition gets evicted.

Write PDBs backwards from the rollout strategy

  • RollingUpdate with maxUnavailable: 25% on 8 replicas → PDB minAvailable: 75%.
  • StatefulSet with a leader election → PDB maxUnavailable: 1.
  • Leaderless queue consumers pinned 1:1 with partitions → PDB minAvailable: 100% (yes really) and accept that node upgrades need a planned maintenance window.

The dashboard that catches the silent failures

Alert on kube_poddisruptionbudget_status_current_healthy < kube_poddisruptionbudget_status_desired_healthy continuously for > 10 minutes. A PDB in that state is why a drain is hanging on your cluster upgrade.

Related articles