How to manage disruptions in Kubernetes? Setting a proper RollingUpdate strategy specs solves only one type of disruption.
What about other disruptions?
Voluntary and Involuntary Disruptions
How to manage disruptions in Kubernetes? Setting a proper RollingUpdate strategy specs solves only one type of disruption.
What about other disruptions?
Voluntary and Involuntary Disruptions
Pods do not disappear until someone (a person or a controller) destroys them, or there is an unavoidable hardware or system software error.
We call these unavoidable cases involuntary disruptions to an application. Examples are:
Except for the out-of-resources condition, all these conditions should be familiar to most users; they are not specific to Kubernetes.
We call other cases voluntary disruptions. These include both actions initiated by the application owner and those initiated by a Cluster Administrator. Typical application owner actions include:
Cluster Administrator actions include:
Here are some ways to mitigate involuntary disruptions:
An Application Owner can create a PodDisruptionBudget
object (PDB) for each application. A PDB limits the number pods of a replicated application that are down simultaneously from voluntary disruptions. For example, a quorum-based application would like to ensure that the number of replicas running is never brought below the number needed for a quorum. A web front end might want to ensure that the number of replicas serving load never falls below a certain percentage of the total.
Cluster managers and hosting providers should use tools which respect Pod Disruption Budgets by calling the Eviction API instead of directly deleting pods. Examples are the kubectl drain
command and the Kubernetes-on-GCE cluster upgrade script (cluster/gce/upgrade.sh
).
When a cluster administrator wants to drain a node they use the kubectl drain
command. That tool tries to evict all the pods on the machine. The eviction request may be temporarily rejected, and the tool periodically retries all failed requests until all pods are terminated, or until a configurable timeout is reached.
Example PDB Using minAvailable:
apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
name: zk-pdb
spec:
minAvailable: 2
selector:
matchLabels:
app: zookeeper
Example PDB Using maxUnavailable (Kubernetes 1.7 or higher):
apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
name: zk-pdb
spec:
maxUnavailable: 1
selector:
matchLabels:
app: zookeeper
apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
name:
namespace:
labels:
app:
chart: -
release:
heritage:
spec:
selector:
matchLabels:
app:
env:
minAvailable:
Imagine you have a service with 2 replicas and you need at least 1 to be available even during node upgrades and other ops tasks.
install / upgrade your release:
helm upgrade --install --debug "$RELEASE_NAME" -f helm/values.yaml \
--set replicas=2,budget.minAvailable=1 myrepo/mychart
kubectl describe pdb “$RELEASE_NAME”
Name: mysvc-prod
Namespace: prod
Min available: 1
Selector: app=myservice,env=prod
Status:
Allowed disruptions: 1
Current: 2
Desired: 1
Total: 2
Events: <none>
drain a node with one of your pods running:
kubectl drain --delete-local-data --force --ignore-daemonsets gke-mycluster-prod-pool-2fca4c85-k6g5node
"gke-mycluster-prod-pool-2fca4c85-k6g5" already cordoned
WARNING: Deleting pods with local storage: sqlproxy-67f695889d-t778w;
Ignoring DaemonSet-managed pods: fluentd-gcp-v3.0.0-llp5s;
Deleting pods not managed by ReplicationController, ReplicaSet, Job, DaemonSet or StatefulSet:
kube-proxy-gke-testing-dev-pool-2fca4c85-k6g5
pod "tiller-deploy-7b7b795779-rcvkd" evicted
pod "mysvc-prod-6856d59f9b-lzrtf" evicted
node "gke-mycluster-prod-pool-2fca4c85-k6g5" drained
again: kubectl describe pdb “$RELEASE_NAME”
Name: mysvc-prod
Namespace: prod
Min available: 1
Selector: app=myservice,env=prod
Status:
Allowed disruptions: 0
Current: 1
Desired: 1
Total: 2
Events: <none>
Tadaaa! We drained a node without any disruptions of our service.
If we had 1 replica only, the kubectl drain would get stuck always. Node drains / upgrades would need to be solved manually.
You might expect the eviction API would try to surge a replica to comply with the minAvailable condition, instead the drain gets stuck and it is your responsibility to solve this situation by yourself. Is it a bug or a feature? The Kubernetes community says you shouldn’t use 1 replica in production at all if you want HA, which is fair :)
It does what is expected of it, though.
If you don’t want your kubectl drains to get stuck, you might want to use PDB for deployments with more than 1 replica.
Edit your templates/pdb.yaml:
apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
...
If you are a Cluster Administrator, and you need to perform a disruptive action on all the nodes in your cluster, such as a node or system software upgrade, here are some options:
Accept downtime during the upgrade. Fail over to another complete replica cluster.