When a node is abnormal, Kubernetes will evict pods on the node to ensure workload availability.
In Kubernetes, both kube-controller-manager and kubelet can evict pods.
kube-controller-manager consists of multiple controllers, and eviction is implemented by node controller. node controller periodically checks the status of all nodes. If a node is in the NotReady state for a period of time, all pods on the node will be evicted.
kube-controller-manager supports the following startup parameters:
If resources of a node are to be used up, kubelet executes the eviction policy based on the pod priority, resource usage, and resource request. If pods have the same priority, the pod that uses the most resources or requests for the most resources will be evicted first.
kube-controller-manager evicts all pods on a faulty node, while kubelet evicts some pods on a faulty node. kubelet periodically checks the memory and disk resources of nodes. If the resources are insufficient, it will evict some pods based on the priority. For details about the pod eviction priority, see Pod selection for kubelet eviction.
There are soft eviction thresholds and hard eviction thresholds.
You can configure hard eviction thresholds using the following parameters:
eviction-hard: indicates a hard eviction threshold. When the eviction signal of a node reaches a certain threshold, for example, memory.available<1Gi, which means, when the available memory of the node is less than 1 GiB, a pod eviction will be triggered immediately.
kubelet supports the following default hard eviction thresholds:
kubelet also supports other parameters:
If the pods are not evicted when the node is faulty, perform the following steps to locate the fault:
After the following command is run, the command output shows that many pods are in the Evicted state.
kubectl get pods
cat /var/log/cce/kubernetes/kubelet.log | grep -i Evicted -C3
The issues here are described in order of how likely they are to occur.
Check these causes one by one until you find the cause of the fault.
If a node suffers resource pressure, kubelet will change the node status and add taints to the node. Perform the following steps to check whether the corresponding taint exists on the node:
$ kubectl describe node 192.168.0.37 Name: 192.168.0.37 ... Taints: key1=value1:NoSchedule ...
Node Status |
Taint |
Eviction Signal |
Description |
---|---|---|---|
MemoryPressure |
node.kubernetes.io/memory-pressure |
memory.available |
The available memory on the node reaches the eviction thresholds. |
DiskPressure |
node.kubernetes.io/disk-pressure |
nodefs.available, nodefs.inodesFree, imagefs.available or imagefs.inodesFree |
The available disk space and inode on the root file system or image file system of the node reach the eviction thresholds. |
PIDPressure |
node.kubernetes.io/pid-pressure |
pid.available |
The available process identifier on the node is below the eviction thresholds. |
Use kubectl or locate the row containing the target workload and choose More > Edit YAML in the Operation column to check whether tolerance is configured for the workload. For details, see Taints and Tolerations.
In a cluster that runs less than 50 worker nodes, if the number of faulty nodes accounts for over 55% of the total nodes, the pod eviction will be suspended. In this case, Kubernetes will not attempt to evict the workload on the faulty node. For details, see Rate limits on eviction.
An evicted pod will be frequently scheduled to the original node.
Possible Causes
Pods on a node are evicted based on the node resource usage. The evicted pods are scheduled based on the allocated node resources. Eviction and scheduling are based on different rules. Therefore, an evicted container may be scheduled to the original node again.
Solution
Properly allocate resources to each container.
A workload pod fails and is being redeployed constantly.
Analysis
After a pod is evicted and scheduled to a new node, if pods in that node are also being evicted, the pod will be evicted again. Pods may be evicted repeatedly.
If a pod is evicted by kube-controller-manager, it would be in the Terminating state. This pod will be automatically deleted only after the node where the container is located is restored. If the node has been deleted or cannot be restored due to other reasons, you can forcibly delete the pod.
If a pod is evicted by kubelet, it would be in the Evicted state. This pod is only used for subsequent fault locating and can be directly deleted.
Solution
Run the following command to delete the evicted pods:
kubectl get pods <namespace> | grep Evicted | awk '{print $1}' | xargs kubectl delete pod <namespace>
In the preceding command, <namespace> indicates the namespace name. Configure it based on your requirements.