If the pod is in the Pending state and the event contains pod scheduling failure information, locate the cause based on the event information. For details about how to view events, see How Do I Use Events to Fix Abnormal Workloads?
Determine the cause based on the event information, as listed in Table 1.
Event Information |
Cause and Solution |
---|---|
no nodes available to schedule pods. |
No node is available in the cluster. |
0/2 nodes are available: 2 Insufficient cpu. 0/2 nodes are available: 2 Insufficient memory. |
Node resources (CPU and memory) are insufficient. Check Item 2: Whether Node Resources (CPU and Memory) Are Sufficient |
0/2 nodes are available: 1 node(s) didn't match node selector, 1 node(s) didn't match pod affinity rules, 1 node(s) didn't match pod affinity/anti-affinity. |
The node and pod affinity configurations are mutually exclusive. No node meets the pod requirements. Check Item 3: Affinity and Anti-Affinity Configuration of the Workload |
0/2 nodes are available: 2 node(s) had volume node affinity conflict. |
The EVS volume mounted to the pod and the node are not in the same AZ. Check Item 4: Whether the Workload's Volume and Node Reside in the Same AZ |
0/1 nodes are available: 1 node(s) had taints that the pod didn't tolerate. |
Taints exist on the node, but the pod cannot tolerate these taints. |
0/7 nodes are available: 7 Insufficient ephemeral-storage. |
The ephemeral storage space of the node is insufficient. |
0/1 nodes are available: 1 everest driver not found at node |
The everest-csi-driver on the node is not in the running state. |
Failed to create pod sandbox: ... Create more free space in thin pool or use dm.min_free_space option to change behavior |
The node thin pool space is insufficient. |
0/1 nodes are available: 1 Too many pods. |
The number of pods scheduled to the node exceeded the maximum number allowed by the node. |
Log in to the CCE console and check whether the node status is Available. Alternatively, run the following command to check whether the node status is Ready:
$ kubectl get node NAME STATUS ROLES AGE VERSION 192.168.0.37 Ready <none> 21d v1.19.10-r1.0.0-source-121-gb9675686c54267 192.168.0.71 Ready <none> 21d v1.19.10-r1.0.0-source-121-gb9675686c54267
If the status of all nodes is Not Ready, no node is available in the cluster.
Solution
0/2 nodes are available: 2 Insufficient cpu. This means insufficient CPUs.
0/2 nodes are available: 2 Insufficient memory. This means insufficient memory.
If the resources requested by the pod exceed the allocatable resources of the node where the pod runs, the node cannot provide the resources required to run new pods and pod scheduling onto the node will definitely fail.
If the number of resources that can be allocated to a node is less than the number of resources that a pod requests, the node does not meet the resource requirements of the pod. As a result, the scheduling fails.
Solution
Add nodes to the cluster. Scale-out is the common solution to insufficient resources.
Inappropriate affinity policies will cause pod scheduling to fail.
Example:
An anti-affinity relationship is established between workload 1 and workload 2. Workload 1 is deployed on node 1 while workload 2 is deployed on node 2.
When you try to deploy workload 3 on node 1 and establish an affinity relationship with workload 2, a conflict occurs, resulting in a workload deployment failure.
0/2 nodes are available: 1 node(s) didn't match node selector, 1 node(s) didn't match pod affinity rules, 1 node(s) didn't match pod affinity/anti-affinity.
Solution
No nodes are available that match all of the following predicates: MatchNode Selector, NodeNotSupportsContainer
If the value is false, the scheduling fails.
0/2 nodes are available: 2 node(s) had volume node affinity conflict. An affinity conflict occurs between volumes and nodes. As a result, the scheduling fails.
This is because EVS disks cannot be attached to nodes across AZs. For example, if the EVS volume is located in AZ 1 and the node is located in AZ 2, scheduling fails.
The EVS volume created on CCE has affinity settings by default, as shown below.
kind: PersistentVolume apiVersion: v1 metadata: name: pvc-c29bfac7-efa3-40e6-b8d6-229d8a5372ac spec: ... nodeAffinity: required: nodeSelectorTerms: - matchExpressions: - key: failure-domain.beta.kubernetes.io/zone operator: In values: -
Solution
In the AZ where the workload's node resides, create a volume. Alternatively, create an identical workload and select an automatically assigned cloud storage volume.
0/1 nodes are available: 1 node(s) had taints that the pod didn't tolerate. This means the node is tainted and the pod cannot be scheduled to the node.
Check the taints on the node. If the following information is displayed, taints exist on the node:
$ kubectl describe node 192.168.0.37 Name: 192.168.0.37 ... Taints: key1=value1:NoSchedule ...
In some cases, the system automatically adds a taint to a node. The current built-in taints include:
Solution
To schedule the pod to the node, use either of the following methods:
apiVersion: v1 kind: Pod metadata: name: nginx spec: containers: - name: nginx image: nginx:alpine tolerations: - key: "key1" operator: "Equal" value: "value1" effect: "NoSchedule"
0/7 nodes are available: 7 Insufficient ephemeral-storage. This means insufficient ephemeral storage of the node.
Check whether the size of the ephemeral volume in the pod is limited. If the size of the ephemeral volume required by the application exceeds the existing capacity of the node, the application cannot be scheduled. To solve this problem, change the size of the ephemeral volume or expand the disk capacity of the node.
apiVersion: v1 kind: Pod metadata: name: frontend spec: containers: - name: app image: images.my-company.example/app:v4 resources: requests: ephemeral-storage: "2Gi" limits: ephemeral-storage: "4Gi" volumeMounts: - name: ephemeral mountPath: "/tmp" volumes: - name: ephemeral emptyDir: {}
To obtain the total capacity (Capacity) and available capacity (Allocatable) of the temporary volume mounted to the node, run the kubectl describe node command, and view the application value and limit value of the temporary volume mounted to the node.
The following is an example of the output:
... Capacity: cpu: 4 ephemeral-storage: 61607776Ki hugepages-1Gi: 0 hugepages-2Mi: 0 localssd: 0 localvolume: 0 memory: 7614352Ki pods: 40 Allocatable: cpu: 3920m ephemeral-storage: 56777726268 hugepages-1Gi: 0 hugepages-2Mi: 0 localssd: 0 localvolume: 0 memory: 6180752Ki pods: 40 ... Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) Resource Requests Limits -------- -------- ------ cpu 1605m (40%) 6530m (166%) memory 2625Mi (43%) 5612Mi (92%) ephemeral-storage 0 (0%) 0 (0%) hugepages-1Gi 0 (0%) 0 (0%) hugepages-2Mi 0 (0%) 0 (0%) localssd 0 0 localvolume 0 0 Events: <none>
0/1 nodes are available: 1 everest driver not found at node. This means the everest-csi-driver of everest is not started properly on the node.
Check the daemon named everest-csi-driver in the kube-system namespace and check whether the pod is started properly. If not, delete the pod. The daemon will restart the pod.
A data disk dedicated for kubelet and the container engine will be attached to a new node. If the data disk space is insufficient, the pod cannot be created.
Solution 1: Clearing images
crictl images -v
crictl rmi Image ID
docker images
docker rmi Image ID
Do not delete system images such as the cce-pause image. Otherwise, pods may fail to be created.
Solution 2: Expanding the disk capacity
To expand a disk capacity, perform the following steps:
Only the storage capacity of the EVS disk is expanded. You also need to perform the following steps to expand the capacity of the logical volume and file system.
A data disk is divided depending on the container storage Rootfs:
Overlayfs: No independent thin pool is allocated. Image data is stored in dockersys.
# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT vda 8:0 0 50G 0 disk └─vda1 8:1 0 50G 0 part / vdb 8:16 0 200G 0 disk # Data disk has been expanded but not allocated ├─vgpaas-dockersys 253:0 0 90G 0 lvm /var/lib/containerd # Space used by the container engine └─vgpaas-kubernetes 253:1 0 10G 0 lvm /mnt/paas/kubernetes/kubelet # Space used by Kubernetes
Add the new disk capacity to the dockersys logical volume used by the container engine.
pvresize /dev/vdb
Information similar to the following is displayed:
Physical volume "/dev/vdb" changed 1 physical volume(s) resized or updated / 0 physical volume(s) not resized
lvextend -l+100%FREE -n vgpaas/dockersys
Information similar to the following is displayed:
Size of logical volume vgpaas/dockersys changed from <90.00 GiB (23039 extents) to <190.00 GiB (48639 extents). Logical volume vgpaas/dockersys successfully resized.
resize2fs /dev/vgpaas/dockersys
Information similar to the following is displayed:
Filesystem at /dev/vgpaas/dockersys is mounted on /var/lib/containerd; on-line resizing required old_desc_blocks = 12, new_desc_blocks = 24 The filesystem on /dev/vgpaas/dockersys is now 49807360 (4k) blocks long.
0/1 nodes are available: 1 Too many pods. indicates excessive number of pods have been scheduled to the node.
When creating a node, configure Max. Pods in Advanced Settings to specify the maximum number of pods that can run properly on the node. The default value varies with the node flavor. You can change the value as needed.
On the Nodes page, obtain the Pods (Allocated/Total) value of the node, and check whether the number of pods scheduled onto the node has reached the upper limit. If so, add nodes or change the maximum number of pods.
To change the maximum number of pods that can run on a node, do as follows: