doc-exports/docs/cce/umn/cce_10_0132.html

<a name="cce_10_0132"></a><a name="cce_10_0132"></a>

<h1 class="topictitle1">CCE Node Problem Detector</h1>
<div id="body1544406897523"><div class="section" id="cce_10_0132__section173631312185614"><h4 class="sectiontitle">Introduction</h4><p id="cce_10_0132__p37354218618">CCE Node Problem Detector (NPD) is an add-on that monitors abnormal events of cluster nodes and connects to a third-party monitoring platform. It is a daemon running on each node. It collects node issues from different daemons and reports them to the API server. This add-on can run as a DaemonSet or a daemon.</p>
<p id="cce_10_0132__p152804433415">For more information, see <a href="https://github.com/kubernetes/node-problem-detector" target="_blank" rel="noopener noreferrer">node-problem-detector</a>.</p>
</div>
<div class="section" id="cce_10_0132__section119671349192611"><h4 class="sectiontitle">Notes and Constraints</h4><ul id="cce_10_0132__ul55121521141010"><li id="cce_10_0132__li8512142151010">When using this add-on, do not format or partition node disks.</li><li id="cce_10_0132__li151217213101">Each NPD process occupies 30 m CPU and 100 MiB of memory.</li><li id="cce_10_0132__li1462834917375">If the NPD version is 1.18.45 or later, the EulerOS version of the host machine must be 2.5 or later.</li></ul>
</div>
<div class="section" id="cce_10_0132__section158021093142"><h4 class="sectiontitle">Permissions</h4><p id="cce_10_0132__p17541021171417">To monitor kernel logs, the NPD add-on needs to read the host <strong id="cce_10_0132__b96321135103716">/dev/kmsg</strong>. Therefore, the privileged mode must be enabled. For details, see <a href="https://kubernetes.io/docs/concepts/policy/pod-security-policy/#privileged" target="_blank" rel="noopener noreferrer">privileged</a>.</p>
<p id="cce_10_0132__p1992131751419">In addition, CCE mitigates risks according to the least privilege principle. Only the following privileges are available for NPD running:</p>
<ul id="cce_10_0132__ul57753573145"><li id="cce_10_0132__li077514576147">cap_dac_read_search: permission to access <strong id="cce_10_0132__b183144419493">/run/log/journal</strong>.</li><li id="cce_10_0132__li15775205761415">cap_sys_admin: permission to access <strong id="cce_10_0132__b173812584913">/dev/kmsg</strong>.</li></ul>
</div>
<div class="section" id="cce_10_0132__section189463341114"><h4 class="sectiontitle">Installing the Add-on</h4><ol id="cce_10_0132__ol13949124616422"><li id="cce_10_0132__li13183153352515"><span>Log in to the CCE console and click the cluster name to access the cluster console. In the navigation pane, choose <strong id="cce_10_0132__b07161840151517"><span id="cce_10_0132__text871610407151">Add-ons</span></strong>, locate <strong id="cce_10_0132__b13717104016154">CCE Node Problem Detector</strong> on the right, and click <strong id="cce_10_0132__b19717204021518">Install</strong>.</span></li><li id="cce_10_0132__li6185135511235"><span>On the <strong id="cce_10_0132__b1745119172456">Install Add-on</strong> page, configure the specifications as needed.</span><p><p id="cce_10_0132__p12804745248">You can adjust the number of add-on instances and resource quotas as required. High availability is not possible with a single pod. If an error occurs on the node where the add-on instance runs, the add-on will fail.</p>
</p></li><li id="cce_10_0132__li3450182972413"><span>Configure the add-on parameters.</span><p><p id="cce_10_0132__p152344311248"><strong id="cce_10_0132__b152741136174614">Maximum Number of Isolated Nodes in a Fault</strong>: specifies the maximum number of nodes that can be isolated to prevent avalanches in case of a fault occurring on multiple nodes. You can configure this parameter either by percentage or quantity.</p>
</p></li><li id="cce_10_0132__li155851217011"><span>Configure deployment policies for the add-on pods.</span><p><div class="note" id="cce_10_0132__cce_10_0129_note32098410561"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><ul id="cce_10_0132__cce_10_0129_ul220911419567"><li id="cce_10_0132__cce_10_0129_li152095435618">Scheduling policies do not take effect on add-on instances of the DaemonSet type.</li><li id="cce_10_0132__cce_10_0129_li1720914445612">When configuring multi-AZ deployment or node affinity, ensure that there are nodes meeting the scheduling policy and that resources are sufficient in the cluster. Otherwise, the add-on cannot run.</li></ul>
</div></div>

<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="cce_10_0132__cce_10_0129_table52109416562" frame="border" border="1" rules="all"><caption><b>Table 1 </b>Configurations for add-on scheduling</caption><thead align="left"><tr id="cce_10_0132__cce_10_0129_row521016413569"><th align="left" class="cellrowborder" valign="top" width="24%" id="mcps1.3.4.2.4.2.2.2.3.1.1"><p id="cce_10_0132__cce_10_0129_p15210124175611">Parameter</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="76%" id="mcps1.3.4.2.4.2.2.2.3.1.2"><p id="cce_10_0132__cce_10_0129_p13210142565">Description</p>
</th>
</tr>
</thead>
<tbody><tr id="cce_10_0132__cce_10_0129_row162102049564"><td class="cellrowborder" valign="top" width="24%" headers="mcps1.3.4.2.4.2.2.2.3.1.1 "><p id="cce_10_0132__cce_10_0129_p421019416569">Multi-AZ Deployment</p>
</td>
<td class="cellrowborder" valign="top" width="76%" headers="mcps1.3.4.2.4.2.2.2.3.1.2 "><ul id="cce_10_0132__cce_10_0129_ul122101425619"><li id="cce_10_0132__cce_10_0129_li142101342560"><strong id="cce_10_0132__cce_10_0129_b14923247163911">Preferred</strong>: Deployment pods of the add-on will be preferentially scheduled to nodes in different AZs. If all the nodes in the cluster are deployed in the same AZ, the pods will be scheduled to different nodes in that AZ.</li><li id="cce_10_0132__cce_10_0129_li52682031184214"><strong id="cce_10_0132__cce_10_0129_b8203192017422">Equivalent mode</strong>: Deployment pods of the add-on are evenly scheduled to the nodes in the cluster in each AZ. If a new AZ is added, you are advised to increase add-on pods for cross-AZ HA deployment. With the Equivalent multi-AZ deployment, the difference between the number of add-on pods in different AZs will be less than or equal to 1. If resources in one of the AZs are insufficient, pods cannot be scheduled to that AZ.</li><li id="cce_10_0132__cce_10_0129_li3210440562"><strong id="cce_10_0132__cce_10_0129_b18511251183914">Forcible</strong>: Deployment pods of the add-on are forcibly scheduled to nodes in different AZs. There can be at most one pod in each AZ. If nodes in a cluster are not in different AZs, some add-on pods cannot run properly. If a node is faulty, add-on pods on it may fail to be migrated.</li></ul>
</td>
</tr>
<tr id="cce_10_0132__cce_10_0129_row1121010416566"><td class="cellrowborder" valign="top" width="24%" headers="mcps1.3.4.2.4.2.2.2.3.1.1 "><p id="cce_10_0132__cce_10_0129_p12210114165612">Node Affinity</p>
</td>
<td class="cellrowborder" valign="top" width="76%" headers="mcps1.3.4.2.4.2.2.2.3.1.2 "><ul id="cce_10_0132__cce_10_0129_ul1621054145617"><li id="cce_10_0132__cce_10_0129_li1721017413562"><strong id="cce_10_0132__cce_10_0129_b2074619819545">Not configured</strong>: Node affinity is disabled for the add-on.</li><li id="cce_10_0132__cce_10_0129_li52109417563"><strong id="cce_10_0132__cce_10_0129_b7658101316551">Specify node</strong>: Specify the nodes where the add-on is deployed. If you do not specify the nodes, the add-on will be randomly scheduled based on the default cluster scheduling policy.</li><li id="cce_10_0132__cce_10_0129_li1421015415561"><strong id="cce_10_0132__cce_10_0129_b98581358205610">Specify node pool</strong>: Specify the node pool where the add-on is deployed. If you do not specify the node pool, the add-on will be randomly scheduled based on the default cluster scheduling policy.</li><li id="cce_10_0132__cce_10_0129_li92101542568"><strong id="cce_10_0132__cce_10_0129_b634615619572">Customize affinity</strong>: Enter the labels of the nodes where the add-on is to be deployed for more flexible scheduling policies. If you do not specify node labels, the add-on will be randomly scheduled based on the default cluster scheduling policy.<p id="cce_10_0132__cce_10_0129_p19210104145617">If multiple custom affinity policies are configured, ensure that there are nodes that meet all the affinity policies in the cluster. Otherwise, the add-on cannot run.</p>
</li></ul>
</td>
</tr>
<tr id="cce_10_0132__cce_10_0129_row3210645563"><td class="cellrowborder" valign="top" width="24%" headers="mcps1.3.4.2.4.2.2.2.3.1.1 "><p id="cce_10_0132__cce_10_0129_p1821012465613">Toleration</p>
</td>
<td class="cellrowborder" valign="top" width="76%" headers="mcps1.3.4.2.4.2.2.2.3.1.2 "><p id="cce_10_0132__cce_10_0129_p11210164125619">Using both taints and tolerations allows (not forcibly) the add-on Deployment to be scheduled to a node with the matching taints, and controls the Deployment eviction policies after the node where the Deployment is located is tainted.</p>
<p id="cce_10_0132__cce_10_0129_p19210174185613">The add-on adds the default tolerance policy for the <strong id="cce_10_0132__cce_10_0129_b17210184125619">node.kubernetes.io/not-ready</strong> and <strong id="cce_10_0132__cce_10_0129_b8210114115616">node.kubernetes.io/unreachable</strong> taints, respectively. The tolerance time window is 60s.</p>
<p id="cce_10_0132__cce_10_0129_p2210144135620">For details, see <a href="cce_10_0728.html">Configuring Tolerance Policies</a>.</p>
</td>
</tr>
</tbody>
</table>
</div>
</p></li><li id="cce_10_0132__li20337191915318"><span>Click <span class="uicontrol" id="cce_10_0132__uicontrol1828423720911"><b>Install</b></span>.</span></li></ol>
</div>
<div class="section" id="cce_10_0132__section0377457163618"><h4 class="sectiontitle">Components</h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="cce_10_0132__table1965341035819" frame="border" border="1" rules="all"><caption><b>Table 2 </b>Add-on components</caption><thead align="left"><tr id="cce_10_0132__row1565319102582"><th align="left" class="cellrowborder" valign="top" width="23%" id="mcps1.3.5.2.2.4.1.1"><p id="cce_10_0132__p14653141018584">Component</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="55.00000000000001%" id="mcps1.3.5.2.2.4.1.2"><p id="cce_10_0132__p065391025820">Description</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="22%" id="mcps1.3.5.2.2.4.1.3"><p id="cce_10_0132__p5653111015587">Resource Type</p>
</th>
</tr>
</thead>
<tbody><tr id="cce_10_0132__row872889165919"><td class="cellrowborder" valign="top" width="23%" headers="mcps1.3.5.2.2.4.1.1 "><p id="cce_10_0132__p019563144115">node-problem-controller</p>
</td>
<td class="cellrowborder" valign="top" width="55.00000000000001%" headers="mcps1.3.5.2.2.4.1.2 "><p id="cce_10_0132__p1455113183920">Isolate faults basically based on fault detection results.</p>
</td>
<td class="cellrowborder" valign="top" width="22%" headers="mcps1.3.5.2.2.4.1.3 "><p id="cce_10_0132__p772869115917">Deployment</p>
</td>
</tr>
<tr id="cce_10_0132__row2653710135812"><td class="cellrowborder" valign="top" width="23%" headers="mcps1.3.5.2.2.4.1.1 "><p id="cce_10_0132__p1352563913419">node-problem-detector</p>
</td>
<td class="cellrowborder" valign="top" width="55.00000000000001%" headers="mcps1.3.5.2.2.4.1.2 "><p id="cce_10_0132__p37055414113">Detect node faults.</p>
</td>
<td class="cellrowborder" valign="top" width="22%" headers="mcps1.3.5.2.2.4.1.3 "><p id="cce_10_0132__p365411016585">DaemonSet</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="cce_10_0132__section69115153399"><h4 class="sectiontitle">NPD Check Items</h4><div class="note" id="cce_10_0132__note173963403394"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="cce_10_0132__p5396184020391">Check items are supported only in 1.16.0 and later versions.</p>
</div></div>
<p id="cce_10_0132__p4958134764017">Check items cover events and statuses.</p>
<ul id="cce_10_0132__ul1699482505"><li id="cce_10_0132__li20691448115014">Event-related<p id="cce_10_0132__p9338344194514"><a name="cce_10_0132__li20691448115014"></a><a name="li20691448115014"></a>For event-related check items, when a problem occurs, NPD reports an event to the API server. The event type can be <strong id="cce_10_0132__b0674907539">Normal</strong> (normal event) or <strong id="cce_10_0132__b264113215531">Warning</strong> (abnormal event).</p>

<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="cce_10_0132__table14820155834010" frame="border" border="1" rules="all"><caption><b>Table 3 </b>Event-related check items</caption><thead align="left"><tr id="cce_10_0132__row1287205884012"><th align="left" class="cellrowborder" valign="top" width="16%" id="mcps1.3.6.4.1.2.2.4.1.1"><p id="cce_10_0132__p7872185819403">Check Item</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="51%" id="mcps1.3.6.4.1.2.2.4.1.2"><p id="cce_10_0132__p1887295816407">Function</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33%" id="mcps1.3.6.4.1.2.2.4.1.3"><p id="cce_10_0132__p1287215814401">Description</p>
</th>
</tr>
</thead>
<tbody><tr id="cce_10_0132__row19872758194019"><td class="cellrowborder" valign="top" width="16%" headers="mcps1.3.6.4.1.2.2.4.1.1 "><p id="cce_10_0132__p17872858134019">OOMKilling</p>
</td>
<td class="cellrowborder" valign="top" width="51%" headers="mcps1.3.6.4.1.2.2.4.1.2 "><p id="cce_10_0132__p1987215816404">Listen to the kernel logs and check whether OOM events occur and are reported.</p>
<p id="cce_10_0132__p160216314713">Typical scenario: When the memory usage of a process in a container exceeds the limit, OOM is triggered and the process is terminated.</p>
</td>
<td class="cellrowborder" valign="top" width="33%" headers="mcps1.3.6.4.1.2.2.4.1.3 "><p id="cce_10_0132__p487265818405">Warning event</p>
<p id="cce_10_0132__p12872117515">Listening object: <strong id="cce_10_0132__b137891565425">/dev/kmsg</strong></p>
<p id="cce_10_0132__p6872710520">Matching rule: "Killed process \\d+ (.+) total-vm:\\d+kB, anon-rss:\\d+kB, file-rss:\\d+kB.*"</p>
</td>
</tr>
<tr id="cce_10_0132__row2087225814405"><td class="cellrowborder" valign="top" width="16%" headers="mcps1.3.6.4.1.2.2.4.1.1 "><p id="cce_10_0132__p1387245820403">TaskHung</p>
</td>
<td class="cellrowborder" valign="top" width="51%" headers="mcps1.3.6.4.1.2.2.4.1.2 "><p id="cce_10_0132__p18872155844013">Listen to the kernel logs and check whether taskHung events occur and are reported.</p>
<p id="cce_10_0132__p2317112419618">Typical scenario: Disk I/O suspension causes process suspension.</p>
</td>
<td class="cellrowborder" valign="top" width="33%" headers="mcps1.3.6.4.1.2.2.4.1.3 "><p id="cce_10_0132__p16872758114011">Warning event</p>
<p id="cce_10_0132__p173071159247">Listening object: <strong id="cce_10_0132__b1783717768">/dev/kmsg</strong></p>
<p id="cce_10_0132__p153079591241">Matching rule: "task \\S+:\\w+ blocked for more than \\w+ seconds\\."</p>
</td>
</tr>
<tr id="cce_10_0132__row137852513316"><td class="cellrowborder" valign="top" width="16%" headers="mcps1.3.6.4.1.2.2.4.1.1 "><p id="cce_10_0132__p147857511316">ReadonlyFilesystem</p>
</td>
<td class="cellrowborder" valign="top" width="51%" headers="mcps1.3.6.4.1.2.2.4.1.2 "><p id="cce_10_0132__p1434416191840">Check whether the <strong id="cce_10_0132__b1150242118561">Remount root filesystem read-only</strong> error occurs in the system kernel by listening to the kernel logs.</p>
<p id="cce_10_0132__p15344719349">Typical scenario: A user detaches a data disk from a node by mistake on the ECS, and applications continuously write data to the mount point of the data disk. As a result, an I/O error occurs in the kernel and the disk is remounted as a read-only disk.</p>
<div class="note" id="cce_10_0132__note181261749101412"><span class="notetitle"> NOTE: </span><div class="notebody"><p id="cce_10_0132__p111261491145">If the rootfs of node pods is of the device mapper type, an error will occur in the thin pool if a data disk is detached. This will affect NPD and NPD will not be able to detect node faults.</p>
</div></div>
</td>
<td class="cellrowborder" valign="top" width="33%" headers="mcps1.3.6.4.1.2.2.4.1.3 "><p id="cce_10_0132__p944535317711">Warning event</p>
<p id="cce_10_0132__p183981710948">Listening object: <strong id="cce_10_0132__b1235710589">/dev/kmsg</strong></p>
<p id="cce_10_0132__p83993101042">Matching rule: <strong id="cce_10_0132__b15200817134219">Remounting filesystem read-only</strong></p>
</td>
</tr>
</tbody>
</table>
</div>
</li><li id="cce_10_0132__li29881573504">Status-related<p id="cce_10_0132__p2700175815517"><a name="cce_10_0132__li29881573504"></a><a name="li29881573504"></a>For status-related check items, when a problem occurs, NPD reports an event to the API server and changes the node status synchronously. This function can be used together with <a href="#cce_10_0132__section1471610580474">Node-problem-controller fault isolation</a> to isolate nodes.</p>
<p id="cce_10_0132__p123464919476"><strong id="cce_10_0132__b289275011010">If the check period is not specified in the following check items, the default period is 30 seconds.</strong></p>

<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="cce_10_0132__table5966193210414" frame="border" border="1" rules="all"><caption><b>Table 4 </b>Checking system components</caption><thead align="left"><tr id="cce_10_0132__row1439833124119"><th align="left" class="cellrowborder" valign="top" width="29.75%" id="mcps1.3.6.4.2.3.2.4.1.1"><p id="cce_10_0132__p039183318412">Check Item</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.97%" id="mcps1.3.6.4.2.3.2.4.1.2"><p id="cce_10_0132__p8397332416">Function</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="36.28%" id="mcps1.3.6.4.2.3.2.4.1.3"><p id="cce_10_0132__p153920338415">Description</p>
</th>
</tr>
</thead>
<tbody><tr id="cce_10_0132__row43634312108"><td class="cellrowborder" valign="top" width="29.75%" headers="mcps1.3.6.4.2.3.2.4.1.1 "><p id="cce_10_0132__p230684616478">Container network component error</p>
<p id="cce_10_0132__p85821221101410">CNIProblem</p>
</td>
<td class="cellrowborder" valign="top" width="33.97%" headers="mcps1.3.6.4.2.3.2.4.1.2 "><p id="cce_10_0132__p2182113131012">Check the status of the CNI components (container network components).</p>
</td>
<td class="cellrowborder" valign="top" width="36.28%" headers="mcps1.3.6.4.2.3.2.4.1.3 "><p id="cce_10_0132__p10182183131016">None</p>
</td>
</tr>
<tr id="cce_10_0132__row1832083812107"><td class="cellrowborder" valign="top" width="29.75%" headers="mcps1.3.6.4.2.3.2.4.1.1 "><p id="cce_10_0132__p192511642175119">Container runtime component error</p>
<p id="cce_10_0132__p189016404149">CRIProblem</p>
</td>
<td class="cellrowborder" valign="top" width="33.97%" headers="mcps1.3.6.4.2.3.2.4.1.2 "><p id="cce_10_0132__p19541247101019">Check the status of Docker and containerd of the CRI components (container runtime components).</p>
</td>
<td class="cellrowborder" valign="top" width="36.28%" headers="mcps1.3.6.4.2.3.2.4.1.3 "><p id="cce_10_0132__p1954154717105">Check object: Docker or containerd</p>
</td>
</tr>
<tr id="cce_10_0132__row133983316414"><td class="cellrowborder" valign="top" width="29.75%" headers="mcps1.3.6.4.2.3.2.4.1.1 "><p id="cce_10_0132__p1273874481418">Frequent restarts of Kubelet</p>
<p id="cce_10_0132__p260214517150">FrequentKubeletRestart</p>
</td>
<td class="cellrowborder" valign="top" width="33.97%" headers="mcps1.3.6.4.2.3.2.4.1.2 "><p id="cce_10_0132__p187251156226">Periodically backtrack system logs to check whether the key component Kubelet restarts frequently.</p>
</td>
<td class="cellrowborder" rowspan="3" valign="top" width="36.28%" headers="mcps1.3.6.4.2.3.2.4.1.3 "><ul id="cce_10_0132__ul15361156122"><li id="cce_10_0132__li14361515126">Default threshold: 10 restarts within 10 minutes<p id="cce_10_0132__p9122024101116"><a name="cce_10_0132__li14361515126"></a><a name="li14361515126"></a>If Kubelet restarts for 10 times within 10 minutes, it indicates that the system restarts frequently and a fault alarm is generated.</p>
</li><li id="cce_10_0132__li33695151213">Listening object: logs in the <strong id="cce_10_0132__b745058162910">/run/log/journal</strong> directory</li></ul>
<div class="note" id="cce_10_0132__note755113461253"><span class="notetitle"> NOTE: </span><div class="notebody"><p id="cce_10_0132__p159646492519">The Ubuntu and HCE 2.0 OSs do not support the preceding check items due to incompatible log formats.</p>
</div></div>
</td>
</tr>
<tr id="cce_10_0132__row639103354117"><td class="cellrowborder" valign="top" headers="mcps1.3.6.4.2.3.2.4.1.1 "><p id="cce_10_0132__p14986347185113">Frequent restarts of Docker</p>
<p id="cce_10_0132__p194684577142">FrequentDockerRestart</p>
</td>
<td class="cellrowborder" valign="top" headers="mcps1.3.6.4.2.3.2.4.1.2 "><p id="cce_10_0132__p939933174112">Periodically backtrack system logs to check whether the container runtime Docker restarts frequently.</p>
</td>
</tr>
<tr id="cce_10_0132__row839733124114"><td class="cellrowborder" valign="top" headers="mcps1.3.6.4.2.3.2.4.1.1 "><p id="cce_10_0132__p184432453516">Frequent restarts of containerd</p>
<p id="cce_10_0132__p1447913571273">FrequentContainerdRestart</p>
</td>
<td class="cellrowborder" valign="top" headers="mcps1.3.6.4.2.3.2.4.1.2 "><p id="cce_10_0132__p193311313312">Periodically backtrack system logs to check whether the container runtime containerd restarts frequently.</p>
</td>
</tr>
<tr id="cce_10_0132__row639123312418"><td class="cellrowborder" valign="top" width="29.75%" headers="mcps1.3.6.4.2.3.2.4.1.1 "><p id="cce_10_0132__p1237611179529">kubelet error</p>
<p id="cce_10_0132__p59951012101515">KubeletProblem</p>
</td>
<td class="cellrowborder" valign="top" width="33.97%" headers="mcps1.3.6.4.2.3.2.4.1.2 "><p id="cce_10_0132__p1639233134114">Check the status of the key component Kubelet.</p>
</td>
<td class="cellrowborder" valign="top" width="36.28%" headers="mcps1.3.6.4.2.3.2.4.1.3 "><p id="cce_10_0132__p1239133334118">None</p>
</td>
</tr>
<tr id="cce_10_0132__row6215145918531"><td class="cellrowborder" valign="top" width="29.75%" headers="mcps1.3.6.4.2.3.2.4.1.1 "><p id="cce_10_0132__p6413139195215">kube-proxy error</p>
<p id="cce_10_0132__p181881518181514">KubeProxyProblem</p>
</td>
<td class="cellrowborder" valign="top" width="33.97%" headers="mcps1.3.6.4.2.3.2.4.1.2 "><p id="cce_10_0132__p41511802546">Check the status of the key component kube-proxy.</p>
</td>
<td class="cellrowborder" valign="top" width="36.28%" headers="mcps1.3.6.4.2.3.2.4.1.3 "><p id="cce_10_0132__p1115120155419">None</p>
</td>
</tr>
</tbody>
</table>
</div>

<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="cce_10_0132__table493185102713" frame="border" border="1" rules="all"><caption><b>Table 5 </b>Checking system metrics</caption><thead align="left"><tr id="cce_10_0132__row6932155122711"><th align="left" class="cellrowborder" valign="top" width="22.93%" id="mcps1.3.6.4.2.4.2.4.1.1"><p id="cce_10_0132__p159321857271">Check Item</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="40.82%" id="mcps1.3.6.4.2.4.2.4.1.2"><p id="cce_10_0132__p593216512271">Function</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="36.25%" id="mcps1.3.6.4.2.4.2.4.1.3"><p id="cce_10_0132__p1293212516270">Description</p>
</th>
</tr>
</thead>
<tbody><tr id="cce_10_0132__row18250140195913"><td class="cellrowborder" valign="top" width="22.93%" headers="mcps1.3.6.4.2.4.2.4.1.1 "><p id="cce_10_0132__p8306646174712">Conntrack table full</p>
<p id="cce_10_0132__p141952518172">ConntrackFullProblem</p>
</td>
<td class="cellrowborder" valign="top" width="40.82%" headers="mcps1.3.6.4.2.4.2.4.1.2 "><p id="cce_10_0132__p1715712185912">Check whether the conntrack table is full.</p>
</td>
<td class="cellrowborder" valign="top" width="36.25%" headers="mcps1.3.6.4.2.4.2.4.1.3 "><ul id="cce_10_0132__ul15378114164118"><li id="cce_10_0132__li937854144118">Default threshold: 90%</li></ul>
<ul id="cce_10_0132__ul137698406424"><li id="cce_10_0132__li6769104014425">Usage: <strong id="cce_10_0132__b136551027115912">nf_conntrack_count</strong></li><li id="cce_10_0132__li676944064214">Maximum value: <strong id="cce_10_0132__b198361535135914">nf_conntrack_max</strong></li></ul>
</td>
</tr>
<tr id="cce_10_0132__row1950431814401"><td class="cellrowborder" valign="top" width="22.93%" headers="mcps1.3.6.4.2.4.2.4.1.1 "><p id="cce_10_0132__p36813213539">Insufficient disk resources</p>
<p id="cce_10_0132__p42911646121819">DiskProblem</p>
</td>
<td class="cellrowborder" valign="top" width="40.82%" headers="mcps1.3.6.4.2.4.2.4.1.2 "><p id="cce_10_0132__p17441915115519">Check the usage of the system disk and CCE data disks (including the CRI logical disk and kubelet logical disk) on the node.</p>
</td>
<td class="cellrowborder" valign="top" width="36.25%" headers="mcps1.3.6.4.2.4.2.4.1.3 "><ul id="cce_10_0132__ul4388122614120"><li id="cce_10_0132__li16388126201219">Default threshold: 90%</li><li id="cce_10_0132__li123880266127">Source:<pre class="screen" id="cce_10_0132__screen18388426171216">df -h</pre>
</li></ul>
<p id="cce_10_0132__p5388226151218">Currently, additional data disks are not supported.</p>
</td>
</tr>
<tr id="cce_10_0132__row7349154113409"><td class="cellrowborder" valign="top" width="22.93%" headers="mcps1.3.6.4.2.4.2.4.1.1 "><p id="cce_10_0132__p793565214526">Insufficient file handles</p>
<p id="cce_10_0132__p052091171916">FDProblem</p>
</td>
<td class="cellrowborder" valign="top" width="40.82%" headers="mcps1.3.6.4.2.4.2.4.1.2 "><p id="cce_10_0132__p8441131525518">Check if the FD file handles are used up.</p>
</td>
<td class="cellrowborder" valign="top" width="36.25%" headers="mcps1.3.6.4.2.4.2.4.1.3 "><ul id="cce_10_0132__ul1622954025812"><li id="cce_10_0132__li92294407583">Default threshold: 90%</li><li id="cce_10_0132__li12298402586">Usage: the first value in <strong id="cce_10_0132__b19956519133313">/proc/sys/fs/file-nr</strong></li><li id="cce_10_0132__li11229104019586">Maximum value: the third value in <strong id="cce_10_0132__b1671227113313">/proc/sys/fs/file-nr</strong></li></ul>
</td>
</tr>
<tr id="cce_10_0132__row4860175711407"><td class="cellrowborder" valign="top" width="22.93%" headers="mcps1.3.6.4.2.4.2.4.1.1 "><p id="cce_10_0132__p46795813527">Insufficient node memory</p>
<p id="cce_10_0132__p971762919196">MemoryProblem</p>
</td>
<td class="cellrowborder" valign="top" width="40.82%" headers="mcps1.3.6.4.2.4.2.4.1.2 "><p id="cce_10_0132__p24411015115516">Check whether memory is used up.</p>
</td>
<td class="cellrowborder" valign="top" width="36.25%" headers="mcps1.3.6.4.2.4.2.4.1.3 "><ul id="cce_10_0132__ul1031348901"><li id="cce_10_0132__li133131781208">Default threshold: 80%</li><li id="cce_10_0132__li14313281409">Usage: <strong id="cce_10_0132__b471916287359">MemTotal-MemAvailable</strong> in <strong id="cce_10_0132__b46373110357">/proc/meminfo</strong></li><li id="cce_10_0132__li1131319819010">Maximum value: <strong id="cce_10_0132__b1280119350352">MemTotal</strong> in <strong id="cce_10_0132__b19312153853515">/proc/meminfo</strong></li></ul>
</td>
</tr>
<tr id="cce_10_0132__row11292101316587"><td class="cellrowborder" valign="top" width="22.93%" headers="mcps1.3.6.4.2.4.2.4.1.1 "><p id="cce_10_0132__p79371748105318">Insufficient process resources</p>
<p id="cce_10_0132__p7989035121918">PIDProblem</p>
</td>
<td class="cellrowborder" valign="top" width="40.82%" headers="mcps1.3.6.4.2.4.2.4.1.2 "><p id="cce_10_0132__p1244141515511">Check whether PID process resources are exhausted.</p>
</td>
<td class="cellrowborder" valign="top" width="36.25%" headers="mcps1.3.6.4.2.4.2.4.1.3 "><ul id="cce_10_0132__ul20671412580"><li id="cce_10_0132__li1611143589">Default threshold: 90%</li><li id="cce_10_0132__li5651445818">Usage: <strong id="cce_10_0132__b1861118191124">nr_threads in /proc/loadavg</strong></li><li id="cce_10_0132__li16619146580">Maximum value: smaller value between <strong id="cce_10_0132__b51811252332">/proc/sys/kernel/pid_max</strong> and <strong id="cce_10_0132__b16653128163312">/proc/sys/kernel/threads-max</strong>.</li></ul>
</td>
</tr>
</tbody>
</table>
</div>

<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="cce_10_0132__table24948715278" frame="border" border="1" rules="all"><caption><b>Table 6 </b>Checking the storage</caption><thead align="left"><tr id="cce_10_0132__row94951770273"><th align="left" class="cellrowborder" valign="top" width="24.2%" id="mcps1.3.6.4.2.5.2.4.1.1"><p id="cce_10_0132__p34951715272">Check Item</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="39.550000000000004%" id="mcps1.3.6.4.2.5.2.4.1.2"><p id="cce_10_0132__p164953713276">Function</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="36.25%" id="mcps1.3.6.4.2.5.2.4.1.3"><p id="cce_10_0132__p174955714276">Description</p>
</th>
</tr>
</thead>
<tbody><tr id="cce_10_0132__row34978752711"><td class="cellrowborder" valign="top" width="24.2%" headers="mcps1.3.6.4.2.5.2.4.1.1 "><p id="cce_10_0132__p1520151125414">Disk read-only</p>
<p id="cce_10_0132__p13529143342016">DiskReadonly</p>
</td>
<td class="cellrowborder" valign="top" width="39.550000000000004%" headers="mcps1.3.6.4.2.5.2.4.1.2 "><p id="cce_10_0132__p249713713271">Periodically perform write tests on the system disk and CCE data disks (including the CRI logical disk and Kubelet logical disk) of the node to check the availability of key disks.</p>
</td>
<td class="cellrowborder" valign="top" width="36.25%" headers="mcps1.3.6.4.2.5.2.4.1.3 "><p id="cce_10_0132__p164974742711">Detection paths:</p>
<ul id="cce_10_0132__ul15871334132818"><li id="cce_10_0132__li19587234112815">/mnt/paas/kubernetes/kubelet/</li><li id="cce_10_0132__li19808033154415">/var/lib/docker/</li><li id="cce_10_0132__li1232314353441">/var/lib/containerd/</li><li id="cce_10_0132__li445844115448">/var/paas/sys/log/cceaddon-npd/</li></ul>
<p id="cce_10_0132__p925152084514">The temporary file <strong id="cce_10_0132__b6967103431319">npd-disk-write-ping</strong> is generated in the detection path.</p>
<p id="cce_10_0132__p17833111174618">Currently, additional data disks are not supported.</p>
</td>
</tr>
<tr id="cce_10_0132__row652213985816"><td class="cellrowborder" valign="top" width="24.2%" headers="mcps1.3.6.4.2.5.2.4.1.1 "><p id="cce_10_0132__p1661435411711">emptyDir storage pool error</p>
<p id="cce_10_0132__p19824134712129">EmptyDirVolumeGroupStatusError</p>
</td>
<td class="cellrowborder" valign="top" width="39.550000000000004%" headers="mcps1.3.6.4.2.5.2.4.1.2 "><p id="cce_10_0132__p45224397584">Check whether the ephemeral volume group on the node is normal.</p>
<p id="cce_10_0132__p1964614101539">Impact: Pods that depend on the storage pool cannot write data to the temporary volume. The temporary volume is remounted as a read-only file system by the kernel due to an I/O error.</p>
<p id="cce_10_0132__p1649544615344">Typical scenario: When creating a node, a user configures two data disks as an ephemeral volume storage pool. Some data disks are deleted by mistake. As a result, the storage pool becomes abnormal.</p>
</td>
<td class="cellrowborder" rowspan="2" valign="top" width="36.25%" headers="mcps1.3.6.4.2.5.2.4.1.3 "><ul id="cce_10_0132__ul694675316171"><li id="cce_10_0132__li294635331719">Detection period: 30s</li><li id="cce_10_0132__li1194655311712">Source:<pre class="screen" id="cce_10_0132__screen1730112457179">vgs -o vg_name, vg_attr</pre>
</li><li id="cce_10_0132__li179461853101720">Principle: Check whether the VG (storage pool) is in the P state. If yes, some PVs (data disks) are lost.</li><li id="cce_10_0132__li6946145317178">Joint scheduling: The scheduler can automatically identify a PV storage pool error and prevent pods that depend on the storage pool from being scheduled to the node.</li><li id="cce_10_0132__li1194619530173">Exceptional scenario: The NPD add-on cannot detect the loss of all PVs (data disks), resulting in the loss of VGs (storage pools). In this case, kubelet automatically isolates the node, detects the loss of VGs (storage pools), and updates the corresponding resources in <strong id="cce_10_0132__b17637957111913">nodestatus.allocatable</strong> to <strong id="cce_10_0132__b1086015931917">0</strong>. This prevents pods that depend on the storage pool from being scheduled to the node. The damage of a single PV cannot be detected by this check item, but by the ReadonlyFilesystem check item.</li></ul>
</td>
</tr>
<tr id="cce_10_0132__row102785374911"><td class="cellrowborder" valign="top" headers="mcps1.3.6.4.2.5.2.4.1.1 "><p id="cce_10_0132__p8529108185">PV storage pool error</p>
<p id="cce_10_0132__p207911347151214">LocalPvVolumeGroupStatusError</p>
</td>
<td class="cellrowborder" valign="top" headers="mcps1.3.6.4.2.5.2.4.1.2 "><p id="cce_10_0132__p027916334910">Check the PV group on the node.</p>
<p id="cce_10_0132__p063762014315">Impact: Pods that depend on the storage pool cannot write data to the persistent volume. The persistent volume is remounted as a read-only file system by the kernel due to an I/O error.</p>
<p id="cce_10_0132__p5448583352">Typical scenario: When creating a node, a user configures two data disks as a persistent volume storage pool. Some data disks are deleted by mistake.</p>
</td>
</tr>
<tr id="cce_10_0132__row846942934417"><td class="cellrowborder" valign="top" width="24.2%" headers="mcps1.3.6.4.2.5.2.4.1.1 "><p id="cce_10_0132__p18192172115182">Mount point error</p>
<p id="cce_10_0132__p154691829104415">MountPointProblem</p>
</td>
<td class="cellrowborder" valign="top" width="39.550000000000004%" headers="mcps1.3.6.4.2.5.2.4.1.2 "><p id="cce_10_0132__p1747082915445">Check the mount point on the node.</p>
<p id="cce_10_0132__p1528212414385">Exceptional definition: You cannot access the mount point by running the <strong id="cce_10_0132__b1364375693213">cd</strong> command.</p>
<p id="cce_10_0132__p11881753183217">Typical scenario: Network File System (NFS), for example, obsfs and s3fs is mounted to a node. When the connection is abnormal due to network or peer NFS server exceptions, all processes that access the mount point are suspended. For example, during a cluster upgrade, a kubelet is restarted, and all mount points are scanned. If the abnormal mount point is detected, the upgrade fails.</p>
</td>
<td class="cellrowborder" valign="top" width="36.25%" headers="mcps1.3.6.4.2.5.2.4.1.3 "><p id="cce_10_0132__p141531643143416">Alternatively, you can run the following command:</p>
<pre class="screen" id="cce_10_0132__screen83631028111316">for dir in `df -h | grep -v "Mounted on" | awk "{print \\$NF}"`;do cd $dir; done &amp;&amp; echo "ok"</pre>
</td>
</tr>
<tr id="cce_10_0132__row12162627450"><td class="cellrowborder" valign="top" width="24.2%" headers="mcps1.3.6.4.2.5.2.4.1.1 "><p id="cce_10_0132__p106116421182">Suspended disk I/O</p>
<p id="cce_10_0132__p1516219213453">DiskHung</p>
</td>
<td class="cellrowborder" valign="top" width="39.550000000000004%" headers="mcps1.3.6.4.2.5.2.4.1.2 "><p id="cce_10_0132__p1162182184513">Check whether I/O suspension occurs on all disks on the node, that is, whether I/O read and write operations are not responded.</p>
<p id="cce_10_0132__p76931555500">Definition of I/O suspension: The system does not respond to disk I/O requests, and some processes are in the D state.</p>
<p id="cce_10_0132__p8166365514">Typical scenario: Disks cannot respond due to abnormal OS hard disk drivers or severe faults on the underlying network.</p>
</td>
<td class="cellrowborder" valign="top" width="36.25%" headers="mcps1.3.6.4.2.5.2.4.1.3 "><ul id="cce_10_0132__ul44097139534"><li id="cce_10_0132__li178071253283">Check object: all data disks</li><li id="cce_10_0132__li7807165162812">Source:<p id="cce_10_0132__p72111118317"><a name="cce_10_0132__li7807165162812"></a><a name="li7807165162812"></a>/proc/diskstat</p>
<div class="p" id="cce_10_0132__p15201512173110">Alternatively, you can run the following command:<pre class="screen" id="cce_10_0132__screen59331319172519">iostat -xmt 1</pre>
</div>
</li><li id="cce_10_0132__li1880718514285">Thresholds: (All following conditions must be met).<ul id="cce_10_0132__ul268520417114"><li id="cce_10_0132__li1068554191110">Average usage (<strong id="cce_10_0132__b10545171881515">ioutil</strong>) ≥ 0.99</li><li id="cce_10_0132__li659354341220">Average I/O queue length (<strong id="cce_10_0132__b11665145417149">avgqu-sz</strong>) ≥ 1</li><li id="cce_10_0132__li149951244195818">Average I/O transfer volume ≤ 1<p id="cce_10_0132__p86100454582"><a name="cce_10_0132__li149951244195818"></a><a name="li149951244195818"></a>Average I/O transfer volume = Number of writes completed per second (<strong id="cce_10_0132__b15014158145">iops</strong>, unit: w/s) + Amount of data written per second (<strong id="cce_10_0132__b1459618229145">ioth</strong>, unit: wMB/s)</p>
</li></ul>
<div class="note" id="cce_10_0132__note10906163042814"><span class="notetitle"> NOTE: </span><div class="notebody"><p id="cce_10_0132__p0853240102816">In some OSs, no data changes during I/O. In this case, calculate the CPU I/O time usage. The value of <strong id="cce_10_0132__b7847339111419">iowait</strong> should be greater than 0.8.</p>
</div></div>
</li></ul>
</td>
</tr>
<tr id="cce_10_0132__row3985171574511"><td class="cellrowborder" valign="top" width="24.2%" headers="mcps1.3.6.4.2.5.2.4.1.1 "><p id="cce_10_0132__p159915595188">Slow disk I/O</p>
<p id="cce_10_0132__p69851115204518">DiskSlow</p>
</td>
<td class="cellrowborder" valign="top" width="39.550000000000004%" headers="mcps1.3.6.4.2.5.2.4.1.2 "><p id="cce_10_0132__p1398571574515">Check whether all disks on the node have slow I/Os, that is, whether I/Os respond slowly.</p>
<p id="cce_10_0132__p8921802560">Typical scenario: EVS disks have slow I/Os due to network fluctuation.</p>
</td>
<td class="cellrowborder" valign="top" width="36.25%" headers="mcps1.3.6.4.2.5.2.4.1.3 "><ul id="cce_10_0132__ul6519330155316"><li id="cce_10_0132__li128591254132810">Check object: all data disks</li><li id="cce_10_0132__li1285912549289">Source:<p id="cce_10_0132__p1748722411311"><a name="cce_10_0132__li1285912549289"></a><a name="li1285912549289"></a>/proc/diskstat</p>
<div class="p" id="cce_10_0132__p1029519252314">Alternatively, you can run the following command:<pre class="screen" id="cce_10_0132__screen168915347212">iostat -xmt 1</pre>
</div>
</li><li id="cce_10_0132__li1785935492816">Default threshold:<p id="cce_10_0132__p6917738102914"><a name="cce_10_0132__li1785935492816"></a><a name="li1785935492816"></a>Average I/O latency (<strong id="cce_10_0132__b123010620169">await</strong>) ≥ 5000 ms</p>
</li></ul>
<div class="note" id="cce_10_0132__note15410023122915"><span class="notetitle"> NOTE: </span><div class="notebody"><p id="cce_10_0132__p178314521752">If I/O requests are not responded and the <strong id="cce_10_0132__b230416205719">await</strong> data is not updated, this check item is invalid.</p>
</div></div>
</td>
</tr>
</tbody>
</table>
</div>

<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="cce_10_0132__table991817246551" frame="border" border="1" rules="all"><caption><b>Table 7 </b>Other check items</caption><thead align="left"><tr id="cce_10_0132__row17919924115518"><th align="left" class="cellrowborder" valign="top" width="23.112311231123112%" id="mcps1.3.6.4.2.6.2.4.1.1"><p id="cce_10_0132__p1891915245556">Check Item</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="40.49404940494049%" id="mcps1.3.6.4.2.6.2.4.1.2"><p id="cce_10_0132__p69192249556">Function</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="36.393639363936394%" id="mcps1.3.6.4.2.6.2.4.1.3"><p id="cce_10_0132__p89196249551">Description</p>
</th>
</tr>
</thead>
<tbody><tr id="cce_10_0132__row8192230115610"><td class="cellrowborder" valign="top" width="23.112311231123112%" headers="mcps1.3.6.4.2.6.2.4.1.1 "><p id="cce_10_0132__p3451528191910">Abnormal NTP</p>
<p id="cce_10_0132__p1211593117560">NTPProblem</p>
</td>
<td class="cellrowborder" valign="top" width="40.49404940494049%" headers="mcps1.3.6.4.2.6.2.4.1.2 "><p id="cce_10_0132__p311543119562">Check whether the node clock synchronization service ntpd or chronyd is running properly and whether a system time drift is caused.</p>
</td>
<td class="cellrowborder" valign="top" width="36.393639363936394%" headers="mcps1.3.6.4.2.6.2.4.1.3 "><p id="cce_10_0132__p10115123114567">Default clock offset threshold: 8000 ms</p>
</td>
</tr>
<tr id="cce_10_0132__row1931541391210"><td class="cellrowborder" valign="top" width="23.112311231123112%" headers="mcps1.3.6.4.2.6.2.4.1.1 "><p id="cce_10_0132__p1254725281912">Process D error</p>
<p id="cce_10_0132__p23709741314">ProcessD</p>
</td>
<td class="cellrowborder" valign="top" width="40.49404940494049%" headers="mcps1.3.6.4.2.6.2.4.1.2 "><p id="cce_10_0132__p73701476137">Check whether there is a process D on the node.</p>
</td>
<td class="cellrowborder" rowspan="2" valign="top" width="36.393639363936394%" headers="mcps1.3.6.4.2.6.2.4.1.3 "><p id="cce_10_0132__p18976103221315">Default threshold: 10 abnormal processes detected for three consecutive times</p>
<p id="cce_10_0132__p173704791313">Source:</p>
<ul id="cce_10_0132__ul1737010716138"><li id="cce_10_0132__li137012741320">/proc/{PID}/stat</li><li id="cce_10_0132__li83707781318">Alternately, you can run the <strong id="cce_10_0132__b731118224388">ps aux</strong> command.</li></ul>
<p id="cce_10_0132__p1860519498129"></p>
</td>
</tr>
<tr id="cce_10_0132__row0605114920127"><td class="cellrowborder" valign="top" headers="mcps1.3.6.4.2.6.2.4.1.1 "><p id="cce_10_0132__p13814149206">Process Z error</p>
<p id="cce_10_0132__p1832332111319">ProcessZ</p>
</td>
<td class="cellrowborder" valign="top" headers="mcps1.3.6.4.2.6.2.4.1.2 "><p id="cce_10_0132__p932392141311">Check whether the node has processes in Z state.</p>
</td>
</tr>
<tr id="cce_10_0132__row161198651218"><td class="cellrowborder" valign="top" width="23.112311231123112%" headers="mcps1.3.6.4.2.6.2.4.1.1 "><p id="cce_10_0132__p12496026172018">ResolvConf error</p>
<p id="cce_10_0132__p123412751218">ResolvConfFileProblem</p>
</td>
<td class="cellrowborder" valign="top" width="40.49404940494049%" headers="mcps1.3.6.4.2.6.2.4.1.2 "><p id="cce_10_0132__p2034187111218">Check whether the ResolvConf file is lost.</p>
<p id="cce_10_0132__p734137101215">Check whether the ResolvConf file is normal.</p>
<p id="cce_10_0132__p123413711218">Exceptional definition: No upstream domain name resolution server (nameserver) is included.</p>
</td>
<td class="cellrowborder" valign="top" width="36.393639363936394%" headers="mcps1.3.6.4.2.6.2.4.1.3 "><p id="cce_10_0132__p5347713123">Object: <strong id="cce_10_0132__b83131505377">/etc/resolv.conf</strong></p>
</td>
</tr>
<tr id="cce_10_0132__row164381610111213"><td class="cellrowborder" valign="top" width="23.112311231123112%" headers="mcps1.3.6.4.2.6.2.4.1.1 "><p id="cce_10_0132__p9227173419208">Existing scheduled event</p>
<p id="cce_10_0132__p12438141013123">ScheduledEvent</p>
</td>
<td class="cellrowborder" valign="top" width="40.49404940494049%" headers="mcps1.3.6.4.2.6.2.4.1.2 "><p id="cce_10_0132__p164381010121213">Check whether scheduled live migration events exist on the node. A live migration plan event is usually triggered by a hardware fault and is an automatic fault rectification method at the IaaS layer.</p>
<p id="cce_10_0132__p35542221815">Typical scenario: The host is faulty. For example, the fan is damaged or the disk has bad sectors. As a result, live migration is triggered for VMs.</p>
</td>
<td class="cellrowborder" valign="top" width="36.393639363936394%" headers="mcps1.3.6.4.2.6.2.4.1.3 "><p id="cce_10_0132__p20500123151810">Source:</p>
<ul id="cce_10_0132__ul175001023151816"><li id="cce_10_0132__li450022312185">http://169.254.169.254/meta-data/latest/events/scheduled</li></ul>
<p id="cce_10_0132__p1050112238180">This check item is an Alpha feature and is disabled by default.</p>
</td>
</tr>
</tbody>
</table>
</div>
<p id="cce_10_0132__p1141153045812">The kubelet component has the following default check items, which have bugs or defects. You can fix them by upgrading the cluster or using NPD.</p>

<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="cce_10_0132__table195521452174319" frame="border" border="1" rules="all"><caption><b>Table 8 </b>Default kubelet check items</caption><thead align="left"><tr id="cce_10_0132__row25985528436"><th align="left" class="cellrowborder" valign="top" width="23.112311231123112%" id="mcps1.3.6.4.2.8.2.4.1.1"><p id="cce_10_0132__p1059895213434">Check Item</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="40.49404940494049%" id="mcps1.3.6.4.2.8.2.4.1.2"><p id="cce_10_0132__p659835218435">Function</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="36.393639363936394%" id="mcps1.3.6.4.2.8.2.4.1.3"><p id="cce_10_0132__p6598135254318">Description</p>
</th>
</tr>
</thead>
<tbody><tr id="cce_10_0132__row859855216439"><td class="cellrowborder" valign="top" width="23.112311231123112%" headers="mcps1.3.6.4.2.8.2.4.1.1 "><p id="cce_10_0132__p48922169211">Insufficient PID resources</p>
<p id="cce_10_0132__p115986525437">PIDPressure</p>
</td>
<td class="cellrowborder" valign="top" width="40.49404940494049%" headers="mcps1.3.6.4.2.8.2.4.1.2 "><p id="cce_10_0132__p19598185213439">Check whether PIDs are sufficient.</p>
</td>
<td class="cellrowborder" valign="top" width="36.393639363936394%" headers="mcps1.3.6.4.2.8.2.4.1.3 "><ul id="cce_10_0132__ul10652201210213"><li id="cce_10_0132__li16652812162116">Interval: 10 seconds</li><li id="cce_10_0132__li146526128212">Threshold: 90%</li><li id="cce_10_0132__li1165261219211">Defect: In community version 1.23.1 and earlier versions, this check item becomes invalid when over 65535 PIDs are used. For details, see <a href="https://github.com/kubernetes/kubernetes/issues/107107" target="_blank" rel="noopener noreferrer">issue 107107</a>. In community version 1.24 and earlier versions, thread-max is not considered in this check item.</li></ul>
</td>
</tr>
<tr id="cce_10_0132__row1759895214432"><td class="cellrowborder" valign="top" width="23.112311231123112%" headers="mcps1.3.6.4.2.8.2.4.1.1 "><p id="cce_10_0132__p6932102810213">Insufficient memory</p>
<p id="cce_10_0132__p159855217436">MemoryPressure</p>
</td>
<td class="cellrowborder" valign="top" width="40.49404940494049%" headers="mcps1.3.6.4.2.8.2.4.1.2 "><p id="cce_10_0132__p18598115210439">Check whether the allocable memory for the containers is sufficient.</p>
</td>
<td class="cellrowborder" valign="top" width="36.393639363936394%" headers="mcps1.3.6.4.2.8.2.4.1.3 "><ul id="cce_10_0132__ul182919104218"><li id="cce_10_0132__li16291104219">Interval: 10 seconds</li><li id="cce_10_0132__li1029110182115">Threshold: max. 100 MiB</li><li id="cce_10_0132__li9291310112110">Allocable = Total memory of a node – Reserved memory of a node</li><li id="cce_10_0132__li12913101213">Defect: This check item checks only the memory consumed by containers, and does not consider that consumed by other elements on the node.</li></ul>
</td>
</tr>
<tr id="cce_10_0132__row11598185264317"><td class="cellrowborder" valign="top" width="23.112311231123112%" headers="mcps1.3.6.4.2.8.2.4.1.1 "><p id="cce_10_0132__p173361134192113">Insufficient disk resources</p>
<p id="cce_10_0132__p359855212431">DiskPressure</p>
</td>
<td class="cellrowborder" valign="top" width="40.49404940494049%" headers="mcps1.3.6.4.2.8.2.4.1.2 "><p id="cce_10_0132__p25986526438">Check the disk usage and inodes usage of the kubelet and Docker disks.</p>
</td>
<td class="cellrowborder" valign="top" width="36.393639363936394%" headers="mcps1.3.6.4.2.8.2.4.1.3 "><ul id="cce_10_0132__ul18127142317248"><li id="cce_10_0132__li612762313240">Interval: 10 seconds</li><li id="cce_10_0132__li940617271242">Threshold: 90%</li></ul>
</td>
</tr>
</tbody>
</table>
</div>
</li></ul>
</div>
<div class="section" id="cce_10_0132__section1471610580474"><a name="cce_10_0132__section1471610580474"></a><a name="section1471610580474"></a><h4 class="sectiontitle">Node-problem-controller Fault Isolation</h4><div class="note" id="cce_10_0132__note57048194567"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="cce_10_0132__p07047195569">Fault isolation is supported only by add-ons of 1.16.0 and later versions.</p>
<p id="cce_10_0132__p14519162082">By default, if multiple nodes become faulty, NPC adds taints to up to 10% of the nodes. You can set <strong id="cce_10_0132__b13370194181513">npc.maxTaintedNode</strong> to increase the threshold.</p>
</div></div>
<p id="cce_10_0132__p321713044916">The open source NPD plugin provides fault detection but not fault isolation. CCE enhances the node-problem-controller (NPC) based on the open source NPD. This component is implemented based on the Kubernetes <a href="https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/#taint-based-evictions" target="_blank" rel="noopener noreferrer">node controller</a>. For faults reported by NPD, NPC automatically adds taints to nodes for node fault isolation.</p>

<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="cce_10_0132__table205378534248" frame="border" border="1" rules="all"><caption><b>Table 9 </b>Parameters</caption><thead align="left"><tr id="cce_10_0132__row6537185313242"><th align="left" class="cellrowborder" valign="top" width="19%" id="mcps1.3.7.4.2.4.1.1"><p id="cce_10_0132__p105372053142420">Parameter</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="41%" id="mcps1.3.7.4.2.4.1.2"><p id="cce_10_0132__p135371653202410">Description</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="40%" id="mcps1.3.7.4.2.4.1.3"><p id="cce_10_0132__p0537353162412">Default</p>
</th>
</tr>
</thead>
<tbody><tr id="cce_10_0132__row25375532245"><td class="cellrowborder" valign="top" width="19%" headers="mcps1.3.7.4.2.4.1.1 "><p id="cce_10_0132__p05371853192416">npc.enable</p>
</td>
<td class="cellrowborder" valign="top" width="41%" headers="mcps1.3.7.4.2.4.1.2 "><p id="cce_10_0132__p6537155372414">Whether to enable NPC</p>
<p id="cce_10_0132__p1260511032812">This parameter is not supported in 1.18.0 or later versions.</p>
</td>
<td class="cellrowborder" valign="top" width="40%" headers="mcps1.3.7.4.2.4.1.3 "><p id="cce_10_0132__p1353716531240">true</p>
</td>
</tr>
<tr id="cce_10_0132__row12539185322410"><td class="cellrowborder" valign="top" width="19%" headers="mcps1.3.7.4.2.4.1.1 "><p id="cce_10_0132__p053910538244">npc.maxTaintedNode</p>
</td>
<td class="cellrowborder" valign="top" width="41%" headers="mcps1.3.7.4.2.4.1.2 "><p id="cce_10_0132__p15539105352410">The maximum number of nodes that NPC can add taints to when an individual fault occurs on multiple nodes for minimizing impact.</p>
<p id="cce_10_0132__p9569114524816">The value can be in int or percentage format.</p>
</td>
<td class="cellrowborder" valign="top" width="40%" headers="mcps1.3.7.4.2.4.1.3 "><p id="cce_10_0132__p653945332418">10%</p>
<p id="cce_10_0132__p7505142243916">Value range:</p>
<ul id="cce_10_0132__ul932916246394"><li id="cce_10_0132__li1360112793913">The value is in int format and ranges from 1 to infinity.</li><li id="cce_10_0132__li5330102473914">The value ranges from 1% to 100%, in percentage. The minimum value of this parameter multiplied by the number of cluster nodes is 1.</li></ul>
</td>
</tr>
<tr id="cce_10_0132__row8539553142410"><td class="cellrowborder" valign="top" width="19%" headers="mcps1.3.7.4.2.4.1.1 "><p id="cce_10_0132__p12539145352419">npc.nodeAffinity</p>
</td>
<td class="cellrowborder" valign="top" width="41%" headers="mcps1.3.7.4.2.4.1.2 "><p id="cce_10_0132__p85394532241">Node affinity of the controller</p>
</td>
<td class="cellrowborder" valign="top" width="40%" headers="mcps1.3.7.4.2.4.1.3 "><p id="cce_10_0132__p155391653132410">N/A</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="cce_10_0132__section148097511543"><h4 class="sectiontitle">Viewing NPD Events</h4><p id="cce_10_0132__p21774237520">Events reported by the NPD add-on can be queried on the <strong id="cce_10_0132__b32851737135319">Nodes</strong> page.</p>
<ol id="cce_10_0132__ol946010517717"><li id="cce_10_0132__li154602519716"><span>Log in to the CCE console.</span></li><li id="cce_10_0132__li14333111673"><span>Click the cluster name to access the cluster console. Choose <span class="uicontrol" id="cce_10_0132__uicontrol529111571497"><b>Nodes</b></span> in the navigation pane.</span></li><li id="cce_10_0132__li6498143711720"><span>Locate the row that contains the target node, and click <strong id="cce_10_0132__b121055125411">View Events</strong>.</span></li></ol>
</div>
<div class="section" id="cce_10_0132__section1424163811319"><h4 class="sectiontitle">Collecting Prometheus Metrics</h4><p id="cce_10_0132__p1278014271146">The NPD daemon pod exposes Prometheus metric data on port 19901. By default, the NPD pod is added with the annotation <strong id="cce_10_0132__b370543615145">metrics.alpha.kubernetes.io/custom-endpoints: '[{"api":"prometheus","path":"/metrics","port":"19901","names":""}]'</strong>. You can build a Prometheus collector to identify and obtain NPD metrics from <strong id="cce_10_0132__b944214591169">http://{{NpdPodIP}}:{{NpdPodPort}}/metrics</strong>.</p>
<div class="note" id="cce_10_0132__note103331531195320"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="cce_10_0132__p1733593118531">If the NPD add-on version is earlier than 1.16.5, the exposed port of Prometheus metrics is <strong id="cce_10_0132__b176723287178">20257</strong>.</p>
</div></div>
<p id="cce_10_0132__p207808271140">Currently, the metric data includes <strong id="cce_10_0132__b129718519273">problem_counter</strong> and <strong id="cce_10_0132__b826910717274">problem_gauge</strong>, as shown below.</p>
<pre class="screen" id="cce_10_0132__screen1898505318417"># HELP problem_counter Number of times a specific type of problem have occurred.
# TYPE problem_counter counter
problem_counter{reason="DockerHung"} 0
problem_counter{reason="DockerStart"} 0
problem_counter{reason="EmptyDirVolumeGroupStatusError"} 0
...
# HELP problem_gauge Whether a specific type of problem is affecting the node or not.
# TYPE problem_gauge gauge
problem_gauge{reason="CNIIsDown",type="CNIProblem"} 0
problem_gauge{reason="CNIIsUp",type="CNIProblem"} 0
problem_gauge{reason="CRIIsDown",type="CRIProblem"} 0
problem_gauge{reason="CRIIsUp",type="CRIProblem"} 0
..</pre>
</div>
<div class="section" id="cce_10_0132__section183121449435"><h4 class="sectiontitle">Change History</h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="cce_10_0132__table88489551792" frame="border" border="1" rules="all"><caption><b>Table 10 </b>Release history</caption><thead align="left"><tr id="cce_10_0132__en-us_topic_0000001559693886_row10215348165819"><th align="left" class="cellrowborder" valign="top" width="17.65%" id="mcps1.3.10.2.2.5.1.1"><p id="cce_10_0132__en-us_topic_0000001559693886_p3874164435917">Add-on Version</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="29.409999999999997%" id="mcps1.3.10.2.2.5.1.2"><p id="cce_10_0132__en-us_topic_0000001559693886_p118748446592">Supported Cluster Version</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="38.07%" id="mcps1.3.10.2.2.5.1.3"><p id="cce_10_0132__en-us_topic_0000001559693886_p887474416596">New Feature</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="14.87%" id="mcps1.3.10.2.2.5.1.4"><p id="cce_10_0132__en-us_topic_0000001559693886_p5254449161013">Community Version</p>
</th>
</tr>
</thead>
<tbody><tr id="cce_10_0132__en-us_topic_0000001559693886_row1014341183211"><td class="cellrowborder" valign="top" width="17.65%" headers="mcps1.3.10.2.2.5.1.1 "><p id="cce_10_0132__en-us_topic_0000001559693886_p84326118326">1.19.11</p>
</td>
<td class="cellrowborder" valign="top" width="29.409999999999997%" headers="mcps1.3.10.2.2.5.1.2 "><p id="cce_10_0132__en-us_topic_0000001559693886_p5432181111321">v1.21</p>
<p id="cce_10_0132__en-us_topic_0000001559693886_p1543211143220">v1.23</p>
<p id="cce_10_0132__en-us_topic_0000001559693886_p44321711103217">v1.25</p>
<p id="cce_10_0132__en-us_topic_0000001559693886_p943241153219">v1.27</p>
<p id="cce_10_0132__en-us_topic_0000001559693886_p1943218115327">v1.28</p>
<p id="cce_10_0132__en-us_topic_0000001559693886_p64321411163214">v1.29</p>
<p id="cce_10_0132__en-us_topic_0000001559693886_p3432141117329">v1.30</p>
</td>
<td class="cellrowborder" valign="top" width="38.07%" headers="mcps1.3.10.2.2.5.1.3 "><p id="cce_10_0132__en-us_topic_0000001559693886_p050552703211">Fixed some issues.</p>
</td>
<td class="cellrowborder" valign="top" width="14.87%" headers="mcps1.3.10.2.2.5.1.4 "><p id="cce_10_0132__en-us_topic_0000001559693886_p11433111119328"><a href="https://github.com/kubernetes/node-problem-detector/releases/tag/v0.8.10" target="_blank" rel="noopener noreferrer">0.8.10</a></p>
</td>
</tr>
<tr id="cce_10_0132__en-us_topic_0000001559693886_row9597142003814"><td class="cellrowborder" valign="top" width="17.65%" headers="mcps1.3.10.2.2.5.1.1 "><p id="cce_10_0132__en-us_topic_0000001559693886_p0597162015388">1.19.1</p>
</td>
<td class="cellrowborder" valign="top" width="29.409999999999997%" headers="mcps1.3.10.2.2.5.1.2 "><p id="cce_10_0132__en-us_topic_0000001559693886_p18228141910207">v1.21</p>
<p id="cce_10_0132__en-us_topic_0000001559693886_p2228131962014">v1.23</p>
<p id="cce_10_0132__en-us_topic_0000001559693886_p5228101942016">v1.25</p>
<p id="cce_10_0132__en-us_topic_0000001559693886_p62281619112019">v1.27</p>
<p id="cce_10_0132__en-us_topic_0000001559693886_p1222813194207">v1.28</p>
<p id="cce_10_0132__en-us_topic_0000001559693886_p1984171972010">v1.29</p>
</td>
<td class="cellrowborder" valign="top" width="38.07%" headers="mcps1.3.10.2.2.5.1.3 "><p id="cce_10_0132__en-us_topic_0000001559693886_p2381944153814">Fixed some issues.</p>
</td>
<td class="cellrowborder" valign="top" width="14.87%" headers="mcps1.3.10.2.2.5.1.4 "><p id="cce_10_0132__en-us_topic_0000001559693886_p183821644123812"><a href="https://github.com/kubernetes/node-problem-detector/releases/tag/v0.8.10" target="_blank" rel="noopener noreferrer">0.8.10</a></p>
</td>
</tr>
<tr id="cce_10_0132__en-us_topic_0000001559693886_row195613123407"><td class="cellrowborder" valign="top" width="17.65%" headers="mcps1.3.10.2.2.5.1.1 "><p id="cce_10_0132__en-us_topic_0000001559693886_p156712164010">1.19.0</p>
</td>
<td class="cellrowborder" valign="top" width="29.409999999999997%" headers="mcps1.3.10.2.2.5.1.2 "><p id="cce_10_0132__en-us_topic_0000001559693886_p131441184020">v1.21</p>
<p id="cce_10_0132__en-us_topic_0000001559693886_p93341184014">v1.23</p>
<p id="cce_10_0132__en-us_topic_0000001559693886_p15394120409">v1.25</p>
<p id="cce_10_0132__en-us_topic_0000001559693886_p43144110402">v1.27</p>
<p id="cce_10_0132__en-us_topic_0000001559693886_p9344174018">v1.28</p>
</td>
<td class="cellrowborder" valign="top" width="38.07%" headers="mcps1.3.10.2.2.5.1.3 "><p id="cce_10_0132__en-us_topic_0000001559693886_p4653105517119">Fixed some issues.</p>
</td>
<td class="cellrowborder" valign="top" width="14.87%" headers="mcps1.3.10.2.2.5.1.4 "><p id="cce_10_0132__en-us_topic_0000001559693886_p3653255141116"><a href="https://github.com/kubernetes/node-problem-detector/releases/tag/v0.8.10" target="_blank" rel="noopener noreferrer">0.8.10</a></p>
</td>
</tr>
<tr id="cce_10_0132__en-us_topic_0000001559693886_row1043616238142"><td class="cellrowborder" valign="top" width="17.65%" headers="mcps1.3.10.2.2.5.1.1 "><p id="cce_10_0132__en-us_topic_0000001559693886_p36162026101415">1.18.48</p>
</td>
<td class="cellrowborder" valign="top" width="29.409999999999997%" headers="mcps1.3.10.2.2.5.1.2 "><p id="cce_10_0132__en-us_topic_0000001559693886_p14616192661418">v1.21</p>
<p id="cce_10_0132__en-us_topic_0000001559693886_p6616626151416">v1.23</p>
<p id="cce_10_0132__en-us_topic_0000001559693886_p7616102611411">v1.25</p>
<p id="cce_10_0132__en-us_topic_0000001559693886_p161682615147">v1.27</p>
<p id="cce_10_0132__en-us_topic_0000001559693886_p11616122619140">v1.28</p>
</td>
<td class="cellrowborder" valign="top" width="38.07%" headers="mcps1.3.10.2.2.5.1.3 "><p id="cce_10_0132__en-us_topic_0000001559693886_p2441425115712">Fixed some issues.</p>
</td>
<td class="cellrowborder" valign="top" width="14.87%" headers="mcps1.3.10.2.2.5.1.4 "><p id="cce_10_0132__en-us_topic_0000001559693886_p11616102641419"><a href="https://github.com/kubernetes/node-problem-detector/releases/tag/v0.8.10" target="_blank" rel="noopener noreferrer">0.8.10</a></p>
</td>
</tr>
<tr id="cce_10_0132__en-us_topic_0000001559693886_row18718023165810"><td class="cellrowborder" valign="top" width="17.65%" headers="mcps1.3.10.2.2.5.1.1 "><p id="cce_10_0132__en-us_topic_0000001559693886_p2718112355816">1.18.46</p>
</td>
<td class="cellrowborder" valign="top" width="29.409999999999997%" headers="mcps1.3.10.2.2.5.1.2 "><p id="cce_10_0132__en-us_topic_0000001559693886_p1991615199599">v1.21</p>
<p id="cce_10_0132__en-us_topic_0000001559693886_p4916131945915">v1.23</p>
<p id="cce_10_0132__en-us_topic_0000001559693886_p9916819195920">v1.25</p>
<p id="cce_10_0132__en-us_topic_0000001559693886_p14916719175914">v1.27</p>
<p id="cce_10_0132__en-us_topic_0000001559693886_p8630152045911">v1.28</p>
</td>
<td class="cellrowborder" valign="top" width="38.07%" headers="mcps1.3.10.2.2.5.1.3 "><p id="cce_10_0132__en-us_topic_0000001559693886_p2169114418574">CCE clusters 1.28 are supported.</p>
</td>
<td class="cellrowborder" valign="top" width="14.87%" headers="mcps1.3.10.2.2.5.1.4 "><p id="cce_10_0132__en-us_topic_0000001559693886_p12442172917188"><a href="https://github.com/kubernetes/node-problem-detector/releases/tag/v0.8.10" target="_blank" rel="noopener noreferrer">0.8.10</a></p>
</td>
</tr>
<tr id="cce_10_0132__en-us_topic_0000001559693886_row10758123992215"><td class="cellrowborder" valign="top" width="17.65%" headers="mcps1.3.10.2.2.5.1.1 "><p id="cce_10_0132__en-us_topic_0000001559693886_p3173144519223">1.18.22</p>
</td>
<td class="cellrowborder" valign="top" width="29.409999999999997%" headers="mcps1.3.10.2.2.5.1.2 "><p id="cce_10_0132__en-us_topic_0000001559693886_p17173345152219">v1.19</p>
<p id="cce_10_0132__en-us_topic_0000001559693886_p61731145182211">v1.21</p>
<p id="cce_10_0132__en-us_topic_0000001559693886_p317364522216">v1.23</p>
<p id="cce_10_0132__en-us_topic_0000001559693886_p1617384510224">v1.25</p>
<p id="cce_10_0132__en-us_topic_0000001559693886_p0173545142214">v1.27</p>
</td>
<td class="cellrowborder" valign="top" width="38.07%" headers="mcps1.3.10.2.2.5.1.3 "><p id="cce_10_0132__en-us_topic_0000001559693886_p9173124512229">None</p>
</td>
<td class="cellrowborder" valign="top" width="14.87%" headers="mcps1.3.10.2.2.5.1.4 "><p id="cce_10_0132__en-us_topic_0000001559693886_p7173104532217"><a href="https://github.com/kubernetes/node-problem-detector/releases/tag/v0.8.10" target="_blank" rel="noopener noreferrer">0.8.10</a></p>
</td>
</tr>
<tr id="cce_10_0132__en-us_topic_0000001559693886_row1752182820340"><td class="cellrowborder" valign="top" width="17.65%" headers="mcps1.3.10.2.2.5.1.1 "><p id="cce_10_0132__en-us_topic_0000001559693886_p65216284340">1.17.4</p>
</td>
<td class="cellrowborder" valign="top" width="29.409999999999997%" headers="mcps1.3.10.2.2.5.1.2 "><p id="cce_10_0132__en-us_topic_0000001559693886_p81027538914">v1.17</p>
<p id="cce_10_0132__en-us_topic_0000001559693886_p459694419918">v1.19</p>
<p id="cce_10_0132__en-us_topic_0000001559693886_p1059614445916">v1.21</p>
<p id="cce_10_0132__en-us_topic_0000001559693886_p135961944693">v1.23</p>
<p id="cce_10_0132__en-us_topic_0000001559693886_p2059664412914">v1.25</p>
</td>
<td class="cellrowborder" valign="top" width="38.07%" headers="mcps1.3.10.2.2.5.1.3 "><p id="cce_10_0132__en-us_topic_0000001559693886_p22619219010">Optimized DiskHung check item.</p>
</td>
<td class="cellrowborder" valign="top" width="14.87%" headers="mcps1.3.10.2.2.5.1.4 "><p id="cce_10_0132__en-us_topic_0000001559693886_p207112293612"><a href="https://github.com/kubernetes/node-problem-detector/releases/tag/v0.8.10" target="_blank" rel="noopener noreferrer">0.8.10</a></p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="cce_10_0908.html">Cloud Native Observability Add-ons</a></div>
</div>
</div>