Files
doc-exports/docs/cce/umn/cce_10_0425.html
qiujiandong1 ab1e53a279 CCE UMN 20251031 version
Reviewed-by: Gergo-Bence Lorincz <a200452876@noreply.gitea.eco.tsi-dev.otc-service.com>
Co-authored-by: qiujiandong1 <qiujiandong1@huawei.com>
Co-committed-by: qiujiandong1 <qiujiandong1@huawei.com>
2026-01-15 10:25:22 +00:00

350 lines
39 KiB
HTML

<a name="cce_10_0425"></a><a name="cce_10_0425"></a>
<h1 class="topictitle1">NUMA Affinity Scheduling</h1>
<div id="body0000001404703353"><p id="cce_10_0425__p814515711327">In non-uniform memory access (NUMA) architecture, a NUMA node is a fundamental component that includes a processor and local memory. These nodes are physically separate but interconnected through a high-speed bus to form a complete system. To boost system performance, NUMA nodes allow for quicker access to local memory. However, accessing memory across multiple NUMA nodes within a node can cause delays. To enhance memory access efficiency and overall performance, it is crucial to optimize task scheduling and memory allocation.</p>
<p id="cce_10_0425__p778213371498">When working with high-performance computing (HPC), real-time applications, or memory-intensive workloads that require frequent communication between CPUs, accessing nodes across NUMA in a cloud native environment can lead to decreased system performance due to increased latency and overhead. Volcano's NUMA affinity scheduling resolves the issue by scheduling pods to the worker node that requires the least number of cross-NUMA nodes. This reduces data transmission overheads, optimizes resource utilization, and enhances overall system performance.</p>
<p id="cce_10_0425__p108771643195512">For more information, see <a href="https://github.com/volcano-sh/volcano/blob/master/docs/design/numa-aware.md" target="_blank" rel="noopener noreferrer">https://github.com/volcano-sh/volcano/blob/master/docs/design/numa-aware.md</a>.</p>
<div class="section" id="cce_10_0425__section1687487141012"><h4 class="sectiontitle">Prerequisites</h4><ul id="cce_10_0425__ul4141815141013"><li id="cce_10_0425__li10141815171014">A CCE standard or Turbo cluster is available. For details, see <a href="cce_10_0028.html">Creating a CCE Standard/Turbo Cluster</a>.</li><li id="cce_10_0425__li6772815141112">The Volcano add-on has been installed in the cluster. For details, see <a href="cce_10_0193.html">Volcano Scheduler</a>.</li></ul>
</div>
<div class="section" id="cce_10_0425__section2430103110429"><a name="cce_10_0425__section2430103110429"></a><a name="section2430103110429"></a><h4 class="sectiontitle">Pod Scheduling Process</h4><p id="cce_10_0425__p10429103104218">After a topology policy is configured for pods, Volcano predicts the nodes that match the policy. For details about how to configure a pod topology policy, see <a href="#cce_10_0425__section20735201818553">Example of NUMA Affinity Scheduling</a>. The scheduling process is as follows:</p>
<ol id="cce_10_0425__ol06921153092"><li id="cce_10_0425__li13692155313910">Volcano filters nodes with the same policy based on the topology policy configured for pods. The topology policy provided by Volcano is the same as that provided by the <a href="https://kubernetes.io/docs/tasks/administer-cluster/topology-manager/" target="_blank" rel="noopener noreferrer">topology manager</a>.</li><li id="cce_10_0425__li6616381116">Among the nodes where the same policy applies, Volcano selects the nodes whose CPU topology meets the policy requirements for scheduling.</li></ol>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="cce_10_0425__table1230718582454" frame="border" border="1" rules="all"><thead align="left"><tr id="cce_10_0425__row45291417876"><th align="left" class="cellrowborder" rowspan="2" valign="top" id="mcps1.3.5.4.1.4.1.1"><p id="cce_10_0425__p1852911171675">Pod Topology Policy</p>
</th>
<th align="left" class="cellrowborder" colspan="2" valign="top" id="mcps1.3.5.4.1.4.1.2"><p id="cce_10_0425__p34408217716">How to Filter Nodes During Pod Scheduling</p>
</th>
</tr>
<tr id="cce_10_0425__row1030714588458"><th align="left" class="cellrowborder" valign="top" id="mcps1.3.5.4.1.4.2.1"><p id="cce_10_0425__p6475144316514">1. Filter nodes that meet the topology policy set for the pod.</p>
</th>
<th align="left" class="cellrowborder" valign="top" id="mcps1.3.5.4.1.4.2.2"><p id="cce_10_0425__p0149941152412">2. Further filter the node whose CPU topology meets the policy.</p>
</th>
</tr>
</thead>
<tbody><tr id="cce_10_0425__row193071458134515"><td class="cellrowborder" valign="top" width="20%" headers="mcps1.3.5.4.1.4.1.1 mcps1.3.5.4.1.4.2.1 "><p id="cce_10_0425__p11195619465">none</p>
</td>
<td class="cellrowborder" valign="top" width="40%" headers="mcps1.3.5.4.1.4.1.2 mcps1.3.5.4.1.4.2.2 "><p id="cce_10_0425__p5536180181311">Nodes with the following topology policies will not be filtered during scheduling:</p>
<ul id="cce_10_0425__ul232718401574"><li id="cce_10_0425__li17327740579"><strong id="cce_10_0425__b615483311713">none</strong>: schedulable</li><li id="cce_10_0425__li13272401477"><strong id="cce_10_0425__b5694193313512">best-effort</strong>: schedulable</li><li id="cce_10_0425__li7327340178"><strong id="cce_10_0425__b2598338175110">restricted</strong>: schedulable</li><li id="cce_10_0425__li632715409713"><strong id="cce_10_0425__b132171558115116">single-numa-node</strong>: schedulable</li></ul>
</td>
<td class="cellrowborder" valign="top" width="40%" headers="mcps1.3.5.4.1.4.1.2 "><p id="cce_10_0425__p1814924182410">None</p>
</td>
</tr>
<tr id="cce_10_0425__row93084582457"><td class="cellrowborder" valign="top" width="20%" headers="mcps1.3.5.4.1.4.1.1 mcps1.3.5.4.1.4.2.1 "><p id="cce_10_0425__p31199615469">best-effort</p>
</td>
<td class="cellrowborder" valign="top" width="40%" headers="mcps1.3.5.4.1.4.1.2 mcps1.3.5.4.1.4.2.2 "><p id="cce_10_0425__p33280231134">Nodes with the <strong id="cce_10_0425__b86017292527">best-effort</strong> topology policy will be filtered.</p>
<ul id="cce_10_0425__ul57538412816"><li id="cce_10_0425__li37531343815"><strong id="cce_10_0425__b19833123219521">none</strong>: unschedulable</li><li id="cce_10_0425__li3753640816"><strong id="cce_10_0425__b294403725117">best-effort</strong>: schedulable</li><li id="cce_10_0425__li77531541882"><strong id="cce_10_0425__b14469343115216">restricted</strong>: unschedulable</li><li id="cce_10_0425__li11753194880"><strong id="cce_10_0425__b2082885011528">single-numa-node</strong>: unschedulable</li></ul>
</td>
<td class="cellrowborder" valign="top" width="40%" headers="mcps1.3.5.4.1.4.1.2 "><p id="cce_10_0425__p79321227185912">Best-effort scheduling:</p>
<p id="cce_10_0425__p13150184142413">Pods are preferentially scheduled to a single NUMA node. If a single NUMA node cannot meet the requested CPU cores, the pods can be scheduled to multiple NUMA nodes.</p>
</td>
</tr>
<tr id="cce_10_0425__row10308158124518"><td class="cellrowborder" valign="top" width="20%" headers="mcps1.3.5.4.1.4.1.1 mcps1.3.5.4.1.4.2.1 "><p id="cce_10_0425__p0120065462">restricted</p>
</td>
<td class="cellrowborder" valign="top" width="40%" headers="mcps1.3.5.4.1.4.1.2 mcps1.3.5.4.1.4.2.2 "><p id="cce_10_0425__p383114214225">Nodes with the <strong id="cce_10_0425__b1421175975619">restricted</strong> topology policy will be filtered.</p>
<ul id="cce_10_0425__ul128398145815"><li id="cce_10_0425__li1783901417819"><strong id="cce_10_0425__b933512393520">none</strong>: unschedulable</li><li id="cce_10_0425__li88399142813"><strong id="cce_10_0425__b103081523175714">best-effort</strong>: unschedulable</li><li id="cce_10_0425__li1883916141781"><strong id="cce_10_0425__b2855104975114">restricted</strong>: schedulable</li><li id="cce_10_0425__li283981417812"><strong id="cce_10_0425__b169621256165218">single-numa-node</strong>: unschedulable</li></ul>
</td>
<td class="cellrowborder" valign="top" width="40%" headers="mcps1.3.5.4.1.4.1.2 "><p id="cce_10_0425__p11507419244">Restricted scheduling:</p>
<ul id="cce_10_0425__ul1772065555517"><li id="cce_10_0425__li147202553553">If the upper CPU limit of a single NUMA node is greater than or equal to the requested CPU cores, pods can only be scheduled to a single NUMA node. If the remaining CPU cores of a single NUMA node are insufficient, the pods cannot be scheduled.</li><li id="cce_10_0425__li117201655115519">If the upper CPU limit of a single NUMA node is less than the requested CPU cores, pods can be scheduled to multiple NUMA nodes.</li></ul>
</td>
</tr>
<tr id="cce_10_0425__row1430855894517"><td class="cellrowborder" valign="top" width="20%" headers="mcps1.3.5.4.1.4.1.1 mcps1.3.5.4.1.4.2.1 "><p id="cce_10_0425__p4120186174616">single-numa-node</p>
</td>
<td class="cellrowborder" valign="top" width="40%" headers="mcps1.3.5.4.1.4.1.2 mcps1.3.5.4.1.4.2.2 "><p id="cce_10_0425__p95501926111311">Nodes with the <strong id="cce_10_0425__b2375105318">single-numa-node</strong> topology policy will be filtered.</p>
<ul id="cce_10_0425__ul2011151613816"><li id="cce_10_0425__li31117161684"><strong id="cce_10_0425__b11336839155211">none</strong>: unschedulable</li><li id="cce_10_0425__li111118161682"><strong id="cce_10_0425__b1943882785712">best-effort</strong>: unschedulable</li><li id="cce_10_0425__li21118161483"><strong id="cce_10_0425__b1628345065217">restricted</strong>: unschedulable</li><li id="cce_10_0425__li14111161789"><strong id="cce_10_0425__b951912045215">single-numa-node</strong>: schedulable</li></ul>
</td>
<td class="cellrowborder" valign="top" width="40%" headers="mcps1.3.5.4.1.4.1.2 "><p id="cce_10_0425__p1015074122416">Pods can only be scheduled to a single NUMA node.</p>
</td>
</tr>
</tbody>
</table>
</div>
<p id="cce_10_0425__p12954835443">For example, two NUMA nodes provide resources, each with a total of 32 CPU cores. The following table lists resource allocation.</p>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="cce_10_0425__table6517617610" frame="border" border="1" rules="all"><thead align="left"><tr id="cce_10_0425__row451713110615"><th align="left" class="cellrowborder" rowspan="2" valign="top" id="mcps1.3.5.6.1.7.1.1"><p id="cce_10_0425__p17517118614">Worker Node</p>
</th>
<th align="left" class="cellrowborder" rowspan="2" valign="top" id="mcps1.3.5.6.1.7.1.2"><p id="cce_10_0425__p9261124214246">Node Topology Policy</p>
</th>
<th align="left" class="cellrowborder" colspan="2" valign="top" id="mcps1.3.5.6.1.7.1.3"><p id="cce_10_0425__p65171911067">Total CPU Cores on NUMA Node 1</p>
</th>
<th align="left" class="cellrowborder" colspan="2" valign="top" id="mcps1.3.5.6.1.7.1.4"><p id="cce_10_0425__p15517610618">Total CPU Cores on NUMA Node 2</p>
</th>
</tr>
<tr id="cce_10_0425__row68381123142819"><th align="left" class="cellrowborder" valign="top" id="mcps1.3.5.6.1.7.2.1"><p id="cce_10_0425__p14839723142810">Total CPU Cores</p>
</th>
<th align="left" class="cellrowborder" valign="top" id="mcps1.3.5.6.1.7.2.2"><p id="cce_10_0425__p11187203452811">Available CPU Cores</p>
</th>
<th align="left" class="cellrowborder" valign="top" id="mcps1.3.5.6.1.7.2.3"><p id="cce_10_0425__p12979173912913">Total CPU Cores</p>
</th>
<th align="left" class="cellrowborder" valign="top" id="mcps1.3.5.6.1.7.2.4"><p id="cce_10_0425__p1661484582816">Available CPU Cores</p>
</th>
</tr>
</thead>
<tbody><tr id="cce_10_0425__row14517519610"><td class="cellrowborder" valign="top" width="16.666666666666664%" headers="mcps1.3.5.6.1.7.1.1 mcps1.3.5.6.1.7.2.1 "><p id="cce_10_0425__p195171111064">Node 1</p>
</td>
<td class="cellrowborder" valign="top" width="16.666666666666664%" headers="mcps1.3.5.6.1.7.1.2 mcps1.3.5.6.1.7.2.2 "><p id="cce_10_0425__p626184210240">best-effort</p>
</td>
<td class="cellrowborder" valign="top" width="16.666666666666664%" headers="mcps1.3.5.6.1.7.1.3 mcps1.3.5.6.1.7.2.3 "><p id="cce_10_0425__p15517412617">16</p>
</td>
<td class="cellrowborder" valign="top" width="16.666666666666664%" headers="mcps1.3.5.6.1.7.1.3 mcps1.3.5.6.1.7.2.4 "><p id="cce_10_0425__p111871347288">7</p>
</td>
<td class="cellrowborder" valign="top" width="16.666666666666664%" headers="mcps1.3.5.6.1.7.1.4 "><p id="cce_10_0425__p7517911769">16</p>
</td>
<td class="cellrowborder" valign="top" width="16.666666666666664%" headers="mcps1.3.5.6.1.7.1.4 "><p id="cce_10_0425__p19614174516288">7</p>
</td>
</tr>
<tr id="cce_10_0425__row114038219253"><td class="cellrowborder" valign="top" width="16.666666666666664%" headers="mcps1.3.5.6.1.7.1.1 mcps1.3.5.6.1.7.2.1 "><p id="cce_10_0425__p56431114122515">Node 2</p>
</td>
<td class="cellrowborder" valign="top" width="16.666666666666664%" headers="mcps1.3.5.6.1.7.1.2 mcps1.3.5.6.1.7.2.2 "><p id="cce_10_0425__p66431214172518">restricted</p>
</td>
<td class="cellrowborder" valign="top" width="16.666666666666664%" headers="mcps1.3.5.6.1.7.1.3 mcps1.3.5.6.1.7.2.3 "><p id="cce_10_0425__p464331412256">16</p>
</td>
<td class="cellrowborder" valign="top" width="16.666666666666664%" headers="mcps1.3.5.6.1.7.1.3 mcps1.3.5.6.1.7.2.4 "><p id="cce_10_0425__p191871534162816">7</p>
</td>
<td class="cellrowborder" valign="top" width="16.666666666666664%" headers="mcps1.3.5.6.1.7.1.4 "><p id="cce_10_0425__p1264314140257">16</p>
</td>
<td class="cellrowborder" valign="top" width="16.666666666666664%" headers="mcps1.3.5.6.1.7.1.4 "><p id="cce_10_0425__p15614184572816">7</p>
</td>
</tr>
<tr id="cce_10_0425__row20647173152513"><td class="cellrowborder" valign="top" width="16.666666666666664%" headers="mcps1.3.5.6.1.7.1.1 mcps1.3.5.6.1.7.2.1 "><p id="cce_10_0425__p1319692255">Node 3</p>
</td>
<td class="cellrowborder" valign="top" width="16.666666666666664%" headers="mcps1.3.5.6.1.7.1.2 mcps1.3.5.6.1.7.2.2 "><p id="cce_10_0425__p9196183014253">restricted</p>
</td>
<td class="cellrowborder" valign="top" width="16.666666666666664%" headers="mcps1.3.5.6.1.7.1.3 mcps1.3.5.6.1.7.2.3 "><p id="cce_10_0425__p19319995256">16</p>
</td>
<td class="cellrowborder" valign="top" width="16.666666666666664%" headers="mcps1.3.5.6.1.7.1.3 mcps1.3.5.6.1.7.2.4 "><p id="cce_10_0425__p141876348288">7</p>
</td>
<td class="cellrowborder" valign="top" width="16.666666666666664%" headers="mcps1.3.5.6.1.7.1.4 "><p id="cce_10_0425__p1731949102517">16</p>
</td>
<td class="cellrowborder" valign="top" width="16.666666666666664%" headers="mcps1.3.5.6.1.7.1.4 "><p id="cce_10_0425__p206142453284">10</p>
</td>
</tr>
<tr id="cce_10_0425__row854654102514"><td class="cellrowborder" valign="top" width="16.666666666666664%" headers="mcps1.3.5.6.1.7.1.1 mcps1.3.5.6.1.7.2.1 "><p id="cce_10_0425__p54281610112515">Node 4</p>
</td>
<td class="cellrowborder" valign="top" width="16.666666666666664%" headers="mcps1.3.5.6.1.7.1.2 mcps1.3.5.6.1.7.2.2 "><p id="cce_10_0425__p14467153711258">single-numa-node</p>
</td>
<td class="cellrowborder" valign="top" width="16.666666666666664%" headers="mcps1.3.5.6.1.7.1.3 mcps1.3.5.6.1.7.2.3 "><p id="cce_10_0425__p11428410112512">16</p>
</td>
<td class="cellrowborder" valign="top" width="16.666666666666664%" headers="mcps1.3.5.6.1.7.1.3 mcps1.3.5.6.1.7.2.4 "><p id="cce_10_0425__p191873349287">7</p>
</td>
<td class="cellrowborder" valign="top" width="16.666666666666664%" headers="mcps1.3.5.6.1.7.1.4 "><p id="cce_10_0425__p114280103255">16</p>
</td>
<td class="cellrowborder" valign="top" width="16.666666666666664%" headers="mcps1.3.5.6.1.7.1.4 "><p id="cce_10_0425__p126141345112819">10</p>
</td>
</tr>
</tbody>
</table>
</div>
<p id="cce_10_0425__p166430235223"><a href="#cce_10_0425__fig1216082014438">Figure 1</a> shows the scheduling of a pod after a topology policy is configured.</p>
<ul id="cce_10_0425__ul912180655"><li id="cce_10_0425__li31211703519">When 9 CPU cores are requested by a pod and the <strong id="cce_10_0425__b1774011237118">best-effort</strong> topology policy is used, Volcano selects node 1 whose topology policy is also <strong id="cce_10_0425__b2043124914110">best-effort</strong>, and this policy allows the pod to be scheduled to multiple NUMA nodes. Therefore, the requested 9 CPU cores will be allocated to two NUMA nodes, and the pod can be scheduled to node 1.</li><li id="cce_10_0425__li4121905517">When 11 CPU cores are requested by a pod and the <strong id="cce_10_0425__b178471913148">restricted</strong> topology policy is used, Volcano selects nodes 2 and 3 whose topology policy is also <strong id="cce_10_0425__b4317171071520">restricted</strong>, and each node provides at least 11 CPU cores. However, the remaining CPU cores on node 2 or 3 are less than the requested. Therefore, the pod cannot be scheduled.</li><li id="cce_10_0425__li21218020520">When 17 CPU cores are requested by a pod and the <strong id="cce_10_0425__b19663141812184">restricted</strong> topology policy is used, Volcano selects nodes 2 and 3 whose topology policy is also <strong id="cce_10_0425__b9497105113189">restricted</strong>, this policy allows the pod to be scheduled to multiple NUMA nodes, and the upper CPU limit of both the nodes is less than 17. Then, the pod can be scheduled to node 3.</li><li id="cce_10_0425__li20121609511">When 17 CPU cores are requested by a pod and the <strong id="cce_10_0425__b1188725722313">single-numa-node</strong> topology policy is used, Volcano selects nodes whose topology policy is also <strong id="cce_10_0425__b1946362016246">single-numa-node</strong>. However, no node can provide a total of 17 CPU cores. Therefore, the pod cannot be scheduled.</li></ul>
<div class="fignone" id="cce_10_0425__fig1216082014438"><a name="cce_10_0425__fig1216082014438"></a><a name="fig1216082014438"></a><span class="figcap"><b>Figure 1 </b>Comparison of NUMA scheduling policies</span><br><span><img class="eddx" id="cce_10_0425__image153435184218" src="en-us_image_0000002516199439.png"></span></div>
</div>
<div class="section" id="cce_10_0425__section1338110172412"><h4 class="sectiontitle">Scheduling Priority</h4><p id="cce_10_0425__p6491162872419">A topology policy aims to schedule pods to the optimal node. In this example, each node is scored to sort out the optimal node.</p>
<p id="cce_10_0425__p18491202872412">Principle: Schedule pods to the worker nodes that require the fewest NUMA nodes.</p>
<p id="cce_10_0425__p179068453246">The scoring formula is as follows:</p>
<p id="cce_10_0425__p9779896253">score = weight x (100 - 100 x numaNodeNum/maxNumaNodeNum)</p>
<p id="cce_10_0425__p12707192211268">Parameters:</p>
<ul id="cce_10_0425__ul1426411244267"><li id="cce_10_0425__li826422412269"><strong id="cce_10_0425__b1605155865115">weight</strong>: the weight of NUMA Aware Plugin.</li><li id="cce_10_0425__li1943533517265"><strong id="cce_10_0425__b1670914435211">numaNodeNum</strong>: the number of NUMA nodes required for running the pod on worker nodes.</li><li id="cce_10_0425__li14341164112612"><strong id="cce_10_0425__b7537152119273">maxNumaNodeNum</strong>: the maximum number of NUMA nodes required for running the pod among all worker nodes.</li></ul>
<p id="cce_10_0425__p1696682165310">For example, three nodes meet the CPU topology policy for a pod and the weight of NUMA Aware Plugin is set to <strong id="cce_10_0425__b15402659397">10</strong>.</p>
<ul id="cce_10_0425__ul186059535551"><li id="cce_10_0425__li1160565325511">Node A: One NUMA node provides the CPU resources required by the pod (numaNodeNum = 1).</li><li id="cce_10_0425__li1360525335510">Node B: Two NUMA nodes provide the CPU resources required by the pod (numaNodeNum = 2).</li><li id="cce_10_0425__li19605453105516">Node C: Four NUMA nodes provide the CPU resources required by the pod (numaNodeNum = 4).</li></ul>
<p id="cce_10_0425__p128521875558">According to the preceding formula, <strong id="cce_10_0425__b457720482405">maxNumaNodeNum</strong> is <strong id="cce_10_0425__b1088924964018">4</strong>.</p>
<ul id="cce_10_0425__ul17527575555"><li id="cce_10_0425__li97527571557">score (Node A) = 10 x (100 - 100 x 1/4) = 750</li><li id="cce_10_0425__li375218573554">score (Node B) = 10 x (100 - 100 x 2/4) = 500</li><li id="cce_10_0425__li075215574558">score (Node C) = 10 x (100 - 100 x 4/4) = 0</li></ul>
<p id="cce_10_0425__p7226173912577">Therefore, the optimal node is Node A.</p>
</div>
<div class="section" id="cce_10_0425__section36347395215"><h4 class="sectiontitle">Enabling NUMA Affinity Scheduling for Volcano</h4><ol id="cce_10_0425__ol6132033164011"><li id="cce_10_0425__li31321833144018"><span>Enable CPU management in the node pool. For details, see <a href="cce_10_0351.html#cce_10_0351__section1460719557453">Configuring a Node Pool-level CPU Management Policy</a>.</span><p><ol type="a" id="cce_10_0425__ol034962625711"><li id="cce_10_0425__li330462393220">Log in to the <span id="cce_10_0425__cce_10_0004_ph18314322182">CCE console</span> and click the cluster name to access the cluster console.</li><li id="cce_10_0425__en-us_topic_0000001244101017_li7594232125718">Choose <span class="uicontrol" id="cce_10_0425__uicontrol195576591910"><b>Nodes</b></span> in the navigation pane and click the <span class="uicontrol" id="cce_10_0425__uicontrol955885161914"><b>Node Pools</b></span> tab on the right. Locate the target node pool and choose <strong id="cce_10_0425__b5558135151911">More</strong> &gt; <strong id="cce_10_0425__b3558559193">Manage</strong>.</li><li id="cce_10_0425__en-us_topic_0000001244101017_li19594113255713">On the <span class="uicontrol" id="cce_10_0425__uicontrol221188141916"><b>Manage Configurations</b></span> page, adjust the <strong id="cce_10_0425__b021178191911">CPU Management Policy (cpu-manager-policy)</strong> in the <strong id="cce_10_0425__b112112821918">kubelet</strong> area based on the QoS class of the service pods.<ul id="cce_10_0425__ul159221553471"><li id="cce_10_0425__li439116344487">If the QoS class is <strong id="cce_10_0425__b106591113195811">Guaranteed</strong> (where resource requests equal limits), set the policy to <strong id="cce_10_0425__b104599270584">static</strong>.</li><li id="cce_10_0425__li1924260164917">If the QoS class is <strong id="cce_10_0425__b660117301637">Burstable</strong> (where resource requests and limits are different), set the policy to <strong id="cce_10_0425__b67536366213">enhanced-static</strong>.</li><li id="cce_10_0425__li1316513993815">If the node pool contains pods with both <strong id="cce_10_0425__b12201102053">Guaranteed</strong> and <strong id="cce_10_0425__b36321987513">Burstable</strong> QoS classes, set the policy to <strong id="cce_10_0425__b281174519217">enhanced-static</strong>.</li></ul>
<p id="cce_10_0425__en-us_topic_0000001244101017_p4897657185210"></p>
</li><li id="cce_10_0425__en-us_topic_0000001244101017_li25944326577">Click <strong id="cce_10_0425__b1112919154194">OK</strong>.</li></ol>
</p></li><li id="cce_10_0425__li1240461917103"><span>Configure a CPU topology policy in the node pool.</span><p><ol type="a" id="cce_10_0425__ol42931721193819"><li id="cce_10_0425__li728113441839">Log in to the <span id="cce_10_0425__cce_10_0004_ph18314322182_1">CCE console</span> and click the cluster name to access the cluster console.</li><li id="cce_10_0425__li1462817302382">Choose <span class="uicontrol" id="cce_10_0425__uicontrol51581752162317"><b>Nodes</b></span> in the navigation pane and click the <span class="uicontrol" id="cce_10_0425__uicontrol1015945218235"><b>Node Pools</b></span> tab on the right. Locate the target node pool and choose <strong id="cce_10_0425__b1515918527233">More</strong> &gt; <strong id="cce_10_0425__b91594523234">Manage</strong>.</li><li id="cce_10_0425__li1421914355386">Change the kubelet <strong id="cce_10_0425__b1574710446477">Topology Management Policy (topology-manager-policy)</strong> value to the required CPU topology policy.<p id="cce_10_0425__p159544350385">Valid topology policies include <strong id="cce_10_0425__b1786014585311">none</strong>, <strong id="cce_10_0425__b108619455538">best-effort</strong>, <strong id="cce_10_0425__b1186254517531">restricted</strong>, and <strong id="cce_10_0425__b118633456535">single-numa-node</strong>. For details, see <a href="#cce_10_0425__section2430103110429">Pod Scheduling Process</a>.</p>
<p id="cce_10_0425__p169532765910"></p>
</li></ol>
</p></li><li id="cce_10_0425__li15561124416020"><span>Enable the numa-aware add-on and the <strong id="cce_10_0425__b9915151215516">resource_exporter</strong> function.</span><p><p id="cce_10_0425__p14912576012"><strong id="cce_10_0425__b1822402215554">Volcano 1.7.1 or later</strong></p>
<ol type="a" id="cce_10_0425__ol29221109112"><li id="cce_10_0425__li175291450330">Log in to the <span id="cce_10_0425__cce_10_0004_ph18314322182_2">CCE console</span> and click the cluster name to access the cluster console.</li><li id="cce_10_0425__li1219753519197">In the navigation pane, choose <strong id="cce_10_0425__b9961855172315"><span id="cce_10_0425__text1296165592317">Add-ons</span></strong>. Locate <strong id="cce_10_0425__b796185516231">Volcano Scheduler</strong> on the right and click <strong id="cce_10_0425__b11961755172318">Edit</strong>.</li><li id="cce_10_0425__li62262475194">In the <strong id="cce_10_0425__b610531743916">Extended Functions</strong> area, enable <strong id="cce_10_0425__b512018174511">NUMA Topology Scheduling</strong> and click <strong id="cce_10_0425__b4130132616457">OK</strong>.</li></ol>
<div class="p" id="cce_10_0425__p187571154204"><strong id="cce_10_0425__b050622018563">Volcano earlier than 1.7.1</strong><ol type="a" id="cce_10_0425__ol0122828903"><li id="cce_10_0425__li13710057534">Log in to the <span id="cce_10_0425__cce_10_0004_ph18314322182_3">CCE console</span> and click the cluster name to access the cluster console.</li><li id="cce_10_0425__li2611115310117">In the navigation pane, choose <strong id="cce_10_0425__b1732973710307">Settings</strong> and click the <strong id="cce_10_0425__b232913715305">Scheduling</strong> tab. Find the expert mode and click <strong id="cce_10_0425__b1532923710303">Try Now</strong>.<p id="cce_10_0425__p2949165711111"></p>
</li><li id="cce_10_0425__li14122192815012">Enable <strong id="cce_10_0425__b112316322512">resource_exporter_enable</strong> to collect node NUMA information. The following is an example in JSON format:<pre class="screen" id="cce_10_0425__screen7651947143817">{
"plugins": {
"eas_service": {
"availability_zone_id": "",
"driver_id": "",
"enable": "false",
"endpoint": "",
"flavor_id": "",
"network_type": "",
"network_virtual_subnet_id": "",
"pool_id": "",
"project_id": "",
"secret_name": "eas-service-secret"
}
},
"resource_exporter_enable": "true"
}</pre>
<div class="p" id="cce_10_0425__p20961144974013">After enabling this function, run the following command to view the NUMA topology information of the current node:<pre class="screen" id="cce_10_0425__screen136816544476">kubectl get numatopo </pre>
</div>
<p id="cce_10_0425__p1648733724812">Information similar to the following is displayed:</p>
<pre class="screen" id="cce_10_0425__screen562605716482">NAME AGE
node-1 4h8m
node-2 4h8m
node-3 4h8m</pre>
</li><li id="cce_10_0425__li12132173324014">Enable the Volcano numa-aware algorithm add-on.<pre class="screen" id="cce_10_0425__screen46241355104913">kubectl edit cm -n kube-system volcano-scheduler-configmap</pre>
<div class="p" id="cce_10_0425__p14657653184018">Add the highlighted content to the YAML file to enable Volcano NUMA-aware scheduling:<pre class="screen" id="cce_10_0425__screen76348310528">kind: ConfigMap
apiVersion: v1
metadata:
name: volcano-scheduler-configmap
namespace: kube-system
data:
default-scheduler.conf: |-
actions: "allocate, backfill"
tiers:
- plugins:
- name: priority
- name: gang
- name: conformance
- plugins:
- name: overcommit
- name: drf
- name: predicates
- name: nodeorder
- plugins:
- name: cce-gpu-topology-predicate
- name: cce-gpu-topology-priority
- name: cce-gpu
- plugins:
- name: nodelocalvolume
- name: nodeemptydirvolume
- name: nodeCSIscheduling
- name: networkresource
arguments:
NetworkType: <i><span class="varname" id="cce_10_0425__varname0265182685713">vpc-router</span></i> # The parameter value depends on the cluster type.
- name: numa-aware # Enable NUMA Aware.
arguments:
weight: 10 # Weight of NUMA Aware</pre>
</div>
</li></ol>
</div>
</p></li></ol>
</div>
<div class="section" id="cce_10_0425__section20735201818553"><a name="cce_10_0425__section20735201818553"></a><a name="section20735201818553"></a><h4 class="sectiontitle">Example of NUMA Affinity Scheduling</h4><p id="cce_10_0425__p935022619016">The following describes how to choose NUMA nodes for scheduling pods according to pod scheduling policies. For details, see <a href="#cce_10_0425__section2430103110429">Pod Scheduling Process</a>.</p>
<ul id="cce_10_0425__ul43503268019"><li id="cce_10_0425__li133507262011"><strong id="cce_10_0425__b1155510322317">single-numa-node</strong>: When pods are scheduled, nodes in the node pool with the <strong id="cce_10_0425__b7617154112912">single-numa-node</strong> topology management policy will be chosen, and a single NUMA node must provide the CPU cores. If none of the nodes in the pool meet these requirements, the pod cannot be scheduled.</li><li id="cce_10_0425__li53500269017"><strong id="cce_10_0425__b4738758696">restricted</strong>: When pods are scheduled, nodes in the node pool with the <strong id="cce_10_0425__b37382581097">restricted</strong> topology management policy will be chosen, and a set of NUMA nodes on the same node must provide the CPU cores. If none of the nodes in the pool meet these requirements, the pod cannot be scheduled.</li><li id="cce_10_0425__li113506261303"><strong id="cce_10_0425__b23834215117">best-effort</strong>: When pods are scheduled, nodes in the node pool with the <strong id="cce_10_0425__b86524321113">best-effort</strong> topology management policy will be chosen, and a single NUMA node needs to provide the CPU cores. If none of the nodes in the pool meet these requirements, the pod will be scheduled to the most suitable node.</li></ul>
<ol id="cce_10_0425__ol3865534145519"><li id="cce_10_0425__li5240121219311"><span>Refer to the following examples for configuration.</span><p><ol type="a" id="cce_10_0425__ol7422615936"><li id="cce_10_0425__li7865113445513">Example 1: Configure NUMA affinity for a Deployment.<pre class="screen" id="cce_10_0425__screen489510541187">kind: Deployment
apiVersion: apps/v1
metadata:
name: numa-test
spec:
replicas: 1
selector:
matchLabels:
app: numa-test
template:
metadata:
labels:
app: numa-test
annotations:
volcano.sh/numa-topology-policy: single-numa-node # Configure the topology policy.
spec:
schedulerName: volcano
containers:
- name: container-1
image: nginx:alpine
resources:
requests:
cpu: 2 # The value must be an integer.
memory: 2048Mi
limits:
cpu: 2 # The value must be an integer.
memory: 2048Mi
imagePullSecrets:
- name: default-secret</pre>
</li><li id="cce_10_0425__li203152420311">Example 2: Create a Volcano job and enable NUMA affinity for it.<pre class="screen" id="cce_10_0425__screen103895515610">apiVersion: batch.volcano.sh/v1alpha1
kind: Job
metadata:
name: vj-test
spec:
schedulerName: volcano
minAvailable: 1
tasks:
- replicas: 1
name: "test"
topologyPolicy: best-effort # set the topology policy for task
template:
spec:
containers:
- image: alpine
command: ["/bin/sh", "-c", "sleep 1000"]
imagePullPolicy: IfNotPresent
name: running
resources:
limits:
cpu: 20
memory: "100Mi"
restartPolicy: OnFailure</pre>
</li></ol>
</p></li><li id="cce_10_0425__li161400324390"><span>Analyze NUMA scheduling.</span><p><p id="cce_10_0425__p16225156681">The following table shows example NUMA nodes.</p>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="cce_10_0425__table8766337375" frame="border" border="1" rules="all"><thead align="left"><tr id="cce_10_0425__row1276812371778"><th align="left" class="cellrowborder" valign="top" width="25%" id="mcps1.3.8.4.2.2.2.1.5.1.1"><p id="cce_10_0425__p387914110719">Worker Node</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="25%" id="mcps1.3.8.4.2.2.2.1.5.1.2"><p id="cce_10_0425__p8306123841015">Topology Manager Policy</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="25%" id="mcps1.3.8.4.2.2.2.1.5.1.3"><p id="cce_10_0425__p20879941074">Allocatable CPU Cores on NUMA Node 0</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="25%" id="mcps1.3.8.4.2.2.2.1.5.1.4"><p id="cce_10_0425__p1987954111719">Allocatable CPU Cores on NUMA Node 1</p>
</th>
</tr>
</thead>
<tbody><tr id="cce_10_0425__row11768123715717"><td class="cellrowborder" valign="top" width="25%" headers="mcps1.3.8.4.2.2.2.1.5.1.1 "><p id="cce_10_0425__p13879144115716">Node 1</p>
</td>
<td class="cellrowborder" valign="top" width="25%" headers="mcps1.3.8.4.2.2.2.1.5.1.2 "><p id="cce_10_0425__p13306173861011">single-numa-node</p>
</td>
<td class="cellrowborder" valign="top" width="25%" headers="mcps1.3.8.4.2.2.2.1.5.1.3 "><p id="cce_10_0425__p18805411373">16</p>
</td>
<td class="cellrowborder" valign="top" width="25%" headers="mcps1.3.8.4.2.2.2.1.5.1.4 "><p id="cce_10_0425__p68807411775">16</p>
</td>
</tr>
<tr id="cce_10_0425__row1942384515381"><td class="cellrowborder" valign="top" width="25%" headers="mcps1.3.8.4.2.2.2.1.5.1.1 "><p id="cce_10_0425__p1788154893811">Node 2</p>
</td>
<td class="cellrowborder" valign="top" width="25%" headers="mcps1.3.8.4.2.2.2.1.5.1.2 "><p id="cce_10_0425__p162651155113819">best-effort</p>
</td>
<td class="cellrowborder" valign="top" width="25%" headers="mcps1.3.8.4.2.2.2.1.5.1.3 "><p id="cce_10_0425__p1088104803810">16</p>
</td>
<td class="cellrowborder" valign="top" width="25%" headers="mcps1.3.8.4.2.2.2.1.5.1.4 "><p id="cce_10_0425__p12881194815382">16</p>
</td>
</tr>
<tr id="cce_10_0425__row95293598815"><td class="cellrowborder" valign="top" width="25%" headers="mcps1.3.8.4.2.2.2.1.5.1.1 "><p id="cce_10_0425__p234574490">Node 3</p>
</td>
<td class="cellrowborder" valign="top" width="25%" headers="mcps1.3.8.4.2.2.2.1.5.1.2 "><p id="cce_10_0425__p83067386108">best-effort</p>
</td>
<td class="cellrowborder" valign="top" width="25%" headers="mcps1.3.8.4.2.2.2.1.5.1.3 "><p id="cce_10_0425__p123452412913">20</p>
</td>
<td class="cellrowborder" valign="top" width="25%" headers="mcps1.3.8.4.2.2.2.1.5.1.4 "><p id="cce_10_0425__p8345946911">20</p>
</td>
</tr>
</tbody>
</table>
</div>
<p id="cce_10_0425__p61028279154">In the preceding examples,</p>
<ul id="cce_10_0425__ul8734121116398"><li id="cce_10_0425__li1479310427175">In example 1, 2 CPU cores are requested by a pod, and the <strong id="cce_10_0425__b19750155135510">single-numa-node</strong> topology policy is used. Therefore, the pod will be scheduled to node 1 with the same policy.</li><li id="cce_10_0425__li7734111163915">In example 2, 20 CPU cores are requested by a pod, and the <strong id="cce_10_0425__b16726349105612">best-effort</strong> topology policy is used. The pod will be scheduled to node 3 because it can allocate all the requested 20 CPU cores onto one NUMA node, while node 2 can do so on two NUMA nodes.</li></ul>
</p></li></ol>
</div>
<div class="section" id="cce_10_0425__section468717154445"><h4 class="sectiontitle">Checking NUMA Node Usage</h4><ol id="cce_10_0425__ol894031210544"><li id="cce_10_0425__li11940171214547">Run the following command to check the CPU usage of the current node:<pre class="screen" id="cce_10_0425__screen1185118440435">lscpu</pre>
<p id="cce_10_0425__p328372715413">Information similar to the following is displayed:</p>
<pre class="screen" id="cce_10_0425__screen11891104145416">...
CPU(s): 32
NUMA node(s): 2
NUMA node0 CPU(s): 0-15
NUMA node1 CPU(s): 16-31</pre>
</li><li id="cce_10_0425__li874615712553">Run the following command to check NUMA node usage:<pre class="screen" id="cce_10_0425__screen1599964344417">cat /var/lib/kubelet/cpu_manager_state</pre>
<p id="cce_10_0425__p194247352553">Information similar to the following is displayed:</p>
<pre class="screen" id="cce_10_0425__screen841825013555">{"policyName":"static","defaultCpuSet":"0,10-15,25-31","entries":{"777870b5-c64f-42f5-9296-688b9dc212ba":<strong id="cce_10_0425__b74184509558">{"container-1":"16-24"}</strong>,"fb15e10a-b6a5-4aaa-8fcd-76c1aa64e6fd":<strong id="cce_10_0425__b16419115020557">{"container-1":"1-9"}</strong>},"checksum":318470969}</pre>
<p id="cce_10_0425__p12616174414565">The preceding example shows that two containers are running on the node. One container uses CPU cores 1 to 9 of NUMA node 0, and the other container uses CPU cores 16 to 24 of NUMA node 1.</p>
</li></ol>
</div>
<div class="section" id="cce_10_0425__section53743014220"><h4 class="sectiontitle">Common Issues</h4><p id="cce_10_0425__p1853984118215"><strong id="cce_10_0425__b18298112405616">Schedule pods failed.</strong></p>
<p id="cce_10_0425__p1553911411821">If Volcano is set as the scheduler and only NUMA is enabled without configuring CPU management during pod scheduling, job scheduling may fail. To fix this issue, do as follows:</p>
<ul id="cce_10_0425__ul1977801216313"><li id="cce_10_0425__li677815121737">Before using NUMA affinity scheduling, make sure that Volcano has been deployed and is running properly.</li><li id="cce_10_0425__li47784121839">When using NUMA affinity scheduling:<ol id="cce_10_0425__ol197787121230"><li id="cce_10_0425__li1577811121733">Correctly configure the CPU management policy <strong id="cce_10_0425__b105382033612">(cpu-manager-policy)</strong> of the node pool.<ul id="cce_10_0425__ul1213195813287"><li id="cce_10_0425__cce_10_0425_li439116344487">If the QoS class is <strong id="cce_10_0425__cce_10_0425_b106591113195811">Guaranteed</strong> (where resource requests equal limits), set the policy to <strong id="cce_10_0425__cce_10_0425_b104599270584">static</strong>.</li><li id="cce_10_0425__cce_10_0425_li1924260164917">If the QoS class is <strong id="cce_10_0425__cce_10_0425_b660117301637">Burstable</strong> (where resource requests and limits are different), set the policy to <strong id="cce_10_0425__cce_10_0425_b67536366213">enhanced-static</strong>.</li><li id="cce_10_0425__cce_10_0425_li1316513993815">If the node pool contains pods with both <strong id="cce_10_0425__cce_10_0425_b12201102053">Guaranteed</strong> and <strong id="cce_10_0425__cce_10_0425_b36321987513">Burstable</strong> QoS classes, set the policy to <strong id="cce_10_0425__cce_10_0425_b281174519217">enhanced-static</strong>.</li></ul>
</li><li id="cce_10_0425__li117789121336">Correctly configure the topology management policy (<strong id="cce_10_0425__b1177997521">topology-manager-policy</strong>) in the node pool.</li><li id="cce_10_0425__li1077817121339">Configure a correct topology policy for pods to filter nodes with the same topology policy in the node pool. For details, see <a href="#cce_10_0425__section20735201818553">Example of NUMA Affinity Scheduling</a>.</li><li id="cce_10_0425__li477811121335">Configure the Volcano scheduler to schedule application pods. For details, see <a href="cce_10_0722.html">Scheduling Workloads</a>. Make sure that the CPU requests for all containers within the pods are integers (measured in cores).</li></ol>
</li></ul>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="cce_10_0423.html">Volcano Scheduling</a></div>
</div>
</div>