Files
doc-exports/docs/modelarts/umn/develop-modelarts-0015.html
Lai, Weijian 6aa966a79a ModelArts UMN 24.3.0 version
Reviewed-by: Pruthi, Vineet <vineet.pruthi@t-systems.com>
Co-authored-by: Lai, Weijian <laiweijian4@huawei.com>
Co-committed-by: Lai, Weijian <laiweijian4@huawei.com>
2024-11-02 09:04:52 +00:00

46 lines
4.4 KiB
HTML

<a name="EN-US_TOPIC_0000002079097945"></a><a name="EN-US_TOPIC_0000002079097945"></a>
<h1 class="topictitle1">Viewing the Resource Usage of a Training Job</h1>
<div id="body0000001166070600"><div class="section" id="EN-US_TOPIC_0000002079097945__section1995355443518"><h4 class="sectiontitle">Operations</h4><p id="EN-US_TOPIC_0000002079097945__en-us_topic_0000001160408180_p1669748115810">You can view the resource usage of a compute node in the <strong id="EN-US_TOPIC_0000002079097945__b13521184817542">Resource Usages</strong> window. The data of at most the last three days can be displayed. When the resource usage window is opened, the data is loading and refreshed periodically.</p>
<p id="EN-US_TOPIC_0000002079097945__en-us_topic_0000001160408180_p153211758154019">Operation 1: If a training job uses multiple compute nodes, choose a node from the drop-down list box to view its metrics.</p>
<p id="EN-US_TOPIC_0000002079097945__en-us_topic_0000001160408180_p5332191611591">Operation 2: Click <span class="parmname" id="EN-US_TOPIC_0000002079097945__parmname162791629205611"><b>cpuUsage</b></span>, <span class="parmname" id="EN-US_TOPIC_0000002079097945__parmname7279229105615"><b>gpuMemUsage</b></span>, <span class="parmname" id="EN-US_TOPIC_0000002079097945__parmname22795296563"><b>gpuUtil</b></span>, or <span class="parmname" id="EN-US_TOPIC_0000002079097945__parmname11280429195612"><b>memUsage</b></span> to show or hide the usage chart of the parameter.</p>
<p id="EN-US_TOPIC_0000002079097945__en-us_topic_0000001160408180_p342519281511">Operation 3: Hover the cursor on the graph to view the usage at the specific time.</p>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="EN-US_TOPIC_0000002079097945__en-us_topic_0000001160408180_table29911160452" frame="border" border="1" rules="all"><caption><b>Table 1 </b>Parameters</caption><thead align="left"><tr id="EN-US_TOPIC_0000002079097945__en-us_topic_0000001160408180_row15991516154519"><th align="left" class="cellrowborder" valign="top" width="18.85%" id="mcps1.3.1.6.2.3.1.1"><p id="EN-US_TOPIC_0000002079097945__en-us_topic_0000001160408180_p139911316154517">Parameter</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="81.15%" id="mcps1.3.1.6.2.3.1.2"><p id="EN-US_TOPIC_0000002079097945__en-us_topic_0000001160408180_p8991141611457">Description</p>
</th>
</tr>
</thead>
<tbody><tr id="EN-US_TOPIC_0000002079097945__en-us_topic_0000001160408180_row1199111616456"><td class="cellrowborder" valign="top" width="18.85%" headers="mcps1.3.1.6.2.3.1.1 "><p id="EN-US_TOPIC_0000002079097945__en-us_topic_0000001160408180_p4991151624510">cpuUsage</p>
</td>
<td class="cellrowborder" valign="top" width="81.15%" headers="mcps1.3.1.6.2.3.1.2 "><p id="EN-US_TOPIC_0000002079097945__en-us_topic_0000001160408180_p59911916204512">CPU usage</p>
</td>
</tr>
<tr id="EN-US_TOPIC_0000002079097945__en-us_topic_0000001160408180_row209911616134519"><td class="cellrowborder" valign="top" width="18.85%" headers="mcps1.3.1.6.2.3.1.1 "><p id="EN-US_TOPIC_0000002079097945__en-us_topic_0000001160408180_p4991216174513">gpuMemUsage</p>
</td>
<td class="cellrowborder" valign="top" width="81.15%" headers="mcps1.3.1.6.2.3.1.2 "><p id="EN-US_TOPIC_0000002079097945__en-us_topic_0000001160408180_p109911166452">GPU memory usage</p>
</td>
</tr>
<tr id="EN-US_TOPIC_0000002079097945__en-us_topic_0000001160408180_row1267403516468"><td class="cellrowborder" valign="top" width="18.85%" headers="mcps1.3.1.6.2.3.1.1 "><p id="EN-US_TOPIC_0000002079097945__en-us_topic_0000001160408180_p067463520461">gpuUtil</p>
</td>
<td class="cellrowborder" valign="top" width="81.15%" headers="mcps1.3.1.6.2.3.1.2 "><p id="EN-US_TOPIC_0000002079097945__en-us_topic_0000001160408180_p767443574614">GPU usage</p>
</td>
</tr>
<tr id="EN-US_TOPIC_0000002079097945__en-us_topic_0000001160408180_row12169174014460"><td class="cellrowborder" valign="top" width="18.85%" headers="mcps1.3.1.6.2.3.1.1 "><p id="EN-US_TOPIC_0000002079097945__en-us_topic_0000001160408180_p8169154015462">memUsage</p>
</td>
<td class="cellrowborder" valign="top" width="81.15%" headers="mcps1.3.1.6.2.3.1.2 "><p id="EN-US_TOPIC_0000002079097945__en-us_topic_0000001160408180_p17169440104618">Memory usage</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="develop-modelarts-0010.html">Performing a Training</a></div>
</div>
</div>