doc-exports/docs/mrs/umn/ALM-14019.html
Yang, Tong 3b1f73dece MRS UMN 2.0.38.SP20 version
Reviewed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com>
Co-authored-by: Yang, Tong <yangtong2@huawei.com>
Co-committed-by: Yang, Tong <yangtong2@huawei.com>
2022-12-13 12:03:34 +00:00

99 lines
14 KiB
HTML

<a name="ALM-14019"></a><a name="ALM-14019"></a>
<h1 class="topictitle1">ALM-14019 DataNode Non-heap Memory Usage Exceeds the Threshold</h1>
<div id="body15867681"><div class="section" id="ALM-14019__s78d38f8984ea440696b8e2268c0a6d07"><h4 class="sectiontitle">Description</h4><p id="ALM-14019__en-us_topic_0070543656_p58198496">The system checks the non-heap memory usage of the HDFS DataNode every 30 seconds and compares the actual usage with the threshold. The non-heap memory usage of the HDFS DataNode has a default threshold. This alarm is generated when the non-heap memory usage of the HDFS DataNode exceeds the threshold.</p>
<p id="ALM-14019__en-us_topic_0070543656_p54024419">Users can choose <strong id="ALM-14019__b1186670114910">O&amp;M &gt; Alarm &gt; Thresholds&gt;</strong> <em id="ALM-14019__i138695014496">Name of the desired cluster</em> <strong id="ALM-14019__b38671010495">&gt;</strong> <strong id="ALM-14019__en-us_topic_0070543655_b3167375">HDFS</strong> to change the threshold.</p>
<p id="ALM-14019__en-us_topic_0070543656_p973304">This alarm is cleared when the no-heap memory usage of the HDFS DataNode is less than or equal to the threshold.</p>
</div>
<div class="section" id="ALM-14019__s0a9b8d8cc90a44309d141adbd390dfe2"><h4 class="sectiontitle">Attribute</h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-14019__en-us_topic_0070543656_table11728806" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-14019__en-us_topic_0070543656_row30516437"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.1"><p id="ALM-14019__en-us_topic_0070543656_p55912309">Alarm ID</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.2"><p id="ALM-14019__en-us_topic_0070543656_p32603190">Alarm Severity</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.3"><p id="ALM-14019__en-us_topic_0070543656_p23612735">Automatically Cleared</p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-14019__en-us_topic_0070543656_row33583417"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.1 "><p id="ALM-14019__en-us_topic_0070543656_p35902284">14019</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.2 "><p id="ALM-14019__en-us_topic_0070543656_p22403871">Major</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.3 "><p id="ALM-14019__en-us_topic_0070543656_p2774234">Yes</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-14019__se71a528544fa485789ba06ac9a6ac19e"><h4 class="sectiontitle">Parameters</h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-14019__en-us_topic_0070543656_table23386406" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-14019__en-us_topic_0070543656_row44820237"><th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.1"><p id="ALM-14019__en-us_topic_0070543656_p6560614">Name</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.2"><p id="ALM-14019__en-us_topic_0070543656_p61647731">Meaning</p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-14019__row880764414338"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-14019__p192431315431">Source</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-14019__p692551319435">Specifies the cluster for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-14019__en-us_topic_0070543656_row27410333"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-14019__en-us_topic_0070543656_p5644491">ServiceName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-14019__en-us_topic_0070543656_p54550630">Specifies the service for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-14019__en-us_topic_0070543656_row21193623"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-14019__en-us_topic_0070543656_p38961920">RoleName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-14019__en-us_topic_0070543656_p1798921">Specifies the role for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-14019__en-us_topic_0070543656_row16190289"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-14019__en-us_topic_0070543656_p36345005">HostName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-14019__en-us_topic_0070543656_p58264318">Specifies the host for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-14019__en-us_topic_0070543656_row54616822"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-14019__en-us_topic_0070543656_p61886481">Trigger condition</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-14019__en-us_topic_0070543656_p46749077">Specifies the threshold triggering the alarm. If the current indicator value exceeds this threshold, the alarm is generated.</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-14019__s91206c0f1636442e8cae6b0adabb3900"><h4 class="sectiontitle">Impact on the System</h4><p id="ALM-14019__en-us_topic_0070543656_p28578856">If the memory usage of the HDFS DataNode is too high, data read/write performance of HDFS will be affected.</p>
</div>
<div class="section" id="ALM-14019__se298ea1b1c454c9fba148ac95d853888"><h4 class="sectiontitle">Possible Causes</h4><p id="ALM-14019__en-us_topic_0070543656_p33186037">Non-heap memory of the HDFS DataNode is insufficient.</p>
</div>
<div class="section" id="ALM-14019__sad3ce5698f5542238198126ccf65d035"><h4 class="sectiontitle">Procedure</h4><p class="tableheading" id="ALM-14019__en-us_topic_0070543656_p3714466"><strong id="ALM-14019__b3304491895354">Delete unnecessary files.</strong></p>
<ol id="ALM-14019__ol524246189546"><li id="ALM-14019__li2962592295356"><span>Log in to the HDFS client as user <strong id="ALM-14019__b3453244195356">root</strong>. <span id="ALM-14019__text101733453110"></span>Run the <strong id="ALM-14019__b4235651595356">cd</strong> command to go to the client installation directory, and run the <strong id="ALM-14019__b4566431895356">source bigdata_env</strong> command.</span><p><p class="litext" id="ALM-14019__p832568495356">If the cluster adopts the security mode, perform security authentication.</p>
<p class="litext" id="ALM-14019__p329176995356">Run the <strong id="ALM-14019__b782229295356">kinit hdfs</strong> command and enter the password as prompted. Obtain the password from the administrator.</p>
</p></li><li id="ALM-14019__li5535251395356"><span>Run the <strong id="ALM-14019__b6530671495356">hdfs dfs -rm -r </strong><em id="ALM-14019__i5088952195356">file or directory path</em> command to delete unnecessary files.</span></li><li id="ALM-14019__li2832425495356"><span>Check whether the alarm is cleared.</span><p><ul class="subitemlist" id="ALM-14019__ul4177490695356"><li id="ALM-14019__li2841056995356">If yes, no further action is required.</li><li id="ALM-14019__li1955473795356">If no, go to <a href="#ALM-14019__li4596028395356">4</a>.</li></ul>
</p></li></ol>
<p class="tableheading" id="ALM-14019__p4042984195356"><strong id="ALM-14019__b1756868995413">Check the DataNode JVM non-heap memory usage and configuration.</strong></p>
<ol start="4" id="ALM-14019__ol1632139995427"><li id="ALM-14019__li4596028395356"><a name="ALM-14019__li4596028395356"></a><a name="li4596028395356"></a><span>On the FusionInsight Manager portal, choose <strong id="ALM-14019__b2054134011380">Cluster &gt; </strong><em id="ALM-14019__i354334043813">Name of the desired cluster</em><strong id="ALM-14019__b954154043816"> &gt; Services</strong> &gt; <strong id="ALM-14019__b1256323895356">HDFS</strong>.</span></li><li id="ALM-14019__li1772384495356"><span>In the <strong id="ALM-14019__b30282645162958">Basic Information</strong> area, click <strong id="ALM-14019__b3179547795356">NameNode(Active)</strong>. The HDFS WebUI is displayed.</span><p><div class="note" id="ALM-14019__note840916461457"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="ALM-14019__en-us_topic_0193189480_p91833832915">By default, the <strong id="ALM-14019__en-us_topic_0193189480_b4780151814294">admin</strong> user does not have the permissions to manage other components. If the page cannot be opened or the displayed content is incomplete when you access the native UI of a component due to insufficient permissions, you can manually create a user with the permissions to manage that component.</p>
</div></div>
</p></li><li id="ALM-14019__li3578073195356"><a name="ALM-14019__li3578073195356"></a><a name="li3578073195356"></a><span>On the HDFS WebUI, click the <strong id="ALM-14019__b1972023018385">Datanodes</strong> tab to view the number of blocks of all DataNodes that report alarms.</span></li><li id="ALM-14019__li4116310695356"><a name="ALM-14019__li4116310695356"></a><a name="li4116310695356"></a><span>On the FusionInsight Manager portal, choose <strong id="ALM-14019__b137462496383">Cluster &gt; </strong><em id="ALM-14019__i1574820497386">Name of the desired cluster</em><strong id="ALM-14019__b1074704913387"> &gt; Services</strong> &gt; <strong id="ALM-14019__b2162943793121">HDFS</strong> &gt; <strong id="ALM-14019__b6593994694053">Configurations</strong> &gt; <strong id="ALM-14019__b715398193121">All</strong> <strong id="ALM-14019__b6816162115232">Configurations</strong>. In <strong id="ALM-14019__b5092474295356">Search</strong>, enter <strong id="ALM-14019__b5566950195356">GC_OPTS</strong> to check the <strong id="ALM-14019__b3126346795356">GC_OPTS</strong> non-heap memory parameter of <strong id="ALM-14019__b1293575395356">HDFS-&gt;DataNode</strong>.</span></li></ol>
<p class="tableheading" id="ALM-14019__p4931292195356"><strong id="ALM-14019__b5347592395437">Adjust system configurations.</strong></p>
<ol start="8" id="ALM-14019__ol832400795450"><li id="ALM-14019__li540990795356"><span>Check whether the memory is properly configured based on the number of blocks in <a href="#ALM-14019__li3578073195356">6</a> and the memory parameters configured for DataNode in <a href="#ALM-14019__li4116310695356">7</a>.</span><p><ul class="subitemlist" id="ALM-14019__ul2297072295356"><li id="ALM-14019__li2433430895356">If yes, go to <a href="#ALM-14019__li3591807095356">9</a>.</li><li id="ALM-14019__li2492192395356">If no, go to <a href="#ALM-14019__li6341663395356">12</a>.</li></ul>
<div class="note" id="ALM-14019__note1467415310394"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="ALM-14019__p858870114012">The mapping between the average number of blocks of a DataNode instance and the DataNode memory is as follows:</p>
<ul id="ALM-14019__ul135881204409"><li id="ALM-14019__li78401326406">If the average number of blocks of a DataNode instance reaches 2,000,000, the reference values of the JVM parameters of the DataNode are as follows: -Xms6G -Xmx6G -XX:NewSize=512M -XX:MaxNewSize=512M</li><li id="ALM-14019__li1858840114013">If the average number of blocks of a DataNode instance reaches 5,000,000, the reference values of the JVM parameters of the DataNode are as follows: -Xms12G -Xmx12G -XX:NewSize=1G -XX:MaxNewSize=1G</li></ul>
</div></div>
</p></li><li id="ALM-14019__li3591807095356"><a name="ALM-14019__li3591807095356"></a><a name="li3591807095356"></a><span>Modify the <strong id="ALM-14019__b16780121544011">GC_OPTS</strong> parameter of the DataNode based on the mapping between the number of blocks and memory.</span></li><li id="ALM-14019__li720993694211"><span>Save the configuration and click <strong id="ALM-14019__b4308232137">Dashboard </strong>&gt;<strong id="ALM-14019__b830915321435"> More</strong> &gt; <strong id="ALM-14019__b11170184254710">Restart Service</strong>.</span></li><li id="ALM-14019__li2420567795356"><span>Check whether the alarm is cleared.</span><p><ul class="subitemlist" id="ALM-14019__ul3923854695356"><li id="ALM-14019__li5482717795356">If yes, no further action is required.</li><li id="ALM-14019__li1181637895356">If no, go to <a href="#ALM-14019__li6341663395356">12</a>.</li></ul>
</p></li></ol>
<p class="tableheading" id="ALM-14019__p1760260095356"><strong id="ALM-14019__b3139494095457">Collect fault information.</strong></p>
<ol start="12" id="ALM-14019__ol472747589550"><li id="ALM-14019__li6341663395356"><a name="ALM-14019__li6341663395356"></a><a name="li6341663395356"></a><span>On the FusionInsight Manager portal, choose <strong id="ALM-14019__b39977366113627">O&amp;M</strong> &gt; <strong id="ALM-14019__b24251979113627">Log &gt; Download</strong>.</span></li><li id="ALM-14019__li6048203295356"><span>Select the following services in the required cluster from the <strong id="ALM-14019__b3387878895356">Service</strong>.</span><p><ul class="subitemlist" id="ALM-14019__ul1417676695356"><li id="ALM-14019__li5982734595356">ZooKeeper</li><li id="ALM-14019__li157519695356">HDFS</li></ul>
</p></li><li id="ALM-14019__li1145664103113"><span>Click <span><img id="ALM-14019__image1945644173117" src="en-us_image_0269417343.png"></span> in the upper right corner, and set <strong id="ALM-14019__b6456941173117">Start Date</strong> and <strong id="ALM-14019__b11456154113318">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-14019__b13456164113319">Download</strong>.</span></li><li id="ALM-14019__li3606587995356"><span>Contact the <span id="ALM-14019__text4614151421417">O&amp;M personnel</span> and send the collected logs.</span></li></ol>
</div>
<div class="section" id="ALM-14019__section1529716184534"><h4 class="sectiontitle">Alarm Clearing</h4><p id="ALM-14019__p4677152685316">After the fault is rectified, the system automatically clears this alarm.</p>
</div>
<div class="section" id="ALM-14019__s97580a68db3d41509ebf661c71db1fee"><h4 class="sectiontitle">Related Information</h4><p id="ALM-14019__en-us_topic_0070543656_p5288017">None</p>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
</div>
</div>