doc-exports/docs/mrs/umn/ALM-14017.html
Yang, Tong 3b1f73dece MRS UMN 2.0.38.SP20 version
Reviewed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com>
Co-authored-by: Yang, Tong <yangtong2@huawei.com>
Co-committed-by: Yang, Tong <yangtong2@huawei.com>
2022-12-13 12:03:34 +00:00

87 lines
12 KiB
HTML

<a name="ALM-14017"></a><a name="ALM-14017"></a>
<h1 class="topictitle1">ALM-14017 NameNode Direct Memory Usage Exceeds the Threshold</h1>
<div id="body13013697"><div class="section" id="ALM-14017__s274309df15fe41d891bef6c000bdafb4"><h4 class="sectiontitle">Description</h4><p id="ALM-14017__en-us_topic_0070543654_p28789050">The system checks the direct memory usage of the HDFS service every 30 seconds. This alarm is generated when the direct memory usage of a NameNode instance exceeds the threshold (90% of the maximum memory).</p>
<p id="ALM-14017__en-us_topic_0070543654_p57774862">The alarm is cleared when the direct memory usage is less than the threshold.</p>
</div>
<div class="section" id="ALM-14017__s2ce66d01e9ee40ab863aee74bd4c20c8"><h4 class="sectiontitle">Attribute</h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-14017__en-us_topic_0070543654_table49252247" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-14017__en-us_topic_0070543654_row59445239"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.1"><p id="ALM-14017__en-us_topic_0070543654_p50335026">Alarm ID</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.2"><p id="ALM-14017__en-us_topic_0070543654_p50605310">Alarm Severity</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.3"><p id="ALM-14017__en-us_topic_0070543654_p5389442">Automatically Cleared</p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-14017__en-us_topic_0070543654_row33891687"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.1 "><p id="ALM-14017__en-us_topic_0070543654_p60872149">14017</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.2 "><p id="ALM-14017__en-us_topic_0070543654_p31697011">Major</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.3 "><p id="ALM-14017__en-us_topic_0070543654_p17321065">Yes</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-14017__scc0fe379bea34012ac08145fd5f246a8"><h4 class="sectiontitle">Parameters</h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-14017__en-us_topic_0070543654_table60829024" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-14017__en-us_topic_0070543654_row25333485"><th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.1"><p id="ALM-14017__en-us_topic_0070543654_p38746393">Name</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.2"><p id="ALM-14017__en-us_topic_0070543654_p51450089">Meaning</p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-14017__row8110121363417"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-14017__p192431315431">Source</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-14017__p692551319435">Specifies the cluster for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-14017__en-us_topic_0070543654_row6707653"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-14017__en-us_topic_0070543654_p6449046">ServiceName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-14017__en-us_topic_0070543654_p52610756">Specifies the service for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-14017__en-us_topic_0070543654_row3734761"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-14017__en-us_topic_0070543654_p34080237">RoleName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-14017__en-us_topic_0070543654_p9035776">Specifies the role for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-14017__en-us_topic_0070543654_row14213128"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-14017__en-us_topic_0070543654_p10412728">HostName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-14017__en-us_topic_0070543654_p38124620">Specifies the host for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-14017__en-us_topic_0070543654_row7577263"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-14017__en-us_topic_0070543654_p9778581">Trigger Condition</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-14017__en-us_topic_0070543654_p53867603">Specifies the threshold triggering the alarm. If the current indicator value exceeds this threshold, the alarm is generated.</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-14017__s017feba06a1c4bffac7139bcea3594b2"><h4 class="sectiontitle">Impact on the System</h4><p id="ALM-14017__en-us_topic_0070543654_p1199762">If the available direct memory of the HDFS service is insufficient, a memory overflow occurs and the service breaks down.</p>
</div>
<div class="section" id="ALM-14017__sbc3aec0dae6748e0a2e75bfd7ca803a7"><h4 class="sectiontitle">Possible Causes</h4><p id="ALM-14017__en-us_topic_0070543654_p30071930">The direct memory of the NameNode instance is overused or the direct memory is inappropriately allocated.</p>
</div>
<div class="section" id="ALM-14017__s3c90b939d990400c815f82fb4bd7418e"><h4 class="sectiontitle">Procedure</h4><p class="tableheading" id="ALM-14017__en-us_topic_0070543654_p19907250"><strong id="ALM-14017__b231836969426">Check the direct memory usage.</strong></p>
<ol id="ALM-14017__ol6578721694227"><li id="ALM-14017__li18750521749"><span>On the FusionInsight Manager portal, choose<strong id="ALM-14017__b137435401252"> <strong id="ALM-14017__b274318404515">O&amp;M</strong> &gt; <strong id="ALM-14017__b16743104013517">Alarm </strong>&gt; <strong id="ALM-14017__b8743540951">Alarms</strong>.</strong> On the displayed interface, click the drop-down button of <strong id="ALM-14017__b1764814191051">ALM-14017 NameNode Direct Memory Usage Exceeds the Threshold</strong>. Then check the role name in <strong id="ALM-14017__b14790172183618">Location </strong>and confirm the IP adress of the instance.</span></li><li id="ALM-14017__li935077194211"><span>On the FusionInsight Manager portal, choose <strong id="ALM-14017__b192661866373">Cluster &gt; </strong><em id="ALM-14017__i6269564370">Name of the desired cluster</em><strong id="ALM-14017__b112675612374"> &gt; Services</strong> &gt; <strong id="ALM-14017__b2280278494211">HDFS</strong> &gt; <strong id="ALM-14017__b389846494211">Instance</strong> &gt; <strong id="ALM-14017__b3508618394211">NameNode (IP address for which the alarm is generated)</strong>. Click the drop-down menu in the upper right corner of <strong id="ALM-14017__b3273144141318">Chart</strong>, choose <strong id="ALM-14017__b7246166191312">Customize</strong> &gt; <strong id="ALM-14017__b15702441192211">Resource</strong>, and select <strong id="ALM-14017__b2340859594211">NameNode Memory</strong> to check the direct memory usage.</span></li><li id="ALM-14017__li3891305494211"><span>Check whether the used direct memory of NameNode reaches 90% of the maximum direct memory specified for NameNode by default.</span><p><ul class="subitemlist" id="ALM-14017__ul4906291594211"><li id="ALM-14017__li1921496194211">If yes, go to <a href="#ALM-14017__li5299688794211">4</a>.</li><li id="ALM-14017__li1290797594211">If no, go to <a href="#ALM-14017__li1686819594211">8</a>.</li></ul>
</p></li><li id="ALM-14017__li5299688794211"><a name="ALM-14017__li5299688794211"></a><a name="li5299688794211"></a><span>On the FusionInsight Manager portal, choose <strong id="ALM-14017__b650671313375">Cluster &gt; </strong><em id="ALM-14017__i650815137377">Name of the desired cluster</em><strong id="ALM-14017__b750651313377"> &gt; Services</strong> &gt; <strong id="ALM-14017__b2162943793121">HDFS</strong> &gt; <strong id="ALM-14017__b6593994694053">Configurations</strong> &gt; <strong id="ALM-14017__b715398193121">All</strong> <strong id="ALM-14017__b6816162115232">Configurations</strong> &gt; <strong id="ALM-14017__b3655433194211">NameNode</strong> &gt; <strong id="ALM-14017__b6055352694211">System</strong> to check whether "-XX:MaxDirectMemorySize" exists in the <strong id="ALM-14017__b205777574306">GC_OPTS</strong> parameter.</span><p><ul id="ALM-14017__ul794196113118"><li id="ALM-14017__li29414612313">If yes, go to <a href="#ALM-14017__li817315147319">5</a>.</li><li id="ALM-14017__li69411567311">If no, go to <a href="#ALM-14017__li16393123713315">6</a>.</li></ul>
</p></li><li id="ALM-14017__li817315147319"><a name="ALM-14017__li817315147319"></a><a name="li817315147319"></a><span>In the <strong id="ALM-14017__b589412210317">GC_OPTS</strong> parameter, delete "-XX:MaxDirectMemorySize". Save the configuration and restart the NameNode instance.</span></li><li id="ALM-14017__li16393123713315"><a name="ALM-14017__li16393123713315"></a><a name="li16393123713315"></a><span>Check whether the <strong id="ALM-14017__b961202410912">ALM-14007 NameNode Heap Memory Usage Exceeds the Threshold</strong> exists.</span><p><ul id="ALM-14017__ul11331814326"><li id="ALM-14017__li933171163212">If yes, handle the alarm by referring to <strong id="ALM-14017__b10341345726">ALM-14007 NameNode Heap Memory Usage Exceeds the Threshold</strong>.</li><li id="ALM-14017__li13338117321">If no, go to <a href="#ALM-14017__li812407194211">7</a>.</li></ul>
</p></li><li id="ALM-14017__li812407194211"><a name="ALM-14017__li812407194211"></a><a name="li812407194211"></a><span>Check whether the alarm is cleared.</span><p><ul class="subitemlist" id="ALM-14017__ul5975262094211"><li id="ALM-14017__li6488942794211">If yes, no further action is required.</li><li id="ALM-14017__li2155226094211">If no, go to <a href="#ALM-14017__li1686819594211">8</a>.</li></ul>
</p></li></ol>
<p class="tableheading" id="ALM-14017__p90267494211"><strong id="ALM-14017__b664122694234">Collect fault information.</strong></p>
<ol start="8" id="ALM-14017__ol6238844294238"><li id="ALM-14017__li1686819594211"><a name="ALM-14017__li1686819594211"></a><a name="li1686819594211"></a><span>On the FusionInsight Manager portal, choose <strong id="ALM-14017__b39977366113627">O&amp;M</strong> &gt; <strong id="ALM-14017__b24251979113627">Log &gt; Download</strong>.</span></li><li id="ALM-14017__li971354294211"><span>Select <strong id="ALM-14017__b1759603094211">NameNode</strong> in the required cluster from the <strong id="ALM-14017__b2414655094211">Service</strong>.</span></li><li id="ALM-14017__li1145664103113"><span>Click <span><img id="ALM-14017__image1945644173117" src="en-us_image_0269383972.png"></span> in the upper right corner, and set <strong id="ALM-14017__b6456941173117">Start Date</strong> and <strong id="ALM-14017__b11456154113318">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-14017__b13456164113319">Download</strong>.</span></li><li id="ALM-14017__li2657131294211"><span>Contact the <span id="ALM-14017__text4614151421417">O&amp;M personnel</span> and send the collected logs.</span></li></ol>
</div>
<div class="section" id="ALM-14017__section1529716184534"><h4 class="sectiontitle">Alarm Clearing</h4><p id="ALM-14017__p4677152685316">After the fault is rectified, the system automatically clears this alarm.</p>
</div>
<div class="section" id="ALM-14017__sfd95476d197549f9b99529f66ff23295"><h4 class="sectiontitle">Related Information</h4><p id="ALM-14017__en-us_topic_0070543654_p47303605">None</p>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
</div>
</div>