doc-exports/docs/mrs/umn/ALM-18013.html
Yang, Tong 3b1f73dece MRS UMN 2.0.38.SP20 version
Reviewed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com>
Co-authored-by: Yang, Tong <yangtong2@huawei.com>
Co-committed-by: Yang, Tong <yangtong2@huawei.com>
2022-12-13 12:03:34 +00:00

87 lines
12 KiB
HTML

<a name="ALM-18013"></a><a name="ALM-18013"></a>
<h1 class="topictitle1">ALM-18013 ResourceManager Direct Memory Usage Exceeds the Threshold</h1>
<div id="body31046979"><div class="section" id="ALM-18013__sc51a6e755205424cb657d512988dee91"><h4 class="sectiontitle">Description</h4><p id="ALM-18013__en-us_topic_0070543510_p25660991">The system checks the direct memory usage of the Yarn service every 30 seconds. This alarm is generated when the direct memory usage of a ResourceManager instance exceeds the threshold (90% of the maximum memory).</p>
<p id="ALM-18013__en-us_topic_0070543510_p29622330">The alarm is cleared when the direct memory usage is less than the threshold.</p>
</div>
<div class="section" id="ALM-18013__s41eaedad8c9f4f3eba799bd7ebd8eb95"><h4 class="sectiontitle">Attribute</h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-18013__en-us_topic_0070543510_table50598521" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-18013__en-us_topic_0070543510_row9871675"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.1"><p id="ALM-18013__en-us_topic_0070543510_p61408172">Alarm ID</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.2"><p id="ALM-18013__en-us_topic_0070543510_p8006030">Alarm Severity</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.3"><p id="ALM-18013__en-us_topic_0070543510_p44508680">Automatically Cleared</p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-18013__en-us_topic_0070543510_row48433347"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.1 "><p id="ALM-18013__en-us_topic_0070543510_p30787047">18013</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.2 "><p id="ALM-18013__en-us_topic_0070543510_p10722842">Major</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.3 "><p id="ALM-18013__en-us_topic_0070543510_p63243911">Yes</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-18013__se7c40815fea64570bb4c60d4b85579c1"><h4 class="sectiontitle">Parameters</h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-18013__en-us_topic_0070543510_table22483156" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-18013__en-us_topic_0070543510_row60346796"><th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.1"><p id="ALM-18013__en-us_topic_0070543510_p56252285">Name</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.2"><p id="ALM-18013__en-us_topic_0070543510_p60141268">Meaning</p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-18013__row114691053162116"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-18013__p192431315431">Source</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-18013__p692551319435">Specifies the cluster for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-18013__en-us_topic_0070543510_row39604502"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-18013__en-us_topic_0070543510_p53848109">ServiceName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-18013__en-us_topic_0070543510_p66729601">Specifies the service for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-18013__en-us_topic_0070543510_row63695501"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-18013__en-us_topic_0070543510_p59061958">RoleName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-18013__en-us_topic_0070543510_p19289328">Specifies the role for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-18013__en-us_topic_0070543510_row39386231"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-18013__en-us_topic_0070543510_p36168141">HostName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-18013__en-us_topic_0070543510_p43938279">Specifies the host for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-18013__en-us_topic_0070543510_row59900193"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-18013__en-us_topic_0070543510_p20077469">Trigger Condition</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-18013__en-us_topic_0070543510_p15662289">Specifies the threshold triggering the alarm. If the current indicator value exceeds this threshold, the alarm is generated.</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-18013__sc38331d8549e41439a2140c8fb84f5e1"><h4 class="sectiontitle">Impact on the System</h4><p id="ALM-18013__en-us_topic_0070543510_p60685926">If the available direct memory of the Yarn service is insufficient, a memory overflow occurs and the service breaks down.</p>
</div>
<div class="section" id="ALM-18013__sc2adecba1bf249c99df1576ae8bbad62"><h4 class="sectiontitle">Possible Causes</h4><p id="ALM-18013__en-us_topic_0070543510_p16612996">The direct memory of the ResourceManager instance is overused or the direct memory is inappropriately allocated.</p>
</div>
<div class="section" id="ALM-18013__s529cf772f4694cbd85142be098a94bb8"><h4 class="sectiontitle">Procedure</h4><p class="tableheading" id="ALM-18013__en-us_topic_0070543510_p3475410"><strong id="ALM-18013__b3390253719118">Check the direct memory usage.</strong></p>
<ol id="ALM-18013__ol8351773191121"><li id="ALM-18013__li1944561819112"><span>On the FusionInsight Manager portal, choose <strong id="ALM-18013__b2543414519112">O&amp;M &gt; Alarm<strong id="ALM-18013__b27872374104950"> &gt; Alarms</strong></strong> &gt; <strong id="ALM-18013__b2758071719112">ALM-18013 ResourceManager Direct Memory Usage Exceeds the Threshold</strong> &gt; <strong id="ALM-18013__b4689986619112">Location</strong> to check the IP address of the instance for which the alarm is generated.</span></li><li id="ALM-18013__li435352519112"><span>On the FusionInsight Manager portal, choose <strong id="ALM-18013__b133828336458">Cluster &gt; </strong><em id="ALM-18013__i738393334512">Name of the desired cluster</em><strong id="ALM-18013__b1138223319451"> &gt; Services</strong> &gt; <strong id="ALM-18013__b3159121119112">Yarn</strong> &gt; <strong id="ALM-18013__b1588544719112">Instance</strong> &gt; <strong id="ALM-18013__b875129719112">ResourceManager (IP address for which the alarm is generated)</strong>. Click the drop-down menu in the upper right corner of <strong id="ALM-18013__b3273144141318">Chart</strong>, choose <strong id="ALM-18013__b7246166191312">Customize</strong> &gt; <strong id="ALM-18013__b44597319295">Memory Usage Status of ResourceManager </strong>to check the direct memory usage.</span></li><li id="ALM-18013__li6388479419112"><span>Check whether the used direct memory of ResourceManager reaches 90% of the maximum direct memory specified for ResourceManager by default.</span><p><ul class="subitemlist" id="ALM-18013__ul4438101219112"><li id="ALM-18013__li1709124919112">If yes, go to <a href="#ALM-18013__li3060052619112">4</a>.</li><li id="ALM-18013__li4221392519112">If no, go to <a href="#ALM-18013__li1521968019112">9</a>.</li></ul>
</p></li><li id="ALM-18013__li3060052619112"><a name="ALM-18013__li3060052619112"></a><a name="li3060052619112"></a><span>On the FusionInsight Manager portal, choose <strong id="ALM-18013__b86310394453">Cluster &gt; </strong><em id="ALM-18013__i5633153954519">Name of the desired cluster</em><strong id="ALM-18013__b963114396455"> &gt; Services</strong> &gt; <strong id="ALM-18013__b728584819112">Yarn</strong> &gt; <strong id="ALM-18013__b6557263319112">Configurations</strong> &gt; <strong id="ALM-18013__b5328279019112">All</strong> <strong id="ALM-18013__b11114133017295">Configurations</strong> &gt; <strong id="ALM-18013__b978306219112">ResourceManager</strong> &gt; <strong id="ALM-18013__b2093870019112">System</strong> to increase the value of check whether <strong id="ALM-18013__b2072813293537">-XX:MaxDirectMemorySize</strong> exists in the <strong id="ALM-18013__b79901348114618">GC_OPTS</strong> parameter.</span><p><ul class="subitemlist" id="ALM-18013__ul37221917135012"><li id="ALM-18013__li1772210177502">If yes, go to <a href="#ALM-18013__li28439618491">5</a>.</li><li id="ALM-18013__li27221017115012">If no, go to <a href="#ALM-18013__li558791074715">7</a>.</li></ul>
</p></li><li id="ALM-18013__li28439618491"><a name="ALM-18013__li28439618491"></a><a name="li28439618491"></a><span>In the <strong id="ALM-18013__b14927131615493">GC_OPTS</strong> parameter, delete <strong id="ALM-18013__b8801682536">-XX:MaxDirectMemorySize</strong>.</span></li><li id="ALM-18013__li696928219112"><span>Save the configuration and restart the ResourceManager instance.</span></li><li id="ALM-18013__li558791074715"><a name="ALM-18013__li558791074715"></a><a name="li558791074715"></a><span>Check whether the <strong id="ALM-18013__b138981356111818">ALM-18008 Heap Memory Usage of ResourceManager Exceeds the Threshold</strong> exists.</span><p><ul class="subitemlist" id="ALM-18013__ul1122918311477"><li id="ALM-18013__li16229133134710">If yes, handle the alarm by referring to <strong id="ALM-18013__b1448741011195">ALM-18008 Heap Memory Usage of ResourceManager Exceeds the Threshold</strong>.</li><li id="ALM-18013__li13229103110472">If no, go to <a href="#ALM-18013__li2441573219112">8</a>.</li></ul>
</p></li><li id="ALM-18013__li2441573219112"><a name="ALM-18013__li2441573219112"></a><a name="li2441573219112"></a><span>Check whether the alarm is cleared.</span><p><ul class="subitemlist" id="ALM-18013__ul2432805919112"><li id="ALM-18013__li6272354519112">If yes, no further action is required.</li><li id="ALM-18013__li4744236019112">If no, go to <a href="#ALM-18013__li1521968019112">9</a>.</li></ul>
</p></li></ol>
<p class="tableheading" id="ALM-18013__p1762594019112"><strong id="ALM-18013__b36945062191130">Collect fault information.</strong></p>
<ol start="9" id="ALM-18013__ol1029931191133"><li id="ALM-18013__li1521968019112"><a name="ALM-18013__li1521968019112"></a><a name="li1521968019112"></a><span>On the FusionInsight Manager portal, choose <strong id="ALM-18013__b825517545297">O&amp;M</strong> &gt; <strong id="ALM-18013__b3151723719112">Log &gt; Download</strong>.</span></li><li id="ALM-18013__li6544505519112"><span>Select <strong id="ALM-18013__b275939919112">ResourceManager</strong> in the required cluster from the <strong id="ALM-18013__b2483459319112">Service</strong>.</span></li><li id="ALM-18013__li1145664103113"><span>Click <span><img id="ALM-18013__image1945644173117" src="en-us_image_0269417400.png"></span> in the upper right corner, and set <strong id="ALM-18013__b6456941173117">Start Date</strong> and <strong id="ALM-18013__b11456154113318">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-18013__b13456164113319">Download</strong>.</span></li><li id="ALM-18013__li1040648319112"><span>Contact the <span id="ALM-18013__text4614151421417">O&amp;M personnel</span> and send the collected logs.</span></li></ol>
</div>
<div class="section" id="ALM-18013__section1529716184534"><h4 class="sectiontitle">Alarm Clearing</h4><p id="ALM-18013__p4677152685316">After the fault is rectified, the system automatically clears this alarm.</p>
</div>
<div class="section" id="ALM-18013__s026be09b19a347abaff98dc5beaa02cc"><h4 class="sectiontitle">Related Information</h4><p id="ALM-18013__en-us_topic_0070543510_p11009764">None</p>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
</div>
</div>