forked from docs/doc-exports
Reviewed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com> Co-authored-by: Yang, Tong <yangtong2@huawei.com> Co-committed-by: Yang, Tong <yangtong2@huawei.com>
88 lines
13 KiB
HTML
88 lines
13 KiB
HTML
<a name="ALM-18010"></a><a name="ALM-18010"></a>
|
|
|
|
<h1 class="topictitle1">ALM-18010 ResourceManager GC Time Exceeds the Threshold</h1>
|
|
<div id="body2407664"><div class="section" id="ALM-18010__sf91c4cb08e92432ba8701f07eeb7c62a"><h4 class="sectiontitle">Description</h4><p id="ALM-18010__en-us_topic_0070543507_p42526340">The system checks the garbage collection (GC) duration of the ResourceManager process every 60 seconds. This alarm is generated when the GC duration exceeds the threshold (12 seconds by default).</p>
|
|
<p id="ALM-18010__en-us_topic_0070543507_p47192740">This alarm is cleared when the GC duration is less than the threshold.</p>
|
|
</div>
|
|
<div class="section" id="ALM-18010__saea7515081244e9f8e093bcc5694e880"><h4 class="sectiontitle">Attribute</h4>
|
|
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-18010__en-us_topic_0070543507_table64515557" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-18010__en-us_topic_0070543507_row25299820"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.1"><p id="ALM-18010__en-us_topic_0070543507_p36019552">Alarm ID</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.2"><p id="ALM-18010__en-us_topic_0070543507_p31902563">Alarm Severity</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.3"><p id="ALM-18010__en-us_topic_0070543507_p33970848">Automatically Cleared</p>
|
|
</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody><tr id="ALM-18010__en-us_topic_0070543507_row175293"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.1 "><p id="ALM-18010__en-us_topic_0070543507_p14198754">18010</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.2 "><p id="ALM-18010__en-us_topic_0070543507_p9248440">Major</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.3 "><p id="ALM-18010__en-us_topic_0070543507_p10926182">Yes</p>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
</div>
|
|
<div class="section" id="ALM-18010__s2ebb175fbce9424fa16a9b551febba83"><h4 class="sectiontitle">Parameters</h4>
|
|
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-18010__en-us_topic_0070543507_table12605589" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-18010__en-us_topic_0070543507_row43066942"><th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.1"><p id="ALM-18010__en-us_topic_0070543507_p65870306">Name</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.2"><p id="ALM-18010__en-us_topic_0070543507_p33894570">Meaning</p>
|
|
</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody><tr id="ALM-18010__row52132417223"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-18010__p192431315431">Source</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-18010__p692551319435">Specifies the cluster for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-18010__en-us_topic_0070543507_row61105663"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-18010__en-us_topic_0070543507_p50611656">ServiceName</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-18010__en-us_topic_0070543507_p5903470">Specifies the service for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-18010__en-us_topic_0070543507_row53131231"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-18010__en-us_topic_0070543507_p8662426">RoleName</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-18010__en-us_topic_0070543507_p30567910">Specifies the role for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-18010__en-us_topic_0070543507_row6675738"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-18010__en-us_topic_0070543507_p3863873">HostName</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-18010__en-us_topic_0070543507_p44538317">Specifies the host for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-18010__en-us_topic_0070543507_row65300541"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-18010__en-us_topic_0070543507_p54852443">Trigger Condition</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-18010__en-us_topic_0070543507_p13862865">Specifies the threshold triggering the alarm. If the current indicator value exceeds this threshold, the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
</div>
|
|
<div class="section" id="ALM-18010__s14922de1e0484d07aaa11b217c599e48"><h4 class="sectiontitle">Impact on the System</h4><p id="ALM-18010__en-us_topic_0070543507_p49150280">A long GC duration of the ResourceManager process may interrupt the services.</p>
|
|
</div>
|
|
<div class="section" id="ALM-18010__en-us_topic_0070543507_section720995"><h4 class="sectiontitle">Possible Causes</h4><p id="ALM-18010__en-us_topic_0070543507_p21749735">The heap memory of the ResourceManager instance is overused or the heap memory is inappropriately allocated. As a result, GCs occur frequently.</p>
|
|
</div>
|
|
<div class="section" id="ALM-18010__sab506c6232b84c518c74e8a6c65f3f95"><h4 class="sectiontitle">Procedure</h4><p class="tableheading" id="ALM-18010__en-us_topic_0070543507_p16898135"><strong id="ALM-18010__b42384480185526">Check the GC duration.</strong></p>
|
|
<ol id="ALM-18010__ol59061512185531"><li id="ALM-18010__li40096486185521"><span>On the FusionInsight Manager portal, choose <strong id="ALM-18010__b8800327185521">O&M > Alarm<strong id="ALM-18010__b27872374104950"> > Alarms</strong></strong> > <strong id="ALM-18010__b12094081185521">ALM-18010 ResourceManager GC Time Exceeds the Threshold</strong> > <strong id="ALM-18010__b41737867185521">Location</strong> to check the IP address of the instance for which the alarm is generated.</span></li><li id="ALM-18010__li30891561185521"><span>On the FusionInsight Manager portal, choose <strong id="ALM-18010__b12738113194213">Cluster </strong>> <em id="ALM-18010__i127398319421">Name of the desired cluster</em><strong id="ALM-18010__b167381531134212"> > Services</strong> > <strong id="ALM-18010__b26589948185521">Yarn</strong> > <strong id="ALM-18010__b37982945185521">Instance</strong> > <strong id="ALM-18010__b1715782810187">ResourceManager (IP address for which the alarm is generated)</strong>. Click the drop-down menu in the upper right corner of <strong id="ALM-18010__b3273144141318">Chart</strong>, choose <strong id="ALM-18010__b7246166191312">Customize</strong> > <strong id="ALM-18010__b40715097185521">Garbage Collection (GC) Time of ResourceManager</strong> to check the GC duration statistics of the Broker process collected every minute.</span></li><li id="ALM-18010__li66468998185521"><span>Check whether the GC duration of the ResourceManager process collected every minute exceeds the threshold (12 seconds by default).</span><p><ul class="subitemlist" id="ALM-18010__ul29755065185521"><li id="ALM-18010__li19188475185521">If yes, go to <a href="#ALM-18010__li52460707185521">4</a>.</li><li id="ALM-18010__li10762658185521">If no, go to <a href="#ALM-18010__li2721601185521">7</a>.</li></ul>
|
|
</p></li><li id="ALM-18010__li52460707185521"><a name="ALM-18010__li52460707185521"></a><a name="li52460707185521"></a><span>On the FusionInsight Manager portal, choose <strong id="ALM-18010__b1323934784215">Cluster </strong>> <em id="ALM-18010__i4242184713423">Name of the desired cluster</em> > <strong id="ALM-18010__b61350074185521">Services</strong> > <strong id="ALM-18010__b15279758185521">Yarn</strong> > <strong id="ALM-18010__b3300099185521">Configurations</strong> > <strong id="ALM-18010__b29700899185521">All</strong> <strong id="ALM-18010__b1768963720193">Configurations </strong>> <strong id="ALM-18010__b65981501185521">ResourceManager</strong> > <strong id="ALM-18010__b56962604185521">System</strong> to increase the value of <strong id="ALM-18010__b50568210185521">GC_OPTS</strong> parameter as required.</span><p><div class="note" id="ALM-18010__note73095218437"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="ALM-18010__p189729915442">The mapping between the number of NodeManager instances in a cluster and the memory size of ResourceManager is as follows:</p>
|
|
<ul id="ALM-18010__ul1797219164417"><li id="ALM-18010__li19972159184412">If the number of NodeManager instances in the cluster reaches 100, the recommended JVM parameters of the ResourceManager instance are as follows: -Xms4G -Xmx4G -XX:NewSize=512M -XX:MaxNewSize=1G</li><li id="ALM-18010__li09721496446">If the number of NodeManager instances in the cluster reaches 200, the recommended JVM parameters of the ResourceManager instance are as follows: -Xms6G -Xmx6G -XX:NewSize=512M -XX:MaxNewSize=1G</li><li id="ALM-18010__li09725914444">If the number of NodeManager instances in the cluster reaches 500, the recommended JVM parameters of the ResourceManager instance are as follows: -Xms10G -Xmx10G -XX:NewSize=1G -XX:MaxNewSize=2G</li><li id="ALM-18010__li9972179124412">If the number of NodeManager instances in the cluster reaches 1000, the recommended JVM parameters of the ResourceManager instance are as follows: -Xms20G -Xmx20G -XX:NewSize=1G -XX:MaxNewSize=2G</li><li id="ALM-18010__li14972591442">If the number of NodeManager instances in the cluster reaches 2000, the recommended JVM parameters of the ResourceManager instance are as follows: -Xms40G -Xmx40G -XX:NewSize=2G -XX:MaxNewSize=4G</li><li id="ALM-18010__li8972149114416">If the number of NodeManager instances in the cluster reaches 3000, the recommended JVM parameters of the ResourceManager instance are as follows: -Xms60G -Xmx60G -XX:NewSize=2G -XX:MaxNewSize=4G</li><li id="ALM-18010__li360719137440">If the number of NodeManager instances in the cluster reaches 4000, the recommended JVM parameters of the ResourceManager instance are as follows: -Xms80G -Xmx80G -XX:NewSize=2G -XX:MaxNewSize=4G</li><li id="ALM-18010__li159729915440">If the number of NodeManager instances in the cluster reaches 5000, the recommended JVM parameters of the ResourceManager instance are as follows: -Xms100G -Xmx100G -XX:NewSize=3G -XX:MaxNewSize=6G</li></ul>
|
|
</div></div>
|
|
</p></li><li id="ALM-18010__li2384316185521"><span>Save the configuration and restart the ResourceManager instance.</span></li><li id="ALM-18010__li40968677185521"><span>Check whether the alarm is cleared.</span><p><ul class="subitemlist" id="ALM-18010__ul7133822185521"><li id="ALM-18010__li21458846185521">If yes, no further action is required.</li><li id="ALM-18010__li60444970185521">If no, go to <a href="#ALM-18010__li2721601185521">7</a>.</li></ul>
|
|
</p></li></ol>
|
|
<p class="tableheading" id="ALM-18010__p64204398185521"><strong id="ALM-18010__b6414535185544">Collect fault information.</strong></p>
|
|
<ol start="7" id="ALM-18010__ol61582939185541"><li id="ALM-18010__li2721601185521"><a name="ALM-18010__li2721601185521"></a><a name="li2721601185521"></a><span>On the FusionInsight Manager portal, choose <strong id="ALM-18010__b8284162132113">O&M</strong> > <strong id="ALM-18010__b30128561185521">Log > Download</strong>.</span></li><li id="ALM-18010__li5467477185521"><span>Select <strong id="ALM-18010__b24494410185521">ResourceManager</strong> in the required cluster from the <strong id="ALM-18010__b19123103185521">Service</strong>.</span></li><li id="ALM-18010__li1145664103113"><span>Click <span><img id="ALM-18010__image1945644173117" src="en-us_image_0269417397.png"></span> in the upper right corner, and set <strong id="ALM-18010__b6456941173117">Start Date</strong> and <strong id="ALM-18010__b11456154113318">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-18010__b13456164113319">Download</strong>.</span></li><li id="ALM-18010__li29218203185521"><span>Contact the <span id="ALM-18010__text4614151421417">O&M personnel</span> and send the collected logs.</span></li></ol>
|
|
</div>
|
|
<div class="section" id="ALM-18010__section1529716184534"><h4 class="sectiontitle">Alarm Clearing</h4><p id="ALM-18010__p4677152685316">After the fault is rectified, the system automatically clears this alarm.</p>
|
|
</div>
|
|
<div class="section" id="ALM-18010__s0dec1f25b0b84c849fc99cb116e4984e"><h4 class="sectiontitle">Related Information</h4><p id="ALM-18010__en-us_topic_0070543507_p19013575">None</p>
|
|
</div>
|
|
</div>
|
|
<div>
|
|
<div class="familylinks">
|
|
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
|
|
</div>
|
|
</div>
|
|
|