forked from docs/doc-exports
Reviewed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com> Co-authored-by: Yang, Tong <yangtong2@huawei.com> Co-committed-by: Yang, Tong <yangtong2@huawei.com>
88 lines
12 KiB
HTML
88 lines
12 KiB
HTML
<a name="ALM-14015"></a><a name="ALM-14015"></a>
|
|
|
|
<h1 class="topictitle1">ALM-14015 DataNode GC Time Exceeds the Threshold</h1>
|
|
<div id="body4800306"><div class="section" id="ALM-14015__se87ce95b1b6745239d9d24568564a13a"><h4 class="sectiontitle">Description</h4><p id="ALM-14015__en-us_topic_0070543652_p49601744">The system checks the garbage collection (GC) duration of the DataNode process every 60 seconds. This alarm is generated when the GC duration exceeds the threshold (12 seconds by default).</p>
|
|
<p id="ALM-14015__en-us_topic_0070543652_p43762514">This alarm is cleared when the GC duration is less than the threshold.</p>
|
|
</div>
|
|
<div class="section" id="ALM-14015__s20085e09176145a491f18dd229b2f790"><h4 class="sectiontitle">Attribute</h4>
|
|
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-14015__en-us_topic_0070543652_table55102727" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-14015__en-us_topic_0070543652_row23476074"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.1"><p id="ALM-14015__en-us_topic_0070543652_p22513874">Alarm ID</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.2"><p id="ALM-14015__en-us_topic_0070543652_p11684490">Alarm Severity</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.3"><p id="ALM-14015__en-us_topic_0070543652_p6919610">Automatically Cleared</p>
|
|
</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody><tr id="ALM-14015__en-us_topic_0070543652_row23617547"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.1 "><p id="ALM-14015__en-us_topic_0070543652_p33973156">14015</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.2 "><p id="ALM-14015__en-us_topic_0070543652_p362286">Major</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.3 "><p id="ALM-14015__en-us_topic_0070543652_p29345233">Yes</p>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
</div>
|
|
<div class="section" id="ALM-14015__sbc401981fe53416b9b24e997eb1c39e7"><h4 class="sectiontitle">Parameters</h4>
|
|
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-14015__en-us_topic_0070543652_table28153652" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-14015__en-us_topic_0070543652_row24226359"><th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.1"><p id="ALM-14015__en-us_topic_0070543652_p16178030">Name</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.2"><p id="ALM-14015__en-us_topic_0070543652_p35352074">Meaning</p>
|
|
</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody><tr id="ALM-14015__row183701627173418"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-14015__p192431315431">Source</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-14015__p692551319435">Specifies the cluster for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-14015__en-us_topic_0070543652_row44945779"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-14015__en-us_topic_0070543652_p16729488">ServiceName</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-14015__en-us_topic_0070543652_p12911306">Specifies the service for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-14015__en-us_topic_0070543652_row49092893"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-14015__en-us_topic_0070543652_p17101388">RoleName</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-14015__en-us_topic_0070543652_p43035212">Specifies the role for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-14015__en-us_topic_0070543652_row51772594"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-14015__en-us_topic_0070543652_p32830574">HostName</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-14015__en-us_topic_0070543652_p42030798">Specifies the host for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-14015__en-us_topic_0070543652_row42732863"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-14015__en-us_topic_0070543652_p38809873">Trigger Condition</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-14015__en-us_topic_0070543652_p56591995">Specifies the threshold triggering the alarm. If the current indicator value exceeds this threshold, the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
</div>
|
|
<div class="section" id="ALM-14015__sb9cc35fa136b413785522f01caba819f"><h4 class="sectiontitle">Impact on the System</h4><p id="ALM-14015__en-us_topic_0070543652_p20548902">A long GC duration of the DataNode process may interrupt the services.</p>
|
|
</div>
|
|
<div class="section" id="ALM-14015__s07f5ea3e33634ab19eb19624c00fae23"><h4 class="sectiontitle">Possible Causes</h4><p id="ALM-14015__en-us_topic_0070543652_p53848378">The heap memory of the DataNode instance is overused or the heap memory is inappropriately allocated. As a result, GCs occur frequently.</p>
|
|
</div>
|
|
<div class="section" id="ALM-14015__scaddaa7c4f494e7ab880936d375601be"><h4 class="sectiontitle">Procedure</h4><p class="tableheading" id="ALM-14015__en-us_topic_0070543652_p66751325"><strong id="ALM-14015__b918346193532">Check the GC duration.</strong></p>
|
|
<ol id="ALM-14015__ol2021026193552"><li id="ALM-14015__li3749189193538"><span>On the FusionInsight Manager portal, choose <strong id="ALM-14015__b0989163143314">O&M</strong> > <strong id="ALM-14015__b1098973103314">Alarm </strong>> <strong id="ALM-14015__b39898316336">Alarms</strong>. On the displayed interface, click the drop-down button of <strong id="ALM-14015__b75824883315">ALM-14015 DataNode GC Time Exceeds the Threshold</strong>. Then check the role name in <strong id="ALM-14015__b14790172183618">Location </strong>and confirm the IP adress of the instance.</span></li><li id="ALM-14015__li2008564493538"><span>On the FusionInsight Manager portal, choose <strong id="ALM-14015__b1091918103615">Cluster > </strong><em id="ALM-14015__i111191810362">Name of the desired cluster</em><strong id="ALM-14015__b2101818103612"> > Services</strong> > <strong id="ALM-14015__b1694433693538">HDFS</strong> > <strong id="ALM-14015__b1828130293538">Instance</strong> > <strong id="ALM-14015__b3031399493538">DataNode (IP address for which the alarm is generated)</strong>. Click the drop-down menu in the upper right corner of <strong id="ALM-14015__b3273144141318">Chart</strong>, choose <strong id="ALM-14015__b7246166191312">Customize</strong> > <strong id="ALM-14015__b15702441192211">Garbage Collection</strong>, and select<strong id="ALM-14015__b540245213164"> </strong><strong id="ALM-14015__b3190969793121"></strong><strong id="ALM-14015__b3951444093538">DataNode Garbage Collection (GC)</strong> to check the GC duration statistics of the DataNode process collected every minute.</span></li><li id="ALM-14015__li6635855893538"><span>Check whether the GC duration of the DataNode process collected every minute exceeds the threshold (12 seconds by default).</span><p><ul class="subitemlist" id="ALM-14015__ul2228625493538"><li id="ALM-14015__li1632449293538">If yes, go to <a href="#ALM-14015__li1285468393538">4</a>.</li><li id="ALM-14015__li4721549393538">If no, go to <a href="#ALM-14015__li5362621093538">7</a>.</li></ul>
|
|
</p></li><li id="ALM-14015__li1285468393538"><a name="ALM-14015__li1285468393538"></a><a name="li1285468393538"></a><span>On the FusionInsight Manager portal, choose <strong id="ALM-14015__b718112714368">Cluster > </strong><em id="ALM-14015__i15201427193612">Name of the desired cluster</em><strong id="ALM-14015__b191852711361"> > Services</strong> > <strong id="ALM-14015__b2162943793121">HDFS</strong> > <strong id="ALM-14015__b6593994694053">Configurations</strong> > <strong id="ALM-14015__b715398193121">All</strong> <strong id="ALM-14015__b6816162115232">Configurations</strong> > <strong id="ALM-14015__b5414114793538">DataNode</strong> > <strong id="ALM-14015__b1750828393538">System</strong> to increase the value of <strong id="ALM-14015__b888483893538">GC_OPTS</strong> parameter as required.</span><p><div class="note" id="ALM-14015__note128832410144"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="ALM-14015__p14281368146">The mapping between the average number of blocks of a DataNode instance and the DataNode memory is as follows:</p>
|
|
<ul id="ALM-14015__ul19428156161410"><li id="ALM-14015__li937212918142">If the average number of blocks of a DataNode instance reaches 2,000,000, the reference values of the JVM parameters of the DataNode are as follows: -Xms6G -Xmx6G -XX:NewSize=512M -XX:MaxNewSize=512M</li><li id="ALM-14015__li1542896131411">If the average number of blocks of a DataNode instance reaches 5,000,000, the reference values of the JVM parameters of the DataNode are as follows: -Xms12G -Xmx12G -XX:NewSize=1G -XX:MaxNewSize=1G</li></ul>
|
|
</div></div>
|
|
</p></li><li id="ALM-14015__li4858329093538"><span>Save the configuration and restart the DataNode instance.</span></li><li id="ALM-14015__li2345579993538"><span>Check whether the alarm is cleared.</span><p><ul class="subitemlist" id="ALM-14015__ul5497087493538"><li id="ALM-14015__li3459642693538">If yes, no further action is required.</li><li id="ALM-14015__li5084711793538">If no, go to <a href="#ALM-14015__li5362621093538">7</a>.</li></ul>
|
|
</p></li></ol>
|
|
<p class="tableheading" id="ALM-14015__p2497582193538"><strong id="ALM-14015__b4121244593559">Collect fault information.</strong></p>
|
|
<ol start="7" id="ALM-14015__ol444115469363"><li id="ALM-14015__li5362621093538"><a name="ALM-14015__li5362621093538"></a><a name="li5362621093538"></a><span>On the FusionInsight Manager portal, choose <strong id="ALM-14015__b39977366113627">O&M</strong> > <strong id="ALM-14015__b24251979113627">Log > Download</strong>.</span></li><li id="ALM-14015__li5689914493538"><span>Select <strong id="ALM-14015__b1287384293538">DataNode</strong> in the required cluster from the <strong id="ALM-14015__b4875571993538">Service</strong>.</span></li><li id="ALM-14015__li1145664103113"><span>Click <span><img id="ALM-14015__image1945644173117" src="en-us_image_0269383970.png"></span> in the upper right corner, and set <strong id="ALM-14015__b6456941173117">Start Date</strong> and <strong id="ALM-14015__b11456154113318">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-14015__b13456164113319">Download</strong>.</span></li><li id="ALM-14015__li2253165693538"><span>Contact the <span id="ALM-14015__text4614151421417">O&M personnel</span> and send the collected logs.</span></li></ol>
|
|
</div>
|
|
<div class="section" id="ALM-14015__section1529716184534"><h4 class="sectiontitle">Alarm Clearing</h4><p id="ALM-14015__p4677152685316">After the fault is rectified, the system automatically clears this alarm.</p>
|
|
</div>
|
|
<div class="section" id="ALM-14015__s3df8b7c7bd0749448e1602cf4ffc974f"><h4 class="sectiontitle">Related Information</h4><p id="ALM-14015__en-us_topic_0070543652_p51531231">None</p>
|
|
</div>
|
|
</div>
|
|
<div>
|
|
<div class="familylinks">
|
|
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
|
|
</div>
|
|
</div>
|
|
|