forked from docs/doc-exports
Reviewed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com> Co-authored-by: Yang, Tong <yangtong2@huawei.com> Co-committed-by: Yang, Tong <yangtong2@huawei.com>
81 lines
10 KiB
HTML
81 lines
10 KiB
HTML
<a name="alm_14002"></a><a name="alm_14002"></a>
|
|
|
|
<h1 class="topictitle1">ALM-14002 DataNode Disk Usage Exceeds the Threshold</h1>
|
|
<div id="body8662426"><div class="section" id="alm_14002__en-us_topic_0191813920_section20869327"><h4 class="sectiontitle">Description</h4><p id="alm_14002__en-us_topic_0191813920_p41580444">The system checks the DataNode disk usage every 30 seconds and compares the actual disk usage with the threshold. The <strong id="alm_14002__b147231426147">Percentage of DataNode Capacity</strong> indicator has a default threshold. This alarm is generated when the value of the <strong id="alm_14002__b17466210181512">Percentage of DataNode Capacity</strong> indicator exceeds the threshold.</p>
|
|
<p id="alm_14002__en-us_topic_0191813920_p12572829">This alarm is cleared when the value of the <strong id="alm_14002__b1732831391511">Percentage of DataNode Capacity</strong> indicator is less than or equal to the threshold.</p>
|
|
</div>
|
|
<div class="section" id="alm_14002__en-us_topic_0191813920_section53606218"><h4 class="sectiontitle">Attribute</h4>
|
|
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="alm_14002__en-us_topic_0191813920_table11766267" frame="border" border="1" rules="all"><thead align="left"><tr id="alm_14002__en-us_topic_0191813920_row7304143"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.1"><p id="alm_14002__en-us_topic_0191813920_p54764719">Alarm ID</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.2"><p id="alm_14002__en-us_topic_0191813920_p6757235">Alarm Severity</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.3"><p id="alm_14002__en-us_topic_0191813920_p10465156">Auto Clear</p>
|
|
</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody><tr id="alm_14002__en-us_topic_0191813920_row42371273"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.1 "><p id="alm_14002__en-us_topic_0191813920_p9521066">14002</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.2 "><p id="alm_14002__en-us_topic_0191813920_p33008913">Major</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.3 "><p id="alm_14002__en-us_topic_0191813920_p56476259">Yes</p>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
</div>
|
|
<div class="section" id="alm_14002__en-us_topic_0191813920_section12693918"><h4 class="sectiontitle">Parameters</h4>
|
|
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="alm_14002__en-us_topic_0191813920_table11174282" frame="border" border="1" rules="all"><thead align="left"><tr id="alm_14002__en-us_topic_0191813920_row15876907"><th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.1"><p id="alm_14002__en-us_topic_0191813920_p10961125">Parameter</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.2"><p id="alm_14002__en-us_topic_0191813920_p15435960">Description</p>
|
|
</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody><tr id="alm_14002__en-us_topic_0191813920_row42353227"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="alm_14002__en-us_topic_0191813920_p8059334">ServiceName</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="alm_14002__en-us_topic_0191813920_p48826322">Specifies the service for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="alm_14002__en-us_topic_0191813920_row36783718"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="alm_14002__en-us_topic_0191813920_p26691149">RoleName</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="alm_14002__en-us_topic_0191813920_p14499481">Specifies the role for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="alm_14002__en-us_topic_0191813920_row63386473"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="alm_14002__en-us_topic_0191813920_p34030663">HostName</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="alm_14002__en-us_topic_0191813920_p5020285">Specifies the host for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="alm_14002__en-us_topic_0191813920_row45182569"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="alm_14002__en-us_topic_0191813920_p35909463">Trigger condition</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="alm_14002__en-us_topic_0191813920_p22985394">Generates an alarm when the actual indicator value exceeds the specified threshold.</p>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
</div>
|
|
<div class="section" id="alm_14002__en-us_topic_0191813920_section47136405"><h4 class="sectiontitle">Impact on the System</h4><p id="alm_14002__en-us_topic_0191813920_p49877601">Insufficient disk space will impact read/write to HDFS.</p>
|
|
</div>
|
|
<div class="section" id="alm_14002__en-us_topic_0191813920_section21574462"><h4 class="sectiontitle">Possible Causes</h4><ul id="alm_14002__en-us_topic_0191813920_ul13553869"><li id="alm_14002__en-us_topic_0191813920_li54875957">The disk space configured for the HDFS cluster is insufficient.</li><li id="alm_14002__en-us_topic_0191813920_li24121565">Data skew occurs among DataNodes.</li></ul>
|
|
</div>
|
|
<div class="section" id="alm_14002__en-us_topic_0191813920_section59952436"><h4 class="sectiontitle">Procedure</h4><ol id="alm_14002__en-us_topic_0191813920_ol2654089315352"><li class="tableheading" id="alm_14002__en-us_topic_0191813920_li3903743315352"><span>Check the cluster disk capacity.</span><p><ol type="a" id="alm_14002__en-us_topic_0191813920_ol2098632"><li id="alm_14002__en-us_topic_0191813920_li18887690">Go to the MRS cluster details page. On the <strong id="alm_14002__b9219154217166">Alarms</strong> page, check whether the ALM-14001 HDFS Disk Usage Exceeds the Threshold alarm exists.<ul class="subitemlist" id="alm_14002__en-us_topic_0191813920_ul35771490"><li id="alm_14002__en-us_topic_0191813920_li53507960">If yes, go to <a href="#alm_14002__en-us_topic_0191813920_yt2">1.b</a>.</li><li id="alm_14002__en-us_topic_0191813920_li39177514">If no, go to <a href="#alm_14002__en-us_topic_0191813920_li64268160">2.a</a>.</li></ul>
|
|
</li><li id="alm_14002__en-us_topic_0191813920_yt2"><a name="alm_14002__en-us_topic_0191813920_yt2"></a><a name="en-us_topic_0191813920_yt2"></a>Handle the alarm by following the instructions in ALM-14001 HDFS Disk Usage Exceeds the Threshold and check whether the alarm is cleared.<ul class="subitemlist" id="alm_14002__en-us_topic_0191813920_ul16725372"><li id="alm_14002__en-us_topic_0191813920_li16310625">If yes, go to <a href="#alm_14002__en-us_topic_0191813920_yt3">1.c</a>.</li><li id="alm_14002__en-us_topic_0191813920_li46092238">If no, go to <a href="#alm_14002__en-us_topic_0191813920_li572522141314">3</a>.</li></ul>
|
|
</li><li id="alm_14002__en-us_topic_0191813920_yt3"><a name="alm_14002__en-us_topic_0191813920_yt3"></a><a name="en-us_topic_0191813920_yt3"></a>Wait 5 minutes and check whether the alarm is cleared. <ul class="subitemlist" id="alm_14002__en-us_topic_0191813920_ul46809697"><li id="alm_14002__en-us_topic_0191813920_li18634089">If yes, no further action is required.</li><li id="alm_14002__en-us_topic_0191813920_li33489078">If no, go to <a href="#alm_14002__en-us_topic_0191813920_li64268160">2.a</a>.</li></ul>
|
|
</li></ol>
|
|
</p></li><li class="tableheading" id="alm_14002__en-us_topic_0191813920_li12337129153517"><span>Check the balance status of DataNodes.</span><p><ol type="a" id="alm_14002__en-us_topic_0191813920_ol23968912151440"><li id="alm_14002__en-us_topic_0191813920_li64268160"><a name="alm_14002__en-us_topic_0191813920_li64268160"></a><a name="en-us_topic_0191813920_li64268160"></a>Use the client on the cluster node, run the <strong id="alm_14002__b12199152713212">hdfs dfsadmin -report</strong> command to view the value of <strong id="alm_14002__b519916279217">DFS Used%</strong> on the DataNode for which the alarm is generated, and compare the value with those on other DataNodes. Check whether the difference between the values is larger than 10.<ul class="subitemlist" id="alm_14002__en-us_topic_0191813920_ul41542534"><li id="alm_14002__en-us_topic_0191813920_li38338487">If yes, go to <a href="#alm_14002__en-us_topic_0191813920_step17">2.b</a>.</li><li id="alm_14002__en-us_topic_0191813920_li18409771">If no, go to <a href="#alm_14002__en-us_topic_0191813920_li572522141314">3</a>.</li></ul>
|
|
</li><li id="alm_14002__en-us_topic_0191813920_step17"><a name="alm_14002__en-us_topic_0191813920_step17"></a><a name="en-us_topic_0191813920_step17"></a>If data skew occurs, use the client on the cluster node and run the <strong id="alm_14002__b19737250172111">hdfs balancer -threshold 10</strong> command.</li><li id="alm_14002__en-us_topic_0191813920_li27263117">Wait 5 minutes and check whether the alarm is cleared.<ul class="subitemlist" id="alm_14002__en-us_topic_0191813920_ul44041463"><li id="alm_14002__en-us_topic_0191813920_li60828848">If yes, no further action is required.</li><li id="alm_14002__en-us_topic_0191813920_li10588721">If no, go to <a href="#alm_14002__en-us_topic_0191813920_li572522141314">3</a>.</li></ul>
|
|
</li></ol>
|
|
</p></li><li id="alm_14002__en-us_topic_0191813920_li572522141314"><a name="alm_14002__en-us_topic_0191813920_li572522141314"></a><a name="en-us_topic_0191813920_li572522141314"></a><span>Collect fault information.</span><p><ol type="a" id="alm_14002__en-us_topic_0191813920_en-us_topic_0191813935_ol6089206913036"><li id="alm_14002__en-us_topic_0191813920_en-us_topic_0191813935_li4478836213036">On MRS Manager, choose <span class="menucascade" id="alm_14002__menucascade14801132872117"><b><span class="uicontrol" id="alm_14002__uicontrol4795328102118">System</span></b> > <b><span class="uicontrol" id="alm_14002__uicontrol16800028162112">Export Log</span></b></span>.</li><li id="alm_14002__li18574327401">Contact technical support engineers for help. For details, see <a href="https://docs.otc.t-systems.com/en-us/public/learnmore.html" target="_blank" rel="noopener noreferrer">technical support</a>.</li></ol>
|
|
</p></li></ol>
|
|
</div>
|
|
<div class="section" id="alm_14002__en-us_topic_0191813920_section2701015"><h4 class="sectiontitle">Reference</h4><p id="alm_14002__en-us_topic_0191813920_p18303590">None</p>
|
|
</div>
|
|
</div>
|
|
<div>
|
|
<div class="familylinks">
|
|
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_0241.html">Alarm Reference (Applicable to Versions Earlier Than MRS 3.x)</a></div>
|
|
</div>
|
|
</div>
|
|
|