forked from docs/doc-exports
Reviewed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com> Co-authored-by: Yang, Tong <yangtong2@huawei.com> Co-committed-by: Yang, Tong <yangtong2@huawei.com>
93 lines
13 KiB
HTML
93 lines
13 KiB
HTML
<a name="ALM-14023"></a><a name="ALM-14023"></a>
|
|
|
|
<h1 class="topictitle1">ALM-14023 Percentage of Total Reserved Disk Space for Replicas Exceeds the Threshold</h1>
|
|
<div id="body1512605981997"><div class="section" id="ALM-14023__section57262109102848"><h4 class="sectiontitle">Description</h4><p id="ALM-14023__p7719242102848">The system checks the percentage of total reserved disk space for replicas (Total reserved disk space for replicas/(Total reserved disk space for replicas + Total remaining disk space)) every 30 seconds and compares the actual percentage with the threshold (<strong id="ALM-14023__b2364316102848">90%</strong> by default). This alarm is generated when the percentage of total reserved disk space for replicas exceeds the threshold for multiple consecutive times (<strong id="ALM-14023__b48421890111935">Trigger Count</strong>).</p>
|
|
<p id="ALM-14023__p45865061102848">The alarm is cleared in the following two scenarios: The value of <strong id="ALM-14023__b4848144214381">Trigger Count</strong> is <strong id="ALM-14023__b24082467102848">1</strong> and the percentage of total reserved disk space for replicas is less than or equal to the threshold; the value of <strong id="ALM-14023__b1563614423818">Trigger Count</strong> is greater than <strong id="ALM-14023__b4522850102848">1</strong> and the percentage of total reserved disk space for replicas is less than or equal to 90% of the threshold.</p>
|
|
</div>
|
|
<div class="section" id="ALM-14023__section40705658102848"><h4 class="sectiontitle">Attribute</h4>
|
|
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-14023__table8823964102848" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-14023__row50528799102848"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.1"><p id="ALM-14023__p66300932102848">Alarm ID</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.2"><p id="ALM-14023__p1666397102848">Alarm Severity</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.3"><p id="ALM-14023__p760467102848">Automatically Cleared</p>
|
|
</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody><tr id="ALM-14023__row61597837102848"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.1 "><p id="ALM-14023__p23368883102848">14023</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.2 "><p id="ALM-14023__p13831407102848">Minor</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.3 "><p id="ALM-14023__p46602150102848">Yes</p>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
</div>
|
|
<div class="section" id="ALM-14023__section16677826102848"><h4 class="sectiontitle">Parameters</h4>
|
|
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-14023__table8726699102848" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-14023__row41024789102848"><th align="left" class="cellrowborder" valign="top" width="44.440000000000005%" id="mcps1.3.3.2.1.3.1.1"><p id="ALM-14023__p34673604102848">Name</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="55.559999999999995%" id="mcps1.3.3.2.1.3.1.2"><p id="ALM-14023__p57098564102848">Meaning</p>
|
|
</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody><tr id="ALM-14023__row162101183335"><td class="cellrowborder" valign="top" width="44.440000000000005%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-14023__p192431315431">Source</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="55.559999999999995%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-14023__p692551319435">Specifies the cluster for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-14023__row61580989102848"><td class="cellrowborder" valign="top" width="44.440000000000005%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-14023__p22004182102848">ServiceName</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="55.559999999999995%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-14023__p37508279102848">Specifies the service for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-14023__row2030194102848"><td class="cellrowborder" valign="top" width="44.440000000000005%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-14023__p30228051102848">RoleName</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="55.559999999999995%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-14023__p32553105102848">Specifies the role for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-14023__row175242516435"><td class="cellrowborder" valign="top" width="44.440000000000005%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-14023__p1372617291401">NameServiceName</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="55.559999999999995%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-14023__p5227992217318">Specifies the NameService service for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-14023__row24542494102848"><td class="cellrowborder" valign="top" width="44.440000000000005%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-14023__p41785010102848">Trigger condition</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="55.559999999999995%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-14023__p29142613102848">Specifies the threshold triggering the alarm. If the current indicator value exceeds this threshold, the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
</div>
|
|
<div class="section" id="ALM-14023__section11741490102848"><h4 class="sectiontitle">Impact on the System</h4><p id="ALM-14023__p11536623102848">The performance of writing data to HDFS is affected. If all remaining DataNode space is reserved for replicas, writing HDFS data fails.</p>
|
|
</div>
|
|
<div class="section" id="ALM-14023__section36720745102848"><h4 class="sectiontitle">Possible Causes</h4><ul id="ALM-14023__ul21590362102848"><li id="ALM-14023__li60095530102848">The alarm threshold is improperly configured.</li><li id="ALM-14023__li3988863102848">The disk space configured for the HDFS cluster is insufficient.</li><li id="ALM-14023__li35899768102848">The volume of services that access HDFS is too large and therefore DataNode is overloaded.</li></ul>
|
|
</div>
|
|
<div class="section" id="ALM-14023__section54662464102848"><h4 class="sectiontitle">Procedure</h4><p id="ALM-14023__p65583495102848"><strong id="ALM-14023__b53380543102848">Check whether the alarm threshold is appropriate.</strong></p>
|
|
<ol id="ALM-14023__ol7370157103714"><li id="ALM-14023__li28856711102848"><span>On the FusionInsight Manager portal, choose <strong id="ALM-14023__b1735017481541">O&M > Alarm > Thresholds ></strong> <em id="ALM-14023__i175981233194118">Name of the desired cluster</em> > <strong id="ALM-14023__b135384805411">HDFS</strong> > <strong id="ALM-14023__b22121736163219">Disk</strong> > <strong id="ALM-14023__b56075985102848">Percentage of Reserved Space for Replicas of Unused Space</strong> to check whether the alarm threshold is appropriate. (The default threshold is <strong id="ALM-14023__b34921825102848">90%</strong>. Users can change it as required.)</span><p><ul id="ALM-14023__ul45860971102848"><li id="ALM-14023__li10095562102848">If yes, go to <a href="#ALM-14023__li13034211102848">4</a>.</li><li id="ALM-14023__li23751198102848">If no, go to <a href="#ALM-14023__li44798865102848">2</a>.</li></ul>
|
|
</p></li><li id="ALM-14023__li44798865102848"><a name="ALM-14023__li44798865102848"></a><a name="li44798865102848"></a><span>Choose <strong id="ALM-14023__en-us_topic_0070543655_b35169757">O&M > Alarm > Thresholds ></strong> <em id="ALM-14023__i1396125519492">Name of the desired cluster</em> > <strong id="ALM-14023__en-us_topic_0070543655_b3167375">HDFS</strong> > <strong id="ALM-14023__b30986652102848">Disk </strong>> <strong id="ALM-14023__b10444412102848">Percentage of Reserved Space for Replicas of Unused Space</strong> and Click <strong id="ALM-14023__b103281457102417">Modify,</strong> change the threshold based on the actual usage.</span></li><li id="ALM-14023__li26890851102848"><span>Wait 5 minutes and check whether the alarm is cleared.</span><p><ul id="ALM-14023__ul40691073102848"><li id="ALM-14023__li30675343102848">If yes, no further action is required.</li><li id="ALM-14023__li7642639102848">If no, go to <a href="#ALM-14023__li13034211102848">4</a>.</li></ul>
|
|
</p></li></ol>
|
|
<p id="ALM-14023__p42617803103722"><strong id="ALM-14023__b357104984414">Check whether an alarm indicating insufficient disk space is generated.</strong></p>
|
|
<ol start="4" id="ALM-14023__ol25959186103734"><li id="ALM-14023__li13034211102848"><a name="ALM-14023__li13034211102848"></a><a name="li13034211102848"></a><span>On the FusionInsight Manager portal, check whether <strong id="ALM-14023__b175188131255">ALM-14001 HDFS Disk Usage Exceeds the Threshold</strong> or <strong id="ALM-14023__b65581524458">ALM-14002 DataNode Disk Usage Exceeds the Threshold</strong> exists on the <strong id="ALM-14023__b24980487172449">O&M > Alarm > Alarms</strong> page.</span><p><ul id="ALM-14023__ul52701704102848"><li id="ALM-14023__li4553289102848">If yes, go to <a href="#ALM-14023__li31013859102848">5</a>.</li><li id="ALM-14023__li40979607102848">If no, go to <a href="#ALM-14023__li16883378102848">7</a>.</li></ul>
|
|
</p></li><li id="ALM-14023__li31013859102848"><a name="ALM-14023__li31013859102848"></a><a name="li31013859102848"></a><span>Handle the alarm by referring to instructions in <strong id="ALM-14023__b1854193119514">ALM-14001 HDFS Disk Usage Exceeds the Threshold</strong> or <strong id="ALM-14023__b5541143120517">ALM-14002 DataNode Disk Usage Exceeds the Threshold</strong> and check whether the alarm is cleared.</span><p><ul id="ALM-14023__ul60524908102848"><li id="ALM-14023__li7853263102848">If yes, go to <a href="#ALM-14023__li20775880102848">6</a>.</li><li id="ALM-14023__li3570510102848">If no, go to <a href="#ALM-14023__li16883378102848">7</a>.</li></ul>
|
|
</p></li><li id="ALM-14023__li20775880102848"><a name="ALM-14023__li20775880102848"></a><a name="li20775880102848"></a><span>Wait 5 minutes and check whether the alarm is cleared.</span><p><ul id="ALM-14023__ul52765198102848"><li id="ALM-14023__li5124739102848">If yes, no further action is required.</li><li id="ALM-14023__li46122656102848">If no, go to <a href="#ALM-14023__li16883378102848">7</a>.</li></ul>
|
|
</p></li></ol>
|
|
<p id="ALM-14023__p35145167103738"><strong id="ALM-14023__b47871048103738">Expand the DataNode capacity.</strong></p>
|
|
<ol start="7" id="ALM-14023__ol722550103752"><li id="ALM-14023__li16883378102848"><a name="ALM-14023__li16883378102848"></a><a name="li16883378102848"></a><span>Expand the DataNode capacity.</span></li><li id="ALM-14023__li25376386102848"><span>Wait 5 minutes and check whether the alarm is cleared.</span><p><ul id="ALM-14023__ul27060884102848"><li id="ALM-14023__li42221366102848">If yes, no further action is required.</li><li id="ALM-14023__li44447977102848">If no, go to <a href="#ALM-14023__li35167437102848">9</a>.</li></ul>
|
|
</p></li></ol>
|
|
<p id="ALM-14023__p12216856103754"><strong id="ALM-14023__b42842846103754">Collect fault information.</strong></p>
|
|
<ol start="9" id="ALM-14023__ol59646065103757"><li id="ALM-14023__li35167437102848"><a name="ALM-14023__li35167437102848"></a><a name="li35167437102848"></a><span>On the FusionInsight Manager portal, choose <strong id="ALM-14023__b39977366113627">O&M</strong> > <strong id="ALM-14023__b24251979113627">Log > Download</strong>.</span></li><li id="ALM-14023__li1476244102848"><span>Select <strong id="ALM-14023__b13286196102848">HDFS</strong> in the required cluster from the <strong id="ALM-14023__b52466902102848">Service</strong>.</span></li><li id="ALM-14023__li1145664103113"><span>Click <span><img id="ALM-14023__image1945644173117" src="en-us_image_0269417368.png"></span> in the upper right corner, and set <strong id="ALM-14023__b6456941173117">Start Date</strong> and <strong id="ALM-14023__b11456154113318">End Date</strong> for log collection to 20 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-14023__b13456164113319">Download</strong>.</span></li><li id="ALM-14023__li1328825102848"><span>Contact the <span id="ALM-14023__text4614151421417">O&M personnel</span> and send the collected logs.</span></li></ol>
|
|
</div>
|
|
<div class="section" id="ALM-14023__section1529716184534"><h4 class="sectiontitle">Alarm Clearing</h4><p id="ALM-14023__p4677152685316">After the fault is rectified, the system automatically clears this alarm.</p>
|
|
</div>
|
|
<div class="section" id="ALM-14023__section11959430102848"><h4 class="sectiontitle">Related Information</h4><p id="ALM-14023__p29189808102848">None</p>
|
|
</div>
|
|
</div>
|
|
<div>
|
|
<div class="familylinks">
|
|
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
|
|
</div>
|
|
</div>
|
|
|