doc-exports/docs/mrs/umn/ALM-14028.html
Yang, Tong 3b1f73dece MRS UMN 2.0.38.SP20 version
Reviewed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com>
Co-authored-by: Yang, Tong <yangtong2@huawei.com>
Co-committed-by: Yang, Tong <yangtong2@huawei.com>
2022-12-13 12:03:34 +00:00

108 lines
14 KiB
HTML

<a name="ALM-14028"></a><a name="ALM-14028"></a>
<h1 class="topictitle1">ALM-14028 Number of Blocks to Be Supplemented Exceeds the Threshold</h1>
<div id="body1557302240379"><div class="section" id="ALM-14028__section8243740"><h4 class="sectiontitle">Description</h4><p id="ALM-14028__p7104400">The system checks the number of blocks to be supplemented every 30 seconds and compares the number with the threshold. The number of blocks to be supplemented has a default threshold. This alarm is generated when the number of blocks to be supplemented exceeds the threshold.</p>
<p id="ALM-14028__p63939602">You can change the threshold specified by <strong id="ALM-14028__b065411569507">Blocks Under Replicated (NameNode)</strong> by choosing <strong id="ALM-14028__b86544568503">O&amp;M</strong> &gt; <strong id="ALM-14028__b66541569507">Alarm</strong> &gt; <strong id="ALM-14028__b765475610508">Thresholds</strong> &gt; <em id="ALM-14028__i1021725095817">Name of the desired cluster</em> &gt; <strong id="ALM-14028__b965495675017">HDFS</strong> &gt; <strong id="ALM-14028__b86549562503">File and Block</strong>.</p>
<p id="ALM-14028__p58579705104741">If <strong id="ALM-14028__b19654195625011">Trigger Count</strong> is set to <strong id="ALM-14028__b1065435645013">1</strong> and the number of blocks to be supplemented is less than or equal to the threshold, this alarm is cleared. If <strong id="ALM-14028__b1465435615014">Trigger Count</strong> is greater than <strong id="ALM-14028__b11654156125018">1</strong> and the number of blocks to be supplemented is less than or equal to 90% of the threshold, this alarm is cleared.</p>
</div>
<div class="section" id="ALM-14028__section7084804"><h4 class="sectiontitle">Attribute</h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-14028__table38418539" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-14028__row53418480"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.1"><p id="ALM-14028__p31929608">Alarm ID</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.2"><p id="ALM-14028__p36161432">Alarm Severity</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.3"><p id="ALM-14028__p43394889">Auto Clear</p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-14028__row25325122"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.1 "><p id="ALM-14028__p38069036">14028</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.2 "><p id="ALM-14028__p63693103">Minor</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.3 "><p id="ALM-14028__p58867698">Yes</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-14028__section63763242"><h4 class="sectiontitle">Parameters</h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-14028__table3554205" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-14028__row22865724"><th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.1"><p id="ALM-14028__p40184376">Name</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.2"><p id="ALM-14028__p33709057">Meaning</p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-14028__row0371321261"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-14028__p156438591896">Source</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-14028__p187931338134115">Specifies the cluster for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-14028__row46079102"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-14028__p65062640">ServiceName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-14028__p66669494">Specifies the service for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-14028__row63154538"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-14028__p35626567">RoleName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-14028__p26802723">Specifies the role for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-14028__row39897916"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-14028__p51620924">HostName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-14028__p45657288">Specifies the host for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-14028__row8262415"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-14028__p65275865">NameServiceName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-14028__p52853732">Specifies the NameService for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-14028__row5921545"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-14028__p9883160">Trigger Condition</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-14028__p62338491">Specifies the threshold for triggering the alarm.</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-14028__section36998271"><h4 class="sectiontitle">Impact on the System</h4><p id="ALM-14028__p16253019">Data stored in HDFS is lost. HDFS may enter the security mode and cannot provide write services. Lost block data cannot be restored.</p>
</div>
<div class="section" id="ALM-14028__section64548988"><h4 class="sectiontitle">Possible Causes</h4><ul id="ALM-14028__ul41426143"><li id="ALM-14028__li37290970">The DataNode instance is abnormal.</li><li id="ALM-14028__li74416">Data is deleted.</li><li id="ALM-14028__li149845017332">The number of replicas written into the file is greater than the number of DataNodes.</li></ul>
</div>
<div class="section" id="ALM-14028__section1757821214462"><h4 class="sectiontitle">Procedure</h4><ol id="ALM-14028__ol359796216321"><li id="ALM-14028__li14669113152720"><span>On FusionInsight Manager, choose <strong id="ALM-14028__b795313500367">O&amp;M</strong>. In the navigation pane on the left, choose <strong id="ALM-14028__b16953650153610">Alarm</strong> &gt; <strong id="ALM-14028__b159533501363">Alarms</strong>. On the page that is displayed, check whether alarm <strong id="ALM-14028__b19953135033610">ALM-14003 Number of Lost HDFS Blocks Exceeds the Threshold</strong> is generated.</span><p><ul class="subitemlist" id="ALM-14028__ul118434614516"><li id="ALM-14028__li81841146656">If yes, go to <a href="#ALM-14028__li23401293163156">2</a>.</li><li id="ALM-14028__li1918511461556">If no, go to <a href="#ALM-14028__li2696171714538">3</a>.</li></ul>
</p></li><li id="ALM-14028__li23401293163156"><a name="ALM-14028__li23401293163156"></a><a name="li23401293163156"></a><span>Rectify the fault according to the handling procedure of <strong id="ALM-14028__b145801012113713">ALM-14003 Number of Lost HDFS Blocks Exceeds the Threshold</strong>. Five minutes later, check whether the alarm is cleared.</span><p><ul class="subitemlist" id="ALM-14028__ul47339386163156"><li id="ALM-14028__li32560724163156">If yes, no further action is required.</li><li id="ALM-14028__li20173012163156">If no, go to <a href="#ALM-14028__li2696171714538">3</a>.</li></ul>
</p></li></ol><ol start="3" id="ALM-14028__ol60451969163226"><li id="ALM-14028__li2696171714538"><a name="ALM-14028__li2696171714538"></a><a name="li2696171714538"></a><span>Log in to the HDFS client as user <strong id="ALM-14028__b13941112752118">root</strong>. The user password is defined by the user before the installation. Contact the MRS cluster administrator to obtain the password. Run the following commands:</span><p><ul id="ALM-14028__ul186659391277"><li id="ALM-14028__li2665153919274">Security mode:<p id="ALM-14028__p15448113195317"><a name="ALM-14028__li2665153919274"></a><a name="li2665153919274"></a><strong id="ALM-14028__b307787015929">cd </strong><em id="ALM-14028__i12086576935929">Client installation directory</em></p>
<p id="ALM-14028__p242017192533"><strong id="ALM-14028__b1749214015534">source bigdata_env</strong></p>
<p id="ALM-14028__p1383154718581"><strong id="ALM-14028__b272414499589">kinit hdfs</strong></p>
</li><li id="ALM-14028__li18727104652710">Normal mode:<p id="ALM-14028__p19874185710584"><a name="ALM-14028__li18727104652710"></a><a name="li18727104652710"></a><strong id="ALM-14028__b187411310205919">su - omm</strong></p>
<p id="ALM-14028__p1481214635916"><strong id="ALM-14028__b15033402605929">cd </strong><em id="ALM-14028__i7363401075929">Client installation directory</em></p>
<p id="ALM-14028__p188128685916"><strong id="ALM-14028__b208122655912">source bigdata_env</strong></p>
</li></ul>
</p></li><li id="ALM-14028__li1810714615406"><span>Run the <strong id="ALM-14028__b31348387552">hdfs fsck / &gt;&gt; fsck.log</strong> command to obtain the status of the current cluster.</span></li><li id="ALM-14028__li86001758173011"><span>Run the following command to count the number (<em id="ALM-14028__i57721236173014">M</em>) of blocks to be replicated:</span><p><p id="ALM-14028__p442765917309"><strong id="ALM-14028__b19129148105520">cat fsck.log | grep "Under-replicated"</strong></p>
</p></li><li id="ALM-14028__li165181251103012"><span>Run the following command to count the number (<em id="ALM-14028__i04901945153010">N</em>) of blocks to be replicated in the <strong id="ALM-14028__b1955213500307">/tmp/hadoop-yarn/staging/</strong> directory:</span><p><p id="ALM-14028__p18563105333016"><strong id="ALM-14028__b56751745560">cat fsck.log | grep "Under replicated" | grep "/tmp/hadoop-yarn/staging/" | wc -l</strong></p>
<div class="note" id="ALM-14028__note15248193410311"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="ALM-14028__p3248334539"><strong id="ALM-14028__b1088820424387">/tmp/hadoop-yarn/staging/</strong> is the default directory. If the directory is modified, obtain it from the configuration item <strong id="ALM-14028__b20874181294318">yarn.app.mapreduce.am.staging-dir</strong> in the <strong id="ALM-14028__b870152294320">mapred-site.xml</strong> file.</p>
</div></div>
</p></li><li id="ALM-14028__li14351159123219"><span>Check whether the percentage of <em id="ALM-14028__i9111725132315">N</em> is greater than 50% (N/M &gt; 50%).</span><p><ul class="subitemlist" id="ALM-14028__ul1952618107282"><li id="ALM-14028__li5526910122818">If yes, go to <a href="#ALM-14028__li181311850105810">8</a>.</li><li id="ALM-14028__li252701020287">If no, go to <a href="#ALM-14028__li1649292775015">9</a>.</li></ul>
</p></li><li id="ALM-14028__li181311850105810"><a name="ALM-14028__li181311850105810"></a><a name="li181311850105810"></a><span>Run the following command to reconfigure the number of file replicas in the directory (set the number of file replicas to the number of DataNodes or the default number of file replicas):</span><p><p id="ALM-14028__p14636451135811"><strong id="ALM-14028__b3213729554">hdfs dfs -setrep -w</strong> <em id="ALM-14028__i95868371515"><strong id="ALM-14028__b1116618411758">Number of file replicas</strong></em><strong id="ALM-14028__b151674413512">/tmp/hadoop-yarn/staging/</strong></p>
<div class="note" id="ALM-14028__note128631666620"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="ALM-14028__p1426064220111">To obtain the default number of file replicas:</p>
<p id="ALM-14028__p1284481019711">Log in to FusionInsight Manager, choose <strong id="ALM-14028__b869531710497">Cluster &gt; Services &gt; HDFS &gt; Configurations &gt; All Configurations</strong>, and search for the <strong id="ALM-14028__b19533192164914">dfs.replication</strong> parameter. The value of this parameter is the default number of file replicas.</p>
</div></div>
<p id="ALM-14028__p199219569262">Check whether the alarm is cleared 5 minutes later.</p>
<ul class="subitemlist" id="ALM-14028__ul248083718277"><li id="ALM-14028__li194801437202718">If yes, no further action is required.</li><li id="ALM-14028__li24811137172710">If no, go to <a href="#ALM-14028__li1649292775015">9</a>.</li></ul>
</p></li></ol>
<p class="subitemlist" id="ALM-14028__p1099931814506"><strong id="ALM-14028__b117631559133715">Collect the fault information.</strong></p>
<ol start="9" id="ALM-14028__ol1949462725015"><li id="ALM-14028__li1649292775015"><a name="ALM-14028__li1649292775015"></a><a name="li1649292775015"></a><span>On FusionInsight Manager, choose <strong id="ALM-14028__b185479972815">O&amp;M</strong>. In the navigation pane on the left, choose <strong id="ALM-14028__b11547159102813">Log</strong> &gt; <strong id="ALM-14028__b354839162810">Download</strong>.</span></li><li id="ALM-14028__li1849210274501"><span>Expand the drop-down list next to the <strong id="ALM-14028__b2025014112205">Service</strong> field. In the <strong id="ALM-14028__b19257111116200">Services</strong> dialog box that is displayed, select <strong id="ALM-14028__b142571111112020">HDFS</strong> for the target cluster.</span></li><li id="ALM-14028__li184925277506"><span>Click <span><img id="ALM-14028__image1781714261388" src="en-us_image_0269417373.png"></span> in the upper right corner, and set <strong id="ALM-14028__b18173261382">Start Date</strong> and <strong id="ALM-14028__b1681710267389">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-14028__b1681762617386">Download</strong>.</span></li><li id="ALM-14028__li17494527125010"><span>Contact <span id="ALM-14028__text144433518428">O&amp;M personnel</span> and provide the collected logs.</span></li></ol>
</div>
<div class="section" id="ALM-14028__section169311343318"><h4 class="sectiontitle">Alarm Clearing</h4><p id="ALM-14028__p754913417333">This alarm is automatically cleared after the fault is rectified.</p>
</div>
<div class="section" id="ALM-14028__section61085563"><h4 class="sectiontitle">Related Information</h4><p id="ALM-14028__p11393601">None</p>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
</div>
</div>