doc-exports/docs/mrs/umn/ALM-14029.html
Yang, Tong 3b1f73dece MRS UMN 2.0.38.SP20 version
Reviewed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com>
Co-authored-by: Yang, Tong <yangtong2@huawei.com>
Co-committed-by: Yang, Tong <yangtong2@huawei.com>
2022-12-13 12:03:34 +00:00

94 lines
11 KiB
HTML

<a name="ALM-14029"></a><a name="ALM-14029"></a>
<h1 class="topictitle1">ALM-14029 Number of Blocks in a Replica Exceeds the Threshold</h1>
<div id="body1559547335321"><div class="section" id="ALM-14029__section8243740"><h4 class="sectiontitle">Description</h4><p id="ALM-14029__p7104400">The system checks the number of blocks in a single replica every four hours and compares the number with the threshold. There is a threshold for the number of blocks in a single replica. This alarm is generated when the actual number of blocks in a single replica exceeds the threshold.</p>
<p id="ALM-14029__p58579705104741">This alarm is cleared when the number of blocks to be supplemented is less than the threshold.</p>
</div>
<div class="section" id="ALM-14029__section7084804"><h4 class="sectiontitle">Attribute</h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-14029__table38418539" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-14029__row53418480"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.1"><p id="ALM-14029__p31929608">Alarm ID</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.2"><p id="ALM-14029__p36161432">Alarm Severity</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.3"><p id="ALM-14029__p43394889">Auto Clear</p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-14029__row25325122"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.1 "><p id="ALM-14029__p38069036">14029</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.2 "><p id="ALM-14029__p63693103">Minor</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.3 "><p id="ALM-14029__p58867698">Yes</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-14029__section63763242"><h4 class="sectiontitle">Parameters</h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-14029__table3554205" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-14029__row22865724"><th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.1"><p id="ALM-14029__p40184376">Name</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.2"><p id="ALM-14029__p33709057">Meaning</p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-14029__row10137556112512"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-14029__p156438591896">Source</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-14029__p187931338134115">Specifies the cluster for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-14029__row46079102"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-14029__p65062640">ServiceName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-14029__p66669494">Specifies the service for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-14029__row63154538"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-14029__p35626567">RoleName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-14029__p26802723">Specifies the role for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-14029__row8262415"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-14029__p65275865">NameServiceName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-14029__p52853732">Specifies the NameService for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-14029__row5921545"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-14029__p9883160">Trigger Condition</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-14029__p62338491">Specifies the threshold for triggering the alarm.</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-14029__section36998271"><h4 class="sectiontitle">Impact on the System</h4><p id="ALM-14029__p16253019">Replica data is prone to be lost when a node is faulty. Too many files of a single replica affect the security of the HDFS file system.</p>
</div>
<div class="section" id="ALM-14029__section64548988"><h4 class="sectiontitle">Possible Causes</h4><ul id="ALM-14029__ul41426143"><li id="ALM-14029__li37290970">The DataNode is faulty.</li><li id="ALM-14029__li74416">The disk is faulty.</li><li id="ALM-14029__li149845017332">Files are written to a single replica.</li></ul>
</div>
<div class="section" id="ALM-14029__section10992153716435"><h4 class="sectiontitle">Procedure</h4><ol id="ALM-14029__ol102534323446"><li id="ALM-14029__li14669113152720"><span>On FusionInsight Manager, choose <strong id="ALM-14029__b471231715140">O&amp;M</strong>. In the navigation pane on the left, choose <strong id="ALM-14029__b13725141720149">Alarm</strong> &gt; <strong id="ALM-14029__b17726161711417">Alarms</strong>. On the page that is displayed, check whether alarm <strong id="ALM-14029__b1372614170146">ALM-14003 Number of Lost HDFS Blocks Exceeds the Threshold</strong> is generated.</span><p><ul class="subitemlist" id="ALM-14029__ul118434614516"><li id="ALM-14029__li81841146656">If yes, go to <a href="#ALM-14029__li23401293163156">2</a>.</li><li id="ALM-14029__li1918511461556">If no, go to <a href="#ALM-14029__li17602112155716">3</a>.</li></ul>
</p></li><li id="ALM-14029__li23401293163156"><a name="ALM-14029__li23401293163156"></a><a name="li23401293163156"></a><span>Rectify the fault according to the handling procedure of <strong id="ALM-14029__b1123353210430">ALM-14003 Number of Lost HDFS Blocks Exceeds the Threshold</strong>. In the next detection period, check whether the alarm is cleared.</span><p><ul class="subitemlist" id="ALM-14029__ul47339386163156"><li id="ALM-14029__li32560724163156">If yes, no further action is required.</li><li id="ALM-14029__li20173012163156">If no, go to <a href="#ALM-14029__li17602112155716">3</a>.</li></ul>
</p></li><li class="subitemlist" id="ALM-14029__li17602112155716"><a name="ALM-14029__li17602112155716"></a><a name="li17602112155716"></a><span>Check whether files of a single replica have been written into the service.</span><p><ul class="subitemlist" id="ALM-14029__ul0701932165710"><li id="ALM-14029__li1470214320578">If yes, go to <a href="#ALM-14029__li2696171714538">4</a>.</li><li id="ALM-14029__li670214325575">If no, go to <a href="#ALM-14029__li12256203224411">7</a>.</li></ul>
</p></li><li id="ALM-14029__li2696171714538"><a name="ALM-14029__li2696171714538"></a><a name="li2696171714538"></a><span>Log in to the HDFS client as user <strong id="ALM-14029__b25671340142111">root</strong>. The user password is defined by the user before the installation. Contact the MRS cluster administrator to obtain the password. Run the following commands:</span><p><ul id="ALM-14029__ul136531649134219"><li id="ALM-14029__li1665316493421">Security mode:<p id="ALM-14029__p15448113195317"><a name="ALM-14029__li1665316493421"></a><a name="li1665316493421"></a><strong id="ALM-14029__b1629057291591">cd </strong><em id="ALM-14029__i1986000263591">Client installation directory</em></p>
<p id="ALM-14029__p242017192533"><strong id="ALM-14029__b1749214015534">source bigdata_env</strong></p>
<p id="ALM-14029__p1383154718581"><strong id="ALM-14029__b272414499589">kinit hdfs</strong></p>
</li><li id="ALM-14029__li99822554425">Normal mode:<p id="ALM-14029__p19874185710584"><a name="ALM-14029__li99822554425"></a><a name="li99822554425"></a><strong id="ALM-14029__b187411310205919">su - omm</strong></p>
<p id="ALM-14029__p1481214635916"><strong id="ALM-14029__b1413220132591">cd </strong><em id="ALM-14029__i1252173532591">Client installation directory</em></p>
<p id="ALM-14029__p188128685916"><strong id="ALM-14029__b208122655912">source bigdata_env</strong></p>
</li></ul>
</p></li><li class="subitemlist" id="ALM-14029__li2836155813912"><span>Run the following command on the client node to increase the number of replicas for a single replica file:</span><p><p id="ALM-14029__p1592615471629"><strong id="ALM-14029__b171501246164620">hdfs dfs -setrep -w</strong> <em id="ALM-14029__i0151246114615">file replica number</em> <em id="ALM-14029__i1715113460463">file name or file path</em></p>
</p></li><li id="ALM-14029__li525513321444"><span>In the next detection period, check whether the alarm is cleared.</span><p><ul class="subitemlist" id="ALM-14029__ul1325563211448"><li id="ALM-14029__li14255632114417">If yes, no further action is required.</li><li id="ALM-14029__li1825553219446">If no, go to <a href="#ALM-14029__li12256203224411">7</a>.</li></ul>
</p></li></ol>
<p class="tableheading" id="ALM-14029__p3255143214441"><strong id="ALM-14029__b5144104519417">Collect the fault information.</strong></p>
<ol start="7" id="ALM-14029__ol8256123214449"><li id="ALM-14029__li12256203224411"><a name="ALM-14029__li12256203224411"></a><a name="li12256203224411"></a><span>On FusionInsight Manager, choose <strong id="ALM-14029__b543917158284">O&amp;M</strong>. In the navigation pane on the left, choose <strong id="ALM-14029__b1144216151281">Log</strong> &gt; <strong id="ALM-14029__b14431315112813">Download</strong>.</span></li><li id="ALM-14029__li18256123234420"><span>Expand the drop-down list next to the <strong id="ALM-14029__b3121175012202">Service</strong> field. In the <strong id="ALM-14029__b1127650122014">Services</strong> dialog box that is displayed, select <strong id="ALM-14029__b181287507208">HDFS</strong> for the target cluster.</span></li><li id="ALM-14029__li32561232114416"><span>Click <span><img id="ALM-14029__image1820414592617" src="en-us_image_0269417374.png"></span> in the upper right corner, and set <strong id="ALM-14029__b1220413455266">Start Date</strong> and <strong id="ALM-14029__b52042451260">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-14029__b92042457265">Download</strong>.</span></li><li id="ALM-14029__li13256143234415"><span>Contact <span id="ALM-14029__text15913156194219">O&amp;M personnel</span> and provide the collected logs.</span></li></ol>
</div>
<div class="section" id="ALM-14029__section169311343318"><h4 class="sectiontitle">Alarm Clearing</h4><p id="ALM-14029__p754913417333">This alarm is automatically cleared after the fault is rectified.</p>
</div>
<div class="section" id="ALM-14029__section61085563"><h4 class="sectiontitle">Related Information</h4><p id="ALM-14029__p11393601">None</p>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
</div>
</div>