forked from docs/doc-exports
Reviewed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com> Co-authored-by: Yang, Tong <yangtong2@huawei.com> Co-committed-by: Yang, Tong <yangtong2@huawei.com>
95 lines
17 KiB
HTML
95 lines
17 KiB
HTML
<a name="alm_14010"></a><a name="alm_14010"></a>
|
|
|
|
<h1 class="topictitle1">ALM-14010 NameService Service Is Abnormal</h1>
|
|
<div id="body8662426"><div class="section" id="alm_14010__en-us_topic_0191813899_section48163256"><h4 class="sectiontitle">Description</h4><p id="alm_14010__en-us_topic_0191813899_p30087318">The system checks the NameService service status every 180 seconds. This alarm is generated when the NameService service is unavailable.</p>
|
|
<p id="alm_14010__en-us_topic_0191813899_p2350413">This alarm is cleared when the NameService service recovers.</p>
|
|
</div>
|
|
<div class="section" id="alm_14010__en-us_topic_0191813899_section30816121"><h4 class="sectiontitle">Attribute</h4>
|
|
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="alm_14010__en-us_topic_0191813899_table56165728" frame="border" border="1" rules="all"><thead align="left"><tr id="alm_14010__en-us_topic_0191813899_row23522831"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.1"><p id="alm_14010__en-us_topic_0191813899_p26301173">Alarm ID</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.2"><p id="alm_14010__en-us_topic_0191813899_p50020275">Alarm Severity</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.3"><p id="alm_14010__en-us_topic_0191813899_p25110514">Auto Clear</p>
|
|
</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody><tr id="alm_14010__en-us_topic_0191813899_row20685786"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.1 "><p id="alm_14010__en-us_topic_0191813899_p64935954">14010</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.2 "><p id="alm_14010__en-us_topic_0191813899_p25320898">Major</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.3 "><p id="alm_14010__en-us_topic_0191813899_p37726867">Yes</p>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
</div>
|
|
<div class="section" id="alm_14010__en-us_topic_0191813899_section8909633"><h4 class="sectiontitle">Parameters</h4>
|
|
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="alm_14010__en-us_topic_0191813899_table35977388" frame="border" border="1" rules="all"><thead align="left"><tr id="alm_14010__en-us_topic_0191813899_row30639779"><th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.1"><p id="alm_14010__en-us_topic_0191813899_p65903005">Parameter</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.2"><p id="alm_14010__en-us_topic_0191813899_p36543171">Description</p>
|
|
</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody><tr id="alm_14010__en-us_topic_0191813899_row7206911"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="alm_14010__en-us_topic_0191813899_p46888886">ServiceName</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="alm_14010__en-us_topic_0191813899_p39903442">Specifies the service for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="alm_14010__en-us_topic_0191813899_row23586666"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="alm_14010__en-us_topic_0191813899_p31471768">RoleName</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="alm_14010__en-us_topic_0191813899_p66185246">Specifies the role for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="alm_14010__en-us_topic_0191813899_row58796306"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="alm_14010__en-us_topic_0191813899_p64880336">HostName</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="alm_14010__en-us_topic_0191813899_p20815867">Specifies the host for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="alm_14010__en-us_topic_0191813899_row53125076"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="alm_14010__en-us_topic_0191813899_p8163917">NSName</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="alm_14010__en-us_topic_0191813899_p57297510">Specifies the NameService service for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
</div>
|
|
<div class="section" id="alm_14010__en-us_topic_0191813899_section13077833"><h4 class="sectiontitle">Impact on the System</h4><p id="alm_14010__en-us_topic_0191813899_p10586695">HDFS fails to provide services for upper-layer components based on the NameService service, such as HBase and MapReduce. As a result, users cannot read or write files.</p>
|
|
</div>
|
|
<div class="section" id="alm_14010__en-us_topic_0191813899_section50591634"><h4 class="sectiontitle">Possible Causes</h4><ul id="alm_14010__en-us_topic_0191813899_ul52215961"><li id="alm_14010__en-us_topic_0191813899_li181603">The JournalNode is faulty.</li><li id="alm_14010__en-us_topic_0191813899_li1634435">The DataNode is faulty.</li><li id="alm_14010__en-us_topic_0191813899_li14709921">The disk capacity is insufficient.</li><li id="alm_14010__en-us_topic_0191813899_li65280430">The NameNode enters safe mode.</li></ul>
|
|
</div>
|
|
<div class="section" id="alm_14010__en-us_topic_0191813899_section52671525"><h4 class="sectiontitle">Procedure</h4><ol id="alm_14010__en-us_topic_0191813899_ol5629029161016"><li class="tableheading" id="alm_14010__en-us_topic_0191813899_li48614381161016"><span>Check the status of the JournalNode instance.</span><p><ol type="a" id="alm_14010__en-us_topic_0191813899_ol9249362"><li id="alm_14010__en-us_topic_0191813899_alm14010_mmccppss_step6">On the MRS Manager home page, click <strong id="alm_14010__b467318919295">Components</strong>.<div class="note" id="alm_14010__en-us_topic_0191813899_note161357103467"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="alm_14010__en-us_topic_0191813899_p01361310104613">For MRS 1.7.2 or earlier, log in to MRS Manager and choose <strong id="alm_14010__b113913331978">Services</strong>.</p>
|
|
</div></div>
|
|
</li><li id="alm_14010__en-us_topic_0191813899_li11000853">Click <strong id="alm_14010__b4342185119298">HDFS</strong>.</li><li id="alm_14010__en-us_topic_0191813899_li31898818">Click <strong id="alm_14010__b49821018317">Instance</strong>.</li><li id="alm_14010__en-us_topic_0191813899_li18653909">Check whether the <strong id="alm_14010__b21414123111">Health Status</strong> of the JournalNode is <strong id="alm_14010__b151504113117">Good</strong>.<ul class="subitemlist" id="alm_14010__en-us_topic_0191813899_ul33667455"><li id="alm_14010__en-us_topic_0191813899_li34571641">If yes, go to <a href="#alm_14010__en-us_topic_0191813899_alm14010_mmccppss_step11">2.a</a>.</li><li id="alm_14010__en-us_topic_0191813899_li48839530">If no, go to <a href="#alm_14010__en-us_topic_0191813899_alm14010_mmccppss_step12">1.e</a>.</li></ul>
|
|
</li><li id="alm_14010__en-us_topic_0191813899_alm14010_mmccppss_step12"><a name="alm_14010__en-us_topic_0191813899_alm14010_mmccppss_step12"></a><a name="en-us_topic_0191813899_alm14010_mmccppss_step12"></a>Select the faulty JournalNode, and choose <strong id="alm_14010__b10491025133110">More</strong> > <strong id="alm_14010__b14495255313">Restart Instance</strong>. Check whether the JournalNode successfully restarts.<ul class="subitemlist" id="alm_14010__en-us_topic_0191813899_ul36319496"><li id="alm_14010__en-us_topic_0191813899_li58440016">If yes, go to <a href="#alm_14010__en-us_topic_0191813899_alm14010_mmccppss_step10">1.f</a>.</li><li id="alm_14010__en-us_topic_0191813899_li36020887">If no, go to <a href="#alm_14010__en-us_topic_0191813899_li572522141314">5</a>.</li></ul>
|
|
</li><li id="alm_14010__en-us_topic_0191813899_alm14010_mmccppss_step10"><a name="alm_14010__en-us_topic_0191813899_alm14010_mmccppss_step10"></a><a name="en-us_topic_0191813899_alm14010_mmccppss_step10"></a>Wait 5 minutes and check whether the alarm is cleared.<ul class="subitemlist" id="alm_14010__en-us_topic_0191813899_ul19660823"><li id="alm_14010__en-us_topic_0191813899_li42729680">If yes, no further action is required.</li><li id="alm_14010__en-us_topic_0191813899_li49022807">If no, go to <a href="#alm_14010__en-us_topic_0191813899_alm14010_mmccppss_step11">2.a</a>.</li></ul>
|
|
</li></ol>
|
|
</p></li><li class="tableheading" id="alm_14010__en-us_topic_0191813899_li15366700161043"><span>Check the status of the DataNode instance.</span><p><ol type="a" id="alm_14010__en-us_topic_0191813899_ol5966621611152"><li id="alm_14010__en-us_topic_0191813899_alm14010_mmccppss_step11"><a name="alm_14010__en-us_topic_0191813899_alm14010_mmccppss_step11"></a><a name="en-us_topic_0191813899_alm14010_mmccppss_step11"></a>On the MRS cluster details page, click <strong id="alm_14010__b8743138163217">Components</strong>.<div class="note" id="alm_14010__en-us_topic_0191813899_note20350123718473"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="alm_14010__en-us_topic_0191813899_p8350737104713">For MRS 1.7.2 or earlier, log in to MRS Manager and choose <strong id="alm_14010__b1045819470712">Services</strong>.</p>
|
|
</div></div>
|
|
</li><li id="alm_14010__en-us_topic_0191813899_li52964449">Click <strong id="alm_14010__b879014273329">HDFS</strong>.</li><li id="alm_14010__en-us_topic_0191813899_li6918001">In <strong id="alm_14010__b1640792916321">Operation and Health Summary</strong>, check whether the <strong id="alm_14010__b3407152915320">Health Status</strong> of all DataNodes is <strong id="alm_14010__b17407142983212">Good</strong>.<ul class="subitemlist" id="alm_14010__en-us_topic_0191813899_ul62262011"><li id="alm_14010__en-us_topic_0191813899_li23487194">If yes, go to <a href="#alm_14010__en-us_topic_0191813899_alm14010_mmccppss_step24">3.a</a>.</li><li id="alm_14010__en-us_topic_0191813899_li23414560">If no, go to <a href="#alm_14010__en-us_topic_0191813899_alm14010_mmccppss_step14">2.d</a>.</li></ul>
|
|
</li><li id="alm_14010__en-us_topic_0191813899_alm14010_mmccppss_step14"><a name="alm_14010__en-us_topic_0191813899_alm14010_mmccppss_step14"></a><a name="en-us_topic_0191813899_alm14010_mmccppss_step14"></a>Click <strong id="alm_14010__b41828526322">Instances</strong>. On the DataNode management page, select the faulty DataNode, and choose <strong id="alm_14010__b584835513315">More</strong> > <strong id="alm_14010__b98541655173317">Restart Instance</strong>. Check whether the DataNode successfully restarts.<ul class="subitemlist" id="alm_14010__en-us_topic_0191813899_ul23562876"><li id="alm_14010__en-us_topic_0191813899_li10739296">If yes, go to <a href="#alm_14010__en-us_topic_0191813899_alm14010_mmccppss_step15">2.e</a>.</li><li id="alm_14010__en-us_topic_0191813899_li64576684">If no, go to <a href="#alm_14010__en-us_topic_0191813899_alm14010_mmccppss_step24">3.a</a>.</li></ul>
|
|
</li><li id="alm_14010__en-us_topic_0191813899_alm14010_mmccppss_step15"><a name="alm_14010__en-us_topic_0191813899_alm14010_mmccppss_step15"></a><a name="en-us_topic_0191813899_alm14010_mmccppss_step15"></a>Wait 5 minutes and check whether the alarm is cleared.<ul class="subitemlist" id="alm_14010__en-us_topic_0191813899_ul33089041"><li id="alm_14010__en-us_topic_0191813899_li29365919">If yes, no further action is required.</li><li id="alm_14010__en-us_topic_0191813899_li62966687">If no, go to <a href="#alm_14010__en-us_topic_0191813899_step28">4.a</a>.</li></ul>
|
|
</li></ol>
|
|
</p></li><li class="tableheading" id="alm_14010__en-us_topic_0191813899_li11089405161051"><span>Check the disk status.</span><p><ol type="a" id="alm_14010__en-us_topic_0191813899_ol1253331711152"><li id="alm_14010__en-us_topic_0191813899_alm14010_mmccppss_step24"><a name="alm_14010__en-us_topic_0191813899_alm14010_mmccppss_step24"></a><a name="en-us_topic_0191813899_alm14010_mmccppss_step24"></a>On the MRS cluster details page, click the <strong id="alm_14010__b131651037143416">Nodes</strong> tab and expand a node group.<div class="note" id="alm_14010__en-us_topic_0191813899_note185001644485"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="alm_14010__en-us_topic_0191813899_p1050044184818">For MRS 1.7.2 or earlier, log in to MRS Manager and click <strong id="alm_14010__b1346611531714">Hosts</strong>.</p>
|
|
</div></div>
|
|
</li><li id="alm_14010__en-us_topic_0191813899_li2267763">In the <strong id="alm_14010__b117541855173419">Disk Usage</strong> column, check whether disk space is insufficient. <ul class="subitemlist" id="alm_14010__en-us_topic_0191813899_ul20409871"><li id="alm_14010__en-us_topic_0191813899_li49471114">If yes, go to <a href="#alm_14010__en-us_topic_0191813899_alm14010_mmccppss_step26">3.c</a>.</li><li id="alm_14010__en-us_topic_0191813899_li47737289">If no, go to <a href="#alm_14010__en-us_topic_0191813899_step28">4.a</a>.</li></ul>
|
|
</li><li id="alm_14010__en-us_topic_0191813899_alm14010_mmccppss_step26"><a name="alm_14010__en-us_topic_0191813899_alm14010_mmccppss_step26"></a><a name="en-us_topic_0191813899_alm14010_mmccppss_step26"></a>Expand the disk capacity.</li><li id="alm_14010__en-us_topic_0191813899_li38092711">Wait 5 minutes and check whether the alarm is cleared.<ul class="subitemlist" id="alm_14010__en-us_topic_0191813899_ul7290082"><li id="alm_14010__en-us_topic_0191813899_li65610739">If yes, no further action is required.</li><li id="alm_14010__en-us_topic_0191813899_li53625739">If no, go to <a href="#alm_14010__en-us_topic_0191813899_step28">4.a</a>.</li></ul>
|
|
</li></ol>
|
|
</p></li><li class="tableheading" id="alm_14010__en-us_topic_0191813899_li5149441716110"><span>Check whether NameNode is in the safe mode.</span><p><ol type="a" id="alm_14010__en-us_topic_0191813899_ol3317679211152"><li id="alm_14010__en-us_topic_0191813899_step28"><a name="alm_14010__en-us_topic_0191813899_step28"></a><a name="en-us_topic_0191813899_step28"></a>Use the client on the cluster node, and run the <strong id="alm_14010__b13631145714356">hdfs dfsadmin -safemode get</strong> command to check whether <strong id="alm_14010__b363195743518">Safe mode is ON</strong> is displayed.<p class="litext" id="alm_14010__en-us_topic_0191813899_p14515408">Information behind <strong id="alm_14010__b1462080173612">Safe mode is ON</strong> is alarm information and is displayed based actual conditions.</p>
|
|
<ul class="subitemlist" id="alm_14010__en-us_topic_0191813899_ul63529816"><li id="alm_14010__en-us_topic_0191813899_li34897438">If yes, go to <a href="#alm_14010__en-us_topic_0191813899_li66373591">4.b</a>.</li><li id="alm_14010__en-us_topic_0191813899_li8120240">If no, go to <a href="#alm_14010__en-us_topic_0191813899_li572522141314">5</a>.</li></ul>
|
|
</li><li id="alm_14010__en-us_topic_0191813899_li66373591"><a name="alm_14010__en-us_topic_0191813899_li66373591"></a><a name="en-us_topic_0191813899_li66373591"></a>Use the client on the cluster node and run the <strong id="alm_14010__b19785720113612">hdfs dfsadmin -safemode leave</strong> command.</li><li id="alm_14010__en-us_topic_0191813899_li7551780">Wait 5 minutes and check whether the alarm is cleared.<ul class="subitemlist" id="alm_14010__en-us_topic_0191813899_ul857158"><li id="alm_14010__en-us_topic_0191813899_li7714425">If yes, no further action is required.</li><li id="alm_14010__en-us_topic_0191813899_li2320967">If no, go to <a href="#alm_14010__en-us_topic_0191813899_li572522141314">5</a>.</li></ul>
|
|
</li></ol>
|
|
</p></li><li id="alm_14010__en-us_topic_0191813899_li572522141314"><a name="alm_14010__en-us_topic_0191813899_li572522141314"></a><a name="en-us_topic_0191813899_li572522141314"></a><span>Collect fault information.</span><p><ol type="a" id="alm_14010__en-us_topic_0191813899_en-us_topic_0191813935_ol6089206913036"><li id="alm_14010__en-us_topic_0191813899_en-us_topic_0191813935_li4478836213036">On MRS Manager, choose <span class="menucascade" id="alm_14010__menucascade345253363614"><b><span class="uicontrol" id="alm_14010__uicontrol184513337362">System</span></b> > <b><span class="uicontrol" id="alm_14010__uicontrol1745219339367">Export Log</span></b></span>.</li><li id="alm_14010__li18574327401">Contact technical support engineers for help. For details, see <a href="https://docs.otc.t-systems.com/en-us/public/learnmore.html" target="_blank" rel="noopener noreferrer">technical support</a>.</li></ol>
|
|
</p></li></ol>
|
|
</div>
|
|
<div class="section" id="alm_14010__en-us_topic_0191813899_section4281684"><h4 class="sectiontitle">Reference</h4><p id="alm_14010__en-us_topic_0191813899_p19182316">None</p>
|
|
</div>
|
|
</div>
|
|
<div>
|
|
<div class="familylinks">
|
|
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_0241.html">Alarm Reference (Applicable to Versions Earlier Than MRS 3.x)</a></div>
|
|
</div>
|
|
</div>
|
|
|