forked from docs/doc-exports
Reviewed-by: Pruthi, Vineet <vineet.pruthi@t-systems.com> Co-authored-by: Yang, Tong <yangtong2@huawei.com> Co-committed-by: Yang, Tong <yangtong2@huawei.com>
82 lines
9.8 KiB
HTML
82 lines
9.8 KiB
HTML
<a name="ALM-14031"></a><a name="ALM-14031"></a>
|
|
|
|
<h1 class="topictitle1">ALM-14031 DataNode Process Is Abnormal</h1>
|
|
<div id="body0000002008297081"><div class="section" id="ALM-14031__section8243740"><h4 class="sectiontitle"><span id="ALM-14031__text8925301575">Alarm Description</span></h4><p id="ALM-14031__p8353691349">The DataNode process checks the process status every 20 seconds. This alarm is generated when the process status is abnormal and does not recover for a long time.</p>
|
|
<p id="ALM-14031__p1931134211237">This alarm is cleared when the process status recovers.</p>
|
|
</div>
|
|
<div class="section" id="ALM-14031__section7084804"><h4 class="sectiontitle"><span id="ALM-14031__text38748475555">Alarm Attributes</span></h4>
|
|
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-14031__table38418539" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-14031__row53418480"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.1"><p id="ALM-14031__p31929608"><span id="ALM-14031__text981514694317">Alarm ID</span></p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.2"><p id="ALM-14031__p36161432"><span id="ALM-14031__text15260185184313">Alarm Severity</span></p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.3"><p id="ALM-14031__p43394889"><span id="ALM-14031__text27412586431">Auto Cleared</span></p>
|
|
</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody><tr id="ALM-14031__row25325122"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.1 "><p id="ALM-14031__p853895314331">14031</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.2 "><p id="ALM-14031__p115373532334">Major</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.3 "><p id="ALM-14031__p1553517532330">Yes</p>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
</div>
|
|
<div class="section" id="ALM-14031__section63763242"><h4 class="sectiontitle"><span id="ALM-14031__text155061195577">Alarm Parameters</span></h4>
|
|
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-14031__table3554205" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-14031__row22865724"><th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.1"><p id="ALM-14031__p21975462"><span id="ALM-14031__text776142495720">Parameter</span></p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.2"><p id="ALM-14031__p35182007"><span id="ALM-14031__text632018391572">Description</span></p>
|
|
</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody><tr id="ALM-14031__row10137556112512"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-14031__p859219498522">Source</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-14031__p2059134995215">Specifies the cluster for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-14031__row46079102"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-14031__p1059010490521">ServiceName</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-14031__p35886492524">Specifies the service for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-14031__row63154538"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-14031__p12587144965212">RoleName</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-14031__p145851849195219">Specifies the role for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-14031__row1089082402316"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-14031__p51620924">HostName</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-14031__p34048007">Specifies the host for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
</div>
|
|
<div class="section" id="ALM-14031__section36998271"><h4 class="sectiontitle"><span id="ALM-14031__text2266192715582">Impact on the System</span></h4><p id="ALM-14031__p16253019">If the process status is abnormal, the process cannot provide services properly. As a result, the entire service may become abnormal.</p>
|
|
</div>
|
|
<div class="section" id="ALM-14031__section64548988"><h4 class="sectiontitle"><span id="ALM-14031__text12656240135813">Possible Causes</span></h4><p id="ALM-14031__p8207814181819">The host responds slowly to I/O (disk I/O and network I/O) requests and some processes are in the D state and Z state. The process may also be suspended and enter the T state.</p>
|
|
</div>
|
|
<div class="section" id="ALM-14031__section770654563320"><h4 class="sectiontitle"><span id="ALM-14031__text19569135285811">Handling Procedure</span></h4><p id="ALM-14031__p1243515278455"><strong id="ALM-14031__b1655484819527">Check whether the process is in the D, Z, or T state.</strong></p>
|
|
<ol id="ALM-14031__ol8805715143410"><li id="ALM-14031__li1980611196816"><span>Log in to FusionInsight Manager and choose <strong id="ALM-14031__b1530417210108">O&M</strong> > <strong id="ALM-14031__b664215411018">Alarm</strong> > <strong id="ALM-14031__b63760791011">Alarms</strong>. Wait for about 10 minutes and check whether the alarm is automatically cleared.</span><p><ul id="ALM-14031__ul10505203319910"><li id="ALM-14031__li5505533895">If the alarm is not in the list, no further action is required.</li><li id="ALM-14031__li350517336917">If the alarm is in the list, view the alarm details and record the IP address of the host where the alarm is generated. Run the command in <a href="#ALM-14031__li162831544134616">2</a>.</li></ul>
|
|
</p></li><li id="ALM-14031__li162831544134616"><a name="ALM-14031__li162831544134616"></a><a name="li162831544134616"></a><span>Log in to the host where the alarm is generated as the <strong id="ALM-14031__b1578713318414">root</strong> user and run the <strong id="ALM-14031__b131521842151211">su - omm</strong> command to switch to the <strong id="ALM-14031__b133931244201216">omm</strong> user.</span></li><li id="ALM-14031__li129386734811"><span>Run the following command to check the process state:</span><p><p id="ALM-14031__p114995439534"><strong id="ALM-14031__b105101533205318">ps ww -eo stat,cmd| grep -w org.apache.hadoop.hdfs.server.datanode.DataNode | grep -v grep | awk '{print$1}'</strong></p>
|
|
</p></li><li id="ALM-14031__li0510123385319"><span>Check whether the command output contains any abnormal state (D, Z, or T).</span><p><ul id="ALM-14031__ul161804819579"><li id="ALM-14031__li161818483576">If the output contains any abnormal state, go to <a href="#ALM-14031__li39471558560">5</a>.</li><li id="ALM-14031__li1661854818575">If the output does not contain abnormal states, go to <a href="#ALM-14031__li14805191513412">7</a>.</li></ul>
|
|
</p></li><li id="ALM-14031__li39471558560"><a name="ALM-14031__li39471558560"></a><a name="li39471558560"></a><span>Switch to user <strong id="ALM-14031__b94993490139">root</strong> and run the <strong id="ALM-14031__b9500154991318">reboot</strong> command to restart the host for which the alarm is generated. (Restarting a host is risky. Ensure that the service process is normal after the restart.)</span></li><li id="ALM-14031__li7936132616563"><span>Wait 5 minutes and check whether the alarm is cleared.</span><p><ul id="ALM-14031__ul19652752195618"><li id="ALM-14031__li1365317526566">If the alarm is cleared, no further action is required.</li><li id="ALM-14031__li2065375285614">If the alarm fails to be cleared, go to <a href="#ALM-14031__li14805191513412">7</a>.</li></ul>
|
|
</p></li></ol>
|
|
<p class="tableheading" id="ALM-14031__p3255143214441"><strong id="ALM-14031__b17190233165214">Collect fault information.</strong></p>
|
|
<ol start="7" id="ALM-14031__ol480581514342"><li id="ALM-14031__li14805191513412"><a name="ALM-14031__li14805191513412"></a><a name="li14805191513412"></a><span>On FusionInsight Manager, choose <strong id="ALM-14031__b463700064113054">O&M</strong>. In the navigation pane on the left, choose <strong id="ALM-14031__b402136686113054">Log</strong> > <strong id="ALM-14031__b1875241582113054">Download</strong>.</span></li><li id="ALM-14031__li168051615113417"><span>Expand the drop-down list next to the <strong id="ALM-14031__b15369453141411">Service</strong> field. In the <strong id="ALM-14031__b10370353171419">Services</strong> dialog box that is displayed, select <strong id="ALM-14031__b14370153101416">HDFS</strong> for the target cluster.</span></li><li id="ALM-14031__li5805171503414"><span>Click the edit icon in the upper right corner, and set <strong id="ALM-14031__b18636418253">Start Date</strong> and <strong id="ALM-14031__b28631142250">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-14031__b17864154132515">Download</strong>.</span></li><li id="ALM-14031__li10805181583414"><span>Contact <span id="ALM-14031__text19191183321513">O&M personnel</span> and provide the collected logs.</span></li></ol>
|
|
</div>
|
|
<div class="section" id="ALM-14031__section169311343318"><h4 class="sectiontitle"><span id="ALM-14031__text367020138593">Alarm Clearance</span></h4><p id="ALM-14031__p754913417333">This alarm is automatically cleared after the fault is rectified.</p>
|
|
</div>
|
|
<div class="section" id="ALM-14031__section53362350"><h4 class="sectiontitle"><span id="ALM-14031__text1246242445916">Related Information</span></h4><p id="ALM-14031__p7522741"><span id="ALM-14031__text1881919412591">None.</span></p>
|
|
</div>
|
|
</div>
|
|
<div>
|
|
<div class="familylinks">
|
|
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
|
|
</div>
|
|
</div>
|
|
|