forked from docs/doc-exports
Reviewed-by: Pruthi, Vineet <vineet.pruthi@t-systems.com> Co-authored-by: Yang, Tong <yangtong2@huawei.com> Co-committed-by: Yang, Tong <yangtong2@huawei.com>
82 lines
9.9 KiB
HTML
82 lines
9.9 KiB
HTML
<a name="ALM-14033"></a><a name="ALM-14033"></a>
|
|
|
|
<h1 class="topictitle1">ALM-14033 ZKFC Process Is Abnormal</h1>
|
|
<div id="body0000001971816508"><div class="section" id="ALM-14033__section979815471118"><h4 class="sectiontitle"><span id="ALM-14033__text1079812471120">Alarm Description</span></h4><p id="ALM-14033__p8353691349">The ZKFC process checks the process status every 20 seconds. This alarm is generated when the process status is abnormal and does not recover for a long time.</p>
|
|
<p id="ALM-14033__p197982471413">This alarm is cleared when the process status recovers.</p>
|
|
</div>
|
|
<div class="section" id="ALM-14033__section18798204714110"><h4 class="sectiontitle"><span id="ALM-14033__text2798164712118">Alarm Attributes</span></h4>
|
|
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-14033__table87986471415" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-14033__row167981047613"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.1"><p id="ALM-14033__p12798647315"><span id="ALM-14033__text10798547517">Alarm ID</span></p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.2"><p id="ALM-14033__p16798124719115"><span id="ALM-14033__text157981347317">Alarm Severity</span></p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.3"><p id="ALM-14033__p17992471410"><span id="ALM-14033__text15799194720117">Auto Cleared</span></p>
|
|
</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody><tr id="ALM-14033__row67994478118"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.1 "><p id="ALM-14033__p18799747419">14033</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.2 "><p id="ALM-14033__p279974710111">Major</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.3 "><p id="ALM-14033__p107994471713">Yes</p>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
</div>
|
|
<div class="section" id="ALM-14033__section19799184712110"><h4 class="sectiontitle"><span id="ALM-14033__text27993470117">Alarm Parameters</span></h4>
|
|
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-14033__table3799204720116" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-14033__row1879915471215"><th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.1"><p id="ALM-14033__p177993479118"><span id="ALM-14033__text207998471417">Parameter</span></p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.2"><p id="ALM-14033__p579954720114"><span id="ALM-14033__text127995473116">Description</span></p>
|
|
</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody><tr id="ALM-14033__row1179918471011"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-14033__p859219498522">Source</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-14033__p2059134995215">Specifies the cluster for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-14033__row1279964711115"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-14033__p1059010490521">ServiceName</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-14033__p35886492524">Specifies the service for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-14033__row079994716117"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-14033__p12587144965212">RoleName</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-14033__p145851849195219">Specifies the role for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-14033__row149900404239"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-14033__p51620924">HostName</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-14033__p34048007">Specifies the host for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
</div>
|
|
<div class="section" id="ALM-14033__section0799144716115"><h4 class="sectiontitle"><span id="ALM-14033__text479911470117">Impact on the System</span></h4><p id="ALM-14033__p8799247918">If the process status is abnormal, the process cannot provide services properly. As a result, the entire service may become abnormal.</p>
|
|
</div>
|
|
<div class="section" id="ALM-14033__section1479910471912"><h4 class="sectiontitle"><span id="ALM-14033__text187997470114">Possible Causes</span></h4><p id="ALM-14033__p1647015610239">The host responds slowly to I/O (disk I/O and network I/O) requests and some processes are in the D state and Z state. The process may also be suspended and enter the T state.</p>
|
|
</div>
|
|
<div class="section" id="ALM-14033__section179924719116"><h4 class="sectiontitle"><span id="ALM-14033__text1799947611">Handling Procedure</span></h4><p id="ALM-14033__p1243515278455"><strong id="ALM-14033__b6239811105419">Check whether the process is in the D, Z, or T state.</strong></p>
|
|
<ol id="ALM-14033__ol67999471216"><li id="ALM-14033__li1980611196816"><span>Log in to FusionInsight Manager and choose <strong id="ALM-14033__b149051811214">O&M</strong> > <strong id="ALM-14033__b1290161817211">Alarm</strong> > <strong id="ALM-14033__b189171812117">Alarms</strong>. Wait for about 10 minutes and check whether the alarm is automatically cleared.</span><p><ul id="ALM-14033__ul10505203319910"><li id="ALM-14033__li5505533895">If the alarm is not in the list, no further action is required.</li><li id="ALM-14033__li350517336917">If the alarm is in the list, view the alarm details and record the IP address of the host where the alarm is generated. Run the command in <a href="#ALM-14033__li191311041031">2</a>.</li></ul>
|
|
</p></li><li id="ALM-14033__li191311041031"><a name="ALM-14033__li191311041031"></a><a name="li191311041031"></a><span>Log in to the host where the alarm is generated as the <strong id="ALM-14033__b31081042122115">root</strong> user and run the <strong id="ALM-14033__b171086426213">su - omm</strong> command to switch to the <strong id="ALM-14033__b131082423217">omm</strong> user.</span></li><li id="ALM-14033__li129386734811"><span>Run the following command to check whether the process state is abnormal:</span><p><p id="ALM-14033__p114995439534"><strong id="ALM-14033__b105101533205318">ps ww -eo stat,cmd| grep -w org.apache.hadoop.hdfs.tools.DFSZKFailoverController | grep -v grep | awk '{print$1}'</strong></p>
|
|
</p></li><li id="ALM-14033__li3621133502116"><span>Check whether the command output contains any abnormal state (D, Z, or T).</span><p><ul id="ALM-14033__ul161804819579"><li id="ALM-14033__li670603211">If the output contains any abnormal state, go to <a href="#ALM-14033__li39471558560">5</a>.</li><li id="ALM-14033__li47070182111">If the output does not contain abnormal states, go to <a href="#ALM-14033__li17799174711116">7</a>.</li></ul>
|
|
</p></li><li id="ALM-14033__li39471558560"><a name="ALM-14033__li39471558560"></a><a name="li39471558560"></a><span>Switch to user <strong id="ALM-14033__b937623310221">root</strong> and run the <strong id="ALM-14033__b537733311228">reboot</strong> command to restart the host for which the alarm is generated. (Restarting a host is risky. Ensure that the service process is normal after the restart.)</span></li><li id="ALM-14033__li7936132616563"><span>Wait 5 minutes and check whether the alarm is cleared.</span><p><ul id="ALM-14033__ul19652752195618"><li id="ALM-14033__li1365317526566">If the alarm is cleared, no further action is required.</li><li id="ALM-14033__li2065375285614">If the alarm fails to be cleared, go to <a href="#ALM-14033__li17799174711116">7</a>.</li></ul>
|
|
</p></li></ol>
|
|
<p class="tableheading" id="ALM-14033__p2079910471716"><strong id="ALM-14033__b14258101712544">Collect fault information.</strong></p>
|
|
<ol start="7" id="ALM-14033__ol37994471410"><li id="ALM-14033__li17799174711116"><a name="ALM-14033__li17799174711116"></a><a name="li17799174711116"></a><span>On FusionInsight Manager, choose <strong id="ALM-14033__b12332161919238">O&M</strong>. In the navigation pane on the left, choose <strong id="ALM-14033__b1833361911230">Log</strong> > <strong id="ALM-14033__b1133317198238">Download</strong>.</span></li><li id="ALM-14033__li177999474110"><span>Expand the drop-down list next to the <strong id="ALM-14033__b16661422122318">Service</strong> field. In the <strong id="ALM-14033__b5667182202315">Services</strong> dialog box that is displayed, select <strong id="ALM-14033__b1166812210237">HDFS</strong> for the target cluster.</span></li><li id="ALM-14033__li5799147219"><span>Click the edit icon in the upper right corner, and set <strong id="ALM-14033__b1370492018259">Start Date</strong> and <strong id="ALM-14033__b18704142012518">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-14033__b147041420182520">Download</strong>.</span></li><li id="ALM-14033__li57991247416"><span>Contact <span id="ALM-14033__text4716173792311">O&M personnel</span> and provide the collected logs.</span></li></ol>
|
|
</div>
|
|
<div class="section" id="ALM-14033__section979934710111"><h4 class="sectiontitle"><span id="ALM-14033__text1379918471115">Alarm Clearance</span></h4><p id="ALM-14033__p27991247919">This alarm is automatically cleared after the fault is rectified.</p>
|
|
</div>
|
|
<div class="section" id="ALM-14033__section879913471915"><h4 class="sectiontitle"><span id="ALM-14033__text16799164711115">Related Information</span></h4><p id="ALM-14033__p1779913479110"><span id="ALM-14033__text879984715119">None.</span></p>
|
|
</div>
|
|
</div>
|
|
<div>
|
|
<div class="familylinks">
|
|
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
|
|
</div>
|
|
</div>
|
|
|