forked from docs/doc-exports
Reviewed-by: Pruthi, Vineet <vineet.pruthi@t-systems.com> Co-authored-by: Yang, Tong <yangtong2@huawei.com> Co-committed-by: Yang, Tong <yangtong2@huawei.com>
82 lines
9.9 KiB
HTML
82 lines
9.9 KiB
HTML
<a name="ALM-14035"></a><a name="ALM-14035"></a>
|
|
|
|
<h1 class="topictitle1">ALM-14035 HttpFS Process Is Abnormal</h1>
|
|
<div id="body0000002008297085"><div class="section" id="ALM-14035__section979815471118"><h4 class="sectiontitle"><span id="ALM-14035__text1079812471120">Alarm Description</span></h4><p id="ALM-14035__p8353691349">The HttpFS process checks the process status every 20 seconds. This alarm is generated when the process status is abnormal and does not recover for a long time.</p>
|
|
<p id="ALM-14035__p197982471413">This alarm is cleared when the process status recovers.</p>
|
|
</div>
|
|
<div class="section" id="ALM-14035__section18798204714110"><h4 class="sectiontitle"><span id="ALM-14035__text2798164712118">Alarm Attributes</span></h4>
|
|
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-14035__table87986471415" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-14035__row167981047613"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.1"><p id="ALM-14035__p12798647315"><span id="ALM-14035__text10798547517">Alarm ID</span></p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.2"><p id="ALM-14035__p16798124719115"><span id="ALM-14035__text157981347317">Alarm Severity</span></p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.3"><p id="ALM-14035__p17992471410"><span id="ALM-14035__text15799194720117">Auto Cleared</span></p>
|
|
</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody><tr id="ALM-14035__row67994478118"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.1 "><p id="ALM-14035__p18799747419">14035</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.2 "><p id="ALM-14035__p279974710111">Major</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.3 "><p id="ALM-14035__p107994471713">Yes</p>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
</div>
|
|
<div class="section" id="ALM-14035__section19799184712110"><h4 class="sectiontitle"><span id="ALM-14035__text27993470117">Alarm Parameters</span></h4>
|
|
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-14035__table3799204720116" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-14035__row1879915471215"><th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.1"><p id="ALM-14035__p177993479118"><span id="ALM-14035__text207998471417">Parameter</span></p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.2"><p id="ALM-14035__p579954720114"><span id="ALM-14035__text127995473116">Description</span></p>
|
|
</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody><tr id="ALM-14035__row1179918471011"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-14035__p859219498522">Source</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-14035__p2059134995215">Specifies the cluster for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-14035__row1279964711115"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-14035__p1059010490521">ServiceName</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-14035__p35886492524">Specifies the service for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-14035__row079994716117"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-14035__p12587144965212">RoleName</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-14035__p145851849195219">Specifies the role for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-14035__row1839713564234"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-14035__p51620924">HostName</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-14035__p34048007">Specifies the host for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
</div>
|
|
<div class="section" id="ALM-14035__section0799144716115"><h4 class="sectiontitle"><span id="ALM-14035__text479911470117">Impact on the System</span></h4><p id="ALM-14035__p8799247918">If the process status is abnormal, the process cannot provide services properly. As a result, the entire service may become abnormal.</p>
|
|
</div>
|
|
<div class="section" id="ALM-14035__section1479910471912"><h4 class="sectiontitle"><span id="ALM-14035__text187997470114">Possible Causes</span></h4><p id="ALM-14035__p251412141245">The host responds slowly to I/O (disk I/O and network I/O) requests and some processes are in the D state and Z state. The process may also be suspended and enter the T state.</p>
|
|
</div>
|
|
<div class="section" id="ALM-14035__section179924719116"><h4 class="sectiontitle"><span id="ALM-14035__text1799947611">Handling Procedure</span></h4><p id="ALM-14035__p1243515278455"><strong id="ALM-14035__b1988924517547">Check whether the process is in the D, Z, or T state.</strong></p>
|
|
<ol id="ALM-14035__ol67999471216"><li id="ALM-14035__li1980611196816"><span>Log in to FusionInsight Manager and choose <strong id="ALM-14035__b10840950103119">O&M</strong> > <strong id="ALM-14035__b5841115013118">Alarm</strong> > <strong id="ALM-14035__b14841155093119">Alarms</strong>. Wait for about 10 minutes and check whether the alarm is automatically cleared.</span><p><ul id="ALM-14035__ul10505203319910"><li id="ALM-14035__li5505533895">If the alarm is not in the list, no further action is required.</li><li id="ALM-14035__li350517336917">If the alarm is in the list, view the alarm details and record the IP address of the host where the alarm is generated. Run the command in <a href="#ALM-14035__li68511247311">2</a>.</li></ul>
|
|
</p></li><li id="ALM-14035__li68511247311"><a name="ALM-14035__li68511247311"></a><a name="li68511247311"></a><span>Log in to the host where the alarm is generated as the <strong id="ALM-14035__b379415162324">root</strong> user and run the <strong id="ALM-14035__b07941316193217">su - omm</strong> command to switch to the <strong id="ALM-14035__b4795516173217">omm</strong> user.</span></li><li id="ALM-14035__li129386734811"><span>Run the following command to check whether the process state is abnormal:</span><p><p id="ALM-14035__p114995439534"><strong id="ALM-14035__b105101533205318">ps ww -eo stat,cmd| grep -w org.apache.hadoop.fs.http.server.HttpFSServerWebServer | grep -v grep | awk '{print$1}'</strong></p>
|
|
</p></li><li id="ALM-14035__li3621133502116"><span>Check whether the command output contains any abnormal state (D, Z, or T).</span><p><ul id="ALM-14035__ul161804819579"><li id="ALM-14035__li670603211">If the output contains any abnormal state, go to <a href="#ALM-14035__li39471558560">5</a>.</li><li id="ALM-14035__li47070182111">If the output does not contain abnormal states, go to <a href="#ALM-14035__li17799174711116">7</a>.</li></ul>
|
|
</p></li><li id="ALM-14035__li39471558560"><a name="ALM-14035__li39471558560"></a><a name="li39471558560"></a><span>Switch to user <strong id="ALM-14035__b5858163753214">root</strong> and run the <strong id="ALM-14035__b4859203716322">reboot</strong> command to restart the host for which the alarm is generated. (Restarting a host is risky. Ensure that the service process is normal after the restart.)</span></li><li id="ALM-14035__li7936132616563"><span>Wait 5 minutes and check whether the alarm is cleared.</span><p><ul id="ALM-14035__ul19652752195618"><li id="ALM-14035__li1365317526566">If the alarm is cleared, no further action is required.</li><li id="ALM-14035__li2065375285614">If the alarm fails to be cleared, go to <a href="#ALM-14035__li17799174711116">7</a>.</li></ul>
|
|
</p></li></ol>
|
|
<p class="tableheading" id="ALM-14035__p2079910471716"><strong id="ALM-14035__b10284155114545">Collect fault information.</strong></p>
|
|
<ol start="7" id="ALM-14035__ol37994471410"><li id="ALM-14035__li17799174711116"><a name="ALM-14035__li17799174711116"></a><a name="li17799174711116"></a><span>On FusionInsight Manager, choose <strong id="ALM-14035__b1761410973312">O&M</strong>. In the navigation pane on the left, choose <strong id="ALM-14035__b1861613933317">Log</strong> > <strong id="ALM-14035__b186171298337">Download</strong>.</span></li><li id="ALM-14035__li177999474110"><span>Expand the drop-down list next to the <strong id="ALM-14035__b1423117123337">Service</strong> field. In the <strong id="ALM-14035__b8232181263315">Services</strong> dialog box that is displayed, select <strong id="ALM-14035__b15232141214334">HDFS</strong> for the target cluster.</span></li><li id="ALM-14035__li5799147219"><span>Click the edit icon in the upper right corner, and set <strong id="ALM-14035__b86785334252">Start Date</strong> and <strong id="ALM-14035__b06791933142513">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-14035__b18679833192510">Download</strong>.</span></li><li id="ALM-14035__li57991247416"><span>Contact <span id="ALM-14035__text6536822123311">O&M personnel</span> and provide the collected logs.</span></li></ol>
|
|
</div>
|
|
<div class="section" id="ALM-14035__section979934710111"><h4 class="sectiontitle"><span id="ALM-14035__text1379918471115">Alarm Clearance</span></h4><p id="ALM-14035__p27991247919">This alarm is automatically cleared after the fault is rectified.</p>
|
|
</div>
|
|
<div class="section" id="ALM-14035__section879913471915"><h4 class="sectiontitle"><span id="ALM-14035__text16799164711115">Related Information</span></h4><p id="ALM-14035__p1779913479110"><span id="ALM-14035__text879984715119">None.</span></p>
|
|
</div>
|
|
</div>
|
|
<div>
|
|
<div class="familylinks">
|
|
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
|
|
</div>
|
|
</div>
|
|
|