forked from docs/doc-exports
Reviewed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com> Co-authored-by: Yang, Tong <yangtong2@huawei.com> Co-committed-by: Yang, Tong <yangtong2@huawei.com>
74 lines
7.0 KiB
HTML
74 lines
7.0 KiB
HTML
<a name="alm_18002"></a><a name="alm_18002"></a>
|
|
|
|
<h1 class="topictitle1">ALM-18002 NodeManager Heartbeat Lost</h1>
|
|
<div id="body8662426"><div class="section" id="alm_18002__en-us_topic_0191813948_section65004788"><h4 class="sectiontitle">Description</h4><p id="alm_18002__en-us_topic_0191813948_p32117087">The system checks the number of lost NodeManager nodes every 30 seconds, and compares the number of lost nodes with the threshold. The <strong id="alm_18002__b1690616611237">Lost Nodes</strong> indicator has a default threshold. This alarm is generated when the value of the <strong id="alm_18002__b1151919121236">Lost Nodes</strong> indicator exceeds the threshold.</p>
|
|
<p id="alm_18002__en-us_topic_0191813948_p51347269">This alarm is cleared when the value of <strong id="alm_18002__b72981217132318">Lost Nodes</strong> is less than or equal to the threshold.</p>
|
|
</div>
|
|
<div class="section" id="alm_18002__en-us_topic_0191813948_section48172185"><h4 class="sectiontitle">Attribute</h4>
|
|
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="alm_18002__en-us_topic_0191813948_table65488131" frame="border" border="1" rules="all"><thead align="left"><tr id="alm_18002__en-us_topic_0191813948_row18827362"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.1"><p id="alm_18002__en-us_topic_0191813948_p48621330">Alarm ID</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.2"><p id="alm_18002__en-us_topic_0191813948_p46013667">Alarm Severity</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.3"><p id="alm_18002__en-us_topic_0191813948_p36119548">Auto Clear</p>
|
|
</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody><tr id="alm_18002__en-us_topic_0191813948_row40002310"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.1 "><p id="alm_18002__en-us_topic_0191813948_p18961705">18002</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.2 "><p id="alm_18002__en-us_topic_0191813948_p59503116">Major</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.3 "><p id="alm_18002__en-us_topic_0191813948_p55023063">Yes</p>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
</div>
|
|
<div class="section" id="alm_18002__en-us_topic_0191813948_section30896486"><h4 class="sectiontitle">Parameters</h4>
|
|
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="alm_18002__en-us_topic_0191813948_table27683123" frame="border" border="1" rules="all"><thead align="left"><tr id="alm_18002__en-us_topic_0191813948_row23047586"><th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.1"><p id="alm_18002__en-us_topic_0191813948_p54915211">Parameter</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.2"><p id="alm_18002__en-us_topic_0191813948_p18947112">Description</p>
|
|
</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody><tr id="alm_18002__en-us_topic_0191813948_row58321122"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="alm_18002__en-us_topic_0191813948_p26390432">ServiceName</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="alm_18002__en-us_topic_0191813948_p57250251">Specifies the service for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="alm_18002__en-us_topic_0191813948_row45490212"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="alm_18002__en-us_topic_0191813948_p60828573">RoleName</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="alm_18002__en-us_topic_0191813948_p28167350">Specifies the role for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="alm_18002__en-us_topic_0191813948_row52179558"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="alm_18002__en-us_topic_0191813948_p65794642">HostName</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="alm_18002__en-us_topic_0191813948_p27765811">Specifies the host for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="alm_18002__en-us_topic_0191813948_row48565715"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="alm_18002__en-us_topic_0191813948_p41508862">Trigger condition</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="alm_18002__en-us_topic_0191813948_p6774674">Generates an alarm when the actual indicator value exceeds the specified threshold.</p>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
</div>
|
|
<div class="section" id="alm_18002__en-us_topic_0191813948_section9632925"><h4 class="sectiontitle">Impact on the System</h4><ul id="alm_18002__en-us_topic_0191813948_ul11877685"><li id="alm_18002__en-us_topic_0191813948_li39790309">The lost NodeManager node cannot provide the Yarn service.</li><li id="alm_18002__en-us_topic_0191813948_li22568466">The number of containers decreases, so the cluster performance deteriorates.</li></ul>
|
|
</div>
|
|
<div class="section" id="alm_18002__en-us_topic_0191813948_section19587461"><h4 class="sectiontitle">Possible Causes</h4><ul id="alm_18002__en-us_topic_0191813948_ul16106447"><li id="alm_18002__en-us_topic_0191813948_li10740297">NodeManager is forcibly deleted without decommission.</li><li id="alm_18002__en-us_topic_0191813948_li29553812">All NodeManager instances are stopped or the NodeManager process is faulty.</li><li id="alm_18002__en-us_topic_0191813948_li64657722">The host where the NodeManager node resides is faulty.</li><li id="alm_18002__en-us_topic_0191813948_li45048586">The network between the NodeManager and ResourceManager is disconnected or busy.</li></ul>
|
|
</div>
|
|
<div class="section" id="alm_18002__en-us_topic_0191813948_section42069424"><h4 class="sectiontitle">Procedure</h4><ol id="alm_18002__en-us_topic_0191813948_ol8416164734914"><li id="alm_18002__en-us_topic_0191813948_li572522141314"><span>Collect fault information.</span><p><ol type="a" id="alm_18002__en-us_topic_0191813948_en-us_topic_0191813935_ol6089206913036"><li id="alm_18002__en-us_topic_0191813948_en-us_topic_0191813935_li4478836213036">On MRS Manager, choose <strong id="alm_18002__b96681417182519">System</strong> > <strong id="alm_18002__b46741717152516">Export Log</strong>.</li><li id="alm_18002__li18574327401">Contact technical support engineers for help. For details, see <a href="https://docs.otc.t-systems.com/en-us/public/learnmore.html" target="_blank" rel="noopener noreferrer">technical support</a>.</li></ol>
|
|
</p></li></ol>
|
|
</div>
|
|
<div class="section" id="alm_18002__en-us_topic_0191813948_section43080504"><h4 class="sectiontitle">Reference</h4><p id="alm_18002__en-us_topic_0191813948_p52956621">None</p>
|
|
</div>
|
|
</div>
|
|
<div>
|
|
<div class="familylinks">
|
|
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_0241.html">Alarm Reference (Applicable to Versions Earlier Than MRS 3.x)</a></div>
|
|
</div>
|
|
</div>
|
|
|