forked from docs/doc-exports
Reviewed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com> Co-authored-by: Yang, Tong <yangtong2@huawei.com> Co-committed-by: Yang, Tong <yangtong2@huawei.com>
79 lines
13 KiB
HTML
79 lines
13 KiB
HTML
<a name="alm_14012"></a><a name="alm_14012"></a>
|
|
|
|
<h1 class="topictitle1">ALM-14012 HDFS JournalNode Data Is Not Synchronized</h1>
|
|
<div id="body8662426"><div class="section" id="alm_14012__en-us_topic_0191813911_section18191719"><h4 class="sectiontitle">Description</h4><p id="alm_14012__en-us_topic_0191813911_p59085005">On the active NameNode, the system checks data synchronization on all JournalNodes in the cluster every 5 minutes. This alarm is generated when data on a JournalNode is not synchronized with that on other JournalNodes.</p>
|
|
<p id="alm_14012__en-us_topic_0191813911_p62003000">This alarm is cleared in 5 minutes after data on JournalNodes is synchronized.</p>
|
|
</div>
|
|
<div class="section" id="alm_14012__en-us_topic_0191813911_section29507743"><h4 class="sectiontitle">Attribute</h4>
|
|
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="alm_14012__en-us_topic_0191813911_table56187107" frame="border" border="1" rules="all"><thead align="left"><tr id="alm_14012__en-us_topic_0191813911_row43395070"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.1"><p id="alm_14012__en-us_topic_0191813911_p25339754">Alarm ID</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.2"><p id="alm_14012__en-us_topic_0191813911_p39254219">Alarm Severity</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.3"><p id="alm_14012__en-us_topic_0191813911_p25475209">Auto Clear</p>
|
|
</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody><tr id="alm_14012__en-us_topic_0191813911_row50226059"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.1 "><p id="alm_14012__en-us_topic_0191813911_p41779002">14012</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.2 "><p id="alm_14012__en-us_topic_0191813911_p28655997">Major</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.3 "><p id="alm_14012__en-us_topic_0191813911_p39434429">Yes</p>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
</div>
|
|
<div class="section" id="alm_14012__en-us_topic_0191813911_section64243102"><h4 class="sectiontitle">Parameters</h4>
|
|
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="alm_14012__en-us_topic_0191813911_table40072161" frame="border" border="1" rules="all"><thead align="left"><tr id="alm_14012__en-us_topic_0191813911_row29623216"><th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.1"><p id="alm_14012__en-us_topic_0191813911_p50670335">Parameter</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.2"><p id="alm_14012__en-us_topic_0191813911_p10656503">Description</p>
|
|
</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody><tr id="alm_14012__en-us_topic_0191813911_row57870399"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="alm_14012__en-us_topic_0191813911_p56990719">ServiceName</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="alm_14012__en-us_topic_0191813911_p52845536">Specifies the service for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="alm_14012__en-us_topic_0191813911_row5847780"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="alm_14012__en-us_topic_0191813911_p3908185">RoleName</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="alm_14012__en-us_topic_0191813911_p48127554">Specifies the role for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="alm_14012__en-us_topic_0191813911_row30494806"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="alm_14012__en-us_topic_0191813911_p54160201">IP</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="alm_14012__en-us_topic_0191813911_p24900132">Specifies the service IP address of the JournalNode instance for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
</div>
|
|
<div class="section" id="alm_14012__en-us_topic_0191813911_section41317012"><h4 class="sectiontitle">Impact on the System</h4><p id="alm_14012__en-us_topic_0191813911_p3644816">When a JournalNode is working incorrectly, data on the node is not synchronized with that on other JournalNodes. If data on more than half of JournalNodes is not synchronized, the NameNode cannot work correctly, making the HDFS service unavailable. </p>
|
|
</div>
|
|
<div class="section" id="alm_14012__en-us_topic_0191813911_section36308794"><h4 class="sectiontitle">Possible Causes</h4><ul id="alm_14012__en-us_topic_0191813911_ul26794676"><li id="alm_14012__en-us_topic_0191813911_li39825499">The JournalNode instance has not been started or has been stopped.</li><li id="alm_14012__en-us_topic_0191813911_li22885177">The JournalNode instance is working incorrectly.</li><li id="alm_14012__en-us_topic_0191813911_li4640007">The network of the JournalNode is unreachable.</li></ul>
|
|
</div>
|
|
<div class="section" id="alm_14012__en-us_topic_0191813911_section58343698"><h4 class="sectiontitle">Procedure</h4><ol id="alm_14012__en-us_topic_0191813911_ol25565879153730"><li class="tableheading" id="alm_14012__en-us_topic_0191813911_li23513719153730"><span>Check whether the JournalNode instance has been started.</span><p><ol type="a" id="alm_14012__en-us_topic_0191813911_ol27122565"><li id="alm_14012__en-us_topic_0191813911_li42776493">On the MRS cluster details page, click <strong id="alm_14012__b144911310171">Alarms</strong>. In the alarm list, click the alarm.</li><li id="alm_14012__en-us_topic_0191813911_li49444123">In the <strong id="alm_14012__b172611122177">Alarm Details</strong> area, check <strong id="alm_14012__b8439225101710">Location</strong> and obtain the IP address of the JournalNode for which the alarm is generated.</li><li id="alm_14012__en-us_topic_0191813911_li42343927">Choose <strong id="alm_14012__b0725153117172">Components</strong> > <strong id="alm_14012__b1973018310172">HDFS</strong> > <strong id="alm_14012__b273023131711">Instances</strong>. In the instance list, click the JournalNode for which the alarm is generated and check whether <strong id="alm_14012__b11730031161716">Operating Status</strong> of the node is <strong id="alm_14012__b1973113191714">Started</strong>.<ul class="subitemlist" id="alm_14012__en-us_topic_0191813911_ul45551026"><li id="alm_14012__en-us_topic_0191813911_li7306053">If yes, go to <a href="#alm_14012__en-us_topic_0191813911_alm14012_mmccppss_s6">2.a</a>.</li><li id="alm_14012__en-us_topic_0191813911_li54919424">If no, go to <a href="#alm_14012__en-us_topic_0191813911_alm14012_mmccppss_s4">1.d</a>.</li></ul>
|
|
</li><li id="alm_14012__en-us_topic_0191813911_alm14012_mmccppss_s4"><a name="alm_14012__en-us_topic_0191813911_alm14012_mmccppss_s4"></a><a name="en-us_topic_0191813911_alm14012_mmccppss_s4"></a>Select the JournalNode instance and choose <strong id="alm_14012__b127191955189">More</strong> > <strong id="alm_14012__b571914516184">Start Instance</strong> to start it. </li><li id="alm_14012__en-us_topic_0191813911_li39377290">Wait 5 minutes and check whether the alarm is cleared.<ul class="subitemlist" id="alm_14012__en-us_topic_0191813911_ul18851291"><li id="alm_14012__en-us_topic_0191813911_li35443891">If yes, no further action is required.</li><li id="alm_14012__en-us_topic_0191813911_li50559563">If no, go to <a href="#alm_14012__en-us_topic_0191813911_li572522141314">4</a>.</li></ul>
|
|
</li></ol>
|
|
</p></li><li class="tableheading" id="alm_14012__en-us_topic_0191813911_li3944544153915"><span>Check whether the JournalNode instance is working correctly.</span><p><ol type="a" id="alm_14012__en-us_topic_0191813911_ol42425799154644"><li id="alm_14012__en-us_topic_0191813911_alm14012_mmccppss_s6"><a name="alm_14012__en-us_topic_0191813911_alm14012_mmccppss_s6"></a><a name="en-us_topic_0191813911_alm14012_mmccppss_s6"></a>Check whether <strong id="alm_14012__b174911132111818">Health Status</strong> of the JournalNode instance is <strong id="alm_14012__b14491832121817">Good</strong>. <ul class="subitemlist" id="alm_14012__en-us_topic_0191813911_ul2179128"><li id="alm_14012__en-us_topic_0191813911_li19612160">If yes, go to <a href="#alm_14012__en-us_topic_0191813911_alm14012_mmccppss_s10">3.a</a>.</li><li id="alm_14012__en-us_topic_0191813911_li45081147">If no, go to <a href="#alm_14012__en-us_topic_0191813911_s7">2.b</a>.</li></ul>
|
|
</li><li id="alm_14012__en-us_topic_0191813911_s7"><a name="alm_14012__en-us_topic_0191813911_s7"></a><a name="en-us_topic_0191813911_s7"></a>Select the JournalNode instance and choose <strong id="alm_14012__b3800175511187">More</strong> > <strong id="alm_14012__b9800115518187">Restart Instance</strong> to restart it. </li><li id="alm_14012__en-us_topic_0191813911_li47922135">Wait 5 minutes and check whether the alarm is cleared.<ul class="subitemlist" id="alm_14012__en-us_topic_0191813911_ul28646034"><li id="alm_14012__en-us_topic_0191813911_li56487722">If yes, no further action is required.</li><li id="alm_14012__en-us_topic_0191813911_li38627455">If no, go to <a href="#alm_14012__en-us_topic_0191813911_li572522141314">4</a>.</li></ul>
|
|
</li></ol>
|
|
</p></li><li class="tableheading" id="alm_14012__en-us_topic_0191813911_li12440269153910"><span>Check whether the network of the JournalNode is reachable.</span><p><ol type="a" id="alm_14012__en-us_topic_0191813911_ol59634447154644"><li id="alm_14012__en-us_topic_0191813911_alm14012_mmccppss_s10"><a name="alm_14012__en-us_topic_0191813911_alm14012_mmccppss_s10"></a><a name="en-us_topic_0191813911_alm14012_mmccppss_s10"></a>On the MRS cluster details page, choose <strong id="alm_14012__b4312172519195">Components</strong> > <strong id="alm_14012__b531202514195">HDFS</strong> > <strong id="alm_14012__b4312112517196">Instances</strong> to check the service IP address of the active NameNode.</li><li id="alm_14012__en-us_topic_0191813911_li31664161">Log in to the active NameNode.</li><li id="alm_14012__en-us_topic_0191813911_li14660263">Run the <strong id="alm_14012__b3859163316194">ping</strong> command to check whether a timeout occurs or the network between the active NameNode and the JournalNode is unreachable.<p class="litext" id="alm_14012__en-us_topic_0191813911_p46630661"><strong id="alm_14012__b1152141610242">ping</strong> <em id="alm_14012__i165221916192410">service IP address of the JournalNode</em></p>
|
|
<ul class="subitemlist" id="alm_14012__en-us_topic_0191813911_ul17269710"><li id="alm_14012__en-us_topic_0191813911_li21209668">If yes, go to <a href="#alm_14012__en-us_topic_0191813911_alm14012_mmccppss_s13">3.d</a>.</li><li id="alm_14012__en-us_topic_0191813911_li40261524">If no, go to <a href="#alm_14012__en-us_topic_0191813911_li572522141314">4</a>.</li></ul>
|
|
</li><li id="alm_14012__en-us_topic_0191813911_alm14012_mmccppss_s13"><a name="alm_14012__en-us_topic_0191813911_alm14012_mmccppss_s13"></a><a name="en-us_topic_0191813911_alm14012_mmccppss_s13"></a>Contact O&M personnel to rectify the network fault. Wait 5 minutes and check whether the alarm is cleared.<ul class="subitemlist" id="alm_14012__en-us_topic_0191813911_ul24077873"><li id="alm_14012__en-us_topic_0191813911_li15374270">If yes, no further action is required.</li><li id="alm_14012__en-us_topic_0191813911_li4150710">If no, go to <a href="#alm_14012__en-us_topic_0191813911_li572522141314">4</a>.</li></ul>
|
|
</li></ol>
|
|
</p></li><li id="alm_14012__en-us_topic_0191813911_li572522141314"><a name="alm_14012__en-us_topic_0191813911_li572522141314"></a><a name="en-us_topic_0191813911_li572522141314"></a><span>Collect fault information.</span><p><ol type="a" id="alm_14012__en-us_topic_0191813911_en-us_topic_0191813935_ol6089206913036"><li id="alm_14012__en-us_topic_0191813911_en-us_topic_0191813935_li4478836213036">On MRS Manager, choose <span class="menucascade" id="alm_14012__menucascade205841172212"><b><span class="uicontrol" id="alm_14012__uicontrol1257917202116">System</span></b> > <b><span class="uicontrol" id="alm_14012__uicontrol458487202116">Export Log</span></b></span>.</li><li id="alm_14012__li18574327401">Contact technical support engineers for help. For details, see <a href="https://docs.otc.t-systems.com/en-us/public/learnmore.html" target="_blank" rel="noopener noreferrer">technical support</a>.</li></ol>
|
|
</p></li></ol>
|
|
</div>
|
|
<div class="section" id="alm_14012__en-us_topic_0191813911_section55331235"><h4 class="sectiontitle">Reference</h4><p id="alm_14012__en-us_topic_0191813911_p4139237">None</p>
|
|
</div>
|
|
</div>
|
|
<div>
|
|
<div class="familylinks">
|
|
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_0241.html">Alarm Reference (Applicable to Versions Earlier Than MRS 3.x)</a></div>
|
|
</div>
|
|
</div>
|
|
|