forked from docs/doc-exports
Reviewed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com> Co-authored-by: Yang, Tong <yangtong2@huawei.com> Co-committed-by: Yang, Tong <yangtong2@huawei.com>
90 lines
16 KiB
HTML
90 lines
16 KiB
HTML
<a name="alm_19006"></a><a name="alm_19006"></a>
|
|
|
|
<h1 class="topictitle1">ALM-19006 HBase Replication Sync Failed</h1>
|
|
<div id="body8662426"><div class="section" id="alm_19006__en-us_topic_0191813928_section18389930"><h4 class="sectiontitle">Description</h4><p id="alm_19006__en-us_topic_0191813928_p40902273">This alarm is generated when disaster recovery (DR) data fails to be synchronized to a standby cluster.</p>
|
|
<p id="alm_19006__en-us_topic_0191813928_p32576140">This alarm is cleared when DR data synchronization succeeds.</p>
|
|
</div>
|
|
<div class="section" id="alm_19006__en-us_topic_0191813928_section31291646"><h4 class="sectiontitle">Attribute</h4>
|
|
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="alm_19006__en-us_topic_0191813928_table57434139" frame="border" border="1" rules="all"><thead align="left"><tr id="alm_19006__en-us_topic_0191813928_row461342"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.1"><p id="alm_19006__en-us_topic_0191813928_p37368736">Alarm ID</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.2"><p id="alm_19006__en-us_topic_0191813928_p6968762">Alarm Severity</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.3"><p id="alm_19006__en-us_topic_0191813928_p27598869">Auto Clear</p>
|
|
</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody><tr id="alm_19006__en-us_topic_0191813928_row20915929"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.1 "><p id="alm_19006__en-us_topic_0191813928_p16468652">19006</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.2 "><p id="alm_19006__en-us_topic_0191813928_p58892473">Major</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.3 "><p id="alm_19006__en-us_topic_0191813928_p5560998">Yes</p>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
</div>
|
|
<div class="section" id="alm_19006__en-us_topic_0191813928_section13189358"><h4 class="sectiontitle">Parameters</h4>
|
|
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="alm_19006__en-us_topic_0191813928_table47787675" frame="border" border="1" rules="all"><thead align="left"><tr id="alm_19006__en-us_topic_0191813928_row20947391"><th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.1"><p id="alm_19006__en-us_topic_0191813928_p19017142">Parameter</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.2"><p id="alm_19006__en-us_topic_0191813928_p63993496">Description</p>
|
|
</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody><tr id="alm_19006__en-us_topic_0191813928_row16090703"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="alm_19006__en-us_topic_0191813928_p28278595">ServiceName</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="alm_19006__en-us_topic_0191813928_p8864859">Specifies the service for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="alm_19006__en-us_topic_0191813928_row12674872"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="alm_19006__en-us_topic_0191813928_p20031746">RoleName</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="alm_19006__en-us_topic_0191813928_p11958757">Specifies the role for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="alm_19006__en-us_topic_0191813928_row40519951"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="alm_19006__en-us_topic_0191813928_p60890569">HostName</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="alm_19006__en-us_topic_0191813928_p33189039">Specifies the host for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
</div>
|
|
<div class="section" id="alm_19006__en-us_topic_0191813928_section51595365"><h4 class="sectiontitle">Impact on the System</h4><p id="alm_19006__en-us_topic_0191813928_p3957675">HBase data in a cluster fails to be synchronized to the standby cluster, causing data inconsistency between active and standby clusters.</p>
|
|
</div>
|
|
<div class="section" id="alm_19006__en-us_topic_0191813928_section61705107"><h4 class="sectiontitle">Possible Causes</h4><ul id="alm_19006__en-us_topic_0191813928_ul52136226"><li id="alm_19006__en-us_topic_0191813928_li66572856">The HBase service on the standby cluster is abnormal.</li><li id="alm_19006__en-us_topic_0191813928_li62284798">The network is abnormal.</li></ul>
|
|
</div>
|
|
<div class="section" id="alm_19006__en-us_topic_0191813928_section18475057"><h4 class="sectiontitle">Procedure</h4><ol id="alm_19006__en-us_topic_0191813928_ol44184035151419"><li class="tableheading" id="alm_19006__en-us_topic_0191813928_li47380068151419"><span>Observe whether the system automatically clears the alarm.</span><p><ol type="a" id="alm_19006__en-us_topic_0191813928_ol40026340"><li id="alm_19006__en-us_topic_0191813928_li1487713813414">Go to the cluster details page and choose <strong id="alm_19006__b137726226589">Alarms</strong>.</li><li id="alm_19006__en-us_topic_0191813928_li20908074">In the alarm list, click the alarm to obtain alarm generation time from <strong id="alm_19006__b1422027155812">Generated Time</strong> in <strong id="alm_19006__b184742765816">Alarm Details</strong>. Check whether the alarm has existed for over 5 minutes.<ul class="subitemlist" id="alm_19006__en-us_topic_0191813928_ul53954942"><li id="alm_19006__en-us_topic_0191813928_li15832433">If yes, go to <a href="#alm_19006__en-us_topic_0191813928_li1255962015108">2.a</a>.</li><li id="alm_19006__en-us_topic_0191813928_li7358688">If no, go to <a href="#alm_19006__en-us_topic_0191813928_step3">1.c</a>.</li></ul>
|
|
</li><li id="alm_19006__en-us_topic_0191813928_step3"><a name="alm_19006__en-us_topic_0191813928_step3"></a><a name="en-us_topic_0191813928_step3"></a>Wait 5 minutes and check whether the alarm is automatically cleared.<ul class="subitemlist" id="alm_19006__en-us_topic_0191813928_ul62883878"><li id="alm_19006__en-us_topic_0191813928_li29083993">If yes, no further action is required.</li><li id="alm_19006__en-us_topic_0191813928_li60429346">If no, go to <a href="#alm_19006__en-us_topic_0191813928_li1255962015108">2.a</a>.</li></ul>
|
|
</li></ol>
|
|
</p></li><li class="tableheading" id="alm_19006__en-us_topic_0191813928_li31929913151431"><span>Check the HBase service status of the standby cluster.</span><p><ol type="a" id="alm_19006__en-us_topic_0191813928_ol20295069154455"><li id="alm_19006__en-us_topic_0191813928_li1255962015108"><a name="alm_19006__en-us_topic_0191813928_li1255962015108"></a><a name="en-us_topic_0191813928_li1255962015108"></a>Go to the cluster details page and choose <strong id="alm_19006__b174651155918">Alarms</strong>.</li><li id="alm_19006__en-us_topic_0191813928_li64879470">In the alarm list, click the alarm and obtain <strong id="alm_19006__b5369833593">HostName</strong> from <strong id="alm_19006__b336913365916">Location</strong> in <strong id="alm_19006__b20370133175910">Alarm Details</strong>. </li><li id="alm_19006__en-us_topic_0191813928_li29810081105811">Log in to the node where the HBase client of the active cluster is located. Run the following commands to switch the user:<p id="alm_19006__en-us_topic_0191813928_p16193731105847"><a name="alm_19006__en-us_topic_0191813928_li29810081105811"></a><a name="en-us_topic_0191813928_li29810081105811"></a><strong id="alm_19006__en-us_topic_0191813928_b44973711105911">sudo su - root</strong></p>
|
|
<p id="alm_19006__en-us_topic_0191813928_p999195010594"><strong id="alm_19006__en-us_topic_0191813928_b36710212105911">su - omm</strong></p>
|
|
</li><li id="alm_19006__en-us_topic_0191813928_li16049999">Run the <strong id="alm_19006__b14726213175915">status 'replication', 'source'</strong> command to check the synchronization status of the faulty node.<p class="litext" id="alm_19006__en-us_topic_0191813928_p24981566">The synchronization status of a node is as follows.</p>
|
|
<pre class="screen" id="alm_19006__en-us_topic_0191813928_screen23507505">10-10-10-153:
|
|
SOURCE: PeerID=abc, SizeOfLogQueue=0, ShippedBatches=2, ShippedOps=2, ShippedBytes=320, LogReadInBytes=1636, LogEditsRead=5, LogEditsFiltered=3, SizeOfLogToReplicate=0, TimeForLogToReplicate=0, ShippedHFiles=0, SizeOfHFileRefsQueue=0, AgeOfLastShippedOp=0, TimeStampsOfLastShippedOp=Mon Jul 18 09:53:28 CST 2016, Replication Lag=0, FailedReplicationAttempts=0
|
|
SOURCE: <strong id="alm_19006__en-us_topic_0191813928_b25059790">PeerID=abc1</strong>, SizeOfLogQueue=0, ShippedBatches=1, ShippedOps=1, ShippedBytes=160, LogReadInBytes=1636, LogEditsRead=5, LogEditsFiltered=3, SizeOfLogToReplicate=0, TimeForLogToReplicate=0, ShippedHFiles=0, SizeOfHFileRefsQueue=0, AgeOfLastShippedOp=16788, TimeStampsOfLastShippedOp=Sat Jul 16 13:19:00 CST 2016, Replication Lag=16788, <strong id="alm_19006__en-us_topic_0191813928_b24211521">FailedReplicationAttempts=5</strong></pre>
|
|
</li><li id="alm_19006__en-us_topic_0191813928_li16577099">Obtain <strong id="alm_19006__b45065357594">PeerID</strong> corresponding to a record whose <strong id="alm_19006__b351217354598">FailedReplicationAttempts</strong> value is greater than 0.<p class="litext" id="alm_19006__en-us_topic_0191813928_p14976165">In the preceding step, data on the faulty node <strong id="alm_19006__b45352408599">10-10-10-153</strong> fails to be synchronized to a standby cluster whose <strong id="alm_19006__b1154111401595">PeerID</strong> is <strong id="alm_19006__b9542440165917">abc1</strong>.</p>
|
|
</li><li id="alm_19006__en-us_topic_0191813928_peerid"><a name="alm_19006__en-us_topic_0191813928_peerid"></a><a name="en-us_topic_0191813928_peerid"></a>Run the <strong id="alm_19006__b192120467598">list_peers</strong> command to find the cluster and the HBase instance corresponding to <strong id="alm_19006__b6925174618593">PeerID</strong>.<pre class="screen" id="alm_19006__en-us_topic_0191813928_screen45988614">PEER_ID CLUSTER_KEY STATE TABLE_CFS
|
|
abc1 10.10.10.110,10.10.10.119,10.10.10.133:24002:/hbase2 ENABLED
|
|
abc 10.10.10.110,10.10.10.119,10.10.10.133:24002:/hbase ENABLED </pre>
|
|
<p class="litext" id="alm_19006__en-us_topic_0191813928_p38376892">In the preceding information, <strong id="alm_19006__b1034517562596">/hbase2</strong> indicates that data is synchronized to the HBase2 instance of the standby cluster.</p>
|
|
</li><li id="alm_19006__en-us_topic_0191813928_li21520579">In the service list of the standby cluster, check whether the health status of the HBase instance obtained in <a href="#alm_19006__en-us_topic_0191813928_peerid">2.f</a> is <strong id="alm_19006__b191071353018">Good</strong>.<ul class="subitemlist" id="alm_19006__en-us_topic_0191813928_ul65445301"><li id="alm_19006__en-us_topic_0191813928_li52136803">If yes, go to <a href="#alm_19006__en-us_topic_0191813928_li594194191119">3.a</a>.</li><li id="alm_19006__en-us_topic_0191813928_li62331491">If no, go to <a href="#alm_19006__en-us_topic_0191813928_alm-19000">2.h</a>.</li></ul>
|
|
</li><li id="alm_19006__en-us_topic_0191813928_alm-19000"><a name="alm_19006__en-us_topic_0191813928_alm-19000"></a><a name="en-us_topic_0191813928_alm-19000"></a>In the alarm list, check whether the alarm ALM-19000 HBase Service Unavailable exists.<ul class="subitemlist" id="alm_19006__en-us_topic_0191813928_ul6956456"><li id="alm_19006__en-us_topic_0191813928_li62608109">If yes, go to <a href="#alm_19006__en-us_topic_0191813928_aalm-19006_mmccppss_process">2.i</a>.</li><li id="alm_19006__en-us_topic_0191813928_li38092103">If no, go to <a href="#alm_19006__en-us_topic_0191813928_li594194191119">3.a</a>.</li></ul>
|
|
</li><li id="alm_19006__en-us_topic_0191813928_aalm-19006_mmccppss_process"><a name="alm_19006__en-us_topic_0191813928_aalm-19006_mmccppss_process"></a><a name="en-us_topic_0191813928_aalm-19006_mmccppss_process"></a>Rectify the fault by following the steps provided in ALM-19000 HBase Service Unavailable.</li><li id="alm_19006__en-us_topic_0191813928_li53182860">Wait several minutes and check whether the alarm is cleared.<ul class="subitemlist" id="alm_19006__en-us_topic_0191813928_ul8883693"><li id="alm_19006__en-us_topic_0191813928_li12844374">If yes, no further action is required.</li><li id="alm_19006__en-us_topic_0191813928_li48490504">If no, go to <a href="#alm_19006__en-us_topic_0191813928_li594194191119">3.a</a>.</li></ul>
|
|
</li></ol>
|
|
</p></li><li class="tableheading" id="alm_19006__en-us_topic_0191813928_li5513747315151"><span>Check the network connection between RegionServers on active and standby clusters.</span><p><ol type="a" id="alm_19006__en-us_topic_0191813928_ol44463001154455"><li id="alm_19006__en-us_topic_0191813928_li594194191119"><a name="alm_19006__en-us_topic_0191813928_li594194191119"></a><a name="en-us_topic_0191813928_li594194191119"></a>Go to the cluster details page and choose <strong id="alm_19006__b131521636429">Alarms</strong>.</li><li id="alm_19006__en-us_topic_0191813928_aalm-19006_mmccppss_ip">In the alarm list, click the alarm and obtain <strong id="alm_19006__b1195417377218">HostName</strong> from <strong id="alm_19006__b495412373211">Location</strong> in <strong id="alm_19006__b199541337120">Alarm Details</strong>.</li><li id="alm_19006__en-us_topic_0191813928_li49011679">Log in to the faulty RegionServer node.</li><li id="alm_19006__en-us_topic_0191813928_li27598891">Run the <strong id="alm_19006__b5572174513216">ping</strong> command to check whether the network connection between the faulty RegionServer node and the host where RegionServer of the standby cluster resides is normal.<ul class="subitemlist" id="alm_19006__en-us_topic_0191813928_ul20917673"><li id="alm_19006__en-us_topic_0191813928_li54041329">If yes, go to <a href="#alm_19006__en-us_topic_0191813928_li572522141314">4</a>.</li><li id="alm_19006__en-us_topic_0191813928_li15271547">If no, go to <a href="#alm_19006__en-us_topic_0191813928_s1">3.e</a>.</li></ul>
|
|
</li><li id="alm_19006__en-us_topic_0191813928_s1"><a name="alm_19006__en-us_topic_0191813928_s1"></a><a name="en-us_topic_0191813928_s1"></a>Contact the O&M personnel to restore the network.</li><li id="alm_19006__en-us_topic_0191813928_li59995477">After the network recovers, check whether the alarm is cleared.<ul class="subitemlist" id="alm_19006__en-us_topic_0191813928_ul3088388"><li id="alm_19006__en-us_topic_0191813928_li27795493">If yes, no further action is required.</li><li id="alm_19006__en-us_topic_0191813928_li48832851">If no, go to <a href="#alm_19006__en-us_topic_0191813928_li572522141314">4</a>.</li></ul>
|
|
</li></ol>
|
|
</p></li><li id="alm_19006__en-us_topic_0191813928_li572522141314"><a name="alm_19006__en-us_topic_0191813928_li572522141314"></a><a name="en-us_topic_0191813928_li572522141314"></a><span>Collect fault information.</span><p><ol type="a" id="alm_19006__en-us_topic_0191813928_en-us_topic_0191813935_ol6089206913036"><li id="alm_19006__en-us_topic_0191813928_en-us_topic_0191813935_li4478836213036">On MRS Manager, choose <span class="menucascade" id="alm_19006__menucascade20820135114318"><b><span class="uicontrol" id="alm_19006__uicontrol1581410511313">System</span></b> > <b><span class="uicontrol" id="alm_19006__uicontrol108191651834">Export Log</span></b></span>.</li><li id="alm_19006__li18574327401">Contact technical support engineers for help. For details, see <a href="https://docs.otc.t-systems.com/en-us/public/learnmore.html" target="_blank" rel="noopener noreferrer">technical support</a>.</li></ol>
|
|
</p></li></ol>
|
|
</div>
|
|
<div class="section" id="alm_19006__en-us_topic_0191813928_section32057793"><h4 class="sectiontitle">Reference</h4><p id="alm_19006__en-us_topic_0191813928_p29441404">None</p>
|
|
</div>
|
|
</div>
|
|
<div>
|
|
<div class="familylinks">
|
|
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_0241.html">Alarm Reference (Applicable to Versions Earlier Than MRS 3.x)</a></div>
|
|
</div>
|
|
</div>
|
|
|