doc-exports/docs/mrs/umn/ALM-19013.html
Yang, Tong 3b1f73dece MRS UMN 2.0.38.SP20 version
Reviewed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com>
Co-authored-by: Yang, Tong <yangtong2@huawei.com>
Co-committed-by: Yang, Tong <yangtong2@huawei.com>
2022-12-13 12:03:34 +00:00

91 lines
14 KiB
HTML

<a name="ALM-19013"></a><a name="ALM-19013"></a>
<h1 class="topictitle1">ALM-19013 Duration of Regions in transaction State Exceeds the Threshold</h1>
<div id="body1583718681617"><div class="section" id="ALM-19013__sb4ffbbcbbe9843cb86ae9b6b05fb5eea"><h4 class="sectiontitle">Description</h4><div class="p" id="ALM-19013__p563621218291">The system checks the number of regions in transaction state on HBase every 300 seconds. This alarm is generated when the system detects that the duration of regions in transaction state exceeds the threshold for two consecutive times. This alarm is cleared when all timeout regions are restored.<div class="note" id="ALM-19013__note14544102852418"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p class="text" id="ALM-19013__en-us_topic_0070543520_p32794215">If the multi-instance function is enabled in the cluster and multiple HBase service instances are installed, you need to determine the HBase service instance where the alarm is generated based on the value of <strong id="ALM-19013__en-us_topic_0070543520_b26712487">ServiceName</strong> in <strong id="ALM-19013__en-us_topic_0070543520_b39085796">Location</strong>. For example, if the HBase1 service is unavailable, <strong id="ALM-19013__en-us_topic_0070543520_b11832897">ServiceName=HBase1</strong> is displayed in <strong id="ALM-19013__en-us_topic_0070543520_b39387211">Location</strong>, and the operation object in the procedure needs to be changed from HBase to HBase1.</p>
</div></div>
</div>
</div>
<div class="section" id="ALM-19013__s71389bab718642a1b9f299af777bad63"><h4 class="sectiontitle">Attribute</h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-19013__en-us_topic_0070543524_table31536769" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-19013__en-us_topic_0070543524_row30299850"><th align="left" class="cellrowborder" valign="top" width="33.183318331833185%" id="mcps1.3.2.2.1.4.1.1"><p id="ALM-19013__en-us_topic_0070543524_p38368787">Alarm ID</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.48334833483348%" id="mcps1.3.2.2.1.4.1.2"><p id="ALM-19013__en-us_topic_0070543524_p20864077">Alarm Severity</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.3"><p id="ALM-19013__en-us_topic_0070543524_p12268704">Automatically Cleared</p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-19013__en-us_topic_0070543524_row54240947"><td class="cellrowborder" valign="top" width="33.183318331833185%" headers="mcps1.3.2.2.1.4.1.1 "><p id="ALM-19013__en-us_topic_0070543524_p31440558">19013</p>
</td>
<td class="cellrowborder" valign="top" width="33.48334833483348%" headers="mcps1.3.2.2.1.4.1.2 "><p id="ALM-19013__en-us_topic_0070543524_p63657274">Major</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.3 "><p id="ALM-19013__en-us_topic_0070543524_p55965566">Yes</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-19013__sd187279c273945a3b238f7517f568b18"><h4 class="sectiontitle">Parameters</h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-19013__en-us_topic_0070543524_table36917027" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-19013__en-us_topic_0070543524_row16343335"><th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.1"><p id="ALM-19013__en-us_topic_0070543524_p48741777">Name</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.2"><p id="ALM-19013__en-us_topic_0070543524_p55769855">Meaning</p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-19013__row19385130101"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-19013__p192431315431">Source</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-19013__p692551319435">Specifies the cluster for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-19013__en-us_topic_0070543524_row21064410"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-19013__en-us_topic_0070543524_p28495625">ServiceName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-19013__p6263923266">Specifies the service for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-19013__en-us_topic_0070543524_row36671901"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-19013__en-us_topic_0070543524_p17633966">RoleName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-19013__p1262413152619">Specifies the role for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-19013__en-us_topic_0070543524_row37368498"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-19013__en-us_topic_0070543524_p6949536">HostName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-19013__p627116222260">Specifies the host for which the alarm is generated.</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-19013__saa7931d75c494f168206ee267c0cbc05"><h4 class="sectiontitle">Impact on the System</h4><p id="ALM-19013__p1653612611291">Some data in the table gets lost or becomes unavailable.</p>
</div>
<div class="section" id="ALM-19013__s1cb3f01d39b04147b1050cfa338c7476"><h4 class="sectiontitle">Possible Causes</h4><ul id="ALM-19013__ul14144143220294"><li id="ALM-19013__li1314416328296">Compaction is permanently blocked.</li><li id="ALM-19013__li1114453212292">The HDFS files are abnormal.</li></ul>
</div>
<div class="section" id="ALM-19013__section2527171553316"><h4 class="sectiontitle">Procedure</h4><p class="tableheading" id="ALM-19013__en-us_topic_0070543524_p30826283"><strong id="ALM-19013__b1394845192812">Locate the alarm cause.</strong></p>
<ol id="ALM-19013__ol94581441133118"><li id="ALM-19013__li12457174173118"><span>On the FusionInsight Manager, choose <strong id="ALM-19013__b17421132514919">O&amp;M</strong> &gt; <strong id="ALM-19013__b6421102534913">Alarm</strong> &gt; <strong id="ALM-19013__b20421192574910">Alarms</strong>, select this alarm, and view the <strong id="ALM-19013__b1535911521325">HostName </strong>and <strong id="ALM-19013__b1635995217328">RoleName </strong>in <strong id="ALM-19013__b1421142524918">Location</strong>.</span></li><li id="ALM-19013__li24565123511"><span>Choose <strong id="ALM-19013__b945791285112">Cluster</strong> &gt; <em id="ALM-19013__i2457151285110">Name of the desired cluster</em> &gt; <strong id="ALM-19013__b979511533814">Services &gt; HBase</strong>, Click the drop-down menu in the chartarea and<strong id="ALM-19013__b12629741494"> </strong>choose <strong id="ALM-19013__b0457201218516">Customize &gt;</strong><strong id="ALM-19013__b15458131265110"> </strong><strong id="ALM-19013__b17638171714511">Service </strong>&gt;</span><p><div class="p" id="ALM-19013__p2045311214515"><strong id="ALM-19013__b16930142520308">Region in transaction count </strong>to view <strong id="ALM-19013__b20102154118308">Region in transaction count over threshold</strong>. Check whether the monitoring item detects a value in three consecutive detection periods. (The default threshold is 60 seconds.)<ul id="ALM-19013__ul154581741103118"><li id="ALM-19013__li164573411317">If yes, go to <a href="#ALM-19013__li0444398318">3</a>.</li><li id="ALM-19013__li1245815413317">If no, go to <a href="#ALM-19013__li11456104183119">7</a>.</li></ul>
</div>
</p></li><li id="ALM-19013__li0444398318"><a name="ALM-19013__li0444398318"></a><a name="li0444398318"></a><span>Choose <strong id="ALM-19013__b1944173910312">Cluster</strong> &gt; <em id="ALM-19013__i1844139163119">Name of the desired cluster</em> &gt; <strong id="ALM-19013__b444173919318">Services</strong> &gt; <strong id="ALM-19013__b1144193973111">HBase</strong> &gt; <strong id="ALM-19013__b154463913112">HMaster (Active)</strong> &gt; <strong id="ALM-19013__b9446392319">Tables</strong> to check whether the regions of only one table transaction status time out.</span><p><ul id="ALM-19013__ul823353623818"><li id="ALM-19013__li142341636193817">If yes, go to <a href="#ALM-19013__li1318724573113">4</a>.</li><li id="ALM-19013__li8234136143814">If no, go to <a href="#ALM-19013__li11456104183119">7</a>.</li></ul>
</p></li><li id="ALM-19013__li1318724573113"><a name="ALM-19013__li1318724573113"></a><a name="li1318724573113"></a><span>Run the <strong id="ALM-19013__b019184523111">hbase hbck</strong> command on the client and check whether the error message "No table descriptor file under hdfs://hacluster/hbase/data/default/table" is displayed.</span><p><ul id="ALM-19013__ul192018173516"><li id="ALM-19013__li122012119354">If yes, go to <a href="#ALM-19013__li417435203115">5</a>.</li><li id="ALM-19013__li92011411358">If no, go to <a href="#ALM-19013__li11456104183119">7</a>.</li></ul>
</p></li><li id="ALM-19013__li417435203115"><a name="ALM-19013__li417435203115"></a><a name="li417435203115"></a><span>Log in to the client as user <strong id="ALM-19013__b61741252163111">root</strong>. <span id="ALM-19013__text02267619200"></span>Run the following command:</span><p><p id="ALM-19013__p92311121010"><strong id="ALM-19013__b168428301313">cd</strong> <em id="ALM-19013__i3649131817370">client installation directory</em></p>
<p id="ALM-19013__p193473119113"><strong id="ALM-19013__b897633619112">source bigdata_env</strong></p>
<p id="ALM-19013__p28620373116">If the cluster is in security mode, run the<strong id="ALM-19013__b6653162211402"> kinit hbase</strong> command</p>
<p id="ALM-19013__en-us_topic_0266013831_p11897448174317">Log in to the HMaster WebUI, choose <strong id="ALM-19013__en-us_topic_0266013831_b53761932174920">Procedure &amp; Locks</strong> in the navigation tree, and check whether any process ID is in the <strong id="ALM-19013__en-us_topic_0266013831_b92021140114910">Waiting</strong> state in <strong id="ALM-19013__b1184151921018">Procedures</strong>. If yes, run the following command to release the procedure lock:</p>
<p id="ALM-19013__en-us_topic_0266013831_p5597411164412"><strong id="ALM-19013__b1612413186476">hbase hbck -j </strong><em id="ALM-19013__i7126718184719">client installation directory</em><strong id="ALM-19013__b61241018164714">/HBase/hbase/tools/hbase-hbck2-*.jar bypass -o </strong><em id="ALM-19013__en-us_topic_0266013831_i91087177114">pid</em></p>
<p id="ALM-19013__en-us_topic_0266013831_p19326719134413">Check whether the state is in the <strong id="ALM-19013__en-us_topic_0266013831_b1922219569498">Bypass</strong> state. If the procedure on the UI is always in <strong id="ALM-19013__en-us_topic_0266013831_b3344193865017">RUNNABLE(Bypass)</strong> state, perform an active/standby switchover. Run the <strong id="ALM-19013__en-us_topic_0266013831_b05741245145011">assigns</strong> command to bring the region online again.</p>
<p id="ALM-19013__en-us_topic_0266013831_p2365135112447"><strong id="ALM-19013__b310823315473">hbase hbck -j </strong><em id="ALM-19013__i111217336476">client installation directory</em><strong id="ALM-19013__b111091833104719">/HBase/hbase/tools/hbase-hbck2-*.jar assigns -o </strong><em id="ALM-19013__en-us_topic_0266013831_i7933185410119">regionName</em></p>
</p></li><li id="ALM-19013__li14511419377"><span>Repeat <a href="#ALM-19013__li1318724573113">4</a>. Run the <strong id="ALM-19013__b19759052183719">hbase hbck</strong> command on the client and check whether the error message "No table descriptor file under hdfs://hacluster/hbase/data/default/table" is displayed.</span><p><ul id="ALM-19013__ul1036512614391"><li id="ALM-19013__li173651226163913">If yes, go to <a href="#ALM-19013__li11456104183119">7</a>.</li><li id="ALM-19013__li11366192610399">If no, no further action is required.</li></ul>
</p></li></ol>
<p class="tableheading" id="ALM-19013__p5564114195739"><strong id="ALM-19013__b48315751195859">Collect fault information.</strong></p>
<ol start="7" id="ALM-19013__ol24564416317"><li id="ALM-19013__li11456104183119"><a name="ALM-19013__li11456104183119"></a><a name="li11456104183119"></a><span>On the FusionInsight Manager page of the active and standby clusters, choose <strong id="ALM-19013__b845674115313">O&amp;M</strong> &gt; <strong id="ALM-19013__b645614413315">Log</strong> &gt; <strong id="ALM-19013__b12456741123117">Download</strong>.</span></li><li id="ALM-19013__li645614153112"><span>In the <strong id="ALM-19013__b84561641133115">Service</strong> area, select faulty HBase services in the required cluster.</span></li><li id="ALM-19013__li1145664103113"><span>Click <span><img id="ALM-19013__image1945644173117" src="en-us_image_0269417429.png"></span> in the upper right corner, and set <strong id="ALM-19013__b6456941173117">Start Date</strong> and <strong id="ALM-19013__b11456154113318">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-19013__b13456164113319">Download</strong>.</span></li><li id="ALM-19013__li245654113112"><span>Contact the <span id="ALM-19013__text4614151421417">O&amp;M personnel</span> and send the collected logs.</span></li></ol>
</div>
<div class="section" id="ALM-19013__section1529716184534"><h4 class="sectiontitle">Alarm Clearing</h4><p id="ALM-19013__p4677152685316">After the fault is rectified, the system automatically clears this alarm.</p>
</div>
<div class="section" id="ALM-19013__sbf8de1837d864a08a95c2656ca095639"><h4 class="sectiontitle">Related Information</h4><p id="ALM-19013__en-us_topic_0070543524_p48858863">None</p>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
</div>
</div>