doc-exports/docs/mrs/umn/ALM-16003.html
Yang, Tong 3b1f73dece MRS UMN 2.0.38.SP20 version
Reviewed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com>
Co-authored-by: Yang, Tong <yangtong2@huawei.com>
Co-committed-by: Yang, Tong <yangtong2@huawei.com>
2022-12-13 12:03:34 +00:00

90 lines
11 KiB
HTML

<a name="ALM-16003"></a><a name="ALM-16003"></a>
<h1 class="topictitle1">ALM-16003 Background Thread Usage Exceeds the Threshold</h1>
<div id="body1546480527717"><div class="section" id="ALM-16003__section27621550"><h4 class="sectiontitle">Description</h4><p id="ALM-16003__p1519816401189">The system checks the background thread usage in every 30 seconds. This alarm is generated when the usage of the background thread pool of Hive exceeds the threshold, 90% by default.</p>
<div class="note" id="ALM-16003__note2859924"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="ALM-16003__p9570113574215">MRS 3.X supports the multi-instance function. If the multi-instance function is enabled in the cluster and multiple Hive services are installed, determine the Hive service for which the alarm is generated based on the value of <strong id="ALM-16003__b427312493811">ServiceName</strong> in <strong id="ALM-16003__b1727317491687">Location</strong> of the alarm. For example, if Hive1 service is unavailable, <strong id="ALM-16003__b627344914815">ServiceName</strong> is set to <strong id="ALM-16003__b1727374919818">Hive1</strong> in <strong id="ALM-16003__b14273164915814">Location</strong>, and the operation object in the handling procedure is changed from Hive to Hive1.</p>
</div></div>
</div>
<div class="section" id="ALM-16003__section47267361"><h4 class="sectiontitle">Attribute</h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-16003__table4510519" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-16003__row13556155"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.1"><p id="ALM-16003__p24306760">Alarm ID</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.2"><p id="ALM-16003__p22690525">Alarm Severity</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.3"><p id="ALM-16003__p25993251">Auto Clear</p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-16003__row25078593"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.1 "><p id="ALM-16003__p18100192">16003</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.2 "><p id="ALM-16003__en-us_topic_0070543660_p29943579">Major</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.3 "><p id="ALM-16003__p39785152">Yes</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-16003__section22753069"><h4 class="sectiontitle">Parameters</h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-16003__table1371856" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-16003__row6362194"><th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.1"><p id="ALM-16003__p45575691">Name</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.2"><p id="ALM-16003__en-us_topic_0070585193_p57841085">Meaning</p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-16003__row133671056133110"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-16003__p192431315431">Source</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-16003__p692551319435">Specifies the cluster for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-16003__row52122619"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-16003__p61182593">ServiceName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-16003__p1062616177113">Specifies the service for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-16003__row41825285"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-16003__p32404954">RoleName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-16003__p819482611119">Specifies the role for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-16003__row891897"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-16003__p5134838">HostName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-16003__p1385817435110">Specifies the host for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-16003__row137328481717"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-16003__en-us_topic_0070543656_p61886481">Trigger condition</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-16003__en-us_topic_0070543656_p46749077">Specifies the threshold triggering the alarm. If the current indicator value exceeds this threshold, the alarm is generated.</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-16003__section3451036"><h4 class="sectiontitle">Impact on the System</h4><p id="ALM-16003__p1696618121017">There are too many background threads, so the newly submitted task cannot run in time.</p>
</div>
<div class="section" id="ALM-16003__section31059325"><h4 class="sectiontitle">Possible Causes</h4><div class="p" id="ALM-16003__p1014764818453">The usage of the background thread pool of Hive is excessively high when:<ul id="ALM-16003__ul103347393452"><li id="ALM-16003__li433483911452">There are many tasks executed in the background thread pool of HiveServer.</li><li id="ALM-16003__li1633410393454">The capacity of the background thread pool of HiveServer is too small.</li></ul>
</div>
</div>
<div class="section" id="ALM-16003__section882417588394"><h4 class="sectiontitle">Procedure</h4><p class="tableheading" id="ALM-16003__p42972417"><strong id="ALM-16003__b15187155953311">Check the number of tasks executed in the background thread pool of HiveServer.</strong></p>
<ol id="ALM-16003__ol720318153816"><li id="ALM-16003__li17187158123815"><span>On the FusionInsight Manager portal, choose <strong id="ALM-16003__b1191235611411"><strong id="ALM-16003__b1791220566416">Cluster</strong></strong> &gt; <em id="ALM-16003__i18914856184120">Name of the desired cluster</em> &gt; <strong id="ALM-16003__b6475159203418">Services</strong> &gt; <strong id="ALM-16003__b134753963416">Hive</strong>. On the displayed page, click <strong id="ALM-16003__b347516963410">HiveServer Instance</strong> and check values of <strong id="ALM-16003__b99961945112819">Background Thread Count</strong> and <strong id="ALM-16003__b658813502284">Background Thread Usage</strong>.</span></li><li id="ALM-16003__li12203108183817"><span>Check whether the number of background threads in the latest half an hour is excessively high. (By default, the queue number is 100, and the thread number is considered as high if it is 90 or larger.)</span><p><ul id="ALM-16003__ul1520348173811"><li id="ALM-16003__li15914143615344">If it is, go to <a href="#ALM-16003__li7203188143816">3</a>.</li><li id="ALM-16003__li1391453618342">If it is not, go to <a href="#ALM-16003__li1418798143810">5</a>.</li></ul>
</p></li><li id="ALM-16003__li7203188143816"><a name="ALM-16003__li7203188143816"></a><a name="li7203188143816"></a><span>Adjust the number of tasks submitted to the background thread pool. (For example, cancel some time-consuming tasks with low performance.)</span></li><li id="ALM-16003__li52037810389"><span>Check whether the values of Background Thread Count and Background Thread Usage decrease.</span><p><ul id="ALM-16003__ul62039814388"><li id="ALM-16003__li77710394368">If it is, go to <a href="#ALM-16003__li73422961119">7</a>.</li><li id="ALM-16003__li28093953617">If it is not, go to <a href="#ALM-16003__li1418798143810">5</a>.</li></ul>
</p></li></ol>
<p class="tableheading" id="ALM-16003__p10529446161349"><strong id="ALM-16003__b39108557367">Check the capacity of the HiveServer background thread pool.</strong></p>
<ol start="5" id="ALM-16003__ol21872813810"><li id="ALM-16003__li1418798143810"><a name="ALM-16003__li1418798143810"></a><a name="li1418798143810"></a><span>On the FusionInsight Manager portal, choose <strong id="ALM-16003__b1649018034313"><strong id="ALM-16003__b15490130164312">Cluster</strong></strong> &gt; <em id="ALM-16003__i174907013433">Name of the desired cluster</em> &gt; <strong id="ALM-16003__b72867793712">Services</strong> &gt; <strong id="ALM-16003__b42863793711">Hive</strong>. On the displayed page, click <strong id="ALM-16003__b112861715376">HiveServer Instance</strong> and check values of Background Thread Count and Background Thread Usage.</span></li><li id="ALM-16003__li161877883810"><span>Increase the value of <strong id="ALM-16003__b59861186406">hive.server2.async.exec.threads</strong> in the <strong id="ALM-16003__b164670531132">${BIGDATA_HOME}/FusionInsight_HD_<span id="ALM-16003__text2078258416">8.1.0.1</span>/1_23_HiveServer/etc/hive-site.xml</strong> file. For example, increase the value by 20%.</span></li><li id="ALM-16003__li73422961119"><a name="ALM-16003__li73422961119"></a><a name="li73422961119"></a><span>Save the modification.</span></li><li id="ALM-16003__li151870863819"><span>Check whether the alarm is cleared.</span><p><ul id="ALM-16003__ul197447004119"><li id="ALM-16003__li274412011417">If it is, no further action is required.</li><li id="ALM-16003__li11744908418">If it is not, go to <a href="#ALM-16003__li3112518015571">9</a>.</li></ul>
</p></li></ol>
<p id="ALM-16003__p41204860155343"><strong id="ALM-16003__b39977233155638">Collect fault information.</strong></p>
<ol start="9" id="ALM-16003__ol52371914155725"><li id="ALM-16003__li3112518015571"><a name="ALM-16003__li3112518015571"></a><a name="li3112518015571"></a><span>On the FusionInsight Manager portal, choose <strong id="ALM-16003__b39977366113627">O&amp;M</strong> &gt; <strong id="ALM-16003__b24251979113627">Log &gt; Download</strong>.</span></li><li id="ALM-16003__li3694115571"><span>Select <strong id="ALM-16003__b1169116915571">Hive</strong> in the required cluster from the <strong id="ALM-16003__b3811166215571">Service</strong>.</span></li><li id="ALM-16003__li1145664103113"><span>Click <span><img id="ALM-16003__image1945644173117" src="en-us_image_0269417379.png"></span> in the upper right corner, and set <strong id="ALM-16003__b6456941173117">Start Date</strong> and <strong id="ALM-16003__b11456154113318">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-16003__b13456164113319">Download</strong>.</span></li><li id="ALM-16003__li4104619215571"><span>Contact the <span id="ALM-16003__text4614151421417">O&amp;M personnel</span> and send the collected logs.</span></li></ol>
</div>
<div class="section" id="ALM-16003__section1529716184534"><h4 class="sectiontitle">Alarm Clearing</h4><p id="ALM-16003__p4677152685316">After the fault is rectified, the system automatically clears this alarm.</p>
</div>
<div class="section" id="ALM-16003__section56407894"><h4 class="sectiontitle">Related Information</h4><p id="ALM-16003__p40534999">None</p>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
</div>
</div>