Files
doc-exports/docs/mrs/umn/alm_18010.html
Yang, Tong 3b1f73dece MRS UMN 2.0.38.SP20 version
Reviewed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com>
Co-authored-by: Yang, Tong <yangtong2@huawei.com>
Co-committed-by: Yang, Tong <yangtong2@huawei.com>
2022-12-13 12:03:34 +00:00

81 lines
10 KiB
HTML

<a name="alm_18010"></a><a name="alm_18010"></a>
<h1 class="topictitle1">ALM-18010 Number of Pending Yarn Tasks Exceeds the Threshold</h1>
<div id="body8662426"><div class="section" id="alm_18010__en-us_topic_0227101889_en-us_topic_0087039425_section43920869"><h4 class="sectiontitle">Description</h4><p id="alm_18010__en-us_topic_0227101889_p1232016261796">The system checks the number of pending Yarn tasks every 30 seconds and compares the number of tasks with the threshold. This alarm is generated when the number of pending tasks exceeds the threshold.</p>
<p id="alm_18010__en-us_topic_0227101889_p143206265919">You can change the threshold by choosing <strong id="alm_18010__b15726609903337">System</strong> &gt; <strong id="alm_18010__b10231273183337">Configure Alarm Threshold</strong> &gt; <strong id="alm_18010__b1370329143337">Service</strong> &gt; <strong id="alm_18010__b3287657193337">Yarn</strong> &gt; <strong id="alm_18010__b6905606523337">Queue Root Pending Applications</strong> &gt; <strong id="alm_18010__b12213021763337">Queue Root Pending Applications</strong> on MRS Manager.</p>
<p id="alm_18010__en-us_topic_0227101889_p832019266918">This alarm is cleared when the number of pending tasks is less than or equal to the threshold.</p>
</div>
<div class="section" id="alm_18010__en-us_topic_0227101889_en-us_topic_0087039425_section59743502"><h4 class="sectiontitle">Attribute</h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="alm_18010__en-us_topic_0227101889_en-us_topic_0087039425_table64843092" frame="border" border="1" rules="all"><thead align="left"><tr id="alm_18010__en-us_topic_0227101889_en-us_topic_0087039425_row10409628"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.1"><p id="alm_18010__en-us_topic_0227101889_en-us_topic_0087039425_p37873528">Alarm ID</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.2"><p id="alm_18010__en-us_topic_0227101889_en-us_topic_0087039425_p47856888">Alarm Severity</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.3"><p id="alm_18010__en-us_topic_0227101889_en-us_topic_0087039425_p51202692">Automatically Cleared</p>
</th>
</tr>
</thead>
<tbody><tr id="alm_18010__en-us_topic_0227101889_en-us_topic_0087039425_row53777413"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.1 "><p id="alm_18010__en-us_topic_0227101889_p431016914314">18010</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.2 "><p id="alm_18010__en-us_topic_0227101889_p73091983110">Major</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.3 "><p id="alm_18010__en-us_topic_0227101889_p2308169103111">Yes</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="alm_18010__en-us_topic_0227101889_en-us_topic_0087039425_section820607"><h4 class="sectiontitle">Parameters</h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="alm_18010__en-us_topic_0227101889_en-us_topic_0087039425_table66543927" frame="border" border="1" rules="all"><thead align="left"><tr id="alm_18010__en-us_topic_0227101889_en-us_topic_0087039425_row61284534"><th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.1"><p id="alm_18010__en-us_topic_0227101889_en-us_topic_0087039425_p65100236">Parameter</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.2"><p id="alm_18010__en-us_topic_0227101889_en-us_topic_0087039425_p38627770">Description</p>
</th>
</tr>
</thead>
<tbody><tr id="alm_18010__en-us_topic_0227101889_en-us_topic_0087039425_row41841705"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="alm_18010__en-us_topic_0227101889_p9439174316">ServiceName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="alm_18010__en-us_topic_0227101889_en-us_topic_0087039425_p48178601">Specifies the service for which the alarm is generated.</p>
</td>
</tr>
<tr id="alm_18010__en-us_topic_0227101889_en-us_topic_0087039425_row30954226"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="alm_18010__en-us_topic_0227101889_en-us_topic_0087039425_p24264406">RoleName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="alm_18010__en-us_topic_0227101889_p8405174319">Specifies the role for which the alarm is generated.</p>
</td>
</tr>
<tr id="alm_18010__en-us_topic_0227101889_en-us_topic_0087039425_row39121107"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="alm_18010__en-us_topic_0227101889_en-us_topic_0087039425_p14693133">HostName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="alm_18010__en-us_topic_0227101889_en-us_topic_0087039425_p49293152">Specifies the host for which the alarm is generated.</p>
</td>
</tr>
<tr id="alm_18010__en-us_topic_0227101889_row16824930152416"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="alm_18010__en-us_topic_0227101889_p131752386244">Trigger Condition</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="alm_18010__en-us_topic_0227101889_p10175183822415">Specifies the threshold for triggering the alarm.</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="alm_18010__en-us_topic_0227101889_en-us_topic_0087039425_section7385465"><h4 class="sectiontitle">Impact on the System</h4><p id="alm_18010__en-us_topic_0227101889_p174771631162512">Tasks may be stacked and cannot be processed in a timely manner.</p>
</div>
<div class="section" id="alm_18010__en-us_topic_0227101889_en-us_topic_0087039425_section66469189"><h4 class="sectiontitle">Possible Causes</h4><p id="alm_18010__en-us_topic_0227101889_p95853492250">The computing capability of the cluster is lower than the task submission rate. As a result, the task cannot be processed in a timely manner after being submitted.</p>
</div>
<div class="section" id="alm_18010__en-us_topic_0227101889_section14111549283"><h4 class="sectiontitle">Procedure</h4><ol id="alm_18010__en-us_topic_0227101889_ol41220546282"><li id="alm_18010__en-us_topic_0227101889_li5561948133015"><span>Check the usage of memory and vCores on the Yarn page.</span><p><p id="alm_18010__en-us_topic_0227101889_p145621248143010">Check whether the values of <strong id="alm_18010__b21241347603337">Memory Used|Memory Total</strong> and <strong id="alm_18010__b21282556803337">VCores Used|VCores Total</strong> on the native Yarn page reach or approach the maximum values.</p>
<ul id="alm_18010__en-us_topic_0227101889_ul171551625143111"><li id="alm_18010__en-us_topic_0227101889_li415512593112">If yes, go to <a href="#alm_18010__en-us_topic_0227101889_li181801656143013">2</a>.</li><li id="alm_18010__en-us_topic_0227101889_li18777182723118">If no, go to <a href="#alm_18010__en-us_topic_0227101889_li572522141314">5</a>.</li></ul>
</p></li><li id="alm_18010__en-us_topic_0227101889_li181801656143013"><a name="alm_18010__en-us_topic_0227101889_li181801656143013"></a><a name="en-us_topic_0227101889_li181801656143013"></a><span>Check the number of submitted tasks.</span><p><p id="alm_18010__en-us_topic_0227101889_p19176742153114">Check whether the running tasks are submitted at a normal frequency.</p>
<ul id="alm_18010__en-us_topic_0227101889_ul5721106323"><li id="alm_18010__en-us_topic_0227101889_li69027012322">If yes, go to <a href="#alm_18010__en-us_topic_0227101889_li10509161210322">3</a>.</li><li id="alm_18010__en-us_topic_0227101889_li107214010327">If no, go to <a href="#alm_18010__en-us_topic_0227101889_li572522141314">5</a>.</li></ul>
</p></li><li id="alm_18010__en-us_topic_0227101889_li10509161210322"><a name="alm_18010__en-us_topic_0227101889_li10509161210322"></a><a name="en-us_topic_0227101889_li10509161210322"></a><span>Scale out the cluster.</span><p><p id="alm_18010__en-us_topic_0227101889_p1933712013322">The scale-out is based on the site requirements. For details, see <a href="mrs_01_0041.html">Manually Scaling Out a Cluster</a>.</p>
</p></li><li id="alm_18010__en-us_topic_0227101889_li017317411188"><span>After the scale-out is completed, check whether the alarm is cleared.</span><p><ul id="alm_18010__en-us_topic_0227101889_ul18904491882"><li id="alm_18010__en-us_topic_0227101889_li68901649483">If yes, no further action is required.</li><li id="alm_18010__en-us_topic_0227101889_li208906499811">If no, go to <a href="#alm_18010__en-us_topic_0227101889_li572522141314">5</a>.</li></ul>
</p></li><li id="alm_18010__en-us_topic_0227101889_li572522141314"><a name="alm_18010__en-us_topic_0227101889_li572522141314"></a><a name="en-us_topic_0227101889_li572522141314"></a><span>Collect fault information.</span><p><ol type="a" id="alm_18010__en-us_topic_0227101889_en-us_topic_0191813935_ol6089206913036"><li id="alm_18010__en-us_topic_0227101889_en-us_topic_0191813935_li4478836213036">On MRS Manager, choose <span class="menucascade" id="alm_18010__menucascade8828882223337"><b><span class="uicontrol" id="alm_18010__uicontrol6213912363337">System</span></b> &gt; <b><span class="uicontrol" id="alm_18010__uicontrol9711372773337">Export Log</span></b></span>.</li><li id="alm_18010__li18574327401">Contact technical support engineers for help. For details, see <a href="https://docs.otc.t-systems.com/en-us/public/learnmore.html" target="_blank" rel="noopener noreferrer">technical support</a>.</li></ol>
</p></li></ol>
</div>
<div class="section" id="alm_18010__en-us_topic_0227101889_en-us_topic_0087039425_section15295265"><h4 class="sectiontitle">Reference</h4><p id="alm_18010__en-us_topic_0227101889_en-us_topic_0087039425_p7510612">None</p>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_0241.html">Alarm Reference (Applicable to Versions Earlier Than MRS 3.x)</a></div>
</div>
</div>