Files
doc-exports/docs/mrs/umn/alm_18006.html
Yang, Tong 3b1f73dece MRS UMN 2.0.38.SP20 version
Reviewed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com>
Co-authored-by: Yang, Tong <yangtong2@huawei.com>
Co-committed-by: Yang, Tong <yangtong2@huawei.com>
2022-12-13 12:03:34 +00:00

81 lines
10 KiB
HTML

<a name="alm_18006"></a><a name="alm_18006"></a>
<h1 class="topictitle1">ALM-18006 MapReduce Job Execution Timeout</h1>
<div id="body8662426"><div class="section" id="alm_18006__en-us_topic_0191813946_section6587942"><h4 class="sectiontitle">Description</h4><p id="alm_18006__en-us_topic_0191813946_p45484537">The alarm module checks the MapReduce job execution every 30 seconds. This alarm is generated when the execution of a submitted MapReduce job times out.</p>
<p id="alm_18006__en-us_topic_0191813946_p60368847">This alarm must be manually cleared.</p>
</div>
<div class="section" id="alm_18006__en-us_topic_0191813946_section59291480"><h4 class="sectiontitle">Attribute</h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="alm_18006__en-us_topic_0191813946_table58038438" frame="border" border="1" rules="all"><thead align="left"><tr id="alm_18006__en-us_topic_0191813946_row33645886"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.1"><p id="alm_18006__en-us_topic_0191813946_p40962215">Alarm ID</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.2"><p id="alm_18006__en-us_topic_0191813946_p29605080">Alarm Severity</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.3"><p id="alm_18006__en-us_topic_0191813946_p49201274">Auto Clear</p>
</th>
</tr>
</thead>
<tbody><tr id="alm_18006__en-us_topic_0191813946_row25880244"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.1 "><p id="alm_18006__en-us_topic_0191813946_p15925039">18006</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.2 "><p id="alm_18006__en-us_topic_0191813946_p14859795">Major</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.3 "><p id="alm_18006__en-us_topic_0191813946_p62792710">No</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="alm_18006__en-us_topic_0191813946_section63861276"><h4 class="sectiontitle">Parameters</h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="alm_18006__en-us_topic_0191813946_table53044787" frame="border" border="1" rules="all"><thead align="left"><tr id="alm_18006__en-us_topic_0191813946_row2530563"><th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.1"><p id="alm_18006__en-us_topic_0191813946_p3649016">Parameter</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.2"><p id="alm_18006__en-us_topic_0191813946_p27134857">Description</p>
</th>
</tr>
</thead>
<tbody><tr id="alm_18006__en-us_topic_0191813946_row50439840"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="alm_18006__en-us_topic_0191813946_p59095202">ServiceName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="alm_18006__en-us_topic_0191813946_p21982073">Specifies the service for which the alarm is generated.</p>
</td>
</tr>
<tr id="alm_18006__en-us_topic_0191813946_row63620936"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="alm_18006__en-us_topic_0191813946_p53022201">RoleName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="alm_18006__en-us_topic_0191813946_p66939890">Specifies the role for which the alarm is generated.</p>
</td>
</tr>
<tr id="alm_18006__en-us_topic_0191813946_row65588106"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="alm_18006__en-us_topic_0191813946_p11036355">HostName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="alm_18006__en-us_topic_0191813946_p21529561">Specifies the host for which the alarm is generated.</p>
</td>
</tr>
<tr id="alm_18006__en-us_topic_0191813946_row59548322"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="alm_18006__en-us_topic_0191813946_p58684749">Trigger condition</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="alm_18006__en-us_topic_0191813946_p55844233">Generates an alarm when the actual indicator value exceeds the specified threshold.</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="alm_18006__en-us_topic_0191813946_section37880580"><h4 class="sectiontitle">Impact on the System</h4><p id="alm_18006__en-us_topic_0191813946_p27089021">Execution of the submitted MapReduce job times out, so no execution result can be obtained. Execute the job again after rectifying the fault.</p>
</div>
<div class="section" id="alm_18006__en-us_topic_0191813946_section5380904"><h4 class="sectiontitle">Possible Causes</h4><p id="alm_18006__en-us_topic_0191813946_p46727122">It takes a long time to execute a MapReduce job. However, the specified time is less than the required execution time.</p>
</div>
<div class="section" id="alm_18006__en-us_topic_0191813946_section48428136"><h4 class="sectiontitle">Procedure</h4><ol id="alm_18006__en-us_topic_0191813946_ol52840577151946"><li class="tableheading" id="alm_18006__en-us_topic_0191813946_li59552285151946"><span>Check whether time is improperly set.</span><p><div class="p" id="alm_18006__en-us_topic_0191813946_p43146212152024">Set <strong id="alm_18006__b125601650112919">-Dapplication.timeout.interval</strong> to a larger value, or do not set the parameter. Check whether the MapReduce job can be executed.<ul class="subitemlist" id="alm_18006__en-us_topic_0191813946_ul8928942"><li id="alm_18006__en-us_topic_0191813946_li13251620">If yes, go to <a href="#alm_18006__en-us_topic_0191813946_clean">2.e</a>.</li><li id="alm_18006__en-us_topic_0191813946_li66748299">If no, go to <a href="#alm_18006__en-us_topic_0191813946_substep_03d21a89">2.b</a>.</li></ul>
</div>
</p></li><li class="tableheading" id="alm_18006__en-us_topic_0191813946_li49337877152014"><span>Check the Yarn status.</span><p><ol type="a" id="alm_18006__en-us_topic_0191813946_ol39878449"><li id="alm_18006__en-us_topic_0191813946_li1487713813414">Go to the cluster details page and choose <strong id="alm_18006__b17504101243016">Alarms</strong>.</li><li id="alm_18006__en-us_topic_0191813946_substep_03d21a89"><a name="alm_18006__en-us_topic_0191813946_substep_03d21a89"></a><a name="en-us_topic_0191813946_substep_03d21a89"></a>In the alarm list on MRS Manager, check whether the alarm ALM-18000 Yarn Service Unavailable is generated.<ul class="subitemlist" id="alm_18006__en-us_topic_0191813946_ul50254714"><li id="alm_18006__en-us_topic_0191813946_li49639248">If yes, go to <a href="#alm_18006__en-us_topic_0191813946_substep_03d82569">2.c</a>.</li><li id="alm_18006__en-us_topic_0191813946_li61356140">If no, go to <a href="#alm_18006__en-us_topic_0191813946_li12092809151957">3</a>.</li></ul>
</li><li id="alm_18006__en-us_topic_0191813946_substep_03d82569"><a name="alm_18006__en-us_topic_0191813946_substep_03d82569"></a><a name="en-us_topic_0191813946_substep_03d82569"></a>Rectify the fault by following the handling procedure in <a href="alm_18000.html">ALM-18000 Yarn Service Unavailable</a>.</li><li id="alm_18006__en-us_topic_0191813946_li38668280">Run the MapReduce job command again to check whether the MapReduce job can be executed.<ul class="subitemlist" id="alm_18006__en-us_topic_0191813946_ul12470207"><li id="alm_18006__en-us_topic_0191813946_li45123001">If yes, go to <a href="#alm_18006__en-us_topic_0191813946_clean">2.e</a>.</li><li id="alm_18006__en-us_topic_0191813946_li31084503">If no, go to <a href="#alm_18006__en-us_topic_0191813946_li572522141314">4</a>.</li></ul>
</li><li id="alm_18006__en-us_topic_0191813946_clean"><a name="alm_18006__en-us_topic_0191813946_clean"></a><a name="en-us_topic_0191813946_clean"></a>In the alarm list, click <span><img id="alm_18006__en-us_topic_0191813946_image31436402911" src="en-us_image_0000001349257373.png"></span> in the <strong id="alm_18006__b159281558163316">Operation</strong> column of the alarm to manually clear the alarm. No further action is required.</li></ol>
</p></li><li class="tableheading" id="alm_18006__en-us_topic_0191813946_li12092809151957"><a name="alm_18006__en-us_topic_0191813946_li12092809151957"></a><a name="en-us_topic_0191813946_li12092809151957"></a><span>Adjust the timeout threshold.</span><p><div class="p" id="alm_18006__en-us_topic_0191813946_p53375769152140">On MRS Manager, choose <strong id="alm_18006__b10119940137">System</strong> &gt; <strong id="alm_18006__b1211934011312">Threshold Configuration</strong> &gt; <strong id="alm_18006__b71201401737">Services</strong> &gt; <strong id="alm_18006__b6121124015318">Yarn</strong> &gt; <strong id="alm_18006__b01211740237">Timed out Applications</strong>, and increase the maximum number of timeout tasks allowed by the current threshold rule. Check whether the alarm is cleared.<ul class="subitemlist" id="alm_18006__en-us_topic_0191813946_ul14315403"><li id="alm_18006__en-us_topic_0191813946_li61729771">If yes, no further action is required.</li><li id="alm_18006__en-us_topic_0191813946_li18697032">If no, go to <a href="#alm_18006__en-us_topic_0191813946_li572522141314">4</a>.</li></ul>
</div>
</p></li><li id="alm_18006__en-us_topic_0191813946_li572522141314"><a name="alm_18006__en-us_topic_0191813946_li572522141314"></a><a name="en-us_topic_0191813946_li572522141314"></a><span>Collect fault information.</span><p><ol type="a" id="alm_18006__en-us_topic_0191813946_en-us_topic_0191813935_ol6089206913036"><li id="alm_18006__en-us_topic_0191813946_en-us_topic_0191813935_li4478836213036">On MRS Manager, choose <span class="menucascade" id="alm_18006__menucascade143747723516"><b><span class="uicontrol" id="alm_18006__uicontrol53733718352">System</span></b> &gt; <b><span class="uicontrol" id="alm_18006__uicontrol437418703512">Export Log</span></b></span>.</li><li id="alm_18006__li18574327401">Contact technical support engineers for help. For details, see <a href="https://docs.otc.t-systems.com/en-us/public/learnmore.html" target="_blank" rel="noopener noreferrer">technical support</a>.</li></ol>
</p></li></ol>
</div>
<div class="section" id="alm_18006__en-us_topic_0191813946_section33200047"><h4 class="sectiontitle">Reference</h4><p id="alm_18006__en-us_topic_0191813946_p1501966">None</p>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_0241.html">Alarm Reference (Applicable to Versions Earlier Than MRS 3.x)</a></div>
</div>
</div>