forked from docs/doc-exports
Reviewed-by: Pruthi, Vineet <vineet.pruthi@t-systems.com> Co-authored-by: Yang, Tong <yangtong2@huawei.com> Co-committed-by: Yang, Tong <yangtong2@huawei.com>
96 lines
14 KiB
HTML
96 lines
14 KiB
HTML
<a name="ALM-45652"></a><a name="ALM-45652"></a>
|
|
|
|
<h1 class="topictitle1">ALM-45652 Flink Service Unavailable</h1>
|
|
<div id="body0000002008221049"><p id="ALM-45652__p12261122253615">This section applies to MRS 3.3.0 or later.</p>
|
|
<div class="section" id="ALM-45652__section663215"><h4 class="sectiontitle"><span id="ALM-45652__text516373020197">Alarm Description</span></h4><p id="ALM-45652__p13979662">The alarm module checks the Flink status every 60 seconds. This alarm is generated when the Flink service is unavailable. This alarm is cleared when the Flink service recovers.</p>
|
|
</div>
|
|
<div class="section" id="ALM-45652__section5968939"><h4 class="sectiontitle"><span id="ALM-45652__text20591447192117">Alarm Attributes</span></h4>
|
|
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-45652__table10143581" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-45652__row61411666"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.3.2.1.4.1.1"><p id="ALM-45652__p17386810"><span id="ALM-45652__text1864783145211">Alarm ID</span></p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.3.2.1.4.1.2"><p id="ALM-45652__p66154394"><span id="ALM-45652__text297913110521">Alarm Severity</span></p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.3.2.1.4.1.3"><p id="ALM-45652__p49230886"><span id="ALM-45652__text0890175712305">Auto Cleared</span></p>
|
|
</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody><tr id="ALM-45652__row49774232"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-45652__p5180964">45652</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-45652__p17004965">Critical</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.3.2.1.4.1.3 "><p id="ALM-45652__p35224963">Yes</p>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
</div>
|
|
<div class="section" id="ALM-45652__section53720453"><h4 class="sectiontitle"><span id="ALM-45652__text18171442142214">Alarm Parameters</span></h4>
|
|
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-45652__table34649765" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-45652__row18974100"><th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.4.2.1.3.1.1"><p id="ALM-45652__p42699947"><span id="ALM-45652__text6203173410617">Parameter</span></p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.4.2.1.3.1.2"><p id="ALM-45652__p36143663"><span id="ALM-45652__text10819164319610">Description</span></p>
|
|
</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody><tr id="ALM-45652__row16272251424"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.1 "><p id="ALM-45652__p9447153994219">Source</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.2 "><p id="ALM-45652__p144723994214">Specifies the cluster for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-45652__row38292076"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.1 "><p id="ALM-45652__p164471639194216">ServiceName</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.2 "><p id="ALM-45652__p44471639174211">Specifies the service for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-45652__row9875225"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.1 "><p id="ALM-45652__p1244715394427">RoleName</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.2 "><p id="ALM-45652__p44471439144216">Specifies the role for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-45652__row13243689"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.1 "><p id="ALM-45652__p1244716397426">HostName</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.2 "><p id="ALM-45652__p244713917425">Specifies the job for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
</div>
|
|
<div class="section" id="ALM-45652__section13722030"><h4 class="sectiontitle"><span id="ALM-45652__text98201443182317">Impact on the System</span></h4><p id="ALM-45652__p1727053153714">FlinkServer and the Flink client cannot be used to submit Flink jobs.</p>
|
|
</div>
|
|
<div class="section" id="ALM-45652__section56389407"><h4 class="sectiontitle"><span id="ALM-45652__text11871546172411">Possible Causes</span></h4><p id="ALM-45652__p633303895016">The ZooKeeper, HDFS, Yarn, KrbServer, or DBService service on which Flink depends is unavailable.</p>
|
|
</div>
|
|
<div class="section" id="ALM-45652__section4437121113517"><h4 class="sectiontitle"><span id="ALM-45652__text79051154102518">Handling Procedure</span></h4><p class="tableheading" id="ALM-45652__p1878424755911"><strong id="ALM-45652__b13373134131713">Check whether the ZooKeeper service on which Flink depends is abnormal.</strong></p>
|
|
<ol id="ALM-45652__ol37851247135911"><li id="ALM-45652__li97851447145910"><span>Log in to FusionInsight Manager and choose <strong id="ALM-45652__b178368144176">O&M</strong> > <strong id="ALM-45652__b7837114171711">Alarm</strong> > <strong id="ALM-45652__b1783811451718">Alarms</strong>.</span></li><li id="ALM-45652__li77851347185920"><span>In the alarm list, check whether "ALM-13000 ZooKeeper Service Unavailable" exists.</span><p><ul id="ALM-45652__ul10785104735919"><li id="ALM-45652__li20785647105914">If yes, go to <a href="#ALM-45652__li16785184716596">3</a>.</li><li id="ALM-45652__li1878518477593">If no, go to <a href="#ALM-45652__li936635985913">5</a>.</li></ul>
|
|
</p></li><li id="ALM-45652__li16785184716596"><a name="ALM-45652__li16785184716596"></a><a name="li16785184716596"></a><span>Handle the alarm by referring to "ALM-13000 ZooKeeper Service Unavailable."</span></li><li id="ALM-45652__li3785124715594"><span>After the alarm is cleared, wait a few minutes and check whether the alarm is cleared.</span><p><ul id="ALM-45652__ul678504719592"><li id="ALM-45652__li1178554713593">If yes, no further action is required.</li><li id="ALM-45652__li11785847105920">If no, go to <a href="#ALM-45652__li936635985913">5</a>.</li></ul>
|
|
</p></li></ol>
|
|
<p class="tableheading" id="ALM-45652__p20365105910597"><strong id="ALM-45652__b142878482019">Check whether the HDFS service on which Flink depends is abnormal.</strong></p>
|
|
<ol start="5" id="ALM-45652__ol936519597596"><li id="ALM-45652__li936635985913"><a name="ALM-45652__li936635985913"></a><a name="li936635985913"></a><span>On FusionInsight Manager, choose <strong id="ALM-45652__b5216131613208">O&M</strong> > <strong id="ALM-45652__b02161165207">Alarm</strong> > <strong id="ALM-45652__b82171016142010">Alarms</strong>.</span></li><li id="ALM-45652__li123667593597"><span>In the alarm list, check whether "ALM-14000 HDFS Service Unavailable" exists.</span><p><ul id="ALM-45652__ul83661859135914"><li id="ALM-45652__li20366165915916">If yes, go to <a href="#ALM-45652__li14366125925911">7</a>.</li><li id="ALM-45652__li6366159145912">If no, go to <a href="#ALM-45652__li1540834513593">9</a>.</li></ul>
|
|
</p></li><li id="ALM-45652__li14366125925911"><a name="ALM-45652__li14366125925911"></a><a name="li14366125925911"></a><span>Handle the alarm by referring to "ALM-14000 HDFS Service Unavailable."</span></li><li id="ALM-45652__li1336655915595"><span>After the alarm is cleared, wait a few minutes and check whether the alarm is cleared.</span><p><ul id="ALM-45652__ul53661759135918"><li id="ALM-45652__li10366105910595">If yes, no further action is required.</li><li id="ALM-45652__li183662593594">If no, go to <a href="#ALM-45652__li1540834513593">9</a>.</li></ul>
|
|
</p></li></ol>
|
|
<p class="tableheading" id="ALM-45652__p124075454594"><strong id="ALM-45652__b13125729122111">Check whether the Yarn service on which Flink depends is abnormal.</strong></p>
|
|
<ol start="9" id="ALM-45652__ol14408144585914"><li id="ALM-45652__li1540834513593"><a name="ALM-45652__li1540834513593"></a><a name="li1540834513593"></a><span>On FusionInsight Manager, choose <strong id="ALM-45652__b2454153462115">O&M</strong> > <strong id="ALM-45652__b10455434192112">Alarm</strong> > <strong id="ALM-45652__b15457134142112">Alarms</strong>.</span></li><li id="ALM-45652__li9408194555914"><span>In the alarm list, check whether "ALM-18000 Yarn Service Unavailable" exists.</span><p><ul id="ALM-45652__ul64081045135915"><li id="ALM-45652__li24081545125914">If yes, go to <a href="#ALM-45652__li1240810456591">11</a>.</li><li id="ALM-45652__li174081845175913">If no, go to <a href="#ALM-45652__li10537624124112">13</a>.</li></ul>
|
|
</p></li><li id="ALM-45652__li1240810456591"><a name="ALM-45652__li1240810456591"></a><a name="li1240810456591"></a><span>Handle the alarm by referring to "ALM-18000 Yarn Service Unavailable."</span></li><li id="ALM-45652__li840818455596"><span>After the alarm is cleared, wait a few minutes and check whether the alarm is cleared.</span><p><ul id="ALM-45652__ul164081145185910"><li id="ALM-45652__li10408184525915">If yes, no further action is required.</li><li id="ALM-45652__li8408164575912">If no, go to <a href="#ALM-45652__li10537624124112">13</a>.</li></ul>
|
|
</p></li></ol>
|
|
<p class="tableheading" id="ALM-45652__p85371024114119"><strong id="ALM-45652__b453242002215">Check whether the KrbServer service on which Flink depends is abnormal.</strong></p>
|
|
<ol start="13" id="ALM-45652__ol13537172410414"><li id="ALM-45652__li10537624124112"><a name="ALM-45652__li10537624124112"></a><a name="li10537624124112"></a><span>On FusionInsight Manager, choose <strong id="ALM-45652__b9233142872210">O&M</strong> > <strong id="ALM-45652__b32341728182215">Alarm</strong> > <strong id="ALM-45652__b92344288223">Alarms</strong>.</span></li><li id="ALM-45652__li1957612118337"><span>In the alarm list, check whether "ALM-25500 KrbServer Service Unavailable" exists.</span><p><ul id="ALM-45652__ul653711248418"><li id="ALM-45652__li185371224164111">If yes, go to <a href="#ALM-45652__li1053752412412">15</a>.</li><li id="ALM-45652__li165379245419">If no, go to <a href="#ALM-45652__li1957661133312">17</a>.</li></ul>
|
|
</p></li><li id="ALM-45652__li1053752412412"><a name="ALM-45652__li1053752412412"></a><a name="li1053752412412"></a><span>Handle the alarm by referring to "ALM-25500 KrbServer Service Unavailable."</span></li><li id="ALM-45652__li1853752415416"><span>After the alarm is cleared, wait a few minutes and check whether the alarm is cleared.</span><p><ul id="ALM-45652__ul7537924164112"><li id="ALM-45652__li185373246419">If yes, no further action is required.</li><li id="ALM-45652__li165387243419">If no, go to <a href="#ALM-45652__li1957661133312">17</a>.</li></ul>
|
|
</p></li></ol>
|
|
<p class="tableheading" id="ALM-45652__p7737436163910"><strong id="ALM-45652__b62399910233">Check whether the DBService service on which Flink depends is abnormal.</strong></p>
|
|
<ol start="17" id="ALM-45652__ol85761615337"><li id="ALM-45652__li1957661133312"><a name="ALM-45652__li1957661133312"></a><a name="li1957661133312"></a><span>On FusionInsight Manager, choose <strong id="ALM-45652__b138813179235">O&M</strong> > <strong id="ALM-45652__b1388212179231">Alarm</strong> > <strong id="ALM-45652__b12884111712315">Alarms</strong>.</span></li><li id="ALM-45652__li19253731104010"><span>In the alarm list, check whether "ALM-27001 DBService Service Unavailable" exists.</span><p><ul id="ALM-45652__ul8576101113318"><li id="ALM-45652__li657618163314">If yes, go to <a href="#ALM-45652__li1857611153310">19</a>.</li><li id="ALM-45652__li7576513337">If no, go to <a href="#ALM-45652__li4749473185459">21</a>.</li></ul>
|
|
</p></li><li id="ALM-45652__li1857611153310"><a name="ALM-45652__li1857611153310"></a><a name="li1857611153310"></a><span>Handle the alarm by referring to "ALM-27001 DBService Service Unavailable."</span></li><li id="ALM-45652__li1457618110336"><span>After the alarm is cleared, wait a few minutes and check whether the alarm is cleared.</span><p><ul id="ALM-45652__ul9576719335"><li id="ALM-45652__li14576151203317">If yes, no further action is required.</li><li id="ALM-45652__li0576612336">If no, go to <a href="#ALM-45652__li4749473185459">21</a>.</li></ul>
|
|
</p></li></ol>
|
|
<p class="tableheading" id="ALM-45652__p3538354385459"><strong id="ALM-45652__b6160463585522">Collect fault information.</strong></p>
|
|
<ol start="21" id="ALM-45652__ol4790308885524"><li id="ALM-45652__li4749473185459"><a name="ALM-45652__li4749473185459"></a><a name="li4749473185459"></a><span>On FusionInsight Manager, choose <strong id="ALM-45652__b14716141019244">O&M</strong>. In the navigation pane on the left, choose <strong id="ALM-45652__b1371719109246">Log</strong> > <strong id="ALM-45652__b17171510142411">Download</strong>.</span></li><li id="ALM-45652__li2648019085459"><span>Expand the <strong id="ALM-45652__b19818101310243">Service</strong> drop-down list, and select <strong id="ALM-45652__b2818151313240">Flink</strong> for the target cluster.</span></li><li id="ALM-45652__li3699511985459"><span>Click <span><img id="ALM-45652__image149001122173310" src="en-us_image_0000002008248613.png"></span> in the upper right corner, and set <strong id="ALM-45652__b1173916177241">Start Date</strong> and <strong id="ALM-45652__b15740181718242">End Date</strong> for log collection to 30 minutes ahead of and after the alarm generation time respectively. Then, click <strong id="ALM-45652__b7741117172417">Download</strong>.</span></li><li id="ALM-45652__li4381466885459"><span>Contact <span id="ALM-45652__text990052215334">O&M personnel</span> and provide the collected logs.</span></li></ol>
|
|
</div>
|
|
<div class="section" id="ALM-45652__section169311343318"><h4 class="sectiontitle"><span id="ALM-45652__text195945622616">Alarm Clearance</span></h4><p id="ALM-45652__p754913417333">This alarm is automatically cleared after the fault is rectified.</p>
|
|
</div>
|
|
<div class="section" id="ALM-45652__section4139237"><h4 class="sectiontitle"><span id="ALM-45652__text143698488285">Related Information</span></h4><p id="ALM-45652__p33559471"><span id="ALM-45652__text19275105817121">None.</span></p>
|
|
</div>
|
|
</div>
|
|
<div>
|
|
<div class="familylinks">
|
|
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
|
|
</div>
|
|
</div>
|
|
|