forked from docs/doc-exports
MRS UMN 20250806 version
Reviewed-by: Pruthi, Vineet <vineet.pruthi@t-systems.com> Co-authored-by: yangtong <yangtong2@huawei.com> Co-committed-by: yangtong <yangtong2@huawei.com>
This commit is contained in:
File diff suppressed because it is too large
Load Diff
@ -61,11 +61,11 @@
|
||||
<div class="section" id="ALM-12091__section950130153414"><h4 class="sectiontitle"><span id="ALM-12091__text12656240135813">Possible Causes</span></h4><p id="ALM-12091__p171771431115712">The disaster process is abnormal.</p>
|
||||
</div>
|
||||
<div class="section" id="ALM-12091__section1548510327214"><h4 class="sectiontitle"><span id="ALM-12091__text19569135285811">Handling Procedure</span></h4><p class="tableheading" id="ALM-12091__p8324186"><strong id="ALM-12091__b1530064416313">Check whether the disaster process is normal.</strong></p>
|
||||
<ol id="ALM-12091__ol5558276163811"><li id="ALM-12091__li34357272165726"><span>In the alarm list on FusionInsight Manager, locate the row that contains the alarm, and click <span><img id="ALM-12091__image168221113135319" src="en-us_image_0000002008258961.png"></span> to view the name of the host for which the alarm is generated.</span></li><li id="ALM-12091__li50024484163811"><span>Log in to the host for which the alarm is generated as user <strong id="ALM-12091__b11453743219">root</strong>. <span id="ALM-12091__text65184518511"></span></span></li><li id="ALM-12091__li1581327399"><span>Run the <strong id="ALM-12091__b249615917334">su - omm</strong> command to switch to user <strong id="ALM-12091__b1496159183317">omm</strong>.</span></li><li id="ALM-12091__li17626636132716"><span>Run the <strong id="ALM-12091__b32015537163811">sh ${BIGDATA_HOME}/om-server/OMS/workspace0/ha/module/hacom/script/status_ha.sh</strong> command to check whether the status of the disaster resources managed by the HA is normal. In the single-node system, the disaster resource is in the normal state. In the dual-node system, the disaster resource is in the normal state on the active node and in the stopped state on the standby node.</span><p><ul class="subitemlist" id="ALM-12091__ul66289368274"><li id="ALM-12091__li1062811360271">If yes, go to <a href="#ALM-12091__li6152360163635">7</a>.</li><li id="ALM-12091__li46281436112719">If no, go to <a href="#ALM-12091__li139657016249">5</a>.</li></ul>
|
||||
<ol id="ALM-12091__ol5558276163811"><li id="ALM-12091__li34357272165726"><span>In the alarm list on MRS Manager, locate the row that contains the alarm, and click <span><img id="ALM-12091__image168221113135319" src="en-us_image_0000002008258961.png"></span> to view the name of the host for which the alarm is generated.</span></li><li id="ALM-12091__li50024484163811"><span>Log in to the host for which the alarm is generated as user <strong id="ALM-12091__b11453743219">root</strong>. <span id="ALM-12091__text65184518511"></span></span></li><li id="ALM-12091__li1581327399"><span>Run the <strong id="ALM-12091__b249615917334">su - omm</strong> command to switch to user <strong id="ALM-12091__b1496159183317">omm</strong>.</span></li><li id="ALM-12091__li17626636132716"><span>Run the <strong id="ALM-12091__b32015537163811">sh ${BIGDATA_HOME}/om-server/OMS/workspace0/ha/module/hacom/script/status_ha.sh</strong> command to check whether the status of the disaster resources managed by the HA is normal. In the single-node system, the disaster resource is in the normal state. In the dual-node system, the disaster resource is in the normal state on the active node and in the stopped state on the standby node.</span><p><ul class="subitemlist" id="ALM-12091__ul66289368274"><li id="ALM-12091__li1062811360271">If yes, go to <a href="#ALM-12091__li6152360163635">7</a>.</li><li id="ALM-12091__li46281436112719">If no, go to <a href="#ALM-12091__li139657016249">5</a>.</li></ul>
|
||||
</p></li><li id="ALM-12091__li139657016249"><a name="ALM-12091__li139657016249"></a><a name="li139657016249"></a><span>Run the <strong id="ALM-12091__b519675815717">vi ${BIGDATA_LOG_HOME}/disaster/disaster.log</strong> command to check whether the disaster resource log of HA contains the keyword <strong id="ALM-12091__b1919625895719">ERROR</strong>. If yes, analyze the logs to locate the resource exception cause and fix the exception.</span></li><li id="ALM-12091__li14736019164314"><span>Wait 5 minutes and check whether the alarm is automatically cleared.</span><p><ul class="subitemlist" id="ALM-12091__ul473671984320"><li id="ALM-12091__li9736151912432">If yes, no further action is required.</li><li id="ALM-12091__li4736141910439">If no, go to <a href="#ALM-12091__li6152360163635">7</a>.</li></ul>
|
||||
</p></li></ol>
|
||||
<p id="ALM-12091__p3652216163758"><strong id="ALM-12091__b83507409354">Collect fault information.</strong></p>
|
||||
<ol start="7" id="ALM-12091__ol26111342163819"><li id="ALM-12091__li6152360163635"><a name="ALM-12091__li6152360163635"></a><a name="li6152360163635"></a><span>On FusionInsight Manager, choose <strong id="ALM-12091__b5931842173510">O&M</strong>. In the navigation pane on the left, choose <strong id="ALM-12091__b169334283512">Log</strong> > <strong id="ALM-12091__b893154212351">Download</strong>.</span></li><li id="ALM-12091__li55371246163635"><span>Expand the <strong id="ALM-12091__b975881714366">Service</strong> drop-down list, select <strong id="ALM-12091__b1758517163617">Disaster</strong> for the target cluster, and click <strong id="ALM-12091__b8758417163620">OK</strong>.</span></li><li id="ALM-12091__li28579174163635"><span>Click <span><img id="ALM-12091__image69691781225" src="en-us_image_0000002008299541.png"></span> in the upper right corner, and set <strong id="ALM-12091__b17704133417363">Start Date</strong> and <strong id="ALM-12091__b87041334123611">End Date</strong> for log collection to 1 hour ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-12091__b1870583453615">Download</strong>.</span></li><li id="ALM-12091__li33211732163635"><span>Contact <span id="ALM-12091__text12867404363">O&M personnel</span> and provide the collected logs.</span></li></ol>
|
||||
<ol start="7" id="ALM-12091__ol26111342163819"><li id="ALM-12091__li6152360163635"><a name="ALM-12091__li6152360163635"></a><a name="li6152360163635"></a><span>On MRS Manager, choose <strong id="ALM-12091__b5931842173510">O&M</strong>. In the navigation pane on the left, choose <strong id="ALM-12091__b169334283512">Log</strong> > <strong id="ALM-12091__b893154212351">Download</strong>.</span></li><li id="ALM-12091__li55371246163635"><span>Expand the <strong id="ALM-12091__b975881714366">Service</strong> drop-down list, select <strong id="ALM-12091__b1758517163617">Disaster</strong> for the target cluster, and click <strong id="ALM-12091__b8758417163620">OK</strong>.</span></li><li id="ALM-12091__li28579174163635"><span>Click <span><img id="ALM-12091__image69691781225" src="en-us_image_0000002008299541.png"></span> in the upper right corner, and set <strong id="ALM-12091__b17704133417363">Start Date</strong> and <strong id="ALM-12091__b87041334123611">End Date</strong> for log collection to 1 hour ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-12091__b1870583453615">Download</strong>.</span></li><li id="ALM-12091__li33211732163635"><span>Contact <span id="ALM-12091__text12867404363">O&M personnel</span> and provide the collected logs.</span></li></ol>
|
||||
</div>
|
||||
<div class="section" id="ALM-12091__section129720811223"><h4 class="sectiontitle"><span id="ALM-12091__text367020138593">Alarm Clearance</span></h4><p id="ALM-12091__p19973168152211">This alarm is automatically cleared after the fault is rectified.</p>
|
||||
</div>
|
||||
|
||||
@ -62,13 +62,13 @@
|
||||
<div class="section" id="ALM-12186__section59738735"><h4 class="sectiontitle"><span id="ALM-12186__text12656240135813">Possible Causes</span></h4><p id="ALM-12186__p63545420285">The CGroup task usage exceeds 90%.</p>
|
||||
</div>
|
||||
<div class="section" id="ALM-12186__section74438017253"><h4 class="sectiontitle"><span id="ALM-12186__text19569135285811">Handling Procedure</span></h4><p id="ALM-12186__p6803921141514"><strong id="ALM-12186__b10197143715218">Check the maximum number of threads that can be concurrently opened by user omm is properly set.</strong></p>
|
||||
<ol id="ALM-12186__ol18615415345"><li id="ALM-12186__li4861174163414"><span>Log in to FusionInsight Manager and choose <strong id="ALM-12186__b59771025152414">O&M</strong> > <strong id="ALM-12186__b12977325132418">Alarm</strong> > <strong id="ALM-12186__b29783254243">Alarms</strong>. On the page that is displayed, click <span><img id="ALM-12186__image178619416342" src="en-us_image_0000001971659200.png"></span> in the row containing the alarm, and view the name of the host for which the alarm is generated in <strong id="ALM-12186__b139781825112416">Location</strong>. Click the host name to view its IP address.</span></li><li id="ALM-12186__li38611343349"><span>Log in to the host for which the alarm is generated as user <strong id="ALM-12186__b108614453411">omm</strong>.</span></li><li id="ALM-12186__li78615410344"><span>Run the following command to obtain the maximum number of threads that can be concurrently opened by user <strong id="ALM-12186__b44603483302">omm</strong> and check whether this number is greater than or equal to <strong id="ALM-12186__b1359611271442">60000</strong>:</span><p><p id="ALM-12186__p186144103410"><strong id="ALM-12186__b188611440346">systemctl status user-$(id -u).slice | grep limit</strong></p>
|
||||
<ol id="ALM-12186__ol18615415345"><li id="ALM-12186__li4861174163414"><span>Log in to MRS Manager and choose <strong id="ALM-12186__b59771025152414">O&M</strong> > <strong id="ALM-12186__b12977325132418">Alarm</strong> > <strong id="ALM-12186__b29783254243">Alarms</strong>. On the page that is displayed, click <span><img id="ALM-12186__image178619416342" src="en-us_image_0000001971659200.png"></span> in the row containing the alarm, and view the name of the host for which the alarm is generated in <strong id="ALM-12186__b139781825112416">Location</strong>. Click the host name to view its IP address.</span></li><li id="ALM-12186__li38611343349"><span>Log in to the host for which the alarm is generated as user <strong id="ALM-12186__b108614453411">omm</strong>.</span></li><li id="ALM-12186__li78615410344"><span>Run the following command to obtain the maximum number of threads that can be concurrently opened by user <strong id="ALM-12186__b44603483302">omm</strong> and check whether this number is greater than or equal to <strong id="ALM-12186__b1359611271442">60000</strong>:</span><p><p id="ALM-12186__p186144103410"><strong id="ALM-12186__b188611440346">systemctl status user-$(id -u).slice | grep limit</strong></p>
|
||||
<ul id="ALM-12186__ul68351049153616"><li id="ALM-12186__li683524983611">If yes, go to <a href="#ALM-12186__li18602412348">6</a>.</li><li id="ALM-12186__li168351249113620">If no, go to <a href="#ALM-12186__li9448150105813">4</a>.</li></ul>
|
||||
</p></li><li id="ALM-12186__li9448150105813"><a name="ALM-12186__li9448150105813"></a><a name="li9448150105813"></a><span>Switch to user <strong id="ALM-12186__b1474565455812">root</strong> and run the following command to change the value for user <strong id="ALM-12186__b1974513542588">omm</strong> to <strong id="ALM-12186__b4971456144717">60000</strong>:</span><p><p id="ALM-12186__p3829509598"><strong id="ALM-12186__b98290075918">systemctl set-property user-2000.slice TasksMax=60000</strong></p>
|
||||
</p></li><li id="ALM-12186__li23671340113420"><span>Change the value of <strong id="ALM-12186__b5941937204815">UserTasksMax</strong> in the <strong id="ALM-12186__b143671241124814">/etc/systemd/logind.conf</strong> file to <strong id="ALM-12186__b19903204454814">60000</strong>. (If the parameter is commented out, uncomment it.) Save the file, wait 5 minutes, and check whether the alarm is cleared.</span><p><ul id="ALM-12186__ul068811558364"><li id="ALM-12186__li7689125573617">If yes, no further action is required.</li><li id="ALM-12186__li156890551363">If no, go to <a href="#ALM-12186__li18602412348">6</a>.</li></ul>
|
||||
</p></li></ol>
|
||||
<p id="ALM-12186__p1215415911338"><strong id="ALM-12186__b594844774514">Collect fault information.</strong></p>
|
||||
<ol start="6" id="ALM-12186__ol98604413413"><li id="ALM-12186__li18602412348"><a name="ALM-12186__li18602412348"></a><a name="li18602412348"></a><span>On FusionInsight Manager of the cluster, choose <strong id="ALM-12186__b145611142462">O&M</strong>. In the navigation pane on the left, choose <strong id="ALM-12186__b756164154619">Log</strong> > <strong id="ALM-12186__b25611140464">Download</strong>.</span></li><li id="ALM-12186__li128601240346"><span>Expand the <strong id="ALM-12186__b563015444610">Service</strong> drop-down list, select <strong id="ALM-12186__b435816267461">OmmServer</strong> and <strong id="ALM-12186__b17358926194615">NodeAgent</strong> for the target cluster, and click <strong id="ALM-12186__b1963711277476">OK</strong>.</span></li><li id="ALM-12186__li12860194133410"><span>Click <span><img id="ALM-12186__image104601319175315" src="en-us_image_0000001971818972.png"></span> in the upper right corner, and set <strong id="ALM-12186__b1417313467476">Start Date</strong> and <strong id="ALM-12186__b12174184612477">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-12186__b16174164617476">Download</strong>.</span></li><li id="ALM-12186__li1886064173413"><span>Contact <span id="ALM-12186__text8470138174810">O&M personnel</span> and provide the collected logs.</span></li></ol>
|
||||
<ol start="6" id="ALM-12186__ol98604413413"><li id="ALM-12186__li18602412348"><a name="ALM-12186__li18602412348"></a><a name="li18602412348"></a><span>On MRS Manager of the cluster, choose <strong id="ALM-12186__b145611142462">O&M</strong>. In the navigation pane on the left, choose <strong id="ALM-12186__b756164154619">Log</strong> > <strong id="ALM-12186__b25611140464">Download</strong>.</span></li><li id="ALM-12186__li128601240346"><span>Expand the <strong id="ALM-12186__b563015444610">Service</strong> drop-down list, select <strong id="ALM-12186__b435816267461">OmmServer</strong> and <strong id="ALM-12186__b17358926194615">NodeAgent</strong> for the target cluster, and click <strong id="ALM-12186__b1963711277476">OK</strong>.</span></li><li id="ALM-12186__li12860194133410"><span>Click <span><img id="ALM-12186__image104601319175315" src="en-us_image_0000001971818972.png"></span> in the upper right corner, and set <strong id="ALM-12186__b1417313467476">Start Date</strong> and <strong id="ALM-12186__b12174184612477">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-12186__b16174164617476">Download</strong>.</span></li><li id="ALM-12186__li1886064173413"><span>Contact <span id="ALM-12186__text8470138174810">O&M personnel</span> and provide the collected logs.</span></li></ol>
|
||||
</div>
|
||||
<div class="section" id="ALM-12186__section169311343318"><h4 class="sectiontitle"><span id="ALM-12186__text367020138593">Alarm Clearance</span></h4><p id="ALM-12186__p754913417333">This alarm is automatically cleared after the fault is rectified.</p>
|
||||
</div>
|
||||
|
||||
@ -65,11 +65,11 @@
|
||||
<div class="section" id="ALM-12187__section18133852349"><h4 class="sectiontitle"><span id="ALM-12187__text12656240135813">Possible Causes</span></h4><ul id="ALM-12187__ul101335315302"><li id="ALM-12187__li9133143153018">The growpart scale-out tool is not installed.</li><li id="ALM-12187__li11331531153011">The system fails to execute the command for expanding disk partition.</li></ul>
|
||||
</div>
|
||||
<div class="section" id="ALM-12187__section1481384019290"><h4 class="sectiontitle"><span id="ALM-12187__text19569135285811">Handling Procedure</span></h4><p id="ALM-12187__p240418532150"><strong id="ALM-12187__b944720112254">Check whether growpart is installed.</strong></p>
|
||||
<ol id="ALM-12187__ol14485161012158"><li id="ALM-12187__li5478151015150"><span>Log in to FusionInsight Manager, click <strong id="ALM-12187__b26822818259">O&M</strong>, and choose <strong id="ALM-12187__b5190625172512">Alarm</strong> > <strong id="ALM-12187__b08101827172511">Alarms</strong> to view the alarm details. In the <strong id="ALM-12187__b8106550112519">Location</strong> column, check the name of the host and mount directory for which the alarm is generated. Click the host name to view its IP address.</span></li><li id="ALM-12187__li14791210101518"><span>Log in to the node for which the alarm is generated as user <strong id="ALM-12187__b2490171214266">root</strong>.</span></li><li id="ALM-12187__li7480191011510"><span>Run the following command to check whether growpart is installed:</span><p><p id="ALM-12187__p1447911051511"><strong id="ALM-12187__b1647901081518">which growpart</strong></p>
|
||||
<ol id="ALM-12187__ol14485161012158"><li id="ALM-12187__li5478151015150"><span>Log in to MRS Manager, click <strong id="ALM-12187__b26822818259">O&M</strong>, and choose <strong id="ALM-12187__b5190625172512">Alarm</strong> > <strong id="ALM-12187__b08101827172511">Alarms</strong> to view the alarm details. In the <strong id="ALM-12187__b8106550112519">Location</strong> column, check the name of the host and mount directory for which the alarm is generated. Click the host name to view its IP address.</span></li><li id="ALM-12187__li14791210101518"><span>Log in to the node for which the alarm is generated as user <strong id="ALM-12187__b2490171214266">root</strong>.</span></li><li id="ALM-12187__li7480191011510"><span>Run the following command to check whether growpart is installed:</span><p><p id="ALM-12187__p1447911051511"><strong id="ALM-12187__b1647901081518">which growpart</strong></p>
|
||||
<div class="p" id="ALM-12187__p647981018150">If information similar to the following is displayed, the growpart tool is installed. Otherwise, contact <span id="ALM-12187__text10479141021520">O&M personnel</span> to install the growpart tool.<pre class="screen" id="ALM-12187__screen154791710111512">[root@<em id="ALM-12187__i1047981021510">xxx</em> ~]#which growpart
|
||||
/usr/bin/growpart</pre>
|
||||
</div>
|
||||
</p></li></ol><ol start="4" id="ALM-12187__ol84161339101720"><li id="ALM-12187__li9575114715178"><span>Wait for 5 minutes, then choose <strong id="ALM-12187__b4731131192718">O&M</strong>, and choose <strong id="ALM-12187__b16291162422715">Alarm</strong> > <strong id="ALM-12187__b11858112522718">Alarms</strong> on FusionInsight Manager. Check whether the alarm is cleared.</span><p><ul id="ALM-12187__ul16575747141717"><li id="ALM-12187__li45761471178">If yes, no further action is required.</li><li id="ALM-12187__li205761476175">If no, go to <a href="#ALM-12187__li88865011163">5</a>.</li></ul>
|
||||
</p></li></ol><ol start="4" id="ALM-12187__ol84161339101720"><li id="ALM-12187__li9575114715178"><span>Wait for 5 minutes, then choose <strong id="ALM-12187__b4731131192718">O&M</strong>, and choose <strong id="ALM-12187__b16291162422715">Alarm</strong> > <strong id="ALM-12187__b11858112522718">Alarms</strong> on MRS Manager. Check whether the alarm is cleared.</span><p><ul id="ALM-12187__ul16575747141717"><li id="ALM-12187__li45761471178">If yes, no further action is required.</li><li id="ALM-12187__li205761476175">If no, go to <a href="#ALM-12187__li88865011163">5</a>.</li></ul>
|
||||
</p></li></ol>
|
||||
<p id="ALM-12187__p866454491610"><strong id="ALM-12187__b484365219278">Run the disk partition expansion command.</strong></p>
|
||||
<ol start="5" id="ALM-12187__ol39035031615"><li id="ALM-12187__li88865011163"><a name="ALM-12187__li88865011163"></a><a name="li88865011163"></a><span>Run the following command to view the disk and partition information:</span><p><p id="ALM-12187__p1871502165"><strong id="ALM-12187__b1887125015166">lsblk</strong></p>
|
||||
@ -86,7 +86,7 @@
|
||||
<p id="ALM-12187__p589195017168"><strong id="ALM-12187__b189165081615">resize2fs /dev/vdb1</strong></p>
|
||||
<p id="ALM-12187__p889125011166">If information similar to the following is displayed, the execution is successful:</p>
|
||||
<p id="ALM-12187__p6901050101614"><span><img id="ALM-12187__image99014507163" src="en-us_image_0000001971818980.png"></span></p>
|
||||
</p></li><li id="ALM-12187__li09010504164"><span>Wait for 5 minutes, click <strong id="ALM-12187__b1830463113918">O&M</strong>, and choose <strong id="ALM-12187__b159115113399">Alarm</strong> > <strong id="ALM-12187__b194154141390">Alarms</strong> on FusionInsight Manager. Check whether the alarm is cleared.</span><p><ul id="ALM-12187__ul49095010162"><li id="ALM-12187__li1090175001611">If yes, no further action is required.</li><li id="ALM-12187__li890850101611">If no, contact <span id="ALM-12187__text11900457101913">O&M personnel</span>.</li></ul>
|
||||
</p></li><li id="ALM-12187__li09010504164"><span>Wait for 5 minutes, click <strong id="ALM-12187__b1830463113918">O&M</strong>, and choose <strong id="ALM-12187__b159115113399">Alarm</strong> > <strong id="ALM-12187__b194154141390">Alarms</strong> on MRS Manager. Check whether the alarm is cleared.</span><p><ul id="ALM-12187__ul49095010162"><li id="ALM-12187__li1090175001611">If yes, no further action is required.</li><li id="ALM-12187__li890850101611">If no, contact <span id="ALM-12187__text11900457101913">O&M personnel</span>.</li></ul>
|
||||
</p></li></ol>
|
||||
</div>
|
||||
<div class="section" id="ALM-12187__section7293173912175"><h4 class="sectiontitle"><span id="ALM-12187__text367020138593">Alarm Clearance</span></h4><p id="ALM-12187__p4178195414013">This alarm is automatically cleared after the fault is rectified.</p>
|
||||
|
||||
@ -60,7 +60,7 @@
|
||||
<div class="section" id="ALM-12188__section18133852349"><h4 class="sectiontitle"><span id="ALM-12188__text12656240135813">Possible Causes</span></h4><ul id="ALM-12188__ul20753183122111"><li id="ALM-12188__li187531031132115">The diskmgt disk monitoring service does not exist.</li><li id="ALM-12188__li137531431122114">The diskmgt disk monitoring service is not started.</li></ul>
|
||||
</div>
|
||||
<div class="section" id="ALM-12188__section1481384019290"><h4 class="sectiontitle"><span id="ALM-12188__text19569135285811">Handling Procedure</span></h4><p id="ALM-12188__p863220328248"><strong id="ALM-12188__b1780312320442">Check whether the diskmgt disk monitoring service exists.</strong></p>
|
||||
<ol id="ALM-12188__ol53051313141410"><li id="ALM-12188__li12658345203210"><span>Log in to FusionInsight Manager, click <strong id="ALM-12188__b295115361448">O&M</strong>, and choose <strong id="ALM-12188__b129521136154416">Alarm</strong> > <strong id="ALM-12188__b17952736144411">Alarms</strong> to view the alarm details. In the <strong id="ALM-12188__b1095213634418">Location</strong> column, check the name of the host for which the alarm is generated. Click the host name to view its IP address.</span></li><li id="ALM-12188__li639818123229"><span>Log in to the node for which the alarm is generated as user <strong id="ALM-12188__b123417532449">root</strong>.</span></li><li id="ALM-12188__li94191028135118"><span>Run the following command to check whether the core service file exists:</span><p><p id="ALM-12188__p1229375915528"><strong id="ALM-12188__b1691616572228">stat /usr/local/diskmgt/inner/diskmgtd</strong></p>
|
||||
<ol id="ALM-12188__ol53051313141410"><li id="ALM-12188__li12658345203210"><span>Log in to MRS Manager, click <strong id="ALM-12188__b295115361448">O&M</strong>, and choose <strong id="ALM-12188__b129521136154416">Alarm</strong> > <strong id="ALM-12188__b17952736144411">Alarms</strong> to view the alarm details. In the <strong id="ALM-12188__b1095213634418">Location</strong> column, check the name of the host for which the alarm is generated. Click the host name to view its IP address.</span></li><li id="ALM-12188__li639818123229"><span>Log in to the node for which the alarm is generated as user <strong id="ALM-12188__b123417532449">root</strong>.</span></li><li id="ALM-12188__li94191028135118"><span>Run the following command to check whether the core service file exists:</span><p><p id="ALM-12188__p1229375915528"><strong id="ALM-12188__b1691616572228">stat /usr/local/diskmgt/inner/diskmgtd</strong></p>
|
||||
<p id="ALM-12188__p11951165662314">If the file does not exist, contact <span id="ALM-12188__text138985011164">O&M personnel</span>.</p>
|
||||
</p></li></ol>
|
||||
<p id="ALM-12188__p10889644250"><strong id="ALM-12188__b5473634154515">Start the diskmgt disk monitoring service.</strong></p>
|
||||
@ -69,7 +69,7 @@
|
||||
<ul id="ALM-12188__ul11610104684116"><li id="ALM-12188__li5610124613415">If information similar to the following is displayed, the service is started successfully. Go to <a href="#ALM-12188__li09010504164">6</a>.<p id="ALM-12188__p653711564416"><span><img id="ALM-12188__image653745664119" src="en-us_image_0000002008258977.png"></span></p>
|
||||
</li></ul>
|
||||
<ul id="ALM-12188__ul923895804114"><li id="ALM-12188__li1423995894119">If no, contact <span id="ALM-12188__text127613104464">O&M personnel</span>.</li></ul>
|
||||
</p></li><li id="ALM-12188__li09010504164"><a name="ALM-12188__li09010504164"></a><a name="li09010504164"></a><span>Wait for 5 minutes, click <strong id="ALM-12188__b931583484612">O&M</strong>, and choose <strong id="ALM-12188__b17316234164616">Alarm</strong> > <strong id="ALM-12188__b18316183413466">Alarms</strong> on FusionInsight Manager. Check whether the alarm is cleared.</span><p><ul id="ALM-12188__ul49095010162"><li id="ALM-12188__li1090175001611">If yes, no further action is required.</li><li id="ALM-12188__li890850101611">If no, contact <span id="ALM-12188__text102591238154611">O&M personnel</span>.</li></ul>
|
||||
</p></li><li id="ALM-12188__li09010504164"><a name="ALM-12188__li09010504164"></a><a name="li09010504164"></a><span>Wait for 5 minutes, click <strong id="ALM-12188__b931583484612">O&M</strong>, and choose <strong id="ALM-12188__b17316234164616">Alarm</strong> > <strong id="ALM-12188__b18316183413466">Alarms</strong> on MRS Manager. Check whether the alarm is cleared.</span><p><ul id="ALM-12188__ul49095010162"><li id="ALM-12188__li1090175001611">If yes, no further action is required.</li><li id="ALM-12188__li890850101611">If no, contact <span id="ALM-12188__text102591238154611">O&M personnel</span>.</li></ul>
|
||||
</p></li></ol>
|
||||
</div>
|
||||
<div class="section" id="ALM-12188__section7293173912175"><h4 class="sectiontitle"><span id="ALM-12188__text367020138593">Alarm Clearance</span></h4><p id="ALM-12188__p4178195414013">This alarm is automatically cleared after the fault is rectified.</p>
|
||||
|
||||
98
docs/mrs/umn/ALM-12191.html
Normal file
98
docs/mrs/umn/ALM-12191.html
Normal file
@ -0,0 +1,98 @@
|
||||
<a name="ALM-12191"></a><a name="ALM-12191"></a>
|
||||
|
||||
<h1 class="topictitle1">ALM-12191 Disk I/O Usage Exceeds the Threshold</h1>
|
||||
<div id="body0000002414056805"><div class="section" id="ALM-12191__section118513241113"><h4 class="sectiontitle"><span id="ALM-12191__text6861824414">Alarm Description</span></h4><p id="ALM-12191__p17860241315">The system checks the disk I/O usage every 30 seconds and compares the actual disk I/O usage with the threshold. This alarm is generated when the disk I/O usage exceeds the threshold for multiple consecutive times (<strong id="ALM-12191__b251371582017">3</strong> by default).</p>
|
||||
<p id="ALM-12191__p68652414113">If the <strong id="ALM-12191__b49830192012">hit number</strong> is <strong id="ALM-12191__b31016305209">1</strong>, this alarm is cleared when the disk I/O usage is less than or equal to the threshold. If the <strong id="ALM-12191__b210123010201">hit number</strong> is greater than <strong id="ALM-12191__b6101030192013">1</strong>, this alarm is cleared when the disk I/O usage is less than or equal to 90% of the threshold.</p>
|
||||
<div class="note" id="ALM-12191__note108112448549"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="ALM-12191__p68111446548">This alarm applies only to MRS 3.3.1 or later.</p>
|
||||
</div></div>
|
||||
</div>
|
||||
<div class="section" id="ALM-12191__section158616245115"><h4 class="sectiontitle"><span id="ALM-12191__text086024218">Alarm Attributes</span></h4>
|
||||
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-12191__table28652410119" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-12191__row98618241212"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.1"><p id="ALM-12191__p16866241117"><span id="ALM-12191__text158620245120">Alarm ID</span></p>
|
||||
</th>
|
||||
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.2"><p id="ALM-12191__p138615244117"><span id="ALM-12191__text158614241113">Alarm Severity</span></p>
|
||||
</th>
|
||||
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.3"><p id="ALM-12191__p386324016"><span id="ALM-12191__text386172414118">Auto Cleared</span></p>
|
||||
</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody><tr id="ALM-12191__row11867241210"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.1 "><p id="ALM-12191__p118672419117">12191</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.2 "><p id="ALM-12191__p11863241614">Major</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.3 "><p id="ALM-12191__p1986102417115">Yes</p>
|
||||
</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
</div>
|
||||
</div>
|
||||
<div class="section" id="ALM-12191__section12888241312"><h4 class="sectiontitle"><span id="ALM-12191__text12883241212">Alarm Parameters</span></h4>
|
||||
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-12191__table188324911" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-12191__row88802416118"><th align="left" class="cellrowborder" valign="top" width="20%" id="mcps1.3.3.2.1.4.1.1"><p id="ALM-12191__p9881241214"><span id="ALM-12191__text168812241214">Type</span></p>
|
||||
</th>
|
||||
<th align="left" class="cellrowborder" valign="top" width="20%" id="mcps1.3.3.2.1.4.1.2"><p id="ALM-12191__p9881524711"><span id="ALM-12191__text6881724016">Parameter</span></p>
|
||||
</th>
|
||||
<th align="left" class="cellrowborder" valign="top" width="60%" id="mcps1.3.3.2.1.4.1.3"><p id="ALM-12191__p1288224115"><span id="ALM-12191__text1988192417114">Description</span></p>
|
||||
</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody><tr id="ALM-12191__row78852416112"><td class="cellrowborder" rowspan="4" valign="top" width="20%" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-12191__p1288162419117">Location Information</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="20%" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-12191__p2885245113">Source</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="60%" headers="mcps1.3.3.2.1.4.1.3 "><p id="ALM-12191__p8883244117">Specifies the cluster or system for which the alarm was generated.</p>
|
||||
</td>
|
||||
</tr>
|
||||
<tr id="ALM-12191__row1788224415"><td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-12191__p28816246115">ServiceName</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-12191__p108816240115">Specifies the service for which the alarm was generated.</p>
|
||||
</td>
|
||||
</tr>
|
||||
<tr id="ALM-12191__row888202414114"><td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-12191__p4888249111">RoleName</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-12191__p17887241816">Specifies the role for which the alarm was generated.</p>
|
||||
</td>
|
||||
</tr>
|
||||
<tr id="ALM-12191__row20887247116"><td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-12191__p12885243110">HostName</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-12191__p9881024911">Specifies the host for which the alarm was generated.</p>
|
||||
</td>
|
||||
</tr>
|
||||
<tr id="ALM-12191__row1882024611"><td class="cellrowborder" valign="top" width="20%" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-12191__p2088192410115">Additional Information</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="20%" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-12191__p1288324616">Trigger Condition</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="60%" headers="mcps1.3.3.2.1.4.1.3 "><p id="ALM-12191__p148862419114">Specifies the alarm triggering condition.</p>
|
||||
</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
</div>
|
||||
</div>
|
||||
<div class="section" id="ALM-12191__section788172412113"><h4 class="sectiontitle"><span id="ALM-12191__text108812247114">Impact on the System</span></h4><ul id="ALM-12191__ul48813241511"><li id="ALM-12191__li9881724713">Latency: Service processes may run slowly and there is a latency.</li><li id="ALM-12191__li208862419120">Service failure: Service processing may be slow, time out, or fail. As a result, jobs may fail to run.</li></ul>
|
||||
</div>
|
||||
<div class="section" id="ALM-12191__section198914249112"><h4 class="sectiontitle"><span id="ALM-12191__text18917241317">Possible Causes</span></h4><ul id="ALM-12191__ul1389192413116"><li id="ALM-12191__li48915244110">The alarm threshold or alarm trigger count is improperly configured.</li><li id="ALM-12191__li78914241914">The disk configuration cannot meet service requirements. The disk I/O usage reaches the upper limit. Alternatively, services are in peak hours. The disk I/O usage reaches the upper limit in a short period.</li></ul>
|
||||
</div>
|
||||
<div class="section" id="ALM-12191__section68910249114"><h4 class="sectiontitle"><span id="ALM-12191__text158922411111">Handling Procedure</span></h4><p class="tableheading" id="ALM-12191__p389624217"><strong id="ALM-12191__b5898241914">Check whether the alarm threshold or alarm trigger count is properly configured.</strong></p>
|
||||
<ol id="ALM-12191__ol1189824717"><li id="ALM-12191__li58992418110"><span>Modify the alarm threshold and alarm trigger count based on the actual disk I/O usage.</span><p><ol type="a" id="ALM-12191__ol13162521162515"><li class="litext" id="ALM-12191__li1216210218257">Log in to MRS Manager and choose <strong id="ALM-12191__b1757764982920">O&M</strong> > <strong id="ALM-12191__b85771249122919">Alarm</strong> > <strong id="ALM-12191__b1357794922915">Thresholds</strong>, click the name of the desired cluster, and choose <strong id="ALM-12191__b145781493291">Host</strong> > <strong id="ALM-12191__b557815494292">Disk</strong> > <strong id="ALM-12191__b457864911292">Disk IO Utilization</strong>.</li><li class="litext" id="ALM-12191__li18429223192513">Click the edit button next to <strong id="ALM-12191__b145971023419">Trigger Count</strong> to change it to a proper value based on the actual service usage.<div class="note" id="ALM-12191__note38913241212"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p class="text" id="ALM-12191__p989102411115"><strong id="ALM-12191__b19741599157">Trigger Count</strong> indicates how many consecutive times the threshold is reached when the alarm is triggered.</p>
|
||||
</div></div>
|
||||
</li><li class="litext" id="ALM-12191__li1855212543258">Click <strong id="ALM-12191__b11923111224012">Modify</strong> in the <strong id="ALM-12191__b106281714164013">Operation</strong> column of the row that contains the rule and change the alarm threshold.</li></ol>
|
||||
</p></li><li id="ALM-12191__li17892241618"><span>Wait 2 minutes and check whether the alarm is automatically cleared.</span><p><ul class="subitemlist" id="ALM-12191__ul14899248112"><li id="ALM-12191__li98952419118">If yes, no further action is required.</li><li id="ALM-12191__li108922416114">If no, go to <a href="#ALM-12191__li15891424513">3</a>.</li></ul>
|
||||
</p></li></ol>
|
||||
<p class="tableheading" id="ALM-12191__p489102416113"><strong id="ALM-12191__b19974134414403">Check whether the disk I/O usage reaches the upper limit.</strong></p>
|
||||
<ol start="3" id="ALM-12191__ol1189192417113"><li id="ALM-12191__li15891424513"><a name="ALM-12191__li15891424513"></a><a name="li15891424513"></a><span>On MRS Manager, choose <strong id="ALM-12191__b16664164794217">O&M</strong> > <strong id="ALM-12191__b4483174911422">Alarm</strong> > <strong id="ALM-12191__b12304195154213">Alarms</strong>. In the alarm list, expand the alarm details and click the name of the host for which the alarm is generated in <strong id="ALM-12191__b11639239184510">Location</strong> area.</span></li><li id="ALM-12191__li28914248113"><span>On the overview page of the host, observe the real-time data of the disk I/O usage for about 5 minutes. If the disk I/O usage exceeds the threshold for multiple times, contact the MRS cluster administrator to improve the disk specification.</span><p><p id="ALM-12191__p187211915314">If <strong id="ALM-12191__b204499181158">Disk IO Utilization</strong> chart is not displayed, click the drop-down arrow on the right, select <strong id="ALM-12191__b175081714195115">Customize</strong>, select the desired item, and click <strong id="ALM-12191__b250881425120">OK</strong>.</p>
|
||||
</p></li><li id="ALM-12191__li58972411117"><span>Check whether it was the peak hour. If this alarm was generated during peak hours, expand the node capacity or contact the MRS cluster administrator to improve the disk specification.</span></li><li id="ALM-12191__li3891246115"><span>Check whether the alarm is cleared.</span><p><ul class="subitemlist" id="ALM-12191__ul108913248118"><li id="ALM-12191__li88911244116">If yes, no further action is required.</li><li id="ALM-12191__li19897244119">If no, go to <a href="#ALM-12191__li289102416117">7</a>.</li></ul>
|
||||
</p></li></ol>
|
||||
<p class="tableheading" id="ALM-12191__p1989192418113"><strong id="ALM-12191__b68912419116">Collect fault information.</strong></p>
|
||||
<ol start="7" id="ALM-12191__ol289152414119"><li id="ALM-12191__li289102416117"><a name="ALM-12191__li289102416117"></a><a name="li289102416117"></a><span>On MRS Manager, choose <strong id="ALM-12191__b148203163311">O&M</strong>. In the navigation pane on the left, choose <strong id="ALM-12191__b78143193313">Log</strong> > <strong id="ALM-12191__b186373317">Download</strong>.</span></li><li id="ALM-12191__li18982416120"><span>Expand the <strong id="ALM-12191__b16322940223288">Service</strong> drop-down list, select <strong id="ALM-12191__b10482802353288">NodeAgent</strong> for the target cluster, and click <strong id="ALM-12191__b7916273543288">OK</strong>.</span></li><li id="ALM-12191__li989724817"><span>Click the edit icon in the upper right corner, and set <strong id="ALM-12191__b18695100753288">Start Date</strong> and <strong id="ALM-12191__b1633065223288">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-12191__b20905669423288">Download</strong>.</span></li><li id="ALM-12191__li168914245110"><span>Contact <span id="ALM-12191__text3901424916">O&M personnel</span> and provide the collected logs.</span></li></ol>
|
||||
</div>
|
||||
<div class="section" id="ALM-12191__section99011241715"><h4 class="sectiontitle"><span id="ALM-12191__text1290102413116">Alarm Clearance</span></h4><p id="ALM-12191__p1909244116">This alarm is automatically cleared after the fault is rectified.</p>
|
||||
</div>
|
||||
<div class="section" id="ALM-12191__section1890132416119"><h4 class="sectiontitle"><span id="ALM-12191__text1590192416114">Related Information</span></h4><p id="ALM-12191__p490162412113"><span id="ALM-12191__text11905241310">None.</span></p>
|
||||
</div>
|
||||
</div>
|
||||
<div>
|
||||
<div class="familylinks">
|
||||
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
98
docs/mrs/umn/ALM-12192.html
Normal file
98
docs/mrs/umn/ALM-12192.html
Normal file
@ -0,0 +1,98 @@
|
||||
<a name="ALM-12192"></a><a name="ALM-12192"></a>
|
||||
|
||||
<h1 class="topictitle1">ALM-12192 Host Load Exceeds the Threshold</h1>
|
||||
<div id="body0000002380457464"><div class="section" id="ALM-12192__section60313499"><h4 class="sectiontitle"><span id="ALM-12192__text164311244911">Alarm Description</span></h4><p id="ALM-12192__p47246084">The system checks the average load every 30 seconds and compares the actual average load with the threshold. This alarm is generated when the average load exceeds the threshold for multiple consecutive times (10 by default).</p>
|
||||
<p id="ALM-12192__p22561573">This alarm is cleared when <strong id="ALM-12192__b17320161812540">Trigger Count</strong> is <strong id="ALM-12192__b1732011186549">1</strong> and the average load is less than or equal to the threshold. This alarm is cleared when <strong id="ALM-12192__b23201518175419">Trigger Count</strong> is greater than 1 and the average load is less than or equal to 90% of the threshold.</p>
|
||||
<div class="note" id="ALM-12192__note108112448549"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="ALM-12192__p68111446548">This alarm applies only to MRS 3.3.1 or later.</p>
|
||||
</div></div>
|
||||
</div>
|
||||
<div class="section" id="ALM-12192__section5950580"><h4 class="sectiontitle"><span id="ALM-12192__text4431134419113">Alarm Attributes</span></h4>
|
||||
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-12192__table15548096" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-12192__row49989141"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.1"><p id="ALM-12192__p5431124414118"><span id="ALM-12192__text164315441716">Alarm ID</span></p>
|
||||
</th>
|
||||
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.2"><p id="ALM-12192__p543118441210"><span id="ALM-12192__text5431144415112">Alarm Severity</span></p>
|
||||
</th>
|
||||
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.3"><p id="ALM-12192__p94311444118"><span id="ALM-12192__text7431104415112">Auto Cleared</span></p>
|
||||
</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody><tr id="ALM-12192__row30415758"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.1 "><p id="ALM-12192__p47757325">12192</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.2 "><p id="ALM-12192__p43138141">Major</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.3 "><p id="ALM-12192__p4528550">Yes</p>
|
||||
</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
</div>
|
||||
</div>
|
||||
<div class="section" id="ALM-12192__section53555227"><h4 class="sectiontitle"><span id="ALM-12192__text134312441118">Alarm Parameters</span></h4>
|
||||
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-12192__table31268239" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-12192__row59179380"><th align="left" class="cellrowborder" valign="top" width="20%" id="mcps1.3.3.2.1.4.1.1"><p id="ALM-12192__p313918575184"><span id="ALM-12192__text23739193194">Type</span></p>
|
||||
</th>
|
||||
<th align="left" class="cellrowborder" valign="top" width="20%" id="mcps1.3.3.2.1.4.1.2"><p id="ALM-12192__p184311944115"><span id="ALM-12192__text543117441219">Parameter</span></p>
|
||||
</th>
|
||||
<th align="left" class="cellrowborder" valign="top" width="60%" id="mcps1.3.3.2.1.4.1.3"><p id="ALM-12192__p184328441110"><span id="ALM-12192__text4432204415119">Description</span></p>
|
||||
</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody><tr id="ALM-12192__row12465939134110"><td class="cellrowborder" rowspan="4" valign="top" width="20%" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-12192__p675011219199">Location Information</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="20%" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-12192__p17935380415">Source</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="60%" headers="mcps1.3.3.2.1.4.1.3 "><p id="ALM-12192__p187931338134115">Specifies the cluster or system for which the alarm was generated.</p>
|
||||
</td>
|
||||
</tr>
|
||||
<tr id="ALM-12192__row48724307"><td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-12192__p54354790">ServiceName</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-12192__p40661878">Specifies the service for which the alarm was generated.</p>
|
||||
</td>
|
||||
</tr>
|
||||
<tr id="ALM-12192__row30412584"><td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-12192__p47500221">RoleName</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-12192__p22312707">Specifies the role for which the alarm was generated.</p>
|
||||
</td>
|
||||
</tr>
|
||||
<tr id="ALM-12192__row66596640"><td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-12192__p25618737">HostName</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-12192__p61851848">Specifies the host for which the alarm was generated.</p>
|
||||
</td>
|
||||
</tr>
|
||||
<tr id="ALM-12192__row19278171612917"><td class="cellrowborder" valign="top" width="20%" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-12192__p2226161952215">Additional Information</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="20%" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-12192__p32262019192217">Trigger Condition</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="60%" headers="mcps1.3.3.2.1.4.1.3 "><p id="ALM-12192__p1222681911222">Specifies the alarm triggering condition.</p>
|
||||
</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
</div>
|
||||
</div>
|
||||
<div class="section" id="ALM-12192__section12235000"><h4 class="sectiontitle"><span id="ALM-12192__text1343217441218">Impact on the System</span></h4><ul id="ALM-12192__ul1692661316257"><li id="ALM-12192__li199260133257">Latency: Service processes may run slowly and there is a latency.</li><li id="ALM-12192__li109263130250">Service failure: Service processing may be slow, time out, or fail. As a result, jobs may fail to run.</li></ul>
|
||||
</div>
|
||||
<div class="section" id="ALM-12192__section43006140"><h4 class="sectiontitle"><span id="ALM-12192__text74322441910">Possible Causes</span></h4><ul id="ALM-12192__ul9818458"><li id="ALM-12192__li21257266">The alarm threshold or alarm trigger count is improperly configured.</li><li id="ALM-12192__li57097668">The host cannot meet service requirements. The average load reaches the upper limit. Alternatively, requirements surged during peak hours, and the average load reaches the upper limit in a short period.</li></ul>
|
||||
</div>
|
||||
<div class="section" id="ALM-12192__section8793187020"><h4 class="sectiontitle"><span id="ALM-12192__text543211444114">Handling Procedure</span></h4><p class="tableheading" id="ALM-12192__p61508423"><strong id="ALM-12192__b778131416584">Check whether the alarm threshold or alarm trigger count is properly configured.</strong></p>
|
||||
<ol id="ALM-12192__ol28325612173738"><li id="ALM-12192__li65444081173731"><span>Modify the alarm threshold and alarm trigger count based on the actual CPU usage.</span><p><ol type="a" id="ALM-12192__ol12276421193817"><li class="litext" id="ALM-12192__li8715112313409">To change the threshold, log in to MRS Manager, choose <strong id="ALM-12192__b978114466596">O&M</strong> > <strong id="ALM-12192__b19471175012594">Alarm</strong> > <strong id="ALM-12192__b1625452105914">Thresholds</strong>, click the name of the desired cluster, and choose <strong id="ALM-12192__b2065343816018">Host</strong> > <strong id="ALM-12192__b01425499016">Host Status</strong> > <strong id="ALM-12192__b1560318545014">Average Host Load Information</strong>.</li><li class="litext" id="ALM-12192__li6276162113383">Click the edit button next to <strong id="ALM-12192__b109604421320">Trigger Count</strong> to set it a proper value based on the actual service usage.<div class="note" id="ALM-12192__note2277146173731"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p class="text" id="ALM-12192__p253016173731"><strong id="ALM-12192__b816813532517">Trigger Count</strong> indicates how many consecutive times the threshold is reached when the alarm is triggered.</p>
|
||||
</div></div>
|
||||
</li><li class="litext" id="ALM-12192__li9666112074016">Click <strong id="ALM-12192__b468920342138">Modify</strong> in the <strong id="ALM-12192__b176901434111314">Operation</strong> column of the row that contains the rule and change the alarm threshold.</li></ol>
|
||||
</p></li><li id="ALM-12192__li29512697173731"><span>Wait 2 minutes and check whether the alarm is automatically cleared.</span><p><ul class="subitemlist" id="ALM-12192__ul16105939173731"><li id="ALM-12192__li52125820173731">If yes, no further action is required.</li><li id="ALM-12192__li61441872173731">If no, go to <a href="#ALM-12192__li64287686173731">3</a>.</li></ul>
|
||||
</p></li></ol>
|
||||
<p class="tableheading" id="ALM-12192__p10735729173731"><strong id="ALM-12192__b1919635451312">Check whether the average load reaches the upper limit.</strong></p>
|
||||
<ol start="3" id="ALM-12192__ol805769017382"><li id="ALM-12192__li64287686173731"><a name="ALM-12192__li64287686173731"></a><a name="li64287686173731"></a><span>On MRS Manager, choose <strong id="ALM-12192__b142177146">O&M</strong> > <strong id="ALM-12192__b14423761410">Alarm</strong> > <strong id="ALM-12192__b1442187161411">Alarms</strong>. In the alarm list, expand the alarm details and click the name of the host for which the alarm is generated in <strong id="ALM-12192__b74210701416">Location</strong> area.</span></li><li id="ALM-12192__li12299054173731"><span>On the overview page of the host, observe the real-time data of average host load for about 5 minutes. If the average load exceeds the threshold for multiple times, contact the MRS cluster administrator to improve the host specification.</span><p><p id="ALM-12192__p187211915314">If <strong id="ALM-12192__b13260185701416">Average Host Load Information</strong> chart is not displayed, click the drop-down arrow on the right, select <strong id="ALM-12192__b118771439101418">Customize</strong>, select the desired item, and click <strong id="ALM-12192__b1987713918145">OK</strong>.</p>
|
||||
</p></li><li id="ALM-12192__li7578531132216"><span>Check whether it was the peak hour. If this alarm was generated during peak hours, expand the node capacity or contact the MRS cluster administrator to improve the host specification.</span></li><li id="ALM-12192__li41709335173731"><span>Check whether the alarm is cleared.</span><p><ul class="subitemlist" id="ALM-12192__ul61824262173731"><li id="ALM-12192__li56699280173731">If yes, no further action is required.</li><li id="ALM-12192__li29238983173731">If no, go to <a href="#ALM-12192__li39839699173731">7</a>.</li></ul>
|
||||
</p></li></ol>
|
||||
<p class="tableheading" id="ALM-12192__p19547451173731"><strong id="ALM-12192__b1709986317387">Collect fault information.</strong></p>
|
||||
<ol start="7" id="ALM-12192__ol56824812173810"><li id="ALM-12192__li39839699173731"><a name="ALM-12192__li39839699173731"></a><a name="li39839699173731"></a><span>On MRS Manager, choose <strong id="ALM-12192__b179942683317">O&M</strong>. In the navigation pane on the left, choose <strong id="ALM-12192__b159949619336">Log</strong> > <strong id="ALM-12192__b2099515683315">Download</strong>.</span></li><li id="ALM-12192__li23012976173731"><span>Expand the <strong id="ALM-12192__b44703167032757">Service</strong> drop-down list, select <strong id="ALM-12192__b214522637432757">NodeAgent</strong> for the target cluster, and click <strong id="ALM-12192__b41808396632757">OK</strong>.</span></li><li id="ALM-12192__li5790200173731"><span>Click the edit icon in the upper right corner, and set <strong id="ALM-12192__b138048470032757">Start Date</strong> and <strong id="ALM-12192__b104119192432757">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-12192__b150670131432757">Download</strong>.</span></li><li id="ALM-12192__li66353041173731"><span>Contact <span id="ALM-12192__text1643218448114">O&M personnel</span> and provide the collected logs.</span></li></ol>
|
||||
</div>
|
||||
<div class="section" id="ALM-12192__section743216441318"><h4 class="sectiontitle"><span id="ALM-12192__text204325441618">Alarm Clearance</span></h4><p id="ALM-12192__p1543274413111">This alarm is automatically cleared after the fault is rectified.</p>
|
||||
</div>
|
||||
<div class="section" id="ALM-12192__section04326448115"><h4 class="sectiontitle"><span id="ALM-12192__text1443214448115">Related Information</span></h4><p id="ALM-12192__p194329447110"><span id="ALM-12192__text943211442016">None.</span></p>
|
||||
</div>
|
||||
</div>
|
||||
<div>
|
||||
<div class="familylinks">
|
||||
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
81
docs/mrs/umn/ALM-12200.html
Normal file
81
docs/mrs/umn/ALM-12200.html
Normal file
@ -0,0 +1,81 @@
|
||||
<a name="ALM-12200"></a><a name="ALM-12200"></a>
|
||||
|
||||
<h1 class="topictitle1">ALM-12200 Password Is About to Expire</h1>
|
||||
<div id="body0000002413936969"><div class="section" id="ALM-12200__section60313499"><h4 class="sectiontitle"><span id="ALM-12200__text164311244911">Alarm Description</span></h4><p id="ALM-12200__p1212203510114">The system checks whether a user password is about to expire at 1:00 a.m. every day. This alarm is generated when a user password is about to expire in less than 5 days by default.</p>
|
||||
<p id="ALM-12200__p371028104017">This alarm is cleared when the password is about to expire in at least five days by default.</p>
|
||||
<div class="note" id="ALM-12200__note108112448549"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="ALM-12200__p68111446548">This alarm applies only to MRS 3.3.1 or later.</p>
|
||||
</div></div>
|
||||
</div>
|
||||
<div class="section" id="ALM-12200__section5950580"><h4 class="sectiontitle"><span id="ALM-12200__text4431134419113">Alarm Attributes</span></h4>
|
||||
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-12200__table15548096" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-12200__row49989141"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.1"><p id="ALM-12200__p5431124414118"><span id="ALM-12200__text164315441716">Alarm ID</span></p>
|
||||
</th>
|
||||
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.2"><p id="ALM-12200__p543118441210"><span id="ALM-12200__text5431144415112">Alarm Severity</span></p>
|
||||
</th>
|
||||
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.3"><p id="ALM-12200__p94311444118"><span id="ALM-12200__text7431104415112">Auto Cleared</span></p>
|
||||
</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody><tr id="ALM-12200__row30415758"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.1 "><p id="ALM-12200__p052373115328">12200</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.2 "><p id="ALM-12200__p11522631133214">Major</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.3 "><p id="ALM-12200__p4528550">Yes</p>
|
||||
</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
</div>
|
||||
</div>
|
||||
<div class="section" id="ALM-12200__section53555227"><h4 class="sectiontitle"><span id="ALM-12200__text134312441118">Alarm Parameters</span></h4>
|
||||
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-12200__table31268239" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-12200__row59179380"><th align="left" class="cellrowborder" valign="top" width="20%" id="mcps1.3.3.2.1.4.1.1"><p id="ALM-12200__p313918575184"><span id="ALM-12200__text23739193194">Type</span></p>
|
||||
</th>
|
||||
<th align="left" class="cellrowborder" valign="top" width="20%" id="mcps1.3.3.2.1.4.1.2"><p id="ALM-12200__p184311944115"><span id="ALM-12200__text543117441219">Parameter</span></p>
|
||||
</th>
|
||||
<th align="left" class="cellrowborder" valign="top" width="60%" id="mcps1.3.3.2.1.4.1.3"><p id="ALM-12200__p184328441110"><span id="ALM-12200__text4432204415119">Description</span></p>
|
||||
</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody><tr id="ALM-12200__row12465939134110"><td class="cellrowborder" rowspan="2" valign="top" width="20%" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-12200__p675011219199">Location Information</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="20%" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-12200__p17935380415">Source</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="60%" headers="mcps1.3.3.2.1.4.1.3 "><p id="ALM-12200__p187931338134115">Specifies the cluster or system for which the alarm was generated.</p>
|
||||
</td>
|
||||
</tr>
|
||||
<tr id="ALM-12200__row48724307"><td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-12200__p54354790">ServiceName</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-12200__p40661878">Specifies the service for which the alarm was generated.</p>
|
||||
</td>
|
||||
</tr>
|
||||
<tr id="ALM-12200__row19278171612917"><td class="cellrowborder" valign="top" width="20%" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-12200__p2226161952215">Additional Information</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="20%" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-12200__p32262019192217">Details</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="60%" headers="mcps1.3.3.2.1.4.1.3 "><p id="ALM-12200__p1222681911222">Specifies that the username of password that is about to expire.</p>
|
||||
</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
</div>
|
||||
</div>
|
||||
<div class="section" id="ALM-12200__section12235000"><h4 class="sectiontitle"><span id="ALM-12200__text1343217441218">Impact on the System</span></h4><p id="ALM-12200__p12150749154618">The account cannot be used.</p>
|
||||
</div>
|
||||
<div class="section" id="ALM-12200__section43006140"><h4 class="sectiontitle"><span id="ALM-12200__text74322441910">Possible Causes</span></h4><p id="ALM-12200__p15323151284712">The password is about to expire.</p>
|
||||
</div>
|
||||
<div class="section" id="ALM-12200__section8793187020"><h4 class="sectiontitle"><span id="ALM-12200__text543211444114">Handling Procedure</span></h4><p id="ALM-12200__p158951940339"><strong id="ALM-12200__b2078914361111">Change the user password.</strong></p>
|
||||
<ol id="ALM-12200__ol805769017382"><li id="ALM-12200__li91811959104819"><span>Log in to MRS Manager and choose <strong id="ALM-12200__b1122904015110">O&M</strong> > <strong id="ALM-12200__b7695841181115">Alarm</strong> > <strong id="ALM-12200__b113204310116">Alarms</strong>. In the alarm list, expand the alarm details, and view and record the name of the user whose password is about to expire in additional information.</span></li><li id="ALM-12200__li1312218213019"><span>Change the password.</span></li><li id="ALM-12200__li1510045515297"><span>If the DataArts Studio service is interconnected, check whether DataArts Studio jobs uses a password that is about to expire. If yes, go to the DataArts Studio management center to change the password. Otherwise, a large number of jobs may fail.</span></li><li id="ALM-12200__li9879173695916"><span>Check whether the alarm is automatically cleared after 1:00 a.m. the next day.</span><p><ul class="subitemlist" id="ALM-12200__ul229363919451"><li id="ALM-12200__li32932039174514">If yes, no further action is required.</li><li id="ALM-12200__li12293183944516">If no, go to <a href="#ALM-12200__li39839699173731">5</a>.</li></ul>
|
||||
</p></li></ol>
|
||||
<p class="tableheading" id="ALM-12200__p19547451173731"><strong id="ALM-12200__b1709986317387">Collect fault information.</strong></p>
|
||||
<ol start="5" id="ALM-12200__ol56824812173810"><li id="ALM-12200__li39839699173731"><a name="ALM-12200__li39839699173731"></a><a name="li39839699173731"></a><span>On MRS Manager, choose <strong id="ALM-12200__b681623815338">O&M</strong>. In the navigation pane on the left, choose <strong id="ALM-12200__b14816103803317">Log</strong> > <strong id="ALM-12200__b781663843317">Download</strong>.</span></li><li id="ALM-12200__li23012976173731"><span>Select <strong id="ALM-12200__b14036516847030">Controller</strong> for <strong id="ALM-12200__b13375985127030">Service</strong> and click <strong id="ALM-12200__b61859527030">OK</strong>.</span></li><li id="ALM-12200__li5790200173731"><span>Click the edit icon in the upper right corner, and set <strong id="ALM-12200__b15490553507030">Start Date</strong> and <strong id="ALM-12200__b16372500157030">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-12200__b4983846327030">Download</strong>.</span></li><li id="ALM-12200__li66353041173731"><span>Contact <span id="ALM-12200__text1643218448114">O&M personnel</span> and provide the collected logs.</span></li></ol>
|
||||
</div>
|
||||
<div class="section" id="ALM-12200__section743216441318"><h4 class="sectiontitle"><span id="ALM-12200__text204325441618">Alarm Clearance</span></h4><p id="ALM-12200__p1543274413111">This alarm is automatically cleared after the fault is rectified.</p>
|
||||
</div>
|
||||
<div class="section" id="ALM-12200__section04326448115"><h4 class="sectiontitle"><span id="ALM-12200__text1443214448115">Related Information</span></h4><p id="ALM-12200__p194329447110"><span id="ALM-12200__text943211442016">None.</span></p>
|
||||
</div>
|
||||
</div>
|
||||
<div>
|
||||
<div class="familylinks">
|
||||
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
98
docs/mrs/umn/ALM-12201.html
Normal file
98
docs/mrs/umn/ALM-12201.html
Normal file
@ -0,0 +1,98 @@
|
||||
<a name="ALM-12201"></a><a name="ALM-12201"></a>
|
||||
|
||||
<h1 class="topictitle1">ALM-12201 Process CPU Usage Exceeds the Threshold</h1>
|
||||
<div id="body0000002380297588"><div class="section" id="ALM-12201__section60313499"><h4 class="sectiontitle"><span id="ALM-12201__text8925301575">Alarm Description</span></h4><p id="ALM-12201__p47246084">The system checks the CPU usage every 30 seconds and compares the check result with the default threshold. This alarm is generated when the CPU usage exceeds the threshold for multiple consecutive times (<strong id="ALM-12201__b113111232516">10</strong> by default).</p>
|
||||
<p id="ALM-12201__p22561573">This alarm is cleared when <strong id="ALM-12201__b177291514175114">Trigger Count</strong> is <strong id="ALM-12201__b5729161485111">1</strong> and the CPU usage is less than or equal to the threshold. This alarm is cleared when <strong id="ALM-12201__b137291145512">Trigger Count</strong> is greater than 1 and the CPU usage is less than or equal to 90% of the threshold.</p>
|
||||
<div class="note" id="ALM-12201__note108112448549"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="ALM-12201__p68111446548">This alarm applies only to MRS 3.3.1 or later.</p>
|
||||
</div></div>
|
||||
</div>
|
||||
<div class="section" id="ALM-12201__section5950580"><h4 class="sectiontitle"><span id="ALM-12201__text38748475555">Alarm Attributes</span></h4>
|
||||
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-12201__table15548096" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-12201__row49989141"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.1"><p id="ALM-12201__p57710042"><span id="ALM-12201__text17980150175619">Alarm ID</span></p>
|
||||
</th>
|
||||
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.2"><p id="ALM-12201__p44001849"><span id="ALM-12201__text199471335614">Alarm Severity</span></p>
|
||||
</th>
|
||||
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.3"><p id="ALM-12201__p7380012"><span id="ALM-12201__text152400388563">Auto Cleared</span></p>
|
||||
</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody><tr id="ALM-12201__row30415758"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.1 "><p id="ALM-12201__p47757325">12201</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.2 "><p id="ALM-12201__p43138141">Major</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.3 "><p id="ALM-12201__p4528550">Yes</p>
|
||||
</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
</div>
|
||||
</div>
|
||||
<div class="section" id="ALM-12201__section53555227"><h4 class="sectiontitle"><span id="ALM-12201__text155061195577">Alarm Parameters</span></h4>
|
||||
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-12201__table31268239" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-12201__row59179380"><th align="left" class="cellrowborder" valign="top" width="20%" id="mcps1.3.3.2.1.4.1.1"><p id="ALM-12201__p313918575184"><span id="ALM-12201__text23739193194">Type</span></p>
|
||||
</th>
|
||||
<th align="left" class="cellrowborder" valign="top" width="20%" id="mcps1.3.3.2.1.4.1.2"><p id="ALM-12201__p21975462"><span id="ALM-12201__text776142495720">Parameter</span></p>
|
||||
</th>
|
||||
<th align="left" class="cellrowborder" valign="top" width="60%" id="mcps1.3.3.2.1.4.1.3"><p id="ALM-12201__p35182007"><span id="ALM-12201__text632018391572">Description</span></p>
|
||||
</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody><tr id="ALM-12201__row12465939134110"><td class="cellrowborder" rowspan="4" valign="top" width="20%" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-12201__p675011219199">Location Information</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="20%" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-12201__p17935380415">Source</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="60%" headers="mcps1.3.3.2.1.4.1.3 "><p id="ALM-12201__p187931338134115">Specifies the cluster or system for which the alarm was generated.</p>
|
||||
</td>
|
||||
</tr>
|
||||
<tr id="ALM-12201__row48724307"><td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-12201__p54354790">ServiceName</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-12201__p40661878">Specifies the service for which the alarm was generated.</p>
|
||||
</td>
|
||||
</tr>
|
||||
<tr id="ALM-12201__row30412584"><td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-12201__p47500221">RoleName</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-12201__p22312707">Specifies the role for which the alarm was generated.</p>
|
||||
</td>
|
||||
</tr>
|
||||
<tr id="ALM-12201__row66596640"><td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-12201__p25618737">HostName</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-12201__p61851848">Specifies the host for which the alarm was generated.</p>
|
||||
</td>
|
||||
</tr>
|
||||
<tr id="ALM-12201__row19278171612917"><td class="cellrowborder" valign="top" width="20%" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-12201__p2226161952215">Additional Information</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="20%" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-12201__p32262019192217">Trigger Condition</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="60%" headers="mcps1.3.3.2.1.4.1.3 "><p id="ALM-12201__p1222681911222">Specifies the alarm triggering condition.</p>
|
||||
</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
</div>
|
||||
</div>
|
||||
<div class="section" id="ALM-12201__section12235000"><h4 class="sectiontitle"><span id="ALM-12201__text2266192715582">Impact on the System</span></h4><ul id="ALM-12201__ul1692661316257"><li id="ALM-12201__li199260133257">Latency: Service processes may run slowly and there is a latency.</li><li id="ALM-12201__li109263130250">Service failure: Service processing may be slow, time out, or fail. As a result, jobs may fail to run.</li></ul>
|
||||
</div>
|
||||
<div class="section" id="ALM-12201__section43006140"><h4 class="sectiontitle"><span id="ALM-12201__text12656240135813">Possible Causes</span></h4><ul id="ALM-12201__ul9818458"><li id="ALM-12201__li21257266">The alarm threshold or alarm trigger count is improperly configured.</li><li id="ALM-12201__li57097668">The CPU configuration cannot meet service requirements, and the CPU usage reaches the upper limit. Alternatively, services are in peak hours. The CPU usage reaches the upper limit in a short period.</li></ul>
|
||||
</div>
|
||||
<div class="section" id="ALM-12201__section151236875016"><h4 class="sectiontitle"><span id="ALM-12201__text19569135285811">Handling Procedure</span></h4><p class="tableheading" id="ALM-12201__p61508423"><strong id="ALM-12201__b1291384818526">Check whether the alarm threshold or alarm trigger count is properly configured.</strong></p>
|
||||
<ol id="ALM-12201__ol28325612173738"><li id="ALM-12201__li65444081173731"><span>Modify the alarm threshold and alarm trigger count based on the actual CPU usage.</span><p><ol type="a" id="ALM-12201__ol5242114310516"><li class="litext" id="ALM-12201__li12885165916617">Log in to MRS Manager and choose <strong id="ALM-12201__b78231163530">O&M</strong> > <strong id="ALM-12201__b329113865317">Alarm</strong> > <strong id="ALM-12201__b1731155116272">Thresholds</strong> > <strong id="ALM-12201__b998955842713">OMS</strong> > <strong id="ALM-12201__b15181195192812">OMSServices</strong> > <strong id="ALM-12201__b1882817912812">CPU</strong> > <strong id="ALM-12201__b684911346282">Process Used CPU (OMS)</strong>.</li><li class="litext" id="ALM-12201__li62425432058">Click the edit button next to <strong id="ALM-12201__b141766486061214">Trigger Count</strong> to set it a proper value based on the actual service usage.<div class="note" id="ALM-12201__note2277146173731"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p class="text" id="ALM-12201__p253016173731"><strong id="ALM-12201__b49975566261214">Trigger Count</strong> indicates how many consecutive times the threshold is reached when the alarm is triggered.</p>
|
||||
</div></div>
|
||||
</li><li class="litext" id="ALM-12201__li5272163918519">Click <strong id="ALM-12201__b0585155225411">Modify</strong> in the <strong id="ALM-12201__b1585195216549">Operation</strong> column of the row that contains the rule and change the alarm threshold.</li></ol>
|
||||
</p></li><li id="ALM-12201__li29512697173731"><span>Wait 2 minutes and check whether the alarm is automatically cleared.</span><p><ul class="subitemlist" id="ALM-12201__ul16105939173731"><li id="ALM-12201__li52125820173731">If yes, no further action is required.</li><li id="ALM-12201__li61441872173731">If no, go to <a href="#ALM-12201__li64287686173731">3</a>.</li></ul>
|
||||
</p></li></ol>
|
||||
<p class="tableheading" id="ALM-12201__p10735729173731"><strong id="ALM-12201__b4111954173747">Check whether the CPU usage reaches the upper limit.</strong></p>
|
||||
<ol start="3" id="ALM-12201__ol805769017382"><li id="ALM-12201__li64287686173731"><a name="ALM-12201__li64287686173731"></a><a name="li64287686173731"></a><span>On MRS Manager, choose <strong id="ALM-12201__b14130452561214">O&M</strong> > <strong id="ALM-12201__b43639832261214">Alarm</strong> > <strong id="ALM-12201__b213313829961214">Alarms</strong>. In the alarm list, expand the alarm details and click the name of the host for which the alarm is generated in <strong id="ALM-12201__b137858407361214">Location</strong> area.</span></li><li id="ALM-12201__li12299054173731"><span>On the overview page of the host, observe the real-time data of the host CPU usage for about 5 minutes. If the CPU usage exceeds the threshold for multiple times, contact the MRS cluster administrator to increase the CPU.</span><p><p id="ALM-12201__p187211915314">If no chart is available, click the drop-down arrow on the right, select <strong id="ALM-12201__b62956453561214">Customize</strong>, select the desired item, and click <strong id="ALM-12201__b126176264661214">OK</strong>.</p>
|
||||
</p></li><li id="ALM-12201__li7578531132216"><span>Check whether it was the peak hour. If this alarm was generated during peak hours, expand the node capacity or contact the MRS cluster administrator to improve the CPU specification.</span></li><li id="ALM-12201__li41709335173731"><span>Check whether the alarm is cleared.</span><p><ul class="subitemlist" id="ALM-12201__ul61824262173731"><li id="ALM-12201__li56699280173731">If yes, no further action is required.</li><li id="ALM-12201__li29238983173731">If no, go to <a href="#ALM-12201__li39839699173731">7</a>.</li></ul>
|
||||
</p></li></ol>
|
||||
<p class="tableheading" id="ALM-12201__p19547451173731"><strong id="ALM-12201__b1709986317387">Collect fault information.</strong></p>
|
||||
<ol start="7" id="ALM-12201__ol56824812173810"><li id="ALM-12201__li39839699173731"><a name="ALM-12201__li39839699173731"></a><a name="li39839699173731"></a><span>On MRS Manager, choose <strong id="ALM-12201__b51824788561214">O&M</strong>. In the navigation pane on the left, choose <strong id="ALM-12201__b151097751261214">Log</strong> > <strong id="ALM-12201__b199091625661214">Download</strong>.</span></li><li id="ALM-12201__li23012976173731"><span>Expand the <strong id="ALM-12201__b35229855661214">Service</strong> drop-down list, select <strong id="ALM-12201__b97569294261214">OmmServer</strong> for the target cluster, and click <strong id="ALM-12201__b125574938361214">OK</strong>.</span></li><li id="ALM-12201__li5790200173731"><span>Click the edit icon in the upper right corner, and set <strong id="ALM-12201__b180350874361214">Start Date</strong> and <strong id="ALM-12201__b8618195061214">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-12201__b181002099461214">Download</strong>.</span></li><li id="ALM-12201__li66353041173731"><span>Contact <span id="ALM-12201__text126301214142412">O&M personnel</span> and provide the collected logs.</span></li></ol>
|
||||
</div>
|
||||
<div class="section" id="ALM-12201__section169311343318"><h4 class="sectiontitle"><span id="ALM-12201__text367020138593">Alarm Clearance</span></h4><p id="ALM-12201__p754913417333">This alarm is automatically cleared after the fault is rectified.</p>
|
||||
</div>
|
||||
<div class="section" id="ALM-12201__section53362350"><h4 class="sectiontitle"><span id="ALM-12201__text1246242445916">Related Information</span></h4><p id="ALM-12201__p7522741"><span id="ALM-12201__text1881919412591">None.</span></p>
|
||||
</div>
|
||||
</div>
|
||||
<div>
|
||||
<div class="familylinks">
|
||||
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
93
docs/mrs/umn/ALM-12202.html
Normal file
93
docs/mrs/umn/ALM-12202.html
Normal file
@ -0,0 +1,93 @@
|
||||
<a name="ALM-12202"></a><a name="ALM-12202"></a>
|
||||
|
||||
<h1 class="topictitle1">ALM-12202 Process Memory Usage Exceeds the Threshold</h1>
|
||||
<div id="body0000002414056801"><div class="section" id="ALM-12202__s654199794cb646f5baa4518aefce49a3"><h4 class="sectiontitle"><span id="ALM-12202__text8183144004216">Alarm Description</span></h4><p id="ALM-12202__ab2a25e177015443ca3ded37167e5fc4d">The system checks the memory usage of main OMS processes every 30 seconds. This alarm is generated when the memory usage of main OMS processes is greater than 90% (default value) of the maximum memory.</p>
|
||||
<p id="ALM-12202__p10845144191216">This alarm is cleared when the memory usage of main OMS processes is less than or equal to 90% of the maximum memory.</p>
|
||||
<div class="note" id="ALM-12202__note108112448549"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="ALM-12202__p68111446548">This alarm applies only to MRS 3.3.1 or later.</p>
|
||||
</div></div>
|
||||
</div>
|
||||
<div class="section" id="ALM-12202__sa26ae86d3dad41409f83a1377a9ffcfa"><h4 class="sectiontitle"><span id="ALM-12202__text817617154720">Alarm Attributes</span></h4>
|
||||
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-12202__tcf229e81dd344017b6e4cffa8812ea38" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-12202__r71b321b2dfa44544b9d38c31a7c564c0"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.1"><p id="ALM-12202__a8c676edf82c34fe9ac00e771db46396a"><span id="ALM-12202__text1444220109553">Alarm ID</span></p>
|
||||
</th>
|
||||
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.2"><p id="ALM-12202__a49e228ba3e9744a9ba62df793cb9f48a"><span id="ALM-12202__text126644116574">Alarm Severity</span></p>
|
||||
</th>
|
||||
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.3"><p id="ALM-12202__acf332948bf634701b2eb985488faaf8b"><span id="ALM-12202__text3482171635914">Auto Cleared</span></p>
|
||||
</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody><tr id="ALM-12202__r97968a0e761c4d90b952c9bfc25f44f9"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.1 "><p id="ALM-12202__a441c0b910dff45a3822b47f0c38788b2">12202</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.2 "><p id="ALM-12202__p763318419313">Major</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.3 "><p id="ALM-12202__a14026af7cc9948e494ecb783327d2acd">Yes</p>
|
||||
</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
</div>
|
||||
</div>
|
||||
<div class="section" id="ALM-12202__s8847e557c6b3453aaee9f6581c60c7f0"><h4 class="sectiontitle"><span id="ALM-12202__text113076815500">Alarm Parameters</span></h4>
|
||||
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-12202__ta4b460384c754c91b30862b9ff824f4f" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-12202__r99aa4ff1011848d48389427aebb04c06"><th align="left" class="cellrowborder" valign="top" width="20%" id="mcps1.3.3.2.1.4.1.1"><p id="ALM-12202__p171301518194511"><span id="ALM-12202__text169109297467">Type</span></p>
|
||||
</th>
|
||||
<th align="left" class="cellrowborder" valign="top" width="20%" id="mcps1.3.3.2.1.4.1.2"><p id="ALM-12202__aecb6aec3722f463da013cd6a9681d943"><span id="ALM-12202__text823417312010">Parameter</span></p>
|
||||
</th>
|
||||
<th align="left" class="cellrowborder" valign="top" width="60%" id="mcps1.3.3.2.1.4.1.3"><p id="ALM-12202__ab6eca64948c8482f8161c84f93f75401"><span id="ALM-12202__text860714538113">Description</span></p>
|
||||
</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody><tr id="ALM-12202__row88669469128"><td class="cellrowborder" rowspan="4" valign="top" width="20%" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-12202__p357045754617">Location Information</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="20%" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-12202__p989615011559">Source</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="60%" headers="mcps1.3.3.2.1.4.1.3 "><p id="ALM-12202__p118966500554">Specifies the cluster for which the alarm was generated.</p>
|
||||
</td>
|
||||
</tr>
|
||||
<tr id="ALM-12202__r6d7be8e0e35a4cf08993625fc62e4301"><td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-12202__p41293795">ServiceName</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-12202__afadd76ad17914fd18b2494f51b17997f">Specifies the service for which the alarm was generated.</p>
|
||||
</td>
|
||||
</tr>
|
||||
<tr id="ALM-12202__r5e5fd6c56f564e7ea629ac99dc22bcce"><td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-12202__p23892775">RoleName</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-12202__ad6d4ca57de514e04b1eb96e905a2938f">Specifies the role for which the alarm was generated.</p>
|
||||
</td>
|
||||
</tr>
|
||||
<tr id="ALM-12202__r9a551a22222e4651b10174987c965499"><td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-12202__p14847206">HostName</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-12202__aa6ded27644dd42aba2f76e0ecd52a010">Specifies the host for which the alarm was generated.</p>
|
||||
</td>
|
||||
</tr>
|
||||
<tr id="ALM-12202__rb39a34fe9a5d406d8ffdb11e868ddecd"><td class="cellrowborder" valign="top" width="20%" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-12202__p65689564511">Additional Information</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="20%" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-12202__a22165862128b459a910b476586ac7149">Trigger Condition</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="60%" headers="mcps1.3.3.2.1.4.1.3 "><p id="ALM-12202__a082c50ad862e4407b31c3d7e28fc781a">Specifies the alarm triggering condition.</p>
|
||||
</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
</div>
|
||||
</div>
|
||||
<div class="section" id="ALM-12202__s7e65a524bbfd4a28ae8d0ad568b2d9bd"><h4 class="sectiontitle"><span id="ALM-12202__text104141775517">Impact on the System</span></h4><p id="ALM-12202__af87172183dc64942a222a190ec490a2f">If the memory usage of main OMS processes is too high, the performance of these processes deteriorates, and even memory overflow occurs. As a result, main OMS processes are unavailable, and OMS tasks are slow or fail to run.</p>
|
||||
</div>
|
||||
<div class="section" id="ALM-12202__s01cd2ae89bd34527b0a20a4ae96da722"><h4 class="sectiontitle"><span id="ALM-12202__text4611134585211">Possible Causes</span></h4><p id="ALM-12202__a5f14f3167e2848079676c82ff8ea6912">The memory usage of main OMS processes is too high or the memory is inappropriately allocated.</p>
|
||||
</div>
|
||||
<div class="section" id="ALM-12202__section1152415311557"><h4 class="sectiontitle"><span id="ALM-12202__text16973885411">Handling Procedure</span></h4><p class="tableheading" id="ALM-12202__a10260c3a3ddf4e069f6222fd6e97959f"><strong id="ALM-12202__b587170513">Check the memory usage of main OMS processes.</strong></p>
|
||||
<ol id="ALM-12202__ol1825416321577"><li id="ALM-12202__li825463245714"><span>On MRS Manager, choose <strong id="ALM-12202__b1984316575213">O&M</strong> > <strong id="ALM-12202__b74514599215">Alarm</strong> > <strong id="ALM-12202__b47831601835">Alarms</strong>. In the alarm list, expand the alarm details, record the process name in <strong id="ALM-12202__b3935101417317">Location</strong>, click the reported host name, and record the service IP address of the host.</span></li><li id="ALM-12202__li22541232175718"><span>Choose <strong id="ALM-12202__b889111261333">System</strong> > <strong id="ALM-12202__b11374291318">OMS</strong> to view the <strong id="ALM-12202__b229553810310">OMS Process Memory Usage Ratio</strong> chart. Check whether the memory usage of the processes reaches the threshold (90% by default) at the time when the alarm is generated.</span><p><div class="p" id="ALM-12202__p187211915314">If no chart is available, click the drop-down arrow on the right, select <strong id="ALM-12202__b1815917537418">Customize</strong>, select the desired item, and click <strong id="ALM-12202__b41602533419">OK</strong>.<ul class="subitemlist" id="ALM-12202__ul111815243720"><li id="ALM-12202__li15181527373">If yes, go to <a href="#ALM-12202__li17254173220575">3</a>.</li><li id="ALM-12202__li11813253710">If the threshold is not reached, go to <a href="#ALM-12202__li17840184055712">6</a>.</li></ul>
|
||||
</div>
|
||||
</p></li><li id="ALM-12202__li17254173220575"><a name="ALM-12202__li17254173220575"></a><a name="li17254173220575"></a><span>Contact <span id="ALM-12202__text99023406416">O&M personnel</span> to modify the memory configurations of the processes.</span></li><li id="ALM-12202__li34811632712"><span>Restart the processes for which the alarm is generated.</span></li><li id="ALM-12202__li18254133295717"><span>Check whether the alarm is cleared in 10 minutes.</span><p><ul class="subitemlist" id="ALM-12202__ul35432427526"><li id="ALM-12202__li1654344217527">If yes, no further action is required.</li><li id="ALM-12202__li0544154235215">If the threshold is not reached, go to <a href="#ALM-12202__li17840184055712">6</a>.</li></ul>
|
||||
</p></li></ol>
|
||||
<p id="ALM-12202__a5f67c845ae194d15b4e5da41d8afcf80"><strong id="ALM-12202__b153101030184412">Collect fault information.</strong></p>
|
||||
<ol start="6" id="ALM-12202__ol6840174012570"><li id="ALM-12202__li17840184055712"><a name="ALM-12202__li17840184055712"></a><a name="li17840184055712"></a><span>On MRS Manager, choose <strong id="ALM-12202__b19293175818520">O&M</strong>. In the navigation pane on the left, choose <strong id="ALM-12202__b1829355813515">Log</strong> > <strong id="ALM-12202__b2029445816515">Download</strong>.</span></li><li id="ALM-12202__li584084075713"><span>Expand the <strong id="ALM-12202__b7957961265">Service</strong> drop-down list, and select <strong id="ALM-12202__b18958166765">OmmServer</strong> for the target cluster.</span></li><li id="ALM-12202__li28401340155713"><span>Click the edit icon in the upper right corner, and set <strong id="ALM-12202__b032682417613">Start Date</strong> and <strong id="ALM-12202__b2032618245618">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-12202__b15327122410615">Download</strong>.</span></li><li id="ALM-12202__li15840104065712"><span>Contact <span id="ALM-12202__text89761227462">O&M personnel</span> and provide the collected logs.</span></li></ol>
|
||||
</div>
|
||||
<div class="section" id="ALM-12202__section16897185085520"><h4 class="sectiontitle"><span id="ALM-12202__text134681322839">Alarm Clearance</span></h4><p id="ALM-12202__p1889715508558">This alarm is automatically cleared after the fault is rectified.</p>
|
||||
</div>
|
||||
<div class="section" id="ALM-12202__s9121af30e9174ff4a8ea197579ce835d"><h4 class="sectiontitle"><span id="ALM-12202__text1951416381860">Related Information</span></h4><p id="ALM-12202__ab49c3640168848f38fc60ef20476b006"><span id="ALM-12202__text1230461819">None.</span></p>
|
||||
</div>
|
||||
</div>
|
||||
<div>
|
||||
<div class="familylinks">
|
||||
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
93
docs/mrs/umn/ALM-12203.html
Normal file
93
docs/mrs/umn/ALM-12203.html
Normal file
@ -0,0 +1,93 @@
|
||||
<a name="ALM-12203"></a><a name="ALM-12203"></a>
|
||||
|
||||
<h1 class="topictitle1">ALM-12203 Process Full GC Duration Exceeds the Threshold</h1>
|
||||
<div id="body0000002380457460"><div class="section" id="ALM-12203__section35798275"><h4 class="sectiontitle"><span id="ALM-12203__text265103912375">Alarm Description</span></h4><p id="ALM-12203__p1118882117462">The system checks the GC duration of main OMS processes every 30 seconds. If the GC duration of an OMS process exceeds the threshold for three consecutive times, this alarm is generated. You can choose <strong id="ALM-12203__b453418101192">O&M</strong> > <strong id="ALM-12203__b152687121992">Alarm</strong> > <strong id="ALM-12203__b164092055857">Thresholds</strong> > <strong id="ALM-12203__b331609619">OMS</strong> > <strong id="ALM-12203__b1587115212611">OMSServices</strong> to change the threshold.</p>
|
||||
<p id="ALM-12203__p6121288">This alarm is cleared when the GC duration of the OMS process is shorter than or equal to the threshold.</p>
|
||||
<div class="note" id="ALM-12203__note108112448549"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="ALM-12203__p68111446548">This alarm applies only to MRS 3.3.1 or later.</p>
|
||||
</div></div>
|
||||
</div>
|
||||
<div class="section" id="ALM-12203__section53749019"><h4 class="sectiontitle"><span id="ALM-12203__text7107134283715">Alarm Attributes</span></h4>
|
||||
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-12203__table26062329" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-12203__row59129055"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.1"><p id="ALM-12203__p24724130"><span id="ALM-12203__text9351846183717">Alarm ID</span></p>
|
||||
</th>
|
||||
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.2"><p id="ALM-12203__p56497485"><span id="ALM-12203__text192344883719">Alarm Severity</span></p>
|
||||
</th>
|
||||
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.3"><p id="ALM-12203__p12893577"><span id="ALM-12203__text22815522375">Auto Cleared</span></p>
|
||||
</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody><tr id="ALM-12203__row37746813"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.1 "><p id="ALM-12203__p37593042">12203</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.2 "><p id="ALM-12203__p25137584">Major</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.3 "><p id="ALM-12203__p22878398">Yes</p>
|
||||
</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
</div>
|
||||
</div>
|
||||
<div class="section" id="ALM-12203__section13979128"><h4 class="sectiontitle"><span id="ALM-12203__text17202105711376">Alarm Parameters</span></h4>
|
||||
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-12203__table41210916" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-12203__row28890097"><th align="left" class="cellrowborder" valign="top" width="20%" id="mcps1.3.3.2.1.4.1.1"><p id="ALM-12203__p1789332383017"><span id="ALM-12203__text19218152013114">Type</span></p>
|
||||
</th>
|
||||
<th align="left" class="cellrowborder" valign="top" width="20%" id="mcps1.3.3.2.1.4.1.2"><p id="ALM-12203__p58396496"><span id="ALM-12203__text125119019384">Parameter</span></p>
|
||||
</th>
|
||||
<th align="left" class="cellrowborder" valign="top" width="60%" id="mcps1.3.3.2.1.4.1.3"><p id="ALM-12203__p32495724"><span id="ALM-12203__text9402629389">Description</span></p>
|
||||
</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody><tr id="ALM-12203__row18994184913243"><td class="cellrowborder" rowspan="4" valign="top" width="20%" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-12203__p49971256314">Location Information</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="20%" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-12203__p156438591896">Source</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="60%" headers="mcps1.3.3.2.1.4.1.3 "><p id="ALM-12203__p187931338134115">Specifies the cluster for which the alarm was generated.</p>
|
||||
</td>
|
||||
</tr>
|
||||
<tr id="ALM-12203__row14907948"><td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-12203__p65062640">ServiceName</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-12203__p33433017">Specifies the service for which the alarm was generated.</p>
|
||||
</td>
|
||||
</tr>
|
||||
<tr id="ALM-12203__row32461705"><td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-12203__p35626567">RoleName</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-12203__p44825864">Specifies the role for which the alarm was generated.</p>
|
||||
</td>
|
||||
</tr>
|
||||
<tr id="ALM-12203__row779592"><td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-12203__p51620924">HostName</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-12203__p14632331">Specifies the host for which the alarm was generated.</p>
|
||||
</td>
|
||||
</tr>
|
||||
<tr id="ALM-12203__row1016518552460"><td class="cellrowborder" valign="top" width="20%" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-12203__p88947231308">Additional Information</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="20%" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-12203__p57854422">Trigger condition</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="60%" headers="mcps1.3.3.2.1.4.1.3 "><p id="ALM-12203__p55696635">Specifies the alarm triggering condition.</p>
|
||||
</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
</div>
|
||||
</div>
|
||||
<div class="section" id="ALM-12203__section58703289"><h4 class="sectiontitle"><span id="ALM-12203__text08997593815">Impact on the System</span></h4><p id="ALM-12203__p44368143">Read and write performance deteriorates. As a result, the task execution may slow down and even the service may restart unexpectedly.</p>
|
||||
</div>
|
||||
<div class="section" id="ALM-12203__section58567561"><h4 class="sectiontitle"><span id="ALM-12203__text69658819380">Possible Causes</span></h4><p id="ALM-12203__p37049824">The memory of main OMS processes is too high or inappropriately allocated, causing frequent occurrence of the full GC.</p>
|
||||
</div>
|
||||
<div class="section" id="ALM-12203__section168711057121616"><h4 class="sectiontitle"><span id="ALM-12203__text145340111381">Handling Procedure</span></h4><p class="tableheading" id="ALM-12203__p48245733"><strong id="ALM-12203__b91730805661232">Check the GC duration.</strong></p>
|
||||
<ol id="ALM-12203__ol24110607142341"><li id="ALM-12203__li4291378163548"><span>On MRS Manager, choose <strong id="ALM-12203__b17615014141214">O&M</strong> > <strong id="ALM-12203__b161531413129">Alarm</strong> > <strong id="ALM-12203__b136168142126">Alarms</strong>. In the alarm list, expand the alarm details, record the process name in <strong id="ALM-12203__b5616171410128">Location</strong>, click the reported host name, and record the service IP address of the host.</span></li><li id="ALM-12203__li53865803163548"><span>Choose <strong id="ALM-12203__b540515197126">System</strong> > <strong id="ALM-12203__b13292121161215">OMS</strong>, view the Full GC Times of OMS Process chart, and check whether the GC time is longer than 12 seconds (default value).</span><p><div class="p" id="ALM-12203__p187211915314">If no chart is available, click the drop-down arrow on the right, select <strong id="ALM-12203__b200081056161232">Customize</strong>, select the desired item, and click <strong id="ALM-12203__b31604493961232">OK</strong>.<ul class="subitemlist" id="ALM-12203__ul111815243720"><li id="ALM-12203__li15181527373">If yes, go to <a href="#ALM-12203__li17254173220575">3</a>.</li><li id="ALM-12203__li11813253710">If no, go to <a href="#ALM-12203__li24184344163548">6</a>.</li></ul>
|
||||
</div>
|
||||
</p></li></ol><ol start="3" id="ALM-12203__ol22352282142356"><li id="ALM-12203__li17254173220575"><a name="ALM-12203__li17254173220575"></a><a name="li17254173220575"></a><span>Contact <span id="ALM-12203__text6386121137">O&M personnel</span> to modify the memory configurations of the processes.</span></li><li id="ALM-12203__li18585192918588"><span>Restart the process.</span></li><li id="ALM-12203__li62339472163548"><span>Check whether the alarm is cleared.</span><p><ul class="subitemlist" id="ALM-12203__ul18996721163548"><li id="ALM-12203__li27642875163548">If yes, no further action is required.</li><li id="ALM-12203__li24480368163548">If no, go to <a href="#ALM-12203__li24184344163548">6</a>.</li></ul>
|
||||
</p></li></ol>
|
||||
<p class="tableheading" id="ALM-12203__p36752769163548"><strong id="ALM-12203__b126317185133">Collect fault information.</strong></p>
|
||||
<ol start="6" id="ALM-12203__ol29820692163654"><li id="ALM-12203__li24184344163548"><a name="ALM-12203__li24184344163548"></a><a name="li24184344163548"></a><span>On MRS Manager, choose <strong id="ALM-12203__b15560420181313">O&M</strong>. In the navigation pane on the left, choose <strong id="ALM-12203__b8560142061317">Log</strong> > <strong id="ALM-12203__b1256142071310">Download</strong>.</span></li><li id="ALM-12203__li16332504163548"><span>Expand the <strong id="ALM-12203__b1582232471314">Service</strong> drop-down list, and select <strong id="ALM-12203__b18823524131316">OmmServer</strong> for the target cluster.</span></li><li id="ALM-12203__li12774810163548"><span>Click the edit icon in the upper right corner, and set <strong id="ALM-12203__b278742771311">Start Date</strong> and <strong id="ALM-12203__b1078702718134">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-12203__b1378732731317">Download</strong>.</span></li><li id="ALM-12203__li47864432163548"><span>Contact <span id="ALM-12203__text126301214142412">O&M personnel</span> and provide the collected logs.</span></li></ol>
|
||||
</div>
|
||||
<div class="section" id="ALM-12203__section169311343318"><h4 class="sectiontitle"><span id="ALM-12203__text182381117183819">Alarm Clearance</span></h4><p id="ALM-12203__p754913417333">This alarm is automatically cleared after the fault is rectified.</p>
|
||||
</div>
|
||||
<div class="section" id="ALM-12203__section46352032"><h4 class="sectiontitle"><span id="ALM-12203__text4952151973819">Related Information</span></h4><p id="ALM-12203__p38036089"><span id="ALM-12203__text61294221672">None.</span></p>
|
||||
</div>
|
||||
</div>
|
||||
<div>
|
||||
<div class="familylinks">
|
||||
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
98
docs/mrs/umn/ALM-12204.html
Normal file
98
docs/mrs/umn/ALM-12204.html
Normal file
@ -0,0 +1,98 @@
|
||||
<a name="ALM-12204"></a><a name="ALM-12204"></a>
|
||||
|
||||
<h1 class="topictitle1">ALM-12204 Wait Duration of a Disk Read Exceeds the Threshold</h1>
|
||||
<div id="body0000002413936965"><div class="section" id="ALM-12204__section118513241113"><h4 class="sectiontitle"><span id="ALM-12204__text6861824414">Alarm Description</span></h4><p id="ALM-12204__p17860241315">The system checks the wait duration of a disk read every 30 seconds and compares the actual wait duration with the threshold. This alarm is generated when the wait duration exceeds the threshold (10s by default) for multiple consecutive times.</p>
|
||||
<p id="ALM-12204__p68652414113">This alarm is cleared when the wait duration is less than or equal to the threshold.</p>
|
||||
<div class="note" id="ALM-12204__note108112448549"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="ALM-12204__p68111446548">This alarm applies only to MRS 3.3.1 or later.</p>
|
||||
</div></div>
|
||||
</div>
|
||||
<div class="section" id="ALM-12204__section158616245115"><h4 class="sectiontitle"><span id="ALM-12204__text086024218">Alarm Attributes</span></h4>
|
||||
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-12204__table28652410119" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-12204__row98618241212"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.1"><p id="ALM-12204__p16866241117"><span id="ALM-12204__text158620245120">Alarm ID</span></p>
|
||||
</th>
|
||||
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.2"><p id="ALM-12204__p138615244117"><span id="ALM-12204__text158614241113">Alarm Severity</span></p>
|
||||
</th>
|
||||
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.3"><p id="ALM-12204__p386324016"><span id="ALM-12204__text386172414118">Auto Cleared</span></p>
|
||||
</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody><tr id="ALM-12204__row11867241210"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.1 "><p id="ALM-12204__p118672419117">12204</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.2 "><p id="ALM-12204__p11863241614">Major</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.3 "><p id="ALM-12204__p1986102417115">Yes</p>
|
||||
</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
</div>
|
||||
</div>
|
||||
<div class="section" id="ALM-12204__section12888241312"><h4 class="sectiontitle"><span id="ALM-12204__text12883241212">Alarm Parameters</span></h4>
|
||||
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-12204__table188324911" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-12204__row88802416118"><th align="left" class="cellrowborder" valign="top" width="20%" id="mcps1.3.3.2.1.4.1.1"><p id="ALM-12204__p9881241214"><span id="ALM-12204__text168812241214">Type</span></p>
|
||||
</th>
|
||||
<th align="left" class="cellrowborder" valign="top" width="20%" id="mcps1.3.3.2.1.4.1.2"><p id="ALM-12204__p9881524711"><span id="ALM-12204__text6881724016">Parameter</span></p>
|
||||
</th>
|
||||
<th align="left" class="cellrowborder" valign="top" width="60%" id="mcps1.3.3.2.1.4.1.3"><p id="ALM-12204__p1288224115"><span id="ALM-12204__text1988192417114">Description</span></p>
|
||||
</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody><tr id="ALM-12204__row78852416112"><td class="cellrowborder" rowspan="4" valign="top" width="20%" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-12204__p1288162419117">Location Information</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="20%" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-12204__p2885245113">Source</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="60%" headers="mcps1.3.3.2.1.4.1.3 "><p id="ALM-12204__p8883244117">Specifies the cluster or system for which the alarm was generated.</p>
|
||||
</td>
|
||||
</tr>
|
||||
<tr id="ALM-12204__row1788224415"><td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-12204__p28816246115">ServiceName</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-12204__p108816240115">Specifies the service for which the alarm was generated.</p>
|
||||
</td>
|
||||
</tr>
|
||||
<tr id="ALM-12204__row888202414114"><td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-12204__p4888249111">RoleName</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-12204__p17887241816">Specifies the role for which the alarm was generated.</p>
|
||||
</td>
|
||||
</tr>
|
||||
<tr id="ALM-12204__row20887247116"><td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-12204__p12885243110">HostName</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-12204__p9881024911">Specifies the host for which the alarm was generated.</p>
|
||||
</td>
|
||||
</tr>
|
||||
<tr id="ALM-12204__row1882024611"><td class="cellrowborder" valign="top" width="20%" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-12204__p2088192410115">Additional Information</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="20%" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-12204__p1288324616">Trigger Condition</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="60%" headers="mcps1.3.3.2.1.4.1.3 "><p id="ALM-12204__p148862419114">Specifies the alarm triggering condition.</p>
|
||||
</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
</div>
|
||||
</div>
|
||||
<div class="section" id="ALM-12204__section788172412113"><h4 class="sectiontitle"><span id="ALM-12204__text108812247114">Impact on the System</span></h4><ul id="ALM-12204__ul48813241511"><li id="ALM-12204__li9881724713">Latency: Service processes may run slowly and there is a latency.</li><li id="ALM-12204__li208862419120">Service failure: Service processing may be slow, time out, or fail. As a result, jobs may fail to run.</li></ul>
|
||||
</div>
|
||||
<div class="section" id="ALM-12204__section198914249112"><h4 class="sectiontitle"><span id="ALM-12204__text18917241317">Possible Causes</span></h4><ul id="ALM-12204__ul1389192413116"><li id="ALM-12204__li48915244110">The alarm threshold or alarm trigger count is improperly configured.</li><li id="ALM-12204__li78914241914">The disk configuration cannot meet service requirements. The disk I/O performance reaches the upper limit. Alternatively, services are in peak hours. The wait duration of a disk read reaches the upper limit in a short period.</li></ul>
|
||||
</div>
|
||||
<div class="section" id="ALM-12204__section68910249114"><h4 class="sectiontitle"><span id="ALM-12204__text158922411111">Handling Procedure</span></h4><p class="tableheading" id="ALM-12204__p389624217"><strong id="ALM-12204__b5898241914">Check whether the alarm threshold or alarm trigger count is properly configured.</strong></p>
|
||||
<ol id="ALM-12204__ol1189824717"><li id="ALM-12204__li58992418110"><span>Modify the alarm threshold and alarm trigger count based on the actual disk I/O usage.</span><p><ol type="a" id="ALM-12204__ol13162521162515"><li class="litext" id="ALM-12204__li1216210218257">Log in to MRS Manager and choose <strong id="ALM-12204__b194112541610">O&M</strong> > <strong id="ALM-12204__b11602122601613">Alarm</strong> > <strong id="ALM-12204__b1843435315613">Thresholds</strong> > <em id="ALM-12204__i3722125611616">Name of the desired cluster</em> > <strong id="ALM-12204__b47081659269">Host</strong> > <strong id="ALM-12204__b394816112710">Disk</strong> > <strong id="ALM-12204__b8934226101618">Average Time Required for Each Read operation</strong>.</li><li class="litext" id="ALM-12204__li18429223192513">Click the edit button next to <strong id="ALM-12204__b163643463461241">Trigger Count</strong> to set it a proper value based on the actual service usage.<div class="note" id="ALM-12204__note38913241212"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p class="text" id="ALM-12204__p989102411115"><strong id="ALM-12204__b121998861561241">Trigger Count</strong> indicates how many consecutive times the threshold is reached when the alarm is triggered.</p>
|
||||
</div></div>
|
||||
</li><li class="litext" id="ALM-12204__li1855212543258">Click <strong id="ALM-12204__b114290508561241">Modify</strong> in the <strong id="ALM-12204__b19744384461241">Operation</strong> column of the row that contains the rule and change the alarm threshold.</li></ol>
|
||||
</p></li><li id="ALM-12204__li17892241618"><span>Wait 2 minutes and check whether the alarm is automatically cleared.</span><p><ul class="subitemlist" id="ALM-12204__ul14899248112"><li id="ALM-12204__li98952419118">If yes, no further action is required.</li><li id="ALM-12204__li108922416114">If no, go to <a href="#ALM-12204__li15891424513">3</a>.</li></ul>
|
||||
</p></li></ol>
|
||||
<p class="tableheading" id="ALM-12204__p489102416113"><strong id="ALM-12204__b7575164519172">Check whether the average time required for each read operation reaches the upper limit.</strong></p>
|
||||
<ol start="3" id="ALM-12204__ol1189192417113"><li id="ALM-12204__li15891424513"><a name="ALM-12204__li15891424513"></a><a name="li15891424513"></a><span>On MRS Manager, choose <strong id="ALM-12204__b79324244175">O&M</strong> > <strong id="ALM-12204__b10933122417179">Alarm</strong> > <strong id="ALM-12204__b20934524181713">Alarms</strong>. In the alarm list, expand the alarm details and click the name of the host for which the alarm is generated in <strong id="ALM-12204__b1293512411714">Location</strong> area.</span></li><li id="ALM-12204__li28914248113"><span>On the overview page of the host, observe the real-time data of average time required for each read operation for about 5 minutes. If the wait duration exceeds the threshold for multiple times, contact the MRS cluster administrator to improve the disk specification.</span><p><p id="ALM-12204__p187211915314">If the <strong id="ALM-12204__b1497552612199">Average Time Required for Each Read Operation</strong> chart is unavailable, click the drop-down arrow on the right, select <strong id="ALM-12204__b144191836151916">Customize</strong>, select the corresponding item, and click <strong id="ALM-12204__b2031323811196">OK</strong>.</p>
|
||||
</p></li><li id="ALM-12204__li58972411117"><span>Check whether it was the peak hour. If this alarm was generated during peak hours, expand the node capacity or contact the MRS cluster administrator to improve the disk specification.</span></li><li id="ALM-12204__li3891246115"><span>Check whether the alarm is cleared.</span><p><ul class="subitemlist" id="ALM-12204__ul108913248118"><li id="ALM-12204__li88911244116">If yes, no further action is required.</li><li id="ALM-12204__li19897244119">If no, go to <a href="#ALM-12204__li289102416117">7</a>.</li></ul>
|
||||
</p></li></ol>
|
||||
<p class="tableheading" id="ALM-12204__p1989192418113"><strong id="ALM-12204__b68912419116">Collect fault information.</strong></p>
|
||||
<ol start="7" id="ALM-12204__ol289152414119"><li id="ALM-12204__li289102416117"><a name="ALM-12204__li289102416117"></a><a name="li289102416117"></a><span>On MRS Manager, choose <strong id="ALM-12204__b1717510579339">O&M</strong>. In the navigation pane on the left, choose <strong id="ALM-12204__b517685712331">Log</strong> > <strong id="ALM-12204__b17176115733313">Download</strong>.</span></li><li id="ALM-12204__li18982416120"><span>Expand the <strong id="ALM-12204__b8788068161241">Service</strong> drop-down list, select <strong id="ALM-12204__b25653840461241">NodeAgent</strong> for the target cluster, and click <strong id="ALM-12204__b96041276461241">OK</strong>.</span></li><li id="ALM-12204__li989724817"><span>Click the edit icon in the upper right corner, and set <strong id="ALM-12204__b110421531361241">Start Date</strong> and <strong id="ALM-12204__b199639447061241">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-12204__b59968104861241">Download</strong>.</span></li><li id="ALM-12204__li168914245110"><span>Contact <span id="ALM-12204__text3901424916">O&M personnel</span> and provide the collected logs.</span></li></ol>
|
||||
</div>
|
||||
<div class="section" id="ALM-12204__section99011241715"><h4 class="sectiontitle"><span id="ALM-12204__text1290102413116">Alarm Clearance</span></h4><p id="ALM-12204__p1909244116">This alarm is automatically cleared after the fault is rectified.</p>
|
||||
</div>
|
||||
<div class="section" id="ALM-12204__section1890132416119"><h4 class="sectiontitle"><span id="ALM-12204__text1590192416114">Related Information</span></h4><p id="ALM-12204__p490162412113"><span id="ALM-12204__text11905241310">None.</span></p>
|
||||
</div>
|
||||
</div>
|
||||
<div>
|
||||
<div class="familylinks">
|
||||
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
98
docs/mrs/umn/ALM-12205.html
Normal file
98
docs/mrs/umn/ALM-12205.html
Normal file
@ -0,0 +1,98 @@
|
||||
<a name="ALM-12205"></a><a name="ALM-12205"></a>
|
||||
|
||||
<h1 class="topictitle1">ALM-12205 Wait Duration of a Disk Write Exceeds the Threshold</h1>
|
||||
<div id="body0000002380297584"><div class="section" id="ALM-12205__section118513241113"><h4 class="sectiontitle"><span id="ALM-12205__text6861824414">Alarm Description</span></h4><p id="ALM-12205__p17860241315">The system checks the wait duration of a disk write every 30 seconds and compares the actual wait duration with the threshold. This alarm is generated when the wait duration exceeds the threshold (10s by default) for multiple consecutive times.</p>
|
||||
<p id="ALM-12205__p68652414113">This alarm is cleared when the wait duration is less than or equal to the threshold.</p>
|
||||
<div class="note" id="ALM-12205__note108112448549"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="ALM-12205__p68111446548">This alarm applies only to MRS 3.3.1 or later.</p>
|
||||
</div></div>
|
||||
</div>
|
||||
<div class="section" id="ALM-12205__section158616245115"><h4 class="sectiontitle"><span id="ALM-12205__text086024218">Alarm Attributes</span></h4>
|
||||
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-12205__table28652410119" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-12205__row98618241212"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.1"><p id="ALM-12205__p16866241117"><span id="ALM-12205__text158620245120">Alarm ID</span></p>
|
||||
</th>
|
||||
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.2"><p id="ALM-12205__p138615244117"><span id="ALM-12205__text158614241113">Alarm Severity</span></p>
|
||||
</th>
|
||||
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.3"><p id="ALM-12205__p386324016"><span id="ALM-12205__text386172414118">Auto Cleared</span></p>
|
||||
</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody><tr id="ALM-12205__row11867241210"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.1 "><p id="ALM-12205__p118672419117">12205</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.2 "><p id="ALM-12205__p11863241614">Major</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.3 "><p id="ALM-12205__p1986102417115">Yes</p>
|
||||
</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
</div>
|
||||
</div>
|
||||
<div class="section" id="ALM-12205__section12888241312"><h4 class="sectiontitle"><span id="ALM-12205__text12883241212">Alarm Parameters</span></h4>
|
||||
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-12205__table188324911" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-12205__row88802416118"><th align="left" class="cellrowborder" valign="top" width="20%" id="mcps1.3.3.2.1.4.1.1"><p id="ALM-12205__p9881241214"><span id="ALM-12205__text168812241214">Type</span></p>
|
||||
</th>
|
||||
<th align="left" class="cellrowborder" valign="top" width="20%" id="mcps1.3.3.2.1.4.1.2"><p id="ALM-12205__p9881524711"><span id="ALM-12205__text6881724016">Parameter</span></p>
|
||||
</th>
|
||||
<th align="left" class="cellrowborder" valign="top" width="60%" id="mcps1.3.3.2.1.4.1.3"><p id="ALM-12205__p1288224115"><span id="ALM-12205__text1988192417114">Description</span></p>
|
||||
</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody><tr id="ALM-12205__row78852416112"><td class="cellrowborder" rowspan="4" valign="top" width="20%" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-12205__p1288162419117">Location Information</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="20%" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-12205__p2885245113">Source</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="60%" headers="mcps1.3.3.2.1.4.1.3 "><p id="ALM-12205__p8883244117">Specifies the cluster or system for which the alarm was generated.</p>
|
||||
</td>
|
||||
</tr>
|
||||
<tr id="ALM-12205__row1788224415"><td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-12205__p28816246115">ServiceName</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-12205__p108816240115">Specifies the service for which the alarm was generated.</p>
|
||||
</td>
|
||||
</tr>
|
||||
<tr id="ALM-12205__row888202414114"><td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-12205__p4888249111">RoleName</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-12205__p17887241816">Specifies the role for which the alarm was generated.</p>
|
||||
</td>
|
||||
</tr>
|
||||
<tr id="ALM-12205__row20887247116"><td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-12205__p12885243110">HostName</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-12205__p9881024911">Specifies the host for which the alarm was generated.</p>
|
||||
</td>
|
||||
</tr>
|
||||
<tr id="ALM-12205__row1882024611"><td class="cellrowborder" valign="top" width="20%" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-12205__p2088192410115">Additional Information</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="20%" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-12205__p1288324616">Trigger Condition</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="60%" headers="mcps1.3.3.2.1.4.1.3 "><p id="ALM-12205__p148862419114">Specifies the alarm triggering condition.</p>
|
||||
</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
</div>
|
||||
</div>
|
||||
<div class="section" id="ALM-12205__section788172412113"><h4 class="sectiontitle"><span id="ALM-12205__text108812247114">Impact on the System</span></h4><ul id="ALM-12205__ul48813241511"><li id="ALM-12205__li9881724713">Latency: Service processes may run slowly and there is a latency.</li><li id="ALM-12205__li208862419120">Service failure: Service processing may be slow, time out, or fail. As a result, jobs may fail to run.</li></ul>
|
||||
</div>
|
||||
<div class="section" id="ALM-12205__section198914249112"><h4 class="sectiontitle"><span id="ALM-12205__text18917241317">Possible Causes</span></h4><ul id="ALM-12205__ul1389192413116"><li id="ALM-12205__li48915244110">The alarm threshold or alarm trigger count is improperly configured.</li><li id="ALM-12205__li78914241914">The disk configuration cannot meet service requirements. The disk I/O performance reaches the upper limit. Alternatively, services are in peak hours. The wait duration of a disk write reaches the upper limit in a short period.</li></ul>
|
||||
</div>
|
||||
<div class="section" id="ALM-12205__section68910249114"><h4 class="sectiontitle"><span id="ALM-12205__text158922411111">Handling Procedure</span></h4><p class="tableheading" id="ALM-12205__p389624217"><strong id="ALM-12205__b5898241914">Check whether the alarm threshold or alarm trigger count is properly configured.</strong></p>
|
||||
<ol id="ALM-12205__ol1189824717"><li id="ALM-12205__li58992418110"><span>Modify the alarm threshold and alarm trigger count based on the actual disk I/O usage.</span><p><ol type="a" id="ALM-12205__ol13162521162515"><li class="litext" id="ALM-12205__li1216210218257">Log in to MRS Manager and choose <strong id="ALM-12205__b871764718121">O&M</strong> > <strong id="ALM-12205__b551844920121">Alarm</strong> > <strong id="ALM-12205__b6420155411219">Thresholds</strong> > <em id="ALM-12205__i11415165716128">Name of the desired cluster</em> > <strong id="ALM-12205__b195541013139">Host</strong> > <strong id="ALM-12205__b153572214138">Disk</strong> > <strong id="ALM-12205__b61117162915">Average Time Required for Each Write Operation</strong>.</li><li class="litext" id="ALM-12205__li18429223192513">Click the edit button next to <strong id="ALM-12205__b58712236861223">Trigger Count</strong> to set it a proper value based on the actual service usage.<div class="note" id="ALM-12205__note38913241212"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p class="text" id="ALM-12205__p989102411115"><strong id="ALM-12205__b163681063861223">Trigger Count</strong> indicates how many consecutive times the threshold is reached when the alarm is triggered.</p>
|
||||
</div></div>
|
||||
</li><li class="litext" id="ALM-12205__li1855212543258">Click <strong id="ALM-12205__b124442631561223">Modify</strong> in the <strong id="ALM-12205__b150246629361223">Operation</strong> column of the row that contains the rule and change the alarm threshold.</li></ol>
|
||||
</p></li><li id="ALM-12205__li17892241618"><span>Wait 2 minutes and check whether the alarm is automatically cleared.</span><p><ul class="subitemlist" id="ALM-12205__ul14899248112"><li id="ALM-12205__li98952419118">If yes, no further action is required.</li><li id="ALM-12205__li108922416114">If no, go to <a href="#ALM-12205__li15891424513">3</a>.</li></ul>
|
||||
</p></li></ol>
|
||||
<p class="tableheading" id="ALM-12205__p489102416113"><strong id="ALM-12205__b11420754131710">Check whether the average time required for each write operation reaches the upper limit.</strong></p>
|
||||
<ol start="3" id="ALM-12205__ol1189192417113"><li id="ALM-12205__li15891424513"><a name="ALM-12205__li15891424513"></a><a name="li15891424513"></a><span>On MRS Manager, choose <strong id="ALM-12205__b154911884261223">O&M</strong> > <strong id="ALM-12205__b177169868961223">Alarm</strong> > <strong id="ALM-12205__b149933576861223">Alarms</strong>. In the alarm list, expand the alarm details and click the name of the host for which the alarm is generated in <strong id="ALM-12205__b27991774261223">Location</strong> area.</span></li><li id="ALM-12205__li28914248113"><span>On the overview page of the host, observe the real-time data of average time required for each write operation for about 5 minutes. If the wait duration exceeds the threshold for multiple times, contact the MRS cluster administrator to improve the disk specification.</span><p><p id="ALM-12205__p187211915314">If the <strong id="ALM-12205__b1874116327915">Average Time Required for Each Write Operation</strong> chart is not displayed, click the drop-down arrow on the right, select <strong id="ALM-12205__b10366145416912">Customize</strong>, select the desired item, and click <strong id="ALM-12205__b111557321018">OK</strong>.</p>
|
||||
</p></li><li id="ALM-12205__li58972411117"><span>Check whether it was the peak hour. If this alarm was generated during peak hours, expand the node capacity or contact the MRS cluster administrator to improve the disk specification.</span></li><li id="ALM-12205__li3891246115"><span>Check whether the alarm is cleared.</span><p><ul class="subitemlist" id="ALM-12205__ul108913248118"><li id="ALM-12205__li88911244116">If yes, no further action is required.</li><li id="ALM-12205__li19897244119">If no, go to <a href="#ALM-12205__li289102416117">7</a>.</li></ul>
|
||||
</p></li></ol>
|
||||
<p class="tableheading" id="ALM-12205__p1989192418113"><strong id="ALM-12205__b68912419116">Collect fault information.</strong></p>
|
||||
<ol start="7" id="ALM-12205__ol289152414119"><li id="ALM-12205__li289102416117"><a name="ALM-12205__li289102416117"></a><a name="li289102416117"></a><span>On MRS Manager, choose <strong id="ALM-12205__b727103173412">O&M</strong>. In the navigation pane on the left, choose <strong id="ALM-12205__b5278353411">Log</strong> > <strong id="ALM-12205__b192793133412">Download</strong>.</span></li><li id="ALM-12205__li18982416120"><span>Expand the <strong id="ALM-12205__b161486186861223">Service</strong> drop-down list, select <strong id="ALM-12205__b180207094561223">NodeAgent</strong> for the target cluster, and click <strong id="ALM-12205__b202669836561223">OK</strong>.</span></li><li id="ALM-12205__li989724817"><span>Click the edit icon in the upper right corner, and set <strong id="ALM-12205__b140354476961223">Start Date</strong> and <strong id="ALM-12205__b47274985761223">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-12205__b48958483361223">Download</strong>.</span></li><li id="ALM-12205__li168914245110"><span>Contact <span id="ALM-12205__text3901424916">O&M personnel</span> and provide the collected logs.</span></li></ol>
|
||||
</div>
|
||||
<div class="section" id="ALM-12205__section99011241715"><h4 class="sectiontitle"><span id="ALM-12205__text1290102413116">Alarm Clearance</span></h4><p id="ALM-12205__p1909244116">This alarm is automatically cleared after the fault is rectified.</p>
|
||||
</div>
|
||||
<div class="section" id="ALM-12205__section1890132416119"><h4 class="sectiontitle"><span id="ALM-12205__text1590192416114">Related Information</span></h4><p id="ALM-12205__p490162412113"><span id="ALM-12205__text11905241310">None.</span></p>
|
||||
</div>
|
||||
</div>
|
||||
<div>
|
||||
<div class="familylinks">
|
||||
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
81
docs/mrs/umn/ALM-12206.html
Normal file
81
docs/mrs/umn/ALM-12206.html
Normal file
@ -0,0 +1,81 @@
|
||||
<a name="ALM-12206"></a><a name="ALM-12206"></a>
|
||||
|
||||
<h1 class="topictitle1">ALM-12206 Password Has Expired</h1>
|
||||
<div id="body0000002414056797"><div class="section" id="ALM-12206__section60313499"><h4 class="sectiontitle"><span id="ALM-12206__text164311244911">Alarm Description</span></h4><p id="ALM-12206__p1212203510114">The system checks whether a user password has expired at 1:00 a.m. every day. This alarm is generated when a user password has expired.</p>
|
||||
<p id="ALM-12206__p371028104017">This alarm is cleared when the user password in the system is within the validity period.</p>
|
||||
<div class="note" id="ALM-12206__note108112448549"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="ALM-12206__p68111446548">This alarm applies only to MRS 3.3.1 or later.</p>
|
||||
</div></div>
|
||||
</div>
|
||||
<div class="section" id="ALM-12206__section5950580"><h4 class="sectiontitle"><span id="ALM-12206__text4431134419113">Alarm Attributes</span></h4>
|
||||
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-12206__table15548096" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-12206__row49989141"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.1"><p id="ALM-12206__p5431124414118"><span id="ALM-12206__text164315441716">Alarm ID</span></p>
|
||||
</th>
|
||||
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.2"><p id="ALM-12206__p543118441210"><span id="ALM-12206__text5431144415112">Alarm Severity</span></p>
|
||||
</th>
|
||||
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.3"><p id="ALM-12206__p94311444118"><span id="ALM-12206__text7431104415112">Auto Cleared</span></p>
|
||||
</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody><tr id="ALM-12206__row30415758"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.1 "><p id="ALM-12206__p052373115328">12206</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.2 "><p id="ALM-12206__p11522631133214">Major</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.3 "><p id="ALM-12206__p4528550">Yes</p>
|
||||
</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
</div>
|
||||
</div>
|
||||
<div class="section" id="ALM-12206__section53555227"><h4 class="sectiontitle"><span id="ALM-12206__text134312441118">Alarm Parameters</span></h4>
|
||||
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-12206__table31268239" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-12206__row59179380"><th align="left" class="cellrowborder" valign="top" width="20%" id="mcps1.3.3.2.1.4.1.1"><p id="ALM-12206__p313918575184"><span id="ALM-12206__text23739193194">Type</span></p>
|
||||
</th>
|
||||
<th align="left" class="cellrowborder" valign="top" width="20%" id="mcps1.3.3.2.1.4.1.2"><p id="ALM-12206__p184311944115"><span id="ALM-12206__text543117441219">Parameter</span></p>
|
||||
</th>
|
||||
<th align="left" class="cellrowborder" valign="top" width="60%" id="mcps1.3.3.2.1.4.1.3"><p id="ALM-12206__p184328441110"><span id="ALM-12206__text4432204415119">Description</span></p>
|
||||
</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody><tr id="ALM-12206__row12465939134110"><td class="cellrowborder" rowspan="2" valign="top" width="20%" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-12206__p675011219199">Location Information</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="20%" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-12206__p17935380415">Source</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="60%" headers="mcps1.3.3.2.1.4.1.3 "><p id="ALM-12206__p187931338134115">Specifies the cluster or system for which the alarm was generated.</p>
|
||||
</td>
|
||||
</tr>
|
||||
<tr id="ALM-12206__row48724307"><td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-12206__p54354790">ServiceName</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-12206__p40661878">Specifies the service for which the alarm was generated.</p>
|
||||
</td>
|
||||
</tr>
|
||||
<tr id="ALM-12206__row19278171612917"><td class="cellrowborder" valign="top" width="20%" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-12206__p2226161952215">Additional Information</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="20%" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-12206__p32262019192217">Details</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="60%" headers="mcps1.3.3.2.1.4.1.3 "><p id="ALM-12206__p1222681911222">Specifies that the username of password that has expired.</p>
|
||||
</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
</div>
|
||||
</div>
|
||||
<div class="section" id="ALM-12206__section12235000"><h4 class="sectiontitle"><span id="ALM-12206__text1343217441218">Impact on the System</span></h4><p id="ALM-12206__p12150749154618">The account cannot be used.</p>
|
||||
</div>
|
||||
<div class="section" id="ALM-12206__section43006140"><h4 class="sectiontitle"><span id="ALM-12206__text74322441910">Possible Causes</span></h4><p id="ALM-12206__p15323151284712">The user password has expired.</p>
|
||||
</div>
|
||||
<div class="section" id="ALM-12206__section8793187020"><h4 class="sectiontitle"><span id="ALM-12206__text543211444114">Handling Procedure</span></h4><p id="ALM-12206__p158951940339"><strong id="ALM-12206__b39813481312">Change the user password.</strong></p>
|
||||
<ol id="ALM-12206__ol805769017382"><li id="ALM-12206__li91811959104819"><span>Log in to MRS Manager and choose <strong id="ALM-12206__b626212816717">O&M</strong> > <strong id="ALM-12206__b182631528873">Alarm</strong> > <strong id="ALM-12206__b162635281974">Alarms</strong>. In the alarm list, expand the alarm details, and view and record the name of the user whose password has expired in additional information.</span></li><li id="ALM-12206__li1312218213019"><span>Change the user password that has expired.</span></li><li id="ALM-12206__li868173811516"><span>If the DataArts Studio service is interconnected, check whether DataArts Studio jobs uses an expired user password. If yes, go to the DataArts Studio management center to change the password and execute the affected jobs again.</span></li><li id="ALM-12206__li9879173695916"><span>Check whether the alarm is automatically cleared after 1:00 a.m. the next day.</span><p><ul class="subitemlist" id="ALM-12206__ul229363919451"><li id="ALM-12206__li32932039174514">If yes, no further action is required.</li><li id="ALM-12206__li12293183944516">If no, go to <a href="#ALM-12206__li39839699173731">5</a>.</li></ul>
|
||||
</p></li></ol>
|
||||
<p class="tableheading" id="ALM-12206__p19547451173731"><strong id="ALM-12206__b1709986317387">Collect fault information.</strong></p>
|
||||
<ol start="5" id="ALM-12206__ol56824812173810"><li id="ALM-12206__li39839699173731"><a name="ALM-12206__li39839699173731"></a><a name="li39839699173731"></a><span>On MRS Manager, choose <strong id="ALM-12206__b751218633417">O&M</strong>. In the navigation pane on the left, choose <strong id="ALM-12206__b185125673411">Log</strong> > <strong id="ALM-12206__b75120643411">Download</strong>.</span></li><li id="ALM-12206__li23012976173731"><span>Select <strong id="ALM-12206__b18691993653422">Controller</strong> for <strong id="ALM-12206__b14412814663422">Service</strong> and click <strong id="ALM-12206__b960717653422">OK</strong>.</span></li><li id="ALM-12206__li5790200173731"><span>Click the edit icon in the upper right corner, and set <strong id="ALM-12206__b10258321483422">Start Date</strong> and <strong id="ALM-12206__b13268990423422">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-12206__b12217358133422">Download</strong>.</span></li><li id="ALM-12206__li66353041173731"><span>Contact <span id="ALM-12206__text1643218448114">O&M personnel</span> and provide the collected logs.</span></li></ol>
|
||||
</div>
|
||||
<div class="section" id="ALM-12206__section743216441318"><h4 class="sectiontitle"><span id="ALM-12206__text204325441618">Alarm Clearance</span></h4><p id="ALM-12206__p1543274413111">This alarm is automatically cleared after the fault is rectified.</p>
|
||||
</div>
|
||||
<div class="section" id="ALM-12206__section04326448115"><h4 class="sectiontitle"><span id="ALM-12206__text1443214448115">Related Information</span></h4><p id="ALM-12206__p194329447110"><span id="ALM-12206__text943211442016">None.</span></p>
|
||||
</div>
|
||||
</div>
|
||||
<div>
|
||||
<div class="familylinks">
|
||||
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
112
docs/mrs/umn/ALM-12207.html
Normal file
112
docs/mrs/umn/ALM-12207.html
Normal file
File diff suppressed because it is too large
Load Diff
@ -60,13 +60,13 @@
|
||||
<div class="section" id="ALM-14031__section64548988"><h4 class="sectiontitle"><span id="ALM-14031__text12656240135813">Possible Causes</span></h4><p id="ALM-14031__p8207814181819">The host responds slowly to I/O (disk I/O and network I/O) requests and some processes are in the D state and Z state. The process may also be suspended and enter the T state.</p>
|
||||
</div>
|
||||
<div class="section" id="ALM-14031__section770654563320"><h4 class="sectiontitle"><span id="ALM-14031__text19569135285811">Handling Procedure</span></h4><p id="ALM-14031__p1243515278455"><strong id="ALM-14031__b1655484819527">Check whether the process is in the D, Z, or T state.</strong></p>
|
||||
<ol id="ALM-14031__ol8805715143410"><li id="ALM-14031__li1980611196816"><span>Log in to FusionInsight Manager and choose <strong id="ALM-14031__b1530417210108">O&M</strong> > <strong id="ALM-14031__b664215411018">Alarm</strong> > <strong id="ALM-14031__b63760791011">Alarms</strong>. Wait for about 10 minutes and check whether the alarm is automatically cleared.</span><p><ul id="ALM-14031__ul10505203319910"><li id="ALM-14031__li5505533895">If the alarm is not in the list, no further action is required.</li><li id="ALM-14031__li350517336917">If the alarm is in the list, view the alarm details and record the IP address of the host where the alarm is generated. Run the command in <a href="#ALM-14031__li162831544134616">2</a>.</li></ul>
|
||||
<ol id="ALM-14031__ol8805715143410"><li id="ALM-14031__li1980611196816"><span>Log in to MRS Manager and choose <strong id="ALM-14031__b1530417210108">O&M</strong> > <strong id="ALM-14031__b664215411018">Alarm</strong> > <strong id="ALM-14031__b63760791011">Alarms</strong>. Wait for about 10 minutes and check whether the alarm is automatically cleared.</span><p><ul id="ALM-14031__ul10505203319910"><li id="ALM-14031__li5505533895">If the alarm is not in the list, no further action is required.</li><li id="ALM-14031__li350517336917">If the alarm is in the list, view the alarm details and record the IP address of the host where the alarm is generated. Run the command in <a href="#ALM-14031__li162831544134616">2</a>.</li></ul>
|
||||
</p></li><li id="ALM-14031__li162831544134616"><a name="ALM-14031__li162831544134616"></a><a name="li162831544134616"></a><span>Log in to the host where the alarm is generated as the <strong id="ALM-14031__b1578713318414">root</strong> user and run the <strong id="ALM-14031__b131521842151211">su - omm</strong> command to switch to the <strong id="ALM-14031__b133931244201216">omm</strong> user.</span></li><li id="ALM-14031__li129386734811"><span>Run the following command to check the process state:</span><p><p id="ALM-14031__p114995439534"><strong id="ALM-14031__b105101533205318">ps ww -eo stat,cmd| grep -w org.apache.hadoop.hdfs.server.datanode.DataNode | grep -v grep | awk '{print$1}'</strong></p>
|
||||
</p></li><li id="ALM-14031__li0510123385319"><span>Check whether the command output contains any abnormal state (D, Z, or T).</span><p><ul id="ALM-14031__ul161804819579"><li id="ALM-14031__li161818483576">If the output contains any abnormal state, go to <a href="#ALM-14031__li39471558560">5</a>.</li><li id="ALM-14031__li1661854818575">If the output does not contain abnormal states, go to <a href="#ALM-14031__li14805191513412">7</a>.</li></ul>
|
||||
</p></li><li id="ALM-14031__li39471558560"><a name="ALM-14031__li39471558560"></a><a name="li39471558560"></a><span>Switch to user <strong id="ALM-14031__b94993490139">root</strong> and run the <strong id="ALM-14031__b9500154991318">reboot</strong> command to restart the host for which the alarm is generated. (Restarting a host is risky. Ensure that the service process is normal after the restart.)</span></li><li id="ALM-14031__li7936132616563"><span>Wait 5 minutes and check whether the alarm is cleared.</span><p><ul id="ALM-14031__ul19652752195618"><li id="ALM-14031__li1365317526566">If the alarm is cleared, no further action is required.</li><li id="ALM-14031__li2065375285614">If the alarm fails to be cleared, go to <a href="#ALM-14031__li14805191513412">7</a>.</li></ul>
|
||||
</p></li></ol>
|
||||
<p class="tableheading" id="ALM-14031__p3255143214441"><strong id="ALM-14031__b17190233165214">Collect fault information.</strong></p>
|
||||
<ol start="7" id="ALM-14031__ol480581514342"><li id="ALM-14031__li14805191513412"><a name="ALM-14031__li14805191513412"></a><a name="li14805191513412"></a><span>On FusionInsight Manager, choose <strong id="ALM-14031__b463700064113054">O&M</strong>. In the navigation pane on the left, choose <strong id="ALM-14031__b402136686113054">Log</strong> > <strong id="ALM-14031__b1875241582113054">Download</strong>.</span></li><li id="ALM-14031__li168051615113417"><span>Expand the drop-down list next to the <strong id="ALM-14031__b15369453141411">Service</strong> field. In the <strong id="ALM-14031__b10370353171419">Services</strong> dialog box that is displayed, select <strong id="ALM-14031__b14370153101416">HDFS</strong> for the target cluster.</span></li><li id="ALM-14031__li5805171503414"><span>Click the edit icon in the upper right corner, and set <strong id="ALM-14031__b18636418253">Start Date</strong> and <strong id="ALM-14031__b28631142250">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-14031__b17864154132515">Download</strong>.</span></li><li id="ALM-14031__li10805181583414"><span>Contact <span id="ALM-14031__text19191183321513">O&M personnel</span> and provide the collected logs.</span></li></ol>
|
||||
<ol start="7" id="ALM-14031__ol480581514342"><li id="ALM-14031__li14805191513412"><a name="ALM-14031__li14805191513412"></a><a name="li14805191513412"></a><span>On MRS Manager, choose <strong id="ALM-14031__b463700064113054">O&M</strong>. In the navigation pane on the left, choose <strong id="ALM-14031__b402136686113054">Log</strong> > <strong id="ALM-14031__b1875241582113054">Download</strong>.</span></li><li id="ALM-14031__li168051615113417"><span>Expand the drop-down list next to the <strong id="ALM-14031__b15369453141411">Service</strong> field. In the <strong id="ALM-14031__b10370353171419">Services</strong> dialog box that is displayed, select <strong id="ALM-14031__b14370153101416">HDFS</strong> for the target cluster.</span></li><li id="ALM-14031__li5805171503414"><span>Click the edit icon in the upper right corner, and set <strong id="ALM-14031__b18636418253">Start Date</strong> and <strong id="ALM-14031__b28631142250">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-14031__b17864154132515">Download</strong>.</span></li><li id="ALM-14031__li10805181583414"><span>Contact <span id="ALM-14031__text19191183321513">O&M personnel</span> and provide the collected logs.</span></li></ol>
|
||||
</div>
|
||||
<div class="section" id="ALM-14031__section169311343318"><h4 class="sectiontitle"><span id="ALM-14031__text367020138593">Alarm Clearance</span></h4><p id="ALM-14031__p754913417333">This alarm is automatically cleared after the fault is rectified.</p>
|
||||
</div>
|
||||
|
||||
@ -60,13 +60,13 @@
|
||||
<div class="section" id="ALM-14032__section1479910471912"><h4 class="sectiontitle"><span id="ALM-14032__text187997470114">Possible Causes</span></h4><p id="ALM-14032__p276313327196">The host responds slowly to I/O (disk I/O and network I/O) requests and some processes are in the D state and Z state. The process may also be suspended and enter the T state.</p>
|
||||
</div>
|
||||
<div class="section" id="ALM-14032__section179924719116"><h4 class="sectiontitle"><span id="ALM-14032__text1799947611">Handling Procedure</span></h4><p id="ALM-14032__p1243515278455"><strong id="ALM-14032__b19561554105317">Check whether the process is in the D, Z, or T state.</strong></p>
|
||||
<ol id="ALM-14032__ol67999471216"><li id="ALM-14032__li1980611196816"><span>Log in to FusionInsight Manager and choose <strong id="ALM-14032__b633663291610">O&M</strong> > <strong id="ALM-14032__b73361132151618">Alarm</strong> > <strong id="ALM-14032__b133673291615">Alarms</strong>. Wait for about 10 minutes and check whether the alarm is automatically cleared.</span><p><ul id="ALM-14032__ul10505203319910"><li id="ALM-14032__li5505533895">If the alarm is not in the list, no further action is required.</li><li id="ALM-14032__li350517336917">If the alarm is in the list, view the alarm details and record the IP address of the host where the alarm is generated. Run the command in <a href="#ALM-14032__li162831544134616">2</a>.</li></ul>
|
||||
<ol id="ALM-14032__ol67999471216"><li id="ALM-14032__li1980611196816"><span>Log in to MRS Manager and choose <strong id="ALM-14032__b633663291610">O&M</strong> > <strong id="ALM-14032__b73361132151618">Alarm</strong> > <strong id="ALM-14032__b133673291615">Alarms</strong>. Wait for about 10 minutes and check whether the alarm is automatically cleared.</span><p><ul id="ALM-14032__ul10505203319910"><li id="ALM-14032__li5505533895">If the alarm is not in the list, no further action is required.</li><li id="ALM-14032__li350517336917">If the alarm is in the list, view the alarm details and record the IP address of the host where the alarm is generated. Run the command in <a href="#ALM-14032__li162831544134616">2</a>.</li></ul>
|
||||
</p></li><li id="ALM-14032__li162831544134616"><a name="ALM-14032__li162831544134616"></a><a name="li162831544134616"></a><span>Log in to the host where the alarm is generated as the <strong id="ALM-14032__b8152183916183">root</strong> user and run the <strong id="ALM-14032__b1215323991811">su - omm</strong> command to switch to the <strong id="ALM-14032__b11531439181818">omm</strong> user.</span></li><li id="ALM-14032__li129386734811"><span>Run the following command to check the process state:</span><p><p id="ALM-14032__p114995439534"><strong id="ALM-14032__b105101533205318">ps ww -eo stat,cmd| grep -w org.apache.hadoop.hdfs.qjournal.server.JournalNode | grep -v grep | awk '{print$1}'</strong></p>
|
||||
</p></li><li id="ALM-14032__li3621133502116"><span>Check whether the command output contains any abnormal state (D, Z, or T).</span><p><ul id="ALM-14032__ul161804819579"><li id="ALM-14032__li670603211">If the output contains any abnormal state, go to <a href="#ALM-14032__li39471558560">5</a>.</li><li id="ALM-14032__li47070182111">If the output does not contain abnormal states, go to <a href="#ALM-14032__li17799174711116">7</a>.</li></ul>
|
||||
</p></li><li id="ALM-14032__li39471558560"><a name="ALM-14032__li39471558560"></a><a name="li39471558560"></a><span>Switch to user <strong id="ALM-14032__b105163881919">root</strong> and run the <strong id="ALM-14032__b6517582194">reboot</strong> command to restart the host for which the alarm is generated. (Restarting a host is risky. Ensure that the service process is normal after the restart.)</span></li><li id="ALM-14032__li7936132616563"><span>Wait 5 minutes and check whether the alarm is cleared.</span><p><ul id="ALM-14032__ul19652752195618"><li id="ALM-14032__li1365317526566">If the alarm is cleared, no further action is required.</li><li id="ALM-14032__li2065375285614">If the alarm fails to be cleared, go to <a href="#ALM-14032__li17799174711116">7</a>.</li></ul>
|
||||
</p></li></ol>
|
||||
<p class="tableheading" id="ALM-14032__p2079910471716"><strong id="ALM-14032__b1648416219547">Collect fault information.</strong></p>
|
||||
<ol start="7" id="ALM-14032__ol37994471410"><li id="ALM-14032__li17799174711116"><a name="ALM-14032__li17799174711116"></a><a name="li17799174711116"></a><span>On FusionInsight Manager, choose <strong id="ALM-14032__b1377244511199">O&M</strong>. In the navigation pane on the left, choose <strong id="ALM-14032__b1677394513195">Log</strong> > <strong id="ALM-14032__b12773144551914">Download</strong>.</span></li><li id="ALM-14032__li177999474110"><span>Expand the drop-down list next to the <strong id="ALM-14032__b1524684816195">Service</strong> field. In the <strong id="ALM-14032__b92474487193">Services</strong> dialog box that is displayed, select <strong id="ALM-14032__b162479489190">HDFS</strong> for the target cluster.</span></li><li id="ALM-14032__li5799147219"><span>Click the edit icon in the upper right corner, and set <strong id="ALM-14032__b17773913202510">Start Date</strong> and <strong id="ALM-14032__b67743138253">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-14032__b3774181332510">Download</strong>.</span></li><li id="ALM-14032__li57991247416"><span>Contact <span id="ALM-14032__text9526257151916">O&M personnel</span> and provide the collected logs.</span></li></ol>
|
||||
<ol start="7" id="ALM-14032__ol37994471410"><li id="ALM-14032__li17799174711116"><a name="ALM-14032__li17799174711116"></a><a name="li17799174711116"></a><span>On MRS Manager, choose <strong id="ALM-14032__b1377244511199">O&M</strong>. In the navigation pane on the left, choose <strong id="ALM-14032__b1677394513195">Log</strong> > <strong id="ALM-14032__b12773144551914">Download</strong>.</span></li><li id="ALM-14032__li177999474110"><span>Expand the drop-down list next to the <strong id="ALM-14032__b1524684816195">Service</strong> field. In the <strong id="ALM-14032__b92474487193">Services</strong> dialog box that is displayed, select <strong id="ALM-14032__b162479489190">HDFS</strong> for the target cluster.</span></li><li id="ALM-14032__li5799147219"><span>Click the edit icon in the upper right corner, and set <strong id="ALM-14032__b17773913202510">Start Date</strong> and <strong id="ALM-14032__b67743138253">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-14032__b3774181332510">Download</strong>.</span></li><li id="ALM-14032__li57991247416"><span>Contact <span id="ALM-14032__text9526257151916">O&M personnel</span> and provide the collected logs.</span></li></ol>
|
||||
</div>
|
||||
<div class="section" id="ALM-14032__section979934710111"><h4 class="sectiontitle"><span id="ALM-14032__text1379918471115">Alarm Clearance</span></h4><p id="ALM-14032__p27991247919">This alarm is automatically cleared after the fault is rectified.</p>
|
||||
</div>
|
||||
|
||||
@ -60,13 +60,13 @@
|
||||
<div class="section" id="ALM-14033__section1479910471912"><h4 class="sectiontitle"><span id="ALM-14033__text187997470114">Possible Causes</span></h4><p id="ALM-14033__p1647015610239">The host responds slowly to I/O (disk I/O and network I/O) requests and some processes are in the D state and Z state. The process may also be suspended and enter the T state.</p>
|
||||
</div>
|
||||
<div class="section" id="ALM-14033__section179924719116"><h4 class="sectiontitle"><span id="ALM-14033__text1799947611">Handling Procedure</span></h4><p id="ALM-14033__p1243515278455"><strong id="ALM-14033__b6239811105419">Check whether the process is in the D, Z, or T state.</strong></p>
|
||||
<ol id="ALM-14033__ol67999471216"><li id="ALM-14033__li1980611196816"><span>Log in to FusionInsight Manager and choose <strong id="ALM-14033__b149051811214">O&M</strong> > <strong id="ALM-14033__b1290161817211">Alarm</strong> > <strong id="ALM-14033__b189171812117">Alarms</strong>. Wait for about 10 minutes and check whether the alarm is automatically cleared.</span><p><ul id="ALM-14033__ul10505203319910"><li id="ALM-14033__li5505533895">If the alarm is not in the list, no further action is required.</li><li id="ALM-14033__li350517336917">If the alarm is in the list, view the alarm details and record the IP address of the host where the alarm is generated. Run the command in <a href="#ALM-14033__li191311041031">2</a>.</li></ul>
|
||||
<ol id="ALM-14033__ol67999471216"><li id="ALM-14033__li1980611196816"><span>Log in to MRS Manager and choose <strong id="ALM-14033__b149051811214">O&M</strong> > <strong id="ALM-14033__b1290161817211">Alarm</strong> > <strong id="ALM-14033__b189171812117">Alarms</strong>. Wait for about 10 minutes and check whether the alarm is automatically cleared.</span><p><ul id="ALM-14033__ul10505203319910"><li id="ALM-14033__li5505533895">If the alarm is not in the list, no further action is required.</li><li id="ALM-14033__li350517336917">If the alarm is in the list, view the alarm details and record the IP address of the host where the alarm is generated. Run the command in <a href="#ALM-14033__li191311041031">2</a>.</li></ul>
|
||||
</p></li><li id="ALM-14033__li191311041031"><a name="ALM-14033__li191311041031"></a><a name="li191311041031"></a><span>Log in to the host where the alarm is generated as the <strong id="ALM-14033__b31081042122115">root</strong> user and run the <strong id="ALM-14033__b171086426213">su - omm</strong> command to switch to the <strong id="ALM-14033__b131082423217">omm</strong> user.</span></li><li id="ALM-14033__li129386734811"><span>Run the following command to check whether the process state is abnormal:</span><p><p id="ALM-14033__p114995439534"><strong id="ALM-14033__b105101533205318">ps ww -eo stat,cmd| grep -w org.apache.hadoop.hdfs.tools.DFSZKFailoverController | grep -v grep | awk '{print$1}'</strong></p>
|
||||
</p></li><li id="ALM-14033__li3621133502116"><span>Check whether the command output contains any abnormal state (D, Z, or T).</span><p><ul id="ALM-14033__ul161804819579"><li id="ALM-14033__li670603211">If the output contains any abnormal state, go to <a href="#ALM-14033__li39471558560">5</a>.</li><li id="ALM-14033__li47070182111">If the output does not contain abnormal states, go to <a href="#ALM-14033__li17799174711116">7</a>.</li></ul>
|
||||
</p></li><li id="ALM-14033__li39471558560"><a name="ALM-14033__li39471558560"></a><a name="li39471558560"></a><span>Switch to user <strong id="ALM-14033__b937623310221">root</strong> and run the <strong id="ALM-14033__b537733311228">reboot</strong> command to restart the host for which the alarm is generated. (Restarting a host is risky. Ensure that the service process is normal after the restart.)</span></li><li id="ALM-14033__li7936132616563"><span>Wait 5 minutes and check whether the alarm is cleared.</span><p><ul id="ALM-14033__ul19652752195618"><li id="ALM-14033__li1365317526566">If the alarm is cleared, no further action is required.</li><li id="ALM-14033__li2065375285614">If the alarm fails to be cleared, go to <a href="#ALM-14033__li17799174711116">7</a>.</li></ul>
|
||||
</p></li></ol>
|
||||
<p class="tableheading" id="ALM-14033__p2079910471716"><strong id="ALM-14033__b14258101712544">Collect fault information.</strong></p>
|
||||
<ol start="7" id="ALM-14033__ol37994471410"><li id="ALM-14033__li17799174711116"><a name="ALM-14033__li17799174711116"></a><a name="li17799174711116"></a><span>On FusionInsight Manager, choose <strong id="ALM-14033__b12332161919238">O&M</strong>. In the navigation pane on the left, choose <strong id="ALM-14033__b1833361911230">Log</strong> > <strong id="ALM-14033__b1133317198238">Download</strong>.</span></li><li id="ALM-14033__li177999474110"><span>Expand the drop-down list next to the <strong id="ALM-14033__b16661422122318">Service</strong> field. In the <strong id="ALM-14033__b5667182202315">Services</strong> dialog box that is displayed, select <strong id="ALM-14033__b1166812210237">HDFS</strong> for the target cluster.</span></li><li id="ALM-14033__li5799147219"><span>Click the edit icon in the upper right corner, and set <strong id="ALM-14033__b1370492018259">Start Date</strong> and <strong id="ALM-14033__b18704142012518">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-14033__b147041420182520">Download</strong>.</span></li><li id="ALM-14033__li57991247416"><span>Contact <span id="ALM-14033__text4716173792311">O&M personnel</span> and provide the collected logs.</span></li></ol>
|
||||
<ol start="7" id="ALM-14033__ol37994471410"><li id="ALM-14033__li17799174711116"><a name="ALM-14033__li17799174711116"></a><a name="li17799174711116"></a><span>On MRS Manager, choose <strong id="ALM-14033__b12332161919238">O&M</strong>. In the navigation pane on the left, choose <strong id="ALM-14033__b1833361911230">Log</strong> > <strong id="ALM-14033__b1133317198238">Download</strong>.</span></li><li id="ALM-14033__li177999474110"><span>Expand the drop-down list next to the <strong id="ALM-14033__b16661422122318">Service</strong> field. In the <strong id="ALM-14033__b5667182202315">Services</strong> dialog box that is displayed, select <strong id="ALM-14033__b1166812210237">HDFS</strong> for the target cluster.</span></li><li id="ALM-14033__li5799147219"><span>Click the edit icon in the upper right corner, and set <strong id="ALM-14033__b1370492018259">Start Date</strong> and <strong id="ALM-14033__b18704142012518">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-14033__b147041420182520">Download</strong>.</span></li><li id="ALM-14033__li57991247416"><span>Contact <span id="ALM-14033__text4716173792311">O&M personnel</span> and provide the collected logs.</span></li></ol>
|
||||
</div>
|
||||
<div class="section" id="ALM-14033__section979934710111"><h4 class="sectiontitle"><span id="ALM-14033__text1379918471115">Alarm Clearance</span></h4><p id="ALM-14033__p27991247919">This alarm is automatically cleared after the fault is rectified.</p>
|
||||
</div>
|
||||
|
||||
@ -60,13 +60,13 @@
|
||||
<div class="section" id="ALM-14034__section1479910471912"><h4 class="sectiontitle"><span id="ALM-14034__text187997470114">Possible Causes</span></h4><p id="ALM-14034__p1626235122417">The host responds slowly to I/O (disk I/O and network I/O) requests and some processes are in the D state and Z state. The process may also be suspended and enter the T state.</p>
|
||||
</div>
|
||||
<div class="section" id="ALM-14034__section179924719116"><h4 class="sectiontitle"><span id="ALM-14034__text1799947611">Handling Procedure</span></h4><p id="ALM-14034__p1243515278455"><strong id="ALM-14034__b34831828145411">Check whether the process is in the D, Z, or T state.</strong></p>
|
||||
<ol id="ALM-14034__ol67999471216"><li id="ALM-14034__li1980611196816"><span>Log in to FusionInsight Manager and choose <strong id="ALM-14034__b1937751732912">O&M</strong> > <strong id="ALM-14034__b1837741713297">Alarm</strong> > <strong id="ALM-14034__b6377121732913">Alarms</strong>. Wait for about 10 minutes and check whether the alarm is automatically cleared.</span><p><ul id="ALM-14034__ul10505203319910"><li id="ALM-14034__li5505533895">If the alarm is not in the list, no further action is required.</li><li id="ALM-14034__li350517336917">If the alarm is in the list, view the alarm details and record the IP address of the host where the alarm is generated. Run the command in <a href="#ALM-14034__li16811215432">2</a>.</li></ul>
|
||||
<ol id="ALM-14034__ol67999471216"><li id="ALM-14034__li1980611196816"><span>Log in to MRS Manager and choose <strong id="ALM-14034__b1937751732912">O&M</strong> > <strong id="ALM-14034__b1837741713297">Alarm</strong> > <strong id="ALM-14034__b6377121732913">Alarms</strong>. Wait for about 10 minutes and check whether the alarm is automatically cleared.</span><p><ul id="ALM-14034__ul10505203319910"><li id="ALM-14034__li5505533895">If the alarm is not in the list, no further action is required.</li><li id="ALM-14034__li350517336917">If the alarm is in the list, view the alarm details and record the IP address of the host where the alarm is generated. Run the command in <a href="#ALM-14034__li16811215432">2</a>.</li></ul>
|
||||
</p></li><li id="ALM-14034__li16811215432"><a name="ALM-14034__li16811215432"></a><a name="li16811215432"></a><span>Log in to the host where the alarm is generated as the <strong id="ALM-14034__b156544532910">root</strong> user and run the <strong id="ALM-14034__b175661745142920">su - omm</strong> command to switch to the <strong id="ALM-14034__b456614458299">omm</strong> user.</span></li><li id="ALM-14034__li129386734811"><span>Run the following command to check whether the process state is abnormal:</span><p><p id="ALM-14034__p114995439534"><strong id="ALM-14034__b105101533205318">ps ww -eo stat,cmd| grep -w org.apache.hadoop.hdfs.server.federation.router.DFSRouter | grep -v grep | awk '{print$1}'</strong></p>
|
||||
</p></li><li id="ALM-14034__li3621133502116"><span>Check whether the command output contains any abnormal state (D, Z, or T).</span><p><ul id="ALM-14034__ul161804819579"><li id="ALM-14034__li670603211">If the output contains any abnormal state, go to <a href="#ALM-14034__li39471558560">5</a>.</li><li id="ALM-14034__li47070182111">If the output does not contain abnormal states, go to <a href="#ALM-14034__li17799174711116">7</a>.</li></ul>
|
||||
</p></li><li id="ALM-14034__li39471558560"><a name="ALM-14034__li39471558560"></a><a name="li39471558560"></a><span>Switch to user <strong id="ALM-14034__b449481414300">root</strong> and run the <strong id="ALM-14034__b149421411305">reboot</strong> command to restart the host for which the alarm is generated. (Restarting a host is risky. Ensure that the service process is normal after the restart.)</span></li><li id="ALM-14034__li7936132616563"><span>Wait 5 minutes and check whether the alarm is cleared.</span><p><ul id="ALM-14034__ul19652752195618"><li id="ALM-14034__li1365317526566">If the alarm is cleared, no further action is required.</li><li id="ALM-14034__li2065375285614">If the alarm fails to be cleared, go to <a href="#ALM-14034__li17799174711116">7</a>.</li></ul>
|
||||
</p></li></ol>
|
||||
<p class="tableheading" id="ALM-14034__p2079910471716"><strong id="ALM-14034__b958603455414">Collect fault information.</strong></p>
|
||||
<ol start="7" id="ALM-14034__ol37994471410"><li id="ALM-14034__li17799174711116"><a name="ALM-14034__li17799174711116"></a><a name="li17799174711116"></a><span>On FusionInsight Manager, choose <strong id="ALM-14034__b1261064693015">O&M</strong>. In the navigation pane on the left, choose <strong id="ALM-14034__b186101846163011">Log</strong> > <strong id="ALM-14034__b761174619309">Download</strong>.</span></li><li id="ALM-14034__li177999474110"><span>Expand the drop-down list next to the <strong id="ALM-14034__b3138154923017">Service</strong> field. In the <strong id="ALM-14034__b1513820496307">Services</strong> dialog box that is displayed, select <strong id="ALM-14034__b313984916304">HDFS</strong> for the target cluster.</span></li><li id="ALM-14034__li5799147219"><span>Click the edit icon in the upper right corner, and set <strong id="ALM-14034__b14685025112516">Start Date</strong> and <strong id="ALM-14034__b96858253253">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-14034__b5685225202516">Download</strong>.</span></li><li id="ALM-14034__li57991247416"><span>Contact <span id="ALM-14034__text640375883017">O&M personnel</span> and provide the collected logs.</span></li></ol>
|
||||
<ol start="7" id="ALM-14034__ol37994471410"><li id="ALM-14034__li17799174711116"><a name="ALM-14034__li17799174711116"></a><a name="li17799174711116"></a><span>On MRS Manager, choose <strong id="ALM-14034__b1261064693015">O&M</strong>. In the navigation pane on the left, choose <strong id="ALM-14034__b186101846163011">Log</strong> > <strong id="ALM-14034__b761174619309">Download</strong>.</span></li><li id="ALM-14034__li177999474110"><span>Expand the drop-down list next to the <strong id="ALM-14034__b3138154923017">Service</strong> field. In the <strong id="ALM-14034__b1513820496307">Services</strong> dialog box that is displayed, select <strong id="ALM-14034__b313984916304">HDFS</strong> for the target cluster.</span></li><li id="ALM-14034__li5799147219"><span>Click the edit icon in the upper right corner, and set <strong id="ALM-14034__b14685025112516">Start Date</strong> and <strong id="ALM-14034__b96858253253">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-14034__b5685225202516">Download</strong>.</span></li><li id="ALM-14034__li57991247416"><span>Contact <span id="ALM-14034__text640375883017">O&M personnel</span> and provide the collected logs.</span></li></ol>
|
||||
</div>
|
||||
<div class="section" id="ALM-14034__section979934710111"><h4 class="sectiontitle"><span id="ALM-14034__text1379918471115">Alarm Clearance</span></h4><p id="ALM-14034__p27991247919">This alarm is automatically cleared after the fault is rectified.</p>
|
||||
</div>
|
||||
|
||||
@ -60,13 +60,13 @@
|
||||
<div class="section" id="ALM-14035__section1479910471912"><h4 class="sectiontitle"><span id="ALM-14035__text187997470114">Possible Causes</span></h4><p id="ALM-14035__p251412141245">The host responds slowly to I/O (disk I/O and network I/O) requests and some processes are in the D state and Z state. The process may also be suspended and enter the T state.</p>
|
||||
</div>
|
||||
<div class="section" id="ALM-14035__section179924719116"><h4 class="sectiontitle"><span id="ALM-14035__text1799947611">Handling Procedure</span></h4><p id="ALM-14035__p1243515278455"><strong id="ALM-14035__b1988924517547">Check whether the process is in the D, Z, or T state.</strong></p>
|
||||
<ol id="ALM-14035__ol67999471216"><li id="ALM-14035__li1980611196816"><span>Log in to FusionInsight Manager and choose <strong id="ALM-14035__b10840950103119">O&M</strong> > <strong id="ALM-14035__b5841115013118">Alarm</strong> > <strong id="ALM-14035__b14841155093119">Alarms</strong>. Wait for about 10 minutes and check whether the alarm is automatically cleared.</span><p><ul id="ALM-14035__ul10505203319910"><li id="ALM-14035__li5505533895">If the alarm is not in the list, no further action is required.</li><li id="ALM-14035__li350517336917">If the alarm is in the list, view the alarm details and record the IP address of the host where the alarm is generated. Run the command in <a href="#ALM-14035__li68511247311">2</a>.</li></ul>
|
||||
<ol id="ALM-14035__ol67999471216"><li id="ALM-14035__li1980611196816"><span>Log in to MRS Manager and choose <strong id="ALM-14035__b10840950103119">O&M</strong> > <strong id="ALM-14035__b5841115013118">Alarm</strong> > <strong id="ALM-14035__b14841155093119">Alarms</strong>. Wait for about 10 minutes and check whether the alarm is automatically cleared.</span><p><ul id="ALM-14035__ul10505203319910"><li id="ALM-14035__li5505533895">If the alarm is not in the list, no further action is required.</li><li id="ALM-14035__li350517336917">If the alarm is in the list, view the alarm details and record the IP address of the host where the alarm is generated. Run the command in <a href="#ALM-14035__li68511247311">2</a>.</li></ul>
|
||||
</p></li><li id="ALM-14035__li68511247311"><a name="ALM-14035__li68511247311"></a><a name="li68511247311"></a><span>Log in to the host where the alarm is generated as the <strong id="ALM-14035__b379415162324">root</strong> user and run the <strong id="ALM-14035__b07941316193217">su - omm</strong> command to switch to the <strong id="ALM-14035__b4795516173217">omm</strong> user.</span></li><li id="ALM-14035__li129386734811"><span>Run the following command to check whether the process state is abnormal:</span><p><p id="ALM-14035__p114995439534"><strong id="ALM-14035__b105101533205318">ps ww -eo stat,cmd| grep -w org.apache.hadoop.fs.http.server.HttpFSServerWebServer | grep -v grep | awk '{print$1}'</strong></p>
|
||||
</p></li><li id="ALM-14035__li3621133502116"><span>Check whether the command output contains any abnormal state (D, Z, or T).</span><p><ul id="ALM-14035__ul161804819579"><li id="ALM-14035__li670603211">If the output contains any abnormal state, go to <a href="#ALM-14035__li39471558560">5</a>.</li><li id="ALM-14035__li47070182111">If the output does not contain abnormal states, go to <a href="#ALM-14035__li17799174711116">7</a>.</li></ul>
|
||||
</p></li><li id="ALM-14035__li39471558560"><a name="ALM-14035__li39471558560"></a><a name="li39471558560"></a><span>Switch to user <strong id="ALM-14035__b5858163753214">root</strong> and run the <strong id="ALM-14035__b4859203716322">reboot</strong> command to restart the host for which the alarm is generated. (Restarting a host is risky. Ensure that the service process is normal after the restart.)</span></li><li id="ALM-14035__li7936132616563"><span>Wait 5 minutes and check whether the alarm is cleared.</span><p><ul id="ALM-14035__ul19652752195618"><li id="ALM-14035__li1365317526566">If the alarm is cleared, no further action is required.</li><li id="ALM-14035__li2065375285614">If the alarm fails to be cleared, go to <a href="#ALM-14035__li17799174711116">7</a>.</li></ul>
|
||||
</p></li></ol>
|
||||
<p class="tableheading" id="ALM-14035__p2079910471716"><strong id="ALM-14035__b10284155114545">Collect fault information.</strong></p>
|
||||
<ol start="7" id="ALM-14035__ol37994471410"><li id="ALM-14035__li17799174711116"><a name="ALM-14035__li17799174711116"></a><a name="li17799174711116"></a><span>On FusionInsight Manager, choose <strong id="ALM-14035__b1761410973312">O&M</strong>. In the navigation pane on the left, choose <strong id="ALM-14035__b1861613933317">Log</strong> > <strong id="ALM-14035__b186171298337">Download</strong>.</span></li><li id="ALM-14035__li177999474110"><span>Expand the drop-down list next to the <strong id="ALM-14035__b1423117123337">Service</strong> field. In the <strong id="ALM-14035__b8232181263315">Services</strong> dialog box that is displayed, select <strong id="ALM-14035__b15232141214334">HDFS</strong> for the target cluster.</span></li><li id="ALM-14035__li5799147219"><span>Click the edit icon in the upper right corner, and set <strong id="ALM-14035__b86785334252">Start Date</strong> and <strong id="ALM-14035__b06791933142513">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-14035__b18679833192510">Download</strong>.</span></li><li id="ALM-14035__li57991247416"><span>Contact <span id="ALM-14035__text6536822123311">O&M personnel</span> and provide the collected logs.</span></li></ol>
|
||||
<ol start="7" id="ALM-14035__ol37994471410"><li id="ALM-14035__li17799174711116"><a name="ALM-14035__li17799174711116"></a><a name="li17799174711116"></a><span>On MRS Manager, choose <strong id="ALM-14035__b1761410973312">O&M</strong>. In the navigation pane on the left, choose <strong id="ALM-14035__b1861613933317">Log</strong> > <strong id="ALM-14035__b186171298337">Download</strong>.</span></li><li id="ALM-14035__li177999474110"><span>Expand the drop-down list next to the <strong id="ALM-14035__b1423117123337">Service</strong> field. In the <strong id="ALM-14035__b8232181263315">Services</strong> dialog box that is displayed, select <strong id="ALM-14035__b15232141214334">HDFS</strong> for the target cluster.</span></li><li id="ALM-14035__li5799147219"><span>Click the edit icon in the upper right corner, and set <strong id="ALM-14035__b86785334252">Start Date</strong> and <strong id="ALM-14035__b06791933142513">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-14035__b18679833192510">Download</strong>.</span></li><li id="ALM-14035__li57991247416"><span>Contact <span id="ALM-14035__text6536822123311">O&M personnel</span> and provide the collected logs.</span></li></ol>
|
||||
</div>
|
||||
<div class="section" id="ALM-14035__section979934710111"><h4 class="sectiontitle"><span id="ALM-14035__text1379918471115">Alarm Clearance</span></h4><p id="ALM-14035__p27991247919">This alarm is automatically cleared after the fault is rectified.</p>
|
||||
</div>
|
||||
|
||||
103
docs/mrs/umn/ALM-14036.html
Normal file
103
docs/mrs/umn/ALM-14036.html
Normal file
File diff suppressed because it is too large
Load Diff
86
docs/mrs/umn/ALM-14037.html
Normal file
86
docs/mrs/umn/ALM-14037.html
Normal file
@ -0,0 +1,86 @@
|
||||
<a name="ALM-14037"></a><a name="ALM-14037"></a>
|
||||
|
||||
<h1 class="topictitle1">ALM-14037 DataNodes Outside the Cluster</h1>
|
||||
<div id="body0000002380297580"><p id="ALM-14037__p11365215121312">This alarm applies only to MRS 3.3.1 or later.</p>
|
||||
<div class="section" id="ALM-14037__section979815471118"><h4 class="sectiontitle"><span id="ALM-14037__text1079812471120">Alarm Description</span></h4><p id="ALM-14037__p8353691349">The NameNode checks whether there are DataNodes that are not managed in the cluster every 8 hours. This alarm is generated when there is a DataNode outside the cluster. This alarm is cleared when no DataNode is outside the cluster.</p>
|
||||
</div>
|
||||
<div class="section" id="ALM-14037__section18798204714110"><h4 class="sectiontitle"><span id="ALM-14037__text2798164712118">Alarm Attributes</span></h4>
|
||||
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-14037__table87986471415" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-14037__row167981047613"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.3.2.1.4.1.1"><p id="ALM-14037__p12798647315"><span id="ALM-14037__text10798547517">Alarm ID</span></p>
|
||||
</th>
|
||||
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.3.2.1.4.1.2"><p id="ALM-14037__p16798124719115"><span id="ALM-14037__text157981347317">Alarm Severity</span></p>
|
||||
</th>
|
||||
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.3.2.1.4.1.3"><p id="ALM-14037__p17992471410"><span id="ALM-14037__text15799194720117">Auto Cleared</span></p>
|
||||
</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody><tr id="ALM-14037__row67994478118"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-14037__p18799747419">14037</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-14037__p279974710111">Major</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.3.2.1.4.1.3 "><p id="ALM-14037__p107994471713">Yes</p>
|
||||
</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
</div>
|
||||
</div>
|
||||
<div class="section" id="ALM-14037__section19799184712110"><h4 class="sectiontitle"><span id="ALM-14037__text27993470117">Alarm Parameters</span></h4>
|
||||
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-14037__table3799204720116" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-14037__row1879915471215"><th align="left" class="cellrowborder" valign="top" width="20%" id="mcps1.3.4.2.1.4.1.1"><p id="ALM-14037__p457420453319"><span id="ALM-14037__text179951097336">Type</span></p>
|
||||
</th>
|
||||
<th align="left" class="cellrowborder" valign="top" width="20%" id="mcps1.3.4.2.1.4.1.2"><p id="ALM-14037__p177993479118"><span id="ALM-14037__text207998471417">Parameter</span></p>
|
||||
</th>
|
||||
<th align="left" class="cellrowborder" valign="top" width="60%" id="mcps1.3.4.2.1.4.1.3"><p id="ALM-14037__p579954720114"><span id="ALM-14037__text127995473116">Description</span></p>
|
||||
</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody><tr id="ALM-14037__row1179918471011"><td class="cellrowborder" rowspan="3" valign="top" width="20%" headers="mcps1.3.4.2.1.4.1.1 "><p id="ALM-14037__p1088261819334">Location Information</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="20%" headers="mcps1.3.4.2.1.4.1.2 "><p id="ALM-14037__p859219498522">Source</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="60%" headers="mcps1.3.4.2.1.4.1.3 "><p id="ALM-14037__p2059134995215">Specifies the cluster for which the alarm was generated.</p>
|
||||
</td>
|
||||
</tr>
|
||||
<tr id="ALM-14037__row1279964711115"><td class="cellrowborder" valign="top" headers="mcps1.3.4.2.1.4.1.1 "><p id="ALM-14037__p1059010490521">ServiceName</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" headers="mcps1.3.4.2.1.4.1.2 "><p id="ALM-14037__p35886492524">Specifies the service for which the alarm was generated.</p>
|
||||
</td>
|
||||
</tr>
|
||||
<tr id="ALM-14037__row079994716117"><td class="cellrowborder" valign="top" headers="mcps1.3.4.2.1.4.1.1 "><p id="ALM-14037__p12587144965212">NameServiceName</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" headers="mcps1.3.4.2.1.4.1.2 "><p id="ALM-14037__p145851849195219">Specifies the NameService for which the alarm was generated.</p>
|
||||
</td>
|
||||
</tr>
|
||||
<tr id="ALM-14037__row75497132133"><td class="cellrowborder" valign="top" width="20%" headers="mcps1.3.4.2.1.4.1.1 "><p id="ALM-14037__p129501240105">Additional Information</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="20%" headers="mcps1.3.4.2.1.4.1.2 "><p id="ALM-14037__p1852102317910">Trigger Condition</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="60%" headers="mcps1.3.4.2.1.4.1.3 "><p id="ALM-14037__p485211231395">Specifies the alarm triggering condition, that is, the IP address and port of a DataNode outside the cluster is detected.</p>
|
||||
</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
</div>
|
||||
</div>
|
||||
<div class="section" id="ALM-14037__section0799144716115"><h4 class="sectiontitle"><span id="ALM-14037__text479911470117">Impact on the System</span></h4><p id="ALM-14037__p8799247918">Data may be lost.</p>
|
||||
</div>
|
||||
<div class="section" id="ALM-14037__section1479910471912"><h4 class="sectiontitle"><span id="ALM-14037__text187997470114">Possible Causes</span></h4><p id="ALM-14037__p251412141245">After a host is forcibly deleted, the host is powered on again, and the process is restarted.</p>
|
||||
</div>
|
||||
<div class="section" id="ALM-14037__section1625313181813"><h4 class="sectiontitle"><span id="ALM-14037__text1799947611">Handling Procedure</span></h4><ol id="ALM-14037__ol13109229192319"><li id="ALM-14037__li20107102982319"><span>Log in to MRS Manager, click <strong id="ALM-14037__b6868164684419">O&M</strong>, and choose <strong id="ALM-14037__b128685462449">Alarm</strong> > <strong id="ALM-14037__b1868134612441">Alarms</strong> to view the alarm details. In the additional information area, check the IP address of the host for which the alarm is generated.</span></li><li id="ALM-14037__li615912460395"><span>Stop the DataNode process on the host for which the alarm is reported.</span><p><div class="notice" id="ALM-14037__note910852912235"><span class="noticetitle"><img src="public_sys-resources/notice_3.0-en-us.png"> </span><div class="noticebody"><p id="ALM-14037__p1310852910233">If there are multiple IP addresses of the host, you can <strong id="ALM-14037__b21362058144713">stop only one DataNode process at a time</strong> and stop the next DataNode process only after <strong id="ALM-14037__b131816574819">Number of Blocks to Be Replicated</strong> changes to <strong id="ALM-14037__b6492429184814">0</strong>.</p>
|
||||
</div></div>
|
||||
<ol type="a" id="ALM-14037__ol8814103134019"><li id="ALM-14037__li1079441012286">Log in to the host for which the alarm is generated as the <strong id="ALM-14037__b101351143863">root</strong> user and change the permission on the <strong id="ALM-14037__b189315501468">hadoop</strong> directory in the installation directory <strong id="ALM-14037__b11917078710">${BIGDATA_HOME}/FusionInsight_HD_*/install</strong>.<p id="ALM-14037__p6948225182413"><strong id="ALM-14037__b17292173411294">chmod 000 ${BIGDATA_HOME}/FusionInsight_HD_<span id="ALM-14037__ph15103223191010">8.1.0.1</span>/install/FusionInsight-Hadoop-3.3.1</strong></p>
|
||||
</li><li id="ALM-14037__li1610822922312">Run the following commands to obtain the PID of the DataNode process and stop it on the host:<p id="ALM-14037__p15303482176"><a name="ALM-14037__li1610822922312"></a><a name="li1610822922312"></a><strong id="ALM-14037__b1853044818173">ps -ef | grep Dproc_datanode</strong></p>
|
||||
<p id="ALM-14037__p353014818176"><strong id="ALM-14037__b2853175144917">kill -15 </strong><em id="ALM-14037__i104361783499">PID</em></p>
|
||||
</li><li id="ALM-14037__li11081629112319">Choose <strong id="ALM-14037__b163961819497">Cluster</strong> > <strong id="ALM-14037__b1682313198492">Services</strong> > <strong id="ALM-14037__b332102254910">HDFS</strong>. Check the <strong id="ALM-14037__b1435102674919">Basic Information</strong> area in the <strong id="ALM-14037__b055818123504">Dashboard</strong> tab (or the <strong id="ALM-14037__b1823272345110">NameService Summary</strong> area in the <strong id="ALM-14037__b2011942645212">Dashboard</strong> tab of HDFS), and wait until the value of <strong id="ALM-14037__b23971631205210">Blocks to be Replicated</strong> changes to <strong id="ALM-14037__b1879119353528">0</strong>.</li></ol>
|
||||
</p></li><li id="ALM-14037__li610972912237"><span>Wait for 8 hours and check whether the alarm is cleared and whether the number of blocks to be replicated is 0.</span><p><ul id="ALM-14037__ul15109729202314"><li id="ALM-14037__li1710910295232">If yes, no further action is required.</li><li id="ALM-14037__li010915292232">If no, go to <a href="#ALM-14037__li6107182982310">4</a>.</li></ul>
|
||||
</p></li></ol>
|
||||
<p class="tableheading" id="ALM-14037__p1851092518232"><strong id="ALM-14037__b67993473118">Collect fault information.</strong></p>
|
||||
<ol start="4" id="ALM-14037__ol17107729192312"><li id="ALM-14037__li6107182982310"><a name="ALM-14037__li6107182982310"></a><a name="li6107182982310"></a><span>On MRS Manager, choose <strong id="ALM-14037__b191917811353033">O&M</strong>. In the navigation pane on the left, choose <strong id="ALM-14037__b77360713353033">Log</strong> > <strong id="ALM-14037__b195669191753033">Download</strong>.</span></li><li id="ALM-14037__li1710792913234"><span>Expand the drop-down list next to the <strong id="ALM-14037__b62295640353033">Service</strong> field. In the <strong id="ALM-14037__b103124844353033">Services</strong> dialog box that is displayed, select <strong id="ALM-14037__b155328321453033">HDFS</strong> for the target cluster.</span></li><li id="ALM-14037__li18107529142313"><span>Click the edit icon in the upper right corner, and set <strong id="ALM-14037__b66166800453033">Start Date</strong> and <strong id="ALM-14037__b170530574753033">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-14037__b149784741753033">Download</strong>.</span></li><li id="ALM-14037__li111072029142310"><span>Contact <span id="ALM-14037__text141071329112312">O&M personnel</span> and provide the collected logs.</span></li></ol>
|
||||
</div>
|
||||
<p id="ALM-14037__p8060118"></p>
|
||||
</div>
|
||||
<div>
|
||||
<div class="familylinks">
|
||||
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
92
docs/mrs/umn/ALM-14038.html
Normal file
92
docs/mrs/umn/ALM-14038.html
Normal file
@ -0,0 +1,92 @@
|
||||
<a name="ALM-14038"></a><a name="ALM-14038"></a>
|
||||
|
||||
<h1 class="topictitle1">ALM-14038 Router Heap Memory Usage Exceeds the Threshold</h1>
|
||||
<div id="body0000002414056793"><div class="section" id="ALM-14038__en-us_topic_0000001973076798_section61130422"><h4 class="sectiontitle"><span id="ALM-14038__en-us_topic_0000001973076798_text14838183534515">Alarm Description</span></h4><p id="ALM-14038__en-us_topic_0000001973076798_p14812162112019">The system checks the size of the used HDFS Router heap memory and the maximum size of the heap memory that can be allocated every 30 seconds, calculates the ratio of the used heap memory to the maximum size of the heap memory that can be allocated to obtain the heap memory usage, and compares the actual heap memory usage of the HDFS Router with the threshold. The HDFS Router Heap Memory usage has a default threshold. This alarm is generated when the HDFS Router Heap Memory usage exceeds the threshold.</p>
|
||||
<p id="ALM-14038__en-us_topic_0000001973076798_en-us_topic_0070543644_p1491792">You can change the threshold in <strong id="ALM-14038__en-us_topic_0000001973076798_en-us_topic_0070543638_b55978213">O&M</strong> > <strong id="ALM-14038__en-us_topic_0000001973076798_b18216526383">Alarm ></strong> <strong id="ALM-14038__en-us_topic_0000001973076798_b122075817202">Thresholds</strong> > <em id="ALM-14038__en-us_topic_0000001973076798_i10674629125819">Name of the desired cluster</em><strong id="ALM-14038__en-us_topic_0000001973076798_b76731229185816"> ></strong> <strong id="ALM-14038__en-us_topic_0000001973076798_en-us_topic_0070543638_b5927966">HDFS</strong>.</p>
|
||||
<p id="ALM-14038__en-us_topic_0000001973076798_en-us_topic_0070543662_p37388429">The alarm is cleared when the heap memory usage is less than or equal to the threshold.</p>
|
||||
</div>
|
||||
<div class="section" id="ALM-14038__en-us_topic_0000001973076798_section13302888"><h4 class="sectiontitle"><span id="ALM-14038__en-us_topic_0000001973076798_text66488119489">Alarm Attributes</span></h4>
|
||||
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-14038__en-us_topic_0000001973076798_table33986641" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-14038__en-us_topic_0000001973076798_row13879140"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.1"><p id="ALM-14038__en-us_topic_0000001973076798_p50468531"><span id="ALM-14038__en-us_topic_0000001973076798_text1074744511529">Alarm ID</span></p>
|
||||
</th>
|
||||
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.2"><p id="ALM-14038__en-us_topic_0000001973076798_p61419199"><span id="ALM-14038__en-us_topic_0000001973076798_text529420513457">Alarm Severity</span></p>
|
||||
</th>
|
||||
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.3"><p id="ALM-14038__en-us_topic_0000001973076798_p8899183"><span id="ALM-14038__en-us_topic_0000001973076798_text139206232502">Auto Cleared</span></p>
|
||||
</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody><tr id="ALM-14038__en-us_topic_0000001973076798_row49745195"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.1 "><p id="ALM-14038__en-us_topic_0000001973076798_p2829020">14038</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.2 "><p id="ALM-14038__en-us_topic_0000001973076798_p6320216191712">Critical (default threshold: 95%)</p>
|
||||
<p id="ALM-14038__en-us_topic_0000001973076798_p51431020">Major (default threshold: 90%)</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.3 "><p id="ALM-14038__en-us_topic_0000001973076798_p39160124">Yes</p>
|
||||
</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
</div>
|
||||
</div>
|
||||
<div class="section" id="ALM-14038__en-us_topic_0000001973076798_section52617132"><h4 class="sectiontitle"><span id="ALM-14038__en-us_topic_0000001973076798_text0580183514489">Alarm Parameters</span></h4>
|
||||
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-14038__en-us_topic_0000001973076798_table17853499" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-14038__en-us_topic_0000001973076798_row18143824"><th align="left" class="cellrowborder" valign="top" width="20%" id="mcps1.3.3.2.1.4.1.1"><p id="ALM-14038__en-us_topic_0000001973076798_p171301518194511"><span id="ALM-14038__en-us_topic_0000001973076798_text169109297467">Type</span></p>
|
||||
</th>
|
||||
<th align="left" class="cellrowborder" valign="top" width="20%" id="mcps1.3.3.2.1.4.1.2"><p id="ALM-14038__en-us_topic_0000001973076798_p60363621"><span id="ALM-14038__en-us_topic_0000001973076798_text12210145419505">Parameter</span></p>
|
||||
</th>
|
||||
<th align="left" class="cellrowborder" valign="top" width="60%" id="mcps1.3.3.2.1.4.1.3"><p id="ALM-14038__en-us_topic_0000001973076798_p57615147"><span id="ALM-14038__en-us_topic_0000001973076798_text1971012173566">Description</span></p>
|
||||
</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody><tr id="ALM-14038__en-us_topic_0000001973076798_row13401184712152"><td class="cellrowborder" rowspan="4" valign="top" width="20%" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-14038__en-us_topic_0000001973076798_p357045754617">Location Information</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="20%" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-14038__en-us_topic_0000001973076798_p0124015142017">Source</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="60%" headers="mcps1.3.3.2.1.4.1.3 "><p id="ALM-14038__en-us_topic_0000001973076798_p141241159202">Specifies the cluster for which the alarm is generated.</p>
|
||||
</td>
|
||||
</tr>
|
||||
<tr id="ALM-14038__en-us_topic_0000001973076798_row36315337"><td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-14038__en-us_topic_0000001973076798_p4124161572012">ServiceName</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-14038__en-us_topic_0000001973076798_p28465328">Specifies the service for which the alarm is generated.</p>
|
||||
</td>
|
||||
</tr>
|
||||
<tr id="ALM-14038__en-us_topic_0000001973076798_row54861362"><td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-14038__en-us_topic_0000001973076798_p91242015132019">RoleName</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-14038__en-us_topic_0000001973076798_p40562973">Specifies the role for which the alarm is generated.</p>
|
||||
</td>
|
||||
</tr>
|
||||
<tr id="ALM-14038__en-us_topic_0000001973076798_row29522441"><td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-14038__en-us_topic_0000001973076798_p71246159207">HostName</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-14038__en-us_topic_0000001973076798_p20557292">Specifies the host for which the alarm is generated.</p>
|
||||
</td>
|
||||
</tr>
|
||||
<tr id="ALM-14038__en-us_topic_0000001973076798_row9721490519"><td class="cellrowborder" valign="top" width="20%" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-14038__en-us_topic_0000001973076798_p899814213535">Additional Information</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="20%" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-14038__en-us_topic_0000001973076798_p9124181519204">Trigger Condition</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="60%" headers="mcps1.3.3.2.1.4.1.3 "><p id="ALM-14038__en-us_topic_0000001973076798_p21241615172015">Specifies the alarm triggering condition.</p>
|
||||
</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
</div>
|
||||
</div>
|
||||
<div class="section" id="ALM-14038__en-us_topic_0000001973076798_section3792148"><h4 class="sectiontitle"><span id="ALM-14038__en-us_topic_0000001973076798_text1127833410585">Impact on the System</span></h4><p id="ALM-14038__en-us_topic_0000001973076798_p554052218813">The HDFS Router Heap Memory usage is too high, which affects the data read/write performance of the HDFS.</p>
|
||||
</div>
|
||||
<div class="section" id="ALM-14038__en-us_topic_0000001973076798_section34129336"><h4 class="sectiontitle"><span id="ALM-14038__en-us_topic_0000001973076798_text10245783115">Possible Causes</span></h4><p id="ALM-14038__en-us_topic_0000001973076798_en-us_topic_0070543644_p39238611">The HDFS Router Heap Memory is insufficient.</p>
|
||||
</div>
|
||||
<div class="section" id="ALM-14038__en-us_topic_0000001973076798_section1548819735318"><h4 class="sectiontitle"><span id="ALM-14038__en-us_topic_0000001973076798_text35421632154">Handling Procedure</span></h4><ol id="ALM-14038__en-us_topic_0000001973076798_ol2380770"><li id="ALM-14038__en-us_topic_0000001973076798_li20888164335514"><span>On MRS Manager, choose <strong id="ALM-14038__en-us_topic_0000001973076798_b10889194316552">Cluster </strong>> <strong id="ALM-14038__en-us_topic_0000001973076798_b12889144313550">Services </strong>> <strong id="ALM-14038__en-us_topic_0000001973076798_b58891643185512">Ranger </strong>> <strong id="ALM-14038__en-us_topic_0000001973076798_b888924318558">Instance </strong>> <strong id="ALM-14038__en-us_topic_0000001973076798_b168898438554">PolicySync</strong>. Click <strong id="ALM-14038__en-us_topic_0000001973076798_b1788944395513">Instance Configuration</strong> and then <strong id="ALM-14038__en-us_topic_0000001973076798_b0889194317551">All Configurations</strong>, and choose <strong id="ALM-14038__en-us_topic_0000001973076798_b4889843165513">PolicySync </strong>> <strong id="ALM-14038__en-us_topic_0000001973076798_b1288918434558">System</strong>.</span></li><li id="ALM-14038__en-us_topic_0000001973076798_li23925171294"><span>On the MRS Manager portal, choose <strong id="ALM-14038__en-us_topic_0000001973076798_b1327004983117">Cluster > </strong><strong id="ALM-14038__en-us_topic_0000001973076798_b327134915318">Services</strong> > <strong id="ALM-14038__en-us_topic_0000001973076798_b185693571706">HDFS</strong> > <strong id="ALM-14038__en-us_topic_0000001973076798_b329064911706">Configurations</strong> > <strong id="ALM-14038__en-us_topic_0000001973076798_b481801251706">All <strong id="ALM-14038__en-us_topic_0000001973076798_b7158112382314">Configurations</strong></strong>. In <strong id="ALM-14038__en-us_topic_0000001973076798_b309679481706">Search</strong>, enter <strong id="ALM-14038__en-us_topic_0000001973076798_b102760761706">GC_OPTS</strong> to check the GC_OPTS memory parameter of <strong id="ALM-14038__en-us_topic_0000001973076798_b1590961212109">HDFS->Router</strong>.</span></li><li id="ALM-14038__en-us_topic_0000001973076798_li11521246145513"><span>Increase the values of <strong id="ALM-14038__en-us_topic_0000001973076798_b7392115517103">-Xms</strong> and <strong id="ALM-14038__en-us_topic_0000001973076798_b6108759201014">-Xmx</strong> in the <strong id="ALM-14038__en-us_topic_0000001973076798_b268013391116">GC_OPTS</strong> parameter based on the site requirements and save the configuration.</span><p><div class="note" id="ALM-14038__en-us_topic_0000001973076798_note14125215132018"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="ALM-14038__en-us_topic_0000001973076798_p18103538171111">If this alarm is generated, the heap memory configured for Router cannot meet the requirements of the current process. You are advised to change the values of <strong id="ALM-14038__en-us_topic_0000001973076798_b19163152012173">-Xms</strong> and <strong id="ALM-14038__en-us_topic_0000001973076798_b1070482312179">-Xmx</strong> in the <strong id="ALM-14038__en-us_topic_0000001973076798_b109951326121712">GC_OPTS </strong>parameter to twice the size of the used heap memory or change the values based on the site requirements.</p>
|
||||
</div></div>
|
||||
</p></li><li id="ALM-14038__en-us_topic_0000001973076798_li35301418"><span>Restart the affected services or instances and check whether the alarm is cleared.</span><p><ul id="ALM-14038__en-us_topic_0000001973076798_ul49277313"><li id="ALM-14038__en-us_topic_0000001973076798_li40842634">If yes, no further action is required.</li><li id="ALM-14038__en-us_topic_0000001973076798_li32039392">If no, go to <a href="#ALM-14038__en-us_topic_0000001973076798_li42224042151734">5</a>.</li></ul>
|
||||
</p></li></ol>
|
||||
<p id="ALM-14038__en-us_topic_0000001973076798_p45053948"><strong id="ALM-14038__en-us_topic_0000001973076798_b35235717112450">Collect fault information.</strong></p>
|
||||
<ol start="5" id="ALM-14038__en-us_topic_0000001973076798_ol41031367112456"><li id="ALM-14038__en-us_topic_0000001973076798_li42224042151734"><a name="ALM-14038__en-us_topic_0000001973076798_li42224042151734"></a><a name="en-us_topic_0000001973076798_li42224042151734"></a><span>On MRS Manager, choose <strong id="ALM-14038__en-us_topic_0000001973076798_b11470051171118">O&M</strong>. In the navigation pane on the left, choose <strong id="ALM-14038__en-us_topic_0000001973076798_b2047175118119">Log</strong> > <strong id="ALM-14038__en-us_topic_0000001973076798_b847125151116">Download</strong>.</span></li><li id="ALM-14038__en-us_topic_0000001973076798_li28093597"><span>Expand the <strong id="ALM-14038__en-us_topic_0000001973076798_b6537154161118">Service</strong> drop-down list, and select <strong id="ALM-14038__en-us_topic_0000001973076798_b11914100105615">HDFS </strong>for the target cluster.</span></li><li id="ALM-14038__en-us_topic_0000001973076798_li51515784"><span>Click the edit icon in the upper right corner, and set <strong id="ALM-14038__en-us_topic_0000001973076798_b1896295911114">Start Date</strong> and <strong id="ALM-14038__en-us_topic_0000001973076798_b396385917117">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-14038__en-us_topic_0000001973076798_b4963195919116">Download</strong>.</span></li><li id="ALM-14038__en-us_topic_0000001973076798_li60988879"><span>Contact <span id="ALM-14038__en-us_topic_0000001973076798_text157218215128">O&M personnel</span> and provide the collected logs.</span></li></ol>
|
||||
</div>
|
||||
<div class="section" id="ALM-14038__en-us_topic_0000001973076798_section2125181572010"><h4 class="sectiontitle"><span id="ALM-14038__en-us_topic_0000001973076798_text976142215819">Alarm Clearance</span></h4><p id="ALM-14038__en-us_topic_0000001973076798_p17125121572018">This alarm is automatically cleared after the fault is rectified.</p>
|
||||
</div>
|
||||
<div class="section" id="ALM-14038__en-us_topic_0000001973076798_section891955662611"><h4 class="sectiontitle"><span id="ALM-14038__en-us_topic_0000001973076798_text13373191116114">Related Information</span></h4><p id="ALM-14038__en-us_topic_0000001973076798_p139191756122619"><span id="ALM-14038__en-us_topic_0000001973076798_text13669101910115">None.</span></p>
|
||||
</div>
|
||||
</div>
|
||||
<div>
|
||||
<div class="familylinks">
|
||||
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
103
docs/mrs/umn/ALM-14039.html
Normal file
103
docs/mrs/umn/ALM-14039.html
Normal file
File diff suppressed because it is too large
Load Diff
95
docs/mrs/umn/ALM-16051.html
Normal file
95
docs/mrs/umn/ALM-16051.html
Normal file
@ -0,0 +1,95 @@
|
||||
<a name="ALM-16051"></a><a name="ALM-16051"></a>
|
||||
|
||||
<h1 class="topictitle1">ALM-16051 Percentage of Sessions Connected to MetaStore Exceeds the Threshold</h1>
|
||||
<div id="body0000001971167310"><div class="section" id="ALM-16051__en-us_topic_0000001759357929_section14753556"><h4 class="sectiontitle"><span id="ALM-16051__en-us_topic_0000001759357929_text194151118891">Alarm Description</span></h4><p id="ALM-16051__en-us_topic_0000001759357929_p9833111914396">The system checks the percentage of sessions connected to MetaStore to the maximum number of sessions allowed by MetaStore every 30 seconds. This alarm is generated when the percentage exceeds the threshold.</p>
|
||||
<p id="ALM-16051__en-us_topic_0000001759357929_p9432052193112">This alarm is cleared when the percentage of MetaStore sessions is less than or equal to the threshold.</p>
|
||||
<p id="ALM-16051__en-us_topic_0000001759357929_p175307509351">This alarm applies to MRS 3.3.1 or later.</p>
|
||||
</div>
|
||||
<div class="section" id="ALM-16051__en-us_topic_0000001759357929_section65673142"><h4 class="sectiontitle"><span id="ALM-16051__en-us_topic_0000001759357929_text47499211097">Alarm Attributes</span></h4>
|
||||
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-16051__en-us_topic_0000001759357929_table2697805" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-16051__en-us_topic_0000001759357929_row10450762"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.1"><p id="ALM-16051__en-us_topic_0000001759357929_p41205356"><span id="ALM-16051__en-us_topic_0000001759357929_text18577824394">Alarm ID</span></p>
|
||||
</th>
|
||||
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.2"><p id="ALM-16051__en-us_topic_0000001759357929_p49299555"><span id="ALM-16051__en-us_topic_0000001759357929_text1247920271299">Alarm Severity</span></p>
|
||||
</th>
|
||||
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.3"><p id="ALM-16051__en-us_topic_0000001759357929_p33841047"><span id="ALM-16051__en-us_topic_0000001759357929_text1817013114915">Auto Cleared</span></p>
|
||||
</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody><tr id="ALM-16051__en-us_topic_0000001759357929_row56770287"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.1 "><p id="ALM-16051__en-us_topic_0000001759357929_p34990548">16051</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.2 "><p id="ALM-16051__en-us_topic_0000001759357929_p1871702316619">Critical (default threshold: 90%)</p>
|
||||
<p id="ALM-16051__en-us_topic_0000001759357929_p107068377612">Major (default threshold: 80%)</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.3 "><p id="ALM-16051__en-us_topic_0000001759357929_p60672611">Yes</p>
|
||||
</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
</div>
|
||||
</div>
|
||||
<div class="section" id="ALM-16051__en-us_topic_0000001759357929_section54187374"><h4 class="sectiontitle"><span id="ALM-16051__en-us_topic_0000001759357929_text1073853410916">Alarm Parameters</span></h4>
|
||||
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-16051__en-us_topic_0000001759357929_table15534429" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-16051__en-us_topic_0000001759357929_row48561591"><th align="left" class="cellrowborder" valign="top" width="20%" id="mcps1.3.3.2.1.4.1.1"><p id="ALM-16051__en-us_topic_0000001759357929_p66611417155311"><span id="ALM-16051__en-us_topic_0000001759357929_text22831541205318">Type</span></p>
|
||||
</th>
|
||||
<th align="left" class="cellrowborder" valign="top" width="20%" id="mcps1.3.3.2.1.4.1.2"><p id="ALM-16051__en-us_topic_0000001759357929_p41174828"><span id="ALM-16051__en-us_topic_0000001759357929_text17559163911916">Parameter</span></p>
|
||||
</th>
|
||||
<th align="left" class="cellrowborder" valign="top" width="60%" id="mcps1.3.3.2.1.4.1.3"><p id="ALM-16051__en-us_topic_0000001759357929_p46826794"><span id="ALM-16051__en-us_topic_0000001759357929_text1037918429916">Description</span></p>
|
||||
</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody><tr id="ALM-16051__en-us_topic_0000001759357929_row1687144862510"><td class="cellrowborder" rowspan="4" valign="top" width="20%" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-16051__en-us_topic_0000001759357929_p891212519535">Location Information</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="20%" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-16051__en-us_topic_0000001759357929_p156438591896">Source</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="60%" headers="mcps1.3.3.2.1.4.1.3 "><p id="ALM-16051__en-us_topic_0000001759357929_p187931338134115">Specifies the cluster for which the alarm was generated.</p>
|
||||
</td>
|
||||
</tr>
|
||||
<tr id="ALM-16051__en-us_topic_0000001759357929_row34873944"><td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-16051__en-us_topic_0000001759357929_p65062640">ServiceName</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-16051__en-us_topic_0000001759357929_p33829733">Specifies the service for which the alarm was generated.</p>
|
||||
</td>
|
||||
</tr>
|
||||
<tr id="ALM-16051__en-us_topic_0000001759357929_row36032144"><td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-16051__en-us_topic_0000001759357929_p35626567">RoleName</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-16051__en-us_topic_0000001759357929_p49481274">Specifies the role for which the alarm was generated.</p>
|
||||
</td>
|
||||
</tr>
|
||||
<tr id="ALM-16051__en-us_topic_0000001759357929_row42678285"><td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-16051__en-us_topic_0000001759357929_p51620924">HostName</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-16051__en-us_topic_0000001759357929_p34048007">Specifies the host for which the alarm was generated.</p>
|
||||
</td>
|
||||
</tr>
|
||||
<tr id="ALM-16051__en-us_topic_0000001759357929_row37996610"><td class="cellrowborder" valign="top" width="20%" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-16051__en-us_topic_0000001759357929_p3661017185311">Additional Information</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="20%" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-16051__en-us_topic_0000001759357929_p57826595">Trigger condition</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="60%" headers="mcps1.3.3.2.1.4.1.3 "><p id="ALM-16051__en-us_topic_0000001759357929_p53442657">Specifies the alarm triggering condition.</p>
|
||||
</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
</div>
|
||||
</div>
|
||||
<div class="section" id="ALM-16051__en-us_topic_0000001759357929_section17924324"><h4 class="sectiontitle"><span id="ALM-16051__en-us_topic_0000001759357929_text590212459910">Impact on the System</span></h4><p id="ALM-16051__en-us_topic_0000001759357929_p33887971">If this alarm is generated, sessions connected to MetaStore are too many. As a result, new connections cannot be set up.</p>
|
||||
</div>
|
||||
<div class="section" id="ALM-16051__en-us_topic_0000001759357929_section27101193"><h4 class="sectiontitle"><span id="ALM-16051__en-us_topic_0000001759357929_text38482481894">Possible Causes</span></h4><p id="ALM-16051__en-us_topic_0000001759357929_p60571154">Too many clients are connected to MetaStore.</p>
|
||||
</div>
|
||||
<div class="section" id="ALM-16051__en-us_topic_0000001759357929_section4155171914486"><h4 class="sectiontitle"><span id="ALM-16051__en-us_topic_0000001759357929_text20124952495">Handling Procedure</span></h4><p class="tableheading" id="ALM-16051__en-us_topic_0000001759357929_p7316437"><strong id="ALM-16051__en-us_topic_0000001759357929_b10474858104020">Change the maximum number of MetaStore connections.</strong></p>
|
||||
<ol id="ALM-16051__en-us_topic_0000001759357929_ol39919782154939"><li id="ALM-16051__en-us_topic_0000001759357929_li24592442154756"><span>On MRS Manager, choose <strong id="ALM-16051__en-us_topic_0000001759357929_b7668151104218">Cluster</strong> > <strong id="ALM-16051__en-us_topic_0000001759357929_b66172039422">Services</strong> > <strong id="ALM-16051__en-us_topic_0000001759357929_b1020036164211">Hive</strong>, click <strong id="ALM-16051__en-us_topic_0000001759357929_b35928815420">Configuration</strong> and then <strong id="ALM-16051__en-us_topic_0000001759357929_b1488218107429">All Configurations</strong>.</span></li><li id="ALM-16051__en-us_topic_0000001759357929_li57566667154756"><span>In the <strong id="ALM-16051__en-us_topic_0000001759357929_b148244262424">All Configurations</strong> tab, search for <strong id="ALM-16051__en-us_topic_0000001759357929_b13672103414216">hive.metastore.server.max.threads</strong> and check whether the value is the maximum <strong id="ALM-16051__en-us_topic_0000001759357929_b20592343204212">10000</strong>.</span><p><ul id="ALM-16051__en-us_topic_0000001759357929_ul490973112514"><li id="ALM-16051__en-us_topic_0000001759357929_li12909103195115">If yes, go to <a href="#ALM-16051__en-us_topic_0000001759357929_li19517422154756">6</a>.</li><li id="ALM-16051__en-us_topic_0000001759357929_li84759384518">If no, go to <a href="#ALM-16051__en-us_topic_0000001759357929_li15632114075420">3</a>.</li></ul>
|
||||
</p></li><li id="ALM-16051__en-us_topic_0000001759357929_li15632114075420"><a name="ALM-16051__en-us_topic_0000001759357929_li15632114075420"></a><a name="en-us_topic_0000001759357929_li15632114075420"></a><span>Change the value of <strong id="ALM-16051__en-us_topic_0000001759357929_b574195910447">hive.metastore.server.max.threads</strong> to <strong id="ALM-16051__en-us_topic_0000001759357929_b3923508457">10000</strong> and click <strong id="ALM-16051__en-us_topic_0000001759357929_b285962164510">Save</strong>.</span></li><li id="ALM-16051__en-us_topic_0000001759357929_li5647193425214"><span>Click <strong id="ALM-16051__en-us_topic_0000001759357929_b1033816318451">Instances</strong>, select all MetaStore instances, and choose <strong id="ALM-16051__en-us_topic_0000001759357929_b5983135174510">More</strong> > <strong id="ALM-16051__en-us_topic_0000001759357929_b19907403459">Restart Instance</strong>.</span><p><div class="notice" id="ALM-16051__en-us_topic_0000001759357929_note786333193517"><span class="noticetitle"><img src="public_sys-resources/notice_3.0-en-us.png"> </span><div class="noticebody"><p id="ALM-16051__en-us_topic_0000001759357929_p086303111351">During MetaStore instance restart, the instance cannot provide services for external systems. SQL tasks that are being executed on the instance may fail.</p>
|
||||
</div></div>
|
||||
</p></li><li id="ALM-16051__en-us_topic_0000001759357929_li55981066154756"><span>Check whether this alarm is cleared.</span><p><ul id="ALM-16051__en-us_topic_0000001759357929_ul969924185813"><li id="ALM-16051__en-us_topic_0000001759357929_li1870014110581">If yes, no further action is required.</li><li id="ALM-16051__en-us_topic_0000001759357929_li0700154105813">If no, go to <a href="#ALM-16051__en-us_topic_0000001759357929_li19517422154756">6</a>.</li></ul>
|
||||
</p></li></ol>
|
||||
<p class="tableheading" id="ALM-16051__en-us_topic_0000001759357929_p18757678154812"><strong id="ALM-16051__en-us_topic_0000001759357929_b52838820155257">Collect fault information.</strong></p>
|
||||
<ol start="6" id="ALM-16051__en-us_topic_0000001759357929_ol1534104915533"><li id="ALM-16051__en-us_topic_0000001759357929_li19517422154756"><a name="ALM-16051__en-us_topic_0000001759357929_li19517422154756"></a><a name="en-us_topic_0000001759357929_li19517422154756"></a><span>On MRS Manager, choose <strong id="ALM-16051__en-us_topic_0000001759357929_b194411115467">O&M</strong>. In the navigation pane on the left, choose <strong id="ALM-16051__en-us_topic_0000001759357929_b29441411124615">Log</strong> > <strong id="ALM-16051__en-us_topic_0000001759357929_b19440119467">Download</strong>.</span></li><li id="ALM-16051__en-us_topic_0000001759357929_li23762613154756"><span>Expand the <strong id="ALM-16051__en-us_topic_0000001759357929_b104046381491526">Service</strong> drop-down list, and select <strong id="ALM-16051__en-us_topic_0000001759357929_b47197577691526">Hive</strong> for the target cluster.</span></li><li id="ALM-16051__en-us_topic_0000001759357929_li46450927154756"><span>Click the edit icon in the upper right corner, and set <strong id="ALM-16051__en-us_topic_0000001759357929_b170455552991526">Start Date</strong> and <strong id="ALM-16051__en-us_topic_0000001759357929_b114668113991526">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-16051__en-us_topic_0000001759357929_b180228772791526">Download</strong>.</span></li><li id="ALM-16051__en-us_topic_0000001759357929_li299017126594"><span>On MRS Manager, choose <strong id="ALM-16051__en-us_topic_0000001759357929_b19966202115298">Cluster</strong> > <strong id="ALM-16051__en-us_topic_0000001759357929_b396742114294">Services</strong> > <strong id="ALM-16051__en-us_topic_0000001759357929_b1496718219293">Hive</strong>. On the displayed <strong id="ALM-16051__en-us_topic_0000001759357929_b1896710216292">Dashboard</strong> page, click <strong id="ALM-16051__en-us_topic_0000001759357929_b796712219295">More</strong> and select <strong id="ALM-16051__en-us_topic_0000001759357929_b14967162142919">Collect Stack Information</strong>. On the displayed page, set the following parameters:</span><p><ul id="ALM-16051__en-us_topic_0000001759357929_ul37952019155910"><li id="ALM-16051__en-us_topic_0000001759357929_li172743511594">Select <strong id="ALM-16051__en-us_topic_0000001759357929_b816811610482">MetaStore</strong> for the role where you want to collect data.</li><li id="ALM-16051__en-us_topic_0000001759357929_li13947145795919">Select <strong id="ALM-16051__en-us_topic_0000001759357929_b154537423482">jstack</strong> and <strong id="ALM-16051__en-us_topic_0000001759357929_b7590195111485">Enable continuous collection of jstack and jmap -histo information</strong>.</li><li id="ALM-16051__en-us_topic_0000001759357929_li18905310116">Set the collection interval to 10 seconds and the duration to 2 minutes.</li></ul>
|
||||
</p></li><li id="ALM-16051__en-us_topic_0000001759357929_li185743711111"><span>Click <strong id="ALM-16051__en-us_topic_0000001759357929_b106135314496">OK</strong>. After the collection is complete, click <strong id="ALM-16051__en-us_topic_0000001759357929_b1041315340490">Download</strong>.</span></li><li id="ALM-16051__en-us_topic_0000001759357929_li7303743154756"><span>Contact <span id="ALM-16051__en-us_topic_0000001759357929_text126301214142412">O&M personnel</span> and provide the collected logs.</span></li></ol>
|
||||
</div>
|
||||
<div class="section" id="ALM-16051__en-us_topic_0000001759357929_section169311343318"><h4 class="sectiontitle"><span id="ALM-16051__en-us_topic_0000001759357929_text36991258595">Alarm Clearance</span></h4><p id="ALM-16051__en-us_topic_0000001759357929_p754913417333">This alarm is automatically cleared after the fault is rectified.</p>
|
||||
</div>
|
||||
<div class="section" id="ALM-16051__en-us_topic_0000001759357929_section47713037"><h4 class="sectiontitle"><span id="ALM-16051__en-us_topic_0000001759357929_text156071924105">Related Information</span></h4><p id="ALM-16051__en-us_topic_0000001759357929_p31026455"><span id="ALM-16051__en-us_topic_0000001759357929_text61294221672">None.</span></p>
|
||||
</div>
|
||||
</div>
|
||||
<div>
|
||||
<div class="familylinks">
|
||||
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
97
docs/mrs/umn/ALM-16052.html
Normal file
97
docs/mrs/umn/ALM-16052.html
Normal file
@ -0,0 +1,97 @@
|
||||
<a name="ALM-16052"></a><a name="ALM-16052"></a>
|
||||
|
||||
<h1 class="topictitle1">ALM-16052 Latency for MetaStore to Access the Meta Database During Table Creation Exceeds the Threshold</h1>
|
||||
<div id="body0000002007647337"><div class="section" id="ALM-16052__en-us_topic_0000002019514985_section14753556"><h4 class="sectiontitle"><span id="ALM-16052__en-us_topic_0000002019514985_text194151118891">Alarm Description</span></h4><p id="ALM-16052__en-us_topic_0000002019514985_p9833111914396">The system periodically checks the latency for MetaStore to access the meta database during table creation. This alarm is generated when the average latency in the last 5 minutes exceeds the threshold.</p>
|
||||
<p id="ALM-16052__en-us_topic_0000002019514985_p9432052193112">This alarm is cleared when the average latency falls below the threshold.</p>
|
||||
<p id="ALM-16052__en-us_topic_0000002019514985_p175307509351">This alarm applies to MRS 3.5.0 or later.</p>
|
||||
</div>
|
||||
<div class="section" id="ALM-16052__en-us_topic_0000002019514985_section65673142"><h4 class="sectiontitle"><span id="ALM-16052__en-us_topic_0000002019514985_text47499211097">Alarm Attributes</span></h4>
|
||||
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-16052__en-us_topic_0000002019514985_table2697805" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-16052__en-us_topic_0000002019514985_row10450762"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.1"><p id="ALM-16052__en-us_topic_0000002019514985_p41205356"><span id="ALM-16052__en-us_topic_0000002019514985_text18577824394">Alarm ID</span></p>
|
||||
</th>
|
||||
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.2"><p id="ALM-16052__en-us_topic_0000002019514985_p49299555"><span id="ALM-16052__en-us_topic_0000002019514985_text1247920271299">Alarm Severity</span></p>
|
||||
</th>
|
||||
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.3"><p id="ALM-16052__en-us_topic_0000002019514985_p33841047"><span id="ALM-16052__en-us_topic_0000002019514985_text1817013114915">Auto Cleared</span></p>
|
||||
</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody><tr id="ALM-16052__en-us_topic_0000002019514985_row56770287"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.1 "><p id="ALM-16052__en-us_topic_0000002019514985_p34990548">16052</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.2 "><p id="ALM-16052__en-us_topic_0000002019514985_p1871702316619">Critical (default threshold: 60 seconds)</p>
|
||||
<p id="ALM-16052__en-us_topic_0000002019514985_p107068377612">Major (default threshold: 10 seconds)</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.3 "><p id="ALM-16052__en-us_topic_0000002019514985_p60672611">Yes</p>
|
||||
</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
</div>
|
||||
</div>
|
||||
<div class="section" id="ALM-16052__en-us_topic_0000002019514985_section54187374"><h4 class="sectiontitle"><span id="ALM-16052__en-us_topic_0000002019514985_text1073853410916">Alarm Parameters</span></h4>
|
||||
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-16052__en-us_topic_0000002019514985_table15534429" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-16052__en-us_topic_0000002019514985_row48561591"><th align="left" class="cellrowborder" valign="top" width="20%" id="mcps1.3.3.2.1.4.1.1"><p id="ALM-16052__en-us_topic_0000002019514985_p66611417155311"><span id="ALM-16052__en-us_topic_0000002019514985_text22831541205318">Type</span></p>
|
||||
</th>
|
||||
<th align="left" class="cellrowborder" valign="top" width="20%" id="mcps1.3.3.2.1.4.1.2"><p id="ALM-16052__en-us_topic_0000002019514985_p41174828"><span id="ALM-16052__en-us_topic_0000002019514985_text17559163911916">Parameter</span></p>
|
||||
</th>
|
||||
<th align="left" class="cellrowborder" valign="top" width="60%" id="mcps1.3.3.2.1.4.1.3"><p id="ALM-16052__en-us_topic_0000002019514985_p46826794"><span id="ALM-16052__en-us_topic_0000002019514985_text1037918429916">Description</span></p>
|
||||
</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody><tr id="ALM-16052__en-us_topic_0000002019514985_row1687144862510"><td class="cellrowborder" rowspan="4" valign="top" width="20%" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-16052__en-us_topic_0000002019514985_p891212519535">Location Information</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="20%" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-16052__en-us_topic_0000002019514985_p156438591896">Source</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="60%" headers="mcps1.3.3.2.1.4.1.3 "><p id="ALM-16052__en-us_topic_0000002019514985_p187931338134115">Specifies the cluster for which the alarm was generated.</p>
|
||||
</td>
|
||||
</tr>
|
||||
<tr id="ALM-16052__en-us_topic_0000002019514985_row34873944"><td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-16052__en-us_topic_0000002019514985_p65062640">ServiceName</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-16052__en-us_topic_0000002019514985_p33829733">Specifies the service for which the alarm was generated.</p>
|
||||
</td>
|
||||
</tr>
|
||||
<tr id="ALM-16052__en-us_topic_0000002019514985_row36032144"><td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-16052__en-us_topic_0000002019514985_p35626567">RoleName</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-16052__en-us_topic_0000002019514985_p49481274">Specifies the role for which the alarm was generated.</p>
|
||||
</td>
|
||||
</tr>
|
||||
<tr id="ALM-16052__en-us_topic_0000002019514985_row42678285"><td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-16052__en-us_topic_0000002019514985_p51620924">HostName</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-16052__en-us_topic_0000002019514985_p34048007">Specifies the host for which the alarm was generated.</p>
|
||||
</td>
|
||||
</tr>
|
||||
<tr id="ALM-16052__en-us_topic_0000002019514985_row37996610"><td class="cellrowborder" valign="top" width="20%" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-16052__en-us_topic_0000002019514985_p3661017185311">Additional Information</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="20%" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-16052__en-us_topic_0000002019514985_p57826595">Trigger Condition</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="60%" headers="mcps1.3.3.2.1.4.1.3 "><p id="ALM-16052__en-us_topic_0000002019514985_p53442657">Specifies the alarm triggering condition.</p>
|
||||
</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
</div>
|
||||
</div>
|
||||
<div class="section" id="ALM-16052__en-us_topic_0000002019514985_section17924324"><h4 class="sectiontitle"><span id="ALM-16052__en-us_topic_0000002019514985_text590212459910">Impact on the System</span></h4><p id="ALM-16052__en-us_topic_0000002019514985_p33887971">If this alarm is generated, the latency for inserting related table information to the meta database is high during table creation in MetaStore. As a result, calling to MetaStore APIs becomes slow or an error occurs.</p>
|
||||
</div>
|
||||
<div class="section" id="ALM-16052__en-us_topic_0000002019514985_section27101193"><h4 class="sectiontitle"><span id="ALM-16052__en-us_topic_0000002019514985_text38482481894">Possible Causes</span></h4><p id="ALM-16052__en-us_topic_0000002019514985_p60571154">The MetaStore GC takes a long time or the meta database is abnormal (for example, the disk I/O usage is too high or there are too many long transactions).</p>
|
||||
</div>
|
||||
<div class="section" id="ALM-16052__en-us_topic_0000002019514985_section371559134715"><h4 class="sectiontitle"><span id="ALM-16052__en-us_topic_0000002019514985_text20124952495">Handling Procedure</span></h4><p id="ALM-16052__en-us_topic_0000002019514985_p1475721633118"><strong id="ALM-16052__en-us_topic_0000002019514985_b79031539183112">Check whether the GC time of MetaStore is too long.</strong></p>
|
||||
<ol id="ALM-16052__en-us_topic_0000002019514985_ol0325141516406"><li id="ALM-16052__en-us_topic_0000002019514985_li2325815164015"><span>Log in to MRS Manager, choose <strong id="ALM-16052__en-us_topic_0000002019514985_b107631022201911">O&M</strong> > <strong id="ALM-16052__en-us_topic_0000002019514985_b144182516198">Alarm</strong> > <strong id="ALM-16052__en-us_topic_0000002019514985_b1696782601911">Alarms</strong>, and check whether alarm <strong id="ALM-16052__en-us_topic_0000002019514985_b3921175911918">Heap Memory Usage of the Hive Process Exceeds the Threshold</strong> exists in the alarm list.</span><p><ul id="ALM-16052__en-us_topic_0000002019514985_ul332511518400"><li id="ALM-16052__en-us_topic_0000002019514985_li2325615124010">If yes, go to <a href="#ALM-16052__en-us_topic_0000002019514985_li032518153407">2</a>.</li><li id="ALM-16052__en-us_topic_0000002019514985_li732511155402">If no, go to <a href="#ALM-16052__en-us_topic_0000002019514985_li64031235409">4</a>.</li></ul>
|
||||
</p></li><li id="ALM-16052__en-us_topic_0000002019514985_li032518153407"><a name="ALM-16052__en-us_topic_0000002019514985_li032518153407"></a><a name="en-us_topic_0000002019514985_li032518153407"></a><span>Rectify the fault by following the handling procedure of <strong id="ALM-16052__en-us_topic_0000002019514985_b1897319762117">ALM-16005 Heap Memory Usage of the Hive Process Exceeds the Threshold</strong>.</span></li><li id="ALM-16052__en-us_topic_0000002019514985_li83257159401"><span>Check whether the alarm is cleared in the alarm list.</span><p><ul id="ALM-16052__en-us_topic_0000002019514985_ul123252159402"><li id="ALM-16052__en-us_topic_0000002019514985_li7325171515405">If yes, no further action is required.</li><li id="ALM-16052__en-us_topic_0000002019514985_li43251315194019">If no, go to <a href="#ALM-16052__en-us_topic_0000002019514985_li64031235409">4</a>.</li></ul>
|
||||
</p></li></ol>
|
||||
<p id="ALM-16052__en-us_topic_0000002019514985_p5491163573911"><strong id="ALM-16052__en-us_topic_0000002019514985_b13914108174016">Check whether the meta database is normal.</strong></p>
|
||||
<ol start="4" id="ALM-16052__en-us_topic_0000002019514985_ol832520157405"><li id="ALM-16052__en-us_topic_0000002019514985_li64031235409"><a name="ALM-16052__en-us_topic_0000002019514985_li64031235409"></a><a name="en-us_topic_0000002019514985_li64031235409"></a><span>Contact the administrator of the cluster meta database to check whether the database is normal.</span><p><ul id="ALM-16052__en-us_topic_0000002019514985_ul144931976416"><li id="ALM-16052__en-us_topic_0000002019514985_li194933711417">If yes, go to <a href="#ALM-16052__en-us_topic_0000002019514985_li032541514407">5</a>.</li><li id="ALM-16052__en-us_topic_0000002019514985_li145781013104113">If no, go to <a href="#ALM-16052__en-us_topic_0000002019514985_li19517422154756">6</a>.</li></ul>
|
||||
</p></li><li id="ALM-16052__en-us_topic_0000002019514985_li032541514407"><a name="ALM-16052__en-us_topic_0000002019514985_li032541514407"></a><a name="en-us_topic_0000002019514985_li032541514407"></a><span>Contact the meta database <span id="ALM-16052__en-us_topic_0000002019514985_text1275715251220">O&M personnel</span> to rectify the fault. After the meta database is restored, check whether the alarm is cleared in the alarm list.</span><p><ul id="ALM-16052__en-us_topic_0000002019514985_ul591211181832"><li id="ALM-16052__en-us_topic_0000002019514985_li19913718938">If yes, no further action is required.</li><li id="ALM-16052__en-us_topic_0000002019514985_li7439824239">If no, go to <a href="#ALM-16052__en-us_topic_0000002019514985_li19517422154756">6</a>.</li></ul>
|
||||
</p></li></ol>
|
||||
<p class="tableheading" id="ALM-16052__en-us_topic_0000002019514985_p18757678154812"><strong id="ALM-16052__en-us_topic_0000002019514985_b134171927102317">Collect fault information.</strong></p>
|
||||
<ol start="6" id="ALM-16052__en-us_topic_0000002019514985_ol1534104915533"><li id="ALM-16052__en-us_topic_0000002019514985_li19517422154756"><a name="ALM-16052__en-us_topic_0000002019514985_li19517422154756"></a><a name="en-us_topic_0000002019514985_li19517422154756"></a><span>On MRS Manager, choose <strong id="ALM-16052__en-us_topic_0000002019514985_b135442920238">O&M</strong>. In the navigation pane on the left, choose <strong id="ALM-16052__en-us_topic_0000002019514985_b113548293234">Log</strong> > <strong id="ALM-16052__en-us_topic_0000002019514985_b1135417292233">Download</strong>.</span></li><li id="ALM-16052__en-us_topic_0000002019514985_li23762613154756"><span>Expand the <strong id="ALM-16052__en-us_topic_0000002019514985_b442213472319">Service</strong> drop-down list, and select <strong id="ALM-16052__en-us_topic_0000002019514985_b114221034132311">Hive</strong> for the target cluster.</span></li><li id="ALM-16052__en-us_topic_0000002019514985_li46450927154756"><span>Click the edit icon in the upper right corner, and set <strong id="ALM-16052__en-us_topic_0000002019514985_b178877386238">Start Date</strong> and <strong id="ALM-16052__en-us_topic_0000002019514985_b088793802316">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-16052__en-us_topic_0000002019514985_b1888716386238">Download</strong>.</span></li><li id="ALM-16052__en-us_topic_0000002019514985_li299017126594"><span>On MRS Manager, choose <strong id="ALM-16052__en-us_topic_0000002019514985_b17715144122320">Cluster</strong> > <strong id="ALM-16052__en-us_topic_0000002019514985_b18715114182313">Services</strong> > <strong id="ALM-16052__en-us_topic_0000002019514985_b117155416235">Hive</strong>. On the displayed <strong id="ALM-16052__en-us_topic_0000002019514985_b571514152315">Dashboard</strong> page, click <strong id="ALM-16052__en-us_topic_0000002019514985_b187151041112311">More</strong> and select <strong id="ALM-16052__en-us_topic_0000002019514985_b1871504116239">Collect Stack Information</strong>. On the displayed page, set the following parameters:</span><p><ul id="ALM-16052__en-us_topic_0000002019514985_ul37952019155910"><li id="ALM-16052__en-us_topic_0000002019514985_li172743511594">Select <strong id="ALM-16052__en-us_topic_0000002019514985_b1969616122265">MetaStore</strong> for the role where you want to collect data.</li><li id="ALM-16052__en-us_topic_0000002019514985_li13947145795919">Select <strong id="ALM-16052__en-us_topic_0000002019514985_b1236112218261">jstack</strong> and <strong id="ALM-16052__en-us_topic_0000002019514985_b2036112211264">Enable continuous collection of jstack and jmap -histo information</strong>.</li><li id="ALM-16052__en-us_topic_0000002019514985_li18905310116">Set the collection interval to 10 seconds and the duration to 2 minutes.</li></ul>
|
||||
</p></li><li id="ALM-16052__en-us_topic_0000002019514985_li185743711111"><span>Click <strong id="ALM-16052__en-us_topic_0000002019514985_b1682485082619">OK</strong>. After the collection is complete, click <strong id="ALM-16052__en-us_topic_0000002019514985_b198257504269">Download</strong>.</span></li><li id="ALM-16052__en-us_topic_0000002019514985_li7303743154756"><span>Contact <span id="ALM-16052__en-us_topic_0000002019514985_text126301214142412">O&M personnel</span> and provide the collected logs and stack information.</span></li></ol>
|
||||
</div>
|
||||
<div class="section" id="ALM-16052__en-us_topic_0000002019514985_section169311343318"><h4 class="sectiontitle"><span id="ALM-16052__en-us_topic_0000002019514985_text36991258595">Alarm Clearance</span></h4><p id="ALM-16052__en-us_topic_0000002019514985_p754913417333">This alarm is automatically cleared after the fault is rectified.</p>
|
||||
</div>
|
||||
<div class="section" id="ALM-16052__en-us_topic_0000002019514985_section47713037"><h4 class="sectiontitle"><span id="ALM-16052__en-us_topic_0000002019514985_text156071924105">Related Information</span></h4><p id="ALM-16052__en-us_topic_0000002019514985_p31026455"><span id="ALM-16052__en-us_topic_0000002019514985_text61294221672">None.</span></p>
|
||||
</div>
|
||||
</div>
|
||||
<div>
|
||||
<div class="familylinks">
|
||||
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
97
docs/mrs/umn/ALM-16053.html
Normal file
97
docs/mrs/umn/ALM-16053.html
Normal file
@ -0,0 +1,97 @@
|
||||
<a name="ALM-16053"></a><a name="ALM-16053"></a>
|
||||
|
||||
<h1 class="topictitle1">ALM-16053 Average HQL Submission Time of Hive in the Last 5 Minutes Exceeds the Threshold</h1>
|
||||
<div id="body0000001971007542"><div class="section" id="ALM-16053__en-us_topic_0000001982875596_section14753556"><h4 class="sectiontitle"><span id="ALM-16053__en-us_topic_0000001982875596_text194151118891">Alarm Description</span></h4><p id="ALM-16053__en-us_topic_0000001982875596_p9833111914396">The system periodically checks the average HQL submission time, which is the time for calling the MapReduce/Spark/Tez APIs to submit Yarn jobs, including the time for uploading dependent temporary JAR packages and splitting files. This alarm is generated when the average HQL submission time exceeds the threshold.</p>
|
||||
<p id="ALM-16053__en-us_topic_0000001982875596_p9432052193112">This alarm is cleared when the HQL submission time falls below the threshold.</p>
|
||||
<p id="ALM-16053__en-us_topic_0000001982875596_p175307509351">This alarm applies to MRS 3.5.0 or later.</p>
|
||||
</div>
|
||||
<div class="section" id="ALM-16053__en-us_topic_0000001982875596_section65673142"><h4 class="sectiontitle"><span id="ALM-16053__en-us_topic_0000001982875596_text47499211097">Alarm Attributes</span></h4>
|
||||
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-16053__en-us_topic_0000001982875596_table2697805" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-16053__en-us_topic_0000001982875596_row10450762"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.1"><p id="ALM-16053__en-us_topic_0000001982875596_p41205356"><span id="ALM-16053__en-us_topic_0000001982875596_text18577824394">Alarm ID</span></p>
|
||||
</th>
|
||||
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.2"><p id="ALM-16053__en-us_topic_0000001982875596_p49299555"><span id="ALM-16053__en-us_topic_0000001982875596_text1247920271299">Alarm Severity</span></p>
|
||||
</th>
|
||||
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.3"><p id="ALM-16053__en-us_topic_0000001982875596_p33841047"><span id="ALM-16053__en-us_topic_0000001982875596_text1817013114915">Auto Cleared</span></p>
|
||||
</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody><tr id="ALM-16053__en-us_topic_0000001982875596_row56770287"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.1 "><p id="ALM-16053__en-us_topic_0000001982875596_p34990548">16053</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.2 "><p id="ALM-16053__en-us_topic_0000001982875596_p1871702316619">Critical (default threshold: 240 seconds)</p>
|
||||
<p id="ALM-16053__en-us_topic_0000001982875596_p107068377612">Major (default threshold: 120 seconds)</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.3 "><p id="ALM-16053__en-us_topic_0000001982875596_p60672611">Yes</p>
|
||||
</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
</div>
|
||||
</div>
|
||||
<div class="section" id="ALM-16053__en-us_topic_0000001982875596_section54187374"><h4 class="sectiontitle"><span id="ALM-16053__en-us_topic_0000001982875596_text1073853410916">Alarm Parameters</span></h4>
|
||||
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-16053__en-us_topic_0000001982875596_table15534429" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-16053__en-us_topic_0000001982875596_row48561591"><th align="left" class="cellrowborder" valign="top" width="20%" id="mcps1.3.3.2.1.4.1.1"><p id="ALM-16053__en-us_topic_0000001982875596_p66611417155311"><span id="ALM-16053__en-us_topic_0000001982875596_text22831541205318">Type</span></p>
|
||||
</th>
|
||||
<th align="left" class="cellrowborder" valign="top" width="20%" id="mcps1.3.3.2.1.4.1.2"><p id="ALM-16053__en-us_topic_0000001982875596_p41174828"><span id="ALM-16053__en-us_topic_0000001982875596_text17559163911916">Parameter</span></p>
|
||||
</th>
|
||||
<th align="left" class="cellrowborder" valign="top" width="60%" id="mcps1.3.3.2.1.4.1.3"><p id="ALM-16053__en-us_topic_0000001982875596_p46826794"><span id="ALM-16053__en-us_topic_0000001982875596_text1037918429916">Description</span></p>
|
||||
</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody><tr id="ALM-16053__en-us_topic_0000001982875596_row1687144862510"><td class="cellrowborder" rowspan="4" valign="top" width="20%" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-16053__en-us_topic_0000001982875596_p891212519535">Location Information</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="20%" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-16053__en-us_topic_0000001982875596_p156438591896">Source</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="60%" headers="mcps1.3.3.2.1.4.1.3 "><p id="ALM-16053__en-us_topic_0000001982875596_p187931338134115">Specifies the cluster for which the alarm was generated.</p>
|
||||
</td>
|
||||
</tr>
|
||||
<tr id="ALM-16053__en-us_topic_0000001982875596_row34873944"><td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-16053__en-us_topic_0000001982875596_p65062640">ServiceName</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-16053__en-us_topic_0000001982875596_p33829733">Specifies the service for which the alarm was generated.</p>
|
||||
</td>
|
||||
</tr>
|
||||
<tr id="ALM-16053__en-us_topic_0000001982875596_row36032144"><td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-16053__en-us_topic_0000001982875596_p35626567">RoleName</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-16053__en-us_topic_0000001982875596_p49481274">Specifies the role for which the alarm was generated.</p>
|
||||
</td>
|
||||
</tr>
|
||||
<tr id="ALM-16053__en-us_topic_0000001982875596_row42678285"><td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-16053__en-us_topic_0000001982875596_p51620924">HostName</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-16053__en-us_topic_0000001982875596_p34048007">Specifies the host for which the alarm was generated.</p>
|
||||
</td>
|
||||
</tr>
|
||||
<tr id="ALM-16053__en-us_topic_0000001982875596_row37996610"><td class="cellrowborder" valign="top" width="20%" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-16053__en-us_topic_0000001982875596_p3661017185311">Additional Information</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="20%" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-16053__en-us_topic_0000001982875596_p57826595">Trigger Condition</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="60%" headers="mcps1.3.3.2.1.4.1.3 "><p id="ALM-16053__en-us_topic_0000001982875596_p53442657">Specifies the alarm triggering condition.</p>
|
||||
</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
</div>
|
||||
</div>
|
||||
<div class="section" id="ALM-16053__en-us_topic_0000001982875596_section17924324"><h4 class="sectiontitle"><span id="ALM-16053__en-us_topic_0000001982875596_text590212459910">Impact on the System</span></h4><p id="ALM-16053__en-us_topic_0000001982875596_p33887971">If this alarm is generated, the average HQL submission time in the last 5 minutes exceeds the threshold. As a result, the HQL running time is prolonged. Errors may occur in Hive On Spark jobs.</p>
|
||||
</div>
|
||||
<div class="section" id="ALM-16053__en-us_topic_0000001982875596_section27101193"><h4 class="sectiontitle"><span id="ALM-16053__en-us_topic_0000001982875596_text38482481894">Possible Causes</span></h4><p id="ALM-16053__en-us_topic_0000001982875596_p60571154">The HiveServer GC time is too long or the HDFS NameNode/Router RPC latency is too long.</p>
|
||||
</div>
|
||||
<div class="section" id="ALM-16053__en-us_topic_0000001982875596_section371559134715"><h4 class="sectiontitle"><span id="ALM-16053__en-us_topic_0000001982875596_text20124952495">Handling Procedure</span></h4><p id="ALM-16053__en-us_topic_0000001982875596_p1475721633118"><strong id="ALM-16053__en-us_topic_0000001982875596_b1511865993615">Check whether the GC time of HiveServer is too long.</strong></p>
|
||||
<ol id="ALM-16053__en-us_topic_0000001982875596_ol8822858203413"><li id="ALM-16053__en-us_topic_0000001982875596_li1582205817344"><span>Log in to MRS Manager, choose <strong id="ALM-16053__en-us_topic_0000001982875596_b74241193372">O&M</strong> > <strong id="ALM-16053__en-us_topic_0000001982875596_b042410933712">Alarm</strong> > <strong id="ALM-16053__en-us_topic_0000001982875596_b64251896375">Alarms</strong>, and check whether alarm <strong id="ALM-16053__en-us_topic_0000001982875596_b104261691374">Heap Memory Usage of the Hive Process Exceeds the Threshold</strong> exists in the alarm list.</span><p><ul id="ALM-16053__en-us_topic_0000001982875596_ul982215581348"><li id="ALM-16053__en-us_topic_0000001982875596_li1982255883410">If yes, go to <a href="#ALM-16053__en-us_topic_0000001982875596_li12822145813344">2</a>.</li><li id="ALM-16053__en-us_topic_0000001982875596_li7822195810345">If no, go to <a href="#ALM-16053__en-us_topic_0000001982875596_li7821358163411">4</a>.</li></ul>
|
||||
</p></li><li id="ALM-16053__en-us_topic_0000001982875596_li12822145813344"><a name="ALM-16053__en-us_topic_0000001982875596_li12822145813344"></a><a name="en-us_topic_0000001982875596_li12822145813344"></a><span>Rectify the fault by following the handling procedure of <strong id="ALM-16053__en-us_topic_0000001982875596_b10726132512376">ALM-16005 Heap Memory Usage of the Hive Process Exceeds the Threshold</strong>.</span></li><li id="ALM-16053__en-us_topic_0000001982875596_li6822135853416"><span>Check whether the alarm is cleared in the alarm list.</span><p><ul id="ALM-16053__en-us_topic_0000001982875596_ul78221058103411"><li id="ALM-16053__en-us_topic_0000001982875596_li1582245813346">If yes, no further action is required.</li><li id="ALM-16053__en-us_topic_0000001982875596_li68222589347">If no, go to <a href="#ALM-16053__en-us_topic_0000001982875596_li7821358163411">4</a>.</li></ul>
|
||||
</p></li></ol>
|
||||
<p id="ALM-16053__en-us_topic_0000001982875596_p15209185510317"><strong id="ALM-16053__en-us_topic_0000001982875596_b1767952683216">Check whether the HDFS RPC latency is too long.</strong></p>
|
||||
<ol start="4" id="ALM-16053__en-us_topic_0000001982875596_ol13822165819341"><li id="ALM-16053__en-us_topic_0000001982875596_li7821358163411"><a name="ALM-16053__en-us_topic_0000001982875596_li7821358163411"></a><a name="en-us_topic_0000001982875596_li7821358163411"></a><span>Check whether alarm <strong id="ALM-16053__en-us_topic_0000001982875596_b416992994718">Average NameNode RPC Processing Time Exceeds the Threshold</strong> exists in the alarm list.</span><p><ul id="ALM-16053__en-us_topic_0000001982875596_ul1182115818347"><li id="ALM-16053__en-us_topic_0000001982875596_li9821175813343">If yes, go to <a href="#ALM-16053__en-us_topic_0000001982875596_li1882125813417">5</a>.</li><li id="ALM-16053__en-us_topic_0000001982875596_li108212058113415">If no, go to <a href="#ALM-16053__en-us_topic_0000001982875596_li598725474015">7</a>.</li></ul>
|
||||
</p></li><li id="ALM-16053__en-us_topic_0000001982875596_li1882125813417"><a name="ALM-16053__en-us_topic_0000001982875596_li1882125813417"></a><a name="en-us_topic_0000001982875596_li1882125813417"></a><span>Rectify the fault by following the handling procedure of <strong id="ALM-16053__en-us_topic_0000001982875596_b1290513214487">ALM-14021 Average NameNode RPC Processing Time Exceeds the Threshold</strong>.</span></li><li id="ALM-16053__en-us_topic_0000001982875596_li282265816349"><span>Check whether the alarm is cleared in the alarm list.</span><p><ul id="ALM-16053__en-us_topic_0000001982875596_ul1182213587345"><li id="ALM-16053__en-us_topic_0000001982875596_li78214588349">If yes, no further action is required.</li><li id="ALM-16053__en-us_topic_0000001982875596_li3822758123420">If no, go to <a href="#ALM-16053__en-us_topic_0000001982875596_li598725474015">7</a>.</li></ul>
|
||||
</p></li></ol>
|
||||
<p class="tableheading" id="ALM-16053__en-us_topic_0000001982875596_p18757678154812"><strong id="ALM-16053__en-us_topic_0000001982875596_b1418116342483">Collect fault information.</strong></p>
|
||||
<ol start="7" id="ALM-16053__en-us_topic_0000001982875596_ol1534104915533"><li id="ALM-16053__en-us_topic_0000001982875596_li598725474015"><a name="ALM-16053__en-us_topic_0000001982875596_li598725474015"></a><a name="en-us_topic_0000001982875596_li598725474015"></a><span>On MRS Manager, choose <strong id="ALM-16053__en-us_topic_0000001982875596_b1371835204815">O&M</strong>. In the navigation pane on the left, choose <strong id="ALM-16053__en-us_topic_0000001982875596_b1537110353486">Log</strong> > <strong id="ALM-16053__en-us_topic_0000001982875596_b19371123515484">Download</strong>.</span></li><li id="ALM-16053__en-us_topic_0000001982875596_li23762613154756"><span>Expand the <strong id="ALM-16053__en-us_topic_0000001982875596_b13776103614814">Service</strong> drop-down list, and select <strong id="ALM-16053__en-us_topic_0000001982875596_b0776536194819">Hive</strong> for the target cluster.</span></li><li id="ALM-16053__en-us_topic_0000001982875596_li46450927154756"><span>Click the edit icon in the upper right corner, and set <strong id="ALM-16053__en-us_topic_0000001982875596_b3308173814487">Start Date</strong> and <strong id="ALM-16053__en-us_topic_0000001982875596_b83092382484">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-16053__en-us_topic_0000001982875596_b1430933816484">Download</strong>.</span></li><li id="ALM-16053__en-us_topic_0000001982875596_li299017126594"><span>On MRS Manager, choose <strong id="ALM-16053__en-us_topic_0000001982875596_b13962405486">Cluster</strong> > <strong id="ALM-16053__en-us_topic_0000001982875596_b20396440154817">Services</strong> > <strong id="ALM-16053__en-us_topic_0000001982875596_b123961640134818">Hive</strong>. On the displayed <strong id="ALM-16053__en-us_topic_0000001982875596_b93964407487">Dashboard</strong> page, click <strong id="ALM-16053__en-us_topic_0000001982875596_b19396154084819">More</strong> and select <strong id="ALM-16053__en-us_topic_0000001982875596_b839611403489">Collect Stack Information</strong>. On the displayed page, set the following parameters:</span><p><ul id="ALM-16053__en-us_topic_0000001982875596_ul37952019155910"><li id="ALM-16053__en-us_topic_0000001982875596_li172743511594">Select <strong id="ALM-16053__en-us_topic_0000001982875596_b206507423483">HiveServer</strong> for the role where you want to collect data.</li><li id="ALM-16053__en-us_topic_0000001982875596_li13947145795919">Select <strong id="ALM-16053__en-us_topic_0000001982875596_b1200450104815">jstack</strong> and <strong id="ALM-16053__en-us_topic_0000001982875596_b102001050104817">Enable continuous collection of jstack and jmap -histo information</strong>.</li><li id="ALM-16053__en-us_topic_0000001982875596_li18905310116">Set the collection interval to 10 seconds and the duration to 2 minutes.</li></ul>
|
||||
</p></li><li id="ALM-16053__en-us_topic_0000001982875596_li185743711111"><span>Click <strong id="ALM-16053__en-us_topic_0000001982875596_b1697954114813">OK</strong>. After the collection is complete, click <strong id="ALM-16053__en-us_topic_0000001982875596_b09725418484">Download</strong>.</span></li><li id="ALM-16053__en-us_topic_0000001982875596_li7303743154756"><span>Contact <span id="ALM-16053__en-us_topic_0000001982875596_text03191755144815">O&M personnel</span> and provide the collected logs and stack information.</span></li></ol>
|
||||
</div>
|
||||
<div class="section" id="ALM-16053__en-us_topic_0000001982875596_section169311343318"><h4 class="sectiontitle"><span id="ALM-16053__en-us_topic_0000001982875596_text36991258595">Alarm Clearance</span></h4><p id="ALM-16053__en-us_topic_0000001982875596_p754913417333">This alarm is automatically cleared after the fault is rectified.</p>
|
||||
</div>
|
||||
<div class="section" id="ALM-16053__en-us_topic_0000001982875596_section47713037"><h4 class="sectiontitle"><span id="ALM-16053__en-us_topic_0000001982875596_text156071924105">Related Information</span></h4><p id="ALM-16053__en-us_topic_0000001982875596_p31026455"><span id="ALM-16053__en-us_topic_0000001982875596_text61294221672">None.</span></p>
|
||||
</div>
|
||||
</div>
|
||||
<div>
|
||||
<div class="familylinks">
|
||||
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
89
docs/mrs/umn/ALM-17008.html
Normal file
89
docs/mrs/umn/ALM-17008.html
Normal file
@ -0,0 +1,89 @@
|
||||
<a name="ALM-17008"></a><a name="ALM-17008"></a>
|
||||
|
||||
<h1 class="topictitle1">ALM-17008 Abnormal Connection Between Oozie and ZooKeeper</h1>
|
||||
<div id="body0000001480131269"><div class="section" id="ALM-17008__section17558163642310"><h4 class="sectiontitle"><span id="ALM-17008__text141671903013">Alarm Description</span></h4><p id="ALM-17008__p9749152652116">In HA mode, Oozie depends on ZooKeeper. This alarm is generated when the connection between Oozie and ZooKeeper is abnormal for three consecutive times.</p>
|
||||
<p id="ALM-17008__p6622162219224">This alarm is cleared when the connection between Oozie and ZooKeeper becomes normal.</p>
|
||||
<p id="ALM-17008__p175307509351">This alarm applies to MRS 3.3.0 or later.</p>
|
||||
</div>
|
||||
<div class="section" id="ALM-17008__section0241152918232"><h4 class="sectiontitle"><span id="ALM-17008__text10304425193010">Alarm Attributes</span></h4>
|
||||
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-17008__table36969235" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-17008__row42433012"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.1"><p id="ALM-17008__p6820189"><span id="ALM-17008__text15625428153013">Alarm ID</span></p>
|
||||
</th>
|
||||
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.2"><p id="ALM-17008__p15564413"><span id="ALM-17008__text1785183153010">Alarm Severity</span></p>
|
||||
</th>
|
||||
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.3"><p id="ALM-17008__p52757950"><span id="ALM-17008__text0945193443018">Auto Cleared</span></p>
|
||||
</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody><tr id="ALM-17008__row21396528"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.1 "><p id="ALM-17008__p9126191145711">17008</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.2 "><p id="ALM-17008__p1912519111570">Minor</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.3 "><p id="ALM-17008__p13123014577">Yes</p>
|
||||
</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
</div>
|
||||
</div>
|
||||
<div class="section" id="ALM-17008__section134680194295"><h4 class="sectiontitle"><span id="ALM-17008__text1656918386302">Alarm Parameters</span></h4>
|
||||
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-17008__table33959796" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-17008__row12041852"><th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.1"><p id="ALM-17008__p10293461"><span id="ALM-17008__text336474319309">Parameter</span></p>
|
||||
</th>
|
||||
<th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.2"><p id="ALM-17008__p28464046"><span id="ALM-17008__text9445144613014">Description</span></p>
|
||||
</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody><tr id="ALM-17008__row6761927132120"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-17008__p156438591896">Source</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-17008__p187931338134115">Specifies the cluster for which the alarm is generated.</p>
|
||||
</td>
|
||||
</tr>
|
||||
<tr id="ALM-17008__row32756501"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-17008__p65062640">ServiceName</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-17008__p32823928">Specifies the service for which the alarm is generated.</p>
|
||||
</td>
|
||||
</tr>
|
||||
<tr id="ALM-17008__row26979903"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-17008__p35626567">RoleName</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-17008__p49072303">Specifies the role for which the alarm is generated.</p>
|
||||
</td>
|
||||
</tr>
|
||||
<tr id="ALM-17008__row38997545"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-17008__p51620924">HostName</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-17008__p43906777">Specifies the host for which the alarm is generated.</p>
|
||||
</td>
|
||||
</tr>
|
||||
<tr id="ALM-17008__row59616680"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-17008__p64221796">Trigger Condition</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-17008__p34583021">Specifies the threshold for triggering the alarm.</p>
|
||||
</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
</div>
|
||||
</div>
|
||||
<div class="section" id="ALM-17008__section7522603"><h4 class="sectiontitle"><span id="ALM-17008__text1741875023020">Impact on the System</span></h4><p id="ALM-17008__p14916535293">Running scheduling tasks are blocked and new scheduling tasks cannot be submitted. In HA mode, the Oozie service will restart if this alarm is reported.</p>
|
||||
</div>
|
||||
<div class="section" id="ALM-17008__section594567"><h4 class="sectiontitle"><span id="ALM-17008__text32611054143010">Possible Causes</span></h4><ul id="ALM-17008__ul241163443715"><li id="ALM-17008__li1188327193">The ZooKeeper service is abnormal.</li><li id="ALM-17008__li3439144013713">Oozie fails to connect to ZooKeeper.</li></ul>
|
||||
</div>
|
||||
<div class="section" id="ALM-17008__section16780114115290"><h4 class="sectiontitle"><span id="ALM-17008__text13657135716302">Handling Procedure</span></h4><p id="ALM-17008__p8305102222517"><strong id="ALM-17008__b1226145915255">Check the ZooKeeper service status.</strong></p>
|
||||
<ol id="ALM-17008__ol16711171114299"><li id="ALM-17008__li19710611182916"><span>In the service list on MRS Manager, check whether <strong id="ALM-17008__b1751978068113529">Running Status</strong> of ZooKeeper is <strong id="ALM-17008__b481266500113529">Normal</strong>.</span><p><ul id="ALM-17008__ul1471016119299"><li id="ALM-17008__li1171016113294">If yes, go to <a href="#ALM-17008__li571051132910">5</a>.</li><li id="ALM-17008__li371031132918">If no, go to <a href="#ALM-17008__li107100116298">2</a>.</li></ul>
|
||||
</p></li><li id="ALM-17008__li107100116298"><a name="ALM-17008__li107100116298"></a><a name="li107100116298"></a><span>In the alarm list, check whether <strong id="ALM-17008__b1380745416275">ALM-13000 ZooKeeper Service Unavailable</strong> is reported.</span><p><ul id="ALM-17008__ul10710511112911"><li id="ALM-17008__li15710411122914">If yes, go to <a href="#ALM-17008__li197112115297">3</a>.</li><li id="ALM-17008__li13710191162913">If no, go to <a href="#ALM-17008__li571051132910">5</a>.</li></ul>
|
||||
</p></li><li id="ALM-17008__li197112115297"><a name="ALM-17008__li197112115297"></a><a name="li197112115297"></a><span>Rectify the fault by performing the operations provided for <a href="ALM-13000.html">ALM-13000 ZooKeeper Service Unavailable</a>.</span></li><li id="ALM-17008__li167111811182912"><span>Wait for several minutes and check whether the alarm <strong id="ALM-17008__b43532622813">Abnormal Connection Between Oozie and ZooKeeper</strong> is cleared.</span><p><ul id="ALM-17008__ul1971119114297"><li id="ALM-17008__li17111511172912">If yes, no further action is required.</li><li id="ALM-17008__li1771141117290">If no, go to <a href="#ALM-17008__li571051132910">5</a>.</li></ul>
|
||||
</p></li></ol>
|
||||
<p id="ALM-17008__p1799018108279"><strong id="ALM-17008__b1430845419287">Check the connectivity between Oozie and ZooKeeper.</strong></p>
|
||||
<ol start="5" id="ALM-17008__ol127108116298"><li id="ALM-17008__li571051132910"><a name="ALM-17008__li571051132910"></a><a name="li571051132910"></a><span>Log in to MRS Manager, choose <strong id="ALM-17008__b4826165573011">O&M</strong> > <strong id="ALM-17008__b982645516305">Log</strong> > <strong id="ALM-17008__b9826125516301">Online Search</strong>, select the Oozie service, and search for the keyword <strong id="ALM-17008__b38266556305">[Oozie Alarm Enhancement][ZooKeeper]</strong> in the log. View the cause in the log, and rectify the fault. In the alarm list, check whether the alarm <strong id="ALM-17008__b565117743116">Abnormal Connection Between Oozie and ZooKeeper</strong> is cleared.</span><p><ul id="ALM-17008__ul11710711182918"><li id="ALM-17008__li177104118290">If yes, no further action is required.</li><li id="ALM-17008__li171031112294">If no, go to <a href="#ALM-17008__li113211349123620">6</a>.</li></ul>
|
||||
</p></li></ol>
|
||||
<p id="ALM-17008__p10510389"><strong id="ALM-17008__b41027473192141">Collect fault information.</strong></p>
|
||||
<ol start="6" id="ALM-17008__ol2322174912366"><li id="ALM-17008__li113211349123620"><a name="ALM-17008__li113211349123620"></a><a name="li113211349123620"></a><span>On MRS Manager, choose <strong id="ALM-17008__b965845778113530">O&M</strong>. In the navigation pane on the left, choose <strong id="ALM-17008__b534383092113530">Log</strong> > <strong id="ALM-17008__b1325857938113530">Download</strong>.</span></li><li id="ALM-17008__li5321149203613"><span>Select <strong id="ALM-17008__b12949142003716">Oozie</strong> for <strong id="ALM-17008__b199171014163715">Service</strong> and click <strong id="ALM-17008__b114889240377">OK</strong>.</span></li><li id="ALM-17008__li33221549123612"><span>Click <span><img id="ALM-17008__image832124913364" src="en-us_image_0000002415853277.png"></span> in the upper right corner, and set <strong id="ALM-17008__b959249210113530">Start Date</strong> and <strong id="ALM-17008__b423222151113530">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-17008__b2139765547113530">Download</strong>.</span></li><li id="ALM-17008__li13322124973618"><span>Contact <span id="ALM-17008__text1932204919369">O&M personnel</span> and provide the collected logs.</span></li></ol>
|
||||
</div>
|
||||
<div class="section" id="ALM-17008__section169311343318"><h4 class="sectiontitle"><span id="ALM-17008__text11828153133114">Alarm Clearance</span></h4><p id="ALM-17008__p330820123019">This alarm is automatically cleared after the fault is rectified.</p>
|
||||
</div>
|
||||
<div class="section" id="ALM-17008__section40120107"><h4 class="sectiontitle"><span id="ALM-17008__text1428897183118">Related Information</span></h4><p id="ALM-17008__p104585259303"><span id="ALM-17008__text61294221672">None.</span></p>
|
||||
</div>
|
||||
</div>
|
||||
<div>
|
||||
<div class="familylinks">
|
||||
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
89
docs/mrs/umn/ALM-17009.html
Normal file
89
docs/mrs/umn/ALM-17009.html
Normal file
@ -0,0 +1,89 @@
|
||||
<a name="ALM-17009"></a><a name="ALM-17009"></a>
|
||||
|
||||
<h1 class="topictitle1">ALM-17009 Abnormal Connection Between Oozie and DBService</h1>
|
||||
<div id="body0000001430011836"><div class="section" id="ALM-17009__section17558163642310"><h4 class="sectiontitle"><span id="ALM-17009__text141671903013">Alarm Description</span></h4><p id="ALM-17009__p15361191315219">Oozie depends on DBService. After a task is submitted, the system checks DBService connectivity. This alarm is generated when the service fails the check for 10 consecutive times.</p>
|
||||
<p id="ALM-17009__p14655103992310">This alarm is cleared when the connection between Oozie and DBService becomes normal.</p>
|
||||
<p id="ALM-17009__p175307509351">This alarm applies to MRS 3.3.0 or later.</p>
|
||||
</div>
|
||||
<div class="section" id="ALM-17009__section0241152918232"><h4 class="sectiontitle"><span id="ALM-17009__text10304425193010">Alarm Attributes</span></h4>
|
||||
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-17009__table36969235" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-17009__row42433012"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.1"><p id="ALM-17009__p6820189"><span id="ALM-17009__text15625428153013">Alarm ID</span></p>
|
||||
</th>
|
||||
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.2"><p id="ALM-17009__p15564413"><span id="ALM-17009__text1785183153010">Alarm Severity</span></p>
|
||||
</th>
|
||||
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.3"><p id="ALM-17009__p52757950"><span id="ALM-17009__text0945193443018">Auto Cleared</span></p>
|
||||
</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody><tr id="ALM-17009__row21396528"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.1 "><p id="ALM-17009__p9126191145711">17009</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.2 "><p id="ALM-17009__p1912519111570">Minor</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.3 "><p id="ALM-17009__p13123014577">Yes</p>
|
||||
</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
</div>
|
||||
</div>
|
||||
<div class="section" id="ALM-17009__section134680194295"><h4 class="sectiontitle"><span id="ALM-17009__text1656918386302">Alarm Parameters</span></h4>
|
||||
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-17009__table33959796" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-17009__row12041852"><th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.1"><p id="ALM-17009__p10293461"><span id="ALM-17009__text336474319309">Parameter</span></p>
|
||||
</th>
|
||||
<th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.2"><p id="ALM-17009__p28464046"><span id="ALM-17009__text9445144613014">Description</span></p>
|
||||
</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody><tr id="ALM-17009__row6761927132120"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-17009__p156438591896">Source</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-17009__p187931338134115">Specifies the cluster for which the alarm is generated.</p>
|
||||
</td>
|
||||
</tr>
|
||||
<tr id="ALM-17009__row32756501"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-17009__p65062640">ServiceName</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-17009__p32823928">Specifies the service for which the alarm is generated.</p>
|
||||
</td>
|
||||
</tr>
|
||||
<tr id="ALM-17009__row26979903"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-17009__p35626567">RoleName</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-17009__p49072303">Specifies the role for which the alarm is generated.</p>
|
||||
</td>
|
||||
</tr>
|
||||
<tr id="ALM-17009__row38997545"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-17009__p51620924">HostName</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-17009__p43906777">Specifies the host for which the alarm is generated.</p>
|
||||
</td>
|
||||
</tr>
|
||||
<tr id="ALM-17009__row59616680"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-17009__p64221796">Trigger Condition</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-17009__p34583021">Specifies the threshold for triggering the alarm.</p>
|
||||
</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
</div>
|
||||
</div>
|
||||
<div class="section" id="ALM-17009__section7522603"><h4 class="sectiontitle"><span id="ALM-17009__text1741875023020">Impact on the System</span></h4><p id="ALM-17009__p149391530114416">Running scheduling tasks are blocked and new scheduling tasks cannot be submitted.</p>
|
||||
</div>
|
||||
<div class="section" id="ALM-17009__section594567"><h4 class="sectiontitle"><span id="ALM-17009__text32611054143010">Possible Causes</span></h4><ul id="ALM-17009__ul241163443715"><li id="ALM-17009__li1188327193">The DBService service is abnormal.</li><li id="ALM-17009__li3439144013713">Oozie fails to connect to DBService.</li></ul>
|
||||
</div>
|
||||
<div class="section" id="ALM-17009__section16780114115290"><h4 class="sectiontitle"><span id="ALM-17009__text13657135716302">Handling Procedure</span></h4><p id="ALM-17009__p284344852311"><strong id="ALM-17009__b177581420102410">Check the DBService status.</strong></p>
|
||||
<ol id="ALM-17009__ol591224932419"><li id="ALM-17009__li6912194922417"><span>In the service list on MRS Manager, check whether <strong id="ALM-17009__b3287205313394">Running Status</strong> of DBService is <strong id="ALM-17009__b9287253203911">Normal</strong>.</span><p><ul id="ALM-17009__ul179121649112415"><li id="ALM-17009__li691214494248">If yes, go to <a href="#ALM-17009__li1877010345257">5</a>.</li><li id="ALM-17009__li179121649182418">If no, go to <a href="#ALM-17009__li1691284919247">2</a>.</li></ul>
|
||||
</p></li><li id="ALM-17009__li1691284919247"><a name="ALM-17009__li1691284919247"></a><a name="li1691284919247"></a><span>In the alarm list, check whether <strong id="ALM-17009__b17338145111438">ALM-27001 DBService Service Unavailable</strong> is reported.</span><p><ul id="ALM-17009__ul14912164902420"><li id="ALM-17009__li139207456260">If yes, go to <a href="#ALM-17009__li139121449162411">3</a>.</li><li id="ALM-17009__li9912194916243">If no, go to <a href="#ALM-17009__li1877010345257">5</a>.</li></ul>
|
||||
</p></li><li id="ALM-17009__li139121449162411"><a name="ALM-17009__li139121449162411"></a><a name="li139121449162411"></a><span>Rectify the fault by performing the operations provided for <a href="ALM-27001.html">ALM-27001 DBService Service Unavailable</a>.</span></li><li id="ALM-17009__li891274932411"><span>Wait for several minutes and check whether the alarm <strong id="ALM-17009__b192995034415">Abnormal Connection Between Oozie and DBService</strong> is cleared.</span><p><ul id="ALM-17009__ul159121849112417"><li id="ALM-17009__li6912184919246">If yes, no further action is required.</li><li id="ALM-17009__li139121049122410">If no, go to <a href="#ALM-17009__li1877010345257">5</a>.</li></ul>
|
||||
</p></li></ol>
|
||||
<p id="ALM-17009__p9228113862418"><strong id="ALM-17009__b35772027457">Check the connectivity between Oozie and DBService.</strong></p>
|
||||
<ol start="5" id="ALM-17009__ol8770133419253"><li id="ALM-17009__li1877010345257"><a name="ALM-17009__li1877010345257"></a><a name="li1877010345257"></a><span>Log in to MRS Manager, choose <strong id="ALM-17009__b20410554324">O&M</strong> > <strong id="ALM-17009__b1741855173217">Log</strong> > <strong id="ALM-17009__b12455514328">Online Search</strong>, select the Oozie service, and search for the keyword <strong id="ALM-17009__b124135518323">[Oozie Alarm Enhancement][DB Service]</strong> in the log. View the cause in the log, and rectify the fault. In the alarm list, check whether the alarm <strong id="ALM-17009__b109481951337">Abnormal Connection Between Oozie and DBService</strong> is cleared.</span><p><ul id="ALM-17009__ul577015344253"><li id="ALM-17009__li1277083472513">If yes, no further action is required.</li><li id="ALM-17009__li19770153432519">If no, go to <a href="#ALM-17009__li16610154216258">6</a>.</li></ul>
|
||||
</p></li></ol>
|
||||
<p id="ALM-17009__p10510389"><strong id="ALM-17009__b41027473192141">Collect fault information.</strong></p>
|
||||
<ol start="6" id="ALM-17009__ol13610144282520"><li id="ALM-17009__li16610154216258"><a name="ALM-17009__li16610154216258"></a><a name="li16610154216258"></a><span>On MRS Manager, choose <strong id="ALM-17009__b11290834204519">O&M</strong>. In the navigation pane on the left, choose <strong id="ALM-17009__b1329073484510">Log</strong> > <strong id="ALM-17009__b1290143416458">Download</strong>.</span></li><li id="ALM-17009__li166104426257"><span>Select <strong id="ALM-17009__b381913183467">Oozie</strong> for <strong id="ALM-17009__b68191118154616">Service</strong> and click <strong id="ALM-17009__b58201218194613">OK</strong>.</span></li><li id="ALM-17009__li12610124242519"><span>Click <span><img id="ALM-17009__image1761044214257" src="en-us_image_0000002382453800.png"></span> in the upper right corner, and set <strong id="ALM-17009__b45861621194619">Start Date</strong> and <strong id="ALM-17009__b19586172118462">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-17009__b19587132134610">Download</strong>.</span></li><li id="ALM-17009__li56101542152510"><span>Contact <span id="ALM-17009__text15884143294614">O&M personnel</span> and provide the collected logs.</span></li></ol>
|
||||
</div>
|
||||
<div class="section" id="ALM-17009__section169311343318"><h4 class="sectiontitle"><span id="ALM-17009__text11828153133114">Alarm Clearance</span></h4><p id="ALM-17009__p330820123019">This alarm is automatically cleared after the fault is rectified.</p>
|
||||
</div>
|
||||
<div class="section" id="ALM-17009__section40120107"><h4 class="sectiontitle"><span id="ALM-17009__text1428897183118">Related Information</span></h4><p id="ALM-17009__p104585259303"><span id="ALM-17009__text61294221672">None.</span></p>
|
||||
</div>
|
||||
</div>
|
||||
<div>
|
||||
<div class="familylinks">
|
||||
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
89
docs/mrs/umn/ALM-17010.html
Normal file
89
docs/mrs/umn/ALM-17010.html
Normal file
@ -0,0 +1,89 @@
|
||||
<a name="ALM-17010"></a><a name="ALM-17010"></a>
|
||||
|
||||
<h1 class="topictitle1">ALM-17010 Abnormal Connection Between Oozie and HDFS</h1>
|
||||
<div id="body0000001480251229"><div class="section" id="ALM-17010__section17558163642310"><h4 class="sectiontitle"><span id="ALM-17010__text141671903013">Alarm Description</span></h4><p id="ALM-17010__p895315602811">Oozie depends on HDFS. After a task is submitted, the system checks HDFS connectivity. This alarm is generated when the service fails the check for 3 consecutive times.</p>
|
||||
<p id="ALM-17010__p14655103992310">This alarm is cleared when the connection between Oozie and HDFS becomes normal.</p>
|
||||
<p id="ALM-17010__p175307509351">This alarm applies to MRS 3.3.0 or later.</p>
|
||||
</div>
|
||||
<div class="section" id="ALM-17010__section0241152918232"><h4 class="sectiontitle"><span id="ALM-17010__text10304425193010">Alarm Attributes</span></h4>
|
||||
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-17010__table36969235" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-17010__row42433012"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.1"><p id="ALM-17010__p6820189"><span id="ALM-17010__text15625428153013">Alarm ID</span></p>
|
||||
</th>
|
||||
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.2"><p id="ALM-17010__p15564413"><span id="ALM-17010__text1785183153010">Alarm Severity</span></p>
|
||||
</th>
|
||||
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.3"><p id="ALM-17010__p52757950"><span id="ALM-17010__text0945193443018">Auto Cleared</span></p>
|
||||
</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody><tr id="ALM-17010__row21396528"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.1 "><p id="ALM-17010__p9126191145711">17010</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.2 "><p id="ALM-17010__p1912519111570">Minor</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.3 "><p id="ALM-17010__p13123014577">Yes</p>
|
||||
</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
</div>
|
||||
</div>
|
||||
<div class="section" id="ALM-17010__section134680194295"><h4 class="sectiontitle"><span id="ALM-17010__text1656918386302">Alarm Parameters</span></h4>
|
||||
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-17010__table33959796" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-17010__row12041852"><th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.1"><p id="ALM-17010__p10293461"><span id="ALM-17010__text336474319309">Parameter</span></p>
|
||||
</th>
|
||||
<th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.2"><p id="ALM-17010__p28464046"><span id="ALM-17010__text9445144613014">Description</span></p>
|
||||
</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody><tr id="ALM-17010__row6761927132120"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-17010__p156438591896">Source</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-17010__p187931338134115">Specifies the cluster for which the alarm is generated.</p>
|
||||
</td>
|
||||
</tr>
|
||||
<tr id="ALM-17010__row32756501"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-17010__p65062640">ServiceName</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-17010__p32823928">Specifies the service for which the alarm is generated.</p>
|
||||
</td>
|
||||
</tr>
|
||||
<tr id="ALM-17010__row26979903"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-17010__p35626567">RoleName</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-17010__p49072303">Specifies the role for which the alarm is generated.</p>
|
||||
</td>
|
||||
</tr>
|
||||
<tr id="ALM-17010__row38997545"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-17010__p51620924">HostName</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-17010__p43906777">Specifies the host for which the alarm is generated.</p>
|
||||
</td>
|
||||
</tr>
|
||||
<tr id="ALM-17010__row59616680"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-17010__p64221796">Trigger Condition</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-17010__p34583021">Specifies the threshold for triggering the alarm.</p>
|
||||
</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
</div>
|
||||
</div>
|
||||
<div class="section" id="ALM-17010__section7522603"><h4 class="sectiontitle"><span id="ALM-17010__text1741875023020">Impact on the System</span></h4><p id="ALM-17010__p384234419443">Running scheduling tasks are blocked and new scheduling tasks cannot be submitted.</p>
|
||||
</div>
|
||||
<div class="section" id="ALM-17010__section594567"><h4 class="sectiontitle"><span id="ALM-17010__text32611054143010">Possible Causes</span></h4><p id="ALM-17010__p10636155622911">The HDFS service restarts, there is a fault, or the network connectivity is abnormal.</p>
|
||||
</div>
|
||||
<div class="section" id="ALM-17010__section16780114115290"><h4 class="sectiontitle"><span id="ALM-17010__text13657135716302">Handling Procedure</span></h4><p id="ALM-17010__p99073488298"><strong id="ALM-17010__b1226145915255">Check the HDFS service status.</strong></p>
|
||||
<ol id="ALM-17010__ol3102133583118"><li id="ALM-17010__li1110114358319"><span>In the service list on MRS Manager, check whether <strong id="ALM-17010__b6980155134812">Running Status</strong> of HDFS is <strong id="ALM-17010__b1598025117489">Normal</strong>.</span><p><ul id="ALM-17010__ul31011335173118"><li id="ALM-17010__li2101135153113">If yes, go to <a href="#ALM-17010__li21011035113113">5</a>.</li><li id="ALM-17010__li1410114355318">If no, go to <a href="#ALM-17010__li41021352312">2</a>.</li></ul>
|
||||
</p></li><li id="ALM-17010__li41021352312"><a name="ALM-17010__li41021352312"></a><a name="li41021352312"></a><span>In the alarm list, check whether the "ALM-14000 HDFS Service Unavailable" alarm is generated.</span><p><ul id="ALM-17010__ul20102133513117"><li id="ALM-17010__li1810115355315">If yes, go to <a href="#ALM-17010__li16102123513112">3</a>.</li><li id="ALM-17010__li19102113516316">If no, go to <a href="#ALM-17010__li21011035113113">5</a>.</li></ul>
|
||||
</p></li><li id="ALM-17010__li16102123513112"><a name="ALM-17010__li16102123513112"></a><a name="li16102123513112"></a><span>Rectify the fault by performing the operations provided for <a href="ALM-14000.html">ALM-14000 HDFS Service Unavailable</a>.</span></li><li id="ALM-17010__li101021435203119"><span>Wait for several minutes and check whether the alarm <strong id="ALM-17010__b1898515251491">Abnormal Connection Between Oozie and HDFS</strong> is cleared.</span><p><ul id="ALM-17010__ul3102103517314"><li id="ALM-17010__li1710214358317">If yes, no further action is required.</li><li id="ALM-17010__li2102113533113">If no, go to <a href="#ALM-17010__li21011035113113">5</a>.</li></ul>
|
||||
</p></li></ol>
|
||||
<p id="ALM-17010__p1799018108279"><strong id="ALM-17010__b163324357496">Check the connectivity between Oozie and HDFS.</strong></p>
|
||||
<ol start="5" id="ALM-17010__ol5101635173118"><li id="ALM-17010__li21011035113113"><a name="ALM-17010__li21011035113113"></a><a name="li21011035113113"></a><span>Log in to MRS Manager, choose <strong id="ALM-17010__b15221142113420">O&M</strong> > <strong id="ALM-17010__b1422164218348">Log</strong> > <strong id="ALM-17010__b422184213419">Online Search</strong>, select the Oozie service, and search for the keyword <strong id="ALM-17010__b172215425344">[Oozie Alarm Enhancement][HDFS]</strong> in the log. View the cause in the log, and rectify the fault. In the alarm list, check whether the alarm <strong id="ALM-17010__b1544615403417">Abnormal Connection Between Oozie and HDFS</strong> is cleared.</span><p><ul id="ALM-17010__ul7101435113113"><li id="ALM-17010__li19101143511319">If yes, no further action is required.</li><li id="ALM-17010__li6101133512314">If no, go to <a href="#ALM-17010__li181920381319">6</a>.</li></ul>
|
||||
</p></li></ol>
|
||||
<p id="ALM-17010__p10510389"><strong id="ALM-17010__b41027473192141">Collect fault information.</strong></p>
|
||||
<ol start="6" id="ALM-17010__ol81925389317"><li id="ALM-17010__li181920381319"><a name="ALM-17010__li181920381319"></a><a name="li181920381319"></a><span>On MRS Manager, choose <strong id="ALM-17010__b1221855519491">O&M</strong>. In the navigation pane on the left, choose <strong id="ALM-17010__b121815554492">Log</strong> > <strong id="ALM-17010__b13218205564920">Download</strong>.</span></li><li id="ALM-17010__li1419283811317"><span>Select <strong id="ALM-17010__b12862156164915">Oozie</strong> for <strong id="ALM-17010__b3862165654910">Service</strong> and click <strong id="ALM-17010__b17862165674914">OK</strong>.</span></li><li id="ALM-17010__li3192938193114"><span>Click <span><img id="ALM-17010__image61925387311" src="en-us_image_0000002416013105.png"></span> in the upper right corner, and set <strong id="ALM-17010__b6722459194914">Start Date</strong> and <strong id="ALM-17010__b19722205916496">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-17010__b17722659114917">Download</strong>.</span></li><li id="ALM-17010__li5192183813312"><span>Contact <span id="ALM-17010__text03612525016">O&M personnel</span> and provide the collected logs.</span></li></ol>
|
||||
</div>
|
||||
<div class="section" id="ALM-17010__section169311343318"><h4 class="sectiontitle"><span id="ALM-17010__text11828153133114">Alarm Clearance</span></h4><p id="ALM-17010__p330820123019">This alarm is automatically cleared after the fault is rectified.</p>
|
||||
</div>
|
||||
<div class="section" id="ALM-17010__section40120107"><h4 class="sectiontitle"><span id="ALM-17010__text1428897183118">Related Information</span></h4><p id="ALM-17010__p104585259303"><span id="ALM-17010__text61294221672">None.</span></p>
|
||||
</div>
|
||||
</div>
|
||||
<div>
|
||||
<div class="familylinks">
|
||||
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
89
docs/mrs/umn/ALM-17011.html
Normal file
89
docs/mrs/umn/ALM-17011.html
Normal file
@ -0,0 +1,89 @@
|
||||
<a name="ALM-17011"></a><a name="ALM-17011"></a>
|
||||
|
||||
<h1 class="topictitle1">ALM-17011 Abnormal Connection Between Oozie and YARN</h1>
|
||||
<div id="body0000001480011481"><div class="section" id="ALM-17011__section17558163642310"><h4 class="sectiontitle"><span id="ALM-17011__text141671903013">Alarm Description</span></h4><p id="ALM-17011__p3367145893316">Oozie depends on YARN. After a task is submitted, the system checks YARN connectivity. This alarm is generated when the service fails the check for 5 consecutive times.</p>
|
||||
<p id="ALM-17011__p14655103992310">This alarm is cleared when the connection between Oozie and YARN becomes normal.</p>
|
||||
<p id="ALM-17011__p175307509351">This alarm applies to MRS 3.3.0 or later.</p>
|
||||
</div>
|
||||
<div class="section" id="ALM-17011__section0241152918232"><h4 class="sectiontitle"><span id="ALM-17011__text10304425193010">Alarm Attributes</span></h4>
|
||||
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-17011__table36969235" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-17011__row42433012"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.1"><p id="ALM-17011__p6820189"><span id="ALM-17011__text15625428153013">Alarm ID</span></p>
|
||||
</th>
|
||||
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.2"><p id="ALM-17011__p15564413"><span id="ALM-17011__text1785183153010">Alarm Severity</span></p>
|
||||
</th>
|
||||
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.3"><p id="ALM-17011__p52757950"><span id="ALM-17011__text0945193443018">Auto Cleared</span></p>
|
||||
</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody><tr id="ALM-17011__row21396528"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.1 "><p id="ALM-17011__p9126191145711">17011</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.2 "><p id="ALM-17011__p1912519111570">Minor</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.3 "><p id="ALM-17011__p13123014577">Yes</p>
|
||||
</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
</div>
|
||||
</div>
|
||||
<div class="section" id="ALM-17011__section134680194295"><h4 class="sectiontitle"><span id="ALM-17011__text1656918386302">Alarm Parameters</span></h4>
|
||||
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-17011__table33959796" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-17011__row12041852"><th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.1"><p id="ALM-17011__p10293461"><span id="ALM-17011__text336474319309">Parameter</span></p>
|
||||
</th>
|
||||
<th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.2"><p id="ALM-17011__p28464046"><span id="ALM-17011__text9445144613014">Description</span></p>
|
||||
</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody><tr id="ALM-17011__row6761927132120"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-17011__p156438591896">Source</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-17011__p187931338134115">Specifies the cluster for which the alarm is generated.</p>
|
||||
</td>
|
||||
</tr>
|
||||
<tr id="ALM-17011__row32756501"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-17011__p65062640">ServiceName</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-17011__p32823928">Specifies the service for which the alarm is generated.</p>
|
||||
</td>
|
||||
</tr>
|
||||
<tr id="ALM-17011__row26979903"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-17011__p35626567">RoleName</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-17011__p49072303">Specifies the role for which the alarm is generated.</p>
|
||||
</td>
|
||||
</tr>
|
||||
<tr id="ALM-17011__row38997545"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-17011__p51620924">HostName</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-17011__p43906777">Specifies the host for which the alarm is generated.</p>
|
||||
</td>
|
||||
</tr>
|
||||
<tr id="ALM-17011__row59616680"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-17011__p64221796">Trigger Condition</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-17011__p34583021">Specifies the threshold for triggering the alarm.</p>
|
||||
</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
</div>
|
||||
</div>
|
||||
<div class="section" id="ALM-17011__section7522603"><h4 class="sectiontitle"><span id="ALM-17011__text1741875023020">Impact on the System</span></h4><p id="ALM-17011__p19221151174514">Running scheduling tasks are blocked and new scheduling tasks cannot be submitted.</p>
|
||||
</div>
|
||||
<div class="section" id="ALM-17011__section594567"><h4 class="sectiontitle"><span id="ALM-17011__text32611054143010">Possible Causes</span></h4><ul id="ALM-17011__ul241163443715"><li id="ALM-17011__li1188327193">The Yarn service is abnormal.</li><li id="ALM-17011__li3439144013713">The connection between Oozie and Yarn is abnormal.</li></ul>
|
||||
</div>
|
||||
<div class="section" id="ALM-17011__section16780114115290"><h4 class="sectiontitle"><span id="ALM-17011__text13657135716302">Handling Procedure</span></h4><p id="ALM-17011__p10681171711356"><strong id="ALM-17011__b7206171817519">Check the YARN service status.</strong></p>
|
||||
<ol id="ALM-17011__ol25203713362"><li id="ALM-17011__li11463718361"><span>In the service list on MRS Manager, check whether <strong id="ALM-17011__b520011200514">Running Status</strong> of Yarn is <strong id="ALM-17011__b1920092025119">Normal</strong>.</span><p><ul id="ALM-17011__ul1541337163618"><li id="ALM-17011__li204123793617">If yes, go to <a href="#ALM-17011__li1041037183619">5</a>.</li><li id="ALM-17011__li045372368">If no, go to <a href="#ALM-17011__li1641237133617">2</a>.</li></ul>
|
||||
</p></li><li id="ALM-17011__li1641237133617"><a name="ALM-17011__li1641237133617"></a><a name="li1641237133617"></a><span>In the alarm list, check whether <strong id="ALM-17011__b12531114013514">ALM-18000 YARN Service Unavailable</strong> is generated.</span><p><ul id="ALM-17011__ul194133719367"><li id="ALM-17011__li3433715366">If yes, go to <a href="#ALM-17011__li14163773614">3</a>.</li><li id="ALM-17011__li194123717369">If no, go to <a href="#ALM-17011__li1041037183619">5</a>.</li></ul>
|
||||
</p></li><li id="ALM-17011__li14163773614"><a name="ALM-17011__li14163773614"></a><a name="li14163773614"></a><span>Rectify the fault by performing the operations provided for <a href="ALM-18000.html">ALM-18000 YARN Service Unavailable</a>.</span></li><li id="ALM-17011__li8511372367"><span>Wait for several minutes and check whether the alarm <strong id="ALM-17011__b115801712175217">Abnormal Connection Between Oozie and Yarn</strong> is cleared.</span><p><ul id="ALM-17011__ul18543718363"><li id="ALM-17011__li641937193612">If yes, no further action is required.</li><li id="ALM-17011__li9543719360">If no, go to <a href="#ALM-17011__li1041037183619">5</a>.</li></ul>
|
||||
</p></li></ol>
|
||||
<p id="ALM-17011__p194917194361"><strong id="ALM-17011__b16623321165218">Check the connectivity between Oozie and Yarn.</strong></p>
|
||||
<ol start="5" id="ALM-17011__ol14123713614"><li id="ALM-17011__li1041037183619"><a name="ALM-17011__li1041037183619"></a><a name="li1041037183619"></a><span>Log in to MRS Manager, choose <strong id="ALM-17011__b173981815363">O&M</strong> > <strong id="ALM-17011__b14739918163614">Log</strong> > <strong id="ALM-17011__b1773951810367">Online Search</strong>, select the Oozie service, and search for the keyword <strong id="ALM-17011__b1873941814363">[Oozie Alarm Enhancement][Yarn]</strong> in the log. View the cause in the log, and rectify the fault. In the alarm list, check whether the alarm <strong id="ALM-17011__b146461528123612">Abnormal Connection Between Oozie and Yarn</strong> is cleared.</span><p><ul id="ALM-17011__ul154203712364"><li id="ALM-17011__li3318374361">If yes, no further action is required.</li><li id="ALM-17011__li1141737163617">If no, go to <a href="#ALM-17011__li17344124033615">6</a>.</li></ul>
|
||||
</p></li></ol>
|
||||
<p id="ALM-17011__p10510389"><strong id="ALM-17011__b41027473192141">Collect fault information.</strong></p>
|
||||
<ol start="6" id="ALM-17011__ol1234510406369"><li id="ALM-17011__li17344124033615"><a name="ALM-17011__li17344124033615"></a><a name="li17344124033615"></a><span>On MRS Manager, choose <strong id="ALM-17011__b36851841145217">O&M</strong>. In the navigation pane on the left, choose <strong id="ALM-17011__b76856415526">Log</strong> > <strong id="ALM-17011__b5686174145211">Download</strong>.</span></li><li id="ALM-17011__li1134534020369"><span>Select <strong id="ALM-17011__b818934365213">Oozie</strong> for <strong id="ALM-17011__b1918916439523">Service</strong> and click <strong id="ALM-17011__b14189154313522">OK</strong>.</span></li><li id="ALM-17011__li1334517408369"><span>Click <span><img id="ALM-17011__image10345124017364" src="en-us_image_0000002382293908.png"></span> in the upper right corner, and set <strong id="ALM-17011__b111846205217">Start Date</strong> and <strong id="ALM-17011__b711946135212">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-17011__b2154618522">Download</strong>.</span></li><li id="ALM-17011__li1234504010367"><span>Contact <span id="ALM-17011__text16466145095213">O&M personnel</span> and provide the collected logs.</span></li></ol>
|
||||
</div>
|
||||
<div class="section" id="ALM-17011__section169311343318"><h4 class="sectiontitle"><span id="ALM-17011__text11828153133114">Alarm Clearance</span></h4><p id="ALM-17011__p330820123019">This alarm is automatically cleared after the fault is rectified.</p>
|
||||
</div>
|
||||
<div class="section" id="ALM-17011__section40120107"><h4 class="sectiontitle"><span id="ALM-17011__text1428897183118">Related Information</span></h4><p id="ALM-17011__p104585259303"><span id="ALM-17011__text61294221672">None.</span></p>
|
||||
</div>
|
||||
</div>
|
||||
<div>
|
||||
<div class="familylinks">
|
||||
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
@ -62,17 +62,17 @@
|
||||
<div class="section" id="ALM-19022__section20958252"><h4 class="sectiontitle"><span id="ALM-19022__text941851614397">Possible Causes</span></h4><ul id="ALM-19022__ul20817398"><li id="ALM-19022__li1188327193">The ZooKeeper service is abnormal.</li><li id="ALM-19022__li9280164">The HBase service is abnormal.</li><li id="ALM-19022__li16412613">In the current HBase service, the MetricController instance on the same node as the active HMaster instance is not started.</li><li id="ALM-19022__li7418144120197">The network is abnormal.</li></ul>
|
||||
</div>
|
||||
<div class="section" id="ALM-19022__section118143257718"><h4 class="sectiontitle"><span id="ALM-19022__text7119112018395">Handling Procedure</span></h4><p class="tableheading" id="ALM-19022__p54353294"><strong id="ALM-19022__b15135086935">Check the ZooKeeper service status.</strong></p>
|
||||
<ol id="ALM-19022__ol967113713192"><li id="ALM-19022__li116753791911"><span>In the service list on FusionInsight Manager, check whether <strong id="ALM-19022__b700098901114933">Running Status</strong> of ZooKeeper is <strong id="ALM-19022__b2002078094114933">Normal</strong>.</span><p><ul class="subitemlist" id="ALM-19022__ul167113714196"><li id="ALM-19022__li867163716190">If yes, go to <a href="#ALM-19022__li18661164216271">5</a>.</li><li id="ALM-19022__li14673371194">If no, go to <a href="#ALM-19022__li1267193701920">2</a>.</li></ul>
|
||||
<ol id="ALM-19022__ol967113713192"><li id="ALM-19022__li116753791911"><span>In the service list on MRS Manager, check whether <strong id="ALM-19022__b700098901114933">Running Status</strong> of ZooKeeper is <strong id="ALM-19022__b2002078094114933">Normal</strong>.</span><p><ul class="subitemlist" id="ALM-19022__ul167113714196"><li id="ALM-19022__li867163716190">If yes, go to <a href="#ALM-19022__li18661164216271">5</a>.</li><li id="ALM-19022__li14673371194">If no, go to <a href="#ALM-19022__li1267193701920">2</a>.</li></ul>
|
||||
</p></li><li id="ALM-19022__li1267193701920"><a name="ALM-19022__li1267193701920"></a><a name="li1267193701920"></a><span>In the alarm list, check whether <strong id="ALM-19022__b1414187519114933">ALM-13000 ZooKeeper Service Unavailable</strong> exists.</span><p><ul class="subitemlist" id="ALM-19022__ul26783713195"><li id="ALM-19022__li76793711910">If yes, go to <a href="#ALM-19022__li667113714198">3</a>.</li><li id="ALM-19022__li14671437191915">If no, go to <a href="#ALM-19022__li18661164216271">5</a>.</li></ul>
|
||||
</p></li><li id="ALM-19022__li667113714198"><a name="ALM-19022__li667113714198"></a><a name="li667113714198"></a><span>Rectify the fault by performing the operations provided for <strong id="ALM-19022__b836413547337">ALM-13000 ZooKeeper Service Unavailable</strong>.</span></li><li id="ALM-19022__li367113701911"><span>Wait for several minutes and check whether the alarm <strong id="ALM-19022__b147001165344">HBase Hotspot Detection Is Unavailable</strong> is cleared.</span><p><ul class="subitemlist" id="ALM-19022__ul76793751911"><li id="ALM-19022__li2671837191915">If yes, no further action is required.</li><li id="ALM-19022__li19671237151914">If no, go to <a href="#ALM-19022__li18661164216271">5</a>.</li></ul>
|
||||
</p></li></ol>
|
||||
<p id="ALM-19022__p865314531778"><strong id="ALM-19022__b1748012335616">Check the HBase service status.</strong></p>
|
||||
<ol start="5" id="ALM-19022__ol466218426271"><li id="ALM-19022__li18661164216271"><a name="ALM-19022__li18661164216271"></a><a name="li18661164216271"></a><span>In the service list on FusionInsight Manager, check whether <strong id="ALM-19022__b18974248183410">Running Status</strong> of HBase is <strong id="ALM-19022__b1097474893418">Normal</strong>.</span><p><ul class="subitemlist" id="ALM-19022__ul146611042202714"><li id="ALM-19022__li4661842122717">If yes, go to <a href="#ALM-19022__li61381651152817">9</a>.</li><li id="ALM-19022__li566194262714">If no, go to <a href="#ALM-19022__li18662154292714">6</a>.</li></ul>
|
||||
<ol start="5" id="ALM-19022__ol466218426271"><li id="ALM-19022__li18661164216271"><a name="ALM-19022__li18661164216271"></a><a name="li18661164216271"></a><span>In the service list on MRS Manager, check whether <strong id="ALM-19022__b18974248183410">Running Status</strong> of HBase is <strong id="ALM-19022__b1097474893418">Normal</strong>.</span><p><ul class="subitemlist" id="ALM-19022__ul146611042202714"><li id="ALM-19022__li4661842122717">If yes, go to <a href="#ALM-19022__li61381651152817">9</a>.</li><li id="ALM-19022__li566194262714">If no, go to <a href="#ALM-19022__li18662154292714">6</a>.</li></ul>
|
||||
</p></li><li id="ALM-19022__li18662154292714"><a name="ALM-19022__li18662154292714"></a><a name="li18662154292714"></a><span>In the alarm list, check whether the alarm ALM-19000 HBase Service Unavailable exists.</span><p><ul class="subitemlist" id="ALM-19022__ul1366214211276"><li id="ALM-19022__li2066144217277">If yes, go to <a href="#ALM-19022__li66625429278">7</a>.</li><li id="ALM-19022__li126627425274">If no, go to <a href="#ALM-19022__li61381651152817">9</a>.</li></ul>
|
||||
</p></li><li id="ALM-19022__li66625429278"><a name="ALM-19022__li66625429278"></a><a name="li66625429278"></a><span>Rectify the fault by following the steps provided for <strong id="ALM-19022__b6885292357">ALM-19000 HBase Service Unavailable</strong>.</span></li><li id="ALM-19022__li3662542162713"><span>Wait for several minutes and check whether the alarm <strong id="ALM-19022__b9249143393512">HBase Hotspot Detection Is Unavailable</strong> is cleared.</span><p><ul class="subitemlist" id="ALM-19022__ul11662144222718"><li id="ALM-19022__li166628421274">If yes, no further action is required.</li><li class="subitemlist" id="ALM-19022__li1266215424271">If no, go to <a href="#ALM-19022__li61381651152817">9</a>.</li></ul>
|
||||
</p></li></ol>
|
||||
<p id="ALM-19022__p868752102714"><strong id="ALM-19022__b1191412289287">Check whether the MetricController instance deployed on the same node as the active HMaster instance is started.</strong></p>
|
||||
<ol start="9" id="ALM-19022__ol1113913517286"><li id="ALM-19022__li61381651152817"><a name="ALM-19022__li61381651152817"></a><a name="li61381651152817"></a><span>On FusionInsight Manager, choose <strong id="ALM-19022__b1855525619369">Cluster</strong> > <strong id="ALM-19022__b42663582366">Service</strong> > <strong id="ALM-19022__b1921119013715">HBase</strong>, and click <strong id="ALM-19022__b152267183717">Instances</strong> to check whether the <strong id="ALM-19022__b7728614153719">MetricController(Active)</strong> instance exists.</span><p><ul id="ALM-19022__ul18137551182817"><li id="ALM-19022__li213685102813">If yes, go to <a href="#ALM-19022__li182979395366">12</a>.</li><li id="ALM-19022__li1013719517283">If no, go to <a href="#ALM-19022__li12138165182818">10</a>.</li></ul>
|
||||
<ol start="9" id="ALM-19022__ol1113913517286"><li id="ALM-19022__li61381651152817"><a name="ALM-19022__li61381651152817"></a><a name="li61381651152817"></a><span>On MRS Manager, choose <strong id="ALM-19022__b1855525619369">Cluster</strong> > <strong id="ALM-19022__b42663582366">Service</strong> > <strong id="ALM-19022__b1921119013715">HBase</strong>, and click <strong id="ALM-19022__b152267183717">Instances</strong> to check whether the <strong id="ALM-19022__b7728614153719">MetricController(Active)</strong> instance exists.</span><p><ul id="ALM-19022__ul18137551182817"><li id="ALM-19022__li213685102813">If yes, go to <a href="#ALM-19022__li182979395366">12</a>.</li><li id="ALM-19022__li1013719517283">If no, go to <a href="#ALM-19022__li12138165182818">10</a>.</li></ul>
|
||||
</p></li><li id="ALM-19022__li12138165182818"><a name="ALM-19022__li12138165182818"></a><a name="li12138165182818"></a><span>Select the MetricController instance whose management IP address is the same as that of the active HMaster instance, and click <strong id="ALM-19022__b16234161112506">Start Instance</strong>.</span></li><li id="ALM-19022__li2139155152815"><span>After the MetricController instance is restarted, check whether the alarm <strong id="ALM-19022__b165503410387">HBase Hotspot Detection Is Unavailable</strong> is cleared.</span><p><ul id="ALM-19022__ul41391251132811"><li id="ALM-19022__li613819516284">If yes, no further action is required.</li><li id="ALM-19022__li813913519284">If no, go to <a href="#ALM-19022__li182979395366">12</a>.</li></ul>
|
||||
</p></li></ol>
|
||||
<p id="ALM-19022__p69991826393"><strong id="ALM-19022__b34087649221">Check the network connectivity between the started MetricController instances and the active HMaster node.</strong></p>
|
||||
@ -80,7 +80,7 @@
|
||||
</p></li><li id="ALM-19022__li929715395365"><a name="ALM-19022__li929715395365"></a><a name="li929715395365"></a><span>Contact the network administrator to restore the network.</span></li><li id="ALM-19022__li6298193923611"><span>After the network recovers, check whether the alarm <strong id="ALM-19022__b4583132544015">HBase Hotspot Detection Is Unavailable</strong> is cleared.</span><p><ul class="subitemlist" id="ALM-19022__ul5298133993617"><li id="ALM-19022__li42981839123610">If yes, no further action is required.</li><li id="ALM-19022__li3298239123613">If no, go to <a href="#ALM-19022__li107641231103617">15</a>.</li></ul>
|
||||
</p></li></ol>
|
||||
<p id="ALM-19022__p15601739207"><strong id="ALM-19022__b3606332013">Collect fault information.</strong></p>
|
||||
<ol start="15" id="ALM-19022__ol167651631113615"><li id="ALM-19022__li107641231103617"><a name="ALM-19022__li107641231103617"></a><a name="li107641231103617"></a><span>On FusionInsight Manager, choose <strong id="ALM-19022__b1985771627114933">O&M</strong>. In the navigation pane on the left, choose <strong id="ALM-19022__b1016475368114933">Log</strong> > <strong id="ALM-19022__b1811065636114933">Download</strong>.</span></li><li id="ALM-19022__li07645310363"><span>Expand the <strong id="ALM-19022__b1683209692114933">Service</strong> drop-down list, and select <strong id="ALM-19022__b1178843058114933">HBase</strong> for the target cluster.</span></li><li id="ALM-19022__li73388391699"><span>In the <strong id="ALM-19022__b109542059414">Host</strong> area, select the host where the HMaster instance is deployed.</span></li><li id="ALM-19022__li976593115360"><span>Click the edit icon in the upper right corner, and set <strong id="ALM-19022__b103081519194118">Start Date</strong> and <strong id="ALM-19022__b130917192419">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-19022__b23091319184112">Download</strong>.</span></li><li id="ALM-19022__li77651631163618"><span>Contact <span id="ALM-19022__text12765631133618">O&M personnel</span> and provide the collected logs.</span></li></ol>
|
||||
<ol start="15" id="ALM-19022__ol167651631113615"><li id="ALM-19022__li107641231103617"><a name="ALM-19022__li107641231103617"></a><a name="li107641231103617"></a><span>On MRS Manager, choose <strong id="ALM-19022__b1985771627114933">O&M</strong>. In the navigation pane on the left, choose <strong id="ALM-19022__b1016475368114933">Log</strong> > <strong id="ALM-19022__b1811065636114933">Download</strong>.</span></li><li id="ALM-19022__li07645310363"><span>Expand the <strong id="ALM-19022__b1683209692114933">Service</strong> drop-down list, and select <strong id="ALM-19022__b1178843058114933">HBase</strong> for the target cluster.</span></li><li id="ALM-19022__li73388391699"><span>In the <strong id="ALM-19022__b109542059414">Host</strong> area, select the host where the HMaster instance is deployed.</span></li><li id="ALM-19022__li976593115360"><span>Click the edit icon in the upper right corner, and set <strong id="ALM-19022__b103081519194118">Start Date</strong> and <strong id="ALM-19022__b130917192419">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-19022__b23091319184112">Download</strong>.</span></li><li id="ALM-19022__li77651631163618"><span>Contact <span id="ALM-19022__text12765631133618">O&M personnel</span> and provide the collected logs.</span></li></ol>
|
||||
</div>
|
||||
<div class="section" id="ALM-19022__section563505465818"><h4 class="sectiontitle"><span id="ALM-19022__text1761202610393">Alarm Clearance</span></h4><p id="ALM-19022__p715945811718">This alarm is automatically cleared after the fault is rectified.</p>
|
||||
</div>
|
||||
|
||||
@ -61,10 +61,10 @@
|
||||
<div class="section" id="ALM-19023__section20958252"><h4 class="sectiontitle"><span id="ALM-19023__text189626102409">Possible Causes</span></h4><p id="ALM-19023__p1635445605811">Too many requests are directed to a single region when the HBase service is accessed.</p>
|
||||
</div>
|
||||
<div class="section" id="ALM-19023__section1147916303585"><h4 class="sectiontitle"><span id="ALM-19023__text8893161374012">Handling Procedure</span></h4><p id="ALM-19023__p1452914817195"><strong id="ALM-19023__b1249473125012">Check whether there are too many requests in a single region of HBase.</strong></p>
|
||||
<ol id="ALM-19023__ol13645123675812"><li id="ALM-19023__li56451636195813"><span>Log in to FusionInsight Manager, and Choose <strong id="ALM-19023__b946811165018">O&M</strong> > <strong id="ALM-19023__b3468111155015">Alarm</strong> > <strong id="ALM-19023__b24681115502">Alarms</strong>.</span></li><li id="ALM-19023__li864533612584"><a name="ALM-19023__li864533612584"></a><a name="li864533612584"></a><span>In <strong id="ALM-19023__b059910382565">Additional Information</strong> of <strong id="ALM-19023__b016916295712">Region Traffic Restriction for HBase</strong>, view the reported table name and region information.</span></li><li id="ALM-19023__li3155164012258"><span>On FusionInsight Manager, choose <strong id="ALM-19023__b79531638105717">Cluster</strong> > <strong id="ALM-19023__b125105407578">Service</strong> > <strong id="ALM-19023__b39691344125715">HBase</strong> and click the hyperlink on the right of HMaster web UI.</span></li><li id="ALM-19023__li143233034717"><span>Click <strong id="ALM-19023__b152025412583">Table Details</strong> and adjust service configurations in the region where the table in <a href="#ALM-19023__li864533612584">2</a> is deployed.</span></li><li id="ALM-19023__li12733112072918"><span>Wait a moment and then check whether the alarm is cleared.</span><p><ul id="ALM-19023__ul815863116392"><li id="ALM-19023__li142853445474">If yes, no further action is required.</li><li id="ALM-19023__li571104311394">If no, go to <a href="#ALM-19023__li16644173610580">6</a>.</li></ul>
|
||||
<ol id="ALM-19023__ol13645123675812"><li id="ALM-19023__li56451636195813"><span>Log in to MRS Manager, and Choose <strong id="ALM-19023__b946811165018">O&M</strong> > <strong id="ALM-19023__b3468111155015">Alarm</strong> > <strong id="ALM-19023__b24681115502">Alarms</strong>.</span></li><li id="ALM-19023__li864533612584"><a name="ALM-19023__li864533612584"></a><a name="li864533612584"></a><span>In <strong id="ALM-19023__b059910382565">Additional Information</strong> of <strong id="ALM-19023__b016916295712">Region Traffic Restriction for HBase</strong>, view the reported table name and region information.</span></li><li id="ALM-19023__li3155164012258"><span>On MRS Manager, choose <strong id="ALM-19023__b79531638105717">Cluster</strong> > <strong id="ALM-19023__b125105407578">Service</strong> > <strong id="ALM-19023__b39691344125715">HBase</strong> and click the hyperlink on the right of HMaster web UI.</span></li><li id="ALM-19023__li143233034717"><span>Click <strong id="ALM-19023__b152025412583">Table Details</strong> and adjust service configurations in the region where the table in <a href="#ALM-19023__li864533612584">2</a> is deployed.</span></li><li id="ALM-19023__li12733112072918"><span>Wait a moment and then check whether the alarm is cleared.</span><p><ul id="ALM-19023__ul815863116392"><li id="ALM-19023__li142853445474">If yes, no further action is required.</li><li id="ALM-19023__li571104311394">If no, go to <a href="#ALM-19023__li16644173610580">6</a>.</li></ul>
|
||||
</p></li></ol>
|
||||
<p id="ALM-19023__p15601739207"><strong id="ALM-19023__b158753340553">Collect fault information.</strong></p>
|
||||
<ol start="6" id="ALM-19023__ol164433617585"><li id="ALM-19023__li16644173610580"><a name="ALM-19023__li16644173610580"></a><a name="li16644173610580"></a><span>On FusionInsight Manager, choose <strong id="ALM-19023__b207011479527">O&M</strong>. In the navigation pane on the left, choose <strong id="ALM-19023__b170114745213">Log</strong> > <strong id="ALM-19023__b17015720524">Download</strong>.</span></li><li id="ALM-19023__li20644736165810"><span>Expand the <strong id="ALM-19023__b8635913155214">Service</strong> drop-down list, and select <strong id="ALM-19023__b146357134525">HBase</strong> for the target cluster.</span></li><li id="ALM-19023__li186448362587"><span>In the <strong id="ALM-19023__b10379201915216">Host</strong> area, select the host where the HMaster instance is deployed.</span></li><li id="ALM-19023__li18644133665817"><span>Click the edit icon in the upper right corner, and set <strong id="ALM-19023__b13562112005217">Start Date</strong> and <strong id="ALM-19023__b35628203522">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-19023__b1556252085213">Download</strong>.</span></li><li id="ALM-19023__li064413695811"><span>Contact <span id="ALM-19023__text127091922205217">O&M personnel</span> and provide the collected logs.</span></li></ol>
|
||||
<ol start="6" id="ALM-19023__ol164433617585"><li id="ALM-19023__li16644173610580"><a name="ALM-19023__li16644173610580"></a><a name="li16644173610580"></a><span>On MRS Manager, choose <strong id="ALM-19023__b207011479527">O&M</strong>. In the navigation pane on the left, choose <strong id="ALM-19023__b170114745213">Log</strong> > <strong id="ALM-19023__b17015720524">Download</strong>.</span></li><li id="ALM-19023__li20644736165810"><span>Expand the <strong id="ALM-19023__b8635913155214">Service</strong> drop-down list, and select <strong id="ALM-19023__b146357134525">HBase</strong> for the target cluster.</span></li><li id="ALM-19023__li186448362587"><span>In the <strong id="ALM-19023__b10379201915216">Host</strong> area, select the host where the HMaster instance is deployed.</span></li><li id="ALM-19023__li18644133665817"><span>Click the edit icon in the upper right corner, and set <strong id="ALM-19023__b13562112005217">Start Date</strong> and <strong id="ALM-19023__b35628203522">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-19023__b1556252085213">Download</strong>.</span></li><li id="ALM-19023__li064413695811"><span>Contact <span id="ALM-19023__text127091922205217">O&M personnel</span> and provide the collected logs.</span></li></ol>
|
||||
</div>
|
||||
<div class="section" id="ALM-19023__section169311343318"><h4 class="sectiontitle"><span id="ALM-19023__text8397201834010">Alarm Clearance</span></h4><p id="ALM-19023__p3969205517187">This alarm will be automatically cleared.</p>
|
||||
</div>
|
||||
|
||||
@ -65,21 +65,21 @@
|
||||
</div>
|
||||
<div class="section" id="ALM-19024__section20958252"><h4 class="sectiontitle"><span id="ALM-19024__text202332032152613">Possible Causes</span></h4><ul id="ALM-19024__ul178628366354"><li id="ALM-19024__li14862193663518">RegionServer GC duration is too long.</li><li id="ALM-19024__li1686243619356">The HDFS RPC response is too slow.</li><li id="ALM-19024__li3862143663513">RegionServer request concurrency is too high.</li></ul>
|
||||
</div>
|
||||
<div class="section" id="ALM-19024__section1147916303585"><h4 class="sectiontitle"><span id="ALM-19024__text1733835102615">Handling Procedure</span></h4><ol id="ALM-19024__ol6708234101512"><li id="ALM-19024__li187081734191516"><a name="ALM-19024__li187081734191516"></a><a name="li187081734191516"></a><span>Log in to FusionInsight Manager and choose <strong id="ALM-19024__b42297502143">O&M</strong>. In the navigation pane on the left, choose <strong id="ALM-19024__b722945015141">Alarm</strong> > <strong id="ALM-19024__b1222911503147">Alarms</strong>. On the page that is displayed, locate the row containing the alarm whose <strong id="ALM-19024__b122291750191414">Alarm ID</strong> is <strong id="ALM-19024__b122995061413">19024</strong>, and view the service instance and host name in <strong id="ALM-19024__b9229850141415">Location</strong>.</span></li></ol>
|
||||
<div class="section" id="ALM-19024__section1147916303585"><h4 class="sectiontitle"><span id="ALM-19024__text1733835102615">Handling Procedure</span></h4><ol id="ALM-19024__ol6708234101512"><li id="ALM-19024__li187081734191516"><a name="ALM-19024__li187081734191516"></a><a name="li187081734191516"></a><span>Log in to MRS Manager and choose <strong id="ALM-19024__b42297502143">O&M</strong>. In the navigation pane on the left, choose <strong id="ALM-19024__b722945015141">Alarm</strong> > <strong id="ALM-19024__b1222911503147">Alarms</strong>. On the page that is displayed, locate the row containing the alarm whose <strong id="ALM-19024__b122291750191414">Alarm ID</strong> is <strong id="ALM-19024__b122995061413">19024</strong>, and view the service instance and host name in <strong id="ALM-19024__b9229850141415">Location</strong>.</span></li></ol>
|
||||
<p id="ALM-19024__p18769103663611"><strong id="ALM-19024__b1938714493153">Check the GC duration of RegionServer.</strong></p>
|
||||
<ol start="2" id="ALM-19024__ol37085342153"><li id="ALM-19024__li97081834111516"><span>In the alarm list on FusionInsight Manager, check whether the "HBase GC Duration Exceeds the Threshold" alarm is generated for the service instance in <a href="#ALM-19024__li187081734191516">1</a>.</span><p><ul id="ALM-19024__ul13708153410158"><li id="ALM-19024__li1170714345158">If yes, go to <a href="#ALM-19024__li167081134161511">3</a>.</li><li id="ALM-19024__li270723411517">If no, go to <a href="#ALM-19024__li2708203154412">5</a>.</li></ul>
|
||||
<ol start="2" id="ALM-19024__ol37085342153"><li id="ALM-19024__li97081834111516"><span>In the alarm list on MRS Manager, check whether the "HBase GC Duration Exceeds the Threshold" alarm is generated for the service instance in <a href="#ALM-19024__li187081734191516">1</a>.</span><p><ul id="ALM-19024__ul13708153410158"><li id="ALM-19024__li1170714345158">If yes, go to <a href="#ALM-19024__li167081134161511">3</a>.</li><li id="ALM-19024__li270723411517">If no, go to <a href="#ALM-19024__li2708203154412">5</a>.</li></ul>
|
||||
</p></li><li id="ALM-19024__li167081134161511"><a name="ALM-19024__li167081134161511"></a><a name="li167081134161511"></a><span>Rectify the fault by following the handling procedure of "ALM-19007 HBase GC Duration Exceeds the Threshold".</span></li><li id="ALM-19024__li18708234121516"><span>Wait several minutes and check whether the alarm is cleared.</span><p><ul id="ALM-19024__ul1170803417150"><li id="ALM-19024__li3708183415151">If yes, no further action is required.</li><li id="ALM-19024__li970893414153">If no, go to <a href="#ALM-19024__li2708203154412">5</a>.</li></ul>
|
||||
</p></li></ol>
|
||||
<p id="ALM-19024__p17294242437"><strong id="ALM-19024__b02081554181812">Check HDFS RPC response time.</strong></p>
|
||||
<ol start="5" id="ALM-19024__ol19709113194416"><li id="ALM-19024__li2708203154412"><a name="ALM-19024__li2708203154412"></a><a name="li2708203154412"></a><span>In the alarm list on FusionInsight Manager, check whether alarm "Average NameNode RPC Processing Time Exceeds the Threshold" is generated for the HDFS service on which the HBase service depends.</span><p><ul id="ALM-19024__ul1045945610445"><li id="ALM-19024__li1545913562445">If yes, go to <a href="#ALM-19024__li87091331184413">6</a>.</li><li id="ALM-19024__li745905654413">If no, go to <a href="#ALM-19024__li2133184710441">8</a>.</li></ul>
|
||||
<ol start="5" id="ALM-19024__ol19709113194416"><li id="ALM-19024__li2708203154412"><a name="ALM-19024__li2708203154412"></a><a name="li2708203154412"></a><span>In the alarm list on MRS Manager, check whether alarm "Average NameNode RPC Processing Time Exceeds the Threshold" is generated for the HDFS service on which the HBase service depends.</span><p><ul id="ALM-19024__ul1045945610445"><li id="ALM-19024__li1545913562445">If yes, go to <a href="#ALM-19024__li87091331184413">6</a>.</li><li id="ALM-19024__li745905654413">If no, go to <a href="#ALM-19024__li2133184710441">8</a>.</li></ul>
|
||||
</p></li><li id="ALM-19024__li87091331184413"><a name="ALM-19024__li87091331184413"></a><a name="li87091331184413"></a><span>Rectify the fault by following the handling procedure of "ALM-14021 Average NameNode RPC Processing Time Exceeds the Threshold".</span></li><li id="ALM-19024__li970913314444"><span>Wait several minutes and check whether the alarm is cleared.</span><p><ul id="ALM-19024__ul770681614614"><li id="ALM-19024__li9706816124617">If yes, no further action is required.</li><li id="ALM-19024__li0706416104619">If no, go to <a href="#ALM-19024__li2133184710441">8</a>.</li></ul>
|
||||
</p></li></ol>
|
||||
<p id="ALM-19024__p7985133944418"><strong id="ALM-19024__b20818193111">Check the number of concurrent processes on a RegionServer.</strong></p>
|
||||
<ol start="8" id="ALM-19024__ol31346472448"><li id="ALM-19024__li2133184710441"><a name="ALM-19024__li2133184710441"></a><a name="li2133184710441"></a><span>In the alarm list on FusionInsight Manager, check whether the "Handler Usage of RegionServer Exceeds the Threshold" alarm is generated for the service instance in <a href="#ALM-19024__li187081734191516">1</a>.</span><p><ul id="ALM-19024__ul14691255124610"><li id="ALM-19024__li16469115518463">If yes, go to <a href="#ALM-19024__li1781144374611">9</a>.</li><li id="ALM-19024__li11469755194617">If no, go to <a href="#ALM-19024__li959275915215">11</a>.</li></ul>
|
||||
<ol start="8" id="ALM-19024__ol31346472448"><li id="ALM-19024__li2133184710441"><a name="ALM-19024__li2133184710441"></a><a name="li2133184710441"></a><span>In the alarm list on MRS Manager, check whether the "Handler Usage of RegionServer Exceeds the Threshold" alarm is generated for the service instance in <a href="#ALM-19024__li187081734191516">1</a>.</span><p><ul id="ALM-19024__ul14691255124610"><li id="ALM-19024__li16469115518463">If yes, go to <a href="#ALM-19024__li1781144374611">9</a>.</li><li id="ALM-19024__li11469755194617">If no, go to <a href="#ALM-19024__li959275915215">11</a>.</li></ul>
|
||||
</p></li><li id="ALM-19024__li1781144374611"><a name="ALM-19024__li1781144374611"></a><a name="li1781144374611"></a><span>Rectify the fault by following the handling procedure of "ALM-19021 Handler Usage of RegionServer Exceeds the Threshold".</span></li><li id="ALM-19024__li61341947114413"><span>Wait several minutes and check whether the alarm is cleared.</span><p><ul id="ALM-19024__ul197493917470"><li id="ALM-19024__li774103917479">If yes, no further action is required.</li><li id="ALM-19024__li874203913474">If no, go to <a href="#ALM-19024__li959275915215">11</a>.</li></ul>
|
||||
</p></li></ol>
|
||||
<p id="ALM-19024__p15601739207"><strong id="ALM-19024__b1595128153218">Collect fault information.</strong></p>
|
||||
<ol start="11" id="ALM-19024__ol1559215914523"><li id="ALM-19024__li959275915215"><a name="ALM-19024__li959275915215"></a><a name="li959275915215"></a><span>On FusionInsight Manager, choose <strong id="ALM-19024__b11396101324">O&M</strong>. In the navigation pane on the left, choose <strong id="ALM-19024__b139201003211">Log</strong> > <strong id="ALM-19024__b1139191053220">Download</strong>.</span></li><li id="ALM-19024__li1959211592529"><span>Expand the <strong id="ALM-19024__b142056118321">Service</strong> drop-down list, and select <strong id="ALM-19024__b162052117320">HBase</strong> for the target cluster.</span></li><li id="ALM-19024__li19592145975215"><span>Click the edit icon in the upper right corner, and set <strong id="ALM-19024__b153341712103211">Start Date</strong> and <strong id="ALM-19024__b6334171283215">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-19024__b833410121328">Download</strong>.</span></li><li id="ALM-19024__li1759215945217"><span>Contact <span id="ALM-19024__text1059295916526">O&M personnel</span> and provide the collected logs.</span></li></ol>
|
||||
<ol start="11" id="ALM-19024__ol1559215914523"><li id="ALM-19024__li959275915215"><a name="ALM-19024__li959275915215"></a><a name="li959275915215"></a><span>On MRS Manager, choose <strong id="ALM-19024__b11396101324">O&M</strong>. In the navigation pane on the left, choose <strong id="ALM-19024__b139201003211">Log</strong> > <strong id="ALM-19024__b1139191053220">Download</strong>.</span></li><li id="ALM-19024__li1959211592529"><span>Expand the <strong id="ALM-19024__b142056118321">Service</strong> drop-down list, and select <strong id="ALM-19024__b162052117320">HBase</strong> for the target cluster.</span></li><li id="ALM-19024__li19592145975215"><span>Click the edit icon in the upper right corner, and set <strong id="ALM-19024__b153341712103211">Start Date</strong> and <strong id="ALM-19024__b6334171283215">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-19024__b833410121328">Download</strong>.</span></li><li id="ALM-19024__li1759215945217"><span>Contact <span id="ALM-19024__text1059295916526">O&M personnel</span> and provide the collected logs.</span></li></ol>
|
||||
</div>
|
||||
<div class="section" id="ALM-19024__section169311343318"><h4 class="sectiontitle"><span id="ALM-19024__text596254111265">Alarm Clearance</span></h4><p id="ALM-19024__p5635175455816">This alarm is automatically cleared after the fault is rectified.</p>
|
||||
</div>
|
||||
|
||||
@ -4,7 +4,7 @@
|
||||
<div id="body0000002007527853"><div class="section" id="ALM-19025__section42400121"><h4 class="sectiontitle"><span id="ALM-19025__text14720102111505">Alarm Description</span></h4><p id="ALM-19025__p42166519292">The system checks the <strong id="ALM-19025__b2710259145714">hdfs://hacluster/hbase/autocorrupt</strong> and <strong id="ALM-19025__b426042055118">hdfs://hacluster/hbase/MasterData/autocorrupt</strong> directories on HDFS of each HBase service every 120 seconds. This alarm is generated when there are files in the directories.</p>
|
||||
<p id="ALM-19025__p1231351418316">This alarm is cleared when the <strong id="ALM-19025__b2036914565214">hdfs://hacluster/hbase/autocorrupt</strong> and <strong id="ALM-19025__b3369125205213">hdfs://hacluster/hbase/MasterData/autocorrupt</strong> directories do not exist or are empty.</p>
|
||||
<p id="ALM-19025__p10779958195019">This alarm applies only to MRS 3.3.0 or later.</p>
|
||||
<div class="note" id="ALM-19025__note227923535513"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="ALM-19025__p13733223134811"><strong id="ALM-19025__b7548014135912">hdfs://hacluster</strong> indicates the name of the file system used by HBase, and <strong id="ALM-19025__b87122095910">/hbase</strong> indicates the root directory of HBase in the file system. You can log in to FusionInsight Manager, choose <strong id="ALM-19025__b156701940135912">Cluster</strong> > <strong id="ALM-19025__b167334325910">Services</strong> > <strong id="ALM-19025__b63431447135914">HBase</strong> and click <strong id="ALM-19025__b159671454175915">Configuration</strong>. Search for <strong id="ALM-19025__b8912221606">fs.defaultFS</strong> and <strong id="ALM-19025__b21192209019">hbase.data.rootdir</strong>.</p>
|
||||
<div class="note" id="ALM-19025__note227923535513"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="ALM-19025__p13733223134811"><strong id="ALM-19025__b7548014135912">hdfs://hacluster</strong> indicates the name of the file system used by HBase, and <strong id="ALM-19025__b87122095910">/hbase</strong> indicates the root directory of HBase in the file system. You can log in to MRS Manager, choose <strong id="ALM-19025__b156701940135912">Cluster</strong> > <strong id="ALM-19025__b167334325910">Services</strong> > <strong id="ALM-19025__b63431447135914">HBase</strong> and click <strong id="ALM-19025__b159671454175915">Configuration</strong>. Search for <strong id="ALM-19025__b8912221606">fs.defaultFS</strong> and <strong id="ALM-19025__b21192209019">hbase.data.rootdir</strong>.</p>
|
||||
</div></div>
|
||||
</div>
|
||||
<div class="section" id="ALM-19025__section46056776"><h4 class="sectiontitle"><span id="ALM-19025__text2972174455013">Alarm Attributes</span></h4>
|
||||
@ -62,7 +62,7 @@
|
||||
</div>
|
||||
<div class="section" id="ALM-19025__section20958252"><h4 class="sectiontitle"><span id="ALM-19025__text11410115911514">Possible Causes</span></h4><p id="ALM-19025__p1844714448">The StoreFile files are damaged.</p>
|
||||
</div>
|
||||
<div class="section" id="ALM-19025__section14353515104812"><h4 class="sectiontitle"><span id="ALM-19025__text157397117523">Handling Procedure</span></h4><ol id="ALM-19025__ol12758176556"><li id="ALM-19025__li0272417155517"><span>Log in to FusionInsight Manager and choose <strong id="ALM-19025__b095616141645">O&M</strong>. In the navigation pane on the left, choose <strong id="ALM-19025__b179561314545">Alarm</strong> > <strong id="ALM-19025__b179572141418">Alarms</strong>. On the page that is displayed, locate the row containing the alarm whose <strong id="ALM-19025__b995741415411">Alarm ID</strong> is <strong id="ALM-19025__b195713145414">19025</strong>, and view the service in <strong id="ALM-19025__b29579141040">Location</strong>.</span></li><li id="ALM-19025__li182722017165515"><span>Log in to the node where the HDFS and HBase clients are installed as the client installation user and run the following commands:</span><p><p id="ALM-19025__p927291765511"><strong id="ALM-19025__b1773610562317">cd </strong><em id="ALM-19025__i87372564312">Client installation directory</em></p>
|
||||
<div class="section" id="ALM-19025__section14353515104812"><h4 class="sectiontitle"><span id="ALM-19025__text157397117523">Handling Procedure</span></h4><ol id="ALM-19025__ol12758176556"><li id="ALM-19025__li0272417155517"><span>Log in to MRS Manager and choose <strong id="ALM-19025__b095616141645">O&M</strong>. In the navigation pane on the left, choose <strong id="ALM-19025__b179561314545">Alarm</strong> > <strong id="ALM-19025__b179572141418">Alarms</strong>. On the page that is displayed, locate the row containing the alarm whose <strong id="ALM-19025__b995741415411">Alarm ID</strong> is <strong id="ALM-19025__b195713145414">19025</strong>, and view the service in <strong id="ALM-19025__b29579141040">Location</strong>.</span></li><li id="ALM-19025__li182722017165515"><span>Log in to the node where the HDFS and HBase clients are installed as the client installation user and run the following commands:</span><p><p id="ALM-19025__p927291765511"><strong id="ALM-19025__b1773610562317">cd </strong><em id="ALM-19025__i87372564312">Client installation directory</em></p>
|
||||
<p id="ALM-19025__p1227212173551"><strong id="ALM-19025__b102721017155513">source bigdata_env</strong></p>
|
||||
<p id="ALM-19025__p1027281735519"><strong id="ALM-19025__b1616016594418">kinit</strong> <em id="ALM-19025__i1523393652">Component service user</em> (If <span id="ALM-19025__ph132721317155510">Kerberos authentication is disabled for the cluster (the cluster is in normal mode)</span>, skip this step.)</p>
|
||||
</p></li><li id="ALM-19025__li11272201715518"><span>Check the damaged StoreFile file.</span><p><ul id="ALM-19025__ul138228219290"><li id="ALM-19025__li1748012243294">Run the following command to check whether the <strong id="ALM-19025__b12848115495217">/hbase/autocorrupt</strong> directory of HDFS is empty. If it is not, go to <a href="#ALM-19025__li202731117105511">4</a>.<p id="ALM-19025__p1301102363115"><strong id="ALM-19025__b527231785512">hdfs dfs -ls -R</strong><strong id="ALM-19025__b1727212176550"> hdfs://hacluster</strong><strong id="ALM-19025__b1327291735518">/hbase</strong><strong id="ALM-19025__b5272917135519">/autocorrupt</strong></p>
|
||||
@ -79,7 +79,7 @@
|
||||
</p></li><li id="ALM-19025__li122754172555"><span>Wait several minutes and check whether the alarm is cleared.</span><p><ul id="ALM-19025__ul15275111795517"><li id="ALM-19025__li1327413175559">If yes, no further action is required.</li><li id="ALM-19025__li182755176556">If no, go to <a href="#ALM-19025__li13270141719556">9</a>.</li></ul>
|
||||
</p></li></ol>
|
||||
<p id="ALM-19025__p8551171215559"><strong id="ALM-19025__b194921811118">Collect fault information.</strong></p>
|
||||
<ol start="9" id="ALM-19025__ol15271317145513"><li id="ALM-19025__li13270141719556"><a name="ALM-19025__li13270141719556"></a><a name="li13270141719556"></a><span>On FusionInsight Manager, choose <strong id="ALM-19025__b1597510214111">O&M</strong>. In the navigation pane on the left, choose <strong id="ALM-19025__b1975152118113">Log</strong> > <strong id="ALM-19025__b12975172115113">Download</strong>.</span></li><li id="ALM-19025__li327016177553"><span>Expand the <strong id="ALM-19025__b840925131118">Service</strong> drop-down list, and select <strong id="ALM-19025__b240102519115">HBase</strong> for the target cluster.</span></li><li id="ALM-19025__li14270121705519"><span>Click the edit icon in the upper right corner, and set <strong id="ALM-19025__b11744525101114">Start Date</strong> and <strong id="ALM-19025__b1874416255114">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-19025__b774410259112">Download</strong>.</span></li><li id="ALM-19025__li42711517125520"><span>Contact <span id="ALM-19025__text1137338201119">O&M personnel</span> and provide the collected logs.</span></li></ol>
|
||||
<ol start="9" id="ALM-19025__ol15271317145513"><li id="ALM-19025__li13270141719556"><a name="ALM-19025__li13270141719556"></a><a name="li13270141719556"></a><span>On MRS Manager, choose <strong id="ALM-19025__b1597510214111">O&M</strong>. In the navigation pane on the left, choose <strong id="ALM-19025__b1975152118113">Log</strong> > <strong id="ALM-19025__b12975172115113">Download</strong>.</span></li><li id="ALM-19025__li327016177553"><span>Expand the <strong id="ALM-19025__b840925131118">Service</strong> drop-down list, and select <strong id="ALM-19025__b240102519115">HBase</strong> for the target cluster.</span></li><li id="ALM-19025__li14270121705519"><span>Click the edit icon in the upper right corner, and set <strong id="ALM-19025__b11744525101114">Start Date</strong> and <strong id="ALM-19025__b1874416255114">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-19025__b774410259112">Download</strong>.</span></li><li id="ALM-19025__li42711517125520"><span>Contact <span id="ALM-19025__text1137338201119">O&M personnel</span> and provide the collected logs.</span></li></ol>
|
||||
</div>
|
||||
<div class="section" id="ALM-19025__section169311343318"><h4 class="sectiontitle"><span id="ALM-19025__text19963162185211">Alarm Clearance</span></h4><p id="ALM-19025__p5635175455816">This alarm is automatically cleared after the fault is rectified.</p>
|
||||
</div>
|
||||
|
||||
@ -4,7 +4,7 @@
|
||||
<div id="body0000001971167294"><div class="section" id="ALM-19026__section42400121"><h4 class="sectiontitle"><span id="ALM-19026__text1917154618563">Alarm Description</span></h4><p id="ALM-19026__p42166519292">The system checks the <strong id="ALM-19026__b19150158121414">hdfs://hacluster/hbase/corrupt</strong> directory on the HDFS of each HBase service every 120 seconds. This alarm is generated when there are WAL files in the <strong id="ALM-19026__b141501558161415">/hbase/corrupt</strong> directory.</p>
|
||||
<p id="ALM-19026__p1231351418316">This alarm is cleared when the <strong id="ALM-19026__b17645927111513">/hbase/corrupt</strong> directory does not exist or does not contain WAL files.</p>
|
||||
<p id="ALM-19026__p10779958195019">This alarm applies only to MRS 3.3.0 or later.</p>
|
||||
<div class="note" id="ALM-19026__note227923535513"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="ALM-19026__p111138587412"><strong id="ALM-19026__b0626835101515">hdfs://hacluster</strong> indicates the name of the file system used by HBase, and <strong id="ALM-19026__b16263356158">/hbase</strong> indicates the root directory of HBase in the file system. You can log in to FusionInsight Manager, choose <strong id="ALM-19026__b8627143514156">Cluster</strong> > <strong id="ALM-19026__b1562711357157">Services</strong> > <strong id="ALM-19026__b1162753521510">HBase</strong> and click <strong id="ALM-19026__b206271535131517">Configuration</strong>. Search for <strong id="ALM-19026__b762853520153">fs.defaultFS</strong> and <strong id="ALM-19026__b46285355150">hbase.data.rootdir</strong>.</p>
|
||||
<div class="note" id="ALM-19026__note227923535513"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="ALM-19026__p111138587412"><strong id="ALM-19026__b0626835101515">hdfs://hacluster</strong> indicates the name of the file system used by HBase, and <strong id="ALM-19026__b16263356158">/hbase</strong> indicates the root directory of HBase in the file system. You can log in to MRS Manager, choose <strong id="ALM-19026__b8627143514156">Cluster</strong> > <strong id="ALM-19026__b1562711357157">Services</strong> > <strong id="ALM-19026__b1162753521510">HBase</strong> and click <strong id="ALM-19026__b206271535131517">Configuration</strong>. Search for <strong id="ALM-19026__b762853520153">fs.defaultFS</strong> and <strong id="ALM-19026__b46285355150">hbase.data.rootdir</strong>.</p>
|
||||
</div></div>
|
||||
</div>
|
||||
<div class="section" id="ALM-19026__section46056776"><h4 class="sectiontitle"><span id="ALM-19026__text17511658165619">Alarm Attributes</span></h4>
|
||||
@ -62,13 +62,13 @@
|
||||
</div>
|
||||
<div class="section" id="ALM-19026__section20958252"><h4 class="sectiontitle"><span id="ALM-19026__text9496174017583">Possible Causes</span></h4><p id="ALM-19026__p1844714448">The WAL files are damaged.</p>
|
||||
</div>
|
||||
<div class="section" id="ALM-19026__section14353515104812"><h4 class="sectiontitle"><span id="ALM-19026__text37162049145815">Handling Procedure</span></h4><ol id="ALM-19026__ol252113230207"><li id="ALM-19026__li85201923102019"><span>Log in to FusionInsight Manager and choose <strong id="ALM-19026__b7730412121613">O&M</strong>. In the navigation pane on the left, choose <strong id="ALM-19026__b17316128160">Alarm</strong> > <strong id="ALM-19026__b7731712201616">Alarms</strong>. On the page that is displayed, locate the row containing the alarm whose <strong id="ALM-19026__b16731191281613">Alarm ID</strong> is <strong id="ALM-19026__b1573151291620">19026</strong>, and view the service in <strong id="ALM-19026__b1873241201613">Location</strong>.</span></li><li id="ALM-19026__li0521723112015"><span>Log in to the node where the HDFS clients are installed as the client installation user and run the following commands:</span><p><p id="ALM-19026__p19521523192012"><strong id="ALM-19026__b2197732101613">cd </strong><em id="ALM-19026__i1119793281615">Client installation directory</em></p>
|
||||
<div class="section" id="ALM-19026__section14353515104812"><h4 class="sectiontitle"><span id="ALM-19026__text37162049145815">Handling Procedure</span></h4><ol id="ALM-19026__ol252113230207"><li id="ALM-19026__li85201923102019"><span>Log in to MRS Manager and choose <strong id="ALM-19026__b7730412121613">O&M</strong>. In the navigation pane on the left, choose <strong id="ALM-19026__b17316128160">Alarm</strong> > <strong id="ALM-19026__b7731712201616">Alarms</strong>. On the page that is displayed, locate the row containing the alarm whose <strong id="ALM-19026__b16731191281613">Alarm ID</strong> is <strong id="ALM-19026__b1573151291620">19026</strong>, and view the service in <strong id="ALM-19026__b1873241201613">Location</strong>.</span></li><li id="ALM-19026__li0521723112015"><span>Log in to the node where the HDFS clients are installed as the client installation user and run the following commands:</span><p><p id="ALM-19026__p19521523192012"><strong id="ALM-19026__b2197732101613">cd </strong><em id="ALM-19026__i1119793281615">Client installation directory</em></p>
|
||||
<p id="ALM-19026__p7521162322017"><strong id="ALM-19026__b75213238207">source bigdata_env</strong></p>
|
||||
<p id="ALM-19026__p1152112392016"><strong id="ALM-19026__b12782035171611">kinit</strong> <em id="ALM-19026__i187903581619">Component service user</em> (If <span id="ALM-19026__ph167953551616">Kerberos authentication is disabled for the cluster (the cluster is in normal mode)</span>, skip this step.)</p>
|
||||
</p></li><li id="ALM-19026__li4521122362019"><span>Run the following command to check the damaged WAL files and go to <a href="#ALM-19026__li135201823182014">4</a>:</span><p><p id="ALM-19026__p1452116235200"><strong id="ALM-19026__b252192317202">hdfs dfs -ls </strong><strong id="ALM-19026__b18521122317205">hdfs://hacluster</strong><strong id="ALM-19026__b352132314200">/hbase</strong><strong id="ALM-19026__b652122352014">/corrupt/*%2C*</strong></p>
|
||||
</p></li></ol>
|
||||
<p id="ALM-19026__p8661181992019"><strong id="ALM-19026__b6869655151611">Collect fault information.</strong></p>
|
||||
<ol start="4" id="ALM-19026__ol952032316206"><li id="ALM-19026__li135201823182014"><a name="ALM-19026__li135201823182014"></a><a name="li135201823182014"></a><span>On FusionInsight Manager, choose <strong id="ALM-19026__b5452059191614">O&M</strong>. In the navigation pane on the left, choose <strong id="ALM-19026__b1945175971610">Log</strong> > <strong id="ALM-19026__b34645941615">Download</strong>.</span></li><li id="ALM-19026__li13520623172011"><span>Expand the <strong id="ALM-19026__b19634159131613">Service</strong> drop-down list, and select <strong id="ALM-19026__b1963414599164">HBase</strong> for the target cluster.</span></li><li id="ALM-19026__li5520152313207"><span>Click the edit icon in the upper right corner, and set <strong id="ALM-19026__b18361000175">Start Date</strong> and <strong id="ALM-19026__b16836120111719">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-19026__b178363018173">Download</strong>.</span></li><li id="ALM-19026__li1652011236208"><span>Contact <span id="ALM-19026__text16951220175">O&M personnel</span> and provide the collected logs.</span></li></ol>
|
||||
<ol start="4" id="ALM-19026__ol952032316206"><li id="ALM-19026__li135201823182014"><a name="ALM-19026__li135201823182014"></a><a name="li135201823182014"></a><span>On MRS Manager, choose <strong id="ALM-19026__b5452059191614">O&M</strong>. In the navigation pane on the left, choose <strong id="ALM-19026__b1945175971610">Log</strong> > <strong id="ALM-19026__b34645941615">Download</strong>.</span></li><li id="ALM-19026__li13520623172011"><span>Expand the <strong id="ALM-19026__b19634159131613">Service</strong> drop-down list, and select <strong id="ALM-19026__b1963414599164">HBase</strong> for the target cluster.</span></li><li id="ALM-19026__li5520152313207"><span>Click the edit icon in the upper right corner, and set <strong id="ALM-19026__b18361000175">Start Date</strong> and <strong id="ALM-19026__b16836120111719">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-19026__b178363018173">Download</strong>.</span></li><li id="ALM-19026__li1652011236208"><span>Contact <span id="ALM-19026__text16951220175">O&M personnel</span> and provide the collected logs.</span></li></ol>
|
||||
</div>
|
||||
<div class="section" id="ALM-19026__section169311343318"><h4 class="sectiontitle"><span id="ALM-19026__text38561858145810">Alarm Clearance</span></h4><p id="ALM-19026__p5635175455816">This alarm is automatically cleared after the fault is rectified.</p>
|
||||
</div>
|
||||
|
||||
102
docs/mrs/umn/ALM-19030.html
Normal file
102
docs/mrs/umn/ALM-19030.html
Normal file
File diff suppressed because it is too large
Load Diff
98
docs/mrs/umn/ALM-19031.html
Normal file
98
docs/mrs/umn/ALM-19031.html
Normal file
@ -0,0 +1,98 @@
|
||||
<a name="ALM-19031"></a><a name="ALM-19031"></a>
|
||||
|
||||
<h1 class="topictitle1">ALM-19031 Number of RegionServer RPC Connections Exceeds the Threshold</h1>
|
||||
<div id="body0000001971167314"><div class="section" id="ALM-19031__en-us_topic_0000001774710640_section42400121"><h4 class="sectiontitle"><span id="ALM-19031__en-us_topic_0000001774710640_text11958175822511">Alarm Description</span></h4><p id="ALM-19031__en-us_topic_0000001774710640_p42166519292">The system checks the number of RegionServer RPC connections in each HBase service every 30 seconds. This alarm is generated when the number of RPC connections of a RegionServer instance exceeds the threshold for 10 consecutive times.</p>
|
||||
<p id="ALM-19031__en-us_topic_0000001774710640_p1231351418316">This alarm is cleared when the number of RPC connections of a RegionServer instance is less than or equal to the threshold.</p>
|
||||
<p id="ALM-19031__en-us_topic_0000001774710640_p10779958195019">This alarm is generated only for MRS 3.3.1 or later.</p>
|
||||
</div>
|
||||
<div class="section" id="ALM-19031__en-us_topic_0000001774710640_section46056776"><h4 class="sectiontitle"><span id="ALM-19031__en-us_topic_0000001774710640_text18690329262">Alarm Attributes</span></h4>
|
||||
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-19031__en-us_topic_0000001774710640_table3909558" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-19031__en-us_topic_0000001774710640_row9358345"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.1"><p id="ALM-19031__en-us_topic_0000001774710640_p19828475"><span id="ALM-19031__en-us_topic_0000001774710640_text14553174118286">Alarm ID</span></p>
|
||||
</th>
|
||||
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.2"><p id="ALM-19031__en-us_topic_0000001774710640_p62602629"><span id="ALM-19031__en-us_topic_0000001774710640_text10623610112610">Alarm Severity</span></p>
|
||||
</th>
|
||||
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.3"><p id="ALM-19031__en-us_topic_0000001774710640_p37648208"><span id="ALM-19031__en-us_topic_0000001774710640_text1568825511215">Auto Cleared</span></p>
|
||||
</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody><tr id="ALM-19031__en-us_topic_0000001774710640_row29606020"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.1 "><p id="ALM-19031__en-us_topic_0000001774710640_p581094414588">19031</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.2 "><ul id="ALM-19031__en-us_topic_0000001774710640_ul499215152411"><li id="ALM-19031__en-us_topic_0000001774710640_li49921251182415">Critical (default threshold: 2000)</li><li id="ALM-19031__en-us_topic_0000001774710640_li15992451132411">Major (default threshold: 1000)</li></ul>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.3 "><p id="ALM-19031__en-us_topic_0000001774710640_p98051144165816">Yes</p>
|
||||
</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
</div>
|
||||
</div>
|
||||
<div class="section" id="ALM-19031__en-us_topic_0000001774710640_section11857806"><h4 class="sectiontitle"><span id="ALM-19031__en-us_topic_0000001774710640_text137861872618">Alarm Parameters</span></h4>
|
||||
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-19031__en-us_topic_0000001774710640_table63098886" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-19031__en-us_topic_0000001774710640_row42029922"><th align="left" class="cellrowborder" valign="top" width="20%" id="mcps1.3.3.2.1.4.1.1"><p id="ALM-19031__en-us_topic_0000001774710640_p11716123011521"><span id="ALM-19031__en-us_topic_0000001774710640_text118041758205214">Type</span></p>
|
||||
</th>
|
||||
<th align="left" class="cellrowborder" valign="top" width="20%" id="mcps1.3.3.2.1.4.1.2"><p id="ALM-19031__en-us_topic_0000001774710640_p48980553"><span id="ALM-19031__en-us_topic_0000001774710640_text296118200264">Parameter</span></p>
|
||||
</th>
|
||||
<th align="left" class="cellrowborder" valign="top" width="60%" id="mcps1.3.3.2.1.4.1.3"><p id="ALM-19031__en-us_topic_0000001774710640_p8001819"><span id="ALM-19031__en-us_topic_0000001774710640_text511510248263">Description</span></p>
|
||||
</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody><tr id="ALM-19031__en-us_topic_0000001774710640_row88451931718"><td class="cellrowborder" rowspan="4" valign="top" width="20%" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-19031__en-us_topic_0000001774710640_p198154148537">Location Information</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="20%" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-19031__en-us_topic_0000001774710640_p13858113752316">Source</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="60%" headers="mcps1.3.3.2.1.4.1.3 "><p id="ALM-19031__en-us_topic_0000001774710640_p187931338134115">Specifies the cluster for which the alarm was generated.</p>
|
||||
</td>
|
||||
</tr>
|
||||
<tr id="ALM-19031__en-us_topic_0000001774710640_row44167618"><td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-19031__en-us_topic_0000001774710640_p39123317">ServiceName</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-19031__en-us_topic_0000001774710640_p83161014635">Specifies the service for which the alarm was generated.</p>
|
||||
</td>
|
||||
</tr>
|
||||
<tr id="ALM-19031__en-us_topic_0000001774710640_row1943587"><td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-19031__en-us_topic_0000001774710640_p37226997">RoleName</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-19031__en-us_topic_0000001774710640_p18316114535">Specifies the role for which the alarm was generated.</p>
|
||||
</td>
|
||||
</tr>
|
||||
<tr id="ALM-19031__en-us_topic_0000001774710640_row10765874"><td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-19031__en-us_topic_0000001774710640_p66118565">HostName</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-19031__en-us_topic_0000001774710640_p33168145315">Specifies the host for which the alarm was generated.</p>
|
||||
</td>
|
||||
</tr>
|
||||
<tr id="ALM-19031__en-us_topic_0000001774710640_row498425193419"><td class="cellrowborder" valign="top" width="20%" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-19031__en-us_topic_0000001774710640_p187169301529">Additional Information</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="20%" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-19031__en-us_topic_0000001774710640_p1624673963311">Threshold</p>
|
||||
</td>
|
||||
<td class="cellrowborder" valign="top" width="60%" headers="mcps1.3.3.2.1.4.1.3 "><p id="ALM-19031__en-us_topic_0000001774710640_p35562471168">Specifies the threshold for generating the alarm.</p>
|
||||
</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
</div>
|
||||
</div>
|
||||
<div class="section" id="ALM-19031__en-us_topic_0000001774710640_section39611396"><h4 class="sectiontitle"><span id="ALM-19031__en-us_topic_0000001774710640_text7398142810268">Impact on the System</span></h4><p id="ALM-19031__en-us_topic_0000001774710640_p1199521583">There are a large amount of concurrent access requests on the RegionServer node, which imposes great pressure and causes slow response. For latency-sensitive services, a large number of service read and write requests may time out.</p>
|
||||
</div>
|
||||
<div class="section" id="ALM-19031__en-us_topic_0000001774710640_section20958252"><h4 class="sectiontitle"><span id="ALM-19031__en-us_topic_0000001774710640_text202332032152613">Possible Causes</span></h4><p id="ALM-19031__en-us_topic_0000001774710640_p9847102513373">Too many concurrent requests are sent from applications to access HBase.</p>
|
||||
</div>
|
||||
<div class="section" id="ALM-19031__en-us_topic_0000001774710640_section9251129132119"><h4 class="sectiontitle"><span id="ALM-19031__en-us_topic_0000001774710640_text1733835102615">Handling Procedure</span></h4><ol id="ALM-19031__en-us_topic_0000001774710640_ol6708234101512"><li id="ALM-19031__en-us_topic_0000001774710640_li187081734191516"><span>Log in to MRS Manager and choose <strong id="ALM-19031__en-us_topic_0000001774710640_b18491152463911">O&M</strong>. In the navigation pane on the left, choose <strong id="ALM-19031__en-us_topic_0000001774710640_b144911024123916">Alarm</strong> > <strong id="ALM-19031__en-us_topic_0000001774710640_b11491424133920">Alarms</strong>. On the page that is displayed, locate the row containing the alarm whose <strong id="ALM-19031__en-us_topic_0000001774710640_b149132423916">Alarm ID</strong> is <strong id="ALM-19031__en-us_topic_0000001774710640_b0869103363916">19031</strong>, and view the service instance and host name in <strong id="ALM-19031__en-us_topic_0000001774710640_b14927242398">Location</strong>.</span></li></ol>
|
||||
<p id="ALM-19031__en-us_topic_0000001774710640_p18769103663611"><strong id="ALM-19031__en-us_topic_0000001774710640_b1146734118395">Check the number of concurrent requests accessing HBase.</strong></p>
|
||||
<ol start="2" id="ALM-19031__en-us_topic_0000001774710640_ol37085342153"><li id="ALM-19031__en-us_topic_0000001774710640_li1169848199"><span>Log in to the node where the HBase client is installed and check whether <strong id="ALM-19031__en-us_topic_0000001774710640_b168775332401">hbase.client.ipc.pool.size</strong> in the <em id="ALM-19031__en-us_topic_0000001774710640_i1578243784012">Client installation directory</em><strong id="ALM-19031__en-us_topic_0000001774710640_b6205421400">/HBase/hbase/conf/hbase-site.xml</strong> file is set to a value greater than <strong id="ALM-19031__en-us_topic_0000001774710640_b2669102913712">5</strong>.</span><p><ul id="ALM-19031__en-us_topic_0000001774710640_ul197493917470"><li id="ALM-19031__en-us_topic_0000001774710640_li774103917479">If yes, go to <a href="#ALM-19031__en-us_topic_0000001774710640_li923414287410">3</a>.</li><li id="ALM-19031__en-us_topic_0000001774710640_li874203913474">If no, go to <a href="#ALM-19031__en-us_topic_0000001774710640_li1891341510112">5</a>.</li></ul>
|
||||
</p></li><li id="ALM-19031__en-us_topic_0000001774710640_li923414287410"><a name="ALM-19031__en-us_topic_0000001774710640_li923414287410"></a><a name="en-us_topic_0000001774710640_li923414287410"></a><span>Decrease the value of <strong id="ALM-19031__en-us_topic_0000001774710640_b19981459154019">hbase.client.ipc.pool.size</strong> and save the change.</span></li><li id="ALM-19031__en-us_topic_0000001774710640_li176234184110"><span>Wait several minutes and check whether the alarm is cleared.</span><p><ul id="ALM-19031__en-us_topic_0000001774710640_ul362318184111"><li id="ALM-19031__en-us_topic_0000001774710640_li116232018191117">If yes, no further action is required.</li><li id="ALM-19031__en-us_topic_0000001774710640_li126237180111">If no, go to <a href="#ALM-19031__en-us_topic_0000001774710640_li1891341510112">5</a>.</li></ul>
|
||||
</p></li><li id="ALM-19031__en-us_topic_0000001774710640_li1891341510112"><a name="ALM-19031__en-us_topic_0000001774710640_li1891341510112"></a><a name="en-us_topic_0000001774710640_li1891341510112"></a><span>Check whether the number of concurrent requests accessing the HBase service is too large.</span><p><ul id="ALM-19031__en-us_topic_0000001774710640_ul16708525161315"><li id="ALM-19031__en-us_topic_0000001774710640_li57083253131">If yes, go to <a href="#ALM-19031__en-us_topic_0000001774710640_li7961131445410">6</a>.</li><li id="ALM-19031__en-us_topic_0000001774710640_li20708142519133">If no, go to <a href="#ALM-19031__en-us_topic_0000001774710640_li959275915215">8</a>.</li></ul>
|
||||
<div class="note" id="ALM-19031__en-us_topic_0000001774710640_note7541131893510"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="ALM-19031__en-us_topic_0000001774710640_p85007425353">The number of concurrent HBase API calls by all applications must be less than or equal to the following value:</p>
|
||||
<p id="ALM-19031__en-us_topic_0000001774710640_p9541161833519">Number of RegionServer instances x Maximum number of handlers of a single RegionServer</p>
|
||||
<p id="ALM-19031__en-us_topic_0000001774710640_p18263730153813">To obtain the number of RegionServer instances, log in to Manager and choose <strong id="ALM-19031__en-us_topic_0000001774710640_b174361815172314">Cluster</strong> > <strong id="ALM-19031__en-us_topic_0000001774710640_b11281518112313">Services</strong> > <strong id="ALM-19031__en-us_topic_0000001774710640_b5843112212234">HBase</strong> > <strong id="ALM-19031__en-us_topic_0000001774710640_b193612582314">Instances</strong>. To obtain the maximum number of handlers of a single RegionServer, click <strong id="ALM-19031__en-us_topic_0000001774710640_b1468142162319">Configurations</strong> and search for the <strong id="ALM-19031__en-us_topic_0000001774710640_b17336195018235">hbase.regionserver.handler.count</strong> parameter.</p>
|
||||
</div></div>
|
||||
</p></li></ol><ol start="6" id="ALM-19031__en-us_topic_0000001774710640_ol19709113194416"><li id="ALM-19031__en-us_topic_0000001774710640_li7961131445410"><a name="ALM-19031__en-us_topic_0000001774710640_li7961131445410"></a><a name="en-us_topic_0000001774710640_li7961131445410"></a><span>Contact the upper-layer service support personnel to decrease the concurrent requests based on actual service requirements.</span></li><li id="ALM-19031__en-us_topic_0000001774710640_li970913314444"><span>Wait several minutes and check whether the alarm is cleared.</span><p><ul id="ALM-19031__en-us_topic_0000001774710640_ul770681614614"><li id="ALM-19031__en-us_topic_0000001774710640_li9706816124617">If yes, no further action is required.</li><li id="ALM-19031__en-us_topic_0000001774710640_li0706416104619">If no, go to <a href="#ALM-19031__en-us_topic_0000001774710640_li959275915215">8</a>.</li></ul>
|
||||
</p></li></ol>
|
||||
<p id="ALM-19031__en-us_topic_0000001774710640_p15601739207"><strong id="ALM-19031__en-us_topic_0000001774710640_b3606332013">Collect fault information.</strong></p>
|
||||
<ol start="8" id="ALM-19031__en-us_topic_0000001774710640_ol1559215914523"><li id="ALM-19031__en-us_topic_0000001774710640_li959275915215"><a name="ALM-19031__en-us_topic_0000001774710640_li959275915215"></a><a name="en-us_topic_0000001774710640_li959275915215"></a><span>On MRS Manager, choose <strong id="ALM-19031__en-us_topic_0000001774710640_b706159576111728">O&M</strong>. In the navigation pane on the left, choose <strong id="ALM-19031__en-us_topic_0000001774710640_b1056485871111728">Log</strong> > <strong id="ALM-19031__en-us_topic_0000001774710640_b1646220034111728">Download</strong>.</span></li><li id="ALM-19031__en-us_topic_0000001774710640_li1959211592529"><span>Expand the <strong id="ALM-19031__en-us_topic_0000001774710640_b668788891111728">Service</strong> drop-down list, and select <strong id="ALM-19031__en-us_topic_0000001774710640_b835086941111728">HBase</strong> for the target cluster.</span></li><li id="ALM-19031__en-us_topic_0000001774710640_li19592145975215"><span>Click the edit icon in the upper right corner, and set <strong id="ALM-19031__en-us_topic_0000001774710640_b1765658928111728">Start Date</strong> and <strong id="ALM-19031__en-us_topic_0000001774710640_b550824030111728">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-19031__en-us_topic_0000001774710640_b722823580111728">Download</strong>.</span></li><li id="ALM-19031__en-us_topic_0000001774710640_li1759215945217"><span>Contact <span id="ALM-19031__en-us_topic_0000001774710640_text1059295916526">O&M personnel</span> and provide the collected logs.</span></li></ol>
|
||||
</div>
|
||||
<div class="section" id="ALM-19031__en-us_topic_0000001774710640_section169311343318"><h4 class="sectiontitle"><span id="ALM-19031__en-us_topic_0000001774710640_text596254111265">Alarm Clearance</span></h4><p id="ALM-19031__en-us_topic_0000001774710640_p5635175455816">This alarm is automatically cleared after the fault is rectified.</p>
|
||||
</div>
|
||||
<div class="section" id="ALM-19031__en-us_topic_0000001774710640_section19896826"><h4 class="sectiontitle"><span id="ALM-19031__en-us_topic_0000001774710640_text7831044102616">Related Information</span></h4><p id="ALM-19031__en-us_topic_0000001774710640_p9275082"><span id="ALM-19031__en-us_topic_0000001774710640_text61294221672">None.</span></p>
|
||||
</div>
|
||||
</div>
|
||||
<div>
|
||||
<div class="familylinks">
|
||||
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
124
docs/mrs/umn/ALM-19032.html
Normal file
124
docs/mrs/umn/ALM-19032.html
Normal file
File diff suppressed because it is too large
Load Diff
132
docs/mrs/umn/ALM-19033.html
Normal file
132
docs/mrs/umn/ALM-19033.html
Normal file
File diff suppressed because it is too large
Load Diff
106
docs/mrs/umn/ALM-19034.html
Normal file
106
docs/mrs/umn/ALM-19034.html
Normal file
File diff suppressed because it is too large
Load Diff
124
docs/mrs/umn/ALM-19035.html
Normal file
124
docs/mrs/umn/ALM-19035.html
Normal file
File diff suppressed because it is too large
Load Diff
133
docs/mrs/umn/ALM-19036.html
Normal file
133
docs/mrs/umn/ALM-19036.html
Normal file
File diff suppressed because it is too large
Load Diff
@ -65,15 +65,15 @@
|
||||
<div class="section" id="ALM-25007__section27017478"><h4 class="sectiontitle"><span id="ALM-25007__text12656240135813">Possible Causes</span></h4><ul id="ALM-25007__ul928305117158"><li id="ALM-25007__li12831051111518">There are too many SlapdServer connections.</li><li id="ALM-25007__li20252875511">The alarm threshold or alarm trigger count is improperly configured.</li></ul>
|
||||
</div>
|
||||
<div class="section" id="ALM-25007__section535785120256"><h4 class="sectiontitle"><span id="ALM-25007__text19569135285811">Handling Procedure</span></h4><p id="ALM-25007__p13680121610197"><strong id="ALM-25007__b554744191416">Check whether there are too many SlapdServer process connections.</strong></p>
|
||||
<ol id="ALM-25007__ol3606133663917"><li id="ALM-25007__li360615369397"><span>Log in to FusionInsight Manager and choose <strong id="ALM-25007__b274121091511">Cluster</strong> > <strong id="ALM-25007__b15878191591510">Services</strong> > <strong id="ALM-25007__b262119189157">LdapServer</strong>.</span></li><li id="ALM-25007__li1360653613918"><span>On the LdapServer dashboard page, observe the SlapdServer process connections and decrease the connections based on service requirements.</span><p><div class="fignone" id="ALM-25007__fig1360663643914"><span class="figcap"><b>Figure 1 </b>SlapdServer process connections</span><br><span><img id="ALM-25007__image86061536153916" src="en-us_image_0000001971659216.png"></span></div>
|
||||
<ol id="ALM-25007__ol3606133663917"><li id="ALM-25007__li360615369397"><span>Log in to MRS Manager and choose <strong id="ALM-25007__b274121091511">Cluster</strong> > <strong id="ALM-25007__b15878191591510">Services</strong> > <strong id="ALM-25007__b262119189157">LdapServer</strong>.</span></li><li id="ALM-25007__li1360653613918"><span>On the LdapServer dashboard page, observe the SlapdServer process connections and decrease the connections based on service requirements.</span><p><div class="fignone" id="ALM-25007__fig1360663643914"><span class="figcap"><b>Figure 1 </b>SlapdServer process connections</span><br><span><img id="ALM-25007__image86061536153916" src="en-us_image_0000001971659216.png"></span></div>
|
||||
</p></li><li id="ALM-25007__li36061236183919"><span>Wait about 2 minutes and check whether the alarm is cleared.</span><p><ul id="ALM-25007__ul4606143673918"><li id="ALM-25007__li160693618391">If yes, no further action is required.</li><li id="ALM-25007__li106061436163913">If no, go to <a href="#ALM-25007__li1860517366397">4</a>.</li></ul>
|
||||
</p></li></ol>
|
||||
<p id="ALM-25007__p197271833958"><strong id="ALM-25007__b13441112181">Check whether the alarm threshold or alarm trigger count is properly configured.</strong></p>
|
||||
<ol start="4" id="ALM-25007__ol7606036183911"><li id="ALM-25007__li1860517366397"><a name="ALM-25007__li1860517366397"></a><a name="li1860517366397"></a><span>On FusionInsight Manager, choose <strong id="ALM-25007__b97002030184">O&M</strong> > <strong id="ALM-25007__b37017311816">Alarm</strong> > <strong id="ALM-25007__b1370216312180">Thresholds</strong>, click the name of the desired cluster, choose <strong id="ALM-25007__b177031439188">LdapServer</strong> > <strong id="ALM-25007__b186991181409">Other</strong> > <strong id="ALM-25007__b1270520331810">SlapdServer Service Connections</strong>, and check whether the alarm trigger count and alarm threshold are set properly.</span><p><ul id="ALM-25007__ul15605736123916"><li id="ALM-25007__li10605836173911">If yes, go to <a href="#ALM-25007__li2086435114014">7</a>.</li><li id="ALM-25007__li12605236173920">If no, go to <a href="#ALM-25007__li20605336193916">5</a>.</li></ul>
|
||||
<ol start="4" id="ALM-25007__ol7606036183911"><li id="ALM-25007__li1860517366397"><a name="ALM-25007__li1860517366397"></a><a name="li1860517366397"></a><span>On MRS Manager, choose <strong id="ALM-25007__b97002030184">O&M</strong> > <strong id="ALM-25007__b37017311816">Alarm</strong> > <strong id="ALM-25007__b1370216312180">Thresholds</strong>, click the name of the desired cluster, choose <strong id="ALM-25007__b177031439188">LdapServer</strong> > <strong id="ALM-25007__b186991181409">Other</strong> > <strong id="ALM-25007__b1270520331810">SlapdServer Service Connections</strong>, and check whether the alarm trigger count and alarm threshold are set properly.</span><p><ul id="ALM-25007__ul15605736123916"><li id="ALM-25007__li10605836173911">If yes, go to <a href="#ALM-25007__li2086435114014">7</a>.</li><li id="ALM-25007__li12605236173920">If no, go to <a href="#ALM-25007__li20605336193916">5</a>.</li></ul>
|
||||
</p></li><li id="ALM-25007__li20605336193916"><a name="ALM-25007__li20605336193916"></a><a name="li20605336193916"></a><span>Change the trigger count and alarm threshold based on the actual number of process connections, and apply the changes.</span></li><li id="ALM-25007__li760611368392"><span>Wait 2 minutes and check whether the alarm is automatically cleared.</span><p><ul id="ALM-25007__ul76061436193914"><li id="ALM-25007__li8605536113913">If yes, no further action is required.</li><li id="ALM-25007__li1060613362395">If no, go to <a href="#ALM-25007__li2086435114014">7</a>.</li></ul>
|
||||
</p></li></ol>
|
||||
<p class="tableheading" id="ALM-25007__p22707215144835"><strong id="ALM-25007__b295134514200">Collect fault information.</strong></p>
|
||||
<ol start="7" id="ALM-25007__ol78649514010"><li id="ALM-25007__li2086435114014"><a name="ALM-25007__li2086435114014"></a><a name="li2086435114014"></a><span>On FusionInsight Manager, choose <strong id="ALM-25007__b1874116469201">O&M</strong>. In the navigation pane on the left, choose <strong id="ALM-25007__b157416465209">Log</strong> > <strong id="ALM-25007__b974144602013">Download</strong>.</span></li><li id="ALM-25007__li188641959409"><span>Expand the <strong id="ALM-25007__b8504952013">Service</strong> drop-down list, and select <strong id="ALM-25007__b176174902013">LdapServer</strong> for the target cluster.</span></li><li id="ALM-25007__li8864165104018"><span>Specify <strong id="ALM-25007__b81165715209">Hosts</strong> for collecting logs, which is optional. By default, all hosts are selected.</span></li><li id="ALM-25007__li286419594011"><span>Click <span><img id="ALM-25007__image68642513407" src="en-us_image_0000001971818984.png"></span> in the upper right corner, and set <strong id="ALM-25007__b55031022112216">Start Date</strong> and <strong id="ALM-25007__b135056229226">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-25007__b15508322182218">Download</strong>.</span></li><li id="ALM-25007__li15864954406"><span>Contact <span id="ALM-25007__text3163192382317">O&M personnel</span> and provide the collected logs.</span></li></ol>
|
||||
<ol start="7" id="ALM-25007__ol78649514010"><li id="ALM-25007__li2086435114014"><a name="ALM-25007__li2086435114014"></a><a name="li2086435114014"></a><span>On MRS Manager, choose <strong id="ALM-25007__b1874116469201">O&M</strong>. In the navigation pane on the left, choose <strong id="ALM-25007__b157416465209">Log</strong> > <strong id="ALM-25007__b974144602013">Download</strong>.</span></li><li id="ALM-25007__li188641959409"><span>Expand the <strong id="ALM-25007__b8504952013">Service</strong> drop-down list, and select <strong id="ALM-25007__b176174902013">LdapServer</strong> for the target cluster.</span></li><li id="ALM-25007__li8864165104018"><span>Specify <strong id="ALM-25007__b81165715209">Hosts</strong> for collecting logs, which is optional. By default, all hosts are selected.</span></li><li id="ALM-25007__li286419594011"><span>Click <span><img id="ALM-25007__image68642513407" src="en-us_image_0000001971818984.png"></span> in the upper right corner, and set <strong id="ALM-25007__b55031022112216">Start Date</strong> and <strong id="ALM-25007__b135056229226">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-25007__b15508322182218">Download</strong>.</span></li><li id="ALM-25007__li15864954406"><span>Contact <span id="ALM-25007__text3163192382317">O&M personnel</span> and provide the collected logs.</span></li></ol>
|
||||
</div>
|
||||
<div class="section" id="ALM-25007__section169311343318"><h4 class="sectiontitle"><span id="ALM-25007__text367020138593">Alarm Clearance</span></h4><p id="ALM-25007__p754913417333">This alarm is automatically cleared after the fault is rectified.</p>
|
||||
</div>
|
||||
|
||||
@ -66,16 +66,16 @@
|
||||
<div class="section" id="ALM-25008__section27017478"><h4 class="sectiontitle"><span id="ALM-25008__text12656240135813">Possible Causes</span></h4><ul id="ALM-25008__ul460131185210"><li id="ALM-25008__li1373752155210">The alarm threshold or alarm trigger count is improperly configured.</li><li id="ALM-25008__li1760201165215">The CPU configuration cannot meet service requirements, and the CPU usage reaches the upper limit.</li></ul>
|
||||
</div>
|
||||
<div class="section" id="ALM-25008__section535785120256"><h4 class="sectiontitle"><span id="ALM-25008__text19569135285811">Handling Procedure</span></h4><p id="ALM-25008__p18319915115316"><strong id="ALM-25008__b1692240193419">Check whether the alarm threshold or alarm trigger count is properly configured.</strong></p>
|
||||
<ol id="ALM-25008__ol12485153614462"><li id="ALM-25008__li124853366461"><span>Log in to FusionInsight Manager, choose <strong id="ALM-25008__b10959185319342">O&M</strong> > <strong id="ALM-25008__b8960125312349">Alarm</strong> > <strong id="ALM-25008__b896118537342">Thresholds</strong>, click the name of the desired cluster, choose <strong id="ALM-25008__b3962253113412">LdapServer</strong> > <strong id="ALM-25008__b142712140111">Other</strong> > <strong id="ALM-25008__b8963253153419">SlapdServer Service Total CPU Percentage</strong>, and check whether the alarm trigger count and alarm threshold are set properly.</span><p><ul id="ALM-25008__ul17485136134617"><li id="ALM-25008__li1748553654613">If yes, go to <a href="#ALM-25008__li848412361466">4</a>.</li><li id="ALM-25008__li19485153694610">If no, go to <a href="#ALM-25008__li174859361464">2</a>.</li></ul>
|
||||
<ol id="ALM-25008__ol12485153614462"><li id="ALM-25008__li124853366461"><span>Log in to MRS Manager, choose <strong id="ALM-25008__b10959185319342">O&M</strong> > <strong id="ALM-25008__b8960125312349">Alarm</strong> > <strong id="ALM-25008__b896118537342">Thresholds</strong>, click the name of the desired cluster, choose <strong id="ALM-25008__b3962253113412">LdapServer</strong> > <strong id="ALM-25008__b142712140111">Other</strong> > <strong id="ALM-25008__b8963253153419">SlapdServer Service Total CPU Percentage</strong>, and check whether the alarm trigger count and alarm threshold are set properly.</span><p><ul id="ALM-25008__ul17485136134617"><li id="ALM-25008__li1748553654613">If yes, go to <a href="#ALM-25008__li848412361466">4</a>.</li><li id="ALM-25008__li19485153694610">If no, go to <a href="#ALM-25008__li174859361464">2</a>.</li></ul>
|
||||
</p></li><li id="ALM-25008__li174859361464"><a name="ALM-25008__li174859361464"></a><a name="li174859361464"></a><span>Change the trigger count and alarm threshold based on the actual CPU usage, and apply the changes.</span></li><li id="ALM-25008__li1148563612460"><span>Wait 2 minutes and check whether the alarm is automatically cleared.</span><p><ul id="ALM-25008__ul9485143618462"><li id="ALM-25008__li748513618463">If yes, no further action is required.</li><li id="ALM-25008__li548563615468">If no, go to <a href="#ALM-25008__li848412361466">4</a>.</li></ul>
|
||||
</p></li></ol>
|
||||
<p id="ALM-25008__p832011512539"><strong id="ALM-25008__b45360489359">Check whether the CPU usage reaches the upper limit.</strong></p>
|
||||
<ol start="4" id="ALM-25008__ol5485203614469"><li id="ALM-25008__li848412361466"><a name="ALM-25008__li848412361466"></a><a name="li848412361466"></a><span>On FusionInsight Manager, choose <strong id="ALM-25008__b971913559358">O&M</strong> > <strong id="ALM-25008__b972118552354">Alarm</strong> > <strong id="ALM-25008__b1272255523516">Alarms</strong>. In the right pane, click this alarm and obtain the host name in <strong id="ALM-25008__b972310555352">Location</strong>.</span></li><li id="ALM-25008__li1248417366465"><a name="ALM-25008__li1248417366465"></a><a name="li1248417366465"></a><span>Choose <strong id="ALM-25008__b3925118345">Cluster</strong> > <strong id="ALM-25008__b199251915349">Services</strong> > <strong id="ALM-25008__b69256113343">LdapServer</strong>, click the <strong id="ALM-25008__b8916032173415">Instance</strong> tab, and click the SlapdServer instance corresponding to the host name in <a href="#ALM-25008__li848412361466">4</a>.</span></li><li id="ALM-25008__li133258517208"><a name="ALM-25008__li133258517208"></a><a name="li133258517208"></a><span>On the dashboard of the instance, observe the real-time data of the <strong id="ALM-25008__b159977196486">CPU Usage of a Single SlapdServer Instance</strong> chart for about 5 minutes and check whether the CPU usage exceeds the threshold (<strong id="ALM-25008__b128032916541">75%</strong> by default) for multiple times.</span><p><ul id="ALM-25008__ul1846911369207"><li id="ALM-25008__li1246923622018">If yes, go to <a href="#ALM-25008__li14826210161714">7</a>.</li><li id="ALM-25008__li0145124915202">If no, go to <a href="#ALM-25008__li89991152124618">9</a>.</li></ul>
|
||||
<ol start="4" id="ALM-25008__ol5485203614469"><li id="ALM-25008__li848412361466"><a name="ALM-25008__li848412361466"></a><a name="li848412361466"></a><span>On MRS Manager, choose <strong id="ALM-25008__b971913559358">O&M</strong> > <strong id="ALM-25008__b972118552354">Alarm</strong> > <strong id="ALM-25008__b1272255523516">Alarms</strong>. In the right pane, click this alarm and obtain the host name in <strong id="ALM-25008__b972310555352">Location</strong>.</span></li><li id="ALM-25008__li1248417366465"><a name="ALM-25008__li1248417366465"></a><a name="li1248417366465"></a><span>Choose <strong id="ALM-25008__b3925118345">Cluster</strong> > <strong id="ALM-25008__b199251915349">Services</strong> > <strong id="ALM-25008__b69256113343">LdapServer</strong>, click the <strong id="ALM-25008__b8916032173415">Instance</strong> tab, and click the SlapdServer instance corresponding to the host name in <a href="#ALM-25008__li848412361466">4</a>.</span></li><li id="ALM-25008__li133258517208"><a name="ALM-25008__li133258517208"></a><a name="li133258517208"></a><span>On the dashboard of the instance, observe the real-time data of the <strong id="ALM-25008__b159977196486">CPU Usage of a Single SlapdServer Instance</strong> chart for about 5 minutes and check whether the CPU usage exceeds the threshold (<strong id="ALM-25008__b128032916541">75%</strong> by default) for multiple times.</span><p><ul id="ALM-25008__ul1846911369207"><li id="ALM-25008__li1246923622018">If yes, go to <a href="#ALM-25008__li14826210161714">7</a>.</li><li id="ALM-25008__li0145124915202">If no, go to <a href="#ALM-25008__li89991152124618">9</a>.</li></ul>
|
||||
</p></li><li id="ALM-25008__li14826210161714"><a name="ALM-25008__li14826210161714"></a><a name="li14826210161714"></a><span>Check whether the status of other SlapdServer instances is normal. For details, see <a href="#ALM-25008__li1248417366465">5</a> to <a href="#ALM-25008__li133258517208">6</a>.</span><p><ul id="ALM-25008__ul53828202177"><li id="ALM-25008__li1298672511175">If yes, contact the MRS cluster administrator to evaluate whether to expand the capacity of SlapdServer instances. Then, go to <a href="#ALM-25008__li12485203614616">8</a>.</li><li id="ALM-25008__li4382920191715">If no, repair the faulty SlapdServer instance and go to <a href="#ALM-25008__li12485203614616">8</a>.</li></ul>
|
||||
</p></li><li id="ALM-25008__li12485203614616"><a name="ALM-25008__li12485203614616"></a><a name="li12485203614616"></a><span>Check whether the alarm is cleared.</span><p><ul id="ALM-25008__ul16484153654614"><li id="ALM-25008__li15484163634617">If yes, no further action is required.</li><li id="ALM-25008__li184842368460">If no, go to <a href="#ALM-25008__li89991152124618">9</a>.</li></ul>
|
||||
</p></li></ol>
|
||||
<p class="tableheading" id="ALM-25008__p22707215144835"><strong id="ALM-25008__b17539175141012">Collect fault information.</strong></p>
|
||||
<ol start="9" id="ALM-25008__ol14015319462"><li id="ALM-25008__li89991152124618"><a name="ALM-25008__li89991152124618"></a><a name="li89991152124618"></a><span>On FusionInsight Manager, choose <strong id="ALM-25008__b18375135361015">O&M</strong>. In the navigation pane on the left, choose <strong id="ALM-25008__b1637515331015">Log</strong> > <strong id="ALM-25008__b3375553171015">Download</strong>.</span></li><li id="ALM-25008__li15999175218461"><span>Expand the <strong id="ALM-25008__b2060812549107">Service</strong> drop-down list, and select <strong id="ALM-25008__b176086541100">LdapServer</strong> for the target cluster.</span></li><li id="ALM-25008__li1799955234619"><span>Click <span><img id="ALM-25008__image1299965219461" src="en-us_image_0000002008258989.png"></span> in the upper right corner, and set <strong id="ALM-25008__b9290115818109">Start Date</strong> and <strong id="ALM-25008__b02911158101019">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-25008__b4291185881017">Download</strong>.</span></li><li id="ALM-25008__li1602535462"><span>Contact <span id="ALM-25008__text176166613113">O&M personnel</span> and provide the collected logs.</span></li></ol>
|
||||
<ol start="9" id="ALM-25008__ol14015319462"><li id="ALM-25008__li89991152124618"><a name="ALM-25008__li89991152124618"></a><a name="li89991152124618"></a><span>On MRS Manager, choose <strong id="ALM-25008__b18375135361015">O&M</strong>. In the navigation pane on the left, choose <strong id="ALM-25008__b1637515331015">Log</strong> > <strong id="ALM-25008__b3375553171015">Download</strong>.</span></li><li id="ALM-25008__li15999175218461"><span>Expand the <strong id="ALM-25008__b2060812549107">Service</strong> drop-down list, and select <strong id="ALM-25008__b176086541100">LdapServer</strong> for the target cluster.</span></li><li id="ALM-25008__li1799955234619"><span>Click <span><img id="ALM-25008__image1299965219461" src="en-us_image_0000002008258989.png"></span> in the upper right corner, and set <strong id="ALM-25008__b9290115818109">Start Date</strong> and <strong id="ALM-25008__b02911158101019">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-25008__b4291185881017">Download</strong>.</span></li><li id="ALM-25008__li1602535462"><span>Contact <span id="ALM-25008__text176166613113">O&M personnel</span> and provide the collected logs.</span></li></ol>
|
||||
</div>
|
||||
<div class="section" id="ALM-25008__section169311343318"><h4 class="sectiontitle"><span id="ALM-25008__text367020138593">Alarm Clearance</span></h4><p id="ALM-25008__p754913417333">This alarm is automatically cleared after the fault is rectified.</p>
|
||||
</div>
|
||||
|
||||
@ -70,13 +70,13 @@
|
||||
</div>
|
||||
<div class="section" id="ALM-29007__section36421639"><h4 class="sectiontitle">Possible Causes</h4><p id="ALM-29007__p9402509505">The Impalad process is executing a large number of query tasks.</p>
|
||||
</div>
|
||||
<div class="section" id="ALM-29007__section61311810131118"><h4 class="sectiontitle">Handling Procedure</h4><ol id="ALM-29007__ol1598183211416"><li id="ALM-29007__li39816321245"><span>On FusionInsight Manager, choose <strong id="ALM-29007__b1159183925518">O&M</strong> > <strong id="ALM-29007__b5160939195510">Alarm</strong> > <strong id="ALM-29007__b116013916555">Thresholds</strong> > <strong id="ALM-29007__b616015395555">Impala</strong> > <strong id="ALM-29007__b6160939195519">CPU and Memory</strong> > <strong id="ALM-29007__b1160153918559">Impalad Process Memory Usage (Impalad)</strong> and check the threshold.</span></li><li id="ALM-29007__li6595161750"><span>If the alarm threshold is smaller than 80%, increase the alarm threshold as required and check whether the alarm is cleared.</span><p><ul id="ALM-29007__ul941175682912"><li id="ALM-29007__li241456102913">If yes, no further action is required.</li><li id="ALM-29007__li1032055153019">If no, go to <a href="#ALM-29007__li54643151153">3</a>.</li></ul>
|
||||
<div class="section" id="ALM-29007__section61311810131118"><h4 class="sectiontitle">Handling Procedure</h4><ol id="ALM-29007__ol1598183211416"><li id="ALM-29007__li39816321245"><span>On MRS Manager, choose <strong id="ALM-29007__b1159183925518">O&M</strong> > <strong id="ALM-29007__b5160939195510">Alarm</strong> > <strong id="ALM-29007__b116013916555">Thresholds</strong> > <strong id="ALM-29007__b616015395555">Impala</strong> > <strong id="ALM-29007__b6160939195519">CPU and Memory</strong> > <strong id="ALM-29007__b1160153918559">Impalad Process Memory Usage (Impalad)</strong> and check the threshold.</span></li><li id="ALM-29007__li6595161750"><span>If the alarm threshold is smaller than 80%, increase the alarm threshold as required and check whether the alarm is cleared.</span><p><ul id="ALM-29007__ul941175682912"><li id="ALM-29007__li241456102913">If yes, no further action is required.</li><li id="ALM-29007__li1032055153019">If no, go to <a href="#ALM-29007__li54643151153">3</a>.</li></ul>
|
||||
</p></li><li id="ALM-29007__li54643151153"><a name="ALM-29007__li54643151153"></a><a name="li54643151153"></a><span>If the threshold is greater than 80%, check whether a large number of concurrent query tasks exist when the alarm is generated. A large number of concurrent query tasks will cause the memory usage to increase sharply. After the tasks are complete, check whether the alarm is automatically cleared. During this period, some tasks may fail to be executed or may be canceled due to insufficient memory. In this case, try again.</span><p><div class="note" id="ALM-29007__note35700516451"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="ALM-29007__p257015514518">If the memory usage always exceeds the threshold, the cluster capacity needs to be expanded.</p>
|
||||
</div></div>
|
||||
<ul id="ALM-29007__ul1769835811449"><li id="ALM-29007__li10698175824412">If yes, no further action is required.</li><li id="ALM-29007__li14698258154413">If no, go to <a href="#ALM-29007__li1698242954313">4</a>.</li></ul>
|
||||
</p></li></ol>
|
||||
<p class="tableheading" id="ALM-29007__p39821129144316"><strong id="ALM-29007__b17406162195715">Collect fault information.</strong></p>
|
||||
<ol start="4" id="ALM-29007__ol189821329134317"><li id="ALM-29007__li1698242954313"><a name="ALM-29007__li1698242954313"></a><a name="li1698242954313"></a><span>On FusionInsight Manager of the active or standby cluster, choose <strong id="ALM-29007__b694017416572">O&M</strong>. In the navigation pane on the left, choose <strong id="ALM-29007__b59401846573">Log</strong> > <strong id="ALM-29007__b109404445710">Download</strong>.</span></li><li id="ALM-29007__li27049781154249"><span>Expand the <strong id="ALM-29007__b149911610572">Service</strong> drop-down list, and select <strong id="ALM-29007__b74991264575">Impala</strong> for the target cluster.</span></li><li id="ALM-29007__li1498212919436"><span>Click the edit icon in the upper right corner, and set <strong id="ALM-29007__b935629185712">Start Date</strong> and <strong id="ALM-29007__b53568916572">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-29007__b33564918571">Download</strong>.</span></li><li id="ALM-29007__li56393916154249"><span>Contact <span id="ALM-29007__text16720101425714">O&M personnel</span> and provide the collected logs.</span></li></ol>
|
||||
<ol start="4" id="ALM-29007__ol189821329134317"><li id="ALM-29007__li1698242954313"><a name="ALM-29007__li1698242954313"></a><a name="li1698242954313"></a><span>On MRS Manager of the active or standby cluster, choose <strong id="ALM-29007__b694017416572">O&M</strong>. In the navigation pane on the left, choose <strong id="ALM-29007__b59401846573">Log</strong> > <strong id="ALM-29007__b109404445710">Download</strong>.</span></li><li id="ALM-29007__li27049781154249"><span>Expand the <strong id="ALM-29007__b149911610572">Service</strong> drop-down list, and select <strong id="ALM-29007__b74991264575">Impala</strong> for the target cluster.</span></li><li id="ALM-29007__li1498212919436"><span>Click the edit icon in the upper right corner, and set <strong id="ALM-29007__b935629185712">Start Date</strong> and <strong id="ALM-29007__b53568916572">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-29007__b33564918571">Download</strong>.</span></li><li id="ALM-29007__li56393916154249"><span>Contact <span id="ALM-29007__text16720101425714">O&M personnel</span> and provide the collected logs.</span></li></ol>
|
||||
</div>
|
||||
<div class="section" id="ALM-29007__section169311343318"><h4 class="sectiontitle">Alarm Clearance</h4><p id="ALM-29007__p55781648135011">The alarm is automatically cleared after the burst concurrent tasks are complete.</p>
|
||||
</div>
|
||||
|
||||
@ -70,13 +70,13 @@
|
||||
</div>
|
||||
<div class="section" id="ALM-29008__section36421639"><h4 class="sectiontitle">Possible Causes</h4><p id="ALM-29008__p9402509505">The number of client connections maintained by the Impalad service is too large or the threshold is too small.</p>
|
||||
</div>
|
||||
<div class="section" id="ALM-29008__section20662238181315"><h4 class="sectiontitle">Handling Procedure</h4><ol id="ALM-29008__ol398575283918"><li id="ALM-29008__li1898555216394"><span>On FusionInsight Manager, choose <strong id="ALM-29008__b121256143357">O&M </strong>> <strong id="ALM-29008__b16125121473512">Alarm</strong> > <strong id="ALM-29008__b16125141412354">Thresholds</strong> > <strong id="ALM-29008__b6125101443510">Impala</strong> > <strong id="ALM-29008__b412515147355">Connections</strong> > <strong id="ALM-29008__b12125101410353">Number of ODBC Connections to Impalad Process (Impalad)</strong> to check the threshold.</span></li><li id="ALM-29008__li1232161715409"><span>Check the number of ODBC applications connected to Impalad and stop idle applications. Check whether the alarm is automatically cleared.</span><p><ul id="ALM-29008__ul1437394518402"><li id="ALM-29008__li9373184519406">If yes, no further action is required.</li><li id="ALM-29008__li5327195094015">If no, go to <a href="#ALM-29008__li1507754134111">3</a> to change the number of concurrent connections supported by Impalad.</li></ul>
|
||||
</p></li><li id="ALM-29008__li1507754134111"><a name="ALM-29008__li1507754134111"></a><a name="li1507754134111"></a><span>On FusionInsight Manager, choose <strong id="ALM-29008__b1618423443519">Cluster</strong> > <strong id="ALM-29008__b17184193412354">Impala</strong> > <strong id="ALM-29008__b21848340351">Configurations</strong> > <strong id="ALM-29008__b1718415344354">All Configurations</strong> > <strong id="ALM-29008__b1918516346351">Impalad</strong> > <strong id="ALM-29008__b7185173413359">Customization</strong>. Add the custom parameter <strong id="ALM-29008__b3185334173516">--fe_service_threads</strong>. The default value of this parameter is <strong id="ALM-29008__b11185934143518">64</strong>. Change the value as required and click <strong id="ALM-29008__b918563414350">Save</strong>.</span></li><li id="ALM-29008__li128051144134613"><span>After the query tasks on all clients are complete, click the <strong id="ALM-29008__b154063993614">Instances</strong> tab. Select all Impalad instances, and restart them.</span><p><div class="note" id="ALM-29008__note220918329390"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="ALM-29008__en-us_topic_0294848303_p0481144124012">The service will become unavailable when all instances are restarted. If a single instance is restarted, the tasks that are being executed on that instance will fail and the service will become available.</p>
|
||||
<div class="section" id="ALM-29008__section20662238181315"><h4 class="sectiontitle">Handling Procedure</h4><ol id="ALM-29008__ol398575283918"><li id="ALM-29008__li1898555216394"><span>On MRS Manager, choose <strong id="ALM-29008__b121256143357">O&M </strong>> <strong id="ALM-29008__b16125121473512">Alarm</strong> > <strong id="ALM-29008__b16125141412354">Thresholds</strong> > <strong id="ALM-29008__b6125101443510">Impala</strong> > <strong id="ALM-29008__b412515147355">Connections</strong> > <strong id="ALM-29008__b12125101410353">Number of ODBC Connections to Impalad Process (Impalad)</strong> to check the threshold.</span></li><li id="ALM-29008__li1232161715409"><span>Check the number of ODBC applications connected to Impalad and stop idle applications. Check whether the alarm is automatically cleared.</span><p><ul id="ALM-29008__ul1437394518402"><li id="ALM-29008__li9373184519406">If yes, no further action is required.</li><li id="ALM-29008__li5327195094015">If no, go to <a href="#ALM-29008__li1507754134111">3</a> to change the number of concurrent connections supported by Impalad.</li></ul>
|
||||
</p></li><li id="ALM-29008__li1507754134111"><a name="ALM-29008__li1507754134111"></a><a name="li1507754134111"></a><span>On MRS Manager, choose <strong id="ALM-29008__b1618423443519">Cluster</strong> > <strong id="ALM-29008__b17184193412354">Impala</strong> > <strong id="ALM-29008__b21848340351">Configurations</strong> > <strong id="ALM-29008__b1718415344354">All Configurations</strong> > <strong id="ALM-29008__b1918516346351">Impalad</strong> > <strong id="ALM-29008__b7185173413359">Customization</strong>. Add the custom parameter <strong id="ALM-29008__b3185334173516">--fe_service_threads</strong>. The default value of this parameter is <strong id="ALM-29008__b11185934143518">64</strong>. Change the value as required and click <strong id="ALM-29008__b918563414350">Save</strong>.</span></li><li id="ALM-29008__li128051144134613"><span>After the query tasks on all clients are complete, click the <strong id="ALM-29008__b154063993614">Instances</strong> tab. Select all Impalad instances, and restart them.</span><p><div class="note" id="ALM-29008__note220918329390"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="ALM-29008__en-us_topic_0294848303_p0481144124012">The service will become unavailable when all instances are restarted. If a single instance is restarted, the tasks that are being executed on that instance will fail and the service will become available.</p>
|
||||
</div></div>
|
||||
</p></li><li id="ALM-29008__li313119456566"><span>After the restart is complete, check whether the alarm is cleared.</span><p><ul id="ALM-29008__ul554711711577"><li id="ALM-29008__li105478719578">If yes, no further action is required.</li><li id="ALM-29008__li165471275576">If yes, go to <a href="#ALM-29008__li17918612154249">6</a>.</li></ul>
|
||||
</p></li></ol>
|
||||
<p class="tableheading" id="ALM-29008__p3847019615437"><strong id="ALM-29008__b1184566123718">Collect fault information.</strong></p>
|
||||
<ol start="6" id="ALM-29008__ol18403783154311"><li id="ALM-29008__li17918612154249"><a name="ALM-29008__li17918612154249"></a><a name="li17918612154249"></a><span>On FusionInsight Manager, choose <strong id="ALM-29008__b92041888374">O&M</strong>. In the navigation pane on the left, choose <strong id="ALM-29008__b10204158143719">Log</strong> > <strong id="ALM-29008__b1520416853712">Download</strong>.</span></li><li id="ALM-29008__li27049781154249"><span>Expand the <strong id="ALM-29008__b145588914370">Service</strong> drop-down list, and select <strong id="ALM-29008__b14558199173717">Impala</strong> for the target cluster.</span></li><li id="ALM-29008__li42121445154249"><span>Click the edit icon in the upper right corner, and set <strong id="ALM-29008__b18417611143713">Start Date</strong> and <strong id="ALM-29008__b1341791133710">End Date</strong> for log collection to 1 hour ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-29008__b12417111113713">Download</strong>.</span></li><li id="ALM-29008__li56393916154249"><span>Contact <span id="ALM-29008__text876211216374">O&M personnel</span> and provide the collected logs.</span></li></ol>
|
||||
<ol start="6" id="ALM-29008__ol18403783154311"><li id="ALM-29008__li17918612154249"><a name="ALM-29008__li17918612154249"></a><a name="li17918612154249"></a><span>On MRS Manager, choose <strong id="ALM-29008__b92041888374">O&M</strong>. In the navigation pane on the left, choose <strong id="ALM-29008__b10204158143719">Log</strong> > <strong id="ALM-29008__b1520416853712">Download</strong>.</span></li><li id="ALM-29008__li27049781154249"><span>Expand the <strong id="ALM-29008__b145588914370">Service</strong> drop-down list, and select <strong id="ALM-29008__b14558199173717">Impala</strong> for the target cluster.</span></li><li id="ALM-29008__li42121445154249"><span>Click the edit icon in the upper right corner, and set <strong id="ALM-29008__b18417611143713">Start Date</strong> and <strong id="ALM-29008__b1341791133710">End Date</strong> for log collection to 1 hour ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-29008__b12417111113713">Download</strong>.</span></li><li id="ALM-29008__li56393916154249"><span>Contact <span id="ALM-29008__text876211216374">O&M personnel</span> and provide the collected logs.</span></li></ol>
|
||||
</div>
|
||||
<div class="section" id="ALM-29008__section169311343318"><h4 class="sectiontitle">Alarm Clearance</h4><p id="ALM-29008__p55781648135011">This alarm is automatically cleared after the fault is rectified.</p>
|
||||
</div>
|
||||
|
||||
@ -70,7 +70,7 @@
|
||||
</div>
|
||||
<div class="section" id="ALM-29010__section36421639"><h4 class="sectiontitle">Possible Causes</h4><p id="ALM-29010__p9402509505">The Impalad service has maintained a large number of queries, or the threshold is too small.</p>
|
||||
</div>
|
||||
<div class="section" id="ALM-29010__section20662238181315"><h4 class="sectiontitle">Handling Procedure</h4><ol id="ALM-29010__ol398575283918"><li id="ALM-29010__li1898555216394"><span>On FusionInsight Manager, choose <strong id="ALM-29010__b112652457235">O&M</strong> > <strong id="ALM-29010__b1926584552312">Alarm</strong> > <strong id="ALM-29010__b426564522319">Thresholds</strong> > <strong id="ALM-29010__b6265114532310">Impala</strong> > <strong id="ALM-29010__b326554511233">Query Task Sum Statistics</strong> > <strong id="ALM-29010__b4266345162313">Total number of Queries Being Submitted (Impalad)</strong> and check the threshold.</span><p><p id="ALM-29010__p16200204422518"><span><img id="ALM-29010__image17964155802615" src="en-us_image_0000002007649989.png"></span></p>
|
||||
<div class="section" id="ALM-29010__section20662238181315"><h4 class="sectiontitle">Handling Procedure</h4><ol id="ALM-29010__ol398575283918"><li id="ALM-29010__li1898555216394"><span>On MRS Manager, choose <strong id="ALM-29010__b112652457235">O&M</strong> > <strong id="ALM-29010__b1926584552312">Alarm</strong> > <strong id="ALM-29010__b426564522319">Thresholds</strong> > <strong id="ALM-29010__b6265114532310">Impala</strong> > <strong id="ALM-29010__b326554511233">Query Task Sum Statistics</strong> > <strong id="ALM-29010__b4266345162313">Total number of Queries Being Submitted (Impalad)</strong> and check the threshold.</span><p><p id="ALM-29010__p16200204422518"><span><img id="ALM-29010__image17964155802615" src="en-us_image_0000002007649989.png"></span></p>
|
||||
</p></li><li id="ALM-29010__li1232161715409"><span>Change the threshold.</span><p><p id="ALM-29010__p1428013915914"><span><img id="ALM-29010__image441151014594" src="en-us_image_0000001971169950.png"></span></p>
|
||||
</p></li><li id="ALM-29010__li1507754134111"><span>Click the <strong id="ALM-29010__b2014612516170">Instances</strong> tab, select all Impalad instances, and restart them.</span><p><div class="note" id="ALM-29010__note220918329390"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="ALM-29010__en-us_topic_0294848303_p0481144124012">The service will become unavailable when all instances are restarted. If a single instance is restarted, the tasks that are being executed on that instance will fail and the service will become available.</p>
|
||||
</div></div>
|
||||
@ -78,7 +78,7 @@
|
||||
</p></li><li id="ALM-29010__li10975203610439"><span>After the restart is complete, check whether the alarm is cleared.</span><p><ul id="ALM-29010__ul5550203425918"><li id="ALM-29010__li15501934145913">If yes, no further action is required.</li><li id="ALM-29010__li55501534135916">If no, go to <a href="#ALM-29010__li17918612154249">5</a>.</li></ul>
|
||||
</p></li></ol>
|
||||
<p class="tableheading" id="ALM-29010__p3847019615437"><strong id="ALM-29010__b445483441518">Collect fault information.</strong></p>
|
||||
<ol start="5" id="ALM-29010__ol18403783154311"><li id="ALM-29010__li17918612154249"><a name="ALM-29010__li17918612154249"></a><a name="li17918612154249"></a><span>On FusionInsight Manager, choose <strong id="ALM-29010__b1363713516156">O&M</strong>. In the navigation pane on the left, choose <strong id="ALM-29010__b14638935181518">Log</strong> > <strong id="ALM-29010__b1463853510152">Download</strong>.</span></li><li id="ALM-29010__li27049781154249"><span>Expand the <strong id="ALM-29010__b173407374153">Service</strong> drop-down list, and select <strong id="ALM-29010__b15340103720156">Impala</strong> for the target cluster.</span></li><li id="ALM-29010__li42121445154249"><span>Click the edit icon in the upper right corner, and set <strong id="ALM-29010__b1313103991517">Start Date</strong> and <strong id="ALM-29010__b1913239131519">End Date</strong> for log collection to 1 hour ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-29010__b71313971513">Download</strong>.</span></li><li id="ALM-29010__li56393916154249"><span>Contact <span id="ALM-29010__text164301140111513">O&M personnel</span> and provide the collected logs.</span></li></ol>
|
||||
<ol start="5" id="ALM-29010__ol18403783154311"><li id="ALM-29010__li17918612154249"><a name="ALM-29010__li17918612154249"></a><a name="li17918612154249"></a><span>On MRS Manager, choose <strong id="ALM-29010__b1363713516156">O&M</strong>. In the navigation pane on the left, choose <strong id="ALM-29010__b14638935181518">Log</strong> > <strong id="ALM-29010__b1463853510152">Download</strong>.</span></li><li id="ALM-29010__li27049781154249"><span>Expand the <strong id="ALM-29010__b173407374153">Service</strong> drop-down list, and select <strong id="ALM-29010__b15340103720156">Impala</strong> for the target cluster.</span></li><li id="ALM-29010__li42121445154249"><span>Click the edit icon in the upper right corner, and set <strong id="ALM-29010__b1313103991517">Start Date</strong> and <strong id="ALM-29010__b1913239131519">End Date</strong> for log collection to 1 hour ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-29010__b71313971513">Download</strong>.</span></li><li id="ALM-29010__li56393916154249"><span>Contact <span id="ALM-29010__text164301140111513">O&M personnel</span> and provide the collected logs.</span></li></ol>
|
||||
</div>
|
||||
<div class="section" id="ALM-29010__section169311343318"><h4 class="sectiontitle">Alarm Clearance</h4><p id="ALM-29010__p55781648135011">This alarm is automatically cleared after the fault is rectified.</p>
|
||||
</div>
|
||||
|
||||
@ -70,7 +70,7 @@
|
||||
</div>
|
||||
<div class="section" id="ALM-29011__section36421639"><h4 class="sectiontitle">Possible Causes</h4><p id="ALM-29011__p9402509505">The Impalad service has maintained a large number of queries, or the threshold is too small.</p>
|
||||
</div>
|
||||
<div class="section" id="ALM-29011__section20662238181315"><h4 class="sectiontitle">Handling Procedure</h4><ol id="ALM-29011__ol398575283918"><li id="ALM-29011__li1898555216394"><span>On FusionInsight Manager, choose <strong id="ALM-29011__b15560629125317">O&M</strong> > <strong id="ALM-29011__b1560112914533">Alarm</strong> > <strong id="ALM-29011__b2560122916538">Thresholds</strong> > <strong id="ALM-29011__b75611929135314">Impala</strong> > <strong id="ALM-29011__b16561829125312">Query Task Sum Statistics</strong> > <strong id="ALM-29011__b85611829165317">Total number of Queries Being Executed (Impalad)</strong> and check the threshold.</span><p><p id="ALM-29011__p9615111114018"><span><img id="ALM-29011__image585410413" src="en-us_image_0000002007530501.png"></span></p>
|
||||
<div class="section" id="ALM-29011__section20662238181315"><h4 class="sectiontitle">Handling Procedure</h4><ol id="ALM-29011__ol398575283918"><li id="ALM-29011__li1898555216394"><span>On MRS Manager, choose <strong id="ALM-29011__b15560629125317">O&M</strong> > <strong id="ALM-29011__b1560112914533">Alarm</strong> > <strong id="ALM-29011__b2560122916538">Thresholds</strong> > <strong id="ALM-29011__b75611929135314">Impala</strong> > <strong id="ALM-29011__b16561829125312">Query Task Sum Statistics</strong> > <strong id="ALM-29011__b85611829165317">Total number of Queries Being Executed (Impalad)</strong> and check the threshold.</span><p><p id="ALM-29011__p9615111114018"><span><img id="ALM-29011__image585410413" src="en-us_image_0000002007530501.png"></span></p>
|
||||
</p></li><li id="ALM-29011__li1232161715409"><span>Change the threshold.</span><p><p id="ALM-29011__p1428013915914"><span><img id="ALM-29011__image441151014594" src="en-us_image_0000002007649997.png"></span></p>
|
||||
</p></li><li id="ALM-29011__li1507754134111"><span>Click the <strong id="ALM-29011__b076818269173">Instances</strong> tab, select all Impalad instances, and restart them.</span><p><div class="note" id="ALM-29011__note220918329390"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="ALM-29011__en-us_topic_0294848303_p0481144124012">The service will become unavailable when all instances are restarted. If a single instance is restarted, the tasks that are being executed on that instance will fail and the service will become available.</p>
|
||||
</div></div>
|
||||
@ -78,7 +78,7 @@
|
||||
</p></li><li id="ALM-29011__li10975203610439"><span>After the restart is complete, check whether the alarm is cleared.</span><p><ul id="ALM-29011__ul5550203425918"><li id="ALM-29011__li15501934145913">If yes, no further action is required.</li><li id="ALM-29011__li55501534135916">If no, go to <a href="#ALM-29011__li17918612154249">5</a>.</li></ul>
|
||||
</p></li></ol>
|
||||
<p class="tableheading" id="ALM-29011__p3847019615437"><strong id="ALM-29011__b1529294803515">Collect fault information.</strong></p>
|
||||
<ol start="5" id="ALM-29011__ol18403783154311"><li id="ALM-29011__li17918612154249"><a name="ALM-29011__li17918612154249"></a><a name="li17918612154249"></a><span>On FusionInsight Manager, choose <strong id="ALM-29011__b11424175273513">O&M</strong>. In the navigation pane on the left, choose <strong id="ALM-29011__b5425105283513">Log</strong> > <strong id="ALM-29011__b74253525354">Download</strong>.</span></li><li id="ALM-29011__li27049781154249"><span>Expand the <strong id="ALM-29011__b19135754143511">Service</strong> drop-down list, and select <strong id="ALM-29011__b121357544355">Impala</strong> for the target cluster.</span></li><li id="ALM-29011__li42121445154249"><span>Click the edit icon in the upper right corner, and set <strong id="ALM-29011__b1778245513356">Start Date</strong> and <strong id="ALM-29011__b47821755193514">End Date</strong> for log collection to 1 hour ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-29011__b778217552359">Download</strong>.</span></li><li id="ALM-29011__li56393916154249"><span>Contact <span id="ALM-29011__text4636165793517">O&M personnel</span> and provide the collected logs.</span></li></ol>
|
||||
<ol start="5" id="ALM-29011__ol18403783154311"><li id="ALM-29011__li17918612154249"><a name="ALM-29011__li17918612154249"></a><a name="li17918612154249"></a><span>On MRS Manager, choose <strong id="ALM-29011__b11424175273513">O&M</strong>. In the navigation pane on the left, choose <strong id="ALM-29011__b5425105283513">Log</strong> > <strong id="ALM-29011__b74253525354">Download</strong>.</span></li><li id="ALM-29011__li27049781154249"><span>Expand the <strong id="ALM-29011__b19135754143511">Service</strong> drop-down list, and select <strong id="ALM-29011__b121357544355">Impala</strong> for the target cluster.</span></li><li id="ALM-29011__li42121445154249"><span>Click the edit icon in the upper right corner, and set <strong id="ALM-29011__b1778245513356">Start Date</strong> and <strong id="ALM-29011__b47821755193514">End Date</strong> for log collection to 1 hour ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-29011__b778217552359">Download</strong>.</span></li><li id="ALM-29011__li56393916154249"><span>Contact <span id="ALM-29011__text4636165793517">O&M personnel</span> and provide the collected logs.</span></li></ol>
|
||||
</div>
|
||||
<div class="section" id="ALM-29011__section169311343318"><h4 class="sectiontitle">Alarm Clearance</h4><p id="ALM-29011__p55781648135011">This alarm is automatically cleared after the fault is rectified.</p>
|
||||
</div>
|
||||
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user