doc-exports/docs/mrs/umn/ALM-12061.html
Yang, Tong 3b1f73dece MRS UMN 2.0.38.SP20 version
Reviewed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com>
Co-authored-by: Yang, Tong <yangtong2@huawei.com>
Co-committed-by: Yang, Tong <yangtong2@huawei.com>
2022-12-13 12:03:34 +00:00

96 lines
14 KiB
HTML

<a name="ALM-12061"></a><a name="ALM-12061"></a>
<h1 class="topictitle1">ALM-12061 Process Usage Exceeds the Threshold</h1>
<div id="body1546852608752"><div class="section" id="ALM-12061__section45251551191910"><h4 class="sectiontitle">Description</h4><p id="ALM-12061__p8690451111916">The system checks the usage of the omm process every 30 seconds. Users can run the <strong id="ALM-12061__b8690125131915">ps -o nlwp, pid, args, -u omm | awk '{sum+=$1} END {print "", sum}'</strong> command to obtain the number of concurrent processes of user <strong id="ALM-12061__b1969014511198">omm</strong>. Run the <strong id="ALM-12061__b166906516196">ulimit -u</strong>command to obtain the maximum number of processes that can be simultaneously opened by user <strong id="ALM-12061__b19690175115197">omm</strong>. Divide the number of concurrent processes by the maximum number to obtain the process usage of user <strong id="ALM-12061__b196901551201915">omm</strong>. The process usage has a default threshold. This alarm is generated when the process usage exceeds the threshold.</p>
<p id="ALM-12061__p96908512194">If <strong id="ALM-12061__b2690155141916">Trigger Count </strong>is <strong id="ALM-12061__b069095113193">3</strong> and the process usage is less than or equal to the threshold, this alarm is cleared. If <strong id="ALM-12061__b8690185151913">Trigger Count</strong> is greater than <strong id="ALM-12061__b0690551141910">1</strong>and the process usage is less than or equal to 90% of the threshold, this alarm is cleared.</p>
</div>
<div class="section" id="ALM-12061__section75265516199"><h4 class="sectiontitle">Attribute</h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-12061__table11528115171913" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-12061__row13691351131919"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.1"><p id="ALM-12061__p1169145119198">Alarm ID</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.2"><p id="ALM-12061__p206914516195">Alarm Severity</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.3"><p id="ALM-12061__p126911651151918">Auto Clear</p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-12061__row1669113514192"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.1 "><p id="ALM-12061__p1269145151915">12061</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.2 "><p id="ALM-12061__p8691175121917">Major</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.3 "><p id="ALM-12061__p4691145115196">Yes</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-12061__section115319514194"><h4 class="sectiontitle">Parameters</h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-12061__table105321951141912" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-12061__row1269219516194"><th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.1"><p id="ALM-12061__p15692105115190">Name</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.2"><p id="ALM-12061__p469295120198">Meaning</p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-12061__row759218834110"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-12061__p192431315431">Source</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-12061__p692551319435">Specifies the cluster or system for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-12061__row4692145112197"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-12061__p1969235120195">ServiceName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-12061__p1969215513194">Specifies the service for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-12061__row9692165112196"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-12061__p1692175110197">RoleName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-12061__p9692105141912">Specifies the role for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-12061__row669212515194"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-12061__p16692951181919">HostName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-12061__p6692251101911">Specifies the host for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-12061__row569215101914"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-12061__p1569215131917">Trigger Condition</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-12061__p569318516195">Specifies the threshold for triggering the alarm.</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-12061__section554019510195"><h4 class="sectiontitle">Impact on the System</h4><ul id="ALM-12061__ul10693185117194"><li id="ALM-12061__li13693185111915">Switch to user <strong id="ALM-12061__b36932051161915">omm</strong> fails.</li><li id="ALM-12061__li186937513198">New omm process cannot be created.</li></ul>
</div>
<div class="section" id="ALM-12061__section19542851121912"><h4 class="sectiontitle">Possible Causes</h4><ul id="ALM-12061__ul116931351161910"><li id="ALM-12061__li369312514191">The alarm threshold is improperly configured.</li><li id="ALM-12061__li176935515190">The maximum number of processes (including threads) that can be concurrently opened by user <strong id="ALM-12061__b116937517193">omm</strong> is inappropriate.</li><li id="ALM-12061__li669355121914">An excessive number of threads are opened at the same time.</li></ul>
</div>
<div class="section" id="ALM-12061__section145451851131917"><h4 class="sectiontitle">Procedure</h4><p id="ALM-12061__p12693135116199"><strong id="ALM-12061__b166935517198">Check whether the alarm threshold or alarm hit number is properly configured.</strong></p>
<ol id="ALM-12061__ol1937419236218"><li id="ALM-12061__li63741123102117"><span>On the FusionInsight Manager, change the alarm threshold and <strong id="ALM-12061__b1936942319210">Trigger Count</strong> based on the actual CPU usage.</span><p><p id="ALM-12061__p53741023132117">Specifically, choose <strong id="ALM-12061__b12369223182120">O&amp;M </strong>&gt; <strong id="ALM-12061__b7369182362114">Alarm</strong> &gt; <strong id="ALM-12061__b183696238213">Thresholds</strong> &gt;<em id="ALM-12061__i2811143010409"> Name of the desired cluster</em> &gt; <strong id="ALM-12061__b1736902316215">Host</strong>&gt; <strong id="ALM-12061__b1369122314213">Process</strong> &gt; <strong id="ALM-12061__b1736992318217">omm Process Usage</strong> to change Trigger Count.</p>
<div class="note" id="ALM-12061__note1837419235216"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="ALM-12061__p6374102312216">The alarm is generated when the process usage exceeds the threshold for the times specified by <strong id="ALM-12061__b1237411237214">Trigger Count</strong>.</p>
</div></div>
<p id="ALM-12061__p1737417236213">Set the alarm threshold based on the actual process usage. To check the process usage, choose <strong id="ALM-12061__b4374172315215">O&amp;M</strong> &gt; <strong id="ALM-12061__b11374192352114">Alarm</strong> &gt; <strong id="ALM-12061__b183741423162110">Thresholds</strong> &gt; <em id="ALM-12061__i18450436164420">Name of the desired cluster</em> &gt; <strong id="ALM-12061__b2374102311219">Host</strong>&gt; <strong id="ALM-12061__b51371152474">Process</strong> &gt; <strong id="ALM-12061__b1693614974714">omm Process Usage</strong>, as shown in <a href="#ALM-12061__fig437414238216">Figure 1</a>.</p>
<div class="fignone" id="ALM-12061__fig437414238216"><a name="ALM-12061__fig437414238216"></a><a name="fig437414238216"></a><span class="figcap"><b>Figure 1 </b>Setting an alarm threshold</span><br><span><img id="ALM-12061__image1615410501365" src="en-us_image_0000001440858217.png"></span></div>
</p></li><li id="ALM-12061__li33745237217"><span>2 minutes later, check whether the alarm is cleared.</span><p><ul id="ALM-12061__ul1437412317219"><li id="ALM-12061__li2374182312217">If it is, no further action is required.</li><li id="ALM-12061__li2374112315211">If it is not, go to <a href="#ALM-12061__li936717234216">3</a>.</li></ul>
</p></li></ol>
<p id="ALM-12061__p630219198214"><strong id="ALM-12061__b6695451191916">Check whether the maximum number of processes (including threads) opened by user omm is appropriate.</strong></p>
<ol start="3" id="ALM-12061__ol13367112317219"><li id="ALM-12061__li936717234216"><a name="ALM-12061__li936717234216"></a><a name="li936717234216"></a><span>In the alarm list on FusionInsight Manager, locate the row that contains the alarm, and view the IP address of the host for which the alarm is generated.</span></li><li id="ALM-12061__li1136752311217"><span>Log in to the host where the alarm is generated as user <strong id="ALM-12061__b1136717231212">root</strong>. <span id="ALM-12061__text985593916354"></span></span></li><li id="ALM-12061__li15367523112112"><span>Run the <strong id="ALM-12061__b5367122302115">su - omm</strong> command to switch to user <strong id="ALM-12061__b193671623132111">omm</strong>.</span></li><li id="ALM-12061__li8367112332111"><span>Run the <strong id="ALM-12061__b14367122392112">ulimit -u</strong> command to obtain the maximum number of threads that can be concurrently opened by user <strong id="ALM-12061__b1236732392116">omm</strong> and check whether the number is greater than or equal to 60000.</span><p><ul id="ALM-12061__ul136710230215"><li id="ALM-12061__li13367423122115">If it is, go to <a href="#ALM-12061__li293443912213">8</a>.</li><li id="ALM-12061__li2367102320214">If it is not, go to <a href="#ALM-12061__li8367152314217">7</a>.</li></ul>
</p></li><li id="ALM-12061__li8367152314217"><a name="ALM-12061__li8367152314217"></a><a name="li8367152314217"></a><span>Run the <strong id="ALM-12061__b53671823112118">ulimit -u 60000</strong> command to change the maximum number to 60000. Two minutes later, check whether the alarm is cleared.</span><p><ul id="ALM-12061__ul19367423152119"><li id="ALM-12061__li93671123122113">If it is, no further action is required.</li><li id="ALM-12061__li836702332116">If it is not, go to <a href="#ALM-12061__li1668345092117">12</a>.</li></ul>
</p></li></ol>
<p id="ALM-12061__p7839436162117"><strong id="ALM-12061__b1836742382111">Check whether an excessive number of processes are opened at the same time.</strong></p>
<ol start="8" id="ALM-12061__ol1093673902112"><li id="ALM-12061__li293443912213"><a name="ALM-12061__li293443912213"></a><a name="li293443912213"></a><span>In the alarm list on FusionInsight Manager, locate the row that contains the alarm, and view the IP address of the host for which the alarm is generated.</span></li><li id="ALM-12061__li3934143952119"><span>Log in to the host where the alarm is generated as user <strong id="ALM-12061__b209341539202116">root</strong>.</span></li><li id="ALM-12061__li893473922118"><span>Run the <strong id="ALM-12061__b199341039112112">ps -o nlwp, pid, lwp, args, -u omm|sort -n</strong> command to check the numbers of threads used by the system. The result is sorted based on the thread number. Analyze the top 5 thread numbers and check whether the threads are incorrectly used. If they are, contact maintenance personnel to rectify the fault. If they are not, run the <strong id="ALM-12061__b209343391212">ulimit -u</strong> command to change the maximum number to be greater than 60000.</span></li><li id="ALM-12061__li119349396211"><span>Five minutes later, check whether the alarm is cleared.</span><p><ul id="ALM-12061__ul11934203918217"><li id="ALM-12061__li29341139172111">If it is, no further action is required.</li><li id="ALM-12061__li10934539102120">If it is not, go to <a href="#ALM-12061__li1668345092117">12</a>.</li></ul>
</p></li></ol>
<p id="ALM-12061__p56917471218"><strong id="ALM-12061__b1493463982113">Collect fault information.</strong></p>
<ol start="12" id="ALM-12061__ol18685115014216"><li id="ALM-12061__li1668345092117"><a name="ALM-12061__li1668345092117"></a><a name="li1668345092117"></a><span>On the FusionInsight Manager home page of the active clusters, choose <strong id="ALM-12061__b968317505217">O&amp;M </strong>&gt; <strong id="ALM-12061__b156836505210">Log</strong> &gt; <strong id="ALM-12061__b7683135018213">Download</strong>.</span></li><li id="ALM-12061__li868355022113"><span>Select <strong id="ALM-12061__b6683950172114">OmmServer</strong> and <strong id="ALM-12061__b468318504214">NodeAgent</strong> from the <strong id="ALM-12061__b33411729132615">Service</strong> and click <strong id="ALM-12061__b3991118545">OK</strong>.</span></li><li id="ALM-12061__li8685135062120"><span>Click <span><img id="ALM-12061__image12683135092120" src="en-us_image_0269383906.png"></span> in the upper right corner. In the displayed dialog box, set <strong id="ALM-12061__b136837501219">Start Date</strong> and <strong id="ALM-12061__b86832508216">End Date</strong> to 10 minutes before and after the alarm generation time respectively and click <strong id="ALM-12061__b1168545014219">OK</strong>. Then, click <strong id="ALM-12061__b13685125042113">Download</strong>.</span></li><li id="ALM-12061__li495644512588"><span>Contact the <span id="ALM-12061__text4614151421417">O&amp;M personnel</span> and send the collected log information.</span></li></ol>
</div>
<div class="section" id="ALM-12061__section10584175161919"><h4 class="sectiontitle">Alarm Clearing</h4><p id="ALM-12061__p6698105111191">This alarm will be automatically cleared after the fault is rectified.</p>
</div>
<div class="section" id="ALM-12061__section8584185131911"><h4 class="sectiontitle">Related Information</h4><p id="ALM-12061__p11698651141916">None</p>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
</div>
</div>