doc-exports/docs/mrs/umn/ALM-12085.html
Yang, Tong 3b1f73dece MRS UMN 2.0.38.SP20 version
Reviewed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com>
Co-authored-by: Yang, Tong <yangtong2@huawei.com>
Co-committed-by: Yang, Tong <yangtong2@huawei.com>
2022-12-13 12:03:34 +00:00

91 lines
13 KiB
HTML

<a name="ALM-12085"></a><a name="ALM-12085"></a>
<h1 class="topictitle1">ALM-12085 Service Audit Log Dump Failure</h1>
<div id="body1547193420659"><div class="section" id="ALM-12085__section1090412854716"><h4 class="sectiontitle">Description</h4><p id="ALM-12085__p7337263451">The system dumps service audit logs at 03:00 every day and stores them on the OMS node. This alarm is generated when the dump fails. This alarm is cleared when the next dump succeeds.</p>
</div>
<div class="section" id="ALM-12085__section12861182513454"><h4 class="sectiontitle">Attribute</h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-12085__table178641425154516" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-12085__row10336267457"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.1"><p id="ALM-12085__p3334269455">Alarm ID</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.2"><p id="ALM-12085__p43322674516">Alarm Severity</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.3"><p id="ALM-12085__p17338264456">Auto Clear</p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-12085__row133626104520"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.1 "><p id="ALM-12085__p233182612451">12085</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.2 "><p id="ALM-12085__p10332026194516">Minor</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.3 "><p id="ALM-12085__p1133162617456">Yes</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-12085__section108741925164518"><h4 class="sectiontitle">Parameters</h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-12085__table15875325104520" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-12085__row335326114513"><th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.1"><p id="ALM-12085__p6351926154513">Name</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.2"><p id="ALM-12085__p5351026164516">Meaning</p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-12085__row186515557386"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-12085__p192431315431">Source</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-12085__p692551319435">Specifies the cluster or system for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-12085__row8351126104516"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-12085__p17356264454">ServiceName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-12085__p1935726114512">Specifies the service for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-12085__row035182617453"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-12085__p13515269457">RoleName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-12085__p2351626144519">Specifies the role for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-12085__row23562618451"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-12085__p153512611456">HostName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-12085__p1935102634516">Specifies the host for which the alarm is generated.</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-12085__section387913251457"><h4 class="sectiontitle">Impact on the System</h4><p id="ALM-12085__p18337113415318">The service audit logs may be lost.</p>
</div>
<div class="section" id="ALM-12085__section2884132516453"><h4 class="sectiontitle">Possible Causes</h4><ul id="ALM-12085__ul1935172644512"><li id="ALM-12085__li435172615451">The service audit logs are oversized.</li><li id="ALM-12085__li63562644518">The OMS backup storage space is insufficient.</li><li id="ALM-12085__li1435826184514">The storage space of a host where the service is located is insufficient.</li></ul>
</div>
<div class="section" id="ALM-12085__section38891252458"><h4 class="sectiontitle">Procedure</h4><p id="ALM-12085__p1635182614452"><strong id="ALM-12085__b18355261456">Check whether the service audit logs are oversized.</strong></p>
<ol id="ALM-12085__ol14251115516515"><li id="ALM-12085__li18250105510514"><span>In the alarm list on FusionInsight Manager, locate the row that contains the alarm, and view the IP address of the host and additional information for which the alarm is generated.</span></li><li id="ALM-12085__li202502556513"><span>Log in to the host where the alarm is generated as user <strong id="ALM-12085__b16250355155117">root</strong>. <span id="ALM-12085__text985593916354"></span></span></li><li id="ALM-12085__li825145510516"><span>Run the <strong id="ALM-12085__b1925020558514">vi ${BIGDATA_LOG_HOME}/controller/scriptlog/getLogs.log</strong> command to check whether the keyword "LOG SIZE is more than 5000MB" can be searched.</span><p><ul id="ALM-12085__ul9251155175119"><li id="ALM-12085__li62515551518">If it can, go to <a href="#ALM-12085__li1525114552513">4</a>.</li><li id="ALM-12085__li1125145519515">If it cannot, go to <a href="#ALM-12085__li17248145525118">5</a>.</li></ul>
</p></li><li id="ALM-12085__li1525114552513"><a name="ALM-12085__li1525114552513"></a><a name="li1525114552513"></a><span>Check whether the oversized service audit logs are caused by exceptions.</span></li></ol>
<p id="ALM-12085__p165332487519"><strong id="ALM-12085__b637152618459">The OMS backup storage space is insufficient.</strong></p>
<ol start="5" id="ALM-12085__ol12250855195118"><li id="ALM-12085__li17248145525118"><a name="ALM-12085__li17248145525118"></a><a name="li17248145525118"></a><span>Run the <strong id="ALM-12085__b1424775565116">vi ${BIGDATA_LOG_HOME}/controller/scriptlog/getLogs.log</strong> command to check whether the keyword "Collect log failed, too many logs on" can be searched.</span><p><ul id="ALM-12085__ul12248195555118"><li id="ALM-12085__li92471755145117">If it can, obtain the host IP address following the keyword "Collect log failed, too many logs on", and go to <a href="#ALM-12085__li1324811555511">6</a>.</li><li id="ALM-12085__li15248135585116">If it cannot, go to <a href="#ALM-12085__li274114665213">11</a>.</li></ul>
</p></li><li id="ALM-12085__li1324811555511"><a name="ALM-12085__li1324811555511"></a><a name="li1324811555511"></a><span>Log in to the host with the IP address obtained in <a href="#ALM-12085__li17248145525118">5</a> as user <strong id="ALM-12085__b524820550519">root</strong>.</span></li><li id="ALM-12085__li1024875511517"><span>Run the <strong id="ALM-12085__b1324819555516">vi {BIGDATA_LOG_HOME}/nodeagent/scriptlog/collectLog.log</strong> command to check whether the keyword "log size exceeds" can be searched.</span><p><ul id="ALM-12085__ul1924820556516"><li id="ALM-12085__li16248185512518">If it can, go to <a href="#ALM-12085__li1411119282589">9</a>.</li><li id="ALM-12085__li82481755185111">If it cannot, go to <a href="#ALM-12085__li1532033151617">8</a>.</li></ul>
</p></li><li class="subitemlist" id="ALM-12085__li1532033151617"><a name="ALM-12085__li1532033151617"></a><a name="li1532033151617"></a><span>Check whether the alarm additional information contains the keyword "no enough space".</span><p><ul id="ALM-12085__ul1731818543217"><li id="ALM-12085__li11318165472110">If yes, go to <a href="#ALM-12085__li1411119282589">9</a>.</li><li id="ALM-12085__li31789204224">If no, go to<a href="#ALM-12085__li274114665213">11</a>.</li></ul>
</p></li><li id="ALM-12085__li1411119282589"><a name="ALM-12085__li1411119282589"></a><a name="li1411119282589"></a><span>Perform the following operations to expand the disk capacity or reduce the maximum number of audit log backups:</span><p><ul id="ALM-12085__ul88871628185918"><li id="ALM-12085__li1424835520516">Expand the capacity of the OMS node<em id="ALM-12085__i1370982115464">.</em></li><li id="ALM-12085__li7888182814599">Run the following command to edit the file and decrease the value of <strong id="ALM-12085__b10816349897">MAX_NUM_BK_AUDITLOG</strong>.<p id="ALM-12085__p73231585014"><strong id="ALM-12085__b174148111704">vi ${CONTROLLER_HOME}/etc/om/componentsauditlog.properties</strong></p>
</li></ul>
</p></li><li id="ALM-12085__li162499550512"><span>In the next execution period, 03:00, check whether the alarm is cleared.</span><p><ul id="ALM-12085__ul14248955155118"><li id="ALM-12085__li18248655105117">If it is, no further action is required.</li><li id="ALM-12085__li10248155535113">If it is not, go to <a href="#ALM-12085__li274114665213">11</a>.</li></ul>
</p></li></ol>
<p id="ALM-12085__p7231123185212"><strong id="ALM-12085__b624915505113">Check whether the space of the host where the service is located is insufficient.</strong></p>
<ol start="11" id="ALM-12085__ol474716605220"><li id="ALM-12085__li274114665213"><a name="ALM-12085__li274114665213"></a><a name="li274114665213"></a><span>Run the <strong id="ALM-12085__b87391669521">vi ${BIGDATA_LOG_HOME}/controller/scriptlog/getLogs.log </strong>command to check whether the keyword "Collect log failed, no enough space on <em id="ALM-12085__i12739868522">hostIp</em>" can be searched.</span><p><ul id="ALM-12085__ul1674115611521"><li id="ALM-12085__li974117610521">If it can, obtain the IP address of the abnormal host and go to <a href="#ALM-12085__li137411362525">12</a>.</li><li id="ALM-12085__li127412695212">If it cannot, go to <a href="#ALM-12085__li1181415165216">15</a>.</li></ul>
</p></li><li id="ALM-12085__li137411362525"><a name="ALM-12085__li137411362525"></a><a name="li137411362525"></a><span>Log in to the host with the IP address obtained as user <strong id="ALM-12085__b3741168528">root</strong>, and run the <strong id="ALM-12085__b1774126175216">df "$BIGDATA_HOME/tmp" -lP | tail -1 | awk '{print ($4/1024)}'</strong> command to obtain the remaining space of the host log directory. Check whether the value is less than 1000 MB.</span><p><ul id="ALM-12085__ul157411161523"><li id="ALM-12085__li1274126125218">If it is, go to <a href="#ALM-12085__li274186155216">13</a>.</li><li id="ALM-12085__li27415617522">If it is not, go to <a href="#ALM-12085__li1181415165216">15</a>.</li></ul>
</p></li><li id="ALM-12085__li274186155216"><a name="ALM-12085__li274186155216"></a><a name="li274186155216"></a><span>Expand the capacity of the node</span></li><li id="ALM-12085__li37461665219"><span>In the next execution period, 03:00, check whether the alarm is cleared.</span><p><ul id="ALM-12085__ul27413610528"><li id="ALM-12085__li074196105220">If it is, no further action is required.</li><li id="ALM-12085__li3741126145211">If it is not, go to <a href="#ALM-12085__li1181415165216">15</a>.</li></ul>
</p></li></ol>
<p id="ALM-12085__p188812125211"><strong id="ALM-12085__b97465619525">Collect fault information.</strong></p>
<ol start="15" id="ALM-12085__ol39315165214"><li id="ALM-12085__li1181415165216"><a name="ALM-12085__li1181415165216"></a><a name="li1181415165216"></a><span>On FusionInsight Manager, choose <strong id="ALM-12085__b784154521">O&amp;M</strong>&gt; <strong id="ALM-12085__b1189158527">Log</strong> &gt; <strong id="ALM-12085__b888159524">Download</strong>.</span></li><li id="ALM-12085__li148121511522"><span>Select <strong id="ALM-12085__b188181513525">Controller</strong> for <strong id="ALM-12085__b3881565219">Service</strong> and click <strong id="ALM-12085__b3991118545">OK</strong>.</span></li><li id="ALM-12085__li59815195214"><span>Click <span><img id="ALM-12085__image386151529" src="en-us_image_0269383932.png"></span> in the upper right corner. In the displayed dialog box, set <strong id="ALM-12085__b58115165217">Start Date</strong> and <strong id="ALM-12085__b138515105216">End Date</strong> to 10 minutes before and after the alarm generation time respectively and click <strong id="ALM-12085__b13941575214">OK</strong>. Then, click <strong id="ALM-12085__b1999156522">Download</strong>.</span></li><li id="ALM-12085__li495644512588"><span>Contact the <span id="ALM-12085__text4614151421417">O&amp;M personnel</span> and send the collected log information.</span></li></ol>
</div>
<div class="section" id="ALM-12085__section9927125174518"><h4 class="sectiontitle">Alarm Clearing</h4><p id="ALM-12085__p14381226174516">This alarm will be automatically cleared after the fault is rectified.</p>
</div>
<div class="section" id="ALM-12085__section692710251457"><h4 class="sectiontitle">Related Information</h4><p id="ALM-12085__p0384267453">None</p>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
</div>
</div>