forked from docs/doc-exports
Reviewed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com> Co-authored-by: Yang, Tong <yangtong2@huawei.com> Co-committed-by: Yang, Tong <yangtong2@huawei.com>
95 lines
12 KiB
HTML
95 lines
12 KiB
HTML
<a name="ALM-27006"></a><a name="ALM-27006"></a>
|
|
|
|
<h1 class="topictitle1">ALM-27006 Disk Space Usage of the Data Directory Exceeds the Threshold</h1>
|
|
<div id="body1593485868636"><div class="section" id="ALM-27006__section132254217017"><h4 class="sectiontitle">Description</h4><p id="ALM-27006__p852913428014">The system checks the disk space usage of the data directory on the active DBServer node every 30 seconds and compares the disk usage with the threshold. The alarm is generated when the disk space usage exceeds the threshold for five consecutive times (the default value). The number of consecutive times is configurable. The disk space usage threshold of the data directory is set to 80% by default, which is configurable as well.</p>
|
|
<p id="ALM-27006__p135297421808">The value of <strong id="ALM-27006__b17529104214018">hit number</strong> is configurable. When the value is set to <strong id="ALM-27006__b352934217017">1</strong> and the disk space usage is lower than or equal to the threshold, the alarm is cleared. When the value is greater than 1 and the disk space usage is lower than 90% of the threshold, the alarm is cleared.</p>
|
|
</div>
|
|
<div class="section" id="ALM-27006__section732315421101"><h4 class="sectiontitle">Attribute</h4>
|
|
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-27006__table183241429014" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-27006__row19529174211018"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.1"><p id="ALM-27006__p452910421907">Alarm ID</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.2"><p id="ALM-27006__p25291642603">Alarm Severity</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.3"><p id="ALM-27006__p25294421016">Auto Clear</p>
|
|
</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody><tr id="ALM-27006__row05292421502"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.1 "><p id="ALM-27006__p252919421203">27006</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.2 "><p id="ALM-27006__p205291427014">Major</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.3 "><p id="ALM-27006__p2052944216010">Yes</p>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
</div>
|
|
<div class="section" id="ALM-27006__section73337421809"><h4 class="sectiontitle">Parameters</h4>
|
|
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-27006__table10333114218013" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-27006__row15530842904"><th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.1"><p id="ALM-27006__p1453018426016">Name</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.2"><p id="ALM-27006__p175308424015">Meaning</p>
|
|
</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody><tr id="ALM-27006__row18530134219015"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-27006__p125309421502">ClusterName</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-27006__p95300427015">Specifies the cluster for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-27006__row5530114215012"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-27006__p553034214019">ServiceName</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-27006__p75304428020">Specifies the service for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-27006__row55306421501"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-27006__p853017422011">RoleName</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-27006__p6530164210012">Specifies the role for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-27006__row1353020428012"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-27006__p25301942104">HostName</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-27006__p205301421500">Specifies the host for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-27006__row17530194219018"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-27006__p25304427012">PartitionName</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-27006__p0530184212010">Specifies the disk partition where the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-27006__row18530204210013"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-27006__p75302426019">Trigger Condition</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-27006__p115303421301">Specifies the threshold triggering the alarm. If the actual indicator value exceeds this threshold, the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
</div>
|
|
<div class="section" id="ALM-27006__section1434020421009"><h4 class="sectiontitle">Impact on the System</h4><ul id="ALM-27006__ul453016425019"><li id="ALM-27006__li95301426012">Service processes become unavailable.</li><li id="ALM-27006__li1553010424011">When the disk space usage of the data directory exceeds 90%, the database reports the "Database Enters the Read-Only Mode" alarm and enters the read-only mode, which may cause service data loss.</li></ul>
|
|
</div>
|
|
<div class="section" id="ALM-27006__section1234319420014"><h4 class="sectiontitle">Possible Causes</h4><ul id="ALM-27006__ul65314421702"><li id="ALM-27006__li155311242506">The alarm threshold is improperly configured.</li><li id="ALM-27006__li10531194215012">The data volume of the database is too large or the disk configuration cannot meet service requirements, causing excessive disk usage.</li></ul>
|
|
</div>
|
|
<div class="section" id="ALM-27006__section63441742407"><h4 class="sectiontitle">Procedure</h4><p id="ALM-27006__p14531242202"><strong id="ALM-27006__b7531194213015">Check whether the threshold is set properly.</strong></p>
|
|
<ol id="ALM-27006__ol15316424011"><li id="ALM-27006__li13531442704"><span>On FusionInsight Manager, choose <strong id="ALM-27006__b195311242605">O&M</strong> > <strong id="ALM-27006__b053117427017">Alarm</strong> > <strong id="ALM-27006__b953112429012">Thresholds</strong> > <em id="ALM-27006__i155311442401">Name of the desired cluster</em> > <strong id="ALM-27006__b1353116421309">DBService</strong> > <strong id="ALM-27006__b135311042108">Database</strong> > <strong id="ALM-27006__b353154214010">Disk Space Usage of the Data Directory</strong> to check whether the alarm threshold is proper (the default value 80% is a proper value).</span><p><ul id="ALM-27006__ul653144213016"><li id="ALM-27006__li7531154214017">If yes, go to <a href="#ALM-27006__li165316427014">3</a>.</li><li id="ALM-27006__li5531842402">If no, go to <a href="#ALM-27006__li165311142601">2</a>.</li></ul>
|
|
</p></li><li id="ALM-27006__li165311142601"><a name="ALM-27006__li165311142601"></a><a name="li165311142601"></a><span>Change the alarm threshold based on the actual service situation.</span></li><li id="ALM-27006__li165316427014"><a name="ALM-27006__li165316427014"></a><a name="li165316427014"></a><span>Choose <strong id="ALM-27006__b19531194212015">Cluster</strong> > <em id="ALM-27006__i1953114216013">Name of the desired cluster</em> > <strong id="ALM-27006__b125312042308">Services</strong> > <strong id="ALM-27006__b85314421901">DBService</strong>. On the <strong id="ALM-27006__b9531942206">Dashboard</strong> page, view the <strong id="ALM-27006__b6531144219011">Disk Space Usage of the Data Directory</strong> chart and check whether the disk space usage of the data directory is lower than the threshold.</span><p><ul id="ALM-27006__ul35319427013"><li id="ALM-27006__li8531242101">If yes, go to <a href="#ALM-27006__li1553118426012">4</a>.</li><li id="ALM-27006__li1531042209">If no, go to <a href="#ALM-27006__li1453211421204">5</a>.</li></ul>
|
|
</p></li><li id="ALM-27006__li1553118426012"><a name="ALM-27006__li1553118426012"></a><a name="li1553118426012"></a><span>Wait 2 minutes and check whether the alarm is automatically cleared.</span><p><ul id="ALM-27006__ul10531154213015"><li id="ALM-27006__li125311421900">If yes, no further action is required.</li><li id="ALM-27006__li4532142804">If no, go to <a href="#ALM-27006__li1453211421204">5</a>.</li></ul>
|
|
<p id="ALM-27006__p1653214421001"><strong id="ALM-27006__b13532542203">Check whether large files are incorrectly written into the disk.</strong></p>
|
|
</p></li><li id="ALM-27006__li1453211421204"><a name="ALM-27006__li1453211421204"></a><a name="li1453211421204"></a><span>Log in to the active DBService node as user <strong id="ALM-27006__b15321424016">omm</strong>.</span></li><li id="ALM-27006__li653284218015"><span>Run the following commands to view the files whose size exceeds 500 MB in the data directory and check whether there are large files incorrectly written into the directory:</span><p><p id="ALM-27006__p1453284211011"><strong id="ALM-27006__b553211428019">source $DBSERVER_HOME/.dbservice_profile</strong></p>
|
|
<p id="ALM-27006__p353211421010"><strong id="ALM-27006__b1532134219011">find "$DBSERVICE_DATA_DIR"/../ -type f -size +500M</strong></p>
|
|
<ul id="ALM-27006__ul753218420013"><li id="ALM-27006__li753217421011">If yes, go to <a href="#ALM-27006__li1453214421204">7</a>.</li><li id="ALM-27006__li1553217424013">If no, go to <a href="#ALM-27006__li1853216425019">8</a>.</li></ul>
|
|
</p></li><li id="ALM-27006__li1453214421204"><a name="ALM-27006__li1453214421204"></a><a name="li1453214421204"></a><span>Handle the large files based on the actual scenario and check whether the alarm is cleared 2 minutes later.</span><p><ul id="ALM-27006__ul15532194219014"><li id="ALM-27006__li6532194216019">If yes, no further action is required.</li><li id="ALM-27006__li205327429011">If no, go to <a href="#ALM-27006__li1853216425019">8</a>.</li></ul>
|
|
<p id="ALM-27006__p205329427018"><strong id="ALM-27006__b7532114213018">Collect fault information.</strong></p>
|
|
</p></li><li id="ALM-27006__li1853216425019"><a name="ALM-27006__li1853216425019"></a><a name="li1853216425019"></a><span>On FusionInsight Manager, choose <strong id="ALM-27006__b55325429016">O&M</strong> > <strong id="ALM-27006__b153212421507">Log</strong> > <strong id="ALM-27006__b14532542705">Download</strong>.</span></li><li id="ALM-27006__li18532154217017"><span>Expand the <strong id="ALM-27006__b9532114212013">Service</strong> drop-down list, and select <strong id="ALM-27006__b95325428010">DBService</strong> for the target cluster.</span></li><li id="ALM-27006__li553219421903"><span>Specify the host for collecting logs by setting the <strong id="ALM-27006__b2532184219018">Host</strong> parameter which is optional. By default, all hosts are selected.</span></li><li id="ALM-27006__li10532144218017"><span>Click <span><img id="ALM-27006__image1353216429015" src="en-us_image_0269623978.png"></span> in the upper right corner, and set <strong id="ALM-27006__b15532144218019">Start Date</strong> and <strong id="ALM-27006__b14532542805">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-27006__b15532164216013">Download</strong>.</span></li><li id="ALM-27006__li75321425016"><span>Contact the <span id="ALM-27006__text4614151421417">O&M personnel</span> and send the collected logs.</span></li></ol>
|
|
</div>
|
|
<div class="section" id="ALM-27006__section14354104218015"><h4 class="sectiontitle">Alarm Clearing</h4><p id="ALM-27006__p35320421506">After the fault is rectified, the system automatically clears this alarm.</p>
|
|
</div>
|
|
<div class="section" id="ALM-27006__section147771053193515"><h4 class="sectiontitle">Related Information</h4><p id="ALM-27006__p198871353113510">None</p>
|
|
</div>
|
|
</div>
|
|
<div>
|
|
<div class="familylinks">
|
|
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
|
|
</div>
|
|
</div>
|
|
|