doc-exports/docs/mrs/umn/ALM-38008.html
Yang, Tong 3b1f73dece MRS UMN 2.0.38.SP20 version
Reviewed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com>
Co-authored-by: Yang, Tong <yangtong2@huawei.com>
Co-committed-by: Yang, Tong <yangtong2@huawei.com>
2022-12-13 12:03:34 +00:00

93 lines
11 KiB
HTML

<a name="ALM-38008"></a><a name="ALM-38008"></a>
<h1 class="topictitle1">ALM-38008 Abnormal Kafka Data Directory Status</h1>
<div id="body1544512676719"><div class="section" id="ALM-38008__s3c5af36d89d44702bd46d9e007a3d832"><h4 class="sectiontitle">Description</h4><p id="ALM-38008__p1266010215274">The system checks the Kafka data directory status every 60 seconds. This alarm is generated when the system detects that the status of a data directory is abnormal.</p>
<p id="ALM-38008__p46602211272"><strong id="ALM-38008__b136604214272">Trigger Count</strong> is set to <strong id="ALM-38008__b9660102192710">1</strong>. This alarm is cleared when the data directory status becomes normal.</p>
</div>
<div class="section" id="ALM-38008__s6d1548c0ed8e453dad7e8bfa61016bbd"><h4 class="sectiontitle">Attribute</h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-38008__en-us_topic_0070543591_table33687968" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-38008__en-us_topic_0070543591_row66116154"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.1"><p id="ALM-38008__en-us_topic_0070543591_p53808254">Alarm ID</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.2"><p id="ALM-38008__en-us_topic_0070543591_p63501347">Alarm Severity</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.3"><p id="ALM-38008__en-us_topic_0070543591_p43335505">Automatically Cleared</p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-38008__en-us_topic_0070543591_row20514991"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.1 "><p id="ALM-38008__en-us_topic_0070543591_p51101541">38008</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.2 "><p id="ALM-38008__en-us_topic_0070543591_p45584156">Major</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.3 "><p id="ALM-38008__en-us_topic_0070543591_p1329147">Yes</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-38008__s6aeb3d426eed41b5b22f2df7ea8fff3b"><h4 class="sectiontitle">Parameters</h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-38008__en-us_topic_0070543591_table40552107" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-38008__en-us_topic_0070543591_row50031493"><th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.1"><p id="ALM-38008__en-us_topic_0070543591_p26019105">Name</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.2"><p id="ALM-38008__en-us_topic_0070543591_p27172786">Meaning</p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-38008__row593413955720"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-38008__p192431315431">Source</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-38008__p692551319435">Specifies the cluster for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-38008__en-us_topic_0070543591_row53512096"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-38008__en-us_topic_0070543591_p39512530">ServiceName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-38008__p15775113422916">Specifies the service for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-38008__en-us_topic_0070543591_row14932284"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-38008__en-us_topic_0070543591_p1555529">RoleName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-38008__p1754034613176">Specifies the role for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-38008__row182205713223"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-38008__p882857112210">HostName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-38008__p163571420182314">Specifies the host name for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-38008__row1054380102315"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-38008__p1854310092319">DirName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-38008__p89256204231">Specifies the directory name for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-38008__en-us_topic_0070543591_row60239080"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-38008__p2039082642918">Trigger Condition</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-38008__p748015062916">Specifies the condition that the Kafka data directory status is abnormal.</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-38008__s6246d25e195c44c89205c3294c587977"><h4 class="sectiontitle">Impact on the System</h4><p id="ALM-38008__p41212320303">If the Kafka data directory status is abnormal, the current replicas of all partitions in the data directory are brought offline, and the data directory status of multiple nodes is abnormal at the same time. As a result, some partitions may become unavailable.</p>
</div>
<div class="section" id="ALM-38008__s073f4bd54c5f43498ef7607a06660555"><h4 class="sectiontitle">Possible Causes</h4><ul id="ALM-38008__ul132011214173014"><li id="ALM-38008__li1512441723018">The data directory permission is tampered with.</li><li id="ALM-38008__li12021914153018">The disk where the data directory is located is faulty.</li></ul>
</div>
<div class="section" id="ALM-38008__section1386954531616"><h4 class="sectiontitle">Procedure</h4><p class="tableheading" id="ALM-38008__en-us_topic_0070543591_p55770280"><strong id="ALM-38008__b1475918276279">Check the permission on the faulty data directory.</strong></p>
<ol id="ALM-38008__ol46864976154410"><li id="ALM-38008__li2868309615440"><span>Find the host information in the alarm information and log in to the host.</span></li><li id="ALM-38008__li1654108315440"><a name="ALM-38008__li1654108315440"></a><a name="li1654108315440"></a><span>In the alarm information, check whether the data directory and its subdirectories belong to the omm:wheel group.</span><p><ul class="subitemlist" id="ALM-38008__ul1394143651911"><li id="ALM-38008__li7957365192">If yes, record the host name of the node and go to <a href="#ALM-38008__li7931254192720">4</a>.</li><li id="ALM-38008__li297436101920">If no, go to <a href="#ALM-38008__li1465202115440">3</a>.</li></ul>
</p></li><li id="ALM-38008__li1465202115440"><a name="ALM-38008__li1465202115440"></a><a name="li1465202115440"></a><span>Restore the owner group of the data directory and its subdirectories to omm:wheel.</span><p><ul id="ALM-38008__ul2034177203318"><li id="ALM-38008__li2034219719334">If yes, go to <a href="#ALM-38008__li1893217544278">6</a>.</li><li id="ALM-38008__li934237133319">If no, go to <a href="#ALM-38008__li4931105420275">5</a>.</li></ul>
</p></li></ol>
<p id="ALM-38008__p1392511547271"><strong id="ALM-38008__b183810502813">Check whether the disk where the data directory is located is faulty.</strong></p>
<ol start="4" id="ALM-38008__ol5932145482712"><li id="ALM-38008__li7931254192720"><a name="ALM-38008__li7931254192720"></a><a name="li7931254192720"></a><span>In the upper-level directory of the data directory, create and delete files as user <strong id="ALM-38008__b31764919329">omm</strong>. Check whether data read/write on the disk is normal.</span></li><li id="ALM-38008__li4931105420275"><a name="ALM-38008__li4931105420275"></a><a name="li4931105420275"></a><span>Replace or repair the disk where the data directory is located to ensure that data read/write on the disk is normal.</span></li><li id="ALM-38008__li1893217544278"><a name="ALM-38008__li1893217544278"></a><a name="li1893217544278"></a><span>On the FusionInsight Manager home page, choose <strong id="ALM-38008__b3931854142710">Cluster</strong> &gt; <em id="ALM-38008__i1993115482710">Name of the desired cluster</em><strong id="ALM-38008__b12931115442713"> </strong>&gt; <strong id="ALM-38008__b1093175452710">Services</strong> &gt; <strong id="ALM-38008__b17931354142713">Kafka</strong> &gt; <strong id="ALM-38008__b1793105415279">Instance</strong>. On the Kafka instance page that is displayed, restart the Broker instance on the host recorded in <a href="#ALM-38008__li1654108315440">2</a>.</span></li><li id="ALM-38008__li0932954152718"><span>After Broker is started, check whether the alarm is cleared.</span><p><ul id="ALM-38008__ul17932135413279"><li id="ALM-38008__li14932135417272">If yes, no further action is required.</li><li id="ALM-38008__li3932454102720">If no, go to <a href="#ALM-38008__li783366415440">8</a>.</li></ul>
</p></li></ol>
<p class="tableheading" id="ALM-38008__p10382276154414"><strong id="ALM-38008__b55150482154417">Collect fault information.</strong></p>
<ol start="8" id="ALM-38008__ol33468126154421"><li id="ALM-38008__li783366415440"><a name="ALM-38008__li783366415440"></a><a name="li783366415440"></a><span>On FusionInsight Manager, choose <strong id="ALM-38008__b4551182792713">O&amp;M </strong>&gt;<strong id="ALM-38008__b1655392720277"> Log</strong> &gt; <strong id="ALM-38008__b4560964915440">Download</strong>.</span></li><li id="ALM-38008__li5839163215440"><span>In the <strong id="ALM-38008__b128041418162315">Service </strong>area, select <strong id="ALM-38008__b10865621510">Kafka</strong> in the required cluster.</span></li><li id="ALM-38008__li1145664103113"><span>Click <span><img id="ALM-38008__image1945644173117" src="en-us_image_0269417506.png"></span> in the upper right corner, and set <strong id="ALM-38008__b6456941173117">Start Date</strong> and <strong id="ALM-38008__b11456154113318">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-38008__b13456164113319">Download</strong>.</span></li><li id="ALM-38008__li3186770515440"><span>Contact the <span id="ALM-38008__text4614151421417">O&amp;M personnel</span> and send the collected logs.</span></li></ol>
</div>
<div class="section" id="ALM-38008__section1529716184534"><h4 class="sectiontitle">Alarm Clearing</h4><p id="ALM-38008__p4677152685316">After the fault is rectified, the system automatically clears this alarm.</p>
</div>
<div class="section" id="ALM-38008__s469a3fa614484b529bfca51a77ce5e1d"><h4 class="sectiontitle">Related Information</h4><p id="ALM-38008__en-us_topic_0070543591_p64814367">None</p>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
</div>
</div>