forked from docs/doc-exports
Reviewed-by: Pruthi, Vineet <vineet.pruthi@t-systems.com> Reviewed-by: Rechenburg, Matthias <matthias.rechenburg@t-systems.com> Co-authored-by: Yang, Tong <yangtong2@huawei.com> Co-committed-by: Yang, Tong <yangtong2@huawei.com>
81 lines
11 KiB
HTML
81 lines
11 KiB
HTML
<a name="ALM-38006"></a><a name="ALM-38006"></a>
|
|
|
|
<h1 class="topictitle1">ALM-38006 Percentage of Kafka Partitions That Are Not Completely Synchronized Exceeds the Threshold</h1>
|
|
<div id="body48078326"><div class="section" id="ALM-38006__s485fd543da0f423ea758d6d86575ee2d"><h4 class="sectiontitle">Description</h4><p id="ALM-38006__en-us_topic_0070543590_p64679424">The system checks the percentage of Kafka partitions that are not completely synchronized to the total number of partitions every 60 seconds. This alarm is generated when the percentage exceeds the threshold (50% by default) for 3 consecutive times.</p>
|
|
<p id="ALM-38006__p23969228145930">When the <strong id="ALM-38006__b1855881691815">Trigger Count</strong> is 1, this alarm is cleared when the percentage is less than or equal to the threshold. When the <strong id="ALM-38006__b1553716592195">Trigger Count</strong> is greater than 1, this alarm is cleared when the percentage is less than or equal to 90% of the threshold.</p>
|
|
</div>
|
|
<div class="section" id="ALM-38006__sc895595722ae431a9d1d38fb9543bd22"><h4 class="sectiontitle">Attribute</h4>
|
|
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-38006__en-us_topic_0070543590_table40877649" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-38006__en-us_topic_0070543590_row47054252"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.1"><p id="ALM-38006__en-us_topic_0070543590_p53298078">Alarm ID</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.2"><p id="ALM-38006__en-us_topic_0070543590_p22177085">Alarm Severity</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.3"><p id="ALM-38006__en-us_topic_0070543590_p51513456">Automatically Cleared</p>
|
|
</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody><tr id="ALM-38006__en-us_topic_0070543590_row11840416"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.1 "><p id="ALM-38006__en-us_topic_0070543590_p19549620">38006</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.2 "><p id="ALM-38006__en-us_topic_0070543590_p40015381">Major</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.3 "><p id="ALM-38006__en-us_topic_0070543590_p20020394">Yes</p>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
</div>
|
|
<div class="section" id="ALM-38006__s2a9bb2c32b904dad926f1f4901900cf3"><h4 class="sectiontitle">Parameters</h4>
|
|
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-38006__en-us_topic_0070543590_table11039195" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-38006__en-us_topic_0070543590_row25081306"><th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.1"><p id="ALM-38006__en-us_topic_0070543590_p18319882">Name</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.2"><p id="ALM-38006__en-us_topic_0070543590_p7515483">Meaning</p>
|
|
</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody><tr id="ALM-38006__row13141151175712"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-38006__p192431315431">Source</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-38006__p692551319435">Specifies the cluster for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-38006__en-us_topic_0070543590_row4774388"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-38006__en-us_topic_0070543590_p51181182">ServiceName</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-38006__en-us_topic_0070543590_p52035105">Specifies the service for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-38006__en-us_topic_0070543590_row65662763"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-38006__en-us_topic_0070543590_p17083557">RoleName</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-38006__en-us_topic_0070543590_p41590859">Specifies the role for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-38006__en-us_topic_0070543590_row38773419"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-38006__en-us_topic_0070543590_p53639231">Trigger Condition</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-38006__en-us_topic_0070543590_p49810484">Specifies the threshold triggering the alarm. If the current indicator value exceeds this threshold, the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
</div>
|
|
<div class="section" id="ALM-38006__s576222aeb64a4f618db716ded92e71af"><h4 class="sectiontitle">Impact on the System</h4><p id="ALM-38006__en-us_topic_0070543590_p8117426">Too many Kafka partitions that are not completely synchronized affect service reliability. In addition, data may be lost when leaders are switched.</p>
|
|
</div>
|
|
<div class="section" id="ALM-38006__s765baaaf095b44d98526d729f2cc0e82"><h4 class="sectiontitle">Possible Causes</h4><p id="ALM-38006__en-us_topic_0070543590_p53531744">Some nodes where the Broker instance resides are abnormal or stop running. As a result, replicas of some partitions in Kafka are out of the in-sync replicas (ISR) set.</p>
|
|
</div>
|
|
<div class="section" id="ALM-38006__s202b85a787724830bc0abf1f919ec0e4"><h4 class="sectiontitle">Procedure</h4><p class="tableheading" id="ALM-38006__en-us_topic_0070543590_p41103974"><strong id="ALM-38006__b42612563154711">Check Broker instances.</strong></p>
|
|
<ol id="ALM-38006__ol51234866154723"><li id="ALM-38006__li4651447915478"><span>On the <span id="ALM-38006__text34789336432">MRS</span> Manager portal, choose <strong id="ALM-38006__b4831185113332">Cluster</strong> > <em id="ALM-38006__i14240109123418">Name of the desired cluster</em><strong id="ALM-38006__b9831125114333"> </strong>><strong id="ALM-38006__b11282328205216"> Services</strong> > <strong id="ALM-38006__b4259370315478">Kafka</strong> > <strong id="ALM-38006__b4779900815478">Instance</strong>. The Kafka instances page is displayed.</span></li><li id="ALM-38006__li1743646315478"><a name="ALM-38006__li1743646315478"></a><a name="li1743646315478"></a><span>Check whether faulty nodes exist among all Broker nodes.</span><p><ul class="subitemlist" id="ALM-38006__ul193738415478"><li id="ALM-38006__li957648715478">If yes, record the host name of the node and go to <a href="#ALM-38006__li2760667615478">3</a>.</li><li id="ALM-38006__li3749796715478">If no, go to <a href="#ALM-38006__li6648467215478">5</a>.</li></ul>
|
|
</p></li><li id="ALM-38006__li2760667615478"><a name="ALM-38006__li2760667615478"></a><a name="li2760667615478"></a><span>On the <span id="ALM-38006__text284441561716">MRS</span> Manager portal, click <strong id="ALM-38006__b1547012412537">O&M </strong>><strong id="ALM-38006__b1456513439539"> Alarm </strong>><strong id="ALM-38006__b4565843155310"> Alarm</strong><strong id="ALM-38006__b932085253011">s</strong> to check whether the fault described in <a href="#ALM-38006__li1743646315478">2</a> exists in the alarm information and handle the alarm based on corresponding methods.</span></li><li id="ALM-38006__li523456615478"><span>On the <span id="ALM-38006__text1262214171177">MRS</span> Manager portal, choose <strong id="ALM-38006__b053921663919">Cluster</strong> > <em id="ALM-38006__i55392016173915">Name of the desired cluster</em><strong id="ALM-38006__b65391163398"> </strong>> <strong id="ALM-38006__b4713349715478">Services</strong> > <strong id="ALM-38006__b2154829715478">Kafka</strong> > <strong id="ALM-38006__b5971694715478">Instance</strong>. The Kafka instances page is displayed.</span></li><li id="ALM-38006__li6648467215478"><a name="ALM-38006__li6648467215478"></a><a name="li6648467215478"></a><span>Check whether stopped nodes exist among all Broker instance.</span><p><ul class="subitemlist" id="ALM-38006__ul5958296815478"><li id="ALM-38006__li2134668115478">If yes, go to <a href="#ALM-38006__li1472641115478">6</a>.</li><li id="ALM-38006__li5135957215478">If no, go to <a href="#ALM-38006__li5037705315478">7</a>.</li></ul>
|
|
</p></li><li id="ALM-38006__li1472641115478"><a name="ALM-38006__li1472641115478"></a><a name="li1472641115478"></a><span>Select all stopped Broker instances and click <strong id="ALM-38006__b6149114015478">Start Instance</strong>.</span></li><li id="ALM-38006__li5037705315478"><a name="ALM-38006__li5037705315478"></a><a name="li5037705315478"></a><span>Check whether the alarm is cleared.</span><p><ul class="subitemlist" id="ALM-38006__ul5033220815478"><li id="ALM-38006__li6542884115478">If yes, no further action is required.</li><li id="ALM-38006__li6524479115478">If no, go to <a href="#ALM-38006__li1632355415478">8</a>.</li></ul>
|
|
</p></li></ol>
|
|
<p class="tableheading" id="ALM-38006__p32640666154724"><strong id="ALM-38006__b25330546154724">Collect fault information.</strong></p>
|
|
<ol start="8" id="ALM-38006__ol26063454154728"><li id="ALM-38006__li1632355415478"><a name="ALM-38006__li1632355415478"></a><a name="li1632355415478"></a><span>On the <span id="ALM-38006__text59881910174">MRS</span> Manager portal, choose <strong id="ALM-38006__b33241914145619">O&M</strong> > <strong id="ALM-38006__b87621619205610">Log </strong>><strong id="ALM-38006__b107624197562"> Download</strong>.</span></li><li id="ALM-38006__li6020228915478"><span>Select <strong id="ALM-38006__b1269426115478">Kafka</strong> in the required cluster from the <strong id="ALM-38006__b4713948915478">Service</strong> drop-down list.</span></li><li id="ALM-38006__li1145664103113"><span>Click <span><img id="ALM-38006__image1945644173117" src="en-us_image_0000001582927557.png"></span> in the upper right corner, and set <strong id="ALM-38006__b6456941173117">Start Date</strong> and <strong id="ALM-38006__b11456154113318">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-38006__b13456164113319">Download</strong>.</span></li><li id="ALM-38006__li1545774815478"><span>Contact the <span id="ALM-38006__text4614151421417">O&M personnel</span> and send the collected logs.</span></li></ol>
|
|
</div>
|
|
<div class="section" id="ALM-38006__section1529716184534"><h4 class="sectiontitle">Alarm Clearing</h4><p id="ALM-38006__p4677152685316">After the fault is rectified, the system automatically clears this alarm.</p>
|
|
</div>
|
|
<div class="section" id="ALM-38006__s6df5673b785e46e6aecec239c153c02d"><h4 class="sectiontitle">Related Information</h4><p id="ALM-38006__en-us_topic_0070543590_p6323433">None</p>
|
|
</div>
|
|
</div>
|
|
<div>
|
|
<div class="familylinks">
|
|
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
|
|
</div>
|
|
</div>
|
|
|