forked from docs/doc-exports
Reviewed-by: Pruthi, Vineet <vineet.pruthi@t-systems.com> Co-authored-by: Yang, Tong <yangtong2@huawei.com> Co-committed-by: Yang, Tong <yangtong2@huawei.com>
94 lines
13 KiB
HTML
94 lines
13 KiB
HTML
<a name="ALM-45441"></a><a name="ALM-45441"></a>
|
|
|
|
<h1 class="topictitle1">ALM-45441 Zookeeper Disconnected</h1>
|
|
<div id="body0000001194005538"><div class="note" id="ALM-45441__note12303191265"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="ALM-45441__p1431519566">This section applies only to MRS 3.3.0 or later.</p>
|
|
</div></div>
|
|
<div class="section" id="ALM-45441__section4181191543314"><h4 class="sectiontitle"><span id="ALM-45441__text14838183534515">Alarm Description</span></h4><p id="ALM-45441__p363513175232">The system checks the connection between ClickHouse and ZooKeeper every minute. This alarm is generated when the connection fails. The alarm is reported because the ZooKeeper connection is abnormal. If the connection fails for three consecutive times, the system generates an alarm.</p>
|
|
<p id="ALM-45441__p842285323314">This alarm is automatically cleared when the system detects that the connection is normal.</p>
|
|
</div>
|
|
<div class="section" id="ALM-45441__section6432132533414"><h4 class="sectiontitle"><span id="ALM-45441__text66488119489">Alarm Attributes</span></h4>
|
|
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-45441__table15811244124611" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-45441__row115971544184611"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.3.2.1.4.1.1"><p id="ALM-45441__p12597174434618"><span id="ALM-45441__text1074744511529">Alarm ID</span></p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.3.2.1.4.1.2"><p id="ALM-45441__p5597114494615"><span id="ALM-45441__text529420513457">Alarm Severity</span></p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.3.2.1.4.1.3"><p id="ALM-45441__p1559716445469"><span id="ALM-45441__text139206232502">Auto Cleared</span></p>
|
|
</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody><tr id="ALM-45441__row155971644124612"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-45441__p65978447466">45441</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-45441__p13598344144611">Critical</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.3.2.1.4.1.3 "><p id="ALM-45441__p175981544194611">Yes</p>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
</div>
|
|
<div class="section" id="ALM-45441__section105471213143515"><h4 class="sectiontitle"><span id="ALM-45441__text0580183514489">Alarm Parameters</span></h4>
|
|
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-45441__table132164271583" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-45441__row92761127482"><th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.4.2.1.3.1.1"><p id="ALM-45441__p12276527485"><span id="ALM-45441__text12210145419505">Parameter</span></p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.4.2.1.3.1.2"><p id="ALM-45441__p72767277812"><span id="ALM-45441__text1971012173566">Description</span></p>
|
|
</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody><tr id="ALM-45441__row1427616274819"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.1 "><p id="ALM-45441__p162761627283">Source</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.2 "><p id="ALM-45441__p1427611271585">Specifies the cluster or system for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-45441__row5276827783"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.1 "><p id="ALM-45441__p52764271086">ServiceName</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.2 "><p id="ALM-45441__p32763271180">Specifies the service for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-45441__row2027616271585"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.1 "><p id="ALM-45441__p122762271287">RoleName</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.2 "><p id="ALM-45441__p2276327885">Specifies the role for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-45441__row5276202713819"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.1 "><p id="ALM-45441__p202768273810">HostName</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.2 "><p id="ALM-45441__p1227618271580">Specifies the host for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
</div>
|
|
<div class="section" id="ALM-45441__section0918121233917"><h4 class="sectiontitle"><span id="ALM-45441__text1127833410585">Impact on the System</span></h4><p id="ALM-45441__p115222903917">If ClickHouse is disconnected from ZooKeeper, the ClickHouse service cannot be used.</p>
|
|
</div>
|
|
<div class="section" id="ALM-45441__section15920165211392"><h4 class="sectiontitle"><span id="ALM-45441__text10245783115">Possible Causes</span></h4><ul id="ALM-45441__ul99361645184215"><li id="ALM-45441__li1046817283012">The ZooKeeper service is abnormal.</li><li id="ALM-45441__li12936945154211">The ClickHouse service is overloaded.</li></ul>
|
|
</div>
|
|
<div class="section" id="ALM-45441__section1437654425314"><h4 class="sectiontitle"><span id="ALM-45441__text35421632154">Handling Procedure</span></h4><p id="ALM-45441__p15198192793613"><strong id="ALM-45441__b561045314214">Check whether ZooKeeper is normal.</strong></p>
|
|
<ol id="ALM-45441__ol79577513018"><li id="ALM-45441__li1195114518013"><span>On FusionInsight Manager, choose <strong id="ALM-45441__b19366759622">Cluster</strong> > <strong id="ALM-45441__b33669592218">Services</strong> > <strong id="ALM-45441__b536625916215">ZooKeeper</strong> > <strong id="ALM-45441__b836716591922">quorumpeer</strong>.</span></li><li id="ALM-45441__li20952135802"><span>Check whether ZooKeeper instances are normal.</span><p><ul id="ALM-45441__ul4144613813"><li id="ALM-45441__li17144311818">If yes, go to <a href="#ALM-45441__li15319205119354">6</a>.</li><li id="ALM-45441__li214431387">If no, go to <a href="#ALM-45441__li1395215202">3</a>.</li></ul>
|
|
</p></li><li id="ALM-45441__li1395215202"><a name="ALM-45441__li1395215202"></a><a name="li1395215202"></a><span>Select instances whose status is not good and choose <strong id="ALM-45441__b172351721665">More</strong> > <strong id="ALM-45441__b1823512211866">Restart Instance</strong>.</span></li><li id="ALM-45441__li99531855012"><span>Check whether the instance status is good after restart.</span><p><ul id="ALM-45441__ul9953054010"><li id="ALM-45441__li1995315703">If yes, go to <a href="#ALM-45441__li6946141915104">5</a>.</li><li id="ALM-45441__li995315511013">If no, go to <a href="#ALM-45441__li6769733151816">10</a>.</li></ul>
|
|
</p></li><li id="ALM-45441__li6946141915104"><a name="ALM-45441__li6946141915104"></a><a name="li6946141915104"></a><span>Choose <strong id="ALM-45441__b16111472714">O&M</strong> > <strong id="ALM-45441__b116121579716">Alarm</strong> > <strong id="ALM-45441__b13612107578">Alarms</strong> and check whether the alarm is cleared.</span><p><ul id="ALM-45441__ul17946619191016"><li id="ALM-45441__li14946191918101">If yes, no further action is required.</li><li id="ALM-45441__li260244816319">If no, go to <a href="#ALM-45441__li15319205119354">6</a>.</li></ul>
|
|
</p></li></ol>
|
|
<p id="ALM-45441__p348310210267"><strong id="ALM-45441__b336616166720">Check whether the ClickHouse service load is heavy.</strong></p>
|
|
<ol start="6" id="ALM-45441__ol113197516357"><li id="ALM-45441__li15319205119354"><a name="ALM-45441__li15319205119354"></a><a name="li15319205119354"></a><span>Log in to FusionInsight Manager, choose <strong id="ALM-45441__b5853201316202">O&M</strong> > <strong id="ALM-45441__b1285361352013">Alarm</strong> > <strong id="ALM-45441__b188531913182010">Alarms</strong>, and view the role name and the IP address for the hostname in <strong id="ALM-45441__b48531713112019">Location</strong>.</span></li><li id="ALM-45441__li13812198153717"><span>Log in to the node where the client is installed as the client installation user and run the following commands:</span><p><p id="ALM-45441__p20951185605"><strong id="ALM-45441__b4951351605">cd </strong><em id="ALM-45441__i49516515011">{Client installation path}</em></p>
|
|
<p id="ALM-45441__p10951851109"><strong id="ALM-45441__b895175403">source bigdata_env</strong></p>
|
|
<ul id="ALM-45441__ul119521151019"><li id="ALM-45441__li12952051017">For a cluster with Kerberos authentication enabled (security mode):<p id="ALM-45441__p199511657011"><a name="ALM-45441__li12952051017"></a><a name="li12952051017"></a><strong id="ALM-45441__b159515511014">kinit</strong> <em id="ALM-45441__i69513518010">Component service user</em></p>
|
|
<p id="ALM-45441__p2233164043715"><strong id="ALM-45441__b9928999924216">clickhouse client --host </strong><em id="ALM-45441__i17831994784216">IP address of the ClickHouseServer instance that reports the alarm</em><strong id="ALM-45441__b3090799784216"> --port </strong>9440 <strong id="ALM-45441__b5464036404216">--secure</strong></p>
|
|
</li><li id="ALM-45441__li59521456012">For a cluster with Kerberos authentication disabled (normal mode):<p id="ALM-45441__p4952175200"><a name="ALM-45441__li59521456012"></a><a name="li59521456012"></a><strong id="ALM-45441__b797211811917">clickhouse client --host </strong>IP address of the ClickHouseServer instance that reports the alarm<strong id="ALM-45441__b1896203614918"> --user</strong><em id="ALM-45441__i952216364916"> User name</em><strong id="ALM-45441__b934319531498"> --password --port </strong>9440</p>
|
|
</li></ul>
|
|
</p></li><li id="ALM-45441__li14242152404310"><span>Run the following statement to check whether data is frequently written to the system table. If yes, wait until the service execution is complete and check whether the alarm is cleared.</span><p><p id="ALM-45441__p89521551509"><strong id="ALM-45441__b4952195308">SELECT query_id, user, FQDN(), elapsed, query FROM system.processes ORDER BY query_id;</strong></p>
|
|
<ul id="ALM-45441__ul99521851808"><li id="ALM-45441__li18952658011">If yes, no further action is required.</li><li id="ALM-45441__li19952159015">If no, go to <a href="#ALM-45441__li195348914449">9</a>.</li></ul>
|
|
</p></li><li id="ALM-45441__li195348914449"><a name="ALM-45441__li195348914449"></a><a name="li195348914449"></a><span>Check whether a large amount of data is written. If yes, wait until the task is complete and check whether the alarm is cleared.</span><p><ul id="ALM-45441__ul153413918448"><li id="ALM-45441__li25354954411">If yes, no further action is required.</li><li id="ALM-45441__li65357912445">If no, go to <a href="#ALM-45441__li6769733151816">10</a>.</li></ul>
|
|
</p></li></ol>
|
|
<p id="ALM-45441__p1086712560313"><strong id="ALM-45441__b10676357193114">Collect fault information.</strong></p>
|
|
<ol start="10" id="ALM-45441__ol14770133318187"><li id="ALM-45441__li6769733151816"><a name="ALM-45441__li6769733151816"></a><a name="li6769733151816"></a><span>On FusionInsight Manager, choose <strong id="ALM-45441__b19849230121814">O&M</strong>. In the navigation pane on the left, choose <strong id="ALM-45441__b785093001817">Log</strong> > <strong id="ALM-45441__b198501930101814">Download</strong>.</span></li><li id="ALM-45441__li10902033134212"><span>Expand the <strong id="ALM-45441__b3487113271812">Service</strong> drop-down list, and select <strong id="ALM-45441__b84870323183">ClickHouse</strong> for the target cluster.</span></li><li id="ALM-45441__li1848161911347"><span>Expand the <strong id="ALM-45441__b11864103414188">Hosts</strong> drop-down list. In the <strong id="ALM-45441__b10864193401817">Select Host</strong> dialog box that is displayed, select the abnormal host, and click <strong id="ALM-45441__b17864534171819">OK</strong>.</span></li><li id="ALM-45441__li181213284341"><span>Click the edit icon in the upper right corner, and set <strong id="ALM-45441__b15588153631815">Start Date</strong> and <strong id="ALM-45441__b1458893620182">End Date</strong> for log collection to 1 hour ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-45441__b3588183601817">Download</strong>.</span></li><li id="ALM-45441__li1539653315345"><span>Contact <span id="ALM-45441__text5701498183">O&M personnel</span> and provide the collected logs.</span></li></ol>
|
|
</div>
|
|
<div class="section" id="ALM-45441__section1069512919569"><h4 class="sectiontitle"><span id="ALM-45441__text976142215819">Alarm Clearance</span></h4><p id="ALM-45441__p391831655614">This alarm is automatically cleared after the fault is rectified.</p>
|
|
</div>
|
|
<div class="section" id="ALM-45441__section891955662611"><h4 class="sectiontitle"><span id="ALM-45441__text13373191116114">Related Information</span></h4><p id="ALM-45441__p139191756122619"><span id="ALM-45441__text13669101910115">None.</span></p>
|
|
</div>
|
|
</div>
|
|
<div>
|
|
<div class="familylinks">
|
|
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
|
|
</div>
|
|
</div>
|
|
|