Files
doc-exports/docs/mrs/umn/admin_guide_000012.html
yangtong c285e88a17 MRS UMN 20250806 version
Reviewed-by: Pruthi, Vineet <vineet.pruthi@t-systems.com>
Co-authored-by: yangtong <yangtong2@huawei.com>
Co-committed-by: yangtong <yangtong2@huawei.com>
2025-09-02 10:43:57 +00:00

73 lines
13 KiB
HTML

<a name="admin_guide_000012"></a><a name="admin_guide_000012"></a>
<h1 class="topictitle1">Performing a Rolling Restart of a Cluster</h1>
<div id="body1556440081183"><div class="section" id="admin_guide_000012__section3621144214401"><h4 class="sectiontitle">Scenario</h4><p id="admin_guide_000012__p18293445407">A rolling restart is batch restarting all services in a cluster after they are modified or upgraded without interrupting workloads.</p>
<p id="admin_guide_000012__p64181242426">You can perform a rolling restart of a cluster as needed.</p>
<div class="note" id="admin_guide_000012__note5534111819436"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><ul id="admin_guide_000012__ul4479352694035"><li id="admin_guide_000012__li1949364194326">Certain services in a cluster do not support rolling restart. These services are restarted in normal mode during the rolling restart of the cluster. As a result, workloads may be interrupted. So, you need to determine whether to perform this operation as prompted.</li><li id="admin_guide_000012__li91281555154315">Configurations that must take effect immediately, for example, server port configurations, should be restarted in normal mode.</li></ul>
</div></div>
</div>
<div class="section" id="admin_guide_000012__section25021132154413"><h4 class="sectiontitle">Impact on the System</h4><p id="admin_guide_000012__en-us_topic_0118210076_a21d53323b19b4b15a4b488084a8d7358">A rolling restart takes a longer time and may affect service throughput and performance.</p>
</div>
<div class="section" id="admin_guide_000012__section16333559440"><h4 class="sectiontitle">Procedure</h4><ol id="admin_guide_000012__en-us_topic_0046737068_ol59198436"><li id="admin_guide_000012__en-us_topic_0046737068_li63023881"><span>Log in to <span id="admin_guide_000012__text15946118176">MRS</span> Manager.</span></li><li id="admin_guide_000012__en-us_topic_0046737068_li30344022"><span>Choose <strong id="admin_guide_000012__b1840013118442">Cluster</strong> &gt; <em id="admin_guide_000012__i72664103305">Name of the target cluster</em> &gt; <strong id="admin_guide_000012__b174015119445">Dashboard</strong>. On this tab page, choose <strong id="admin_guide_000012__b640115117448">More</strong> &gt; <strong id="admin_guide_000012__b740171194418">Rolling-restart Service</strong>.</span><p><div class="note" id="admin_guide_000012__note15466144215117"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="admin_guide_000012__admin_guide_000011_p98116010369">For MRS 3.3.0 or later, the <strong id="admin_guide_000012__admin_guide_000011_b1613994105012">Cluster</strong> &gt; <strong id="admin_guide_000012__admin_guide_000011_b44246194501">Dashboard</strong> page has been removed from Manager. You can choose <strong id="admin_guide_000012__admin_guide_000011_b20392164813615">More</strong> in the upper right corner of the <strong id="admin_guide_000012__admin_guide_000011_b823793316367">Homepage</strong> to access cluster maintenance and management functions.</p>
</div></div>
</p></li><li id="admin_guide_000012__li147445458458"><span>In the dialog box that is displayed, enter the password of the current login user and click <strong id="admin_guide_000012__b934419859113418">OK</strong>.</span></li><li id="admin_guide_000012__li74461344154616"><span>Configure the parameters based on site requirements.</span><p>
<div class="tablenoborder"><a name="admin_guide_000012__en-us_topic_0118210076_t65f951fcfc8a4a37b6c7f3481125fe35"></a><a name="en-us_topic_0118210076_t65f951fcfc8a4a37b6c7f3481125fe35"></a><table cellpadding="4" cellspacing="0" summary="" id="admin_guide_000012__en-us_topic_0118210076_t65f951fcfc8a4a37b6c7f3481125fe35" frame="border" border="1" rules="all"><caption><b>Table 1 </b>Rolling restart parameters</caption><thead align="left"><tr id="admin_guide_000012__en-us_topic_0118210076_rc85481c745524300bedbca2144a66df7"><th align="left" class="cellrowborder" valign="top" width="32.71%" id="mcps1.3.3.2.4.2.1.2.3.1.1"><p id="admin_guide_000012__en-us_topic_0118210076_a7452edafdcb64e2f83afe45a973773ee">Parameter</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="67.29%" id="mcps1.3.3.2.4.2.1.2.3.1.2"><p id="admin_guide_000012__en-us_topic_0118210076_a49a04e30f8d54a6bb9374969101f503c">Description</p>
</th>
</tr>
</thead>
<tbody><tr id="admin_guide_000012__en-us_topic_0118210076_r6847022d2bba48abaaccb49ad0717c44"><td class="cellrowborder" valign="top" width="32.71%" headers="mcps1.3.3.2.4.2.1.2.3.1.1 "><p id="admin_guide_000012__en-us_topic_0118210076_ae29e533d82874e7e8cce0ceb11b46f56">Restart only instances with expired configurations in the cluster</p>
</td>
<td class="cellrowborder" valign="top" width="67.29%" headers="mcps1.3.3.2.4.2.1.2.3.1.2 "><p id="admin_guide_000012__en-us_topic_0118210076_en-us_topic_0049504339_p364696516151">Whether to restart only the modified instances in a cluster</p>
</td>
</tr>
<tr id="admin_guide_000012__en-us_topic_0118210076_r067cff235d7d41448245883a5347ac1c"><td class="cellrowborder" valign="top" width="32.71%" headers="mcps1.3.3.2.4.2.1.2.3.1.1 "><p id="admin_guide_000012__en-us_topic_0118210076_a6965f9543cba45ee9c8c09e9cab17227">Enable rack strategy</p>
</td>
<td class="cellrowborder" valign="top" width="67.29%" headers="mcps1.3.3.2.4.2.1.2.3.1.2 "><p id="admin_guide_000012__en-us_topic_0118210076_ab5bcc6c3709446e2bc05e3c7fb7888af">Whether to enable the concurrent rack rolling restart strategy. This parameter takes effect only for roles that meet the rack rolling restart strategy. (The roles support rack awareness, and instances of the roles belong to two or more racks.)</p>
<div class="note" id="admin_guide_000012__en-us_topic_0118210076_note38687787202428"><span class="notetitle"> NOTE: </span><div class="notebody"><p id="admin_guide_000012__en-us_topic_0118210076_p12645767202428">This parameter is configurable only when a rolling restart is performed on HDFS or YARN.</p>
</div></div>
</td>
</tr>
<tr id="admin_guide_000012__en-us_topic_0118210076_r55197725e71845a0a99b408dfc8ab2d9"><td class="cellrowborder" valign="top" width="32.71%" headers="mcps1.3.3.2.4.2.1.2.3.1.1 "><p id="admin_guide_000012__en-us_topic_0118210076_aa15a79b6bd804abb9779096defa847f9">Data Nodes to Be Batch Restarted</p>
</td>
<td class="cellrowborder" valign="top" width="67.29%" headers="mcps1.3.3.2.4.2.1.2.3.1.2 "><p id="admin_guide_000012__en-us_topic_0118210076_p157461842161814">Number of instances that are restarted in each batch when the batch rolling restart strategy is used. The default value is <strong id="admin_guide_000012__b1660319920617">1</strong>.</p>
<div class="note" id="admin_guide_000012__en-us_topic_0118210076_note574714214185"><span class="notetitle"> NOTE: </span><div class="notebody"><ul id="admin_guide_000012__en-us_topic_0118210076_ul13747194221810"><li id="admin_guide_000012__en-us_topic_0118210076_li3747242191810">This parameter is valid only when the batch rolling restart strategy is used and the instance type is DataNode.</li><li id="admin_guide_000012__en-us_topic_0118210076_li16747184231812">This parameter is invalid when the rack strategy is enabled. In this case, the cluster uses the maximum number of instances (20 by default) configured in the rack strategy as the maximum number of instances that are concurrently restarted in a rack.</li><li id="admin_guide_000012__en-us_topic_0118210076_li59017316193">This parameter is configurable only when a rolling restart is performed on HDFS, HBase, YARN, Kafka, Storm, or Flume.</li><li id="admin_guide_000012__en-us_topic_0118210076_li15586176122412">This parameter for the RegionServer of HBase cannot be manually configured. Instead, it is automatically adjusted based on the number of RegionServer nodes. Specifically, if the number of RegionServer nodes is less than 30, the parameter value is <strong id="admin_guide_000012__b17517114910222">1</strong>. If the number is greater than or equal to 30 and less than 300, the parameter value is <strong id="admin_guide_000012__b47639463232">2</strong>. If the number is greater than or equal to 300, the parameter value is 1% of the number (rounded-down).</li></ul>
</div></div>
</td>
</tr>
<tr id="admin_guide_000012__en-us_topic_0118210076_r327da26d16da4552b831ff81bfc305a7"><td class="cellrowborder" valign="top" width="32.71%" headers="mcps1.3.3.2.4.2.1.2.3.1.1 "><p id="admin_guide_000012__en-us_topic_0118210076_en-us_topic_0049504339_p576793392935">Batch Interval</p>
</td>
<td class="cellrowborder" valign="top" width="67.29%" headers="mcps1.3.3.2.4.2.1.2.3.1.2 "><p id="admin_guide_000012__en-us_topic_0118210076_a21a822d829d84ed29c7c6c437f25d34d">Interval between two batches of instances to be roll-restarted. The default value is <strong id="admin_guide_000012__b667516392261">0</strong>.</p>
</td>
</tr>
<tr id="admin_guide_000012__en-us_topic_0118210076_rcd86d7ea78f4432faed15789f6955fa7"><td class="cellrowborder" valign="top" width="32.71%" headers="mcps1.3.3.2.4.2.1.2.3.1.1 "><p id="admin_guide_000012__en-us_topic_0118210076_ac3af9ecf55194ecc86cf689f59fd96cd">Decommissioning Timeout Interval</p>
</td>
<td class="cellrowborder" valign="top" width="67.29%" headers="mcps1.3.3.2.4.2.1.2.3.1.2 "><p id="admin_guide_000012__en-us_topic_0118210076_p116263715191">Decommissioning interval for role instances during a rolling restart. The default value is <strong id="admin_guide_000012__b371714319337">1800s</strong>.</p>
<p id="admin_guide_000012__en-us_topic_0118210076_p41631837181915">Some roles (such as HiveServer and JDBCServer) stop providing services before the rolling restart. Stopped instances cannot cannot be connected to new clients. Existing connections will be completed after a period of time. An appropriate timeout interval can ensure service continuity.</p>
<div class="note" id="admin_guide_000012__en-us_topic_0118210076_note7507462202634"><span class="notetitle"> NOTE: </span><div class="notebody"><p id="admin_guide_000012__en-us_topic_0118210076_p458298202634">This parameter is configurable only when a rolling restart is performed on Hive or Spark2x.</p>
</div></div>
</td>
</tr>
<tr id="admin_guide_000012__en-us_topic_0118210076_rc87535526c26449f94dce582eb93a314"><td class="cellrowborder" valign="top" width="32.71%" headers="mcps1.3.3.2.4.2.1.2.3.1.1 "><p id="admin_guide_000012__en-us_topic_0118210076_ad53062aff93d461fb8cbb1e6f5ad6dea">Batch Fault Tolerance Threshold</p>
</td>
<td class="cellrowborder" valign="top" width="67.29%" headers="mcps1.3.3.2.4.2.1.2.3.1.2 "><p id="admin_guide_000012__en-us_topic_0118210076_a167ad3ce261a41ac8d525490ef9307e8">Tolerance times when the rolling restart of instances fails to be batch executed. The default value is <strong id="admin_guide_000012__b102711511144613">0</strong>, which indicates that the rolling restart task ends after any batch of instances fails to restart.</p>
</td>
</tr>
</tbody>
</table>
</div>
<div class="note" id="admin_guide_000012__en-us_topic_0118210076_nee5569bd4ed240a3b814af969d421941"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="admin_guide_000012__en-us_topic_0118210076_a2e12e675d5b64f5584c73ab6693e61e5">Advanced parameters, such as <strong id="admin_guide_000012__b1926414217497">Data Nodes to Be Batch Restarted</strong>, <strong id="admin_guide_000012__b20475205724918">Batch Interval</strong>, and <strong id="admin_guide_000012__b1711212875018">Batch Fault Tolerance Threshold</strong>, should be properly configured based on site requirements. Otherwise, services may be interrupted or cluster performance may be severely affected.</p>
<p id="admin_guide_000012__en-us_topic_0118210076_a8ff53168887141179cca6287c26258f8">Example:</p>
<ul id="admin_guide_000012__en-us_topic_0118210076_u10fa31ee3f4f48c59bd499614665e20c"><li id="admin_guide_000012__en-us_topic_0118210076_l7aed63bc0c6b4934b6cff299a74bd85d">If <strong id="admin_guide_000012__b686369279113418">Data Nodes to Be Batch Restarted</strong> is set to an unnecessarily large value, a large number of instances are restarted concurrently. As a result, services are interrupted or cluster performance is severely affected due to too few working instances.</li><li id="admin_guide_000012__en-us_topic_0118210076_la40843f322a841e49e49829666c2e0f2">If <strong id="admin_guide_000012__b1560044433113418">Batch Fault Tolerance Threshold</strong> is too large, services will be interrupted because a next batch of instances will be restarted after a batch of instances fails to restart.</li></ul>
</div></div>
</p></li><li id="admin_guide_000012__li115673451469"><span>Click <span class="uicontrol" id="admin_guide_000012__en-us_topic_0118210076_u200c150990a446038f46e109fd7035c4"><b>OK</b></span>.</span></li></ol>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="admin_guide_000010.html">Cluster Management</a></div>
</div>
</div>