This section applies only to MRS 3.2.0 or later.
Scenario 1: As the number of MRS ClickHouse services increases, the storage and compute resources of clusters cannot meet service requirements. Thus, the clusters need to be split so that part of user service and database data can be migrated to new clusters.
Scenario 2: The data center where the backend hosts of MRS ClickHouse clusters are located needs to be migrated, so does the ClickHouse cluster data.
To meet the migration requirements, MRS provides a one-click data migration tool for migrating the databases, table objects (DDL), and service data of ClickHouse from a source cluster to the new cluster.
In this migration solution, ClickHouse uses ZooKeeper to automatically synchronize the data of Replicated*MergeTree tables of different replicas in the same shard. The logical procedure is as follows:
During the migration of a replicated table, its metadata in the source cluster is exported and changed to the ZooKeeper path and replica of the destination cluster. Then, you can create a table in the destination cluster based on the modified metadata.
To migrate data in the non-replicated tables and materialized views, you can call the remote function.
The preceding migration operations are encapsulated using the migration tool script. In this way, you can modify the related configuration files and run the migration scripts to complete the migration by one click. For details, see the procedure description.
Step |
Description |
---|---|
Step 1: Connect the source cluster to the destination cluster. |
This step ensures that the source and target ClickHouse clusters as well as their nodes can communicate with each other. |
By doing so, ZooKeeper in the source cluster can be used as the auxiliary ZooKeeper during data migration. |
|
You can run the corresponding script to migrate metadata such as the database name, table name, and table structure of the ClickHouse database and tables in the source cluster to the destination cluster. |
|
You can run the corresponding script to migrate the ClickHouse database and table data from the source cluster to the destination cluster. |
The following figure shows the host configurations of the ClickHouse instance:
vi /etc/hosts
Copy the host information of the ClickHouse instance of the source cluster obtained in 2.b to the hosts file.
Parameter |
Value |
---|---|
auxiliary_zookeepers.zookeeper2.node[1].host |
Service IP address of the first ZooKeeper quorumpeer instance in the source cluster obtained in 1. Currently, only the IP address of the ZooKeeper instance can be configured. |
auxiliary_zookeepers.zookeeper2.node[1].port |
2181 |
auxiliary_zookeepers.zookeeper2.node[2].host |
Service IP address of the second ZooKeeper quorumpeer instance in the source cluster obtained in 1. Currently, only the IP address of the ZooKeeper instance can be configured. |
auxiliary_zookeepers.zookeeper2.node[2].port |
2181 |
auxiliary_zookeepers.zookeeper2.node[3].host |
Service IP address of the third ZooKeeper quorumpeer instance in the source cluster obtained in 1. Currently, only the IP address of the ZooKeeper instance can be configured. |
auxiliary_zookeepers.zookeeper2.node[3].port |
2181 |
ps -ef |grep clickhouse
cat ENV_VARS | grep ZK
Obtain the values of ZK_SERVER_FQDN, ZK_USER_PRINCIPAL and ZK_USER_REALM.
Parameter |
Value |
---|---|
AUXILIARY_ZK_SERVER_FQDN |
Value of ZK_SERVER_FQDN obtained in 7 |
AUXILIARY_ZK_SERVER_PRINCIPAL |
Value of ZK_USER_PRINCIPAL obtained in 7 |
AUXILIARY_ZK_SERVER_REALM |
Value of ZK_USER_REALM obtained in 7 |
METADATA_COLLECTION_TIMEOUT |
180. This parameter specifies the timeout interval for waiting for the completion of metadata backup on other nodes, in seconds. |
cd Client installation directory/ClickHouse/clickhouse_migration_tool/clickhouse-metadata-migration
vi example_config.yaml
After the configuration is modified, you must delete all comment with number sign(#) and retain only valid configurations. Otherwise, an error may occur during script migration.
Configuration Item |
Sub-item |
Value and Description |
---|---|---|
source_cluster |
host |
IP address of any ClickHouseServer node in the source cluster. |
cluster_name |
Name of the source ClickHouse cluster. You can log in to the ClickHouse client by referring to Using ClickHouse from Scratch and run the following command to obtain the value. If the source cluster name has not been changed, the default value is default_cluster. select cluster,shard_num,replica_num,host_name from system.clusters; |
|
https_port |
To obtain the port number, log in to FusionInsight Manager of the source cluster, choose Cluster > Services > ClickHouse, and click the Configurations tab and then All Configurations. In the displayed page, search for https_port. |
|
zookeeper_root_path |
To obtain the value, log in to FusionInsight Manager of the source cluster, choose Cluster > Services > ClickHouse, and click the Configurations tab and then All Configurations. In the displayed page, search for clickhouse.zookeeper.root.path. |
|
system |
System parameter. Retain the default value. |
|
databases |
Optional.
|
|
tables |
Optional. The value is in the format of Database name.Table name. The database name must be in the databases parameter list.
|
|
destination_cluster |
host |
IP address of any ClickHouseServer node in the destination cluster. |
cluster_name |
Name of the destination ClickHouse cluster. You can log in to the ClickHouse client by referring to Using ClickHouse from Scratch and run the following command to obtain the value. If the destination cluster name has not been changed, the default value is default_cluster. select cluster,shard_num,replica_num,host_name from system.clusters; |
|
user |
Username created in 1 for logging in to FusionInsight Manager of the destination ClickHouse cluster. |
|
https_port |
To obtain the port number, log in to FusionInsight Manager of the destination cluster, choose Cluster > Services > ClickHouse, and click the Configurations tab and then All Configurations. In the displayed page, search for https_port. |
|
zookeeper_root_path |
To obtain the value, log in to FusionInsight Manager of the destination cluster, choose Cluster > Services > ClickHouse, and click the Configurations tab and then All Configurations. In the displayed page, search for clickhouse.zookeeper.root.path. |
|
system |
System parameter. Retain the default value. |
./clickhouse_migrate_metadata.sh -f yaml_file
Enter the usernames and passwords of the source and destination clusters.
If metadata migration fails, perform the following steps:
cd Client installation directory/ClickHouse/clickhouse_migration_tool/clickhouse-data-migration
vi example_config.yaml
Configuration Item |
Sub-item |
Value and Description |
---|---|---|
source_cluster |
host |
IP address of any ClickHouseServer node in the source cluster. |
cluster_name |
Name of the source ClickHouse cluster. You can log in to the ClickHouse client by referring to Using ClickHouse from Scratch and run the following command to obtain the value. If the source cluster name has not been changed, the default value is default_cluster. select cluster,shard_num,replica_num,host_name from system.clusters; |
|
user |
Username created in 1 for logging in to FusionInsight Manager of the source ClickHouse cluster. |
|
https_port |
To obtain the port number, log in to FusionInsight Manager of the source cluster, choose Cluster > Services > ClickHouse, and click the Configurations tab and then All Configurations. In the displayed page, search for https_port. |
|
tcp_port |
To obtain the value, log in to FusionInsight Manager of the source cluster, choose Cluster > Services > ClickHouse, and click the Configurations tab and then All Configurations. In the displayed page, search for tcp_port_secure if the cluster is in security mode. Otherwise, search for tcp_port. |
|
zookeeper_root_path |
To obtain the value, log in to FusionInsight Manager of the source cluster, choose Cluster > Services > ClickHouse, and click the Configurations tab and then All Configurations. In the displayed page, search for clickhouse.zookeeper.root.path. |
|
system |
System parameter. Retain the default value. |
|
databases |
Optional.
|
|
tables |
Optional. The value is in the format of Database name.Table name. The database name must be in the databases parameter list.
|
|
destination_cluster |
host |
IP address of any ClickHouseServer node in the destination cluster. |
cluster_name |
Name of the destination ClickHouse cluster. You can log in to the ClickHouse client by referring to Using ClickHouse from Scratch and run the following command to obtain the value. If the destination cluster name has not been changed, the default value is default_cluster. select cluster,shard_num,replica_num,host_name from system.clusters; |
|
user |
Username created in 1 for logging in to FusionInsight Manager of the destination ClickHouse cluster. |
|
https_port |
To obtain the port number, log in to FusionInsight Manager of the destination cluster, choose Cluster > Services > ClickHouse, and click the Configurations tab and then All Configurations. In the displayed page, search for https_port. |
|
tcp_port |
To obtain the value, log in to FusionInsight Manager of the destination cluster, choose Cluster > Services > ClickHouse, and click the Configurations tab and then All Configurations. In the displayed page, search for tcp_port_secure if the cluster is in security mode. Otherwise, search for tcp_port. |
|
zookeeper_root_path |
To obtain the value, log in to FusionInsight Manager of the destination cluster, choose Cluster > Services > ClickHouse, and click the Configurations tab and then All Configurations. In the displayed page, search for clickhouse.zookeeper.root.path. |
|
system |
System parameter. Retain the default value. |
|
auxiliary_zookeepers |
name |
ZooKeeper name of the source ClickHouse cluster configured in 3, for example, zookeeper2. |
hosts |
IP address of the ZooKeeper instance of the source ClickHouse. To obtain the IP address, log in to FusionInsight Manager of the source cluster, choose Cluster > Services > ZooKeeper, and click the Instance tab. On the displayed page, view the service IP addresses of the ZooKeeper quorumpeer instance , as shown in Figure 3. The format is as follows:
hosts: - "192.168.1.2" - "192.168.1.3" - "192.168.1.4" |
|
port |
2181 |
|
execution_procedure |
- |
This parameter is left blank by default, indicating that the script is executed once to synchronize service data. Value options are firststep and secondstep.
|
onereplica_use_auxiliaryzookeeper |
- |
|
./clickhouse_migrate_data.sh -f yaml_file
Enter the usernames and passwords of the source and destination clusters.
Log in to the ClickHouse client node in the destination cluster and go to the Client installation directory/ClickHouse/clickhouse_migration_tool/clickhouse-data-migration/comparison_result directory.
Compare the following result file information to check the data consistency between the source cluster and the destination cluster:
If the data is inconsistent before and after the migration, clear the data in the table of the destination cluster and migrate the data in the table separately or manually.
In addition, you can log in to the ClickHouse databases of the source and destination clusters to manually check whether the number of table data records and partitions are consistent.
Click Save. In the displayed dialog box, click OK.
vi example_config.yaml
Delete the password from the configuration file to prevent password leakage.
If service data migration fails, perform the following steps: