Files

chenxiaoxiong f9e2808b7c DataArts UMN 20250810 version

Reviewed-by: Pruthi, Vineet <vineet.pruthi@t-systems.com>
Co-authored-by: chenxiaoxiong <chenxiaoxiong@huawei.com>
Co-committed-by: chenxiaoxiong <chenxiaoxiong@huawei.com>

2025-09-02 10:44:13 +00:00

20 KiB

Raw Blame History

Managing CDM Job Configuration

On the Settings tab page, you can perform the following operations:

Maximum Concurrent Extractors
Scheduled Backup/Restoration
Environment Variables of Job Parameters

Maximum Concurrent Extractors

Maximum number of concurrent extraction tasks in a cluster

This parameter is also available on the Cluster Configuration page. You can change its value either on this page or the Cluster Configuration page.

CDM migrates data through data migration jobs. It works in the following way:

When data migration jobs are submitted, CDM splits each job into multiple tasks based on the Concurrent Extractors parameter in the job configuration.

Jobs for different data sources may be split based on different dimensions. Some jobs may not be split based on the Concurrent Extractors parameter.
CDM submits the tasks to the running pool in sequence. Tasks (defined by Maximum Concurrent Extractors) run concurrently. Excess tasks are queued.

By setting appropriate values for the Concurrent Extractors and Maximum Concurrent Extractors parameters, you can accelerate migration.

You are advised to set Maximum Concurrent Extractors to twice the number of vCPUs. For details, see Table 1.

**Table 1** Recommended maximum number of concurrent extractors for a CDM cluster
Flavor	vCPUs/Memory	Recommended Maximum Concurrent Extractors
cdm.large	8 vCPUs, 16 GB	16
cdm.xlarge	16 vCPUs, 32 GB	32
cdm.4xlarge	64 vCPUs, 128 GB	128

Configure the number of concurrent extractors based on the following rules:
1. When data is to be migrated to files, CDM does not support multiple concurrent tasks. In this case, set a single process to extract data.
2. If each row of the table contains less than or equal to 1 MB data, data can be extracted concurrently. If each row contains more than 1 MB data, it is recommended that data be extracted in a single thread.
3. Set Concurrent Extractors for a job based on Maximum Concurrent Extractors for the cluster. It is recommended that the value of Concurrent Extractors is less than that of Maximum Concurrent Extractors.
4. If the migration source is Hive and JDBC is used to read data, CDM does not support multi-concurrency. In this case, set the number of concurrent extractors to 1.
5. If the destination is DLI, you are advised to set the number of concurrent extractors to 1. Otherwise, data may fail to be written.

Scheduled Backup/Restoration

This function depends on the OBS service. Backup files cannot be automatically aged. You need to manually delete backup files on a regular basis.

Prerequisites
An OBS link has been created. For details, see OBS Link Parameters.

Scheduled backup

On the Job Management page, click Settings and configure Scheduled Backup and its related parameters.

**Table 2** Scheduled backup parameters
Parameter	Description	Example Value
Scheduled Backup	Whether to enable automatic backup. This function is used to back up jobs but not links.	Enable
Backup Policy	All jobs: CDM backs up all table/file migration jobs and entire DB migration jobs regardless of the job statuses. However, historical jobs are not backed up. All jobs by groups: You select one or more job groups to back up.	All jobs
Backup Cycle	Select the backup cycle. Day: The backup is performed daily at 00:00:00. Week: The backup is performed at 00:00:00 every Monday. Month: The backup is performed at 00:00:00 on the first day of each month.	Day
OBS Link for Writing Backups	Link used to back up jobs to OBS buckets. Select a link you have created on the Links page.	obslink
OBS Bucket	OBS bucket where backup files are stored	cdm
Backup Data Directory	Directory where backup files are stored	/cdm-bk/

Restoring jobs
If automatic backup has been performed, the backup list is displayed on the Configuration Management tab page. The OBS buckets where the backup files reside, backup paths, and backup time are displayed.

You can click Restore Backup in the Operation column of the backup list to restore the CDM jobs.

Environment Variables of Job Parameters

When creating a migration job on CDM, the parameter (such as the OBS bucket name or file path) that can be manually configured, a field in a parameter, or a character in a field can be configured as a global variable, so that you can change parameter values in batches, or batch replace certain characters after jobs are exported or imported.

Parent topic: Creating a Job in a CDM Cluster

20 KiB Raw Blame History

Managing CDM Job Configuration

Maximum Concurrent Extractors

Scheduled Backup/Restoration

Environment Variables of Job Parameters

20 KiB

Raw Blame History