This guide provides reference for Flink 1.12 only.
Change Data Capture (CDC) can synchronize incremental changes from the source database to one or more destinations. During data synchronization, CDC processes data, for example, grouping (GROUP BY) and joining multiple tables (JOIN).
This example creates a PostgreSQL CDC source table to monitor PostgreSQL data changes and insert the changed data into a GaussDB(DWS) database.
The version of the RDS PostgreSQL database cannot be earlier than 11.
Step 2: Create an RDS PostgreSQL Database and Table
Step 3: Create a GaussDB(DWS) Database and Table
The queue name can contain only digits, letters, and underscores (_), but cannot contain only digits or start with an underscore (_). The name must contain 1 to 128 characters.
The queue name is case-insensitive. Uppercase letters will be automatically converted to lowercase letters.
The CIDR block of a queue cannot overlap with the CIDR blocks of DMS Kafka and RDS for MySQL DB instances. Otherwise, datasource connections will fail to be created.
create table test.cdc_order( order_id VARCHAR, order_channel VARCHAR, order_time VARCHAR, pay_amount FLOAT8, real_pay FLOAT8, pay_time VARCHAR, user_id VARCHAR, user_name VARCHAR, area_id VARCHAR, primary key(order_id));
ALTER TABLE test.cdc_order REPLICA IDENTITY FULL;
gsql -d gaussdb -h Connection address of the GaussDB(DWS) cluster -U dbadmin -p 8000 -W password -r
CREATE DATABASE testdwsdb;
\q gsql -d testdwsdb -h Connection address of the GaussDB(DWS) cluster -U dbadmin -p 8000 -W password -r
create schema test; set current_schema= test; drop table if exists dws_order; CREATE TABLE dws_order ( order_id VARCHAR, order_channel VARCHAR, order_time VARCHAR, pay_amount FLOAT8, real_pay FLOAT8, pay_time VARCHAR, user_id VARCHAR, user_name VARCHAR, area_id VARCHAR );
Click OK. Click the name of the created datasource connection to view its status. You can perform subsequent steps only after the connection status changes to Active.
Click OK. Click the name of the created datasource connection to view its status. You can perform subsequent steps only after the connection status changes to Active.
In this example, the syntax version of Flink OpenSource SQL is 1.12. In this example, the data source is Kafka and the result data is written to Elasticsearch.
Parameter |
Description |
---|---|
Queue |
A shared queue is selected by default. You can select a CCE queue with dedicated resources and configure the following parameters: UDF Jar: UDF Jar file. Before selecting such a file, upload the corresponding JAR file to the OBS bucket and choose Data Management > Package Management to create a package. For details, see . In SQL, you can call a UDF that is inserted into a JAR file. NOTE:
When creating a job, a sub-user can only select the queue that has been allocated to the user. If the remaining capacity of the selected queue cannot meet the job requirements, the system automatically scales up the capacity and you will be billed based on the increased capacity. When a queue is idle, the system automatically scales in its capacity. |
CUs |
Sum of the number of compute units and job manager CUs of DLI. One CU equals 1 vCPU and 4 GB. The value is the number of CUs required for job running and cannot exceed the number of CUs in the bound queue. |
Job Manager CUs |
Number of CUs of the management unit. |
Parallelism |
Maximum number of Flink OpenSource SQL jobs that can run at the same time. NOTE:
This value cannot be greater than four times the compute units (number of CUs minus the number of JobManager CUs). |
Task Manager Configuration |
Whether to set Task Manager resource parameters. If this option is selected, you need to set the following parameters:
|
OBS Bucket |
OBS bucket to store job logs and checkpoint information. If the selected OBS bucket is not authorized, click Authorize. |
Save Job Log |
Whether to save job run logs to OBS. The logs are saved in Bucket name/jobs/logs/Directory starting with the job ID. CAUTION:
You are advised to configure this parameter. Otherwise, no run log is generated after the job is executed. If the job fails, the run log cannot be obtained for fault locating. If this option is selected, you need to set the following parameters: OBS Bucket: Select an OBS bucket to store user job logs. If the selected OBS bucket is not authorized, click Authorize.
NOTE:
If Enable Checkpointing and Save Job Log are both selected, you only need to authorize OBS once. |
Alarm Generation upon Job Exception |
Whether to notify users of any job exceptions, such as running exceptions or arrears, via SMS or email. If this option is selected, you need to set the following parameters: SMN Topic Select a user-defined SMN topic. For details about how to create a custom SMN topic, see "Creating a Topic" in Simple Message Notification User Guide. |
Enable Checkpointing |
Whether to enable job snapshots. If this function is enabled, jobs can be restored based on checkpoints. If this option is selected, you need to set the following parameters:
|
Auto Restart upon Exception |
Whether to enable automatic restart. If this function is enabled, jobs will be automatically restarted and restored when exceptions occur. If this option is selected, you need to set the following parameters:
|
Idle State Retention Time |
How long the state of a key is retained without being updated before it is removed in GroupBy or Window. The default value is 1 hour. |
Dirty Data Policy |
Policy for processing dirty data. The following policies are supported: Ignore, Trigger a job exception, and Save. If you set this field to Save, Dirty Data Dump Address must be set. Click the address box to select the OBS path for storing dirty data. |
create table PostgreCdcSource( order_id string, order_channel string, order_time string, pay_amount double, real_pay double, pay_time string, user_id string, user_name string, area_id STRING, primary key (order_id) not enforced ) with ( 'connector' = 'postgres-cdc', 'hostname' = ' 192.168.15.153',--IP address of the PostgreSQL instance 'port'= ' 5432',--Port number of the PostgreSQL instance 'pwd_auth_name'= 'xxxxx', -- Name of the datasource authentication of the password type created on DLI. If datasource authentication is used, you do not need to set the username and password for the job. 'database-name' = ' testrdsdb',--Database name of the PostgreSQL instance 'schema-name' = ' test',-- Schema in the PostgreSQL database 'table-name' = ' cdc_order'--Table name in the PostgreSQL database ); create table dwsSink( order_id string, order_channel string, order_time string, pay_amount double, real_pay double, pay_time string, user_id string, user_name string, area_id STRING, primary key(order_id) not enforced ) with ( 'connector' = 'gaussdb', 'driver' = 'com.gauss200.jdbc.Driver', 'url'='jdbc:gaussdb://192.168.168.16:8000/testdwsdb ', ---192.168.168.16:8000 indicates the internal IP address and port of the GaussDB(DWS) instance. testdwsdb indicates the name of the created GaussDB(DWS) database. 'table-name' = ' test\".\"dws_order', ---test indicates the schema of the created GaussDB(DWS) table, and dws_order indicates the GaussDB(DWS) table name. 'username' = 'xxxxx',--Username of the GaussDB(DWS) instance 'password' = 'xxxxx',--Password of the GaussDB(DWS) instance 'write.mode' = 'insert' ); insert into dwsSink select * from PostgreCdcSource where pay_amount > 100;
insert into test.cdc_order values ('202103241000000001','webShop','2021-03-24 10:00:00','50.00','100.00','2021-03-24 10:02:03','0001','Alice','330106'), ('202103251606060001','appShop','2021-03-24 12:06:06','200.00','180.00','2021-03-24 16:10:06','0002','Jason','330106'), ('202103261000000001','webShop','2021-03-24 14:03:00','300.00','100.00','2021-03-24 10:02:03','0003','Lily','330106'), ('202103271606060001','appShop','2021-03-24 16:36:06','99.00','150.00','2021-03-24 16:10:06','0001','Henry','330106');
gsql -d testdwsdb -h Connection address of the GaussDB(DWS) cluster -U dbadmin -p 8000 -W password -r
select * from test.dws_order;
order_channel order_channel order_time pay_amount real_pay pay_time user_id user_name area_id 202103251606060001 appShop 2021-03-24 12:06:06 200.0 180.0 2021-03-24 16:10:06 0002 Jason 330106 202103261000000001 webShop 2021-03-24 14:03:00 300.0 100.0 2021-03-24 10:02:03 0003 Lily 330106