Reading Data from Kafka and Writing Data to GaussDB(DWS)

This guide provides reference for Flink 1.12 only.

Description

This example analyzes real-time vehicle driving data and collects statistics on data results that meet specific conditions. The real-time vehicle driving data is stored in the Kafka source table, and then the analysis result is output to GaussDB(DWS).

For example, enter the following sample data:

{"car_id":"3027", "car_owner":"lilei", "car_age":"7", "average_speed":"76", "total_miles":"15000"}
{"car_id":"3028", "car_owner":"hanmeimei", "car_age":"6", "average_speed":"92", "total_miles":"17000"}
{"car_id":"3029", "car_owner":"Ann", "car_age":"10", "average_speed":"81", "total_miles":"230000"}
Expected output is vehicles meeting the average_speed <= 90 and total_miles <= 200,000 condition.
{"car_id":"3027", "car_owner":"lilei", "car_age":"7", "average_speed":"76", "total_miles":"15000"}

Prerequisites

  1. You have created a DMS for Kafka instance.

    When you create the instance, do not enable Kafka SASL_SSL.

  2. You have created a GaussDB(DWS) instance.

Overall Development Process

Overall Process
Figure 1 Job development process

Step 1: Create a Queue

Step 2: Create a Kafka Topic

Step 3: Create a GaussDB(DWS) Database and Table

Step 4: Create an Enhanced Datasource Connection

Step 5: Run a Job

Step 6: Send Data and Query Results

Step 1: Create a Queue

  1. Log in to the DLI console. In the navigation pane on the left, choose Resources > Queue Management.
  2. On the displayed page, click Buy Queue in the upper right corner.
  3. On the Buy Queue page, set queue parameters as follows:
    • Billing Mode: .
    • Region and Project: Retain the default values.
    • Name: Enter a queue name.

      The queue name can contain only digits, letters, and underscores (_), but cannot contain only digits or start with an underscore (_). The name must contain 1 to 128 characters.

      The queue name is case-insensitive. Uppercase letters will be automatically converted to lowercase letters.

    • Type: Select For general purpose. Select the Dedicated Resource Mode.
    • AZ Mode and Specifications: Retain the default values.
    • Enterprise Project: Select default.
    • Advanced Settings: Select Custom.
    • CIDR Block: Specify the queue network segment. For example, 10.0.0.0/16.

      The CIDR block of a queue cannot overlap with the CIDR blocks of DMS Kafka and RDS for MySQL DB instances. Otherwise, datasource connections will fail to be created.

    • Set other parameters as required.
  4. Click Buy. Confirm the configuration and click Submit.

Step 2: Create a Kafka Topic

  1. On the Kafka management console, click an instance name on the DMS for Kafka page. Basic information of the Kafka instance is displayed.
  2. Choose Topics in the navigation pane on the left. On the displayed page, click Create Topic. Configure the following parameters:
    • Topic Name: For this example, enter testkafkatopic.
    • Partitions: Set the value to 1.
    • Replicas: Set the value to 1.

    Retain default values for other parameters.

Step 3: Create a GaussDB(DWS) Database and Table

  1. .
  2. Connect to the default database gaussdb of a GaussDB(DWS) cluster.
    gsql -d gaussdb -h Connection address of the GaussDB(DWS) cluster -U dbadmin -p 8000 -W password -r
    • gaussdb: Default database of the GaussDB(DWS) cluster
    • Connection address of the DWS cluster: If a public network address is used for connection, set this parameter to the public network IP address or domain name. If a private network address is used for connection, set this parameter to the private network IP address or domain name. If an ELB is used for connection, set this parameter to the ELB address.
    • dbadmin: Default administrator username used during cluster creation
    • password: Default password of the administrator
  3. Run the following command to create the testdwsdb database:
    CREATE DATABASE testdwsdb;
  4. Run the following command to exit the gaussdb database and connect to testdwsdb:
    \q
    gsql -d testdwsdb -h Connection address of the GaussDB(DWS) cluster -U dbadmin -p 8000 -W password -r
  5. Run the following commands to create a table:
    create schema test;
    set current_schema= test;
    drop table if exists qualified_cars;
    CREATE TABLE qualified_cars
    (
        car_id VARCHAR, 
        car_owner VARCHAR,
        car_age INTEGER ,
        average_speed FLOAT8,
        total_miles FLOAT8
    );

Step 4: Create an Enhanced Datasource Connection

Step 5: Run a Job

  1. On the DLI management console, choose Job Management > Flink Jobs. On the Flink Jobs page, click Create Job.
  2. In the Create Job dialog box, set Type to Flink OpenSource SQL and Name to FlinkKafkaDWS. Click OK.
  3. On the job editing page, set the following parameters and retain the default values of other parameters.
    • Queue: Select the queue created in Step 1: Create a Queue.
    • Flink Version: Select 1.12.
    • Save Job Log: Enable this function.
    • OBS Bucket: Select an OBS bucket for storing job logs and grant access permissions of the OBS bucket as prompted.
    • Enable Checkpointing: Enable this function.
    • Enter a SQL statement in the editing pane. The following is an example. Modify the parameters in bold as you need.

      In this example, the syntax version of Flink OpenSource SQL is 1.12. In this example, the data source is Kafka and the result data is written to GaussDB(DWS).

      create table car_infos(
        car_id STRING,
        car_owner STRING,
        car_age INT,
        average_speed DOUBLE,
        total_miles DOUBLE
      ) with (
          "connector" = "kafka",
          "properties.bootstrap.servers" = " 10.128.0.120:9092,10.128.0.89:9092,10.128.0.83:9092 ",-- Internal network address and port number of the Kafka instance
          "properties.group.id" = "click",
          "topic" = " testkafkatopic",--Created Kafka topic
          "format" = "json",
          "scan.startup.mode" = "latest-offset"
      );
      
      create table qualified_cars (
        car_id STRING,
        car_owner STRING,
        car_age INT,
        average_speed DOUBLE,
        total_miles DOUBLE
      )
      WITH (
        'connector' = 'gaussdb',
        'driver' = 'com.gauss200.jdbc.Driver',
        'url'='jdbc:gaussdb://192.168.168.16:8000/testdwsdb ', ---192.168.168.16:8000 indicates the internal IP address and port of the GaussDB(DWS) instance. testdwsdb indicates the name of the created GaussDB(DWS) database.
        'table-name' = ' test\".\"qualified_cars', ---test indicates the schema of the created GaussDB(DWS) table, and qualified_cars indicates the GaussDB(DWS) table name.
        'pwd_auth_name'= 'xxxxx', -- Name of the datasource authentication of the password type created on DLI. If datasource authentication is used, you do not need to set the username and password for the job.
        'write.mode' = 'insert'
      );
      
      /** Output information about qualified vehicles **/
      INSERT INTO qualified_cars
      SELECT *
      FROM car_infos
      where average_speed <= 90 and total_miles <= 200000;
  4. Click Check Semantic and ensure that the SQL statement passes the check. Click Save. Click Start, confirm the job parameters, and click Start Now to execute the job. Wait until the job status changes to Running.

Step 6: Send Data and Query Results

  1. Use the Kafka client to send data to topics created in Step 2: Create a Kafka Topic to simulate real-time data streams.

    The sample data is as follows:

    {"car_id":"3027", "car_owner":"lilei", "car_age":"7", "average_speed":"76", "total_miles":"15000"}
    {"car_id":"3028", "car_owner":"hanmeimei", "car_age":"6", "average_speed":"92", "total_miles":"17000"}
    {"car_id":"3029", "car_owner":"Ann", "car_age":"10", "average_speed":"81", "total_miles":"230000"}
  2. Connect to the created GaussDB(DWS) cluster.
  3. Connect to the default database testdwsdb of a GaussDB(DWS) cluster.
    gsql -d testdwsdb -h Connection address of the GaussDB(DWS) cluster -U dbadmin -p 8000 -W password -r
  4. Run the following statement to query GaussDB(DWS) table data:
    select * from test.qualified_cars;
    The query result is as follows:
    car_id  car_owner  car_age  average_speed  total_miles
    3027      lilei     7           76.0       15000.0