DataGen is used to generate random data for debugging and testing.
Type |
Description |
|---|---|
Supported Table Types |
Source table |
create table dataGenSource(
attr_name attr_type
(',' attr_name attr_type)*
(',' WATERMARK FOR rowtime_column_name AS watermark-strategy_expression)
)
with (
'connector' = 'datagen'
);
Parameter |
Mandatory |
Default Value |
Data Type |
Description |
|---|---|---|---|---|
connector |
Yes |
None |
String |
Connector to be used. Set this parameter to datagen. |
rows-per-second |
No |
10000 |
Long |
Rows per second to control the emit rate. |
number-of-rows |
No |
None |
Long |
The total number of rows to emit. By default, the total number of rows of generated data is not limited. If the generator type is a sequence generator, data generation will stop when either the maximum number of rows has been reached or the sequence number has reached its end value. |
fields.#.kind |
No |
random |
String |
Generator of the # field. The # field must be an actual field in the DataGen table. Replace # with the corresponding field name. The meanings of the # field for other parameters are the same. The value can be sequence or random.
|
fields.#.min |
No |
Minimum value of the field type specified by # |
Field type specified by # |
This parameter is valid only when fields.#.kind is set to random. Minimum value of the random generator. It applies only to numeric field types specified by #. |
fields.#.max |
No |
Maximum value of the field type specified by # |
Field type specified by # |
This parameter is valid only when fields.#.kind is set to random. Maximum value of the random number. It applies only to numeric field types specified by #. |
fields.#.max-past |
No |
0 |
Duration |
This parameter is valid only when fields.#.kind is set to random. The random generator generates a maximum offset from the current time towards the past. The # specified field is only applicable to timestamp types. |
fields.#.length |
No |
100 |
Integer |
This parameter is valid only when fields.#.kind is set to random. Length of the characters generated by the random generator. It applies only to char, varchar, and string types specified by #. |
fields.#.start |
No |
None |
Field type specified by # |
This parameter is valid only when fields.#.kind is set to sequence. Start value of a sequence generator. |
fields.#.end |
No |
None |
Field type specified by # |
This parameter is valid only when fields.#.kind is set to sequence. End value of a sequence generator. |
Create a Flink OpenSource SQL job. Run the following script to generate random data through the DataGen table and output the data to the Print result table.
create table dataGenSource( user_id string, amount int ) with ( 'connector' = 'datagen', 'rows-per-second' = '1', --Generates a piece of data per second. 'fields.user_id.kind' = 'random', --Specifies a random generator for the user_id field. 'fields.user_id.length' = '3' --Limits the length of the user_id field to 3. 'fields.amount.kind' = 'sequence', --Specify a sequence generator for the amount field. 'fields.amount.start' = '1', --Start value of the amount field 'fields.amount.end' = '1000' --End value of the amount field ); create table printSink( user_id string, amount int ) with ( 'connector' = 'print' ); insert into printSink select * from dataGenSource;
After the job is submitted, the job status changes to Running. You can perform the following operations of either method to view the output result: