forked from docs/doc-exports
Reviewed-by: Pruthi, Vineet <vineet.pruthi@t-systems.com> Co-authored-by: Su, Xiaomeng <suxiaomeng1@huawei.com> Co-committed-by: Su, Xiaomeng <suxiaomeng1@huawei.com>
201 lines
19 KiB
HTML
201 lines
19 KiB
HTML
<a name="dli_08_15031"></a><a name="dli_08_15031"></a>
|
|
|
|
<h1 class="topictitle1">DataGen</h1>
|
|
<div id="body0000001737592992"><div class="section" id="dli_08_15031__section1598811329411"><h4 class="sectiontitle">Function</h4><p id="dli_08_15031__p138701039965">DataGen is used to generate random data for debugging and testing.</p>
|
|
|
|
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="dli_08_15031__table3954102713514" frame="border" border="1" rules="all"><caption><b>Table 1 </b>Supported types</caption><thead align="left"><tr id="dli_08_15031__row139551727153515"><th align="left" class="cellrowborder" valign="top" width="33.87%" id="mcps1.3.1.3.2.3.1.1"><p id="dli_08_15031__p169550272355">Type</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="66.13%" id="mcps1.3.1.3.2.3.1.2"><p id="dli_08_15031__p9955172713520">Description</p>
|
|
</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody><tr id="dli_08_15031__row595518271358"><td class="cellrowborder" valign="top" width="33.87%" headers="mcps1.3.1.3.2.3.1.1 "><p id="dli_08_15031__p4955182716353">Supported Table Types</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="66.13%" headers="mcps1.3.1.3.2.3.1.2 "><p id="dli_08_15031__p1595518273356">Source table</p>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
</div>
|
|
<div class="section" id="dli_08_15031__section17666546857"><h4 class="sectiontitle">Caveats</h4><ul id="dli_08_15031__ul144771616134614"><li id="dli_08_15031__li164771916194619">When you create a DataGen table, the table field type cannot be Array, Map, or Row. You can use <strong id="dli_08_15031__b382012118440">COMPUTED COLUMN</strong> in <a href="dli_08_15006.html">CREATE TABLE</a> to construct similar functions.</li><li id="dli_08_15031__li13608118132418">When you create a Flink OpenSource SQL job, set <strong id="dli_08_15031__dli_08_15029_b163001353185217">Flink Version</strong> to <strong id="dli_08_15031__dli_08_15029_b1430115539523">1.15</strong> in the <strong id="dli_08_15031__dli_08_15029_b1030175315523">Running Parameters</strong> tab. Select <strong id="dli_08_15031__dli_08_15029_b430135325212">Save Job Log</strong>, and specify the OBS bucket for saving job logs.</li><li id="dli_08_15031__li980192610493">Storing authentication credentials such as usernames and passwords in code or plaintext poses significant security risks. It is recommended using DEW to manage credentials instead. Storing encrypted credentials in configuration files or environment variables and decrypting them when needed ensures security. For details, see .</li></ul>
|
|
</div>
|
|
<div class="section" id="dli_08_15031__section69469501551"><h4 class="sectiontitle">Syntax</h4><pre class="screen" id="dli_08_15031__screen19281731397">create table dataGenSource(
|
|
attr_name attr_type
|
|
(',' attr_name attr_type)*
|
|
(',' WATERMARK FOR rowtime_column_name AS watermark-strategy_expression)
|
|
)
|
|
with (
|
|
'connector' = 'datagen'
|
|
);</pre>
|
|
</div>
|
|
<div class="section" id="dli_08_15031__section0475313610"><h4 class="sectiontitle">Parameter Description</h4>
|
|
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="dli_08_15031__table517231215112" frame="border" border="1" rules="all"><caption><b>Table 2 </b>Parameters</caption><thead align="left"><tr id="dli_08_15031__row6172712121117"><th align="left" class="cellrowborder" valign="top" width="11.72%" id="mcps1.3.4.2.2.6.1.1"><p id="dli_08_15031__p1417216126113">Parameter</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="9.53%" id="mcps1.3.4.2.2.6.1.2"><p id="dli_08_15031__p161724126119">Mandatory</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="16.32%" id="mcps1.3.4.2.2.6.1.3"><p id="dli_08_15031__p16172151215110">Default Value</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="11.53%" id="mcps1.3.4.2.2.6.1.4"><p id="dli_08_15031__p19172512121111">Data Type</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="50.9%" id="mcps1.3.4.2.2.6.1.5"><p id="dli_08_15031__p19172101221114">Description</p>
|
|
</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody><tr id="dli_08_15031__row8172141214114"><td class="cellrowborder" valign="top" width="11.72%" headers="mcps1.3.4.2.2.6.1.1 "><p id="dli_08_15031__p1017211124118">connector</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="9.53%" headers="mcps1.3.4.2.2.6.1.2 "><p id="dli_08_15031__p13172912101115">Yes</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="16.32%" headers="mcps1.3.4.2.2.6.1.3 "><p id="dli_08_15031__p168024243501">None</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="11.53%" headers="mcps1.3.4.2.2.6.1.4 "><p id="dli_08_15031__p141721112151113">String</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50.9%" headers="mcps1.3.4.2.2.6.1.5 "><p id="dli_08_15031__p15172161213119">Connector to be used. Set this parameter to <strong id="dli_08_15031__b120006448044045">datagen</strong>.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="dli_08_15031__row1172131216111"><td class="cellrowborder" valign="top" width="11.72%" headers="mcps1.3.4.2.2.6.1.1 "><p id="dli_08_15031__p8172161218116">rows-per-second</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="9.53%" headers="mcps1.3.4.2.2.6.1.2 "><p id="dli_08_15031__p131721127112">No</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="16.32%" headers="mcps1.3.4.2.2.6.1.3 "><p id="dli_08_15031__p111721412111110">10000</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="11.53%" headers="mcps1.3.4.2.2.6.1.4 "><p id="dli_08_15031__p117211126113">Long</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50.9%" headers="mcps1.3.4.2.2.6.1.5 "><p id="dli_08_15031__p121721112101118">Rows per second to control the emit rate.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="dli_08_15031__row16544110151218"><td class="cellrowborder" valign="top" width="11.72%" headers="mcps1.3.4.2.2.6.1.1 "><p id="dli_08_15031__p1354430121216">number-of-rows</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="9.53%" headers="mcps1.3.4.2.2.6.1.2 "><p id="dli_08_15031__p954410019121">No</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="16.32%" headers="mcps1.3.4.2.2.6.1.3 "><p id="dli_08_15031__p12544140141214">None</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="11.53%" headers="mcps1.3.4.2.2.6.1.4 "><p id="dli_08_15031__p3544170161217">Long</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50.9%" headers="mcps1.3.4.2.2.6.1.5 "><p id="dli_08_15031__p1654490161215">The total number of rows to emit. By default, the total number of rows of generated data is not limited. If the generator type is a sequence generator, data generation will stop when either the maximum number of rows has been reached or the sequence number has reached its end value.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="dli_08_15031__row8172812171117"><td class="cellrowborder" valign="top" width="11.72%" headers="mcps1.3.4.2.2.6.1.1 "><p id="dli_08_15031__p15172101291119">fields.#.kind</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="9.53%" headers="mcps1.3.4.2.2.6.1.2 "><p id="dli_08_15031__p181721712171120">No</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="16.32%" headers="mcps1.3.4.2.2.6.1.3 "><p id="dli_08_15031__p01721212141119">random</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="11.53%" headers="mcps1.3.4.2.2.6.1.4 "><p id="dli_08_15031__p41721712111118">String</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50.9%" headers="mcps1.3.4.2.2.6.1.5 "><p id="dli_08_15031__p31971475539">Generator of the <strong id="dli_08_15031__b111610498744045">#</strong> field. The <strong id="dli_08_15031__b132446440144045">#</strong> field must be an actual field in the DataGen table. Replace <strong id="dli_08_15031__b82448296844045">#</strong> with the corresponding field name. The meanings of the <strong id="dli_08_15031__b208576370244045">#</strong> field for other parameters are the same.</p>
|
|
<p id="dli_08_15031__p8543610141814">The value can be <strong id="dli_08_15031__b111400711844045">sequence</strong> or <strong id="dli_08_15031__b53968837644045">random</strong>.</p>
|
|
<ul id="dli_08_15031__ul11237181812018"><li id="dli_08_15031__li129801523117"><strong id="dli_08_15031__b1328112710366">random</strong> is the default value, indicating an unbounded random generator. You can use the <strong id="dli_08_15031__b121181072944045">fields.#.max</strong> and <strong id="dli_08_15031__b157454507944045">fields.#.min</strong> parameters to specify the maximum and minimum values that are randomly generated. If the specified field type is char, varchar, or string, you can also use the <strong id="dli_08_15031__b15343643844045">fields.#.length</strong> parameter to specify the length. If the specified field type is a timestamp, you can use the <strong id="dli_08_15031__b71441147121119">fields.#.max-past</strong> parameter to specify the maximum offset from the current time towards the past.</li><li id="dli_08_15031__li525092319018"><strong id="dli_08_15031__b1731416308125">sequence</strong> represents a bounded sequence generator. You can specify the start and end values of the sequence using <strong id="dli_08_15031__b281215641213">fields.#.start</strong> and <strong id="dli_08_15031__b17807133191316">fields.#.end</strong>. Once the sequence number reaches the end value, no more data will be generated.</li></ul>
|
|
</td>
|
|
</tr>
|
|
<tr id="dli_08_15031__row1717213122117"><td class="cellrowborder" valign="top" width="11.72%" headers="mcps1.3.4.2.2.6.1.1 "><p id="dli_08_15031__p151728122110">fields.#.min</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="9.53%" headers="mcps1.3.4.2.2.6.1.2 "><p id="dli_08_15031__p71721912141111">No</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="16.32%" headers="mcps1.3.4.2.2.6.1.3 "><p id="dli_08_15031__p17172191251117">Minimum value of the field type specified by <strong id="dli_08_15031__b100339631844045">#</strong></p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="11.53%" headers="mcps1.3.4.2.2.6.1.4 "><p id="dli_08_15031__p1617221213117">Field type specified by <strong id="dli_08_15031__b72809986544045">#</strong></p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50.9%" headers="mcps1.3.4.2.2.6.1.5 "><p id="dli_08_15031__p420473212613">This parameter is valid only when <strong id="dli_08_15031__b73744834944045">fields.#.kind</strong> is set to <strong id="dli_08_15031__b11267253344045">random</strong>.</p>
|
|
<p id="dli_08_15031__p13172712121120">Minimum value of the random generator. It applies only to numeric field types specified by <strong id="dli_08_15031__b20695360444045">#</strong>.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="dli_08_15031__row8172121251110"><td class="cellrowborder" valign="top" width="11.72%" headers="mcps1.3.4.2.2.6.1.1 "><p id="dli_08_15031__p17172212181117">fields.#.max</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="9.53%" headers="mcps1.3.4.2.2.6.1.2 "><p id="dli_08_15031__p117201215119">No</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="16.32%" headers="mcps1.3.4.2.2.6.1.3 "><p id="dli_08_15031__p1985120175617">Maximum value of the field type specified by <strong id="dli_08_15031__b141746208444045">#</strong></p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="11.53%" headers="mcps1.3.4.2.2.6.1.4 "><p id="dli_08_15031__p1661330654">Field type specified by <strong id="dli_08_15031__b109821267744045">#</strong></p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50.9%" headers="mcps1.3.4.2.2.6.1.5 "><p id="dli_08_15031__p368316402811">This parameter is valid only when <strong id="dli_08_15031__b2183006544045">fields.#.kind</strong> is set to <strong id="dli_08_15031__b110764909944045">random</strong>.</p>
|
|
<p id="dli_08_15031__p10172912121116">Maximum value of the random number. It applies only to numeric field types specified by <strong id="dli_08_15031__b109109501044045">#</strong>.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="dli_08_15031__row66591534162012"><td class="cellrowborder" valign="top" width="11.72%" headers="mcps1.3.4.2.2.6.1.1 "><p id="dli_08_15031__p13659173411201">fields.#.max-past</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="9.53%" headers="mcps1.3.4.2.2.6.1.2 "><p id="dli_08_15031__p56591134182019">No</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="16.32%" headers="mcps1.3.4.2.2.6.1.3 "><p id="dli_08_15031__p14659134122017">0</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="11.53%" headers="mcps1.3.4.2.2.6.1.4 "><p id="dli_08_15031__p106591634132019">Duration</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50.9%" headers="mcps1.3.4.2.2.6.1.5 "><p id="dli_08_15031__p823916108407">This parameter is valid only when <strong id="dli_08_15031__b147097076244045">fields.#.kind</strong> is set to <strong id="dli_08_15031__b75353631244045">random</strong>.</p>
|
|
<p id="dli_08_15031__p865973417207">The random generator generates a maximum offset from the current time towards the past. The <strong id="dli_08_15031__b18336205391313">#</strong> specified field is only applicable to timestamp types.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="dli_08_15031__row417211219118"><td class="cellrowborder" valign="top" width="11.72%" headers="mcps1.3.4.2.2.6.1.1 "><p id="dli_08_15031__p20172141217114">fields.#.length</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="9.53%" headers="mcps1.3.4.2.2.6.1.2 "><p id="dli_08_15031__p131729128113">No</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="16.32%" headers="mcps1.3.4.2.2.6.1.3 "><p id="dli_08_15031__p217281291119">100</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="11.53%" headers="mcps1.3.4.2.2.6.1.4 "><p id="dli_08_15031__p12172912181112">Integer</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50.9%" headers="mcps1.3.4.2.2.6.1.5 "><p id="dli_08_15031__p36511201197">This parameter is valid only when <strong id="dli_08_15031__b178900662044045">fields.#.kind</strong> is set to <strong id="dli_08_15031__b133444771244045">random</strong>.</p>
|
|
<p id="dli_08_15031__p171727125117">Length of the characters generated by the random generator. It applies only to char, varchar, and string types specified by <strong id="dli_08_15031__b85908916344045">#</strong>.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="dli_08_15031__row15172121213111"><td class="cellrowborder" valign="top" width="11.72%" headers="mcps1.3.4.2.2.6.1.1 "><p id="dli_08_15031__p10172181231115">fields.#.start</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="9.53%" headers="mcps1.3.4.2.2.6.1.2 "><p id="dli_08_15031__p1717281271114">No</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="16.32%" headers="mcps1.3.4.2.2.6.1.3 "><p id="dli_08_15031__p217221281110">None</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="11.53%" headers="mcps1.3.4.2.2.6.1.4 "><p id="dli_08_15031__p3432185819915">Field type specified by <strong id="dli_08_15031__b5897836944045">#</strong></p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50.9%" headers="mcps1.3.4.2.2.6.1.5 "><p id="dli_08_15031__p1067273311910">This parameter is valid only when <strong id="dli_08_15031__b170188773444045">fields.#.kind</strong> is set to <strong id="dli_08_15031__b80595396544045">sequence</strong>.</p>
|
|
<p id="dli_08_15031__p1917201261115">Start value of a sequence generator.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="dli_08_15031__row17172812201110"><td class="cellrowborder" valign="top" width="11.72%" headers="mcps1.3.4.2.2.6.1.1 "><p id="dli_08_15031__p9172712161114">fields.#.end</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="9.53%" headers="mcps1.3.4.2.2.6.1.2 "><p id="dli_08_15031__p2172171271120">No</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="16.32%" headers="mcps1.3.4.2.2.6.1.3 "><p id="dli_08_15031__p12172171241117">None</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="11.53%" headers="mcps1.3.4.2.2.6.1.4 "><p id="dli_08_15031__p817291271110">Field type specified by <strong id="dli_08_15031__b172641533344045">#</strong></p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50.9%" headers="mcps1.3.4.2.2.6.1.5 "><p id="dli_08_15031__p172726421499">This parameter is valid only when <strong id="dli_08_15031__b26636673844045">fields.#.kind</strong> is set to <strong id="dli_08_15031__b3893391644045">sequence</strong>.</p>
|
|
<p id="dli_08_15031__p017251217118">End value of a sequence generator.</p>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
</div>
|
|
<div class="section" id="dli_08_15031__section1572208365"><h4 class="sectiontitle">Example</h4><p id="dli_08_15031__p133815267248">Create a Flink OpenSource SQL job. Run the following script to generate random data through the DataGen table and output the data to the Print result table.</p>
|
|
<pre class="screen" id="dli_08_15031__screen1990817895210">create table dataGenSource(
|
|
user_id string,
|
|
amount int
|
|
) with (
|
|
'connector' = 'datagen',
|
|
'rows-per-second' = '1', --Generates a piece of data per second.
|
|
'fields.user_id.kind' = 'random', --Specifies a random generator for the user_id field.
|
|
'fields.user_id.length' = '3' --Limits the length of the user_id field to 3.
|
|
'fields.amount.kind' = 'sequence', --Specify a sequence generator for the <strong id="dli_08_15031__b3973222201510">amount</strong> field.
|
|
'fields.amount.start' = '1', --Start value of the <strong id="dli_08_15031__b67424171518">amount</strong> field
|
|
'fields.amount.end' = '1000' --End value of the <strong id="dli_08_15031__b688165131511">amount</strong> field
|
|
);
|
|
|
|
create table printSink(
|
|
user_id string,
|
|
amount int
|
|
) with (
|
|
'connector' = 'print'
|
|
);
|
|
|
|
insert into printSink select * from dataGenSource;</pre>
|
|
<p id="dli_08_15031__p23018165418">After the job is submitted, the job status changes to <strong id="dli_08_15031__b152814328891914">Running</strong>. You can perform the following operations of either method to view the output result:</p>
|
|
<ul id="dli_08_15031__ul133191514102210"><li id="dli_08_15031__li11970649381">Method 1:<ol id="dli_08_15031__ol4711275385"><li id="dli_08_15031__li5612141010388">Log in to the DLI console. In the navigation pane, choose <strong id="dli_08_15031__b13543276491927">Job Management</strong> > <strong id="dli_08_15031__b64336516991927">Flink Jobs</strong>.</li><li id="dli_08_15031__li117110711383">Locate the row that contains the target Flink job, and choose <strong id="dli_08_15031__b4539672191920">More</strong> > <strong id="dli_08_15031__b177002727391920">FlinkUI</strong> in the <strong id="dli_08_15031__b126476822691920">Operation</strong> column.</li><li id="dli_08_15031__li07833554385">On the Flink UI, choose <strong id="dli_08_15031__b70112549491921">Task Managers</strong>, click the task name, and select <strong id="dli_08_15031__b159390306591921">Stdout</strong> to view job logs.</li></ol>
|
|
</li><li id="dli_08_15031__li341910155285">Method 2: If you select <strong id="dli_08_15031__b120018362491924">Save Job Log</strong> on the <strong id="dli_08_15031__b97127601291924">Running Parameters</strong> tab before submitting the job, perform the following operations:<ol id="dli_08_15031__ol864115198285"><li id="dli_08_15031__li10901621122819">Log in to the DLI console. In the navigation pane, choose <strong id="dli_08_15031__b156441691491927">Job Management</strong> > <strong id="dli_08_15031__b167760250591927">Flink Jobs</strong>.</li><li id="dli_08_15031__li1912163912282">Click the name of the corresponding Flink job, choose <strong id="dli_08_15031__b103083841991937">Run Log</strong>, click <strong id="dli_08_15031__b205495117691937">OBS Bucket</strong>, and locate the folder of the log you want to view according to the date.</li><li id="dli_08_15031__li0641191914285">Go to the folder of the date, find the folder whose name contains <strong id="dli_08_15031__b198559959592234">taskmanager</strong>, download the file whose name contains <strong id="dli_08_15031__b175184934192234">taskmanager.out</strong>, and view result logs.</li></ol>
|
|
</li></ul>
|
|
</div>
|
|
</div>
|
|
<div>
|
|
<div class="familylinks">
|
|
<div class="parentlink"><strong>Parent topic:</strong> <a href="dli_08_15027.html">Connectors</a></div>
|
|
</div>
|
|
</div>
|
|
|