Files
doc-exports/docs/dli/sqlreference/dli_08_15065.html
Su, Xiaomeng be9eabe464 dli_sqlreference_20250305
Reviewed-by: Pruthi, Vineet <vineet.pruthi@t-systems.com>
Co-authored-by: Su, Xiaomeng <suxiaomeng1@huawei.com>
Co-committed-by: Su, Xiaomeng <suxiaomeng1@huawei.com>
2025-03-25 09:06:21 +00:00

368 lines
36 KiB
HTML

<a name="dli_08_15065"></a><a name="dli_08_15065"></a>
<h1 class="topictitle1">Upsert Kafka</h1>
<div id="body0000001737753660"><div class="section" id="dli_08_15065__section77915506517"><h4 class="sectiontitle">Function</h4><p id="dli_08_15065__p1370804594310">Apache Kafka is a fast, scalable, and fault-tolerant distributed message publishing and subscription system. It delivers high throughput and built-in partitions and provides data replicas and fault tolerance. Apache Kafka is applicable to scenarios of handling massive messages. The Upsert Kafka connector allows for reading data from and writing data into Kafka topics in the upsert fashion. Source tables and result tables are supported.</p>
<ul id="dli_08_15065__ul19293135815715"><li id="dli_08_15065__li25071681185">As a source, the upsert-kafka connector produces a changelog stream, where each data record represents an update or delete event.<p id="dli_08_15065__p7437691583"><a name="dli_08_15065__li25071681185"></a><a name="li25071681185"></a>The value in a data record is interpreted as an UPDATE of the last value for the same key, if any (if a corresponding key does not exist yet, the UPDATE will be considered an INSERT). Using the table analogy, a data record in a changelog stream is interpreted as an UPSERT, also known as INSERT/UPDATE, because any existing row with the same key is overwritten. Also, null values are interpreted in a special way: A record with a null value represents a DELETE.</p>
</li><li id="dli_08_15065__li1429314581975">As a sink, the upsert-kafka connector can consume a changelog stream. It will write INSERT/UPDATE_AFTER data as normal Kafka messages value, and write DELETE data as Kafka messages with null values (indicate tombstone for the key). Flink will guarantee the message ordering on the primary key by partition data on the values of the primary key columns, so the UPDATE/DELETE messages on the same key will fall into the same partition.</li></ul>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="dli_08_15065__table3954102713514" frame="border" border="1" rules="all"><caption><b>Table 1 </b>Supported types</caption><thead align="left"><tr id="dli_08_15065__row139551727153515"><th align="left" class="cellrowborder" valign="top" width="33.87%" id="mcps1.3.1.4.2.3.1.1"><p id="dli_08_15065__p169550272355">Type</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="66.13%" id="mcps1.3.1.4.2.3.1.2"><p id="dli_08_15065__p9955172713520">Description</p>
</th>
</tr>
</thead>
<tbody><tr id="dli_08_15065__row595518271358"><td class="cellrowborder" valign="top" width="33.87%" headers="mcps1.3.1.4.2.3.1.1 "><p id="dli_08_15065__p4955182716353">Supported Table Types</p>
</td>
<td class="cellrowborder" valign="top" width="66.13%" headers="mcps1.3.1.4.2.3.1.2 "><p id="dli_08_15065__p1595518273356">Source table and result table</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="dli_08_15065__dli_08_0256_en-us_topic_0132788972_section2579142713429"><h4 class="sectiontitle">Prerequisites</h4><div class="p" id="dli_08_15065__p12653145115206">An enhanced datasource connection has been created for DLI to connect to Kafka clusters, so that jobs can run on the dedicated queue of DLI and you can set the security group rules as required.
</div>
</div>
<div class="section" id="dli_08_15065__section1230618441125"><h4 class="sectiontitle">Caveats</h4><ul id="dli_08_15065__ul181711754173415"><li id="dli_08_15065__li13608118132418">When you create a Flink OpenSource SQL job, set <strong id="dli_08_15065__dli_08_15029_b163001353185217">Flink Version</strong> to <strong id="dli_08_15065__dli_08_15029_b1430115539523">1.15</strong> in the <strong id="dli_08_15065__dli_08_15029_b1030175315523">Running Parameters</strong> tab. Select <strong id="dli_08_15065__dli_08_15029_b430135325212">Save Job Log</strong>, and specify the OBS bucket for saving job logs.</li><li id="dli_08_15065__li980192610493">Storing authentication credentials such as usernames and passwords in code or plaintext poses significant security risks. It is recommended using DEW to manage credentials instead. Storing encrypted credentials in configuration files or environment variables and decrypting them when needed ensures security. For details, see .</li><li id="dli_08_15065__li19866203716378">The Upsert Kafka always works in the upsert fashion and requires to define the primary key in the DDL. With the assumption that records with the same key should be ordered in the same partition, the primary key semantic on the changelog source means the materialized changelog is unique on the primary keys. The primary key definition will also control which fields should end up in Kafka's key.</li><li id="dli_08_15065__li19218997204">Because the connector is working in upsert mode, the last record on the same key will take effect when reading back as a source.</li><li id="dli_08_15065__li118441048194615">For details about how to use data types, see <a href="dli_08_15014.html">Format</a>.</li></ul>
</div>
<div class="section" id="dli_08_15065__section892013201167"><h4 class="sectiontitle">Syntax</h4><div class="codecoloring" codetype="Sql" id="dli_08_15065__dli_08_0256_screen1461215294716"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal"> 1</span>
<span class="normal"> 2</span>
<span class="normal"> 3</span>
<span class="normal"> 4</span>
<span class="normal"> 5</span>
<span class="normal"> 6</span>
<span class="normal"> 7</span>
<span class="normal"> 8</span>
<span class="normal"> 9</span>
<span class="normal">10</span>
<span class="normal">11</span>
<span class="normal">12</span></pre></div></td><td class="code"><div><pre><span></span><span class="k">create</span><span class="w"> </span><span class="k">table</span><span class="w"> </span><span class="n">kafkaTable</span><span class="p">(</span>
<span class="w"> </span><span class="n">attr_name</span><span class="w"> </span><span class="n">attr_type</span><span class="w"> </span>
<span class="w"> </span><span class="p">(</span><span class="s1">','</span><span class="w"> </span><span class="n">attr_name</span><span class="w"> </span><span class="n">attr_type</span><span class="p">)</span><span class="o">*</span><span class="w"> </span>
<span class="w"> </span><span class="p">(</span><span class="s1">','</span><span class="k">PRIMARY</span><span class="w"> </span><span class="k">KEY</span><span class="w"> </span><span class="p">(</span><span class="n">attr_name</span><span class="p">,</span><span class="w"> </span><span class="p">...)</span><span class="w"> </span><span class="k">NOT</span><span class="w"> </span><span class="n">ENFORCED</span><span class="p">)</span>
<span class="p">)</span>
<span class="k">with</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="s1">'connector'</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'upsert-kafka'</span><span class="p">,</span>
<span class="w"> </span><span class="s1">'topic'</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">''</span><span class="p">,</span>
<span class="w"> </span><span class="s1">'properties.bootstrap.servers'</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">''</span><span class="p">,</span>
<span class="w"> </span><span class="s1">'key.format'</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">''</span><span class="p">,</span>
<span class="w"> </span><span class="s1">'value.format'</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">''</span>
<span class="p">);</span>
</pre></div></td></tr></table></div>
</div>
</div>
<div class="section" id="dli_08_15065__section893519312815"><h4 class="sectiontitle">Parameter Description</h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="dli_08_15065__table82421231587" frame="border" border="1" rules="all"><caption><b>Table 2 </b>Parameters</caption><thead align="left"><tr id="dli_08_15065__row13242831486"><th align="left" class="cellrowborder" valign="top" width="19.25%" id="mcps1.3.5.2.2.6.1.1"><p id="dli_08_15065__p524213118810">Parameter</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="9.5%" id="mcps1.3.5.2.2.6.1.2"><p id="dli_08_15065__p1124218311813">Mandatory</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="9.93%" id="mcps1.3.5.2.2.6.1.3"><p id="dli_08_15065__p8335102131613">Default Value</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="11.19%" id="mcps1.3.5.2.2.6.1.4"><p id="dli_08_15065__p12962019151617">Data Type</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="50.129999999999995%" id="mcps1.3.5.2.2.6.1.5"><p id="dli_08_15065__p15242163119815">Description</p>
</th>
</tr>
</thead>
<tbody><tr id="dli_08_15065__row15242113115811"><td class="cellrowborder" valign="top" width="19.25%" headers="mcps1.3.5.2.2.6.1.1 "><p id="dli_08_15065__p02421431888">connector</p>
</td>
<td class="cellrowborder" valign="top" width="9.5%" headers="mcps1.3.5.2.2.6.1.2 "><p id="dli_08_15065__p112425311817">Yes</p>
</td>
<td class="cellrowborder" valign="top" width="9.93%" headers="mcps1.3.5.2.2.6.1.3 "><p id="dli_08_15065__p12117414204814">None</p>
</td>
<td class="cellrowborder" valign="top" width="11.19%" headers="mcps1.3.5.2.2.6.1.4 "><p id="dli_08_15065__p229631913168">String</p>
</td>
<td class="cellrowborder" valign="top" width="50.129999999999995%" headers="mcps1.3.5.2.2.6.1.5 "><p id="dli_08_15065__p11242431287">Connector to be used. For the Upsert Kafka connector, set this parameter to <strong id="dli_08_15065__b3957144613018">upsert-kafka</strong>.</p>
</td>
</tr>
<tr id="dli_08_15065__row82421531587"><td class="cellrowborder" valign="top" width="19.25%" headers="mcps1.3.5.2.2.6.1.1 "><p id="dli_08_15065__p2024317318813">topic</p>
</td>
<td class="cellrowborder" valign="top" width="9.5%" headers="mcps1.3.5.2.2.6.1.2 "><p id="dli_08_15065__p12431831982">Yes</p>
</td>
<td class="cellrowborder" valign="top" width="9.93%" headers="mcps1.3.5.2.2.6.1.3 "><p id="dli_08_15065__p8332518174816">None</p>
</td>
<td class="cellrowborder" valign="top" width="11.19%" headers="mcps1.3.5.2.2.6.1.4 "><p id="dli_08_15065__p1429741971613">String</p>
</td>
<td class="cellrowborder" valign="top" width="50.129999999999995%" headers="mcps1.3.5.2.2.6.1.5 "><p id="dli_08_15065__p12432311483">Kafka topic name</p>
</td>
</tr>
<tr id="dli_08_15065__row724317317820"><td class="cellrowborder" valign="top" width="19.25%" headers="mcps1.3.5.2.2.6.1.1 "><p id="dli_08_15065__p924303118810">properties.bootstrap.servers</p>
</td>
<td class="cellrowborder" valign="top" width="9.5%" headers="mcps1.3.5.2.2.6.1.2 "><p id="dli_08_15065__p32435314815">Yes</p>
</td>
<td class="cellrowborder" valign="top" width="9.93%" headers="mcps1.3.5.2.2.6.1.3 "><p id="dli_08_15065__p1333611214161">None</p>
</td>
<td class="cellrowborder" valign="top" width="11.19%" headers="mcps1.3.5.2.2.6.1.4 "><p id="dli_08_15065__p162972194169">String</p>
</td>
<td class="cellrowborder" valign="top" width="50.129999999999995%" headers="mcps1.3.5.2.2.6.1.5 "><p id="dli_08_15065__p024319311282">Comma separated list of Kafka brokers</p>
</td>
</tr>
<tr id="dli_08_15065__row17243203117811"><td class="cellrowborder" valign="top" width="19.25%" headers="mcps1.3.5.2.2.6.1.1 "><p id="dli_08_15065__p112431031983">key.format</p>
</td>
<td class="cellrowborder" valign="top" width="9.5%" headers="mcps1.3.5.2.2.6.1.2 "><p id="dli_08_15065__p124317311385">Yes</p>
</td>
<td class="cellrowborder" valign="top" width="9.93%" headers="mcps1.3.5.2.2.6.1.3 "><p id="dli_08_15065__p183361526165">None</p>
</td>
<td class="cellrowborder" valign="top" width="11.19%" headers="mcps1.3.5.2.2.6.1.4 "><p id="dli_08_15065__p9297171920164">String</p>
</td>
<td class="cellrowborder" valign="top" width="50.129999999999995%" headers="mcps1.3.5.2.2.6.1.5 "><p id="dli_08_15065__p9603137143311">Format used to deserialize and serialize the key part of Kafka messages. The key fields are specified by the <strong id="dli_08_15065__b1522716261768">PRIMARY KEY</strong> syntax. The following formats are supported:</p>
<ul id="dli_08_15065__ul6122242153320"><li id="dli_08_15065__li19328194414334">csv</li><li id="dli_08_15065__li1156344893319">json</li><li id="dli_08_15065__li853832410492">avro</li></ul>
<p id="dli_08_15065__p953952912491">Refer to <a href="dli_08_15014.html">Format</a> for more details and format parameters.</p>
</td>
</tr>
<tr id="dli_08_15065__row1912194914110"><td class="cellrowborder" valign="top" width="19.25%" headers="mcps1.3.5.2.2.6.1.1 "><p id="dli_08_15065__p179131249144115">key.fields-prefix</p>
</td>
<td class="cellrowborder" valign="top" width="9.5%" headers="mcps1.3.5.2.2.6.1.2 "><p id="dli_08_15065__p4913149114111">No</p>
</td>
<td class="cellrowborder" valign="top" width="9.93%" headers="mcps1.3.5.2.2.6.1.3 "><p id="dli_08_15065__p11800322104818">None</p>
</td>
<td class="cellrowborder" valign="top" width="11.19%" headers="mcps1.3.5.2.2.6.1.4 "><p id="dli_08_15065__p16913114917417">String</p>
</td>
<td class="cellrowborder" valign="top" width="50.129999999999995%" headers="mcps1.3.5.2.2.6.1.5 "><p id="dli_08_15065__p3744911145019">Defines a custom prefix for all fields of the key format to avoid name clashes with fields of the value format.</p>
<p id="dli_08_15065__p39131349154111">By default, the prefix is empty. If a custom prefix is defined, both the table schema and <strong id="dli_08_15065__b76225521762">key.fields</strong> will work with prefixed names. When constructing the data type of the key format, the prefix will be removed and the non-prefixed names will be used within the key format. Note that this option requires that <strong id="dli_08_15065__b6529751275">value.fields-include</strong> be set to <strong id="dli_08_15065__b11530151677">EXCEPT_KEY</strong>.</p>
</td>
</tr>
<tr id="dli_08_15065__row750314113432"><td class="cellrowborder" valign="top" width="19.25%" headers="mcps1.3.5.2.2.6.1.1 "><p id="dli_08_15065__p145042011104317">value.format</p>
</td>
<td class="cellrowborder" valign="top" width="9.5%" headers="mcps1.3.5.2.2.6.1.2 "><p id="dli_08_15065__p85041711124316">Yes</p>
</td>
<td class="cellrowborder" valign="top" width="9.93%" headers="mcps1.3.5.2.2.6.1.3 "><p id="dli_08_15065__p81161246489">None</p>
</td>
<td class="cellrowborder" valign="top" width="11.19%" headers="mcps1.3.5.2.2.6.1.4 "><p id="dli_08_15065__p10504151120439">String</p>
</td>
<td class="cellrowborder" valign="top" width="50.129999999999995%" headers="mcps1.3.5.2.2.6.1.5 "><p id="dli_08_15065__p666811347347">Format used to deserialize and serialize the value part of Kafka messages. The following formats are supported:</p>
<ul id="dli_08_15065__ul19198153716341"><li id="dli_08_15065__li18422154073420">csv</li><li id="dli_08_15065__li19501446349">json</li><li id="dli_08_15065__li17198153714342">avro</li></ul>
<p id="dli_08_15065__p56981358165010">Refer to <a href="dli_08_15014.html">Format</a> for more details and format parameters.</p>
</td>
</tr>
<tr id="dli_08_15065__row4575174020442"><td class="cellrowborder" valign="top" width="19.25%" headers="mcps1.3.5.2.2.6.1.1 "><p id="dli_08_15065__p0575154004417">value.fields-include</p>
</td>
<td class="cellrowborder" valign="top" width="9.5%" headers="mcps1.3.5.2.2.6.1.2 "><p id="dli_08_15065__p4576940134417">Yes</p>
</td>
<td class="cellrowborder" valign="top" width="9.93%" headers="mcps1.3.5.2.2.6.1.3 "><p id="dli_08_15065__p195761240124419">ALL</p>
</td>
<td class="cellrowborder" valign="top" width="11.19%" headers="mcps1.3.5.2.2.6.1.4 "><p id="dli_08_15065__p1957634016440">String</p>
</td>
<td class="cellrowborder" valign="top" width="50.129999999999995%" headers="mcps1.3.5.2.2.6.1.5 "><p id="dli_08_15065__p1576154015440">Controls which fields should appear in the value part. Possible values are:</p>
<ul id="dli_08_15065__ul198451237165110"><li id="dli_08_15065__li484553712516"><strong id="dli_08_15065__b12861553377">ALL</strong>: All fields in the schema, including the primary key field, are included in the value part.</li><li id="dli_08_15065__li16282847523"><strong id="dli_08_15065__b1850412572079">EXCEPT_KEY</strong>: All the fields of the table schema are included, except the primary key field.</li></ul>
</td>
</tr>
<tr id="dli_08_15065__row56454611484"><td class="cellrowborder" valign="top" width="19.25%" headers="mcps1.3.5.2.2.6.1.1 "><p id="dli_08_15065__p18929131412486">properties.*</p>
</td>
<td class="cellrowborder" valign="top" width="9.5%" headers="mcps1.3.5.2.2.6.1.2 "><p id="dli_08_15065__p793014149489">No</p>
</td>
<td class="cellrowborder" valign="top" width="9.93%" headers="mcps1.3.5.2.2.6.1.3 "><p id="dli_08_15065__p1493061417486">None</p>
</td>
<td class="cellrowborder" valign="top" width="11.19%" headers="mcps1.3.5.2.2.6.1.4 "><p id="dli_08_15065__p1293061484819">String</p>
</td>
<td class="cellrowborder" valign="top" width="50.129999999999995%" headers="mcps1.3.5.2.2.6.1.5 "><p id="dli_08_15065__p91931621185216">This option can set and pass arbitrary Kafka configurations.</p>
<p id="dli_08_15065__p779051717532">The suffix to <strong id="dli_08_15065__b19244131117818">properties.</strong> must match the parameter defined in <a href="https://kafka.apache.org/documentation/#configuration" target="_blank" rel="noopener noreferrer">Kafka Configuration documentation</a>. Flink will remove the <strong id="dli_08_15065__b2093614784">properties.</strong> key prefix and pass the transformed key and value to the underlying KafkaClient.</p>
<p id="dli_08_15065__p7512441155310">For example, you can disable automatic topic creation via <strong id="dli_08_15065__b1976104913012">'properties.allow.auto.create.topics' = 'false'</strong>.</p>
<p id="dli_08_15065__p12930141413480">But there are some configurations that do not support to set, because Flink will override them, for example, <strong id="dli_08_15065__b566414209818">'key.deserializer'</strong> and <strong id="dli_08_15065__b126650201988">'value.deserializer'</strong>.</p>
</td>
</tr>
<tr id="dli_08_15065__row20376833175016"><td class="cellrowborder" valign="top" width="19.25%" headers="mcps1.3.5.2.2.6.1.1 "><p id="dli_08_15065__p3565185125013">sink.parallelism</p>
</td>
<td class="cellrowborder" valign="top" width="9.5%" headers="mcps1.3.5.2.2.6.1.2 "><p id="dli_08_15065__p13377133135012">No</p>
</td>
<td class="cellrowborder" valign="top" width="9.93%" headers="mcps1.3.5.2.2.6.1.3 "><p id="dli_08_15065__p143771133205013">None</p>
</td>
<td class="cellrowborder" valign="top" width="11.19%" headers="mcps1.3.5.2.2.6.1.4 "><p id="dli_08_15065__p143775338503">Integer</p>
</td>
<td class="cellrowborder" valign="top" width="50.129999999999995%" headers="mcps1.3.5.2.2.6.1.5 "><p id="dli_08_15065__p1837715330506">Defines the parallelism of the Upsert Kafka sink operator. By default, the parallelism is determined by the framework: using the same parallelism as the upstream join operator.</p>
</td>
</tr>
<tr id="dli_08_15065__row681019388509"><td class="cellrowborder" valign="top" width="19.25%" headers="mcps1.3.5.2.2.6.1.1 "><p id="dli_08_15065__p12434122795214">sink.buffer-flush.max-rows</p>
</td>
<td class="cellrowborder" valign="top" width="9.5%" headers="mcps1.3.5.2.2.6.1.2 "><p id="dli_08_15065__p8810133818506">No</p>
</td>
<td class="cellrowborder" valign="top" width="9.93%" headers="mcps1.3.5.2.2.6.1.3 "><p id="dli_08_15065__p1881019383505">0</p>
</td>
<td class="cellrowborder" valign="top" width="11.19%" headers="mcps1.3.5.2.2.6.1.4 "><p id="dli_08_15065__p68104387506">Integer</p>
</td>
<td class="cellrowborder" valign="top" width="50.129999999999995%" headers="mcps1.3.5.2.2.6.1.5 "><p id="dli_08_15065__p20768618205610">The max size of buffered records before flushing.</p>
<p id="dli_08_15065__p142265438318">When the sink receives many updates on the same key, the buffer will retain the last record of the same key. This can help to reduce data shuffling and avoid possible tombstone messages to Kafka topic. Can be set to <strong id="dli_08_15065__b11101104832010">0</strong> to disable it.</p>
<p id="dli_08_15065__p158572125617">By default, this is disabled. Note both <strong id="dli_08_15065__b20802313192119">sink.buffer-flush.max-rows</strong> and <strong id="dli_08_15065__b455020172118">sink.buffer-flush.interval</strong> must be set to be greater than zero to enable sink buffer flushing.</p>
</td>
</tr>
<tr id="dli_08_15065__row035816437507"><td class="cellrowborder" valign="top" width="19.25%" headers="mcps1.3.5.2.2.6.1.1 "><p id="dli_08_15065__p61311416528">sink.buffer-flush.interval</p>
</td>
<td class="cellrowborder" valign="top" width="9.5%" headers="mcps1.3.5.2.2.6.1.2 "><p id="dli_08_15065__p8358443145019">No</p>
</td>
<td class="cellrowborder" valign="top" width="9.93%" headers="mcps1.3.5.2.2.6.1.3 "><p id="dli_08_15065__p5358164335018">0</p>
</td>
<td class="cellrowborder" valign="top" width="11.19%" headers="mcps1.3.5.2.2.6.1.4 "><p id="dli_08_15065__p835813430501">Duration</p>
</td>
<td class="cellrowborder" valign="top" width="50.129999999999995%" headers="mcps1.3.5.2.2.6.1.5 "><p id="dli_08_15065__p842492411317">The flush interval mills, over this time, asynchronous threads will flush data. The unit can be millisecond (ms), second (s), minute (min), or hour (h). For example, <strong id="dli_08_15065__b6507105792412">'sink.buffer-flush.interval'='10 ms'</strong>.</p>
<p id="dli_08_15065__p13589439506">By default, this is disabled. Note both <strong id="dli_08_15065__b928919611232">sink.buffer-flush.max-rows</strong> and <strong id="dli_08_15065__b182903612317">sink.buffer-flush.interval</strong> must be set to be greater than zero to enable sink buffer flushing.</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="dli_08_15065__section669818312571"><h4 class="sectiontitle">Metadata</h4><p id="dli_08_15065__p28301241115719">For a list of available metadata fields, see <a href="dli_08_15058.html#dli_08_15058__section9326019161710">Kafka Connector</a>.</p>
</div>
<div class="section" id="dli_08_15065__section59890504504"><h4 class="sectiontitle">Example</h4><ul id="dli_08_15065__ul169111047194317"><li id="dli_08_15065__li5128916184416"><strong id="dli_08_15065__b1288562182914">Example 1: This example reads data from a DMS Kafka data source and writes it to the Print result table.</strong><ol id="dli_08_15065__ol9976201974411"><li id="dli_08_15065__li1845952844412">Create an enhanced datasource connection in the VPC and subnet where Kafka locates, and bind the connection to the required Flink elastic resource pool.</li><li id="dli_08_15065__li496053314418">Set Kafka security groups and add inbound rules to allow access from the Flink queue. Test the connectivity using the Kafka address. If the connection passes the test, it is bound to the queue.</li><li id="dli_08_15065__li89840388448">Create a Flink OpenSource SQL job. Enter the following job script and submit the job.<div class="p" id="dli_08_15065__p8332044619"><a name="dli_08_15065__li89840388448"></a><a name="li89840388448"></a>When you create a job, set <strong id="dli_08_15065__b1810143720104">Flink Version</strong> to <strong id="dli_08_15065__b31113717108">1.15</strong> in the <strong id="dli_08_15065__b1312133751013">Running Parameters</strong> tab. Select <strong id="dli_08_15065__b1113937171017">Save Job Log</strong>, and specify the OBS bucket for saving job logs. <strong id="dli_08_15065__b8218311379953">Change the values of the parameters in bold as needed in the following script.</strong><pre class="screen" id="dli_08_15065__screen5311155384510">CREATE TABLE upsertKafkaSource (
order_id string,
order_channel string,
order_time string,
pay_amount double,
real_pay double,
pay_time string,
user_id string,
user_name string,
area_id string,
PRIMARY KEY (order_id) NOT ENFORCED
) WITH (
'connector' = 'upsert-kafka',
'topic' = '<em id="dli_08_15065__i103121753174514"><strong id="dli_08_15065__b63121532453">KafkaTopic</strong></em>',
'properties.bootstrap.servers' = '<em id="dli_08_15065__i4312185311455"><strong id="dli_08_15065__b15312053114519">Kafka</strong></em><em id="dli_08_15065__i133121353194516"><strong id="dli_08_15065__b13121853174511">Address1:KafkaPort,KafkAddress2:KafkaPort</strong></em>',
'key.format' = 'csv',
'value.format' = 'json'
);
CREATE TABLE printSink (
order_id string,
order_channel string,
order_time string,
pay_amount double,
real_pay double,
pay_time string,
user_id string,
user_name string,
area_id string,
PRIMARY KEY (order_id) NOT ENFORCED
) WITH (
'connector' = 'print'
);
INSERT INTO printSink SELECT * FROM upsertKafkaSource;</pre>
</div>
</li><li id="dli_08_15065__li1868011202451">Insert the following data to the specified topics in Kafka. (Note: Specify the key when inserting data to Kafka.)<pre class="screen" id="dli_08_15065__screen939221534614">{"order_id":"202303251202020001", "order_channel":"miniAppShop", "order_time":"2023-03-25 12:02:02", "pay_amount":"60.00", "real_pay":"60.00", "pay_time":"2023-03-25 12:03:00", "user_id":"0002", "user_name":"Bob", "area_id":"330110"}
{"order_id":"202303251505050001", "order_channel":"appshop", "order_time":"2023-03-25 15:05:05", "pay_amount":"500.00", "real_pay":"400.00", "pay_time":"2023-03-25 15:10:00", "user_id":"0003", "user_name":"Cindy", "area_id":"330108"}
{"order_id":"202303251202020001", "order_channel":"miniAppShop", "order_time":"2023-03-25 12:02:02", "pay_amount":"60.00", "real_pay":"60.00", "pay_time":"2023-03-25 12:03:00", "user_id":"0002", "user_name":"Bob", "area_id":"330111"}</pre>
</li><li id="dli_08_15065__li18187113934517">View the <strong id="dli_08_15065__b1466114547260">out</strong> file of the TaskManager. The data results are as follows:<pre class="screen" id="dli_08_15065__screen7330246154614">+I(202303251202020001,miniAppShop,2023-03-2512:02:02,60.0,60.0,2023-03-2512:03:00,0002,Bob,330110)
+I(202303251505050001,appshop,2023-03-25 15:05:05,500.0,400.0,2023-03-2515:10:00,0003,Cindy,330108)
-U(202303251202020001,miniAppShop,2023-03-2512:02:02,60.0,60.0,2023-03-2512:03:00,0002,Bob,330110)
+U(202303251202020001,miniAppShop,2023-03-2512:02:02,60.0,60.0,2023-03-2512:03:00,0002,Bob,330111)</pre>
</li></ol>
</li></ul>
</div>
<ul id="dli_08_15065__ul6850193624718"><li id="dli_08_15065__li4850123614713"><strong id="dli_08_15065__b11984153932914">Example 2: This example retrieves DMS Kafka source topic data from a Kafka source table and writes it to a Kafka sink topic using Upsert Kafka result table.</strong><ol id="dli_08_15065__ol6897011184819"><li id="dli_08_15065__li118975114481">Create an enhanced datasource connection in the VPC and subnet where Kafka locates, and bind the connection to the required Flink elastic resource pool.</li><li id="dli_08_15065__li4126215184814">Set Kafka security groups and add inbound rules to allow access from the Flink queue. Test the connectivity using the Kafka address. If the connection passes the test, it is bound to the queue.</li><li id="dli_08_15065__li5712045101217">Create a Flink OpenSource SQL job. Enter the following job script and submit the job.<p id="dli_08_15065__p1111612171319"><a name="dli_08_15065__li5712045101217"></a><a name="li5712045101217"></a>When you create a job, set <strong id="dli_08_15065__b20533895535355">Flink Version</strong> to <strong id="dli_08_15065__b111938596735355">1.15</strong> in the <strong id="dli_08_15065__b186945659935355">Running Parameters</strong> tab. Select <strong id="dli_08_15065__b187881779535355">Save Job Log</strong>, and specify the OBS bucket for saving job logs. <strong id="dli_08_15065__b1949595018477">Change the values of the parameters in bold as needed in the following script.</strong></p>
<pre class="screen" id="dli_08_15065__screen910517456538">CREATE TABLE orders (
order_id string,
order_channel string,
order_time string,
pay_amount double,
real_pay double,
pay_time string,
user_id string,
user_name string,
area_id string
) WITH (
'connector' = 'kafka',
'topic' = '<em id="dli_08_15065__i5663552125810"><strong id="dli_08_15065__b544510556581">KafkaTopic</strong></em>',
'properties.bootstrap.servers' = '<em id="dli_08_15065__i1477713010594"><strong id="dli_08_15065__b14777309596">Kafka</strong></em><em id="dli_08_15065__i15795191714599"><strong id="dli_08_15065__b1454541665919">Address1:KafkaPort,KafkAddress2:KafkaPort</strong></em>',
'properties.group.id' = '<em id="dli_08_15065__i48641216140"><strong id="dli_08_15065__b1258711171413">GroupId</strong></em>',
'scan.startup.mode' = 'latest-offset',
'format' = 'json'
);
CREATE TABLE upsertKafkaSink (
order_id string,
order_channel string,
order_time string,
pay_amount double,
real_pay double,
pay_time string,
user_id string,
user_name string,
area_id string,
PRIMARY KEY(order_id) NOT ENFORCED
) WITH (
'connector' = 'upsert-kafka',
'topic' = '<em id="dli_08_15065__i1615711522219"><strong id="dli_08_15065__b131571315112220">KafkaTopic</strong></em>',
'properties.bootstrap.servers' = '<em id="dli_08_15065__i557924202211"><strong id="dli_08_15065__b155799414224">Kafka</strong></em><em id="dli_08_15065__i857913422217"><strong id="dli_08_15065__b45793442219">Address1:KafkaPort,KafkAddress2:KafkaPort</strong></em>',
'key.format' = 'csv',
'value.format' = 'json'
);
insert into upsertKafkaSink select * from orders;</pre>
</li><li id="dli_08_15065__li975951604818">Connect to the Kafka cluster and send the following test data to the Kafka source topic:<pre class="screen" id="dli_08_15065__screen12321712549">{"order_id":"202303251202020001", "order_channel":"miniAppShop", "order_time":"2023-03-25 12:02:02", "pay_amount":"60.00", "real_pay":"60.00", "pay_time":"2023-03-25 12:03:00", "user_id":"0002", "user_name":"Bob", "area_id":"330110"}
{"order_id":"202303251505050001", "order_channel":"appshop", "order_time":"2023-03-25 15:05:05", "pay_amount":"500.00", "real_pay":"400.00", "pay_time":"2023-03-25 15:10:00", "user_id":"0003", "user_name":"Cindy", "area_id":"330108"}
{"order_id":"202303251202020001", "order_channel":"miniAppShop", "order_time":"2023-03-25 12:02:02", "pay_amount":"60.00", "real_pay":"60.00", "pay_time":"2023-03-25 12:03:00", "user_id":"0002", "user_name":"Bob", "area_id":"330111"}</pre>
</li><li id="dli_08_15065__li176961346201518">Connect to the Kafka cluster and read data from the Kafka sink topic. The result is as follows:<pre class="screen" id="dli_08_15065__screen17299105417107">{"order_id":"202303251202020001", "order_channel":"miniAppShop", "order_time":"2023-03-25 12:02:02", "pay_amount":"60.00", "real_pay":"60.00", "pay_time":"2023-03-25 12:03:00", "user_id":"0002", "user_name":"Bob", "area_id":"330110"}
{"order_id":"202303251505050001", "order_channel":"appshop", "order_time":"2023-03-25 15:05:05", "pay_amount":"500.00", "real_pay":"400.00", "pay_time":"2023-03-25 15:10:00", "user_id":"0003", "user_name":"Cindy", "area_id":"330108"}
{"order_id":"202303251202020001", "order_channel":"miniAppShop", "order_time":"2023-03-25 12:02:02", "pay_amount":"60.00", "real_pay":"60.00", "pay_time":"2023-03-25 12:03:00", "user_id":"0002", "user_name":"Bob", "area_id":"330111"}</pre>
</li></ol>
</li><li id="dli_08_15065__li053020568110"><strong id="dli_08_15065__b889717813010">Example 3: In this scenario, the MRS cluster has enabled Kerberos authentication and Kafka is using the SASL_PLAINTEXT protocol. Data is retrieved from a Kafka source table and written to the Print result table.</strong><ol id="dli_08_15065__ol1420175911129"><li id="dli_08_15065__li3389205133">Create an enhanced datasource connection in the VPC and subnet where the MRS cluster locates, and bind the connection to the required Flink elastic resource pool.</li><li id="dli_08_15065__li133899031310">Set MRS cluster security groups and add inbound rules to allow access from the Flink queue. Test the connectivity using the Kafka address. If the connection passes the test, it is bound to the queue.</li><li id="dli_08_15065__li2038912013133">Create a Flink OpenSource SQL job. Enter the following job script and submit the job.<p id="dli_08_15065__p1138916013135"><a name="dli_08_15065__li2038912013133"></a><a name="li2038912013133"></a>When you create a job, set <strong id="dli_08_15065__b3184454335355">Flink Version</strong> to <strong id="dli_08_15065__b137611478135355">1.15</strong> in the <strong id="dli_08_15065__b53223807735355">Running Parameters</strong> tab. Select <strong id="dli_08_15065__b125702232335355">Save Job Log</strong>, and specify the OBS bucket for saving job logs. <strong id="dli_08_15065__b2503209479953">Change the values of the parameters in bold as needed in the following script.</strong></p>
<pre class="screen" id="dli_08_15065__screen19163103012224">CREATE TABLE upsertKafkaSource (
order_id string,
order_channel string,
order_time string,
pay_amount double,
real_pay double,
pay_time string,
user_id string,
user_name string,
area_id string,
PRIMARY KEY(order_id) NOT ENFORCED
) WITH (
'connector' = 'upsert-kafka',
'topic' = '<em id="dli_08_15065__i46091742172412"><strong id="dli_08_15065__b12609164282413">KafkaTopic</strong></em>',
'properties.bootstrap.servers' = '<em id="dli_08_15065__i1886314422264"><strong id="dli_08_15065__b8863114219268">Kafka</strong></em><em id="dli_08_15065__i98631142152612"><strong id="dli_08_15065__b1863642122618">Address1:KafkaPort,KafkAddress2:KafkaPort</strong></em>',
'key.format' = 'csv',
'value.format' = 'json',
'properties.sasl.mechanism' = 'GSSAPI',
'properties.security.protocol' = 'SASL_PLAINTEXT',
'properties.sasl.kerberos.service.name' = 'kafka', -- <em id="dli_08_15065__i1491114518248">Configured in MRS</em>
'properties.connector.auth.open' = 'true',
'properties.connector.kerberos.principal' = '<strong id="dli_08_15065__b11978758165114">username</strong>', --Username
'properties.connector.kerberos.krb5' = '<strong id="dli_08_15065__b1128184113515">obs://xx/krb5.conf</strong>', --krb5_conf path
'properties.connector.kerberos.keytab' = '<strong id="dli_08_15065__b48499469511">obs://xx/user.keytab</strong>' --keytab path
);
CREATE TABLE printSink (
order_id string,
order_channel string,
order_time string,
pay_amount double,
real_pay double,
pay_time string,
user_id string,
user_name string,
area_id string,
PRIMARY KEY (order_id) NOT ENFORCED
) WITH (
'connector' = 'print'
);
INSERT INTO printSink SELECT * FROM upsertKafkaSource;</pre>
</li><li id="dli_08_15065__li104051114191313">Insert the following data to the specified topics in Kafka. (Note: Specify the key when inserting data to Kafka.)<pre class="screen" id="dli_08_15065__screen1286614414406">{"order_id":"202303251202020001", "order_channel":"miniAppShop", "order_time":"2023-03-25 12:02:02", "pay_amount":"60.00", "real_pay":"60.00", "pay_time":"2023-03-25 12:03:00", "user_id":"0002", "user_name":"Bob", "area_id":"330110"}
{"order_id":"202303251505050001", "order_channel":"appshop", "order_time":"2023-03-25 15:05:05", "pay_amount":"500.00", "real_pay":"400.00", "pay_time":"2023-03-25 15:10:00", "user_id":"0003", "user_name":"Cindy", "area_id":"330108"}
{"order_id":"202303251202020001", "order_channel":"miniAppShop", "order_time":"2023-03-25 12:02:02", "pay_amount":"60.00", "real_pay":"60.00", "pay_time":"2023-03-25 12:03:00", "user_id":"0002", "user_name":"Bob", "area_id":"330111"}</pre>
</li><li id="dli_08_15065__li152001834111312">View the <strong id="dli_08_15065__b109367013284">out</strong> file of the TaskManager. The data results are as follows:<pre class="screen" id="dli_08_15065__screen1272875213402">+I(202303251202020001,miniAppShop,2023-03-2512:02:02,60.0,60.0,2023-03-2512:03:00,0002,Bob,330110)
+I(202303251505050001,appshop,2023-03-2515:05:05,500.0,400.0,2023-03-2515:10:00,0003,Cindy,330108)
-U(202303251202020001,miniAppShop,2023-03-2512:02:02,60.0,60.0,2023-03-2512:03:00,0002,Bob,330110)
+U(202303251202020001,miniAppShop,2023-03-2512:02:02,60.0,60.0,2023-03-2512:03:00,0002,Bob,330111)</pre>
</li></ol>
</li></ul>
<div class="section" id="dli_08_15065__section1373912193420"><h4 class="sectiontitle">FAQ</h4><p id="dli_08_15065__p412711653412">None</p>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="dli_08_15027.html">Connectors</a></div>
</div>
</div>