forked from docs/doc-exports
Reviewed-by: Pruthi, Vineet <vineet.pruthi@t-systems.com> Co-authored-by: Su, Xiaomeng <suxiaomeng1@huawei.com> Co-committed-by: Su, Xiaomeng <suxiaomeng1@huawei.com>
228 lines
16 KiB
HTML
228 lines
16 KiB
HTML
<a name="dli_08_15025"></a><a name="dli_08_15025"></a>
|
|
|
|
<h1 class="topictitle1">Parquet</h1>
|
|
<div id="body0000001778994273"><div class="section" id="dli_08_15025__section322883116715"><h4 class="sectiontitle">Function</h4><p id="dli_08_15025__p1162914361370">The Apache Parquet format allows to read and write Parquet data. For details, see <a href="https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/connectors/table/formats/parquet/" target="_blank" rel="noopener noreferrer">Parquet Format</a>.</p>
|
|
</div>
|
|
<div class="section" id="dli_08_15025__section122491371116"><h4 class="sectiontitle">Supported Connectors</h4><ul id="dli_08_15025__ul188074312166"><li id="dli_08_15025__li14357112884017">FileSystem</li></ul>
|
|
</div>
|
|
<div class="section" id="dli_08_15025__section168154297811"><h4 class="sectiontitle">Parameter Description</h4>
|
|
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="dli_08_15025__table51831049681" frame="border" border="1" rules="all"><caption><b>Table 1 </b>Parameters</caption><thead align="left"><tr id="dli_08_15025__row11832491881"><th align="left" class="cellrowborder" valign="top" width="20%" id="mcps1.3.3.2.2.6.1.1"><p id="dli_08_15025__p10183349181">Parameter</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="20%" id="mcps1.3.3.2.2.6.1.2"><p id="dli_08_15025__p31834491182">Mandatory</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="20%" id="mcps1.3.3.2.2.6.1.3"><p id="dli_08_15025__p1518317494810">Default Value</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="20%" id="mcps1.3.3.2.2.6.1.4"><p id="dli_08_15025__p191834491584">Data Type</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="20%" id="mcps1.3.3.2.2.6.1.5"><p id="dli_08_15025__p91839491485">Description</p>
|
|
</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody><tr id="dli_08_15025__row3183849987"><td class="cellrowborder" valign="top" width="20%" headers="mcps1.3.3.2.2.6.1.1 "><p id="dli_08_15025__p94431926691">format</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="20%" headers="mcps1.3.3.2.2.6.1.2 "><p id="dli_08_15025__p161837491988">Yes</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="20%" headers="mcps1.3.3.2.2.6.1.3 "><p id="dli_08_15025__p11832491884">None</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="20%" headers="mcps1.3.3.2.2.6.1.4 "><p id="dli_08_15025__p131838495813">String</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="20%" headers="mcps1.3.3.2.2.6.1.5 "><p id="dli_08_15025__p41836492815">Specify what format to use, here should be <strong id="dli_08_15025__b5217171012475">parquet</strong>.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="dli_08_15025__row111831249287"><td class="cellrowborder" valign="top" width="20%" headers="mcps1.3.3.2.2.6.1.1 "><p id="dli_08_15025__p41831549087">parquet.utc-timezone</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="20%" headers="mcps1.3.3.2.2.6.1.2 "><p id="dli_08_15025__p10183249685">No</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="20%" headers="mcps1.3.3.2.2.6.1.3 "><p id="dli_08_15025__p1018334913818">false</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="20%" headers="mcps1.3.3.2.2.6.1.4 "><p id="dli_08_15025__p1918310494819">Boolean</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="20%" headers="mcps1.3.3.2.2.6.1.5 "><p id="dli_08_15025__p1318364913817">Use UTC timezone or local timezone to the conversion between epoch time and LocalDateTime. Hive 0.x/1.x/2.x use local timezone. But Hive 3.<em id="dli_08_15025__i6381737484">x</em> use UTC timezone.</p>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
</div>
|
|
<div class="section" id="dli_08_15025__section19492140142312"><h4 class="sectiontitle">Data Type Mapping</h4><p id="dli_08_15025__p14448845516">Currently, Parquet format type mapping is compatible with Apache Hive, but different with Apache Spark:</p>
|
|
<ul id="dli_08_15025__ul164458205520"><li id="dli_08_15025__li24415815558">Timestamp: mapping timestamp type to int96 whatever the precision is.</li><li id="dli_08_15025__li104416805514">Decimal: mapping decimal type to fixed length byte array according to the precision.</li></ul>
|
|
<p id="dli_08_15025__p144413855514">The following table lists the type mapping from Flink type to Parquet type.</p>
|
|
<p id="dli_08_15025__p73531834115516">Note that currently only writing is supported for composite data types (Array, Map, and Row), while reading is not supported.</p>
|
|
|
|
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="dli_08_15025__table76243521557" frame="border" border="1" rules="all"><caption><b>Table 2 </b>Data type mapping</caption><thead align="left"><tr id="dli_08_15025__row1062445215558"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.4.6.2.4.1.1"><p id="dli_08_15025__p16241352145518">Flink SQL Type</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.4.6.2.4.1.2"><p id="dli_08_15025__p126241522551">Parquet Type</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.4.6.2.4.1.3"><p id="dli_08_15025__p1262415295511">Parquet Logical Type</p>
|
|
</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody><tr id="dli_08_15025__row17624205295515"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.4.6.2.4.1.1 "><p id="dli_08_15025__p13624552185512">CHAR/VARCHAR/STRING</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.4.6.2.4.1.2 "><p id="dli_08_15025__p126241852115512">BINARY</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.4.6.2.4.1.3 "><p id="dli_08_15025__p6624155212552">UTF8</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="dli_08_15025__row14624652105516"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.4.6.2.4.1.1 "><p id="dli_08_15025__p162495225511">BOOLEAN</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.4.6.2.4.1.2 "><p id="dli_08_15025__p46241952115516">BOOLEAN</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.4.6.2.4.1.3 "><p id="dli_08_15025__p16241752145515">-</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="dli_08_15025__row16624252125520"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.4.6.2.4.1.1 "><p id="dli_08_15025__p8624115213557">BINARY/VARBINARY</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.4.6.2.4.1.2 "><p id="dli_08_15025__p46241252165513">BINARY</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.4.6.2.4.1.3 "><p id="dli_08_15025__p1562485295517">-</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="dli_08_15025__row16248522550"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.4.6.2.4.1.1 "><p id="dli_08_15025__p10624105235514">DECIMAL</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.4.6.2.4.1.2 "><p id="dli_08_15025__p1462475211554">FIXED_LEN_BYTE_ARRAY</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.4.6.2.4.1.3 "><p id="dli_08_15025__p76241352155510">DECIMAL</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="dli_08_15025__row4624185255510"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.4.6.2.4.1.1 "><p id="dli_08_15025__p8624252105514">TINYINT</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.4.6.2.4.1.2 "><p id="dli_08_15025__p462435211552">INT32</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.4.6.2.4.1.3 "><p id="dli_08_15025__p1862419528550">INT_8</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="dli_08_15025__row162445285515"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.4.6.2.4.1.1 "><p id="dli_08_15025__p1162417527553">SMALLINT</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.4.6.2.4.1.2 "><p id="dli_08_15025__p196256526551">INT32</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.4.6.2.4.1.3 "><p id="dli_08_15025__p10625155245517">INT_16</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="dli_08_15025__row7625452145511"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.4.6.2.4.1.1 "><p id="dli_08_15025__p662517527555">INT</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.4.6.2.4.1.2 "><p id="dli_08_15025__p1762545213554">INT32</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.4.6.2.4.1.3 "><p id="dli_08_15025__p116258523557">-</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="dli_08_15025__row56256522558"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.4.6.2.4.1.1 "><p id="dli_08_15025__p11625165225517">BIGINT</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.4.6.2.4.1.2 "><p id="dli_08_15025__p66250528557">INT64</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.4.6.2.4.1.3 "><p id="dli_08_15025__p762514520554">-</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="dli_08_15025__row5625105285516"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.4.6.2.4.1.1 "><p id="dli_08_15025__p1762545285512">FLOAT</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.4.6.2.4.1.2 "><p id="dli_08_15025__p1962585213558">FLOAT</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.4.6.2.4.1.3 "><p id="dli_08_15025__p0625452155514">-</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="dli_08_15025__row16625185245518"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.4.6.2.4.1.1 "><p id="dli_08_15025__p36251652195519">DOUBLE</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.4.6.2.4.1.2 "><p id="dli_08_15025__p15625175205511">DOUBLE</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.4.6.2.4.1.3 "><p id="dli_08_15025__p36253520550">-</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="dli_08_15025__row1262575255510"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.4.6.2.4.1.1 "><p id="dli_08_15025__p8626105212552">DATE</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.4.6.2.4.1.2 "><p id="dli_08_15025__p13626145255511">INT32</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.4.6.2.4.1.3 "><p id="dli_08_15025__p1626352155513">DATE</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="dli_08_15025__row862675265519"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.4.6.2.4.1.1 "><p id="dli_08_15025__p26269529555">TIME</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.4.6.2.4.1.2 "><p id="dli_08_15025__p46261152155511">INT32</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.4.6.2.4.1.3 "><p id="dli_08_15025__p5626185225518">TIME_MILLIS</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="dli_08_15025__row2626115213555"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.4.6.2.4.1.1 "><p id="dli_08_15025__p562613528554">TIMESTAMP</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.4.6.2.4.1.2 "><p id="dli_08_15025__p86264523557">INT96</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.4.6.2.4.1.3 "><p id="dli_08_15025__p66263521557">-</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="dli_08_15025__row1562616529552"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.4.6.2.4.1.1 "><p id="dli_08_15025__p1962635211557">ARRAY</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.4.6.2.4.1.2 "><p id="dli_08_15025__p2062625210557">-</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.4.6.2.4.1.3 "><p id="dli_08_15025__p146261552165511">LIST</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="dli_08_15025__row262635255510"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.4.6.2.4.1.1 "><p id="dli_08_15025__p1362655216555">MAP</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.4.6.2.4.1.2 "><p id="dli_08_15025__p1626195215512">-</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.4.6.2.4.1.3 "><p id="dli_08_15025__p86261552195511">MAP</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="dli_08_15025__row10626452185517"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.4.6.2.4.1.1 "><p id="dli_08_15025__p7626352165511">ROW</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.4.6.2.4.1.2 "><p id="dli_08_15025__p362655210552">-</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.4.6.2.4.1.3 "><p id="dli_08_15025__p162665265513">STRUCT</p>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
</div>
|
|
<div class="section" id="dli_08_15025__section457955774517"><h4 class="sectiontitle">Example</h4><p id="dli_08_15025__p15881132116016">Use Kafka to send data and output the data to Print.</p>
|
|
<ol id="dli_08_15025__ol840395722311"><li id="dli_08_15025__li04031578234"><span>Create a datasource connection for the communication with the VPC and subnet where Kafka locates and bind the connection to the queue. Set a security group and inbound rule to allow access of the queue and test the connectivity of the queue using the Kafka IP address. For example, locate a general-purpose queue where the job runs and choose <strong id="dli_08_15025__b17553040155418">More</strong> > <strong id="dli_08_15025__b12554340105413">Test Address Connectivity</strong> in the <strong id="dli_08_15025__b75546404544">Operation</strong> column. If the connection is successful, the datasource is bound to the queue. Otherwise, the binding fails.</span></li><li id="dli_08_15025__li1599913011242"><span>Create a Flink OpenSource SQL job and enable checkpointing. Copy the following statement and submit the job:</span><p><pre class="screen" id="dli_08_15025__screen299960162418">CREATE TABLE kafkaSource (
|
|
order_id string,
|
|
order_channel string,
|
|
order_time string,
|
|
pay_amount double,
|
|
real_pay double,
|
|
pay_time string,
|
|
user_id string,
|
|
user_name string,
|
|
area_id string
|
|
) WITH (
|
|
'connector' = 'kafka',
|
|
'topic-pattern' = '<em id="dli_08_15025__i16836144119516"><strong id="dli_08_15025__b18837174117517">kafkaTopic</strong></em>',
|
|
'properties.bootstrap.servers' = '<em id="dli_08_15025__i13283145113512"><strong id="dli_08_15025__b4283751135115">KafkaAddress1:KafkaPort,KafkaAddress2:KafkaPort</strong></em>',
|
|
'properties.group.id' = '<em id="dli_08_15025__i2022058195111"><strong id="dli_08_15025__b42165813518">GroupId</strong></em>',
|
|
'scan.startup.mode' = 'latest-offset',
|
|
'format' = 'json'
|
|
);
|
|
|
|
|
|
CREATE TABLE sink (
|
|
order_id string,
|
|
order_channel string,
|
|
order_time string,
|
|
pay_amount double,
|
|
real_pay double,
|
|
pay_time string,
|
|
user_id string,
|
|
user_name string,
|
|
area_id string
|
|
) WITH (
|
|
'connector' = 'filesystem',
|
|
'format' = 'parquet',
|
|
'path' = 'obs://xx'
|
|
);
|
|
insert into sink select * from kafkaSource; </pre>
|
|
</p></li><li id="dli_08_15025__li1511420343241"><span>Insert the following data into the source Kafka topic:</span><p><pre class="screen" id="dli_08_15025__screen107391221112410">202103251505050001,appShop,2021-03-25 15:05:05,500.00,400.00,2021-03-25 15:10:00,0003,Cindy,330108
|
|
|
|
202103241606060001,appShop,2021-03-24 16:06:06,200.00,180.00,2021-03-24 16:10:06,0001,Alice,330106</pre>
|
|
</p></li><li id="dli_08_15025__li4353143193117"><span>Read the Parquet file in the OBS path configured in the sink table. The data results are as follows:</span><p><pre class="screen" id="dli_08_15025__screen14251955184812">202103251202020001, miniAppShop, 2021-03-25 12:02:02, 60.0, 60.0, 2021-03-25 12:03:00, 0002, Bob, 330110
|
|
|
|
202103241606060001, appShop, 2021-03-24 16:06:06, 200.0, 180.0, 2021-03-24 16:10:06, 0001, Alice, 330106</pre>
|
|
</p></li></ol>
|
|
</div>
|
|
</div>
|
|
<div>
|
|
<div class="familylinks">
|
|
<div class="parentlink"><strong>Parent topic:</strong> <a href="dli_08_15014.html">Formats</a></div>
|
|
</div>
|
|
</div>
|
|
|