The Apache Parquet format allows to read and write Parquet data. For details, see Parquet Format.
Parameter |
Mandatory |
Default Value |
Data Type |
Description |
|---|---|---|---|---|
format |
Yes |
None |
String |
Specify what format to use, here should be parquet. |
parquet.utc-timezone |
No |
false |
Boolean |
Use UTC timezone or local timezone to the conversion between epoch time and LocalDateTime. Hive 0.x/1.x/2.x use local timezone. But Hive 3.x use UTC timezone. |
Currently, Parquet format type mapping is compatible with Apache Hive, but different with Apache Spark:
The following table lists the type mapping from Flink type to Parquet type.
Note that currently only writing is supported for composite data types (Array, Map, and Row), while reading is not supported.
Flink SQL Type |
Parquet Type |
Parquet Logical Type |
|---|---|---|
CHAR/VARCHAR/STRING |
BINARY |
UTF8 |
BOOLEAN |
BOOLEAN |
- |
BINARY/VARBINARY |
BINARY |
- |
DECIMAL |
FIXED_LEN_BYTE_ARRAY |
DECIMAL |
TINYINT |
INT32 |
INT_8 |
SMALLINT |
INT32 |
INT_16 |
INT |
INT32 |
- |
BIGINT |
INT64 |
- |
FLOAT |
FLOAT |
- |
DOUBLE |
DOUBLE |
- |
DATE |
INT32 |
DATE |
TIME |
INT32 |
TIME_MILLIS |
TIMESTAMP |
INT96 |
- |
ARRAY |
- |
LIST |
MAP |
- |
MAP |
ROW |
- |
STRUCT |
Use Kafka to send data and output the data to Print.
CREATE TABLE kafkaSource ( order_id string, order_channel string, order_time string, pay_amount double, real_pay double, pay_time string, user_id string, user_name string, area_id string ) WITH ( 'connector' = 'kafka', 'topic-pattern' = 'kafkaTopic', 'properties.bootstrap.servers' = 'KafkaAddress1:KafkaPort,KafkaAddress2:KafkaPort', 'properties.group.id' = 'GroupId', 'scan.startup.mode' = 'latest-offset', 'format' = 'json' ); CREATE TABLE sink ( order_id string, order_channel string, order_time string, pay_amount double, real_pay double, pay_time string, user_id string, user_name string, area_id string ) WITH ( 'connector' = 'filesystem', 'format' = 'parquet', 'path' = 'obs://xx' ); insert into sink select * from kafkaSource;
202103251505050001,appShop,2021-03-25 15:05:05,500.00,400.00,2021-03-25 15:10:00,0003,Cindy,330108 202103241606060001,appShop,2021-03-24 16:06:06,200.00,180.00,2021-03-24 16:10:06,0001,Alice,330106
202103251202020001, miniAppShop, 2021-03-25 12:02:02, 60.0, 60.0, 2021-03-25 12:03:00, 0002, Bob, 330110 202103241606060001, appShop, 2021-03-24 16:06:06, 200.0, 180.0, 2021-03-24 16:10:06, 0001, Alice, 330106