Creates a foreign table in the current database for parallel data import and export of OBS data. The server used is gsmpp_server, which is created by the database by default.
The hybrid data warehouse (standalone) does 8.2.0.100 and later versions support OBS foreign table import and export.
Data Type |
DIST_FDW |
|
---|---|---|
- |
READ ONLY |
WRITE ONLY |
ORC |
× |
× |
PARQUET |
× |
× |
CARBONDATA |
× |
× |
TEXT |
√ |
√ |
CSV |
√ |
√ |
JSON |
× |
× |
1 2 3 4 5 6 7 8 | CREATE FOREIGN TABLE [ IF NOT EXISTS ] table_name ( { column_name type_name [column_constraint ] | LIKE source_table | table_constraint [, ...]} [, ...] ) SERVER server_name OPTIONS ( { option_name ' value ' } [, ...] ) [ { WRITE ONLY | READ ONLY }] [ WITH error_table_name | LOG INTO error_table_name] [PER NODE REJECT LIMIT 'value'] ; |
1 2 3 | [CONSTRAINT constraint_name] {PRIMARY KEY | UNIQUE} [NOT ENFORCED [ENABLE QUERY OPTIMIZATION | DISABLE QUERY OPTIMIZATION] | ENFORCED] |
1 2 3 | [CONSTRAINT constraint_name] {PRIMARY KEY | UNIQUE} (column_name) [NOT ENFORCED [ENABLE QUERY OPTIMIZATION | DISABLE QUERY OPTIMIZATION] | ENFORCED] |
Does not throw an error if a table with the same name exists. A notice is issued in this case.
Specifies the name of the foreign table to be created.
Value range: a string. It must comply with the naming convention.
Specifies the name of a column in the foreign table.
Value range: a string. It must comply with the naming convention.
Specifies the data type of the column.
Specifies the server name of the foreign table. For OBS foreign tables used for data import and export, you can use gsmpp_server created by the initial database by default or use a customized server.
Specifies parameters of foreign table data.
Specifies whether HTTPS is enabled for data transfer. on enables HTTPS and off disables it (in this case, HTTP is used). The default value is off.
Indicates the access key (AK, obtained from the user information on the console) used for the OBS access protocol. When you create a foreign table, its AK value is not encrypted and saved to the metadata table of the database. The correctness of the parameter is not verified when a foreign table is created.
Indicates the secret access key (SK, obtained from the user information on the console) used for the OBS access protocol. When you create a foreign table, its SK value is encrypted and saved to the metadata table of the database. The correctness of the parameter is not verified when a foreign table is created.
Corresponds to the SecurityToken value of the temporary security credential in IAM. A temporary AK, a temporary SK, and a temporary security token form a temporary security credential. This parameter is supported by version 8.2.0 or later clusters.
Specifies the cache read by each OBS thread on a DN. Its value range is 8 to 512 in the unit of MB. Its default value is 64.
Specifies the data source location of a foreign table. Currently, only URLs are allowed. Multiple URLs are separated using vertical bars (|).
When importing and exporting data, you are advised to use the location parameter as follows:
(Optional) specifies the value of regionCode, region information on the cloud.
If the region parameter is explicitly specified, the value of region will be read. If the region parameter is not specified, the value of defaultRegion will be read.
Note the following when setting parameters for importing or exporting OBS foreign tables in TEXT or CSV format:
Specifies the format of the source data file in a foreign table.
Valid value: CSV and TEXT. The default value is TEXT. GaussDB(DWS) only supports CSV and TEXT formats.
Specifies whether a file contains a header with the names of each column in the file.
When OBS exports data, this parameter cannot be set to true. Use the default value false, indicating that the first row of the exported data file is not the header.
When data is imported, if header is on, the first row of the data file will be identified as title row and ignored. If header is off, the first row will be identified as a data row.
Valid value: true, on, false, and off. The default value is false or off.
Specifies the column delimiter of data. Use the default delimiter if it is not set. The default delimiter of TEXT is a tab and that of CSV is a comma (,).
Value range:
The value of delimiter can be a multi-character delimiter whose length is less than or equal to 10 bytes.
Specifies the quotation mark for the CSV format. The default value is a double quotation mark (").
Specifies an escape character for a CSV file. The value must be a single-byte character.
The default value is a double quotation mark ("). If the value is the same as the quote value, it will be replaced with \0.
Value range:
Specifies whether to escape the backslash (\) and its following characters in the TEXT format.
noescaping is available only for the TEXT format.
Valid value: true, on, false, and off. The default value is false or off.
Specifies the encoding of a data file, that is, the encoding used to parse, check, and generate a data file. Its default value is the default client_encoding value of the current database.
Before you import foreign tables, it is recommended that you set client_encoding to the file encoding format, or a format matching the character set of the file. Otherwise, unnecessary parsing and check errors may occur, leading to import errors, rollback, or even invalid data import. Before exporting foreign tables, you are also advised to specify this parameter, because the export result using the default character set may not be what you expect.
If this parameter is not specified when you create a foreign table, a warning message will be displayed on the client.
Specifies how to handle the problem that the last column of a row in the source file is lost during data import.
Valid value: true, on, false, and off. The default value is false or off.
missing data for column "tt"
Specifies whether to ignore excessive columns when the number of columns in a source data file exceeds that defined in the foreign table. This parameter is available only for data import.
Valid value: true, on, false, and off. The default value is false or off.
extra data after last expected column
If the linefeed at the end of a row is lost and this parameter is set to true, data in the next row will be ignored.
Specifies the maximum number of data format errors allowed during a data import task. If the number of errors does not reach the maximum number, the data import task can still be executed.
You are advised to replace this syntax with PER NODE REJECT LIMIT 'value'.
Examples of data format errors include the following: a column is lost, an extra column exists, a data type is incorrect, and encoding is incorrect. Once a non-data format error occurs, the whole data import process is stopped.
Value range: an integer and unlimited.
If this parameter is not specified, an error message is returned immediately.
Imports and exports empty files between GaussDB(DWS) and OBS.
Valid value: true, on, false, and off. The default value is false or off.
If obs_null_file is set to true or on:
No such file or directory: 'XXX'
Specifies the newline character style of the imported or exported data file.
Value range: multi-character newline characters within 10 bytes. Common newline characters include \r (0x0D), \n (0x0A), and \r\n (0x0D0A). Special newline characters include $ and #.
Specifies the DATE format for data import. This syntax is available only for READ ONLY foreign tables.
Value range: a valid DATE value. For details, see Date and Time Processing Functions and Operators.
If ORACLE is specified as the compatible database, the DATE format is TIMESTAMP. For details, see timestamp_format below.
Specifies the TIME format for data import. This syntax is available only for READ ONLY foreign tables.
Value range: a valid TIME value. Time zones cannot be used.
Specifies the TIMESTAMP format for data import. This syntax is available only for READ ONLY foreign tables.
Value range: any valid TIMESTAMP value. Time zones cannot be used.
Specifies the SMALLDATETIME format for data import. This syntax is available only for READ ONLY foreign tables.
Value range: a valid SMALLDATETIME value.
Specifies whether to enable fault tolerance on invalid characters during data import. This syntax is available only for READ ONLY foreign tables.
Valid value: true, on, false, and off. The default value is false or off.
On a Windows platform, if OBS reads data files using the TEXT format, 0x1A will be treated as an EOF symbol and a parsing error will occur. It is the implementation constraint of the Windows platform. Since OBS on a Windows platform does not support BINARY read, the data can be read by OBS on a Linux platform.
The rule of error tolerance for invalid characters imported is as follows:
(1) \0 is converted to a space.
(2) Other invalid characters are converted to question marks.
(3) If compatible_illegal_chars is set to true or on, invalid characters are tolerated. If NULL, DELIMITER, QUOTE, and ESCAPE are set to a spaces or question marks, errors like "illegal chars conversion may confuse COPY escape 0x20" will be displayed to prompt users to change parameter values that cause confusion, preventing import errors.
Indicates whether a CSV file contains the utf8 BOM.
Value range: true, on, false, and off
Default value: false
This parameter is valid only when the foreign table is read-only and uses UTF8 code.
This parameter is used to optimize the performance of importing data in TEXT format. It specifies the lower limit of the logical block size of a file. If this parameter is specified, large files are split based on the actual file and DN status to improve the import concurrency. The purpose is to evenly distribute tasks on each DN. Therefore, this parameter can be used in scenarios where the number of files is less than the number of DNs or the file size is unbalanced.
The value ranges from 0 to 2147483647, in MB. The default value is 0, which indicates that this parameter does not take effect.
For example, if the current file size is 1024 MB and the number of DNs is 4, If the value of file_split_threshold is less than 256, the file is evenly divided into four blocks, and a 256 MB file import task is allocated to each DN. When file_split_threshold is set to 500, the file is split into 500 MB and 524 MB and allocated to two DNs because the block size cannot be less than 500 MB. This parameter is also applicable to multiple files.
Specifies whether a foreign table is read-only. This parameter is available only for data import.
Specifies whether a foreign table is write-only. This parameter is available only for data export.
Specifies the table where data format errors generated during parallel data import are recorded. You can query the error information table after data is imported to obtain error details. This parameter is available only after reject_limit is set.
To be compatible with PostgreSQL open source interfaces, you are advised to replace this syntax with LOG INTO. When this parameter is specified, an error table is automatically created.
Value range: a string. It must comply with the naming convention.
Specifies the table where data format errors generated during parallel data import are recorded. You can query the error information table after data is imported to obtain error details.
Value range: a string. It must comply with the naming convention.
Specifies the maximum number of data format errors on each DN during data import. If the number of errors exceeds the specified value on any DN, data import fails, an error is reported, and the system exits data import.
This syntax specifies the error tolerance of a single node.
Examples of data format errors include the following: a column is lost, an extra column exists, a data type is incorrect, and encoding is incorrect. When a non-data format error occurs, the whole data import process stops.
Value range: an unlimited integer. If this parameter is not specified, an error message is returned immediately.
Specifies the constraint to be an informational constraint. This constraint is guaranteed by the user instead of the database.
The default value is ENFORCED. ENFORCED is a reserved parameter and is currently not supported.
Specifies the informational constraint on column_name.
Value range: a string. It must comply with the naming convention, and the value of column_name must exist.
Optimizes the query plan using an informational constraint.
Disables the optimization of the query plan using an informational constraint.
Hard-coded or plaintext AK and SK are risky. For security purposes, encrypt your AK and SK and store them in the configuration file or environment variables.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | DROP FOREIGN TABLE IF EXISTS OBS_ft; NOTICE: foreign table "obs_ft" does not exist, skipping DROP FOREIGN TABLE CREATE FOREIGN TABLE OBS_ft( a int, b int)SERVER gsmpp_server OPTIONS (location 'obs://gaussdbcheck/obs_ddl/test_case_data/txt_obs_informatonal_test001',format 'text',encoding 'utf8',chunksize '32', encrypt 'on',ACCESS_KEY 'access_key_value_to_be_replaced',SECRET_ACCESS_KEY 'secret_access_key_value_to_be_replaced',delimiter E'\x08') read only; CREATE FOREIGN TABLE DROP TABLE row_tbl; DROP TABLE CREATE TABLE row_tbl( a int, b int); NOTICE: The 'DISTRIBUTE BY' clause is not specified. Using 'a' as the distribution column by default. HINT: Please use 'DISTRIBUTE BY' clause to specify suitable data distribution column. CREATE TABLE INSERT INTO row_tbl select * from OBS_ft; INSERT 0 3 |