CREATE TABLE

Function

Create an HStore table in the current database. The table will be owned by the user who created it.

In a hybrid data warehouse, you can use DDL statements to create HStore tables. To create an HStore table, set enable_hstore to on and set orientation to column.

To enhance performance, GaussDB(DWS) 9.1.0 and later versions have optimized HStore tables and kept the old ones for compatibility purposes. The optimized tables are known as HStore Opt tables. HStore tables can be replaced by HStore Opt tables for better performance, except in scenarios requiring high performance without micro-batch updates.

Precautions

Syntax

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
CREATE TABLE [ IF NOT EXISTS ] table_name
({ column_name data_type 
    | LIKE source_table [like_option [...] ] }
}
    [, ... ])
[ WITH ( {storage_parameter = value}  [, ... ] ) ]
[ TABLESPACE tablespace_name ]
[ DISTRIBUTE BY  HASH ( column_name [,...])]
[ TO { GROUP groupname | NODE ( nodename [, ... ] ) } ]
[ PARTITION BY { 
        {RANGE (partition_key) ( partition_less_than_item [, ... ] )} 
 } [ { ENABLE | DISABLE } ROW MOVEMENT ] ]; 
The options for LIKE are as follows:
{ INCLUDING | EXCLUDING } { DEFAULTS | CONSTRAINTS | INDEXES | STORAGE | COMMENTS | PARTITION | RELOPTIONS | DISTRIBUTION | ALL }

Differences Between Delta Tables

Table 1 Differences between the delta tables of HStore and column-store tables

Type

Column-Store Delta Table

HStore Delta Table

HStore Opt Delta Table

Table structure

Same as that defined for the column-store primary table.

Different from that defined for the primary table.

Different from the definitions of the primary table and but same as the definitions of the HStore table.

Functionality

Used to temporarily store a small batch of inserted data. After the data size reaches the threshold, the data will be merged to the primary table. In this way, data will not be directly inserted to the primary table or generate a large number of small CUs.

Persistently stores UPDATE, DELETE, and INSERT information. It is used to restore the memory structure that manages concurrent updates, such as the memory update chain, in the case of a fault.

Persistently stores UPDATE, DELETE, and INSERT information. It is used to restore the memory structure that manages concurrent updates, such as the memory update chain, in the case of a fault. It is further optimized compared with HStore.

Weakness

If data is not merged in a timely manner, the delta table will grow large and affect query performance. In addition, the table cannot solve lock conflicts during concurrent updates.

The merge operation depends on the background AUTOVACUUM.

The merge operation depends on the background AUTOVACUUM.

Specification differences

Concurrent requests in the same CU are not supported. It is applicable to the scenario where there are not many concurrent updates.

  1. Insertion and update restrictions:
    • MERGE INTO does not support concurrent updates of the same row or repeated updates of the same key.
    • Concurrent UPDATE or DELETE operations on the same row are not supported. Otherwise, an error is reported.
  2. Index and query restrictions:
    • Indexes do not support array condition filtering, IN expression filtering, partial indexes, or expression indexes.
    • Indexes cannot be invalidated.
  3. Table structure and operation restrictions:
    • Ensure that the tables to be exchanged are HStore tables during partition exchange or relfilenode operations.
    • The distribution column cannot be modified using the UPDATE command. You are not advised to modify the partition column using the UPDATE command. (No error is reported, but the performance is poor.)
  1. Insertion and update restrictions:
    • MERGE INTO does not support concurrent updates of the same row or repeated updates of the same key.
    • Concurrent updates or deletions of the same row is not supported.
    • hstore_opt does not support cross-partition upserts.
  2. Index and query restrictions:
    • Bitmap indexes are supported.
    • Global dictionaries are supported.
    • bitmap_columns must be specified during table creation and cannot be modified after being set.
    • The opt version does not support transparent parameter transmission during SMP streaming. In multi-table join queries that require partition pruning, avoid using replicated tables or setting query_dop.
  3. Table structure and operation restrictions:
    • Distribution columns and partition columns cannot be modified using UPDATE.
    • The enable_hstore_opt attribute must be set when the table is created and cannot be changed after being set.

Data import suggestions

  1. For optimal data import, query performance, and space utilization, it is recommended to choose the HStore Opt table. In scenarios involving micro-batch copying with high performance demands and no data updates, you can choose the HStore table.
  2. Similarities between HStore and HStore Opt tables:
    • The performance of importing data using UPDATE is poor. You are advised to use UPSERT to import data.
    • When using DELETE to import data, use index scanning. The JDBC batch method is recommended.
    • Use MERGE INTO to import data records to the database when the data volume exceeds 1 million per DN and there is no concurrent data.
    • Do not modify or add data in cold partitions.
  3. Suggestions on HStore table data import using UPSERT:
    • Select a method.

    Step 1: Select Method 2 for partial column upsert. For full column upsert (update all columns to new values without expressions when a conflict occurs), go to step 2.

    Step 2: Check whether data is concurrently updated to the same key when being imported to the database. If no conflict occurs, select Method 1. If a conflict occurs, go to step 3.

    Step 3: If duplicate data exists in the database, select Method 2. Otherwise, go to step 4.

    Step 4: If copying of temporary tables is used for import, select Method 3. Otherwise, select Method 2.

    • The methods are as follows:
      • Method 1: Enable enable_hstore_nonconflict_upsert_optimization and disable enable_hstore_partial_upsert_optimization.
      • Method 2: Disable enable_hstore_nonconflict_upsert_optimization and enable enable_hstore_partial_upsert_optimization.
      • Method 3: Disable enable_hstore_nonconflict_upsert_optimization and enable_hstore_partial_upsert_optimization.
    • Note: If the number of accumulated batches is less than 2,000, import data in batches into the database. For accumulated batches exceeding 2,000, import data into the database by copying temporary tables.
  4. Suggestions on HStore Opt table data import using UPSERT:

    If there is no concurrency conflict, enable the enable_hstore_nonconflict_upsert_optimization parameter. In other scenarios, disable the parameter. The optimal path is automatically selected.

Point query suggestions

  1. Generally, the HStore Opt table is recommended for point queries.
  2. Similarities between HStore and HStore Opt tables:

    Create a level-2 partition on the column where the equal-value filter condition is most frequently used and distinct values are evenly distributed.

  3. Suggestions on using HStore tables for point queries:
    • Accelerating indexes other than primary keys may have poor effect. You are advised not to enable index acceleration.
    • If the data type is numeric or strings less than 16 bytes, Turbo acceleration is recommended.
  4. Suggestions on using HStore Opt tables:
    • For equal-value filter columns not in level-2 partitions, if the columns involved in the filter criteria are basically fixed in the query, use the CB-tree index. If the columns change continuously, you are advised to use the GIN index. Do not select more than five index columns.
    • For all string columns involving equivalent filtering, bitmap indexes can be specified during table creation. The number of columns is not limited, but cannot be modified later.
    • Specify columns that can be filtered by time range as the partition columns.
    • If the number of returned data records exceeds 100,000 per DN, index scanning may not significantly enhance performance. In this case, you are advised to use the GUC parameter enable_seqscan to test the performance then determine which optimization method to use.

Parameters

Example

Create a simple HStore Opt table.

CREATE TABLE warehouse_t1
(
    W_WAREHOUSE_SK            INTEGER               NOT NULL,
    W_WAREHOUSE_ID            CHAR(16)              NOT NULL,
    W_WAREHOUSE_NAME          VARCHAR(20)                   ,
    W_WAREHOUSE_SQ_FT         INTEGER                       ,
    W_STREET_NUMBER           CHAR(10)                      ,
    W_STREET_NAME             VARCHAR(60)                   ,
    W_STREET_TYPE             CHAR(15)                      ,
    W_SUITE_NUMBER            CHAR(10)                      ,
    W_CITY                    VARCHAR(60)                   ,
    W_COUNTY                  VARCHAR(30)                   ,
    W_STATE                   CHAR(2)                       ,
    W_ZIP                     CHAR(10)                      ,
    W_COUNTRY                 VARCHAR(20)                   ,
    W_GMT_OFFSET              DECIMAL(5,2)
)WITH(ORIENTATION=COLUMN, ENABLE_HSTORE_OPT=ON);

CREATE TABLE warehouse_t2 (LIKE warehouse_t1 INCLUDING ALL);