CREATE TABLE

Function

Creates a new empty table in the current database.

This table is owned by the user who executes the command. However, if the system administrator creates a table in the schema with the same name as a common user, the owner of the table is the user (not the system administrator).

Precautions

  • You are not advised to specify a user-defined tablespace when creating an ordinary table.
  • Do not specify the COMPRESS compression attribute when creating a row-store table.
  • When creating a hash-distributed table object, ensure that data is evenly distributed. (For a table with more than 10 GB data, the skew rate must be less than 10%.)
  • When creating a table object for REPLICATION distribution, ensure that the number of rows in the table is less than 1 million.
  • When creating an H-Store table, ensure that the database GUC parameter settings meet the following requirements:
    • autovacuum is set to on.
    • The value of autovacuum_max_workers_hstore is greater than 0.
    • The value of autovacuum_max_workers is greater than that of autovacuum_max_workers_hstore.
  • For a large table (with more than 50 million rows of data) that contains the time field, the table must be designed as a partition table and the partition interval must be properly designed based on the query characteristics.
  • For a table where a large amount of data needs to be added, deleted, or modified, it is recommended that the number of indexes be less than or equal to three. The maximum number of indexes is five.
  • For more information about development and design specifications, see "GaussDB(DWS) Development and Design Proposal" in the Data Warehouse Service (DWS) Developer Guide.

Syntax

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
CREATE [ [ GLOBAL | LOCAL | VOLATILE ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXISTS ] table_name 
    { ({ column_name data_type [ compress_mode ] [ COLLATE collation ] [ column_constraint [ ... ] ]
        | table_constraint
        | LIKE source_table [ like_option [...] ] }
        [, ... ])|
        LIKE source_table [ like_option [...] ] } 
    [ WITH ( {storage_parameter = value} [, ... ] ) ]
    [ ON COMMIT { PRESERVE ROWS | DELETE ROWS } ]
    [ COMPRESS | NOCOMPRESS ]
    [ DISTRIBUTE BY { REPLICATION | ROUNDROBIN | { HASH ( column_name [,...] ) } } ]
    [ TO { GROUP groupname | NODE ( nodename [, ... ] ) } ]
    [ COMMENT [=] 'text' ];

Table Design Reference

GaussDB(DWS) is compatible with the PostgreSQL ecosystem. Row storage and its B-tree index are similar to those of PostgreSQL. Column storage and its index are self-developed. When creating a table, it is crucial to choose the right storage method, distribution column, partition key, and index. This ensures efficient data access during SQL execution, reducing I/O consumption. The following figure illustrates the process from SQL statement initiation to data acquisition, helping you understand the function of each technical method for performance optimization.

  1. When the SQL statement is executed, the partition table is optimized using the Partition Column to pinpoint the specific partition.
  2. The Distribute Column is used in a distributed hash table to quickly identify the data shard where the data resides. The data shard is located on a DN in a storage-compute coupled architecture, while in a storage-compute decoupled architecture, it is located on a bucket.
  3. In row-store mode, B-tree is used to quickly locate the data page. In column-store mode, the min-max index is used to quickly locate the CU data block that may contain relevant data. This index is particularly effective when filtering on the PCK column.
  4. The system automatically maintains the min-max index for all columns in the column-store mode. There is no need for manual index definition. The min-max index is used for coarse filtering. CU data blocks meeting the min-max condition may not contain data rows that meet the filter condition. If a bitmap column is defined, the bitmap index can quickly locate the row number of data that meets the filter condition in the CU. For ordered CUs, binary search is also used to quickly locate the row number of data.
  5. Column storage supports B-tree and GIN indexes, which can quickly locate the CU and row number of data that meets the conditions. However, due to high index maintenance costs, it is advised to use bitmap indexes instead unless there are high performance requirements for point queries.

The following table lists the existing optimization methods of GaussDB(DWS).

Table 1 Optimization methods

No.

Method

Usage

Example SQL

Modifiable After Creation

1

String

  1. The string type has slower performance compared to the fixed-length type, so it is not recommended for scenarios where the fixed-length type is more suitable.
  2. If the specified length is less than 16, performance will be significantly improved.

-

Yes (The existing data can be rewritten.)

2

Numeric

Specifying precision for the numeric type is essential for improving performance. It is not advisable to use the numeric type without specifying precision.

-

Yes (The existing data can be rewritten.)

3

Partition by Column

  1. This requires user-defined settings and is designed for partitioned tables. Pruning is possible using partition keys and partition-wise joins are supported. This method is suitable for equality and range queries.
  2. Having more than 1000 partitions is not recommended, and it is advisable to limit the number of partition columns to two.
1
SELECT * FROM t1 WHERE t1.c1='p1';

No (You need to create a new table to make modifications.)

4

secondary_part_column

  1. This requires user-defined settings and is applicable only to column-store tables and equality queries.
  2. Specify a level-2 partition on the most commonly used equivalent filter.
1
SELECT * FROM t1 WHERE t1.c1='p1';

No (You need to create a new table to make modifications.)

5

Distribute by Column

This requires user-defined settings and is suitable for join fields that require frequent GROUP BY or multi-table joins. It reduces data shuffling through local joins and is ideal for equality queries.

1
SELECT * FROM t1 join t2 on  t1.c3 = t2.c1;

No (You need to create a new table to make modifications.)

6

Bitmap column

Define the bitmap index (cardinality ≤ 32) or bloom filter (cardinality > 32) based on the repeated values in the CU. This method is applicable to equivalent queries of varchar or text type columns. It is advised to create indexes on columns involved in the WHERE condition.

1
SELECT * FROM t1 WHERE t1.c4='hello';

Yes (Modification does not rewrite existing data. Only the new data is affected.)

7

min-max index

  1. The min-max index is automatically generated and can be used for both equality and range queries.
  2. The min-max filtering effect depends on the data order. Specifying the PCK column enhances the filtering effect.
1
SELECT * FROM t1 WHERE c3 > 100 and c3 < 200;

Yes (The PCK columns can be modified. Modification does not rewrite existing data and only the new data is affected.)

8

Primary key (B-tree index)

  1. UPSERT data import strongly depends on the primary key and needs to be customized. It is applicable to equality and range queries. We suggest limiting the number of columns to five or fewer.
  2. If service requirements are met, it is better to use fixed-length type columns. During definition, place columns with more distinct values at the beginning.
1
SELECT * FROM t1 WHERE c3 > 100 and c3 < 200;

Yes (The index can be modified and re-created.)

9

GIN index

  1. This requires user-defined settings and is suitable for multi-condition equality queries. Avoid using columns with more than 1 million distinct values.
  2. It is recommended when the data volume after filtering is less than 1000. If the data volume remains large after filtering, it is not recommended.
1
SELECT * FROM t1 WHERE c1 = 100 and c3 = 200 and c2 = 105;

Yes (The index can be modified and re-created.)

10

Orientation=column/row

This method specifies whether a table is stored in rows or columns. Row-store tables cannot be compressed and are best suited for point queries and frequent updates. Column-store tables can be compressed and are ideal for analysis purposes.

-

No (You need to create a new table to make modifications.)

Parameters

Examples

Create a V3 table with storage and compute decoupled (supported only in the storage-compute decoupling 3.0 version).

1
2
3
4
5
6
7
8
CREATE TABLE  public.t1 
( 
id integer not null,  
data integer, 
age integer 
)  
WITH (ORIENTATION =COLUMN, COLVERSION =3.0) 
DISTRIBUTE BY ROUNDROBIN;

Specify the cache policy when creating a table (supported only in clusters of the storage-compute decoupling 3.0 version).

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
CREATE TABLE Sports
(
    N_NATIONKEY  INT NOT NULL
  , N_NAME       CHAR(25) NOT NULL
  , N_REGIONKEY  INT NOT NULL
  , N_COMMENT    VARCHAR(152)
) WITH (orientation = column, colversion = 3.0, cache_policy = 'HPL: Balls, Basketball')
tablespace cu_obs_tbs
DISTRIBUTE BY ROUNDROBIN
partition by list(N_NAME)
(
  partition Balls values ('Basketball', 'football', 'badminton'),
  partition Athletics values ('High jump', 'long jump', 'javelin'),
  partition Water_Sports values ('Surfing', 'diving', 'swimming'),
  partition Shooting values ('air guns', 'Rifles', 'archery'),
  partition rest values (DEFAULT)
);

Define a unique column constraint for the table:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
CREATE TABLE CUSTOMER
(    
    C_CUSTKEY     BIGINT NOT NULL CONSTRAINT C_CUSTKEY_pk PRIMARY KEY  , 
    C_NAME        VARCHAR(25)  , 
    C_ADDRESS     VARCHAR(40)  , 
    C_NATIONKEY   INT          , 
    C_PHONE       CHAR(15)     , 
    C_ACCTBAL     DECIMAL(15,2)  
)
DISTRIBUTE BY HASH(C_CUSTKEY);

Define a primary key table constraint for the table. You can define a primary key table constraint on one or more columns of a table:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
CREATE TABLE CUSTOMER
(    
    C_CUSTKEY     BIGINT       , 
    C_NAME        VARCHAR(25)  , 
    C_ADDRESS     VARCHAR(40)  , 
    C_NATIONKEY   INT          , 
    C_PHONE       CHAR(15)     , 
    C_ACCTBAL     DECIMAL(15,2)   , 
    CONSTRAINT C_CUSTKEY_KEY PRIMARY KEY(C_CUSTKEY,C_NAME)
)
DISTRIBUTE BY HASH(C_CUSTKEY,C_NAME);

Define the CHECK column constraint:

1
2
3
4
5
6
7
8
CREATE TABLE CUSTOMER
(    
    C_CUSTKEY     BIGINT NOT NULL CONSTRAINT C_CUSTKEY_pk PRIMARY KEY  , 
    C_NAME        VARCHAR(25)  , 
    C_ADDRESS     VARCHAR(40)  , 
    C_NATIONKEY   INT NOT NULL  CHECK (C_NATIONKEY > 0)  
)
DISTRIBUTE BY HASH(C_CUSTKEY);

Define the CHECK table constraint:

CREATE TABLE CUSTOMER
(    
    C_CUSTKEY     BIGINT NOT NULL CONSTRAINT C_CUSTKEY_pk PRIMARY KEY  , 
    C_NAME        VARCHAR(25)      , 
    C_ADDRESS     VARCHAR(40)      , 
    C_NATIONKEY   INT              , 
    CONSTRAINT C_CUSTKEY_KEY2 CHECK(C_CUSTKEY > 0 AND C_NAME <> '')
)
DISTRIBUTE BY HASH(C_CUSTKEY);

Create a column-store table and specify the storage format and compression mode:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
CREATE TABLE customer_address
(
    ca_address_sk       INTEGER                  NOT NULL   ,
    ca_address_id       CHARACTER(16)            NOT NULL   ,
    ca_street_number    CHARACTER(10)                       ,
    ca_street_name      CHARACTER varying(60)               ,
    ca_street_type      CHARACTER(15)                       ,
    ca_suite_number     CHARACTER(10)                    
)
WITH (ORIENTATION = COLUMN, COMPRESSION=HIGH,COLVERSION=2.0)
DISTRIBUTE BY HASH (ca_address_sk);

Use DEFAULT to declare a default value for column W_STATE:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
CREATE TABLE warehouse_t
(
    W_WAREHOUSE_SK            INTEGER                NOT NULL,
    W_WAREHOUSE_ID            CHAR(16)               NOT NULL,
    W_WAREHOUSE_NAME          VARCHAR(20)   UNIQUE DEFERRABLE,
    W_WAREHOUSE_SQ_FT         INTEGER                        ,
    W_COUNTY                  VARCHAR(30)                    ,
    W_STATE                   CHAR(2)            DEFAULT 'GA',
    W_ZIP                     CHAR(10)                       
);

Create the CUSTOMER_bk table in LIKE mode:

1
CREATE TABLE CUSTOMER_bk (LIKE CUSTOMER INCLUDING ALL);

Helpful Links

ALTER TABLE, 12.101-RENAME TABLE, and DROP TABLE