Hot and Cold Data Management

Introduction to Hot and Cold Data

In massive big data scenarios, as services and data volume increase, data storage and consumption increase. The need for data may vary in different time periods, therefore, data is managed in a hierarchical manner, improving data analysis performance and reducing service costs.

For example, in a network traffic analysis system, users may be interested in security events and network access in the last month, but seldom pay attention to data generated several months ago. In such scenarios, data can be classified into hot data and cold data based on time periods.

Hot and cold data is classified based on the data access frequency and update frequency.

You can define cold and hot management tables to switch cold data that meets the specified rules to OBS for storage. Cold and hot data can be automatically determined and migrated by partition.

Hot and Cold Data Migration

When data is inserted to GaussDB(DWS) column-store tables, the data is first stored in hot partitions. As data accumulates, you can manually or automatically migrate the cold data to OBS for storage. The metadata, description tables, and indexes of the migrated cold data are stored locally to ensure the read performance.

Cold/Hot Switchover Policies

Currently, the hot and cold partitions can be switched based on LMT (Last Modify Time) and HPN (Hot Partition Number) policies. LMT indicates that the switchover is performed based on the last update time of the partition, and HPN indicates that the switchover is performed based on the number of reserved hot partitions.

Hot and cold data management supports the following functions:

Restrictions on Hot and Cold Data Management

Examples

  1. Create column-store cold and hot tables and set the hot data validity period LMT to 100 days.
    1
    2
    3
    4
    5
    6
    7
    8
    CREATE TABLE lifecycle_table(i int, val text) WITH (ORIENTATION = COLUMN, storage_policy = 'LMT:100')
    PARTITION BY RANGE (i)
    (
    PARTITION P1 VALUES LESS THAN(5),
    PARTITION P2 VALUES LESS THAN(10),
    PARTITION P3 VALUES LESS THAN(15),
    PARTITION P8 VALUES LESS THAN(MAXVALUE)
    )ENABLE ROW MOVEMENT;
    
  2. Switch cold data to the OBS tablespace.
    • Automatic switchover: The scheduler automatically triggers the switchover at 00:00 every day.

      The automatic switchover time can be customized. For example, the time can be changed to 06:30 every morning.

      1
      SELECT * FROM pg_obs_cold_refresh_time('lifecycle_table', '06:30:00');
      
    • Manual switchover

      Perform the following operations to manually switch a single table:

      1
      ALTER TABLE lifecycle_table refresh storage;
      

      Perform the following operations to switch over all cold and hot tables in batches:

      1
      SELECT pg_catalog.pg_refresh_storage();
      
  3. View data distribution in hot and cold tables.

    View the data distribution in a single table:

    1
    SELECT * FROM pg_catalog.pg_lifecycle_table_data_distribute('lifecycle_table');
    

    View data distribution in all hot and cold tables.

    1
    SELECT * FROM pg_catalog.pg_lifecycle_node_data_distribute();