Impact of rule violation:
Incorrect distribution method and column selection can cause storage skew, deteriorate access performance, and even overload storage and computing resources.
Solution:
Distribution Method |
Description |
Scenario |
|---|---|---|
Hash |
Table data is distributed to each DN based on the mapping between hash values generated by distribution columns and DNs.
|
Large tables and fact tables |
RoundRobin |
Table data is distributed to DNs in polling mode.
|
Large tables, fact tables, and tables without proper distribution columns |
Replication |
Full data in a table is copied to each DN in the cluster.
|
Small tables and dimension tables |
Impact of rule violation:
Row-store tables are not properly used. As a result, the query performance is poor and resources are overloaded.
Solution:
Storage Type |
Applicable Scenario |
Inapplicable Scenario |
|---|---|---|
Row storage |
|
DML query: statistical analysis query (with mass data involved in GROUP and JOIN processes) CAUTION:
When creating a row-store table (orientation is set to row), do not specify the compress attribute or use a row-store compressed table. |
Column storage |
|
|
Impact of rule violation:
Without partitioning, query performance and data governance efficiency will deteriorate. The larger the data volume, the greater the deterioration. The advantages of partitioning include:
Solution:
Partitioning Policy |
Description |
Scenario |
|---|---|---|
Range partitioning |
Data is stored in different partitions based on the range of partition key values. The partition key ranges are consecutive but not overlapped. |
|
List partitioning |
Partitioning is performed based on a unique list of partition key values. |
|
Impact of rule violation:
Solution:
Impact of rule violation:
When auto-increment sequences or data types are heavily used, the GTM may become overloaded and slow down sequence generation.
Solution: