doc-exports/docs/dws/dev/dws_04_0438.html
Lu, Huayi e6fa411af0 DWS DEV 830.201 version
Reviewed-by: Pruthi, Vineet <vineet.pruthi@t-systems.com>
Co-authored-by: Lu, Huayi <luhuayi@huawei.com>
Co-committed-by: Lu, Huayi <luhuayi@huawei.com>
2024-05-16 07:24:04 +00:00

15 lines
2.9 KiB
HTML

<a name="EN-US_TOPIC_0000001188642066"></a><a name="EN-US_TOPIC_0000001188642066"></a>
<h1 class="topictitle1">Reviewing and Modifying a Table Definition</h1>
<div id="body1536646420232"><p id="EN-US_TOPIC_0000001188642066__aaecf81fc3e5c4bb0994e03744ac40898">In a distributed framework, data is distributed on DNs. Data on one or more DNs is stored on a physical storage device. To properly define a table, you must:</p>
<ol id="EN-US_TOPIC_0000001188642066__o150a9d6062844a98be2847fc15639453"><li id="EN-US_TOPIC_0000001188642066__le17aea1509e741ae98ededeecb7518ca"><strong id="EN-US_TOPIC_0000001188642066__b842352706204622">Evenly distribute data on each DN</strong> to avoid the available capacity decrease of a cluster caused by insufficient storage space of the storage device associated with a DN. Specifically, select a proper distribution key to avoid data skew.</li><li id="EN-US_TOPIC_0000001188642066__l6bd2cce7739545cdbb5aab0b065838d9"><strong id="EN-US_TOPIC_0000001188642066__b84235270621021">Evenly assign table scanning tasks on each DN</strong> to avoid that a DN is overloaded by the table scanning tasks. Specifically, do not select columns in the equivalent filter of a base table as the distribution key.</li><li id="EN-US_TOPIC_0000001188642066__leab523f5a3b54591bbf3d50b9b42209e"><strong id="EN-US_TOPIC_0000001188642066__en-us_topic_0076211989_en-us_topic_0071158044_b976457">Reduce the data volume scanned</strong> by using the partition pruning mechanism.</li><li id="EN-US_TOPIC_0000001188642066__lc8342b3efbaa40f3b65bfb2546e5bd84"><strong id="EN-US_TOPIC_0000001188642066__b842352706211330">Avoid the use of random I/O</strong> by using clustering or partial clustering.</li><li id="EN-US_TOPIC_0000001188642066__ldf22f2ef03284d2aab0b01714e3504be"><strong id="EN-US_TOPIC_0000001188642066__b84235270621167">Avoid data shuffle</strong> to reduce the network pressure by selecting the <strong id="EN-US_TOPIC_0000001188642066__b12816112055414">join-condition</strong> column or <strong id="EN-US_TOPIC_0000001188642066__b126748165542">group by</strong> column as the distribution column.</li></ol>
<p id="EN-US_TOPIC_0000001188642066__a14e12065f8794bc38c23d9ac7a665315">The distribution column is the core for defining a table. The following figure shows the procedure of defining a table. The table definition is created during the database design and is reviewed and modified during the SQL statement optimization.</p>
<div class="fignone" id="EN-US_TOPIC_0000001188642066__f517dfe25e4a54b85bbd72aa421728ea4"><span class="figcap"><b>Figure 1 </b>Procedure of defining a table</span><br><span><img class="vsd" id="EN-US_TOPIC_0000001188642066__ida852153e8d642549ca959ced0ec7e62" src="figure/en-us_image_0000001188642240.png"></span></div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="dws_04_0437.html">Reviewing and Modifying a Table Definition</a></div>
</div>
</div>