forked from docs/doc-exports
Reviewed-by: Pruthi, Vineet <vineet.pruthi@t-systems.com> Co-authored-by: Lu, Huayi <luhuayi@huawei.com> Co-committed-by: Lu, Huayi <luhuayi@huawei.com>
21 lines
3.6 KiB
HTML
21 lines
3.6 KiB
HTML
<a name="EN-US_TOPIC_0000001629070625"></a><a name="EN-US_TOPIC_0000001629070625"></a>
|
|
|
|
<h1 class="topictitle1">Reviewing and Modifying a Table Definition</h1>
|
|
<div id="body32001227"><p id="EN-US_TOPIC_0000001629070625__en-us_topic_0000001188642066_aaecf81fc3e5c4bb0994e03744ac40898">In a distributed framework, data is distributed on DNs. Data on one or more DNs is stored on a physical storage device. To properly define a table, you must:</p>
|
|
<ol id="EN-US_TOPIC_0000001629070625__en-us_topic_0000001188642066_o150a9d6062844a98be2847fc15639453"><li id="EN-US_TOPIC_0000001629070625__en-us_topic_0000001188642066_le17aea1509e741ae98ededeecb7518ca"><strong id="EN-US_TOPIC_0000001629070625__en-us_topic_0000001188642066_b842352706204622">Evenly distribute data on each DN</strong> to avoid the available capacity decrease of a cluster caused by insufficient storage space of the storage device associated with a DN. Specifically, select a proper distribution key to avoid data skew.</li><li id="EN-US_TOPIC_0000001629070625__en-us_topic_0000001188642066_l6bd2cce7739545cdbb5aab0b065838d9"><strong id="EN-US_TOPIC_0000001629070625__en-us_topic_0000001188642066_b84235270621021">Evenly assign table scanning tasks on each DN</strong> to avoid that a DN is overloaded by the table scanning tasks. Specifically, do not select columns in the equivalent filter of a base table as the distribution key.</li><li id="EN-US_TOPIC_0000001629070625__en-us_topic_0000001188642066_leab523f5a3b54591bbf3d50b9b42209e"><strong id="EN-US_TOPIC_0000001629070625__en-us_topic_0000001188642066_en-us_topic_0076211989_en-us_topic_0071158044_b976457">Reduce the data volume scanned</strong> by using the partition pruning mechanism.</li><li id="EN-US_TOPIC_0000001629070625__en-us_topic_0000001188642066_lc8342b3efbaa40f3b65bfb2546e5bd84"><strong id="EN-US_TOPIC_0000001629070625__en-us_topic_0000001188642066_b842352706211330">Avoid the use of random I/O</strong> by using clustering or partial clustering.</li><li id="EN-US_TOPIC_0000001629070625__en-us_topic_0000001188642066_ldf22f2ef03284d2aab0b01714e3504be"><strong id="EN-US_TOPIC_0000001629070625__en-us_topic_0000001188642066_b84235270621167">Avoid data shuffle</strong> to reduce the network pressure by selecting the <strong id="EN-US_TOPIC_0000001629070625__en-us_topic_0000001188642066_b12816112055414">join-condition</strong> column or <strong id="EN-US_TOPIC_0000001629070625__en-us_topic_0000001188642066_b126748165542">group by</strong> column as the distribution column.</li></ol>
|
|
<p id="EN-US_TOPIC_0000001629070625__en-us_topic_0000001188642066_a14e12065f8794bc38c23d9ac7a665315">The distribution column is the core for defining a table. The following figure shows the procedure of defining a table. The table definition is created during the database design and is reviewed and modified during the SQL statement optimization.</p>
|
|
<div class="fignone" id="EN-US_TOPIC_0000001629070625__en-us_topic_0000001188642066_f517dfe25e4a54b85bbd72aa421728ea4"><span class="figcap"><b>Figure 1 </b>Procedure of defining a table</span><br><span><img class="imgResize" id="EN-US_TOPIC_0000001629070625__en-us_topic_0000001188642066_ida852153e8d642549ca959ced0ec7e62" src="figure/en-us_image_0000001188642240.png" width="494.76000000000005" height="264.43725" title="Click to enlarge"></span></div>
|
|
</div>
|
|
<div>
|
|
<div class="familylinks">
|
|
<div class="parentlink"><strong>Parent topic:</strong> <a href="dws_04_0437.html">Reviewing and Modifying a Table Definition</a></div>
|
|
</div>
|
|
</div>
|
|
|
|
|
|
<script language="JavaScript">
|
|
<!--
|
|
image_size('.imgResize');
|
|
var msg_imageMax = "view original image";
|
|
var msg_imageClose = "close";
|
|
//--></script> |