forked from docs/doc-exports
Reviewed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com> Co-authored-by: Yang, Tong <yangtong2@huawei.com> Co-committed-by: Yang, Tong <yangtong2@huawei.com>
18 lines
2.0 KiB
HTML
18 lines
2.0 KiB
HTML
<a name="mrs_01_1992"></a><a name="mrs_01_1992"></a>
|
|
|
|
<h1 class="topictitle1">Optimizing Memory when Data Is Inserted into Dynamic Partitioned Tables</h1>
|
|
<div id="body1595920218639"><div class="section" id="mrs_01_1992__se7cd8e362d444eefb02fe601787937f4"><h4 class="sectiontitle">Scenario</h4><p id="mrs_01_1992__a085e5d75807347a688f52b015841d1b2">When SparkSQL inserts data to dynamic partitioned tables, the more partitions there are, the more HDFS files a single task generates and the more memory metadata occupies. In this case, Garbage Collection (GC) is severe and Out of Memory (OOM) may occur.</p>
|
|
<p id="mrs_01_1992__a3f499fb95639456b83046e625b48a755">Assume there are 10240 tasks and 2000 partitioned. Before the rename operation of HDFS files from a temporary directory to the target directory, there is about 29 GB FileStatus metadata.</p>
|
|
</div>
|
|
<div class="section" id="mrs_01_1992__sca4d708db25446e5ad0379233f155ef8"><h4 class="sectiontitle">Procedure</h4><p id="mrs_01_1992__a3a642af243c0494485740880255e48e0">Insert <strong id="mrs_01_1992__a6530e4465088470286bb7b30ceece126">distribute by</strong> followed by partition fields into dynamic partition statements.</p>
|
|
<p id="mrs_01_1992__a7ce82dc1925b4d61bf5d0450f6ebc395">For example:</p>
|
|
<p id="mrs_01_1992__ac5947be3a8f048e69945b2969257d791">insert into table store_returns partition (sr_returned_date_sk) select sr_return_time_sk,sr_item_sk,sr_customer_sk,sr_cdemo_sk,sr_hdemo_sk,sr_addr_sk,sr_store_sk,sr_reason_sk,sr_ticket_number,sr_return_quantity,sr_return_amt,sr_return_tax,sr_return_amt_inc_tax,sr_fee,sr_return_ship_cost,sr_refunded_cash,sr_reversed_charge,sr_store_credit,sr_net_loss,sr_returned_date_sk from ${SOURCE}.store_returns <strong id="mrs_01_1992__a8aa6b16af38e4e5da3abd919ca5900ed">distribute by sr_returned_date_sk</strong>;</p>
|
|
</div>
|
|
</div>
|
|
<div>
|
|
<div class="familylinks">
|
|
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1985.html">Spark SQL and DataFrame Tuning</a></div>
|
|
</div>
|
|
</div>
|
|
|