doc-exports/docs/dws/dev/dws_04_0465.html
Lu, Huayi e6fa411af0 DWS DEV 830.201 version
Reviewed-by: Pruthi, Vineet <vineet.pruthi@t-systems.com>
Co-authored-by: Lu, Huayi <luhuayi@huawei.com>
Co-committed-by: Lu, Huayi <luhuayi@huawei.com>
2024-05-16 07:24:04 +00:00

149 lines
18 KiB
HTML
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<a name="EN-US_TOPIC_0000001188163750"></a><a name="EN-US_TOPIC_0000001188163750"></a>
<h1 class="topictitle1">Plan Hint Cases</h1>
<div id="body1534471018691"><p id="EN-US_TOPIC_0000001188163750__p8060118">This section takes the statements in TPC-DS (Q24) as an example to describe how to optimize an execution plan by using hints in 1000X+24DN environments. For example:</p>
<div class="codecoloring" codetype="Sql" id="EN-US_TOPIC_0000001188163750__screen9193617117"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal"> 1</span>
<span class="normal"> 2</span>
<span class="normal"> 3</span>
<span class="normal"> 4</span>
<span class="normal"> 5</span>
<span class="normal"> 6</span>
<span class="normal"> 7</span>
<span class="normal"> 8</span>
<span class="normal"> 9</span>
<span class="normal">10</span>
<span class="normal">11</span>
<span class="normal">12</span>
<span class="normal">13</span>
<span class="normal">14</span>
<span class="normal">15</span>
<span class="normal">16</span>
<span class="normal">17</span>
<span class="normal">18</span>
<span class="normal">19</span>
<span class="normal">20</span>
<span class="normal">21</span>
<span class="normal">22</span>
<span class="normal">23</span>
<span class="normal">24</span>
<span class="normal">25</span>
<span class="normal">26</span>
<span class="normal">27</span>
<span class="normal">28</span>
<span class="normal">29</span>
<span class="normal">30</span>
<span class="normal">31</span>
<span class="normal">32</span>
<span class="normal">33</span>
<span class="normal">34</span>
<span class="normal">35</span>
<span class="normal">36</span></pre></div></td><td class="code"><div><pre><span></span><span class="k">select</span><span class="w"> </span><span class="k">avg</span><span class="p">(</span><span class="n">netpaid</span><span class="p">)</span><span class="w"> </span><span class="k">from</span>
<span class="p">(</span><span class="k">select</span><span class="w"> </span><span class="n">c_last_name</span>
<span class="p">,</span><span class="n">c_first_name</span>
<span class="p">,</span><span class="n">s_store_name</span>
<span class="p">,</span><span class="n">ca_state</span>
<span class="p">,</span><span class="n">s_state</span>
<span class="p">,</span><span class="n">i_color</span>
<span class="p">,</span><span class="n">i_current_price</span>
<span class="p">,</span><span class="n">i_manager_id</span>
<span class="p">,</span><span class="n">i_units</span>
<span class="p">,</span><span class="n">i_size</span>
<span class="p">,</span><span class="k">sum</span><span class="p">(</span><span class="n">ss_sales_price</span><span class="p">)</span><span class="w"> </span><span class="n">netpaid</span>
<span class="k">from</span><span class="w"> </span><span class="n">store_sales</span>
<span class="p">,</span><span class="n">store_returns</span>
<span class="p">,</span><span class="n">store</span>
<span class="p">,</span><span class="n">item</span>
<span class="p">,</span><span class="n">customer</span>
<span class="p">,</span><span class="n">customer_address</span>
<span class="k">where</span><span class="w"> </span><span class="n">ss_ticket_number</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">sr_ticket_number</span>
<span class="k">and</span><span class="w"> </span><span class="n">ss_item_sk</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">sr_item_sk</span>
<span class="k">and</span><span class="w"> </span><span class="n">ss_customer_sk</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">c_customer_sk</span>
<span class="k">and</span><span class="w"> </span><span class="n">ss_item_sk</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i_item_sk</span>
<span class="k">and</span><span class="w"> </span><span class="n">ss_store_sk</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">s_store_sk</span>
<span class="k">and</span><span class="w"> </span><span class="n">c_birth_country</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">upper</span><span class="p">(</span><span class="n">ca_country</span><span class="p">)</span>
<span class="k">and</span><span class="w"> </span><span class="n">s_zip</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ca_zip</span>
<span class="k">and</span><span class="w"> </span><span class="n">s_market_id</span><span class="o">=</span><span class="mi">7</span>
<span class="k">group</span><span class="w"> </span><span class="k">by</span><span class="w"> </span><span class="n">c_last_name</span>
<span class="p">,</span><span class="n">c_first_name</span>
<span class="p">,</span><span class="n">s_store_name</span>
<span class="p">,</span><span class="n">ca_state</span>
<span class="p">,</span><span class="n">s_state</span>
<span class="p">,</span><span class="n">i_color</span>
<span class="p">,</span><span class="n">i_current_price</span>
<span class="p">,</span><span class="n">i_manager_id</span>
<span class="p">,</span><span class="n">i_units</span>
<span class="p">,</span><span class="n">i_size</span><span class="p">);</span>
</pre></div></td></tr></table></div>
</div>
<ol id="EN-US_TOPIC_0000001188163750__ol207971158175310"><li id="EN-US_TOPIC_0000001188163750__li11797105865313">The original plan of this statement is as follows and the statement execution takes 110s:</li></ol>
<p id="EN-US_TOPIC_0000001188163750__p1464858821"><span><img id="EN-US_TOPIC_0000001188163750__image23681242193912" src="figure/en-us_image_0000001188642278.png"></span></p>
<p id="EN-US_TOPIC_0000001188163750__p9986844164634">In this plan, the performance of the layer-10 <strong id="EN-US_TOPIC_0000001188163750__b9655112954817">broadcast</strong> is poor because the estimation result generated at layer 11 is 2140 rows, which is much less than the actual number of rows. The inaccurate estimation is mainly caused by the underestimated number of rows in layer-13 hash join. In this layer, <strong id="EN-US_TOPIC_0000001188163750__b193711024214">store_sales</strong> and <strong id="EN-US_TOPIC_0000001188163750__b154865149423">store_returns</strong> are joined (based on the <strong id="EN-US_TOPIC_0000001188163750__b14441166124218">ss_ticket_number</strong> and <strong id="EN-US_TOPIC_0000001188163750__b4335510194214">ss_item_sk</strong> columns in <strong id="EN-US_TOPIC_0000001188163750__b6221410134620">store_sales</strong> and the <strong id="EN-US_TOPIC_0000001188163750__b1871342819468">sr_ticket_number</strong> and <strong id="EN-US_TOPIC_0000001188163750__b134980373463">sr_item_sk</strong> columns in <strong id="EN-US_TOPIC_0000001188163750__b16918165812462">store_returns</strong>) but the multi-column correlation is not considered.</p>
<p id="EN-US_TOPIC_0000001188163750__p65441763165542">2. After the <strong id="EN-US_TOPIC_0000001188163750__b1228215133317">rows</strong> hint is used for optimization, the plan is as follows and the statement execution takes 318s:</p>
<div class="codecoloring" codetype="Sql" id="EN-US_TOPIC_0000001188163750__screen1388233054017"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
<span class="normal">2</span></pre></div></td><td class="code"><div><pre><span></span><span class="k">select</span><span class="w"> </span><span class="k">avg</span><span class="p">(</span><span class="n">netpaid</span><span class="p">)</span><span class="w"> </span><span class="k">from</span>
<span class="p">(</span><span class="k">select</span><span class="w"> </span><span class="cm">/*+rows(store_sales store_returns * 11270)*/</span><span class="w"> </span><span class="n">c_last_name</span><span class="w"> </span><span class="p">...</span>
</pre></div></td></tr></table></div>
</div>
<p id="EN-US_TOPIC_0000001188163750__p76514582218"><span><img id="EN-US_TOPIC_0000001188163750__image105910124010" src="figure/en-us_image_0000001188163836.png"></span></p>
<p id="EN-US_TOPIC_0000001188163750__p9663288165638">The execution takes a longer time because layer-9 <strong id="EN-US_TOPIC_0000001188163750__b19468172115319">redistribute</strong> is slow. Considering that data skew does not occur at layer-9 <strong id="EN-US_TOPIC_0000001188163750__b197521239415">redistribute</strong>, the slow redistribution is caused by the slow layer-8 <strong id="EN-US_TOPIC_0000001188163750__b173801113105216">hashjoin</strong> due to data skew at layer-18 <strong id="EN-US_TOPIC_0000001188163750__b1349411511509">redistribute</strong>.</p>
<p id="EN-US_TOPIC_0000001188163750__p3640410172335">3. Data skew occurs at layer-18 <strong id="EN-US_TOPIC_0000001188163750__b1349411511509_1">redistribute</strong> because <strong id="EN-US_TOPIC_0000001188163750__b4243317171018">customer_address</strong> has a few different values in its two join keys. Therefore, plan <strong id="EN-US_TOPIC_0000001188163750__b1113575154018">customer_address</strong> as the last one to be joined. After the hint is used for optimization, the plan is as follows and the statement execution takes 116s:</p>
<div class="codecoloring" codetype="Sql" id="EN-US_TOPIC_0000001188163750__screen857018543401"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
<span class="normal">2</span>
<span class="normal">3</span>
<span class="normal">4</span></pre></div></td><td class="code"><div><pre><span></span><span class="k">select</span><span class="w"> </span><span class="k">avg</span><span class="p">(</span><span class="n">netpaid</span><span class="p">)</span><span class="w"> </span><span class="k">from</span>
<span class="p">(</span><span class="k">select</span><span class="w"> </span><span class="cm">/*+rows(store_sales store_returns *11270)</span>
<span class="cm">leading((store_sales store_returns store item customer) customer_address)*/</span>
<span class="n">c_last_name</span><span class="w"> </span><span class="p">...</span>
</pre></div></td></tr></table></div>
</div>
<p id="EN-US_TOPIC_0000001188163750__p567135811215"><span><img id="EN-US_TOPIC_0000001188163750__image151981946124013" src="figure/en-us_image_0000001188323808.png"></span></p>
<p id="EN-US_TOPIC_0000001188163750__p2732476172730">Most of the time is spent on layer-6 <strong id="EN-US_TOPIC_0000001188163750__b15716959481">redistribute</strong>. The plan needs to be further optimized.</p>
<p id="EN-US_TOPIC_0000001188163750__p1753514917519">4. Most of the time is spent on layer-6 <strong id="EN-US_TOPIC_0000001188163750__b15716959481_1">redistribute</strong> because of data skew. To avoid the data skew, plan the <strong id="EN-US_TOPIC_0000001188163750__b87887035220">item</strong> table as the last one to be joined because the number of rows is not reduced after <strong id="EN-US_TOPIC_0000001188163750__b17153111916534">item</strong> is joined. After the hint is used for optimization, the plan is as follows and the statement execution takes 120s:</p>
<div class="codecoloring" codetype="Sql" id="EN-US_TOPIC_0000001188163750__screen1174655464115"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
<span class="normal">2</span>
<span class="normal">3</span>
<span class="normal">4</span></pre></div></td><td class="code"><div><pre><span></span><span class="k">select</span><span class="w"> </span><span class="k">avg</span><span class="p">(</span><span class="n">netpaid</span><span class="p">)</span><span class="w"> </span><span class="k">from</span>
<span class="p">(</span><span class="k">select</span><span class="w"> </span><span class="cm">/*+rows(store_sales store_returns *11270)</span>
<span class="cm">leading((customer_address (store_sales store_returns store customer) item))</span>
<span class="cm">c_last_name ...</span>
</pre></div></td></tr></table></div>
</div>
<p id="EN-US_TOPIC_0000001188163750__p569135810217"><span><img id="EN-US_TOPIC_0000001188163750__image2993111284211" src="figure/en-us_image_0000001233883441.png"></span></p>
<p id="EN-US_TOPIC_0000001188163750__p11684954175337">Data skew occurs after the join of <strong id="EN-US_TOPIC_0000001188163750__b18453173185910">item</strong> and <strong id="EN-US_TOPIC_0000001188163750__b1898773475914">customer_address</strong> because <strong id="EN-US_TOPIC_0000001188163750__b142518405150">item</strong> is broadcasted at layer-22. As a result, layer-6 <strong id="EN-US_TOPIC_0000001188163750__b2313135044714">redistribute</strong> is still slow.</p>
<p id="EN-US_TOPIC_0000001188163750__p34007604175422">5. Add a hint to disable <strong id="EN-US_TOPIC_0000001188163750__b93931221022">broadcast</strong> for <strong id="EN-US_TOPIC_0000001188163750__b689890524">item</strong> or add a <strong id="EN-US_TOPIC_0000001188163750__b782961316511">redistribute</strong> hint for the join result of <strong id="EN-US_TOPIC_0000001188163750__b1350523615110">item</strong> and <strong id="EN-US_TOPIC_0000001188163750__b539217579514">customer_address</strong>. After the hint is used for optimization, the plan is as follows and the statement execution takes 105s:</p>
<div class="codecoloring" codetype="Sql" id="EN-US_TOPIC_0000001188163750__screen12189192620423"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
<span class="normal">2</span>
<span class="normal">3</span>
<span class="normal">4</span>
<span class="normal">5</span></pre></div></td><td class="code"><div><pre><span></span><span class="k">select</span><span class="w"> </span><span class="k">avg</span><span class="p">(</span><span class="n">netpaid</span><span class="p">)</span><span class="w"> </span><span class="k">from</span>
<span class="p">(</span><span class="k">select</span><span class="w"> </span><span class="cm">/*+rows(store_sales store_returns *11270)</span>
<span class="cm">leading((customer_address (store_sales store_returns store customer) item))</span>
<span class="cm">no broadcast(item)*/</span>
<span class="n">c_last_name</span><span class="w"> </span><span class="p">...</span>
</pre></div></td></tr></table></div>
</div>
<p id="EN-US_TOPIC_0000001188163750__p37115581725"><span><img id="EN-US_TOPIC_0000001188163750__image1151418428" src="figure/en-us_image_0000001188482366.png"></span></p>
<p id="EN-US_TOPIC_0000001188163750__p17911270172628">6. The last layer uses single-layer <strong id="EN-US_TOPIC_0000001188163750__b48062189713">Agg</strong> and the number of rows is greatly reduced. Set <strong id="EN-US_TOPIC_0000001188163750__b7469219917">best_agg_plan</strong> to <strong id="EN-US_TOPIC_0000001188163750__b119281831594">3</strong> and change the single-layer <strong id="EN-US_TOPIC_0000001188163750__b02491118154611">Agg</strong> to a double-layer <strong id="EN-US_TOPIC_0000001188163750__b48062189713_1">Agg</strong>. The plan is as follows and the statement execution takes 94s. The optimization ends.</p>
<p id="EN-US_TOPIC_0000001188163750__p87117581923"><span><img id="EN-US_TOPIC_0000001188163750__image19946125864217" src="figure/en-us_image_0000001233761953.png"></span></p>
<p id="EN-US_TOPIC_0000001188163750__p9812141112473"></p>
<p id="EN-US_TOPIC_0000001188163750__p108711021134617">If the query performance deteriorates due to statistics changes, you can use hints to optimize the query plan. Take TPCH-Q17 as an example. The query performance deteriorates after the value of <strong id="EN-US_TOPIC_0000001188163750__b1352962295615">default_statistics_target</strong> is changed from the default one to <strong id="EN-US_TOPIC_0000001188163750__b18712171265516">2</strong> for statistics collection.</p>
<p id="EN-US_TOPIC_0000001188163750__p2031111474503">1. If <strong id="EN-US_TOPIC_0000001188163750__b12924155913217">default_statistics_target</strong> is set to the default value <strong id="EN-US_TOPIC_0000001188163750__b683010713312">100</strong>, the plan is as follows:</p>
<p id="EN-US_TOPIC_0000001188163750__p1144621112525"><span><img id="EN-US_TOPIC_0000001188163750__image18109101412534" src="figure/en-us_image_0000001233761951.png"></span></p>
<p id="EN-US_TOPIC_0000001188163750__p198252022105313">2. If <strong id="EN-US_TOPIC_0000001188163750__b53091221040">default_statistics_target</strong> is set to <strong id="EN-US_TOPIC_0000001188163750__b113091621445">2</strong>, the plan is as follows:</p>
<p id="EN-US_TOPIC_0000001188163750__p961319379537"><span><img id="EN-US_TOPIC_0000001188163750__image15154101125413" src="figure/en-us_image_0000001188482364.png"></span></p>
<p id="EN-US_TOPIC_0000001188163750__p115342515413">3. After the analysis, the cause is that the stream type is changed from <strong id="EN-US_TOPIC_0000001188163750__b130316185416">BroadCast</strong> to <strong id="EN-US_TOPIC_0000001188163750__b2398321174118">Redistribute</strong> during the join of the <strong id="EN-US_TOPIC_0000001188163750__b1835714510619">lineitem</strong> and <strong id="EN-US_TOPIC_0000001188163750__b15343135414613">part</strong> tables. You can use a hint to change the stream type back to <strong id="EN-US_TOPIC_0000001188163750__b83431571982">BroadCast</strong>. For example:</p>
<p id="EN-US_TOPIC_0000001188163750__p1132682295611"><span><img id="EN-US_TOPIC_0000001188163750__image88541293211" src="figure/en-us_image_0000001188323810.png"></span></p>
<p id="EN-US_TOPIC_0000001188163750__p169813157218"></p>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="dws_04_0454.html">Hint-based Tuning</a></div>
</div>
</div>