forked from docs/doc-exports
Reviewed-by: Pruthi, Vineet <vineet.pruthi@t-systems.com> Co-authored-by: luhuayi <luhuayi@huawei.com> Co-committed-by: luhuayi <luhuayi@huawei.com>
137 lines
18 KiB
HTML
137 lines
18 KiB
HTML
<a name="EN-US_TOPIC_0000001764650836"></a><a name="EN-US_TOPIC_0000001764650836"></a>
|
||
|
||
<h1 class="topictitle1">Plan Hint Cases</h1>
|
||
<div id="body1534471018691"><p id="EN-US_TOPIC_0000001764650836__p8060118">This section takes the statements in TPC-DS (Q24) as an example to describe how to optimize an execution plan by using hints in 1000X+24DN environments. For example:</p>
|
||
<div class="codecoloring" codetype="Sql" id="EN-US_TOPIC_0000001764650836__screen9193617117"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal"> 1</span>
|
||
<span class="normal"> 2</span>
|
||
<span class="normal"> 3</span>
|
||
<span class="normal"> 4</span>
|
||
<span class="normal"> 5</span>
|
||
<span class="normal"> 6</span>
|
||
<span class="normal"> 7</span>
|
||
<span class="normal"> 8</span>
|
||
<span class="normal"> 9</span>
|
||
<span class="normal">10</span>
|
||
<span class="normal">11</span>
|
||
<span class="normal">12</span>
|
||
<span class="normal">13</span>
|
||
<span class="normal">14</span>
|
||
<span class="normal">15</span>
|
||
<span class="normal">16</span>
|
||
<span class="normal">17</span>
|
||
<span class="normal">18</span>
|
||
<span class="normal">19</span>
|
||
<span class="normal">20</span>
|
||
<span class="normal">21</span>
|
||
<span class="normal">22</span>
|
||
<span class="normal">23</span>
|
||
<span class="normal">24</span>
|
||
<span class="normal">25</span>
|
||
<span class="normal">26</span>
|
||
<span class="normal">27</span>
|
||
<span class="normal">28</span>
|
||
<span class="normal">29</span>
|
||
<span class="normal">30</span>
|
||
<span class="normal">31</span>
|
||
<span class="normal">32</span>
|
||
<span class="normal">33</span>
|
||
<span class="normal">34</span>
|
||
<span class="normal">35</span>
|
||
<span class="normal">36</span></pre></div></td><td class="code"><div><pre><span></span><span class="k">select</span><span class="w"> </span><span class="k">avg</span><span class="p">(</span><span class="n">netpaid</span><span class="p">)</span><span class="w"> </span><span class="k">from</span>
|
||
<span class="p">(</span><span class="k">select</span><span class="w"> </span><span class="n">c_last_name</span>
|
||
<span class="p">,</span><span class="n">c_first_name</span>
|
||
<span class="p">,</span><span class="n">s_store_name</span>
|
||
<span class="p">,</span><span class="n">ca_state</span>
|
||
<span class="p">,</span><span class="n">s_state</span>
|
||
<span class="p">,</span><span class="n">i_color</span>
|
||
<span class="p">,</span><span class="n">i_current_price</span>
|
||
<span class="p">,</span><span class="n">i_manager_id</span>
|
||
<span class="p">,</span><span class="n">i_units</span>
|
||
<span class="p">,</span><span class="n">i_size</span>
|
||
<span class="p">,</span><span class="k">sum</span><span class="p">(</span><span class="n">ss_sales_price</span><span class="p">)</span><span class="w"> </span><span class="n">netpaid</span>
|
||
<span class="k">from</span><span class="w"> </span><span class="n">store_sales</span>
|
||
<span class="p">,</span><span class="n">store_returns</span>
|
||
<span class="p">,</span><span class="n">store</span>
|
||
<span class="p">,</span><span class="n">item</span>
|
||
<span class="p">,</span><span class="n">customer</span>
|
||
<span class="p">,</span><span class="n">customer_address</span>
|
||
<span class="k">where</span><span class="w"> </span><span class="n">ss_ticket_number</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">sr_ticket_number</span>
|
||
<span class="k">and</span><span class="w"> </span><span class="n">ss_item_sk</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">sr_item_sk</span>
|
||
<span class="k">and</span><span class="w"> </span><span class="n">ss_customer_sk</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">c_customer_sk</span>
|
||
<span class="k">and</span><span class="w"> </span><span class="n">ss_item_sk</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i_item_sk</span>
|
||
<span class="k">and</span><span class="w"> </span><span class="n">ss_store_sk</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">s_store_sk</span>
|
||
<span class="k">and</span><span class="w"> </span><span class="n">c_birth_country</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">upper</span><span class="p">(</span><span class="n">ca_country</span><span class="p">)</span>
|
||
<span class="k">and</span><span class="w"> </span><span class="n">s_zip</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ca_zip</span>
|
||
<span class="k">and</span><span class="w"> </span><span class="n">s_market_id</span><span class="o">=</span><span class="mi">7</span>
|
||
<span class="k">group</span><span class="w"> </span><span class="k">by</span><span class="w"> </span><span class="n">c_last_name</span>
|
||
<span class="p">,</span><span class="n">c_first_name</span>
|
||
<span class="p">,</span><span class="n">s_store_name</span>
|
||
<span class="p">,</span><span class="n">ca_state</span>
|
||
<span class="p">,</span><span class="n">s_state</span>
|
||
<span class="p">,</span><span class="n">i_color</span>
|
||
<span class="p">,</span><span class="n">i_current_price</span>
|
||
<span class="p">,</span><span class="n">i_manager_id</span>
|
||
<span class="p">,</span><span class="n">i_units</span>
|
||
<span class="p">,</span><span class="n">i_size</span><span class="p">);</span>
|
||
</pre></div></td></tr></table></div>
|
||
</div>
|
||
<ol id="EN-US_TOPIC_0000001764650836__ol207971158175310"><li id="EN-US_TOPIC_0000001764650836__li11797105865313">The original plan of this statement is as follows and the statement execution takes 110s:<div class="fignone" id="EN-US_TOPIC_0000001764650836__fig562061716275"><span class="figcap"><b>Figure 1 </b>Statement initial plan</span><br><span><img id="EN-US_TOPIC_0000001764650836__image23681242193912" src="figure/en-us_image_0000001764492316.png"></span></div>
|
||
<p id="EN-US_TOPIC_0000001764650836__p9986844164634">In this plan, the performance of the layer-10 <strong id="EN-US_TOPIC_0000001764650836__b9655112954817">broadcast</strong> is poor because the number of rows estimated by the layer-11 operator is 2,140, which is much lower than the actual number of rows. The inaccurate estimation is mainly caused by the underestimated number of rows in layer-13 hash join. In this layer, <strong id="EN-US_TOPIC_0000001764650836__b193711024214">store_sales</strong> and <strong id="EN-US_TOPIC_0000001764650836__b154865149423">store_returns</strong> are joined (based on the <strong id="EN-US_TOPIC_0000001764650836__b14441166124218">ss_ticket_number</strong> and <strong id="EN-US_TOPIC_0000001764650836__b4335510194214">ss_item_sk</strong> columns in <strong id="EN-US_TOPIC_0000001764650836__b6221410134620">store_sales</strong> and the <strong id="EN-US_TOPIC_0000001764650836__b1871342819468">sr_ticket_number</strong> and <strong id="EN-US_TOPIC_0000001764650836__b134980373463">sr_item_sk</strong> columns in <strong id="EN-US_TOPIC_0000001764650836__b16918165812462">store_returns</strong>) but the multi-column correlation is not considered.</p>
|
||
</li><li id="EN-US_TOPIC_0000001764650836__li14457122012164">After the <strong id="EN-US_TOPIC_0000001764650836__b539418403389">rows</strong> hint is used for optimization, the plan is as follows and the statement execution takes 318s:<div class="codecoloring" codetype="Sql" id="EN-US_TOPIC_0000001764650836__screen1388233054017"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
|
||
<span class="normal">2</span></pre></div></td><td class="code"><div><pre><span></span><span class="k">select</span><span class="w"> </span><span class="k">avg</span><span class="p">(</span><span class="n">netpaid</span><span class="p">)</span><span class="w"> </span><span class="k">from</span>
|
||
<span class="p">(</span><span class="k">select</span><span class="w"> </span><span class="cm">/*+rows(store_sales store_returns * 11270)*/</span><span class="w"> </span><span class="n">c_last_name</span><span class="w"> </span><span class="p">...</span>
|
||
</pre></div></td></tr></table></div>
|
||
</div>
|
||
<div class="fignone" id="EN-US_TOPIC_0000001764650836__fig5341144415262"><span class="figcap"><b>Figure 2 </b>Using rows hints for optimization</span><br><span><img id="EN-US_TOPIC_0000001764650836__image105910124010" src="figure/en-us_image_0000001764651264.png"></span></div>
|
||
<p id="EN-US_TOPIC_0000001764650836__p9663288165638">The execution takes a longer time because layer-9 <strong id="EN-US_TOPIC_0000001764650836__b19468172115319">redistribute</strong> is slow. Considering that data skew does not occur at layer-9 <strong id="EN-US_TOPIC_0000001764650836__b197521239415">redistribute</strong>, the slow redistribution is caused by the slow layer-8 <strong id="EN-US_TOPIC_0000001764650836__b173801113105216">hashjoin</strong> due to data skew at layer-18 <strong id="EN-US_TOPIC_0000001764650836__b1349411511509">redistribute</strong>.</p>
|
||
</li><li id="EN-US_TOPIC_0000001764650836__li4505542141617">Data skew occurs at layer-18 <strong id="EN-US_TOPIC_0000001764650836__b184321651133817">redistribute</strong> because <strong id="EN-US_TOPIC_0000001764650836__b1443325110382">customer_address</strong> has a few different values in its two join keys. Therefore, plan <strong id="EN-US_TOPIC_0000001764650836__b174341051103820">customer_address</strong> as the last one to be joined. After the hint is used for optimization, the plan is as follows and the statement execution takes 116s:<div class="codecoloring" codetype="Sql" id="EN-US_TOPIC_0000001764650836__screen857018543401"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
|
||
<span class="normal">2</span>
|
||
<span class="normal">3</span>
|
||
<span class="normal">4</span></pre></div></td><td class="code"><div><pre><span></span><span class="k">select</span><span class="w"> </span><span class="k">avg</span><span class="p">(</span><span class="n">netpaid</span><span class="p">)</span><span class="w"> </span><span class="k">from</span>
|
||
<span class="p">(</span><span class="k">select</span><span class="w"> </span><span class="cm">/*+rows(store_sales store_returns *11270)</span>
|
||
<span class="cm">leading((store_sales store_returns store item customer) customer_address)*/</span>
|
||
<span class="n">c_last_name</span><span class="w"> </span><span class="p">...</span>
|
||
</pre></div></td></tr></table></div>
|
||
</div>
|
||
<div class="fignone" id="EN-US_TOPIC_0000001764650836__fig94322715269"><span class="figcap"><b>Figure 3 </b>Hint optimization</span><br><span><img id="EN-US_TOPIC_0000001764650836__image151981946124013" src="figure/en-us_image_0000001764492324.png"></span></div>
|
||
<p id="EN-US_TOPIC_0000001764650836__p2732476172730">Most of the time is spent on layer-6 <strong id="EN-US_TOPIC_0000001764650836__b15716959481">redistribute</strong>. The plan needs to be further optimized.</p>
|
||
</li><li id="EN-US_TOPIC_0000001764650836__li199801782171">The last layer redistribute contains skew. Therefore, it takes a long time. To avoid the data skew, plan the <strong id="EN-US_TOPIC_0000001764650836__b87887035220">item</strong> table as the last one to be joined because the number of rows is not reduced after <strong id="EN-US_TOPIC_0000001764650836__b17153111916534">item</strong> is joined. After the hint is used for optimization, the plan is as follows and the statement execution takes 120s:<div class="codecoloring" codetype="Sql" id="EN-US_TOPIC_0000001764650836__screen1174655464115"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
|
||
<span class="normal">2</span>
|
||
<span class="normal">3</span>
|
||
<span class="normal">4</span></pre></div></td><td class="code"><div><pre><span></span><span class="k">select</span><span class="w"> </span><span class="k">avg</span><span class="p">(</span><span class="n">netpaid</span><span class="p">)</span><span class="w"> </span><span class="k">from</span>
|
||
<span class="p">(</span><span class="k">select</span><span class="w"> </span><span class="cm">/*+rows(store_sales store_returns *11270)</span>
|
||
<span class="cm">leading((customer_address (store_sales store_returns store customer) item))</span>
|
||
<span class="cm">c_last_name ...</span>
|
||
</pre></div></td></tr></table></div>
|
||
</div>
|
||
<div class="fignone" id="EN-US_TOPIC_0000001764650836__fig9309113462515"><span class="figcap"><b>Figure 4 </b>Modifying hints and executing statements</span><br><span><img id="EN-US_TOPIC_0000001764650836__image2993111284211" src="figure/en-us_image_0000001764651272.png"></span></div>
|
||
<p id="EN-US_TOPIC_0000001764650836__p11684954175337">Data skew occurs after the join of <strong id="EN-US_TOPIC_0000001764650836__b18453173185910">item</strong> and <strong id="EN-US_TOPIC_0000001764650836__b1898773475914">customer_address</strong> because <strong id="EN-US_TOPIC_0000001764650836__b142518405150">item</strong> is broadcasted at layer-22. As a result, layer-6 <strong id="EN-US_TOPIC_0000001764650836__b2313135044714">redistribute</strong> is still slow.</p>
|
||
</li><li id="EN-US_TOPIC_0000001764650836__li15314624121719">Add a hint to disable <strong id="EN-US_TOPIC_0000001764650836__b8818172723918">broadcast</strong> for <strong id="EN-US_TOPIC_0000001764650836__b13818162783918">item</strong> or add a <strong id="EN-US_TOPIC_0000001764650836__b148191627113916">redistribute</strong> hint for the join result of <strong id="EN-US_TOPIC_0000001764650836__b138209275393">item</strong> and <strong id="EN-US_TOPIC_0000001764650836__b4820327153916">customer_address</strong>. After the hint is used for optimization, the plan is as follows and the statement execution takes 105s:<div class="codecoloring" codetype="Sql" id="EN-US_TOPIC_0000001764650836__screen12189192620423"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
|
||
<span class="normal">2</span>
|
||
<span class="normal">3</span>
|
||
<span class="normal">4</span>
|
||
<span class="normal">5</span></pre></div></td><td class="code"><div><pre><span></span><span class="k">select</span><span class="w"> </span><span class="k">avg</span><span class="p">(</span><span class="n">netpaid</span><span class="p">)</span><span class="w"> </span><span class="k">from</span>
|
||
<span class="p">(</span><span class="k">select</span><span class="w"> </span><span class="cm">/*+rows(store_sales store_returns *11270)</span>
|
||
<span class="cm">leading((customer_address (store_sales store_returns store customer) item))</span>
|
||
<span class="cm">no broadcast(item)*/</span>
|
||
<span class="n">c_last_name</span><span class="w"> </span><span class="p">...</span>
|
||
</pre></div></td></tr></table></div>
|
||
</div>
|
||
<div class="fignone" id="EN-US_TOPIC_0000001764650836__fig5361156102412"><span class="figcap"><b>Figure 5 </b>Execution plan</span><br><span><img id="EN-US_TOPIC_0000001764650836__image1151418428" src="figure/en-us_image_0000001811610665.png"></span></div>
|
||
</li><li id="EN-US_TOPIC_0000001764650836__li14989103618176">The last layer uses single-layer <strong id="EN-US_TOPIC_0000001764650836__b106231338173920">Agg</strong> and the number of rows is greatly reduced. Set <strong id="EN-US_TOPIC_0000001764650836__b11731942143910">best_agg_plan</strong> to <strong id="EN-US_TOPIC_0000001764650836__b16174342103910">3</strong> and change the single-layer <strong id="EN-US_TOPIC_0000001764650836__b9174942143913">Agg</strong> to a double-layer <strong id="EN-US_TOPIC_0000001764650836__b181755426391">Agg</strong>. The plan is as follows and the statement execution takes 94s. The optimization ends.<div class="fignone" id="EN-US_TOPIC_0000001764650836__fig623533414226"><span class="figcap"><b>Figure 6 </b>Final optimization plan</span><br><span><img id="EN-US_TOPIC_0000001764650836__image19946125864217" src="figure/en-us_image_0000001811491585.png"></span></div>
|
||
</li></ol>
|
||
<p id="EN-US_TOPIC_0000001764650836__p9812141112473"></p>
|
||
<p id="EN-US_TOPIC_0000001764650836__p108711021134617">If the query performance deteriorates due to statistics changes, you can use hints to optimize the query plan. Take TPCH-Q17 as an example. The query performance deteriorates after the value of <strong id="EN-US_TOPIC_0000001764650836__b1352962295615">default_statistics_target</strong> is changed from the default one to <strong id="EN-US_TOPIC_0000001764650836__b18712171265516">–2</strong> for statistics collection.</p>
|
||
<ol id="EN-US_TOPIC_0000001764650836__ol56141934131511"><li id="EN-US_TOPIC_0000001764650836__li16141334151517">If <strong id="EN-US_TOPIC_0000001764650836__b263204917392">default_statistics_target</strong> is set to the default value <strong id="EN-US_TOPIC_0000001764650836__b166494913394">100</strong>, the plan is as follows:<div class="fignone" id="EN-US_TOPIC_0000001764650836__fig16126556112311"><span class="figcap"><b>Figure 7 </b>Default statistics</span><br><span><img id="EN-US_TOPIC_0000001764650836__image18109101412534" src="figure/en-us_image_0000001764651268.png"></span></div>
|
||
</li><li id="EN-US_TOPIC_0000001764650836__li1716512430154">If <strong id="EN-US_TOPIC_0000001764650836__b138651154113918">default_statistics_target</strong> is set to <strong id="EN-US_TOPIC_0000001764650836__b0866195410399">–2</strong>, the plan is as follows.<div class="fignone" id="EN-US_TOPIC_0000001764650836__fig12387104852313"><span class="figcap"><b>Figure 8 </b>Changes in statistics</span><br><span><img id="EN-US_TOPIC_0000001764650836__image15154101125413" src="figure/en-us_image_0000001764492328.png"></span></div>
|
||
</li><li id="EN-US_TOPIC_0000001764650836__li471412547153">After the analysis, the cause is that the stream type is changed from <strong id="EN-US_TOPIC_0000001764650836__b172941399409">BroadCast</strong> to <strong id="EN-US_TOPIC_0000001764650836__b11295159174010">Redistribute</strong> during the join of the <strong id="EN-US_TOPIC_0000001764650836__b2296119124011">lineitem</strong> and <strong id="EN-US_TOPIC_0000001764650836__b72967964017">part</strong> tables. You can use a hint to change the stream type back to <strong id="EN-US_TOPIC_0000001764650836__b91405973655030">BroadCast</strong>. The figure below shows an example.<div class="fignone" id="EN-US_TOPIC_0000001764650836__fig3923153272315"><span class="figcap"><b>Figure 9 </b>Statements</span><br><span><img id="EN-US_TOPIC_0000001764650836__image88541293211" src="figure/en-us_image_0000001811610661.png"></span></div>
|
||
</li></ol>
|
||
<p id="EN-US_TOPIC_0000001764650836__p169813157218"></p>
|
||
</div>
|
||
<div>
|
||
<div class="familylinks">
|
||
<div class="parentlink"><strong>Parent topic:</strong> <a href="dws_04_0454.html">Hint-based Tuning</a></div>
|
||
</div>
|
||
</div>
|
||
|