Yang, Tong 6182f91ba8 MRS component operation guide_normal 2.0.38.SP20 version
Reviewed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com>
Co-authored-by: Yang, Tong <yangtong2@huawei.com>
Co-committed-by: Yang, Tong <yangtong2@huawei.com>
2022-12-09 14:55:21 +00:00

67 lines
9.4 KiB
HTML

<a name="mrs_01_1998"></a><a name="mrs_01_1998"></a>
<h1 class="topictitle1">Merging CBO</h1>
<div id="body1595920218992"><div class="section" id="mrs_01_1998__sa3e2bbd4b1d540c69c41a1aaf7e55cd5"><h4 class="sectiontitle">Scenario</h4><p id="mrs_01_1998__p1288985151716">Spark SQL supports rule-based optimization by default. However, the rule-based optimization cannot ensure that Spark selects the optimal query plan. Cost-Based Optimizer (CBO) is a technology that intelligently selects query plans for SQL statements. After CBO is enabled, the CBO optimizer performs a series of estimations based on the table and column statistics to select the optimal query plan.</p>
</div>
<div class="section" id="mrs_01_1998__s637bd309b51a4763afd52b7872e94745"><h4 class="sectiontitle">Procedure</h4><p id="mrs_01_1998__ac20d9480dec2461f9600837df33f55fa">Perform the following steps to enable CBO:</p>
<ol id="mrs_01_1998__o376f60bbdf7d427d9e87c97e54246212"><li id="mrs_01_1998__lfe73aa51a9e0487da199fea9461c866c">You need to run corresponding SQL commands to collect required table and column statistics.<p id="mrs_01_1998__a73d4a626d46946b895335d83a8f87e55"><a name="mrs_01_1998__lfe73aa51a9e0487da199fea9461c866c"></a><a name="lfe73aa51a9e0487da199fea9461c866c"></a>SQL commands are as follows (to be chosen as required):</p>
<ul id="mrs_01_1998__u1c84db8115d64342b337ea9f59595671"><li id="mrs_01_1998__l9b48fd1d7734467083183c65c4aafdb6">Generate table-level statistics (table scanning):<p id="mrs_01_1998__a8e8093cb7b2543a68786bc7d0180e295"><a name="mrs_01_1998__l9b48fd1d7734467083183c65c4aafdb6"></a><a name="l9b48fd1d7734467083183c65c4aafdb6"></a><i><b><span class="cmdname" style="font-family:Arial" id="mrs_01_1998__cfc672366cef34e9ebd73760e0b70c898">ANALYZE TABLE src COMPUTE STATISTICS</span></b></i></p>
<p id="mrs_01_1998__a01e8e6f94f0e44c5808276f343c6d950">This command generates <strong id="mrs_01_1998__b1191875153116">sizeInBytes</strong> and <strong id="mrs_01_1998__b09238517318">rowCount</strong>.</p>
<p id="mrs_01_1998__abe04b910cb7c482489dba0394e0741d6">When you use the ANALYZE statement to collect statistics, sizes of tables not from HDFS cannot be calculated. </p>
</li><li id="mrs_01_1998__l52df6922c3cd4bfeb58bdbd9df2ebe1c">Generate table-level statistics (no table scanning):<p id="mrs_01_1998__a64b5f2f7811c40e3882e42c85b6ee8dd"><a name="mrs_01_1998__l52df6922c3cd4bfeb58bdbd9df2ebe1c"></a><a name="l52df6922c3cd4bfeb58bdbd9df2ebe1c"></a><i><b><span class="cmdname" style="font-family:Arial" id="mrs_01_1998__c2f088223bb984cb7b4100f5ddab5df5f">ANALYZE TABLE src COMPUTE STATISTICS NOSCAN</span></b></i></p>
<p id="mrs_01_1998__a81cf77448cfa4d14b4ebfc0eff0cdf7a">This command generates only <strong id="mrs_01_1998__b02857562317">sizeInBytes</strong>. Compared with the originally generated <strong id="mrs_01_1998__b15290145633116">sizeInBytes</strong> and <strong id="mrs_01_1998__b1329155617317">rowCount</strong> if the <strong id="mrs_01_1998__b1029115614319">sizeInBytes</strong> remains unchanged, <strong id="mrs_01_1998__b1729145611310">rowCount</strong> (if any) reserves. Otherwise, <strong id="mrs_01_1998__b142921756103114">rowCount</strong> is cleared.</p>
</li><li id="mrs_01_1998__l31fddb46eedd4629a0d780018ae92d64">Generate column-level statistics:<p id="mrs_01_1998__ac88fda91ab8a4e03a2341a4581fb6fb2"><a name="mrs_01_1998__l31fddb46eedd4629a0d780018ae92d64"></a><a name="l31fddb46eedd4629a0d780018ae92d64"></a><i><b><span class="cmdname" style="font-family:Arial" id="mrs_01_1998__cd2fb564cb33143a9a8a98d5de09916c3">ANALYZE TABLE src COMPUTE STATISTICS FOR COLUMNS a, b, c</span></b></i></p>
<p id="mrs_01_1998__a14f17d5d0f5041ca98125479119a0811">This command generates column statistics and updates table statistics for consistency. Statistics of complicated data types (such as Seq and Map) and HiveStringType cannot be generated.</p>
</li><li id="mrs_01_1998__lfbd7b7fdf3d549c5b603462c3379d697">Display statistics:<p id="mrs_01_1998__a053b885f56a341efb7dbf21833ed0fda"><a name="mrs_01_1998__lfbd7b7fdf3d549c5b603462c3379d697"></a><a name="lfbd7b7fdf3d549c5b603462c3379d697"></a><i><b><span class="cmdname" style="font-family:Arial" id="mrs_01_1998__c543ab8362d3b468b80716989d01682a7">DESC FORMATTED src</span></b></i></p>
<p id="mrs_01_1998__aec4002b70959496bbd411bb7562acf69">This command displays <em id="mrs_01_1998__i32451024123318">xxx</em> bytes and <em id="mrs_01_1998__i32511924133314">xxx</em> rows in <strong id="mrs_01_1998__b525162443314">Statistics</strong> to indicate table-level statistics. You can also run the following command to display column statistics:</p>
<p id="mrs_01_1998__p63218251284"><strong id="mrs_01_1998__b13226655185510">DESC FORMATTED src a</strong></p>
</li></ul>
<p id="mrs_01_1998__aa7be78fc5be94ea8b72e410a59d82fa4"><strong id="mrs_01_1998__b876973216331">Limitation</strong>: The current statistics collection does not support statistics for partition levels for partitioned tables.</p>
</li></ol><ol start="2" id="mrs_01_1998__oc3347a1a52254b35a8acbc28a69ff2e2"><li id="mrs_01_1998__l90e62173aabc47c2ba1ee9d868609fdb">Configure parameters in <a href="#mrs_01_1998__t0f01c19bf3d342d69b1e43165d3651c4">Table 1</a> in the <span class="filepath" id="mrs_01_1998__f389335c59bc940f3880bdc2e4e07b73e"><b>spark-defaults.conf</b></span> file on the Spark client.
<div class="tablenoborder"><a name="mrs_01_1998__t0f01c19bf3d342d69b1e43165d3651c4"></a><a name="t0f01c19bf3d342d69b1e43165d3651c4"></a><table cellpadding="4" cellspacing="0" summary="" id="mrs_01_1998__t0f01c19bf3d342d69b1e43165d3651c4" frame="border" border="1" rules="all"><caption><b>Table 1 </b>Parameter description</caption><thead align="left"><tr id="mrs_01_1998__rb9ef34d486ad4457b7c140ac7c5243e8"><th align="left" class="cellrowborder" valign="top" width="36.4%" id="mcps1.3.2.4.1.3.2.4.1.1"><p id="mrs_01_1998__a05b274f73c3041f6a34dd69c90341c4f">Parameter</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="50.88%" id="mcps1.3.2.4.1.3.2.4.1.2"><p id="mrs_01_1998__a8e83292fff7743b490d6d0d245bd6738"><strong id="mrs_01_1998__b557655015338">Description</strong></p>
</th>
<th align="left" class="cellrowborder" valign="top" width="12.72%" id="mcps1.3.2.4.1.3.2.4.1.3"><p id="mrs_01_1998__a6d9950c769ba499b8b18f1556640b762"><strong id="mrs_01_1998__b243185214335">Default Value</strong></p>
</th>
</tr>
</thead>
<tbody><tr id="mrs_01_1998__rb515ca4fb5d94847b8c421e7a81db874"><td class="cellrowborder" valign="top" width="36.4%" headers="mcps1.3.2.4.1.3.2.4.1.1 "><p id="mrs_01_1998__a073693eb4a7441568d003eed9ed60697">spark.sql.cbo.enabled</p>
</td>
<td class="cellrowborder" valign="top" width="50.88%" headers="mcps1.3.2.4.1.3.2.4.1.2 "><p id="mrs_01_1998__af4ddd8ddb91944dca86307f5f2e9b23a">The switch to enable or disable CBO.</p>
<ul id="mrs_01_1998__uea6421717be94a15bd8509940df3c474"><li id="mrs_01_1998__l4d3160acde444f3aa96b5543b28839af"><strong id="mrs_01_1998__b16961115633317">true</strong>: Enable</li><li id="mrs_01_1998__l47ff8e44339d4850980ac5c63fdb415d"><strong id="mrs_01_1998__b1819716581337">false</strong>: Disable</li></ul>
<p id="mrs_01_1998__ad48fdbbb06ca465885a1cb00cb8234f1">To enable this function, ensure that statistics of related tables and columns are generated.</p>
</td>
<td class="cellrowborder" valign="top" width="12.72%" headers="mcps1.3.2.4.1.3.2.4.1.3 "><p id="mrs_01_1998__a4bc2f3c6b42d4ecdb7a5e5a417f43ae3">false</p>
</td>
</tr>
<tr id="mrs_01_1998__r20c07787337549d3942c7d5b9a663e0b"><td class="cellrowborder" valign="top" width="36.4%" headers="mcps1.3.2.4.1.3.2.4.1.1 "><p id="mrs_01_1998__a2d8493df5d9d420e8bc79f89eb4919b3">spark.sql.cbo.joinReorder.enabled</p>
</td>
<td class="cellrowborder" valign="top" width="50.88%" headers="mcps1.3.2.4.1.3.2.4.1.2 "><p id="mrs_01_1998__acc47cf33f68c42a58b1d7a20216d5c0d">Specifies whether to automatically adjust the sequence of consecutive inner joins by using CBO.</p>
<ul id="mrs_01_1998__u107898c36a0b43d18ea673a6af8f60f1"><li id="mrs_01_1998__l2f8b505484fe4b21bc4fd03b7216c024"><strong id="mrs_01_1998__b4545182910342">true</strong>: Enable</li><li id="mrs_01_1998__lc51aa79d44384e17bfb8a575899a22a9"><strong id="mrs_01_1998__b111762310341">false</strong>: Disable</li></ul>
<p id="mrs_01_1998__a45b93eff1fc04058acd31f5f525e7389">To enable this function, ensure that statistics of related tables and columns are generated and CBO is enabled.</p>
</td>
<td class="cellrowborder" valign="top" width="12.72%" headers="mcps1.3.2.4.1.3.2.4.1.3 "><p id="mrs_01_1998__adc1a7e5d65b24052a58e6e641c4ebd02">false</p>
</td>
</tr>
<tr id="mrs_01_1998__r33966ba5f4564dd2be55c3cc54865fc5"><td class="cellrowborder" valign="top" width="36.4%" headers="mcps1.3.2.4.1.3.2.4.1.1 "><p id="mrs_01_1998__a00dfc86e624c4b2fb2080c4c0592014a">spark.sql.cbo.joinReorder.dp.threshold</p>
</td>
<td class="cellrowborder" valign="top" width="50.88%" headers="mcps1.3.2.4.1.3.2.4.1.2 "><p id="mrs_01_1998__afe191515553f48f7aee92f1f140e7501">Specifies the threshold of the number of tables that the sequence of consecutive inner joins is automatically adjusted by CBO.</p>
<p id="mrs_01_1998__a4d17f1b52cb0493c8d9a3ebf538f4199">If the threshold is exceeded, the sequence of joins is not adjusted.</p>
</td>
<td class="cellrowborder" valign="top" width="12.72%" headers="mcps1.3.2.4.1.3.2.4.1.3 "><p id="mrs_01_1998__a88198061a4294d40869d22f6c8afa377">12</p>
</td>
</tr>
</tbody>
</table>
</div>
</li></ol>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1985.html">Spark SQL and DataFrame Tuning</a></div>
</div>
</div>