Files
doc-exports/docs/dws/dev/dws_04_0426.html
luhuayi 177cd61a57 DWS DEVG 910.211 version
Reviewed-by: Pruthi, Vineet <vineet.pruthi@t-systems.com>
Co-authored-by: luhuayi <luhuayi@huawei.com>
Co-committed-by: luhuayi <luhuayi@huawei.com>
2025-05-05 07:44:03 +00:00

38 lines
14 KiB
HTML

<a name="EN-US_TOPIC_0000002088892837"></a><a name="EN-US_TOPIC_0000002088892837"></a>
<h1 class="topictitle1">Configuring LLVM</h1>
<div id="body8662426"><p id="EN-US_TOPIC_0000002088892837__en-us_topic_0000001233681667_ae61276586da246dc81db3670549711cc">LLVM dynamic compilation can be used to generate customized machine code for each query to replace original common functions. The query performance is improved by reducing redundant judgment condition and virtual function invocation, and make local data more accurate during actual queries.</p>
<p id="EN-US_TOPIC_0000002088892837__en-us_topic_0000001233681667_en-us_topic_0059779342_p168506395224">LLVM needs to consume extra time to pre-generate intermediate representation (IR) and compile it into code. Therefore, if the data volume is small or if a query itself consumes little time, LLVM actually does more harm than good.</p>
<div class="section" id="EN-US_TOPIC_0000002088892837__en-us_topic_0000001233681667_section1978941756"><h4 class="sectiontitle">LLVM Application Scenarios and Constraints</h4><p id="EN-US_TOPIC_0000002088892837__en-us_topic_0000001233681667_p1989722512246"><strong id="EN-US_TOPIC_0000002088892837__en-us_topic_0000001233681667_b1248684412288">Applicable Scenarios</strong></p>
<ul id="EN-US_TOPIC_0000002088892837__en-us_topic_0000001233681667_ul1596716304255"><li id="EN-US_TOPIC_0000002088892837__en-us_topic_0000001233681667_l103432947a7f4768a2995f252c1e37e5">Expressions supporting LLVM. The query statements that contain the following expressions support LLVM optimization:<ol id="EN-US_TOPIC_0000002088892837__en-us_topic_0000001233681667_en-us_topic_0066033419_ol66421493368"><li id="EN-US_TOPIC_0000002088892837__en-us_topic_0000001233681667_l9aab2a41834341fda8af19448699e5ce">CASE...WHEN...</li><li id="EN-US_TOPIC_0000002088892837__en-us_topic_0000001233681667_en-us_topic_0066033419_li16420499364">IN</li><li id="EN-US_TOPIC_0000002088892837__en-us_topic_0000001233681667_en-us_topic_0066033419_li06427499364">Bool (AND/OR/NOT)</li><li id="EN-US_TOPIC_0000002088892837__en-us_topic_0000001233681667_l25bef0201d244cf986b3708b2a7bc408">BooleanTest (IS_NOT_KNOWN/IS_UNKNOWN/IS_TRUE/IS_NOT_TRUE/IS_FALSE/IS_NOT_FALSE)</li><li id="EN-US_TOPIC_0000002088892837__en-us_topic_0000001233681667_l5b0b817852cb45d9b591385b8a91a89f">NullTest (IS_NOT_NULL/IS_NULL)</li><li id="EN-US_TOPIC_0000002088892837__en-us_topic_0000001233681667_lc913227ed0e8444f8b022c8893ad5fc6">Operators</li><li id="EN-US_TOPIC_0000002088892837__en-us_topic_0000001233681667_l55d469a5aa8341c8878f7e955d5039cb">Functions (lpad, substring, btrim, rtrim, and length)</li><li id="EN-US_TOPIC_0000002088892837__en-us_topic_0000001233681667_l57788fc7998c4ac690c437a03d614612">Nullif</li></ol>
<p id="EN-US_TOPIC_0000002088892837__en-us_topic_0000001233681667_a6c8797ede3e54ef5b309a147aae5cca2">The following data types are supported for expression calculation: bool, tinyint, smallint, int, bigint, float4, float8, numeric, date, time, timetz, timestamp, timestamptz, interval, bpchar, varchar, text, and oid.</p>
<p id="EN-US_TOPIC_0000002088892837__en-us_topic_0000001233681667_p1487895172817">Consider using LLVM dynamic compilation and optimization only when expressions are used in the following scenarios:</p>
<ul id="EN-US_TOPIC_0000002088892837__en-us_topic_0000001233681667_ul98857582285"><li id="EN-US_TOPIC_0000002088892837__en-us_topic_0000001233681667_li12282556132810"><strong id="EN-US_TOPIC_0000002088892837__en-us_topic_0000001233681667_b6165234750">filter</strong> on the <strong id="EN-US_TOPIC_0000002088892837__en-us_topic_0000001233681667_b174738174719">Scan</strong> node in the case of a vectorized executor.</li><li id="EN-US_TOPIC_0000002088892837__en-us_topic_0000001233681667_li10282356162816"><strong id="EN-US_TOPIC_0000002088892837__en-us_topic_0000001233681667_b74871246669">complicate hash condition</strong>, <strong id="EN-US_TOPIC_0000002088892837__en-us_topic_0000001233681667_b16264517612">hash join filter</strong>, and <strong id="EN-US_TOPIC_0000002088892837__en-us_topic_0000001233681667_b533614571462">hash join target</strong> in the <strong id="EN-US_TOPIC_0000002088892837__en-us_topic_0000001233681667_b18220425772">Hash Join</strong> node.</li><li id="EN-US_TOPIC_0000002088892837__en-us_topic_0000001233681667_li15283105662815"><strong id="EN-US_TOPIC_0000002088892837__en-us_topic_0000001233681667_b7166104518719">filter</strong> and <strong id="EN-US_TOPIC_0000002088892837__en-us_topic_0000001233681667_b268214491970">join filter</strong> in the <strong id="EN-US_TOPIC_0000002088892837__en-us_topic_0000001233681667_b216216563716">Nested Loop</strong> node.</li><li id="EN-US_TOPIC_0000002088892837__en-us_topic_0000001233681667_li1428305612288"><strong id="EN-US_TOPIC_0000002088892837__en-us_topic_0000001233681667_b760413515817">merge join filter</strong> and <strong id="EN-US_TOPIC_0000002088892837__en-us_topic_0000001233681667_b9613199889">merge join target</strong> in the <strong id="EN-US_TOPIC_0000002088892837__en-us_topic_0000001233681667_b24921200810">Merge Join</strong> node.</li><li id="EN-US_TOPIC_0000002088892837__en-us_topic_0000001233681667_li628455682817"><strong id="EN-US_TOPIC_0000002088892837__en-us_topic_0000001233681667_b546442882">filter</strong> in the Group node.</li></ul>
</li></ul>
</div>
<ul id="EN-US_TOPIC_0000002088892837__en-us_topic_0000001233681667_u9274e99851b946e6b5d3a7c488f78b6c"><li id="EN-US_TOPIC_0000002088892837__en-us_topic_0000001233681667_le449ae6114a543a6b00667be8ba232e3">Operators that can use LLVM:<ol id="EN-US_TOPIC_0000002088892837__en-us_topic_0000001233681667_o732da3a860274858842c6fbc61f07011"><li id="EN-US_TOPIC_0000002088892837__en-us_topic_0000001233681667_l40520a7a58fe44de8a394bd0c29823ab">Join: HashJoin</li><li id="EN-US_TOPIC_0000002088892837__en-us_topic_0000001233681667_en-us_topic_0066033419_li16445493366">Agg: HashAgg</li><li id="EN-US_TOPIC_0000002088892837__en-us_topic_0000001233681667_li1141120323914">Sort</li></ol>
<p id="EN-US_TOPIC_0000002088892837__en-us_topic_0000001233681667_p1322295519294">Among them:</p>
<ul id="EN-US_TOPIC_0000002088892837__en-us_topic_0000001233681667_ul1547173403019"><li id="EN-US_TOPIC_0000002088892837__en-us_topic_0000001233681667_li12121330309">HashJoin supports only Hash Inner Join, and the corresponding hash cond supports comparisons between int4, bigint, and bpchar.</li><li id="EN-US_TOPIC_0000002088892837__en-us_topic_0000001233681667_li9213123303010">HashAgg supports sum and avg operations of bigint and numeric data types. Group By statements support int4, bigint, bpchar, text, varchar, timestamp, and the count(*) aggregation operation.</li><li id="EN-US_TOPIC_0000002088892837__en-us_topic_0000001233681667_li1221483313302">Sort supports only comparisons between int4, bigint, numeric, bpchar, text, and varchar data types.</li></ul>
<p id="EN-US_TOPIC_0000002088892837__en-us_topic_0000001233681667_p1522561414316">With the exception of the operations above, LLVM dynamic compilation and optimization cannot be used. To further confirm, use the explain performance tool to check.</p>
</li></ul>
<p id="EN-US_TOPIC_0000002088892837__en-us_topic_0000001233681667_p12529171914240"><strong id="EN-US_TOPIC_0000002088892837__en-us_topic_0000001233681667_b112758853916">Non-Applicable Scenarios</strong></p>
<ul id="EN-US_TOPIC_0000002088892837__en-us_topic_0000001233681667_ua27de605a72c4f93a035ef9a0fec8894"><li id="EN-US_TOPIC_0000002088892837__en-us_topic_0000001233681667_ld1156f530bd34390badfd6d0841390fc">LLVM dynamic compilation and optimization are not supported on CNs.</li><li id="EN-US_TOPIC_0000002088892837__en-us_topic_0000001233681667_l453f595680724eeaa95175739f806932">Tables that have small amounts of data cannot be dynamically compiled using LLVM.</li><li id="EN-US_TOPIC_0000002088892837__en-us_topic_0000001233681667_li20957175217595">Query jobs with a non-vectorized execution path cannot be generated.</li></ul>
<div class="section" id="EN-US_TOPIC_0000002088892837__en-us_topic_0000001233681667_section190275833116"><h4 class="sectiontitle">Other Factors Impacting LLVM Performance</h4><p id="EN-US_TOPIC_0000002088892837__en-us_topic_0000001233681667_aff64dcc417604486ad190147fb9e6962">The result of LLVM optimization depends not only on operations and computation in the database, but also on the hardware environment.</p>
<ul id="EN-US_TOPIC_0000002088892837__en-us_topic_0000001233681667_u8e208e167aa04189ab320d306358a139"><li id="EN-US_TOPIC_0000002088892837__en-us_topic_0000001233681667_l03d3a5739d7045acb5cf979db6e967bb">Number of C- functions invoked by query statements<p id="EN-US_TOPIC_0000002088892837__en-us_topic_0000001233681667_afac41cff7edf4ec083a3c4b579703c81"><a name="EN-US_TOPIC_0000002088892837__en-us_topic_0000001233681667_l03d3a5739d7045acb5cf979db6e967bb"></a><a name="en-us_topic_0000001233681667_l03d3a5739d7045acb5cf979db6e967bb"></a>CodeGen cannot be used in all expressions in an entire expression, that is, some expressions use CodeGen while others invoke original C codes for computation. In an entire expression, if more expressions invoke original C codes, LLVM dynamic compilation and optimization may reduce the computational performance. By setting <strong id="EN-US_TOPIC_0000002088892837__en-us_topic_0000001233681667_b209531648663612">log_min_messages</strong> to <strong id="EN-US_TOPIC_0000002088892837__en-us_topic_0000001233681667_b105175760763612">DEBUG1</strong>, you can check expressions that directly invoke C codes.</p>
</li><li id="EN-US_TOPIC_0000002088892837__en-us_topic_0000001233681667_la78e6bbd243940b9a884e4c46dc50fa4">Memory resources<p id="EN-US_TOPIC_0000002088892837__en-us_topic_0000001233681667_af617ba886bde4377b81f6639c3b5512b"><a name="EN-US_TOPIC_0000002088892837__en-us_topic_0000001233681667_la78e6bbd243940b9a884e4c46dc50fa4"></a><a name="en-us_topic_0000001233681667_la78e6bbd243940b9a884e4c46dc50fa4"></a>One of the key LLVM features is to ensure the locality of data, that is, data should be stored in registers whenever possible. Data loading should be reduced at the same time. Therefore, when using LLVM optimization, the value of work_mem must be set as large as required to ensure that the code is processed in the memory using LLVM. Otherwise, performance may deteriorate.</p>
</li><li id="EN-US_TOPIC_0000002088892837__en-us_topic_0000001233681667_l6910907f79f74533b5bd71c993aa43d6">Optimizer cost estimation<p id="EN-US_TOPIC_0000002088892837__en-us_topic_0000001233681667_a87d4981cd3f1497da702bba6044c58d7"><a name="EN-US_TOPIC_0000002088892837__en-us_topic_0000001233681667_l6910907f79f74533b5bd71c993aa43d6"></a><a name="en-us_topic_0000001233681667_l6910907f79f74533b5bd71c993aa43d6"></a>The LLVM feature realizes a simple cost estimation model. You can determine whether to use LLVM dynamic compilation and optimization for the current node based on the sizes of tables involved in node computation. If the optimizer understates the actual number of rows involved, the expected performance gains may not be realized. An overestimation will have the same effect.</p>
</li></ul>
</div>
<div class="section" id="EN-US_TOPIC_0000002088892837__en-us_topic_0000001233681667_section212525593215"><h4 class="sectiontitle">Recommended Usage of LLVM</h4><p id="EN-US_TOPIC_0000002088892837__en-us_topic_0000001233681667_a74e96233566b4439b8ef0e836b9c5739">LLVM is enabled in the database kernel by default, and users can configure it based on the analysis above. The overall suggestions are as follows:</p>
<ol id="EN-US_TOPIC_0000002088892837__en-us_topic_0000001233681667_o79007bf1997543a392f243e649a64126"><li id="EN-US_TOPIC_0000002088892837__en-us_topic_0000001233681667_l1a57b9f1518b40189ae3b0b0a022ee7e">Set an appropriate value for <strong id="EN-US_TOPIC_0000002088892837__en-us_topic_0000001233681667_b11429924463612">work_mem</strong> and set it as large as possible. If much data is flushed to disks, you are advised to disable LLVM dynamic compilation and optimization by setting <strong id="EN-US_TOPIC_0000002088892837__en-us_topic_0000001233681667_b10946011363612">enable_codegen</strong> to <strong id="EN-US_TOPIC_0000002088892837__en-us_topic_0000001233681667_b29489589763612">off</strong>.</li><li id="EN-US_TOPIC_0000002088892837__en-us_topic_0000001233681667_l0a98ef65bbd74d5e89d56419a40e1a7b">Set an appropriate value for <strong id="EN-US_TOPIC_0000002088892837__en-us_topic_0000001233681667_b165216516263612">codegen_cost_threshold</strong> (The default value is 10,000). Ensure that LLVM dynamic compilation and optimization is not used when the data volume is small. After the value is set, if the database performance deteriorates due to the use of LLVM dynamic compilation and optimization, increase the value.</li><li id="EN-US_TOPIC_0000002088892837__en-us_topic_0000001233681667_l7baa672a60e44a7281c770f6d59ff130">If a large number of C- functions are invoked, you are advised to disable LLVM dynamic compilation and optimization.</li><li id="EN-US_TOPIC_0000002088892837__en-us_topic_0000001233681667_li8308205120418">The constants following the <strong id="EN-US_TOPIC_0000002088892837__en-us_topic_0000001233681667_b29216241263612">In</strong> expression cannot exceed 10. Otherwise, LLVM compilation and optimization cannot be performed.<div class="note" id="EN-US_TOPIC_0000002088892837__en-us_topic_0000001233681667_n6ff2a71af153457b9322c69a873c4fc1"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="EN-US_TOPIC_0000002088892837__en-us_topic_0000001233681667_a5e8d9dc8889c4fa889e118935ef425cc">If resources are sufficient, the database performance will improve as the data volume increases.</p>
</div></div>
</li></ol>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="dws_04_0414.html">System Optimization</a></div>
</div>
</div>