Files
doc-exports/docs/dli/sqlreference/dli_08_15076.html
Su, Xiaomeng be9eabe464 dli_sqlreference_20250305
Reviewed-by: Pruthi, Vineet <vineet.pruthi@t-systems.com>
Co-authored-by: Su, Xiaomeng <suxiaomeng1@huawei.com>
Co-committed-by: Su, Xiaomeng <suxiaomeng1@huawei.com>
2025-03-25 09:06:21 +00:00

63 lines
10 KiB
HTML

<a name="dli_08_15076"></a><a name="dli_08_15076"></a>
<h1 class="topictitle1">Over Aggregation</h1>
<div id="body0000001871115657"><p id="dli_08_15076__p2032458191313"><strong id="dli_08_15076__b196438415296">OVER</strong> aggregates compute an aggregated value for every input row over a range of ordered rows. In contrast to <strong id="dli_08_15076__b324618992910">GROUP BY</strong> aggregates, <strong id="dli_08_15076__b2092711112298">OVER</strong> aggregates do not reduce the number of result rows to a single row for every group. Instead <strong id="dli_08_15076__b87561924192915">OVER</strong> aggregates produce an aggregated value for every input row.</p>
<p id="dli_08_15076__p0535194614238">For more information, see <a href="https://nightlies.apache.org/flink/flink-docs-release-1.15/zh/docs/dev/table/sql/queries/over-agg/" target="_blank" rel="noopener noreferrer">Over Aggregation</a>.</p>
<div class="section" id="dli_08_15076__section36317259497"><h4 class="sectiontitle">Syntax</h4><div class="codecoloring" codetype="Sql" id="dli_08_15076__dli_08_0218_screen19309191263420"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
<span class="normal">2</span>
<span class="normal">3</span>
<span class="normal">4</span>
<span class="normal">5</span>
<span class="normal">6</span>
<span class="normal">7</span></pre></div></td><td class="code"><div><pre><span></span><span class="k">SELECT</span>
<span class="w"> </span><span class="n">agg_func</span><span class="p">(</span><span class="n">agg_col</span><span class="p">)</span><span class="w"> </span><span class="n">OVER</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="p">[</span><span class="n">PARTITION</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">col1</span><span class="p">[,</span><span class="w"> </span><span class="n">col2</span><span class="p">,</span><span class="w"> </span><span class="p">...]]</span>
<span class="w"> </span><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">time_col</span>
<span class="w"> </span><span class="n">range_definition</span><span class="p">),</span>
<span class="w"> </span><span class="p">...</span>
<span class="k">FROM</span><span class="w"> </span><span class="p">...</span>
</pre></div></td></tr></table></div>
</div>
</div>
<div class="section" id="dli_08_15076__section1915564514911"><h4 class="sectiontitle">Caveats</h4><ul id="dli_08_15076__ul177866310176"><li id="dli_08_15076__dli_08_0218_li13261491242">Currently, only windows from <strong id="dli_08_15076__b889914312304">PRECEDING</strong> (unbounded or bounded) to <strong id="dli_08_15076__b20549646183016">CURRENT ROW</strong> are supported. The range described by <strong id="dli_08_15076__b1720565233018">FOLLOWING</strong> is not supported.</li><li id="dli_08_15076__dli_08_0218_li19326174912416"><strong id="dli_08_15076__b17294191920304">ORDER BY</strong> must be specified for a single time attribute.</li><li id="dli_08_15076__li169761726121415">You can define multiple <strong id="dli_08_15076__b10937175221020">OVER</strong> window aggregates in a <strong id="dli_08_15076__b1016817555106">SELECT</strong> clause. However, for streaming queries, the <strong id="dli_08_15076__b172851732116">OVER</strong> windows for all aggregates must be identical due to current limitation.</li><li id="dli_08_15076__li5786123181713"><strong id="dli_08_15076__b222977201117">OVER</strong> windows are defined on an ordered sequence of rows. Since tables do not have an inherent order, the <strong id="dli_08_15076__b10996112916118">ORDER BY</strong> clause is mandatory. For streaming queries, Flink currently only supports <strong id="dli_08_15076__b156415571319">OVER</strong> windows that are defined with an ascending time attributes order. Additional orderings are not supported.</li></ul>
</div>
<div class="section" id="dli_08_15076__section1790165584916"><h4 class="sectiontitle">Description</h4><pre class="screen" id="dli_08_15076__screen480717411207">SELECT order_id, order_time, amount,
SUM(amount) OVER w AS sum_amount,
AVG(amount) OVER w AS avg_amount
FROM Orders
WINDOW w AS (
PARTITION BY product
ORDER BY order_time
RANGE BETWEEN INTERVAL '1' HOUR PRECEDING AND CURRENT ROW)</pre>
<ul id="dli_08_15076__ul194661734101610"><li id="dli_08_15076__li1319431314505"><strong id="dli_08_15076__b16194713195010">ORDER BY</strong>: <strong id="dli_08_15076__b1262432510319">OVER</strong> windows are defined on an ordered sequence of rows. Since tables do not have an inherent order, the <strong id="dli_08_15076__b4186103918314">ORDER BY</strong> clause is mandatory. For streaming queries, Flink currently only supports <strong id="dli_08_15076__b1969875333113">OVER</strong> windows that are defined with an ascending time attributes order. Additional orderings are not supported.</li><li id="dli_08_15076__li1526921517501"><strong id="dli_08_15076__b1426910157508">PARTITION BY</strong>: <strong id="dli_08_15076__b1470713133212">OVER</strong> windows can be defined on a partitioned table. In presence of a <strong id="dli_08_15076__b918122317323">PARTITION BY</strong> clause, the aggregate is computed for each input row only over the rows of its partition.</li><li id="dli_08_15076__li3466834171614"><strong id="dli_08_15076__b18989193683218">Range Definitions</strong>: The range definition specifies how many rows are included in the aggregate. The range is defined with a <strong id="dli_08_15076__b1663410010332">BETWEEN</strong> clause that defines a lower and an upper boundary. All rows between these boundaries are included in the aggregate. Flink only supports <strong id="dli_08_15076__b14867117193312">CURRENT ROW</strong> as the upper boundary. There are two options to define the range, <strong id="dli_08_15076__b15964129143310">ROWS</strong> intervals and <strong id="dli_08_15076__b1637093311336">RANGE</strong> intervals.<ol id="dli_08_15076__ol164549533174"><li id="dli_08_15076__li745485331717"><strong id="dli_08_15076__b310315819177">RANGE intervals</strong><p id="dli_08_15076__p179029311189">A <strong id="dli_08_15076__b986216589338">RANGE</strong> interval is defined on the values of the <strong id="dli_08_15076__b1085252173418">ORDER BY</strong> column, which is in case of Flink always a time attribute. The following <strong id="dli_08_15076__b113656141349">RANGE</strong> interval defines that all rows with a time attribute of at most 30 minutes less than the current row are included in the aggregate.</p>
<pre class="screen" id="dli_08_15076__screen1993722031818">RANGE BETWEEN INTERVAL '30' MINUTE PRECEDING AND CURRENT ROW</pre>
</li><li id="dli_08_15076__li346292111812"><strong id="dli_08_15076__b1610172716181">ROW intervals</strong><p id="dli_08_15076__p95121734101812">A <strong id="dli_08_15076__b630193943416">ROWS</strong> interval is a count-based interval. It defines exactly how many rows are included in the aggregate. The following <strong id="dli_08_15076__b068272718354">ROWS</strong> interval defines that the 10 rows preceding the current row and the current row (so 11 rows in total) are included in the aggregate.</p>
<pre class="screen" id="dli_08_15076__screen176415761910">ROWS BETWEEN 10 PRECEDING AND CURRENT ROW</pre>
</li></ol>
</li><li id="dli_08_15076__li764919383502"><strong id="dli_08_15076__b98351950183511">WINDOW</strong>: The <strong id="dli_08_15076__b108131054173515">WINDOW</strong> clause can be used to define an <strong id="dli_08_15076__b0516105773519">OVER</strong> window outside of the <strong id="dli_08_15076__b97635093610">SELECT</strong> clause. It can make queries more readable and also allows us to reuse the window definition for multiple aggregates.</li></ul>
</div>
<div class="section" id="dli_08_15076__section950414289518"><h4 class="sectiontitle">Example</h4><p id="dli_08_15076__p8826182516207">The following query computes for every order the sum of amounts of all orders for the same product that were received within one hour before the current order.</p>
<div class="codecoloring" codetype="Sql" id="dli_08_15076__dli_08_0218_screen42401715953"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
<span class="normal">2</span>
<span class="normal">3</span>
<span class="normal">4</span>
<span class="normal">5</span>
<span class="normal">6</span>
<span class="normal">7</span></pre></div></td><td class="code"><div><pre><span></span><span class="k">SELECT</span><span class="w"> </span><span class="n">order_id</span><span class="p">,</span><span class="w"> </span><span class="n">order_time</span><span class="p">,</span><span class="w"> </span><span class="n">amount</span><span class="p">,</span>
<span class="w"> </span><span class="k">SUM</span><span class="p">(</span><span class="n">amount</span><span class="p">)</span><span class="w"> </span><span class="n">OVER</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="n">PARTITION</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">product</span>
<span class="w"> </span><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">order_time</span>
<span class="w"> </span><span class="n">RANGE</span><span class="w"> </span><span class="k">BETWEEN</span><span class="w"> </span><span class="nb">INTERVAL</span><span class="w"> </span><span class="s1">'1'</span><span class="w"> </span><span class="n">HOUR</span><span class="w"> </span><span class="n">PRECEDING</span><span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="k">CURRENT</span><span class="w"> </span><span class="k">ROW</span>
<span class="w"> </span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">one_hour_prod_amount_sum</span>
<span class="k">FROM</span><span class="w"> </span><span class="n">Orders</span>
</pre></div></td></tr></table></div>
</div>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="dli_08_15066.html">DML Snytax</a></div>
</div>
</div>