Files
doc-exports/docs/dli/sqlreference/dli_08_15071.html
Su, Xiaomeng be9eabe464 dli_sqlreference_20250305
Reviewed-by: Pruthi, Vineet <vineet.pruthi@t-systems.com>
Co-authored-by: Su, Xiaomeng <suxiaomeng1@huawei.com>
Co-committed-by: Su, Xiaomeng <suxiaomeng1@huawei.com>
2025-03-25 09:06:21 +00:00

145 lines
14 KiB
HTML

<a name="dli_08_15071"></a><a name="dli_08_15071"></a>
<h1 class="topictitle1">Window Aggregation</h1>
<div id="body0000001824150228"><div class="section" id="dli_08_15071__section17795153816216"><h4 class="sectiontitle">Window TVF Aggregation</h4><p id="dli_08_15071__p1149217498213">Window aggregations are defined in the <strong id="dli_08_15071__b1245320414353">GROUP BY</strong> clause contains "window_start" and "window_end" columns of the relation applied <a href="dli_08_15070.html#dli_08_15070__section3516193316120">Windowing TVF</a>. Just like queries with regular <strong id="dli_08_15071__b43071432372">GROUP BY</strong> clauses, queries with a group by window aggregation will compute a single result row per group. Unlike other aggregations on continuous tables, window aggregation do not emit intermediate results but only a final result, the total aggregation at the end of the window. Moreover, window aggregations purge all intermediate state when no longer needed.</p>
<p id="dli_08_15071__p6744154310334">For more information, see <a href="https://nightlies.apache.org/flink/flink-docs-release-1.15/zh/docs/dev/table/sql/queries/window-agg/" target="_blank" rel="noopener noreferrer">Window Aggregation</a>.</p>
<div class="note" id="dli_08_15071__note171701338510"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="dli_08_15071__p71708381518">The start and end timestamps of group windows can be selected with the grouped <strong id="dli_08_15071__b17692983227">window_start</strong> and <strong id="dli_08_15071__b15746111162216">window_end</strong> columns.</p>
</div></div>
<ul id="dli_08_15071__ul995195014011"><li id="dli_08_15071__li89517506011"><strong id="dli_08_15071__b19769144593919">Windowing TVFs</strong><p id="dli_08_15071__p1327834620412">Flink supports <strong id="dli_08_15071__b167557365400">TUMBLE</strong>, <strong id="dli_08_15071__b8812738134019">HOP</strong> and <strong id="dli_08_15071__b2115114218403">CUMULATE</strong> types of window aggregations.</p>
<ul id="dli_08_15071__ul13650135617416"><li id="dli_08_15071__li192761651104116">In streaming mode, the time attribute field of a window table-valued function must be on either event or processing time attributes. See <a href="dli_08_15070.html#dli_08_15070__section3516193316120">Windowing TVF</a> for more windowing functions information.</li><li id="dli_08_15071__li6276751124118">In batch mode, the time attribute field of a window table-valued function must be an attribute of type <strong id="dli_08_15071__b29871436144118">TIMESTAMP</strong> or <strong id="dli_08_15071__b1589415382417">TIMESTAMP_LTZ</strong>.</li></ul>
<pre class="screen" id="dli_08_15071__screen34901014810">-- tables must have time attribute, e.g. `bidtime` in this table
Flink SQL&gt; desc Bid;
+-------------+------------------------+------+-----+--------+---------------------------------+
| name | type | null | key | extras | watermark |
+-------------+------------------------+------+-----+--------+---------------------------------+
| bidtime | TIMESTAMP(3) *ROWTIME* | true | | | `bidtime` - INTERVAL '1' SECOND |
| price | DECIMAL(10, 2) | true | | | |
| item | STRING | true | | | |
| supplier_id | STRING | true | | | |
+-------------+------------------------+------+-----+--------+---------------------------------+
Flink SQL&gt; SELECT * FROM Bid;
+------------------+-------+------+-------------+
| bidtime | price | item | supplier_id |
+------------------+-------+------+-------------+
| 2020-04-15 08:05 | 4.00 | C | supplier1 |
| 2020-04-15 08:07 | 2.00 | A | supplier1 |
| 2020-04-15 08:09 | 5.00 | D | supplier2 |
| 2020-04-15 08:11 | 3.00 | B | supplier2 |
| 2020-04-15 08:13 | 1.00 | E | supplier1 |
| 2020-04-15 08:17 | 6.00 | F | supplier2 |
+------------------+-------+------+-------------+
-- tumbling window aggregation
Flink SQL&gt; SELECT window_start, window_end, SUM(price)
FROM TABLE(
TUMBLE(TABLE Bid, DESCRIPTOR(bidtime), INTERVAL '10' MINUTES))
GROUP BY window_start, window_end;
+------------------+------------------+-------+
| window_start | window_end | price |
+------------------+------------------+-------+
| 2020-04-15 08:00 | 2020-04-15 08:10 | 11.00 |
| 2020-04-15 08:10 | 2020-04-15 08:20 | 10.00 |
+------------------+------------------+-------+
-- hopping window aggregation
Flink SQL&gt; SELECT window_start, window_end, SUM(price)
FROM TABLE(
HOP(TABLE Bid, DESCRIPTOR(bidtime), INTERVAL '5' MINUTES, INTERVAL '10' MINUTES))
GROUP BY window_start, window_end;
+------------------+------------------+-------+
| window_start | window_end | price |
+------------------+------------------+-------+
| 2020-04-15 08:00 | 2020-04-15 08:10 | 11.00 |
| 2020-04-15 08:05 | 2020-04-15 08:15 | 15.00 |
| 2020-04-15 08:10 | 2020-04-15 08:20 | 10.00 |
| 2020-04-15 08:15 | 2020-04-15 08:25 | 6.00 |
+------------------+------------------+-------+
-- cumulative window aggregation
Flink SQL&gt; SELECT window_start, window_end, SUM(price)
FROM TABLE(
CUMULATE(TABLE Bid, DESCRIPTOR(bidtime), INTERVAL '2' MINUTES, INTERVAL '10' MINUTES))
GROUP BY window_start, window_end;
+------------------+------------------+-------+
| window_start | window_end | price |
+------------------+------------------+-------+
| 2020-04-15 08:00 | 2020-04-15 08:06 | 4.00 |
| 2020-04-15 08:00 | 2020-04-15 08:08 | 6.00 |
| 2020-04-15 08:00 | 2020-04-15 08:10 | 11.00 |
| 2020-04-15 08:10 | 2020-04-15 08:12 | 3.00 |
| 2020-04-15 08:10 | 2020-04-15 08:14 | 4.00 |
| 2020-04-15 08:10 | 2020-04-15 08:16 | 4.00 |
| 2020-04-15 08:10 | 2020-04-15 08:18 | 10.00 |
| 2020-04-15 08:10 | 2020-04-15 08:20 | 10.00 |</pre>
</li></ul>
</div>
<ul id="dli_08_15071__ul06731615116"><li id="dli_08_15071__li1267317116119"><strong id="dli_08_15071__b08383401298">GROUPING SETS</strong><p id="dli_08_15071__p1913811530917">Window aggregations also support <strong id="dli_08_15071__b141815544428">GROUPING SETS</strong> syntax. Grouping sets allow for more complex grouping operations than those describable by a standard <strong id="dli_08_15071__b15398112164311">GROUP BY</strong>. Rows are grouped separately by each specified grouping set and aggregates are computed for each group just as for simple <strong id="dli_08_15071__b1582211449436">GROUP BY</strong> clauses.</p>
<p id="dli_08_15071__p151386531791">Window aggregations with <strong id="dli_08_15071__b4598165516434">GROUPING SETS</strong> require both the <strong id="dli_08_15071__b11481854411">window_start</strong> and <strong id="dli_08_15071__b11920181013442">window_end</strong> columns have to be in the <strong id="dli_08_15071__b2024041724414">GROUP BY</strong> clause, but not in the <strong id="dli_08_15071__b18961112017447">GROUPING SETS</strong> clause.</p>
<pre class="screen" id="dli_08_15071__screen10246144519103">Flink SQL&gt; SELECT window_start, window_end, supplier_id, SUM(price) as price
FROM TABLE(
TUMBLE(TABLE Bid, DESCRIPTOR(bidtime), INTERVAL '10' MINUTES))
GROUP BY window_start, window_end, GROUPING SETS ((supplier_id), ());
+------------------+------------------+-------------+-------+
| window_start | window_end | supplier_id | price |
+------------------+------------------+-------------+-------+
| 2020-04-15 08:00 | 2020-04-15 08:10 | (NULL) | 11.00 |
| 2020-04-15 08:00 | 2020-04-15 08:10 | supplier2 | 5.00 |
| 2020-04-15 08:00 | 2020-04-15 08:10 | supplier1 | 6.00 |
| 2020-04-15 08:10 | 2020-04-15 08:20 | (NULL) | 10.00 |
| 2020-04-15 08:10 | 2020-04-15 08:20 | supplier2 | 9.00 |
| 2020-04-15 08:10 | 2020-04-15 08:20 | supplier1 | 1.00 |
+------------------+------------------+-------------+-------+</pre>
<p id="dli_08_15071__p1683813391018">Each sublist of <strong id="dli_08_15071__b1683134504410">GROUPING SETS</strong> may specify zero or more columns or expressions and is interpreted the same way as though used directly in the <strong id="dli_08_15071__b16135516440">GROUP BY</strong> clause. An empty grouping set means that all rows are aggregated down to a single group, which is output even if no input rows were present.</p>
<p id="dli_08_15071__p2838173311015">References to the grouping columns or expressions are replaced by null values in result rows for grouping sets in which those columns do not appear. For example, <strong id="dli_08_15071__b1467562662217">()</strong> in <strong id="dli_08_15071__b155311830182213">GROUPING SETS ((supplier_id), ())</strong> in the preceding example is an empty sublist, and the <strong id="dli_08_15071__b1923105242218">supplier_id</strong> column in the corresponding result data is filled with <strong id="dli_08_15071__b19164185602216">NULL</strong>.</p>
</li><li id="dli_08_15071__li18529811712"><strong id="dli_08_15071__b144213151210">ROLLUP</strong><p id="dli_08_15071__p33441535114211"><strong id="dli_08_15071__b1234019774619">ROLLUP</strong> is a shorthand notation for specifying a common type of grouping set. It represents the given list of expressions and all prefixes of the list, including the empty list.</p>
<p id="dli_08_15071__p6961869122">For example, <strong id="dli_08_15071__b11828141102314">ROLLUP (one,two)</strong> is equivalent to <strong id="dli_08_15071__b1323026112318">GROUPING SET((one,two),(one),())</strong>.</p>
<p id="dli_08_15071__p109613641213">Window aggregations with <strong id="dli_08_15071__b1610371418471">ROLLUP</strong> requires both the <strong id="dli_08_15071__b1611571713473">window_start</strong> and <strong id="dli_08_15071__b18557181911479">window_end</strong> columns have to be in the <strong id="dli_08_15071__b143861423184714">GROUP BY</strong> clause, but not in the <strong id="dli_08_15071__b144047278472">ROLLUP</strong> clause.</p>
<p id="dli_08_15071__p1496110619128">For example, the following query is equivalent to the one above.</p>
<pre class="screen" id="dli_08_15071__screen1565551716124">SELECT window_start, window_end, supplier_id, SUM(price) as price
FROM TABLE(
TUMBLE(TABLE Bid, DESCRIPTOR(bidtime), INTERVAL '10' MINUTES))
GROUP BY window_start, window_end, ROLLUP (supplier_id);</pre>
</li><li id="dli_08_15071__li1242191813111"><strong id="dli_08_15071__b20893192614123">CUBE</strong><p id="dli_08_15071__p1646014319125"><strong id="dli_08_15071__b876771124913">CUBE</strong> is a shorthand notation for specifying a common type of grouping set. It represents the given list and all of its possible subsets - the power set.</p>
<p id="dli_08_15071__p114608430126">Window aggregations with <strong id="dli_08_15071__b1895841114913">CUBE</strong> requires both the <strong id="dli_08_15071__b161411454495">window_start</strong> and <strong id="dli_08_15071__b1712844816491">window_end</strong> columns have to be in the <strong id="dli_08_15071__b2229185184919">GROUP BY</strong> clause, but not in the <strong id="dli_08_15071__b246715541499">CUBE</strong> clause.</p>
<p id="dli_08_15071__p2460164316122">For example, the following two queries are equivalent.</p>
<pre class="screen" id="dli_08_15071__screen99195534124">SELECT window_start, window_end, item, supplier_id, SUM(price) as price
FROM TABLE(
TUMBLE(TABLE Bid, DESCRIPTOR(bidtime), INTERVAL '10' MINUTES))
GROUP BY window_start, window_end, CUBE (supplier_id, item);
SELECT window_start, window_end, item, supplier_id, SUM(price) as price
FROM TABLE(
TUMBLE(TABLE Bid, DESCRIPTOR(bidtime), INTERVAL '10' MINUTES))
GROUP BY window_start, window_end, GROUPING SETS (
(supplier_id, item),
(supplier_id ),
( item),
( )
)</pre>
</li></ul>
<ul id="dli_08_15071__ul15646446718"><li id="dli_08_15071__li964617462112"><strong id="dli_08_15071__b55571227135013">Cascading Window Aggregation</strong><p id="dli_08_15071__p1930394244218">The <strong id="dli_08_15071__b106181339175015">window_start</strong> and <strong id="dli_08_15071__b5104174213508">window_end</strong> columns are regular timestamp columns, not time attributes. Thus they can not be used as time attributes in subsequent time-based operations.</p>
<p id="dli_08_15071__p36071854151519">To propagate time attributes, you need to additionally add <strong id="dli_08_15071__b16549192112517">window_time</strong> column into <strong id="dli_08_15071__b1244626105119">GROUP BY</strong> clause. The <strong id="dli_08_15071__b1882173717516">window_time</strong> is the third column produced by <a href="dli_08_15070.html#dli_08_15070__section3516193316120">Windowing Table-Valued Functions (Windowing TVFs)</a> which is a time attribute of the assigned window. Adding <strong id="dli_08_15071__b413017209520">window_time</strong> into <strong id="dli_08_15071__b12973112218522">GROUP BY</strong> clause makes <strong id="dli_08_15071__b12248112612526">window_time</strong> also to be group key that can be selected. Then following queries can use this column for subsequent time-based operations, such as cascading window aggregations and Window TopN.</p>
<p id="dli_08_15071__p17607125441512">The following shows a cascading window aggregation where the first window aggregation propagates the time attribute for the second window aggregation.</p>
<pre class="screen" id="dli_08_15071__screen61565920175">-- tumbling 5 minutes for each supplier_id
CREATE VIEW window1 AS
-- Note: The window start and window end fields of inner Window TVF are optional in the select clause. However, if they appear in the clause, they need to be aliased to prevent name conflicting with the window start and window end of the outer Window TVF.
SELECT window_start as window_5mintumble_start, window_end as window_5mintumble_end, window_time as rowtime, SUM(price) as partial_price
FROM TABLE(
TUMBLE(TABLE Bid, DESCRIPTOR(bidtime), INTERVAL '5' MINUTES))
GROUP BY supplier_id, window_start, window_end, window_time;
-- tumbling 10 minutes on the first window
SELECT window_start, window_end, SUM(partial_price) as total_price
FROM TABLE(
TUMBLE(TABLE window1, DESCRIPTOR(rowtime), INTERVAL '10' MINUTES))
GROUP BY window_start, window_end;</pre>
</li></ul>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="dli_08_15069.html">Window</a></div>
</div>
</div>