forked from docs/doc-exports
Reviewed-by: Pruthi, Vineet <vineet.pruthi@t-systems.com> Co-authored-by: Su, Xiaomeng <suxiaomeng1@huawei.com> Co-committed-by: Su, Xiaomeng <suxiaomeng1@huawei.com>
368 lines
35 KiB
HTML
368 lines
35 KiB
HTML
<a name="dli_08_15070"></a><a name="dli_08_15070"></a>
|
||
|
||
<h1 class="topictitle1">Window Functions</h1>
|
||
<div id="body0000001870730249"><div class="section" id="dli_08_15070__section3516193316120"><a name="dli_08_15070__section3516193316120"></a><a name="section3516193316120"></a><h4 class="sectiontitle">Windowing Table-Valued Functions (Windowing TVFs)</h4><p id="dli_08_15070__p1739213520121">Windows are at the heart of processing infinite streams. Windows split the stream into "buckets" of finite size, over which we can apply computations.</p>
|
||
<p id="dli_08_15070__p116691712201315">Apache Flink provides several <strong id="dli_08_15070__b134961613312">window table-valued functions (TVF)</strong> to divide the elements of your table into windows, including:</p>
|
||
<ul id="dli_08_15070__ul119672415137"><li id="dli_08_15070__li1919612246131">Tumble Windows</li><li id="dli_08_15070__li153601730161318">Hop Windows</li><li id="dli_08_15070__li1655912187149">Cumulate Windows</li></ul>
|
||
<p id="dli_08_15070__p1051921516318">Note that each element can logically belong to more than one window, depending on the windowing table-valued function you use. For example, HOP windowing creates overlapping windows wherein a single element can be assigned to multiple windows.</p>
|
||
<p id="dli_08_15070__p793431313311">Windowing TVFs are Flink defined Polymorphic Table Functions (abbreviated PTF). PTF is part of the SQL 2016 standard, a special table-function, but can have a table as a parameter.</p>
|
||
<p id="dli_08_15070__p175281173315">Windowing TVFs is a replacement of legacy Grouped Window Functions. Windowing TVFs is more SQL standard compliant and more powerful to support complex window-based computations, e.g. Window TopN, Window Join. However, Grouped Window Functions can only support Window Aggregation.</p>
|
||
<p id="dli_08_15070__p793320584219">For more information, see <a href="https://nightlies.apache.org/flink/flink-docs-release-1.15/zh/docs/dev/table/sql/queries/window-tvf/" target="_blank" rel="noopener noreferrer">Window Functions</a>.</p>
|
||
</div>
|
||
<div class="section" id="dli_08_15070__section06021219101211"><h4 class="sectiontitle">Window Functions</h4><p id="dli_08_15070__p13526645543">Apache Flink provides 3 built-in windowing TVFs: <strong id="dli_08_15070__b57794115417">TUMBLE</strong>, <strong id="dli_08_15070__b1429924411417">HOP</strong> and <strong id="dli_08_15070__b250016454414">CUMULATE</strong>.</p>
|
||
<p id="dli_08_15070__p87335117417">The return value of windowing TVF is a new relation that includes all columns of original relation as well as additional 3 columns named "window_start", "window_end", "window_time" to indicate the assigned window.</p>
|
||
<p id="dli_08_15070__p033415302126">In batch mode, the "window_time" field is an attribute of type <strong id="dli_08_15070__b327012519617">TIMESTAMP</strong> or <strong id="dli_08_15070__b172171028565">TIMESTAMP_LTZ</strong> based on input time field type. The "window_time" field can be used in subsequent time-based operations, e.g. another windowing TVF, or interval joins, over aggregations. The value of window_time always equal to window_end – 1 ms.</p>
|
||
</div>
|
||
<div class="section" id="dli_08_15070__section541623111168"><h4 class="sectiontitle">TUMBLE</h4><ul id="dli_08_15070__ul155467281886"><li id="dli_08_15070__li15827592517"><strong id="dli_08_15070__b376530412104456">Function</strong><p id="dli_08_15070__p22733517368">The <strong id="dli_08_15070__b27381932491">TUMBLE</strong> function assigns each element to a window of specified window size. Tumbling windows have a fixed size and do not overlap.</p>
|
||
<p id="dli_08_15070__p42411191256">For example, suppose you specify a tumbling window with a size of 5 minutes. In that case, Flink will evaluate the current window, and a new window started every five minutes.</p>
|
||
<div class="fignone" id="dli_08_15070__fig146599127245"><span class="figcap"><b>Figure 1 </b>Tumbling window</span><br><span><img id="dli_08_15070__image1933652113217" src="en-us_image_0000001870733085.png"></span></div>
|
||
</li><li id="dli_08_15070__li8801144213615"><strong id="dli_08_15070__b1301133814611">Description</strong><p id="dli_08_15070__p10983928173612">The <strong id="dli_08_15070__b1766815338503">TUMBLE</strong> function assigns a window for each row of a relation based on a time attribute field. In streaming mode, the time attribute field must be either event or processing time attributes. In batch mode, the time attribute field of window table function must be an attribute of type <strong id="dli_08_15070__b81005142511">TIMESTAMP</strong> or <strong id="dli_08_15070__b76571516175120">TIMESTAMP_LTZ</strong>.</p>
|
||
<p id="dli_08_15070__p1977554116361">The return value of <strong id="dli_08_15070__b2813193917919">TUMBLE</strong> is a new relation that includes all columns of original relation as well as additional 3 columns named "window_start", "window_end", "window_time" to indicate the assigned window. The original time attribute "timecol" will be a regular timestamp column after window TVF.</p>
|
||
<pre class="screen" id="dli_08_15070__screen177871928382">TUMBLE(TABLE data, DESCRIPTOR(timecol), size [, offset ])</pre>
|
||
|
||
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="dli_08_15070__table2152513173710" frame="border" border="1" rules="all"><caption><b>Table 1 </b>TUMBLE function parameters</caption><thead align="left"><tr id="dli_08_15070__row415213136377"><th align="left" class="cellrowborder" valign="top" width="22.322232223222326%" id="mcps1.3.3.2.2.5.2.4.1.1"><p id="dli_08_15070__p615261311376">Parameter</p>
|
||
</th>
|
||
<th align="left" class="cellrowborder" valign="top" width="23.25232523252325%" id="mcps1.3.3.2.2.5.2.4.1.2"><p id="dli_08_15070__p141531713133718">Mandatory</p>
|
||
</th>
|
||
<th align="left" class="cellrowborder" valign="top" width="54.42544254425443%" id="mcps1.3.3.2.2.5.2.4.1.3"><p id="dli_08_15070__p15153513103715">Description</p>
|
||
</th>
|
||
</tr>
|
||
</thead>
|
||
<tbody><tr id="dli_08_15070__row1915314139372"><td class="cellrowborder" valign="top" width="22.322232223222326%" headers="mcps1.3.3.2.2.5.2.4.1.1 "><p id="dli_08_15070__p141531613143711">data</p>
|
||
</td>
|
||
<td class="cellrowborder" valign="top" width="23.25232523252325%" headers="mcps1.3.3.2.2.5.2.4.1.2 "><p id="dli_08_15070__p15153181353711">Yes</p>
|
||
</td>
|
||
<td class="cellrowborder" valign="top" width="54.42544254425443%" headers="mcps1.3.3.2.2.5.2.4.1.3 "><p id="dli_08_15070__p315319136371">A table parameter that can be any relation with a time attribute column.</p>
|
||
</td>
|
||
</tr>
|
||
<tr id="dli_08_15070__row0153313103718"><td class="cellrowborder" valign="top" width="22.322232223222326%" headers="mcps1.3.3.2.2.5.2.4.1.1 "><p id="dli_08_15070__p181531613183715">timecol</p>
|
||
</td>
|
||
<td class="cellrowborder" valign="top" width="23.25232523252325%" headers="mcps1.3.3.2.2.5.2.4.1.2 "><p id="dli_08_15070__p1015351310374">Yes</p>
|
||
</td>
|
||
<td class="cellrowborder" valign="top" width="54.42544254425443%" headers="mcps1.3.3.2.2.5.2.4.1.3 "><p id="dli_08_15070__p131531813183711">A column descriptor indicating which time attributes column of data should be mapped to tumbling windows.</p>
|
||
</td>
|
||
</tr>
|
||
<tr id="dli_08_15070__row175597413371"><td class="cellrowborder" valign="top" width="22.322232223222326%" headers="mcps1.3.3.2.2.5.2.4.1.1 "><p id="dli_08_15070__p355934193715">size</p>
|
||
</td>
|
||
<td class="cellrowborder" valign="top" width="23.25232523252325%" headers="mcps1.3.3.2.2.5.2.4.1.2 "><p id="dli_08_15070__p11559541163716">Yes</p>
|
||
</td>
|
||
<td class="cellrowborder" valign="top" width="54.42544254425443%" headers="mcps1.3.3.2.2.5.2.4.1.3 "><p id="dli_08_15070__p955974117371">A duration specifying the width of the tumbling windows.</p>
|
||
</td>
|
||
</tr>
|
||
<tr id="dli_08_15070__row1559184111370"><td class="cellrowborder" valign="top" width="22.322232223222326%" headers="mcps1.3.3.2.2.5.2.4.1.1 "><p id="dli_08_15070__p20559204110373">offset</p>
|
||
</td>
|
||
<td class="cellrowborder" valign="top" width="23.25232523252325%" headers="mcps1.3.3.2.2.5.2.4.1.2 "><p id="dli_08_15070__p5559124153711">No</p>
|
||
</td>
|
||
<td class="cellrowborder" valign="top" width="54.42544254425443%" headers="mcps1.3.3.2.2.5.2.4.1.3 "><p id="dli_08_15070__p95591041183714">Offset which window start would be shifted by.</p>
|
||
</td>
|
||
</tr>
|
||
</tbody>
|
||
</table>
|
||
</div>
|
||
</li><li id="dli_08_15070__li1970017531620"><strong id="dli_08_15070__b1068764145104438">Example</strong><pre class="screen" id="dli_08_15070__screen31861248171819">-- tables must have time attribute, e.g. `bidtime` in this table
|
||
Flink SQL> desc Bid;
|
||
+-------------+------------------------+------+-----+--------+---------------------------------+
|
||
| name | type | null | key | extras | watermark |
|
||
+-------------+------------------------+------+-----+--------+---------------------------------+
|
||
| bidtime | TIMESTAMP(3) *ROWTIME* | true | | | `bidtime` - INTERVAL '1' SECOND |
|
||
| price | DECIMAL(10, 2) | true | | | |
|
||
| item | STRING | true | | | |
|
||
+-------------+------------------------+------+-----+--------+---------------------------------+
|
||
|
||
Flink SQL> SELECT * FROM Bid;
|
||
+------------------+-------+------+
|
||
| bidtime | price | item |
|
||
+------------------+-------+------+
|
||
| 2020-04-15 08:05 | 4.00 | C |
|
||
| 2020-04-15 08:07 | 2.00 | A |
|
||
| 2020-04-15 08:09 | 5.00 | D |
|
||
| 2020-04-15 08:11 | 3.00 | B |
|
||
| 2020-04-15 08:13 | 1.00 | E |
|
||
| 2020-04-15 08:17 | 6.00 | F |
|
||
+------------------+-------+------+
|
||
|
||
Flink SQL> SELECT * FROM TABLE(
|
||
TUMBLE(TABLE Bid, DESCRIPTOR(bidtime), INTERVAL '10' MINUTES));
|
||
-- or with the named params
|
||
-- note: the DATA param must be the first
|
||
Flink SQL> SELECT * FROM TABLE(
|
||
TUMBLE(
|
||
DATA => TABLE Bid,
|
||
TIMECOL => DESCRIPTOR(bidtime),
|
||
SIZE => INTERVAL '10' MINUTES));
|
||
+------------------+-------+------+------------------+------------------+-------------------------+
|
||
| bidtime | price | item | window_start | window_end | window_time |
|
||
+------------------+-------+------+------------------+------------------+-------------------------+
|
||
| 2020-04-15 08:05 | 4.00 | C | 2020-04-15 08:00 | 2020-04-15 08:10 | 2020-04-15 08:09:59.999 |
|
||
| 2020-04-15 08:07 | 2.00 | A | 2020-04-15 08:00 | 2020-04-15 08:10 | 2020-04-15 08:09:59.999 |
|
||
| 2020-04-15 08:09 | 5.00 | D | 2020-04-15 08:00 | 2020-04-15 08:10 | 2020-04-15 08:09:59.999 |
|
||
| 2020-04-15 08:11 | 3.00 | B | 2020-04-15 08:10 | 2020-04-15 08:20 | 2020-04-15 08:19:59.999 |
|
||
| 2020-04-15 08:13 | 1.00 | E | 2020-04-15 08:10 | 2020-04-15 08:20 | 2020-04-15 08:19:59.999 |
|
||
| 2020-04-15 08:17 | 6.00 | F | 2020-04-15 08:10 | 2020-04-15 08:20 | 2020-04-15 08:19:59.999 |
|
||
+------------------+-------+------+------------------+------------------+-------------------------+
|
||
|
||
-- apply aggregation on the tumbling windowed table
|
||
Flink SQL> SELECT window_start, window_end, SUM(price)
|
||
FROM TABLE(
|
||
TUMBLE(TABLE Bid, DESCRIPTOR(bidtime), INTERVAL '10' MINUTES))
|
||
GROUP BY window_start, window_end;
|
||
+------------------+------------------+-------+
|
||
| window_start | window_end | price |
|
||
+------------------+------------------+-------+
|
||
| 2020-04-15 08:00 | 2020-04-15 08:10 | 11.00 |
|
||
| 2020-04-15 08:10 | 2020-04-15 08:20 | 10.00 |
|
||
+------------------+------------------+-------+</pre>
|
||
</li></ul>
|
||
</div>
|
||
<div class="section" id="dli_08_15070__section894031011135"><h4 class="sectiontitle">HOP</h4><ul id="dli_08_15070__ul43231010796"><li id="dli_08_15070__li378858192"><strong id="dli_08_15070__b249623498104456">Function</strong><p id="dli_08_15070__p1729711351411">The <strong id="dli_08_15070__b142512354159">HOP</strong> function assigns elements to windows of fixed length. Like a <strong id="dli_08_15070__b13794144915153">TUMBLE</strong> windowing function, the size of the windows is configured by the window size parameter. An additional window slide parameter controls how frequently a hopping window is started. Hence, hopping windows can be overlapping if the slide is smaller than the window size. In this case, elements are assigned to multiple windows.</p>
|
||
<p id="dli_08_15070__p04861758131319">For example, you could have windows of size 10 minutes that slides by 5 minutes. With this, you get every 5 minutes a window that contains the events that arrived during the last 10 minutes, as depicted by the following figure.</p>
|
||
<div class="fignone" id="dli_08_15070__fig11496931201312"><span class="figcap"><b>Figure 2 </b>Hopping window</span><br><span><img id="dli_08_15070__image1576214262363" src="en-us_image_0000001827630114.png"></span></div>
|
||
</li></ul>
|
||
</div>
|
||
<ul id="dli_08_15070__ul12937415181411"><li id="dli_08_15070__li11937215151410"><strong id="dli_08_15070__b18816191712146">Description</strong><p id="dli_08_15070__p1977810499404">The <strong id="dli_08_15070__b2016021671912">HOP</strong> function assigns windows that cover rows within the interval of size and shifting every slide based on a time attribute field. In streaming mode, the time attribute field must be either event or processing time attributes. In batch mode, the time attribute field of window table function must be an attribute of type <strong id="dli_08_15070__b416034531917">TIMESTAMP</strong> or <strong id="dli_08_15070__b144401948181914">TIMESTAMP_LTZ</strong>.</p>
|
||
<p id="dli_08_15070__p177257444404">The return value of <strong id="dli_08_15070__b1441080112017">HOP</strong> is a new relation that includes all columns of original relation as well as additional 3 columns named "window_start", "window_end", "window_time" to indicate the assigned window. The original time attribute "timecol" will be a regular timestamp column after window TVF.</p>
|
||
<pre class="screen" id="dli_08_15070__screen20953122853813">HOP(TABLE data, DESCRIPTOR(timecol), slide, size [, offset ])</pre>
|
||
<p id="dli_08_15070__p16874183211394"></p>
|
||
|
||
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="dli_08_15070__table1046311463387" frame="border" border="1" rules="all"><caption><b>Table 2 </b>HOP function parameters</caption><thead align="left"><tr id="dli_08_15070__row34638460384"><th align="left" class="cellrowborder" valign="top" width="22.322232223222326%" id="mcps1.3.5.1.6.2.4.1.1"><p id="dli_08_15070__p84631346143811">Parameter</p>
|
||
</th>
|
||
<th align="left" class="cellrowborder" valign="top" width="23.25232523252325%" id="mcps1.3.5.1.6.2.4.1.2"><p id="dli_08_15070__p746354618387">Mandatory</p>
|
||
</th>
|
||
<th align="left" class="cellrowborder" valign="top" width="54.42544254425443%" id="mcps1.3.5.1.6.2.4.1.3"><p id="dli_08_15070__p11463846173816">Description</p>
|
||
</th>
|
||
</tr>
|
||
</thead>
|
||
<tbody><tr id="dli_08_15070__row246324693814"><td class="cellrowborder" valign="top" width="22.322232223222326%" headers="mcps1.3.5.1.6.2.4.1.1 "><p id="dli_08_15070__p15463194683815">data</p>
|
||
</td>
|
||
<td class="cellrowborder" valign="top" width="23.25232523252325%" headers="mcps1.3.5.1.6.2.4.1.2 "><p id="dli_08_15070__p5464846123819">Yes</p>
|
||
</td>
|
||
<td class="cellrowborder" valign="top" width="54.42544254425443%" headers="mcps1.3.5.1.6.2.4.1.3 "><p id="dli_08_15070__p13464646183813">A table parameter that can be any relation with a time attribute column.</p>
|
||
</td>
|
||
</tr>
|
||
<tr id="dli_08_15070__row1146415467386"><td class="cellrowborder" valign="top" width="22.322232223222326%" headers="mcps1.3.5.1.6.2.4.1.1 "><p id="dli_08_15070__p1746404616387">timecol</p>
|
||
</td>
|
||
<td class="cellrowborder" valign="top" width="23.25232523252325%" headers="mcps1.3.5.1.6.2.4.1.2 "><p id="dli_08_15070__p13464154663814">Yes</p>
|
||
</td>
|
||
<td class="cellrowborder" valign="top" width="54.42544254425443%" headers="mcps1.3.5.1.6.2.4.1.3 "><p id="dli_08_15070__p14464114617385">A column descriptor indicating which time attributes column of data should be mapped to tumbling windows.</p>
|
||
</td>
|
||
</tr>
|
||
<tr id="dli_08_15070__row1574155913382"><td class="cellrowborder" valign="top" width="22.322232223222326%" headers="mcps1.3.5.1.6.2.4.1.1 "><p id="dli_08_15070__p67495917389">slide</p>
|
||
</td>
|
||
<td class="cellrowborder" valign="top" width="23.25232523252325%" headers="mcps1.3.5.1.6.2.4.1.2 "><p id="dli_08_15070__p1174185933817">Yes</p>
|
||
</td>
|
||
<td class="cellrowborder" valign="top" width="54.42544254425443%" headers="mcps1.3.5.1.6.2.4.1.3 "><p id="dli_08_15070__p1474259153818">A duration specifying the duration between the start of sequential hopping windows.</p>
|
||
</td>
|
||
</tr>
|
||
<tr id="dli_08_15070__row11464246193820"><td class="cellrowborder" valign="top" width="22.322232223222326%" headers="mcps1.3.5.1.6.2.4.1.1 "><p id="dli_08_15070__p54641846183812">size</p>
|
||
</td>
|
||
<td class="cellrowborder" valign="top" width="23.25232523252325%" headers="mcps1.3.5.1.6.2.4.1.2 "><p id="dli_08_15070__p1746404617389">Yes</p>
|
||
</td>
|
||
<td class="cellrowborder" valign="top" width="54.42544254425443%" headers="mcps1.3.5.1.6.2.4.1.3 "><p id="dli_08_15070__p54641346123815">A duration specifying the width of the hopping windows.</p>
|
||
</td>
|
||
</tr>
|
||
<tr id="dli_08_15070__row2464154612388"><td class="cellrowborder" valign="top" width="22.322232223222326%" headers="mcps1.3.5.1.6.2.4.1.1 "><p id="dli_08_15070__p6464154673814">offset</p>
|
||
</td>
|
||
<td class="cellrowborder" valign="top" width="23.25232523252325%" headers="mcps1.3.5.1.6.2.4.1.2 "><p id="dli_08_15070__p164649464382">No</p>
|
||
</td>
|
||
<td class="cellrowborder" valign="top" width="54.42544254425443%" headers="mcps1.3.5.1.6.2.4.1.3 "><p id="dli_08_15070__p146484673816">Offset which window start would be shifted by.</p>
|
||
</td>
|
||
</tr>
|
||
</tbody>
|
||
</table>
|
||
</div>
|
||
</li><li id="dli_08_15070__li179371515131415"><strong id="dli_08_15070__b18585171412238">Example</strong><pre class="screen" id="dli_08_15070__screen788393021910">> SELECT * FROM TABLE(
|
||
HOP(TABLE Bid, DESCRIPTOR(bidtime), INTERVAL '5' MINUTES, INTERVAL '10' MINUTES));
|
||
-- or with the named params
|
||
-- note: the DATA param must be the first
|
||
> SELECT * FROM TABLE(
|
||
HOP(
|
||
DATA => TABLE Bid,
|
||
TIMECOL => DESCRIPTOR(bidtime),
|
||
SLIDE => INTERVAL '5' MINUTES,
|
||
SIZE => INTERVAL '10' MINUTES));
|
||
+------------------+-------+------+------------------+------------------+-------------------------+
|
||
| bidtime | price | item | window_start | window_end | window_time |
|
||
+------------------+-------+------+------------------+------------------+-------------------------+
|
||
| 2020-04-15 08:05 | 4.00 | C | 2020-04-15 08:00 | 2020-04-15 08:10 | 2020-04-15 08:09:59.999 |
|
||
| 2020-04-15 08:05 | 4.00 | C | 2020-04-15 08:05 | 2020-04-15 08:15 | 2020-04-15 08:14:59.999 |
|
||
| 2020-04-15 08:07 | 2.00 | A | 2020-04-15 08:00 | 2020-04-15 08:10 | 2020-04-15 08:09:59.999 |
|
||
| 2020-04-15 08:07 | 2.00 | A | 2020-04-15 08:05 | 2020-04-15 08:15 | 2020-04-15 08:14:59.999 |
|
||
| 2020-04-15 08:09 | 5.00 | D | 2020-04-15 08:00 | 2020-04-15 08:10 | 2020-04-15 08:09:59.999 |
|
||
| 2020-04-15 08:09 | 5.00 | D | 2020-04-15 08:05 | 2020-04-15 08:15 | 2020-04-15 08:14:59.999 |
|
||
| 2020-04-15 08:11 | 3.00 | B | 2020-04-15 08:05 | 2020-04-15 08:15 | 2020-04-15 08:14:59.999 |
|
||
| 2020-04-15 08:11 | 3.00 | B | 2020-04-15 08:10 | 2020-04-15 08:20 | 2020-04-15 08:19:59.999 |
|
||
| 2020-04-15 08:13 | 1.00 | E | 2020-04-15 08:05 | 2020-04-15 08:15 | 2020-04-15 08:14:59.999 |
|
||
| 2020-04-15 08:13 | 1.00 | E | 2020-04-15 08:10 | 2020-04-15 08:20 | 2020-04-15 08:19:59.999 |
|
||
| 2020-04-15 08:17 | 6.00 | F | 2020-04-15 08:10 | 2020-04-15 08:20 | 2020-04-15 08:19:59.999 |
|
||
| 2020-04-15 08:17 | 6.00 | F | 2020-04-15 08:15 | 2020-04-15 08:25 | 2020-04-15 08:24:59.999 |
|
||
+------------------+-------+------+------------------+------------------+-------------------------+
|
||
|
||
-- apply aggregation on the hopping windowed table
|
||
> SELECT window_start, window_end, SUM(price)
|
||
FROM TABLE(
|
||
HOP(TABLE Bid, DESCRIPTOR(bidtime), INTERVAL '5' MINUTES, INTERVAL '10' MINUTES))
|
||
GROUP BY window_start, window_end;
|
||
+------------------+------------------+-------+
|
||
| window_start | window_end | price |
|
||
+------------------+------------------+-------+
|
||
| 2020-04-15 08:00 | 2020-04-15 08:10 | 11.00 |
|
||
| 2020-04-15 08:05 | 2020-04-15 08:15 | 15.00 |
|
||
| 2020-04-15 08:10 | 2020-04-15 08:20 | 10.00 |
|
||
| 2020-04-15 08:15 | 2020-04-15 08:25 | 6.00 |
|
||
+------------------+------------------+-------+</pre>
|
||
</li></ul>
|
||
<div class="section" id="dli_08_15070__section17718626191418"><h4 class="sectiontitle">CUMULATE</h4><ul id="dli_08_15070__ul15391337111418"><li id="dli_08_15070__li1539119374144">Function<p id="dli_08_15070__p15443559174115"><a name="dli_08_15070__li1539119374144"></a><a name="li1539119374144"></a>Cumulating windows are very useful in some scenarios, such as tumbling windows with early firing in a fixed window interval. For example, a daily dashboard draws cumulative UVs from 00:00 to every minute, the UV at 10:00 represents the total number of UV from 00:00 to 10:00. This can be easily and efficiently implemented by CUMULATE windowing.</p>
|
||
<p id="dli_08_15070__p1877514417424">The <strong id="dli_08_15070__b62707194259">CUMULATE</strong> function assigns elements to windows that cover rows within an initial interval of step size and expand to one more step size (keep window start fixed) every step until the max window size. You can think <strong id="dli_08_15070__b167701427143012">CUMULATE</strong> function as applying <strong id="dli_08_15070__b5770427163010">TUMBLE</strong> windowing with max window size first, and split each tumbling windows into several windows with same window start and window ends of step-size difference. So cumulating windows do overlap and do not have a fixed size.</p>
|
||
<p id="dli_08_15070__p2054861419">For example, you could have a cumulating window for 1 hour step and 1 day max size, and you will get windows: [00:00, 01:00), [00:00, 02:00), [00:00, 03:00), …, [00:00, 24:00) for every day.</p>
|
||
<div class="fignone" id="dli_08_15070__fig119981447141413"><span class="figcap"><b>Figure 3 </b>Cumulating window</span><br><span><img id="dli_08_15070__image16303113910424" src="en-us_image_0000001874189597.png"></span></div>
|
||
</li></ul>
|
||
</div>
|
||
<ul id="dli_08_15070__ul12328910111510"><li id="dli_08_15070__li1232813106153"><strong id="dli_08_15070__b134361313276">Description</strong><p id="dli_08_15070__p87403341407">The <strong id="dli_08_15070__b6659133613279">CUMULATE</strong> functions assigns windows based on a time attribute column. In streaming mode, the time attribute field must be either event or processing time attributes. In batch mode, the time attribute field of window table function must be an attribute of type <strong id="dli_08_15070__b1131725082714">TIMESTAMP</strong> or <strong id="dli_08_15070__b173181650182718">TIMESTAMP_LTZ</strong>.</p>
|
||
<p id="dli_08_15070__p17221394431">The return value of <strong id="dli_08_15070__b894105412276">CUMULATE</strong> is a new relation that includes all columns of original relation as well as additional 3 columns named "window_start", "window_end", "window_time" to indicate the assigned window. The original time attribute "timecol" will be a regular timestamp column after window TVF.</p>
|
||
<pre class="screen" id="dli_08_15070__screen157691622124311">CUMULATE(TABLE data, DESCRIPTOR(timecol), step, size)</pre>
|
||
<p id="dli_08_15070__p7434184673917"></p>
|
||
|
||
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="dli_08_15070__table12646447163918" frame="border" border="1" rules="all"><caption><b>Table 3 </b>CUMULATE function parameters</caption><thead align="left"><tr id="dli_08_15070__row186461047173916"><th align="left" class="cellrowborder" valign="top" width="22.322232223222326%" id="mcps1.3.7.1.6.2.4.1.1"><p id="dli_08_15070__p16646447163919">Parameter</p>
|
||
</th>
|
||
<th align="left" class="cellrowborder" valign="top" width="23.25232523252325%" id="mcps1.3.7.1.6.2.4.1.2"><p id="dli_08_15070__p1664634719393">Mandatory</p>
|
||
</th>
|
||
<th align="left" class="cellrowborder" valign="top" width="54.42544254425443%" id="mcps1.3.7.1.6.2.4.1.3"><p id="dli_08_15070__p6646144733916">Description</p>
|
||
</th>
|
||
</tr>
|
||
</thead>
|
||
<tbody><tr id="dli_08_15070__row86461247123915"><td class="cellrowborder" valign="top" width="22.322232223222326%" headers="mcps1.3.7.1.6.2.4.1.1 "><p id="dli_08_15070__p19646184717392">data</p>
|
||
</td>
|
||
<td class="cellrowborder" valign="top" width="23.25232523252325%" headers="mcps1.3.7.1.6.2.4.1.2 "><p id="dli_08_15070__p7646047123914">Yes</p>
|
||
</td>
|
||
<td class="cellrowborder" valign="top" width="54.42544254425443%" headers="mcps1.3.7.1.6.2.4.1.3 "><p id="dli_08_15070__p1064694719397">A table parameter that can be any relation with a time attribute column.</p>
|
||
</td>
|
||
</tr>
|
||
<tr id="dli_08_15070__row1646154723912"><td class="cellrowborder" valign="top" width="22.322232223222326%" headers="mcps1.3.7.1.6.2.4.1.1 "><p id="dli_08_15070__p11646947103919">timecol</p>
|
||
</td>
|
||
<td class="cellrowborder" valign="top" width="23.25232523252325%" headers="mcps1.3.7.1.6.2.4.1.2 "><p id="dli_08_15070__p1164634712392">Yes</p>
|
||
</td>
|
||
<td class="cellrowborder" valign="top" width="54.42544254425443%" headers="mcps1.3.7.1.6.2.4.1.3 "><p id="dli_08_15070__p13646204753919">A column descriptor indicating which time attributes column of data should be mapped to cumulating windows.</p>
|
||
</td>
|
||
</tr>
|
||
<tr id="dli_08_15070__row166461147203911"><td class="cellrowborder" valign="top" width="22.322232223222326%" headers="mcps1.3.7.1.6.2.4.1.1 "><p id="dli_08_15070__p46461847173912">step</p>
|
||
</td>
|
||
<td class="cellrowborder" valign="top" width="23.25232523252325%" headers="mcps1.3.7.1.6.2.4.1.2 "><p id="dli_08_15070__p1164624710395">Yes</p>
|
||
</td>
|
||
<td class="cellrowborder" valign="top" width="54.42544254425443%" headers="mcps1.3.7.1.6.2.4.1.3 "><p id="dli_08_15070__p2646134715399">A duration specifying the increased window size between the end of sequential cumulating windows.</p>
|
||
</td>
|
||
</tr>
|
||
<tr id="dli_08_15070__row1264654720398"><td class="cellrowborder" valign="top" width="22.322232223222326%" headers="mcps1.3.7.1.6.2.4.1.1 "><p id="dli_08_15070__p10647134763918">size</p>
|
||
</td>
|
||
<td class="cellrowborder" valign="top" width="23.25232523252325%" headers="mcps1.3.7.1.6.2.4.1.2 "><p id="dli_08_15070__p764764723918">Yes</p>
|
||
</td>
|
||
<td class="cellrowborder" valign="top" width="54.42544254425443%" headers="mcps1.3.7.1.6.2.4.1.3 "><p id="dli_08_15070__p126474479397">A duration specifying the width of the cumulating windows.</p>
|
||
</td>
|
||
</tr>
|
||
<tr id="dli_08_15070__row4647647203915"><td class="cellrowborder" valign="top" width="22.322232223222326%" headers="mcps1.3.7.1.6.2.4.1.1 "><p id="dli_08_15070__p116470471399">offset</p>
|
||
</td>
|
||
<td class="cellrowborder" valign="top" width="23.25232523252325%" headers="mcps1.3.7.1.6.2.4.1.2 "><p id="dli_08_15070__p12647174715397">No</p>
|
||
</td>
|
||
<td class="cellrowborder" valign="top" width="54.42544254425443%" headers="mcps1.3.7.1.6.2.4.1.3 "><p id="dli_08_15070__p76473475394">Offset which window start would be shifted by.</p>
|
||
</td>
|
||
</tr>
|
||
</tbody>
|
||
</table>
|
||
</div>
|
||
</li><li id="dli_08_15070__li5328141031510"><strong id="dli_08_15070__b030691632316">Example</strong><pre class="screen" id="dli_08_15070__screen1853344815198">> SELECT * FROM TABLE(
|
||
CUMULATE(TABLE Bid, DESCRIPTOR(bidtime), INTERVAL '2' MINUTES, INTERVAL '10' MINUTES));
|
||
-- or with the named params
|
||
-- note: the DATA param must be the first
|
||
> SELECT * FROM TABLE(
|
||
CUMULATE(
|
||
DATA => TABLE Bid,
|
||
TIMECOL => DESCRIPTOR(bidtime),
|
||
STEP => INTERVAL '2' MINUTES,
|
||
SIZE => INTERVAL '10' MINUTES));
|
||
+------------------+-------+------+------------------+------------------+-------------------------+
|
||
| bidtime | price | item | window_start | window_end | window_time |
|
||
+------------------+-------+------+------------------+------------------+-------------------------+
|
||
| 2020-04-15 08:05 | 4.00 | C | 2020-04-15 08:00 | 2020-04-15 08:06 | 2020-04-15 08:05:59.999 |
|
||
| 2020-04-15 08:05 | 4.00 | C | 2020-04-15 08:00 | 2020-04-15 08:08 | 2020-04-15 08:07:59.999 |
|
||
| 2020-04-15 08:05 | 4.00 | C | 2020-04-15 08:00 | 2020-04-15 08:10 | 2020-04-15 08:09:59.999 |
|
||
| 2020-04-15 08:07 | 2.00 | A | 2020-04-15 08:00 | 2020-04-15 08:08 | 2020-04-15 08:07:59.999 |
|
||
| 2020-04-15 08:07 | 2.00 | A | 2020-04-15 08:00 | 2020-04-15 08:10 | 2020-04-15 08:09:59.999 |
|
||
| 2020-04-15 08:09 | 5.00 | D | 2020-04-15 08:00 | 2020-04-15 08:10 | 2020-04-15 08:09:59.999 |
|
||
| 2020-04-15 08:11 | 3.00 | B | 2020-04-15 08:10 | 2020-04-15 08:12 | 2020-04-15 08:11:59.999 |
|
||
| 2020-04-15 08:11 | 3.00 | B | 2020-04-15 08:10 | 2020-04-15 08:14 | 2020-04-15 08:13:59.999 |
|
||
| 2020-04-15 08:11 | 3.00 | B | 2020-04-15 08:10 | 2020-04-15 08:16 | 2020-04-15 08:15:59.999 |
|
||
| 2020-04-15 08:11 | 3.00 | B | 2020-04-15 08:10 | 2020-04-15 08:18 | 2020-04-15 08:17:59.999 |
|
||
| 2020-04-15 08:11 | 3.00 | B | 2020-04-15 08:10 | 2020-04-15 08:20 | 2020-04-15 08:19:59.999 |
|
||
| 2020-04-15 08:13 | 1.00 | E | 2020-04-15 08:10 | 2020-04-15 08:14 | 2020-04-15 08:13:59.999 |
|
||
| 2020-04-15 08:13 | 1.00 | E | 2020-04-15 08:10 | 2020-04-15 08:16 | 2020-04-15 08:15:59.999 |
|
||
| 2020-04-15 08:13 | 1.00 | E | 2020-04-15 08:10 | 2020-04-15 08:18 | 2020-04-15 08:17:59.999 |
|
||
| 2020-04-15 08:13 | 1.00 | E | 2020-04-15 08:10 | 2020-04-15 08:20 | 2020-04-15 08:19:59.999 |
|
||
| 2020-04-15 08:17 | 6.00 | F | 2020-04-15 08:10 | 2020-04-15 08:18 | 2020-04-15 08:17:59.999 |
|
||
| 2020-04-15 08:17 | 6.00 | F | 2020-04-15 08:10 | 2020-04-15 08:20 | 2020-04-15 08:19:59.999 |
|
||
+------------------+-------+------+------------------+------------------+-------------------------+
|
||
|
||
-- apply aggregation on the cumulating windowed table
|
||
> SELECT window_start, window_end, SUM(price)
|
||
FROM TABLE(
|
||
CUMULATE(TABLE Bid, DESCRIPTOR(bidtime), INTERVAL '2' MINUTES, INTERVAL '10' MINUTES))
|
||
GROUP BY window_start, window_end;
|
||
+------------------+------------------+-------+
|
||
| window_start | window_end | price |
|
||
+------------------+------------------+-------+
|
||
| 2020-04-15 08:00 | 2020-04-15 08:06 | 4.00 |
|
||
| 2020-04-15 08:00 | 2020-04-15 08:08 | 6.00 |
|
||
| 2020-04-15 08:00 | 2020-04-15 08:10 | 11.00 |
|
||
| 2020-04-15 08:10 | 2020-04-15 08:12 | 3.00 |
|
||
| 2020-04-15 08:10 | 2020-04-15 08:14 | 4.00 |
|
||
| 2020-04-15 08:10 | 2020-04-15 08:16 | 4.00 |
|
||
| 2020-04-15 08:10 | 2020-04-15 08:18 | 10.00 |
|
||
| 2020-04-15 08:10 | 2020-04-15 08:20 | 10.00 |
|
||
+------------------+------------------+-------+</pre>
|
||
</li></ul>
|
||
<div class="section" id="dli_08_15070__section145761730125914"><h4 class="sectiontitle">Window Offset</h4><p id="dli_08_15070__p14134547105910"><strong id="dli_08_15070__b1165911306">Offset</strong> is an optional parameter which could be used to change the window assignment. It could be positive duration and negative duration. Default values for window offset is <strong id="dli_08_15070__b141982614308">0</strong>. The same record maybe assigned to the different window if set different offset value. For example, which window would be assigned to for a record with timestamp 2021-06-30 00:00:04 for a Tumble window with 10 MINUTE as size?</p>
|
||
<ul id="dli_08_15070__ul213411472594"><li id="dli_08_15070__li1813494716599">If <strong id="dli_08_15070__b2360182323114">offset</strong> value is <strong id="dli_08_15070__b158399286312">-16</strong> MINUTE, the record assigns to window [2021-06-29 23:54:00, 2021-06-30 00:04:00).</li><li id="dli_08_15070__li16134164765915">If <strong id="dli_08_15070__b2055924853116">offset</strong> value is <strong id="dli_08_15070__b45602481318">-6</strong> MINUTE, the record assigns to window [2021-06-29 23:54:00, 2021-06-30 00:04:00).</li><li id="dli_08_15070__li6134194725919">If <strong id="dli_08_15070__b847151519324">offset</strong> is <strong id="dli_08_15070__b19934131819320">-4</strong> MINUTE, the record assigns to window [2021-06-29 23:56:00, 2021-06-30 00:06:00).</li><li id="dli_08_15070__li1113414472596">If <strong id="dli_08_15070__b151215337323">offset</strong> is <strong id="dli_08_15070__b14222135123218">0</strong>, the record assigns to window [2021-06-30 00:00:00, 2021-06-30 00:10:00).</li><li id="dli_08_15070__li1813418477594">If <strong id="dli_08_15070__b1575833719320">offset</strong> value is <strong id="dli_08_15070__b1375915370325">4</strong> MINUTE, the record assigns to window [2021-06-29 23:54:00, 2021-06-30 00:04:00).</li><li id="dli_08_15070__li513418471590">If <strong id="dli_08_15070__b335134615324">offset</strong> is <strong id="dli_08_15070__b835174653214">6</strong> MINUTE, the record assigns to window [2021-06-29 23:56:00, 2021-06-30 00:06:00).</li><li id="dli_08_15070__li11134154755918">If <strong id="dli_08_15070__b1692725119324">offset</strong> is <strong id="dli_08_15070__b10928155153219">16</strong> MINUTE, the record assigns to window [2021-06-29 23:56:00, 2021-06-30 00:06:00). We could find that, some windows offset parameters may have same effect on the assignment of windows. In the above case, <strong id="dli_08_15070__b852013363331">-16</strong> MINUTE, <strong id="dli_08_15070__b19322388332">-6</strong> MINUTE and <strong id="dli_08_15070__b4495194118336">4</strong> MINUTE have same effect for a Tumble window with 10 MINUTE as size.</li></ul>
|
||
<div class="note" id="dli_08_15070__note14867193751518"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="dli_08_15070__p148678378156">The effect of window offset is just for updating window assignment, it has no effect on Watermark.</p>
|
||
</div></div>
|
||
<pre class="screen" id="dli_08_15070__screen010031710209">-- NOTE: Currently Flink doesn't support evaluating individual window table-valued function,
|
||
-- window table-valued function should be used with aggregate operation,
|
||
-- this example is just used for explaining the syntax and the data produced by table-valued function.
|
||
Flink SQL> SELECT * FROM TABLE(
|
||
TUMBLE(TABLE Bid, DESCRIPTOR(bidtime), INTERVAL '10' MINUTES, INTERVAL '1' MINUTES));
|
||
-- or with the named params
|
||
-- note: the DATA param must be the first
|
||
Flink SQL> SELECT * FROM TABLE(
|
||
TUMBLE(
|
||
DATA => TABLE Bid,
|
||
TIMECOL => DESCRIPTOR(bidtime),
|
||
SIZE => INTERVAL '10' MINUTES,
|
||
OFFSET => INTERVAL '1' MINUTES));
|
||
+------------------+-------+------+------------------+------------------+-------------------------+
|
||
| bidtime | price | item | window_start | window_end | window_time |
|
||
+------------------+-------+------+------------------+------------------+-------------------------+
|
||
| 2020-04-15 08:05 | 4.00 | C | 2020-04-15 08:01 | 2020-04-15 08:11 | 2020-04-15 08:10:59.999 |
|
||
| 2020-04-15 08:07 | 2.00 | A | 2020-04-15 08:01 | 2020-04-15 08:11 | 2020-04-15 08:10:59.999 |
|
||
| 2020-04-15 08:09 | 5.00 | D | 2020-04-15 08:01 | 2020-04-15 08:11 | 2020-04-15 08:10:59.999 |
|
||
| 2020-04-15 08:11 | 3.00 | B | 2020-04-15 08:11 | 2020-04-15 08:21 | 2020-04-15 08:20:59.999 |
|
||
| 2020-04-15 08:13 | 1.00 | E | 2020-04-15 08:11 | 2020-04-15 08:21 | 2020-04-15 08:20:59.999 |
|
||
| 2020-04-15 08:17 | 6.00 | F | 2020-04-15 08:11 | 2020-04-15 08:21 | 2020-04-15 08:20:59.999 |
|
||
+------------------+-------+------+------------------+------------------+-------------------------+
|
||
|
||
-- apply aggregation on the tumbling windowed table
|
||
Flink SQL> SELECT window_start, window_end, SUM(price)
|
||
FROM TABLE(
|
||
TUMBLE(TABLE Bid, DESCRIPTOR(bidtime), INTERVAL '10' MINUTES, INTERVAL '1' MINUTES))
|
||
GROUP BY window_start, window_end;
|
||
+------------------+------------------+-------+
|
||
| window_start | window_end | price |
|
||
+------------------+------------------+-------+
|
||
| 2020-04-15 08:01 | 2020-04-15 08:11 | 11.00 |
|
||
| 2020-04-15 08:11 | 2020-04-15 08:21 | 10.00 |
|
||
+------------------+------------------+-------+</pre>
|
||
</div>
|
||
</div>
|
||
<div>
|
||
<div class="familylinks">
|
||
<div class="parentlink"><strong>Parent topic:</strong> <a href="dli_08_15069.html">Window</a></div>
|
||
</div>
|
||
</div>
|
||
|