Files
doc-exports/docs/dli/sqlreference/dli_08_15075.html
Su, Xiaomeng be9eabe464 dli_sqlreference_20250305
Reviewed-by: Pruthi, Vineet <vineet.pruthi@t-systems.com>
Co-authored-by: Su, Xiaomeng <suxiaomeng1@huawei.com>
Co-committed-by: Su, Xiaomeng <suxiaomeng1@huawei.com>
2025-03-25 09:06:21 +00:00

75 lines
7.2 KiB
HTML

<a name="dli_08_15075"></a><a name="dli_08_15075"></a>
<h1 class="topictitle1">Group Aggregation</h1>
<div id="body0000001871124341"><p id="dli_08_15075__p8060118">An aggregate function computes a single result from multiple input rows. For example, there are aggregates to compute the <strong id="dli_08_15075__b1222102318219">COUNT</strong>, <strong id="dli_08_15075__b85977246217">SUM</strong>, <strong id="dli_08_15075__b195239281215">AVG</strong> (average), <strong id="dli_08_15075__b247012328214">MAX</strong> (maximum) and <strong id="dli_08_15075__b17146163572115">MIN</strong> (minimum) over a set of rows.</p>
<p id="dli_08_15075__p18151815163518">For streaming queries, the required state for computing the query result might grow infinitely. State size depends on the number of groups and the number and type of aggregation functions. For example MIN/MAX are heavy on state size while COUNT is cheap. You can provide a query configuration with an appropriate state time-to-live (TTL) to prevent excessive state size. Note that this might affect the correctness of the query result.</p>
<p id="dli_08_15075__p1699101292413">For more information, see <a href="https://nightlies.apache.org/flink/flink-docs-release-1.15/zh/docs/dev/table/sql/queries/group-agg/" target="_blank" rel="noopener noreferrer">Group Aggregation</a>.</p>
<div class="section" id="dli_08_15075__section1517113447239"><h4 class="sectiontitle">DISTINCT Aggregation</h4><p id="dli_08_15075__p599125852318">Distinct aggregates remove duplicate values before applying an aggregation function. The following example counts the number of distinct order_ids instead of the total number of rows in the <strong id="dli_08_15075__b1479184614245">Orders</strong> table.</p>
<pre class="screen" id="dli_08_15075__screen36812457249">SELECT COUNT(DISTINCT order_id) FROM Orders</pre>
</div>
<div class="section" id="dli_08_15075__section20922164692419"><h4 class="sectiontitle">GROUPING SETS</h4><p id="dli_08_15075__p1793511555248">Grouping sets allow for more complex grouping operations than those describable by a standard <strong id="dli_08_15075__b1650135092415">GROUP BY</strong>. Rows are grouped separately by each specified grouping set and aggregates are computed for each group just as for simple <strong id="dli_08_15075__b376925410240">GROUP BY</strong> clauses.</p>
<p id="dli_08_15075__p1895216239257">Each sublist of <strong id="dli_08_15075__b9632201212515">GROUPING SETS</strong> may specify zero or more columns or expressions and is interpreted the same way as though used directly in the <strong id="dli_08_15075__b663381218250">GROUP BY</strong> clause. An empty grouping set means that all rows are aggregated down to a single group, which is output even if no input rows were present.</p>
<p id="dli_08_15075__p169522234255">References to the grouping columns or expressions are replaced by null values in result rows for grouping sets in which those columns do not appear.</p>
<pre class="screen" id="dli_08_15075__screen1730011862510">SELECT supplier_id, rating, COUNT(*) AS total
FROM (VALUES
('supplier1', 'product1', 4),
('supplier1', 'product2', 3),
('supplier2', 'product3', 3),
('supplier2', 'product4', 4))
AS Products(supplier_id, product_id, rating)
GROUP BY GROUPING SETS ((supplier_id, rating), (supplier_id), ())</pre>
</div>
<div class="section" id="dli_08_15075__section132599153253"><h4 class="sectiontitle">ROLLUP</h4><p id="dli_08_15075__p117756432255"><strong id="dli_08_15075__b1117014269254">ROLLUP</strong> is a shorthand notation for specifying a common type of grouping set. It represents the given list of expressions and all prefixes of the list, including the empty list.</p>
<pre class="screen" id="dli_08_15075__screen327519113266">SELECT supplier_id, rating, COUNT(*)
FROM (VALUES
('supplier1', 'product1', 4),
('supplier1', 'product2', 3),
('supplier2', 'product3', 3),
('supplier2', 'product4', 4))
AS Products(supplier_id, product_id, rating)
GROUP BY ROLLUP (supplier_id, rating)</pre>
</div>
<div class="section" id="dli_08_15075__section37011318265"><h4 class="sectiontitle">CUBE</h4><p id="dli_08_15075__p15476149122613"><strong id="dli_08_15075__b18811123518255">CUBE</strong> is a shorthand notation for specifying a common type of grouping set. It represents the given list and all of its possible subsets - the power set.</p>
<p id="dli_08_15075__p157666153264">For example, the following two queries are equivalent.</p>
<pre class="screen" id="dli_08_15075__screen1158331912616">SELECT supplier_id, rating, product_id, COUNT(*)
FROM (VALUES
('supplier1', 'product1', 4),
('supplier1', 'product2', 3),
('supplier2', 'product3', 3),
('supplier2', 'product4', 4))
AS Products(supplier_id, product_id, rating)
GROUP BY CUBE (supplier_id, rating, product_id)
SELECT supplier_id, rating, product_id, COUNT(*)
FROM (VALUES
('supplier1', 'product1', 4),
('supplier1', 'product2', 3),
('supplier2', 'product3', 3),
('supplier2', 'product4', 4))
AS Products(supplier_id, product_id, rating)
GROUP BY GROUPING SET (
( supplier_id, product_id, rating ),
( supplier_id, product_id ),
( supplier_id, rating ),
( supplier_id ),
( product_id, rating ),
( product_id ),
( rating ),
( )
)</pre>
</div>
<div class="section" id="dli_08_15075__section19877172912620"><h4 class="sectiontitle">HAVING</h4><p id="dli_08_15075__p9447123418261"><strong id="dli_08_15075__b0127185817252">HAVING</strong> eliminates group rows that do not satisfy the condition. <strong id="dli_08_15075__b1055007152612">HAVING</strong> is different from <strong id="dli_08_15075__b18951610172612">WHERE</strong>: <strong id="dli_08_15075__b19346161310268">WHERE</strong> filters individual rows before the <strong id="dli_08_15075__b1566741617267">GROUP BY</strong> while <strong id="dli_08_15075__b1037041912614">HAVING</strong> filters group rows created by <strong id="dli_08_15075__b2602822102612">GROUP BY</strong>. Each column referenced in condition must unambiguously reference a grouping column unless it appears within an aggregate function.</p>
<p id="dli_08_15075__p125553212714">The presence of <strong id="dli_08_15075__b8680182714273">HAVING</strong> turns a query into a grouped query even if there is no <strong id="dli_08_15075__b5992113111276">GROUP BY</strong> clause. It is the same as what happens when the query contains aggregate functions but no <strong id="dli_08_15075__b127824542710">GROUP BY</strong> clause. The query considers all selected rows to form a single group, and the <strong id="dli_08_15075__b1155415042811">SELECT</strong> list and <strong id="dli_08_15075__b36358392814">HAVING</strong> clause can only reference table columns from within aggregate functions. Such a query will emit a single row if the <strong id="dli_08_15075__b1321852816">HAVING</strong> condition is true, zero rows if it is not true.</p>
<pre class="screen" id="dli_08_15075__screen3219164710262">SELECT SUM(amount)
FROM Orders
GROUP BY users
HAVING SUM(amount) &gt; 50</pre>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="dli_08_15066.html">DML Snytax</a></div>
</div>
</div>