Support GROUPING SETS, CUBE and ROLLUP.
This SQL standard functionality allows to aggregate data by different
GROUP BY clauses at once. Each grouping set returns rows with columns
grouped by in other sets set to NULL.
This could previously be achieved by doing each grouping as a separate
query, conjoined by UNION ALLs. Besides being considerably more concise,
grouping sets will in many cases be faster, requiring only one scan over
the underlying data.
The current implementation of grouping sets only supports using sorting
for input. Individual sets that share a sort order are computed in one
pass. If there are sets that don't share a sort order, additional sort &
aggregation steps are performed. These additional passes are sourced by
the previous sort step; thus avoiding repeated scans of the source data.
The code is structured in a way that adding support for purely using
hash aggregation or a mix of hashing and sorting is possible. Sorting
was chosen to be supported first, as it is the most generic method of
implementation.
Instead of, as in an earlier versions of the patch, representing the
chain of sort and aggregation steps as full blown planner and executor
nodes, all but the first sort are performed inside the aggregation node
itself. This avoids the need to do some unusual gymnastics to handle
having to return aggregated and non-aggregated tuples from underlying
nodes, as well as having to shut down underlying nodes early to limit
memory usage. The optimizer still builds Sort/Agg node to describe each
phase, but they're not part of the plan tree, but instead additional
data for the aggregation node. They're a convenient and preexisting way
to describe aggregation and sorting. The first (and possibly only) sort
step is still performed as a separate execution step. That retains
similarity with existing group by plans, makes rescans fairly simple,
avoids very deep plans (leading to slow explains) and easily allows to
avoid the sorting step if the underlying data is sorted by other means.
A somewhat ugly side of this patch is having to deal with a grammar
ambiguity between the new CUBE keyword and the cube extension/functions
named cube (and rollup). To avoid breaking existing deployments of the
cube extension it has not been renamed, neither has cube been made a
reserved keyword. Instead precedence hacking is used to make GROUP BY
cube(..) refer to the CUBE grouping sets feature, and not the function
cube(). To actually group by a function cube(), unlikely as that might
be, the function name has to be quoted.
Needs a catversion bump because stored rules may change.
Author: Andrew Gierth and Atri Sharma, with contributions from Andres Freund
Reviewed-By: Andres Freund, Noah Misch, Tom Lane, Svenne Krap, Tomas
Vondra, Erik Rijkers, Marti Raudsepp, Pavel Stehule
Discussion: CAOeZVidmVRe2jU6aMk_5qkxnB7dfmPROzM7Ur8JPW5j8Y5X-Lw@mail.gmail.com
2015-05-15 21:40:59 -04:00
|
|
|
--
|
|
|
|
|
-- grouping sets
|
|
|
|
|
--
|
|
|
|
|
|
|
|
|
|
-- test data sources
|
|
|
|
|
|
|
|
|
|
create temp view gstest1(a,b,v)
|
|
|
|
|
as values (1,1,10),(1,1,11),(1,2,12),(1,2,13),(1,3,14),
|
|
|
|
|
(2,3,15),
|
|
|
|
|
(3,3,16),(3,4,17),
|
|
|
|
|
(4,1,18),(4,1,19);
|
|
|
|
|
|
|
|
|
|
create temp table gstest2 (a integer, b integer, c integer, d integer,
|
|
|
|
|
e integer, f integer, g integer, h integer);
|
|
|
|
|
copy gstest2 from stdin;
|
|
|
|
|
1 1 1 1 1 1 1 1
|
|
|
|
|
1 1 1 1 1 1 1 2
|
|
|
|
|
1 1 1 1 1 1 2 2
|
|
|
|
|
1 1 1 1 1 2 2 2
|
|
|
|
|
1 1 1 1 2 2 2 2
|
|
|
|
|
1 1 1 2 2 2 2 2
|
|
|
|
|
1 1 2 2 2 2 2 2
|
|
|
|
|
1 2 2 2 2 2 2 2
|
|
|
|
|
2 2 2 2 2 2 2 2
|
|
|
|
|
\.
|
|
|
|
|
|
|
|
|
|
create temp table gstest3 (a integer, b integer, c integer, d integer);
|
|
|
|
|
copy gstest3 from stdin;
|
|
|
|
|
1 1 1 1
|
|
|
|
|
2 2 2 2
|
|
|
|
|
\.
|
|
|
|
|
alter table gstest3 add primary key (a);
|
|
|
|
|
|
2017-03-26 23:20:54 -04:00
|
|
|
create temp table gstest4(id integer, v integer,
|
|
|
|
|
unhashable_col bit(4), unsortable_col xid);
|
|
|
|
|
insert into gstest4
|
|
|
|
|
values (1,1,b'0000','1'), (2,2,b'0001','1'),
|
|
|
|
|
(3,4,b'0010','2'), (4,8,b'0011','2'),
|
|
|
|
|
(5,16,b'0000','2'), (6,32,b'0001','2'),
|
|
|
|
|
(7,64,b'0010','1'), (8,128,b'0011','1');
|
|
|
|
|
|
Support GROUPING SETS, CUBE and ROLLUP.
This SQL standard functionality allows to aggregate data by different
GROUP BY clauses at once. Each grouping set returns rows with columns
grouped by in other sets set to NULL.
This could previously be achieved by doing each grouping as a separate
query, conjoined by UNION ALLs. Besides being considerably more concise,
grouping sets will in many cases be faster, requiring only one scan over
the underlying data.
The current implementation of grouping sets only supports using sorting
for input. Individual sets that share a sort order are computed in one
pass. If there are sets that don't share a sort order, additional sort &
aggregation steps are performed. These additional passes are sourced by
the previous sort step; thus avoiding repeated scans of the source data.
The code is structured in a way that adding support for purely using
hash aggregation or a mix of hashing and sorting is possible. Sorting
was chosen to be supported first, as it is the most generic method of
implementation.
Instead of, as in an earlier versions of the patch, representing the
chain of sort and aggregation steps as full blown planner and executor
nodes, all but the first sort are performed inside the aggregation node
itself. This avoids the need to do some unusual gymnastics to handle
having to return aggregated and non-aggregated tuples from underlying
nodes, as well as having to shut down underlying nodes early to limit
memory usage. The optimizer still builds Sort/Agg node to describe each
phase, but they're not part of the plan tree, but instead additional
data for the aggregation node. They're a convenient and preexisting way
to describe aggregation and sorting. The first (and possibly only) sort
step is still performed as a separate execution step. That retains
similarity with existing group by plans, makes rescans fairly simple,
avoids very deep plans (leading to slow explains) and easily allows to
avoid the sorting step if the underlying data is sorted by other means.
A somewhat ugly side of this patch is having to deal with a grammar
ambiguity between the new CUBE keyword and the cube extension/functions
named cube (and rollup). To avoid breaking existing deployments of the
cube extension it has not been renamed, neither has cube been made a
reserved keyword. Instead precedence hacking is used to make GROUP BY
cube(..) refer to the CUBE grouping sets feature, and not the function
cube(). To actually group by a function cube(), unlikely as that might
be, the function name has to be quoted.
Needs a catversion bump because stored rules may change.
Author: Andrew Gierth and Atri Sharma, with contributions from Andres Freund
Reviewed-By: Andres Freund, Noah Misch, Tom Lane, Svenne Krap, Tomas
Vondra, Erik Rijkers, Marti Raudsepp, Pavel Stehule
Discussion: CAOeZVidmVRe2jU6aMk_5qkxnB7dfmPROzM7Ur8JPW5j8Y5X-Lw@mail.gmail.com
2015-05-15 21:40:59 -04:00
|
|
|
create temp table gstest_empty (a integer, b integer, v integer);
|
|
|
|
|
|
|
|
|
|
create function gstest_data(v integer, out a integer, out b integer)
|
|
|
|
|
returns setof record
|
|
|
|
|
as $f$
|
|
|
|
|
begin
|
|
|
|
|
return query select v, i from generate_series(1,3) i;
|
|
|
|
|
end;
|
|
|
|
|
$f$ language plpgsql;
|
|
|
|
|
|
|
|
|
|
-- basic functionality
|
|
|
|
|
|
2017-03-26 23:20:54 -04:00
|
|
|
set enable_hashagg = false; -- test hashing explicitly later
|
|
|
|
|
|
Support GROUPING SETS, CUBE and ROLLUP.
This SQL standard functionality allows to aggregate data by different
GROUP BY clauses at once. Each grouping set returns rows with columns
grouped by in other sets set to NULL.
This could previously be achieved by doing each grouping as a separate
query, conjoined by UNION ALLs. Besides being considerably more concise,
grouping sets will in many cases be faster, requiring only one scan over
the underlying data.
The current implementation of grouping sets only supports using sorting
for input. Individual sets that share a sort order are computed in one
pass. If there are sets that don't share a sort order, additional sort &
aggregation steps are performed. These additional passes are sourced by
the previous sort step; thus avoiding repeated scans of the source data.
The code is structured in a way that adding support for purely using
hash aggregation or a mix of hashing and sorting is possible. Sorting
was chosen to be supported first, as it is the most generic method of
implementation.
Instead of, as in an earlier versions of the patch, representing the
chain of sort and aggregation steps as full blown planner and executor
nodes, all but the first sort are performed inside the aggregation node
itself. This avoids the need to do some unusual gymnastics to handle
having to return aggregated and non-aggregated tuples from underlying
nodes, as well as having to shut down underlying nodes early to limit
memory usage. The optimizer still builds Sort/Agg node to describe each
phase, but they're not part of the plan tree, but instead additional
data for the aggregation node. They're a convenient and preexisting way
to describe aggregation and sorting. The first (and possibly only) sort
step is still performed as a separate execution step. That retains
similarity with existing group by plans, makes rescans fairly simple,
avoids very deep plans (leading to slow explains) and easily allows to
avoid the sorting step if the underlying data is sorted by other means.
A somewhat ugly side of this patch is having to deal with a grammar
ambiguity between the new CUBE keyword and the cube extension/functions
named cube (and rollup). To avoid breaking existing deployments of the
cube extension it has not been renamed, neither has cube been made a
reserved keyword. Instead precedence hacking is used to make GROUP BY
cube(..) refer to the CUBE grouping sets feature, and not the function
cube(). To actually group by a function cube(), unlikely as that might
be, the function name has to be quoted.
Needs a catversion bump because stored rules may change.
Author: Andrew Gierth and Atri Sharma, with contributions from Andres Freund
Reviewed-By: Andres Freund, Noah Misch, Tom Lane, Svenne Krap, Tomas
Vondra, Erik Rijkers, Marti Raudsepp, Pavel Stehule
Discussion: CAOeZVidmVRe2jU6aMk_5qkxnB7dfmPROzM7Ur8JPW5j8Y5X-Lw@mail.gmail.com
2015-05-15 21:40:59 -04:00
|
|
|
-- simple rollup with multiple plain aggregates, with and without ordering
|
|
|
|
|
-- (and with ordering differing from grouping)
|
2017-03-26 23:20:54 -04:00
|
|
|
|
Support GROUPING SETS, CUBE and ROLLUP.
This SQL standard functionality allows to aggregate data by different
GROUP BY clauses at once. Each grouping set returns rows with columns
grouped by in other sets set to NULL.
This could previously be achieved by doing each grouping as a separate
query, conjoined by UNION ALLs. Besides being considerably more concise,
grouping sets will in many cases be faster, requiring only one scan over
the underlying data.
The current implementation of grouping sets only supports using sorting
for input. Individual sets that share a sort order are computed in one
pass. If there are sets that don't share a sort order, additional sort &
aggregation steps are performed. These additional passes are sourced by
the previous sort step; thus avoiding repeated scans of the source data.
The code is structured in a way that adding support for purely using
hash aggregation or a mix of hashing and sorting is possible. Sorting
was chosen to be supported first, as it is the most generic method of
implementation.
Instead of, as in an earlier versions of the patch, representing the
chain of sort and aggregation steps as full blown planner and executor
nodes, all but the first sort are performed inside the aggregation node
itself. This avoids the need to do some unusual gymnastics to handle
having to return aggregated and non-aggregated tuples from underlying
nodes, as well as having to shut down underlying nodes early to limit
memory usage. The optimizer still builds Sort/Agg node to describe each
phase, but they're not part of the plan tree, but instead additional
data for the aggregation node. They're a convenient and preexisting way
to describe aggregation and sorting. The first (and possibly only) sort
step is still performed as a separate execution step. That retains
similarity with existing group by plans, makes rescans fairly simple,
avoids very deep plans (leading to slow explains) and easily allows to
avoid the sorting step if the underlying data is sorted by other means.
A somewhat ugly side of this patch is having to deal with a grammar
ambiguity between the new CUBE keyword and the cube extension/functions
named cube (and rollup). To avoid breaking existing deployments of the
cube extension it has not been renamed, neither has cube been made a
reserved keyword. Instead precedence hacking is used to make GROUP BY
cube(..) refer to the CUBE grouping sets feature, and not the function
cube(). To actually group by a function cube(), unlikely as that might
be, the function name has to be quoted.
Needs a catversion bump because stored rules may change.
Author: Andrew Gierth and Atri Sharma, with contributions from Andres Freund
Reviewed-By: Andres Freund, Noah Misch, Tom Lane, Svenne Krap, Tomas
Vondra, Erik Rijkers, Marti Raudsepp, Pavel Stehule
Discussion: CAOeZVidmVRe2jU6aMk_5qkxnB7dfmPROzM7Ur8JPW5j8Y5X-Lw@mail.gmail.com
2015-05-15 21:40:59 -04:00
|
|
|
select a, b, grouping(a,b), sum(v), count(*), max(v)
|
|
|
|
|
from gstest1 group by rollup (a,b);
|
|
|
|
|
select a, b, grouping(a,b), sum(v), count(*), max(v)
|
|
|
|
|
from gstest1 group by rollup (a,b) order by a,b;
|
|
|
|
|
select a, b, grouping(a,b), sum(v), count(*), max(v)
|
|
|
|
|
from gstest1 group by rollup (a,b) order by b desc, a;
|
|
|
|
|
select a, b, grouping(a,b), sum(v), count(*), max(v)
|
|
|
|
|
from gstest1 group by rollup (a,b) order by coalesce(a,0)+coalesce(b,0);
|
|
|
|
|
|
|
|
|
|
-- various types of ordered aggs
|
|
|
|
|
select a, b, grouping(a,b),
|
|
|
|
|
array_agg(v order by v),
|
|
|
|
|
string_agg(v::text, ':' order by v desc),
|
|
|
|
|
percentile_disc(0.5) within group (order by v),
|
|
|
|
|
rank(1,2,12) within group (order by a,b,v)
|
|
|
|
|
from gstest1 group by rollup (a,b) order by a,b;
|
|
|
|
|
|
|
|
|
|
-- test usage of grouped columns in direct args of aggs
|
|
|
|
|
select grouping(a), a, array_agg(b),
|
|
|
|
|
rank(a) within group (order by b nulls first),
|
|
|
|
|
rank(a) within group (order by b nulls last)
|
|
|
|
|
from (values (1,1),(1,4),(1,5),(3,1),(3,2)) v(a,b)
|
|
|
|
|
group by rollup (a) order by a;
|
|
|
|
|
|
|
|
|
|
-- nesting with window functions
|
|
|
|
|
select a, b, sum(c), sum(sum(c)) over (order by a,b) as rsum
|
|
|
|
|
from gstest2 group by rollup (a,b) order by rsum, a, b;
|
|
|
|
|
|
2015-07-26 10:37:49 -04:00
|
|
|
-- nesting with grouping sets
|
|
|
|
|
select sum(c) from gstest2
|
|
|
|
|
group by grouping sets((), grouping sets((), grouping sets(())))
|
|
|
|
|
order by 1 desc;
|
|
|
|
|
select sum(c) from gstest2
|
|
|
|
|
group by grouping sets((), grouping sets((), grouping sets(((a, b)))))
|
|
|
|
|
order by 1 desc;
|
|
|
|
|
select sum(c) from gstest2
|
|
|
|
|
group by grouping sets(grouping sets(rollup(c), grouping sets(cube(c))))
|
|
|
|
|
order by 1 desc;
|
|
|
|
|
select sum(c) from gstest2
|
|
|
|
|
group by grouping sets(a, grouping sets(a, cube(b)))
|
|
|
|
|
order by 1 desc;
|
|
|
|
|
select sum(c) from gstest2
|
|
|
|
|
group by grouping sets(grouping sets((a, (b))))
|
|
|
|
|
order by 1 desc;
|
|
|
|
|
select sum(c) from gstest2
|
|
|
|
|
group by grouping sets(grouping sets((a, b)))
|
|
|
|
|
order by 1 desc;
|
|
|
|
|
select sum(c) from gstest2
|
|
|
|
|
group by grouping sets(grouping sets(a, grouping sets(a), a))
|
|
|
|
|
order by 1 desc;
|
|
|
|
|
select sum(c) from gstest2
|
|
|
|
|
group by grouping sets(grouping sets(a, grouping sets(a, grouping sets(a), ((a)), a, grouping sets(a), (a)), a))
|
|
|
|
|
order by 1 desc;
|
|
|
|
|
select sum(c) from gstest2
|
|
|
|
|
group by grouping sets((a,(a,b)), grouping sets((a,(a,b)),a))
|
|
|
|
|
order by 1 desc;
|
|
|
|
|
|
Support GROUPING SETS, CUBE and ROLLUP.
This SQL standard functionality allows to aggregate data by different
GROUP BY clauses at once. Each grouping set returns rows with columns
grouped by in other sets set to NULL.
This could previously be achieved by doing each grouping as a separate
query, conjoined by UNION ALLs. Besides being considerably more concise,
grouping sets will in many cases be faster, requiring only one scan over
the underlying data.
The current implementation of grouping sets only supports using sorting
for input. Individual sets that share a sort order are computed in one
pass. If there are sets that don't share a sort order, additional sort &
aggregation steps are performed. These additional passes are sourced by
the previous sort step; thus avoiding repeated scans of the source data.
The code is structured in a way that adding support for purely using
hash aggregation or a mix of hashing and sorting is possible. Sorting
was chosen to be supported first, as it is the most generic method of
implementation.
Instead of, as in an earlier versions of the patch, representing the
chain of sort and aggregation steps as full blown planner and executor
nodes, all but the first sort are performed inside the aggregation node
itself. This avoids the need to do some unusual gymnastics to handle
having to return aggregated and non-aggregated tuples from underlying
nodes, as well as having to shut down underlying nodes early to limit
memory usage. The optimizer still builds Sort/Agg node to describe each
phase, but they're not part of the plan tree, but instead additional
data for the aggregation node. They're a convenient and preexisting way
to describe aggregation and sorting. The first (and possibly only) sort
step is still performed as a separate execution step. That retains
similarity with existing group by plans, makes rescans fairly simple,
avoids very deep plans (leading to slow explains) and easily allows to
avoid the sorting step if the underlying data is sorted by other means.
A somewhat ugly side of this patch is having to deal with a grammar
ambiguity between the new CUBE keyword and the cube extension/functions
named cube (and rollup). To avoid breaking existing deployments of the
cube extension it has not been renamed, neither has cube been made a
reserved keyword. Instead precedence hacking is used to make GROUP BY
cube(..) refer to the CUBE grouping sets feature, and not the function
cube(). To actually group by a function cube(), unlikely as that might
be, the function name has to be quoted.
Needs a catversion bump because stored rules may change.
Author: Andrew Gierth and Atri Sharma, with contributions from Andres Freund
Reviewed-By: Andres Freund, Noah Misch, Tom Lane, Svenne Krap, Tomas
Vondra, Erik Rijkers, Marti Raudsepp, Pavel Stehule
Discussion: CAOeZVidmVRe2jU6aMk_5qkxnB7dfmPROzM7Ur8JPW5j8Y5X-Lw@mail.gmail.com
2015-05-15 21:40:59 -04:00
|
|
|
-- empty input: first is 0 rows, second 1, third 3 etc.
|
|
|
|
|
select a, b, sum(v), count(*) from gstest_empty group by grouping sets ((a,b),a);
|
|
|
|
|
select a, b, sum(v), count(*) from gstest_empty group by grouping sets ((a,b),());
|
|
|
|
|
select a, b, sum(v), count(*) from gstest_empty group by grouping sets ((a,b),(),(),());
|
|
|
|
|
select sum(v), count(*) from gstest_empty group by grouping sets ((),(),());
|
|
|
|
|
|
|
|
|
|
-- empty input with joins tests some important code paths
|
|
|
|
|
select t1.a, t2.b, sum(t1.v), count(*) from gstest_empty t1, gstest_empty t2
|
|
|
|
|
group by grouping sets ((t1.a,t2.b),());
|
|
|
|
|
|
|
|
|
|
-- simple joins, var resolution, GROUPING on join vars
|
|
|
|
|
select t1.a, t2.b, grouping(t1.a, t2.b), sum(t1.v), max(t2.a)
|
|
|
|
|
from gstest1 t1, gstest2 t2
|
|
|
|
|
group by grouping sets ((t1.a, t2.b), ());
|
|
|
|
|
|
|
|
|
|
select t1.a, t2.b, grouping(t1.a, t2.b), sum(t1.v), max(t2.a)
|
|
|
|
|
from gstest1 t1 join gstest2 t2 on (t1.a=t2.a)
|
|
|
|
|
group by grouping sets ((t1.a, t2.b), ());
|
|
|
|
|
|
|
|
|
|
select a, b, grouping(a, b), sum(t1.v), max(t2.c)
|
|
|
|
|
from gstest1 t1 join gstest2 t2 using (a,b)
|
|
|
|
|
group by grouping sets ((a, b), ());
|
|
|
|
|
|
|
|
|
|
-- check that functionally dependent cols are not nulled
|
|
|
|
|
select a, d, grouping(a,b,c)
|
|
|
|
|
from gstest3
|
|
|
|
|
group by grouping sets ((a,b), (a,c));
|
|
|
|
|
|
Make setrefs.c match by ressortgroupref even for plain Vars.
Previously, we skipped using search_indexed_tlist_for_sortgroupref()
if the tlist expression being sought in the child plan node was merely
a Var. This is purely an optimization, based on the theory that
search_indexed_tlist_for_var() is faster, and one copy of a Var should
be as good as another. However, the GROUPING SETS patch broke the
latter assumption: grouping columns containing the "same" Var can
sometimes have different outputs, as shown in the test case added here.
So do it the hard way whenever a ressortgroupref marking exists.
(If this seems like a bottleneck, we could imagine building a tlist index
data structure for ressortgroupref values, as we do for Vars. But I'll
let that idea go until there's some evidence it's worthwhile.)
Back-patch to 9.6. The problem also exists in 9.5 where GROUPING SETS
came in, but this patch is insufficient to resolve the problem in 9.5:
there is some obscure dependency on the upper-planner-pathification
work that happened in 9.6. Given that this is such a weird corner case,
and no end users have complained about it, it doesn't seem worth the work
to develop a fix for 9.5.
Patch by me, per a report from Heikki Linnakangas. (This does not fix
Heikki's original complaint, just the follow-on one.)
Discussion: https://postgr.es/m/aefc657e-edb2-64d5-6df1-a0828f6e9104@iki.fi
2017-10-26 12:17:40 -04:00
|
|
|
-- check that distinct grouping columns are kept separate
|
|
|
|
|
-- even if they are equal()
|
|
|
|
|
explain (costs off)
|
|
|
|
|
select g as alias1, g as alias2
|
|
|
|
|
from generate_series(1,3) g
|
|
|
|
|
group by alias1, rollup(alias2);
|
|
|
|
|
|
|
|
|
|
select g as alias1, g as alias2
|
|
|
|
|
from generate_series(1,3) g
|
|
|
|
|
group by alias1, rollup(alias2);
|
|
|
|
|
|
2018-01-12 12:24:50 -05:00
|
|
|
-- check that pulled-up subquery outputs still go to null when appropriate
|
|
|
|
|
select four, x
|
|
|
|
|
from (select four, ten, 'foo'::text as x from tenk1) as t
|
|
|
|
|
group by grouping sets (four, x)
|
|
|
|
|
having x = 'foo';
|
|
|
|
|
|
|
|
|
|
select four, x || 'x'
|
|
|
|
|
from (select four, ten, 'foo'::text as x from tenk1) as t
|
|
|
|
|
group by grouping sets (four, x)
|
|
|
|
|
order by four;
|
|
|
|
|
|
|
|
|
|
select (x+y)*1, sum(z)
|
|
|
|
|
from (select 1 as x, 2 as y, 3 as z) s
|
|
|
|
|
group by grouping sets (x+y, x);
|
|
|
|
|
|
|
|
|
|
select x, not x as not_x, q2 from
|
|
|
|
|
(select *, q1 = 1 as x from int8_tbl i1) as t
|
|
|
|
|
group by grouping sets(x, q2)
|
|
|
|
|
order by x, q2;
|
|
|
|
|
|
2020-08-22 14:46:40 -04:00
|
|
|
-- check qual push-down rules for a subquery with grouping sets
|
|
|
|
|
explain (verbose, costs off)
|
|
|
|
|
select * from (
|
|
|
|
|
select 1 as x, q1, sum(q2)
|
|
|
|
|
from int8_tbl i1
|
|
|
|
|
group by grouping sets(1, 2)
|
|
|
|
|
) ss
|
|
|
|
|
where x = 1 and q1 = 123;
|
|
|
|
|
|
|
|
|
|
select * from (
|
|
|
|
|
select 1 as x, q1, sum(q2)
|
|
|
|
|
from int8_tbl i1
|
|
|
|
|
group by grouping sets(1, 2)
|
|
|
|
|
) ss
|
|
|
|
|
where x = 1 and q1 = 123;
|
|
|
|
|
|
Support GROUPING SETS, CUBE and ROLLUP.
This SQL standard functionality allows to aggregate data by different
GROUP BY clauses at once. Each grouping set returns rows with columns
grouped by in other sets set to NULL.
This could previously be achieved by doing each grouping as a separate
query, conjoined by UNION ALLs. Besides being considerably more concise,
grouping sets will in many cases be faster, requiring only one scan over
the underlying data.
The current implementation of grouping sets only supports using sorting
for input. Individual sets that share a sort order are computed in one
pass. If there are sets that don't share a sort order, additional sort &
aggregation steps are performed. These additional passes are sourced by
the previous sort step; thus avoiding repeated scans of the source data.
The code is structured in a way that adding support for purely using
hash aggregation or a mix of hashing and sorting is possible. Sorting
was chosen to be supported first, as it is the most generic method of
implementation.
Instead of, as in an earlier versions of the patch, representing the
chain of sort and aggregation steps as full blown planner and executor
nodes, all but the first sort are performed inside the aggregation node
itself. This avoids the need to do some unusual gymnastics to handle
having to return aggregated and non-aggregated tuples from underlying
nodes, as well as having to shut down underlying nodes early to limit
memory usage. The optimizer still builds Sort/Agg node to describe each
phase, but they're not part of the plan tree, but instead additional
data for the aggregation node. They're a convenient and preexisting way
to describe aggregation and sorting. The first (and possibly only) sort
step is still performed as a separate execution step. That retains
similarity with existing group by plans, makes rescans fairly simple,
avoids very deep plans (leading to slow explains) and easily allows to
avoid the sorting step if the underlying data is sorted by other means.
A somewhat ugly side of this patch is having to deal with a grammar
ambiguity between the new CUBE keyword and the cube extension/functions
named cube (and rollup). To avoid breaking existing deployments of the
cube extension it has not been renamed, neither has cube been made a
reserved keyword. Instead precedence hacking is used to make GROUP BY
cube(..) refer to the CUBE grouping sets feature, and not the function
cube(). To actually group by a function cube(), unlikely as that might
be, the function name has to be quoted.
Needs a catversion bump because stored rules may change.
Author: Andrew Gierth and Atri Sharma, with contributions from Andres Freund
Reviewed-By: Andres Freund, Noah Misch, Tom Lane, Svenne Krap, Tomas
Vondra, Erik Rijkers, Marti Raudsepp, Pavel Stehule
Discussion: CAOeZVidmVRe2jU6aMk_5qkxnB7dfmPROzM7Ur8JPW5j8Y5X-Lw@mail.gmail.com
2015-05-15 21:40:59 -04:00
|
|
|
-- simple rescan tests
|
|
|
|
|
|
|
|
|
|
select a, b, sum(v.x)
|
|
|
|
|
from (values (1),(2)) v(x), gstest_data(v.x)
|
|
|
|
|
group by rollup (a,b);
|
|
|
|
|
|
|
|
|
|
select *
|
|
|
|
|
from (values (1),(2)) v(x),
|
|
|
|
|
lateral (select a, b, sum(v.x) from gstest_data(v.x) group by rollup (a,b)) s;
|
|
|
|
|
|
2017-03-14 11:38:30 -04:00
|
|
|
-- min max optimization should still work with GROUP BY ()
|
Support GROUPING SETS, CUBE and ROLLUP.
This SQL standard functionality allows to aggregate data by different
GROUP BY clauses at once. Each grouping set returns rows with columns
grouped by in other sets set to NULL.
This could previously be achieved by doing each grouping as a separate
query, conjoined by UNION ALLs. Besides being considerably more concise,
grouping sets will in many cases be faster, requiring only one scan over
the underlying data.
The current implementation of grouping sets only supports using sorting
for input. Individual sets that share a sort order are computed in one
pass. If there are sets that don't share a sort order, additional sort &
aggregation steps are performed. These additional passes are sourced by
the previous sort step; thus avoiding repeated scans of the source data.
The code is structured in a way that adding support for purely using
hash aggregation or a mix of hashing and sorting is possible. Sorting
was chosen to be supported first, as it is the most generic method of
implementation.
Instead of, as in an earlier versions of the patch, representing the
chain of sort and aggregation steps as full blown planner and executor
nodes, all but the first sort are performed inside the aggregation node
itself. This avoids the need to do some unusual gymnastics to handle
having to return aggregated and non-aggregated tuples from underlying
nodes, as well as having to shut down underlying nodes early to limit
memory usage. The optimizer still builds Sort/Agg node to describe each
phase, but they're not part of the plan tree, but instead additional
data for the aggregation node. They're a convenient and preexisting way
to describe aggregation and sorting. The first (and possibly only) sort
step is still performed as a separate execution step. That retains
similarity with existing group by plans, makes rescans fairly simple,
avoids very deep plans (leading to slow explains) and easily allows to
avoid the sorting step if the underlying data is sorted by other means.
A somewhat ugly side of this patch is having to deal with a grammar
ambiguity between the new CUBE keyword and the cube extension/functions
named cube (and rollup). To avoid breaking existing deployments of the
cube extension it has not been renamed, neither has cube been made a
reserved keyword. Instead precedence hacking is used to make GROUP BY
cube(..) refer to the CUBE grouping sets feature, and not the function
cube(). To actually group by a function cube(), unlikely as that might
be, the function name has to be quoted.
Needs a catversion bump because stored rules may change.
Author: Andrew Gierth and Atri Sharma, with contributions from Andres Freund
Reviewed-By: Andres Freund, Noah Misch, Tom Lane, Svenne Krap, Tomas
Vondra, Erik Rijkers, Marti Raudsepp, Pavel Stehule
Discussion: CAOeZVidmVRe2jU6aMk_5qkxnB7dfmPROzM7Ur8JPW5j8Y5X-Lw@mail.gmail.com
2015-05-15 21:40:59 -04:00
|
|
|
explain (costs off)
|
|
|
|
|
select min(unique1) from tenk1 GROUP BY ();
|
|
|
|
|
|
|
|
|
|
-- Views with GROUPING SET queries
|
|
|
|
|
CREATE VIEW gstest_view AS select a, b, grouping(a,b), sum(c), count(*), max(c)
|
|
|
|
|
from gstest2 group by rollup ((a,b,c),(c,d));
|
|
|
|
|
|
|
|
|
|
select pg_get_viewdef('gstest_view'::regclass, true);
|
|
|
|
|
|
|
|
|
|
-- Nested queries with 3 or more levels of nesting
|
|
|
|
|
select(select (select grouping(a,b) from (values (1)) v2(c)) from (values (1,2)) v1(a,b) group by (a,b)) from (values(6,7)) v3(e,f) GROUP BY ROLLUP(e,f);
|
|
|
|
|
select(select (select grouping(e,f) from (values (1)) v2(c)) from (values (1,2)) v1(a,b) group by (a,b)) from (values(6,7)) v3(e,f) GROUP BY ROLLUP(e,f);
|
|
|
|
|
select(select (select grouping(c) from (values (1)) v2(c) GROUP BY c) from (values (1,2)) v1(a,b) group by (a,b)) from (values(6,7)) v3(e,f) GROUP BY ROLLUP(e,f);
|
|
|
|
|
|
|
|
|
|
-- Combinations of operations
|
|
|
|
|
select a, b, c, d from gstest2 group by rollup(a,b),grouping sets(c,d);
|
|
|
|
|
select a, b from (values (1,2),(2,3)) v(a,b) group by a,b, grouping sets(a);
|
|
|
|
|
|
|
|
|
|
-- Tests for chained aggregates
|
|
|
|
|
select a, b, grouping(a,b), sum(v), count(*), max(v)
|
2017-03-26 23:20:54 -04:00
|
|
|
from gstest1 group by grouping sets ((a,b),(a+1,b+1),(a+2,b+2)) order by 3,6;
|
Support GROUPING SETS, CUBE and ROLLUP.
This SQL standard functionality allows to aggregate data by different
GROUP BY clauses at once. Each grouping set returns rows with columns
grouped by in other sets set to NULL.
This could previously be achieved by doing each grouping as a separate
query, conjoined by UNION ALLs. Besides being considerably more concise,
grouping sets will in many cases be faster, requiring only one scan over
the underlying data.
The current implementation of grouping sets only supports using sorting
for input. Individual sets that share a sort order are computed in one
pass. If there are sets that don't share a sort order, additional sort &
aggregation steps are performed. These additional passes are sourced by
the previous sort step; thus avoiding repeated scans of the source data.
The code is structured in a way that adding support for purely using
hash aggregation or a mix of hashing and sorting is possible. Sorting
was chosen to be supported first, as it is the most generic method of
implementation.
Instead of, as in an earlier versions of the patch, representing the
chain of sort and aggregation steps as full blown planner and executor
nodes, all but the first sort are performed inside the aggregation node
itself. This avoids the need to do some unusual gymnastics to handle
having to return aggregated and non-aggregated tuples from underlying
nodes, as well as having to shut down underlying nodes early to limit
memory usage. The optimizer still builds Sort/Agg node to describe each
phase, but they're not part of the plan tree, but instead additional
data for the aggregation node. They're a convenient and preexisting way
to describe aggregation and sorting. The first (and possibly only) sort
step is still performed as a separate execution step. That retains
similarity with existing group by plans, makes rescans fairly simple,
avoids very deep plans (leading to slow explains) and easily allows to
avoid the sorting step if the underlying data is sorted by other means.
A somewhat ugly side of this patch is having to deal with a grammar
ambiguity between the new CUBE keyword and the cube extension/functions
named cube (and rollup). To avoid breaking existing deployments of the
cube extension it has not been renamed, neither has cube been made a
reserved keyword. Instead precedence hacking is used to make GROUP BY
cube(..) refer to the CUBE grouping sets feature, and not the function
cube(). To actually group by a function cube(), unlikely as that might
be, the function name has to be quoted.
Needs a catversion bump because stored rules may change.
Author: Andrew Gierth and Atri Sharma, with contributions from Andres Freund
Reviewed-By: Andres Freund, Noah Misch, Tom Lane, Svenne Krap, Tomas
Vondra, Erik Rijkers, Marti Raudsepp, Pavel Stehule
Discussion: CAOeZVidmVRe2jU6aMk_5qkxnB7dfmPROzM7Ur8JPW5j8Y5X-Lw@mail.gmail.com
2015-05-15 21:40:59 -04:00
|
|
|
select(select (select grouping(a,b) from (values (1)) v2(c)) from (values (1,2)) v1(a,b) group by (a,b)) from (values(6,7)) v3(e,f) GROUP BY ROLLUP((e+1),(f+1));
|
|
|
|
|
select(select (select grouping(a,b) from (values (1)) v2(c)) from (values (1,2)) v1(a,b) group by (a,b)) from (values(6,7)) v3(e,f) GROUP BY CUBE((e+1),(f+1)) ORDER BY (e+1),(f+1);
|
|
|
|
|
select a, b, sum(c), sum(sum(c)) over (order by a,b) as rsum
|
|
|
|
|
from gstest2 group by cube (a,b) order by rsum, a, b;
|
|
|
|
|
select a, b, sum(c) from (values (1,1,10),(1,1,11),(1,2,12),(1,2,13),(1,3,14),(2,3,15),(3,3,16),(3,4,17),(4,1,18),(4,1,19)) v(a,b,c) group by rollup (a,b);
|
|
|
|
|
select a, b, sum(v.x)
|
|
|
|
|
from (values (1),(2)) v(x), gstest_data(v.x)
|
|
|
|
|
group by cube (a,b) order by a,b;
|
|
|
|
|
|
2019-06-30 18:49:13 -04:00
|
|
|
-- Test reordering of grouping sets
|
|
|
|
|
explain (costs off)
|
|
|
|
|
select * from gstest1 group by grouping sets((a,b,v),(v)) order by v,b,a;
|
Support GROUPING SETS, CUBE and ROLLUP.
This SQL standard functionality allows to aggregate data by different
GROUP BY clauses at once. Each grouping set returns rows with columns
grouped by in other sets set to NULL.
This could previously be achieved by doing each grouping as a separate
query, conjoined by UNION ALLs. Besides being considerably more concise,
grouping sets will in many cases be faster, requiring only one scan over
the underlying data.
The current implementation of grouping sets only supports using sorting
for input. Individual sets that share a sort order are computed in one
pass. If there are sets that don't share a sort order, additional sort &
aggregation steps are performed. These additional passes are sourced by
the previous sort step; thus avoiding repeated scans of the source data.
The code is structured in a way that adding support for purely using
hash aggregation or a mix of hashing and sorting is possible. Sorting
was chosen to be supported first, as it is the most generic method of
implementation.
Instead of, as in an earlier versions of the patch, representing the
chain of sort and aggregation steps as full blown planner and executor
nodes, all but the first sort are performed inside the aggregation node
itself. This avoids the need to do some unusual gymnastics to handle
having to return aggregated and non-aggregated tuples from underlying
nodes, as well as having to shut down underlying nodes early to limit
memory usage. The optimizer still builds Sort/Agg node to describe each
phase, but they're not part of the plan tree, but instead additional
data for the aggregation node. They're a convenient and preexisting way
to describe aggregation and sorting. The first (and possibly only) sort
step is still performed as a separate execution step. That retains
similarity with existing group by plans, makes rescans fairly simple,
avoids very deep plans (leading to slow explains) and easily allows to
avoid the sorting step if the underlying data is sorted by other means.
A somewhat ugly side of this patch is having to deal with a grammar
ambiguity between the new CUBE keyword and the cube extension/functions
named cube (and rollup). To avoid breaking existing deployments of the
cube extension it has not been renamed, neither has cube been made a
reserved keyword. Instead precedence hacking is used to make GROUP BY
cube(..) refer to the CUBE grouping sets feature, and not the function
cube(). To actually group by a function cube(), unlikely as that might
be, the function name has to be quoted.
Needs a catversion bump because stored rules may change.
Author: Andrew Gierth and Atri Sharma, with contributions from Andres Freund
Reviewed-By: Andres Freund, Noah Misch, Tom Lane, Svenne Krap, Tomas
Vondra, Erik Rijkers, Marti Raudsepp, Pavel Stehule
Discussion: CAOeZVidmVRe2jU6aMk_5qkxnB7dfmPROzM7Ur8JPW5j8Y5X-Lw@mail.gmail.com
2015-05-15 21:40:59 -04:00
|
|
|
|
|
|
|
|
-- Agg level check. This query should error out.
|
|
|
|
|
select (select grouping(a,b) from gstest2) from gstest2 group by a,b;
|
|
|
|
|
|
|
|
|
|
--Nested queries
|
|
|
|
|
select a, b, sum(c), count(*) from gstest2 group by grouping sets (rollup(a,b),a);
|
|
|
|
|
|
|
|
|
|
-- HAVING queries
|
|
|
|
|
select ten, sum(distinct four) from onek a
|
|
|
|
|
group by grouping sets((ten,four),(ten))
|
|
|
|
|
having exists (select 1 from onek b where sum(distinct a.four) = b.four);
|
|
|
|
|
|
2016-02-08 05:03:31 -05:00
|
|
|
-- Tests around pushdown of HAVING clauses, partially testing against previous bugs
|
|
|
|
|
select a,count(*) from gstest2 group by rollup(a) order by a;
|
|
|
|
|
select a,count(*) from gstest2 group by rollup(a) having a is distinct from 1 order by a;
|
|
|
|
|
explain (costs off)
|
|
|
|
|
select a,count(*) from gstest2 group by rollup(a) having a is distinct from 1 order by a;
|
|
|
|
|
|
|
|
|
|
select v.c, (select count(*) from gstest2 group by () having v.c)
|
|
|
|
|
from (values (false),(true)) v(c) order by v.c;
|
|
|
|
|
explain (costs off)
|
|
|
|
|
select v.c, (select count(*) from gstest2 group by () having v.c)
|
|
|
|
|
from (values (false),(true)) v(c) order by v.c;
|
|
|
|
|
|
2015-07-26 09:34:29 -04:00
|
|
|
-- HAVING with GROUPING queries
|
|
|
|
|
select ten, grouping(ten) from onek
|
|
|
|
|
group by grouping sets(ten) having grouping(ten) >= 0
|
|
|
|
|
order by 2,1;
|
|
|
|
|
select ten, grouping(ten) from onek
|
|
|
|
|
group by grouping sets(ten, four) having grouping(ten) > 0
|
|
|
|
|
order by 2,1;
|
|
|
|
|
select ten, grouping(ten) from onek
|
|
|
|
|
group by rollup(ten) having grouping(ten) > 0
|
|
|
|
|
order by 2,1;
|
|
|
|
|
select ten, grouping(ten) from onek
|
|
|
|
|
group by cube(ten) having grouping(ten) > 0
|
|
|
|
|
order by 2,1;
|
|
|
|
|
select ten, grouping(ten) from onek
|
|
|
|
|
group by (ten) having grouping(ten) >= 0
|
|
|
|
|
order by 2,1;
|
|
|
|
|
|
Support GROUPING SETS, CUBE and ROLLUP.
This SQL standard functionality allows to aggregate data by different
GROUP BY clauses at once. Each grouping set returns rows with columns
grouped by in other sets set to NULL.
This could previously be achieved by doing each grouping as a separate
query, conjoined by UNION ALLs. Besides being considerably more concise,
grouping sets will in many cases be faster, requiring only one scan over
the underlying data.
The current implementation of grouping sets only supports using sorting
for input. Individual sets that share a sort order are computed in one
pass. If there are sets that don't share a sort order, additional sort &
aggregation steps are performed. These additional passes are sourced by
the previous sort step; thus avoiding repeated scans of the source data.
The code is structured in a way that adding support for purely using
hash aggregation or a mix of hashing and sorting is possible. Sorting
was chosen to be supported first, as it is the most generic method of
implementation.
Instead of, as in an earlier versions of the patch, representing the
chain of sort and aggregation steps as full blown planner and executor
nodes, all but the first sort are performed inside the aggregation node
itself. This avoids the need to do some unusual gymnastics to handle
having to return aggregated and non-aggregated tuples from underlying
nodes, as well as having to shut down underlying nodes early to limit
memory usage. The optimizer still builds Sort/Agg node to describe each
phase, but they're not part of the plan tree, but instead additional
data for the aggregation node. They're a convenient and preexisting way
to describe aggregation and sorting. The first (and possibly only) sort
step is still performed as a separate execution step. That retains
similarity with existing group by plans, makes rescans fairly simple,
avoids very deep plans (leading to slow explains) and easily allows to
avoid the sorting step if the underlying data is sorted by other means.
A somewhat ugly side of this patch is having to deal with a grammar
ambiguity between the new CUBE keyword and the cube extension/functions
named cube (and rollup). To avoid breaking existing deployments of the
cube extension it has not been renamed, neither has cube been made a
reserved keyword. Instead precedence hacking is used to make GROUP BY
cube(..) refer to the CUBE grouping sets feature, and not the function
cube(). To actually group by a function cube(), unlikely as that might
be, the function name has to be quoted.
Needs a catversion bump because stored rules may change.
Author: Andrew Gierth and Atri Sharma, with contributions from Andres Freund
Reviewed-By: Andres Freund, Noah Misch, Tom Lane, Svenne Krap, Tomas
Vondra, Erik Rijkers, Marti Raudsepp, Pavel Stehule
Discussion: CAOeZVidmVRe2jU6aMk_5qkxnB7dfmPROzM7Ur8JPW5j8Y5X-Lw@mail.gmail.com
2015-05-15 21:40:59 -04:00
|
|
|
-- FILTER queries
|
|
|
|
|
select ten, sum(distinct four) filter (where four::text ~ '123') from onek a
|
|
|
|
|
group by rollup(ten);
|
|
|
|
|
|
|
|
|
|
-- More rescan tests
|
|
|
|
|
select * from (values (1),(2)) v(a) left join lateral (select v.a, four, ten, count(*) from onek group by cube(four,ten)) s on true order by v.a,four,ten;
|
|
|
|
|
select array(select row(v.a,s1.*) from (select two,four, count(*) from onek group by cube(two,four) order by two,four) s1) from (values (1),(2)) v(a);
|
|
|
|
|
|
2015-07-26 09:17:44 -04:00
|
|
|
-- Grouping on text columns
|
|
|
|
|
select sum(ten) from onek group by two, rollup(four::text) order by 1;
|
|
|
|
|
select sum(ten) from onek group by rollup(four::text), two order by 1;
|
|
|
|
|
|
2017-03-26 23:20:54 -04:00
|
|
|
-- hashing support
|
|
|
|
|
|
|
|
|
|
set enable_hashagg = true;
|
|
|
|
|
|
|
|
|
|
-- failure cases
|
|
|
|
|
|
|
|
|
|
select count(*) from gstest4 group by rollup(unhashable_col,unsortable_col);
|
|
|
|
|
select array_agg(v order by v) from gstest4 group by grouping sets ((id,unsortable_col),(id));
|
|
|
|
|
|
|
|
|
|
-- simple cases
|
|
|
|
|
|
|
|
|
|
select a, b, grouping(a,b), sum(v), count(*), max(v)
|
|
|
|
|
from gstest1 group by grouping sets ((a),(b)) order by 3,1,2;
|
|
|
|
|
explain (costs off) select a, b, grouping(a,b), sum(v), count(*), max(v)
|
|
|
|
|
from gstest1 group by grouping sets ((a),(b)) order by 3,1,2;
|
|
|
|
|
|
|
|
|
|
select a, b, grouping(a,b), sum(v), count(*), max(v)
|
|
|
|
|
from gstest1 group by cube(a,b) order by 3,1,2;
|
|
|
|
|
explain (costs off) select a, b, grouping(a,b), sum(v), count(*), max(v)
|
|
|
|
|
from gstest1 group by cube(a,b) order by 3,1,2;
|
|
|
|
|
|
|
|
|
|
-- shouldn't try and hash
|
|
|
|
|
explain (costs off)
|
|
|
|
|
select a, b, grouping(a,b), array_agg(v order by v)
|
|
|
|
|
from gstest1 group by cube(a,b);
|
|
|
|
|
|
2018-03-21 06:42:04 -04:00
|
|
|
-- unsortable cases
|
|
|
|
|
select unsortable_col, count(*)
|
|
|
|
|
from gstest4 group by grouping sets ((unsortable_col),(unsortable_col))
|
|
|
|
|
order by unsortable_col::text;
|
|
|
|
|
|
2017-03-26 23:20:54 -04:00
|
|
|
-- mixed hashable/sortable cases
|
|
|
|
|
select unhashable_col, unsortable_col,
|
|
|
|
|
grouping(unhashable_col, unsortable_col),
|
|
|
|
|
count(*), sum(v)
|
|
|
|
|
from gstest4 group by grouping sets ((unhashable_col),(unsortable_col))
|
|
|
|
|
order by 3, 5;
|
|
|
|
|
explain (costs off)
|
|
|
|
|
select unhashable_col, unsortable_col,
|
|
|
|
|
grouping(unhashable_col, unsortable_col),
|
|
|
|
|
count(*), sum(v)
|
|
|
|
|
from gstest4 group by grouping sets ((unhashable_col),(unsortable_col))
|
|
|
|
|
order by 3,5;
|
|
|
|
|
|
|
|
|
|
select unhashable_col, unsortable_col,
|
|
|
|
|
grouping(unhashable_col, unsortable_col),
|
|
|
|
|
count(*), sum(v)
|
|
|
|
|
from gstest4 group by grouping sets ((v,unhashable_col),(v,unsortable_col))
|
|
|
|
|
order by 3,5;
|
|
|
|
|
explain (costs off)
|
|
|
|
|
select unhashable_col, unsortable_col,
|
|
|
|
|
grouping(unhashable_col, unsortable_col),
|
|
|
|
|
count(*), sum(v)
|
|
|
|
|
from gstest4 group by grouping sets ((v,unhashable_col),(v,unsortable_col))
|
|
|
|
|
order by 3,5;
|
|
|
|
|
|
|
|
|
|
-- empty input: first is 0 rows, second 1, third 3 etc.
|
|
|
|
|
select a, b, sum(v), count(*) from gstest_empty group by grouping sets ((a,b),a);
|
|
|
|
|
explain (costs off)
|
|
|
|
|
select a, b, sum(v), count(*) from gstest_empty group by grouping sets ((a,b),a);
|
|
|
|
|
select a, b, sum(v), count(*) from gstest_empty group by grouping sets ((a,b),());
|
|
|
|
|
select a, b, sum(v), count(*) from gstest_empty group by grouping sets ((a,b),(),(),());
|
|
|
|
|
explain (costs off)
|
|
|
|
|
select a, b, sum(v), count(*) from gstest_empty group by grouping sets ((a,b),(),(),());
|
|
|
|
|
select sum(v), count(*) from gstest_empty group by grouping sets ((),(),());
|
|
|
|
|
explain (costs off)
|
|
|
|
|
select sum(v), count(*) from gstest_empty group by grouping sets ((),(),());
|
|
|
|
|
|
|
|
|
|
-- check that functionally dependent cols are not nulled
|
|
|
|
|
select a, d, grouping(a,b,c)
|
|
|
|
|
from gstest3
|
|
|
|
|
group by grouping sets ((a,b), (a,c));
|
|
|
|
|
explain (costs off)
|
|
|
|
|
select a, d, grouping(a,b,c)
|
|
|
|
|
from gstest3
|
|
|
|
|
group by grouping sets ((a,b), (a,c));
|
|
|
|
|
|
|
|
|
|
-- simple rescan tests
|
|
|
|
|
|
|
|
|
|
select a, b, sum(v.x)
|
|
|
|
|
from (values (1),(2)) v(x), gstest_data(v.x)
|
2018-01-29 14:02:09 -05:00
|
|
|
group by grouping sets (a,b)
|
|
|
|
|
order by 1, 2, 3;
|
2017-03-26 23:20:54 -04:00
|
|
|
explain (costs off)
|
|
|
|
|
select a, b, sum(v.x)
|
|
|
|
|
from (values (1),(2)) v(x), gstest_data(v.x)
|
2018-01-29 14:02:09 -05:00
|
|
|
group by grouping sets (a,b)
|
|
|
|
|
order by 3, 1, 2;
|
2017-03-26 23:20:54 -04:00
|
|
|
select *
|
|
|
|
|
from (values (1),(2)) v(x),
|
|
|
|
|
lateral (select a, b, sum(v.x) from gstest_data(v.x) group by grouping sets (a,b)) s;
|
|
|
|
|
explain (costs off)
|
|
|
|
|
select *
|
|
|
|
|
from (values (1),(2)) v(x),
|
|
|
|
|
lateral (select a, b, sum(v.x) from gstest_data(v.x) group by grouping sets (a,b)) s;
|
|
|
|
|
|
|
|
|
|
-- Tests for chained aggregates
|
|
|
|
|
select a, b, grouping(a,b), sum(v), count(*), max(v)
|
|
|
|
|
from gstest1 group by grouping sets ((a,b),(a+1,b+1),(a+2,b+2)) order by 3,6;
|
|
|
|
|
explain (costs off)
|
|
|
|
|
select a, b, grouping(a,b), sum(v), count(*), max(v)
|
|
|
|
|
from gstest1 group by grouping sets ((a,b),(a+1,b+1),(a+2,b+2)) order by 3,6;
|
|
|
|
|
select a, b, sum(c), sum(sum(c)) over (order by a,b) as rsum
|
|
|
|
|
from gstest2 group by cube (a,b) order by rsum, a, b;
|
|
|
|
|
explain (costs off)
|
|
|
|
|
select a, b, sum(c), sum(sum(c)) over (order by a,b) as rsum
|
|
|
|
|
from gstest2 group by cube (a,b) order by rsum, a, b;
|
|
|
|
|
select a, b, sum(v.x)
|
|
|
|
|
from (values (1),(2)) v(x), gstest_data(v.x)
|
|
|
|
|
group by cube (a,b) order by a,b;
|
|
|
|
|
explain (costs off)
|
|
|
|
|
select a, b, sum(v.x)
|
|
|
|
|
from (values (1),(2)) v(x), gstest_data(v.x)
|
|
|
|
|
group by cube (a,b) order by a,b;
|
|
|
|
|
|
Fix slot type handling for Agg nodes performing internal sorts.
Since 15d8f8312 we assert that - and since 7ef04e4d2cb2, 4da597edf1
rely on - the slot type for an expression's
ecxt_{outer,inner,scan}tuple not changing, unless explicitly flagged
as such. That allows to either skip deforming (for a virtual tuple
slot) or optimize the code for JIT accelerated deforming
appropriately (for other known slot types).
This assumption was sometimes violated for grouping sets, when
nodeAgg.c internally uses tuplesorts, and the child node doesn't
return a TTSOpsMinimalTuple type slot. Detect that case, and flag that
the outer slot might not be "fixed".
It's probably worthwhile to optimize this further in the future, and
more granularly determine whether the slot is fixed. As we already
instantiate per-phase transition and equal expressions, we could
cheaply set the slot type appropriately for each phase. But that's a
separate change from this bugfix.
This commit does include a very minor optimization by avoiding to
create a slot for handling tuplesorts, if no such sorts are
performed. Previously we created that slot unnecessarily in the common
case of computing all grouping sets via hashing. The code looked too
confusing without that, as the conditions for needing a sort slot and
flagging that the slot type isn't fixed, are the same.
Reported-By: Ashutosh Sharma
Author: Andres Freund
Discussion: https://postgr.es/m/CAE9k0PmNaMD2oHTEAhRyxnxpaDaYkuBYkLa1dpOpn=RS0iS2AQ@mail.gmail.com
Backpatch: 12-, where the bug was introduced in 15d8f8312
2019-07-25 17:22:52 -04:00
|
|
|
-- Verify that we correctly handle the child node returning a
|
|
|
|
|
-- non-minimal slot, which happens if the input is pre-sorted,
|
|
|
|
|
-- e.g. due to an index scan.
|
|
|
|
|
BEGIN;
|
|
|
|
|
SET LOCAL enable_hashagg = false;
|
2019-07-25 17:52:36 -04:00
|
|
|
EXPLAIN (COSTS OFF) SELECT a, b, count(*), max(a), max(b) FROM gstest3 GROUP BY GROUPING SETS(a, b,()) ORDER BY a, b;
|
Fix slot type handling for Agg nodes performing internal sorts.
Since 15d8f8312 we assert that - and since 7ef04e4d2cb2, 4da597edf1
rely on - the slot type for an expression's
ecxt_{outer,inner,scan}tuple not changing, unless explicitly flagged
as such. That allows to either skip deforming (for a virtual tuple
slot) or optimize the code for JIT accelerated deforming
appropriately (for other known slot types).
This assumption was sometimes violated for grouping sets, when
nodeAgg.c internally uses tuplesorts, and the child node doesn't
return a TTSOpsMinimalTuple type slot. Detect that case, and flag that
the outer slot might not be "fixed".
It's probably worthwhile to optimize this further in the future, and
more granularly determine whether the slot is fixed. As we already
instantiate per-phase transition and equal expressions, we could
cheaply set the slot type appropriately for each phase. But that's a
separate change from this bugfix.
This commit does include a very minor optimization by avoiding to
create a slot for handling tuplesorts, if no such sorts are
performed. Previously we created that slot unnecessarily in the common
case of computing all grouping sets via hashing. The code looked too
confusing without that, as the conditions for needing a sort slot and
flagging that the slot type isn't fixed, are the same.
Reported-By: Ashutosh Sharma
Author: Andres Freund
Discussion: https://postgr.es/m/CAE9k0PmNaMD2oHTEAhRyxnxpaDaYkuBYkLa1dpOpn=RS0iS2AQ@mail.gmail.com
Backpatch: 12-, where the bug was introduced in 15d8f8312
2019-07-25 17:22:52 -04:00
|
|
|
SELECT a, b, count(*), max(a), max(b) FROM gstest3 GROUP BY GROUPING SETS(a, b,()) ORDER BY a, b;
|
|
|
|
|
SET LOCAL enable_seqscan = false;
|
2019-07-25 17:52:36 -04:00
|
|
|
EXPLAIN (COSTS OFF) SELECT a, b, count(*), max(a), max(b) FROM gstest3 GROUP BY GROUPING SETS(a, b,()) ORDER BY a, b;
|
Fix slot type handling for Agg nodes performing internal sorts.
Since 15d8f8312 we assert that - and since 7ef04e4d2cb2, 4da597edf1
rely on - the slot type for an expression's
ecxt_{outer,inner,scan}tuple not changing, unless explicitly flagged
as such. That allows to either skip deforming (for a virtual tuple
slot) or optimize the code for JIT accelerated deforming
appropriately (for other known slot types).
This assumption was sometimes violated for grouping sets, when
nodeAgg.c internally uses tuplesorts, and the child node doesn't
return a TTSOpsMinimalTuple type slot. Detect that case, and flag that
the outer slot might not be "fixed".
It's probably worthwhile to optimize this further in the future, and
more granularly determine whether the slot is fixed. As we already
instantiate per-phase transition and equal expressions, we could
cheaply set the slot type appropriately for each phase. But that's a
separate change from this bugfix.
This commit does include a very minor optimization by avoiding to
create a slot for handling tuplesorts, if no such sorts are
performed. Previously we created that slot unnecessarily in the common
case of computing all grouping sets via hashing. The code looked too
confusing without that, as the conditions for needing a sort slot and
flagging that the slot type isn't fixed, are the same.
Reported-By: Ashutosh Sharma
Author: Andres Freund
Discussion: https://postgr.es/m/CAE9k0PmNaMD2oHTEAhRyxnxpaDaYkuBYkLa1dpOpn=RS0iS2AQ@mail.gmail.com
Backpatch: 12-, where the bug was introduced in 15d8f8312
2019-07-25 17:22:52 -04:00
|
|
|
SELECT a, b, count(*), max(a), max(b) FROM gstest3 GROUP BY GROUPING SETS(a, b,()) ORDER BY a, b;
|
|
|
|
|
COMMIT;
|
|
|
|
|
|
2017-03-26 23:20:54 -04:00
|
|
|
-- More rescan tests
|
|
|
|
|
select * from (values (1),(2)) v(a) left join lateral (select v.a, four, ten, count(*) from onek group by cube(four,ten)) s on true order by v.a,four,ten;
|
|
|
|
|
select array(select row(v.a,s1.*) from (select two,four, count(*) from onek group by cube(two,four) order by two,four) s1) from (values (1),(2)) v(a);
|
|
|
|
|
|
|
|
|
|
-- Rescan logic changes when there are no empty grouping sets, so test
|
|
|
|
|
-- that too:
|
|
|
|
|
select * from (values (1),(2)) v(a) left join lateral (select v.a, four, ten, count(*) from onek group by grouping sets(four,ten)) s on true order by v.a,four,ten;
|
|
|
|
|
select array(select row(v.a,s1.*) from (select two,four, count(*) from onek group by grouping sets(two,four) order by two,four) s1) from (values (1),(2)) v(a);
|
|
|
|
|
|
|
|
|
|
-- test the knapsack
|
|
|
|
|
|
2017-03-27 00:53:26 -04:00
|
|
|
set enable_indexscan = false;
|
2017-03-26 23:20:54 -04:00
|
|
|
set work_mem = '64kB';
|
|
|
|
|
explain (costs off)
|
|
|
|
|
select unique1,
|
|
|
|
|
count(two), count(four), count(ten),
|
|
|
|
|
count(hundred), count(thousand), count(twothousand),
|
|
|
|
|
count(*)
|
|
|
|
|
from tenk1 group by grouping sets (unique1,twothousand,thousand,hundred,ten,four,two);
|
|
|
|
|
explain (costs off)
|
|
|
|
|
select unique1,
|
|
|
|
|
count(two), count(four), count(ten),
|
|
|
|
|
count(hundred), count(thousand), count(twothousand),
|
|
|
|
|
count(*)
|
|
|
|
|
from tenk1 group by grouping sets (unique1,hundred,ten,four,two);
|
|
|
|
|
|
|
|
|
|
set work_mem = '384kB';
|
|
|
|
|
explain (costs off)
|
|
|
|
|
select unique1,
|
|
|
|
|
count(two), count(four), count(ten),
|
|
|
|
|
count(hundred), count(thousand), count(twothousand),
|
|
|
|
|
count(*)
|
|
|
|
|
from tenk1 group by grouping sets (unique1,twothousand,thousand,hundred,ten,four,two);
|
|
|
|
|
|
2019-01-17 00:33:01 -05:00
|
|
|
-- check collation-sensitive matching between grouping expressions
|
|
|
|
|
-- (similar to a check for aggregates, but there are additional code
|
|
|
|
|
-- paths for GROUPING, so check again here)
|
|
|
|
|
|
|
|
|
|
select v||'a', case grouping(v||'a') when 1 then 1 else 0 end, count(*)
|
|
|
|
|
from unnest(array[1,1], array['a','b']) u(i,v)
|
|
|
|
|
group by rollup(i, v||'a') order by 1,3;
|
|
|
|
|
select v||'a', case when grouping(v||'a') = 1 then 1 else 0 end, count(*)
|
|
|
|
|
from unnest(array[1,1], array['a','b']) u(i,v)
|
|
|
|
|
group by rollup(i, v||'a') order by 1,3;
|
|
|
|
|
|
2020-12-26 20:25:30 -05:00
|
|
|
-- Bug #16784
|
2020-12-27 12:47:23 -05:00
|
|
|
create table bug_16784(i int, j int);
|
|
|
|
|
analyze bug_16784;
|
|
|
|
|
alter table bug_16784 set (autovacuum_enabled = 'false');
|
|
|
|
|
update pg_class set reltuples = 10 where relname='bug_16784';
|
2020-12-26 20:25:30 -05:00
|
|
|
|
2020-12-27 12:47:23 -05:00
|
|
|
insert into bug_16784 select g/10, g from generate_series(1,40) g;
|
2020-12-26 20:25:30 -05:00
|
|
|
|
2020-12-27 12:47:23 -05:00
|
|
|
set work_mem='64kB';
|
|
|
|
|
set enable_sort = false;
|
2020-12-26 20:25:30 -05:00
|
|
|
|
|
|
|
|
select * from
|
|
|
|
|
(values (1),(2)) v(a),
|
|
|
|
|
lateral (select a, i, j, count(*) from
|
|
|
|
|
bug_16784 group by cube(i,j)) s
|
|
|
|
|
order by v.a, i, j;
|
|
|
|
|
|
Disk-based Hash Aggregation.
While performing hash aggregation, track memory usage when adding new
groups to a hash table. If the memory usage exceeds work_mem, enter
"spill mode".
In spill mode, new groups are not created in the hash table(s), but
existing groups continue to be advanced if input tuples match. Tuples
that would cause a new group to be created are instead spilled to a
logical tape to be processed later.
The tuples are spilled in a partitioned fashion. When all tuples from
the outer plan are processed (either by advancing the group or
spilling the tuple), finalize and emit the groups from the hash
table. Then, create new batches of work from the spilled partitions,
and select one of the saved batches and process it (possibly spilling
recursively).
Author: Jeff Davis
Reviewed-by: Tomas Vondra, Adam Lee, Justin Pryzby, Taylor Vesely, Melanie Plageman
Discussion: https://postgr.es/m/507ac540ec7c20136364b5272acbcd4574aa76ef.camel@j-davis.com
2020-03-18 18:42:02 -04:00
|
|
|
--
|
|
|
|
|
-- Compare results between plans using sorting and plans using hash
|
|
|
|
|
-- aggregation. Force spilling in both cases by setting work_mem low
|
2020-06-11 14:58:16 -04:00
|
|
|
-- and altering the statistics.
|
Disk-based Hash Aggregation.
While performing hash aggregation, track memory usage when adding new
groups to a hash table. If the memory usage exceeds work_mem, enter
"spill mode".
In spill mode, new groups are not created in the hash table(s), but
existing groups continue to be advanced if input tuples match. Tuples
that would cause a new group to be created are instead spilled to a
logical tape to be processed later.
The tuples are spilled in a partitioned fashion. When all tuples from
the outer plan are processed (either by advancing the group or
spilling the tuple), finalize and emit the groups from the hash
table. Then, create new batches of work from the spilled partitions,
and select one of the saved batches and process it (possibly spilling
recursively).
Author: Jeff Davis
Reviewed-by: Tomas Vondra, Adam Lee, Justin Pryzby, Taylor Vesely, Melanie Plageman
Discussion: https://postgr.es/m/507ac540ec7c20136364b5272acbcd4574aa76ef.camel@j-davis.com
2020-03-18 18:42:02 -04:00
|
|
|
--
|
|
|
|
|
|
2020-06-11 14:58:16 -04:00
|
|
|
create table gs_data_1 as
|
|
|
|
|
select g%1000 as g1000, g%100 as g100, g%10 as g10, g
|
|
|
|
|
from generate_series(0,1999) g;
|
|
|
|
|
|
|
|
|
|
analyze gs_data_1;
|
|
|
|
|
alter table gs_data_1 set (autovacuum_enabled = 'false');
|
|
|
|
|
update pg_class set reltuples = 10 where relname='gs_data_1';
|
|
|
|
|
|
2020-12-27 12:47:23 -05:00
|
|
|
set work_mem='64kB';
|
Disk-based Hash Aggregation.
While performing hash aggregation, track memory usage when adding new
groups to a hash table. If the memory usage exceeds work_mem, enter
"spill mode".
In spill mode, new groups are not created in the hash table(s), but
existing groups continue to be advanced if input tuples match. Tuples
that would cause a new group to be created are instead spilled to a
logical tape to be processed later.
The tuples are spilled in a partitioned fashion. When all tuples from
the outer plan are processed (either by advancing the group or
spilling the tuple), finalize and emit the groups from the hash
table. Then, create new batches of work from the spilled partitions,
and select one of the saved batches and process it (possibly spilling
recursively).
Author: Jeff Davis
Reviewed-by: Tomas Vondra, Adam Lee, Justin Pryzby, Taylor Vesely, Melanie Plageman
Discussion: https://postgr.es/m/507ac540ec7c20136364b5272acbcd4574aa76ef.camel@j-davis.com
2020-03-18 18:42:02 -04:00
|
|
|
|
|
|
|
|
-- Produce results with sorting.
|
|
|
|
|
|
2020-12-27 12:47:23 -05:00
|
|
|
set enable_sort = true;
|
Disk-based Hash Aggregation.
While performing hash aggregation, track memory usage when adding new
groups to a hash table. If the memory usage exceeds work_mem, enter
"spill mode".
In spill mode, new groups are not created in the hash table(s), but
existing groups continue to be advanced if input tuples match. Tuples
that would cause a new group to be created are instead spilled to a
logical tape to be processed later.
The tuples are spilled in a partitioned fashion. When all tuples from
the outer plan are processed (either by advancing the group or
spilling the tuple), finalize and emit the groups from the hash
table. Then, create new batches of work from the spilled partitions,
and select one of the saved batches and process it (possibly spilling
recursively).
Author: Jeff Davis
Reviewed-by: Tomas Vondra, Adam Lee, Justin Pryzby, Taylor Vesely, Melanie Plageman
Discussion: https://postgr.es/m/507ac540ec7c20136364b5272acbcd4574aa76ef.camel@j-davis.com
2020-03-18 18:42:02 -04:00
|
|
|
set enable_hashagg = false;
|
|
|
|
|
set jit_above_cost = 0;
|
|
|
|
|
|
|
|
|
|
explain (costs off)
|
2020-06-11 14:58:16 -04:00
|
|
|
select g100, g10, sum(g::numeric), count(*), max(g::text)
|
|
|
|
|
from gs_data_1 group by cube (g1000, g100,g10);
|
Disk-based Hash Aggregation.
While performing hash aggregation, track memory usage when adding new
groups to a hash table. If the memory usage exceeds work_mem, enter
"spill mode".
In spill mode, new groups are not created in the hash table(s), but
existing groups continue to be advanced if input tuples match. Tuples
that would cause a new group to be created are instead spilled to a
logical tape to be processed later.
The tuples are spilled in a partitioned fashion. When all tuples from
the outer plan are processed (either by advancing the group or
spilling the tuple), finalize and emit the groups from the hash
table. Then, create new batches of work from the spilled partitions,
and select one of the saved batches and process it (possibly spilling
recursively).
Author: Jeff Davis
Reviewed-by: Tomas Vondra, Adam Lee, Justin Pryzby, Taylor Vesely, Melanie Plageman
Discussion: https://postgr.es/m/507ac540ec7c20136364b5272acbcd4574aa76ef.camel@j-davis.com
2020-03-18 18:42:02 -04:00
|
|
|
|
|
|
|
|
create table gs_group_1 as
|
2020-06-11 14:58:16 -04:00
|
|
|
select g100, g10, sum(g::numeric), count(*), max(g::text)
|
|
|
|
|
from gs_data_1 group by cube (g1000, g100,g10);
|
Disk-based Hash Aggregation.
While performing hash aggregation, track memory usage when adding new
groups to a hash table. If the memory usage exceeds work_mem, enter
"spill mode".
In spill mode, new groups are not created in the hash table(s), but
existing groups continue to be advanced if input tuples match. Tuples
that would cause a new group to be created are instead spilled to a
logical tape to be processed later.
The tuples are spilled in a partitioned fashion. When all tuples from
the outer plan are processed (either by advancing the group or
spilling the tuple), finalize and emit the groups from the hash
table. Then, create new batches of work from the spilled partitions,
and select one of the saved batches and process it (possibly spilling
recursively).
Author: Jeff Davis
Reviewed-by: Tomas Vondra, Adam Lee, Justin Pryzby, Taylor Vesely, Melanie Plageman
Discussion: https://postgr.es/m/507ac540ec7c20136364b5272acbcd4574aa76ef.camel@j-davis.com
2020-03-18 18:42:02 -04:00
|
|
|
|
|
|
|
|
-- Produce results with hash aggregation.
|
|
|
|
|
|
|
|
|
|
set enable_hashagg = true;
|
|
|
|
|
set enable_sort = false;
|
|
|
|
|
|
|
|
|
|
explain (costs off)
|
2020-06-11 14:58:16 -04:00
|
|
|
select g100, g10, sum(g::numeric), count(*), max(g::text)
|
|
|
|
|
from gs_data_1 group by cube (g1000, g100,g10);
|
Disk-based Hash Aggregation.
While performing hash aggregation, track memory usage when adding new
groups to a hash table. If the memory usage exceeds work_mem, enter
"spill mode".
In spill mode, new groups are not created in the hash table(s), but
existing groups continue to be advanced if input tuples match. Tuples
that would cause a new group to be created are instead spilled to a
logical tape to be processed later.
The tuples are spilled in a partitioned fashion. When all tuples from
the outer plan are processed (either by advancing the group or
spilling the tuple), finalize and emit the groups from the hash
table. Then, create new batches of work from the spilled partitions,
and select one of the saved batches and process it (possibly spilling
recursively).
Author: Jeff Davis
Reviewed-by: Tomas Vondra, Adam Lee, Justin Pryzby, Taylor Vesely, Melanie Plageman
Discussion: https://postgr.es/m/507ac540ec7c20136364b5272acbcd4574aa76ef.camel@j-davis.com
2020-03-18 18:42:02 -04:00
|
|
|
|
|
|
|
|
create table gs_hash_1 as
|
2020-06-11 14:58:16 -04:00
|
|
|
select g100, g10, sum(g::numeric), count(*), max(g::text)
|
|
|
|
|
from gs_data_1 group by cube (g1000, g100,g10);
|
Disk-based Hash Aggregation.
While performing hash aggregation, track memory usage when adding new
groups to a hash table. If the memory usage exceeds work_mem, enter
"spill mode".
In spill mode, new groups are not created in the hash table(s), but
existing groups continue to be advanced if input tuples match. Tuples
that would cause a new group to be created are instead spilled to a
logical tape to be processed later.
The tuples are spilled in a partitioned fashion. When all tuples from
the outer plan are processed (either by advancing the group or
spilling the tuple), finalize and emit the groups from the hash
table. Then, create new batches of work from the spilled partitions,
and select one of the saved batches and process it (possibly spilling
recursively).
Author: Jeff Davis
Reviewed-by: Tomas Vondra, Adam Lee, Justin Pryzby, Taylor Vesely, Melanie Plageman
Discussion: https://postgr.es/m/507ac540ec7c20136364b5272acbcd4574aa76ef.camel@j-davis.com
2020-03-18 18:42:02 -04:00
|
|
|
|
|
|
|
|
set enable_sort = true;
|
|
|
|
|
set work_mem to default;
|
|
|
|
|
|
|
|
|
|
-- Compare results
|
|
|
|
|
|
|
|
|
|
(select * from gs_hash_1 except select * from gs_group_1)
|
|
|
|
|
union all
|
|
|
|
|
(select * from gs_group_1 except select * from gs_hash_1);
|
|
|
|
|
|
|
|
|
|
drop table gs_group_1;
|
|
|
|
|
drop table gs_hash_1;
|
|
|
|
|
|
Implement GROUP BY DISTINCT
With grouping sets, it's possible that some of the grouping sets are
duplicate. This is especially common with CUBE and ROLLUP clauses. For
example GROUP BY CUBE (a,b), CUBE (b,c) is equivalent to
GROUP BY GROUPING SETS (
(a, b, c),
(a, b, c),
(a, b, c),
(a, b),
(a, b),
(a, b),
(a),
(a),
(a),
(c, a),
(c, a),
(c, a),
(c),
(b, c),
(b),
()
)
Some of the grouping sets are calculated multiple times, which is mostly
unnecessary. This commit implements a new GROUP BY DISTINCT feature, as
defined in the SQL standard, which eliminates the duplicate sets.
Author: Vik Fearing
Reviewed-by: Erik Rijkers, Georgios Kokolatos, Tomas Vondra
Discussion: https://postgr.es/m/bf3805a8-d7d1-ae61-fece-761b7ff41ecc@postgresfriends.org
2021-03-18 12:45:38 -04:00
|
|
|
-- GROUP BY DISTINCT
|
|
|
|
|
|
|
|
|
|
-- "normal" behavior...
|
|
|
|
|
select a, b, c
|
|
|
|
|
from (values (1, 2, 3), (4, null, 6), (7, 8, 9)) as t (a, b, c)
|
|
|
|
|
group by all rollup(a, b), rollup(a, c)
|
|
|
|
|
order by a, b, c;
|
|
|
|
|
|
|
|
|
|
-- ...which is also the default
|
|
|
|
|
select a, b, c
|
|
|
|
|
from (values (1, 2, 3), (4, null, 6), (7, 8, 9)) as t (a, b, c)
|
|
|
|
|
group by rollup(a, b), rollup(a, c)
|
|
|
|
|
order by a, b, c;
|
|
|
|
|
|
|
|
|
|
-- "group by distinct" behavior...
|
|
|
|
|
select a, b, c
|
|
|
|
|
from (values (1, 2, 3), (4, null, 6), (7, 8, 9)) as t (a, b, c)
|
|
|
|
|
group by distinct rollup(a, b), rollup(a, c)
|
|
|
|
|
order by a, b, c;
|
|
|
|
|
|
|
|
|
|
-- ...which is not the same as "select distinct"
|
|
|
|
|
select distinct a, b, c
|
|
|
|
|
from (values (1, 2, 3), (4, null, 6), (7, 8, 9)) as t (a, b, c)
|
|
|
|
|
group by rollup(a, b), rollup(a, c)
|
|
|
|
|
order by a, b, c;
|
|
|
|
|
|
Support GROUPING SETS, CUBE and ROLLUP.
This SQL standard functionality allows to aggregate data by different
GROUP BY clauses at once. Each grouping set returns rows with columns
grouped by in other sets set to NULL.
This could previously be achieved by doing each grouping as a separate
query, conjoined by UNION ALLs. Besides being considerably more concise,
grouping sets will in many cases be faster, requiring only one scan over
the underlying data.
The current implementation of grouping sets only supports using sorting
for input. Individual sets that share a sort order are computed in one
pass. If there are sets that don't share a sort order, additional sort &
aggregation steps are performed. These additional passes are sourced by
the previous sort step; thus avoiding repeated scans of the source data.
The code is structured in a way that adding support for purely using
hash aggregation or a mix of hashing and sorting is possible. Sorting
was chosen to be supported first, as it is the most generic method of
implementation.
Instead of, as in an earlier versions of the patch, representing the
chain of sort and aggregation steps as full blown planner and executor
nodes, all but the first sort are performed inside the aggregation node
itself. This avoids the need to do some unusual gymnastics to handle
having to return aggregated and non-aggregated tuples from underlying
nodes, as well as having to shut down underlying nodes early to limit
memory usage. The optimizer still builds Sort/Agg node to describe each
phase, but they're not part of the plan tree, but instead additional
data for the aggregation node. They're a convenient and preexisting way
to describe aggregation and sorting. The first (and possibly only) sort
step is still performed as a separate execution step. That retains
similarity with existing group by plans, makes rescans fairly simple,
avoids very deep plans (leading to slow explains) and easily allows to
avoid the sorting step if the underlying data is sorted by other means.
A somewhat ugly side of this patch is having to deal with a grammar
ambiguity between the new CUBE keyword and the cube extension/functions
named cube (and rollup). To avoid breaking existing deployments of the
cube extension it has not been renamed, neither has cube been made a
reserved keyword. Instead precedence hacking is used to make GROUP BY
cube(..) refer to the CUBE grouping sets feature, and not the function
cube(). To actually group by a function cube(), unlikely as that might
be, the function name has to be quoted.
Needs a catversion bump because stored rules may change.
Author: Andrew Gierth and Atri Sharma, with contributions from Andres Freund
Reviewed-By: Andres Freund, Noah Misch, Tom Lane, Svenne Krap, Tomas
Vondra, Erik Rijkers, Marti Raudsepp, Pavel Stehule
Discussion: CAOeZVidmVRe2jU6aMk_5qkxnB7dfmPROzM7Ur8JPW5j8Y5X-Lw@mail.gmail.com
2015-05-15 21:40:59 -04:00
|
|
|
-- end
|