postgresql

mirror of https://github.com/postgres/postgres.git synced 2026-03-10 10:11:28 -04:00

Author	SHA1	Message	Date
Jeff Davis	76df765e88	Reduce test time for disk-based Hash Aggregation. Discussion: https://postgr.es/m/23196.1584943506@sss.pgh.pa.us	2020-03-23 19:03:49 -07:00
Tom Lane	24e2885ee3	Introduce "anycompatible" family of polymorphic types. This patch adds the pseudo-types anycompatible, anycompatiblearray, anycompatiblenonarray, and anycompatiblerange. They work much like anyelement, anyarray, anynonarray, and anyrange respectively, except that the actual input values need not match precisely in type. Instead, if we can find a common supertype (using the same rules as for UNION/CASE type resolution), then the parser automatically promotes the input values to that type. For example, "myfunc(anycompatible, anycompatible)" can match a call with one integer and one bigint argument, with the integer automatically promoted to bigint. With anyelement in the definition, the user would have had to cast the integer explicitly. The new types also provide a second, independent set of type variables for function matching; thus with "myfunc(anyelement, anyelement, anycompatible) returns anycompatible" the first two arguments are constrained to be the same type, but the third can be some other type, and the result has the type of the third argument. The need for more than one set of type variables was foreseen back when we first invented the polymorphic types, but we never did anything about it. Pavel Stehule, revised a bit by me Discussion: https://postgr.es/m/CAFj8pRDna7VqNi8gR+Tt2Ktmz0cq5G93guc3Sbn_NVPLdXAkqA@mail.gmail.com	2020-03-19 11:43:11 -04:00
Jeff Davis	1f39bce021	Disk-based Hash Aggregation. While performing hash aggregation, track memory usage when adding new groups to a hash table. If the memory usage exceeds work_mem, enter "spill mode". In spill mode, new groups are not created in the hash table(s), but existing groups continue to be advanced if input tuples match. Tuples that would cause a new group to be created are instead spilled to a logical tape to be processed later. The tuples are spilled in a partitioned fashion. When all tuples from the outer plan are processed (either by advancing the group or spilling the tuple), finalize and emit the groups from the hash table. Then, create new batches of work from the spilled partitions, and select one of the saved batches and process it (possibly spilling recursively). Author: Jeff Davis Reviewed-by: Tomas Vondra, Adam Lee, Justin Pryzby, Taylor Vesely, Melanie Plageman Discussion: https://postgr.es/m/507ac540ec7c20136364b5272acbcd4574aa76ef.camel@j-davis.com	2020-03-18 15:42:02 -07:00
Tom Lane	823e739d4a	Test GROUP BY matching of join columns that are type-coerced by USING. If we have, say, an int column that is left-joined to a bigint column with USING, the merged column is the int column promoted to bigint. GROUP BY's tests for whether grouping on the merged column allows a reference to the underlying column, or vice versa, should know about that relationship --- and they do. But I nearly broke this case with an ill-advised optimization, so the lack of any test coverage for it seems like a bad idea.	2020-01-01 19:31:41 -05:00
David Rowley	a5be4062f7	Don't remove surplus columns from GROUP BY for inheritance parents `d4c3a156c` added code to remove columns that were not part of a table's PRIMARY KEY constraint from the GROUP BY clause when all the primary key columns were present in the group by. This is fine to do since we know that there will only be one row per group coming from this relation. However, the logic failed to consider inheritance parent relations. These can have child relations without a primary key, but even if they did, they could duplicate one of the parent's rows or one from another child relation. In this case, those additional GROUP BY columns are required. Fix this by disabling the optimization for inheritance parent tables. In v11 and beyond, partitioned tables are fine since partitions cannot overlap and before v11 partitioned tables could not have a primary key. Reported-by: Manuel Rigger Discussion: http://postgr.es/m/CA+u7OA7VLKf_vEr6kLF3MnWSA9LToJYncgpNX2tQ-oWzYCBQAw@mail.gmail.com Backpatch-through: 9.6	2019-07-03 23:44:54 +12:00
Andrew Gierth	44e95b5728	Fix array size allocation for HashAggregate hash keys. When there were duplicate columns in the hash key list, the array sizes could be miscomputed, resulting in access off the end of the array. Adjust the computation to ensure the array is always large enough. (I considered whether the duplicates could be removed in planning, but I can't rule out the possibility that duplicate columns might have different hash functions assigned. Simpler to just make sure it works at execution time regardless.) Bug apparently introduced in `fc4b3dea2` as part of narrowing down the tuples stored in the hashtable. Reported by Colm McHugh of Salesforce, though I didn't use their patch. Backpatch back to version 10 where the bug was introduced. Discussion: https://postgr.es/m/CAFeeJoKKu0u+A_A9R9316djW-YW3-+Gtgvy3ju655qRHR3jtdA@mail.gmail.com	2019-05-23 15:26:01 +01:00
Andres Freund	2657283256	Minimally fix partial aggregation for aggregates that don't have one argument. For partial aggregation combine steps, AggStatePerTrans->numTransInputs was set to the transition function's number of inputs, rather than the combine function's number of inputs (always 1). That lead to partial aggregates with strict combine functions to wrongly check for NOT NULL input as required by strictness. When the aggregate wasn't exactly passed one argument, the strictness check was either omitted (in the 0 args case) or too many arguments were checked. In the latter case we'd read beyond the end of FunctionCallInfoData->args (only in master). AggStatePerTrans->numTransInputs actually has been wrong since since 9.6, where partial aggregates were added. But it turns out to not be an active problem in 9.6 and 10, because numTransInputs wasn't used at all for combine functions: Before `c253b722f6` there simply was no NULL check for the input to strict trans functions, and after that the check was simply hardcoded for the right offset in fcinfo, as it's done by code specific to combine functions. In `bf6c614a2f` (11) the strictness check was generalized, with common code doing the strictness checks for both plain and combine transition functions, based on numTransInputs. For combine functions this lead to not emitting an expression step to check for strict input in the 0 arguments case, and in the > 1 arguments case, we'd check too many arguments.Due to the fact that the relevant fcinfo->isnull[2..] was always zero-initialized (more or less by accident, by being part of the AggStatePerTrans struct, which is palloc0'ed), there was no observable damage in the latter case before `a9c35cf85c`, we just checked too many array elements. Due to the changes in `a9c35cf85c`, > 1 argument bug became visible, because these days fcinfo is a) dynamically allocated without being zeroed b) exactly the length required for the number of specified arguments (hardcoded to 2 in this case). This commit only contains a fairly minimal fix, setting numTransInputs to a hardcoded 1 when building a pertrans for a combine function. It seems likely that we'll want to clean this up further (e.g. the arguments build_pertrans_for_aggref() aren't particularly meaningful for combine functions). But the wrap date for 12 beta1 is coming up fast, so it seems good to have a minimal fix in place. Backpatch to 11. While AggStatePerTrans->numTransInputs was set wrongly before that, the value was not used for combine functions. Reported-By: Rajkumar Raghuwanshi Diagnosed-By: Kyotaro Horiguchi, Jeevan Chalke, Andres Freund, David Rowley Author: David Rowley, Kyotaro Horiguchi, Andres Freund Discussion: https://postgr.es/m/CAKcux6=uZEyWyLw0N7HtR9OBc-sWEFeByEZC7t-KDf15FKxVew@mail.gmail.com	2019-05-19 18:01:06 -07:00
Andrew Gierth	02ddd49932	Change floating-point output format for improved performance. Previously, floating-point output was done by rounding to a specific decimal precision; by default, to 6 or 15 decimal digits (losing information) or as requested using extra_float_digits. Drivers that wanted exact float values, and applications like pg_dump that must preserve values exactly, set extra_float_digits=3 (or sometimes 2 for historical reasons, though this isn't enough for float4). Unfortunately, decimal rounded output is slow enough to become a noticable bottleneck when dealing with large result sets or COPY of large tables when many floating-point values are involved. Floating-point output can be done much faster when the output is not rounded to a specific decimal length, but rather is chosen as the shortest decimal representation that is closer to the original float value than to any other value representable in the same precision. The recently published Ryu algorithm by Ulf Adams is both relatively simple and remarkably fast. Accordingly, change float4out/float8out to output shortest decimal representations if extra_float_digits is greater than 0, and make that the new default. Applications that need rounded output can set extra_float_digits back to 0 or below, and take the resulting performance hit. We make one concession to portability for systems with buggy floating-point input: we do not output decimal values that fall exactly halfway between adjacent representable binary values (which would rely on the reader doing round-to-nearest-even correctly). This is known to be a problem at least for VS2013 on Windows. Our version of the Ryu code originates from https://github.com/ulfjack/ryu/ at commit c9c3fb1979, but with the following (significant) modifications: - Output format is changed to use fixed-point notation for small exponents, as printf would, and also to use lowercase 'e', a minimum of 2 exponent digits, and a mandatory sign on the exponent, to keep the formatting as close as possible to previous output. - The output of exact midpoint values is disabled as noted above. - The integer fast-path code is changed somewhat (since we have fixed-point output and the upstream did not). - Our project style has been largely applied to the code with the exception of C99 declaration-after-statement, which has been retained as an exception to our present policy. - Most of upstream's debugging and conditionals are removed, and we use our own configure tests to determine things like uint128 availability. Changing the float output format obviously affects a number of regression tests. This patch uses an explicit setting of extra_float_digits=0 for test output that is not expected to be exactly reproducible (e.g. due to numerical instability or differing algorithms for transcendental functions). Conversions from floats to numeric are unchanged by this patch. These may appear in index expressions and it is not yet clear whether any change should be made, so that can be left for another day. This patch assumes that the only supported floating point format is now IEEE format, and the documentation is updated to reflect that. Code by me, adapting the work of Ulf Adams and other contributors. References: https://dl.acm.org/citation.cfm?id=3192369 Reviewed-by: Tom Lane, Andres Freund, Donald Dong Discussion: https://postgr.es/m/87r2el1bx6.fsf@news-spur.riddles.org.uk	2019-02-13 15:20:33 +00:00
Andrew Gierth	d16d453870	Postpone aggregate checks until after collation is assigned. Previously, parseCheckAggregates was run before assign_query_collations, but this causes problems if any expression has already had a collation assigned by some transform function (e.g. transformCaseExpr) before parseCheckAggregates runs. The differing collations would cause expressions not to be recognized as equal to the ones in the GROUP BY clause, leading to spurious errors about unaggregated column references. The result was that CASE expr WHEN val ... would fail when "expr" contained a GROUPING() expression or matched one of the group by expressions, and where collatable types were involved; whereas the supposedly identical CASE WHEN expr = val ... would succeed. Backpatch all the way; this appears to have been wrong ever since collations were introduced. Per report from Guillaume Lelarge, analysis and patch by me. Discussion: https://postgr.es/m/CAECtzeVSO_US8C2Khgfv54ZMUOBR4sWq+6_bLrETnWExHT=rFg@mail.gmail.com Discussion: https://postgr.es/m/87muo0k0c7.fsf@news-spur.riddles.org.uk	2019-01-17 06:46:10 +00:00
Andres Freund	4c640f4f38	Fix STRICT check for strict aggregates with NULL ORDER BY columns. I (Andres) broke this unintentionally in `69c3936a14`, by checking strictness for all input expressions computed for an aggregate, rather than just the input for the aggregate transition function. Reported-By: Ondřej Bouda Bisected-By: Tom Lane Diagnosed-By: Andrew Gierth Discussion: https://postgr.es/m/2a505161-2727-2473-7c46-591ed108ac52@email.cz Backpatch: 11-, like `69c3936a14`	2018-11-03 14:48:42 -07:00
Dean Rasheed	e954a727f0	Improve the accuracy of floating point statistical aggregates. When computing statistical aggregates like variance, the common schoolbook algorithm which computes the sum of the squares of the values and subtracts the square of the mean can lead to a large loss of precision when using floating point arithmetic, because the difference between the two terms is often very small relative to the terms themselves. To avoid this, re-work these aggregates to use the Youngs-Cramer algorithm, which is a proven, numerically stable algorithm that directly aggregates the sum of the squares of the differences of the values from the mean in a single pass over the data. While at it, improve the test coverage to test the aggregate combine functions used during parallel aggregation. Per report and suggested algorithm from Erich Schubert. Patch by me, reviewed by Madeleine Thompson. Discussion: https://postgr.es/m/153313051300.1397.9594490737341194671@wrigleys.postgresql.org	2018-10-06 11:20:09 +01:00
Andres Freund	249126e761	Use context with correct lifetime in hypothetical_dense_rank_final. The query lifetime expression context created in hypothetical_dense_rank_final() was buggily allocated in the calling memory context. I (Andres) broke that in `bf6c614a2f`. Reported-By: Rajkumar Raghuwanshi Author: Amit Langote Discussion: https://postgr.es/m/CAKcux6kmzWmur5HhA_aU6gYVFu0RLQdgJJ+aC9SLdcOvBSrpfA@mail.gmail.com Backpatch: 11-	2018-07-04 17:36:01 -07:00
Tom Lane	ec4719cd15	Fix partial aggregation for variance(int4) and related aggregates. A typo in numeric_poly_combine caused bogus results for queries using it, but of course would only manifest if parallel aggregation is performed. Reported by Rajkumar Raghuwanshi. David Rowley did the diagnosis and the fix; I editorialized rather heavily on his regression test additions. Back-patch to v10 where the breakage was introduced (by `9cca11c91`). Discussion: https://postgr.es/m/CAKcux6nU4E2x8nkSBpLOT2DPvQ5LviJ3SGyAN6Sz7qDH4G4+Pw@mail.gmail.com	2018-06-21 16:18:39 -04:00
Tom Lane	fb8697b31a	Avoid unnecessary use of pg_strcasecmp for already-downcased identifiers. We have a lot of code in which option names, which from the user's viewpoint are logically keywords, are passed through the grammar as plain identifiers, and then matched to string literals during command execution. This approach avoids making words into lexer keywords unnecessarily. Some places matched these strings using plain strcmp, some using pg_strcasecmp. But the latter should be unnecessary since identifiers would have been downcased on their way through the parser. Aside from any efficiency concerns (probably not a big factor), the lack of consistency in this area creates a hazard of subtle bugs due to different places coming to different conclusions about whether two option names are the same or different. Hence, standardize on using strcmp() to match any option names that are expected to have been fed through the parser. This does create a user-visible behavioral change, which is that while formerly all of these would work: alter table foo set (fillfactor = 50); alter table foo set (FillFactor = 50); alter table foo set ("fillfactor" = 50); alter table foo set ("FillFactor" = 50); now the last case will fail because that double-quoted identifier is different from the others. However, none of our documentation says that you can use a quoted identifier in such contexts at all, and we should discourage doing so since it would break if we ever decide to parse such constructs as true lexer keywords rather than poor man's substitutes. So this shouldn't create a significant compatibility issue for users. Daniel Gustafsson, reviewed by Michael Paquier, small changes by me Discussion: https://postgr.es/m/29405B24-564E-476B-98C0-677A29805B84@yesql.se	2018-01-26 18:25:14 -05:00
Tom Lane	e842791b0f	Fix unstable regression test added by commits `59b71c6fe` et al. The query didn't really have a preferred index, leading to platform- specific choices of which one to use. Adjust it to make sure tenk1_hundred is always chosen. Per buildfarm.	2017-11-24 00:29:20 -05:00
Andres Freund	59b71c6fe6	Fix handling of NULLs returned by aggregate combine functions. When strict aggregate combine functions, used in multi-stage/parallel aggregation, returned NULL, we didn't check for that, invoking the combine function with NULL the next round, despite it being strict. The equivalent code invoking normal transition functions has a check for that situation, which did not get copied in `a7de3dc5c3`. Fix the bug by adding the equivalent check. Based on a quick look I could not find any strict combine functions in core actually returning NULL, and it doesn't seem very likely external users have done so. So this isn't likely to have caused issues in practice. Add tests verifying transition / combine functions returning NULL is tested. Reported-By: Andres Freund Author: Andres Freund Discussion: https://postgr.es/m/20171121033642.7xvmjqrl4jdaaat3@alap3.anarazel.de Backpatch: 9.6, where parallel aggregation was introduced	2017-11-23 17:15:27 -08:00
Tom Lane	be0ebb65f5	Allow the built-in ordered-set aggregates to share transition state. The built-in OSAs all share the same transition function, so they can share transition state as long as the final functions cooperate to not do the sort step more than once. To avoid running the tuplesort object in randomAccess mode unnecessarily, add a bit of infrastructure to nodeAgg.c to let the aggregate functions find out whether the transition state is actually being shared or not. This doesn't work for the hypothetical aggregates, since those inject a hypothetical row that isn't traceable to the shared input state. So they remain marked aggfinalmodify = 'w'. Discussion: https://postgr.es/m/CAB4ELO5RZhOamuT9Xsf72ozbenDLLXZKSk07FiSVsuJNZB861A@mail.gmail.com	2017-10-16 15:51:23 -04:00
Tom Lane	c3dfe0fec0	Repair breakage of aggregate FILTER option. An aggregate's input expression(s) are not supposed to be evaluated at all for a row where its FILTER test fails ... but commit `8ed3f11bb` overlooked that requirement. Reshuffle so that aggregates having a filter clause evaluate their arguments separately from those without. This still gets the benefit of doing only one ExecProject in the common case of multiple Aggrefs, none of which have filters. While at it, arrange for filter clauses to be included in the common ExecProject evaluation, thus perhaps buying a little bit even when there are filters. Back-patch to v10 where the bug was introduced. Discussion: https://postgr.es/m/30065.1508161354@sss.pgh.pa.us	2017-10-16 15:24:36 -04:00
Tom Lane	52328727be	Prevent sharing transition states between ordered-set aggregates. This ought to work, but the built-in OSAs are not capable of coping, because their final-functions destructively modify their transition state (specifically, the contained tuplesort object). That was fine when those functions were written, but commit `804163bc2` moved the goalposts without telling orderedsetaggs.c. We should fix the built-in OSAs to support this, but it will take a little work, especially if we don't want to sacrifice performance in the normal non-shared-state case. Given that it took a year after 9.6 release for anyone to notice this bug, we should not prioritize sharable-state over nonsharable-state performance. And a proper fix is likely to be more complicated than we'd want to back-patch, too. Therefore, let's just put in this stop-gap patch to prevent nodeAgg.c from choosing to use shared state for OSAs. We can revert it in HEAD when we get a better fix. Report from Lukas Eder, diagnosis by me, patch by David Rowley. Back-patch to 9.6 where the problem was introduced. Discussion: https://postgr.es/m/CAB4ELO5RZhOamuT9Xsf72ozbenDLLXZKSk07FiSVsuJNZB861A@mail.gmail.com	2017-10-11 22:18:10 -04:00
Heikki Linnakangas	db80acfc9d	Fix sharing Agg transition state of DISTINCT or ordered aggs. If a query contained two aggregates that could share the transition value, we would correctly collect the input into a tuplesort only once, but incorrectly run the transition function over the accumulated input twice, in finalize_aggregates(). That caused a crash, when we tried to call tuplesort_performsort() on an already-freed NULL tuplestore. Backport to 9.6, where sharing of transition state and this bug were introduced. Analysis by Tom Lane. Discussion: https://www.postgresql.org/message-id/ac5b0b69-744c-9114-6218-8300ac920e61@iki.fi	2016-12-20 09:20:17 +02:00
Tom Lane	2c00fad286	Fix improper repetition of previous results from a hashed aggregate. ExecReScanAgg's check for whether it could re-use a previously calculated hashtable neglected the possibility that the Agg node might reference PARAM_EXEC Params that are not referenced by its input plan node. That's okay if the Params are in upper tlist or qual expressions; but if one appears in aggregate input expressions, then the hashtable contents need to be recomputed when the Param's value changes. To avoid unnecessary performance degradation in the case of a Param that isn't within an aggregate input, add logic to the planner to determine which Params are within aggregate inputs. This requires a new field in struct Agg, but fortunately we never write plans to disk, so this isn't an initdb-forcing change. Per report from Jeevan Chalke. This has been broken since forever, so back-patch to all supported branches. Andrew Gierth, with minor adjustments by me Report: <CAM2+6=VY8ykfLT5Q8vb9B6EbeBk-NGuLbT6seaQ+Fq4zXvrDcA@mail.gmail.com>	2016-08-24 14:38:12 -04:00
Robert Haas	5ce5e4a12e	Set consider_parallel correctly for upper planner rels. Commit `3fc6e2d7f5` introduced new "upper" RelOptInfo structures but didn't set consider_parallel for them correctly, a point I completely missed when reviewing it. Later, commit `e06a38965b` made the situation worse by doing it incorrectly for the grouping relation. Try to straighten all of that out. Along the way, get rid of the annoying wholePlanParallelSafe flag, which was only necessarily because of the fact that upper planning stages didn't use paths at the time that code was written. The most important immediate impact of these changes is that force_parallel_mode will provide useful test coverage in quite a few more scenarios than it did previously, but it's also necessary preparation for fixing some problems related to subqueries. Patch by me, reviewed by Tom Lane.	2016-07-01 11:52:56 -04:00
Tom Lane	c12f02ffc9	Don't apply sortgroupref labels to a tlist that might not match. If we need to use a gating Result node for pseudoconstant quals, create_scan_plan() intentionally suppresses use_physical_tlist's checks on whether there are matches for sortgroupref labels, on the grounds that we don't need matches because we can label the Result's projection output properly. However, it then called apply_pathtarget_labeling_to_tlist anyway. This oversight was harmless when written, but in commit `aeb9ae645` I made that function throw an error if there was no match. Thus, the combination of a table scan, pseudoconstant quals, and a non-simple-Var sortgroupref column threw the dreaded "ORDER/GROUP BY expression not found in targetlist" error. To fix, just skip applying the labeling in this case. Per report from Rushabh Lathia. Report: <CAGPqQf2iLB8t6t-XrL-zR233DFTXxEsfVZ4WSqaYfLupEoDxXA@mail.gmail.com>	2016-06-28 10:43:11 -04:00
Tom Lane	d4c3a156cb	Remove GROUP BY columns that are functionally dependent on other columns. If a GROUP BY clause includes all columns of a non-deferred primary key, as well as other columns of the same relation, those other columns are redundant and can be dropped from the grouping; the pkey is enough to ensure that each row of the table corresponds to a separate group. Getting rid of the excess columns will reduce the cost of the sorting or hashing needed to implement GROUP BY, and can indeed remove the need for a sort step altogether. This seems worth testing for since many query authors are not aware of the GROUP-BY-primary-key exception to the rule about queries not being allowed to reference non-grouped-by columns in their targetlists or HAVING clauses. Thus, redundant GROUP BY items are not uncommon. Also, we can make the test pretty cheap in most queries where it won't help by not looking up a rel's primary key until we've found that at least two of its columns are in GROUP BY. David Rowley, reviewed by Julien Rouhaud	2016-02-11 17:34:59 -05:00
Heikki Linnakangas	804163bc25	Share transition state between different aggregates when possible. If there are two different aggregates in the query with same inputs, and the aggregates have the same initial condition and transition function, only calculate the state value once, and only call the final functions separately. For example, AVG(x) and SUM(x) aggregates have the same transition function, which accumulates the sum and number of input tuples. For a query like "SELECT AVG(x), SUM(x) FROM x", we can therefore accumulate the state function only once, which gives a nice speedup. David Rowley, reviewed and edited by me.	2015-08-04 17:53:10 +03:00
Tom Lane	b0f479113a	Repair corner-case bug in array version of percentile_cont(). The code for advancing through the input rows overlooked the case that we might already be past the first row of the row pair now being considered, in case the previous percentile also fell between the same two input rows. Report and patch by Andrew Gierth; logic rewritten a bit for clarity by me.	2014-12-13 11:49:41 -05:00
Tom Lane	8d65da1f01	Support ordered-set (WITHIN GROUP) aggregates. This patch introduces generic support for ordered-set and hypothetical-set aggregate functions, as well as implementations of the instances defined in SQL:2008 (percentile_cont(), percentile_disc(), rank(), dense_rank(), percent_rank(), cume_dist()). We also added mode() though it is not in the spec, as well as versions of percentile_cont() and percentile_disc() that can compute multiple percentile values in one pass over the data. Unlike the original submission, this patch puts full control of the sorting process in the hands of the aggregate's support functions. To allow the support functions to find out how they're supposed to sort, a new API function AggGetAggref() is added to nodeAgg.c. This allows retrieval of the aggregate call's Aggref node, which may have other uses beyond the immediate need. There is also support for ordered-set aggregates to install cleanup callback functions, so that they can be sure that infrastructure such as tuplesort objects gets cleaned up. In passing, make some fixes in the recently-added support for variadic aggregates, and make some editorial adjustments in the recent FILTER additions for aggregates. Also, simplify use of IsBinaryCoercible() by allowing it to succeed whenever the target type is ANY or ANYELEMENT. It was inconsistent that it dealt with other polymorphic target types but not these. Atri Sharma and Andrew Gierth; reviewed by Pavel Stehule and Vik Fearing, and rather heavily editorialized upon by Tom Lane	2013-12-23 16:11:35 -05:00
Tom Lane	b5e0a2a384	Tweak placement of explicit ANALYZE commands in the regression tests. Make the COPY test, which loads most of the large static tables used in the tests, also explicitly ANALYZE those tables. This allows us to get rid of various ad-hoc, and rather redundant, ANALYZE commands that had gotten stuck into various test scripts over time to ensure we got consistent plan choices. (We could have done a database-wide ANALYZE, but that would cause stats to get attached to the small static tables too, which results in plan changes compared to the historical behavior. I'm not sure that's a good idea, so not going that far for now.) Back-patch to 9.0, since 9.0 and 9.1 are currently sometimes failing regression tests for lack of an "ANALYZE tenk1" in the subselect test. There's no need for this in 8.4 since we didn't print any plans back then.	2013-12-11 15:09:15 -05:00
Tom Lane	69c8fbac20	Improve performance of numeric sum(), avg(), stddev(), variance(), etc. This patch improves performance of most built-in aggregates that formerly used a NUMERIC or NUMERIC array as their transition type; this includes not only aggregates on numeric inputs, but some aggregates on integer inputs where overflow of an int8 value is a possibility. The code now uses a special-purpose data structure to avoid array construction and deconstruction overhead, as well as packing and unpacking overhead for numeric values. These aggregates' transition type is now declared as INTERNAL, since it doesn't correspond to any SQL data type. To keep the planner from thinking that that means a lot of storage will be used, we make use of the just-added pg_aggregate.aggtransspace feature. The space estimate is set to 128 bytes, which is at least in the right ballpark. Hadi Moshayedi, reviewed by Pavel Stehule and Tomas Vondra	2013-11-16 18:46:34 -05:00
Tom Lane	0d3f4406df	Allow aggregate functions to be VARIADIC. There's no inherent reason why an aggregate function can't be variadic (even VARIADIC ANY) if its transition function can handle the case. Indeed, this patch to add the feature touches none of the planner or executor, and little of the parser; the main missing stuff was DDL and pg_dump support. It is true that variadic aggregates can create the same sort of ambiguity about parameters versus ORDER BY keys that was complained of when we (briefly) had both one- and two-argument forms of string_agg(). However, the policy formed in response to that discussion only said that we'd not create any built-in aggregates with varying numbers of arguments, not that we shouldn't allow users to do it. So the logical extension of that is we can allow users to make variadic aggregates as long as we're wary about shipping any such in core. In passing, this patch allows aggregate function arguments to be named, to the extent of remembering the names in pg_proc and dumping them in pg_dump. You can't yet call an aggregate using named-parameter notation. That seems like a likely future extension, but it'll take some work, and it's not what this patch is really about. Likewise, there's still some work needed to make window functions handle VARIADIC fully, but I left that for another day. initdb forced because of new aggvariadic field in Aggref parse nodes.	2013-09-03 17:08:46 -04:00
Noah Misch	b560ec1b0d	Implement the FILTER clause for aggregate function calls. This is SQL-standard with a few extensions, namely support for subqueries and outer references in clause expressions. catversion bump due to change in Aggref and WindowFunc. David Fetter, reviewed by Dean Rasheed.	2013-07-16 20:15:36 -04:00
Tom Lane	d3237e04ca	Fix SELECT DISTINCT with index-optimized MIN/MAX on inheritance trees. In a query such as "SELECT DISTINCT min(x) FROM tab", the DISTINCT is pretty useless (there being only one output row), but nonetheless it shouldn't fail. But it could fail if "tab" is an inheritance parent, because planagg.c's code for fixing up equivalence classes after making the index-optimized MIN/MAX transformation wasn't prepared to find child-table versions of the aggregate expression. The least ugly fix seems to be to add an option to mutate_eclass_expressions() to skip child-table equivalence class members, which aren't used anymore at this stage of planning so it's not really necessary to fix them. Since child members are ignored in many cases already, it seems plausible for mutate_eclass_expressions() to have an option to ignore them too. Per bug #7703 from Maxim Boguk. Back-patch to 9.1. Although the same code exists before that, it cannot encounter child-table aggregates AFAICS, because the index optimization transformation cannot succeed on inheritance trees before 9.1 (for lack of MergeAppend).	2012-11-26 12:57:58 -05:00
Tom Lane	eaccfded98	Centralize the logic for detecting misplaced aggregates, window funcs, etc. Formerly we relied on checking after-the-fact to see if an expression contained aggregates, window functions, or sub-selects when it shouldn't. This is grotty, easily forgotten (indeed, we had forgotten to teach DefineIndex about rejecting window functions), and none too efficient since it requires extra traversals of the parse tree. To improve matters, define an enum type that classifies all SQL sub-expressions, store it in ParseState to show what kind of expression we are currently parsing, and make transformAggregateCall, transformWindowFuncCall, and transformSubLink check the expression type and throw error if the type indicates the construct is disallowed. This allows removal of a large number of ad-hoc checks scattered around the code base. The enum type is sufficiently fine-grained that we can still produce error messages of at least the same specificity as before. Bringing these error checks together revealed that we'd been none too consistent about phrasing of the error messages, so standardize the wording a bit. Also, rewrite checking of aggregate arguments so that it requires only one traversal of the arguments, rather than up to three as before. In passing, clean up some more comments left over from add_missing_from support, and annotate some tests that I think are dead code now that that's gone. (I didn't risk actually removing said dead code, though.)	2012-08-10 11:36:15 -04:00
Peter Eisentraut	c0cc526e8b	Rename bytea_agg to string_agg and add delimiter argument Per mailing list discussion, we would like to keep the bytea functions parallel to the text functions, so rename bytea_agg to string_agg, which already exists for text. Also, to satisfy the rule that we don't want aggregate functions of the same name with a different number of arguments, add a delimiter argument, just like string_agg for text already has.	2012-04-13 21:36:59 +03:00
Robert Haas	d5448c7d31	Add bytea_agg, parallel to string_agg. Pavel Stehule	2011-12-23 08:40:25 -05:00
Tom Lane	8df08c8489	Reimplement planner's handling of MIN/MAX aggregate optimization (again). Instead of playing cute games with pathkeys, just build a direct representation of the intended sub-select, and feed it through query_planner to get a Path for the index access. This is a bit slower than 9.1's previous method, since we'll duplicate most of the overhead of query_planner; but since the whole optimization only applies to rather simple single-table queries, that probably won't be much of a problem in practice. The advantage is that we get to do the right thing when there's a partial index that needs the implicit IS NOT NULL clause to be usable. Also, although this makes planagg.c be a bit more closely tied to the ordering of operations in grouping_planner, we can get rid of some coupling to lower-level parts of the planner. Per complaint from Marti Raudsepp.	2011-03-22 00:34:31 -04:00
Peter Eisentraut	fc946c39ae	Remove useless whitespace at end of lines	2010-11-23 22:34:55 +02:00
Tom Lane	034967bdcb	Reimplement planner's handling of MIN/MAX aggregate optimization. Per my recent proposal, get rid of all the direct inspection of indexes and manual generation of paths in planagg.c. Instead, set up EquivalenceClasses for the aggregate argument expressions, and let the regular path generation logic deal with creating paths that can satisfy those sort orders. This makes planagg.c a bit more visible to the rest of the planner than it was originally, but the approach is basically a lot cleaner than before. A major advantage of doing it this way is that we get MIN/MAX optimization on inheritance trees (using MergeAppend of indexscans) practically for free, whereas in the old way we'd have had to add a whole lot more duplicative logic. One small disadvantage of this approach is that MIN/MAX aggregates can no longer exploit partial indexes having an "x IS NOT NULL" predicate, unless that restriction or something that implies it is specified in the query. The previous implementation was able to use the added "x IS NOT NULL" condition as an extra predicate proof condition, but in this version we rely entirely on indexes that are considered usable by the main planning process. That seems a fair tradeoff for the simplicity and functionality gained.	2010-11-04 12:01:17 -04:00
Tom Lane	b0c451e145	Remove the single-argument form of string_agg(). It added nothing much in functionality, while creating an ambiguity in usage with ORDER BY that at least two people have already gotten seriously confused by. Also, add an opr_sanity test to check that we don't in future violate the newly minted policy of not having built-in aggregates with the same name and different numbers of parameters. Per discussion of a complaint from Thom Brown.	2010-08-05 18:21:19 +00:00
Tom Lane	fba999cb2c	Allow ORDER BY/GROUP BY/etc items to match targetlist items regardless of any implicit casting previously applied to the targetlist item. This is reasonable because the implicit cast, by definition, wasn't written by the user; so we are preserving the expected behavior that ORDER BY items match textually equivalent tlist items. The case never arose before because there couldn't be any implicit casting of a top-level SELECT item before we process ORDER BY etc. But now it can arise in the context of aggregates containing ORDER BY clauses, since the "targetlist" is the already-casted list of arguments for the aggregate. The net effect is that the datatype used for ORDER BY/DISTINCT purposes is the aggregate's declared input type, not that of the original input column; which is a bit debatable but not horrendous, and to do otherwise would require major rework that doesn't seem justified. Per bug #5564 from Daniel Grace. Back-patch to 9.0 where aggregate ORDER BY was implemented.	2010-07-18 19:37:49 +00:00
Itagaki Takahiro	9ea9918e37	Add string_agg aggregate functions. The one argument version concatenates the input values into a string. The two argument version also does the same thing, but inserts delimiters between elements. Original patch by Pavel Stehule, reviewed by David E. Wheeler and me.	2010-02-01 03:14:45 +00:00
Tom Lane	34d26872ed	Support ORDER BY within aggregate function calls, at long last providing a non-kluge method for controlling the order in which values are fed to an aggregate function. At the same time eliminate the old implementation restriction that DISTINCT was only supported for single-argument aggregates. Possibly release-notable behavioral change: formerly, agg(DISTINCT x) dropped null values of x unconditionally. Now, it does so only if the agg transition function is strict; otherwise nulls are treated as DISTINCT normally would, ie, you get one copy. Andrew Gierth, reviewed by Hitoshi Harada	2009-12-15 17:57:48 +00:00
Tom Lane	20a3ddbbf9	Fix the handling of sub-SELECTs appearing in the arguments of an outer-level aggregate function. By definition, such a sub-SELECT cannot reference any variables of query levels between itself and the aggregate's semantic level (else the aggregate would've been assigned to that lower level instead). So the correct, most efficient implementation is to treat the sub-SELECT as being a sub-select of that outer query level, not the level the aggregate syntactically appears in. Not doing so also confuses the heck out of our parameter-passing logic, as illustrated in bug report from Daniel Grace. Fortunately, we were already copying the whole Aggref expression up to the outer query level, so all that's needed is to delay SS_process_sublinks processing of the sub-SELECT until control returns to the outer level. This has been broken since we introduced spec-compliant treatment of outer aggregates in 7.4; so patch all the way back.	2009-04-25 16:44:56 +00:00
Tom Lane	d344115519	Apply my original fix for Taiki Yamaguchi's bug report about DISTINCT MAX(). Add some regression tests for plausible failures in this area.	2008-03-31 16:59:26 +00:00
Tom Lane	1249cf8f38	SQL2003-standard statistical aggregates, by Sergey Koposov. I've added only the float8 versions of the aggregates, which is all that the standard requires. Sergey's original patch also provided versions using numeric arithmetic, but given the size and slowness of the code, I doubt we ought to include those in core.	2006-07-28 18:33:04 +00:00
Tom Lane	108fe47301	Aggregate functions now support multiple input arguments. I also took the opportunity to treat COUNT(*) as a zero-argument aggregate instead of the old hack that equated it to COUNT(1); this is materially cleaner (no more weird ANYOID cases) and ought to be at least a tiny bit faster. Original patch by Sergey Koposov; review, documentation, simple regression tests, pg_dump and psql support by moi.	2006-07-27 19:52:07 +00:00
Neil Conway	0ebf1cc834	Implement 4 new aggregate functions from SQL2003. Specifically: var_pop(), var_samp(), stddev_pop(), and stddev_samp(). var_samp() and stddev_samp() are just renamings of the historical Postgres aggregates variance() and stddev() -- the latter names have been kept for backward compatibility. This patch includes updates for the documentation and regression tests. The catversion has been bumped. NB: SQL2003 requires that DISTINCT not be specified for any of these aggregates. Per discussion on -patches, I have NOT implemented this restriction: if the user asks for stddev(DISTINCT x), presumably they know what they are doing.	2006-03-10 20:15:28 +00:00
Tom Lane	addc42c339	Create the planner mechanism for optimizing simple MIN and MAX queries into indexscans on matching indexes. For the moment, it only handles int4 and text datatypes; next step is to add a column to pg_aggregate so that all MIN/MAX aggregates can be handled. Per my recent proposal.	2005-04-11 23:06:57 +00:00
Bruce Momjian	8096fe45ce	The added aggregates are: (1) boolean-and and boolean-or aggregates named bool_and and bool_or. they (SHOULD;-) correspond to standard sql every and some/any aggregates. they do not have the right name as there is a problem with the standard and the parser for some/any. Tom also think that the standard name is misleading because NULL are ignored. Also add 'every' aggregate. (2) bitwise integer aggregates named bit_and and bit_or for int2, int4, int8 and bit types. They are not standard, but I find them useful. I needed them once. The patches adds: - 2 new very short strict functions for boolean aggregates in src/backed/utils/adt/bool.c, src/include/utils/builtins.h and src/include/catalog/pg_proc.h - the new aggregates declared in src/include/catalog/pg_proc.h and src/include/catalog/pg_aggregate.h - some documentation and validation about these new aggregates. Fabien COELHO	2004-05-26 15:26:28 +00:00
Tom Lane	e649796f12	Implement outer-level aggregates to conform to the SQL spec, with extensions to support our historical behavior. An aggregate belongs to the closest query level of any of the variables in its argument, or the current query level if there are no variables (e.g., COUNT(*)). The implementation involves adding an agglevelsup field to Aggref, and treating outer aggregates like outer variables at planning time.	2003-06-06 15:04:03 +00:00

1 2

57 commits