postgresql

mirror of https://github.com/postgres/postgres.git synced 2026-03-18 00:22:18 -04:00

Author	SHA1	Message	Date
Tom Lane	efd0c16bec	Avoid using list_length() to test for empty list. The standard way to check for list emptiness is to compare the List pointer to NIL; our list code goes out of its way to ensure that that is the only representation of an empty list. (An acceptable alternative is a plain boolean test for non-null pointer, but explicit mention of NIL is usually preferable.) Various places didn't get that memo and expressed the condition with list_length(), which might not be so bad except that there were such a variety of ways to check it exactly: equal to zero, less than or equal to zero, less than one, yadda yadda. In the name of code readability, let's standardize all those spellings as "list == NIL" or "list != NIL". (There's probably some microscopic efficiency gain too, though few of these look to be at all performance-critical.) A very small number of cases were left as-is because they seemed more consistent with other adjacent list_length tests that way. Peter Smith, with bikeshedding from a number of us Discussion: https://postgr.es/m/CAHut+PtQYe+ENX5KrONMfugf0q6NHg4hR5dAhqEXEc2eefFeig@mail.gmail.com	2022-08-17 11:12:35 -04:00
David Rowley	53823a06be	Fix failure to set correct operator in window run condition This was a simple omission in `9d9c02ccd` where the code didn't correctly set the operator to use in the run condition OpExpr when the window function was both monotonically increasing and decreasing. Bug discovered by Julien Roze, although he did not report it. Reported-by: Phil Florent Discussion: https://postgr.es/m/PA4P191MB160009A09B9D0624359278CFBA9F9@PA4P191MB1600.EURP191.PROD.OUTLOOK.COM Backpatch-through: 15, where `9d9c02ccd` was added	2022-08-05 10:14:00 +12:00
Tom Lane	1aa8dad41f	Fix incorrect tests for SRFs in relation_can_be_sorted_early(). Commit `fac1b470a` thought we could check for set-returning functions by testing only the top-level node in an expression tree. This is wrong in itself, and to make matters worse it encouraged others to make the same mistake, by exporting tlist.c's special-purpose IS_SRF_CALL() as a widely-visible macro. I can't find any evidence that anyone's taken the bait, but it was only a matter of time. Use expression_returns_set() instead, and stuff the IS_SRF_CALL() genie back in its bottle, this time with a warning label. I also added a couple of cross-reference comments. After a fair amount of fooling around, I've despaired of making a robust test case that exposes the bug reliably, so no test case here. (Note that the test case added by `fac1b470a` is itself broken, in that it doesn't notice if you remove the code change. The repro given by the bug submitter currently doesn't fail either in v15 or HEAD, though I suspect that may indicate an unrelated bug.) Per bug #17564 from Martijn van Oosterhout. Back-patch to v13, as the faulty patch was. Discussion: https://postgr.es/m/17564-c7472c2f90ef2da3@postgresql.org	2022-08-03 17:33:42 -04:00
David Rowley	1349d2790b	Improve performance of ORDER BY / DISTINCT aggregates ORDER BY / DISTINCT aggreagtes have, since implemented in Postgres, been executed by always performing a sort in nodeAgg.c to sort the tuples in the current group into the correct order before calling the transition function on the sorted tuples. This was not great as often there might be an index that could have provided pre-sorted input and allowed the transition functions to be called as the rows come in, rather than having to store them in a tuplestore in order to sort them once all the tuples for the group have arrived. Here we change the planner so it requests a path with a sort order which supports the most amount of ORDER BY / DISTINCT aggregate functions and add new code to the executor to allow it to support the processing of ORDER BY / DISTINCT aggregates where the tuples are already sorted in the correct order. Since there can be many ORDER BY / DISTINCT aggregates in any given query level, it's very possible that we can't find an order that suits all of these aggregates. The sort order that the planner chooses is simply the one that suits the most aggregate functions. We take the most strictly sorted variation of each order and see how many aggregate functions can use that, then we try again with the order of the remaining aggregates to see if another order would suit more aggregate functions. For example: SELECT agg(a ORDER BY a),agg2(a ORDER BY a,b) ... would request the sort order to be {a, b} because {a} is a subset of the sort order of {a,b}, but; SELECT agg(a ORDER BY a),agg2(a ORDER BY c) ... would just pick a plan ordered by {a} (we give precedence to aggregates which are earlier in the targetlist). SELECT agg(a ORDER BY a),agg2(a ORDER BY b),agg3(a ORDER BY b) ... would choose to order by {b} since two aggregates suit that vs just one that requires input ordered by {a}. Author: David Rowley Reviewed-by: Ronan Dunklau, James Coleman, Ranier Vilela, Richard Guo, Tom Lane Discussion: https://postgr.es/m/CAApHDvpHzfo92%3DR4W0%2BxVua3BUYCKMckWAmo-2t_KiXN-wYH%3Dw%40mail.gmail.com	2022-08-02 23:11:45 +12:00
David Rowley	b592422095	Relax overly strict rules in select_outer_pathkeys_for_merge() The select_outer_pathkeys_for_merge function made an attempt to build the merge join pathkeys in the same order as query_pathkeys. This was done as it may have led to no sort being required for an ORDER BY or GROUP BY clause in the upper planner. However, this restriction seems overly strict as it required that we match the query_pathkeys entirely or we don't bother putting the merge join pathkeys in that order. Here we relax this rule so that we use a prefix of the query_pathkeys providing that prefix matches all of the join quals. This may provide the upper planner with partially sorted input which will allow the use of incremental sorts instead of full sorts. Author: David Rowley Reviewed-by: Richard Guo Discussion: https://postgr.es/m/CAApHDvrtZu0PHVfDPFM4Yx3jNR2Wuwosv+T2zqa7LrhhBr2rRg@mail.gmail.com	2022-08-02 11:02:46 +12:00
Tom Lane	d8e34fa7a1	Fix incorrect is-this-the-topmost-join tests in parallel planning. Two callers of generate_useful_gather_paths were testing the wrong thing when deciding whether to call that function: they checked for being at the top of the current join subproblem, rather than being at the actual top join. This'd result in failing to construct parallel paths for a sub-join for which they might be useful. While set_rel_pathlist() isn't actively broken, it seems best to make its identical-in-intention test for this be like the other two. This has been wrong all along, but given the lack of field complaints I'm hesitant to back-patch into stable branches; we usually prefer to avoid non-bug-fix changes in plan choices in minor releases. It seems not too late for v15 though. Richard Guo, reviewed by Antonin Houska and Tom Lane Discussion: https://postgr.es/m/CAMbWs4-mH8Zf87-w+3P2J=nJB+5OyicO28ia9q_9o=Lamf_VHg@mail.gmail.com	2022-07-30 13:05:15 -04:00
Thomas Munro	4f1f5a7f85	Remove fls(), use pg_leftmost_one_pos32() instead. Commit `4f658dc8` provided the traditional BSD fls() function in src/port/fls.c so it could be used in several places. Later we added a bunch of similar facilities in pg_bitutils.h, based on compiler builtins that map to hardware instructions. It's a bit confusing to have both 1-based and 0-based variants of this operation in use in different parts of the tree, and neither is blessed by a standard. Let's drop fls.c and the configure probe, and reuse the newer code. Reviewed-by: David Rowley <dgrowleyml@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/CA%2BhUKG%2B7dSX1XF8yFGmYk-%3D48dbjH2kmzZj16XvhbrWP-9BzRg%40mail.gmail.com	2022-07-22 10:41:50 +12:00
Tom Lane	e2f6c307c0	Estimate cost of elided SubqueryScan, Append, MergeAppend nodes better. setrefs.c contains logic to discard no-op SubqueryScan nodes, that is, ones that have no qual to check and copy the input targetlist unchanged. (Formally it's not very nice to be applying such optimizations so late in the planner, but there are practical reasons for it; mostly that we can't unify relids between the subquery and the parent query until we flatten the rangetable during setrefs.c.) This behavior falsifies our previous cost estimates, since we would've charged cpu_tuple_cost per row just to pass data through the node. Most of the time that's little enough to not matter, but there are cases where this effect visibly changes the plan compared to what you would've gotten with no sub-select. To improve the situation, make the callers of cost_subqueryscan tell it whether they think the targetlist is trivial. cost_subqueryscan already has the qual list, so it can check the other half of the condition easily. It could make its own determination of tlist triviality too, but doing so would be repetitive (for callers that may call it several times) or unnecessarily expensive (for callers that can determine this more cheaply than a general test would do). This isn't a 100% solution, because createplan.c also does things that can falsify any earlier estimate of whether the tlist is trivial. However, it fixes nearly all cases in practice, if results for the regression tests are anything to go by. setrefs.c also contains logic to discard no-op Append and MergeAppend nodes. We did have knowledge of that behavior at costing time, but somebody failed to update it when a check on parallel-awareness was added to the setrefs.c logic. Fix that while we're here. These changes result in two minor changes in query plans shown in our regression tests. Neither is relevant to the purposes of its test case AFAICT. Patch by me; thanks to Richard Guo for review. Discussion: https://postgr.es/m/2581077.1651703520@sss.pgh.pa.us	2022-07-19 11:18:19 -04:00
Peter Eisentraut	b449afb582	Attempt to fix compiler warning on old compiler Build farm member lapwing (using gcc 4.7.2) didn't like one part of `9fd45870c1`, raising a compiler warning. Revert that for now.	2022-07-16 13:45:57 +02:00
Peter Eisentraut	9fd45870c1	Replace many MemSet calls with struct initialization This replaces all MemSet() calls with struct initialization where that is easily and obviously possible. (For example, some cases have to worry about padding bits, so I left those.) (The same could be done with appropriate memset() calls, but this patch is part of an effort to phase out MemSet(), so it doesn't touch memset() calls.) Reviewed-by: Ranier Vilela <ranier.vf@gmail.com> Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://www.postgresql.org/message-id/9847b13c-b785-f4e2-75c3-12ec77a3b05c@enterprisedb.com	2022-07-16 08:50:49 +02:00
Michael Paquier	6203583b72	Remove support for Visual Studio 2013 No members of the buildfarm are using this version of Visual Studio, resulting in all the code cleaned up here as being mostly dead, and VS2017 is the oldest version still supported. More versions could be cut, but the gain would be minimal, while removing only VS2013 has the advantage to remove from the core code all the dependencies on the value defined by _MSC_VER, where compatibility tweaks have accumulated across the years mostly around locales and strtof(), so that's a nice isolated cleanup. Note that this commit additionally allows a revert of `3154e16`. The versions of Visual Studio now supported range from 2015 to 2022. Author: Michael Paquier Reviewed-by: Juan José Santamaría Flecha, Tom Lane, Thomas Munro, Justin Pryzby Discussion: https://postgr.es/m/YoH2IMtxcS3ncWn+@paquier.xyz	2022-07-14 11:22:49 +09:00
David Rowley	c23e3e6beb	Use list_copy_head() instead of list_truncate(list_copy(...), ...) Truncating off the end of a freshly copied List is not a very efficient way of copying the first N elements of a List. In many of the cases that are updated here, the pattern was only being used to remove the final element of a List. That's about the best case for it, but there were many instances where the truncate trimming the List down much further. `4cc832f94` added list_copy_head(), so let's use it in cases where it's useful. Author: David Rowley Discussion: https://postgr.es/m/1986787.1657666922%40sss.pgh.pa.us	2022-07-13 15:03:47 +12:00
David Rowley	4cc832f94a	Tidy up code in get_cheapest_group_keys_order() There are a few things that we could do a little better within get_cheapest_group_keys_order(): 1. We should be using list_free() rather than pfree() on a List. 2. We should use for_each_from() instead of manually coding a for loop to skip the first n elements of a List 3. list_truncate(list_copy(...), n) is not a great way to copy the first n elements of a list. Let's invent list_copy_head() for that. That way we don't need to copy the entire list just to truncate it directly afterwards. 4. We can simplify finding the cheapest cost by setting the cheapest cost variable to DBL_MAX. That allows us to skip special-casing the initial iteration of the loop. Author: David Rowley Discussion: https://postgr.es/m/CAApHDvrGyL3ft8waEkncG9y5HDMu5TFFJB1paoTC8zi9YK97Nw@mail.gmail.com Backpatch-through: 15, where get_cheapest_group_keys_order was added.	2022-07-13 14:02:20 +12:00
David Rowley	3e9abd2eb1	Teach remove_unused_subquery_outputs about window run conditions `9d9c02ccd` added code to allow the executor to take shortcuts when quals on monotonic window functions guaranteed that once the qual became false it could never become true again. When possible, baserestrictinfo quals are converted to become these quals, which we call run conditions. Unfortunately, in `9d9c02ccd`, I forgot to update remove_unused_subquery_outputs to teach it about these run conditions. This could cause a WindowFunc column which was unused in the target list but referenced by an upper-level WHERE clause to be removed from the subquery when the qual in the WHERE clause was converted into a window run condition. Because of this, the entire WindowClause would be removed from the query resulting in additional rows making it into the resultset when they should have been filtered out by the WHERE clause. Here we fix this by recording which target list items in the subquery have run conditions. That gets passed along to remove_unused_subquery_outputs to tell it not to remove these items from the target list. Bug: #17495 Reported-by: Jeremy Evans Reviewed-by: Richard Guo Discussion: https://postgr.es/m/17495-7ffe2fa0b261b9fa@postgresql.org	2022-05-27 10:37:58 +12:00
Tom Lane	a916cb9d5a	Avoid overflow hazard when clamping group counts to "long int". Several places in the planner tried to clamp a double value to fit in a "long" by doing (long) Min(x, (double) LONG_MAX); This is subtly incorrect, because it casts LONG_MAX to double and potentially back again. If long is 64 bits then the double value is inexact, and the platform might round it up to LONG_MAX+1 resulting in an overflow and an undesirably negative output. While it's not hard to rewrite the expression into a safe form, let's put it into a common function to reduce the risk of someone doing it wrong in future. In principle this is a bug fix, but since the problem could only manifest with group count estimates exceeding 2^63, it seems unlikely that anyone has actually hit this or will do so anytime soon. We're fixing it mainly to satisfy fuzzer-type tools. That being the case, a HEAD-only fix seems sufficient. Andrey Lepikhov Discussion: https://postgr.es/m/ebbc2efb-7ef9-bf2f-1ada-d6ec48f70e58@postgrespro.ru	2022-05-21 13:13:44 -04:00
David Rowley	1e731ed12a	Fix incorrect row estimates used for Memoize costing In order to estimate the cache hit ratio of a Memoize node, one of the inputs we require is the estimated number of times the Memoize node will be rescanned. The higher this number, the large the cache hit ratio is likely to become. Unfortunately, the value being passed as the number of "calls" to the Memoize was incorrectly using the Nested Loop's outer_path->parent->rows instead of outer_path->rows. This failed to account for the fact that the outer_path might be parameterized by some upper-level Nested Loop. This problem could lead to Memoize plans appearing more favorable than they might actually be. It could also lead to extended executor startup times when work_mem values were large due to the planner setting overly large MemoizePath->est_entries resulting in the Memoize hash table being initially made much larger than might be required. Fix this simply by passing outer_path->rows rather than outer_path->parent->rows. Also, adjust the expected regression test output for a plan change. Reported-by: Pavel Stehule Author: David Rowley Discussion: https://postgr.es/m/CAFj8pRAMp%3DQsMi6sPQJ4W3hczoFJRvyXHJV3AZAZaMyTVM312Q%40mail.gmail.com Backpatch-through: 14, where Memoize was introduced	2022-05-16 16:07:56 +12:00
Tom Lane	23e7b38bfe	Pre-beta mechanical code beautification. Run pgindent, pgperltidy, and reformat-dat-files. I manually fixed a couple of comments that pgindent uglified.	2022-05-12 15:17:30 -04:00
Tom Lane	c40ba5f318	Fix rowcount estimate for SubqueryScan that's under a Gather. SubqueryScan was always getting labeled with a rowcount estimate appropriate for non-parallel cases. However, nodes that are underneath a Gather should be treated as processing only one worker's share of the rows, whether the particular node is explicitly parallel-aware or not. Most non-scan-level node types get this right automatically because they base their rowcount estimate on that of their input sub-Path(s). But SubqueryScan didn't do that, instead using the whole-relation rowcount estimate as if it were a non-parallel-aware scan node. If there is a parallel-aware node below the SubqueryScan, this is wrong, and it results in inflating the cost estimates for nodes above the SubqueryScan, which can cause us to not choose a parallel plan, or choose a silly one --- as indeed is visible in the one regression test whose results change with this patch. (Although that plan tree appears to contain no SubqueryScans, there were some in it before setrefs.c deleted them.) To fix, use path->subpath->rows not baserel->tuples as the number of input tuples we'll process. This requires estimating the quals' selectivity afresh, which is slightly annoying; but it shouldn't really add much cost thanks to the caching done in RestrictInfo. This is pretty clearly a bug fix, but I'll refrain from back-patching as people might not appreciate plan choices changing in stable branches. The fact that it took us this long to identify the bug suggests that it's not a major problem. Per report from bucoo, though this is not his proposed patch. Discussion: https://postgr.es/m/202204121457159307248@sohu.com	2022-05-04 14:44:40 -04:00
Tom Lane	92e7a53752	Remove inadequate assertion check in CTE inlining. inline_cte() expected to find exactly as many references to the target CTE as its cterefcount indicates. While that should be accurate for the tree as emitted by the parser, there are some optimizations that occur upstream of here that could falsify it, notably removal of unused subquery output expressions. Trying to make the accounting 100% accurate seems expensive and doomed to future breakage. It's not really worth it, because all this code is protecting is downstream assumptions that every referenced CTE has a plan. Let's convert those assertions to regular test-and-elog just in case there's some actual problem, and then drop the failing assertion. Per report from Tomas Vondra (thanks also to Richard Guo for analysis). Back-patch to v12 where the faulty code came in. Discussion: https://postgr.es/m/29196a1e-ed47-c7ca-9be2-b1c636816183@enterprisedb.com	2022-04-21 17:58:52 -04:00
David Rowley	b0e5f02ddc	Fix various typos and spelling mistakes in code comments Author: Justin Pryzby Discussion: https://postgr.es/m/20220411020336.GB26620@telsasoft.com	2022-04-11 20:49:41 +12:00
David Rowley	9d9c02ccd1	Teach planner and executor about monotonic window funcs Window functions such as row_number() always return a value higher than the previously returned value for tuples in any given window partition. Traditionally queries such as; SELECT * FROM ( SELECT , row_number() over (order by c) rn FROM t ) t WHERE rn <= 10; were executed fairly inefficiently. Neither the query planner nor the executor knew that once rn made it to 11 that nothing further would match the outer query's WHERE clause. It would blindly continue until all tuples were exhausted from the subquery. Here we implement means to make the above execute more efficiently. This is done by way of adding a pg_proc.prosupport function to various of the built-in window functions and adding supporting code to allow the support function to inform the planner if the window function is monotonically increasing, monotonically decreasing, both or neither. The planner is then able to make use of that information and possibly allow the executor to short-circuit execution by way of adding a "run condition" to the WindowAgg to allow it to determine if some of its execution work can be skipped. This "run condition" is not like a normal filter. These run conditions are only built using quals comparing values to monotonic window functions. For monotonic increasing functions, quals making use of the btree operators for <, <= and = can be used (assuming the window function column is on the left). You can see here that once such a condition becomes false that a monotonic increasing function could never make it subsequently true again. For monotonically decreasing functions the >, >= and = btree operators for the given type can be used for run conditions. The best-case situation for this is when there is a single WindowAgg node without a PARTITION BY clause. Here when the run condition becomes false the WindowAgg node can simply return NULL. No more tuples will ever match the run condition. It's a little more complex when there is a PARTITION BY clause. In this case, we cannot return NULL as we must still process other partitions. To speed this case up we pull tuples from the outer plan to check if they're from the same partition and simply discard them if they are. When we find a tuple belonging to another partition we start processing as normal again until the run condition becomes false or we run out of tuples to process. When there are multiple WindowAgg nodes to evaluate then this complicates the situation. For intermediate WindowAggs we must ensure we always return all tuples to the calling node. Any filtering done could lead to incorrect results in WindowAgg nodes above. For all intermediate nodes, we can still save some work when the run condition becomes false. We've no need to evaluate the WindowFuncs anymore. Other WindowAgg nodes cannot reference the value of these and these tuples will not appear in the final result anyway. The savings here are small in comparison to what can be saved in the top-level WingowAgg, but still worthwhile. Intermediate WindowAgg nodes never filter out tuples, but here we change WindowAgg so that the top-level WindowAgg filters out tuples that don't match the intermediate WindowAgg node's run condition. Such filters appear in the "Filter" clause in EXPLAIN for the top-level WindowAgg node. Here we add prosupport functions to allow the above to work for; row_number(), rank(), dense_rank(), count() and count(expr). It appears technically possible to do the same for min() and max(), however, it seems unlikely to be useful enough, so that's not done here. Bump catversion Author: David Rowley Reviewed-by: Andy Fan, Zhihong Yu Discussion: https://postgr.es/m/CAApHDvqvp3At8++yF8ij06sdcoo1S_b2YoaT9D4Nf+MObzsrLQ@mail.gmail.com	2022-04-08 10:34:36 +12:00
Andrew Dunstan	9f91344223	Fix comments with "a expression"	2022-03-31 15:45:25 -04:00
Tom Lane	f3dd9fe1dd	Fix postgres_fdw to check shippability of sort clauses properly. postgres_fdw would push ORDER BY clauses to the remote side without verifying that the sort operator is safe to ship. Moreover, it failed to print a suitable USING clause if the sort operator isn't default for the sort expression's type. The net result of this is that the remote sort might not have anywhere near the semantics we expect, which'd be disastrous for locally-performed merge joins in particular. We addressed similar issues in the context of ORDER BY within an aggregate function call in commit `7012b132d`, but failed to notice that query-level ORDER BY was broken. Thus, much of the necessary logic already existed, but it requires refactoring to be usable in both cases. Back-patch to all supported branches. In HEAD only, remove the core code's copy of find_em_expr_for_rel, which is no longer used and really should never have been pushed into equivclass.c in the first place. Ronan Dunklau, per report from David Rowley; reviews by David Rowley, Ranier Vilela, and myself Discussion: https://postgr.es/m/CAApHDvr4OeC2DBVY--zVP83-K=bYrTD7F8SZDhN4g+pj2f2S-A@mail.gmail.com	2022-03-31 14:29:48 -04:00
Tomas Vondra	db0d67db24	Optimize order of GROUP BY keys When evaluating a query with a multi-column GROUP BY clause using sort, the cost may be heavily dependent on the order in which the keys are compared when building the groups. Grouping does not imply any ordering, so we're allowed to compare the keys in arbitrary order, and a Hash Agg leverages this. But for Group Agg, we simply compared keys in the order as specified in the query. This commit explores alternative ordering of the keys, trying to find a cheaper one. In principle, we might generate grouping paths for all permutations of the keys, and leave the rest to the optimizer. But that might get very expensive, so we try to pick only a couple interesting orderings based on both local and global information. When planning the grouping path, we explore statistics (number of distinct values, cost of the comparison function) for the keys and reorder them to minimize comparison costs. Intuitively, it may be better to perform more expensive comparisons (for complex data types etc.) last, because maybe the cheaper comparisons will be enough. Similarly, the higher the cardinality of a key, the lower the probability we’ll need to compare more keys. The patch generates and costs various orderings, picking the cheapest ones. The ordering of group keys may interact with other parts of the query, some of which may not be known while planning the grouping. E.g. there may be an explicit ORDER BY clause, or some other ordering-dependent operation, higher up in the query, and using the same ordering may allow using either incremental sort or even eliminate the sort entirely. The patch generates orderings and picks those minimizing the comparison cost (for various pathkeys), and then adds orderings that might be useful for operations higher up in the plan (ORDER BY, etc.). Finally, it always keeps the ordering specified in the query, on the assumption the user might have additional insights. This introduces a new GUC enable_group_by_reordering, so that the optimization may be disabled if needed. The original patch was proposed by Teodor Sigaev, and later improved and reworked by Dmitry Dolgov. Reviews by a number of people, including me, Andrey Lepikhov, Claudio Freire, Ibrar Ahmed and Zhihong Yu. Author: Dmitry Dolgov, Teodor Sigaev, Tomas Vondra Reviewed-by: Tomas Vondra, Andrey Lepikhov, Claudio Freire, Ibrar Ahmed, Zhihong Yu Discussion: https://postgr.es/m/7c79e6a5-8597-74e8-0671-1c39d124c9d6%40sigaev.ru Discussion: https://postgr.es/m/CA%2Bq6zcW_4o2NC0zutLkOJPsFt80megSpX_dVRo6GK9PC-Jx_Ag%40mail.gmail.com	2022-03-31 01:13:33 +02:00
Andrew Dunstan	1a36bc9dba	SQL/JSON query functions This introduces the SQL/JSON functions for querying JSON data using jsonpath expressions. The functions are: JSON_EXISTS() JSON_QUERY() JSON_VALUE() All of these functions only operate on jsonb. The workaround for now is to cast the argument to jsonb. JSON_EXISTS() tests if the jsonpath expression applied to the jsonb value yields any values. JSON_VALUE() must return a single value, and an error occurs if it tries to return multiple values. JSON_QUERY() must return a json object or array, and there are various WRAPPER options for handling scalar or multi-value results. Both these functions have options for handling EMPTY and ERROR conditions. Nikita Glukhov Reviewers have included (in no particular order) Andres Freund, Alexander Korotkov, Pavel Stehule, Andrew Alsup, Erik Rijkers, Zihong Yu, Himanshu Upadhyaya, Daniel Gustafsson, Justin Pryzby. Discussion: https://postgr.es/m/cd0bb935-0158-78a7-08b5-904886deac4b@postgrespro.ru	2022-03-29 16:57:13 -04:00
Tom Lane	0bd7af082a	Invent recursive_worktable_factor GUC to replace hard-wired constant. Up to now, the planner estimated the size of a recursive query's worktable as 10 times the size of the non-recursive term. It's hard to see how to do significantly better than that automatically, but we can give users control over the multiplier to allow tuning for specific use-cases. The default behavior remains the same. Simon Riggs Discussion: https://postgr.es/m/CANbhV-EuaLm4H3g0+BSTYHEGxJj3Kht0R+rJ8vT57Dejnh=_nA@mail.gmail.com	2022-03-24 11:47:41 -04:00
Tom Lane	2591ee8ec4	Fix assorted missing logic for GroupingFunc nodes. The planner needs to treat GroupingFunc like Aggref for many purposes, in particular with respect to processing of the argument expressions, which are not to be evaluated at runtime. A few places hadn't gotten that memo, notably including subselect.c's processing of outer-level aggregates. This resulted in assertion failures or wrong plans for cases in which a GROUPING() construct references an outer aggregation level. Also fix missing special cases for GroupingFunc in cost_qual_eval (resulting in wrong cost estimates for GROUPING(), although it's not clear that that would affect plan shapes in practice) and in ruleutils.c (resulting in excess parentheses in pretty-print mode). Per bug #17088 from Yaoguang Chen. Back-patch to all supported branches. Richard Guo, Tom Lane Discussion: https://postgr.es/m/17088-e33882b387de7f5c@postgresql.org	2022-03-21 17:44:29 -04:00
Tomas Vondra	7b65862e22	Correct type of front_pathkey to PathKey In sort_inner_and_outer we iterate a list of PathKey elements, but the variable is declared as (List *). This mistake is benign, because we only pass the pointer to lcons() and never dereference it. This exists since ~2004, but it's confusing. So fix and backpatch to all supported branches. Backpatch-through: 10 Discussion: https://postgr.es/m/bf3a6ea1-a7d8-7211-0669-189d5c169374%40enterprisedb.com	2022-01-23 03:53:18 +01:00
Tomas Vondra	6b94e7a6da	Consider fractional paths in generate_orderedappend_paths When building append paths, we've been looking only at startup and total costs for the paths. When building fractional paths that may eliminate the cheapest one, because it may be dominated by two separate paths (one for startup, one for total cost). This extends generate_orderedappend_paths() to also consider which paths have lowest fractional cost. Currently we only consider paths matching pathkeys - in the future this may be improved by also considering paths that are only partially sorted, with an incremental sort on top. Original report of an issue by Arne Roland, patch by me (based on a suggestion by Tom Lane). Reviewed-by: Arne Roland, Zhihong Yu Discussion: https://postgr.es/m/e8f9ec90-546d-e948-acce-0525f3e92773%40enterprisedb.com Discussion: https://postgr.es/m/1581042da8044e71ada2d6e3a51bf7bb%40index.de	2022-01-12 22:27:24 +01:00
Bruce Momjian	27b77ecf9f	Update copyright for 2022 Backpatch-through: 10	2022-01-07 19:04:57 -05:00
Tom Lane	8a2e323f20	Handle mixed returnable and non-returnable columns better in IOS. We can revert the code changes of commit `b5febc1d1` now, because commit `9a3ddeb51` installed a real solution for the difficulty that `b5febc1d1` just dodged, namely that the planner might pick the wrong one of several index columns nominally containing the same value. It only matters which one we pick if we pick one that's not returnable, and that mistake is now foreclosed. Although both of the aforementioned commits were back-patched, I don't feel a need to take any risk by back-patching this one. The cases that it improves are very corner-ish. Discussion: https://postgr.es/m/3179992.1641150853@sss.pgh.pa.us	2022-01-03 16:12:11 -05:00
David Rowley	e502150f7d	Allow Memoize to operate in binary comparison mode Memoize would always use the hash equality operator for the cache key types to determine if the current set of parameters were the same as some previously cached set. Certain types such as floating points where -0.0 and +0.0 differ in their binary representation but are classed as equal by the hash equality operator may cause problems as unless the join uses the same operator it's possible that whichever join operator is being used would be able to distinguish the two values. In which case we may accidentally return in the incorrect rows out of the cache. To fix this here we add a binary mode to Memoize to allow it to the current set of parameters to previously cached values by comparing bit-by-bit rather than logically using the hash equality operator. This binary mode is always used for LATERAL joins and it's used for normal joins when any of the join operators are not hashable. Reported-by: Tom Lane Author: David Rowley Discussion: https://postgr.es/m/3004308.1632952496@sss.pgh.pa.us Backpatch-through: 14, where Memoize was added	2021-11-24 10:06:59 +13:00
David Rowley	39a3105678	Fix incorrect hash equality operator bug in Memoize In v14, because we don't have a field in RestrictInfo to cache both the left and right type's hash equality operator, we just restrict the scope of Memoize to only when the left and right types of a RestrictInfo are the same. In master we add another field to RestrictInfo and cache both hash equality operators. Reported-by: Jaime Casanova Author: David Rowley Discussion: https://postgr.es/m/20210929185544.GB24346%40ahch-to Backpatch-through: 14	2021-11-08 14:40:33 +13:00
Etsuro Fujita	700c733128	Add missing word to comment in joinrels.c. Author: Amit Langote Backpatch-through: 13 Discussion: https://postgr.es/m/CA%2BHiwqGQNbtamQ_9DU3osR1XiWR4wxWFZurPmN6zgbdSZDeWmw%40mail.gmail.com	2021-10-07 17:45:00 +09:00
Tom Lane	e3ec3c00d8	Remove arbitrary 64K-or-so limit on rangetable size. Up to now the size of a query's rangetable has been limited by the constants INNER_VAR et al, which mustn't be equal to any real rangetable index. 65000 doubtless seemed like enough for anybody, and it still is orders of magnitude larger than the number of joins we can realistically handle. However, we need a rangetable entry for each child partition that is (or might be) processed by a query. Queries with a few thousand partitions are getting more realistic, so that the day when that limit becomes a problem is in sight, even if it's not here yet. Hence, let's raise the limit. Rather than just increase the values of INNER_VAR et al, this patch adopts the approach of making them small negative values, so that rangetables could theoretically become as long as INT_MAX. The bulk of the patch is concerned with changing Var.varno and some related variables from "Index" (unsigned int) to plain "int". This is basically cosmetic, with little actual effect other than to help debuggers print their values nicely. As such, I've only bothered with changing places that could actually see INNER_VAR et al, which the parser and most of the planner don't. We do have to be careful in places that are performing less/greater comparisons on varnos, but there are very few such places, other than the IS_SPECIAL_VARNO macro itself. A notable side effect of this patch is that while it used to be possible to add INNER_VAR et al to a Bitmapset, that will now draw an error. I don't see any likelihood that it wouldn't be a bug to include these fake varnos in a bitmapset of real varnos, so I think this is all to the good. Although this touches outfuncs/readfuncs, I don't think a catversion bump is required, since stored rules would never contain Vars with these fake varnos. Andrey Lepikhov and Tom Lane, after a suggestion by Peter Eisentraut Discussion: https://postgr.es/m/43c7f2f5-1e27-27aa-8c65-c91859d15190@postgrespro.ru	2021-09-15 14:11:21 -04:00
Peter Eisentraut	18fea737b5	Change NestPath node to contain JoinPath node This makes the structure of all JoinPath-derived nodes the same, independent of whether they have additional fields. Discussion: https://www.postgresql.org/message-id/flat/c1097590-a6a4-486a-64b1-e1f9cc0533ce@enterprisedb.com	2021-08-08 18:46:34 +02:00
David Rowley	db632fbca3	Allow ordered partition scans in more cases `959d00e9d` added the ability to make use of an Append node instead of a MergeAppend when we wanted to perform a scan of a partitioned table and the required sort order was the same as the partitioned keys and the partitioned table was defined in such a way that earlier partitions were guaranteed to only contain lower-order values than later partitions. However, previously we didn't allow these ordered partition scans for LIST partitioned table when there were any partitions that allowed multiple Datums. This was a very cheap check to make and we could likely have done a little better by checking if there were interleaved partitions, but at the time we didn't have visibility about which partitions were pruned, so we still may have disallowed cases where all interleaved partitions were pruned. Since `475dbd0b7`, we now have knowledge of pruned partitions, we can do a much better job inside partitions_are_ordered(). Here we pass which partitions survived partition pruning into partitions_are_ordered() and, for LIST partitioning, have it check to see if any live partitions exist that are also in the new "interleaved_parts" field defined in PartitionBoundInfo. For RANGE partitioning we can relax the code which caused the partitions to be unordered if a DEFAULT partition existed. Since we now know which partitions were pruned, partitions_are_ordered() now returns true when the DEFAULT partition was pruned. Reviewed-by: Amit Langote, Zhihong Yu Discussion: https://postgr.es/m/CAApHDvrdoN_sXU52i=QDXe2k3WAo=EVry29r2+Tq2WYcn2xhEA@mail.gmail.com	2021-08-03 12:25:52 +12:00
David Rowley	475dbd0b71	Track a Bitmapset of non-pruned partitions in RelOptInfo For partitioned tables with large numbers of partitions where queries are able to prune all but a very small number of partitions, the time spent in the planner looping over RelOptInfo.part_rels checking for non-NULL RelOptInfos could become a large portion of the overall planning time. Here we add a Bitmapset that records the non-pruned partitions. This allows us to more efficiently skip the pruned partitions by looping over the Bitmapset. This will cause a very slight slow down in cases where no or not many partitions could be pruned, however, those cases are already slow to plan anyway and the overhead of looping over the Bitmapset would be unmeasurable when compared with the other tasks such as path creation for a large number of partitions. Reviewed-by: Amit Langote, Zhihong Yu Discussion: https://postgr.es/m/CAApHDvqnPx6JnUuPwaf5ao38zczrAb9mxt9gj4U1EKFfd4AqLA@mail.gmail.com	2021-08-03 11:47:24 +12:00
Tom Lane	28d936031a	Get rid of artificial restriction on hash table sizes on Windows. The point of introducing the hash_mem_multiplier GUC was to let users reproduce the old behavior of hash aggregation, i.e. that it could use more than work_mem at need. However, the implementation failed to get the job done on Win64, where work_mem is clamped to 2GB to protect various places that calculate memory sizes using "long int". As written, the same clamp was applied to hash_mem. This resulted in severe performance regressions for queries requiring a bit more than 2GB for hash aggregation, as they now spill to disk and there's no way to stop that. Getting rid of the work_mem restriction seems like a good idea, but it's a big job and could not conceivably be back-patched. However, there's only a fairly small number of places that are concerned with the hash_mem value, and it turns out to be possible to remove the restriction there without too much code churn or any ABI breaks. So, let's do that for now to fix the regression, and leave the larger task for another day. This patch does introduce a bit more infrastructure that should help with the larger task, namely pg_bitutils.h support for working with size_t values. Per gripe from Laurent Hasson. Back-patch to v13 where the behavior change came in. Discussion: https://postgr.es/m/997817.1627074924@sss.pgh.pa.us Discussion: https://postgr.es/m/MN2PR15MB25601E80A9B6D1BA6F592B1985E39@MN2PR15MB2560.namprd15.prod.outlook.com	2021-07-25 14:02:27 -04:00
David Rowley	83f4fcc655	Change the name of the Result Cache node to Memoize "Result Cache" was never a great name for this node, but nobody managed to come up with another name that anyone liked enough. That was until David Johnston mentioned "Node Memoization", which Tom Lane revised to just "Memoize". People seem to like "Memoize", so let's do the rename. Reviewed-by: Justin Pryzby Discussion: https://postgr.es/m/20210708165145.GG1176@momjian.us Backpatch-through: 14, where Result Cache was introduced	2021-07-14 12:43:58 +12:00
David Rowley	9ee91cc583	Fix typo in comment Author: James Coleman Discussion: https://postgr.es/m/CAAaqYe8f8ENA0i1PdBtUNWDd2sxHSMgscNYbjhaXMuAdfBrZcg@mail.gmail.com	2021-07-06 12:38:50 +12:00
David Rowley	99c5852e20	Add missing NULL check when building Result Cache paths Code added in `9e215378d` to disable building of Result Cache paths when not all join conditions are part of the parameterization of a unique join failed to first check if the inner path's param_info was set before checking the param_info's ppi_clauses. Add a check for NULL values here and just bail on trying to build the path if param_info is NULL. lateral_vars are not considered when deciding if the join is unique, so we're not missing out on doing the optimization when there are lateral_vars and no param_info. Reported-by: Coverity, via Tom Lane Discussion: https://postgr.es/m/457998.1621779290@sss.pgh.pa.us	2021-05-24 12:37:11 +12:00
David Rowley	9e215378d7	Fix planner's use of Result Cache with unique joins When the planner considered using a Result Cache node to cache results from the inner side of a Nested Loop Join, it failed to consider that the inner path's parameterization may not be the entire join condition. If the join was marked as inner_unique then we may accidentally put the cache in singlerow mode. This meant that entries would be marked as complete after caching the first row. That was wrong as if only part of the join condition was parameterized then the uniqueness of the unique join was not guaranteed at the Result Cache's level. The uniqueness is only guaranteed after Nested Loop applies the join filter. If subsequent rows were found, this would lead to: ERROR: cache entry already complete This could have been fixed by only putting the cache in singlerow mode if the entire join condition was parameterized. However, Nested Loop will only read its inner side so far as the first matching row when the join is unique, so that might mean we never get an opportunity to mark cache entries as complete. Since non-complete cache entries are useless for subsequent lookups, we just don't bother considering a Result Cache path in this case. In passing, remove the XXX comment that claimed the above ERROR might be better suited to be an Assert. After there being an actual case which triggered it, it seems better to keep it an ERROR. Reported-by: David Christensen Discussion: https://postgr.es/m/CAOxo6X+dy-V58iEPFgst8ahPKEU+38NZzUuc+a7wDBZd4TrHMQ@mail.gmail.com	2021-05-22 16:22:27 +12:00
Peter Eisentraut	544b28088f	doc: Improve hyphenation consistency	2021-04-21 08:14:43 +02:00
Tom Lane	7645376774	Rename find_em_expr_usable_for_sorting_rel. I didn't particularly like this function name, as it fails to express what's going on. Also, returning the sort expression alone isn't too helpful --- typically, a caller would also need some other fields of the EquivalenceMember. But the sole caller really only needs a bool result, so let's make it "bool relation_can_be_sorted_early()". Discussion: https://postgr.es/m/91f3ec99-85a4-fa55-ea74-33f85a5c651f@swarm64.com	2021-04-20 11:37:36 -04:00
Tom Lane	3753982441	Fix planner failure in some cases of sorting by an aggregate. An oversight introduced by the incremental-sort patches caused "could not find pathkey item to sort" errors in some situations where a sort key involves an aggregate or window function. The basic problem here is that find_em_expr_usable_for_sorting_rel isn't properly modeling what prepare_sort_from_pathkeys will do later. Rather than hoping we can keep those functions in sync, let's refactor so that they actually share the code for identifying a suitable sort expression. With this refactoring, tlist.c's tlist_member_ignore_relabel is unused. I removed it in HEAD but left it in place in v13, in case any extensions are using it. Per report from Luc Vlaming. Back-patch to v13 where the problem arose. James Coleman and Tom Lane Discussion: https://postgr.es/m/91f3ec99-85a4-fa55-ea74-33f85a5c651f@swarm64.com	2021-04-20 11:32:02 -04:00
Tom Lane	e1623b7d86	Fix obsolete comments referencing JoinPathExtraData.extra_lateral_rels. That field went away in commit `edca44b15`, but it seems that commit `45be99f8c` re-introduced some comments mentioning it. Noted by James Coleman, though this isn't exactly his proposed new wording. Also thanks to Justin Pryzby for software archaeology. Discussion: https://postgr.es/m/CAAaqYe8fxZjq3na+XkNx4C78gDqykH-7dbnzygm9Qa9nuDTePg@mail.gmail.com	2021-04-14 14:28:24 -04:00
David Rowley	50e17ad281	Speedup ScalarArrayOpExpr evaluation ScalarArrayOpExprs with "useOr=true" and a set of Consts on the righthand side have traditionally been evaluated by using a linear search over the array. When these arrays contain large numbers of elements then this linear search could become a significant part of execution time. Here we add a new method of evaluating ScalarArrayOpExpr expressions to allow them to be evaluated by first building a hash table containing each element, then on subsequent evaluations, we just probe that hash table to determine if there is a match. The planner is in charge of determining when this optimization is possible and it enables it by setting hashfuncid in the ScalarArrayOpExpr. The executor will only perform the hash table evaluation when the hashfuncid is set. This means that not all cases are optimized. For example CHECK constraints containing an IN clause won't go through the planner, so won't get the hashfuncid set. We could maybe do something about that at some later date. The reason we're not doing it now is from fear that we may slow down cases where the expression is evaluated only once. Those cases can be common, for example, a single row INSERT to a table with a CHECK constraint containing an IN clause. In the planner, we enable this when there are suitable hash functions for the ScalarArrayOpExpr's operator and only when there is at least MIN_ARRAY_SIZE_FOR_HASHED_SAOP elements in the array. The threshold is currently set to 9. Author: James Coleman, David Rowley Reviewed-by: David Rowley, Tomas Vondra, Heikki Linnakangas Discussion: https://postgr.es/m/CAAaqYe8x62+=wn0zvNKCj55tPpg-JBHzhZFFc6ANovdqFw7-dA@mail.gmail.com	2021-04-08 23:51:22 +12:00
David Rowley	9eacee2e62	Add Result Cache executor node (take 2) Here we add a new executor node type named "Result Cache". The planner can include this node type in the plan to have the executor cache the results from the inner side of parameterized nested loop joins. This allows caching of tuples for sets of parameters so that in the event that the node sees the same parameter values again, it can just return the cached tuples instead of rescanning the inner side of the join all over again. Internally, result cache uses a hash table in order to quickly find tuples that have been previously cached. For certain data sets, this can significantly improve the performance of joins. The best cases for using this new node type are for join problems where a large portion of the tuples from the inner side of the join have no join partner on the outer side of the join. In such cases, hash join would have to hash values that are never looked up, thus bloating the hash table and possibly causing it to multi-batch. Merge joins would have to skip over all of the unmatched rows. If we use a nested loop join with a result cache, then we only cache tuples that have at least one join partner on the outer side of the join. The benefits of using a parameterized nested loop with a result cache increase when there are fewer distinct values being looked up and the number of lookups of each value is large. Also, hash probes to lookup the cache can be much faster than the hash probe in a hash join as it's common that the result cache's hash table is much smaller than the hash join's due to result cache only caching useful tuples rather than all tuples from the inner side of the join. This variation in hash probe performance is more significant when the hash join's hash table no longer fits into the CPU's L3 cache, but the result cache's hash table does. The apparent "random" access of hash buckets with each hash probe can cause a poor L3 cache hit ratio for large hash tables. Smaller hash tables generally perform better. The hash table used for the cache limits itself to not exceeding work_mem * hash_mem_multiplier in size. We maintain a dlist of keys for this cache and when we're adding new tuples and realize we've exceeded the memory budget, we evict cache entries starting with the least recently used ones until we have enough memory to add the new tuples to the cache. For parameterized nested loop joins, we now consider using one of these result cache nodes in between the nested loop node and its inner node. We determine when this might be useful based on cost, which is primarily driven off of what the expected cache hit ratio will be. Estimating the cache hit ratio relies on having good distinct estimates on the nested loop's parameters. For now, the planner will only consider using a result cache for parameterized nested loop joins. This works for both normal joins and also for LATERAL type joins to subqueries. It is possible to use this new node for other uses in the future. For example, to cache results from correlated subqueries. However, that's not done here due to some difficulties obtaining a distinct estimation on the outer plan to calculate the estimated cache hit ratio. Currently we plan the inner plan before planning the outer plan so there is no good way to know if a result cache would be useful or not since we can't estimate the number of times the subplan will be called until the outer plan is generated. The functionality being added here is newly introducing a dependency on the return value of estimate_num_groups() during the join search. Previously, during the join search, we only ever needed to perform selectivity estimations. With this commit, we need to use estimate_num_groups() in order to estimate what the hit ratio on the result cache will be. In simple terms, if we expect 10 distinct values and we expect 1000 outer rows, then we'll estimate the hit ratio to be 99%. Since cache hits are very cheap compared to scanning the underlying nodes on the inner side of the nested loop join, then this will significantly reduce the planner's cost for the join. However, it's fairly easy to see here that things will go bad when estimate_num_groups() incorrectly returns a value that's significantly lower than the actual number of distinct values. If this happens then that may cause us to make use of a nested loop join with a result cache instead of some other join type, such as a merge or hash join. Our distinct estimations have been known to be a source of trouble in the past, so the extra reliance on them here could cause the planner to choose slower plans than it did previous to having this feature. Distinct estimations are also fairly hard to estimate accurately when several tables have been joined already or when a WHERE clause filters out a set of values that are correlated to the expressions we're estimating the number of distinct value for. For now, the costing we perform during query planning for result caches does put quite a bit of faith in the distinct estimations being accurate. When these are accurate then we should generally see faster execution times for plans containing a result cache. However, in the real world, we may find that we need to either change the costings to put less trust in the distinct estimations being accurate or perhaps even disable this feature by default. There's always an element of risk when we teach the query planner to do new tricks that it decides to use that new trick at the wrong time and causes a regression. Users may opt to get the old behavior by turning the feature off using the enable_resultcache GUC. Currently, this is enabled by default. It remains to be seen if we'll maintain that setting for the release. Additionally, the name "Result Cache" is the best name I could think of for this new node at the time I started writing the patch. Nobody seems to strongly dislike the name. A few people did suggest other names but no other name seemed to dominate in the brief discussion that there was about names. Let's allow the beta period to see if the current name pleases enough people. If there's some consensus on a better name, then we can change it before the release. Please see the 2nd discussion link below for the discussion on the "Result Cache" name. Author: David Rowley Reviewed-by: Andy Fan, Justin Pryzby, Zhihong Yu, Hou Zhijie Tested-By: Konstantin Knizhnik Discussion: https://postgr.es/m/CAApHDvrPcQyQdWERGYWx8J%2B2DLUNgXu%2BfOSbQ1UscxrunyXyrQ%40mail.gmail.com Discussion: https://postgr.es/m/CAApHDvq=yQXr5kqhRviT2RhNKwToaWr9JAN5t+5_PzhuRJ3wvg@mail.gmail.com	2021-04-02 14:10:56 +13:00
David Rowley	28b3e3905c	Revert `b6002a796` This removes "Add Result Cache executor node". It seems that something weird is going on with the tracking of cache hits and misses as highlighted by many buildfarm animals. It's not yet clear what the problem is as other parts of the plan indicate that the cache did work correctly, it's just the hits and misses that were being reported as 0. This is especially a bad time to have the buildfarm so broken, so reverting before too many more animals go red. Discussion: https://postgr.es/m/CAApHDvq_hydhfovm4=izgWs+C5HqEeRScjMbOgbpC-jRAeK3Yw@mail.gmail.com	2021-04-01 13:33:23 +13:00

1 2 3 4 5 ...

1124 commits