Commit graph

2393 commits

Author SHA1 Message Date
Tom Lane
445b9020c9 Fix incorrect tests for SRFs in relation_can_be_sorted_early().
Commit fac1b470a thought we could check for set-returning functions
by testing only the top-level node in an expression tree.  This is
wrong in itself, and to make matters worse it encouraged others
to make the same mistake, by exporting tlist.c's special-purpose
IS_SRF_CALL() as a widely-visible macro.  I can't find any evidence
that anyone's taken the bait, but it was only a matter of time.

Use expression_returns_set() instead, and stuff the IS_SRF_CALL()
genie back in its bottle, this time with a warning label.  I also
added a couple of cross-reference comments.

After a fair amount of fooling around, I've despaired of making
a robust test case that exposes the bug reliably, so no test case
here.  (Note that the test case added by fac1b470a is itself
broken, in that it doesn't notice if you remove the code change.
The repro given by the bug submitter currently doesn't fail either
in v15 or HEAD, though I suspect that may indicate an unrelated bug.)

Per bug #17564 from Martijn van Oosterhout.  Back-patch to v13,
as the faulty patch was.

Discussion: https://postgr.es/m/17564-c7472c2f90ef2da3@postgresql.org
2022-08-03 17:33:42 -04:00
David Rowley
23c2b76a83 Fix incorrect row estimates used for Memoize costing
In order to estimate the cache hit ratio of a Memoize node, one of the
inputs we require is the estimated number of times the Memoize node will
be rescanned.  The higher this number, the large the cache hit ratio is
likely to become.  Unfortunately, the value being passed as the number of
"calls" to the Memoize was incorrectly using the Nested Loop's
outer_path->parent->rows instead of outer_path->rows.  This failed to
account for the fact that the outer_path might be parameterized by some
upper-level Nested Loop.

This problem could lead to Memoize plans appearing more favorable than
they might actually be.  It could also lead to extended executor startup
times when work_mem values were large due to the planner setting overly
large MemoizePath->est_entries resulting in the Memoize hash table being
initially made much larger than might be required.

Fix this simply by passing outer_path->rows rather than
outer_path->parent->rows.  Also, adjust the expected regression test
output for a plan change.

Reported-by: Pavel Stehule
Author: David Rowley
Discussion: https://postgr.es/m/CAFj8pRAMp%3DQsMi6sPQJ4W3hczoFJRvyXHJV3AZAZaMyTVM312Q%40mail.gmail.com
Backpatch-through: 14, where Memoize was introduced
2022-05-16 16:08:37 +12:00
Tom Lane
ac51c9fba5 Make pull_var_clause() handle GroupingFuncs exactly like Aggrefs.
This follows in the footsteps of commit 2591ee8ec by removing one more
ill-advised shortcut from planning of GroupingFuncs.  It's true that
we don't intend to execute the argument expression(s) at runtime, but
we still have to process any Vars appearing within them, or we risk
failure at setrefs.c time (or more fundamentally, in EXPLAIN trying
to print such an expression).  Vars in upper plan nodes have to have
referents in the next plan level, whether we ever execute 'em or not.

Per bug #17479 from Michael J. Sullivan.  Back-patch to all supported
branches.

Richard Guo

Discussion: https://postgr.es/m/17479-6260deceaf0ad304@postgresql.org
2022-05-12 11:31:46 -04:00
Etsuro Fujita
ebb7902415 Disable asynchronous execution if using gating Result nodes.
mark_async_capable_plan(), which is called from create_append_plan() to
determine whether subplans are async-capable, failed to take into
account that the given subplan created from a given subpath might
include a gating Result node if the subpath is a SubqueryScanPath or
ForeignPath, causing a segmentation fault there when the subplan created
from a SubqueryScanPath includes the Result node, or causing
ExecAsyncRequest() to throw an error about an unrecognized node type
when the subplan created from a ForeignPath includes the Result node,
because in the latter case the Result node was unintentionally
considered as async-capable, but we don't currently support executing
Result nodes asynchronously.  Fix by modifying mark_async_capable_plan()
to disable asynchronous execution in such cases.  Also, adjust code in
the ProjectionPath case in mark_async_capable_plan(), for consistency
with other cases, and adjust/improve comments there.

is_async_capable_path() added in commit 27e1f1456, which was rewritten
to mark_async_capable_plan() in a later commit, has the same issue,
causing the error at execution mentioned above, so back-patch to v14
where the aforesaid commit went in.

Per report from Justin Pryzby.

Etsuro Fujita, reviewed by Zhihong Yu and Justin Pryzby.

Discussion: https://postgr.es/m/20220408124338.GK24419%40telsasoft.com
2022-04-28 15:15:02 +09:00
Tom Lane
da22ef388a Remove inadequate assertion check in CTE inlining.
inline_cte() expected to find exactly as many references to the
target CTE as its cterefcount indicates.  While that should be
accurate for the tree as emitted by the parser, there are some
optimizations that occur upstream of here that could falsify it,
notably removal of unused subquery output expressions.

Trying to make the accounting 100% accurate seems expensive and
doomed to future breakage.  It's not really worth it, because
all this code is protecting is downstream assumptions that every
referenced CTE has a plan.  Let's convert those assertions to
regular test-and-elog just in case there's some actual problem,
and then drop the failing assertion.

Per report from Tomas Vondra (thanks also to Richard Guo for
analysis).  Back-patch to v12 where the faulty code came in.

Discussion: https://postgr.es/m/29196a1e-ed47-c7ca-9be2-b1c636816183@enterprisedb.com
2022-04-21 17:58:52 -04:00
Tom Lane
48b6035f0f Fix assorted missing logic for GroupingFunc nodes.
The planner needs to treat GroupingFunc like Aggref for many purposes,
in particular with respect to processing of the argument expressions,
which are not to be evaluated at runtime.  A few places hadn't gotten
that memo, notably including subselect.c's processing of outer-level
aggregates.  This resulted in assertion failures or wrong plans for
cases in which a GROUPING() construct references an outer aggregation
level.

Also fix missing special cases for GroupingFunc in cost_qual_eval
(resulting in wrong cost estimates for GROUPING(), although it's
not clear that that would affect plan shapes in practice) and in
ruleutils.c (resulting in excess parentheses in pretty-print mode).

Per bug #17088 from Yaoguang Chen.  Back-patch to all supported
branches.

Richard Guo, Tom Lane

Discussion: https://postgr.es/m/17088-e33882b387de7f5c@postgresql.org
2022-03-21 17:44:29 -04:00
Tom Lane
277e744ae1 Don't use_physical_tlist for an IOS with non-returnable columns.
createplan.c tries to save a runtime projection step by specifying
a scan plan node's output as being exactly the table's columns, or
index's columns in the case of an index-only scan, if there is not a
reason to do otherwise.  This logic did not previously pay attention
to whether an index's columns are returnable.  That worked, sort of
accidentally, until commit 9a3ddeb51 taught setrefs.c to reject plans
that try to read a non-returnable column.  I have no desire to loosen
setrefs.c's new check, so instead adjust use_physical_tlist() to not
try to optimize this way when there are non-returnable column(s).

Per report from Ryan Kelly.  Like the previous patch, back-patch
to all supported branches.

Discussion: https://postgr.es/m/CAHUie24ddN+pDNw7fkhNrjrwAX=fXXfGZZEHhRuofV_N_ftaSg@mail.gmail.com
2022-02-11 15:23:52 -05:00
David Rowley
357ff66153 Consider parallel awareness when removing single-child Appends
8edd0e794 added some code to remove Append and MergeAppend nodes when they
contained a single child node.  As it turned out, this was unsafe to do
when the Append/MergeAppend was parallel_aware and the child node was not.
Removing the Append/MergeAppend, in this case, could lead to the child plan
being called multiple times by parallel workers when it was unsafe to do
so.

Here we fix this by just not removing the Append/MergeAppend when the
parallel_aware flag of the parent and child node don't match.

Reported-by: Yura Sokolov
Bug: #17335
Discussion: https://postgr.es/m/b59605fecb20ba9ea94e70ab60098c237c870628.camel%40postgrespro.ru
Backpatch-through: 12, where 8edd0e794 was first introduced
2022-01-25 21:14:27 +13:00
Tom Lane
ef4edf88df Suppress variable-set-but-not-used warning from clang 13.
In the normal configuration where GEQO_DEBUG isn't defined,
recent clang versions have started to complain that geqo_main.c
accumulates the edge_failures count but never does anything
with it.  As a minimal back-patchable fix, insert a void cast
to silence this warning.  (I'd speculated about ripping out the
GEQO_DEBUG logic altogether, but I don't think we'd wish to
back-patch that.)

Per recently-established project policy, this is a candidate
for back-patching into out-of-support branches: it suppresses
an annoying compiler warning but changes no behavior.  Hence,
back-patch all the way to 9.2.

Discussion: https://postgr.es/m/CA+hUKGLTSZQwES8VNPmWO9AO0wSeLt36OCPDAZTccT1h7Q7kTQ@mail.gmail.com
2022-01-23 11:09:19 -05:00
Tomas Vondra
72ac4d71b5 Correct type of front_pathkey to PathKey
In sort_inner_and_outer we iterate a list of PathKey elements, but the
variable is declared as (List *). This mistake is benign, because we
only pass the pointer to lcons() and never dereference it.

This exists since ~2004, but it's confusing. So fix and backpatch to all
supported branches.

Backpatch-through: 10
Discussion: https://postgr.es/m/bf3a6ea1-a7d8-7211-0669-189d5c169374%40enterprisedb.com
2022-01-23 04:05:08 +01:00
Tom Lane
17da9d4c28 Teach hash_ok_operator() that record_eq is only sometimes hashable.
The need for this was foreseen long ago, but when record_eq
actually became hashable (in commit 01e658fa7), we missed updating
this spot.

Per bug #17363 from Elvis Pranskevichus.  Back-patch to v14 where
the faulty commit came in.

Discussion: https://postgr.es/m/17363-f6d42fd0d726be02@postgresql.org
2022-01-16 16:39:26 -05:00
Tom Lane
d228af79d0 Fix index-only scan plans, take 2.
Commit 4ace45677 failed to fix the problem fully, because the
same issue of attempting to fetch a non-returnable index column
can occur when rechecking the indexqual after using a lossy index
operator.  Moreover, it broke EXPLAIN for such indexquals (which
indicates a gap in our test cases :-().

Revert the code changes of 4ace45677 in favor of adding a new field
to struct IndexOnlyScan, containing a version of the indexqual that
can be executed against the index-returned tuple without using any
non-returnable columns.  (The restrictions imposed by check_index_only
guarantee this is possible, although we may have to recompute indexed
expressions.)  Support construction of that during setrefs.c
processing by marking IndexOnlyScan.indextlist entries as resjunk
if they can't be returned, rather than removing them entirely.
(We could alternatively require setrefs.c to look up the IndexOptInfo
again, but abusing resjunk this way seems like a reasonably safe way
to avoid needing to do that.)

This solution isn't great from an API-stability standpoint: if there
are any extensions out there that build IndexOnlyScan structs directly,
they'll be broken in the next minor releases.  However, only a very
invasive extension would be likely to do such a thing.  There's no
change in the Path representation, so typical planner extensions
shouldn't have a problem.

As before, back-patch to all supported branches.

Discussion: https://postgr.es/m/3179992.1641150853@sss.pgh.pa.us
Discussion: https://postgr.es/m/17350-b5bdcf476e5badbb@postgresql.org
2022-01-03 15:42:27 -05:00
Tom Lane
cabea571d1 Fix index-only scan plans when not all index columns can be returned.
If an index has both returnable and non-returnable columns, and one of
the non-returnable columns is an expression using a Var that is in a
returnable column, then a query returning that expression could result
in an index-only scan plan that attempts to read the non-returnable
column, instead of recomputing the expression from the returnable
column as intended.

To fix, redefine the "indextlist" list of an IndexOnlyScan plan node
as containing null Consts in place of any non-returnable columns.
This solves the problem by preventing setrefs.c from falsely matching
to such entries.  The executor is happy since it only cares about the
exposed types of the entries, and ruleutils.c doesn't care because a
correct plan won't reference those entries.  I considered some other
ways to prevent setrefs.c from doing the wrong thing, but this way
seems good since (a) it allows a very localized fix, (b) it makes
the indextlist structure more compact in many cases, and (c) the
indextlist is now a more faithful representation of what the index AM
will actually produce, viz. nulls for any non-returnable columns.

This is easier to hit since we introduced included columns, but it's
possible to construct failing examples without that, as per the
added regression test.  Hence, back-patch to all supported branches.

Per bug #17350 from Louis Jachiet.

Discussion: https://postgr.es/m/17350-b5bdcf476e5badbb@postgresql.org
2022-01-01 16:12:03 -05:00
David Rowley
c2dc7b9e15 Flush Memoize cache when non-key parameters change, take 2
It's possible that a subplan below a Memoize node contains a parameter
from above the Memoize node.  If this parameter changes then cache entries
may become out-dated due to the new parameter value.

Previously Memoize was mistakenly not aware of this.  We fix this here by
flushing the cache whenever a parameter that's not part of the cache
key changes.

Bug: #17213
Reported by: Elvis Pranskevichus
Author: David Rowley
Discussion: https://postgr.es/m/17213-988ed34b225a2862@postgresql.org
Backpatch-through: 14, where Memoize was added
2021-11-24 23:29:56 +13:00
David Rowley
7933bc0d13 Revert "Flush Memoize cache when non-key parameters change"
This reverts commit f94edb06ab.
2021-11-24 15:28:34 +13:00
David Rowley
f94edb06ab Flush Memoize cache when non-key parameters change
It's possible that a subplan below a Memoize node contains a parameter
from above the Memoize node.  If this parameter changes then cache entries
may become out-dated due to the new parameter value.

Previously Memoize was mistakenly not aware of this.  We fix this here by
flushing the cache whenever a parameter that's not part of the cache
key changes.

Bug: #17213
Reported by: Elvis Pranskevichus
Author: David Rowley
Discussion: https://postgr.es/m/17213-988ed34b225a2862@postgresql.org
Backpatch-through: 14, where Memoize was added
2021-11-24 14:57:07 +13:00
David Rowley
6c32c09777 Allow Memoize to operate in binary comparison mode
Memoize would always use the hash equality operator for the cache key
types to determine if the current set of parameters were the same as some
previously cached set.  Certain types such as floating points where -0.0
and +0.0 differ in their binary representation but are classed as equal by
the hash equality operator may cause problems as unless the join uses the
same operator it's possible that whichever join operator is being used
would be able to distinguish the two values.  In which case we may
accidentally return in the incorrect rows out of the cache.

To fix this here we add a binary mode to Memoize to allow it to the
current set of parameters to previously cached values by comparing
bit-by-bit rather than logically using the hash equality operator.  This
binary mode is always used for LATERAL joins and it's used for normal
joins when any of the join operators are not hashable.

Reported-by: Tom Lane
Author: David Rowley
Discussion: https://postgr.es/m/3004308.1632952496@sss.pgh.pa.us
Backpatch-through: 14, where Memoize was added
2021-11-24 10:07:38 +13:00
David Rowley
1f194ed6c2 Fix incorrect hash equality operator bug in Memoize
In v14, because we don't have a field in RestrictInfo to cache both the
left and right type's hash equality operator, we just restrict the scope
of Memoize to only when the left and right types of a RestrictInfo are the
same.

In master we add another field to RestrictInfo and cache both hash
equality operators.

Reported-by: Jaime Casanova
Author: David Rowley
Discussion: https://postgr.es/m/20210929185544.GB24346%40ahch-to
Backpatch-through: 14
2021-11-08 14:41:13 +13:00
Tom Lane
08cfa5981e Avoid O(N^2) behavior in SyncPostCheckpoint().
As in commits 6301c3ada and e9d9ba2a4, avoid doing repetitive
list_delete_first() operations, since that would be expensive when
there are many files waiting to be unlinked.  This is a slightly
larger change than in those cases.  We have to keep the list state
valid for calls to AbsorbSyncRequests(), so it's necessary to invent a
"canceled" field instead of immediately deleting PendingUnlinkEntry
entries.  Also, because we might not be able to process all the
entries, we need a new list primitive list_delete_first_n().

list_delete_first_n() is almost list_copy_tail(), but it modifies the
input List instead of making a new copy.  I found a couple of existing
uses of the latter that could profitably use the new function.  (There
might be more, but the other callers look like they probably shouldn't
overwrite the input List.)

As before, back-patch to v13.

Discussion: https://postgr.es/m/CD2F0E7F-9822-45EC-A411-AE56F14DEA9F@amazon.com
2021-11-02 11:31:54 -04:00
Tom Lane
fd059ac2e4 Fix planner error with pulling up subquery expressions into function RTEs.
If a function-in-FROM laterally references the output of some sub-SELECT
earlier in the FROM clause, and we are able to flatten that sub-SELECT
into the outer query, the expression(s) copied into the function RTE
missed being processed by eval_const_expressions.  This'd lead to trouble
and probable crashes at execution if such expressions contained
named-argument function call syntax or functions with defaulted arguments.
The bug is masked if the query contains any explicit JOIN syntax, which
may help explain why we'd not noticed.

Per bug #17227 from Bernd Dorn.  This is an oversight in commit 7266d0997,
so back-patch to v13 where that came in.

Discussion: https://postgr.es/m/17227-5a28ed1512189fa4@postgresql.org
2021-10-14 12:43:43 -04:00
Etsuro Fujita
ef2e107f54 Add missing word to comment in joinrels.c.
Author: Amit Langote
Backpatch-through: 13
Discussion: https://postgr.es/m/CA%2BHiwqGQNbtamQ_9DU3osR1XiWR4wxWFZurPmN6zgbdSZDeWmw%40mail.gmail.com
2021-10-07 17:45:01 +09:00
Tom Lane
4d5b4483db Fix pull_varnos to cope with translated PlaceHolderVars.
Commit 55dc86eca changed pull_varnos to use (if possible) the associated
ph_eval_at for a PlaceHolderVar.  I missed a fine point though: we might
be looking at a PHV in the quals or tlist of a child appendrel, in which
case we need to compute a ph_eval_at value that's been translated in the
same way that the PHV itself has been (cf. adjust_appendrel_attrs).
Fortunately, enough info is available in the PlaceHolderInfo to make
such translation possible without additional outside data, so we don't
need another round of uglification of planner APIs.  This is a little
bit complicated, but since it's a hard-to-hit corner case, I'm not much
worried about adding cycles here.

Per report from Jaime Casanova.  Back-patch to v12, like the previous
commit.

Discussion: https://postgr.es/m/20210915230959.GB17635@ahch-to
2021-09-17 15:41:16 -04:00
Tom Lane
29aa0ce361 Fix planner error with multiple copies of an AlternativeSubPlan.
It's possible for us to copy an AlternativeSubPlan expression node
into multiple places, for example the scan quals of several
partition children.  Then it's possible that we choose a different
one of the alternatives as optimal in each place.  Commit 41efb8340
failed to consider this scenario, so its attempt to remove "unused"
subplans could remove subplans that were still used elsewhere.

Fix by delaying the removal logic until we've examined all the
AlternativeSubPlans in a given query level.  (This does assume that
AlternativeSubPlans couldn't get copied to other query levels, but
for the foreseeable future that's fine; cf qual_is_pushdown_safe.)

Per report from Rajkumar Raghuwanshi.  Back-patch to v14
where the faulty logic came in.

Discussion: https://postgr.es/m/CAKcux6==O3NNZC3bZ2prRYv3cjm3_Zw1GfzmOjEVqYN4jub2+Q@mail.gmail.com
2021-09-14 15:11:21 -04:00
Tom Lane
983d7033df Fix missed lock acquisition while inlining new-style SQL functions.
When starting to use a query parsetree loaded from the catalogs,
we must begin by applying AcquireRewriteLocks(), to obtain the same
relation locks that the parser would have gotten if the query were
entered interactively, and to do some other cleanup such as dealing
with later-dropped columns.  New-style SQL functions are just as
subject to this rule as other stored parsetrees; however, of the
places dealing with such functions, only init_sql_fcache had gotten
the memo.  In particular, if we successfully inlined a new-style
set-returning SQL function that contained any relation references,
we'd either get an assertion failure or attempt to use those
relation(s) sans locks.

I also added AcquireRewriteLocks calls to fmgr_sql_validator and
print_function_sqlbody.  Desultory experiments didn't demonstrate any
failures in those, but I suspect that I just didn't try hard enough.
Certainly we don't expect nearby code paths to operate without locks.

On the same logic of it-ought-to-have-the-same-effects-as-the-old-code,
call pg_rewrite_query() in fmgr_sql_validator, too.  It's possible
that neither code path there needs to bother with rewriting, but
doing the analysis to prove that is beyond my goals for today.

Per bug #17161 from Alexander Lakhin.

Discussion: https://postgr.es/m/17161-048a1cdff8422800@postgresql.org
2021-08-31 12:02:36 -04:00
David Rowley
dece64a941 Fix incorrect comment for get_agg_clause_costs
Adjust the header comment in get_agg_clause_costs so that it matches what
the function currently does.  No recursive searching has been done ever
since 0a2bc5d61.  It also does not determine the aggtranstype like the
comment claimed. That's all done in preprocess_aggref().
preprocess_aggref also now determines the numOrderedAggs, so remove the
mention that get_agg_clause_costs also calculates "counts".

Normally, since this is just an adjustment of a comment it might not be
worth back-patching, but since this code is new to PG14 and that version
is still in beta, then it seems worth having the comments match.

Discussion: https://postgr.es/m/CAApHDvrrGrTJFPELrjx0CnDtz9B7Jy2XYW3Z2BKifAWLSaJYwQ@mail.gmail.com
Backpatch-though: 14
2021-07-26 14:56:09 +12:00
Tom Lane
b154ee63bb Get rid of artificial restriction on hash table sizes on Windows.
The point of introducing the hash_mem_multiplier GUC was to let users
reproduce the old behavior of hash aggregation, i.e. that it could use
more than work_mem at need.  However, the implementation failed to get
the job done on Win64, where work_mem is clamped to 2GB to protect
various places that calculate memory sizes using "long int".  As
written, the same clamp was applied to hash_mem.  This resulted in
severe performance regressions for queries requiring a bit more than
2GB for hash aggregation, as they now spill to disk and there's no
way to stop that.

Getting rid of the work_mem restriction seems like a good idea, but
it's a big job and could not conceivably be back-patched.  However,
there's only a fairly small number of places that are concerned with
the hash_mem value, and it turns out to be possible to remove the
restriction there without too much code churn or any ABI breaks.
So, let's do that for now to fix the regression, and leave the
larger task for another day.

This patch does introduce a bit more infrastructure that should help
with the larger task, namely pg_bitutils.h support for working with
size_t values.

Per gripe from Laurent Hasson.  Back-patch to v13 where the
behavior change came in.

Discussion: https://postgr.es/m/997817.1627074924@sss.pgh.pa.us
Discussion: https://postgr.es/m/MN2PR15MB25601E80A9B6D1BA6F592B1985E39@MN2PR15MB2560.namprd15.prod.outlook.com
2021-07-25 14:02:27 -04:00
David Rowley
47ca483644 Change the name of the Result Cache node to Memoize
"Result Cache" was never a great name for this node, but nobody managed
to come up with another name that anyone liked enough.  That was until
David Johnston mentioned "Node Memoization", which Tom Lane revised to
just "Memoize".  People seem to like "Memoize", so let's do the rename.

Reviewed-by: Justin Pryzby
Discussion: https://postgr.es/m/20210708165145.GG1176@momjian.us
Backpatch-through: 14, where Result Cache was introduced
2021-07-14 12:45:00 +12:00
Tom Lane
1d98fdaed8 Avoid creating a RESULT RTE that's marked LATERAL.
Commit 7266d0997 added code to pull up simple constant function
results, converting the RTE_FUNCTION RTE to a dummy RTE_RESULT
RTE since it no longer need be scanned.  But I forgot to clear
the LATERAL flag if the RTE has it set.  If the function reduced
to a constant, it surely contains no lateral references so this
simplification is logically OK.  It's needed because various other
places will Assert that RESULT RTEs aren't LATERAL.

Per bug #17097 from Yaoguang Chen.  Back-patch to v13 where the
faulty code came in.

Discussion: https://postgr.es/m/17097-3372ef9f798fc94f@postgresql.org
2021-07-09 13:38:24 -04:00
Tom Lane
e56bce5d43 Reconsider the handling of procedure OUT parameters.
Commit 2453ea142 redefined pg_proc.proargtypes to include the types of
OUT parameters, for procedures only.  While that had some advantages
for implementing the SQL-spec behavior of DROP PROCEDURE, it was pretty
disastrous from a number of other perspectives.  Notably, since the
primary key of pg_proc is name + proargtypes, this made it possible to
have multiple procedures with identical names + input arguments and
differing output argument types.  That would make it impossible to call
any one of the procedures by writing just NULL (or "?", or any other
data-type-free notation) for the output argument(s).  The change also
seems likely to cause grave confusion for client applications that
examine pg_proc and expect the traditional definition of proargtypes.

Hence, revert the definition of proargtypes to what it was, and
undo a number of complications that had been added to support that.

To support the SQL-spec behavior of DROP PROCEDURE, when there are
no argmode markers in the command's parameter list, we perform the
lookup both ways (that is, matching against both proargtypes and
proallargtypes), succeeding if we get just one unique match.
In principle this could result in ambiguous-function failures
that would not happen when using only one of the two rules.
However, overloading of procedure names is thought to be a pretty
rare usage, so this shouldn't cause many problems in practice.
Postgres-specific code such as pg_dump can defend against any
possibility of such failures by being careful to specify argmodes
for all procedure arguments.

This also fixes a few other bugs in the area of CALL statements
with named parameters, and improves the documentation a little.

catversion bump forced because the representation of procedures
with OUT arguments changes.

Discussion: https://postgr.es/m/3742981.1621533210@sss.pgh.pa.us
2021-06-10 17:11:36 -04:00
Tom Lane
889592344c Fix planner's row-mark code for inheritance from a foreign table.
Commit 428b260f8 broke planning of cases where row marks are needed
(SELECT FOR UPDATE, etc) and one of the query's tables is a foreign
table that has regular table(s) as inheritance children.  We got the
reverse case right, but apparently were thinking that foreign tables
couldn't be inheritance parents.  Not so; so we need to be able to
add a CTID junk column while adding a new child, not only a wholerow
junk column.

Back-patch to v12 where the faulty code came in.

Amit Langote

Discussion: https://postgr.es/m/CA+HiwqEmo3FV1LAQ4TVyS2h1WM=kMkZUmbNuZSCnfHvMcUcPeA@mail.gmail.com
2021-06-02 14:38:14 -04:00
Tom Lane
6ee41a301e Fix mis-planning of repeated application of a projection.
create_projection_plan contains a hidden assumption (here made
explicit by an Assert) that a projection-capable Path will yield a
projection-capable Plan.  Unfortunately, that assumption is violated
only a few lines away, by create_projection_plan itself.  This means
that two stacked ProjectionPaths can yield an outcome where we try to
jam the upper path's tlist into a non-projection-capable child node,
resulting in an invalid plan.

There isn't any good reason to have stacked ProjectionPaths; indeed the
whole concept is faulty, since the set of Vars/Aggs/etc needed by the
upper one wouldn't necessarily be available in the output of the lower
one, nor could the lower one create such values if they weren't
available from its input.  Hence, we can fix this by adjusting
create_projection_path to strip any top-level ProjectionPath from the
subpath it's given.  (This amounts to saying "oh, we changed our
minds about what we need to project here".)

The test case added here only fails in v13 and HEAD; before that, we
don't attempt to shove the Sort into the parallel part of the plan,
for reasons that aren't entirely clear to me.  However, all the
directly-related code looks generally the same as far back as v11,
where the hazard was introduced (by d7c19e62a).  So I've got no faith
that the same type of bug doesn't exist in v11 and v12, given the
right test case.  Hence, back-patch the code changes, but not the
irrelevant test case, into those branches.

Per report from Bas Poot.

Discussion: https://postgr.es/m/534fca83789c4a378c7de379e9067d4f@politie.nl
2021-05-31 12:03:00 -04:00
Tom Lane
e30e3fdea8 Fix use of uninitialized variable in inline_function().
Commit e717a9a18 introduced a code path that bypassed the call of
get_expr_result_type, which is not good because we need its rettupdesc
result to pass to check_sql_fn_retval.  We'd failed to notice right
away because the code path in which check_sql_fn_retval uses that
argument is fairly hard to reach in this context.  It's not impossible
though, and in any case inline_function would have no business
assuming that check_sql_fn_retval doesn't need that value.

To fix, move get_expr_result_type out of the if-block, which in
turn requires moving the construction of the dummy FuncExpr
out of it.

Per report from Ranier Vilela.  (I'm bemused by the lack of any
compiler complaints...)

Discussion: https://postgr.es/m/CAEudQAqBqQpQ3HruWAGU_7WaMJ7tntpk0T8k_dVtNB46DqdBgw@mail.gmail.com
2021-05-25 12:55:55 -04:00
David Rowley
cba5c70b95 Fix setrefs.c code for Result Cache nodes
Result Cache, added in 9eacee2e6 neglected to properly adjust the plan
references in setrefs.c.  This could lead to the following error during
EXPLAIN:

ERROR:  cannot decompile join alias var in plan tree

Fix that.

Bug: 17030
Reported-by: Hans Buschmann
Discussion: https://postgr.es/m/17030-5844aecae42fe223@postgresql.org
2021-05-25 12:50:22 +12:00
David Rowley
99c5852e20 Add missing NULL check when building Result Cache paths
Code added in 9e215378d to disable building of Result Cache paths when
not all join conditions are part of the parameterization of a unique
join failed to first check if the inner path's param_info was set before
checking the param_info's ppi_clauses.

Add a check for NULL values here and just bail on trying to build the
path if param_info is NULL. lateral_vars are not considered when
deciding if the join is unique, so we're not missing out on doing the
optimization when there are lateral_vars and no param_info.

Reported-by: Coverity, via Tom Lane
Discussion: https://postgr.es/m/457998.1621779290@sss.pgh.pa.us
2021-05-24 12:37:11 +12:00
David Rowley
9e215378d7 Fix planner's use of Result Cache with unique joins
When the planner considered using a Result Cache node to cache results
from the inner side of a Nested Loop Join, it failed to consider that the
inner path's parameterization may not be the entire join condition.  If
the join was marked as inner_unique then we may accidentally put the cache
in singlerow mode.  This meant that entries would be marked as complete
after caching the first row.  That was wrong as if only part of the join
condition was parameterized then the uniqueness of the unique join was not
guaranteed at the Result Cache's level.  The uniqueness is only guaranteed
after Nested Loop applies the join filter.  If subsequent rows were found,
this would lead to:

ERROR: cache entry already complete

This could have been fixed by only putting the cache in singlerow mode if
the entire join condition was parameterized.  However, Nested Loop will
only read its inner side so far as the first matching row when the join is
unique, so that might mean we never get an opportunity to mark cache
entries as complete.  Since non-complete cache entries are useless for
subsequent lookups, we just don't bother considering a Result Cache path
in this case.

In passing, remove the XXX comment that claimed the above ERROR might be
better suited to be an Assert.  After there being an actual case which
triggered it, it seems better to keep it an ERROR.

Reported-by: David Christensen
Discussion: https://postgr.es/m/CAOxo6X+dy-V58iEPFgst8ahPKEU+38NZzUuc+a7wDBZd4TrHMQ@mail.gmail.com
2021-05-22 16:22:27 +12:00
Tom Lane
def5b065ff Initial pgindent and pgperltidy run for v14.
Also "make reformat-dat-files".

The only change worthy of note is that pgindent messed up the formatting
of launcher.c's struct LogicalRepWorkerId, which led me to notice that
that struct wasn't used at all anymore, so I just took it out.
2021-05-12 13:14:10 -04:00
Tom Lane
049e1e2edb Fix mishandling of resjunk columns in ON CONFLICT ... UPDATE tlists.
It's unusual to have any resjunk columns in an ON CONFLICT ... UPDATE
list, but it can happen when MULTIEXPR_SUBLINK SubPlans are present.
If it happens, the ON CONFLICT UPDATE code path would end up storing
tuples that include the values of the extra resjunk columns.  That's
fairly harmless in the short run, but if new columns are added to
the table then the values would become accessible, possibly leading
to malfunctions if they don't match the datatypes of the new columns.

This had escaped notice through a confluence of missing sanity checks,
including

* There's no cross-check that a tuple presented to heap_insert or
heap_update matches the table rowtype.  While it's difficult to
check that fully at reasonable cost, we can easily add assertions
that there aren't too many columns.

* The output-column-assignment cases in execExprInterp.c lacked
any sanity checks on the output column numbers, which seems like
an oversight considering there are plenty of assertion checks on
input column numbers.  Add assertions there too.

* We failed to apply nodeModifyTable's ExecCheckPlanOutput() to
the ON CONFLICT UPDATE tlist.  That wouldn't have caught this
specific error, since that function is chartered to ignore resjunk
columns; but it sure seems like a bad omission now that we've seen
this bug.

In HEAD, the right way to fix this is to make the processing of
ON CONFLICT UPDATE tlists work the same as regular UPDATE tlists
now do, that is don't add "SET x = x" entries, and use
ExecBuildUpdateProjection to evaluate the tlist and combine it with
old values of the not-set columns.  This adds a little complication
to ExecBuildUpdateProjection, but allows removal of a comparable
amount of now-dead code from the planner.

In the back branches, the most expedient solution seems to be to
(a) use an output slot for the ON CONFLICT UPDATE projection that
actually matches the target table, and then (b) invent a variant of
ExecBuildProjectionInfo that can be told to not store values resulting
from resjunk columns, so it doesn't try to store into nonexistent
columns of the output slot.  (We can't simply ignore the resjunk columns
altogether; they have to be evaluated for MULTIEXPR_SUBLINK to work.)
This works back to v10.  In 9.6, projections work much differently and
we can't cheaply give them such an option.  The 9.6 version of this
patch works by inserting a JunkFilter when it's necessary to get rid
of resjunk columns.

In addition, v11 and up have the reverse problem when trying to
perform ON CONFLICT UPDATE on a partitioned table.  Through a
further oversight, adjust_partition_tlist() discarded resjunk columns
when re-ordering the ON CONFLICT UPDATE tlist to match a partition.
This accidentally prevented the storing-bogus-tuples problem, but
at the cost that MULTIEXPR_SUBLINK cases didn't work, typically
crashing if more than one row has to be updated.  Fix by preserving
resjunk columns in that routine.  (I failed to resist the temptation
to add more assertions there too, and to do some minor code
beautification.)

Per report from Andres Freund.  Back-patch to all supported branches.

Security: CVE-2021-32028
2021-05-10 11:02:29 -04:00
Thomas Munro
ec48314708 Revert per-index collation version tracking feature.
Design problems were discovered in the handling of composite types and
record types that would cause some relevant versions not to be recorded.
Misgivings were also expressed about the use of the pg_depend catalog
for this purpose.  We're out of time for this release so we'll revert
and try again.

Commits reverted:

1bf946bd: Doc: Document known problem with Windows collation versions.
cf002008: Remove no-longer-relevant test case.
ef387bed: Fix bogus collation-version-recording logic.
0fb0a050: Hide internal error for pg_collation_actual_version(<bad OID>).
ff942057: Suppress "warning: variable 'collcollate' set but not used".
d50e3b1f: Fix assertion in collation version lookup.
f24b1569: Rethink extraction of collation dependencies.
257836a7: Track collation versions for indexes.
cd6f479e: Add pg_depend.refobjversion.
7d1297df: Remove pg_collation.collversion.

Discussion: https://postgr.es/m/CA%2BhUKGLhj5t1fcjqAu8iD9B3ixJtsTNqyCCD4V0aTO9kAKAjjA%40mail.gmail.com
2021-05-07 21:10:11 +12:00
Alvaro Herrera
8aba932251
Fix relcache inconsistency hazard in partition detach
During queries coming from ri_triggers.c, we need to omit partitions
that are marked pending detach -- otherwise, the RI query is tricked
into allowing a row into the referencing table whose corresponding row
is in the detached partition.  Which is bogus: once the detach operation
completes, the row becomes an orphan.

However, the code was not doing that in repeatable-read transactions,
because relcache kept a copy of the partition descriptor that included
the partition, and used it in the RI query.  This commit changes the
partdesc cache code to only keep descriptors that aren't dependent on
a snapshot (namely: those where no detached partition exist, and those
where detached partitions are included).  When a partdesc-without-
detached-partitions is requested, we create one afresh each time; also,
those partdescs are stored in PortalContext instead of
CacheMemoryContext.

find_inheritance_children gets a new output *detached_exist boolean,
which indicates whether any partition marked pending-detach is found.
Its "include_detached" input flag is changed to "omit_detached", because
that name captures desired the semantics more naturally.
CreatePartitionDirectory() and RelationGetPartitionDesc() arguments are
identically renamed.

This was noticed because a buildfarm member that runs with relcache
clobbering, which would not keep the improperly cached partdesc, broke
one test, which led us to realize that the expected output of that test
was bogus.  This commit also corrects that expected output.

Author: Amit Langote <amitlangote09@gmail.com>
Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/3269784.1617215412@sss.pgh.pa.us
2021-04-22 15:13:25 -04:00
Peter Eisentraut
544b28088f doc: Improve hyphenation consistency 2021-04-21 08:14:43 +02:00
Tom Lane
7645376774 Rename find_em_expr_usable_for_sorting_rel.
I didn't particularly like this function name, as it fails to
express what's going on.  Also, returning the sort expression
alone isn't too helpful --- typically, a caller would also
need some other fields of the EquivalenceMember.  But the
sole caller really only needs a bool result, so let's make
it "bool relation_can_be_sorted_early()".

Discussion: https://postgr.es/m/91f3ec99-85a4-fa55-ea74-33f85a5c651f@swarm64.com
2021-04-20 11:37:36 -04:00
Tom Lane
3753982441 Fix planner failure in some cases of sorting by an aggregate.
An oversight introduced by the incremental-sort patches caused
"could not find pathkey item to sort" errors in some situations
where a sort key involves an aggregate or window function.

The basic problem here is that find_em_expr_usable_for_sorting_rel
isn't properly modeling what prepare_sort_from_pathkeys will do
later.  Rather than hoping we can keep those functions in sync,
let's refactor so that they actually share the code for
identifying a suitable sort expression.

With this refactoring, tlist.c's tlist_member_ignore_relabel
is unused.  I removed it in HEAD but left it in place in v13,
in case any extensions are using it.

Per report from Luc Vlaming.  Back-patch to v13 where the
problem arose.

James Coleman and Tom Lane

Discussion: https://postgr.es/m/91f3ec99-85a4-fa55-ea74-33f85a5c651f@swarm64.com
2021-04-20 11:32:02 -04:00
Tom Lane
1111b2668d Undo decision to allow pg_proc.prosrc to be NULL.
Commit e717a9a18 changed the longstanding rule that prosrc is NOT NULL
because when a SQL-language function is written in SQL-standard style,
we don't currently have anything useful to put there.  This seems a poor
decision though, as it could easily have negative impacts on external
PLs (opening them to crashes they didn't use to have, for instance).
SQL-function-related code can just as easily test "is prosqlbody not
null" as "is prosrc null", so there's no real gain there either.
Hence, revert the NOT NULL marking removal and adjust related logic.

For now, we just put an empty string into prosrc for SQL-standard
functions.  Maybe we'll have a better idea later, although the
history of things like pg_attrdef.adsrc suggests that it's not
easy to maintain a string equivalent of a node tree.

This also adds an assertion that queryDesc->sourceText != NULL
to standard_ExecutorStart.  We'd been silently relying on that
for awhile, so let's make it less silent.

Also fix some overlooked documentation and test cases.

Discussion: https://postgr.es/m/2197698.1617984583@sss.pgh.pa.us
2021-04-15 17:17:20 -04:00
Tom Lane
e1623b7d86 Fix obsolete comments referencing JoinPathExtraData.extra_lateral_rels.
That field went away in commit edca44b15, but it seems that
commit 45be99f8c re-introduced some comments mentioning it.
Noted by James Coleman, though this isn't exactly his
proposed new wording.  Also thanks to Justin Pryzby for
software archaeology.

Discussion: https://postgr.es/m/CAAaqYe8fxZjq3na+XkNx4C78gDqykH-7dbnzygm9Qa9nuDTePg@mail.gmail.com
2021-04-14 14:28:24 -04:00
David Rowley
50e17ad281 Speedup ScalarArrayOpExpr evaluation
ScalarArrayOpExprs with "useOr=true" and a set of Consts on the righthand
side have traditionally been evaluated by using a linear search over the
array.  When these arrays contain large numbers of elements then this
linear search could become a significant part of execution time.

Here we add a new method of evaluating ScalarArrayOpExpr expressions to
allow them to be evaluated by first building a hash table containing each
element, then on subsequent evaluations, we just probe that hash table to
determine if there is a match.

The planner is in charge of determining when this optimization is possible
and it enables it by setting hashfuncid in the ScalarArrayOpExpr.  The
executor will only perform the hash table evaluation when the hashfuncid
is set.

This means that not all cases are optimized. For example CHECK constraints
containing an IN clause won't go through the planner, so won't get the
hashfuncid set.  We could maybe do something about that at some later
date.  The reason we're not doing it now is from fear that we may slow
down cases where the expression is evaluated only once.  Those cases can
be common, for example, a single row INSERT to a table with a CHECK
constraint containing an IN clause.

In the planner, we enable this when there are suitable hash functions for
the ScalarArrayOpExpr's operator and only when there is at least
MIN_ARRAY_SIZE_FOR_HASHED_SAOP elements in the array.  The threshold is
currently set to 9.

Author: James Coleman, David Rowley
Reviewed-by: David Rowley, Tomas Vondra, Heikki Linnakangas
Discussion: https://postgr.es/m/CAAaqYe8x62+=wn0zvNKCj55tPpg-JBHzhZFFc6ANovdqFw7-dA@mail.gmail.com
2021-04-08 23:51:22 +12:00
Peter Eisentraut
e717a9a18b SQL-standard function body
This adds support for writing CREATE FUNCTION and CREATE PROCEDURE
statements for language SQL with a function body that conforms to the
SQL standard and is portable to other implementations.

Instead of the PostgreSQL-specific AS $$ string literal $$ syntax,
this allows writing out the SQL statements making up the body
unquoted, either as a single statement:

    CREATE FUNCTION add(a integer, b integer) RETURNS integer
        LANGUAGE SQL
        RETURN a + b;

or as a block

    CREATE PROCEDURE insert_data(a integer, b integer)
    LANGUAGE SQL
    BEGIN ATOMIC
      INSERT INTO tbl VALUES (a);
      INSERT INTO tbl VALUES (b);
    END;

The function body is parsed at function definition time and stored as
expression nodes in a new pg_proc column prosqlbody.  So at run time,
no further parsing is required.

However, this form does not support polymorphic arguments, because
there is no more parse analysis done at call time.

Dependencies between the function and the objects it uses are fully
tracked.

A new RETURN statement is introduced.  This can only be used inside
function bodies.  Internally, it is treated much like a SELECT
statement.

psql needs some new intelligence to keep track of function body
boundaries so that it doesn't send off statements when it sees
semicolons that are inside a function body.

Tested-by: Jaime Casanova <jcasanov@systemguards.com.ec>
Reviewed-by: Julien Rouhaud <rjuju123@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/1c11f1eb-f00c-43b7-799d-2d44132c02d7@2ndquadrant.com
2021-04-07 21:47:55 +02:00
Peter Eisentraut
9c5f67fd62 Add support for NullIfExpr in eval_const_expressions
Author: Hou Zhijie <houzj.fnst@cn.fujitsu.com>
Discussion: https://www.postgresql.org/message-id/flat/7ea5ce773bbc4eea9ff1a381acd3b102@G08CNEXMBPEKD05.g08.fujitsu.local
2021-04-02 11:01:49 +02:00
David Rowley
9eacee2e62 Add Result Cache executor node (take 2)
Here we add a new executor node type named "Result Cache".  The planner
can include this node type in the plan to have the executor cache the
results from the inner side of parameterized nested loop joins.  This
allows caching of tuples for sets of parameters so that in the event that
the node sees the same parameter values again, it can just return the
cached tuples instead of rescanning the inner side of the join all over
again.  Internally, result cache uses a hash table in order to quickly
find tuples that have been previously cached.

For certain data sets, this can significantly improve the performance of
joins.  The best cases for using this new node type are for join problems
where a large portion of the tuples from the inner side of the join have
no join partner on the outer side of the join.  In such cases, hash join
would have to hash values that are never looked up, thus bloating the hash
table and possibly causing it to multi-batch.  Merge joins would have to
skip over all of the unmatched rows.  If we use a nested loop join with a
result cache, then we only cache tuples that have at least one join
partner on the outer side of the join.  The benefits of using a
parameterized nested loop with a result cache increase when there are
fewer distinct values being looked up and the number of lookups of each
value is large.  Also, hash probes to lookup the cache can be much faster
than the hash probe in a hash join as it's common that the result cache's
hash table is much smaller than the hash join's due to result cache only
caching useful tuples rather than all tuples from the inner side of the
join.  This variation in hash probe performance is more significant when
the hash join's hash table no longer fits into the CPU's L3 cache, but the
result cache's hash table does.  The apparent "random" access of hash
buckets with each hash probe can cause a poor L3 cache hit ratio for large
hash tables.  Smaller hash tables generally perform better.

The hash table used for the cache limits itself to not exceeding work_mem
* hash_mem_multiplier in size.  We maintain a dlist of keys for this cache
and when we're adding new tuples and realize we've exceeded the memory
budget, we evict cache entries starting with the least recently used ones
until we have enough memory to add the new tuples to the cache.

For parameterized nested loop joins, we now consider using one of these
result cache nodes in between the nested loop node and its inner node.  We
determine when this might be useful based on cost, which is primarily
driven off of what the expected cache hit ratio will be.  Estimating the
cache hit ratio relies on having good distinct estimates on the nested
loop's parameters.

For now, the planner will only consider using a result cache for
parameterized nested loop joins.  This works for both normal joins and
also for LATERAL type joins to subqueries.  It is possible to use this new
node for other uses in the future.  For example, to cache results from
correlated subqueries.  However, that's not done here due to some
difficulties obtaining a distinct estimation on the outer plan to
calculate the estimated cache hit ratio.  Currently we plan the inner plan
before planning the outer plan so there is no good way to know if a result
cache would be useful or not since we can't estimate the number of times
the subplan will be called until the outer plan is generated.

The functionality being added here is newly introducing a dependency on
the return value of estimate_num_groups() during the join search.
Previously, during the join search, we only ever needed to perform
selectivity estimations.  With this commit, we need to use
estimate_num_groups() in order to estimate what the hit ratio on the
result cache will be.   In simple terms, if we expect 10 distinct values
and we expect 1000 outer rows, then we'll estimate the hit ratio to be
99%.  Since cache hits are very cheap compared to scanning the underlying
nodes on the inner side of the nested loop join, then this will
significantly reduce the planner's cost for the join.   However, it's
fairly easy to see here that things will go bad when estimate_num_groups()
incorrectly returns a value that's significantly lower than the actual
number of distinct values.  If this happens then that may cause us to make
use of a nested loop join with a result cache instead of some other join
type, such as a merge or hash join.  Our distinct estimations have been
known to be a source of trouble in the past, so the extra reliance on them
here could cause the planner to choose slower plans than it did previous
to having this feature.  Distinct estimations are also fairly hard to
estimate accurately when several tables have been joined already or when a
WHERE clause filters out a set of values that are correlated to the
expressions we're estimating the number of distinct value for.

For now, the costing we perform during query planning for result caches
does put quite a bit of faith in the distinct estimations being accurate.
When these are accurate then we should generally see faster execution
times for plans containing a result cache.  However, in the real world, we
may find that we need to either change the costings to put less trust in
the distinct estimations being accurate or perhaps even disable this
feature by default.  There's always an element of risk when we teach the
query planner to do new tricks that it decides to use that new trick at
the wrong time and causes a regression.  Users may opt to get the old
behavior by turning the feature off using the enable_resultcache GUC.
Currently, this is enabled by default.  It remains to be seen if we'll
maintain that setting for the release.

Additionally, the name "Result Cache" is the best name I could think of
for this new node at the time I started writing the patch.  Nobody seems
to strongly dislike the name. A few people did suggest other names but no
other name seemed to dominate in the brief discussion that there was about
names. Let's allow the beta period to see if the current name pleases
enough people.  If there's some consensus on a better name, then we can
change it before the release.  Please see the 2nd discussion link below
for the discussion on the "Result Cache" name.

Author: David Rowley
Reviewed-by: Andy Fan, Justin Pryzby, Zhihong Yu, Hou Zhijie
Tested-By: Konstantin Knizhnik
Discussion: https://postgr.es/m/CAApHDvrPcQyQdWERGYWx8J%2B2DLUNgXu%2BfOSbQ1UscxrunyXyrQ%40mail.gmail.com
Discussion: https://postgr.es/m/CAApHDvq=yQXr5kqhRviT2RhNKwToaWr9JAN5t+5_PzhuRJ3wvg@mail.gmail.com
2021-04-02 14:10:56 +13:00
David Rowley
28b3e3905c Revert b6002a796
This removes "Add Result Cache executor node".  It seems that something
weird is going on with the tracking of cache hits and misses as
highlighted by many buildfarm animals.  It's not yet clear what the
problem is as other parts of the plan indicate that the cache did work
correctly, it's just the hits and misses that were being reported as 0.

This is especially a bad time to have the buildfarm so broken, so
reverting before too many more animals go red.

Discussion: https://postgr.es/m/CAApHDvq_hydhfovm4=izgWs+C5HqEeRScjMbOgbpC-jRAeK3Yw@mail.gmail.com
2021-04-01 13:33:23 +13:00
David Rowley
b6002a796d Add Result Cache executor node
Here we add a new executor node type named "Result Cache".  The planner
can include this node type in the plan to have the executor cache the
results from the inner side of parameterized nested loop joins.  This
allows caching of tuples for sets of parameters so that in the event that
the node sees the same parameter values again, it can just return the
cached tuples instead of rescanning the inner side of the join all over
again.  Internally, result cache uses a hash table in order to quickly
find tuples that have been previously cached.

For certain data sets, this can significantly improve the performance of
joins.  The best cases for using this new node type are for join problems
where a large portion of the tuples from the inner side of the join have
no join partner on the outer side of the join.  In such cases, hash join
would have to hash values that are never looked up, thus bloating the hash
table and possibly causing it to multi-batch.  Merge joins would have to
skip over all of the unmatched rows.  If we use a nested loop join with a
result cache, then we only cache tuples that have at least one join
partner on the outer side of the join.  The benefits of using a
parameterized nested loop with a result cache increase when there are
fewer distinct values being looked up and the number of lookups of each
value is large.  Also, hash probes to lookup the cache can be much faster
than the hash probe in a hash join as it's common that the result cache's
hash table is much smaller than the hash join's due to result cache only
caching useful tuples rather than all tuples from the inner side of the
join.  This variation in hash probe performance is more significant when
the hash join's hash table no longer fits into the CPU's L3 cache, but the
result cache's hash table does.  The apparent "random" access of hash
buckets with each hash probe can cause a poor L3 cache hit ratio for large
hash tables.  Smaller hash tables generally perform better.

The hash table used for the cache limits itself to not exceeding work_mem
* hash_mem_multiplier in size.  We maintain a dlist of keys for this cache
and when we're adding new tuples and realize we've exceeded the memory
budget, we evict cache entries starting with the least recently used ones
until we have enough memory to add the new tuples to the cache.

For parameterized nested loop joins, we now consider using one of these
result cache nodes in between the nested loop node and its inner node.  We
determine when this might be useful based on cost, which is primarily
driven off of what the expected cache hit ratio will be.  Estimating the
cache hit ratio relies on having good distinct estimates on the nested
loop's parameters.

For now, the planner will only consider using a result cache for
parameterized nested loop joins.  This works for both normal joins and
also for LATERAL type joins to subqueries.  It is possible to use this new
node for other uses in the future.  For example, to cache results from
correlated subqueries.  However, that's not done here due to some
difficulties obtaining a distinct estimation on the outer plan to
calculate the estimated cache hit ratio.  Currently we plan the inner plan
before planning the outer plan so there is no good way to know if a result
cache would be useful or not since we can't estimate the number of times
the subplan will be called until the outer plan is generated.

The functionality being added here is newly introducing a dependency on
the return value of estimate_num_groups() during the join search.
Previously, during the join search, we only ever needed to perform
selectivity estimations.  With this commit, we need to use
estimate_num_groups() in order to estimate what the hit ratio on the
result cache will be.   In simple terms, if we expect 10 distinct values
and we expect 1000 outer rows, then we'll estimate the hit ratio to be
99%.  Since cache hits are very cheap compared to scanning the underlying
nodes on the inner side of the nested loop join, then this will
significantly reduce the planner's cost for the join.   However, it's
fairly easy to see here that things will go bad when estimate_num_groups()
incorrectly returns a value that's significantly lower than the actual
number of distinct values.  If this happens then that may cause us to make
use of a nested loop join with a result cache instead of some other join
type, such as a merge or hash join.  Our distinct estimations have been
known to be a source of trouble in the past, so the extra reliance on them
here could cause the planner to choose slower plans than it did previous
to having this feature.  Distinct estimations are also fairly hard to
estimate accurately when several tables have been joined already or when a
WHERE clause filters out a set of values that are correlated to the
expressions we're estimating the number of distinct value for.

For now, the costing we perform during query planning for result caches
does put quite a bit of faith in the distinct estimations being accurate.
When these are accurate then we should generally see faster execution
times for plans containing a result cache.  However, in the real world, we
may find that we need to either change the costings to put less trust in
the distinct estimations being accurate or perhaps even disable this
feature by default.  There's always an element of risk when we teach the
query planner to do new tricks that it decides to use that new trick at
the wrong time and causes a regression.  Users may opt to get the old
behavior by turning the feature off using the enable_resultcache GUC.
Currently, this is enabled by default.  It remains to be seen if we'll
maintain that setting for the release.

Additionally, the name "Result Cache" is the best name I could think of
for this new node at the time I started writing the patch.  Nobody seems
to strongly dislike the name. A few people did suggest other names but no
other name seemed to dominate in the brief discussion that there was about
names. Let's allow the beta period to see if the current name pleases
enough people.  If there's some consensus on a better name, then we can
change it before the release.  Please see the 2nd discussion link below
for the discussion on the "Result Cache" name.

Author: David Rowley
Reviewed-by: Andy Fan, Justin Pryzby, Zhihong Yu
Tested-By: Konstantin Knizhnik
Discussion: https://postgr.es/m/CAApHDvrPcQyQdWERGYWx8J%2B2DLUNgXu%2BfOSbQ1UscxrunyXyrQ%40mail.gmail.com
Discussion: https://postgr.es/m/CAApHDvq=yQXr5kqhRviT2RhNKwToaWr9JAN5t+5_PzhuRJ3wvg@mail.gmail.com
2021-04-01 12:32:22 +13:00