Commit graph

1105 commits

Author SHA1 Message Date
Alvaro Herrera
3ad4c8615a
Avoid failure to open dropped detached partition
When a partition is detached and immediately dropped, a prepared
statement could try to compute a new partition descriptor that includes
it.  This leads to this kind of error:
ERROR:  could not open relation with OID 457639

Avoid this by skipping the partition in expand_partitioned_rtentry if it
doesn't exist.

Noted by me while investigating bug #18559.  Kuntal Gosh helped to
identify the exact failure.

Backpatch to 14, where DETACH CONCURRENTLY was introduced.

Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Kuntal Ghosh <kuntalghosh.2007@gmail.com>
Reviewed-by: Junwang Zhao <zhjwpku@gmail.com>
Discussion: https://postgr.es/m/202408122233.bo4adt3vh5bi@alvherre.pgsql
2024-08-19 16:09:10 -04:00
Masahiko Sawada
72ef1675e2 Restrict accesses to non-system views and foreign tables during pg_dump.
When pg_dump retrieves the list of database objects and performs the
data dump, there was possibility that objects are replaced with others
of the same name, such as views, and access them. This vulnerability
could result in code execution with superuser privileges during the
pg_dump process.

This issue can arise when dumping data of sequences, foreign
tables (only 13 or later), or tables registered with a WHERE clause in
the extension configuration table.

To address this, pg_dump now utilizes the newly introduced
restrict_nonsystem_relation_kind GUC parameter to restrict the
accesses to non-system views and foreign tables during the dump
process. This new GUC parameter is added to back branches too, but
these changes do not require cluster recreation.

Back-patch to all supported branches.

Reviewed-by: Noah Misch
Security: CVE-2024-7348
Backpatch-through: 12
2024-08-05 06:05:23 -07:00
David Rowley
dae9f16aa3 Fix possible Assert failure in cost_memoize_rescan
In cost_memoize_rescan(), when calculating the hit_ratio using the calls
and ndistinct estimations, if the value that was set in
MemoizePath.calls had not been processed through clamp_row_est(), then it
was possible that it was set to some non-integer value which could result
in ndistinct being 1 higher than calls due to estimate_num_groups()
performing clamp_row_est() on its input_rows.  This could result in
hit_ratio values slightly below 0.0, which would cause an Assert failure.

The value of MemoizePath.calls comes from the final parameter in the
create_memoize_path() function, of which we only have one true caller of.
That caller passes outer_path->rows.  All the core code I looked at
always seems to call clamp_row_est() on the Path.rows, so there might
have been no issues with any core Paths causing troubles here.  The bug
report was about a CustomPath with a non-clamped row estimated.

The misbehavior as a result of this seems to be mostly limited to the
Assert() failing.  Aside from that, it seems the Memoize costs would
just come out slightly higher than they should have, which is likely
fairly harmless.

Reported-by: Kohei KaiGai <kaigai@heterodb.com>
Discussion: https://postgr.es/m/CAOP8fzZnTU+N64UYJYogb1hN-5hFP+PwTb3m_cnGAD7EsQwrKw@mail.gmail.com
Reviewed-by: Richard Guo
Backpatch-through: 14, where Memoize was introduced
2024-06-19 10:21:52 +12:00
Tom Lane
096f2132c6 Fix infer_arbiter_indexes() to not assume resultRelation is 1.
infer_arbiter_indexes failed to renumber varnos in index expressions
or predicates that it got from the catalogs.  This escaped detection
up to now because the stored varnos in such trees will be 1, and an
INSERT's result relation is usually the first rangetable entry,
so that that was fine.  However, in cases such as inserting through
an updatable view, it's not fine, leading to failure to match the
expressions to the query with ensuing "there is no unique or exclusion
constraint matching the ON CONFLICT specification" errors.

Fix by copy-and-paste from get_relation_info().

Per bug #18502 from Michael Wang.  Back-patch to all supported
versions.

Discussion: https://postgr.es/m/18502-545b53f5b81e54e0@postgresql.org
2024-06-11 17:57:46 -04:00
Tom Lane
ab2402268c Fix generation of EC join conditions at the wrong plan level.
get_baserel_parampathinfo previously assumed without checking that
the results of generate_join_implied_equalities "necessarily satisfy
join_clause_is_movable_into".  This turns out to be wrong in the
presence of outer joins, because the generated clauses could include
Vars that mustn't be evaluated below a relevant outer join.  That
led to applying clauses at the wrong plan level and possibly getting
incorrect query results.  We must check each clause's nullable_relids,
and really the right thing to do is test join_clause_is_movable_into.

However, trying to fix it that way exposes an oversight in
equivclass.c: it wasn't careful about marking join clauses for
appendrel children with the correct clause_relids.  That caused the
modified get_baserel_parampathinfo code to reject some clauses it
still needs to accept.  (See parallel commit for HEAD/v16 for more
commentary about that.)

Per bug #18429 from Benoît Ryder.  This misbehavior existed for
a long time before commit 2489d76c4, so patch v12-v15 this way.

Discussion: https://postgr.es/m/18429-8982d4a348cc86c6@postgresql.org
2024-04-16 11:22:39 -04:00
Tom Lane
78e81e14db Fix type-checking of RECORD-returning functions in FROM, redux.
Commit 2ed8f9a01 intended to institute a policy that if a
RangeTblFunction has a coldeflist, then the function return type is
certainly RECORD, and we should use the coldeflist as the source of
truth about what the columns of the record type are.  When the
original function has been folded to a constant, inspection of the
constant might give a different answer.  This situation will lead to
a tuple-type-mismatch error at execution, but up until that point we
need to consistently believe the coldeflist, or we'll have problems
from different bits of code reaching different conclusions.

expandRTE didn't get that memo though, and would try to produce a
tupdesc based on the constant in this situation, leading to an
assertion failure.  (Desultory testing suggests that non-assert
builds often manage to give the expected error, although I also
saw a "cache lookup failed for type 0" error, and it seems at
least possible that a crash could happen.)

Some other callers of get_expr_result_type and get_expr_result_tupdesc
were also being incautious about this.  While none of them seem to
have actual bugs, they're working harder than necessary in this case,
besides which it seems safest to have an explicit policy of not using
those functions on an RTE with a coldeflist.  Adjust the code
accordingly, and add commentary to funcapi.c about this policy.

Also fix an obsolete comment that claimed "get_expr_result_type()
doesn't know how to extract type info from a RECORD constant".
That hasn't been true since commit d57534740.

Per bug #18422 from Alexander Lakhin.
As with the previous commit, back-patch to all supported branches.

Discussion: https://postgr.es/m/18422-89ca86c8eac5246d@postgresql.org
2024-04-15 12:56:56 -04:00
Tom Lane
649bbba113 Fix confusion about the return rowtype of SQL-language procedures.
There is a very ancient hack in check_sql_fn_retval that allows a
single SELECT targetlist entry of composite type to be taken as
supplying all the output columns of a function returning composite.
(This is grotty and fundamentally ambiguous, but it's really hard
to do nested composite-returning functions without it.)

As far as I know, that doesn't cause any problems in ordinary
functions.  It's disastrous for procedures however.  All procedures
that have any output parameters are labeled with prorettype RECORD,
and the CALL code expects it will get back a record with one column
per output parameter, regardless of whether any of those parameters
is composite.  Doing something else leads to an assertion failure
or core dump.

This is simple enough to fix: we just need to not apply that rule
when considering procedures.  However, that requires adding another
argument to check_sql_fn_retval, which at least in principle might be
getting called by external callers.  Therefore, in the back branches
convert check_sql_fn_retval into an ABI-preserving wrapper around a
new function check_sql_fn_retval_ext.

Per report from Yahor Yuzefovich.  This has been broken since we
implemented procedures, so back-patch to all supported branches.

Discussion: https://postgr.es/m/CABz5gWHSjj2df6uG0NRiDhZ_Uz=Y8t0FJP-_SVSsRsnrQT76Gg@mail.gmail.com
2024-03-12 18:16:10 -04:00
Alexander Korotkov
84cc1a5527 Backpatch missing check_stack_depth() to some recursive functions
Backpatch changes from d57b7cc333, 75bcba6cbd to all supported branches per
proposal of Egor Chindyaskin.

Discussion: https://postgr.es/m/DE5FD776-A8CD-4378-BCFA-3BF30F1F6D60%40mail.ru
2024-03-11 03:06:15 +02:00
Tom Lane
a595c3075f Fix type-checking of RECORD-returning functions in FROM.
In the corner case where a function returning RECORD has been
simplified to a RECORD constant or an inlined ROW() expression,
ExecInitFunctionScan failed to cross-check the function's result
rowtype against the coldeflist provided by the calling query.
That happened because get_expr_result_type is able to extract a
tupdesc from such expressions, which led ExecInitFunctionScan to
ignore the coldeflist.  (Instead, it used the extracted tupdesc
to check the function's output, which of course always succeeds.)

I have not been able to demonstrate any really serious consequences
from this, because if some column of the result is of the wrong
type and is directly referenced by a Var of the calling query,
CheckVarSlotCompatibility will catch it.  However, we definitely do
fail to report the case where the function returns more columns than
the coldeflist expects, and in the converse case where it returns
fewer columns, we get an assert failure (but, seemingly, no worse
results in non-assert builds).

To fix, always build the expected tupdesc from the coldeflist if there
is one, and consult get_expr_result_type only when there isn't one.

Also remove the failing Assert, even though it is no longer reached
after this fix.  It doesn't seem to be adding anything useful, since
later checking will deal with cases with the wrong number of columns.

The only other place I could find that is doing something similar
is inline_set_returning_function.  There's no live bug there because
we cannot be looking at a Const or RowExpr, but for consistency
change that code to agree with ExecInitFunctionScan.

Per report from PetSerAl.  After some debate I've concluded that
this should be back-patched.  There is a small risk that somebody
has been relying on such a case not throwing an error, but I judge
this outweighed by the risk that I've missed some way in which the
failure to cross-check has worse consequences than sketched above.

Discussion: https://postgr.es/m/CAKygsHSerA1eXsJHR9wft3Gn3wfHQ5RfP8XHBzF70_qcrrRvEg@mail.gmail.com
2024-03-06 14:41:13 -05:00
Tom Lane
cbeb455275 Avoid dangling-pointer problem with partitionwise joins under GEQO.
build_child_join_sjinfo creates a derived SpecialJoinInfo in
the short-lived GEQO context, but afterwards the semi_rhs_exprs
from that may be used in a UniquePath for a child base relation.
This breaks the expectation that all base-relation-level structures
are in the planning-lifespan context, leading to use of a dangling
pointer with probable ensuing crash later on in create_unique_plan.
To fix, copy the expression trees when making a UniquePath.

Per bug #18360 from Alexander Lakhin.  This has been broken since
partitionwise joins were added, so back-patch to all supported
branches.

Discussion: https://postgr.es/m/18360-a23caf3157f34e62@postgresql.org
2024-02-23 15:21:53 -05:00
Tom Lane
e031995d5c Apply band-aid fix for an oversight in reparameterize_path_by_child.
The path we wish to reparameterize is not a standalone object:
in particular, it implicitly references baserestrictinfo clauses
in the associated RelOptInfo, and if it's a SampleScan path then
there is also the TableSampleClause in the RTE to worry about.
Both of those could contain lateral references to the join partner
relation, which would need to be modified to refer to its child.
Since we aren't doing that, affected queries can give wrong answers,
or odd failures such as "variable not found in subplan target list",
or executor crashes.  But we can't just summarily modify those
expressions, because they are shared with other paths for the rel.
We'd break things if we modify them and then end up using some
non-partitioned-join path.

In HEAD, we plan to fix this by postponing reparameterization
until create_plan(), when we know that those other paths are
no longer of interest, and then adjusting those expressions along
with the ones in the path itself.  That seems like too big a change
for stable branches however.  In the back branches, let's just detect
whether any troublesome lateral references actually exist in those
expressions, and fail reparameterization if so.  This will result in
not performing a partitioned join in such cases.  Given the lack of
field complaints, nobody's likely to miss the optimization.

Report and patch by Richard Guo.  Apply to 12-16 only, since
the intended fix for HEAD looks quite different.  We're not quite
ready to push the HEAD fix, but with back-branch releases coming
up soon, it seems wise to get this stopgap fix in place there.

Discussion: https://postgr.es/m/CAMbWs496+N=UAjOc=rcD3P7B6oJe4rZw08e_TZRUsWbPxZW3Tw@mail.gmail.com
2024-02-01 12:34:21 -05:00
Tom Lane
9bd0f74eac Ensure we preprocess expressions before checking their volatility.
contain_mutable_functions and contain_volatile_functions give
reliable answers only after expression preprocessing (specifically
eval_const_expressions).  Some places understand this, but some did
not get the memo --- which is not entirely their fault, because the
problem is documented only in places far away from those functions.
Introduce wrapper functions that allow doing the right thing easily,
and add commentary in hopes of preventing future mistakes from
copy-and-paste of code that's only conditionally safe.

Two actual bugs of this ilk are fixed here.  We failed to preprocess
column GENERATED expressions before checking mutability, so that the
code could fail to detect the use of a volatile function
default-argument expression, or it could reject a polymorphic function
that is actually immutable on the datatype of interest.  Likewise,
column DEFAULT expressions weren't preprocessed before determining if
it's safe to apply the attmissingval mechanism.  A false negative
would just result in an unnecessary table rewrite, but a false
positive could allow the attmissingval mechanism to be used in a case
where it should not be, resulting in unexpected initial values in a
new column.

In passing, re-order the steps in ComputePartitionAttrs so that its
checks for invalid column references are done before applying
expression_planner, rather than after.  The previous coding would
not complain if a partition expression contains a disallowed column
reference that gets optimized away by constant folding, which seems
to me to be a behavior we do not want.

Per bug #18097 from Jim Keener.  Back-patch to all supported versions.

Discussion: https://postgr.es/m/18097-ebb179674f22932f@postgresql.org
2023-11-16 10:05:14 -05:00
Tom Lane
8f4a6b9e4f Fix problems when a plain-inheritance parent table is excluded.
When an UPDATE/DELETE/MERGE's target table is an old-style
inheritance tree, it's possible for the parent to get excluded
from the plan while some children are not.  (I believe this is
only possible if we can prove that a CHECK ... NO INHERIT
constraint on the parent contradicts the query WHERE clause,
so it's a very unusual case.)  In such a case, ExecInitModifyTable
mistakenly concluded that the first surviving child is the target
table, leading to at least two bugs:

1. The wrong table's statement-level triggers would get fired.

2. In v16 and up, it was possible to fail with "invalid perminfoindex
0 in RTE with relid nnnn" due to the child RTE not having permissions
data included in the query plan.  This was hard to reproduce reliably
because it did not occur unless the update triggered some non-HOT
index updates.

In v14 and up, this is easy to fix by defining ModifyTable.rootRelation
to be the parent RTE in plain inheritance as well as partitioned cases.

While the wrong-triggers bug also appears in older branches, the
relevant code in both the planner and executor is quite a bit
different, so it would take a good deal of effort to develop and
test a suitable patch.  Given the lack of field complaints about the
trigger issue, I'll desist for now.  (Patching v11 for this seems
unwise anyway, given that it will have no more releases after next
month.)

Per bug #18147 from Hans Buschmann.

Amit Langote and Tom Lane

Discussion: https://postgr.es/m/18147-6fc796538913ee88@postgresql.org
2023-10-24 14:48:34 -04:00
Etsuro Fujita
b0e390e6d1 Disallow replacing joins with scans in problematic cases.
Commit e7cb7ee14, which introduced the infrastructure for FDWs and
custom scan providers to replace joins with scans, failed to add support
handling of pseudoconstant quals assigned to replaced joins in
createplan.c, leading to an incorrect plan without a gating Result node
when postgres_fdw replaced a join with such a qual.

To fix, we could add the support by 1) modifying the ForeignPath and
CustomPath structs to store the list of RestrictInfo nodes to apply to
the join, as in JoinPaths, if they represent foreign and custom scans
replacing a join with a scan, and by 2) modifying create_scan_plan() in
createplan.c to use that list in that case, instead of the
baserestrictinfo list, to get pseudoconstant quals assigned to the join;
but #1 would cause an ABI break.  So fix by modifying the infrastructure
to just disallow replacing joins with such quals.

Back-patch to all supported branches.

Reported by Nishant Sharma.  Patch by me, reviewed by Nishant Sharma and
Richard Guo.

Discussion: https://postgr.es/m/CADrsxdbcN1vejBaf8a%2BQhrZY5PXL-04mCd4GDu6qm6FigDZd6Q%40mail.gmail.com
2023-07-28 15:45:04 +09:00
Tom Lane
f8d799eda2 Handle RLS dependencies in inlined set-returning functions properly.
If an SRF in the FROM clause references a table having row-level
security policies, and we inline that SRF into the calling query,
we neglected to mark the plan as potentially dependent on which
role is executing it.  This could lead to later executions in the
same session returning or hiding rows that should have been hidden
or returned instead.

Our thanks to Wolfgang Walther for reporting this problem.

Stephen Frost and Tom Lane

Security: CVE-2023-2455
2023-05-08 10:12:44 -04:00
Tom Lane
0dd55ef9b8 Fix parallel-safety marking when moving initplans to another node.
Our policy since commit ab77a5a45 has been that a plan node having
any initplans is automatically not parallel-safe.  (This could be
relaxed, but not today.)  clean_up_removed_plan_level neglected
this, and could attach initplans to a parallel-safe child plan
node without clearing the plan's parallel-safe flag.  That could
lead to "subplan was not initialized" errors at runtime, in case
an initplan referenced another one and only the referencing one
got transmitted to parallel workers.

The fix in clean_up_removed_plan_level is trivial enough.
materialize_finished_plan also moves initplans from one node
to another, but it's okay because it already copies the source
node's parallel_safe flag.  The other place that does this kind
of thing is standard_planner's hack to inject a top-level Gather
when debug_parallel_query is active.  But that's actually dead
code given that we're correctly enforcing the "initplans aren't
parallel safe" rule, so just replace it with an Assert that
there are no initplans.

Also improve some related comments.

Normally we'd add a regression test case for this sort of bug.
The mistake itself is already reached by existing tests, but there
is accidentally no visible problem.  The only known test case that
creates an actual failure seems too indirect and fragile to justify
keeping it as a regression test (not least because it fails to fail
in v11, though the bug is clearly present there too).

Per report from Justin Pryzby.  Back-patch to all supported branches.

Discussion: https://postgr.es/m/ZDVt6MaNWkRDO1LQ@telsasoft.com
2023-04-12 10:46:30 -04:00
Tom Lane
8cd190e13a Fix calculation of which GENERATED columns need to be updated.
We were identifying the updatable generated columns of inheritance
children by transposing the calculation made for their parent.
However, there's nothing that says a traditional-inheritance child
can't have generated columns that aren't there in its parent, or that
have different dependencies than are in the parent's expression.
(At present it seems that we don't enforce that for partitioning
either, which is likely wrong to some degree or other; but the case
clearly needs to be handled with traditional inheritance.)

Hence, drop the very-klugy-anyway "extraUpdatedCols" RTE field
in favor of identifying which generated columns depend on updated
columns during executor startup.  In HEAD we can remove
extraUpdatedCols altogether; in back branches, it's still there but
always empty.  Another difference between the HEAD and back-branch
versions of this patch is that in HEAD we can add the new bitmap field
to ResultRelInfo, but that would cause an ABI break in back branches.
Like 4b3e37993, add a List field at the end of struct EState instead.

Back-patch to v13.  The bogus calculation is also being made in v12,
but it doesn't have the same visible effect because we don't use it
to decide which generated columns to recalculate; as a consequence of
which the patch doesn't apply easily.  I think that there might still
be a demonstrable bug associated with trigger firing conditions, but
that's such a weird corner-case usage that I'm content to leave it
unfixed in v12.

Amit Langote and Tom Lane

Discussion: https://postgr.es/m/CA+HiwqFshLKNvQUd1DgwJ-7tsTp=dwv7KZqXC4j2wYBV1aCDUA@mail.gmail.com
Discussion: https://postgr.es/m/2793383.1672944799@sss.pgh.pa.us
2023-01-05 14:12:17 -05:00
Michael Paquier
e373e5578b Fix typos in comments, code and documentation
While on it, newlines are removed from the end of two elog() strings.
The others are simple grammar mistakes.  One comment in pg_upgrade
referred incorrectly to sequences since a7e5457.

Author: Justin Pryzby
Discussion: https://postgr.es/m/20221230231257.GI1153@telsasoft.com
Backpatch-through: 11
2023-01-03 16:26:30 +09:00
Tom Lane
dc3648f65b Fix Memoize to work with partitionwise joining.
A couple of places weren't up to speed for this.  By sheer good
luck, we didn't fail but just selected a non-memoized join plan,
at least in the test case we have.  Nonetheless, it's a bug,
and I'm not quite sure that it couldn't have worse consequences
in other examples.  So back-patch to v14 where Memoize came in.

Richard Guo

Discussion: https://postgr.es/m/CAMbWs48GkNom272sfp0-WeD6_0HSR19BJ4H1c9ZKSfbVnJsvRg@mail.gmail.com
2022-12-05 12:36:41 -05:00
Tom Lane
ce093aa180 Fix broken MemoizePath support in reparameterize_path().
It neglected to recurse to the subpath, meaning you'd get back
a path identical to the input.  This could produce wrong query
results if the omission meant that the subpath fails to enforce
some join clause it should be enforcing.  We don't have a test
case for this at the moment, but the code is obviously broken
and the fix is equally obvious.  Back-patch to v14 where
Memoize was introduced.

Richard Guo

Discussion: https://postgr.es/m/CAMbWs4_R=ORpz=Lkn2q3ebPC5EuWyfZF+tmfCPVLBVK5W39mHA@mail.gmail.com
2022-12-04 13:48:12 -05:00
Tom Lane
8c611602bd Guard against table-AM-less relations in planner.
The executor will dump core if it's asked to execute a seqscan on
a relation having no table AM, such as a view.  While that shouldn't
really happen, it's possible to get there via catalog corruption,
such as a missing ON SELECT rule.  It seems worth installing a defense
against that.  There are multiple plausible places for such a defense,
but I picked the planner's get_relation_info().

Per discussion of bug #17646 from Kui Liu.  Back-patch to v12 where
the tableam APIs were introduced; in older versions you won't get a
SIGSEGV, so it seems less pressing.

Discussion: https://postgr.es/m/17646-70c93cfa40365776@postgresql.org
2022-10-17 11:35:23 -04:00
Tom Lane
88c947cb52 Suppress more variable-set-but-not-used warnings from clang 15.
Mop up assorted set-but-not-used warnings in the back branches.
This includes back-patching relevant fixes from commit 152c9f7b8
the rest of the way, but there are also several cases that did not
appear in HEAD.  Some of those we'd fixed in a retail way but not
back-patched, and others I think just got rewritten out of existence
during nearby refactoring.

While here, also back-patch b1980f6d0 (PL/Tcl: Fix compiler warnings
with Tcl 8.6) into 9.2, so that that branch compiles warning-free
with modern Tcl.

Per project policy, this is a candidate for back-patching into
out-of-support branches: it suppresses annoying compiler warnings
but changes no behavior.  Hence, back-patch all the way to 9.2.

Discussion: https://postgr.es/m/514615.1663615243@sss.pgh.pa.us
2022-09-21 13:52:38 -04:00
Tom Lane
445b9020c9 Fix incorrect tests for SRFs in relation_can_be_sorted_early().
Commit fac1b470a thought we could check for set-returning functions
by testing only the top-level node in an expression tree.  This is
wrong in itself, and to make matters worse it encouraged others
to make the same mistake, by exporting tlist.c's special-purpose
IS_SRF_CALL() as a widely-visible macro.  I can't find any evidence
that anyone's taken the bait, but it was only a matter of time.

Use expression_returns_set() instead, and stuff the IS_SRF_CALL()
genie back in its bottle, this time with a warning label.  I also
added a couple of cross-reference comments.

After a fair amount of fooling around, I've despaired of making
a robust test case that exposes the bug reliably, so no test case
here.  (Note that the test case added by fac1b470a is itself
broken, in that it doesn't notice if you remove the code change.
The repro given by the bug submitter currently doesn't fail either
in v15 or HEAD, though I suspect that may indicate an unrelated bug.)

Per bug #17564 from Martijn van Oosterhout.  Back-patch to v13,
as the faulty patch was.

Discussion: https://postgr.es/m/17564-c7472c2f90ef2da3@postgresql.org
2022-08-03 17:33:42 -04:00
Tom Lane
ac51c9fba5 Make pull_var_clause() handle GroupingFuncs exactly like Aggrefs.
This follows in the footsteps of commit 2591ee8ec by removing one more
ill-advised shortcut from planning of GroupingFuncs.  It's true that
we don't intend to execute the argument expression(s) at runtime, but
we still have to process any Vars appearing within them, or we risk
failure at setrefs.c time (or more fundamentally, in EXPLAIN trying
to print such an expression).  Vars in upper plan nodes have to have
referents in the next plan level, whether we ever execute 'em or not.

Per bug #17479 from Michael J. Sullivan.  Back-patch to all supported
branches.

Richard Guo

Discussion: https://postgr.es/m/17479-6260deceaf0ad304@postgresql.org
2022-05-12 11:31:46 -04:00
David Rowley
c2dc7b9e15 Flush Memoize cache when non-key parameters change, take 2
It's possible that a subplan below a Memoize node contains a parameter
from above the Memoize node.  If this parameter changes then cache entries
may become out-dated due to the new parameter value.

Previously Memoize was mistakenly not aware of this.  We fix this here by
flushing the cache whenever a parameter that's not part of the cache
key changes.

Bug: #17213
Reported by: Elvis Pranskevichus
Author: David Rowley
Discussion: https://postgr.es/m/17213-988ed34b225a2862@postgresql.org
Backpatch-through: 14, where Memoize was added
2021-11-24 23:29:56 +13:00
David Rowley
7933bc0d13 Revert "Flush Memoize cache when non-key parameters change"
This reverts commit f94edb06ab.
2021-11-24 15:28:34 +13:00
David Rowley
f94edb06ab Flush Memoize cache when non-key parameters change
It's possible that a subplan below a Memoize node contains a parameter
from above the Memoize node.  If this parameter changes then cache entries
may become out-dated due to the new parameter value.

Previously Memoize was mistakenly not aware of this.  We fix this here by
flushing the cache whenever a parameter that's not part of the cache
key changes.

Bug: #17213
Reported by: Elvis Pranskevichus
Author: David Rowley
Discussion: https://postgr.es/m/17213-988ed34b225a2862@postgresql.org
Backpatch-through: 14, where Memoize was added
2021-11-24 14:57:07 +13:00
David Rowley
6c32c09777 Allow Memoize to operate in binary comparison mode
Memoize would always use the hash equality operator for the cache key
types to determine if the current set of parameters were the same as some
previously cached set.  Certain types such as floating points where -0.0
and +0.0 differ in their binary representation but are classed as equal by
the hash equality operator may cause problems as unless the join uses the
same operator it's possible that whichever join operator is being used
would be able to distinguish the two values.  In which case we may
accidentally return in the incorrect rows out of the cache.

To fix this here we add a binary mode to Memoize to allow it to the
current set of parameters to previously cached values by comparing
bit-by-bit rather than logically using the hash equality operator.  This
binary mode is always used for LATERAL joins and it's used for normal
joins when any of the join operators are not hashable.

Reported-by: Tom Lane
Author: David Rowley
Discussion: https://postgr.es/m/3004308.1632952496@sss.pgh.pa.us
Backpatch-through: 14, where Memoize was added
2021-11-24 10:07:38 +13:00
Tom Lane
08cfa5981e Avoid O(N^2) behavior in SyncPostCheckpoint().
As in commits 6301c3ada and e9d9ba2a4, avoid doing repetitive
list_delete_first() operations, since that would be expensive when
there are many files waiting to be unlinked.  This is a slightly
larger change than in those cases.  We have to keep the list state
valid for calls to AbsorbSyncRequests(), so it's necessary to invent a
"canceled" field instead of immediately deleting PendingUnlinkEntry
entries.  Also, because we might not be able to process all the
entries, we need a new list primitive list_delete_first_n().

list_delete_first_n() is almost list_copy_tail(), but it modifies the
input List instead of making a new copy.  I found a couple of existing
uses of the latter that could profitably use the new function.  (There
might be more, but the other callers look like they probably shouldn't
overwrite the input List.)

As before, back-patch to v13.

Discussion: https://postgr.es/m/CD2F0E7F-9822-45EC-A411-AE56F14DEA9F@amazon.com
2021-11-02 11:31:54 -04:00
Tom Lane
4d5b4483db Fix pull_varnos to cope with translated PlaceHolderVars.
Commit 55dc86eca changed pull_varnos to use (if possible) the associated
ph_eval_at for a PlaceHolderVar.  I missed a fine point though: we might
be looking at a PHV in the quals or tlist of a child appendrel, in which
case we need to compute a ph_eval_at value that's been translated in the
same way that the PHV itself has been (cf. adjust_appendrel_attrs).
Fortunately, enough info is available in the PlaceHolderInfo to make
such translation possible without additional outside data, so we don't
need another round of uglification of planner APIs.  This is a little
bit complicated, but since it's a hard-to-hit corner case, I'm not much
worried about adding cycles here.

Per report from Jaime Casanova.  Back-patch to v12, like the previous
commit.

Discussion: https://postgr.es/m/20210915230959.GB17635@ahch-to
2021-09-17 15:41:16 -04:00
Tom Lane
983d7033df Fix missed lock acquisition while inlining new-style SQL functions.
When starting to use a query parsetree loaded from the catalogs,
we must begin by applying AcquireRewriteLocks(), to obtain the same
relation locks that the parser would have gotten if the query were
entered interactively, and to do some other cleanup such as dealing
with later-dropped columns.  New-style SQL functions are just as
subject to this rule as other stored parsetrees; however, of the
places dealing with such functions, only init_sql_fcache had gotten
the memo.  In particular, if we successfully inlined a new-style
set-returning SQL function that contained any relation references,
we'd either get an assertion failure or attempt to use those
relation(s) sans locks.

I also added AcquireRewriteLocks calls to fmgr_sql_validator and
print_function_sqlbody.  Desultory experiments didn't demonstrate any
failures in those, but I suspect that I just didn't try hard enough.
Certainly we don't expect nearby code paths to operate without locks.

On the same logic of it-ought-to-have-the-same-effects-as-the-old-code,
call pg_rewrite_query() in fmgr_sql_validator, too.  It's possible
that neither code path there needs to bother with rewriting, but
doing the analysis to prove that is beyond my goals for today.

Per bug #17161 from Alexander Lakhin.

Discussion: https://postgr.es/m/17161-048a1cdff8422800@postgresql.org
2021-08-31 12:02:36 -04:00
Tom Lane
b154ee63bb Get rid of artificial restriction on hash table sizes on Windows.
The point of introducing the hash_mem_multiplier GUC was to let users
reproduce the old behavior of hash aggregation, i.e. that it could use
more than work_mem at need.  However, the implementation failed to get
the job done on Win64, where work_mem is clamped to 2GB to protect
various places that calculate memory sizes using "long int".  As
written, the same clamp was applied to hash_mem.  This resulted in
severe performance regressions for queries requiring a bit more than
2GB for hash aggregation, as they now spill to disk and there's no
way to stop that.

Getting rid of the work_mem restriction seems like a good idea, but
it's a big job and could not conceivably be back-patched.  However,
there's only a fairly small number of places that are concerned with
the hash_mem value, and it turns out to be possible to remove the
restriction there without too much code churn or any ABI breaks.
So, let's do that for now to fix the regression, and leave the
larger task for another day.

This patch does introduce a bit more infrastructure that should help
with the larger task, namely pg_bitutils.h support for working with
size_t values.

Per gripe from Laurent Hasson.  Back-patch to v13 where the
behavior change came in.

Discussion: https://postgr.es/m/997817.1627074924@sss.pgh.pa.us
Discussion: https://postgr.es/m/MN2PR15MB25601E80A9B6D1BA6F592B1985E39@MN2PR15MB2560.namprd15.prod.outlook.com
2021-07-25 14:02:27 -04:00
David Rowley
47ca483644 Change the name of the Result Cache node to Memoize
"Result Cache" was never a great name for this node, but nobody managed
to come up with another name that anyone liked enough.  That was until
David Johnston mentioned "Node Memoization", which Tom Lane revised to
just "Memoize".  People seem to like "Memoize", so let's do the rename.

Reviewed-by: Justin Pryzby
Discussion: https://postgr.es/m/20210708165145.GG1176@momjian.us
Backpatch-through: 14, where Result Cache was introduced
2021-07-14 12:45:00 +12:00
Tom Lane
e56bce5d43 Reconsider the handling of procedure OUT parameters.
Commit 2453ea142 redefined pg_proc.proargtypes to include the types of
OUT parameters, for procedures only.  While that had some advantages
for implementing the SQL-spec behavior of DROP PROCEDURE, it was pretty
disastrous from a number of other perspectives.  Notably, since the
primary key of pg_proc is name + proargtypes, this made it possible to
have multiple procedures with identical names + input arguments and
differing output argument types.  That would make it impossible to call
any one of the procedures by writing just NULL (or "?", or any other
data-type-free notation) for the output argument(s).  The change also
seems likely to cause grave confusion for client applications that
examine pg_proc and expect the traditional definition of proargtypes.

Hence, revert the definition of proargtypes to what it was, and
undo a number of complications that had been added to support that.

To support the SQL-spec behavior of DROP PROCEDURE, when there are
no argmode markers in the command's parameter list, we perform the
lookup both ways (that is, matching against both proargtypes and
proallargtypes), succeeding if we get just one unique match.
In principle this could result in ambiguous-function failures
that would not happen when using only one of the two rules.
However, overloading of procedure names is thought to be a pretty
rare usage, so this shouldn't cause many problems in practice.
Postgres-specific code such as pg_dump can defend against any
possibility of such failures by being careful to specify argmodes
for all procedure arguments.

This also fixes a few other bugs in the area of CALL statements
with named parameters, and improves the documentation a little.

catversion bump forced because the representation of procedures
with OUT arguments changes.

Discussion: https://postgr.es/m/3742981.1621533210@sss.pgh.pa.us
2021-06-10 17:11:36 -04:00
Tom Lane
889592344c Fix planner's row-mark code for inheritance from a foreign table.
Commit 428b260f8 broke planning of cases where row marks are needed
(SELECT FOR UPDATE, etc) and one of the query's tables is a foreign
table that has regular table(s) as inheritance children.  We got the
reverse case right, but apparently were thinking that foreign tables
couldn't be inheritance parents.  Not so; so we need to be able to
add a CTID junk column while adding a new child, not only a wholerow
junk column.

Back-patch to v12 where the faulty code came in.

Amit Langote

Discussion: https://postgr.es/m/CA+HiwqEmo3FV1LAQ4TVyS2h1WM=kMkZUmbNuZSCnfHvMcUcPeA@mail.gmail.com
2021-06-02 14:38:14 -04:00
Tom Lane
6ee41a301e Fix mis-planning of repeated application of a projection.
create_projection_plan contains a hidden assumption (here made
explicit by an Assert) that a projection-capable Path will yield a
projection-capable Plan.  Unfortunately, that assumption is violated
only a few lines away, by create_projection_plan itself.  This means
that two stacked ProjectionPaths can yield an outcome where we try to
jam the upper path's tlist into a non-projection-capable child node,
resulting in an invalid plan.

There isn't any good reason to have stacked ProjectionPaths; indeed the
whole concept is faulty, since the set of Vars/Aggs/etc needed by the
upper one wouldn't necessarily be available in the output of the lower
one, nor could the lower one create such values if they weren't
available from its input.  Hence, we can fix this by adjusting
create_projection_path to strip any top-level ProjectionPath from the
subpath it's given.  (This amounts to saying "oh, we changed our
minds about what we need to project here".)

The test case added here only fails in v13 and HEAD; before that, we
don't attempt to shove the Sort into the parallel part of the plan,
for reasons that aren't entirely clear to me.  However, all the
directly-related code looks generally the same as far back as v11,
where the hazard was introduced (by d7c19e62a).  So I've got no faith
that the same type of bug doesn't exist in v11 and v12, given the
right test case.  Hence, back-patch the code changes, but not the
irrelevant test case, into those branches.

Per report from Bas Poot.

Discussion: https://postgr.es/m/534fca83789c4a378c7de379e9067d4f@politie.nl
2021-05-31 12:03:00 -04:00
Tom Lane
e30e3fdea8 Fix use of uninitialized variable in inline_function().
Commit e717a9a18 introduced a code path that bypassed the call of
get_expr_result_type, which is not good because we need its rettupdesc
result to pass to check_sql_fn_retval.  We'd failed to notice right
away because the code path in which check_sql_fn_retval uses that
argument is fairly hard to reach in this context.  It's not impossible
though, and in any case inline_function would have no business
assuming that check_sql_fn_retval doesn't need that value.

To fix, move get_expr_result_type out of the if-block, which in
turn requires moving the construction of the dummy FuncExpr
out of it.

Per report from Ranier Vilela.  (I'm bemused by the lack of any
compiler complaints...)

Discussion: https://postgr.es/m/CAEudQAqBqQpQ3HruWAGU_7WaMJ7tntpk0T8k_dVtNB46DqdBgw@mail.gmail.com
2021-05-25 12:55:55 -04:00
Tom Lane
def5b065ff Initial pgindent and pgperltidy run for v14.
Also "make reformat-dat-files".

The only change worthy of note is that pgindent messed up the formatting
of launcher.c's struct LogicalRepWorkerId, which led me to notice that
that struct wasn't used at all anymore, so I just took it out.
2021-05-12 13:14:10 -04:00
Thomas Munro
ec48314708 Revert per-index collation version tracking feature.
Design problems were discovered in the handling of composite types and
record types that would cause some relevant versions not to be recorded.
Misgivings were also expressed about the use of the pg_depend catalog
for this purpose.  We're out of time for this release so we'll revert
and try again.

Commits reverted:

1bf946bd: Doc: Document known problem with Windows collation versions.
cf002008: Remove no-longer-relevant test case.
ef387bed: Fix bogus collation-version-recording logic.
0fb0a050: Hide internal error for pg_collation_actual_version(<bad OID>).
ff942057: Suppress "warning: variable 'collcollate' set but not used".
d50e3b1f: Fix assertion in collation version lookup.
f24b1569: Rethink extraction of collation dependencies.
257836a7: Track collation versions for indexes.
cd6f479e: Add pg_depend.refobjversion.
7d1297df: Remove pg_collation.collversion.

Discussion: https://postgr.es/m/CA%2BhUKGLhj5t1fcjqAu8iD9B3ixJtsTNqyCCD4V0aTO9kAKAjjA%40mail.gmail.com
2021-05-07 21:10:11 +12:00
Alvaro Herrera
8aba932251
Fix relcache inconsistency hazard in partition detach
During queries coming from ri_triggers.c, we need to omit partitions
that are marked pending detach -- otherwise, the RI query is tricked
into allowing a row into the referencing table whose corresponding row
is in the detached partition.  Which is bogus: once the detach operation
completes, the row becomes an orphan.

However, the code was not doing that in repeatable-read transactions,
because relcache kept a copy of the partition descriptor that included
the partition, and used it in the RI query.  This commit changes the
partdesc cache code to only keep descriptors that aren't dependent on
a snapshot (namely: those where no detached partition exist, and those
where detached partitions are included).  When a partdesc-without-
detached-partitions is requested, we create one afresh each time; also,
those partdescs are stored in PortalContext instead of
CacheMemoryContext.

find_inheritance_children gets a new output *detached_exist boolean,
which indicates whether any partition marked pending-detach is found.
Its "include_detached" input flag is changed to "omit_detached", because
that name captures desired the semantics more naturally.
CreatePartitionDirectory() and RelationGetPartitionDesc() arguments are
identically renamed.

This was noticed because a buildfarm member that runs with relcache
clobbering, which would not keep the improperly cached partdesc, broke
one test, which led us to realize that the expected output of that test
was bogus.  This commit also corrects that expected output.

Author: Amit Langote <amitlangote09@gmail.com>
Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/3269784.1617215412@sss.pgh.pa.us
2021-04-22 15:13:25 -04:00
Tom Lane
3753982441 Fix planner failure in some cases of sorting by an aggregate.
An oversight introduced by the incremental-sort patches caused
"could not find pathkey item to sort" errors in some situations
where a sort key involves an aggregate or window function.

The basic problem here is that find_em_expr_usable_for_sorting_rel
isn't properly modeling what prepare_sort_from_pathkeys will do
later.  Rather than hoping we can keep those functions in sync,
let's refactor so that they actually share the code for
identifying a suitable sort expression.

With this refactoring, tlist.c's tlist_member_ignore_relabel
is unused.  I removed it in HEAD but left it in place in v13,
in case any extensions are using it.

Per report from Luc Vlaming.  Back-patch to v13 where the
problem arose.

James Coleman and Tom Lane

Discussion: https://postgr.es/m/91f3ec99-85a4-fa55-ea74-33f85a5c651f@swarm64.com
2021-04-20 11:32:02 -04:00
Tom Lane
1111b2668d Undo decision to allow pg_proc.prosrc to be NULL.
Commit e717a9a18 changed the longstanding rule that prosrc is NOT NULL
because when a SQL-language function is written in SQL-standard style,
we don't currently have anything useful to put there.  This seems a poor
decision though, as it could easily have negative impacts on external
PLs (opening them to crashes they didn't use to have, for instance).
SQL-function-related code can just as easily test "is prosqlbody not
null" as "is prosrc null", so there's no real gain there either.
Hence, revert the NOT NULL marking removal and adjust related logic.

For now, we just put an empty string into prosrc for SQL-standard
functions.  Maybe we'll have a better idea later, although the
history of things like pg_attrdef.adsrc suggests that it's not
easy to maintain a string equivalent of a node tree.

This also adds an assertion that queryDesc->sourceText != NULL
to standard_ExecutorStart.  We'd been silently relying on that
for awhile, so let's make it less silent.

Also fix some overlooked documentation and test cases.

Discussion: https://postgr.es/m/2197698.1617984583@sss.pgh.pa.us
2021-04-15 17:17:20 -04:00
David Rowley
50e17ad281 Speedup ScalarArrayOpExpr evaluation
ScalarArrayOpExprs with "useOr=true" and a set of Consts on the righthand
side have traditionally been evaluated by using a linear search over the
array.  When these arrays contain large numbers of elements then this
linear search could become a significant part of execution time.

Here we add a new method of evaluating ScalarArrayOpExpr expressions to
allow them to be evaluated by first building a hash table containing each
element, then on subsequent evaluations, we just probe that hash table to
determine if there is a match.

The planner is in charge of determining when this optimization is possible
and it enables it by setting hashfuncid in the ScalarArrayOpExpr.  The
executor will only perform the hash table evaluation when the hashfuncid
is set.

This means that not all cases are optimized. For example CHECK constraints
containing an IN clause won't go through the planner, so won't get the
hashfuncid set.  We could maybe do something about that at some later
date.  The reason we're not doing it now is from fear that we may slow
down cases where the expression is evaluated only once.  Those cases can
be common, for example, a single row INSERT to a table with a CHECK
constraint containing an IN clause.

In the planner, we enable this when there are suitable hash functions for
the ScalarArrayOpExpr's operator and only when there is at least
MIN_ARRAY_SIZE_FOR_HASHED_SAOP elements in the array.  The threshold is
currently set to 9.

Author: James Coleman, David Rowley
Reviewed-by: David Rowley, Tomas Vondra, Heikki Linnakangas
Discussion: https://postgr.es/m/CAAaqYe8x62+=wn0zvNKCj55tPpg-JBHzhZFFc6ANovdqFw7-dA@mail.gmail.com
2021-04-08 23:51:22 +12:00
Peter Eisentraut
e717a9a18b SQL-standard function body
This adds support for writing CREATE FUNCTION and CREATE PROCEDURE
statements for language SQL with a function body that conforms to the
SQL standard and is portable to other implementations.

Instead of the PostgreSQL-specific AS $$ string literal $$ syntax,
this allows writing out the SQL statements making up the body
unquoted, either as a single statement:

    CREATE FUNCTION add(a integer, b integer) RETURNS integer
        LANGUAGE SQL
        RETURN a + b;

or as a block

    CREATE PROCEDURE insert_data(a integer, b integer)
    LANGUAGE SQL
    BEGIN ATOMIC
      INSERT INTO tbl VALUES (a);
      INSERT INTO tbl VALUES (b);
    END;

The function body is parsed at function definition time and stored as
expression nodes in a new pg_proc column prosqlbody.  So at run time,
no further parsing is required.

However, this form does not support polymorphic arguments, because
there is no more parse analysis done at call time.

Dependencies between the function and the objects it uses are fully
tracked.

A new RETURN statement is introduced.  This can only be used inside
function bodies.  Internally, it is treated much like a SELECT
statement.

psql needs some new intelligence to keep track of function body
boundaries so that it doesn't send off statements when it sees
semicolons that are inside a function body.

Tested-by: Jaime Casanova <jcasanov@systemguards.com.ec>
Reviewed-by: Julien Rouhaud <rjuju123@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/1c11f1eb-f00c-43b7-799d-2d44132c02d7@2ndquadrant.com
2021-04-07 21:47:55 +02:00
Peter Eisentraut
9c5f67fd62 Add support for NullIfExpr in eval_const_expressions
Author: Hou Zhijie <houzj.fnst@cn.fujitsu.com>
Discussion: https://www.postgresql.org/message-id/flat/7ea5ce773bbc4eea9ff1a381acd3b102@G08CNEXMBPEKD05.g08.fujitsu.local
2021-04-02 11:01:49 +02:00
David Rowley
9eacee2e62 Add Result Cache executor node (take 2)
Here we add a new executor node type named "Result Cache".  The planner
can include this node type in the plan to have the executor cache the
results from the inner side of parameterized nested loop joins.  This
allows caching of tuples for sets of parameters so that in the event that
the node sees the same parameter values again, it can just return the
cached tuples instead of rescanning the inner side of the join all over
again.  Internally, result cache uses a hash table in order to quickly
find tuples that have been previously cached.

For certain data sets, this can significantly improve the performance of
joins.  The best cases for using this new node type are for join problems
where a large portion of the tuples from the inner side of the join have
no join partner on the outer side of the join.  In such cases, hash join
would have to hash values that are never looked up, thus bloating the hash
table and possibly causing it to multi-batch.  Merge joins would have to
skip over all of the unmatched rows.  If we use a nested loop join with a
result cache, then we only cache tuples that have at least one join
partner on the outer side of the join.  The benefits of using a
parameterized nested loop with a result cache increase when there are
fewer distinct values being looked up and the number of lookups of each
value is large.  Also, hash probes to lookup the cache can be much faster
than the hash probe in a hash join as it's common that the result cache's
hash table is much smaller than the hash join's due to result cache only
caching useful tuples rather than all tuples from the inner side of the
join.  This variation in hash probe performance is more significant when
the hash join's hash table no longer fits into the CPU's L3 cache, but the
result cache's hash table does.  The apparent "random" access of hash
buckets with each hash probe can cause a poor L3 cache hit ratio for large
hash tables.  Smaller hash tables generally perform better.

The hash table used for the cache limits itself to not exceeding work_mem
* hash_mem_multiplier in size.  We maintain a dlist of keys for this cache
and when we're adding new tuples and realize we've exceeded the memory
budget, we evict cache entries starting with the least recently used ones
until we have enough memory to add the new tuples to the cache.

For parameterized nested loop joins, we now consider using one of these
result cache nodes in between the nested loop node and its inner node.  We
determine when this might be useful based on cost, which is primarily
driven off of what the expected cache hit ratio will be.  Estimating the
cache hit ratio relies on having good distinct estimates on the nested
loop's parameters.

For now, the planner will only consider using a result cache for
parameterized nested loop joins.  This works for both normal joins and
also for LATERAL type joins to subqueries.  It is possible to use this new
node for other uses in the future.  For example, to cache results from
correlated subqueries.  However, that's not done here due to some
difficulties obtaining a distinct estimation on the outer plan to
calculate the estimated cache hit ratio.  Currently we plan the inner plan
before planning the outer plan so there is no good way to know if a result
cache would be useful or not since we can't estimate the number of times
the subplan will be called until the outer plan is generated.

The functionality being added here is newly introducing a dependency on
the return value of estimate_num_groups() during the join search.
Previously, during the join search, we only ever needed to perform
selectivity estimations.  With this commit, we need to use
estimate_num_groups() in order to estimate what the hit ratio on the
result cache will be.   In simple terms, if we expect 10 distinct values
and we expect 1000 outer rows, then we'll estimate the hit ratio to be
99%.  Since cache hits are very cheap compared to scanning the underlying
nodes on the inner side of the nested loop join, then this will
significantly reduce the planner's cost for the join.   However, it's
fairly easy to see here that things will go bad when estimate_num_groups()
incorrectly returns a value that's significantly lower than the actual
number of distinct values.  If this happens then that may cause us to make
use of a nested loop join with a result cache instead of some other join
type, such as a merge or hash join.  Our distinct estimations have been
known to be a source of trouble in the past, so the extra reliance on them
here could cause the planner to choose slower plans than it did previous
to having this feature.  Distinct estimations are also fairly hard to
estimate accurately when several tables have been joined already or when a
WHERE clause filters out a set of values that are correlated to the
expressions we're estimating the number of distinct value for.

For now, the costing we perform during query planning for result caches
does put quite a bit of faith in the distinct estimations being accurate.
When these are accurate then we should generally see faster execution
times for plans containing a result cache.  However, in the real world, we
may find that we need to either change the costings to put less trust in
the distinct estimations being accurate or perhaps even disable this
feature by default.  There's always an element of risk when we teach the
query planner to do new tricks that it decides to use that new trick at
the wrong time and causes a regression.  Users may opt to get the old
behavior by turning the feature off using the enable_resultcache GUC.
Currently, this is enabled by default.  It remains to be seen if we'll
maintain that setting for the release.

Additionally, the name "Result Cache" is the best name I could think of
for this new node at the time I started writing the patch.  Nobody seems
to strongly dislike the name. A few people did suggest other names but no
other name seemed to dominate in the brief discussion that there was about
names. Let's allow the beta period to see if the current name pleases
enough people.  If there's some consensus on a better name, then we can
change it before the release.  Please see the 2nd discussion link below
for the discussion on the "Result Cache" name.

Author: David Rowley
Reviewed-by: Andy Fan, Justin Pryzby, Zhihong Yu, Hou Zhijie
Tested-By: Konstantin Knizhnik
Discussion: https://postgr.es/m/CAApHDvrPcQyQdWERGYWx8J%2B2DLUNgXu%2BfOSbQ1UscxrunyXyrQ%40mail.gmail.com
Discussion: https://postgr.es/m/CAApHDvq=yQXr5kqhRviT2RhNKwToaWr9JAN5t+5_PzhuRJ3wvg@mail.gmail.com
2021-04-02 14:10:56 +13:00
David Rowley
28b3e3905c Revert b6002a796
This removes "Add Result Cache executor node".  It seems that something
weird is going on with the tracking of cache hits and misses as
highlighted by many buildfarm animals.  It's not yet clear what the
problem is as other parts of the plan indicate that the cache did work
correctly, it's just the hits and misses that were being reported as 0.

This is especially a bad time to have the buildfarm so broken, so
reverting before too many more animals go red.

Discussion: https://postgr.es/m/CAApHDvq_hydhfovm4=izgWs+C5HqEeRScjMbOgbpC-jRAeK3Yw@mail.gmail.com
2021-04-01 13:33:23 +13:00
David Rowley
b6002a796d Add Result Cache executor node
Here we add a new executor node type named "Result Cache".  The planner
can include this node type in the plan to have the executor cache the
results from the inner side of parameterized nested loop joins.  This
allows caching of tuples for sets of parameters so that in the event that
the node sees the same parameter values again, it can just return the
cached tuples instead of rescanning the inner side of the join all over
again.  Internally, result cache uses a hash table in order to quickly
find tuples that have been previously cached.

For certain data sets, this can significantly improve the performance of
joins.  The best cases for using this new node type are for join problems
where a large portion of the tuples from the inner side of the join have
no join partner on the outer side of the join.  In such cases, hash join
would have to hash values that are never looked up, thus bloating the hash
table and possibly causing it to multi-batch.  Merge joins would have to
skip over all of the unmatched rows.  If we use a nested loop join with a
result cache, then we only cache tuples that have at least one join
partner on the outer side of the join.  The benefits of using a
parameterized nested loop with a result cache increase when there are
fewer distinct values being looked up and the number of lookups of each
value is large.  Also, hash probes to lookup the cache can be much faster
than the hash probe in a hash join as it's common that the result cache's
hash table is much smaller than the hash join's due to result cache only
caching useful tuples rather than all tuples from the inner side of the
join.  This variation in hash probe performance is more significant when
the hash join's hash table no longer fits into the CPU's L3 cache, but the
result cache's hash table does.  The apparent "random" access of hash
buckets with each hash probe can cause a poor L3 cache hit ratio for large
hash tables.  Smaller hash tables generally perform better.

The hash table used for the cache limits itself to not exceeding work_mem
* hash_mem_multiplier in size.  We maintain a dlist of keys for this cache
and when we're adding new tuples and realize we've exceeded the memory
budget, we evict cache entries starting with the least recently used ones
until we have enough memory to add the new tuples to the cache.

For parameterized nested loop joins, we now consider using one of these
result cache nodes in between the nested loop node and its inner node.  We
determine when this might be useful based on cost, which is primarily
driven off of what the expected cache hit ratio will be.  Estimating the
cache hit ratio relies on having good distinct estimates on the nested
loop's parameters.

For now, the planner will only consider using a result cache for
parameterized nested loop joins.  This works for both normal joins and
also for LATERAL type joins to subqueries.  It is possible to use this new
node for other uses in the future.  For example, to cache results from
correlated subqueries.  However, that's not done here due to some
difficulties obtaining a distinct estimation on the outer plan to
calculate the estimated cache hit ratio.  Currently we plan the inner plan
before planning the outer plan so there is no good way to know if a result
cache would be useful or not since we can't estimate the number of times
the subplan will be called until the outer plan is generated.

The functionality being added here is newly introducing a dependency on
the return value of estimate_num_groups() during the join search.
Previously, during the join search, we only ever needed to perform
selectivity estimations.  With this commit, we need to use
estimate_num_groups() in order to estimate what the hit ratio on the
result cache will be.   In simple terms, if we expect 10 distinct values
and we expect 1000 outer rows, then we'll estimate the hit ratio to be
99%.  Since cache hits are very cheap compared to scanning the underlying
nodes on the inner side of the nested loop join, then this will
significantly reduce the planner's cost for the join.   However, it's
fairly easy to see here that things will go bad when estimate_num_groups()
incorrectly returns a value that's significantly lower than the actual
number of distinct values.  If this happens then that may cause us to make
use of a nested loop join with a result cache instead of some other join
type, such as a merge or hash join.  Our distinct estimations have been
known to be a source of trouble in the past, so the extra reliance on them
here could cause the planner to choose slower plans than it did previous
to having this feature.  Distinct estimations are also fairly hard to
estimate accurately when several tables have been joined already or when a
WHERE clause filters out a set of values that are correlated to the
expressions we're estimating the number of distinct value for.

For now, the costing we perform during query planning for result caches
does put quite a bit of faith in the distinct estimations being accurate.
When these are accurate then we should generally see faster execution
times for plans containing a result cache.  However, in the real world, we
may find that we need to either change the costings to put less trust in
the distinct estimations being accurate or perhaps even disable this
feature by default.  There's always an element of risk when we teach the
query planner to do new tricks that it decides to use that new trick at
the wrong time and causes a regression.  Users may opt to get the old
behavior by turning the feature off using the enable_resultcache GUC.
Currently, this is enabled by default.  It remains to be seen if we'll
maintain that setting for the release.

Additionally, the name "Result Cache" is the best name I could think of
for this new node at the time I started writing the patch.  Nobody seems
to strongly dislike the name. A few people did suggest other names but no
other name seemed to dominate in the brief discussion that there was about
names. Let's allow the beta period to see if the current name pleases
enough people.  If there's some consensus on a better name, then we can
change it before the release.  Please see the 2nd discussion link below
for the discussion on the "Result Cache" name.

Author: David Rowley
Reviewed-by: Andy Fan, Justin Pryzby, Zhihong Yu
Tested-By: Konstantin Knizhnik
Discussion: https://postgr.es/m/CAApHDvrPcQyQdWERGYWx8J%2B2DLUNgXu%2BfOSbQ1UscxrunyXyrQ%40mail.gmail.com
Discussion: https://postgr.es/m/CAApHDvq=yQXr5kqhRviT2RhNKwToaWr9JAN5t+5_PzhuRJ3wvg@mail.gmail.com
2021-04-01 12:32:22 +13:00
Tom Lane
86dc90056d Rework planning and execution of UPDATE and DELETE.
This patch makes two closely related sets of changes:

1. For UPDATE, the subplan of the ModifyTable node now only delivers
the new values of the changed columns (i.e., the expressions computed
in the query's SET clause) plus row identity information such as CTID.
ModifyTable must re-fetch the original tuple to merge in the old
values of any unchanged columns.  The core advantage of this is that
the changed columns are uniform across all tables of an inherited or
partitioned target relation, whereas the other columns might not be.
A secondary advantage, when the UPDATE involves joins, is that less
data needs to pass through the plan tree.  The disadvantage of course
is an extra fetch of each tuple to be updated.  However, that seems to
be very nearly free in context; even worst-case tests don't show it to
add more than a couple percent to the total query cost.  At some point
it might be interesting to combine the re-fetch with the tuple access
that ModifyTable must do anyway to mark the old tuple dead; but that
would require a good deal of refactoring and it seems it wouldn't buy
all that much, so this patch doesn't attempt it.

2. For inherited UPDATE/DELETE, instead of generating a separate
subplan for each target relation, we now generate a single subplan
that is just exactly like a SELECT's plan, then stick ModifyTable
on top of that.  To let ModifyTable know which target relation a
given incoming row refers to, a tableoid junk column is added to
the row identity information.  This gets rid of the horrid hack
that was inheritance_planner(), eliminating O(N^2) planning cost
and memory consumption in cases where there were many unprunable
target relations.

Point 2 of course requires point 1, so that there is a uniform
definition of the non-junk columns to be returned by the subplan.
We can't insist on uniform definition of the row identity junk
columns however, if we want to keep the ability to have both
plain and foreign tables in a partitioning hierarchy.  Since
it wouldn't scale very far to have every child table have its
own row identity column, this patch includes provisions to merge
similar row identity columns into one column of the subplan result.
In particular, we can merge the whole-row Vars typically used as
row identity by FDWs into one column by pretending they are type
RECORD.  (It's still okay for the actual composite Datums to be
labeled with the table's rowtype OID, though.)

There is more that can be done to file down residual inefficiencies
in this patch, but it seems to be committable now.

FDW authors should note several API changes:

* The argument list for AddForeignUpdateTargets() has changed, and so
has the method it must use for adding junk columns to the query.  Call
add_row_identity_var() instead of manipulating the parse tree directly.
You might want to reconsider exactly what you're adding, too.

* PlanDirectModify() must now work a little harder to find the
ForeignScan plan node; if the foreign table is part of a partitioning
hierarchy then the ForeignScan might not be the direct child of
ModifyTable.  See postgres_fdw for sample code.

* To check whether a relation is a target relation, it's no
longer sufficient to compare its relid to root->parse->resultRelation.
Instead, check it against all_result_relids or leaf_result_relids,
as appropriate.

Amit Langote and Tom Lane

Discussion: https://postgr.es/m/CA+HiwqHpHdqdDn48yCEhynnniahH78rwcrv1rEX65-fsZGBOLQ@mail.gmail.com
2021-03-31 11:52:37 -04:00
David Rowley
ed934d4fa3 Allow estimate_num_groups() to pass back further details about the estimation
Here we add a new output parameter to estimate_num_groups() to allow it to
inform the caller of additional, possibly useful information about the
estimation.

The new output parameter is a struct that currently contains just a single
field with a set of flags.  This was done rather than having the flags as
an output parameter to allow future fields to be added without having to
change the signature of the function at a later date when we want to pass
back further information that might not be suitable to store in the flags
field.

It seems reasonable that one day in the future that the planner would want
to know more about the estimation. For example, how many individual sets
of statistics was the estimation generated from?  The planner may want to
take that into account if we ever want to consider risks as well as costs
when generating plans.

For now, there's only 1 flag we set in the flags field.  This is to
indicate if the estimation fell back on using the hard-coded constants in
any part of the estimation. Callers may like to change their behavior if
this is set, and this gives them the ability to do so.  Callers may pass
the flag pointer as NULL if they have no interest in obtaining any
additional information about the estimate.

We're not adding any actual usages of these flags here.  Some follow-up
commits will make use of this feature.  Additionally, we're also not
making any changes to add support for clauselist_selectivity() and
clauselist_selectivity_ext().  However, if this is required in the future
then the same struct being added here should be fine to use as a new
output argument for those functions too.

Author: David Rowley
Discussion: https://postgr.es/m/CAApHDvqQqpk=1W-G_ds7A9CsXX3BggWj_7okinzkLVhDubQzjA@mail.gmail.com
2021-03-30 20:52:46 +13:00