Commit graph

10449 commits

Author SHA1 Message Date
Noah Misch
0fe002d0c9 Unpin buffer before inplace update waits for an XID to end.
Commit a07e03fd8f changed inplace updates
to wait for heap_update() commands like GRANT TABLE and GRANT DATABASE.
By keeping the pin during that wait, a sequence of autovacuum workers
and an uncommitted GRANT starved one foreground LockBufferForCleanup()
for six minutes, on buildfarm member sarus.  Prevent, at the cost of a
bit of complexity.  Back-patch to v12, like the earlier commit.  That
commit and heap_inplace_lock() have not yet appeared in any release.

Discussion: https://postgr.es/m/20241026184936.ae.nmisch@google.com
2024-10-29 09:39:59 -07:00
Michael Paquier
14440d3b07 doc: Add better description for rewrite functions in event triggers
There are two functions that can be used in event triggers to get more
details about a rewrite happening on a relation.  Both had a limited
documentation:
- pg_event_trigger_table_rewrite_reason() and
pg_event_trigger_table_rewrite_oid() were not mentioned in the main
event trigger section in the paragraph dedicated to the event
table_rewrite.
- pg_event_trigger_table_rewrite_reason() returns an integer which is a
bitmap of the reasons why a rewrite happens.  There was no explanation
about the meaning of these values, forcing the reader to look at the
code to find out that these are defined in event_trigger.h.

While on it, let's add a comment in event_trigger.h where the
AT_REWRITE_* are defined, telling to update the documentation when
these values are changed.

Backpatch down to 13 as a consequence of 1ad23335f3, where this area
of the documentation has been heavily reworked.

Author: Greg Sabino Mullane
Discussion: https://postgr.es/m/CAKAnmmL+Z6j-C8dAx1tVrnBmZJu+BSoc68WSg3sR+CVNjBCqbw@mail.gmail.com
Backpatch-through: 13
2024-10-29 15:35:19 +09:00
Noah Misch
4eac5a1fa7 For inplace update, send nontransactional invalidations.
The inplace update survives ROLLBACK.  The inval didn't, so another
backend's DDL could then update the row without incorporating the
inplace update.  In the test this fixes, a mix of CREATE INDEX and ALTER
TABLE resulted in a table with an index, yet relhasindex=f.  That is a
source of index corruption.  Back-patch to v12 (all supported versions).
The back branch versions don't change WAL, because those branches just
added end-of-recovery SIResetAll().  All branches change the ABI of
extern function PrepareToInvalidateCacheTuple().  No PGXN extension
calls that, and there's no apparent use case in extensions.

Reviewed by Nitin Motiani and (in earlier versions) Andres Freund.

Discussion: https://postgr.es/m/20240523000548.58.nmisch@google.com
2024-10-25 06:51:07 -07:00
Noah Misch
3baf804b72 At end of recovery, reset all sinval-managed caches.
An inplace update's invalidation messages are part of its transaction's
commit record.  However, the update survives even if its transaction
aborts or we stop recovery before replaying its transaction commit.
After recovery, a backend that started in recovery could update the row
without incorporating the inplace update.  That could result in a table
with an index, yet relhasindex=f.  That is a source of index corruption.

This bulk invalidation avoids the functional consequences.  A future
change can fix the !RecoveryInProgress() scenario without changing the
WAL format.  Back-patch to v17 - v12 (all supported versions).  v18 will
instead add invalidations to WAL.

Discussion: https://postgr.es/m/20240618152349.7f.nmisch@google.com
2024-10-25 06:51:07 -07:00
Andrew Dunstan
d700e8d75b Bump MIN_WINNT for MINGW to clear a build error
Because we have been setting this too low, there has been a
long-standing warning about a missing declaration for inet_pton().
Modern gcc now considers this an error, so we have been getting failures
on the buildfarm animal fairywren.

Fix suggested by Thomas Munro.

This isn't needed in later branches, as they already set MIN_WINNT
higher, nor on earlier branches because they don't use inet_pton().

Discussion: https://postgr.es/m/574fae43-c993-4a25-b0e5-04c3e9c36d6d@dunslane.net
2024-09-30 11:32:32 -04:00
Noah Misch
5c837f8fa0 For inplace update durability, make heap_update() callers wait.
The previous commit fixed some ways of losing an inplace update.  It
remained possible to lose one when a backend working toward a
heap_update() copied a tuple into memory just before inplace update of
that tuple.  In catalogs eligible for inplace update, use LOCKTAG_TUPLE
to govern admission to the steps of copying an old tuple, modifying it,
and issuing heap_update().  This includes MERGE commands.  To avoid
changing most of the pg_class DDL, don't require LOCKTAG_TUPLE when
holding a relation lock sufficient to exclude inplace updaters.
Back-patch to v12 (all supported versions).  In v13 and v12, "UPDATE
pg_class" or "UPDATE pg_database" can still lose an inplace update.  The
v14+ UPDATE fix needs commit 86dc90056d,
and it wasn't worth reimplementing that fix without such infrastructure.

Reviewed by Nitin Motiani and (in earlier versions) Heikki Linnakangas.

Discussion: https://postgr.es/m/20231027214946.79.nmisch@google.com
2024-09-24 15:25:23 -07:00
Noah Misch
8590c942c1 Fix data loss at inplace update after heap_update().
As previously-added tests demonstrated, heap_inplace_update() could
instead update an unrelated tuple of the same catalog.  It could lose
the update.  Losing relhasindex=t was a source of index corruption.
Inplace-updating commands like VACUUM will now wait for heap_update()
commands like GRANT TABLE and GRANT DATABASE.  That isn't ideal, but a
long-running GRANT already hurts VACUUM progress more just by keeping an
XID running.  The VACUUM will behave like a DELETE or UPDATE waiting for
the uncommitted change.

For implementation details, start at the systable_inplace_update_begin()
header comment and README.tuplock.  Back-patch to v12 (all supported
versions).  In back branches, retain a deprecated heap_inplace_update(),
for extensions.

Reported by Smolkin Grigory.  Reviewed by Nitin Motiani, (in earlier
versions) Heikki Linnakangas, and (in earlier versions) Alexander
Lakhin.

Discussion: https://postgr.es/m/CAMp+ueZQz3yDk7qg42hk6-9gxniYbp-=bG2mgqecErqR5gGGOA@mail.gmail.com
2024-09-24 15:25:23 -07:00
Tom Lane
2f4e895be7 Allow adjusting session_authorization and role in parallel workers.
The code intends to allow GUCs to be set within parallel workers
via function SET clauses, but not otherwise.  However, doing so fails
for "session_authorization" and "role", because the assign hooks for
those attempt to set the subsidiary "is_superuser" GUC, and that call
falls foul of the "not otherwise" prohibition.  We can't switch to
using GUC_ACTION_SAVE for this, so instead add a new GUC variable
flag GUC_ALLOW_IN_PARALLEL to mark is_superuser as being safe to set
anyway.  (This is okay because is_superuser has context PGC_INTERNAL
and thus only hard-wired calls can change it.  We'd need more thought
before applying the flag to other GUCs; but maybe there are other
use-cases.)  This isn't the prettiest fix perhaps, but other
alternatives we thought of would be much more invasive.

While here, correct a thinko in commit 059de3ca4: when rejecting
a GUC setting within a parallel worker, we should return 0 not -1
if the ereport doesn't longjmp.  (This seems to have no consequences
right now because no caller cares, but it's inconsistent.)  Improve
the comments to try to forestall future confusion of the same kind.

Despite the lack of field complaints, this seems worth back-patching.
Thanks to Nathan Bossart for the idea to invent a new flag,
and for review.

Discussion: https://postgr.es/m/2833457.1723229039@sss.pgh.pa.us
2024-08-10 15:51:28 -04:00
Masahiko Sawada
e81e53a0c1 Restrict accesses to non-system views and foreign tables during pg_dump.
When pg_dump retrieves the list of database objects and performs the
data dump, there was possibility that objects are replaced with others
of the same name, such as views, and access them. This vulnerability
could result in code execution with superuser privileges during the
pg_dump process.

This issue can arise when dumping data of sequences, foreign
tables (only 13 or later), or tables registered with a WHERE clause in
the extension configuration table.

To address this, pg_dump now utilizes the newly introduced
restrict_nonsystem_relation_kind GUC parameter to restrict the
accesses to non-system views and foreign tables during the dump
process. This new GUC parameter is added to back branches too, but
these changes do not require cluster recreation.

Back-patch to all supported branches.

Reviewed-by: Noah Misch
Security: CVE-2024-7348
Backpatch-through: 12
2024-08-05 06:05:25 -07:00
Etsuro Fujita
d7bc9c9d04 Update comment in portal.h.
We store tuples into the portal's tuple store for a PORTAL_ONE_MOD_WITH
query as well.

Back-patch to all supported branches.

Reviewed by Andy Fan.

Discussion: https://postgr.es/m/CAPmGK14HVYBZYZtHabjeCd-e31VT%3Dwx6rQNq8QfehywLcpZ2Hw%40mail.gmail.com
2024-08-01 17:45:04 +09:00
Daniel Gustafsson
970cd5c62b Fix macro placement in pg_config.h.in
Commit 274bbced85 accidentally placed the pg_config.h.in
for SSL_CTX_set_num_tickets on the wrong line wrt where autoheader
places it.  Fix by re-arranging and backpatch to the same level as
the original commit.

Reported-by: Marina Polyakova <m.polyakova@postgrespro.ru>
Discussion: https://postgr.es/m/48cebe8c3eaf308bae253b1dbf4e4a75@postgrespro.ru
Backpatch-through: v12
2024-07-26 14:16:40 +02:00
Daniel Gustafsson
118ec331bf Disable all TLS session tickets
OpenSSL supports two types of session tickets for TLSv1.3, stateless
and stateful. The option we've used only turns off stateless tickets
leaving stateful tickets active. Use the new API introduced in 1.1.1
to disable all types of tickets.

Backpatch to all supported versions.

Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reported-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/20240617173803.6alnafnxpiqvlh3g@awork3.anarazel.de
Backpatch-through: v12
2024-07-26 11:09:45 +02:00
Melanie Plageman
dc6354c670 Ensure vacuum removes all visibly dead tuples older than OldestXmin
If vacuum fails to remove a tuple with xmax older than
VacuumCutoffs->OldestXmin and younger than GlobalVisState->maybe_needed,
it will loop infinitely in lazy_scan_prune(), which compares tuples'
visibility information to OldestXmin.

Starting in version 14, which uses GlobalVisState for visibility testing
during pruning, it is possible for GlobalVisState->maybe_needed to
precede OldestXmin if maybe_needed is forced to go backward while vacuum
is running. This can happen if a disconnected standby with a running
transaction older than VacuumCutoffs->OldestXmin reconnects to the
primary after vacuum initially calculates GlobalVisState and OldestXmin.

Fix this by having vacuum always remove tuples older than OldestXmin
during pruning. This is okay because the standby won't replay the tuple
removal until the tuple is removable. Thus, the worst that can happen is
a recovery conflict.

Fixes BUG# 17257

Back-patched in versions 14-17

Author: Melanie Plageman
Reviewed-by: Noah Misch, Peter Geoghegan, Robert Haas, Andres Freund, and Heikki Linnakangas
Discussion: https://postgr.es/m/CAAKRu_Y_NJzF4-8gzTTeaOuUL3CcGoXPjXcAHbTTygT8AyVqag%40mail.gmail.com
2024-07-19 12:05:51 -04:00
Masahiko Sawada
aee8c2b954 Fix possibility of logical decoding partial transaction changes.
When creating and initializing a logical slot, the restart_lsn is set
to the latest WAL insertion point (or the latest replay point on
standbys). Subsequently, WAL records are decoded from that point to
find the start point for extracting changes in the
DecodingContextFindStartpoint() function. Since the initial
restart_lsn could be in the middle of a transaction, the start point
must be a consistent point where we won't see the data for partial
transactions.

Previously, when not building a full snapshot, serialized snapshots
were restored, and the SnapBuild jumps to the consistent state even
while finding the start point. Consequently, the slot's restart_lsn
and confirmed_flush could be set to the middle of a transaction. This
could lead to various unexpected consequences. Specifically, there
were reports of logical decoding decoding partial transactions, and
assertion failures occurred because only subtransactions were decoded
without decoding their top-level transaction until decoding the commit
record.

To resolve this issue, the changes prevent restoring the serialized
snapshot and jumping to the consistent state while finding the start
point.

On v17 and HEAD, a flag indicating whether snapshot restores should be
skipped has been added to the SnapBuild struct, and SNAPBUILD_VERSION
has been bumpded.

On backbranches, the flag is stored in the LogicalDecodingContext
instead, preserving on-disk compatibility.

Backpatch to all supported versions.

Reported-by: Drew Callahan
Reviewed-by: Amit Kapila, Hayato Kuroda
Discussion: https://postgr.es/m/2444AA15-D21B-4CCE-8052-52C7C2DAFE5C%40amazon.com
Backpatch-through: 12
2024-07-11 22:48:16 +09:00
Thomas Munro
467d77bb16 Cope with <regex.h> name clashes.
macOS 15's SDK pulls in headers related to <regex.h> when we include
<xlocale.h>.  This causes our own regex_t implementation to clash with
the OS's regex_t implementation.  Luckily our function names already had
pg_ prefixes, but the macros and typenames did not.

Include <regex.h> explicitly on all POSIX systems, and fix everything
that breaks.  Then we can prove that we are capable of fully hiding and
replacing the system regex API with our own.

1.  Deal with standard-clobbering macros by undefining them all first.
POSIX says they are "symbolic constants".  If they are macros, this
allows us to redefine them.  If they are enums or variables, our macros
will hide them.

2.  Deal with standard-clobbering types by giving our types pg_
prefixes, and then using macros to redirect xxx_t -> pg_xxx_t.

After including our "regex/regex.h", the system <regex.h> is hidden,
because we've replaced all the standard names.  The PostgreSQL source
tree and extensions can continue to use standard prefix-less type and
macro names, but reach our implementation, if they included our
"regex/regex.h" header.

Back-patch to all supported branches, so that macOS 15's tool chain can
build them.

Reported-by: Stan Hu <stanhu@gmail.com>
Suggested-by: Tom Lane <tgl@sss.pgh.pa.us>
Tested-by: Aleksander Alekseev <aleksander@timescale.com>
Discussion: https://postgr.es/m/CAMBWrQnEwEJtgOv7EUNsXmFw2Ub4p5P%2B5QTBEgYwiyjy7rAsEQ%40mail.gmail.com
2024-07-06 10:53:13 +12:00
Noah Misch
2ca8ca4827 Remove comment about xl_heap_inplace "AT END OF STRUCT".
Commit 2c03216d83 moved the tuple data
from there to the buffer-0 data.  Back-patch to v12 (all supported
versions), the plan for the next change to this struct.

Discussion: https://postgr.es/m/20240523000548.58.nmisch@google.com
2024-06-27 19:21:11 -07:00
Noah Misch
b08a4b6163 Cope with inplace update making catcache stale during TOAST fetch.
This extends ad98fb1422 to invals of
inplace updates.  Trouble requires an inplace update of a catalog having
a TOAST table, so only pg_database was at risk.  (The other catalog on
which core code performs inplace updates, pg_class, has no TOAST table.)
Trouble would require something like the inplace-inval.spec test.
Consider GRANT ... ON DATABASE fetching a stale row from cache and
discarding a datfrozenxid update that vac_truncate_clog() has already
relied upon.  Back-patch to v12 (all supported versions).

Reviewed (in an earlier version) by Robert Haas.

Discussion: https://postgr.es/m/20240114201411.d0@rfd.leadboat.com
Discussion: https://postgr.es/m/20240512232923.aa.nmisch@google.com
2024-06-27 19:21:11 -07:00
Noah Misch
a338e41374 Lock before setting relhassubclass on RELKIND_PARTITIONED_INDEX.
Commit 5b562644fe added a comment that
SetRelationHasSubclass() callers must hold this lock.  When commit
17f206fbc8 extended use of this column to
partitioned indexes, it didn't take the lock.  As the latter commit
message mentioned, we currently never reset a partitioned index to
relhassubclass=f.  That largely avoids harm from the lock omission.  The
cause for fixing this now is to unblock introducing a rule about locks
required to heap_update() a pg_class row.  This might cause more
deadlocks.  It gives minor user-visible benefits:

- If an ALTER INDEX SET TABLESPACE runs concurrently with ALTER TABLE
  ATTACH PARTITION or CREATE PARTITION OF, one transaction blocks
  instead of failing with "tuple concurrently updated".  (Many cases of
  DDL concurrency still fail that way.)

- Match ALTER INDEX ATTACH PARTITION in choosing to lock the index.

While not user-visible today, we'll need this if we ever make something
set the flag to false for a partitioned index, like ANALYZE does today
for tables.  Back-patch to v12 (all supported versions), the plan for
the commit relying on the new rule.  In back branches, add
LockOrStrongerHeldByMe() instead of adding a LockHeldByMe() parameter.

Reviewed (in an earlier version) by Robert Haas.

Discussion: https://postgr.es/m/20240611024525.9f.nmisch@google.com
2024-06-27 19:21:10 -07:00
Heikki Linnakangas
0e2f3d78bf Fix MVCC bug with prepared xact with subxacts on standby
We did not recover the subtransaction IDs of prepared transactions
when starting a hot standby from a shutdown checkpoint. As a result,
such subtransactions were considered as aborted, rather than
in-progress. That would lead to hint bits being set incorrectly, and
the subtransactions suddenly becoming visible to old snapshots when
the prepared transaction was committed.

To fix, update pg_subtrans with prepared transactions's subxids when
starting hot standby from a shutdown checkpoint. The snapshots taken
from that state need to be marked as "suboverflowed", so that we also
check the pg_subtrans.

Backport to all supported versions.

Discussion: https://www.postgresql.org/message-id/6b852e98-2d49-4ca1-9e95-db419a2696e0@iki.fi
2024-06-27 21:10:31 +03:00
Tom Lane
f550833193 Fix insertion of SP-GiST REDIRECT tuples during REINDEX CONCURRENTLY.
Reconstruction of an SP-GiST index by REINDEX CONCURRENTLY may
insert some REDIRECT tuples.  This will typically happen in
a transaction that lacks an XID, which leads either to assertion
failure in spgFormDeadTuple or to insertion of a REDIRECT tuple
with zero xid.  The latter's not good either, since eventually
VACUUM will apply GlobalVisTestIsRemovableXid() to the zero xid,
resulting in either an assertion failure or a garbage answer.

In practice, since REINDEX CONCURRENTLY locks out index scans
till it's done, it doesn't matter whether it inserts REDIRECTs
or PLACEHOLDERs; and likewise it doesn't matter how soon VACUUM
reduces such a REDIRECT to a PLACEHOLDER.  So in non-assert builds
there's no observable problem here, other than perhaps a little
index bloat.  But it's not behaving as intended.

To fix, remove the failing Assert in spgFormDeadTuple, acknowledging
that we might sometimes insert a zero XID; and guard VACUUM's
GlobalVisTestIsRemovableXid() call with a test for valid XID,
ensuring that we'll reduce such a REDIRECT the first time VACUUM
sees it.  (Versions before v14 use TransactionIdPrecedes here,
which won't fail on zero xid, so they really have no bug at all
in non-assert builds.)

Another solution could be to not create REDIRECTs at all during
REINDEX CONCURRENTLY, making the relevant code paths treat that
case like index build (which likewise knows that no concurrent
index scans can be happening).  That would allow restoring the
Assert in spgFormDeadTuple, but we'd still need the VACUUM change
because redirection tuples with zero xid may be out there already.
But there doesn't seem to be a nice way for spginsert() to tell that
it's being called in REINDEX CONCURRENTLY without some API changes,
so we'll leave that as a possible future improvement.

In HEAD, also rename the SpGistState.myXid field to redirectXid,
which seems less misleading (since it might not in fact be our
transaction's XID) and is certainly less uninformatively generic.

Per bug #18499 from Alexander Lakhin.  Back-patch to all supported
branches.

Discussion: https://postgr.es/m/18499-8a519c280f956480@postgresql.org
2024-06-17 14:30:59 -04:00
Tom Lane
4ac385adc5 Account for optimized MinMax aggregates during SS_finalize_plan.
We are capable of optimizing MIN() and MAX() aggregates on indexed
columns into subqueries that exploit the index, rather than the normal
thing of scanning the whole table.  When we do this, we replace the
Aggref node(s) with Params referencing subquery outputs.  Such Params
really ought to be included in the per-plan-node extParam/allParam
sets computed by SS_finalize_plan.  However, we've never done so
up to now because of an ancient implementation choice to perform
that substitution during set_plan_references, which runs after
SS_finalize_plan, so that SS_finalize_plan never sees these Params.

The cleanest fix would be to perform a separate tree walk to do
these substitutions before SS_finalize_plan runs.  That seems
unattractive, first because a whole-tree mutation pass is expensive,
and second because we lack infrastructure for visiting expression
subtrees in a Plan tree, so that we'd need a new function knowing
as much as SS_finalize_plan knows about that.  I also considered
swapping the order of SS_finalize_plan and set_plan_references,
but that fell foul of various assumptions that seem tricky to fix.
So the approach adopted here is to teach SS_finalize_plan itself
to check for such Aggrefs.  I refactored things a bit in setrefs.c
to avoid having three copies of the code that does that.

Back-patch of v17 commits d0d44049d and 779ac2c74.  When d0d44049d
went in, there was no evidence that it was fixing a reachable bug,
so I refrained from back-patching.  Now we have such evidence.

Per bug #18465 from Hal Takahara.  Back-patch to all supported
branches.

Discussion: https://postgr.es/m/18465-2fae927718976b22@postgresql.org
Discussion: https://postgr.es/m/2391880.1689025003@sss.pgh.pa.us
2024-05-18 14:31:35 -04:00
David Rowley
52f21f9287 Ensure we allocate NAMEDATALEN bytes for names in Index Only Scans
As an optimization, we store "name" columns as cstrings in btree
indexes.

Here we modify it so that Index Only Scans convert these cstrings back
to names with NAMEDATALEN bytes rather than storing the cstring in the
tuple slot, as was happening previously.

Bug: #17855
Reported-by: Alexander Lakhin
Reviewed-by: Alexander Lakhin, Tom Lane
Discussion: https://postgr.es/m/17855-5f523e0f9769a566@postgresql.org
Backpatch-through: 12, all supported versions
2024-05-01 13:22:16 +12:00
Andres Freund
dcb7cf945c simplehash: Free collisions array in SH_STAT
While SH_STAT() is only used for debugging, the allocated array can be large,
and therefore should be freed.

It's unclear why coverity started warning now.

Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Reported-by: Coverity
Discussion: https://postgr.es/m/3005248.1712538233@sss.pgh.pa.us
Backpatch: 12-
2024-04-07 19:09:04 -07:00
Alexander Korotkov
c2faf48fa3 Fix the parameters order for TableAmRoutine.relation_copy_for_cluster()
Specify OldTable first, NewTable second as used by
table_relation_copy_for_cluster() and as implemented in
heapam_relation_copy_for_cluster().

Backpatch to PostgreSQL 12, where TableAmRoutine was introduced.

Discussion: https://postgr.es/m/ME3P282MB3166860D4911AE82F92DF7C5B63F2%40ME3P282MB3166.AUSP282.PROD.OUTLOOK.COM
Author: Japin Li
Reviewed-by: Pavel Borisov
Backpatch-through: 12
2024-04-03 21:31:06 +03:00
Tom Lane
4fb56a734d Avoid deadlock during orphan temp table removal.
If temp tables have dependencies (such as sequences) then it's
possible for autovacuum's cleanup of orphan temp tables to deadlock
against an incoming backend that's trying to clean out the temp
namespace for its own use.  That can happen because RemoveTempRelations'
performDeletion call can visit objects within the namespace in
an order different from the order in which a per-table deletion
will visit them.

To fix, observe that performDeletion will begin by taking an exclusive
lock on the temp namespace (even though it won't actually delete it).
So, if we can get a shared lock on the namespace, we can be sure we're
not running concurrently with RemoveTempRelations, while also not
conflicting with ordinary use of the namespace.  This requires
introducing a conditional version of LockDatabaseObject, but that's no
big deal.  (It's surprising we've got along without that this long.)

Report and patch by Mikhail Zhilin.  Back-patch to all supported
branches.

Discussion: https://postgr.es/m/c43ce028-2bc2-4865-9b89-3f706246eed5@postgrespro.ru
2024-04-02 14:59:04 -04:00
Tom Lane
6f66fadad9 Fix confusion about the return rowtype of SQL-language procedures.
There is a very ancient hack in check_sql_fn_retval that allows a
single SELECT targetlist entry of composite type to be taken as
supplying all the output columns of a function returning composite.
(This is grotty and fundamentally ambiguous, but it's really hard
to do nested composite-returning functions without it.)

As far as I know, that doesn't cause any problems in ordinary
functions.  It's disastrous for procedures however.  All procedures
that have any output parameters are labeled with prorettype RECORD,
and the CALL code expects it will get back a record with one column
per output parameter, regardless of whether any of those parameters
is composite.  Doing something else leads to an assertion failure
or core dump.

This is simple enough to fix: we just need to not apply that rule
when considering procedures.  However, that requires adding another
argument to check_sql_fn_retval, which at least in principle might be
getting called by external callers.  Therefore, in the back branches
convert check_sql_fn_retval into an ABI-preserving wrapper around a
new function check_sql_fn_retval_ext.

Per report from Yahor Yuzefovich.  This has been broken since we
implemented procedures, so back-patch to all supported branches.

Discussion: https://postgr.es/m/CABz5gWHSjj2df6uG0NRiDhZ_Uz=Y8t0FJP-_SVSsRsnrQT76Gg@mail.gmail.com
2024-03-12 18:16:10 -04:00
Michael Paquier
e82b64a92c Revert "Fix parallel-safety check of expressions and predicate for index builds"
This reverts commit eae7be600b, following a discussion with Tom Lane,
due to concerns that this impacts the decisions made by the planner for
the number of workers spawned based on the inlining and const-folding of
index expressions and predicate for cases that would have worked until
this commit.

Discussion: https://postgr.es/m/162802.1709746091@sss.pgh.pa.us
Backpatch-through: 12
2024-03-07 08:31:04 +09:00
Michael Paquier
50b5913a87 Fix parallel-safety check of expressions and predicate for index builds
As coded, the planner logic that calculates the number of parallel
workers to use for a parallel index build uses expressions and
predicates from the relcache, which are flattened for the planner by
eval_const_expressions().

As reported in the bug, an immutable parallel-unsafe function flattened
in the relcache would become a Const, which would be considered as
parallel-safe, even if the predicate or the expressions including the
function are not safe in parallel workers.  Depending on the expressions
or predicate used, this could cause the parallel build to fail.

Tests are included that check parallel index builds with parallel-unsafe
predicate and expressions.  Two routines are added to lsyscache.h to be
able to retrieve expressions and predicate of an index from its pg_index
data.

Reported-by: Alexander Lakhin
Author: Tender Wang
Reviewed-by: Jian He, Michael Paquier
Discussion: https://postgr.es/m/CAHewXN=UaAaNn9ruHDH3Os8kxLVmtWqbssnf=dZN_s9=evHUFA@mail.gmail.com
Backpatch-through: 12
2024-03-06 17:24:06 +09:00
Noah Misch
8fa4a1ac61 Sync PG_VERSION file in CREATE DATABASE.
An OS crash could leave PG_VERSION empty or missing.  The same symptom
appeared in a backup by block device snapshot, taken after the next
checkpoint and before the OS flushes the PG_VERSION blocks.  Device
snapshots are not a documented backup method, however.  Back-patch to
v15, where commit 9c08aea6a3 introduced
STRATEGY=WAL_LOG and made it the default.

Discussion: https://postgr.es/m/20240130195003.0a.nmisch@google.com
2024-02-01 13:44:23 -08:00
Nathan Bossart
3726c1cb0e Move is_valid_ascii() to ascii.h.
This function requires simd.h, which is a rather large dependency
for a widely-used header file like pg_wchar.h.  Furthermore, there
is a report of a third-party tool that is struggling to use
pg_wchar.h due to its dependence on simd.h (presumably because
simd.h uses several intrinsics).  Moving the function to the much
less popular ascii.h resolves these issues for now.

This commit is back-patched for the benefit of the aforementioned
third-party tool.  The simd.h dependency was only added in v16,
but we've opted to back-patch to v15 so that is_valid_ascii() lives
in the same file for all versions where it exists.  This could
break existing third-party code that uses the function, but we
couldn't find any examples of such code.  It should be possible to
fix any code that this commit breaks by including ascii.h in the
file that uses is_valid_ascii().

Author: Jubilee Young
Reviewed-by: Tom Lane, John Naylor, Andres Freund, Eric Ridge
Discussion: https://postgr.es/m/CAPNHn3oKJJxMsYq%2BqLYzVJOFrUcOr4OF1EC-KtFT-qh8nOOOtQ%40mail.gmail.com
Backpatch-through: 15
2024-01-29 12:09:08 -06:00
Michael Paquier
1cf2dba84b Add try_index_open(), conditional variant of index_open()
try_index_open() is able to open an index if its relkind fits, except
that it would return NULL instead of generated an error if the relation
does not exist.  This new routine will be used by an upcoming patch to
make REINDEX on partitioned relations more robust when an index in a
partition tree is dropped.

Extracted from a larger patch by the same author.

Author: Fei Changhong
Discussion: https://postgr.es/m/tencent_6A52106095ACDE55333E3AD33F304C0C3909@qq.com
Backpatch-through: 14
2024-01-18 15:04:35 +09:00
Andres Freund
f374fb4aab lwlock: Fix quadratic behavior with very long wait lists
Until now LWLockDequeueSelf() sequentially searched the list of waiters to see
if the current proc is still is on the list of waiters, or has already been
removed. In extreme workloads, where the wait lists are very long, this leads
to a quadratic behavior. #backends iterating over a list #backends
long. Additionally, the likelihood of needing to call LWLockDequeueSelf() in
the first place also increases with the increased length of the wait queue, as
it becomes more likely that a lock is released while waiting for the wait list
lock, which is held for longer during lock release.

Due to the exponential back-off in perform_spin_delay() this is surprisingly
hard to detect. We should make that easier, e.g. by adding a wait event around
the pg_usleep() - but that's a separate patch.

The fix is simple - track whether a proc is currently waiting in the wait list
or already removed but waiting to be woken up in PGPROC->lwWaiting.

In some workloads with a lot of clients contending for a small number of
lwlocks (e.g. WALWriteLock), the fix can substantially increase throughput.

This has been originally fixed for 16~ with a4adc31f69 without a
backpatch, and we have heard complaints from users impacted by this
quadratic behavior in older versions as well.

Author: Andres Freund <andres@anarazel.de>
Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Discussion: https://postgr.es/m/20221027165914.2hofzp4cvutj6gin@awork3.anarazel.de
Discussion: https://postgr.es/m/CALj2ACXktNbG=K8Xi7PSqbofTZozavhaxjatVc14iYaLu4Maag@mail.gmail.com
Backpatch-through: 12
2024-01-18 11:12:31 +09:00
Tom Lane
5dd30bb54b Use BIO_{get,set}_app_data instead of BIO_{get,set}_data.
We should have done it this way all along, but we accidentally got
away with using the wrong BIO field up until OpenSSL 3.2.  There,
the library's BIO routines that we rely on use the "data" field
for their own purposes, and our conflicting use causes assorted
weird behaviors up to and including core dumps when SSL connections
are attempted.  Switch to using the approved field for the purpose,
i.e. app_data.

While at it, remove our configure probes for BIO_get_data as well
as the fallback implementation.  BIO_{get,set}_app_data have been
there since long before any OpenSSL version that we still support,
even in the back branches.

Also, update src/test/ssl/t/001_ssltests.pl to allow for a minor
change in an error message spelling that evidently came in with 3.2.

Tristan Partin and Bo Andreson.  Back-patch to all supported branches.

Discussion: https://postgr.es/m/CAN55FZ1eDDYsYaL7mv+oSLUij2h_u6hvD4Qmv-7PK7jkji0uyQ@mail.gmail.com
2023-11-28 12:34:03 -05:00
Heikki Linnakangas
2873fbfe0d Fix assertions with RI triggers in heap_update and heap_delete.
If the tuple being updated is not visible to the crosscheck snapshot,
we return TM_Updated but the assertions would not hold in that case.
Move them to before the cross-check.

Fixes bug #17893. Backpatch to all supported versions.

Author: Alexander Lakhin
Backpatch-through: 12
Discussion: https://www.postgresql.org/message-id/17893-35847009eec517b5%40postgresql.org
2023-11-28 11:59:50 +02:00
Daniel Gustafsson
aef521849b llvmjit: Use explicit LLVMContextRef for inlining
When performing inlining LLVM unfortunately "leaks" types (the
types survive and are usable, but a new round of inlining will
recreate new structurally equivalent types). This accumulation
will over time amount to a memory leak which for some queries
can be large enough to trigger the OOM process killer.

To avoid accumulation of types, all IR related data is stored
in an LLVMContextRef which is dropped and recreated in order
to release all types.  Dropping and recreating incurs overhead,
so it will be done only after 100 queries. This is a heuristic
which might be revisited, but until we can get the size of the
context from LLVM we are flying a bit blind.

This issue has been reported several times, there may be more
references to it in the archives on top of the threads linked
below.

This is a backpatch of 9dce22033d to all supported branches.

Reported-By: Justin Pryzby <pryzby@telsasoft.com>
Reported-By: Kurt Roeckx <kurt@roeckx.be>
Reported-By: Jaime Casanova <jcasanov@systemguards.com.ec>
Reported-By: Lauri Laanmets <pcspets@gmail.com>
Author: Andres Freund and Daniel Gustafsson
Discussion: https://postgr.es/m/7acc8678-df5f-4923-9cf6-e843131ae89d@www.fastmail.com
Discussion: https://postgr.es/m/20201218235607.GC30237@telsasoft.com
Discussion: https://postgr.es/m/CAPH-tTxLf44s3CvUUtQpkDr1D8Hxqc2NGDzGXS1ODsfiJ6WSqA@mail.gmail.com
Backpatch-through: v12
2023-11-17 10:21:34 +01:00
Tom Lane
9057ddbefe Ensure we preprocess expressions before checking their volatility.
contain_mutable_functions and contain_volatile_functions give
reliable answers only after expression preprocessing (specifically
eval_const_expressions).  Some places understand this, but some did
not get the memo --- which is not entirely their fault, because the
problem is documented only in places far away from those functions.
Introduce wrapper functions that allow doing the right thing easily,
and add commentary in hopes of preventing future mistakes from
copy-and-paste of code that's only conditionally safe.

Two actual bugs of this ilk are fixed here.  We failed to preprocess
column GENERATED expressions before checking mutability, so that the
code could fail to detect the use of a volatile function
default-argument expression, or it could reject a polymorphic function
that is actually immutable on the datatype of interest.  Likewise,
column DEFAULT expressions weren't preprocessed before determining if
it's safe to apply the attmissingval mechanism.  A false negative
would just result in an unnecessary table rewrite, but a false
positive could allow the attmissingval mechanism to be used in a case
where it should not be, resulting in unexpected initial values in a
new column.

In passing, re-order the steps in ComputePartitionAttrs so that its
checks for invalid column references are done before applying
expression_planner, rather than after.  The previous coding would
not complain if a partition expression contains a disallowed column
reference that gets optimized away by constant folding, which seems
to me to be a behavior we do not want.

Per bug #18097 from Jim Keener.  Back-patch to all supported versions.

Discussion: https://postgr.es/m/18097-ebb179674f22932f@postgresql.org
2023-11-16 10:05:14 -05:00
Nathan Bossart
18f47989ec Fix fallback implementation for pg_atomic_test_set_flag().
The fallback implementation of pg_atomic_test_set_flag() that uses
atomic-exchange gives pg_atomic_exchange_u32_impl() an extra
argument.  This issue has been present since the introduction of
the atomics API in commit b64d92f1a5.

Reviewed-by: Andres Freund
Discussion: https://postgr.es/m/20231114035439.GA1809032%40nathanxps13
Backpatch-through: 12
2023-11-15 15:04:34 -06:00
David Rowley
456d697bae Ensure we use the correct spelling of "ensure"
We seem to have accidentally used "insure" in a few places.  Correct
that.

Author: Peter Smith
Discussion: https://postgr.es/m/CAHut+Pv0biqrhA3pMhu40aDsj343mTsD75khKnHsLqR8P04f=Q@mail.gmail.com
Backpatch-through: 12, oldest supported version
2023-11-10 00:17:07 +13:00
Dean Rasheed
308a69a987 Fix corner-case 64-bit integer subtraction bug on some platforms.
When computing "0 - INT64_MIN", most platforms would report an
overflow error, which is correct. However, platforms without integer
overflow builtins or 128-bit integers would fail to spot the overflow,
and incorrectly return INT64_MIN.

Back-patch to all supported branches.

Patch be me. Thanks to Jian He for initial investigation, and Laurenz
Albe and Tom Lane for review.

Discussion: https://postgr.es/m/CAEZATCUNK-AZSD0jVdgkk0N%3DNcAXBWeAEX-QU9AnJPensikmdQ%40mail.gmail.com
2023-11-09 09:54:22 +00:00
Tom Lane
3bc6bc3ee2 Detect integer overflow while computing new array dimensions.
array_set_element() and related functions allow an array to be
enlarged by assigning to subscripts outside the current array bounds.
While these places were careful to check that the new bounds are
allowable, they neglected to consider the risk of integer overflow
in computing the new bounds.  In edge cases, we could compute new
bounds that are invalid but get past the subsequent checks,
allowing bad things to happen.  Memory stomps that are potentially
exploitable for arbitrary code execution are possible, and so is
disclosure of server memory.

To fix, perform the hazardous computations using overflow-detecting
arithmetic routines, which fortunately exist in all still-supported
branches.

The test cases added for this generate (after patching) errors that
mention the value of MaxArraySize, which is platform-dependent.
Rather than introduce multiple expected-files, use psql's VERBOSITY
parameter to suppress the printing of the message text.  v11 psql
lacks that parameter, so omit the tests in that branch.

Our thanks to Pedro Gallegos for reporting this problem.

Security: CVE-2023-5869
2023-11-06 10:56:43 -05:00
Tom Lane
ae33659d42 Be more wary about NULL values for GUC string variables.
get_explain_guc_options() crashed if a string GUC marked GUC_EXPLAIN
has a NULL boot_val.  Nosing around found a couple of other places
that seemed insufficiently cautious about NULL string values, although
those are likely unreachable in practice.  Add some commentary
defining the expectations for NULL values of string variables,
in hopes of forestalling future additions of more such bugs.

Xing Guo, Aleksander Alekseev, Tom Lane

Discussion: https://postgr.es/m/CACpMh+AyDx5YUpPaAgzVwC1d8zfOL4JoD-uyFDnNSa1z0EsDQQ@mail.gmail.com
2023-11-02 11:47:33 -04:00
Tom Lane
1268e73781 Fix problems when a plain-inheritance parent table is excluded.
When an UPDATE/DELETE/MERGE's target table is an old-style
inheritance tree, it's possible for the parent to get excluded
from the plan while some children are not.  (I believe this is
only possible if we can prove that a CHECK ... NO INHERIT
constraint on the parent contradicts the query WHERE clause,
so it's a very unusual case.)  In such a case, ExecInitModifyTable
mistakenly concluded that the first surviving child is the target
table, leading to at least two bugs:

1. The wrong table's statement-level triggers would get fired.

2. In v16 and up, it was possible to fail with "invalid perminfoindex
0 in RTE with relid nnnn" due to the child RTE not having permissions
data included in the query plan.  This was hard to reproduce reliably
because it did not occur unless the update triggered some non-HOT
index updates.

In v14 and up, this is easy to fix by defining ModifyTable.rootRelation
to be the parent RTE in plain inheritance as well as partitioned cases.

While the wrong-triggers bug also appears in older branches, the
relevant code in both the planner and executor is quite a bit
different, so it would take a good deal of effort to develop and
test a suitable patch.  Given the lack of field complaints about the
trigger issue, I'll desist for now.  (Patching v11 for this seems
unwise anyway, given that it will have no more releases after next
month.)

Per bug #18147 from Hans Buschmann.

Amit Langote and Tom Lane

Discussion: https://postgr.es/m/18147-6fc796538913ee88@postgresql.org
2023-10-24 14:48:34 -04:00
Thomas Munro
b2e0977886 jit: Supply LLVMGlobalGetValueType() for LLVM < 8.
Commit 37d5babb used this C API function while adding support for LLVM
16 and opaque pointers, but it's not available in LLVM 7 and older.
Provide it in our own llvmjit_wrap.cpp.  It just calls a C++ function
that pre-dates LLVM 3.9, our minimum target.

Back-patch to 12, like 37d5babb.

Discussion: https://postgr.es/m/CA%2BhUKGKnLnJnWrkr%3D4mSGhE5FuTK55FY15uULR7%3Dzzc%3DwX4Nqw%40mail.gmail.com
2023-10-19 03:03:27 +13:00
Thomas Munro
eed1feb3fe jit: Support opaque pointers in LLVM 16.
Remove use of LLVMGetElementType() and provide the type of all pointers
to LLVMBuildXXX() functions when emitting IR, as required by modern LLVM
versions[1].

 * For LLVM <= 14, we'll still use the old LLVMBuildXXX() functions.
 * For LLVM == 15, we'll continue to do the same, explicitly opting
   out of opaque pointer mode.
 * For LLVM >= 16, we'll use the new LLVMBuildXXX2() functions that take
   the extra type argument.

The difference is hidden behind some new IR emitting wrapper functions
l_load(), l_gep(), l_call() etc.  The change is mostly mechanical,
except that at each site the correct type had to be provided.

In some places we needed to do some extra work to get functions types,
including some new wrappers for C++ APIs that are not yet exposed by in
LLVM's C API, and some new "example" functions in llvmjit_types.c
because it's no longer possible to start from the function pointer type
and ask for the function type.

Back-patch to 12, because it's a little tricker in 11 and we agreed not
to put the latest LLVM support into the upcoming final release of 11.

[1] https://llvm.org/docs/OpaquePointers.html

Reviewed-by: Dmitry Dolgov <9erthalion6@gmail.com>
Reviewed-by: Ronan Dunklau <ronan.dunklau@aiven.io>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA%2BhUKGKNX_%3Df%2B1C4r06WETKTq0G4Z_7q4L4Fxn5WWpMycDj9Fw%40mail.gmail.com
2023-10-18 22:59:46 +13:00
Nathan Bossart
c9265ae80b Avoid calling proc_exit() in processes forked by system().
The SIGTERM handler for the startup process immediately calls
proc_exit() for the duration of the restore_command, i.e., a call
to system().  This system() call forks a new process to execute the
shell command, and this child process inherits the parent's signal
handlers.  If both the parent and child processes receive SIGTERM,
both will attempt to call proc_exit().  This can end badly.  For
example, both processes will try to remove themselves from the
PGPROC shared array.

To fix this problem, this commit adds a check in
StartupProcShutdownHandler() to see whether MyProcPid == getpid().
If they match, this is the parent process, and we can proc_exit()
like before.  If they do not match, this is a child process, and we
just emit a message to STDERR (in a signal safe manner) and
_exit(), thereby skipping any problematic exit callbacks.

This commit also adds checks in proc_exit(), ProcKill(), and
AuxiliaryProcKill() that verify they are not being called within
such child processes.

Suggested-by: Andres Freund
Reviewed-by: Thomas Munro, Andres Freund
Discussion: https://postgr.es/m/Y9nGDSgIm83FHcad%40paquier.xyz
Discussion: https://postgr.es/m/20230223231503.GA743455%40nathanxps13
Backpatch-through: 11
2023-10-17 10:42:06 -05:00
Noah Misch
782be0f712 Dissociate btequalimage() from interval_ops, ending its deduplication.
Under interval_ops, some equal values are distinguishable.  One such
pair is '24:00:00' and '1 day'.  With that being so, btequalimage()
breaches the documented contract for the "equalimage" btree support
function.  This can cause incorrect results from index-only scans.
Users should REINDEX any btree indexes having interval-type columns.
After updating, pg_amcheck will report an error for almost all such
indexes.  This fix makes interval_ops simply omit the support function,
like numeric_ops does.  Back-pack to v13, where btequalimage() first
appeared.  In back branches, for the benefit of old catalog content,
btequalimage() code will return false for type "interval".  Going
forward, back-branch initdb will include the catalog change.

Reviewed by Peter Geoghegan.

Discussion: https://postgr.es/m/20231011013317.22.nmisch@google.com
2023-10-14 16:33:54 -07:00
Dean Rasheed
3c1a1af91d Fix EvalPlanQual rechecking during MERGE.
Under some circumstances, concurrent MERGE operations could lead to
inconsistent results, that varied according the plan chosen. This was
caused by a lack of rowmarks on the source relation, which meant that
EvalPlanQual rechecking was not guaranteed to return the same source
tuples when re-running the join query.

Fix by ensuring that preprocess_rowmarks() sets up PlanRowMarks for
all non-target relations used in MERGE, in the same way that it does
for UPDATE and DELETE.

Per bug #18103. Back-patch to v15, where MERGE was introduced.

Dean Rasheed, reviewed by Richard Guo.

Discussion: https://postgr.es/m/18103-c4386baab8e355e3%40postgresql.org
2023-09-30 10:55:24 +01:00
Peter Geoghegan
cac37c1a1b Fix btmarkpos/btrestrpos array key wraparound bug.
nbtree's mark/restore processing failed to correctly handle an edge case
involving array key advancement and related search-type scan key state.
Scans with ScalarArrayScalarArrayOpExpr quals requiring mark/restore
processing (for a merge join) could incorrectly conclude that an
affected array/scan key must not have advanced during the time between
marking and restoring the scan's position.

As a result of all this, array key handling within btrestrpos could skip
a required call to _bt_preprocess_keys().  This confusion allowed later
primitive index scans to overlook tuples matching the true current array
keys.  The scan's search-type scan keys would still have spurious values
corresponding to the final array element(s) -- not values matching the
first/now-current array element(s).

To fix, remember that "array key wraparound" has taken place during the
ongoing btrescan in a flag variable stored in the scan's state, and use
that information at the point where btrestrpos decides if another call
to _bt_preprocess_keys is required.

Oversight in commit 70bc5833, which taught nbtree to handle array keys
during mark/restore processing, but missed this subtlety.  That commit
was itself a bug fix for an issue in commit 9e8da0f7, which taught
nbtree to handle ScalarArrayOpExpr quals natively.

Author: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/CAH2-WzkgP3DDRJxw6DgjCxo-cu-DKrvjEv_ArkP2ctBJatDCYg@mail.gmail.com
Backpatch: 11- (all supported branches).
2023-09-28 16:29:32 -07:00
Tom Lane
8700851352 Avoid unnecessary plancache revalidation of utility statements.
Revalidation of a plancache entry (after a cache invalidation event)
requires acquiring a snapshot.  Normally that is harmless, but not
if the cached statement is one that needs to run without acquiring a
snapshot.  We were already aware of that for TransactionStmts,
but for some reason hadn't extrapolated to the other statements that
PlannedStmtRequiresSnapshot() knows mustn't set a snapshot.  This can
lead to unexpected failures of commands such as SET TRANSACTION
ISOLATION LEVEL.  We can fix it in the same way, by excluding those
command types from revalidation.

However, we can do even better than that: there is no need to
revalidate for any statement type for which parse analysis, rewrite,
and plan steps do nothing interesting, which is nearly all utility
commands.  To mechanize this, invent a parser function
stmt_requires_parse_analysis() that tells whether parse analysis does
anything beyond wrapping a CMD_UTILITY Query around the raw parse
tree.  If that's what it does, then rewrite and plan will just
skip the Query, so that it is not possible for the same raw parse
tree to produce a different plan tree after cache invalidation.

stmt_requires_parse_analysis() is basically equivalent to the
existing function analyze_requires_snapshot(), except that for
obscure reasons that function omits ReturnStmt and CallStmt.
It is unclear whether those were oversights or intentional.
I have not been able to demonstrate a bug from not acquiring a
snapshot while analyzing these commands, but at best it seems mighty
fragile.  It seems safer to acquire a snapshot for parse analysis of
these commands too, which allows making stmt_requires_parse_analysis
and analyze_requires_snapshot equivalent.

In passing this fixes a second bug, which is that ResetPlanCache
would exclude ReturnStmts and CallStmts from revalidation.
That's surely *not* safe, since they contain parsable expressions.

Per bug #18059 from Pavel Kulakov.  Back-patch to all supported
branches.

Discussion: https://postgr.es/m/18059-79c692f036b25346@postgresql.org
2023-08-24 12:02:40 -04:00
Peter Eisentraut
908f711d23 Update DECLARE_INDEX documentation
Update source code comment changes belonging to the changes in
6a6389a08b.

Discussion: https://www.postgresql.org/message-id/flat/75ae5875-3abc-dafc-8aec-73247ed41cde@eisentraut.org
2023-08-24 14:00:54 +02:00