Since OpenBSD core dumps do not embed executable paths, the script now
searches for the corresponding binary manually within the specified
directory before invoking LLDB. This is imperfect but should find the
right executable in practice, as needed for meaningful backtraces.
Author: Nazir Bilal Yavuz <byavuz81@gmail.com>
Discussion: https://postgr.es/m/CAN55FZ36R74TZ8RKsFueYwLxGKDAm3LU2FHM_ZUCSB6imd3vYA@mail.gmail.com
Backpatch-through: 18
We've successfully used libsanitizer for awhile with the undefined
and alignment sanitizers, but with some other sanitizers (at least
thread and hwaddress) it crashes due to internal recursion before
it's fully initialized itself. It turns out that that's due to the
"__ubsan_default_options" hack installed by commit f686ae82f, and we
can fix it by ensuring that __ubsan_default_options is built without
any sanitizer instrumentation hooks.
Reported-by: Emmanuel Sibi <emmanuelsibi.mec@gmail.com>
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Diagnosed-by: Emmanuel Sibi <emmanuelsibi.mec@gmail.com>
Fix-suggested-by: Jacob Champion <jacob.champion@enterprisedb.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/F7543B04-E56C-4D68-A040-B14CCBAD38F1@gmail.com
Discussion: https://postgr.es/m/dbf77bf7-6e54-ed8a-c4ae-d196eeb664ce@gmail.com
Backpatch-through: 16
In generate_orderedappend_paths(), there is an assumption that a child
relation's row estimate is always greater than zero. There is an
Assert verifying this assumption, and the estimate is also used to
convert an absolute tuple count into a fraction.
However, this assumption is not always valid -- for example, upper
relations can have their row estimates unset, resulting in a value of
zero. This can cause an assertion failure in debug builds or lead to
the tuple fraction being computed as infinity in production builds.
To fix, use the row estimate from the cheapest_total path to compute
the tuple fraction. The row estimate in this path should already have
been forced to a valid value.
In passing, update the comment for generate_orderedappend_paths() to
note that the function also considers the cheapest-fractional case
when not all tuples need to be retrieved. That is, it collects all
the cheapest fractional paths and builds an ordered append path for
each interesting ordering.
Backpatch to v18, where this issue was introduced.
Bug: #19102
Reported-by: Kuntal Ghosh <kuntalghosh.2007@gmail.com>
Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Kuntal Ghosh <kuntalghosh.2007@gmail.com>
Reviewed-by: Andrei Lepikhov <lepihov@gmail.com>
Discussion: https://postgr.es/m/19102-93480667e1200169@postgresql.org
Backpatch-through: 18
The test introduced by 17b2d5ec75 verifies that a WAL receiver
survives across a timeline jump by searching the server logs for
termination messages. However, it called restart() before the timeline
switch, which kills the WAL receiver and may log the exact message being
checked, hence failing the test. As TAP tests reuse the same log file
across restarts, a rotate_logfile() is used before the restart so as the
log matching check is not impacted by log entries generated by a
previous shutdown.
Recent changes to file handle inheritance altered I/O timing enough to
make this fail consistently while testing another patch.
While on it, this adds an extra check based on a PID comparison. This
test may lead to false positives as it could be possible that the WAL
receiver has processed a timeline jump before the initial PID is
grabbed, but it should be good enough in most cases.
Like 17b2d5ec75, backpatch down to v13.
Author: Bryan Green <dbryan.green@gmail.com>
Co-authored-by: Xuneng Zhou <xunengzhou@gmail.com>
Discussion: https://postgr.es/m/9d00b597-d64a-4f1e-802e-90f9dc394c70@gmail.com
Backpatch-through: 13
The comment for ChangeVarNodes() refers to a parameter named
change_RangeTblRef, which does not exist in the code.
The comment for ChangeVarNodesExtended() contains an extra space,
while the comment for replace_relid_callback() has an awkward line
break and a typo.
This patch fixes these issues and revises some sentences for smoother
wording.
Oversights in commits ab42d643c and fc069a3a6.
Author: Richard Guo <guofenglinux@gmail.com>
Discussion: https://postgr.es/m/CAMbWs480j16HC1JtjKCgj5WshivT8ZJYkOfTyZAM0POjFomJkg@mail.gmail.com
Backpatch-through: 18
First, the assertions in assign_io_method() were the wrong way round. Second,
the lengthof() assertion checked the length of io_method_options, which is the
wrong array to check and is always longer than pgaio_method_ops_table.
While add it, add a static assert to ensure pgaio_method_ops_table and
io_method_options stay in sync.
Per coverity and Tom Lane.
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Backpatch-through: 18
In 2a0faed9d7, which added JIT compilation support for expressions, I
accidentally used sizeof(LLVMBasicBlockRef *) instead of
sizeof(LLVMBasicBlockRef) as part of computing the size of an allocation. That
turns out to have no real negative consequences due to LLVMBasicBlockRef being
a pointer itself (and thus having the same size). It still is wrong and
confusing, so fix it.
Reported by coverity.
Backpatch-through: 13
Allow pg_newlocale_from_collation(C_COLLATION_OID) to work even if
there's no catalog access, which some extensions expect.
Not known to be a bug without extensions involved, but backport to 18.
Also corrects an issue in master with dummy_c_locale (introduced in
commit 5a38104b36) where deterministic was not set. That wasn't a bug,
but could have been if that structure was used more widely.
Reported-by: Alexander Kukushkin <cyberdemn@gmail.com>
Reviewed-by: Alexander Kukushkin <cyberdemn@gmail.com>
Discussion: https://postgr.es/m/CAFh8B=nj966ECv5vi_u3RYij12v0j-7NPZCXLYzNwOQp9AcPWQ@mail.gmail.com
Backpatch-through: 18
This commit adds CHECK_FOR_INTERRUPTS to the shared buffer iteration
loops in EvictRelUnpinnedBuffers and EvictAllUnpinnedBuffers. These
functions, used by pg_buffercache's pg_buffercache_evict_relation and
pg_buffercache_evict_all, can now be interrupted during long-running
operations.
Backpatch to version 18, where these functions and their corresponding
pg_buffercache functions were introduced.
Author: Yuhang Qiu <iamqyh@gmail.com>
Discussion: https://postgr.es/m/8DC280D4-94A2-4E7B-BAB9-C345891D0B78%40gmail.com
Backpatch-through: 18
Commit a95e3d84c0 added ActiveSnapshot push+pop when processing
work-items (BRIN autosummarization), but forgot to handle the case of
a transaction failing during the run, which drops the snapshot untimely.
Fix by making the pop conditional on an element being actually there.
Author: Álvaro Herrera <alvherre@kurilemu.de>
Backpatch-through: 13
Discussion: https://postgr.es/m/202511041648.nofajnuddmwk@alvherre.pgsql
When building intermediate TID lists during parallel GIN builds, split
the sorted lists into smaller chunks, to limit the amount of memory
needed when merging the chunks later.
The leader may need to keep in memory up to one chunk per worker, and
possibly one extra chunk (before evicting some of the data). The code
processing item pointers uses regular palloc/repalloc calls, which means
it's subject to the MaxAllocSize (1GB) limit.
We could fix this by allowing huge allocations, but that'd require
changes in many places without much benefit. Larger chunks do not
actually improve performance, so the memory usage would be wasted.
Fixed by limiting the chunk size to not hit MaxAllocSize. Each worker
gets a fair share.
This requires remembering the number of participating workers, in a
place that can be accessed from the callback. Luckily, the bs_worker_id
field in GinBuildState was unused, so repurpose that.
Report by Greg Smith, investigation and fix by me. Batchpatched to 18,
where parallel GIN builds were introduced.
Reported-by: Gregory Smith <gregsmithpgsql@gmail.com>
Discussion: https://postgr.es/m/CAHLJuCWDwn-PE2BMZE4Kux7x5wWt_6RoWtA0mUQffEDLeZ6sfA@mail.gmail.com
Backpatch-through: 18
A generated column may end up being part of the partition key
expression, if it's specified as an expression e.g. "(<generated
column name>)" or if the partition key expression contains a whole-row
reference, even though we do not allow a generated column to be part
of partition key expression. Fix this hole.
Co-authored-by: jian he <jian.universality@gmail.com>
Co-authored-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com>
Discussion: https://www.postgresql.org/message-id/flat/CACJufxF%3DWDGthXSAQr9thYUsfx_1_t9E6N8tE3B8EqXcVoVfQw%40mail.gmail.com
It's possible to define BRIN indexes on functions that require a
snapshot to run, but the autosummarization feature introduced by commit
7526e10224 fails to provide one. This causes autovacuum to leave a
BRIN placeholder tuple behind after a failed work-item execution, making
such indexes less efficient. Repair by obtaining a snapshot prior to
running the task, and add a test to verify this behavior.
Author: Álvaro Herrera <alvherre@kurilemu.de>
Reported-by: Giovanni Fabris <giovanni.fabris@icon.it>
Reported-by: Arthur Nascimento <tureba@gmail.com>
Backpatch-through: 13
Discussion: https://postgr.es/m/202511031106.h4fwyuyui6fz@alvherre.pgsql
Commit b4f584f9d2 (affecting v15~, later backpatched down to 13 as of
3635a0a35a) introduced an unconditional WAL receiver shutdown when
switching from streaming to archive WAL sources. This causes problems
during a timeline switch, when a WAL receiver enters WALRCV_WAITING
state but remains alive, waiting for instructions.
The unconditional shutdown can break some monitoring scenarios as the
WAL receiver gets repeatedly terminated and re-spawned, causing
pg_stat_wal_receiver.status to show a "streaming" instead of "waiting"
status, masking the fact that the WAL receiver is waiting for a new TLI
and a new LSN to be able to continue streaming.
This commit changes the WAL receiver behavior so as the shutdown becomes
conditional, with InstallXLogFileSegmentActive being always reset to
prevent the regression fixed by b4f584f9d2: only terminate the WAL
receiver when it is actively streaming (WALRCV_STREAMING,
WALRCV_STARTING, or WALRCV_RESTARTING). When in WALRCV_WAITING state,
just reset InstallXLogFileSegmentActive flag to allow archive
restoration without killing the process. WALRCV_STOPPED and
WALRCV_STOPPING are not reachable states in this code path. For the
latter, the startup process is the one in charge of setting
WALRCV_STOPPING via ShutdownWalRcv(), waiting for the WAL receiver to
reach a WALRCV_STOPPED state after switching walRcvState, so
WaitForWALToBecomeAvailable() cannot be reached while a WAL receiver is
in a WALRCV_STOPPING state.
A regression test is added to check that a WAL receiver is not stopped
on timeline jump, that fails when the fix of this commit is reverted.
Reported-by: Ryan Bird <ryanzxg@gmail.com>
Author: Xuneng Zhou <xunengzhou@gmail.com>
Reviewed-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/19093-c4fff49a608f82a0@postgresql.org
Backpatch-through: 13
Add comments explaining when and where it is safe for nbtree to treat
row compare keys as if they were simple scalar inequality keys on the
row's most significant column. This is particularly important within
_bt_advance_array_keys, which deals with required inequality keys in a
general and uniform way, without any special handling for row compares.
Also spell out the implications of _bt_check_rowcompare's approach of
_conditionally_ evaluating lower-order row compare subkeys, particularly
when one of its lower-order subkeys might see NULL index tuple values
(these may or may not affect whether the qual as a whole is satisfied).
The behavior in this area isn't particularly intuitive, so these issues
seem worth going into.
In passing, add a few more defensive/documenting row comparison related
assertions to _bt_first and _bt_check_rowcompare.
Follow-up to commits bd3f59fd and ec986020.
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Victor Yegorov <vyegorov@gmail.com>
Reviewed-By: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAH2-Wznwkak_K7pcAdv9uH8ZfNo8QO7+tHXOaCUddMeTfaCCFw@mail.gmail.com
Backpatch-through: 18
_bt_first reliably uses the same equality key (on each index column) for
initial positioning purposes as the one that _bt_checkkeys can use to
end the scan following commit f09816a0. _bt_first no longer applies its
own independent rules to determine which initial positioning key to use
on each column (for equality and inequality keys alike). Preprocessing
is now fully in control of determining which keys start and end each
scan, ensuring that _bt_first and _bt_checkkeys have symmetric behavior.
Remove obsolete comments that described why _bt_first was expected to
use at least one of the available required equality keys for initial
positioning purposes. The rules in this area are now maximally strict
and uniform, so there's no reason to draw attention to equality keys.
Any column with a required equality key cannot have a redundant required
inequality key (nor can it have a redundant required equality key).
Oversight in commit f09816a0, which removed similar comments from
_bt_first, but missed these comments.
Author: Peter Geoghegan <pg@bowt.ie>
Backpatch-through: 18
The C standard says that the second and third arguments of a
conditional operator shall be both void type or both not-void
type. The Windows version of INTERRUPTS_PENDING_CONDITION()
got this wrong. It's pretty harmless because the result of
the operator is ignored anyway, but apparently recent versions
of MSVC have started issuing a warning about it. Silence the
warning by casting the dummy zero to void.
Reported-by: Christian Ullrich <chris@chrullrich.net>
Author: Bryan Green <dbryan.green@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/cc4ef8db-f8dc-4347-8a22-e7ebf44c0308@chrullrich.net
Backpatch-through: 13
The original messages were confusing in dry-run mode in that they state
that something is being done, when in reality it isn't. Use alternative
wording in that case, to make the distinction clear.
Author: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Euler Taveira <euler@eulerto.com>
Backpatch-through: 18
Discussion: https://postgr.es/m/CAHut+PsvQJQnQO0KT0S2oegenkvJ8FUuY-QS5syyqmT24R2xFQ@mail.gmail.com
The code updates the system identifier, then runs pg_walreset; if the
latter fails, it complains about the former, which makes no sense.
Change the error message to complain about the right thing.
Noticed while reviewing a patch touching nearby code.
Author: Álvaro Herrera <alvherre@kurilemu.de>
Backpatch-through: 17
This commit reverts 818fefd8fd, that has been introduced to address a
an instability in some of the TAP tests due to the presence of random
standby snapshot WAL records, when slots are invalidated by
InvalidatePossiblyObsoleteSlot().
Anyway, this commit had also the consequence of introducing a behavior
regression. After 818fefd8fd, the code may determine that a slot needs
to be invalidated while it may not require one: the slot may have moved
from a conflicting state to a non-conflicting state between the moment
when the mutex is released and the moment when we recheck the slot, in
InvalidatePossiblyObsoleteSlot(). Hence, the invalidations may be more
aggressive than they actually have to.
105b2cb336 has tackled the test instability in a way that should be
hopefully sufficient for the buildfarm, even for slow members:
- In v18, the test relies on an injection point that bypasses the
creation of the random records generated for standby snapshots,
eliminating the random factor that impacted the test. This option was
not available when 818fefd8fd was discussed.
- In v16 and v17, the problem was bypassed by disallowing a slot to
become active in some of the scenarios tested.
While on it, this commit adds a comment to document that it is fine for
a recheck to use xmin and LSN values stored in the slot, without storing
and reusing them across multiple checks.
Reported-by: "suyu.cmj" <mengjuan.cmj@alibaba-inc.com>
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/f492465f-657e-49af-8317-987460cb68b0.mengjuan.cmj@alibaba-inc.com
Backpatch-through: 16
RIGHT_SEMI joins rely on the HEAP_TUPLE_HAS_MATCH flag to guarantee
that only the first match for each inner tuple is considered.
However, in a parallel hash join, the inner relation is stored in a
shared global hash table that can be probed by multiple workers
concurrently. This allows different workers to inspect and set the
match flags of the same inner tuples at the same time.
If two workers probe the same inner tuple concurrently, both may see
the match flag as unset and emit the same tuple, leading to duplicate
output rows and violating RIGHT_SEMI join semantics.
For now, we disable parallel plans for RIGHT_SEMI joins. In the long
term, it may be possible to support parallel execution by performing
atomic operations on the match flag, for example using a CAS or
similar mechanism.
Backpatch to v18, where RIGHT_SEMI join was introduced.
Bug: #19094
Reported-by: Lori Corbani <Lori.Corbani@jax.org>
Diagnosed-by: Tom Lane <tgl@sss.pgh.pa.us>
Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/19094-6ed410eb5b256abd@postgresql.org
Backpatch-through: 18
Because long is 32-bit on 64-bit Windows, it isn't a good datatype to
store the difference between 2 pointers. The under-sized type could
overflow and lead to scary warnings in MEMORY_CONTEXT_CHECKING builds,
such as:
WARNING: problem in alloc set ExecutorState: bad single-chunk %p in block %p
However, the problem lies only in the code running the check, not from
an actual memory accounting bug.
Fix by using "Size" instead of "long". This means using an unsigned
type rather than the previous signed type. If the block's freeptr was
corrupted, we'd still catch that if the unsigned type wrapped. Unsigned
allows us to avoid further needless complexities around comparing signed
and unsigned types.
Author: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Backpatch-through: 13
Discussion: https://postgr.es/m/CAApHDvo-RmiT4s33J=aC9C_-wPZjOXQ232V-EZFgKftSsNRi4w@mail.gmail.com
CRLF translation is already handled by the text mode, so there should be
no need for any specific logic. See also 1c6d462939, msys perl being
one case where the translation mattered.
Note: An equivalent has been first applied on HEAD with 8767b449a3.
The same change is backpatched to v18 as all the Windows buildfarm
members have reported green.
Author: Jacob Champion <jacob.champion@enterprisedb.com>
Co-authored-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/aPsh39bxwYKvUlAf@paquier.xyz
Backpatch-through: 18
Some recent changes were made to remove the explicit dependency on
btree indexes in some parts of the code. One of these changes was
made in commit 9ef1851685, which allows non-btree indexes to be used
in get_actual_variable_range(). A follow-up commit ee1ae8b99f fixes
the cases where an index doesn’t have a sortopfamily as this is a
prerequisite to be used in get_actual_variable_range().
However, it was found that indexes that have amcanorder = true but do
not allow index-only-scans (amcanreturn returns false or is NULL) will
pass all of the conditions, while they should be rejected since
get_actual_variable_range() uses the index-only-scan machinery in
get_actual_variable_endpoint(). Such an index might cause errors like
ERROR: no data returned for index-only scan
during query planning.
The fix is to add a check in get_actual_variable_range() to reject
indexes that do not allow index-only scans.
Author: Maxime Schoemans <maxime.schoemans@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/20ED852A-C2D9-41EB-8671-8C8B9D418BE9%40enterprisedb.com
Previously, the check_hook for synchronized_standby_slots attempted to
validate that each specified slot existed and was physical. However, these
checks were not performed during server startup. As a result, if users
configured non-existent slots before startup, the misconfiguration would
go undetected initially. This could later cause parallel query failures,
as newly launched workers would detect the issue and raise an ERROR.
This patch improves the check_hook by validating the syntax and format of
slot names. Validation of slot existence and type is deferred to the WAL
sender process, aligning with the behavior of the check_hook for
primary_slot_name.
Reported-by: Fabrice Chapuis <fabrice636861@gmail.com>
Author: Shlok Kyal <shlok.kyal.oss@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Ashutosh Sharma <ashu.coek88@gmail.com>
Reviewed-by: Rahila Syed <rahilasyed90@gmail.com>
Backpatch-through: 17, where it was introduced
Discussion: https://postgr.es/m/CAA5-nLCeO4MQzWipCXH58qf0arruiw0OeUc1+Q=Z=4GM+=v1NQ@mail.gmail.com
When dealing with ResultRelInfos for partitions, there are cases where
there are mixed requirements for the ri_RootResultRelInfo. There are
cases when the partition itself requires a NULL ri_RootResultRelInfo and
in the same query, the same partition may require a ResultRelInfo with
its parent set in ri_RootResultRelInfo. This could cause the column
mapping between the partitioned table and the partition not to be done
which could result in crashes if the column attnums didn't match
exactly.
The fix is simple. We now check that the ri_RootResultRelInfo matches
what the caller passed to ExecGetTriggerResultRel() and only return a
cached ResultRelInfo when the ri_RootResultRelInfo matches what the
caller wants, otherwise we'll make a new one.
Author: David Rowley <dgrowleyml@gmail.com>
Author: Amit Langote <amitlangote09@gmail.com>
Reported-by: Dmitry Fomin <fomin.list@gmail.com>
Discussion: https://postgr.es/m/7DCE78D7-0520-4207-822B-92F60AEA14B4@gmail.com
Backpatch-through: 15
These two functions expect there to be room to insert another item
in the FreePageBtree's array, but their assertions were too weak
to guarantee that. This has little practical effect granting that
the callers are not buggy, but it seems to be misleading late-model
Coverity into complaining about possible array overrun.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/799984.1761150474@sss.pgh.pa.us
Backpatch-through: 13
Commit c6f7f11d8 intended to prevent leaking any PyObject reference
counts in edge cases (such as out-of-memory during string
construction), but actually it introduced a leak in the normal case.
Repeating an error-trapping operation often enough would lead to
session-lifespan memory bloat. The problem is that I failed to
think about the fact that PyObject_GetAttrString() increments the
refcount of the returned PyObject, so that simply walking down the
list of error frame objects causes all but the first one to have
their refcount incremented.
I experimented with several more-or-less-complex ways around that,
and eventually concluded that the right fix is simply to drop the
newly-obtained refcount as soon as we walk to the next frame
object in PLy_traceback. This sounds unsafe, but it's perfectly
okay because the caller holds a refcount on the first frame object
and each frame object holds a refcount on the next one; so the
current frame object can't disappear underneath us.
By the same token, we can simplify the caller's cleanup back to
simply dropping its refcount on the first object. Cleanup of
each frame object will lead in turn to the refcount of the next
one going to zero.
I also added a couple of comments explaining why PLy_elog_impl()
doesn't try to free the strings acquired from PLy_get_spi_error_data()
or PLy_get_error_data(). That's because I got here by looking at a
Coverity complaint about how those strings might get leaked. They
are not leaked, but in testing that I discovered this other leak.
Back-patch, as c6f7f11d8 was. It's a bit nervous-making to be
putting such a fix into v13, which is only a couple weeks from its
final release; but I can't see that leaving a recently-introduced
leak in place is a better idea.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/1203918.1761184159@sss.pgh.pa.us
Backpatch-through: 13
Commit 883a95646a introduced overflow entries in the replication lag tracker
to fix an issue where lag columns in pg_stat_replication could stall when
the replay LSN stopped advancing.
This commit adds comments clarifying the purpose and behavior of overflow
entries to improve code readability and understanding.
Since commit 883a95646a was recently applied and backpatched to all
supported branches, this follow-up commit is also backpatched accordingly.
Author: Xuneng Zhou <xunengzhou@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CABPTF7VxqQA_DePxyZ7Y8V+ErYyXkmwJ1P6NC+YC+cvxMipWKw@mail.gmail.com
Backpatch-through: 13
When JIT deformed tuples (controlled via the jit_tuple_deforming GUC),
types narrower than sizeof(Datum) would be zero-extended up to Datum
width. This wasn't the same as what fetch_att() does in the standard
tuple deforming code. Logically the values are the same when fetching
via the DatumGet*() marcos, but negative numbers are not the same in
binary form.
In the report, the problem was manifesting itself with:
ERROR: could not find memoization table entry
in a query which had a "Cache Mode: binary" Memoize node. However, it's
currently unclear what else is affected. Anything that uses
datum_image_eq() or datum_image_hash() on a Datum from a tuple deformed by
JIT could be affected, but it may not be limited to that.
The fix for this is simple: use signed extension instead of zero
extension.
Many thanks to Emmanuel Touzery for reporting this issue and providing
steps and backup which allowed the problem to easily be recreated.
Reported-by: Emmanuel Touzery <emmanuel.touzery@plandela.si>
Author: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/DB8P194MB08532256D5BAF894F241C06393F3A@DB8P194MB0853.EURP194.PROD.OUTLOOK.COM
Backpatch-through: 13
One code path forgot to free the separately-malloc'd filename
part of a struct rfile. Another place freed the filename but
forgot the struct rfile itself. These seem worth fixing because
with a large backup we could be dealing with many files.
Coverity found the bug in make_rfile(). I found the other one
by manual inspection.
Previously, if primary_slot_name was set to an invalid slot name and
the configuration file was reloaded, both the postmaster and all other
backend processes reported a WARNING. With many processes running,
this could produce a flood of duplicate messages. The problem was that
the GUC check hook for primary_slot_name reported errors at WARNING
level via ereport().
This commit changes the check hook to use GUC_check_errdetail() and
GUC_check_errhint() for error reporting. As with other GUC parameters,
this causes non-postmaster processes to log the message at DEBUG3,
so by default, only the postmaster's message appears in the log file.
Backpatch to all supported versions.
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Chao Li <lic@highgo.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Discussion: https://postgr.es/m/CAHGQGwFud-cvthCTfusBfKHBS6Jj6kdAPTdLWKvP2qjUX6L_wA@mail.gmail.com
Backpatch-through: 13
Previously, when the replay LSN reported in feedback messages from a standby
stopped advancing, for example, due to a recovery conflict, the write_lag and
flush_lag columns in pg_stat_replication would initially update but then stop
progressing. This prevented users from correctly monitoring replication lag.
The problem occurred because when any LSN stopped updating, the lag tracker's
cyclic buffer became full (the write head reached the slowest read head).
In that state, the lag tracker could no longer compute round-trip lag values
correctly.
This commit fixes the issue by handling the slowest read entry (the one
causing the buffer to fill) as a separate overflow entry and freeing space
so the write and other read heads can continue advancing in the buffer.
As a result, write_lag and flush_lag now continue updating even if the reported
replay LSN remains stalled.
Backpatch to all supported versions.
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Chao Li <lic@highgo.com>
Reviewed-by: Shinya Kato <shinya11.kato@gmail.com>
Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com>
Discussion: https://postgr.es/m/CAHGQGwGdGQ=1-X-71Caee-LREBUXSzyohkoQJd4yZZCMt24C0g@mail.gmail.com
Backpatch-through: 13
A BlockNumber (32-bit) might not be large enough to add bo_pagesPerRange
to when the table contains close to 2^32 pages. At worst, this could
result in a cancellable infinite loop during the BRIN index scan with
power-of-2 pagesPerRange, and slow (inefficient) BRIN index scans and
scanning of unneeded heap blocks for non power-of-2 pagesPerRange.
Backpatch to all supported versions.
Author: sunil s <sunilfeb26@gmail.com>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CAOG6S4-tGksTQhVzJM19NzLYAHusXsK2HmADPZzGQcfZABsvpA@mail.gmail.com
Backpatch-through: 13
The comment fixed in this commit described the function as dealing with
database blocks, but in reality it processes shared memory allocations.
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/aH4DDhdiG9Gi0rG7@ip-10-97-1-34.eu-west-3.compute.internal
Backpatch-through: 18
67a54b9e8 taught the planner to push down HAVING clauses even when
grouping sets are present, as long as the clause does not reference
any columns that are nullable by the grouping sets. However, there
was an oversight: if any empty grouping sets are present, the
aggregation node can produce a row that did not come from the input,
and pushing down a HAVING clause in this case may cause us to fail to
filter out that row.
Currently, non-degenerate HAVING clauses are not pushed down when
empty grouping sets are present, since the empty grouping sets would
nullify the vars they reference. However, degenerate (variable-free)
HAVING clauses are not subject to this restriction and may be
incorrectly pushed down.
To fix, explicitly check for the presence of empty grouping sets and
retain degenerate clauses in HAVING when they are present. This
ensures that we don't emit a bogus aggregated row. A copy of each
such clause is also put in WHERE so that query_planner() can use it in
a gating Result node.
To facilitate this check, this patch expands the groupingSets tree of
the query to a flat list of grouping sets before applying the HAVING
pushdown optimization. This does not add any additional planning
overhead, since we need to do this expansion anyway.
In passing, make a small tweak to preprocess_grouping_sets() by
reordering its initial operations a bit.
Backpatch to v18, where this issue was introduced.
Reported-by: Yuhang Qiu <iamqyh@gmail.com>
Author: Richard Guo <guofenglinux@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/0879D9C9-7FE2-4A20-9593-B23F7A0B5290@gmail.com
Backpatch-through: 18
pgwin32_unsetenv() (compatibility routine of unsetenv() on Windows)
lacks the input validation that its sibling pgwin32_setenv() has.
Without these checks, calling unsetenv() with incorrect names crashes on
WIN32. However, invalid names should be handled, failing on EINVAL.
This commit adds the same checks as setenv() to fail with EINVAL for a
"name" set to NULL, an empty string, or if '=' is included in the value,
per POSIX requirements.
Like 7ca37fb040, backpatch down to v14. pgwin32_unsetenv() is defined
on REL_13_STABLE, but with the branch going EOL soon and the lack of
setenv() there for WIN32, nothing is done for v13.
Author: Bryan Green <dbryan.green@gmail.com>
Discussion: https://postgr.es/m/b6a1e52b-d808-4df7-87f7-2ff48d15003e@gmail.com
Backpatch-through: 14
The revised logic in 001_ssltests.pl would fail if openssl
doesn't work or if Perl is a 32-bit build, because it had
already overwritten $serialno with something inappropriate
to use in the eventual match. We could go back to the
previous code layout, but it seems best to introduce a
separate variable for the output of openssl.
Per failure on buildfarm member mamba, which has a 32-bit Perl.
It emerges that zlib's configuration logic is not robust enough
to guarantee that the macro will have the same ideas about struct
field layout as the library itself does, leading to corruption of
zlib's state struct followed by unintelligible failure messages.
This hazard has existed for a long time, but we'd not noticed
for several reasons:
(1) We only use gzgetc() when trying to read a manually-compressed
TOC file within a directory-format dump, which is a rarely-used
scenario that we weren't even testing before 20ec99589.
(2) No corruption actually occurs unless sizeof(long) is different
from sizeof(off_t) and the platform is big-endian.
(3) Some platforms have already fixed the configuration instability,
at least sufficiently for their environments.
Despite (3), it seems foolish to assume that the problem isn't
going to be present in some environments for a long time to come.
Hence, avoid relying on this macro. We can just #undef it and
fall back on the underlying function of the same name.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/2122679.1760846783@sss.pgh.pa.us
Backpatch-through: 13
It is possible to have a non-inherited not-null constraint on an
inherited column, but we were failing to preserve such constraints
during pg_upgrade where the source is 17 or older, because of a bug in
the pg_dump query for it. Oversight in commit 14e87ffa5c. Fix that
query. In passing, touch-up a bogus nearby comment introduced by the
same commit.
In version 17, make the regression tests leave a table in this
situation, so that this scenario is tested in the cross-version upgrade
tests of 18 and up.
Author: Dilip Kumar <dilipbalaut@gmail.com>
Reported-by: Andrew Bille <andrewbille@gmail.com>
Bug: #19074
Backpatch-through: 18
Discussion: https://postgr.es/m/19074-ae2548458cf0195c@postgresql.org
This fixes an unlikely issue when fetching GROUPING SET results from
their internally stored hash tables. It was possible in rare cases that
the hash iterator would be set up incorrectly which could result in a
crash.
This was introduced in 4d143509c, so backpatch to v18.
Many thanks to Yuri Zamyatin for reporting and helping to debug this
issue.
Bug: #19078
Reported-by: Yuri Zamyatin <yuri@yrz.am>
Author: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Jeff Davis <pgsql@j-davis.com>
Discussion: https://postgr.es/m/19078-dfd62f840a2c0766@postgresql.org
Backpatch-through: 18
Commit a1b4f289be improved the hashjoin sizing to also consider the
memory used by BufFiles for batches. The code however had multiple
issues, making it ineffective or not working as expected in some cases.
* The amount of memory needed by buffers was calculated using uint32,
so it would overflow for nbatch >= 262144. If this happened the loop
would exit prematurely and the memory usage would not be reduced.
The nbatch overflow is fixed by reworking the condition to not use a
multiplication at all, so there's no risk of overflow. An explicit
cast was added to a similar calculation in ExecHashIncreaseBatchSize.
* The loop adjusting the nbatch value used hash_table_bytes to calculate
the old/new size, but then updated only space_allowed. The consequence
is the total memory usage was not reduced, but all the memory saved by
reducing the number of batches was used for the internal hash table.
This was fixed by using only space_allowed. This is also more correct,
because hash_table_bytes does not account for skew buckets.
* The code was also doubling multiple parameters (e.g. the number of
buckets for hash table), but was missing overflow protections.
The loop now checks for overflow, and terminates if needed. It'd be
possible to cap the value and continue the loop, but it's not worth
the complexity. And the overflow implies the in-memory hash table is
already very large anyway.
While at it, rework the comment explaining how the memory balancing
works, to make it more concise and easier to understand.
The initial nbatch overflow issue was reported by Vaibhav Jain. The
other issues were noticed by me and Melanie Plageman. Fix by me, with a
lot of review and feedback by Melanie.
Backpatch to 18, where the hashjoin memory balancing was introduced.
Reported-by: Vaibhav Jain <jainva@google.com>
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Backpatch-through: 18
Discussion: https://postgr.es/m/CABa-Az174YvfFq7rLS+VNKaQyg7inA2exvPWmPWqnEn6Ditr_Q@mail.gmail.com
The SSL tests for pg_stat_ssl tries to exactly match the serial
from the certificate by extracting it with the openssl binary.
If that fails due to the binary not being available, a fallback
match is used, but the attempt to execute a missing binary adds
a warning to the output which can confuse readers for a failure
in the test. Fix by only attempting if the openssl binary was
found by autoconf/meson.
Backpatch down to v16 where commit c8e4030d1b made the test
use the OPENSSL variable from autoconf/meson instead of a hard-
coded value.
Author: Daniel Gustafsson <daniel@yesql.se>
Reported-by: Christoph Berg <myon@debian.org>
Discussion: https://postgr.es/m/aNPSp1-RIAs3skZm@msg.df7cb.de
Backpatch-through: 16
042_low_level_backup compared the result of a query two times with a
comparison operator based on an integer, while the result should be
compared with a string.
The outcome of the tests is currently not impacted by this change.
However, it could be possible that the tests fail to detect future
issues if the query results become different, for some reason.
Oversight in 99b4a63bef.
Author: Sadhuprasad Patro <b.sadhu@gmail.com>
Discussion: https://postgr.es/m/CAFF0-CHhwNx_Cv2uy7tKjODUbeOgPrJpW4Rpf1jqB16_1bU2sg@mail.gmail.com
Backpatch-through: 17
040_pg_createsubscriber has been calling safe_psql(), that returns the
result of a SQL query, with ok() without checking the result generated
(in this case 't', for a number of publications).
The outcome of the tests is currently not impacted by this change.
However, it could be possible that the test fails to detect future
issues if the query results become different.
The test is rewritten so as the number of publications is checked. This
is not the fix suggested originally by the author, but this is more
reliable in the long run.
Oversight in e5aeed4b80.
Author: Sadhuprasad Patro <b.sadhu@gmail.com>
Discussion: https://postgr.es/m/CAFF0-CHhwNx_Cv2uy7tKjODUbeOgPrJpW4Rpf1jqB16_1bU2sg@mail.gmail.com
Backpatch-through: 18
The original formulation failed to take into account the fact that for
the PGXS case, the source dir is not $(top_srcdir), so it ended up not
doing anything. Handle it explicitly.
Author: Ryo Matsumura <matsumura.ryo@fujitsu.com>
Reviewed-by: Bryan Green <dbryan.green@gmail.com>
Backpatch-through: 13
Discussion: https://postgr.es/m/TYCPR01MB113164770FB0B0BE6ED21E68EE8DCA@TYCPR01MB11316.jpnprd01.prod.outlook.com
EvalPlanQualStart() failed to propagate es_partition_directory into
the child EState used for EPQ rechecks. When execution time partition
pruning ran during the EPQ scan, executor code dereferenced a NULL
partition directory and crashed.
Previously, propagating es_partition_directory into the EPQ EState was
unnecessary because CreatePartitionPruneState(), which sets it on
demand, also initialized the exec-pruning context. After commit
d47cbf474, CreatePartitionPruneState() now initializes only the init-
time pruning context, leaving exec-pruning context initialization to
ExecInitNode(). Since EvalPlanQualStart() runs only ExecInitNode() and
not CreatePartitionPruneState(), it can encounter a NULL
es_partition_directory. Other executor fields initialized during
CreatePartitionPruneState() are already copied into the child EState
thanks to commit 8741e48e5d, but es_partition_directory was missed.
Fix by borrowing the parent estate's es_partition_directory in
EvalPlanQualStart(), and by clearing that field in EvalPlanQualEnd()
so the parent remains responsible for freeing the directory.
Add an isolation test permutation that triggers EPQ with execution-
time partition pruning, the case that reproduces this crash.
Bug: #19078
Reported-by: Yuri Zamyatin <yuri@yrz.am>
Diagnosed-by: David Rowley <dgrowleyml@gmail.com>
Author: David Rowley <dgrowleyml@gmail.com>
Co-authored-by: Amit Langote <amitlangote09@gmail.com>
Discussion: https://postgr.es/m/19078-dfd62f840a2c0766@postgresql.org
Backpatch-through: 18
Commit c8af5019be added a forward declaration for this typedef that
caused redefinitions, which are not valid in C99.
Per buildfarm members longfin and sifaka.
Discussion: https://postgr.es/m/aO_fzfnKVXMd_RUM%40nathan
Backpatch-through: 18 only
Presently, these functions look up the relation's OID, lock it, and
then check privileges. Not only does this approach provide no
guarantee that the locked relation matches the arguments of the
lookup, but it also allows users to briefly lock relations for
which they do not have privileges, which might enable
denial-of-service attacks. This commit adjusts these functions to
use RangeVarGetRelidExtended(), which is purpose-built to avoid
both of these issues. The new RangeVarGetRelidCallback function is
somewhat complicated because it must handle both tables and
indexes, and for indexes, we must check privileges on the parent
table and lock it first. Also, it needs to handle a couple of
extremely unlikely race conditions involving concurrent OID reuse.
A downside of this change is that the coding doesn't allow for
locking indexes in AccessShare mode anymore; everything is locked
in ShareUpdateExclusive mode. Per discussion, the original choice
of lock levels was intended for a now defunct implementation that
used in-place updates, so we believe this change is okay.
Reviewed-by: Jeff Davis <pgsql@j-davis.com>
Discussion: https://postgr.es/m/Z8zwVmGzXyDdkAXj%40nathan
Backpatch-through: 18
If inside an EPQ recheck, ExecScanFetch would run the recheck method
function for foreign/custom joins even if they aren't descendant nodes
in the EPQ recheck plan tree, which is problematic at least in the
foreign-join case, because such a foreign join isn't guaranteed to have
an alternative local-join plan required for running the recheck method
function; in the postgres_fdw case this could lead to a segmentation
fault or an assert failure in an assert-enabled build when running the
recheck method function.
Even if inside an EPQ recheck, any scan nodes that aren't descendant
ones in the EPQ recheck plan tree should be normally processed by using
the access method function; fix by modifying ExecScanFetch so that if
inside an EPQ recheck, it runs the recheck method function for
foreign/custom joins that are descendant nodes in the EPQ recheck plan
tree as before and runs the access method function for foreign/custom
joins that aren't.
This fix also adds to postgres_fdw an isolation test for an EPQ recheck
that caused issues stated above.
Oversight in commit 385f337c9.
Reported-by: Kristian Lejao <kristianlejao@gmail.com>
Author: Masahiko Sawada <sawada.mshk@gmail.com>
Co-authored-by: Etsuro Fujita <etsuro.fujita@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Etsuro Fujita <etsuro.fujita@gmail.com>
Discussion: https://postgr.es/m/CAD21AoBpo6Gx55FBOW+9s5X=nUw3Xpq64v35fpDEKsTERnc4TQ@mail.gmail.com
Backpatch-through: 13
The version number calculated by read_pg_version_file() is multiplied
once by 10000, to be able to do comparisons based on PG_VERSION_NUM or
equivalents with a minor version included. However, the version number
given sync_pgdata() was multiplied by 10000 a second time, leading to an
overestimated number.
This issue was harmless (still incorrect) as pg_combinebackup does not
support versions of Postgres older than v10, and sync_pgdata() only
includes a version check due to the rename of pg_xlog/ to pg_wal/. This
folder rename happened in the development cycle of v10. This would
become a problem if in the future sync_pgdata() is changed to have more
version-specific checks.
Oversight in dc21234005, so backpatch down to v17.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/aOil5d0y87ZM_wsZ@paquier.xyz
Backpatch-through: 17
log_error() would probably fail completely if used, and would
certainly print garbage for anything that needed to be interpolated
into the message, because it was failing to use the correct printing
subroutine for a va_list argument.
This bug likely went undetected because the error cases this code
is used for are rarely exercised - they only occur when Windows
security API calls fail catastrophically (out of memory, security
subsystem corruption, etc).
The FRONTEND variant can be fixed just by calling vfprintf()
instead of fprintf(). However, there was no va_list variant
of write_stderr(), so create one by refactoring that function.
Following the usual naming convention for such things, call
it vwrite_stderr().
Author: Bryan Green <dbryan.green@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAF+pBj8goe4fRmZ0V3Cs6eyWzYLvK+HvFLYEYWG=TzaM+tWPnw@mail.gmail.com
Backpatch-through: 13
pg_dump expects a read request of zero bytes to be a no-op; see for
example ReadStr(). Gzip_read got this wrong and falsely supposed
that the resulting gzret == 0 indicated an error. We could complicate
that error-checking logic some more, but it seems best to just fall
out immediately when passed size == 0.
This bug breaks the nominally-supported case of manually gzip'ing
the toc.dat file within a directory-style dump, so back-patch to v16
where this code came in. (Prior branches already have a short-circuit
for size == 0 before their only gzread call.)
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/3515357.1760128017@sss.pgh.pa.us
Backpatch-through: 16
Remove a variable that is no longer in use following commit 9a2e2a28.
It's not immediately clear why there were no compiler warnings about
this oversight.
Author: Peter Geoghegan <pg@bowt.ie>
Backpatch-through: 18
In commit a45c78e32 I removed the only regression test case that
reaches this function, because it turns out that we only use it
if reading an LZ4-compressed blobs.toc file in a directory dump,
and that is a state that has to be created manually. That seems
like a bad thing to not test, not so much for LZ4Stream_gets()
itself as because it means the squirrely eol_flag logic in
LZ4Stream_read_internal() is not tested.
The reason for the change was that I thought the lz4 program did not
have any way to perform compression without explicit specification
of the output file name. However, it turns out that the syntax
synopsis in its man page is a lie, and if you read enough of the
man page you find out that with "-m" it will do what's needful.
So restore the manual compression step in that test case.
Noted while testing some proposed changes in pg_dump's compression
logic.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/3515357.1760128017@sss.pgh.pa.us
Backpatch-through: 17
Commit 71f4c8c6f7 (which implemented DETACH CONCURRENTLY) added code
to create a separate table constraint when a table is detached
concurrently, identical to the partition constraint, on the theory that
such a constraint was needed in case the optimizer had constructed any
query plans that depended on the constraint being there. However, that
theory was apparently bogus because any such plans would be invalidated.
For hash partitioning, those constraints are problematic, because their
expressions reference the OID of the parent partitioned table, to which
the detached table is no longer related; this causes all sorts of
problems (such as inability of restoring a pg_dump of that table, and
the table no longer working properly if the partitioned table is later
dropped).
We'd like to get rid of all those constraints. In fact, for branch
master, do that -- no longer create any substitute constraints.
However, out of fear that some users might somehow depend on these
constraints for other partitioning strategies, for stable branches
(back to 14, which added DETACH CONCURRENTLY), only do it for hash
partitioning.
(If you repeatedly DETACH CONCURRENTLY and then ATTACH a partition, then
with this constraint addition you don't need to scan the table in the
ATTACH step, which presumably is good. But if users really valued this
feature, they would have requested that it worked for non-concurrent
DETACH also.)
Author: Haiyang Li <mohen.lhy@alibaba-inc.com>
Reported-by: Fei Changhong <feichanghong@qq.com>
Reported-by: Haiyang Li <mohen.lhy@alibaba-inc.com>
Backpatch-through: 14
Discussion: https://postgr.es/m/18371-7fef49f63de13f02@postgresql.org
Discussion: https://postgr.es/m/19070-781326347ade7c57@postgresql.org
Introduced by my (Álvaro's) commit 9e4f914b5e, which was itself
backpatched to pg10, though only pg15 and up contain the problem
because of commit 9c08aea6a3.
This isn't a particularly significant leak, but given the fix is
trivial, we might as well backpatch to all branches where it applies, so
do that.
Author: Nathan Bossart <nathandbossart@gmail.com>
Reported-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/x4odfdlrwvsjawscnqsqjpofvauxslw7b4oyvxgt5owoyf4ysn@heafjusodrz7
An assertion in _bt_killitems expected the scan's currPos state to
contain a valid LSN, saved from when currPos's page was initially read.
The assertion failed to account for the fact that even logged relations
can have leaf pages with an invalid LSN when built with wal_level set to
"minimal". Remove the faulty assertion.
Oversight in commit e6eed40e (though note that the assertion was
backpatched to stable branches before 18 by commit 7c319f54).
Author: Peter Geoghegan <pg@bowt.ie>
Reported-By: Matthijs van der Vleuten <postgresql@zr40.nl>
Bug: #19082
Discussion: https://postgr.es/m/19082-628e62160dbbc1c1@postgresql.org
Backpatch-through: 13
An error happening while a slot data is saved on disk in
SaveSlotToPath() could cause a state.tmp file (temporary file holding
the slot state data, renamed to its permanent name at the end of the
function) to remain around after it has been created. This temporary
file is created with O_EXCL, meaning that if an existing state.tmp is
found, its creation would fail. This would prevent the slot data to be
saved, requiring a manual intervention to remove state.tmp before being
able to save again a slot. Possible scenarios where this temporary file
could remain on disk is for example a ENOSPC case (no disk space) while
writing, syncing or renaming it. The bug reports point to a write
failure as the principal cause of the problems.
Using O_TRUNC has been argued back in 2019 as a potential solution to
discard any temporary file that could exist. This solution was rejected
as O_EXCL can also act as a safety measure when saving the slot state,
crash recovery offering cleanup guarantees post-crash. This commit uses
the alternative approach that has been suggested by Andres Freund back
in 2019. When the temporary state file cannot be written, synced,
closed or renamed (note: not when created!), an unlink() is used to
remove the temporary state file while holding the in-progress I/O
LWLock, so as any follow-up attempts to save a slot's data would not
choke on an existing file that remained around because of a previous
failure.
This problem has been reported a few times across the years, going back
to 2019, but for some reason I have never come back to do something
about it and it has been forgotten. A recent report has reminded me
that this was still a problem.
Reported-by: Kevin K Biju <kevinkbiju@gmail.com>
Reported-by: Sergei Kornilov <sk@zsrv.org>
Reported-by: Grigory Smolkin <g.smolkin@postgrespro.ru>
Discussion: https://postgr.es/m/CAM45KeHa32soKL_G8Vk38CWvTBeOOXcsxAPAs7Jt7yPRf2mbVA@mail.gmail.com
Discussion: https://postgr.es/m/3559061693910326@qy4q4a6esb2lebnz.sas.yp-c.yandex.net
Discussion: https://postgr.es/m/08bbfab1-a61d-3750-fc18-4ab2c1aa7f09@postgrespro.ru
Backpatch-through: 13
While pgoutput caches relation synchronization information in
RelationSyncCache that resides in CacheMemoryContext, each entry's
information (such as row filter expressions and column lists) is
stored in the entry's private memory context (entry_cxt in
RelationSyncEntry), which is a descendant memory context of the
decoding context. If a logical decoding invoked via SQL functions like
pg_logical_slot_get_binary_changes fails with an error, subsequent
logical decoding executions could access already-freed memory of the
entry's cache, resulting in a crash.
With this change, it's ensured that RelationSyncCache is cleaned up
even in error cases by using a memory context reset callback function.
Backpatch to 15, where entry_cxt was introduced for column filtering
and row filtering.
While the backbranches v13 and v14 have a similar issue where
RelationSyncCache persists even after an error when pgoutput is used
via SQL API, we decided not to backport this fix. This decision was
made because v13 is approaching its final minor release, and we won't
have an chance to fix any new issues that might arise. Additionally,
since using pgoutput via SQL API is not a common use case, the risk
outwights the benefit. If we receive bug reports, we can consider
backporting the fixes then.
Author: vignesh C <vignesh21@gmail.com>
Co-authored-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: Euler Taveira <euler@eulerto.com>
Discussion: https://postgr.es/m/CALDaNm0x-aCehgt8Bevs2cm=uhmwS28MvbYq1=s2Ekf0aDPkOA@mail.gmail.com
Backpatch-through: 15
SQL/JSON functions such as JSON_VALUE could fail with "unrecognized
node type" errors when a DEFAULT clause contained an explicit COLLATE
expression. That happened because assign_collations_walker() could
invoke exprSetCollation() on a JsonBehavior expression whose DEFAULT
still contained a CollateExpr, which exprSetCollation() does not
handle.
For example:
SELECT JSON_VALUE('{"a":1}', '$.c' RETURNING text
DEFAULT 'A' COLLATE "C" ON EMPTY);
Fix by validating in transformJsonBehavior() that the DEFAULT
expression's collation matches the enclosing JSON expression’s
collation. In exprSetCollation(), replace the recursive call on the
JsonBehavior expression with an assertion that its collation already
matches the target, since the parser now enforces that condition.
Reported-by: Jian He <jian.universality@gmail.com>
Author: Jian He <jian.universality@gmail.com>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Discussion: https://postgr.es/m/CACJufxHVwYYSyiVQ6o+PsRX6zQ7rAFinh_fv1kCfTsT1xG4Zeg@mail.gmail.com
Backpatch-through: 17
On Windows, this code did not handle error conditions correctly at
all, since it looked at "errno" which is not used for socket-related
errors on that platform. This resulted, for example, in failure
to connect to a PostgreSQL server with GSSAPI enabled.
We have a convention for dealing with this within libpq, which is to
use SOCK_ERRNO and SOCK_ERRNO_SET rather than touching errno directly;
but the GSSAPI code is a relative latecomer and did not get that memo.
(The equivalent backend code continues to use errno, because the
backend does this differently. Maybe libpq's approach should be
rethought someday.)
Apparently nobody tries to build libpq with GSSAPI support on Windows,
or we'd have heard about this before, because it's been broken all
along. Back-patch to all supported branches.
Author: Ning Wu <ning94803@gmail.com>
Co-authored-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAFGqpvg-pRw=cdsUpKYfwY6D3d-m9tw8WMcAEE7HHWfm-oYWvw@mail.gmail.com
Backpatch-through: 13
In similar vein to commit ccc8194e42, a reset instance of a shared
memory TID store happened to occupy the same private memory as the old
one for the entry point, since the chunk freed after the last round
of index vacuuming was put on the context's freelist. The failure
to update the vacrel->dead_items pointer was evident by nudging the
system to allocate memory in a different area. This was not discovered
at the time of the earlier commit since our regression tests didn't
cover multiple index passes with parallel vacuum.
Backpatch to v17, when TidStore came in.
Author: Kevin Oommen Anish <kevin.o@zohocorp.com>
Reviewed-by: Richard Guo <guofenglinux@gmail.com>
Tested-by: Richard Guo <guofenglinux@gmail.com>
Discussion: https://postgr.es/m/199a07cbdfc.7a1c4aac25838.1675074408277594551%40zohocorp.com
Backpatch-through: 17
Currently, pgbench aborts when a COPY response is received in
readCommandResponse(). However, as PQgetResult() returns an empty
result when there is no asynchronous result, through getCopyResult(),
the logic done at the end of readCommandResponse() for the error path
leads to an infinite loop.
This commit forcefully exits the COPY state with PQendcopy() before
moving to the error handler when fiding a COPY state, avoiding the
infinite loop. The COPY protocol is not supported by pgbench anyway, as
an error is assumed in this case, so giving up is better than having the
tool be stuck forever. pgbench was interruptible in this state.
A TAP test is added to check that an error happens if trying to use
COPY.
Author: Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com>
Discussion: https://postgr.es/m/CAO6_XqpHyF2m73ifV5a=5jhXxH2chk=XrgefY+eWWPe2Eft3=A@mail.gmail.com
Backpatch-through: 13
Some macOS machines are having trouble with 002_inline, which executes
the JSON parser test executables hundreds of times in a nested loop.
Both developer machines and buildfarm critters have shown excessive test
durations, upwards of 20 seconds.
Push the innermost loop of 002_inline, which iterates through differing
chunk sizes, down into the test executable. (I'd eventually like to push
all of the JSON unit tests down into C, but this is an easy win in the
short term.) Testers have reported a speedup between 4-9x.
Reported-by: Robert Haas <robertmhaas@gmail.com>
Suggested-by: Andres Freund <andres@anarazel.de>
Tested-by: Andrew Dunstan <andrew@dunslane.net>
Tested-by: Tom Lane <tgl@sss.pgh.pa.us>
Tested-by: Robert Haas <robertmhaas@gmail.com>
Discussion: https://postgr.es/m/CA%2BTgmobKoG%2BgKzH9qB7uE4MFo-z1hn7UngqAe9b0UqNbn3_XGQ%40mail.gmail.com
Backpatch-through: 17
pgbench uses readCommandResponse() to process server responses.
When readCommandResponse() encounters an error during a call to
PQgetResult() to fetch the current result, it attempts to report it
with an additional error message from PQerrorMessage(). However,
previously, this extra error message could be lost or become incorrect.
The cause was that after fetching the current result (and detecting
an error), readCommandResponse() called PQgetResult() again to
peek at the next result. This second call could overwrite the libpq
connection's error message before the original error was reported,
causing the error message retrieved from PQerrorMessage() to be
lost or overwritten.
This commit fixes the issue by updating readCommandResponse()
to use PQresultErrorMessage() instead of PQerrorMessage()
to retrieve the error message generated when the PQgetResult()
for the current result causes an error, ensuring the correct message
is reported.
Backpatch to all supported versions.
Author: Yugo Nagata <nagata@sraoss.co.jp>
Reviewed-by: Chao Li <lic@highgo.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/20250925110940.ebacc31725758ec47d5432c6@sraoss.co.jp
Backpatch-through: 13
Contrary to its siblings for the archiver, the bgwriter and the
checkpointer stats, pgstat_report_inj_fixed() can be called
concurrently. This was causing an assertion failure, while messing up
with the stats.
This code is aimed at being a template for extension developers, so it
is not a critical issue, but let's be correct. This module has also
been useful for some benchmarking, at least for me, and that was how I
have discovered this issue.
Oversight in f68cd847fa.
Author: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: wenhui qiu <qiuwenhuifx@gmail.com>
Discussion: https://postgr.es/m/aNnXbAXHPFUWPIz2@paquier.xyz
Backpatch-through: 18
Neighbor get_statistics_object_oid() ignores objects in pg_temp, as has
been the standard for non-relation, non-type namespace searches since
CVE-2007-2138. Hence, most operations that name a statistics object
correctly decline to map an unqualified name to a statistics object in
pg_temp. StatisticsObjIsVisibleExt() did not. Consequently,
pg_statistics_obj_is_visible() wrongly returned true for such objects,
psql \dX wrongly listed them, and getObjectDescription()-based ereport()
and pg_describe_object() wrongly omitted namespace qualification. Any
malfunction beyond that would depend on how a human or application acts
on those wrong indications. Commit
d99d58cdc8 introduced this. Back-patch to
v13 (all supported versions).
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Discussion: https://postgr.es/m/20250920162116.2e.nmisch@google.com
Backpatch-through: 13
Because we failed to do this, DISTINCT in GROUP BY DISTINCT would be
ignored in PL/pgSQL assignment statements. It's not surprising that
no one noticed, since such statements will throw an error if the query
produces more than one row. That eliminates most scenarios where
advanced forms of GROUP BY could be useful, and indeed makes it hard
even to find a simple test case. Nonetheless it's wrong.
This is directly the fault of be45be9c3 which added the groupDistinct
field, but I think much of the blame has to fall on c9d529848, in
which I incautiously supposed that we'd manage to keep two copies of
a big chunk of parse-analysis logic in sync. As a follow-up, I plan
to refactor so that there's only one copy. But that seems useful
only in master, so let's use this one-line fix for the back branches.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/31027.1758919078@sss.pgh.pa.us
Backpatch-through: 14
When running pgbench with --verbose-errors option and a custom script that
triggered retriable errors (e.g., serialization errors) in pipeline mode,
an assertion failure could occur:
Assertion failed: (sql_script[st->use_file].commands[st->command]->type == 1), function commandError, file pgbench.c, line 3062.
The failure happened because pgbench assumed these errors would only occur
during SQL commands, but in pipeline mode they can also happen during
\endpipeline meta command.
This commit fixes the assertion failure by adjusting the assertion check to
allow such errors during either SQL commands or \endpipeline.
Backpatch to v15, where the assertion check was introduced.
Author: Yugo Nagata <nagata@sraoss.co.jp>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CAHGQGwGWQMOzNkQs-LmpDHdNC0h8dmAuUMRvZrEntQi5a-b=Kg@mail.gmail.com
The functions test_stat_func() and test_stat_func2() had empty
function bodies, so that they took very little time to run. This made
it possible that on machines with relatively low timer resolution the
functions could return before the clock advanced, making the test fail
(as seen on buildfarm members fruitcrow and hamerkop).
To avoid that, pg_sleep for 10us during the functions. As far as we
can tell, all current hardware has clock resolution much less than
that. (The current implementation of pg_sleep will round it up to
1ms anyway, but someday that might get improved.)
Author: Michael Banck <mbanck@gmx.net>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/68d413a3.a70a0220.24c74c.8be9@mx.google.com
Backpatch-through: 15
If we already have an extension_state array but see a new extension_id
much larger than the highest the extension_id we've previously seen,
the old code might have failed to expand the array to a large enough
size, leading to disaster. Also, if we don't have an extension array
at all and need to create one, we should make sure that it's big enough
that we don't have to resize it instantly.
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: http://postgr.es/m/2949591.1758570711@sss.pgh.pa.us
Backpatch-through: 18
When defining an injection point there is no need to wrap the definition
with USE_INJECTION_POINT guards, the INJECTION_POINT macro is available
in all builds. Remove to make the code consistent.
Author: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/OSCPR01MB14966C8015DEB05ABEF2CE077F51FA@OSCPR01MB14966.jpnprd01.prod.outlook.com
Backpatch-through: 17
... which silently propagates a lot of headers into many places
via pgstat.h, as evidenced by the variety of headers that this patch
needs to add to seemingly random places. Add a minimum of typedefs to
conflict.h to be able to remove execnodes.h, and fix the fallout.
Backpatch to 18, where conflict.h first appeared.
Discussion: https://postgr.es/m/202509191927.uj2ijwmho7nv@alvherre.pgsql
Previously, the parallel apply worker used SIGINT to receive a graceful
shutdown signal from the leader apply worker. However, SIGINT is also used
by the LOCK_TIMEOUT handler to trigger a query-cancel interrupt. This
overlap caused the parallel apply worker to miss LOCK_TIMEOUT signals,
leading to incorrect behavior during lock wait/contention.
This patch resolves the conflict by switching the graceful shutdown signal
from SIGINT to SIGUSR2.
Reported-by: Zane Duffield <duffieldzane@gmail.com>
Diagnosed-by: Zhijie Hou <houzj.fnst@fujitsu.com>
Author: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Backpatch-through: 16, where it was introduced
Discussion: https://postgr.es/m/CACMiCkXyC4au74kvE2g6Y=mCEF8X6r-Ne_ty4r7qWkUjRE4+oQ@mail.gmail.com
Initially this was to fix the "catched" typo, but I (David) wasn't quite
clear on what the previous comment meant about being "effective". I
expect this means efficiency, so I've reworded the comment to indicate
that.
While this is only a comment fixup, for the sake of possibly minimizing
possible future backpatching pain, I've opted to backpatch to 18 since
this code is new to that version and the release isn't out the door yet.
Author: Tender Wang <tndrwang@gmail.com>
Discussion: https://postgr.es/m/CAHewXNmSYWPud1sfBvpKbCJeRkWeZYuqatxtV9U9LvAFXBEiBw@mail.gmail.com
Backpatch-through: 18
Commit bb3ec16e14 moved partition pruning metadata into PlannedStmt.
At executor startup this metadata is used to initialize the EState
fields es_part_prune_infos, es_part_prune_states, and
es_part_prune_results. EvalPlanQualStart() failed to copy those
fields into the child EState, causing NULL dereference when Append
ran partition pruning during a recheck. This can occur with DELETE
or UPDATE on partitioned tables that use runtime pruning, e.g. with
generic plans.
Fix by copying all partition pruning state into the EPQ estate.
Add an isolation test that reproduces the crash with concurrent
UPDATE and DELETE on a partitioned table, where the DELETE session
hits the crash during its EPQ recheck after the UPDATE commits.
Bug: #19056
Reported-by: Fei Changhong <feichanghong@qq.com>
Diagnozed-by: Fei Changhong <feichanghong@qq.com>
Author: David Rowley <dgrowleyml@gmail.com>
Co-authored-by: Amit Langote <amitlangote09@gmail.com>
Discussion: https://postgr.es/m/19056-a677cef9b54d76a0%40postgresql.org
Previously, pg_restore did not skip security labels on publications or
subscriptions even when --no-publications or --no-subscriptions was specified.
As a result, it could issue SECURITY LABEL commands for objects that were
never created, causing those commands to fail.
This commit fixes the issue by ensuring that security labels on publications
and subscriptions are also skipped when the corresponding options are used.
Backpatch to all supported versions.
Author: Jian He <jian.universality@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CACJufxHCt00pR9h51AVu6+yPD5J7JQn=7dQXxqacj0XyDhc-fA@mail.gmail.com
Backpatch-through: 13
If an aggregate function call contains a sub-select that has
an RTE referencing a CTE outside the aggregate, we must treat
that reference like a Var referencing the CTE's query level
for purposes of determining the aggregate's level. Otherwise
we might reach the nonsensical conclusion that the aggregate
should be evaluated at some query level higher than the CTE,
ending in a planner error or a broken plan tree that causes
executor failures.
Bug: #19055
Reported-by: BugForge <dllggyx@outlook.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/19055-6970cfa8556a394d@postgresql.org
Backpatch-through: 13
The pending entry was not used when incrementing its data, directly
manipulating the shared memory pointer, without even locking it. This
could mean losing statistics under concurrent activity. The flush
callback was a no-op.
This code serves as a base template for extensions for the custom
cumulative statistics, so let's be clean and use a pending entry for the
incrementations, whose data is then flushed to the corresponding entry
in the shared hashtable when all the stats are reported, in its own
flush callback.
Author: Sami Imseih <samimseih@gmail.com>
Discussion: https://postgr.es/m/CAA5RZ0v0U0yhPbY+bqChomkPbyUrRQ3rQXnZf_SB-svDiQOpgQ@mail.gmail.com
Backpatch-through: 18
The shared memory size was calculated based on an offset of io_handles,
which is itself a pointer included in the structure. We tend to
overestimate the shared memory size overall, so this was unlikely an
issue in practice, but let's be correct and use the full size of the
structure in the calculation, so as the pointer for io_handles is
included.
Oversight in da7226993f.
Author: Madhukar Prasad <madhukarprasad@google.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Discussion: https://postgr.es/m/CAKi+wrbC2dTzh_vKJoAZXV5wqTbhY0n4wRNpCjJ=e36aoo0kFw@mail.gmail.com
Backpatch-through: 18
The EvalPlanQual recheck for TID Range Scan wasn't rechecking the TID qual
still passed after following update chains. This could result in tuples
being updated or deleted by plans using TID Range Scans where the ctid of
the new (updated) tuple no longer matches the clause of the scan. This
isn't desired behavior, and isn't consistent with what would happen if the
chosen plan had used an Index or Seq Scan, and that could lead to hard to
predict behavior for scans that contain TID quals and other quals as the
planner has freedom to choose TID Range or some other non-TID scan method
for such queries, and the chosen plan could change at any moment.
Here we fix this by properly implementing the recheck function for TID
Range Scans.
Backpatch to 14, where TID Range Scans were added
Reported-by: Sophie Alpert <pg@sophiebits.com>
Author: Sophie Alpert <pg@sophiebits.com>
Author: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/4a6268ff-3340-453a-9bf5-c98d51a6f729@app.fastmail.com
Backpatch-through: 14
The EvalPlanQual recheck for TID Scan wasn't rechecking the TID qual
still passed after following update chains. This could result in tuples
being updated or deleted by plans using TID Scans where the ctid of the
new (updated) tuple no longer matches the clause of the scan. This isn't
desired behavior, and isn't consistent with what would happen if the
chosen plan had used an Index or Seq Scan, and that could lead to hard to
predict behavior for scans that contain TID quals and other quals as the
planner has freedom to choose TID or some other scan method for such
queries, and the chosen plan could change at any moment.
Here we fix this by properly implementing the recheck function for TID
Scans.
Backpatch to 13, oldest supported version
Reported-by: Sophie Alpert <pg@sophiebits.com>
Author: Sophie Alpert <pg@sophiebits.com>
Author: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/4a6268ff-3340-453a-9bf5-c98d51a6f729@app.fastmail.com
Backpatch-through: 13
This reverts commit 98fc31d649.
That change allowed DROP OWNED BY to drop grants of the target
role to other roles, arguing that nobody would need those
privileges anymore. But that's not so: if you're not superuser,
you still need admin privilege on the target role so you can
drop it.
It's not clear whether or how the dependency-based approach
to solving the original problem can be adapted to keep these
grants. Since v18 release is fast approaching, the sanest
thing to do seems to be to revert this patch for now. The
race-condition problem is low severity and not worth taking
risks for.
I didn't force a catversion bump in 98fc31d64, so I won't do
so here either.
Reported-by: Dipesh Dhameliya <dipeshdhameliya125@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CABgZEgczOFicCJoqtrH9gbYMe_BV3Hq8zzCBRcMgmU6LRsihUA@mail.gmail.com
Backpatch-through: 18
The COMMENT should depend on the separately-dumped constraint, not the
domain. Sufficient restore parallelism might fail the COMMENT command
by issuing it before the constraint exists. Back-patch to v13, like
commit 0858f0f96e.
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Discussion: https://postgr.es/m/20250913020233.fa.nmisch@google.com
Backpatch-through: 13
In our previous discussions around making our regression tests
pass in FIPS mode, we concluded that we didn't need to support
the different error message wording observed with pre-3.0 OpenSSL.
However there are still a few LTS distributions soldiering along
with such versions, and now we have some in the buildfarm.
So let's add the variant expected-files needed to make them happy.
This commit only covers the core regression tests. Previous
discussion suggested that we might need some adjustments in
contrib as well, but it's not totally clear to me what those
would be. Rather than work it out from first principles,
I'll wait to see what the buildfarm shows.
Back-patch to v17 which is the oldest branch that claims
to support this case.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/443709.1757876535@sss.pgh.pa.us
Backpatch-through: 17
JsonConstructorExpr can produce non-NULL output with a NULL input, so
it should be treated as a non-strict construct. Failing to do so can
lead to incorrect query behavior.
For example, in the reported case, when pulling up a subquery that is
under an outer join, if the subquery's target list contains a
JsonConstructorExpr that uses subquery variables and it is mistakenly
treated as strict, it will be pulled up without being wrapped in a
PlaceHolderVar. As a result, the expression will be evaluated at the
wrong place and will not be forced to null when the outer join should
do so.
Back-patch to v16 where JsonConstructorExpr was introduced.
Bug: #19046
Reported-by: Runyuan He <runyuan@berkeley.edu>
Author: Tender Wang <tndrwang@gmail.com>
Co-authored-by: Richard Guo <guofenglinux@gmail.com>
Discussion: https://postgr.es/m/19046-765b6602b0a8cfdf@postgresql.org
Backpatch-through: 16
Fix for commit 3c86223c99. That commit moved the typedef of pg_int64
from postgres_ext.h to libpq-fe.h, because the only remaining place
where it might be used is libpq users, and since the type is obsolete,
the intent was to limit its scope.
The problem is that if someone builds an extension against an
older (pre-PG18) server version and a new (PG18) libpq, they might get
two typedefs, depending on include file order. This is not allowed
under C99, so they might get warnings or errors, depending on the
compiler and options. The underlying types might also be
different (e.g., long int vs. long long int), which would also lead to
errors. This scenario is plausible when using the standard Debian
packaging, which provides only the newest libpq but per-major-version
server packages.
The fix is to undo that part of commit 3c86223c99. That way, the
typedef is in the same header file across versions. At least, this is
the safest fix doable before PostgreSQL 18 releases.
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://www.postgresql.org/message-id/25144219-5142-4589-89f8-4e76948b32db%40eisentraut.org
Previously, pg_dump incorrectly queried pg_seclabel to retrieve security labels
for subscriptions, which are stored in pg_shseclabel as they are global objects.
This could result in security labels for subscriptions not being dumped.
This commit fixes the issue by updating pg_dump to query the pg_seclabels view,
which aggregates entries from both pg_seclabel and pg_shseclabel.
While querying pg_shseclabel directly for subscriptions was an alternative,
using pg_seclabels is simpler and sufficient.
In addition, pg_dump is updated to dump security labels on event triggers,
which were previously omitted.
Backpatch to all supported versions.
Author: Jian He <jian.universality@gmail.com>
Co-authored-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CACJufxHCt00pR9h51AVu6+yPD5J7JQn=7dQXxqacj0XyDhc-fA@mail.gmail.com
Backpatch-through: 13
Commit 7202d72787 added in passing some const qualifiers, but the one
on the postmaster_child_launch() startup_data argument was incorrect,
because the function itself modifies the pointed-to data. This is
hidden from the compiler because of casts. The qualifiers on the
functions called by postmaster_child_launch() are still correct.
Previously, pg_restore did not skip comments on policies even when
--no-policies was specified. As a result, it could issue COMMENT commands
for policies that were never created, causing those commands to fail.
This commit fixes the issue by ensuring that comments on policies
are also skipped when --no-policies is used.
Backpatch to v18, where --no-policies was added in pg_restore.
Author: Jian He <jian.universality@gmail.com>
Co-authored-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CACJufxHCt00pR9h51AVu6+yPD5J7JQn=7dQXxqacj0XyDhc-fA@mail.gmail.com
Backpatch-through: 18
Previously, pg_restore did not skip comments on publications or subscriptions
even when --no-publications or --no-subscriptions was specified. As a result,
it could issue COMMENT commands for objects that were never created,
causing those commands to fail.
This commit fixes the issue by ensuring that comments on publications and
subscriptions are also skipped when the corresponding options are used.
Backpatch to all supported versions.
Author: Jian He <jian.universality@gmail.com>
Co-authored-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CACJufxHCt00pR9h51AVu6+yPD5J7JQn=7dQXxqacj0XyDhc-fA@mail.gmail.com
Backpatch-through: 13
If extensions of equal names were installed in different directories
in the path, the views pg_available_extensions and
pg_available_extension_versions would show all of them, even though
only the first one was actually reachable by CREATE EXTENSION. To
fix, have those views skip extensions found later in the path if they
have names already found earlier.
Also add a bit of documentation that only the first extension in the
path can be used.
Reported-by: Pierrick <pierrick.chovelon@dalibo.com>
Discussion: https://www.postgresql.org/message-id/flat/8f5a0517-1cb8-4085-ae89-77e7454e27ba%40dalibo.com
The TimescaleDB extension expects to be able to change an nbtree scan's
keys across rescans. The issue arises in the extension's implementation
of loose index scan. This is arguably a misuse of the index AM API,
though apparently it worked until recently. It stopped working when the
skipScan flag was added to BTScanOpaqueData by commit 8a510275, though.
The flag wouldn't reliably track whether the scan (actually, the current
rescan) has any skip arrays, leading to confusion in _bt_set_startikey.
nbtree preprocessing will now defensively initialize the scan's skipScan
flag in all cases, including the case where _bt_preprocess_array_keys
returns early due to the (re)scan not using arrays. While nbtree isn't
obligated to support this use case (at least not according to my reading
of the index AM API), it still seems like a good idea to be consistent
here, on general robustness grounds.
Author: Peter Geoghegan <pg@bowt.ie>
Reported-By: Natalya Aksman <natalya@timescale.com>
Discussion: https://postgr.es/m/CAJumhcirfMojbk20+W0YimbNDkwdECvJprQGQ-XqK--ph09nQw@mail.gmail.com
Backpatch-through: 18
Commit e3ffc3e91 fixed the translation of character classes in
SIMILAR TO regular expressions. Unfortunately the fix broke a corner
case: if there is an escape character right after the opening bracket
(for example in "[\q]"), a closing bracket right after the escape
sequence would not be seen as closing the character class.
There were two more oversights: a backslash or a nested opening bracket
right at the beginning of a character class should remove the special
meaning from any following caret or closing bracket.
This bug suggests that this code needs to be more readable, so also
rename the variables "charclass_depth" and "charclass_start" to
something more meaningful, rewrite an "if" cascade to be more
consistent, and improve the commentary.
Reported-by: Dominique Devienne <ddevienne@gmail.com>
Reported-by: Stephan Springl <springl-psql@bfw-online.de>
Author: Laurenz Albe <laurenz.albe@cybertec.at>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAFCRh-8NwJd0jq6P=R3qhHyqU7hw0BTor3W0SvUcii24et+zAw@mail.gmail.com
Backpatch-through: 13
Commit a0b99fc12 caused pg_event_trigger_dropped_objects()
to not fill the object_name field for schemas, which it
should have; and caused it to fill the object_name field
for default values, which it should not have.
In addition, triggers and RLS policies really should behave
the same way as we're making column defaults do; that is,
they should have is_temporary = true if they belong to a
temporary table.
Fix those things, and upgrade event_trigger.sql's woefully
inadequate test coverage of these secondary output columns.
As before, back-patch only to v15.
Reported-by: Sergey Shinderuk <s.shinderuk@postgrespro.ru>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/bd7b4651-1c26-4d30-832b-f942fabcb145@postgrespro.ru
Backpatch-through: 15
A recently added nbtree preprocessing step failed to account for the
fact that DESC columns already had their B-Tree strategy number commuted
at this point in preprocessing. As a result, preprocessing could output
a set of scan keys where one or more keys had the correct strategy
number, but used the wrong comparison routine.
To fix, make the faulty code path that looks up a more restrictive
replacement operator/comparison routine commute its requested inequality
strategy (while outputting the transformed strategy number as before).
This makes the final transformed scan key comport with the approach
preprocessing has always used to deal with DESC columns (which is
described by comments above _bt_fix_scankey_strategy).
Oversight in commit commit b3f1a13f, which made nbtree preprocessing
perform transformations on skip array inequalities that can reduce the
total number of index searches.
Author: Peter Geoghegan <pg@bowt.ie>
Reported-By: Natalya Aksman <natalya@timescale.com>
Discussion: https://postgr.es/m/19049-b7df801e71de41b2@postgresql.org
Backpatch-through: 18
Its size was ~3.8GB before, which sometimes was not enough. OpenBSD CI task
often were failing due to no space left on device. Increase the RAM disk size
to ~4.6 GB.
Author: Nazir Bilal Yavuz <byavuz81@gmail.com>
Discussion: https://postgr.es/m/CAN55FZ2XVVPJRJmGB2DsL3gOrOinWh=HWvj6GO1cHzJ=6LwTag@mail.gmail.com
Backpatch-through: 18, where openbsd was added to CI
Clang 21 shows some new compiler warnings, for example:
warning: variable 'dstsize' is uninitialized when passed as a const pointer argument here [-Wuninitialized-const-pointer]
The fix is to initialize the variables when they are defined. This is
similar to, for example, the existing situation in gistKeyIsEQ().
Discussion: https://www.postgresql.org/message-id/flat/6604ad6e-5934-43ac-8590-15113d6ae4b1%40eisentraut.org
pg_event_trigger_dropped_objects() would report a column default
object with is_temporary = false, even if it belongs to a
temporary table. This seems clearly wrong, so adjust it to
report the table's temp-ness.
While here, refactor EventTriggerSQLDropAddObject to make its
handling of namespace objects less messy and avoid duplication
of the schema-lookup code. And add some explicit test coverage
of dropped-object reports for dependencies of temp tables.
Back-patch to v15. The bug exists further back, but the
GetAttrDefaultColumnAddress function this patch depends on does not,
and it doesn't seem worth adjusting it to cope with the older code.
Author: Antoine Violin <violin.antuan@gmail.com>
Co-authored-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAFjUV9x3-hv0gihf+CtUc-1it0hh7Skp9iYFhMS7FJjtAeAptA@mail.gmail.com
Backpatch-through: 15
The Sun Studio compiler complains about an empty declaration here.
Note for future historians: This does not mean that this compiler is
still of current interest for anyone using PostgreSQL. But we can let
this small fix be its parting gift.
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/a0f817ee-fb86-483a-8a14-b6f7f5991b6e%40eisentraut.org
hash_xlog.h included descriptions for the blocks used in WAL records
that were was not completely consistent with how the records are
generated, with one block missing for SQUEEZE_PAGE, and inconsistent
descriptions used for block 0 in VACUUM_ONE_PAGE and MOVE_PAGE_CONTENTS.
This information was incorrect since c11453ce0a, cross-checking the
logic for the record generation.
Author: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Discussion: https://postgr.es/m/CALdSSPj1j=a1d1hVA3oabRFz0hSU3KKrYtZPijw4UPUM7LY9zw@mail.gmail.com
Backpatch-through: 13
GucSource_Names was documented as being in guc.c, but since 0a20ff54f5
it is located in guc_tables.c. The reference to the location of
GucSource_Names is important, as GucSource needs to be kept in sync with
GucSource_Names.
Author: David G. Johnston <david.g.johnston@gmail.com>
Discussion: https://postgr.es/m/CAKFQuwYPgAHWPYjPzK7iXzhSZ6MKR8w20_Nz7ZXpOvx=kZbs7A@mail.gmail.com
Backpatch-through: 16
If the hash functions used for hashing tuples leaked any memory,
we failed to clean that up, resulting in query-lifespan memory
leakage in queries using hashed subplans. One way that could
happen is if the values being hashed require de-toasting, since
most of our hash functions don't trouble to clean up de-toasted
inputs.
Prior to commit bf6c614a2, this leakage was largely masked
because TupleHashTableMatch would reset hashtable->tempcxt
(via execTuplesMatch). But it doesn't do that anymore, and
that's not really the right place for this anyway: doing it
there could reset the tempcxt many times per hash lookup,
or not at all. Instead put reset calls into ExecHashSubPlan
and buildSubPlanHash. Along the way to that, rearrange
ExecHashSubPlan so that there's just one place to call
MemoryContextReset instead of several.
This amounts to accepting the de-facto API spec that the caller
of the TupleHashTable routines is responsible for resetting the
tempcxt adequately often. Although the other callers seem to
get this right, it was not documented anywhere, so add a comment
about it.
Bug: #19040
Reported-by: Haiyang Li <mohen.lhy@alibaba-inc.com>
Author: Haiyang Li <mohen.lhy@alibaba-inc.com>
Reviewed-by: Fei Changhong <feichanghong@qq.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/19040-c9b6073ef814f48c@postgresql.org
Backpatch-through: 13
autoconf builds have compiled this file with -ftree-vectorize since
commit 8870917623, but meson builds seem to have missed the memo.
Reviewed-by: Jeff Davis <pgsql@j-davis.com>
Discussion: https://postgr.es/m/aL85CeasM51-0D1h%40nathan
Backpatch-through: 16
Use -funroll-loops and -ftree-vectorize when building checksum.c to
match what autoconf does.
Missed backport of 9af672bcb2, noticed by Nathan Bossart.
Reported-by: Nathan Bossart <nathandbossart@gmail.com>
Discussion: https://postgr.es/m/a81f2f7ef34afc24a89c613671ea017e3651329c.camel@j-davis.com
Reviewed-by: Andres Freund <andres@anarazel.de>
Backpatch-through: 16
The startup process does not process shared invalidation messages, only
sending them, and never calls AtEOXact_SMgr() which clean up any
unpinned SMgrRelations. Hence, it is never able to free SMgrRelations
on a periodic basis, bloating its hashtable over time.
Like the checkpointer and the bgwriter, this commit takes a conservative
approach by freeing periodically SMgrRelations when replaying a
checkpoint record, either online or shutdown, so as the startup process
has a way to perform a periodic cleanup.
Issue caused by 21d9c3ee4e, so backpatch down to v17.
Author: Jingtang Zhang <mrdrivingduck@gmail.com>
Reviewed-by: Yuhang Qiu <iamqyh@gmail.com>
Discussion: https://postgr.es/m/28C687D4-F335-417E-B06C-6612A0BD5A10@gmail.com
Backpatch-through: 17
A new pgstats entry is created as a two-step process:
- The entry is looked at in the shared hashtable of pgstats, and is
inserted if not found.
- When not found and inserted, its fields are then initialized. This
part include a DSA chunk allocation for the stats data of the new entry.
As currently coded, if the DSA chunk allocation fails due to an
out-of-memory failure, an ERROR is generated, leaving in the pgstats
shared hashtable an inconsistent entry due to the first step, as the
entry has already been inserted in the hashtable. These broken entries
can then be found by other backends, crashing them.
There are only two callers of pgstat_init_entry(), when loading the
pgstats file at startup and when creating a new pgstats entry. This
commit changes pgstat_init_entry() so as we use dsa_allocate_extended()
with DSA_ALLOC_NO_OOM, making it return NULL on allocation failure
instead of failing. This way, a backend failing an entry creation can
take appropriate cleanup actions in the shared hashtable before throwing
an error. Currently, this means removing the entry from the shared
hashtable before throwing the error for the allocation failure.
Out-of-memory errors unlikely happen in the wild, and we do not bother
with back-patches when these are fixed, usually. However, the problem
dealt with here is a degree worse as it breaks the shared memory state
of pgstats, impacting other processes that may look at an inconsistent
entry that a different process has failed to create.
Author: Mikhail Kot <mikhail.kot@databricks.com>
Discussion: https://postgr.es/m/CAAi9E7jELo5_-sBENftnc2E8XhW2PKZJWfTC3i2y-GMQd2bcqQ@mail.gmail.com
Backpatch-through: 15
When executing a MERGE UPDATE action, if there is more than one
concurrent update of the target row, the lock-and-retry code would
sometimes incorrectly identify the latest version of the target tuple,
leading to incorrect results.
This was caused by using the ctid field from the TM_FailureData
returned by table_tuple_lock() in a case where the result was TM_Ok,
which is unsafe because the TM_FailureData struct is not guaranteed to
be fully populated in that case. Instead, it should use the tupleid
passed to (and updated by) table_tuple_lock().
To reduce the chances of similar errors in the future, improve the
commentary for table_tuple_lock() and TM_FailureData to make it
clearer that table_tuple_lock() updates the tid passed to it, and most
fields of TM_FailureData should not be relied on in non-failure cases.
An exception to this is the "traversed" field, which is set in both
success and failure cases.
Reported-by: Dmitry <dsy.075@yandex.ru>
Author: Yugo Nagata <nagata@sraoss.co.jp>
Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/1570d30e-2b95-4239-b9c3-f7bf2f2f8556@yandex.ru
Backpatch-through: 15
SlruRecentlyUsed() is an inline function since 53c2a97a92, not a
macro. The description of long_segment_names was missing at the top of
SimpleLruInit(), part forgotten in 4ed8f0913b.
Author: Julien Rouhaud <rjuju123@gmail.com>
Discussion: https://postgr.es/m/aLpBLMOYwEQkaleF@jrouhaud
Backpatch-through: 17
When executing a MERGE, check that the target relation supports all
actions mentioned in the MERGE command. Specifically, check that it
has a REPLICA IDENTITY if it publishes updates or deletes and the
MERGE command contains update or delete actions. Failing to do this
can silently break replication.
Author: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Tested-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/OS3PR01MB57180C87E43A679A730482DF94B62@OS3PR01MB5718.jpnprd01.prod.outlook.com
Backpatch-through: 15
If an INSERT has an ON CONFLICT DO UPDATE clause, the executor must
check that the target relation supports UPDATE as well as INSERT. In
particular, it must check that the target relation has a REPLICA
IDENTITY if it publishes updates. Formerly, it was not doing this
check, which could lead to silently breaking replication.
Fix by adding such a check to CheckValidResultRel(), which requires
adding a new onConflictAction argument. In back-branches, preserve ABI
compatibility by introducing a wrapper function with the original
signature.
Author: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Tested-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/OS3PR01MB57180C87E43A679A730482DF94B62@OS3PR01MB5718.jpnprd01.prod.outlook.com
Backpatch-through: 13
The counters saved from pgWalUsage, used for the difference calculations
when flushing the backend WAL stats, are updated when calling
pgstat_flush_backend() under PGSTAT_BACKEND_FLUSH_WAL, and not
pgstat_report_wal(). The comment updated in this commit referenced the
latter, but it is perfectly OK to flush the backend stats independently
of the WAL stats.
Noticed while looking at this area of the code, introduced by
76def4cdd7 as a copy-pasto.
Backpatch-through: 18
SubPlan nodes are typically built very early, before any RelOptInfos
have been constructed for the parent query level. As a result, the
simple_rel_array in the parent root has not yet been initialized.
Currently, during cost estimation of a SubPlan's testexpr, we may call
examine_variable() to look up statistical data about the expressions.
This can lead to "no relation entry for relid" errors.
To fix, pass root as NULL to cost_qual_eval() in cost_subplan(), since
the root does not yet contain enough information to safely consult
statistics.
One exception is SubPlan nodes built for the initplans of MIN/MAX
aggregates from indexes. In this case, having a NULL root is safe
because testexpr will be NULL. Additionally, an initplan will by
definition not consult anything from the parent plan.
Backpatch to all supported branches. Although the reported call path
that triggers this error is not reachable prior to v17, there's no
guarantee that other code paths -- especially in extensions -- could
not encounter the same issue when cost_qual_eval() is called with a
root that lacks a valid simple_rel_array. The test case is not
included in pre-v17 branches though.
Bug: #19037
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Diagnosed-by: Tom Lane <tgl@sss.pgh.pa.us>
Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/19037-3d1c7bb553c7ce84@postgresql.org
Backpatch-through: 13
PQtrace() was generating its output for non-printable characters without
casting the characters printed with unsigned char, leading to some extra
"\xffffff" generated in the output due to the fact that char may be
signed.
Oversights introduced by commit 198b3716db, so backpatch down to v14.
Author: Ran Benita <ran@unusedvar.com>
Discussion: https://postgr.es/m/a3383211-4539-459b-9d51-95c736ef08e0@app.fastmail.com
Backpatch-through: 14
SLRU bank locks are referred as "bank locks" or "SLRU bank locks" in the
code comments. The comments updated in this commit use the latter term.
Oversight in 53c2a97a92, that has replaced the single ControlLock by
the bank control locks.
Author: Julien Rouhaud <julien.rouhaud@free.fr>
Discussion: https://postgr.es/m/aLUT2UO8RjJOzZNq@jrouhaud
Backpatch-through: 17
Compression in pg_dump is abstracted using an API with multiple
implementations which can be selected at runtime by the user.
The API and its implementations have evolved over time, notable
commits include bf9aa490db, e9960732a9, 84adc8e20, and 0da243fed.
The errorhandling defined by the API was however problematic and
the implementations had a few bugs and/or were not following the
API specification. This commit modifies the API to ensure that
callers can perform errorhandling efficiently and fixes all the
implementations such that they all implement the API in the same
way. A full list of the changes can be seen below.
* write_func:
- Make write_func throw an error on all error conditions. All
callers of write_func were already checking for success and
calling pg_fatal on all errors, so we might as well make the
API support that case directly with simpler errorhandling as
a result.
* open_func:
- zstd: move stream initialization from the open function to
the read and write functions as they can have fatal errors.
Also ensure to dup the file descriptor like none and gzip.
- lz4: Ensure to dup the file descriptor like none and gzip.
* close_func:
- zstd: Ensure to close the file descriptor even if closing
down the compressor fails, and clean up state allocation on
fclose failures. Make sure to capture errors set by fclose.
- lz4: Ensure to close the file descriptor even if closing
down the compressor fails, and instead of calling pg_fatal
log the failures using pg_log_error. Make sure to capture
errors set by fclose.
- none: Make sure to catch errors set by fclose.
* read_func / gets_func:
- Make read_func unconditionally return the number of read
bytes instead of making it optional per implementation.
- lz4: Make sure to call throw an error and not return -1
- gzip: gzread returning zero cannot be assumed to indicate
EOF as it is documented to return zero for some types of
errors.
- lz4, zstd: Convert the _read_internal helper functions to
not call pg_fatal on errors to be able to handle gets_func
returning NULL on error.
* getc_func:
- zstd: Use an unsigned char rather than an int to read char
into.
* LZ4Stream_init:
- Make sure to not switch to inited state until we know that
initialization succeeded and reset errno just in case.
On top of these changes there are minor comment cleanups and
improvements as well as an attempt to consistently reset errno
in codepaths where it is inspected.
This work was initiated by a report of API misuse, which turned
into a larger body of work. As this is an internal API these
changes can be backpatched into all affected branches.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Author: Daniel Gustafsson <daniel@yesql.se>
Reported-by: Evgeniy Gorbanev <gorbanyoves@basealt.ru>
Discussion: https://postgr.es/m/517794.1750082166@sss.pgh.pa.us
Backpatch-through: 16
It's possible that if the only live partition is concurrently dropped
and try_table_open() fails, that the bms_del_member() will pfree the
live_parts Bitmapset. Since the bms_del_member() call does not assign
the result back to the live_parts local variable, the while loop could
segfault as that variable would still reference the pfree'd Bitmapset.
Backpatch to 15. 52f3de874 was backpatched to 14, but there's no
bms_del_member() there due to live_parts not yet existing in RelOptInfo in
that version. Technically there's no bug in version 15 as
bms_del_member() didn't pfree when the set became empty prior to
00b41463c (from v16). Applied to v15 anyway to keep the code similar and
to avoid the bad coding pattern.
Author: Bernd Reiß <bd_reiss@gmx.at>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/6b88f27a-c45c-4826-8e37-d61a04d90182@gmx.at
Backpatch-through: 15
I think the error message for a different condition was inadvertently
copied.
This problem seems to have been introduced by commit a4d75c86bf.
Author: Álvaro Herrera <alvherre@kurilemu.de>
Reported-by: jian he <jian.universality@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Backpatch-through: 14
Discussion: https://postgr.es/m/CACJufxEZ48toGH0Em_6vdsT57Y3L8pLF=DZCQ_gCii6=C3MeXw@mail.gmail.com
For a child relation, we should not assume that its parent's
unique-ified relation (or unique-ified path in v18) always exists. In
cases where all RHS columns that need to be unique-ified are equated
to constants, the unique-ified relation/path for the parent table is
not built, as there are no columns left to unique-ify. Failing to
account for this can result in a SIGSEGV crash during planning.
This patch checks whether the parent's unique-ified relation or path
exists and skips unique-ification of the child relation if it does
not.
Author: Richard Guo <guofenglinux@gmail.com>
Discussion: https://postgr.es/m/CAMbWs49MOdLW2c+qbLHHBt8VBu=4ONpM91D19=AWeW93eFUF6A@mail.gmail.com
Backpatch-through: 18
Previously, we used LW_EXCLUSIVE in several places despite only reading
WalSummarizerCtl fields. This patch reduces the lock level to LW_SHARED
where we are only reading the shared fields.
Backpatch to 17, where wal summarization was introduced.
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Discussion: https://postgr.es/m/CAD21AoDdKhf_9oriEYxY-JCdF+Oe_muhca3pcdkMEdBMzyHyKw@mail.gmail.com
Backpatch-through: 17
One mechanism we have for implementing semi-joins is to de-duplicate
the output of the RHS and then treat the join as a plain inner join.
Initial construction of the join's SpecialJoinInfo identifies the
RHS columns that need to be de-duplicated, but later we may find that
some of those don't need to be handled explicitly, either because
they're known to be constant or because they are redundant with some
previous column.
Up to now, while sort-based de-duplication handled such cases well,
hash-based de-duplication didn't: we'd still hash on all of the
originally-identified columns. This is probably not a very big
deal performance-wise, but in the wake of commit a3179ab69 it can
cause planner errors. That happens when join elimination causes
recalculation of variables' attr_needed bitmapsets, and we decide
that a variable mentioned in a semijoin clause doesn't need to be
propagated up to the join level anymore.
There are a number of ways we could slice the blame for this, but the
only fix that doesn't result in pessimizing plans for loosely-related
cases is to be more careful about not hashing columns we don't
actually need to de-duplicate. We can install that consideration
into create_unique_paths in master, or the predecessor code in
create_unique_path in v18, without much refactoring.
(As follow-up work, it might be a good idea to look at more-invasive
refactoring, in hopes of preventing other bugs in this area. But
with v18 release so close, there's not time for that now, nor would
we be likely to want to put such refactoring into v18 anyway.)
Reported-by: Sergey Soloviev <sergey.soloviev@tantorlabs.ru>
Diagnosed-by: Richard Guo <guofenglinux@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Richard Guo <guofenglinux@gmail.com>
Discussion: https://postgr.es/m/1fd1a421-4609-4d46-a1af-ab74d5de504a@tantorlabs.ru
Backpatch-through: 18
During an investigation into rather odd aio related errors on macos, observed
by Alexander and Konstantin, we started to wonder if bitfield access is
related to the error. At the moment it looks like it is related, we cannot
reproduce the failures when replacing the bitfields. In addition, the problem
can only be reproduced with some compiler [versions] and not everyone has been
able to reproduce the issue.
The observed problem is that, very rarely, PgAioHandle->{state,target} are in
an inconsistent state, after having been checked to be in a valid state not
long before, triggering an assertion failure. Unfortunately, this could be
caused by wrong compiler code generation or somehow of missing memory barriers
- we don't really know. In theory there should not be any concurrent write
access to the handle in the state the bug is triggered, as the handle was idle
and is just being initialized.
Separately from the bug, we observed that at least gcc and clang generate
rather terrible code for the bitfield access. Even if it's not clear if the
observed assertion failure is actually caused by the bitfield somehow, the bad
code generation alone is sufficient reason to stop using bitfields.
Therefore, replace the enum bitfields with uint8s and instead cast in each
switch statement.
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Reported-by: Konstantin Knizhnik <knizhnik@garret.ru>
Discussion: https://postgr.es/m/1500090.1745443021@sss.pgh.pa.us
Backpatch-through: 18
Commit d31bbfb659 lost some test coverage, because the situation
being tested, a concurrent DROP, cannot happen anymore. Put the test
coverage back with a bit of a trick, by deleting directly from the
catalog table.
Co-authored-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://www.postgresql.org/message-id/flat/bf72b82c-124d-4efa-a484-bb928e9494e4@eisentraut.org
Commit d31bbfb659 removed the comment at objectNamesToOids() that
there is no locking, because that commit added locking. But to fix
all the problems, we'd still need a stronger lock. So put the comment
back with more a detailed explanation.
Co-authored-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://www.postgresql.org/message-id/flat/bf72b82c-124d-4efa-a484-bb928e9494e4@eisentraut.org
This patch fixes a bug in how 'load_external_function' handles
'$libdir/ prefixes in module paths.
Previously, 'load_external_function' would unconditionally strip
'$libdir/' from the beginning of the 'filename' string. This caused
an issue when the path was nested, such as "$libdir/nested/my_lib".
Stripping the prefix resulted in a path of "nested/my_lib", which
would fail to be found by the expand_dynamic_library_name function
because the original '$libdir' macro was removed.
To fix this, the code now checks for the presence of an additional
directory separator ('/' or '\') after the '$libdir/' prefix. The
prefix is only stripped if the remaining string does not contain a
separator. This ensures that simple filenames like '"$libdir/my_lib"'
are correctly handled, while nested paths are left intact for
'expand_dynamic_library_name' to process correctly.
Reported-by: Dilip Kumar <dilipbalaut@gmail.com>
Co-authored-by: Matheus Alcantara <matheusssilv97@gmail.com>
Co-authored-by: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Srinath Reddy Sadipiralla <srinath2133@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CAFiTN-uKNzAro4tVwtJhF1UqcygfJ%2BR%2BRL%3Db-_ZMYE3LdHoGhA%40mail.gmail.com
When checking for expression indexes that may be affected by a Unicode
update during upgrade, check for a few more functions. Specifically,
check for documented regexp functions, as well as the new CASEFOLD()
function.
Also, fully-qualify references to pg_catalog.text and
pg_catalog.regtype.
Discussion: https://postgr.es/m/399b656a3abb0c9283538a040f72199c0601525c.camel@j-davis.com
Backpatch-through: 18
Followup to 4e1e41733 and 52ecd05ae. oauth-utils.c uses
pthread_sigmask(), requiring -pthread on Debian bullseye at minimum.
Reported-by: Christoph Berg <myon@debian.org>
Tested-by: Christoph Berg <myon@debian.org>
Discussion: https://postgr.es/m/aK4PZgC0wuwQ5xSK%40msg.df7cb.de
Backpatch-through: 18
Commit 4b754d6c1 introduced the concept of an excludeOnly scan key,
which cannot select matching index entries but can reject
non-matching tuples, for example a tsquery such as '!term'. There are
poorly-documented assumptions that such scan keys do not appear as the
first scan key. ginNewScanKey did nothing to ensure that, however,
with the result that certain GIN index searches could go into an
infinite loop while apparently-equivalent queries with the clauses in
a different order were fine.
Fix by teaching ginNewScanKey to place all excludeOnly scan keys
after all not-excludeOnly ones. So far as we know at present,
it might be sufficient to avoid the case where the very first
scan key is excludeOnly; but I'm not very convinced that there
aren't other dependencies on the ordering.
Bug: #19031
Reported-by: Tim Wood <washwithcare@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/19031-0638148643d25548@postgresql.org
Backpatch-through: 13
The CHECK_FOR_INTERRUPTS call in gingetbitmap turns out to be
inadequate to prevent a long uninterruptible loop, because
we now know a case where looping occurs within scanGetItem.
While the next patch will fix the bug that caused that, it
seems foolish to assume that no similar patterns are possible.
Let's do the CFI within scanGetItem's retry loop, instead.
This demonstrably allows canceling out of the loop exhibited
in bug #19031.
Bug: #19031
Reported-by: Tim Wood <washwithcare@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/19031-0638148643d25548@postgresql.org
Backpatch-through: 13
The Self-Join Elimination SJE feature messes up keeping and removing RowMark's
in remove_self_joins_one_group(). That didn't lead to user-level error,
because the planned RowMark is only used to reference a rtable entry in later
execution stages. An RTE entry for keeping and removing relations is
identical and refers to the same relation OID.
To reduce confusion and prevent future issues, this commit cleans up the code
and fixes the incorrect behaviour. Furthermore, it includes sanity checks in
setrefs.c on existing non-null RTE and RelOptInfo entries for each RowMark.
Discussion: https://postgr.es/m/18c6bd6c-6d2a-419a-b0da-dfedef34b585%40gmail.com
Author: Andrei Lepikhov <lepihov@gmail.com>
Reviewed-by: Greg Sabino Mullane <htamfids@gmail.com>
Backpatch-through: 18
Rename inner and outer to rrel and krel, respectively, to highlight their
connection to r and k indexes. For the same reason, rename imark and omark
to rmark and kmark.
Discussion: https://postgr.es/m/18c6bd6c-6d2a-419a-b0da-dfedef34b585%40gmail.com
Author: Andrei Lepikhov <lepihov@gmail.com>
Reviewed-by: Greg Sabino Mullane <htamfids@gmail.com>
Backpatch-through: 18
libpq-oauth uses floor() but did not link against libm. Since libpq
itself uses -lm, nothing in the buildfarm has had problems with
libpq-oauth yet, and it seems difficult to hit a failure in practice.
But commit 1443b6c0e attempted to add an executable based on
libpq-oauth, which ran into link-time failures with Clang due to this
omission. It seems prudent to fix this for both the module and the
executable simultaneously so that no one trips over it in the future.
This is a Makefile-only change. The Meson side already pulls in libm,
through the os_deps dependency.
Discussion: https://postgr.es/m/CAOYmi%2Bn6ORcmV10k%2BdAs%2Bp0b9QJ4bfpk0WuHQaF5ODXxM8Y36A%40mail.gmail.com
Backpatch-through: 18
v17 introduced the MAINTAIN ON TABLES privilege. That changed the
applicable "baseacls" reaching buildACLCommands(). That yielded
spurious TestUpgradeXversion diffs. Change to use a TYPES privilege.
Types have the same one privilege in all supported versions, so they
avoid the problem. Per buildfarm. Back-patch to v13, like that commit.
Discussion: https://postgr.es/m/20250823144505.88.nmisch@google.com
Backpatch-through: 13
Commit 0decd5e89d missed DO_DEFAULT_ACL,
leading to assertion failures, potential dump order instability, and
spurious schema diffs. Back-patch to v13, like that commit.
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Author: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/d32aaa8d-df7c-4f94-bcb3-4c85f02bea21@gmail.com
Backpatch-through: 13
This reverts commit bc22dc0e0d.
It appears that conditional variables are not suitable for use inside
critical sections. If WaitLatch()/WaitEventSetWaitBlock() face postmaster
death, they exit, releasing all locks instead of PANIC. In certain
situations, this leads to data corruption.
Reported-by: Andrey Borodin <x4mmm@yandex-team.ru>
Discussion: https://postgr.es/m/B3C69B86-7F82-4111-B97F-0005497BB745%40yandex-team.ru
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Reviewed-by: Aleksander Alekseev <aleksander@tigerdata.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Yura Sokolov <y.sokolov@postgrespro.ru>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Backpatch-through: 18
Statistics aren't created for virtual generated columns, so
"vacuumdb --missing-stats-only" always chooses to analyze tables
that have them. To fix, modify vacuumdb's query for retrieving
relations that are missing statistics to exclude those columns.
Oversight in commit edba754f05.
Author: Yugo Nagata <nagata@sraoss.co.jp>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Corey Huinker <corey.huinker@gmail.com>
Discussion: https://postgr.es/m/20250820104226.8ba51e43164cd590b863ce41%40sraoss.co.jp
Backpatch-through: 18
The protocol documentation states that the maximum length of a cancel
key is 256 bytes. This starts checking for that limit in libpq.
Otherwise third party backend implementations will probably start
using more bytes anyway. We also start requiring that a protocol 3.0
connection does not send a longer cancel key, to make sure that
servers don't start breaking old 3.0-only clients by accident. Finally
this also restricts the minimum key length to 4 bytes (both in the
protocol spec and in the libpq implementation).
Author: Jelte Fennema-Nio <postgres@jeltef.nl>
Reviewed-by: Jacob Champion <jchampion@postgresql.org>
Discussion: https://www.postgresql.org/message-id/df892f9f-5923-4046-9d6f-8c48d8980b50@iki.fi
Backpatch-through: 18
In most cases, if an out-of-memory situation happens, we attach the
error message to the connection and report it at the next
PQgetResult() call. However, there are a few cases, while processing
messages that are not associated with any particular query, where we
handled failed allocations differently and not very nicely:
- If we ran out of memory while processing an async notification,
getNotify() either returned EOF, which stopped processing any
further data until more data was received from the server, or
silently dropped the notification. Returning EOF is problematic
because if more data never arrives, e.g. because the connection was
used just to wait for the notification, or because the next
ReadyForQuery was already received and buffered, it would get stuck
forever. Silently dropping a notification is not nice either.
- (New in v18) If we ran out of memory while receiving BackendKeyData
message, getBackendKeyData() returned EOF, which has the same issues
as in getNotify().
- If we ran out of memory while saving a received a ParameterStatus
message, we just skipped it. A later call to PQparameterStatus()
would return NULL, even though the server did send the status.
Change all those cases to terminate the connnection instead. Our
options for reporting those errors are limited, but it seems better to
terminate than try to soldier on. Applications should handle
connection loss gracefully, whereas silently missing a notification,
parameter status, or cancellation key could cause much weirder
problems.
This also changes the error message on OOM while expanding the input
buffer. It used to report "cannot allocate memory for input buffer",
followed by "lost synchronization with server: got message type ...".
The "lost synchronization" message seems unnecessary, so remove that
and report only "cannot allocate memory for input buffer". (The
comment speculated that the out of memory could indeed be caused by
loss of sync, but that seems highly unlikely.)
This evolved from a more narrow patch by Jelte Fennema-Nio, which was
reviewed by Jacob Champion.
Somewhat arbitrarily, backpatch to v18 but no further. These are
long-standing issues, but we haven't received any complaints from the
field. We can backpatch more later, if needed.
Co-authored-by: Jelte Fennema-Nio <postgres@jeltef.nl>
Reviewed-by: Jelte Fennema-Nio <postgres@jeltef.nl>
Reviewed-by: Jacob Champion <jchampion@postgresql.org>
Discussion: https://www.postgresql.org/message-id/df892f9f-5923-4046-9d6f-8c48d8980b50@iki.fi
Backpatch-through: 18
Commit 1585ff7387 changed GetTransactionSnapshot() to throw an error
if it's called during logical decoding, instead of returning the
historic snapshot. I made that change for extra protection, because a
historic snapshot can only be used to access catalog tables while
GetTransactionSnapshot() is usually called when you're executing
arbitrary queries. You might get very subtle visibility problems if
you tried to use the historic snapshot for arbitrary queries.
There's no built-in code in PostgreSQL that calls
GetTransactionSnapshot() during logical decoding, but it turns out
that the pglogical extension does just that, to evaluate row filter
expressions. You would get weird results if the row filter runs
arbitrary queries, but it is sane as long as you don't access any
non-catalog tables. Even though there are no checks to enforce that in
pglogical, a typical row filter expression does not access any tables
and works fine. Accessing tables marked with the user_catalog_table =
true option is also OK.
To fix pglogical with row filters, and any other extensions that might
do similar things, revert GetTransactionSnapshot() to return a
historic snapshot during logical decoding.
To try to still catch the unsafe usage of historic snapshots, add
checks in heap_beginscan() and index_beginscan() to complain if you
try to use a historic snapshot to scan a non-catalog table. We're very
close to the version 18 release however, so add those new checks only
in master.
Backpatch-through: 18
Reported-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Noah Misch <noah@leadboat.com>
Discussion: https://www.postgresql.org/message-id/20250809222338.cc.nmisch@google.com
Temporary relations may share the same RelFileNumber with a permanent
relation, or other temporary relations associated with other sessions.
Being able to uniquely identify a temporary relation would require
RelidByRelfilenumber() to know about the proc number of the temporary
relation it wants to identify, something it is not designed for since
its introduction in f01d1ae3a1.
There are currently three callers of RelidByRelfilenumber():
- autoprewarm.
- Logical decoding, reorder buffer.
- pg_filenode_relation(), that attempts to find a relation OID based on
a tablespace OID and a RelFileNumber.
This makes the situation problematic particularly for the first two
cases, leading to the possibility of random ERRORs due to
inconsistencies that temporary relations can create in the cache
maintained by RelidByRelfilenumber(). The third case should be less of
an issue, as I suspect that there are few direct callers of
pg_filenode_relation().
The window where the ERRORs are happen is very narrow, requiring an OID
wraparound to create a lookup conflict in RelidByRelfilenumber() with a
temporary table reusing the same OID as another relation already cached.
The problem is easier to reach in workloads with a high OID consumption
rate, especially with a higher number of temporary relations created.
We could get pg_filenode_relation() and RelidByRelfilenumber() to work
with temporary relations if provided the means to identify them with an
optional proc number given in input, but the years have also shown that
we do not have a use case for it, yet. Note that this could not be
backpatched if pg_filenode_relation() needs changes. It is simpler to
ignore temporary relations.
Reported-by: Shenhao Wang <wangsh.fnst@fujitsu.com>
Author: Vignesh C <vignesh21@gmail.com>
Reviewed-By: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-By: Robert Haas <robertmhaas@gmail.com>
Reviewed-By: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-By: Takamichi Osumi <osumi.takamichi@fujitsu.com>
Reviewed-By: Michael Paquier <michael@paquier.xyz>
Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com>
Reported-By: Shenhao Wang <wangsh.fnst@fujitsu.com>
Discussion: https://postgr.es/m/bbaaf9f9-ebb2-645f-54bb-34d6efc7ac42@fujitsu.com
Backpatch-through: 13
The result of pgaio_io_get_id() was being assigned to a mix of int and
uint32 variables. This fixes it to use int consistently, which seems
the most correct. Also change the queue empty special value in
method_worker.c to -1 from UINT32_MAX.
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://www.postgresql.org/message-id/70c784b3-f60b-4652-b8a6-75e5f051243e%40eisentraut.org
If we error out during execution of a SQL-language function, we will
often leave behind non-null pointers in its SQLFunctionCache's cplan
and eslist fields. This is problematic if the SQLFunctionCache is
re-used, because those pointers will point at resources that were
released during error cleanup. This problem escaped detection so far
because ordinarily we won't re-use an FmgrInfo+SQLFunctionCache struct
after a query error. However, in the rather improbable case that
someone implements an opclass support function in SQL language, there
will be long-lived FmgrInfos for it in the relcache, and then the
problem is reachable after the function throws an error.
To fix, add a flag to SQLFunctionCache that tracks whether execution
escapes out of fmgr_sql, and clear out the relevant fields during
init_sql_fcache if so. (This is going to need more thought if we ever
try to share FMgrInfos across threads; but it's very far from being
the only problem such a project will encounter, since many functions
regard fn_extra as being query-local state.)
This broke at commit 0313c5dc6; before that we did not try to re-use
SQLFunctionCache state across calls. Hence, back-patch to v18.
Bug: #19026
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/19026-90aed5e71d0c8af3@postgresql.org
Backpatch-through: 18
Some replication slot manipulations (logical decoding via SQL,
advancing) were failing an assertion when releasing a slot in
single-user mode, because active_pid was not set in a ReplicationSlot
when its slot is acquired.
ReplicationSlotAcquire() has some logic to be able to work with the
single-user mode. This commit sets ReplicationSlot->active_pid to
MyProcPid, to let the slot-related logic fall-through, considering the
single process as the one holding the slot.
Some TAP tests are added for various replication slot functions with the
single-user mode, while on it, for slot creation, drop, advancing, copy
and logical decoding with multiple slot types (temporary, physical vs
logical). These tests are skipped on Windows, as direct calls of
postgres --single would fail on permission failures. There is no
platform-specific behavior that needs to be checked, so living with this
restriction should be fine. The CI is OK with that, now let's see what
the buildfarm tells.
Author: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Paul A. Jungwirth <pj@illuminatedcomputing.com>
Reviewed-by: Mutaamba Maasha <maasha@gmail.com>
Discussion: https://postgr.es/m/OSCPR01MB14966ED588A0328DAEBE8CB25F5FA2@OSCPR01MB14966.jpnprd01.prod.outlook.com
Backpatch-through: 13
The DROP SUBSCRIPTION command performs several operations: it stops the
subscription workers, removes subscription-related entries from system
catalogs, and deletes the replication slot on the publisher server.
Previously, this command acquired an AccessExclusiveLock on
pg_subscription before initiating these steps.
However, while holding this lock, the command attempts to connect to the
publisher to remove the replication slot. In cases where the connection is
made to a newly created database on the same server as subscriber, the
cache-building process during connection tries to acquire an
AccessShareLock on pg_subscription, resulting in a self-deadlock.
To resolve this issue, we reduce the lock level on pg_subscription during
DROP SUBSCRIPTION from AccessExclusiveLock to RowExclusiveLock. Earlier,
the higher lock level was used to prevent the launcher from starting a new
worker during the drop operation, as a restarted worker could become
orphaned.
Now, instead of relying on a strict lock, we acquire an AccessShareLock on
the specific subscription being dropped and re-validate its existence
after acquiring the lock. If the subscription is no longer valid, the
worker exits gracefully. This approach avoids the deadlock while still
ensuring that orphan workers are not created.
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Author: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: vignesh C <vignesh21@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Backpatch-through: 13
Discussion: https://postgr.es/m/18988-7312c868be2d467f@postgresql.org
Apparently the only Test::More function this script uses is
BAIL_OUT, so this omission just results in the wrong error
output appearing in the cases where it bails out.
Seems to have been an oversight in commit 9f899562d which
split Kerberos.pm out of another script.
Author: Maxim Orlov <orlovmg@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CACG=ezY1Dp-S94b78nN0ZuaBGGcMUB6_nF-VyYUwPt1ArFqmGA@mail.gmail.com
Backpatch-through: 17
Commit c5b7ba4e6 changed things so that the ri_RootResultRelInfo field
of this struct is set for both partitions and inheritance children and
used for tuple routing and transition capture (before that commit, it
was only set for partitions to route tuples into), but failed to update
these comments.
Author: Etsuro Fujita <etsuro.fujita@gmail.com>
Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Discussion: https://postgr.es/m/CAPmGK14NF5CcdCmTZpxrvpvBiT0y4EqKikW1r_wAu1CEHeOmUA%40mail.gmail.com
Backpatch-through: 14
When extracting a timestamp from a UUIDv7, a conversion from
milliseconds to microseconds was using the incorrect constant
NS_PER_US instead of US_PER_MS. Although both constants have the same
value, this fix improves code clarity by using the semantically
correct constant.
Backpatch to v18, where UUIDv7 was introduced.
Author: Erik Nordström <erik@tigerdata.com>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Discussion: https://postgr.es/m/CACAa4V+i07eaP6h4MHNydZeX47kkLPwAg0sqe67R=M5tLdxNuQ@mail.gmail.com
Backpatch-through: 18
We do not want to trigger some tasks by default, to avoid using too many
compute credits. These tasks have to be manually triggered to be run. But
e.g. for cfbot we do have sufficient resources, so we always want to start
those tasks.
With this commit, an individual repository can be configured to trigger
them automatically using an environment variable defined under
"Repository Settings", for example:
REPO_CI_AUTOMATIC_TRIGGER_TASKS="mingw netbsd openbsd"
This will enable cfbot to turn them on by default when running tests for the
Commitfest app.
Backpatch this back to PG 15, even though PG 15 does not have any manually
triggered task. Keeping the CI infrastructure the same seems advantageous.
Author: Andres Freund <andres@anarazel.de>
Co-authored-by: Thomas Munro <thomas.munro@gmail.com>
Co-authored-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Discussion: https://postgr.es/m/20240413021221.hg53rvqlvldqh57i%40awork3.anarazel.de
Backpatch-through: 16
This function uses an argument named "maxsize" that is only used in
assertions, being set once outside the assertion area. Recent gcc
versions with -Wunused-but-set-parameter complain about a warning when
building without assertions enabled, because of that.
In order to fix this issue, PG_USED_FOR_ASSERTS_ONLY is added to the
function argument of SerializeClientConnectionInfo(), which is the first
time we are doing so in the tree. The CI is fine with the change, but
let's see what the buildfarm has to say on the matter.
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Jacob Champion <jchampion@postgresql.org>
Discussion: https://postgr.es/m/pevajesswhxafjkivoq3yvwxga77tbncghlf3gq5fvchsvfuda@6uivg25sb3nx
Backpatch-through: 16
It turns out that on some platforms (at least current macOS, NetBSD,
OpenBSD) semget(2) will return EINVAL if there is a pre-existing
semaphore set with the same key and too few semaphores. Our code
expects EEXIST in that case and treats EINVAL as a hard failure,
resulting in failure during initdb or postmaster start.
POSIX does document EINVAL for too-few-semaphores-in-set, and is
silent on its priority relative to EEXIST, so this behavior arguably
conforms to spec. Nonetheless it's quite problematic because EINVAL
is also documented to mean that nsems is greater than the system's
limit on the number of semaphores per set (SEMMSL). If that is
where the problem lies, retrying would just become an infinite loop.
To resolve this contradiction, retry after EINVAL, but also install a
loop limit that will make us give up regardless of the specific errno
after trying 1000 different keys. (1000 is a pretty arbitrary number,
but it seems like it should be sufficient.) I like this better than
the previous infinite-looping behavior, since it will also keep us out
of trouble if (say) we get EACCES due to a system-level permissions
problem rather than anything to do with a specific semaphore set.
This problem has only been observed in the field in PG 17, which uses
a higher nsems value than other branches (cf. 38da05346, 810a8b1c8).
That makes it possible to get the failure if a new v17 postmaster
has a key collision with an existing postmaster of another branch.
In principle though, we might see such a collision against a semaphore
set created by some other application, in which case all branches are
vulnerable on these platforms. Hence, backpatch.
Reported-by: Gavin Panella <gavinpanella@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CALL7chmzY3eXHA7zHnODUVGZLSvK3wYCSP0RmcDFHJY8f28Q3g@mail.gmail.com
Backpatch-through: 13
A malicious server could inject psql meta-commands into plain-text
dump output (i.e., scripts created with pg_dump --format=plain,
pg_dumpall, or pg_restore --file) that are run at restore time on
the machine running psql. To fix, introduce a new "restricted"
mode in psql that blocks all meta-commands (except for \unrestrict
to exit the mode), and teach pg_dump, pg_dumpall, and pg_restore to
use this mode in plain-text dumps.
While at it, encourage users to only restore dumps generated from
trusted servers or to inspect it beforehand, since restoring causes
the destination to execute arbitrary code of the source superusers'
choice. However, the client running the dump and restore needn't
trust the source or destination superusers.
Reported-by: Martin Rakhmanov
Reported-by: Matthieu Denais <litezeraw@gmail.com>
Reported-by: RyotaK <ryotak.mail@gmail.com>
Suggested-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Security: CVE-2025-8714
Backpatch-through: 13
Maliciously-crafted object names could achieve SQL injection during
restore. CVE-2012-0868 fixed this class of problem at the time, but
later work reintroduced three cases. Commit
bc8cd50fef (back-patched to v11+ in
2023-05 releases) introduced the pg_dump case. Commit
6cbdbd9e8d (v12+) introduced the two
pg_dumpall cases. Move sanitize_line(), unchanged, to dumputils.c so
pg_dumpall has access to it in all supported versions. Back-patch to
v13 (all supported versions).
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Backpatch-through: 13
Security: CVE-2025-8715
Commit e2d4ef8de8 (the fix for CVE-2017-7484) added security checks
to the selectivity estimation functions to prevent them from running
user-supplied operators on data obtained from pg_statistic if the user
lacks privileges to select from the underlying table. In cases
involving inheritance/partitioning, those checks were originally
performed against the child RTE (which for plain inheritance might
actually refer to the parent table). Commit 553d2ec271 then extended
that to also check the parent RTE, allowing access if the user had
permissions on either the parent or the child. It turns out, however,
that doing any checks using the child RTE is incorrect, since
securityQuals is set to NULL when creating an RTE for an inheritance
child (whether it refers to the parent table or the child table), and
therefore such checks do not correctly account for any RLS policies or
security barrier views. Therefore, do the security checks using only
the parent RTE. This is consistent with how RLS policies are applied,
and the executor's ACL checks, both of which use only the parent
table's permissions/policies. Similar checks are performed in the
extended stats code, so update that in the same way, centralizing all
the checks in a new function.
In addition, note that these checks by themselves are insufficient to
ensure that the user has access to the table's data because, in a
query that goes via a view, they only check that the view owner has
permissions on the underlying table, not that the current user has
permissions on the view itself. In the selectivity estimation
functions, there is no easy way to navigate from underlying tables to
views, so add permissions checks for all views mentioned in the query
to the planner startup code. If the user lacks permissions on a view,
a permissions error will now be reported at planner-startup, and the
selectivity estimation functions will not be run.
Checking view permissions at planner-startup in this way is a little
ugly, since the same checks will be repeated at executor-startup.
Longer-term, it might be better to move all the permissions checks
from the executor to the planner so that permissions errors can be
reported sooner, instead of creating a plan that won't ever be run.
However, such a change seems too far-reaching to be back-patched.
Back-patch to all supported versions. In v13, there is the added
complication that UPDATEs and DELETEs on inherited target tables are
planned using inheritance_planner(), which plans each inheritance
child table separately, so that the selectivity estimation functions
do not know that they are dealing with a child table accessed via its
parent. Handle that by checking access permissions on the top parent
table at planner-startup, in the same way as we do for views. Any
securityQuals on the top parent table are moved down to the child
tables by inheritance_planner(), so they continue to be checked by the
selectivity estimation functions.
Author: Dean Rasheed <dean.a.rasheed@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Noah Misch <noah@leadboat.com>
Backpatch-through: 13
Security: CVE-2025-8713
Commit 0decd5e89d recently added the
assertion to confirm dump order remains independent of OID values. The
assertion remained reachable via DO_DEFAULT_ACL. Given the release wrap
tomorrow, make the assertion master-only.
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/d32aaa8d-df7c-4f94-bcb3-4c85f02bea21@gmail.com
Backpatch-through: 13-18
The internal queue of buffers could become corrupted in a rare edge case
that failed to invalidate an entry, causing a stale buffer to be
"forwarded" to StartReadBuffers(). This is a simple fix for the
immediate problem.
A small API change might be able to remove this and related fragility
entirely, but that will have to wait a bit.
Defect in commit ed0b87ca.
Bug: 19006
Backpatch-through: 18
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com>
Discussion: https://postgr.es/m/19006-80fcaaf69000377e%40postgresql.org
Tracking down the bugs that led to the addition of comb_multiplexer()
and drain_timer_events() was difficult, because an inefficient flow is
not visibly different from one that is working properly. To help
maintainers notice when something has gone wrong, track the number of
calls into the flow as part of debug mode, and print the total when the
flow finishes.
A new test makes sure the total count is less than 100. (We expect
something on the order of 10.) This isn't foolproof, but it is able to
catch several regressions in the logic of the prior two commits, and
future work to add TLS support to the oauth_validator test server should
strengthen it as well.
Backpatch-through: 18
Discussion: https://postgr.es/m/CAOYmi+nDZxJHaWj9_jRSyf8uMToCADAmOfJEggsKW-kY7aUwHA@mail.gmail.com
In a case similar to the previous commit, an expired timer can remain
permanently readable if Curl does not remove the timeout itself. Since
that removal isn't guaranteed to happen in real-world situations,
implement drain_timer_events() to reset the timer before calling into
drive_request().
Moving to drain_timer_events() happens to fix a logic bug in the
previous caller of timer_expired(), which treated an error condition as
if the timer were expired instead of bailing out.
The previous implementation of timer_expired() gave differing results
for epoll and kqueue if the timer was reset. (For epoll, a reset timer
was considered to be expired, and for kqueue it was not.) This didn't
previously cause problems, since timer_expired() was only called while
the timer was known to be set, but both implementations now use the
kqueue logic.
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Backpatch-through: 18
Discussion: https://postgr.es/m/CAOYmi+nDZxJHaWj9_jRSyf8uMToCADAmOfJEggsKW-kY7aUwHA@mail.gmail.com
If Curl needs to switch the direction of a socket's registration (e.g.
from CURL_POLL_IN to CURL_POLL_OUT), it expects the old registration to
be discarded. For epoll, this happened via EPOLL_CTL_MOD, but for
kqueue, the old registration would remain if it was not explicitly
removed by Curl.
Explicitly remove the opposite-direction event during registrations. (If
that event doesn't exist, we'll just get an ENOENT, which will be
ignored by the same code that handles CURL_POLL_REMOVE.) A few
assertions are also added to strengthen the relationship between the
number of events added, the number of events pulled off the queue, and
the lengths of the kevent arrays.
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Backpatch-through: 18
Discussion: https://postgr.es/m/CAOYmi+nDZxJHaWj9_jRSyf8uMToCADAmOfJEggsKW-kY7aUwHA@mail.gmail.com
If a socket is added to the kqueue, becomes readable/writable, and
subsequently becomes non-readable/writable again, the kqueue itself will
remain readable until either the socket registration is removed, or the
stale event is cleared via a call to kevent().
In many simple cases, Curl itself will remove the socket registration
quickly, but in real-world usage, this is not guaranteed to happen. The
kqueue can then remain stuck in a permanently readable state until the
request ends, which results in pointless wakeups for the client and
wasted CPU time.
Implement comb_multiplexer() to call kevent() and unstick any stale
events that would cause unnecessary callbacks. This is called right
after drive_request(), before we return control to the client to wait.
Suggested-by: Thomas Munro <thomas.munro@gmail.com>
Co-authored-by: Thomas Munro <thomas.munro@gmail.com>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Backpatch-through: 18
Discussion: https://postgr.es/m/CAOYmi+nDZxJHaWj9_jRSyf8uMToCADAmOfJEggsKW-kY7aUwHA@mail.gmail.com
This function is called from ATExecAttachPartition/ATExecAddInherit,
which prevent tables with row-level triggers with transition tables from
becoming partitions or inheritance children, to check if there is such a
trigger on the given table, but failed to check if a found trigger is
row-level, causing the caller functions to needlessly prevent a table
with only a statement-level trigger with transition tables from becoming
a partition or inheritance child. Repair.
Oversight in commit 501ed02cf.
Author: Etsuro Fujita <etsuro.fujita@gmail.com>
Discussion: https://postgr.es/m/CAPmGK167mXzwzzmJ_0YZ3EZrbwiCxtM1vogH_8drqsE6PtxRYw%40mail.gmail.com
Backpatch-through: 13
Previously, pg_dump --filter could misinterpret invalid object types
in the filter file as valid ones. For example, the invalid object type
"table-data" (likely a typo for the valid "table_data") could be
mistakenly recognized as "table", causing pg_dump to succeed
when it should have failed.
This happened because pg_dump identified keywords as sequences of
ASCII alphabetic characters, treating non-alphabetic characters
(like hyphens) as keyword boundaries. As a result, "table-data" was
parsed as "table".
To fix this, pg_dump --filter now treats keywords as strings of
non-whitespace characters, ensuring invalid types like "table-data"
are correctly rejected.
Back-patch to v17, where the --filter option was introduced.
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com>
Reviewed-by: Srinath Reddy <srinath2133@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/CAHGQGwFzPKUwiV5C-NLBqz1oK1+z9K8cgrF+LcxFem-p3_Ftug@mail.gmail.com
Backpatch-through: 17
Commit 9e6104c66 disallowed transition tables on foreign tables, but
failed to account for cases where a foreign table is a child table of a
partitioned/inherited table on which transition tables exist, leading to
incorrect transition tuples collected from such foreign tables for
queries on the parent table triggering transition capture. This
occurred not only for inherited UPDATE/DELETE but for partitioned INSERT
later supported by commit 3d956d956, which should have handled it at
least for the INSERT case, but didn't.
To fix, modify ExecAR*Triggers to throw an error if the given relation
is a foreign table requesting transition capture. Also, this commit
fixes make_modifytable so that in case of an inherited UPDATE/DELETE
triggering transition capture, FDWs choose normal operations to modify
child foreign tables, not DirectModify; which is needed because they
would otherwise skip the calls to ExecAR*Triggers at execution, causing
unexpected behavior.
Author: Etsuro Fujita <etsuro.fujita@gmail.com>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Discussion: https://postgr.es/m/CAPmGK14QJYikKzBDCe3jMbpGENnQ7popFmbEgm-XTNuk55oyHg%40mail.gmail.com
Backpatch-through: 13
Dropping twice a pgstats entry should not happen, and the error report
generated was missing the "generation" counter (tracking when an entry
is reused) that has been added in 818119afcc.
Like d92573adcb, backpatch down to v15 where this information is
useful to have, to gather more information from instances where the
problem shows up. A report has shown that this error path has been
reached on a standby based on 17.3, for a relation stats entry and an
OID close to wraparound.
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/CAN4RuQvYth942J2+FcLmJKgdpq6fE5eqyFvb_PuskxF2eL=Wzg@mail.gmail.com
Backpatch-through: 15
Use Min(NBuffers, MAX_CHECKPOINT_REQUESTS) instead of NBuffers in
CheckpointerShmemSize() to match the actual array size limit set in
CheckpointerShmemInit(). This prevents wasting shared memory when
NBuffers > MAX_CHECKPOINT_REQUESTS. Also, fix the comment.
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/1439188.1754506714%40sss.pgh.pa.us
Author: Xuneng Zhou <xunengzhou@gmail.com>
Co-authored-by: Alexander Korotkov <aekorotkov@gmail.com>
Recent ICU versions have added U_SHOW_CPLUSPLUS_HEADER_API, and we need
to set this to zero as well to hide the ICU C++ APIs from pg_locale.h
Per discussion, we want cpluspluscheck to work cleanly in backbranches,
so backpatch both this and its predecessor commit ed26c4e25a to all
supported versions.
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/1115793.1754414782%40sss.pgh.pa.us
Backpatch-through: 13
Commit d85ce012f9 has added some new error handling code to
date_trunc() of timestamp, timestamptz, and interval with infinite
values.
However, the new test cases added by that commit did not actually test
all of the new code, missing coverage for the following cases:
1) For timestamp without time zone:
1-1) infinite value with valid unit
1-2) infinite value with unsupported unit
1-3) finite value with unsupported unit, for a code path older than
d85ce012f9.
2) For timestamp with time zone, without a time zone specified for the
truncation:
2-1) infinite value with valid unit
2-2) infinite value with unsupported unit
2-3) finite value with unsupported unit, for a code path older than
d85ce012f9.
3) For timestamp with time zone, with a time zone specified for the
truncation:
3-1) infinite value with valid unit.
3-2) infinite value with unsupported unit.
This commit also provides coverage for the bug fixed in 2242b26ce4,
through cases 2-1) and 3-1), when using an infinite value with a valid
unit, with[out] the optional time zone parameter used for the
truncation.
Author: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/2d320b6f-b4af-4fbc-9eec-5d0fa15d187b@eisentraut.org
Discussion: https://postgr.es/m/4bf60a84-2862-4a53-acd5-8eddf134a60e@eisentraut.org
Backpatch-through: 18
The code used a PG_RETURN_TIMESTAMPTZ() where the return type is
TimestampTz and not a Datum.
On 64-bit systems, there is no effect since this just ends up casting
64-bit integers back and forth. On 32-bit systems, timestamptz is
pass-by-reference. PG_RETURN_TIMESTAMPTZ() allocates new memory and
returns the address, meaning that the caller could interpret this as a
timestamp value.
The effect is using "date_trunc(..., 'infinity'::timestamptz) will
return random values (instead of the correct return value 'infinity').
Bug introduced in commit d85ce012f9.
Author: Peter Eisentraut <peter@eisentraut.org>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/2d320b6f-b4af-4fbc-9eec-5d0fa15d187b@eisentraut.org
Discussion: https://postgr.es/m/4bf60a84-2862-4a53-acd5-8eddf134a60e@eisentraut.org
Backpatch-through: 18
These were introduced (commit efdc7d7475) at the same time as we were
moving to using the standard inttypes.h format macros (commit
a0ed19e0a9). It doesn't seem useful to keep a new already-deprecated
interface like this with only a few users, so remove the new symbols
again and have the callers use PRIx64.
(Also, INT64_HEX_FORMAT was kind of a misnomer, since hex formats all
use unsigned types.)
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/0ac47b5d-e5ab-4cac-98a7-bdee0e2831e4%40eisentraut.org
The result of "DirectFunctionCall1(numeric_float8, d)" is already in
Datum form, but the code was incorrectly applying PG_RETURN_FLOAT8()
to it. On machines where float8 is pass-by-reference, this would
result in complete garbage, since an unpredictable pointer value
would be treated as an integer and then converted to float. It's not
entirely clear how much of a problem would ensue on 64-bit hardware,
but certainly interpreting a float8 bitpattern as uint64 and then
converting that to float isn't the intended behavior.
As luck would have it, even the complete-garbage case doesn't break
BRIN indexes, since the results are only used to make choices about
how to merge values into ranges: at worst, we'd make poor choices
resulting in an inefficient index. Doubtless that explains the lack
of field complaints. However, users with BRIN indexes that use the
numeric_minmax_multi_ops opclass may wish to reindex in hopes of
making their indexes more efficient.
Author: Peter Eisentraut <peter@eisentraut.org>
Co-authored-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/2093712.1753983215@sss.pgh.pa.us
Backpatch-through: 14