The SSL tests for pg_stat_ssl tries to exactly match the serial
from the certificate by extracting it with the openssl binary.
If that fails due to the binary not being available, a fallback
match is used, but the attempt to execute a missing binary adds
a warning to the output which can confuse readers for a failure
in the test. Fix by only attempting if the openssl binary was
found by autoconf/meson.
Backpatch down to v16 where commit c8e4030d1b made the test
use the OPENSSL variable from autoconf/meson instead of a hard-
coded value.
Author: Daniel Gustafsson <daniel@yesql.se>
Reported-by: Christoph Berg <myon@debian.org>
Discussion: https://postgr.es/m/aNPSp1-RIAs3skZm@msg.df7cb.de
Backpatch-through: 16
The URL for "Sorting and Searching Algorithms: A Cookbook"
by Thomas Niemann has started returning 404, and since we
refer to the page for license terms this replaces the now
defunct link with one to the copy on archive.org.
Author: Chao Li <lic@highgo.com>
Discussion: https://postgr.es/m/6DED3DEF-875E-4D1D-8F8F-7353D5AF7B79@gmail.com
The TAP tests whose ok() calls are changed in this commit were relying
on perl operators, rather than equivalents available in Test::More. For
example, rather than the following:
ok($data =~ qr/expr/m, "expr matching");
ok($data !~ qr/expr/m, "expr not matching");
The new test code uses this equivalent:
like($data, qr/expr/m, "expr matching");
unlike($data, qr/expr/m, "expr not matching");
A huge benefit of the new formulation is that it is possible to know
about the values we are checking if a failure happens, making debugging
easier, should the test runs happen in the buildfarm, in the CI or
locally.
This change leads to more test code overall as perltidy likes to make
the code pretty the way it is in this commit.
Author: Sadhuprasad Patro <b.sadhu@gmail.com>
Discussion: https://postgr.es/m/CAFF0-CHhwNx_Cv2uy7tKjODUbeOgPrJpW4Rpf1jqB16_1bU2sg@mail.gmail.com
042_low_level_backup compared the result of a query two times with a
comparison operator based on an integer, while the result should be
compared with a string.
The outcome of the tests is currently not impacted by this change.
However, it could be possible that the tests fail to detect future
issues if the query results become different, for some reason.
Oversight in 99b4a63bef.
Author: Sadhuprasad Patro <b.sadhu@gmail.com>
Discussion: https://postgr.es/m/CAFF0-CHhwNx_Cv2uy7tKjODUbeOgPrJpW4Rpf1jqB16_1bU2sg@mail.gmail.com
Backpatch-through: 17
040_pg_createsubscriber has been calling safe_psql(), that returns the
result of a SQL query, with ok() without checking the result generated
(in this case 't', for a number of publications).
The outcome of the tests is currently not impacted by this change.
However, it could be possible that the test fails to detect future
issues if the query results become different.
The test is rewritten so as the number of publications is checked. This
is not the fix suggested originally by the author, but this is more
reliable in the long run.
Oversight in e5aeed4b80.
Author: Sadhuprasad Patro <b.sadhu@gmail.com>
Discussion: https://postgr.es/m/CAFF0-CHhwNx_Cv2uy7tKjODUbeOgPrJpW4Rpf1jqB16_1bU2sg@mail.gmail.com
Backpatch-through: 18
The original formulation failed to take into account the fact that for
the PGXS case, the source dir is not $(top_srcdir), so it ended up not
doing anything. Handle it explicitly.
Author: Ryo Matsumura <matsumura.ryo@fujitsu.com>
Reviewed-by: Bryan Green <dbryan.green@gmail.com>
Backpatch-through: 13
Discussion: https://postgr.es/m/TYCPR01MB113164770FB0B0BE6ED21E68EE8DCA@TYCPR01MB11316.jpnprd01.prod.outlook.com
Add a test case to cover pg_dump with --compress=none. This
brings the coverage of compress_none.c up from about 64% to 90%,
in particular covering the new code added in a previous patch.
Include compression of toc.dat in manually-compressed test cases.
We would have found the bug fixed in commit a239c4a0c much sooner
if we'd done this. As far as I can tell, this doesn't reduce test
coverage at all, since there are other tests of directory format
that still use an uncompressed toc.dat.
Widen the wide row used to verify correct (de) compression.
Commit 1a05c1d25 advises us (not without reason) to ensure that
this test case fully fills DEFAULT_IO_BUFFER_SIZE, so that loops
within the compression logic will iterate completely. To follow
that advice with the proposed DEFAULT_IO_BUFFER_SIZE of 128K,
we need something close to this. This does indeed increase the
reported code coverage by a few lines.
While here, fix a glitch that I noticed in testing: the
$glob_patterns tests were incapable of failing, because glob()
will return 'foo' as 'foo' whether there is a matching file or
not. (Indeed, the stanza just above that one relies on that.)
In my testing, this patch adds approximately as much runtime
as was saved by the previous patch, so that it's about a wash
compared to the old code. However, we get better test coverage.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/3515357.1760128017@sss.pgh.pa.us
Add a new test script 006_pg_dump_compress.pl, containing just
the pg_dump tests specifically concerned with compression, and
remove those tests from 002_pg_dump.pl. We can also drop some
infrastructure in 002_pg_dump.pl that was used only for these tests.
The point of this is to avoid the cost of running these test
cases over and over in all the scenarios (runs) that 002_pg_dump.pl
exercises. We don't learn anything more about the behavior of the
compression code that way, and we expend significant amounts of
time, since one of these test cases is quite large and due to get
larger.
The intent of this specific patch is to provide exactly the same
coverage as before, except that I went back to using --no-sync
in all the test runs moved over to 006_pg_dump_compress.pl.
I think that avoiding that had basically been cargo-culted into
these test cases as a result of modeling them on the
defaults_custom_format test case; again, doing that over and over
isn't going to teach us anything new.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/3515357.1760128017@sss.pgh.pa.us
After commit fe8192a95, compress_zstd.c tends to produce data block
sizes around 128K, and we don't really have any control over that
unless we want to overrule ZSTD_CStreamOutSize(). Which seems like
a bad idea. But let's try to align the other compression modes to
produce block sizes roughly comparable to that, so that pg_restore's
skip-data performance isn't enormously different for different modes.
gzip compression can be brought in line simply by setting
DEFAULT_IO_BUFFER_SIZE = 128K, which this patch does. That
increases some unrelated buffer sizes, but none of them seem
problematic for modern platforms.
lz4's idea of appropriate block size is highly nonlinear:
if we just increase DEFAULT_IO_BUFFER_SIZE then the output
blocks end up around 200K. I found that adjusting the slop
factor in LZ4State_compression_init was a not-too-ugly way
of bringing that number roughly into line.
With compress = none you get data blocks the same sizes as the
table rows, which seems potentially problematic for narrow tables.
Introduce a layer of buffering to make that case match the others.
Comments in compress_io.h and 002_pg_dump.pl suggest that if
we increase DEFAULT_IO_BUFFER_SIZE then we need to increase the
amount of data fed through the tests in order to improve coverage.
I've not done that here, leaving it for a separate patch.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/3515357.1760128017@sss.pgh.pa.us
This information appears to have been unused since commit
c5b7ba4e67. We could not find any references in third-party code,
either.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/aO_CyFRpbVMtgJWM%40nathan
To support the upcoming addition of a sequence synchronization worker,
this patch extracts common synchronization logic shared by table sync
workers and the new sequence sync worker into a dedicated file. This
modularization improves code reuse, maintainability, and clarity in the
logical workers framework.
Author: vignesh C <vignesh21@gmail.com>
Author: Hou Zhijie <houzj.fnst@fujitsu.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/CAA4eK1LC+KJiAkSrpE_NwvNdidw9F2os7GERUeSxSKv71gXysQ@mail.gmail.com
EvalPlanQualStart() failed to propagate es_partition_directory into
the child EState used for EPQ rechecks. When execution time partition
pruning ran during the EPQ scan, executor code dereferenced a NULL
partition directory and crashed.
Previously, propagating es_partition_directory into the EPQ EState was
unnecessary because CreatePartitionPruneState(), which sets it on
demand, also initialized the exec-pruning context. After commit
d47cbf474, CreatePartitionPruneState() now initializes only the init-
time pruning context, leaving exec-pruning context initialization to
ExecInitNode(). Since EvalPlanQualStart() runs only ExecInitNode() and
not CreatePartitionPruneState(), it can encounter a NULL
es_partition_directory. Other executor fields initialized during
CreatePartitionPruneState() are already copied into the child EState
thanks to commit 8741e48e5d, but es_partition_directory was missed.
Fix by borrowing the parent estate's es_partition_directory in
EvalPlanQualStart(), and by clearing that field in EvalPlanQualEnd()
so the parent remains responsible for freeing the directory.
Add an isolation test permutation that triggers EPQ with execution-
time partition pruning, the case that reproduces this crash.
Bug: #19078
Reported-by: Yuri Zamyatin <yuri@yrz.am>
Diagnosed-by: David Rowley <dgrowleyml@gmail.com>
Author: David Rowley <dgrowleyml@gmail.com>
Co-authored-by: Amit Langote <amitlangote09@gmail.com>
Discussion: https://postgr.es/m/19078-dfd62f840a2c0766@postgresql.org
Backpatch-through: 18
Per report from buildfarm member prion. The CI does not use this
parameter, and this buildfarm member sets log_error_verbosity to
"verbose". This would generate extra LOCATION entries in the logs,
causing the regexps of the test to fail.
Trying to support log_error_verbosity=verbose in the test would mean to
tweak all the regexps used in the test to detect an optional set of
LOCATION lines, at least. This would not improve the coverage, and
forcing the GUC value is simpler.
Oversight in 76bba03312.
Discussion: https://postgr.es/m/aPBaNNGiYT3xMBN1@paquier.xyz
Temporary file usage is sometimes attributed to the wrong query in the logs
output. One identified reason is that unnamed portal cleanup (and
consequently temp file logging) happens during the next BIND message as
a, after debug_query_string has already been updated to the new query.
Dropping an unnamed portal in the next BIND message is a rather old
protocol behavior (fe19e56c57, also mentioned in the docs).
log_temp_files is a bit newer than that, as of be8a431881.
This commit adds tests to track which query is displayed next to the
temporary file(s) removed when a portal is dropped, and in some cases if
a query is displayed or not. We have not concluded how to improve the
situation yet; these tests will at least help in checking what changes
in the logs depending on the proposal discussed and how it affects the
scenarios tracked by this new test.
Author: Sami Imseih <samimseih@gmail.com>
Author: Frédéric Yhuel <frederic.yhuel@dalibo.com>
Reviewed-by: Mircea Cadariu <cadariu.mircea@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/3d07ee43-8855-42db-97e0-bad5db82d972@dalibo.com
This commit adjusts RangeVarCallbackForReindexIndex() to handle an
extremely unlikely race condition involving concurrent OID reuse.
In short, if REINDEX INDEX is executed at the same time that the
index is re-created with the same name and OID but a different
parent table OID, we might lock the wrong parent table. To fix,
simply detect when this happens and emit an ERROR. Unfortunately,
we can't gracefully handle this situation because we will have
already locked the index, and we must lock the parent table before
the index to avoid deadlocks.
While at it, I've replaced all but one early return in this
callback function with ERRORs that should be unreachable. While I
haven't verified the presence of a live bug, the checks in question
appear to be unnecessary, and the early returns seem prone to
breaking the parent table locking code in subtle ways. If nothing
else, this simplifies the code a bit.
This is a bug fix and could be back-patched, but given the presumed
rarity of the race condition and the lack of reports, I'm not going
to bother.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Jeff Davis <pgsql@j-davis.com>
Discussion: https://postgr.es/m/Z8zwVmGzXyDdkAXj%40nathan
Per-character pg_locale_t APIs. Useful for tsearch parsing and
potentially other places.
Significant overlap with the regc_wc_isalpha() and related functions
in regc_pg_locale.c, but this change leaves those intact for
now.
Discussion: https://postgr.es/m/0151ad01239e2cc7b3139644358cf8f7b9622ff7.camel@j-davis.com
Presently, these functions look up the relation's OID, lock it, and
then check privileges. Not only does this approach provide no
guarantee that the locked relation matches the arguments of the
lookup, but it also allows users to briefly lock relations for
which they do not have privileges, which might enable
denial-of-service attacks. This commit adjusts these functions to
use RangeVarGetRelidExtended(), which is purpose-built to avoid
both of these issues. The new RangeVarGetRelidCallback function is
somewhat complicated because it must handle both tables and
indexes, and for indexes, we must check privileges on the parent
table and lock it first. Also, it needs to handle a couple of
extremely unlikely race conditions involving concurrent OID reuse.
A downside of this change is that the coding doesn't allow for
locking indexes in AccessShare mode anymore; everything is locked
in ShareUpdateExclusive mode. Per discussion, the original choice
of lock levels was intended for a now defunct implementation that
used in-place updates, so we believe this change is okay.
Reviewed-by: Jeff Davis <pgsql@j-davis.com>
Discussion: https://postgr.es/m/Z8zwVmGzXyDdkAXj%40nathan
Backpatch-through: 18
The log output functionality of log_autovacuum_min_duration applies to
both VACUUM and ANALYZE, so it is not possible to separate the VACUUM
and ANALYZE log output thresholds. Logs are likely to be output only for
VACUUM and not for ANALYZE.
Therefore, we decided to separate the threshold for log output of VACUUM
by autovacuum (log_autovacuum_min_duration) and the threshold for log
output of ANALYZE by autovacuum (log_autoanalyze_min_duration).
Author: Shinya Kato <shinya11.kato@gmail.com>
Reviewed-by: Kasahara Tatsuhito <kasaharatt@oss.nttdata.com>
Discussion: https://www.postgresql.org/message-id/flat/CAOzEurQtfV4MxJiWT-XDnimEeZAY+rgzVSLe8YsyEKhZcajzSA@mail.gmail.com
If inside an EPQ recheck, ExecScanFetch would run the recheck method
function for foreign/custom joins even if they aren't descendant nodes
in the EPQ recheck plan tree, which is problematic at least in the
foreign-join case, because such a foreign join isn't guaranteed to have
an alternative local-join plan required for running the recheck method
function; in the postgres_fdw case this could lead to a segmentation
fault or an assert failure in an assert-enabled build when running the
recheck method function.
Even if inside an EPQ recheck, any scan nodes that aren't descendant
ones in the EPQ recheck plan tree should be normally processed by using
the access method function; fix by modifying ExecScanFetch so that if
inside an EPQ recheck, it runs the recheck method function for
foreign/custom joins that are descendant nodes in the EPQ recheck plan
tree as before and runs the access method function for foreign/custom
joins that aren't.
This fix also adds to postgres_fdw an isolation test for an EPQ recheck
that caused issues stated above.
Oversight in commit 385f337c9.
Reported-by: Kristian Lejao <kristianlejao@gmail.com>
Author: Masahiko Sawada <sawada.mshk@gmail.com>
Co-authored-by: Etsuro Fujita <etsuro.fujita@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Etsuro Fujita <etsuro.fujita@gmail.com>
Discussion: https://postgr.es/m/CAD21AoBpo6Gx55FBOW+9s5X=nUw3Xpq64v35fpDEKsTERnc4TQ@mail.gmail.com
Backpatch-through: 13
This essentially reverts commit 866566a690, which installed
safeguards against loading plpython2 and plpython3 into the same
process. We don't support plpython2 anymore, so this is obsolete.
The Python and PL/Python initialization now happens again in
_PG_init() rather than the first time a PL/Python call handler is
invoked. (Often, these will be very close together.)
I kept the separate PLy_initialize() function introduced by
866566a690 to keep _PG_init() a bit modular.
Reviewed-by: Mario González Troncoso <gonzalemario@gmail.com>
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/9eb9feb6-1df3-4f0c-a0dc-9bcf35273111%40eisentraut.org
This patch replaces ALTER SUBSCRIPTION REFRESH with
ALTER SUBSCRIPTION REFRESH PUBLICATION in comments and error messages to
improve clarity and support future extensibility. The change aligns with
upcoming addition REFRESH SEQUENCES for sequence synchronization.
Author: vignesh C <vignesh21@gmail.com>
Author: Hou Zhijie <houzj.fnst@fujitsu.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/CAA4eK1LC+KJiAkSrpE_NwvNdidw9F2os7GERUeSxSKv71gXysQ@mail.gmail.com
pg_createsubscriber is documented as requiring the same major version as
the target clusters. Attempting to use this tool on a cluster where the
control file version read does not match with the version compiled with
would lead to the following error message:
pg_createsubscriber: error: control file appears to be corrupt
This is confusing as the control file is correct: only the version
expected does not match. This commit integrates pg_createsubscriber
with the facility added by cd0be131ba, where the contents of
PG_VERSION are read and compared with the value of PG_MAJORVERSION_NUM
expected by the tool. This puts pg_createsubscriber in line with the
documentation, with a better error message when the version does not
match.
Author: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/aONDWig0bIGilixs@paquier.xyz
pg_resetwal's custom logic to retrieve the version number of a data
folder's PG_VERSION can be replaced by the facility introduced in
cd0be131ba. This removes some code.
One thing specific to pg_resetwal is that the first line of PG_VERSION
is read and reported in the error report generated when the major
version read does not match with the version pg_resetwal has been
compiled with. The new logic preserves this property, without changes
to neither the error message nor the data used in the error report.
Note that as a chdir() is done within the data folder before checking the
data of PG_VERSION, get_pg_version() needs to be tweaked to look for
PG_VERSION in the current folder.
Author: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/aOiirvWJzwdVCXph@paquier.xyz
pg_combinebackup's custom logic to retrieve the version number of a data
folder's PG_VERSION can be replaced by the facility introduced in
cd0be131ba. This removes some code.
One thing specific to this tool is that backend versions older than v10
are not supported. The new code does the same checks as the previous
code.
Author: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/aOiirvWJzwdVCXph@paquier.xyz
This reverts commit 74ac377d75.
The previous change contained a misconception about how publications
are cleaned up on the subscriber. The newly added log message could
confuse users, particularly when running pg_createsubscriber with
--dry-run - users would see a "dropping publication" message
immediately followed by a "no publications found" message.
Discussion: https://postgr.es/m/CAHut+Pu7xz1LqNvyQyvSHrV0Sw6D=e6T-Jm=gh1MRJrkuWGyBQ@mail.gmail.com
This function only requires a few fields from LVRelState, so pass them
in individually.
This change allows calling heap_page_is_all_visible() from code such as
pruneheap.c, which does not have access to an LVRelState.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/2wk7jo4m4qwh5sn33pfgerdjfujebbccsmmlownybddbh6nawl%40mdyyqpqzxjek
These functions appeared prominently in a profile of a patch that sets
the visibility map on-access. Inline them to remove call overhead and
make them cheaper to use in hot paths.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/2wk7jo4m4qwh5sn33pfgerdjfujebbccsmmlownybddbh6nawl%40mdyyqpqzxjek
After scanning the line pointers on a heap page during the first phase
of vacuum, we use the information collected to decide whether to use
the assembled freeze plans.
Move this decision logic into a helper function to improve readability.
While here, rename a PruneState member and disambiguate some local
variables in heap_page_prune_and_freeze().
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/2wk7jo4m4qwh5sn33pfgerdjfujebbccsmmlownybddbh6nawl%40mdyyqpqzxjek
When specifying --clean=publication to pg_createsubscriber, it drops
all existing publications with a log message "dropping all existing
publications in database "testdb"". Add a new log message "no
publications found" when there are no publications to drop, making the
progress more transparent to users.
Author: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/CAHut+Ptm+WJwbbYXhC0s6FP_98KzZCR=5CPu8F8N5uV8P7BpqA@mail.gmail.com
add323da40 altered xl_heap_prune, changing the WAL format, but
neglected to bump XLOG_PAGE_MAGIC. Do so now.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reported-by: Kirill Reshke <reshkekirill@gmail.com>
Reported-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/aO3Gw6hCAZFUd5ab%40paquier.xyz
Previously WinCheckAndInitializeNullTreatment() used elog() to emit an
error message. ereport() should be used instead because it's a
user-facing error. Also use existing get_func_name() to get a
function's name, rather than own implementation.
Moreover add an assertion to validate winobj parameter, just like
other window function API.
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Author: Tatsuo Ishii <ishii@postgresql.org>
Reviewed-by: Chao Li <lic@highgo.com>
Discussion: https://postgr.es/m/2952409.1760023154%40sss.pgh.pa.us
Unsurprisingly, this shaves code. get_major_server_version() can be
replaced by the new routine added by cd0be131ba, with the contents of
PG_VERSION stored in an allocated buffer instead of a fixed-sized one.
Author: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/aOiirvWJzwdVCXph@paquier.xyz
get_pg_version() is able to return a version number, that can be used
for comparisons based on PG_VERSION_NUM. A macro is added to convert
the result to a major version number, to work with PG_MAJORVERSION_NUM.
It is possible to pass to the routine an optional argument, where the
contents retrieved from PG_VERSION are saved. This requirement matters
for some of the frontend code (one example: pg_upgrade wants that for
tablespace paths with a version number strictly older than v10).
This will be used by a set of follow-up patches, to be consumed in
various frontend tools that duplicate a logic similar to do what this
new routine does, like:
- pg_resetwal
- pg_combinebackup
- pg_createsubscriber
- pg_upgrade
This routine supports both the post-v10 version number and the older
flavor (aka 9.6), as required at least by pg_upgrade.
Author: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/aOiirvWJzwdVCXph@paquier.xyz
The version number calculated by read_pg_version_file() is multiplied
once by 10000, to be able to do comparisons based on PG_VERSION_NUM or
equivalents with a minor version included. However, the version number
given sync_pgdata() was multiplied by 10000 a second time, leading to an
overestimated number.
This issue was harmless (still incorrect) as pg_combinebackup does not
support versions of Postgres older than v10, and sync_pgdata() only
includes a version check due to the rename of pg_xlog/ to pg_wal/. This
folder rename happened in the development cycle of v10. This would
become a problem if in the future sync_pgdata() is changed to have more
version-specific checks.
Oversight in dc21234005, so backpatch down to v17.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/aOil5d0y87ZM_wsZ@paquier.xyz
Backpatch-through: 17
Instead of emitting a separate XLOG_HEAP2_VISIBLE WAL record for each
page that becomes all-visible in vacuum's third phase, specify the VM
changes in the already emitted XLOG_HEAP2_PRUNE_VACUUM_CLEANUP record.
Visibility checks are now performed before marking dead items unused.
This is safe because the heap page is held under exclusive lock for the
entire operation.
This reduces the number of WAL records generated by VACUUM phase III by
up to 50%.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
log_error() would probably fail completely if used, and would
certainly print garbage for anything that needed to be interpolated
into the message, because it was failing to use the correct printing
subroutine for a va_list argument.
This bug likely went undetected because the error cases this code
is used for are rarely exercised - they only occur when Windows
security API calls fail catastrophically (out of memory, security
subsystem corruption, etc).
The FRONTEND variant can be fixed just by calling vfprintf()
instead of fprintf(). However, there was no va_list variant
of write_stderr(), so create one by refactoring that function.
Following the usual naming convention for such things, call
it vwrite_stderr().
Author: Bryan Green <dbryan.green@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAF+pBj8goe4fRmZ0V3Cs6eyWzYLvK+HvFLYEYWG=TzaM+tWPnw@mail.gmail.com
Backpatch-through: 13
I was distressed to find that reading an LZ4-compressed toc.dat
file was hundreds of times slower than it ought to be. On
investigation, the blame mostly affixes to LZ4Stream_read_overflow's
habit of memmove'ing all the remaining buffered data after each read
operation. Since reading a TOC file tends to involve a lot of small
(even one-byte) decompression calls, that amounts to an O(N^2) cost.
This could have been fixed with a minimal patch, but to my
eyes LZ4Stream_read_internal and LZ4Stream_read_overflow are
badly-written spaghetti code; in particular the eol_flag logic
is inefficient and duplicative. I chose to throw the code
away and rewrite from scratch. This version is about sixty
lines shorter as well as not having the performance issue.
Fortunately, AFAICT the only way to get to this problem is to
manually LZ4-compress the toc.dat and/or blobs.toc files within a
directory-style archive; in the main data files, we read blocks
that are large enough that the O(N^2) behavior doesn't manifest.
Few people do that, which likely explains the lack of field
complaints. Otherwise this performance bug might be considered
bad enough to warrant back-patching.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/3515357.1760128017@sss.pgh.pa.us
Both of these modules dumped each bit of output that they got from
the underlying compression library as a separate "data block" in
the emitted archive file. In the case of zstd this'd frequently
result in block sizes well under 100 bytes; lz4 is a little better
but still produces blocks around 300 bytes, at least in the test
case I tried. This bloats the archive file a little bit compared
to larger block sizes, but the real problem is that when pg_restore
has to skip each data block rather than seeking directly to some
target data, tiny block sizes are enormously inefficient.
Fix both modules so that they fill their allocated buffer reasonably
well before dumping a data block. In the case of lz4, also delete
some redundant logic that caused the lz4 frame header to be emitted
as a separate data block. (That saves little, but I see no reason
to expend extra code to get worse results.)
I fixed the "stream API" code too. In those cases, feeding small
amounts of data to fwrite() probably doesn't have any meaningful
performance consequences. But it seems like a bad idea to leave
the two sets of code doing the same thing in two different ways.
In passing, remove unnecessary "extra paranoia" check in
_ZstdWriteCommon. _CustomWriteFunc (the only possible referent
of cs->writeF) already protects itself against zero-length writes,
and it's really a modularity violation for _ZstdWriteCommon to know
that the custom format disallows empty data blocks. Also, fix
Zstd_read_internal to do less work when passed size == 0.
Reported-by: Dimitrios Apostolou <jimis@gmx.net>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/3515357.1760128017@sss.pgh.pa.us
pg_dump expects a read request of zero bytes to be a no-op; see for
example ReadStr(). Gzip_read got this wrong and falsely supposed
that the resulting gzret == 0 indicated an error. We could complicate
that error-checking logic some more, but it seems best to just fall
out immediately when passed size == 0.
This bug breaks the nominally-supported case of manually gzip'ing
the toc.dat file within a directory-style dump, so back-patch to v16
where this code came in. (Prior branches already have a short-circuit
for size == 0 before their only gzread call.)
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/3515357.1760128017@sss.pgh.pa.us
Backpatch-through: 16
Remove a variable that is no longer in use following commit 9a2e2a28.
It's not immediately clear why there were no compiler warnings about
this oversight.
Author: Peter Geoghegan <pg@bowt.ie>
Backpatch-through: 18
In commit a45c78e32 I removed the only regression test case that
reaches this function, because it turns out that we only use it
if reading an LZ4-compressed blobs.toc file in a directory dump,
and that is a state that has to be created manually. That seems
like a bad thing to not test, not so much for LZ4Stream_gets()
itself as because it means the squirrely eol_flag logic in
LZ4Stream_read_internal() is not tested.
The reason for the change was that I thought the lz4 program did not
have any way to perform compression without explicit specification
of the output file name. However, it turns out that the syntax
synopsis in its man page is a lie, and if you read enough of the
man page you find out that with "-m" it will do what's needful.
So restore the manual compression step in that test case.
Noted while testing some proposed changes in pg_dump's compression
logic.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/3515357.1760128017@sss.pgh.pa.us
Backpatch-through: 17
Commit 71f4c8c6f7 (which implemented DETACH CONCURRENTLY) added code
to create a separate table constraint when a table is detached
concurrently, identical to the partition constraint, on the theory that
such a constraint was needed in case the optimizer had constructed any
query plans that depended on the constraint being there. However, that
theory was apparently bogus because any such plans would be invalidated.
For hash partitioning, those constraints are problematic, because their
expressions reference the OID of the parent partitioned table, to which
the detached table is no longer related; this causes all sorts of
problems (such as inability of restoring a pg_dump of that table, and
the table no longer working properly if the partitioned table is later
dropped).
We'd like to get rid of all those constraints. In fact, for branch
master, do that -- no longer create any substitute constraints.
However, out of fear that some users might somehow depend on these
constraints for other partitioning strategies, for stable branches
(back to 14, which added DETACH CONCURRENTLY), only do it for hash
partitioning.
(If you repeatedly DETACH CONCURRENTLY and then ATTACH a partition, then
with this constraint addition you don't need to scan the table in the
ATTACH step, which presumably is good. But if users really valued this
feature, they would have requested that it worked for non-concurrent
DETACH also.)
Author: Haiyang Li <mohen.lhy@alibaba-inc.com>
Reported-by: Fei Changhong <feichanghong@qq.com>
Reported-by: Haiyang Li <mohen.lhy@alibaba-inc.com>
Backpatch-through: 14
Discussion: https://postgr.es/m/18371-7fef49f63de13f02@postgresql.org
Discussion: https://postgr.es/m/19070-781326347ade7c57@postgresql.org
Introduced by my (Álvaro's) commit 9e4f914b5e, which was itself
backpatched to pg10, though only pg15 and up contain the problem
because of commit 9c08aea6a3.
This isn't a particularly significant leak, but given the fix is
trivial, we might as well backpatch to all branches where it applies, so
do that.
Author: Nathan Bossart <nathandbossart@gmail.com>
Reported-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/x4odfdlrwvsjawscnqsqjpofvauxslw7b4oyvxgt5owoyf4ysn@heafjusodrz7
An assertion in _bt_killitems expected the scan's currPos state to
contain a valid LSN, saved from when currPos's page was initially read.
The assertion failed to account for the fact that even logged relations
can have leaf pages with an invalid LSN when built with wal_level set to
"minimal". Remove the faulty assertion.
Oversight in commit e6eed40e (though note that the assertion was
backpatched to stable branches before 18 by commit 7c319f54).
Author: Peter Geoghegan <pg@bowt.ie>
Reported-By: Matthijs van der Vleuten <postgresql@zr40.nl>
Bug: #19082
Discussion: https://postgr.es/m/19082-628e62160dbbc1c1@postgresql.org
Backpatch-through: 13
An error happening while a slot data is saved on disk in
SaveSlotToPath() could cause a state.tmp file (temporary file holding
the slot state data, renamed to its permanent name at the end of the
function) to remain around after it has been created. This temporary
file is created with O_EXCL, meaning that if an existing state.tmp is
found, its creation would fail. This would prevent the slot data to be
saved, requiring a manual intervention to remove state.tmp before being
able to save again a slot. Possible scenarios where this temporary file
could remain on disk is for example a ENOSPC case (no disk space) while
writing, syncing or renaming it. The bug reports point to a write
failure as the principal cause of the problems.
Using O_TRUNC has been argued back in 2019 as a potential solution to
discard any temporary file that could exist. This solution was rejected
as O_EXCL can also act as a safety measure when saving the slot state,
crash recovery offering cleanup guarantees post-crash. This commit uses
the alternative approach that has been suggested by Andres Freund back
in 2019. When the temporary state file cannot be written, synced,
closed or renamed (note: not when created!), an unlink() is used to
remove the temporary state file while holding the in-progress I/O
LWLock, so as any follow-up attempts to save a slot's data would not
choke on an existing file that remained around because of a previous
failure.
This problem has been reported a few times across the years, going back
to 2019, but for some reason I have never come back to do something
about it and it has been forgotten. A recent report has reminded me
that this was still a problem.
Reported-by: Kevin K Biju <kevinkbiju@gmail.com>
Reported-by: Sergei Kornilov <sk@zsrv.org>
Reported-by: Grigory Smolkin <g.smolkin@postgrespro.ru>
Discussion: https://postgr.es/m/CAM45KeHa32soKL_G8Vk38CWvTBeOOXcsxAPAs7Jt7yPRf2mbVA@mail.gmail.com
Discussion: https://postgr.es/m/3559061693910326@qy4q4a6esb2lebnz.sas.yp-c.yandex.net
Discussion: https://postgr.es/m/08bbfab1-a61d-3750-fc18-4ab2c1aa7f09@postgrespro.ru
Backpatch-through: 13
test_random_operations() did not check the result returned by
bms_is_member() in its last phase, when checking that the contents of
the bitmap match with what is expected. This was impacting the
reliability of the function and the coverage it could provide.
This commit improves the whole function, adding more checks based on
bms_is_member(), using a bitmap and a secondary array that tracks the
members added by random additions and deletions.
While on it, more comments are added to document the internals of the
function.
Reported-by: Ranier Vilela <ranier.vf@gmail.com>
Author: Greg Burd <greg@burd.me>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/CAEudQAq_zOSA2NUQSWePTGV_=90Uw0WcXxGOWnN-vwF046OOqA@mail.gmail.com
Instead of emitting a separate WAL XLOG_HEAP2_VISIBLE record for setting
bits in the VM, specify the VM block changes in the
XLOG_HEAP2_MULTI_INSERT record.
This halves the number of WAL records emitted by COPY FREEZE.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
The processing of the PARALLEL option for VACUUM was not quite
following what the DefElem code had intended. defGetInt32() already has
code to handle missing parameters and returns a perfectly good error
message for when that happens.
Here we get rid of the ExecVacuum() error:
ERROR: parallel option requires a value between 0 and N
and leave defGetInt32() handle it, which will give:
ERROR: parallel requires an integer value
defGetInt32() was already handling the non-integer parameter case, so it
may as well handle the missing parameter case too.
Additionally, parameterize the option name to make translator work easier,
and also use errhint_internal() rather than errhint() for the
BUFFER_USAGE_LIMIT option since there isn't any work for a translator to
do for "%s".
Author: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/CAApHDvovH14tNWB+WvP6TSbfi7-=TysQ9h5tQ5AgavwyWRWKHA@mail.gmail.com
An error context callback function might leak some memory into
ErrorContext, since those functions are run with ErrorContext as
current context. In the case where the elevel is ERROR, this is
no problem since the code level that catches the error should do
FlushErrorState to clean up, and that will reset ErrorContext.
However, if the elevel is less than ERROR then no such cleanup occurs.
In principle, repeated leaks while emitting log messages or client
notices could accumulate arbitrarily much leaked data, if no ERROR
occurs in the session.
To fix, let errfinish() perform an ErrorContext reset if it is
at the outermost error nesting level. (If it isn't, we'll delay
cleanup until the outermost nesting level is exited.)
The only actual leakage of this sort that I've been able to observe
within our regression tests was recently introduced by commit
f727b63e8. While it seems plausible that there are other such
leaks not reached in the regression tests, the lack of field
reports suggests that they're not a big problem. Accordingly,
I won't take the risk of back-patching this now. We can always
back-patch later if we get field reports of leaks.
Reported-by: Andres Freund <andres@anarazel.de>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/jngsjonyfscoont4tnwi2qoikatpd5hifsg373vmmjvugwiu6g@m6opxh7uisgd
While pgoutput caches relation synchronization information in
RelationSyncCache that resides in CacheMemoryContext, each entry's
information (such as row filter expressions and column lists) is
stored in the entry's private memory context (entry_cxt in
RelationSyncEntry), which is a descendant memory context of the
decoding context. If a logical decoding invoked via SQL functions like
pg_logical_slot_get_binary_changes fails with an error, subsequent
logical decoding executions could access already-freed memory of the
entry's cache, resulting in a crash.
With this change, it's ensured that RelationSyncCache is cleaned up
even in error cases by using a memory context reset callback function.
Backpatch to 15, where entry_cxt was introduced for column filtering
and row filtering.
While the backbranches v13 and v14 have a similar issue where
RelationSyncCache persists even after an error when pgoutput is used
via SQL API, we decided not to backport this fix. This decision was
made because v13 is approaching its final minor release, and we won't
have an chance to fix any new issues that might arise. Additionally,
since using pgoutput via SQL API is not a common use case, the risk
outwights the benefit. If we receive bug reports, we can consider
backporting the fixes then.
Author: vignesh C <vignesh21@gmail.com>
Co-authored-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: Euler Taveira <euler@eulerto.com>
Discussion: https://postgr.es/m/CALDaNm0x-aCehgt8Bevs2cm=uhmwS28MvbYq1=s2Ekf0aDPkOA@mail.gmail.com
Backpatch-through: 15
Some of the buildfarm is still unhappy with WinGetFuncArgInPartition
even after 2273fa32b. While it seems to be just very old compilers,
we can suppress the warnings and arguably make the code more readable
by not initializing these variables till closer to where they are
used. While at it, make a couple of cosmetic comment improvements.
The creation of a replication slot done in a specific database on a
publisher was logged twice, with the second log not mentioning the
database where the slot creation happened. This commit removes the
information logged after a slot has been successfully created, moving
the information about the publisher from the second to the first log.
Note that failing a slot creation is also logged, so there is no loss of
information.
Author: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAHut+Pv7qDvLbDgc9PQGhULT3rPXTxdu_=w+iW-kMs+zPADR+w@mail.gmail.com
This patch adds support for the ALL SEQUENCES clause in publications,
enabling synchronization/replication of all sequences that is useful for
upgrades.
Publications can now include all sequences via FOR ALL SEQUENCES.
psql enhancements:
\d shows publications for a given sequence.
\dRp indicates if a publication includes all sequences.
ALL SEQUENCES can be combined with ALL TABLES, but not with other options
like TABLE or TABLES IN SCHEMA. We can extend support for more granular
clauses in future.
The view pg_publication_sequences provides information about the mapping
between publications and sequences.
This patch enables publishing of sequences; subscriber-side support will
be added in upcoming patches.
Author: vignesh C <vignesh21@gmail.com>
Author: Tomas Vondra <tomas@vondra.me>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Nisha Moond <nisha.moond412@gmail.com>
Reviewed-by: Shlok Kyal <shlok.kyal.oss@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/CAA4eK1LC+KJiAkSrpE_NwvNdidw9F2os7GERUeSxSKv71gXysQ@mail.gmail.com
SQL/JSON functions such as JSON_VALUE could fail with "unrecognized
node type" errors when a DEFAULT clause contained an explicit COLLATE
expression. That happened because assign_collations_walker() could
invoke exprSetCollation() on a JsonBehavior expression whose DEFAULT
still contained a CollateExpr, which exprSetCollation() does not
handle.
For example:
SELECT JSON_VALUE('{"a":1}', '$.c' RETURNING text
DEFAULT 'A' COLLATE "C" ON EMPTY);
Fix by validating in transformJsonBehavior() that the DEFAULT
expression's collation matches the enclosing JSON expression’s
collation. In exprSetCollation(), replace the recursive call on the
JsonBehavior expression with an assertion that its collation already
matches the target, since the parser now enforces that condition.
Reported-by: Jian He <jian.universality@gmail.com>
Author: Jian He <jian.universality@gmail.com>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Discussion: https://postgr.es/m/CACJufxHVwYYSyiVQ6o+PsRX6zQ7rAFinh_fv1kCfTsT1xG4Zeg@mail.gmail.com
Backpatch-through: 17
truncate_useless_pathkeys() seems to have neglected to account for
PathKeys that might be useful for WindowClause evaluation. Modify it so
that it properly accounts for that.
Making this work required adjusting two things:
1. Change from checking query_pathkeys to check sort_pathkeys instead.
2. Add explicit check for window_pathkeys
For #1, query_pathkeys gets set in standard_qp_callback() according to the
sort order requirements for the first operation to be applied after the
join planner is finished, so this changes depending on which upper
planner operations a particular query needs. If the query has window
functions and no GROUP BY, then query_pathkeys gets set to
window_pathkeys. Before this change, this meant PathKeys useful for the
ORDER BY were not accounted for in queries with window functions.
Because of #1, #2 is now required so that we explicitly check to ensure
we don't truncate away PathKeys useful for window functions.
Author: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/CAApHDvrj3HTKmXoLMbUjTO=_MNMxM=cnuCSyBKidAVibmYPnrg@mail.gmail.com
Previously StrategyGetBuffer() acquired the buffer header spinlock for every
buffer, whether it was reusable or not. If reusable, it'd be returned, with
the lock held, to GetVictimBuffer(), which then would pin the buffer with
PinBuffer_Locked(). That's somewhat violating the spirit of the guidelines for
holding spinlocks (i.e. that they are only held for a few lines of consecutive
code) and necessitates using PinBuffer_Locked(), which scales worse than
PinBuffer() due to holding the spinlock. This alone makes it worth changing
the code.
However, the main reason to change this is that a future commit will make
PinBuffer_Locked() slower (due to making UnlockBufHdr() slower), to gain
scalability for the much more common case of pinning a pre-existing buffer. By
pinning the buffer with a single atomic operation, iff the buffer is reusable,
we avoid any potential regression for miss-heavy workloads. There strictly are
fewer atomic operations for each potential buffer after this change.
The price for this improvement is that freelist.c needs two CAS loops and
needs to be able to set up the resource accounting for pinned buffers. The
latter is achieved by exposing a new function for that purpose from bufmgr.c,
that seems better than exposing the entire private refcount infrastructure.
The improvement seems worth the complexity.
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff
We're planning to merge buffer content locks into BufferDesc.state. To reduce
the size of that patch, centralize calls to BufferDescriptorGetContentLock().
The biggest part of the change is in assertions, by introducing
BufferIsLockedByMe[InMode]() (and removing BufferIsExclusiveLocked()). This
seems like an improvement even without aforementioned plans.
Additionally replace some direct calls to LWLockAcquire() with calls to
LockBuffer().
Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff
BM_PERMANENT is defined as 1U<<31, which is a negative number when interpreted
as a signed integer. Unfortunately the mask variable in BufferSync() was
signed. This has been wrong for a long time, but failed to fail, due to
integer conversion rules.
However, in an upcoming patch the width of the state variable will be
increased, with the wrong signedness leading to never flushing permanent
buffers - luckily caught in a test.
It seems better to fix this separately, instead of doing so as part of a
large, otherwise mechanical, patch.
Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff
While testing a new potential use for ReadRecentBuffer(), Andres
reported that it scales badly when called concurrently for the same
buffer by many backends. Instead of a naive (but wrong) coding with
PinBuffer(), it used the spinlock, so that it could be careful to pin
only if the buffer was valid and holding the expected block, to avoid
breaking invariants in eg GetVictimBuffer(). Unfortunately that made it
less scalable than PinBuffer(), which uses compare-exchange instead.
We can fix that by giving PinBuffer() a new skip_if_not_valid mode that
doesn't pin invalid buffers. It might occasionally skip when it
shouldn't due to the unlocked read of the header flags, but that's
unlikely and perfectly acceptable for an opportunistic optimisation
routine, and it can only succeed when it really should due to the
compare-exchange loop.
Note that this fixes ReadRecentBuffer()'s failure to bump the usage
count. While this could be seen as a bug, there currently aren't cases
affected by this in core, so it doesn't seem worth backpatching that portion.
Author: Thomas Munro <thomas.munro@gmail.com>
Reported-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Discussion: https://postgr.es/m/20230627020546.t6z4tntmj7wmjrfh%40awork3.anarazel.de
Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff
This commit introduces a new column mem_exceeded_count to the
pg_stat_replication_slots view. This counter tracks how often the
memory used by logical decoding exceeds the logical_decoding_work_mem
limit. The new statistic helps users determine whether exceeding the
logical_decoding_work_mem limit is a rare occurrences or a frequent
issue, information that wasn't available through existing statistics.
Bumps catversion.
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/978D21E8-9D3B-40EA-A4B1-F87BABE7868C@yesql.se
In the same spirit as 3bf905692, assume that all compilers we still
support provide the NAN macro, and get rid of workarounds for that.
The C standard allows implementations to omit NAN if the underlying
float arithmetic lacks quiet (non-signaling) NaNs. However, we've
required that feature for years: the workarounds only supported
lack of the macro, not lack of the functionality. I put in a
compile-time #error if there's no macro, just for clarity.
Also fix up the copies of these functions in ecpglib, and leave
a breadcrumb for the next hacker who touches them.
History of the hacks being removed here can be found in commits
1bc2d544b, 4d17a2146, cec8394b5.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/1952095.1759764279@sss.pgh.pa.us
Extensions can stash data computed at plan time into this list using
planner_shutdown_hook (or perhaps other mechanisms) and then access
it from any code that has access to the PlannedStmt (such as explain
hooks), allowing for extensible debugging and instrumentation of
plans.
Reviewed-by: Andrei Lepikhov <lepihov@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: http://postgr.es/m/CA+TgmoYWKHU2hKr62Toyzh-kTDEnMDeLw7gkOOnjL-TnOUq0kQ@mail.gmail.com
These hooks allow plugins to get control at the earliest point at
which the PlannerGlobal object is fully initialized, and then just
before it gets destroyed. This is useful in combination with the
extendable plan state facilities (see extendplan.h) and perhaps for
other purposes as well.
Reviewed-by: Andrei Lepikhov <lepihov@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: http://postgr.es/m/CA+TgmoYWKHU2hKr62Toyzh-kTDEnMDeLw7gkOOnjL-TnOUq0kQ@mail.gmail.com
This allows extensions to have access to any data they've stored
in the ExplainState during planning. Unfortunately, it won't help
with EXPLAIN EXECUTE is used, but since that case is less common,
this still seems like an improvement.
Since planner() has quite a few arguments now, also add some
documentation of those arguments and the return value.
Author: Robert Haas <rhaas@postgresql.org>
Co-authored-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Andrei Lepikhov <lepihov@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: http://postgr.es/m/CA+TgmoYWKHU2hKr62Toyzh-kTDEnMDeLw7gkOOnjL-TnOUq0kQ@mail.gmail.com
Eager aggregation is a query optimization technique that partially
pushes aggregation past a join, and finalizes it once all the
relations are joined. Eager aggregation may reduce the number of
input rows to the join and thus could result in a better overall plan.
In the current planner architecture, the separation between the
scan/join planning phase and the post-scan/join phase means that
aggregation steps are not visible when constructing the join tree,
limiting the planner's ability to exploit aggregation-aware
optimizations. To implement eager aggregation, we collect information
about aggregate functions in the targetlist and HAVING clause, along
with grouping expressions from the GROUP BY clause, and store it in
the PlannerInfo node. During the scan/join planning phase, this
information is used to evaluate each base or join relation to
determine whether eager aggregation can be applied. If applicable, we
create a separate RelOptInfo, referred to as a grouped relation, to
represent the partially-aggregated version of the relation and
generate grouped paths for it.
Grouped relation paths can be generated in two ways. The first method
involves adding sorted and hashed partial aggregation paths on top of
the non-grouped paths. To limit planning time, we only consider the
cheapest or suitably-sorted non-grouped paths in this step.
Alternatively, grouped paths can be generated by joining a grouped
relation with a non-grouped relation. Joining two grouped relations
is currently not supported.
To further limit planning time, we currently adopt a strategy where
partial aggregation is pushed only to the lowest feasible level in the
join tree where it provides a significant reduction in row count.
This strategy also helps ensure that all grouped paths for the same
grouped relation produce the same set of rows, which is important to
support a fundamental assumption of the planner.
For the partial aggregation that is pushed down to a non-aggregated
relation, we need to consider all expressions from this relation that
are involved in upper join clauses and include them in the grouping
keys, using compatible operators. This is essential to ensure that an
aggregated row from the partial aggregation matches the other side of
the join if and only if each row in the partial group does. This
ensures that all rows within the same partial group share the same
"destiny", which is crucial for maintaining correctness.
One restriction is that we cannot push partial aggregation down to a
relation that is in the nullable side of an outer join, because the
NULL-extended rows produced by the outer join would not be available
when we perform the partial aggregation, while with a
non-eager-aggregation plan these rows are available for the top-level
aggregation. Pushing partial aggregation in this case may result in
the rows being grouped differently than expected, or produce incorrect
values from the aggregate functions.
If we have generated a grouped relation for the topmost join relation,
we finalize its paths at the end. The final paths will compete in the
usual way with paths built from regular planning.
The patch was originally proposed by Antonin Houska in 2017. This
commit reworks various important aspects and rewrites most of the
current code. However, the original patch and reviews were very
useful.
Author: Richard Guo <guofenglinux@gmail.com>
Author: Antonin Houska <ah@cybertec.at> (in an older version)
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Jian He <jian.universality@gmail.com>
Reviewed-by: Tender Wang <tndrwang@gmail.com>
Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Tomas Vondra <tomas@vondra.me> (in an older version)
Reviewed-by: Andy Fan <zhihuifan1213@163.com> (in an older version)
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> (in an older version)
Discussion: https://postgr.es/m/CAMbWs48jzLrPt1J_00ZcPZXWUQKawQOFE8ROc-ADiYqsqrpBNw@mail.gmail.com
This patch reuses the existing aggtransspace in pg_aggregate to
signal that an aggregate's transition state can grow unboundedly. If
aggtransspace is set to a negative value, it now indicates that the
transition state may consume unpredictable or large amounts of memory,
such as in aggregates like array_agg or string_agg that accumulate
input rows.
This information can be used by the planner to avoid applying
memory-sensitive optimizations (e.g., eager aggregation) when there is
a risk of excessive memory usage during partial aggregation.
Bump catalog version.
Per idea from Robert Haas, though applied differently than originally
suggested.
Discussion: https://postgr.es/m/CA+TgmoYbkvYwLa+1vOP7RDY7kO2=A7rppoPusoRXe44VDOGBPg@mail.gmail.com
The following information is added in the description of some GIN
records:
- In INSERT_LISTPAGE, the number of tuples and the right link block.
- In UPDATE_META_PAGE, the number of tuples, the previous tail block,
and the right link block.
- In SPLIT, the left and right children blocks.
Author: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Discussion: https://postgr.es/m/CALdSSPgnAt5L=D_xGXRXLYO5FK1H31_eYEESxdU1n-r4g+6GqA@mail.gmail.com
It is possible to call pg_stat_reset_single_function_counters() for a
single function, but the reset time was missing the system view showing
its statistics. Like all the fields of pg_stat_user_functions, the GUC
track_functions needs to be enabled to show the statistics about
function executions.
Bump catalog version.
Bump PGSTAT_FILE_FORMAT_ID, as a result of the new field added to
PgStat_StatFuncEntry.
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/aONjnsaJSx-nEdfU@paquier.xyz
Fix several issues pointed out by Coverity (reported by Tome Lane).
- In row_is_in_frame(), return value of window_gettupleslot() was not
checked.
- WinGetFuncArgInPartition() tried to derefference "isout" pointer
even if it could be NULL in some places.
Besides the issues, I also fixed a compiler warning reported by Álvaro
Herrera.
Moreover, in WinGetFuncArgInPartition refactor the do...while loop so
that the codes inside the loop simpler. Also simplify the case when
abs_pos < 0.
Author: Tatsuo Ishii <ishii@postgresql.org>
Reviewed-by: Paul Ramsey <pramsey@cleverelephant.ca>
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Reported-by: Álvaro Herrera <alvherre@kurilemu.de>
Discussion: https://postgr.es/m/1686755.1759679957%40sss.pgh.pa.us
Discussion: https://postgr.es/m/202510051612.gw67jlc2iqpw%40alvherre.pgsql
The INFINITY macro is always defined per C99 standard, so this should
mean we can now get rid of the workaround code for when that macro isn't
defined.
Also, delete the (now unneeded) #pragma code which was disabling a
compiler warning in MSVC. There was a comment explaining why the #pragma
was placed outside the function body to work around a MSVC compiler bug,
but the link explaining that was dead, as reported by jian he.
Author: David Rowley <dgrowleyml@gmail.com>
Reported-by: jian he <jian.universality@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CACJufxGARYETnNwtCK7QC0zE_7gq-tfN0mME=gT5rTNtC=VSHQ@mail.gmail.com
Instead, use the new mechanism that allows planner extensions to store
private state inside a PlannerInfo, treating GEQO as an in-core planner
extension. This is a useful test of the new facility, and also buys
back a few bytes of storage.
To make this work, we must remove innerrel_is_unique_ext's hack of
testing whether join_search_private is set as a proxy for whether
the join search might be retried. Add a flag that extensions can
use to explicitly signal their intentions instead.
Reviewed-by: Andrei Lepikhov <lepihov@gmail.com>
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: http://postgr.es/m/CA+TgmoYWKHU2hKr62Toyzh-kTDEnMDeLw7gkOOnjL-TnOUq0kQ@mail.gmail.com
Extension that make extensive use of planner hooks may want to
coordinate their efforts, for example to avoid duplicate computation,
but that's currently difficult because there's no really good way to
pass data between different hooks.
To make that easier, allow for storage of extension-managed private
state in PlannerGlobal, PlannerInfo, and RelOptInfo, along very
similar lines to what we have permitted for ExplainState since commit
c65bc2e1d1.
Reviewed-by: Andrei Lepikhov <lepihov@gmail.com>
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: http://postgr.es/m/CA+TgmoYWKHU2hKr62Toyzh-kTDEnMDeLw7gkOOnjL-TnOUq0kQ@mail.gmail.com
Seems Apple's version of "wc -l" puts spaces before the number.
(I wonder why the cfbot didn't find this.) While here, make
the failure case log what it got, to aid debugging future issues.
Per buildfarm.
We try to use the pager only when more than a screenful's worth of
data is to be printed. However, the code in print.c that's concerned
with counting the number of lines that will be needed missed a lot of
edge cases:
* While plain aligned mode accounted for embedded newlines in column
headers and table cells, unaligned and vertical output modes did not.
* In particular, since vertical mode repeats the headers for each
record, we need to account for embedded newlines in the headers for
each record.
* Multi-line table titles were not accounted for.
* tuples_only mode (where headers aren't printed) wasn't accounted
for.
* Footers were accounted for as one line per footer, again missing
the possibility of multi-line footers. (In some cases such as
"\d+" on a view, there can be many lines in a footer.) Also,
we failed to account for the default footer.
To fix, move the entire responsibility for counting lines into
IsPagerNeeded (or actually, into a new subroutine count_table_lines),
and then expand the logic as appropriate. Also restructure to make it
perhaps a bit easier to follow. It's still only completely accurate
for ALIGNED/WRAPPED/UNALIGNED formats, but the other formats are not
typically used with interactive output.
Arrange to not run count_table_lines at all unless we will use
its result, and teach it to quit early as soon as it's proven
that the output is long enough to require use of the pager.
When dealing with large tables this should save a noticeable
amount of time, since pg_wcssize() isn't exactly cheap.
In passing, move the "flog" output step to the bottom of printTable(),
rather than running it when we've already opened the pager in some
modes. In principle it shouldn't interfere with the pager because
flog should always point to a non-interactive file; but it seems silly
to risk any interference, especially when the existing positioning
seems to have been chosen with the aid of a dartboard.
Also add a TAP test to exercise pager mode. Up to now, we have had
zero test coverage of these code paths, because they aren't reached
unless isatty(stdout). We do have the test infrastructure to improve
that situation, though. Following the lead of 010_tab_completion.pl,
set up an interactive psql and feed it some test cases. To detect
whether it really did invoke the pager, point PSQL_PAGER to "wc -l".
The test is skipped if that utility isn't available.
Author: Erik Wienhold <ewie@ewie.name>
Test-authored-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/2dd2430f-dd20-4c89-97fd-242616a3d768@ewie.name
Previously, subqueries were given names only after they were planned,
which makes it difficult to use information from a previous execution of
the query to guide future planning. If, for example, you knew something
about how you want "InitPlan 2" to be planned, you won't know whether
the subquery you're currently planning will end up being "InitPlan 2"
until after you've finished planning it, by which point it's too late to
use the information that you had.
To fix this, assign each subplan a unique name before we begin planning
it. To improve consistency, use textual names for all subplans, rather
than, as we did previously, a mix of numbers (such as "InitPlan 1") and
names (such as "CTE foo"), and make sure that the same name is never
assigned more than once.
We adopt the somewhat arbitrary convention of using the type of sublink
to set the plan name; for example, a query that previously had two
expression sublinks shown as InitPlan 2 and InitPlan 1 will now end up
named expr_1 and expr_2. Because names are assigned before rather than
after planning, some of the regression test outputs show the numerical
part of the name switching positions: what was previously SubPlan 2 was
actually the first one encountered, but we finished planning it later.
We assign names even to subqueries that aren't shown as such within the
EXPLAIN output. These include subqueries that are a FROM clause item or
a branch of a set operation, rather than something that will be turned
into an InitPlan or SubPlan. The purpose of this is to make sure that,
below the topmost query level, there's always a name for each subquery
that is stable from one planning cycle to the next (assuming no changes
to the query or the database schema).
Author: Robert Haas <rhaas@postgresql.org>
Co-authored-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Alexandra Wang <alexandra.wang.oss@gmail.com>
Reviewed-by: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Junwang Zhao <zhjwpku@gmail.com>
Discussion: http://postgr.es/m/3641043.1758751399@sss.pgh.pa.us
When either inputs of an INTERSECT [ALL] operator are proven not to return
any results (a dummy rel), then mark the entire INTERSECT operation as
dummy.
Likewise, if an EXCEPT [ALL] operation's left input is proven empty, then
mark the entire operation as dummy.
With EXCEPT ALL, we can easily handle the right input being dummy as
we can return the left input without any processing. That can lead to
significant performance gains during query execution. We can't easily
handle dummy right inputs for EXCEPT (without ALL), as that would require
deduplication of the left input. Wiring up those Paths is likely more
complex than it's worth as the gains during execution aren't that great,
so let's leave that one to be handled by the normal Path generation code.
Author: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAApHDvri53PPF76c3M94_QNWbJfXjyCnjXuj_2=LYM-0m8WZtw@mail.gmail.com
The prior code, added in 03d40e4b5 attempted to use the targetlist of the
first UNION child when all UNION children were proven as dummy rels.
That's not going to work when some operation atop of the Result node must
find target entries within the Result's targetlist. This could have been
something as simple as trying to sort the results of the UNION operation,
which would lead to:
ERROR: could not find pathkey item to sort
Instead, use the top-level UNION's targetlist and fix the varnos in
setrefs.c. Because set operation targetlists always use varno==0, we
can rewrite those to become varno==1, i.e. use the Vars from the first
UNION child. This does result in showing Vars from relations that are
not present in the final plan, but that's no different to what we see
when normal base relations are proven dummy.
Without this fix it would be possible to see the following error in
EXPLAIN VERBOSE when all UNION inputs were proven empty.
ERROR: bogus varno: 0
Author: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/CAApHDvrUASy9sfULMEsM2udvZJP6AoBRCZvHYXYxZTy2tX9FYw@mail.gmail.com
Previously, we attempted to form a posting list tuple even when
ginCompressPostingList() failed to compress the posting list due to
its size. While there was no functional failure, it always wasted one
GinFormTuple() call when item pointers didn't fit in a posting list
tuple.
This commit ensures that a GIN index tuple is formed only when all
item pointers in the posting list are successfully compressed.
Author: Arseniy Mukhin <arseniy.mukhin.dev@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/CAE7r3M+C=jcpTD93f_RBHrQp3C+=TAXFs+k4tTuZuuxboK8AvA@mail.gmail.com
The hex_encode() and hex_decode() functions serve as the workhorses
for hexadecimal data for bytea's text format conversion functions,
and some workloads are sensitive to their performance. This commit
adds new implementations that use routines from port/simd.h, which
testing indicates are much faster for larger inputs. For small or
invalid inputs, we fall back on the existing scalar versions.
Since we are using port/simd.h, these optimizations apply to both
x86-64 and AArch64.
Author: Nathan Bossart <nathandbossart@gmail.com>
Co-authored-by: Chiranmoy Bhattacharya <chiranmoy.bhattacharya@fujitsu.com>
Co-authored-by: Susmitha Devanga <devanga.susmitha@fujitsu.com>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Discussion: https://postgr.es/m/aLhVWTRy0QPbW2tl%40nathan
This patch enhances the pg_get_sequence_data function to include the
page-level LSN (Log Sequence Number) of the sequence. This additional
metadata will be used by upcoming patches to support synchronization
of sequences during logical replication.
By exposing the LSN, we enable more accurate tracking of sequence
changes, which is essential for maintaining consistency across
replicated nodes.
Author: vignesh C <vignesh21@gmail.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://www.postgresql.org/message-id/CAA4eK1LC+KJiAkSrpE_NwvNdidw9F2os7GERUeSxSKv71gXysQ@mail.gmail.com
All the other structures describe the list of blocks used, and in the
case of a GIN_INSERT_LISTPAGE record block 0 refers to a list page with
the items added to it.
Author: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Discussion: https://postgr.es/m/CALdSSPgk=9WRoXhZy5fdk+T1hiau7qbL_vn94w_L1N=gtEdbsg@mail.gmail.com
The WAL records XLOG_GIN_INSERT and XLOG_GIN_VACUUM_DATA_LEAF_PAGE
included some information about the blocks added to the record.
This information is already provided by XLogRecGetBlockRefInfo() with
much more details about the blocks included in each record, like the
compression information, for example. This commit removes the block
information that existed in the record descriptions specific to GIN.
Author: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Discussion: https://postgr.es/m/CALdSSPgk=9WRoXhZy5fdk+T1hiau7qbL_vn94w_L1N=gtEdbsg@mail.gmail.com
It is possible to call pg_stat_reset_single_table_counters() on a
relation (index or table) but the reset time was missing from the system
views showing their statistics.
This commit adds the reset time as an attribute of pg_stat_all_tables,
pg_stat_all_indexes, and other relations related to them.
Bump catalog version.
Bump PGSTAT_FILE_FORMAT_ID, as a result of the new field added to
PgStat_StatTabEntry.
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Sami Imseih <samimseih@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/aN8l182jKxEq1h9f@paquier.xyz
stats.sql is already doing some tests coverage on index statistics, by
retrieving for example idx_scan and friends in pg_stat_all_tables.
pg_stat_reset_single_table_counters() is supported for an index for a
long time, but the case was never covered.
This commit closes the gap, by using this reset function on an index,
cross-checking the contents of pg_stat_all_indexes.
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/aN8l182jKxEq1h9f@paquier.xyz
On Windows, this code did not handle error conditions correctly at
all, since it looked at "errno" which is not used for socket-related
errors on that platform. This resulted, for example, in failure
to connect to a PostgreSQL server with GSSAPI enabled.
We have a convention for dealing with this within libpq, which is to
use SOCK_ERRNO and SOCK_ERRNO_SET rather than touching errno directly;
but the GSSAPI code is a relative latecomer and did not get that memo.
(The equivalent backend code continues to use errno, because the
backend does this differently. Maybe libpq's approach should be
rethought someday.)
Apparently nobody tries to build libpq with GSSAPI support on Windows,
or we'd have heard about this before, because it's been broken all
along. Back-patch to all supported branches.
Author: Ning Wu <ning94803@gmail.com>
Co-authored-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAFGqpvg-pRw=cdsUpKYfwY6D3d-m9tw8WMcAEE7HHWfm-oYWvw@mail.gmail.com
Backpatch-through: 13
This adjusts UNION planning so that the planner produces more optimal
plans when one or more of the UNION's subqueries have been proven to be
empty (a dummy rel).
If any of the inputs are empty, then that input can be removed from the
Append / MergeAppend. Previously, a const-false "Result" node would
appear to represent this. Removing empty inputs has a few extra
benefits when only 1 union child remains as it means the Append or
MergeAppend can be removed in setrefs.c, making the plan slightly faster
to execute. Also, we can provide better n_distinct estimates by looking
at the sole remaining input rel's statistics.
Author: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAApHDvri53PPF76c3M94_QNWbJfXjyCnjXuj_2=LYM-0m8WZtw@mail.gmail.com
bms_union() causes a new set to be allocated. What this caller needs is
members added to an existing set. bms_add_members() is the tool for
that job.
This is just a matter of fixing an inefficiency due to surplus memory
allocations. No bugs being fixed.
The only other place I found that might be valid to apply this change is
in markNullableIfNeeded(), but I opted not to do that due to the risk to
reward ratio not looking favorable. The risk being that there *could* be
another pointer pointing to the Bitmapset.
Author: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Greg Burd <greg@burd.me>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAApHDvoCcoS-p5tZNJLTxFOKTYNjqVh7Dwf+5ikDUBwnvWftRw@mail.gmail.com
Presently, the SIMD implementation of this function uses unsigned
saturating subtraction to find bytes less than or equal to the
given value, which is a workaround for the lack of unsigned
comparison instructions on some architectures. However, Neon
offers vminvq_u8(), which returns the minimum (unsigned) value in
the vector. This commit adds a Neon-specific implementation that
uses vminvq_u8() to optimize vector8_has_le() on AArch64.
In passing, adjust the SSE2 implementation to use vector8_min() and
vector8_eq() to find values less than or equal to the given value.
This was the only use of vector8_ssub(), so it has been removed.
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Discussion: https://postgr.es/m/aNHDNDSHleq0ogC_%40nathan
Make some use of anonymous unions, which are allowed as of C11, as
examples and encouragement for future code, and to test compilers.
This commit changes the DSMRegistryEntry struct.
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/aNKsDg0fJwqhZdXX%40nathan
This function is marked as strict, so we can safely remove the checks
checking for NULL input parameters.
Author: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/CAApHDvqiN0+mbooUOSCDALc=GoM8DmTbCdvwnCwak6Wb2O1ZJA@mail.gmail.com
In similar vein to commit ccc8194e42, a reset instance of a shared
memory TID store happened to occupy the same private memory as the old
one for the entry point, since the chunk freed after the last round
of index vacuuming was put on the context's freelist. The failure
to update the vacrel->dead_items pointer was evident by nudging the
system to allocate memory in a different area. This was not discovered
at the time of the earlier commit since our regression tests didn't
cover multiple index passes with parallel vacuum.
Backpatch to v17, when TidStore came in.
Author: Kevin Oommen Anish <kevin.o@zohocorp.com>
Reviewed-by: Richard Guo <guofenglinux@gmail.com>
Tested-by: Richard Guo <guofenglinux@gmail.com>
Discussion: https://postgr.es/m/199a07cbdfc.7a1c4aac25838.1675074408277594551%40zohocorp.com
Backpatch-through: 17
The comment incorrectly references the defunct function
BufFileOpenShared(), which was replaced in commit dcac5e7ac.
This patch updates the comment to refer to the current function
BufFileOpenFileSet().
Author: Zhang Mingli <zmlpostgres@gmail.com>
Reviewed-by: wenhui qiu <qiuwenhuifx@gmail.com>
Reviewed-by: Richard Guo <guofenglinux@gmail.com>
Discussion: https://postgr.es/m/1cb48b4c-54ab-40cc-b355-0b3c2af6d3f7@Spark
Currently, pgbench aborts when a COPY response is received in
readCommandResponse(). However, as PQgetResult() returns an empty
result when there is no asynchronous result, through getCopyResult(),
the logic done at the end of readCommandResponse() for the error path
leads to an infinite loop.
This commit forcefully exits the COPY state with PQendcopy() before
moving to the error handler when fiding a COPY state, avoiding the
infinite loop. The COPY protocol is not supported by pgbench anyway, as
an error is assumed in this case, so giving up is better than having the
tool be stuck forever. pgbench was interruptible in this state.
A TAP test is added to check that an error happens if trying to use
COPY.
Author: Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com>
Discussion: https://postgr.es/m/CAO6_XqpHyF2m73ifV5a=5jhXxH2chk=XrgefY+eWWPe2Eft3=A@mail.gmail.com
Backpatch-through: 13
Add IGNORE NULLS/RESPECT NULLS option (null treatment clause) to lead,
lag, first_value, last_value and nth_value window functions. If
unspecified, the default is RESPECT NULLS which includes NULL values
in any result calculation. IGNORE NULLS ignores NULL values.
Built-in window functions are modified to call new API
WinCheckAndInitializeNullTreatment() to indicate whether they accept
IGNORE NULLS/RESPECT NULLS option or not (the API can be called by
user defined window functions as well). If WinGetFuncArgInPartition's
allowNullTreatment argument is true and IGNORE NULLS option is given,
WinGetFuncArgInPartition() or WinGetFuncArgInFrame() will return
evaluated function's argument expression on specified non NULL row (if
it exists) in the partition or the frame.
When IGNORE NULLS option is given, window functions need to visit and
evaluate same rows over and over again to look for non null rows. To
mitigate the issue, 2-bit not null information array is created while
executing window functions to remember whether the row has been
already evaluated to NULL or NOT NULL. If already evaluated, we could
skip the evaluation work, thus we could get better performance.
Author: Oliver Ford <ojford@gmail.com>
Co-authored-by: Tatsuo Ishii <ishii@postgresql.org>
Reviewed-by: Krasiyan Andreev <krasiyan@gmail.com>
Reviewed-by: Andrew Gierth <andrew@tao11.riddles.org.uk>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: David Fetter <david@fetter.org>
Reviewed-by: Vik Fearing <vik@postgresfriends.org>
Reviewed-by: "David G. Johnston" <david.g.johnston@gmail.com>
Reviewed-by: Chao Li <lic@highgo.com>
Discussion: https://postgr.es/m/flat/CAGMVOdsbtRwE_4+v8zjH1d9xfovDeQAGLkP_B6k69_VoFEgX-A@mail.gmail.com
test_bms_make_singleton is defined as STRICT and only takes a single
parameter, so there is no need to check that parameter for NULL as a
NULL input won't ever reach there.
Author: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/BC483901-9587-4076-B20F-9A414C66AB78@yesql.se
This fixes a typo in the sql/expected test files and removes a
leftover comment from test_bitmapset.c from when the functions
invoked bms_free.
Author: Daniel Gustafsson <daniel@yesql.se>
Reported-by: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/978D21E8-9D3B-40EA-A4B1-F87BABE7868C@yesql.se
Oversight in 2c03216d83, when the redo code of GIN got refactored for
the new WAL format where block information has been standardized, as the
payload data got tracked for each block after the change, and not in the
whole record. This is just a cleanup.
Author: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Discussion: https://postgr.es/m/CALdSSPgnAt5L=D_xGXRXLYO5FK1H31_eYEESxdU1n-r4g+6GqA@mail.gmail.com
Some macOS machines are having trouble with 002_inline, which executes
the JSON parser test executables hundreds of times in a nested loop.
Both developer machines and buildfarm critters have shown excessive test
durations, upwards of 20 seconds.
Push the innermost loop of 002_inline, which iterates through differing
chunk sizes, down into the test executable. (I'd eventually like to push
all of the JSON unit tests down into C, but this is an easy win in the
short term.) Testers have reported a speedup between 4-9x.
Reported-by: Robert Haas <robertmhaas@gmail.com>
Suggested-by: Andres Freund <andres@anarazel.de>
Tested-by: Andrew Dunstan <andrew@dunslane.net>
Tested-by: Tom Lane <tgl@sss.pgh.pa.us>
Tested-by: Robert Haas <robertmhaas@gmail.com>
Discussion: https://postgr.es/m/CA%2BTgmobKoG%2BgKzH9qB7uE4MFo-z1hn7UngqAe9b0UqNbn3_XGQ%40mail.gmail.com
Backpatch-through: 17
Newer compilers warned about
extern int _CRT_glob = 0;
which is indeed a mysterious C construction, as it combines "extern"
and an initialization. It turns out that according to the C standard,
the "extern" is ignored here, so we can remove it to resolve the
warnings. But then we also need to add a real extern
declaration (without initializer) to satisfy
-Wmissing-variable-declarations.
(Note that this code is only active on MinGW.)
Discussion: https://www.postgresql.org/message-id/1053279b-da01-4eb4-b7a3-da6b5d8f73d1%40eisentraut.org
Two macros are added in this module, to cut duplicated patterns:
- PG_ARG_GETBITMAPSET(), for input argument handling, with knowledge
about NULL.
- PG_RETURN_BITMAPSET_AS_TEXT(), that generates a text result from a
Bitmapset.
These changes limit the code so as the SQL functions are now mostly
wrappers of the equivalent C function. Functions that use integer input
arguments still need some NULL handling, like bms_make_singleton().
A NULL input is translated to "<>", which is what nodeToString()
generates. Some of the tests are able to generate this result.
Per discussion, the calls of bms_free() are removed. These may be
justified if the functions are used in a rather long-lived memory
context, but let's keep the code minimal for now. These calls used NULL
checks, which were also not necessary as NULL is an input authorized by
bms_free().
Some of the tests existed to cover behaviors related to the SQL
functions for NULL inputs. Most of them are still relevant, as the
routines of bitmapset.c are able to handle such cases.
The coverage reports of bitmapset.c and test_bitmapset.c remain the
same after these changes, with 300 lines of C code removed.
Author: David Rowley <dgrowleyml@gmail.com>
Co-authored-by: Greg Burd <greg@burd.me>
Discussion: https://postgr.es/m/CAApHDvqghMnm_zgSNefto9oaEJ0S-3Cgb3gdsV7XvLC-hMS02Q@mail.gmail.com
pgbench uses readCommandResponse() to process server responses.
When readCommandResponse() encounters an error during a call to
PQgetResult() to fetch the current result, it attempts to report it
with an additional error message from PQerrorMessage(). However,
previously, this extra error message could be lost or become incorrect.
The cause was that after fetching the current result (and detecting
an error), readCommandResponse() called PQgetResult() again to
peek at the next result. This second call could overwrite the libpq
connection's error message before the original error was reported,
causing the error message retrieved from PQerrorMessage() to be
lost or overwritten.
This commit fixes the issue by updating readCommandResponse()
to use PQresultErrorMessage() instead of PQerrorMessage()
to retrieve the error message generated when the PQgetResult()
for the current result causes an error, ensuring the correct message
is reported.
Backpatch to all supported versions.
Author: Yugo Nagata <nagata@sraoss.co.jp>
Reviewed-by: Chao Li <lic@highgo.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/20250925110940.ebacc31725758ec47d5432c6@sraoss.co.jp
Backpatch-through: 13
This reverts commit efcd5199d8.
I rebased my patch series incorrectly. This patch contained unrelated
parts from another patch, which made the overall build fail. Revert
for now and reconsider.
Stop including utils/relcache.h in access/genam.h, and stop including
htup_details.h in nodes/tidbitmap.h. Both these files (genam.h and
tidbitmap.h) are widely used in other header files, so it's in our best
interest that they remain as lean as reasonable.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/202509291356.o5t6ny2hoa3q@alvherre.pgsql
During recovery, XLogNeedsFlush() checks the minimum recovery LSN point
instead of the flush LSN point. The same condition checks are used when
updating the minimum recovery point in UpdateMinRecoveryPoint(), but are
written in reverse order.
This commit makes the order of the checks consistent between
XLogNeedsFlush() and UpdateMinRecoveryPoint(), improving the code
clarity. Note that the second check (as ordered by this commit) relies
on InRecovery, which is true only in the startup process. So this makes
XLogNeedsFlush() cheaper in the startup process with the first check
acting as a shortcut while doing crash recovery, where
LocalMinRecoveryPoint is an invalid LSN.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Discussion: https://postgr.es/m/aMIHNRTP6Wj6vw1s%40paquier.xyz
Contrary to its siblings for the archiver, the bgwriter and the
checkpointer stats, pgstat_report_inj_fixed() can be called
concurrently. This was causing an assertion failure, while messing up
with the stats.
This code is aimed at being a template for extension developers, so it
is not a critical issue, but let's be correct. This module has also
been useful for some benchmarking, at least for me, and that was how I
have discovered this issue.
Oversight in f68cd847fa.
Author: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: wenhui qiu <qiuwenhuifx@gmail.com>
Discussion: https://postgr.es/m/aNnXbAXHPFUWPIz2@paquier.xyz
Backpatch-through: 18
GROUP BY ALL is a form of GROUP BY that adds any TargetExpr that does
not contain an aggregate or window function into the groupClause of
the query, making it exactly equivalent to specifying those same
expressions in an explicit GROUP BY list.
This feature is useful for certain kinds of data exploration. It's
already present in some other DBMSes, and the SQL committee recently
accepted it into the standard, so we can be reasonably confident in
the syntax being stable. We do have to invent part of the semantics,
as the standard doesn't allow for expressions in GROUP BY, so they
haven't specified what to do with window functions. We assume that
those should be treated like aggregates, i.e., left out of the
constructed GROUP BY list.
In passing, wordsmith some existing documentation about GROUP BY,
and update some neglected synopsis entries in select_into.sgml.
Author: David Christensen <david@pgguru.net>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAHM0NXjz0kDwtzoe-fnHAqPB1qA8_VJN0XAmCgUZ+iPnvP5LbA@mail.gmail.com
Neighbor get_statistics_object_oid() ignores objects in pg_temp, as has
been the standard for non-relation, non-type namespace searches since
CVE-2007-2138. Hence, most operations that name a statistics object
correctly decline to map an unqualified name to a statistics object in
pg_temp. StatisticsObjIsVisibleExt() did not. Consequently,
pg_statistics_obj_is_visible() wrongly returned true for such objects,
psql \dX wrongly listed them, and getObjectDescription()-based ereport()
and pg_describe_object() wrongly omitted namespace qualification. Any
malfunction beyond that would depend on how a human or application acts
on those wrong indications. Commit
d99d58cdc8 introduced this. Back-patch to
v13 (all supported versions).
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Discussion: https://postgr.es/m/20250920162116.2e.nmisch@google.com
Backpatch-through: 13
This commit expands the set of tests added by 00c3d87a5c, to bring the
coverage of bitmapset.c close to 100% by addressing a lot of corner
cases (most of these relate to word counts and reallocations).
Some of the functions of this module also have their own idea of the
result to return depending on the input values given. These are
specific to the module, still let's add more coverage for all of them.
Some comments are made more consistent in the tests, while on it.
Author: Greg Burd <greg@burd.me>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/aNR-gsGmLnMaNT5i@paquier.xyz
For UNION, EXCEPT and INTERSECT, we were not very good at estimating the
PathTarget.width for the set operation. Since the targetlist of the set
operation is made up of Vars with varno==0, this would result in
get_expr_width() applying a default estimate based on the Var's type
rather than taking width estimates from any relation's statistics.
Here we attempt to improve the situation by looking at the width estimates
for the set operation child paths and calculating the average width of the
relevant child paths weighted over the estimated number of rows. For
UNION and INTERSECT, the relevant paths to look at are *all* child paths.
For EXCEPT, since we don't return rows from the right-hand child (only
possibly remove left-hand rows matching those), we use only the left-hand
child for width estimates.
This also adjusts the hashed-UNION Path's PathTarget to use the same
PathTarget as its Append subpath. Both PathTargets will be the same and
are void of any resjunk columns, per generate_append_tlist(). Making
the AggPath use the same PathTarget saves having to adjust the "width"
of the AggPath's PathTarget too.
This was reported as a bug by sunw.fnst, but it's not something we ever
claimed to do properly. Plus, if we were to adjust this in back
branches, plans could change as the estimated input sizes to Sorts and
Hash Aggregates could go up or down. Plan choices aren't something we
want to destabilize in stable versions.
Reported-by: sunw.fnst <936739278@qq.com>
Author: David Rowley <drowleyml@gmail.com>
Discussion: https://postgr.es/m/tencent_34CF8017AB81944A4C08DD089D410AB6C306@qq.com
This serves as coverage for the tracking of entry count added by
7bd2975fa9 as built-in variable-sized stats kinds have no need for it,
at least not yet.
A new function, called injection_points_stats_count(), is added to the
module. It is able to return the number of entries. This has been
useful when doing some benchmarking to check the sanity of the counts.
Author: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/aMPKWR81KT5UXvEr@paquier.xyz
Stats kinds can set a new option called "track_entry_count" (disabled by
default, available for variable-numbered stats) that will make pgstats
track the number of entries that exist in its shared hashtable.
As there is only one code path where a new entry is added, and one code
path where entries are freed, the count tracking is straight-forward in
its implementation. Reads of these counters are optimistic, and may
change across two calls. The counter is incremented when an entry is
created (not when reused), and is decremented when an entry is freed
from the hashtable (marked for drop with its refcount reaching 0), which
is something that pgstats decides internally.
A first use case of this facility would be pg_stat_statements, where we
need to be able to cap the number of entries that would be stored in the
shared hashtable, based on its "max" GUC. The module currently relies
on hash_get_num_entries(), which offers a cheap way to count how many
entries are in its hash table, but we cannot do that in pgstats for
variable-sized stats kinds as a single hashtable is used for all the
stats kinds. Independently of PGSS, this is useful for other custom
stats kinds that want to cap, control, or track the number of entries
they have, without depending on a potentially expensive sequential scan
to know the number of entries while holding an extra exclusive lock.
Author: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Sami Imseih <samimseih@gmail.com>
Reviewed-by: Keisuke Kuroda <keisuke.kuroda.3862@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/aMPKWR81KT5UXvEr@paquier.xyz
transformPLAssignStmt contained many lines cribbed directly from
transformSelectStmt. I had supposed that we could manage to keep
the two copies in sync, but the bug just fixed in 7504d2be9 shows
that that hope was foolish. Let's refactor so there's just one copy.
The main stumbling block to doing this is that transformPLAssignStmt
has a chunk of custom code that has to run after transformTargetList
but before we potentially modify the tlist further during analysis
of ORDER BY and GROUP BY. Rather than make transformSelectStmt fully
aware of PLAssignStmt processing, I put that code into a callback
function. It still feels a little bit ugly, but it's not too awful,
and surely it's better than a hundred lines of duplicated code.
The steps involved in processing a PLAssignStmt remain exactly
the same as before, just in different places.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/31027.1758919078@sss.pgh.pa.us
Because we failed to do this, DISTINCT in GROUP BY DISTINCT would be
ignored in PL/pgSQL assignment statements. It's not surprising that
no one noticed, since such statements will throw an error if the query
produces more than one row. That eliminates most scenarios where
advanced forms of GROUP BY could be useful, and indeed makes it hard
even to find a simple test case. Nonetheless it's wrong.
This is directly the fault of be45be9c3 which added the groupDistinct
field, but I think much of the blame has to fall on c9d529848, in
which I incautiously supposed that we'd manage to keep two copies of
a big chunk of parse-analysis logic in sync. As a follow-up, I plan
to refactor so that there's only one copy. But that seems useful
only in master, so let's use this one-line fix for the back branches.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/31027.1758919078@sss.pgh.pa.us
Backpatch-through: 14
It had always been intended that this already works correctly as
pg_unreachable() uses __assume(0) on MSVC, and that directs the compiler
in a way so it knows that a given function won't return. However, with
ereport_domain(), it didn't work...
It's now understood that the failure to determine that elog(ERROR) does not
return comes from the inability of the MSVC compiler to detect the "const
int elevel_" is the same as the "elevel" macro parameter. MSVC seems to be
unable to make the "if (elevel_ >= ERROR) branch constantly-true when the
macro is used with any elevel >= ERROR, therefore the pg_unreachable() is
seen to only be present in a *conditional* branch rather than present
*unconditionally*.
While there seems to be no way to force the compiler into knowing that
elevel_ is equal to elevel within the ereport_domain() macro, there is a
way in C11 to determine if the elevel parameter is a compile-time
constant or not. This is done via some hackery using the _Generic()
intrinsic function, which gives us functionality similar to GCC's
__builtin_constant_p(), albeit only for integers.
Here we define pg_builtin_integer_constant_p() for this purpose.
Callers can check for availability via HAVE_PG_BUILTIN_INTEGER_CONSTANT_P.
ereport_domain() has been adjusted to use
pg_builtin_integer_constant_p() instead of __builtin_constant_p().
It's not quite clear at this stage if this now allows us to forego doing
the likes of "return NULL; /* keep compiler quiet */" as there may be other
compilers in use that have similar struggles. It's just a matter of time
before someone commits a function that does not "return" a value after
an elog(ERROR). Let's make time and lack of complaints about said commit
be the judge of if we need to continue the "/* keep compiler quiet */"
palaver.
Author: David Rowley <drowleyml@gmail.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/CAApHDvom02B_XNVSkvxznVUyZbjGAR+5myA89ZcbEd3=PA9UcA@mail.gmail.com
This allows these routines to be reused by a future utility heavily
based on vacuumdb.
I made a few relatively minor changes from the original, most notably:
- objfilter was declared as an enum but the values are bit-or'ed, and
individual bits are tested throughout the code. We've discussed this
coding pattern in other contexts and stayed away from it, on the
grounds that the values so generated aren't really true values of the
enum. This commit changes it to be a bits32 with a few #defines for
the flag definitions, the way we do elsewhere. Also, instead of being
a global variable, it's now in the vacuumingOptions struct.
- Two booleans, analyze_only (in vacuumingOptions) and analyze_in_stages
(passed around as a separate boolean argument), are really determining
what "mode" the program runs in -- it's either vacuum, or one of those
two modes. I have three adjectives for them: inconsistent,
unergonomic, unorthodox. Removing these and replacing them with a
mode enum to be kept in vacuumingOptions makes the code structure easier
to understand in a couple of places, and it'll be useful for the new
mode we add next, so do that.
Reviewed-by: Antonin Houska <ah@cybertec.at>
Discussion: https://postgr.es/m/202508301750.cbohxyy2pcce@alvherre.pgsql
Use our established coding pattern to reduce maintenance pain when
adding other per-process-type characteristics.
Like PG_KEYWORD, PG_CMDTAG, PG_RMGR.
To keep the strings translatable, the relevant makefile now also scans
src/include for this specific file. I didn't want to have it scan all
.h files, as then gettext would have to scan all header files. I didn't
find any way to affect the meson behavior in this respect though.
Author: Álvaro Herrera <alvherre@kurilemu.de>
Co-authored-by: Jonathan Gonzalez V. <jonathan.abdiel@gmail.com>
Discussion: https://postgr.es/m/202507151830.dwgz5nmmqtdy@alvherre.pgsql
When running pgbench with --verbose-errors option and a custom script that
triggered retriable errors (e.g., serialization errors) in pipeline mode,
an assertion failure could occur:
Assertion failed: (sql_script[st->use_file].commands[st->command]->type == 1), function commandError, file pgbench.c, line 3062.
The failure happened because pgbench assumed these errors would only occur
during SQL commands, but in pipeline mode they can also happen during
\endpipeline meta command.
This commit fixes the assertion failure by adjusting the assertion check to
allow such errors during either SQL commands or \endpipeline.
Backpatch to v15, where the assertion check was introduced.
Author: Yugo Nagata <nagata@sraoss.co.jp>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CAHGQGwGWQMOzNkQs-LmpDHdNC0h8dmAuUMRvZrEntQi5a-b=Kg@mail.gmail.com
This improves the stability of VACUUM when processing btree indexes,
which was previously able to trigger an assertion failure in
_bt_lock_subtree_parent() when an error was previously thrown outside
the scope of _bt_split() when splitting a btree page. VACUUM would
consider the index as in a corrupted state as the right page would not
be zeroed for the error thrown (allocation failure is one pattern).
In a non-assert build, VACUUM is able to succeed, reporting what it sees
as a corruption while attempting to fix the index. This would manifest
as a LOG message, as of:
LOG: failed to re-find parent key in index "idx" for deletion target
page N
CONTEXT: while vacuuming index "idx" of relation "public.tab"
This commit improves the code to rely on two PGAlignedBlocks that are
used as a temporary space for the left and right pages. The main change
concerns the right page, whose contents are now copied into the
"temporary" PGAlignedBlock page while its original space is zeroed. Its
contents are moved from the PGAlignedBlock page back to the page once we
enter in the critical section used for the split. This simplifies the
split logic, as it is not necessary to zero the right page before
throwing an error anymore. Hence errors can now be thrown outside the
split code. For the left page, this shaves one allocation, with
PageGetTempPage() being previously used.
The previous logic originates from commit 8fa30f906b, at a point where
PGAlignedBlock did not exist yet. This could be argued as something
that should be backpatched, but the lack of complaints indicates that it
may not be necessary.
Author: Konstantin Knizhnik <knizhnik@garret.ru>
Discussion: https://postgr.es/m/566dacaf-5751-47e4-abc6-73de17a5d42a@garret.ru
The comment claimed that a TABLESPACE reference was added to the
resulting string, but that's not true. Looks like the comment was
copied from pg_get_indexdef_string() without being adjusted correctly.
Reported-by: jian he <jian.universality@gmail.com>
Discussion: https://postgr.es/m/CACJufxHwVPgeu8o9D8oUeDQYEHTAZGt-J5uaJNgYMzkAW7MiCA@mail.gmail.com
... and find_window_run_conditions.
This seems to have been around and unused ever since the Run Condition
feature was added in 9d9c02ccd. Let's remove it to clean things up a
bit.
Author: Matheus Alcantara <matheusssilv97@gmail.com>
Discussion: https://postgr.es/m/DD26NJ0Y34ZS.2ZOJPHSY12PFI@gmail.com
While most tab completions in match_previous_words() use
COMPLETE_WITH* macros to wrap rl_completion_matches(), some direct
calls to rl_completion_matches() still remained.
This commit introduces COMPLETE_WITH_FILES and COMPLETE_WITH_GENERATOR
macros to replace these direct calls, enhancing both code consistency
and readability.
Author: Yugo Nagata <nagata@sraoss.co.jp>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/20250605100835.b396f9d656df1018f65a4556@sraoss.co.jp
I noticed the surprising behavior that pg_sleep(0.001) will sleep
for 2ms not the expected 1ms. Apparently the float8 calculation of
time-to-sleep is managing to produce something a hair over 1, which
ceil() rounds up to 2, and then WaitLatch() faithfully waits 2ms.
It could be that this works as-expected for some ranges of current
timestamp but not others, which would account for not having seen
it before. In any case, let's try to avoid it by removing the
float arithmetic in the delay calculation. We're stuck with the
declared input type being float8, but we can convert that to integer
microseconds right away, and then work strictly with integral values.
There might still be roundoff surprises for certain input values,
but at least the behavior won't be time-varying.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Discussion: https://postgr.es/m/3879137.1758825752@sss.pgh.pa.us
The functions test_stat_func() and test_stat_func2() had empty
function bodies, so that they took very little time to run. This made
it possible that on machines with relatively low timer resolution the
functions could return before the clock advanced, making the test fail
(as seen on buildfarm members fruitcrow and hamerkop).
To avoid that, pg_sleep for 10us during the functions. As far as we
can tell, all current hardware has clock resolution much less than
that. (The current implementation of pg_sleep will round it up to
1ms anyway, but someday that might get improved.)
Author: Michael Banck <mbanck@gmx.net>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/68d413a3.a70a0220.24c74c.8be9@mx.google.com
Backpatch-through: 15
If we already have an extension_state array but see a new extension_id
much larger than the highest the extension_id we've previously seen,
the old code might have failed to expand the array to a large enough
size, leading to disaster. Also, if we don't have an extension array
at all and need to create one, we should make sure that it's big enough
that we don't have to resize it instantly.
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: http://postgr.es/m/2949591.1758570711@sss.pgh.pa.us
Backpatch-through: 18
When defining an injection point there is no need to wrap the definition
with USE_INJECTION_POINT guards, the INJECTION_POINT macro is available
in all builds. Remove to make the code consistent.
Author: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/OSCPR01MB14966C8015DEB05ABEF2CE077F51FA@OSCPR01MB14966.jpnprd01.prod.outlook.com
Backpatch-through: 17
... which silently propagates a lot of headers into many places
via pgstat.h, as evidenced by the variety of headers that this patch
needs to add to seemingly random places. Add a minimum of typedefs to
conflict.h to be able to remove execnodes.h, and fix the fallout.
Backpatch to 18, where conflict.h first appeared.
Discussion: https://postgr.es/m/202509191927.uj2ijwmho7nv@alvherre.pgsql
Previously, vacuumdb --analyze-only issued VACUUM (ONLY_DATABASE_STATS)
at the end. Since --analyze-only is meant to update optimizer statistics only,
this extra VACUUM command is unnecessary.
This commit prevents vacuumdb --analyze-only from running that redundant
VACUUM command.
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Mircea Cadariu <cadariu.mircea@gmail.com>
Discussion: https://postgr.es/m/CAHGQGwEqHGa-k=wbRMucUVihHVXk4NQkK94GNN=ym9cQ5HBSHg@mail.gmail.com
f83d709760 incorrectly refers to a XLOG_HEAP2_PRUNE_FREEZE WAL record
opcode. No such code exists. The relevant opcodes are
XLOG_HEAP2_PRUNE_ON_ACCESS, XLOG_HEAP2_PRUNE_VACUUM_SCAN, and
XLOG_HEAP2_PRUNE_VACUUM_CLEANUP. Correct it.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/yn4zp35kkdsjx6wf47zcfmxgexxt4h2og47pvnw2x5ifyrs3qc%407uw6jyyxuyf7
Without this, rebuilds can malfunction unless --enable-depend is used.
Historically we've expected that you can get away without
--enable-depend as long as you manually clean after changing *.h
files; the makefiles are supposed to handle other sorts of
dependencies. So add this one.
Follow-on to 635998965, so no need for back-patch.
Discussion: https://postgr.es/m/3121329.1758650878@sss.pgh.pa.us
We were already doing a short (1-second) pg_test_timing run during
check-world and buildfarm runs. But we weren't doing anything
with the result except for a basic regex-based sanity check.
Collecting that output from buildfarm runs is seeming very
attractive though, because it would help us determine what sort
of timing resolution is available on supported platforms.
It's not very long, so let's just note it verbatim in the TAP log.
Discussion: https://postgr.es/m/3321785.1758728271@sss.pgh.pa.us
This commit corrects several issues in function comments:
* The parameter "rel" was incorrectly referred to as "relation" in the comments
for table_tuple_delete(), table_tuple_update(), and table_tuple_lock().
* In table_tuple_delete(), "changingPart" was listed as an output parameter
in the comments but is actually input.
* In table_tuple_update(), "slot" was listed as an input parameter
in the comments but is actually output.
* The comment for "update_indexes" in table_tuple_update() was mis-indented.
* The comments for heap_lock_tuple() incorrectly referenced a non-existent
"tid" parameter.
Author: Chao Li <lic@highgo.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CAEoWx2nB6Ay8g=KEn7L3qbYX_4+sLk9XOMkV0XZqHR4cTY8ZvQ@mail.gmail.com
Format validation and element extraction for intermediate line
strings were inconsistent in their handling of tab delimiters,
which resulted in an unclear error when multiple tab characters
were used as a delimiter. This fixes it by using captures from
the validation regex instead of a separate split() to avoid the
inconsistency. Also, it ensures that \t+ is used consistently
when inspecting the strings.
Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/20250729.135638.1148639539103758555.horikyota.ntt@gmail.com
Mappings for 18 characters have changed, affecting 36 code points. This
is a break in compatibility, but these characters are rarely used.
U+E5E5 (Private Use Area) was previously mapped to \xA3A0. This code
point now maps to \x65356535. Attempting to convert \xA3A0 will now
raise an error.
Separate from the 2022 update, the following mappings were previously
swapped, and subsequently corrected in 2000 and later versions:
* U+E7C7 (Private Use Area) now maps to \x8135F437
* U+1E3F (Latin Small Letter M with Acute) now maps to \xA8BC
The 2022 standard mentions the following policy changes, but they
have no effect in our implementation:
66 new ideographs are now required, but these are mapped
algorithmically so were already handled by utf8_and_gb18030.c.
Nine CJK compatibility ideographs are no longer required, but
implementations may retain them, as does the source we use from
the Unicode Consortium.
Release notes: Compatibility section
For further details, see:
https://www.unicode.org/L2/L2022/22274-disruptive-changes.pdfhttps://ken-lunde.medium.com/the-gb-18030-2022-standard-3d0ebaeb4132
Author: Chao Li <lic@highgo.com>
Author: Zheng Tao <taoz@highgo.com>
Discussion: https://postgr.es/m/966d9fc.169.198741fe60b.Coremail.jiaoshuntian%40highgo.com
Previously, the parallel apply worker used SIGINT to receive a graceful
shutdown signal from the leader apply worker. However, SIGINT is also used
by the LOCK_TIMEOUT handler to trigger a query-cancel interrupt. This
overlap caused the parallel apply worker to miss LOCK_TIMEOUT signals,
leading to incorrect behavior during lock wait/contention.
This patch resolves the conflict by switching the graceful shutdown signal
from SIGINT to SIGUSR2.
Reported-by: Zane Duffield <duffieldzane@gmail.com>
Diagnosed-by: Zhijie Hou <houzj.fnst@fujitsu.com>
Author: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Backpatch-through: 16, where it was introduced
Discussion: https://postgr.es/m/CACMiCkXyC4au74kvE2g6Y=mCEF8X6r-Ne_ty4r7qWkUjRE4+oQ@mail.gmail.com
The macros doing conversions of/from "text" from/to Bitmapset were using
arbitrary casts with Datum, something that is not fine since
2a600a93c7.
These macros do not actually need casts with Datum, as they are given
already "text" and Bitmapset data in input. They are updated to use
cstring_to_text() and text_to_cstring(), fixing the compiler warnings
reported by the buildfarm. Note that appending a -m32 to gcc to trigger
32-bit builds was enough to reproduce the warnings here.
While on it, outer parenthesis are added to TEXT_TO_BITMAPSET(), and
inner parenthesis are removed from BITMAPSET_TO_TEXT(), to make these
macros more consistent with the style used in the tree, based on
suggestions by Tom Lane.
Oversights in commit 00c3d87a5c.
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Author: Greg Burd <greg@burd.me>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/3027069.1758606227@sss.pgh.pa.us
Result nodes now include an RTI set, which is only non-NULL when they
have no subplan, and is taken from the relid set of the RelOptInfo that
the Result is generating. ExplainPreScanNode now takes notice of these
RTIs, which means that a few things get schema-qualified in the
regression tests that previously did not. This makes the output more
consistent between cases where some part of the plan tree is replaced by
a Result node and those where this does not happen.
Likewise, pg_overexplain's EXPLAIN (RANGE_TABLE) now displays the RTIs
stored in a Result node just as it already does for other RTI-bearing
node types.
Result nodes also now include a result_reason, which tells us something
about why the Result node was inserted. Using that information, EXPLAIN
now emits, where relevant, a "Replaces" line describing the origin of
a Result node.
The purpose of these changes is to allow code that inspects a Plan
tree to understand the origin of Result nodes that appear therein.
Discussion: http://postgr.es/m/CA+TgmoYeUZePZWLsSO+1FAN7UPePT_RMEZBKkqYBJVCF1s60=w@mail.gmail.com
Reviewed-by: Alexandra Wang <alexandra.wang.oss@gmail.com>
Reviewed-by: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Junwang Zhao <zhjwpku@gmail.com>
Bitmapset has a complex set of APIs, defined in bitmapset.h, and it can
be hard to test edge cases with the backend core code only.
This test module is aimed at closing the gap, and implements a set of
SQL functions that act as wrappers of the low-level C functions of the
same names. These functions rely on text as data type for the input and
the output as Bitmapset as a node has support for these. An extra
function, named test_random_operations(), can be used to stress bitmaps
with random member values and a defined number of operations potentially
useful for other purposes than only tests.
The coverage increases from 85.2% to 93.4%. It should be possible to
cover more code paths, but at least it's a beginning.
Author: Greg Burd <greg@burd.me>
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/7BD1ABDB-B03A-464A-9BA9-A73B55AD8A1F@getmailspring.com
This adds support for base64url encoding and decoding, a base64
variant which is safe to use in filenames and URLs. base64url
replaces '+' in the base64 alphabet with '-' and '/' with '_',
thus making it safe for URL addresses and file systems.
Support for base64url was originally suggested by Przemysław Sztoch.
Author: Florents Tselai <florents.tselai@gmail.com>
Reviewed-by: Aleksander Alekseev <aleksander@timescale.com>
Reviewed-by: David E. Wheeler <david@justatheory.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Chao Li (Evan) <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/70f2b6a8-486a-4fdb-a951-84cef35e22ab@sztoch.pl
The lossy-counting algorithm that ANALYZE uses to identify most-common
array elements has a notion of cutoff frequency: elements with
frequency greater than that are guaranteed to be collected, elements
with smaller frequencies are not. In cases where we find fewer MCEs
than the stats target would permit us to store, the cutoff frequency
provides valuable additional information, to wit that there are no
non-MCEs with frequency greater than that. What the selectivity
estimation functions actually use the "minfreq" entry for is as a
ceiling on the possible frequency of non-MCEs, so using the cutoff
rather than the lowest stored MCE frequency provides a tighter bound
and more accurate estimates.
Therefore, instead of redundantly storing the minimum observed MCE
frequency, store the cutoff frequency when there are fewer tracked
values than we want. (When there are more, then of course we cannot
assert that no non-stored elements are above the cutoff frequency,
since we're throwing away some that are; so we still use the
minimum stored frequency in that case.)
Notably, this works even when none of the values are common enough
to be called MCEs. In such cases we previously stored nothing in
the STATISTIC_KIND_MCELEM pg_statistic slot, which resulted in the
selectivity functions falling back to default estimates. So in that
case we want to construct a STATISTIC_KIND_MCELEM entry that contains
no "values" but does have "numbers", to wit the three extra numbers
that the MCELEM entry type defines. A small obstacle is that
update_attstats() has traditionally stored a null, not an empty array,
when passed zero "values" for a slot. That gives rise to an MCELEM
entry that get_attstatsslot() will spit up on. The least risky
solution seems to be to adjust update_attstats() so that it will emit
a non-null (but possibly empty) array when the passed stavalues array
pointer isn't NULL, rather than conditioning that on numvalues > 0.
In other existing cases I don't believe that that changes anything.
For consistency, handle the stanumbers array the same way.
In passing, improve the comments in routines that use
STATISTIC_KIND_MCELEM data. Particularly, explain why we use
minfreq / 2 not minfreq as the estimate for non-MCE values.
Thanks to Matt Long for the suggestion that we could apply this
idea even when there are more than zero MCEs.
Reported-by: Mark Frost <FROSTMAR@uk.ibm.com>
Reported-by: Matt Long <matt@mattlong.org>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/PH3PPF1C905D6E6F24A5C1A1A1D8345B593E16FA@PH3PPF1C905D6E6.namprd15.prod.outlook.com
Commit a391ff3c3, which added the ability for a function's support
function to provide a custom selectivity estimate for "WHERE f(...)",
unintentionally removed the possibility of applying expression
statistics after finding there's no applicable support function.
That happened because we no longer fell through to boolvarsel()
as before. Refactor to do so again, putting the 0.3333333 default
back into boolvarsel() where it had been (cf. commit 39df0f150).
I surely wouldn't have made this error if 39df0f150 had included
a test case, so add one now. At the time we did not have the
"extended statistics" infrastructure, but we do now, and it is
also unable to work in this scenario because of this error.
So make use of that for the test case.
This is very clearly a bug fix, but I'm afraid to put it into
released branches because of the likelihood of altering plan
choices, which we avoid doing in minor releases. So, master only.
Reported-by: Frédéric Yhuel <frederic.yhuel@dalibo.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/a8b99dce-1bfb-4d97-af73-54a32b85c916@dalibo.com
Initially this was to fix the "catched" typo, but I (David) wasn't quite
clear on what the previous comment meant about being "effective". I
expect this means efficiency, so I've reworded the comment to indicate
that.
While this is only a comment fixup, for the sake of possibly minimizing
possible future backpatching pain, I've opted to backpatch to 18 since
this code is new to that version and the release isn't out the door yet.
Author: Tender Wang <tndrwang@gmail.com>
Discussion: https://postgr.es/m/CAHewXNmSYWPud1sfBvpKbCJeRkWeZYuqatxtV9U9LvAFXBEiBw@mail.gmail.com
Backpatch-through: 18
Commit 216a784829 introduced parallel apply workers, allowing multiple
processes to share a replication origin. To support this,
replorigin_session_setup() was extended to accept a pid argument
identifying the process using the origin.
This commit exposes that capability through the SQL interface function
pg_replication_origin_session_setup() by adding an optional pid parameter.
This enables multiple processes to coordinate replication using the same
origin when using SQL-level replication functions.
This change allows the non-builtin logical replication solutions to
implement parallel apply for large transactions.
Additionally, an existing internal error was made user-facing, as it can
now be triggered via the exposed SQL API.
Author: Doruk Yilmaz <doruk@mixrank.com>
Author: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Euler Taveira <euler@eulerto.com>
Discussion: https://postgr.es/m/CAMPB6wfe4zLjJL8jiZV5kjjpwBM2=rTRme0UCL7Ra4L8MTVdOg@mail.gmail.com
Discussion: https://postgr.es/m/CAE2gYzyTSNvHY1+iWUwykaLETSuAZsCWyryokjP6rG46ZvRgQA@mail.gmail.com
When deciding which code path to use depending on the state of recovery,
XLogFlush() and XLogNeedsFlush() have been relying on different
criterias:
- XLogFlush() relied on XLogInsertAllowed().
- XLogNeedsFlush() relied on RecoveryInProgress().
Currently, the checkpointer is allowed to insert WAL records while
RecoveryInProgress() returns true for an end-of-recovery checkpoint,
where XLogInsertAllowed() matters. Using RecoveryInProgress() in
XLogNeedsFlush() did not really matter for its existing callers, as the
checkpointer only called XLogFlush(). However, a feature under
discussion, by Melanie Plageman, needs XLogNeedsFlush() to be able to
work in more contexts, the end-of-recovery checkpoint being one.
This commit changes XLogNeedsFlush() to use XLogInsertAllowed() instead
of RecoveryInProgress(), making the checks in both routines more
consistent. While on it, an assertion based on XLogNeedsFlush() is
added at the end of XLogFlush(), triggered when flushing a physical
position (not for the normal recovery patch that checks for updates of
the minimum recovery point). This assertion would fail for example in
the recovery test 015_promotion_pages if XLogNeedsFlush() is changed to
use RecoveryInProgress(). This should be hopefully enough to ensure
that the checks done in both routines remain consistent.
Author: Melanie Plageman <melanieplageman@gmail.com>
Co-authored-by: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Jeff Davis <pgsql@j-davis.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CAAKRu_a1vZRZRWO3_jv_X13RYoqLRVipGO0237g5PKzPa2YX6g@mail.gmail.com
Commit bb3ec16e14 moved partition pruning metadata into PlannedStmt.
At executor startup this metadata is used to initialize the EState
fields es_part_prune_infos, es_part_prune_states, and
es_part_prune_results. EvalPlanQualStart() failed to copy those
fields into the child EState, causing NULL dereference when Append
ran partition pruning during a recheck. This can occur with DELETE
or UPDATE on partitioned tables that use runtime pruning, e.g. with
generic plans.
Fix by copying all partition pruning state into the EPQ estate.
Add an isolation test that reproduces the crash with concurrent
UPDATE and DELETE on a partitioned table, where the DELETE session
hits the crash during its EPQ recheck after the UPDATE commits.
Bug: #19056
Reported-by: Fei Changhong <feichanghong@qq.com>
Diagnozed-by: Fei Changhong <feichanghong@qq.com>
Author: David Rowley <dgrowleyml@gmail.com>
Co-authored-by: Amit Langote <amitlangote09@gmail.com>
Discussion: https://postgr.es/m/19056-a677cef9b54d76a0%40postgresql.org
This change is a tighter rework of 7d85d87f4d, which tried to improve
the code so as it would work should PgStat_HashKey gain new fields that
create padding bytes. However, the previous change is proving to not be
enough as some code paths of pgstats do not pass PgStat_HashKey by
reference (valgrind would warn when padding is added to the structure,
through a new field).
Per discussion, let's document and check that PgStat_HashKey has no
padding rather than try to complicate the code of pgstats so as it is
able to work around that.
This removes a couple of memset(0) calls that should not be required.
While on it, this commit adds a static assertion checking that no
padding is introduced in the structure, by checking that the size of
PgStat_HashKey matches with the sum of the size of all its fields.
The object ID part of the hash key is already 8 bytes, which should be
plenty enough already. A comment is added to discourage the addition of
new fields.
Author: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Sami Imseih <samimseih@gmail.com>
Discussion: https://postgr.es/m/CAA5RZ0t9omat+HVSakJXwTMWvhpYFcAZb41RPWKwrKFUgmAFBQ@mail.gmail.com
When shared memory is re-initialized after a crash, the named
LWLock tranche request array that was copied to shared memory will
no longer be accessible. To fix, save the pointer to the original
array in postmaster's local memory, and switch to it when
re-initializing the LWLock-related shared memory.
Oversight in commit ed1aad15e0. Per buildfarm member batta.
Reported-by: Michael Paquier <michael@paquier.xyz>
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/aMoejB3iTWy1SxfF%40paquier.xyz
Discussion: https://postgr.es/m/f8ca018f-3479-49f6-a92c-e31db9f849d7%40gmail.com
Previously, pg_restore did not skip security labels on publications or
subscriptions even when --no-publications or --no-subscriptions was specified.
As a result, it could issue SECURITY LABEL commands for objects that were
never created, causing those commands to fail.
This commit fixes the issue by ensuring that security labels on publications
and subscriptions are also skipped when the corresponding options are used.
Backpatch to all supported versions.
Author: Jian He <jian.universality@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CACJufxHCt00pR9h51AVu6+yPD5J7JQn=7dQXxqacj0XyDhc-fA@mail.gmail.com
Backpatch-through: 13
StrategyInitialize() calls InitBufTable() with maximum number of entries that
the buffer lookup table can ever have. Thus there should not be any need to
allocate more element after initialization. Hence mark the hash table as fixed
sized.
Author: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Discussion: https://postgr.es/m/CAExHW5v0jh3F_wj86yC=qBfWk0uiT94qy=Z41uzAHLHh0SerRA@mail.gmail.com
If an aggregate function call contains a sub-select that has
an RTE referencing a CTE outside the aggregate, we must treat
that reference like a Var referencing the CTE's query level
for purposes of determining the aggregate's level. Otherwise
we might reach the nonsensical conclusion that the aggregate
should be evaluated at some query level higher than the CTE,
ending in a planner error or a broken plan tree that causes
executor failures.
Bug: #19055
Reported-by: BugForge <dllggyx@outlook.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/19055-6970cfa8556a394d@postgresql.org
Backpatch-through: 13
Commit 2a600a93 made Datum 8 bytes wide everywhere. It was no longer
appropriate to use TypeSizeT on 32 bit systems, and JIT compilation
would fail with various type check errors. Introduce a separate
LLVMTypeRef with the name TypeDatum. TypeSizeT is still used in some
places for actual size_t values.
Reported-by: Dmitry Mityugov <d.mityugov@postgrespro.ru>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Tested-by: Dmitry Mityugov <d.mityugov@postgrespro.ru>
Discussion: https://postgr.es/m/0a9f0be59171c2e8f1b3bc10f4fcf267%40postgrespro.ru
The pending entry was not used when incrementing its data, directly
manipulating the shared memory pointer, without even locking it. This
could mean losing statistics under concurrent activity. The flush
callback was a no-op.
This code serves as a base template for extensions for the custom
cumulative statistics, so let's be clean and use a pending entry for the
incrementations, whose data is then flushed to the corresponding entry
in the shared hashtable when all the stats are reported, in its own
flush callback.
Author: Sami Imseih <samimseih@gmail.com>
Discussion: https://postgr.es/m/CAA5RZ0v0U0yhPbY+bqChomkPbyUrRQ3rQXnZf_SB-svDiQOpgQ@mail.gmail.com
Backpatch-through: 18
The shared memory size was calculated based on an offset of io_handles,
which is itself a pointer included in the structure. We tend to
overestimate the shared memory size overall, so this was unlikely an
issue in practice, but let's be correct and use the full size of the
structure in the calculation, so as the pointer for io_handles is
included.
Oversight in da7226993f.
Author: Madhukar Prasad <madhukarprasad@google.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Discussion: https://postgr.es/m/CAKi+wrbC2dTzh_vKJoAZXV5wqTbhY0n4wRNpCjJ=e36aoo0kFw@mail.gmail.com
Backpatch-through: 18
The EvalPlanQual recheck for TID Range Scan wasn't rechecking the TID qual
still passed after following update chains. This could result in tuples
being updated or deleted by plans using TID Range Scans where the ctid of
the new (updated) tuple no longer matches the clause of the scan. This
isn't desired behavior, and isn't consistent with what would happen if the
chosen plan had used an Index or Seq Scan, and that could lead to hard to
predict behavior for scans that contain TID quals and other quals as the
planner has freedom to choose TID Range or some other non-TID scan method
for such queries, and the chosen plan could change at any moment.
Here we fix this by properly implementing the recheck function for TID
Range Scans.
Backpatch to 14, where TID Range Scans were added
Reported-by: Sophie Alpert <pg@sophiebits.com>
Author: Sophie Alpert <pg@sophiebits.com>
Author: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/4a6268ff-3340-453a-9bf5-c98d51a6f729@app.fastmail.com
Backpatch-through: 14
The EvalPlanQual recheck for TID Scan wasn't rechecking the TID qual
still passed after following update chains. This could result in tuples
being updated or deleted by plans using TID Scans where the ctid of the
new (updated) tuple no longer matches the clause of the scan. This isn't
desired behavior, and isn't consistent with what would happen if the
chosen plan had used an Index or Seq Scan, and that could lead to hard to
predict behavior for scans that contain TID quals and other quals as the
planner has freedom to choose TID or some other scan method for such
queries, and the chosen plan could change at any moment.
Here we fix this by properly implementing the recheck function for TID
Scans.
Backpatch to 13, oldest supported version
Reported-by: Sophie Alpert <pg@sophiebits.com>
Author: Sophie Alpert <pg@sophiebits.com>
Author: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/4a6268ff-3340-453a-9bf5-c98d51a6f729@app.fastmail.com
Backpatch-through: 13
This reverts commit 98fc31d649.
That change allowed DROP OWNED BY to drop grants of the target
role to other roles, arguing that nobody would need those
privileges anymore. But that's not so: if you're not superuser,
you still need admin privilege on the target role so you can
drop it.
It's not clear whether or how the dependency-based approach
to solving the original problem can be adapted to keep these
grants. Since v18 release is fast approaching, the sanest
thing to do seems to be to revert this patch for now. The
race-condition problem is low severity and not worth taking
risks for.
I didn't force a catversion bump in 98fc31d64, so I won't do
so here either.
Reported-by: Dipesh Dhameliya <dipeshdhameliya125@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CABgZEgczOFicCJoqtrH9gbYMe_BV3Hq8zzCBRcMgmU6LRsihUA@mail.gmail.com
Backpatch-through: 18
The COMMENT should depend on the separately-dumped constraint, not the
domain. Sufficient restore parallelism might fail the COMMENT command
by issuing it before the constraint exists. Back-patch to v13, like
commit 0858f0f96e.
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Discussion: https://postgr.es/m/20250913020233.fa.nmisch@google.com
Backpatch-through: 13
Up to now we've contented ourselves with a one-size-fits-all error
hint when we fail to find any match to a function or procedure call.
That was mostly okay in the beginning, but it was never great, and
since the introduction of named arguments it's really not adequate.
We at least ought to distinguish "function name doesn't exist" from
"function name exists, but not with those argument names". And the
rules for named-argument matching are arcane enough that some more
detail seems warranted if we match the argument names but the call
still doesn't work.
This patch creates a framework for dealing with these problems:
FuncnameGetCandidates and related code will now pass back a bitmask of
flags showing how far the match succeeded. This allows a considerable
amount of granularity in the reports. The set-bits-in-a-bitmask
approach means that when there are multiple candidate functions, the
report will reflect the match(es) that got the furthest, which seems
correct. Also, we can avoid mentioning "maybe add casts" unless
failure to match argument types is actually the issue.
Extend the same return-a-bitmask approach to OpernameGetCandidates.
The issues around argument names don't apply to operator syntax,
but it still seems worth distinguishing between "there is no
operator of that name" and "we couldn't match the argument types".
While at it, adjust these messages and related ones to more strictly
separate "detail" from "hint", following our message style guidelines'
distinction between those.
Reported-by: Dominique Devienne <ddevienne@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Discussion: https://postgr.es/m/1756041.1754616558@sss.pgh.pa.us
In our previous discussions around making our regression tests
pass in FIPS mode, we concluded that we didn't need to support
the different error message wording observed with pre-3.0 OpenSSL.
However there are still a few LTS distributions soldiering along
with such versions, and now we have some in the buildfarm.
So let's add the variant expected-files needed to make them happy.
This commit only covers the core regression tests. Previous
discussion suggested that we might need some adjustments in
contrib as well, but it's not totally clear to me what those
would be. Rather than work it out from first principles,
I'll wait to see what the buildfarm shows.
Back-patch to v17 which is the oldest branch that claims
to support this case.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/443709.1757876535@sss.pgh.pa.us
Backpatch-through: 17
JsonConstructorExpr can produce non-NULL output with a NULL input, so
it should be treated as a non-strict construct. Failing to do so can
lead to incorrect query behavior.
For example, in the reported case, when pulling up a subquery that is
under an outer join, if the subquery's target list contains a
JsonConstructorExpr that uses subquery variables and it is mistakenly
treated as strict, it will be pulled up without being wrapped in a
PlaceHolderVar. As a result, the expression will be evaluated at the
wrong place and will not be forced to null when the outer join should
do so.
Back-patch to v16 where JsonConstructorExpr was introduced.
Bug: #19046
Reported-by: Runyuan He <runyuan@berkeley.edu>
Author: Tender Wang <tndrwang@gmail.com>
Co-authored-by: Richard Guo <guofenglinux@gmail.com>
Discussion: https://postgr.es/m/19046-765b6602b0a8cfdf@postgresql.org
Backpatch-through: 16
Previously we built the .map files for GB18030 (version 2000) from an
XML file. The 2022 version for this encoding is only available as a
Unicode Character Mapping (UCM) file, so as preparatory refactoring
switch to this format as the source for building version 2000.
As we do with most input files for the conversion mappings, download
the file on demand. In order to generate the same mappings we have
now, we must download from a previous upstream commit, rather than
the head since the latter contains a correction not present in our
current .map files.
The XML file is still used by EUC_CN, so we cannot delete it from our
repository. GB18030 is a superset of EUC_CN, so it may be possible to
build EUC_CN from the same UCM file, but that is left for future work.
Author: Chao Li <lic@highgo.com>
Discussion: https://postgr.es/m/966d9fc.169.198741fe60b.Coremail.jiaoshuntian%40highgo.com
Fix for commit 3c86223c99. That commit moved the typedef of pg_int64
from postgres_ext.h to libpq-fe.h, because the only remaining place
where it might be used is libpq users, and since the type is obsolete,
the intent was to limit its scope.
The problem is that if someone builds an extension against an
older (pre-PG18) server version and a new (PG18) libpq, they might get
two typedefs, depending on include file order. This is not allowed
under C99, so they might get warnings or errors, depending on the
compiler and options. The underlying types might also be
different (e.g., long int vs. long long int), which would also lead to
errors. This scenario is plausible when using the standard Debian
packaging, which provides only the newest libpq but per-major-version
server packages.
The fix is to undo that part of commit 3c86223c99. That way, the
typedef is in the same header file across versions. At least, this is
the safest fix doable before PostgreSQL 18 releases.
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://www.postgresql.org/message-id/25144219-5142-4589-89f8-4e76948b32db%40eisentraut.org
Previously, pg_dump incorrectly queried pg_seclabel to retrieve security labels
for subscriptions, which are stored in pg_shseclabel as they are global objects.
This could result in security labels for subscriptions not being dumped.
This commit fixes the issue by updating pg_dump to query the pg_seclabels view,
which aggregates entries from both pg_seclabel and pg_shseclabel.
While querying pg_shseclabel directly for subscriptions was an alternative,
using pg_seclabels is simpler and sufficient.
In addition, pg_dump is updated to dump security labels on event triggers,
which were previously omitted.
Backpatch to all supported versions.
Author: Jian He <jian.universality@gmail.com>
Co-authored-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CACJufxHCt00pR9h51AVu6+yPD5J7JQn=7dQXxqacj0XyDhc-fA@mail.gmail.com
Backpatch-through: 13