Commit graph

12688 commits

Author SHA1 Message Date
Bruce Momjian
451c43974f Update copyright for 2026
Backpatch-through: 14
2026-01-01 13:24:10 -05:00
Andrew Dunstan
f3c9e341cd Add paths of extensions to pg_available_extensions
Add a new "location" column to the pg_available_extensions and
pg_available_extension_versions views, exposing the directory where
the extension is located.

The default system location is shown as '$system', the same value
that can be used to configure the extension_control_path GUC.

User-defined locations are only visible for super users, otherwise
'<insufficient privilege>' is returned as a column value, the same
behaviour that we already use in pg_stat_activity.

I failed to resist the temptation to do a little extra editorializing of
the TAP test script.

Catalog version bumped.

Author: Matheus Alcantara <mths.dev@pm.me>
Reviewed-By: Chao Li <li.evan.chao@gmail.com>
Reviewed-By: Rohit Prasad <rohit.prasad@arm.com>
Reviewed-By: Michael Banck <mbanck@gmx.net>
Reviewed-By: Manni Wood <manni.wood@enterprisedb.com>
Reviewed-By: Euler Taveira <euler@eulerto.com>
Reviewed-By: Quan Zongliang <quanzongliang@yeah.net>
2026-01-01 12:13:59 -05:00
Tom Lane
bc6374cd76 Change IndexAmRoutines to be statically-allocated structs.
Up to now, index amhandlers were expected to produce a new, palloc'd
struct on each call.  That requires palloc/pfree overhead, and creates
a risk of memory leaks if the caller fails to pfree, and the time
taken to fill such a large structure isn't nil.  Moreover, we were
storing these things in the relcache, eating several hundred bytes for
each cached index.  There is not anything in these structs that needs
to vary at runtime, so let's change the definition so that an
amhandler can return a pointer to a "static const" struct of which
there's only one copy per index AM.  Mark all the core code's
IndexAmRoutine pointers const so that we catch anyplace that might
still try to change or pfree one.

(This is similar to the way we were already handling TableAmRoutine
structs.  This commit does fix one comment that was infelicitously
copied-and-pasted into tableamapi.c.)

This commit needs to be called out in the v19 release notes as an API
change for extension index AMs.  An un-updated AM will still work
(as of now, anyway) but it risks memory leaks and will be slower than
necessary.

Author: Matthias van de Meent <boekewurm+postgres@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAEoWx2=vApYk2LRu8R0DdahsPNEhWUxGBZ=rbZo1EXE=uA+opQ@mail.gmail.com
2025-12-30 18:26:23 -05:00
Thomas Munro
1a28b4b455 jit: Drop redundant LLVM configure probes.
We currently require LLVM 14, so these probes for LLVM 9 functions
always succeeded.  Even when the features aren't enabled in an LLVM
build, dummy functions are defined (a problem for a later commit).

The whole PGAC_CHECK_LLVM_FUNCTIONS macro and Meson equivalent are
removed, because we switched to testing LLVM_VERSION_MAJOR at compile
time in subsequent work and these were the last holdouts.  That suits
the nature of LLVM API evolution better, and also allows for strictly
mechanical pruning in future commits like 820b5af7 and 972c2cd2.  They
advanced the minimum LLVM version but failed to spot these.

Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA%2BhUKGJgB6gvrdDohgwLfCwzVQm%3DVMtb9m0vzQn%3DCwWn-kwG9w%40mail.gmail.com
2025-12-30 20:24:42 +13:00
Michael Paquier
97b101776c Add pg_get_multixact_stats()
This new function exposes at SQL level some information related to
multixacts, not available until now.  This data is useful for monitoring
purposes, especially for workloads that make a heavy use of multixacts:
- num_mxids, number of MultiXact IDs in use.
- num_members, number of member entries in use.
- members_size, bytes used by num_members in pg_multixact/members/.
- oldest_multixact: oldest MultiXact still needed.

This patch has been originally proposed when MultiXactOffset was still
32 bits, to monitor wraparound.  This part is not relevant anymore since
bd8d9c9bdf that has widen MultiXactOffset to 64 bits.  The monitoring
of disk space usage for the members is still relevant.

Some tests are added to check this function, in the shape of one
isolation test with concurrent transactions that take a ROW SHARE lock,
and some SQL tests for pg_read_all_stats.  Some documentation is added
to explain some patterns that can come from the information provided by
the function.

Bump catalog version.

Author: Naga Appani <nagnrik@gmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Atsushi Torikoshi <torikoshia@oss.nttdata.com>
Discussion: https://postgr.es/m/CA+QeY+AAsYK6WvBW4qYzHz4bahHycDAY_q5ECmHkEV_eB9ckzg@mail.gmail.com
2025-12-30 15:38:50 +09:00
Michael Paquier
0e3ad4b96a Add MultiXactOffsetStorageSize() to multixact_internal.h
This function calculates in bytes the storage taken between two
multixact offsets.  This will be used in an upcoming patch, introduced
separately here as this piece can be useful on its own.

Author: Naga Appani <nagnrik@gmail.com>
Co-authored-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/aUyTvZMq2CLgNEB4@paquier.xyz
2025-12-30 14:13:40 +09:00
Michael Paquier
9cf746a453 Change GetMultiXactInfo() to return the next multixact offset
This routine returned a number of members as a MultiXactOffset,
calculated based on the difference between the next-to-be-assigned
offset and the oldest offset.  However, this number is not actually an
offset but a number.

This type confusion comes from the original implementation of
MultiXactMemberFreezeThreshold(), in 53bb309d2d.  The number of
members is now defined as a uint64, large enough for MultiXactOffset.
This change will be used in a follow-up patch.

Reviewed-by: Naga Appani <nagnrik@gmail.com>
Discussion: https://postgr.es/m/aUyTvZMq2CLgNEB4@paquier.xyz
2025-12-30 14:03:49 +09:00
Richard Guo
ad66f705fa Strip PlaceHolderVars from index operands
When pulling up a subquery, we may need to wrap its targetlist items
in PlaceHolderVars to enforce separate identity or as a result of
outer joins.  However, this causes any upper-level WHERE clauses
referencing these outputs to contain PlaceHolderVars, which prevents
indxpath.c from recognizing that they could be matched to index
columns or index expressions, potentially affecting the planner's
ability to use indexes.

To fix, explicitly strip PlaceHolderVars from index operands.  A
PlaceHolderVar appearing in a relation-scan-level expression is
effectively a no-op.  Nevertheless, to play it safe, we strip only
PlaceHolderVars that are not marked nullable.

The stripping is performed recursively to handle cases where
PlaceHolderVars are nested or interleaved with other node types.  To
minimize performance impact, we first use a lightweight walker to
check for the presence of strippable PlaceHolderVars.  The expensive
mutator is invoked only if a candidate is found, avoiding unnecessary
memory allocation and tree copying in the common case where no
PlaceHolderVars are present.

Back-patch to v18.  Although this issue exists before that, changes in
this version made it common enough to notice.  Given the lack of field
reports for older versions, I am not back-patching further.

Reported-by: Haowu Ge <gehaowu@bitmoe.com>
Author: Richard Guo <guofenglinux@gmail.com>
Discussion: https://postgr.es/m/62af586c-c270-44f3-9c5e-02c81d537e3d.gehaowu@bitmoe.com
Backpatch-through: 18
2025-12-29 11:38:49 +09:00
Peter Eisentraut
b7057e4346 Change some Datum to void * for opaque pass-through pointer
Here, Datum was used to pass around an opaque pointer between a group
of functions.  But one might as well use void * for that; the use of
Datum doesn't achieve anything here and is just distracting.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/1c5d23cb-288b-4154-b1cd-191fe2301707%40eisentraut.org
2025-12-28 14:34:12 +01:00
Michael Paquier
9adf32da6b Split some long Makefile lists
This change makes more readable code diffs when adding new items or
removing old items, while ensuring that lines do not get excessively
long.  Some SUBDIRS, PROGRAMS and REGRESS lists are split.

Note that there are a few more REGRESS lists that could be split,
particularly in contrib/.

Author: Jelte Fennema-Nio <postgres@jeltef.nl>
Co-Authored-By: Jacob Champion <jacob.champion@enterprisedb.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Japin Li <japinli@hotmail.com>
Reviewed-by: Man Zeng <zengman@halodbtech.com>
Discussion: https://postgr.es/m/DF6HDGB559U5.3MPRFCWPONEAE@jeltef.nl
2025-12-28 09:17:42 +09:00
Peter Eisentraut
b63443718a Remove MsgType type
Presumably, the C type MsgType was meant to hold the protocol message
type in the pre-version-3 era, but this was never fully developed even
then, and the name is pretty confusing nowadays.  It has only one
vestigial use for cancel requests that we can get rid of.  Since a
cancel request is indicated by a special protocol version number, we
can use the ProtocolVersion type, which MsgType was based on.

Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/505e76cb-0ca2-4e22-ba0f-772b5dc3f230%40eisentraut.org
2025-12-27 23:46:28 +01:00
Michael Paquier
213a1b8952 Move attribute statistics functions to stat_utils.c
Many of the operations done for attribute stats in attribute_stats.c
share the same logic as extended stats, as done by a patch under
discussion to add support for extended stats import and export.  All the
pieces necessary for extended statistics are moved to stats_utils.c,
which is the file where common facilities are shared for stats files.

The following renames are done:
* get_attr_stat_type() -> statatt_get_type()
* init_empty_stats_tuple() -> statatt_init_empty_tuple()
* set_stats_slot() -> statatt_set_slot()
* get_elem_stat_type() -> statatt_get_elem_type()

While on it, this commit adds more documentation for all these
functions, describing more their internals and the dependencies that
have been implied for attribute statistics.  The same concepts apply to
extended statistics, at some degree.

Author: Corey Huinker <corey.huinker@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Yu Wang <wangyu_runtime@163.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CADkLM=dpz3KFnqP-dgJ-zvRvtjsa8UZv8wDAQdqho=qN3kX0Zg@mail.gmail.com
2025-12-25 15:13:39 +09:00
Richard Guo
325808cac9 Fix planner error with SRFs and grouping sets
If there are any SRFs in a PathTarget, we must separate it into
SRF-computing and SRF-free targets.  This is because the executor can
only handle SRFs that appear at the top level of the targetlist of a
ProjectSet plan node.

If we find a subexpression that matches an expression already computed
in the previous plan level, we should treat it like a Var and should
not split it again.  setrefs.c will later replace the expression with
a Var referencing the subplan output.

However, when processing the grouping target for grouping sets, the
planner can fail to recognize that an expression is already computed
in the scan/join phase.  The root cause is a mismatch in the
nullingrels bits.  Expressions in the grouping target carry the
grouping nulling bit in their nullingrels to indicate that they can be
nulled by the grouping step.  However, the corresponding expressions
in the scan/join target do not have these bits.

As a result, the exact match check in list_member() fails, leading the
planner to incorrectly believe that the expression needs to be
re-evaluated from its arguments, which are often not available in the
subplan.  This can lead to planner errors such as "variable not found
in subplan target list".

To fix, ignore the grouping nulling bit when checking whether an
expression from the grouping target is available in the pre-grouping
input target.  This aligns with the matching logic in setrefs.c.

Backpatch to v18, where this issue was introduced.

Bug: #19353
Reported-by: Marian MULLER REBEYROL <marian.muller@serli.com>
Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Tender Wang <tndrwang@gmail.com>
Discussion: https://postgr.es/m/19353-aaa179bba986a19b@postgresql.org
Backpatch-through: 18
2025-12-25 12:12:52 +09:00
Masahiko Sawada
67c20979ce Toggle logical decoding dynamically based on logical slot presence.
Previously logical decoding required wal_level to be set to 'logical'
at server start. This meant that users had to incur the overhead of
logical-level WAL logging even when no logical replication slots were
in use.

This commit adds functionality to automatically control logical
decoding availability based on logical replication slot presence. The
newly introduced module logicalctl.c allows logical decoding to be
dynamically activated when needed when wal_level is set to
'replica'.

When the first logical replication slot is created, the system
automatically increases the effective WAL level to maintain
logical-level WAL records. Conversely, after the last logical slot is
dropped or invalidated, it decreases back to 'replica' WAL level.

While activation occurs synchronously right after creating the first
logical slot, deactivation happens asynchronously through the
checkpointer process. This design avoids a race condition at the end
of recovery; a concurrent deactivation could happen while the startup
process enables logical decoding at the end of recovery, but WAL
writes are still not permitted until recovery fully completes. The
checkpointer will handle it after recovery is done. Asynchronous
deactivation also avoids excessive toggling of the logical decoding
status in workloads that repeatedly create and drop a single logical
slot. On the other hand, this lazy approach can delay changes to
effective_wal_level and the disabling logical decoding, especially
when the checkpointer is busy with other tasks. We chose this lazy
approach in all deactivation paths to keep the implementation simple,
even though laziness is strictly required only for end-of-recovery
cases. Future work might address this limitation either by using a
dedicated worker instead of the checkpointer, or by implementing
synchronous waiting during slot drops if workloads are significantly
affected by the lazy deactivation of logical decoding.

The effective WAL level, determined internally by XLogLogicalInfo, is
allowed to change within a transaction until an XID is assigned. Once
an XID is assigned, the value becomes fixed for the remainder of the
transaction. This behavior ensures that the logging mode remains
consistent within a writing transaction, similar to the behavior of
GUC parameters.

A new read-only GUC parameter effective_wal_level is introduced to
monitor the actual WAL level in effect. This parameter reflects the
current operational WAL level, which may differ from the configured
wal_level setting.

Bump PG_CONTROL_VERSION as it adds a new field to CheckPoint struct.

Reviewed-by: Shveta Malik <shveta.malik@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Shlok Kyal <shlok.kyal.oss@gmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Discussion: https://postgr.es/m/CAD21AoCVLeLYq09pQPaWs+Jwdni5FuJ8v2jgq-u9_uFbcp6UbA@mail.gmail.com
2025-12-23 10:13:16 -08:00
Michael Paquier
e5f3839af6 Switch buffile.c/h to use pgoff_t instead of off_t
off_t was previously used for offsets, which is 4 bytes on Windows,
hence limiting the backend code to a hard limit for files longer than
2GB.  This leads to some simplification in these files, removing some
casts based on long, also 4 bytes on Windows.

This commit removes one comment introduced in db3c4c3a2d, not relevant
anymore as pgoff_t is a safe 8-byte alternative on Windows.

This change is surprisingly not invasive, as the callers of
BufFileTell(), BufFileSeek() and BufFileTruncateFileSet() (worker.c,
tuplestore.c, etc.) track offsets in local structures that just to
switch from off_t to pgoff_t for the most part.

The file is still relying on a maximum file size of
MAX_PHYSICAL_FILESIZE (1GB).  This change allows the code to make this
maximum potentially larger in the future, or larger on a per-demand
basis.

Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/aUStrqoOCDRFAq1M@paquier.xyz
2025-12-23 07:41:34 +09:00
Heikki Linnakangas
47a9f61fca Use proper type for RestoreTransactionSnapshot's PGPROC arg
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://www.postgresql.org/message-id/08cbaeb5-aaaf-47b6-9ed8-4f7455b0bc4b@iki.fi
2025-12-19 13:40:02 +02:00
Michael Paquier
167cb26718 Fix const correctness in pgstat data serialization callbacks
4ba012a8ed defined the "header" (pointer to the stats data) of
from_serialized_data() as a const, even though it is fine (and
expected!) for the callback to modify the shared memory entry when
loading the stats at startup.

While on it, this commit updates the callback to_serialized_data() in
the test module test_custom_stats to make the data extracted from the
"header" parameter a const since it should never be modified: the stats
are written to disk and no modifications are expected in the shared
memory entry.

This clarifies the API contract of these new callbacks.

Reported-By: Peter Eisentraut <peter@eisentraut.org>
Author: Michael Paquier <michael@paquier.xyz>
Co-authored-by: Sami Imseih <samimseih@gmail.com>
Discussion: https://postgr.es/m/d87a93b0-19c7-4db6-b9c0-d6827e7b2da1@eisentraut.org
2025-12-18 07:33:40 +09:00
Michael Paquier
f4e797171e Change pgstat_report_vacuum() to use Relation
This change makes pgstat_report_vacuum() more consistent with
pgstat_report_analyze(), that also uses a Relation.  This enforces a
policy that callers of this routine should open and lock the relation
whose statistics are updated before calling this routine.  We will
unlikely have a lot of callers of this routine in the tree, but it seems
like a good idea to imply this requirement in the long run.

Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Suggested-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/aUEA6UZZkDCQFgSA@ip-10-97-1-34.eu-west-3.compute.internal
2025-12-17 11:26:17 +09:00
Jeff Davis
0a90df58cf Avoid global LC_CTYPE dependency in pg_locale_icu.c.
ICU still depends on libc for compatibility with certain historical
behavior for single-byte encodings. Make the dependency explicit by
holding a locale_t object when required.

We should consider a better solution in the future, such as decoding
the text to UTF-32 and using u_tolower(). That would be a behavior
change and require additional infrastructure though; so for now, just
avoid the global LC_CTYPE dependency.

Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/450ceb6260cad30d7afdf155d991a9caafee7c0d.camel@j-davis.com
2025-12-16 15:32:57 -08:00
Jeff Davis
87b2968df0 downcase_identifier(): use method table from locale provider.
Previously, libc's tolower() was always used for lowercasing
identifiers, regardless of the database locale (though only characters
beyond 127 in single-byte encodings were affected). Refactor to allow
each provider to supply its own implementation of identifier
downcasing.

For historical compatibility, when using a single-byte encoding, ICU
still relies on tolower().

One minor behavior change is that, before the database default locale
is initialized, it uses ASCII semantics to downcase the
identifiers. Previously, it would use the postmaster's LC_CTYPE
setting from the environment. While that could have some effect during
GUC processing, for example, it would have been fragile to rely on the
environment setting anyway. (Also, it only matters when the encoding
is single-byte.)

Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/450ceb6260cad30d7afdf155d991a9caafee7c0d.camel@j-davis.com
2025-12-16 15:32:41 -08:00
Jeff Davis
24bf379cb1 Clarify a #define introduced in 8d299052fe.
The value is the same, but use the right symbol for clarity.
2025-12-16 12:48:53 -08:00
Nathan Bossart
48d4a1423d Allow passing a pointer to GetNamedDSMSegment()'s init callback.
This commit adds a new "void *arg" parameter to
GetNamedDSMSegment() that is passed to the initialization callback
function.  This is useful for reusing an initialization callback
function for multiple DSM segments.

Author: Zsolt Parragi <zsolt.parragi@percona.com>
Reviewed-by: Sami Imseih <samimseih@gmail.com>
Discussion: https://postgr.es/m/CAN4CZFMjh8TrT9ZhWgjVTzBDkYZi2a84BnZ8bM%2BfLPuq7Cirzg%40mail.gmail.com
2025-12-15 14:27:16 -06:00
Noah Misch
64bf53dd61 Revisit cosmetics of "For inplace update, send nontransactional invalidations."
This removes a never-used CacheInvalidateHeapTupleInplace() parameter.
It adds README content about inplace update visibility in logical
decoding.  It rewrites other comments.

Back-patch to v18, where commit 243e9b40f1
first appeared.  Since this removes a CacheInvalidateHeapTupleInplace()
parameter, expect a v18 ".abi-compliance-history" edit to follow.  PGXN
contains no calls to that function.

Reported-by: Paul A Jungwirth <pj@illuminatedcomputing.com>
Reported-by: Ilyasov Ian <ianilyasov@outlook.com>
Reviewed-by: Paul A Jungwirth <pj@illuminatedcomputing.com>
Reviewed-by: Surya Poondla <s_poondla@apple.com>
Discussion: https://postgr.es/m/CA+renyU+LGLvCqS0=fHit-N1J-2=2_mPK97AQxvcfKm+F-DxJA@mail.gmail.com
Backpatch-through: 18
2025-12-15 12:19:49 -08:00
Jeff Davis
95a19fefdc Remove incorrect declarations in pg_wchar.h.
Oversight in commit 9acae56ce0.

Discussion: https://postgr.es/m/541F240E-94AD-4D65-9794-7D6C316BC3FF@gmail.com
2025-12-15 10:38:55 -08:00
Jeff Davis
54c41a6deb Remove unused single-byte char_is_cased() API.
https://postgr.es/m/450ceb6260cad30d7afdf155d991a9caafee7c0d.camel@j-davis.com
2025-12-15 10:24:57 -08:00
Peter Eisentraut
17f446784d Refactor static_assert() support.
HAVE__STATIC_ASSERT was really a test for GCC statement expressions,
as needed for StaticAssertExpr() now that _Static_assert could be
assumed to be available through our C11 requirement.  This
artificially prevented Visual Studio from being able to use
static_assert() in other contexts.

Instead, make a new test for HAVE_STATEMENT_EXPRESSIONS, and use that
to control only whether StaticAssertExpr() uses fallback code, not the
other variants.  This improves the quality of failure messages in the
(much more common) other variants under Visual Studio.

Also get rid of the two separate implementations for C++, since the C
implementation is also also valid as C++11.  While it is a stretch to
apply HAVE_STATEMENT_EXPRESSIONS tested with $CC to a C++ compiler,
the previous C++ coding assumed that the C++ compiler had them
unconditionally, so it isn't a new stretch.  In practice, the C and
C++ compilers are very likely to agree, and if a combination is ever
reported that falsifies this assumption we can always reconsider that.

Author: Thomas Munro <thomas.munro@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CA%2BhUKGKvr0x_oGmQTUkx%3DODgSksT2EtgCA6LmGx_jQFG%3DsDUpg%40mail.gmail.com
2025-12-15 11:54:23 +01:00
Michael Paquier
4ba012a8ed Allow cumulative statistics to read/write auxiliary data from/to disk
Cumulative stats kinds gain the capability to write additional per-entry
data when flushing the stats at shutdown, and read this data when
loading back the stats at startup.  This can be fit for example in the
case of variable-length data (like normalized query strings), so as it
becomes possible to link the shared memory stats entries to data that is
stored in a different area, like a DSA segment.

Three new optional callbacks are added to PgStat_KindInfo, available to
variable-numbered stats kinds:
* to_serialized_data: writes auxiliary data for an entry.
* from_serialized_data: reads auxiliary data for an entry.
* finish: performs actions after read/write/discard operations.  This is
invoked after processing all the entries of a kind, allowing extensions
to close file handles and clean up resources.

Stats kinds have the option to store this data in the existing pgstats
file, but can as well store it in one or more additional files whose
names can be built upon the entry keys.  The new serialized callbacks
are called once an entry key is read or written from the main stats
file.  A file descriptor to the main pgstats file is available in the
arguments of the callbacks.

Author: Sami Imseih <samimseih@gmail.com>
Co-authored-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAA5RZ0s9SDOu+Z6veoJCHWk+kDeTktAtC-KY9fQ9Z6BJdDUirQ@mail.gmail.com
2025-12-15 09:40:56 +09:00
Tom Lane
58dad7f349 Update typedefs.list to match what the buildfarm currently reports.
The current list from the buildfarm includes quite a few typedef
names that it used to miss.  The reason is a bit obscure, but it
seems likely to have something to do with our recent increased
use of palloc_object and palloc_array.  In any case, this makes
the relevant struct declarations be much more nicely formatted,
so I'll take it.  Install the current list and re-run pgindent
to update affected code.

Syncing with the current list also removes some obsolete
typedef names and fixes some alphabetization errors.

Discussion: https://postgr.es/m/1681301.1765742268@sss.pgh.pa.us
2025-12-14 17:03:53 -05:00
Tom Lane
66b2282b0c Make "pgoff_t" be a typedef not a #define.
There doesn't seem to be any great reason why this has been a macro
rather than a typedef.  But doing it like that means our buildfarm
typedef tooling doesn't capture the name as a typedef.  That would
result in pgindent glitches, except that we've seemingly kept it
in typedefs.list manually.  That's obviously error-prone, so let's
convert it to a typedef now.

Discussion: https://postgr.es/m/1681301.1765742268@sss.pgh.pa.us
2025-12-14 16:53:34 -05:00
Alexander Korotkov
b27e48213f Refactor WaitLSNType enum to use a macro for type count
Change WAIT_LSN_TYPE_COUNT from an enum sentinel to a macro definition,
in a similar way to IOObject, IOContext, and BackendType enums.  Remove
explicit enum value assignments well.

Author: Xuneng Zhou <xunengzhou@gmail.com>
2025-12-14 17:18:32 +02:00
Alexander Korotkov
4b3d173629 Implement ALTER TABLE ... SPLIT PARTITION ... command
This new DDL command splits a single partition into several partitions.  Just
like the ALTER TABLE ... MERGE PARTITIONS ... command, new partitions are
created using the createPartitionTable() function with the parent partition
as the template.

This commit comprises a quite naive implementation which works in a single
process and holds the ACCESS EXCLUSIVE LOCK on the parent table during all
the operations, including the tuple routing.  This is why the new DDL command
can't be recommended for large, partitioned tables under high load.  However,
this implementation comes in handy in certain cases, even as it is.  Also, it
could serve as a foundation for future implementations with less locking and
possibly parallelism.

Discussion: https://postgr.es/m/c73a1746-0cd0-6bdd-6b23-3ae0b7c0c582%40postgrespro.ru
Author: Dmitry Koval <d.koval@postgrespro.ru>
Co-authored-by: Alexander Korotkov <aekorotkov@gmail.com>
Co-authored-by: Tender Wang <tndrwang@gmail.com>
Co-authored-by: Richard Guo <guofenglinux@gmail.com>
Co-authored-by: Dagfinn Ilmari Mannsaker <ilmari@ilmari.org>
Co-authored-by: Fujii Masao <masao.fujii@gmail.com>
Co-authored-by: Jian He <jian.universality@gmail.com>
Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Reviewed-by: Laurenz Albe <laurenz.albe@cybertec.at>
Reviewed-by: Zhihong Yu <zyu@yugabyte.com>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com>
Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Robert Haas <rhaas@postgresql.org>
Reviewed-by: Stephane Tachoires <stephane.tachoires@gmail.com>
Reviewed-by: Jian He <jian.universality@gmail.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: Pavel Borisov <pashkin.elfe@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Alexander Lakhin <exclusion@gmail.com>
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-by: Daniel Gustafsson <dgustafsson@postgresql.org>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Noah Misch <noah@leadboat.com>
2025-12-14 13:29:38 +02:00
Alexander Korotkov
f2e4cc4279 Implement ALTER TABLE ... MERGE PARTITIONS ... command
This new DDL command merges several partitions into a single partition of the
target table.  The target partition is created using the new
createPartitionTable() function with the parent partition as the template.

This commit comprises a quite naive implementation which works in a single
process and holds the ACCESS EXCLUSIVE LOCK on the parent table during all
the operations, including the tuple routing.  This is why this new DDL
command can't be recommended for large partitioned tables under a high load.
However, this implementation comes in handy in certain cases, even as it is.
Also, it could serve as a foundation for future implementations with less
locking and possibly parallelism.

Discussion: https://postgr.es/m/c73a1746-0cd0-6bdd-6b23-3ae0b7c0c582%40postgrespro.ru
Author: Dmitry Koval <d.koval@postgrespro.ru>
Co-authored-by: Alexander Korotkov <aekorotkov@gmail.com>
Co-authored-by: Tender Wang <tndrwang@gmail.com>
Co-authored-by: Richard Guo <guofenglinux@gmail.com>
Co-authored-by: Dagfinn Ilmari Mannsaker <ilmari@ilmari.org>
Co-authored-by: Fujii Masao <masao.fujii@gmail.com>
Co-authored-by: Jian He <jian.universality@gmail.com>
Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Reviewed-by: Laurenz Albe <laurenz.albe@cybertec.at>
Reviewed-by: Zhihong Yu <zyu@yugabyte.com>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com>
Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Robert Haas <rhaas@postgresql.org>
Reviewed-by: Stephane Tachoires <stephane.tachoires@gmail.com>
Reviewed-by: Jian He <jian.universality@gmail.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: Pavel Borisov <pashkin.elfe@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Alexander Lakhin <exclusion@gmail.com>
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-by: Daniel Gustafsson <dgustafsson@postgresql.org>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Noah Misch <noah@leadboat.com>
2025-12-14 13:29:17 +02:00
Peter Eisentraut
315342ffed Use correct preprocessor conditional in relptr.h
When relptr.h was added (commit fbc1c12a94), there was no check for
HAVE_TYPEOF, so it used HAVE__BUILTIN_TYPES_COMPATIBLE_P, which
already existed (commit ea473fb2de) and which was thought to cover
approximately the same compilers.  But the guarded code can also work
without HAVE__BUILTIN_TYPES_COMPATIBLE_P, and we now have a check for
HAVE_TYPEOF (commit 4cb824699e), so let's fix this up to use the
correct logic.

Co-authored-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://www.postgresql.org/message-id/CA%2BhUKGL7trhWiJ4qxpksBztMMTWDyPnP1QN%2BLq341V7QL775DA%40mail.gmail.com
2025-12-13 19:56:09 +01:00
Peter Eisentraut
493eb0da31 Replace most StaticAssertStmt() with StaticAssertDecl()
Similar to commit 75f49221c2, it is preferable to use
StaticAssertDecl() instead of StaticAssertStmt() when possible.

Discussion: https://www.postgresql.org/message-id/flat/CA%2BhUKGKvr0x_oGmQTUkx%3DODgSksT2EtgCA6LmGx_jQFG%3DsDUpg%40mail.gmail.com
2025-12-12 10:06:40 +01:00
Nathan Bossart
b4cbc106a6 Fix some comments.
Like commit 123661427b, these were discovered while reviewing
Aleksander Alekseev's proposed changes to pgindent.
2025-12-11 15:13:04 -06:00
Peter Eisentraut
795e94c70c Make <assert.h> consistently available in frontend and backend
Previously, c.h made <assert.h> only available in frontends (#ifdef
FRONTEND), which was probably reasonable, because the only thing it
would give you is assert(), which you generally shouldn't use in the
backend.  But with C11, <assert.h> also makes available
static_assert(), which would be useful everywhere.  So this patch
moves <assert.h> to the commonly available header files in c.h and
fixes a small complication in regcustom.h that resulted from that.

Co-authored-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CA%2BhUKGKvr0x_oGmQTUkx%3DODgSksT2EtgCA6LmGx_jQFG%3DsDUpg%40mail.gmail.com
2025-12-11 09:56:57 +01:00
Tom Lane
0909380e4c Allow PG_PRINTF_ATTRIBUTE to be different in C and C++ code.
Although clang claims to be compatible with gcc's printf format
archetypes, this appears to be a falsehood: it likes __syslog__
(which gcc does not, on most platforms) and doesn't accept
gnu_printf.  This means that if you try to use gcc with clang++
or clang with g++, you get compiler warnings when compiling
printf-like calls in our C++ code.  This has been true for quite
awhile, but it's gotten more annoying with the recent appearance
of several buildfarm members that are configured like this.

To fix, run separate probes for the format archetype to use with the
C and C++ compilers, and conditionally define PG_PRINTF_ATTRIBUTE
depending on __cplusplus.

(We could alternatively insist that you not mix-and-match C and
C++ compilers; but if the case works otherwise, this is a poor
reason to insist on that.)

No back-patch for now, but we may want to do that if this
patch survives buildfarm testing.

Discussion: https://postgr.es/m/986485.1764825548@sss.pgh.pa.us
2025-12-10 17:09:10 -05:00
Jeff Davis
630706ced0 Add pg_iswcased().
True if character has multiple case forms. Will be a useful
multibyte-aware replacement for char_is_cased().

Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/450ceb6260cad30d7afdf155d991a9caafee7c0d.camel@j-davis.com
2025-12-10 11:56:11 -08:00
Jeff Davis
1e493158d3 Remove char_tolower() API.
It's only useful for an ILIKE optimization for the libc provider using
a single-byte encoding and a non-C locale, but it creates significant
internal complexity.

Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/450ceb6260cad30d7afdf155d991a9caafee7c0d.camel@j-davis.com
2025-12-10 11:55:59 -08:00
Thomas Munro
c507ba55f5 Fix O_CLOEXEC flag handling in Windows port.
PostgreSQL's src/port/open.c has always set bInheritHandle = TRUE
when opening files on Windows, making all file descriptors inheritable
by child processes.  This meant the O_CLOEXEC flag, added to many call
sites by commit 1da569ca1f (v16), was silently ignored.

The original commit included a comment suggesting that our open()
replacement doesn't create inheritable handles, but it was a mis-
understanding of the code path.  In practice, the code was creating
inheritable handles in all cases.

This hasn't caused widespread problems because most child processes
(archive_command, COPY PROGRAM, etc.) operate on file paths passed as
arguments rather than inherited file descriptors.  Even if a child
wanted to use an inherited handle, it would need to learn the numeric
handle value, which isn't passed through our IPC mechanisms.

Nonetheless, the current behavior is wrong.  It violates documented
O_CLOEXEC semantics, contradicts our own code comments, and makes
PostgreSQL behave differently on Windows than on Unix.  It also creates
potential issues with future code or security auditing tools.

To fix, define O_CLOEXEC to _O_NOINHERIT in master, previously used by
O_DSYNC.  We use different values in the back branches to preserve
existing values.  In pgwin32_open_handle() we set bInheritHandle
according to whether O_CLOEXEC is specified, for the same atomic
semantics as POSIX in multi-threaded programs that create processes.

Backpatch-through: 16
Author: Bryan Green <dbryan.green@gmail.com>
Co-authored-by: Thomas Munro <thomas.munro@gmail.com> (minor adjustments)
Discussion: https://postgr.es/m/e2b16375-7430-4053-bda3-5d2194ff1880%40gmail.com
2025-12-10 09:01:35 +13:00
Nathan Bossart
750816971b Add ParallelSlotSetIdle().
This commit refactors the code for marking a ParallelSlot as idle
to a new static inline function.  This can be used to mark a slot
that was obtained via ParallelSlotGetIdle() but that we don't
intend to actually use for a query as idle again.

This is preparatory work for a follow-up commit that will add a
--dry-run option to vacuumdb.

Reviewed-by: Corey Huinker <corey.huinker@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com
Discussion: https://postgr.es/m/CADkLM%3DckHkX7Of5SrK7g0LokPUwJ%3Dkk8JU1GXGF5pZ1eBVr0%3DQ%40mail.gmail.com
2025-12-09 13:34:22 -06:00
Masahiko Sawada
ab40db3852 Add started_by column to pg_stat_progress_analyze view.
The new column, started_by, indicates the initiator of the
analyze ('manual' or 'autovacuum'), helping users and monitoring tools
to better understand ANALYZE behavior.

Bump catalog version.

Author: Shinya Kato <shinya11.kato@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Sami Imseih <samimseih@gmail.com>
Reviewed-by: Yu Wang <wangyu_runtime@163.com>
Discussion: https://postgr.es/m/CAA5RZ0suoicwxFeK_eDkUrzF7s0BVTaE7M%2BehCpYcCk5wiECpw%40mail.gmail.com
2025-12-09 11:23:45 -08:00
Masahiko Sawada
0d78952061 Add mode and started_by columns to pg_stat_progress_vacuum view.
The new columns, mode and started_by, indicate the vacuum
mode ('normal', 'aggressive', or 'failsafe') and the initiator of the
vacuum ('manual', 'autovacuum', or 'autovacuum_wraparound'),
respectively. This allows users and monitoring tools to better
understand VACUUM behavior.

Bump catalog version.

Author: Shinya Kato <shinya11.kato@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Robert Treat <rob@xzilla.net>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Sami Imseih <samimseih@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Yu Wang <wangyu_runtime@163.com>
Discussion: https://postgr.es/m/CAOzEurQcOY-OBL_ouEVfEaFqe_md3vB5pXjR_m6L71Dcp1JKCQ@mail.gmail.com
2025-12-09 10:51:14 -08:00
Tom Lane
f8715ec866 Support "j" length modifier in snprintf.c.
POSIX has for a long time defined the "j" length modifier for
printf conversions as meaning the size of intmax_t or uintmax_t.
We got away without supporting that so far, because we were not
using intmax_t anywhere.  However, commit e6be84356 re-introduced
upstream's use of intmax_t and PRIdMAX into zic.c.  It emerges
that on some platforms (at least FreeBSD and macOS), <inttypes.h>
defines PRIdMAX as "jd", so that snprintf.c falls over if that is
used.  (We hadn't noticed yet because it would only be apparent
if bad data is fed to zic, resulting in an error report, and even
then the only visible symptom is a missing line number in the
error message.)

We could revert that decision from our copy of zic.c, but
on the whole it seems better to update snprintf.c to support
this standard modifier.  There might well be extensions,
now or in future, that expect it to work.

I did this in the lazy man's way of translating "j" to either
"l" or "ll" depending on a compile-time sizeof() check, just
as was done long ago to support "z" for size_t.  One could
imagine promoting intmax_t to have full support in snprintf.c,
for example converting fmtint()'s value argument and internal
arithmetic to use [u]intmax_t not [unsigned] long long.  But
that'd be more work and I'm hesitant to do it anyway: if there
are any platforms out there where intmax_t is actually wider
than "long long", this would doubtless result in a noticeable
speed penalty to snprintf().  Let's not go there until we have
positive evidence that there's a reason to, and some way to
measure what size of penalty we're taking.

Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/3210703.1765236740@sss.pgh.pa.us
2025-12-09 11:43:25 -05:00
Heikki Linnakangas
bd8d9c9bdf Widen MultiXactOffset to 64 bits
This eliminates MultiXactOffset wraparound and the 2^32 limit on the
total number of multixid members. Multixids are still limited to 2^31,
but this is a nice improvement because 'members' can grow much faster
than the number of multixids. On such systems, you can now run longer
before hitting hard limits or triggering anti-wraparound vacuums.

Not having to deal with MultiXactOffset wraparound also simplifies the
code and removes some gnarly corner cases.

We no longer need to perform emergency anti-wraparound freezing
because of running out of 'members' space, so the offset stop limit is
gone. But you might still not want 'members' to consume huge amounts
of disk space. For that reason, I kept the logic for lowering vacuum's
multixid freezing cutoff if a large amount of 'members' space is
used. The thresholds for that are roughly the same as the "safe" and
"danger" thresholds used before, 2 billion transactions and 4 billion
transactions. This keeps the behavior for the freeze cutoff roughly
the same as before. It might make sense to make this smarter or
configurable, now that the threshold is only needed to manage disk
usage, but that's left for the future.

Add code to pg_upgrade to convert multitransactions from the old to
the new format, rewriting the pg_multixact SLRU files. Because
pg_upgrade now rewrites the files, we can get rid of some hacks we had
put in place to deal with old bugs and upgraded clusters. Bump catalog
version for the pg_multixact/offsets format change.

Author: Maxim Orlov <orlovmg@gmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: wenhui qiu <qiuwenhuifx@gmail.com>
Discussion: https://www.postgresql.org/message-id/CACG%3DezaWg7_nt-8ey4aKv2w9LcuLthHknwCawmBgEeTnJrJTcw@mail.gmail.com
2025-12-09 13:53:03 +02:00
Heikki Linnakangas
bb3b1c4f64 Move pg_multixact SLRU page format definitions to a separate header
This makes them accessible from pg_upgrade, needed by the next commit.
I'm doing this mechanical move as a separate commit to make the next
commit's changes to these definitions more obvious.

Author: Maxim Orlov <orlovmg@gmail.com>
Discussion: https://www.postgresql.org/message-id/CACG%3DezbZo_3_fnx%3DS5BfepwRftzrpJ%2B7WET4EkTU6wnjDTsnjg@mail.gmail.com
2025-12-09 13:45:01 +02:00
Michael Paquier
0c3c5c3b06 Use palloc_object() and palloc_array() in more areas of the tree
The idea is to encourage more the use of these new routines across the
tree, as these offer stronger type safety guarantees than palloc().

The following paths are included in this batch, treating all the areas
proposed by the author for the most trivial changes, except src/backend
(by far the largest batch):
src/bin/
src/common/
src/fe_utils/
src/include/
src/pl/
src/test/
src/tutorial/

Similar work has been done in 31d3847a37.

The code compiles the same before and after this commit, with the
following exceptions due to changes in line numbers because some of the
new allocation formulas are shorter:
blkreftable.c
pgfnames.c
pl_exec.c

Author: David Geier <geidav.pg@gmail.com>
Discussion: https://postgr.es/m/ad0748d4-3080-436e-b0bc-ac8f86a3466a@gmail.com
2025-12-09 14:53:17 +09:00
Andres Freund
aa749bde32 Improve documentation for pg_atomic_unlocked_write_u32()
After my recent commit 7902a47c20, Nathan noticed that
pg_atomic_unlocked_write_u64() was not accurately described by the comments
for the 32bit version. Turns out the 32bit version has suffered from
copy-and-paste-itis since its introduction. Fix.

Reported-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Discussion: https://postgr.es/m/aTGt7q4Jvn97uGAx@nathan
2025-12-08 23:11:19 -05:00
Peter Geoghegan
65d6acbc56 Relocate _bt_readpage and related functions.
Quite a bit of code within nbtutils.c is only called by _bt_readpage.
Move _bt_readpage and all of the nbtutils.c functions it depends on into
a new .c file, nbtreadpage.c.  Also reorder some of the functions within
the new file for clarity.

This commit has no functional impact.  It is strictly mechanical.

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Victor Yegorov <vyegorov@gmail.com>
Discussion: https://postgr.es/m/CAH2-WzmwMwcwKFgaf+mYPwiz3iL4AqpXnwtW_O0vqpWPXRom9Q@mail.gmail.com
2025-12-08 13:15:00 -05:00
Tom Lane
0986e95161 Revise APIs for pushJsonbValue() and associated routines.
Instead of passing "JsonbParseState **" to pushJsonbValue(),
pass a pointer to a JsonbInState, which will contain the
parseState stack pointer as well as other useful fields.
Also, instead of returning a JsonbValue pointer that is often
meaningless/ignored, return the top-level JsonbValue pointer
in the "result" field of the JsonbInState.

This involves a lot of (mostly mechanical) edits, but I think
the results are notationally cleaner and easier to understand.
Certainly the business with sometimes capturing the result of
pushJsonbValue() and sometimes not was bug-prone and incapable of
mechanical verification.  In the new arrangement, JsonbInState.result
remains null until we've completed a valid sequence of pushes, so
that an incorrect sequence will result in a null-pointer dereference,
not mistaken use of a partial result.

However, this isn't simply an exercise in prettier notation.
The real reason for doing it is to provide a mechanism whereby
pushJsonbValue() can be told to construct the JsonbValue tree
in a context that is not CurrentMemoryContext.  That happens
when a non-null "outcontext" is specified in the JsonbInState.
No callers exercise that option in this patch, but the next
patch in the series will make use of it.

I tried to improve the comments in this area too.

Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: jian he <jian.universality@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/1060917.1753202222@sss.pgh.pa.us
2025-12-07 11:51:33 -05:00
Tom Lane
3628af4210 Add a macro for the declared typlen of type timetz.
pg_type.typlen says 12 for the size of timetz, but sizeof(TimeTzADT)
will be 16 on most platforms due to alignment padding.  Using the
sizeof number is no problem for usages such as palloc'ing a result
datum, but in usages such as datumCopy we really ought to match
what pg_type says.  Add a macro TIMETZ_TYPLEN so that we have a
symbolic way to write that rather than hard-coding "12".

I cannot find any place where we've needed this so far, but an
upcoming patch requires it.

Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/2329959.1765047648@sss.pgh.pa.us
2025-12-07 11:33:35 -05:00
Tom Lane
6498287696 Handle constant inputs to corr() and related aggregates more precisely.
The SQL standard says that corr() and friends should return NULL in
the mathematically-undefined case where all the inputs in one of
the columns have the same value.  We were checking that by seeing
if the sums Sxx and Syy were zero, but that approach is very
vulnerable to roundoff error: if a sum is close to zero but not
exactly that, we'd come out with a pretty silly non-NULL result.

Instead, directly track whether the inputs are all equal by
remembering the common value in each column.  Once we detect
that a new input is different from before, represent that by
storing NaN for the common value.  (An objection to this scheme
is that if the inputs are all NaN, we will consider that they
were not all equal.  But under IEEE float arithmetic rules,
one NaN is never equal to another, so this behavior is arguably
correct.  Moreover it matches what we did before in such cases.)
Then, leave the sums at their exact value of zero for as long
as we haven't detected different input values.

This solution requires the aggregate transition state to contain
8 float values not 6, which is not problematic, and it seems to add
less than 1% to the aggregates' runtime, which seems acceptable.

While we're here, improve corr()'s final function to cope with
overflow/underflow in the final calculation, and to clamp its
result to [-1, 1] in case of roundoff error.

Although this is arguably a bug fix, it requires a catversion bump
due to the change in aggregates' initial states, so it can't be
back-patched.

Patch written by me, but many of the ideas are due to Dean Rasheed,
who also did a deal of testing.

Bug: #19340
Reported-by: Oleg Ivanov <o15611@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Co-authored-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Discussion: https://postgr.es/m/19340-6fb9f6637f562092@postgresql.org
2025-12-06 18:31:26 -05:00
Amit Kapila
5db6a344ab Rename column slotsync_skip_at to slotsync_last_skip.
Commit 76b78721ca introduced two new columns in pg_stat_replication_slots
to improve monitoring of slot synchronization. One of these columns was
named slotsync_skip_at, which is inconsistent with the naming convention
used for similar columns in other system views.

Columns that store timestamps of the most recent event typically use the
'last_' in the column name (e.g., last_autovacuum, checksum_last_failure).
Renaming slotsync_skip_at to slotsync_last_skip aligns with this pattern,
making the purpose of the column clearer and improving overall consistency
across the views.

Author: Shlok Kyal <shlok.kyal.oss@gmail.com>
Reviewed-by: Michael Banck <mbanck@gmx.net>
Discussion: https://postgr.es/m/20251128091552.GB13635@p46.dedyn.io;lightning.p46.dedyn.io
Discussion: https://postgr.es/m/CAE9k0PkhfKrTEAsGz4DjOhEj1nQ+hbQVfvWUxNacD38ibW3a1g@mail.gmail.com
2025-12-05 04:12:55 +00:00
Peter Eisentraut
c6be3daa05 Remove no longer needed casts to Pointer
These casts used to be required when Pointer was char *, but now it's
void * (commit 1b2bb5077e), so they are not needed anymore.

Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://www.postgresql.org/message-id/4154950a-47ae-4223-bd01-1235cc50e933%40eisentraut.org
2025-12-04 19:40:08 +01:00
Andres Freund
6c5c393b74 Rename BUFFERPIN wait event class to BUFFER
In an upcoming patch more wait events will be added to the wait event
class (for buffer locking), making the current name too
specific. Alternatively we could introduce a dedicated wait event class for
those, but it seems somewhat confusing to have a BUFFERPIN and a BUFFER wait
event class.

Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff
2025-12-03 18:38:20 -05:00
Andres Freund
7902a47c20 Add pg_atomic_unlocked_write_u64
The 64bit equivalent of pg_atomic_unlocked_write_u32(), to be used in an
upcoming patch converting BufferDesc.state into a 64bit atomic.

Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff
2025-12-03 18:38:20 -05:00
Andres Freund
156680055d bufmgr: Turn BUFFER_LOCK_* into an enum
It seems cleaner to use an enum to tie the different values together. It also
helps to have a more descriptive type in the argument to various functions.

Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff
2025-12-03 18:38:20 -05:00
Heikki Linnakangas
789d65364c Set next multixid's offset when creating a new multixid
With this commit, the next multixid's offset will always be set on the
offsets page, by the time that a backend might try to read it, so we
no longer need the waiting mechanism with the condition variable. In
other words, this eliminates "corner case 2" mentioned in the
comments.

The waiting mechanism was broken in a few scenarios:

- When nextMulti was advanced without WAL-logging the next
  multixid. For example, if a later multixid was already assigned and
  WAL-logged before the previous one was WAL-logged, and then the
  server crashed. In that case the next offset would never be set in
  the offsets SLRU, and a query trying to read it would get stuck
  waiting for it. Same thing could happen if pg_resetwal was used to
  forcibly advance nextMulti.

- In hot standby mode, a deadlock could happen where one backend waits
  for the next multixid assignment record, but WAL replay is not
  advancing because of a recovery conflict with the waiting backend.

The old TAP test used carefully placed injection points to exercise
the old waiting code, but now that the waiting code is gone, much of
the old test is no longer relevant. Rewrite the test to reproduce the
IPC/MultixactCreation hang after crash recovery instead, and to verify
that previously recorded multixids stay readable.

Backpatch to all supported versions. In back-branches, we still need
to be able to read WAL that was generated before this fix, so in the
back-branches this includes a hack to initialize the next offsets page
when replaying XLOG_MULTIXACT_CREATE_ID for the last multixid on a
page. On 'master', bump XLOG_PAGE_MAGIC instead to indicate that the
WAL is not compatible.

Author: Andrey Borodin <amborodin@acm.org>
Reviewed-by: Dmitry Yurichev <dsy.075@yandex.ru>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Ivan Bykov <i.bykov@modernsys.ru>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://www.postgresql.org/message-id/172e5723-d65f-4eec-b512-14beacb326ce@yandex.ru
Backpatch-through: 14
2025-12-03 19:15:08 +02:00
Peter Eisentraut
9790affcce Fix stray references to SubscriptRef
This type never existed.  SubscriptingRef was meant instead.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/2eaa45e3-efc5-4d75-b082-f8159f51445f%40eisentraut.org
2025-12-03 14:44:14 +01:00
Peter Eisentraut
1b2bb5077e Change Pointer to void *
The comment for the Pointer type said 'XXX Pointer arithmetic is done
with this, so it can't be void * under "true" ANSI compilers.'.  This
has been fixed in the previous commit 756a436893.  This now changes
the definition of the type from char * to void *, as envisaged by that
comment.

Extension code that relies on using Pointer for pointer arithmetic
will need to make changes similar to commit 756a436893, but those
changes would be backward compatible.

Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://www.postgresql.org/message-id/4154950a-47ae-4223-bd01-1235cc50e933%40eisentraut.org
2025-12-03 10:22:17 +01:00
Nathan Bossart
f894acb24a Show size of DSAs and dshashes in pg_dsm_registry_allocations.
Presently, this view reports NULL for the size of DSAs and dshash
tables because 1) the current backend might not be attached to them
and 2) the registry doesn't save the pointers to the dsa_area or
dshash_table in local memory.  Also, the view doesn't show
partially-initialized entries to avoid ambiguity, since those
entries would report a NULL size as well.

This commit introduces a function that looks up the size of a DSA
given its handle (transiently attaching to the control segment if
needed) and teaches pg_dsm_registry_allocations to use it to show
the size of successfully-initialized DSA and dshash entries.
Furthermore, the view now reports partially-initialized entries
with a NULL size.

Reviewed-by: Rahila Syed <rahilasyed90@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/aSeEDeznAsHR1_YF%40nathan
2025-12-02 10:29:45 -06:00
Michael Paquier
713d9a847e Update some timestamp[tz] functions to use soft-error reporting
This commit updates two functions that convert "timestamptz" to
"timestamp", and vice-versa, to use the soft error reporting rather than
a their own logic to do the same.  These are now named as follows:
- timestamp2timestamptz_safe()
- timestamptz2timestamp_safe()

These functions were suffixed with "_opt_overflow", previously.

This shaves some code, as it is possible to detect how a timestamp[tz]
overflowed based on the returned value rather than a custom state.  It
is optionally possible for the callers of these functions to rely on the
error generated internally by these functions, depending on the error
context.

Similar work has been done in d03668ea05 and 4246a977ba.

Reviewed-by: Amul Sul <sulamul@gmail.com>
Discussion: https://postgr.es/m/aS09YF2GmVXjAxbJ@paquier.xyz
2025-12-02 09:30:23 +09:00
Jeff Davis
19b966243c Make regex "max_chr" depend on encoding, not provider.
The regex mechanism scans through the first "max_chr" character values
to cache character property ranges (isalpha, etc.). For single-byte
encodings, there's no sense in scanning beyond UCHAR_MAX; but for
UTF-8 it makes sense to cache higher code point values (though not all
of them; only up to MAX_SIMPLE_CHR).

Prior to 5a38104b36, the logic about how many character values to scan
was based on the pg_regex_strategy, which was dependent on the
provider. Commit 5a38104b36 preserved that logic exactly, allowing
different providers to define the "max_chr".

Now, change it to depend only on the encoding and whether
ctype_is_c. For this specific calculation, distinguishing between
providers creates more complexity than it's worth.

Discussion: https://postgr.es/m/450ceb6260cad30d7afdf155d991a9caafee7c0d.camel@j-davis.com
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
2025-12-01 11:06:17 -08:00
Michael Paquier
a87987cafc Move WAL sequence code into its own file
This split exists for most of the other RMGRs, and makes cleaner the
separation between the WAL code, the redo code and the record
description code (already in its own file) when it comes to the sequence
RMGR.  The redo and masking routines are moved to a new file,
sequence_xlog.c.  All the RMGR routines are now located in a new header,
sequence_xlog.h.

This separation is useful for a different patch related to sequences
that I have been working on, where it makes a refactoring of sequence.c
easier if its RMGR routines and its core routines are split.

Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/aSfTxIWjiXkTKh1E@paquier.xyz
2025-12-01 16:21:41 +09:00
Michael Paquier
d03668ea05 Switch some date/timestamp functions to use the soft error reporting
This commit changes some functions related to the data types date and
timestamp to use the soft error reporting rather than a custom boolean
flag called "overflow", used to let the callers of these functions know
if an overflow happens.

This results in the removal of some boilerplate code, as it is possible
to rely on an error context rather than a custom state, with the
possibility to use the error generated inside the functions updated
here, if necessary.

These functions were suffixed with "_opt_overflow".  They are now
renamed to use "_safe" as suffix.

This work is similar to 4246a977ba.

Author: Amul Sul <sulamul@gmail.com>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CAAJ_b95HEmFyzHZfsdPquSHeswcopk8MCG1Q_vn4tVkZ+xxofw@mail.gmail.com
2025-12-01 15:22:20 +09:00
Peter Eisentraut
8b3e2c622a Fix pg_isblank()
There was a pg_isblank() function that claimed to be a replacement for
the standard isblank() function, which was thought to be "not very
portable yet".  We can now assume that it's portable (it's in C99).

But pg_isblank() actually diverged from the standard isblank() by also
accepting '\r', while the standard one only accepts space and tab.
This was added to support parsing pg_hba.conf under Windows.  But the
hba parsing code now works completely differently and already handles
line endings before we get to pg_isblank().  The other user of
pg_isblank() is for ident protocol message parsing, which also handles
'\r' separately.  So this behavior is now obsolete and confusing.

To improve clarity, I separated those concerns.  The ident parsing now
gets its own function that hardcodes the whitespace characters
mentioned by the relevant RFC.  pg_isblank() is now static in hba.c
and is a wrapper around the standard isblank(), with some extra logic
to ensure robust treatment of non-ASCII characters.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/170308e6-a7a3-4484-87b2-f960bb564afa%40eisentraut.org
2025-11-28 08:33:07 +01:00
Amit Kapila
e68b6adad9 Add slotsync_skip_reason column to pg_replication_slots view.
Introduce a new column, slotsync_skip_reason, in the pg_replication_slots
view. This column records the reason why the last slot synchronization was
skipped. It is primarily relevant for logical replication slots on standby
servers where the 'synced' field is true. The value is NULL when
synchronization succeeds.

Author: Shlok Kyal <shlok.kyal.oss@gmail.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Ashutosh Sharma <ashu.coek88@gmail.com>
Reviewed-by: Hou Zhijie <houzj.fnst@fujitsu.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/CAE9k0PkhfKrTEAsGz4DjOhEj1nQ+hbQVfvWUxNacD38ibW3a1g@mail.gmail.com
2025-11-28 05:21:35 +00:00
Michael Paquier
9660906dbd Add routines for marking buffers dirty efficiently
This commit introduces new internal bufmgr routines for marking shared
buffers as dirty:
* MarkDirtyUnpinnedBuffer()
* MarkDirtyRelUnpinnedBuffers()
* MarkDirtyAllUnpinnedBuffers()

These functions provide an efficient mechanism to respectively mark one
buffer, all the buffers of a relation, or the entire shared buffer pool
as dirty, something that can be useful to force patterns for the
checkpointer.  MarkDirtyUnpinnedBufferInternal(), an extra routine, is
used by these three, to mark as dirty an unpinned buffer.

They are intended as developer tools to manipulate buffer dirtiness in
bulk, and will be used in a follow-up commit.

Author: Nazir Bilal Yavuz <byavuz81@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Aidar Imamov <a.imamov@postgrespro.ru>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Joseph Koshakow <koshy44@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Yuhang Qiu <iamqyh@gmail.com>
Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com>
Discussion: https://postgr.es/m/CAN55FZ0h_YoSqqutxV6DES1RW8ig6wcA8CR9rJk358YRMxZFmw@mail.gmail.com
2025-11-28 07:39:33 +09:00
Peter Eisentraut
e7075a3405 Use C11 alignas in pg_atomic_uint64 definitions
They were already using pg_attribute_aligned.  This replaces that with
alignas and moves that into the required syntactic position.  This
ends up making these three atomics implementations appear a bit more
consistent, but shouldn't change anything otherwise.

Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/46f05236-d4d4-4b4e-84d4-faa500f14691%40eisentraut.org
2025-11-27 07:53:34 +01:00
David Rowley
0ca3b16973 Add parallelism support for TID Range Scans
In v14, bb437f995 added support for scanning for ranges of TIDs using a
dedicated executor node for the purpose.  Here, we allow these scans to
be parallelized.  The range of blocks to scan is divvied up similarly to
how a Parallel Seq Scans does that, where 'chunks' of blocks are
allocated to each worker and the size of those chunks is slowly reduced
down to 1 block per worker by the time we're nearing the end of the
scan.  Doing that means workers finish at roughly the same time.

Allowing TID Range Scans to be parallelized removes the dilemma from the
planner as to whether a Parallel Seq Scan will cost less than a
non-parallel TID Range Scan due to the CPU concurrency of the Seq Scan
(disk costs are not divided by the number of workers).  It was possible
the planner could choose the Parallel Seq Scan which would result in
reading additional blocks during execution than the TID Scan would have.
Allowing Parallel TID Range Scans removes the trade-off the planner
makes when choosing between reduced CPU costs due to parallelism vs
additional I/O from the Parallel Seq Scan due to it scanning blocks from
outside of the required TID range.  There is also, of course, the
traditional parallelism performance benefits to be gained as well, which
likely doesn't need to be explained here.

Author: Cary Huang <cary.huang@highgo.ca>
Author: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Junwang Zhao <zhjwpku@gmail.com>
Reviewed-by: Rafia Sabih <rafia.pghackers@gmail.com>
Reviewed-by: Steven Niu <niushiji@gmail.com>
Discussion: https://postgr.es/m/18f2c002a24.11bc2ab825151706.3749144144619388582@highgo.ca
2025-11-27 14:05:04 +13:00
David Rowley
42473b3b31 Have the planner replace COUNT(ANY) with COUNT(*), when possible
This adds SupportRequestSimplifyAggref to allow pg_proc.prosupport
functions to receive an Aggref and allow them to determine if there is a
way that the Aggref call can be optimized.

Also added is a support function to allow transformation of COUNT(ANY)
into COUNT(*).  This is possible to do when the given "ANY" cannot be
NULL and also that there are no ORDER BY / DISTINCT clauses within the
Aggref.  This is a useful transformation to do as it is common that
people write COUNT(1), which until now has added unneeded overhead.
When counting a NOT NULL column.  The overheads can be worse as that
might mean deforming more of the tuple, which for large fact tables may
be many columns in.

It may be possible to add prosupport functions for other aggregates.  We
could consider if ORDER BY could be dropped for some calls, e.g. the
ORDER BY is quite useless in MAX(c ORDER BY c).

There is a little bit of passing fallout from adjusting
expr_is_nonnullable() to handle Const which results in a plan change in
the aggregates.out regression test.  Previously, nothing was able to
determine that "One-Time Filter: (100 IS NOT NULL)" was always true,
therefore useless to include in the plan.

Author: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Corey Huinker <corey.huinker@gmail.com>
Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com>
Discussion: https://postgr.es/m/CAApHDvqGcPTagXpKfH=CrmHBqALpziThJEDs_MrPqjKVeDF9wA@mail.gmail.com
2025-11-27 10:43:28 +13:00
Jeff Davis
8d299052fe Add #define for UNICODE_CASEMAP_BUFSZ.
Useful for mapping a single codepoint at a time into a
statically-allocated buffer.

Discussion: https://postgr.es/m/450ceb6260cad30d7afdf155d991a9caafee7c0d.camel@j-davis.com
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
2025-11-26 10:05:11 -08:00
Jeff Davis
ec4997a9d7 Inline pg_ascii_tolower() and pg_ascii_toupper().
Discussion: https://postgr.es/m/450ceb6260cad30d7afdf155d991a9caafee7c0d.camel@j-davis.com
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
2025-11-26 10:04:32 -08:00
Peter Eisentraut
8fe4aef829 Replace internal C function pg_hypot() by standard hypot()
The code comment said, "It is expected that this routine will
eventually be replaced with the C99 hypot() function.", so let's do
that now.

This function is tested via the geometry regression test, so if it is
faulty on any platform, it will show up there.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/170308e6-a7a3-4484-87b2-f960bb564afa%40eisentraut.org
2025-11-26 07:48:29 +01:00
Amit Kapila
76b78721ca Add slotsync skip statistics.
This patch adds two new columns to the pg_stat_replication_slots view:
slotsync_skip_count - the total number of times a slotsync operation was
skipped.
slotsync_skip_at - the timestamp of the most recent skip.

These additions provide better visibility into replication slot
synchronization behavior.

A future patch will introduce the slotsync_skip_reason column in
pg_replication_slots to capture the reason for skip.

Author: Shlok Kyal <shlok.kyal.oss@gmail.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Ashutosh Sharma <ashu.coek88@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/CAE9k0PkhfKrTEAsGz4DjOhEj1nQ+hbQVfvWUxNacD38ibW3a1g@mail.gmail.com
2025-11-25 07:06:02 +00:00
Michael Paquier
ed823da128 Rename routines for write/read of pgstats file
This commit renames write_chunk and read_chunk to respectively
pgstat_write_chunk() and pgstat_read_chunk(), along with the *_s
convenience macros.

These are made available for plug-ins, so as any code that decides to
write and/or read stats data can rely on a single code path for this
work.

Extracted from a larger patch by the same author.

Author: Sami Imseih <samimseih@gmail.com>
Discussion: https://postgr.es/m/CAA5RZ0s9SDOu+Z6veoJCHWk+kDeTktAtC-KY9fQ9Z6BJdDUirQ@mail.gmail.com
2025-11-25 10:55:40 +09:00
Tom Lane
698fa924b1 Improve detection of implicitly-temporary views.
We've long had a practice of making views temporary by default if they
reference any temporary tables.  However the implementation was pretty
incomplete, in that it only searched for RangeTblEntry references to
temp relations.  Uses of temporary types, regclass constants, etc
were not detected even though the dependency mechanism considers them
grounds for dropping the view.  Thus a view not believed to be temp
could silently go away at session exit anyhow.

To improve matters, replace the ad-hoc isQueryUsingTempRelation()
logic with use of the dependency-based infrastructure introduced by
commit 572c40ba9.  This is complete by definition, and it's less code
overall.

While we're at it, we can also extend the warning NOTICE (or ERROR
in the case of a materialized view) to mention one of the temp
objects motivating the classification of the view as temp, as was
done for functions in 572c40ba9.

Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Jim Jones <jim.jones@uni-muenster.de>
Discussion: https://postgr.es/m/19cf6ae1-04cd-422c-a760-d7e75fe6cba9@uni-muenster.de
2025-11-24 17:00:16 -05:00
Jacob Champion
0664aa4ff8 Reorganize pqcomm.h a bit
Group the PG_PROTOCOL() codes, add a comment to AuthRequest now that the
AUTH_REQ codes live in a different header, and make some small
adjustments to spacing and comment style for the sake of scannability.

Reviewed-by: Jelte Fennema-Nio <postgres@jeltef.nl>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/CAOYmi%2B%3D6zg4oXXOQtifrVao_YKiujTDa3u6bxnU08r0FsSig4g%40mail.gmail.com
2025-11-24 10:01:30 -08:00
Jacob Champion
8934f2136c Add pg_add_size_overflow() and friends
Commit 600086f47 added (several bespoke copies of) size_t addition with
overflow checks to libpq. Move this to common/int.h, along with
its subtraction and multiplication counterparts.

pg_neg_size_overflow() is intentionally omitted; I'm not sure we should
add SSIZE_MAX to win32_port.h for the sake of a function with no
callers.

Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CAOYmi%2B%3D%2BpqUd2MUitvgW1pAJuXgG_TKCVc3_Ek7pe8z9nkf%2BAg%40mail.gmail.com
2025-11-24 09:59:38 -08:00
Peter Eisentraut
d4c0f91f7d C11 alignas instead of unions -- extended alignments
This replaces some uses of pg_attribute_aligned() with the standard
alignas() for cases where extended alignment (larger than max_align_t)
is required.

This patch stipulates that all supported compilers must support
alignments up to PG_IO_ALIGN_SIZE, but that seems pretty likely.

We can then also desupport the case where direct I/O is disabled
because pg_attribute_aligned is not supported.

Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/46f05236-d4d4-4b4e-84d4-faa500f14691%40eisentraut.org
2025-11-24 07:39:37 +01:00
David Rowley
07d1dc3aeb Fix incorrect IndexOptInfo header comment
The comment incorrectly indicated that indexcollations[] stored
collations for both key columns and INCLUDE columns, but in reality it
only has elements for the key columns.  canreturn[] didn't get a mention,
so add that while we're here.

Author: Junwang Zhao <zhjwpku@gmail.com>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/CAEG8a3LwbZgMKOQ9CmZarX5DEipKivdHp5PZMOO-riL0w%3DL%3D4A%40mail.gmail.com
Backpatch-through: 14
2025-11-24 17:00:01 +13:00
Tom Lane
572c40ba94 Issue a NOTICE if a created function depends on any temp objects.
We don't have an official concept of temporary functions.  (You can
make one explicitly in pg_temp, but then you have to explicitly
schema-qualify it on every call.)  However, until now we were quite
laissez-faire about whether a non-temporary function could depend on
a temporary object, such as a temp table or view.  If one does,
it will silently go away at end of session, due to the automatic
DROP ... CASCADE on the session's temporary objects.  People have
complained that that's surprising; however, we can't really forbid
it because other people (including our own regression tests) rely
on being able to do it.  Let's compromise by emitting a NOTICE
at CREATE FUNCTION time.  This is somewhat comparable to our
ancient practice of emitting a NOTICE when forcing a view to
become temp because it depends on temp tables.

Along the way, refactor recordDependencyOnExpr() so that the
dependencies of an expression can be combined with other
dependencies, instead of being emitted separately and perhaps
duplicatively.

We should probably make the implementation of temp-by-default
views use the same infrastructure used here, but that's for
another patch.  It's unclear whether there are any other object
classes that deserve similar treatment.

Author: Jim Jones <jim.jones@uni-muenster.de>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/19cf6ae1-04cd-422c-a760-d7e75fe6cba9@uni-muenster.de
2025-11-23 15:02:55 -05:00
Tom Lane
b140c8d7a3 Add SupportRequestInlineInFrom planner support request.
This request allows a support function to replace a function call
appearing in FROM (typically a set-returning function) with an
equivalent SELECT subquery.  The subquery will then be subject
to the planner's usual optimizations, potentially allowing a much
better plan to be generated.  While the planner has long done this
automatically for simple SQL-language functions, it's now possible
for extensions to do it for functions outside that group.
Notably, this could be useful for functions that are presently
implemented in PL/pgSQL and work by generating and then EXECUTE'ing
a SQL query.

Author: Paul A Jungwirth <pj@illuminatedcomputing.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/09de6afa-c33d-4d94-a5cb-afc6cea0d2bb@illuminatedcomputing.com
2025-11-22 19:33:34 -05:00
Peter Eisentraut
5eed8ce50c Add range_minus_multi and multirange_minus_multi functions
The existing range_minus function raises an exception when the range is
"split", because then the result can't be represented by a single range.
For example '[0,10)'::int4range - '[4,5)' would be '[0,4)' and '[5,10)'.

This commit adds new set-returning functions so that callers can get
results even in the case of splits. There is no risk of an exception for
multiranges, but a set-returning function lets us handle them the same
way we handle ranges.

Both functions return zero results if the subtraction would give an
empty range/multirange.

The main use-case for these functions is to implement UPDATE/DELETE FOR
PORTION OF, which must compute the application-time of "temporal
leftovers": the part of history in an updated/deleted row that was not
changed. To preserve the untouched history, we will implicitly insert
one record for each result returned by range/multirange_minus_multi.
Using a set-returning function will also let us support user-defined
types for application-time update/delete in the future.

Author: Paul A. Jungwirth <pj@illuminatedcomputing.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/ec498c3d-5f2b-48ec-b989-5561c8aa2024%40illuminatedcomputing.com
2025-11-22 09:42:03 +01:00
Peter Eisentraut
97e04c74be C11 alignas instead of unions
This changes a few union members that only existed to ensure
alignments and replaces them with the C11 alignas specifier.

This change only uses fundamental alignments (meaning approximately
alignments of basic types), which all C11 compilers must support.
There are opportunities for similar changes using extended alignments,
for example in PGIOAlignedBlock, but these are not necessarily
supported by all compilers, so they are kept as a separate change.

Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/46f05236-d4d4-4b4e-84d4-faa500f14691%40eisentraut.org
2025-11-21 10:08:24 +01:00
Melanie Plageman
1937ed7062 Refactor heap_page_prune_and_freeze() parameters into a struct
heap_page_prune_and_freeze() had accumulated an unwieldy number of input
parameters and upcoming work to handle VM updates in this function will
add even more.

Introduce a new PruneFreezeParams struct to group the function’s input
parameters, improving readability and maintainability.

Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/yn4zp35kkdsjx6wf47zcfmxgexxt4h2og47pvnw2x5ifyrs3qc%407uw6jyyxuyf7
2025-11-20 10:32:14 -05:00
Peter Eisentraut
300c8f5324 Add <stdalign.h> to c.h
This allows using the C11 constructs alignas and alignof (not done in
this patch).

Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/46f05236-d4d4-4b4e-84d4-faa500f14691%40eisentraut.org
2025-11-19 08:18:25 +01:00
Alexander Korotkov
75e82b2f5a Optimize shared memory usage for WaitLSNProcInfo
We need separate pairing heaps for different WaitLSNType's, because there
might be waiters for different LSN's at the same time.  However, one process
can wait only for one type of LSN at a time.  So, no need for inHeap
and heapNode fields to be arrays.

Discussion: https://postgr.es/m/CAPpHfdsBR-7sDtXFJ1qpJtKiohfGoj%3DvqzKVjWxtWsWidx7G_A%40mail.gmail.com
Author: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com>
2025-11-18 09:50:12 +02:00
Amit Kapila
3edaf29fa5 Rename two columns in pg_stat_subscription_stats.
This patch renames the sync_error_count column to sync_table_error_count
in the pg_stat_subscription_stats view. The new name makes the purpose
explicit now that a separate column exists to track sequence
synchronization errors.

Additionally, the column seq_sync_error_count is renamed to
sync_seq_error_count to maintain a consistent naming pattern, making it
easier for users to group, and query synchronization related counters.

Author: Vignesh C <vignesh21@gmail.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CALDaNm3WwJmz=-4ybTkhniB-Nf3qmFG9Zx1uKjyLLoPF5NYYXA@mail.gmail.com
2025-11-18 03:58:55 +00:00
Michael Paquier
e76defbcf0 Rework output format of pg_dependencies
The existing format of pg_dependencies uses a single-object JSON
structure, with each key value embedding all the knowledge about the
set attributes tracked, like:
{"1 => 5": 1.000000, "5 => 1": 0.423130}

While this is a very compact format, it is confusing to read and it is
difficult to manipulate the values within the object, particularly when
tracking multiple attributes.

The new output format introduced in this commit is a JSON array of
objects, with:
- A key named "degree", with a float value.
- A key named "attributes", with an array of attribute numbers.
- A key named "dependency", with an attribute number.

The values use the same underlying type as previously when printed, with
a new output format that shows now as follows:
[{"degree": 1.000000, "attributes": [1], "dependency": 5},
 {"degree": 0.423130, "attributes": [5], "dependency": 1}]

This new format will become handy for a follow-up set of changes, so as
it becomes possible to inject extended statistics rather than require an
ANALYZE, like in a dump/restore sequence or after pg_upgrade on a new
cluster.

This format has been suggested by Tomas Vondra.  The key names are
defined in the header introduced by 1f927cce44, to ease the
integration of frontend-specific changes that are still under
discussion.  (Again a personal note: if anybody comes up with better
name for the keys, of course feel free.)

The bulk of the changes come from the regression tests, where
jsonb_pretty() is now used to make the outputs generated easier to
parse.

Author: Corey Huinker <corey.huinker@gmail.com>
Reviewed-by: Jian He <jian.universality@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CADkLM=dpz3KFnqP-dgJ-zvRvtjsa8UZv8wDAQdqho=qN3kX0Zg@mail.gmail.com
2025-11-17 10:44:26 +09:00
Michael Paquier
1f927cce44 Rework output format of pg_ndistinct
The existing format of pg_ndistinct uses a single-object JSON structure
where each key is itself a comma-separated list of attnums, like:
{"3, 4": 11, "3, 6": 11, "4, 6": 11, "3, 4, 6": 11}

While this is a very compact format, it is confusing to read and it is
difficult to manipulate the values within the object.

The new output format introduced in this commit is an array of objects,
with:
- A key named "attributes", that contains an array of attribute numbers.
- A key named "ndistinct", represented as an integer.

The values use the same underlying type as previously when printed, with
a new output format that shows now as follows:
[{"ndistinct": 11, "attributes": [3,4]},
 {"ndistinct": 11, "attributes": [3,6]},
 {"ndistinct": 11, "attributes": [4,6]},
 {"ndistinct": 11, "attributes": [3,4,6]}]

This new format will become handy for a follow-up set of changes, so as
it becomes possible to inject extended statistics rather than require an
ANALYZE, like in a dump/restore sequence or after pg_upgrade on a new
cluster.

This format has been suggested by Tomas Vondra.  The key names are
defined in a new header, to ease with the integration of
frontend-specific changes that are still under discussion.  (Personal
note: I am not specifically wedded to these key names, but if there are
better name suggestions for this release, feel free.)

The bulk of the changes come from the regression tests, where
jsonb_pretty() is now used to make the outputs generated easier to
parse.

Author: Corey Huinker <corey.huinker@gmail.com>
Reviewed-by: Jian He <jian.universality@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CADkLM=dpz3KFnqP-dgJ-zvRvtjsa8UZv8wDAQdqho=qN3kX0Zg@mail.gmail.com
2025-11-17 09:52:20 +09:00
David Rowley
586d63214e Adjust MemSet macro to use size_t rather than long
Likewise for MemSetAligned.

"long" wasn't the most suitable type for these macros as with MSVC in
64-bit builds, sizeof(long) == 4, which is narrower than the processor's
word size, therefore these macros had to perform twice as many loops as
they otherwise might.

Author: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/CAApHDvoGFjSA3aNyVQ3ivbyc4ST=CC5L-_VjEUQ92HbE2Cxovg@mail.gmail.com
2025-11-17 12:27:00 +13:00
David Rowley
9c047da51f Get rid of long datatype in CATCACHE_STATS enabled builds
"long" is 32 bits on Windows 64-bit.  Switch to a datatype that's 64-bit
on all platforms.  While we're there, use an unsigned type as these
fields count things that have occurred, of which it's not possible to
have negative numbers of.

Author: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/CAApHDvoGFjSA3aNyVQ3ivbyc4ST=CC5L-_VjEUQ92HbE2Cxovg@mail.gmail.com
2025-11-17 12:26:41 +13:00
Alexander Korotkov
23792d7381 Fix incorrect function name in comments
Update comments to reference WaitForLSN() instead of the outdated
WaitForLSNReplay() function name.

Discussion: https://postgr.es/m/CABPTF7UieOYbOgH3EnQCasaqcT1T4N6V2wammwrWCohQTnD_Lw%40mail.gmail.com
Author: Xuneng Zhou <xunengzhou@gmail.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
2025-11-15 12:27:42 +02:00
Michael Paquier
84fb27511d Replace off_t by pgoff_t in I/O routines
PostgreSQL's Windows port has never been able to handle files larger
than 2GB due to the use of off_t for file offsets, only 32-bit on
Windows.  This causes signed integer overflow at exactly 2^31 bytes when
trying to handle files larger than 2GB, for the routines touched by this
commit.

Note that large files are forbidden by ./configure (3c6248a828) and
meson (recent change, see 79cd66f28c).  This restriction also exists
in v16 and older versions for the now-dead MSVC scripts.

The code base already defines pgoff_t as __int64 (64-bit) on Windows for
this purpose, and some function declarations in headers use it, but many
internals still rely on off_t.  This commit switches more routines to
use pgoff_t, offering more portability, for areas mainly related to file
extensions and storage.

These are not critical for WAL segments yet, which have currently a
maximum size allowed of 1GB (well, this opens the door at allowing a
larger size for them).  This matters more for segment files if we want
to lift the large file restriction in ./configure and meson in the
future, which would make sense to remove once/if all traces of off_t are
gone from the tree.  This can additionally matter for out-of-core code
that may want files larger than 2GB in places where off_t is four bytes
in size.

Note that off_t is still used in other parts of the tree like
buffile.c, WAL sender/receiver, base backup, pg_combinebackup, etc.
These other code paths can be addressed separately, and their update
will be required if we want to remove the large file restriction in the
future.  This commit is a good first cut in itself towards more
portability, hopefully.

On Unix-like systems, pgoff_t is defined as off_t, so this change only
affects Windows behavior.

Author: Bryan Green <dbryan.green@gmail.com>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/0f238ff4-c442-42f5-adb8-01b762c94ca1@gmail.com
2025-11-13 12:41:40 +09:00
Heikki Linnakangas
8eeb4a0f7c Fix bug where we truncated CLOG that was still needed by LISTEN/NOTIFY
The async notification queue contains the XID of the sender, and when
processing notifications we call TransactionIdDidCommit() on the
XID. But we had no safeguards to prevent the CLOG segments containing
those XIDs from being truncated away. As a result, if a backend didn't
for some reason process its notifications for a long time, or when a
new backend issued LISTEN, you could get an error like:

test=# listen c21;
ERROR:  58P01: could not access status of transaction 14279685
DETAIL:  Could not open file "pg_xact/000D": No such file or directory.
LOCATION:  SlruReportIOError, slru.c:1087

To fix, make VACUUM "freeze" the XIDs in the async notification queue
before truncating the CLOG. Old XIDs are replaced with
FrozenTransactionId or InvalidTransactionId.

Note: This commit is not a full fix. A race condition remains, where a
backend is executing asyncQueueReadAllNotifications() and has just
made a local copy of an async SLRU page which contains old XIDs, while
vacuum concurrently truncates the CLOG covering those XIDs. When the
backend then calls TransactionIdDidCommit() on those XIDs from the
local copy, you still get the error. The next commit will fix that
remaining race condition.

This was first reported by Sergey Zhuravlev in 2021, with many other
people hitting the same issue later. Thanks to:
- Alexandra Wang, Daniil Davydov, Andrei Varashen and Jacques Combrink
  for investigating and providing reproducable test cases,
- Matheus Alcantara and Arseniy Mukhin for review and earlier proposed
  patches to fix this,
- Álvaro Herrera and Masahiko Sawada for reviews,
- Yura Sokolov aka funny-falcon for the idea of marking transactions
  as committed in the notification queue, and
- Joel Jacobson for the final patch version. I hope I didn't forget
  anyone.

Backpatch to all supported versions. I believe the bug goes back all
the way to commit d1e027221d, which introduced the SLRU-based async
notification queue.

Discussion: https://www.postgresql.org/message-id/16961-25f29f95b3604a8a@postgresql.org
Discussion: https://www.postgresql.org/message-id/18804-bccbbde5e77a68c2@postgresql.org
Discussion: https://www.postgresql.org/message-id/CAK98qZ3wZLE-RZJN_Y%2BTFjiTRPPFPBwNBpBi5K5CU8hUHkzDpw@mail.gmail.com
Backpatch-through: 14
2025-11-12 20:59:36 +02:00
Álvaro Herrera
877a024902
Split out innards of pg_tablespace_location()
This creates a src/backend/catalog/pg_tablespace.c supporting file
containing a new function get_tablespace_location(), which lets the code
underlying pg_tablespace_location() be reused for other purposes.

Author: Manni Wood <manni.wood@enterprisedb.com>
Author: Nishant Sharma <nishant.sharma@enterprisedb.com>
Reviewed-by: Vaibhav Dalvi <vaibhav.dalvi@enterprisedb.com>
Reviewed-by: Ian Lawrence Barwick <barwick@gmail.com>
Reviewed-by: Jim Jones <jim.jones@uni-muenster.de>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Discussion: https://postgr.es/m/CAKWEB6rmnmGKUA87Zmq-s=b3Scsnj02C0kObQjnbL2ajfPWGEw@mail.gmail.com
2025-11-12 16:39:55 +01:00
Peter Eisentraut
d2f24df19b Clean up qsort comparison function for GUC entries
guc_var_compare() is invoked from qsort() on an array of struct
config_generic, but the function accesses these directly as
strings (char *).  This relies on the name being the first field, so
this works.  But we can write this more clearly by using the struct
and then accessing the field through the struct.  Before the
reorganization of the GUC structs (commit a13833c35f), the old code
was probably more convenient, but now we can write this more clearly
and correctly.

After this change, it is no longer required that the name is the first
field in struct config_generic, so remove that comment.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/2c961fa1-14f6-44a2-985c-e30b95654e8d%40eisentraut.org
2025-11-11 07:55:10 +01:00
Heikki Linnakangas
e510378358 Bump PG_CONTROL_VERSION for commit 3e0ae46d90
Commit 3e0ae46d90 added a field to ControlFileData and bumped
CATALOG_VERSION_NO, but CATALOG_VERSION_NO is not the right version
number for ControlFileData changes. Bumping either one will force an
initdb, but PG_CONTROL_VERSION is more accurate. Bump
PG_CONTROL_VERSION now.

Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/1874404.1762787779@sss.pgh.pa.us
2025-11-10 19:12:43 +02:00
Nathan Bossart
5e4fcbe531 Check for CREATE privilege on the schema in CREATE STATISTICS.
This omission allowed table owners to create statistics in any
schema, potentially leading to unexpected naming conflicts.  For
ALTER TABLE commands that require re-creating statistics objects,
skip this check in case the user has since lost CREATE on the
schema.  The addition of a second parameter to CreateStatistics()
breaks ABI compatibility, but we are unaware of any impacted
third-party code.

Reported-by: Jelte Fennema-Nio <postgres@jeltef.nl>
Author: Jelte Fennema-Nio <postgres@jeltef.nl>
Co-authored-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Security: CVE-2025-12817
Backpatch-through: 13
2025-11-10 09:00:00 -06:00
Heikki Linnakangas
3e0ae46d90 Move SLRU_PAGES_PER_SEGMENT to pg_config_manual.h
It seems plausible that someone might want to experiment with
different values. The pressing reason though is that I'm reviewing a
patch that requires pg_upgrade to manipulate SLRU files. That patch
needs to access SLRU_PAGES_PER_SEGMENT from pg_upgrade code, and
slru.h, where SLRU_PAGES_PER_SEGMENT is currently defined, cannot be
included from frontend code. Moving it to pg_config_manual.h makes it
accessible.

Now that it's a little more likely that someone might change
SLRU_PAGES_PER_SEGMENT, add a cluster compatibility check for it.

Bump catalog version because of the new field in the control file.

Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Discussion: https://www.postgresql.org/message-id/c7a4ea90-9f7b-4953-81be-b3fcb47db057@iki.fi
2025-11-10 16:11:41 +02:00
Thomas Munro
c5d34f4a55 Fix generic read and write barriers for Clang.
generic-gcc.h maps our read and write barriers to C11 acquire and
release fences using compiler builtins, for platforms where we don't
have our own hand-rolled assembler.  This is apparently enough for GCC,
but the C11 memory model is only defined in terms of atomic accesses,
and our barriers for non-atomic, non-volatile accesses were not always
respected under Clang's stricter interpretation of the standard.

This explains the occasional breakage observed on new RISC-V + Clang
animal greenfly in lock-free PgAioHandle manipulation code containing a
repeating pattern of loads and read barriers.  The problem can also be
observed in code generated for MIPS and LoongAarch, though we aren't
currently testing those with Clang, and on x86, though we use our own
assembler there.  The scariest aspect is that we use the generic version
on very common ARM systems, but it doesn't seem to reorder the relevant
code there (or we'd have debugged this long ago).

Fix by inserting an explicit compiler barrier.  It expands to an empty
assembler block declared to have memory side-effects, so registers are
flushed and reordering is prevented.  In those respects this is like the
architecture-specific assembler versions, but the compiler is still in
charge of generating the appropriate fence instruction.  Done for write
barriers on principle, though concrete problems have only been observed
with read barriers.

Reported-by: Alexander Lakhin <exclusion@gmail.com>
Tested-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/d79691be-22bd-457d-9d90-18033b78c40a%40gmail.com
Backpatch-through: 13
2025-11-08 12:26:43 +13:00
Amit Kapila
f6a4c498dc Add seq_sync_error_count to subscription statistics.
This commit adds a new column, seq_sync_error_count, to the
pg_stat_subscription_stats view. This counter tracks the number of errors
encountered by the sequence synchronization worker during operation.

Since a single worker handles the synchronization of all sequences, this
value may reflect errors from multiple sequences. This addition improves
observability of sequence synchronization behavior and helps monitor
potential issues during replication.

Author: Vignesh C <vignesh21@gmail.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/CAA4eK1LC+KJiAkSrpE_NwvNdidw9F2os7GERUeSxSKv71gXysQ@mail.gmail.com
2025-11-07 08:05:08 +00:00
Andres Freund
c75ebc657f bufmgr: Allow some buffer state modifications while holding header lock
Until now BufferDesc.state was not allowed to be modified while the buffer
header spinlock was held. This meant that operations like unpinning buffers
needed to use a CAS loop, waiting for the buffer header spinlock to be
released before updating.

The benefit of that restriction is that it allowed us to unlock the buffer
header spinlock with just a write barrier and an unlocked write (instead of a
full atomic operation). That was important to avoid regressions in
48354581a4. However, since then the hottest buffer header spinlock uses have
been replaced with atomic operations (in particular, the most common use of
PinBuffer_Locked(), in GetVictimBuffer() (formerly in BufferAlloc()), has been
removed in 5e89985928).

This change will allow, in a subsequent commit, to release buffer pins with a
single atomic-sub operation. This previously was not possible while such
operations were not allowed while the buffer header spinlock was held, as an
atomic-sub would not have allowed a race-free check for the buffer header lock
being held.

Using atomic-sub to unpin buffers is a nice scalability win, however it is not
the primary motivation for this change (although it would be sufficient). The
primary motivation is that we would like to merge the buffer content lock into
BufferDesc.state, which will result in more frequent changes of the state
variable, which in some situations can cause a performance regression, due to
an increased CAS failure rate when unpinning buffers.  The regression entirely
vanishes when using atomic-sub.

Naively implementing this would require putting CAS loops in every place
modifying the buffer state while holding the buffer header lock. To avoid
that, introduce UnlockBufHdrExt(), which can set/add flags as well as the
refcount, together with releasing the lock.

Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff
2025-11-06 16:42:10 -05:00
Álvaro Herrera
06edbed478
Introduce XLogRecPtrIsValid()
XLogRecPtrIsInvalid() is inconsistent with the affirmative form of
macros used for other datatypes, and leads to awkward double negatives
in a few places.  This commit introduces XLogRecPtrIsValid(), which
allows code to be written more naturally.

This patch only adds the new macro.  XLogRecPtrIsInvalid() is left in
place, and all existing callers remain untouched.  This means all
supported branches can accept hypothetical bug fixes that use the new
macro, and at the same time any code that compiled with the original
formulation will continue to silently compile just fine.

Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Backpatch-through: 13
Discussion: https://postgr.es/m/aQB7EvGqrbZXrMlg@ip-10-97-1-34.eu-west-3.compute.internal
2025-11-06 19:08:29 +01:00
Heikki Linnakangas
aa9c5fd3e3 Refactor shared memory allocation for semaphores
Before commit e25626677f, spinlocks were implemented using semaphores
on some platforms (--disable-spinlocks). That made it necessary to
initialize semaphores early, before any spinlocks could be used. Now
that we don't support --disable-spinlocks anymore, we can allocate the
shared memory needed for semaphores the same way as other shared
memory structures. Since the semaphores are used only in the PGPROC
array, move the semaphore shmem size estimation and initialization
calls to ProcGlobalShmemSize() and InitProcGlobal().

Author: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Discussion: https://www.postgresql.org/message-id/CAExHW5seSZpPx-znjidVZNzdagGHOk06F+Ds88MpPUbxd1kTaA@mail.gmail.com
2025-11-06 14:45:00 +02:00
Peter Eisentraut
01a985c3c4 Re-run autoheader
Some of the changes in pg_config.h.in from commit 3853a6956c didn't
match the order that a fresh run would produce.
2025-11-06 07:37:22 +01:00
Alexander Korotkov
447aae13b0 Implement WAIT FOR command
WAIT FOR is to be used on standby and specifies waiting for
the specific WAL location to be replayed.  This option is useful when
the user makes some data changes on primary and needs a guarantee to see
these changes are on standby.

WAIT FOR needs to wait without any snapshot held.  Otherwise, the snapshot
could prevent the replay of WAL records, implying a kind of self-deadlock.
This is why separate utility command seems appears to be the most robust
way to implement this functionality.  It's not possible to implement this as
a function.  Previous experience shows that stored procedures also have
limitation in this aspect.

Discussion: https://www.postgresql.org/message-id/flat/CAPpHfdsjtZLVzxjGT8rJHCYbM0D5dwkO+BBjcirozJ6nYbOW8Q@mail.gmail.com
Discussion: https://www.postgresql.org/message-id/flat/CABPTF7UNft368x-RgOXkfj475OwEbp%2BVVO-wEXz7StgjD_%3D6sw%40mail.gmail.com
Author: Kartyshov Ivan <i.kartyshov@postgrespro.ru>
Author: Alexander Korotkov <aekorotkov@gmail.com>
Author: Xuneng Zhou <xunengzhou@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Alexander Lakhin <exclusion@gmail.com>
Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Reviewed-by: Euler Taveira <euler@eulerto.com>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-by: jian he <jian.universality@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com>
2025-11-05 11:44:13 +02:00
Alexander Korotkov
3b4e53a075 Add infrastructure for efficient LSN waiting
Implement a new facility that allows processes to wait for WAL to reach
specific LSNs, both on primary (waiting for flush) and standby (waiting
for replay) servers.

The implementation uses shared memory with per-backend information
organized into pairing heaps, allowing O(1) access to the minimum
waited LSN. This enables fast-path checks: after replaying or flushing
WAL, the startup process or WAL writer can quickly determine if any
waiters need to be awakened.

Key components:
- New xlogwait.c/h module with WaitForLSNReplay() and WaitForLSNFlush()
- Separate pairing heaps for replay and flush waiters
- WaitLSN lightweight lock for coordinating shared state
- Wait events WAIT_FOR_WAL_REPLAY and WAIT_FOR_WAL_FLUSH for monitoring

This infrastructure can be used by features that need to wait for WAL
operations to complete.

Discussion: https://www.postgresql.org/message-id/flat/CAPpHfdsjtZLVzxjGT8rJHCYbM0D5dwkO+BBjcirozJ6nYbOW8Q@mail.gmail.com
Discussion: https://www.postgresql.org/message-id/flat/CABPTF7UNft368x-RgOXkfj475OwEbp%2BVVO-wEXz7StgjD_%3D6sw%40mail.gmail.com
Author: Kartyshov Ivan <i.kartyshov@postgrespro.ru>
Author: Alexander Korotkov <aekorotkov@gmail.com>
Author: Xuneng Zhou <xunengzhou@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Alexander Lakhin <exclusion@gmail.com>
Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Reviewed-by: Euler Taveira <euler@eulerto.com>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com>
2025-11-05 11:44:13 +02:00
Alexander Korotkov
8af3ae0d4b Add pairingheap_initialize() for shared memory usage
The existing pairingheap_allocate() uses palloc(), which allocates
from process-local memory. For shared memory use cases, the pairingheap
structure must be allocated via ShmemAlloc() or embedded in a shared
memory struct. Add pairingheap_initialize() to initialize an already-
allocated pairingheap structure in-place, enabling shared memory usage.

Discussion: https://www.postgresql.org/message-id/flat/CAPpHfdsjtZLVzxjGT8rJHCYbM0D5dwkO+BBjcirozJ6nYbOW8Q@mail.gmail.com
Discussion: https://www.postgresql.org/message-id/flat/CABPTF7UNft368x-RgOXkfj475OwEbp%2BVVO-wEXz7StgjD_%3D6sw%40mail.gmail.com
Author: Kartyshov Ivan <i.kartyshov@postgrespro.ru>
Author: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Alexander Lakhin <exclusion@gmail.com>
Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Reviewed-by: Euler Taveira <euler@eulerto.com>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com>
2025-11-05 11:44:13 +02:00
Amit Kapila
5509055d69 Add sequence synchronization for logical replication.
This patch introduces sequence synchronization. Sequences that are synced
will have 2 states:
   - INIT (needs [re]synchronizing)
   - READY (is already synchronized)

A new sequencesync worker is launched as needed to synchronize sequences.
A single sequencesync worker is responsible for synchronizing all
sequences. It begins by retrieving the list of sequences that are flagged
for synchronization, i.e., those in the INIT state. These sequences are
then processed in batches, allowing multiple entries to be synchronized
within a single transaction. The worker fetches the current sequence
values and page LSNs from the remote publisher, updates the corresponding
sequences on the local subscriber, and finally marks each sequence as
READY upon successful synchronization.

Sequence synchronization occurs in 3 places:
1) CREATE SUBSCRIPTION
    - The command syntax remains unchanged.
    - The subscriber retrieves sequences associated with publications.
    - Published sequences are added to pg_subscription_rel with INIT
      state.
    - Initiate the sequencesync worker to synchronize all sequences.

2) ALTER SUBSCRIPTION ... REFRESH PUBLICATION
    - The command syntax remains unchanged.
    - Dropped published sequences are removed from pg_subscription_rel.
    - Newly published sequences are added to pg_subscription_rel with INIT
      state.
    - Initiate the sequencesync worker to synchronize only newly added
      sequences.

3) ALTER SUBSCRIPTION ... REFRESH SEQUENCES
    - A new command introduced for PG19 by f0b3573c3a.
    - All sequences in pg_subscription_rel are reset to INIT state.
    - Initiate the sequencesync worker to synchronize all sequences.
    - Unlike "ALTER SUBSCRIPTION ... REFRESH PUBLICATION" command,
      addition and removal of missing sequences will not be done in this
      case.

Author: Vignesh C <vignesh21@gmail.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Hou Zhijie <houzj.fnst@fujitsu.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Nisha Moond <nisha.moond412@gmail.com>
Reviewed-by: Shlok Kyal <shlok.kyal.oss@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAA4eK1LC+KJiAkSrpE_NwvNdidw9F2os7GERUeSxSKv71gXysQ@mail.gmail.com
2025-11-05 05:59:58 +00:00
Michael Paquier
e0ca61e7c4 Add WalRcvGetState() to retrieve the state of a WAL receiver
This has come up as useful as an alternative of WalRcvStreaming(), to be
able to do sanity checks based on the state of a WAL receiver.  This
will be used in a follow-up commit.

Author: Xuneng Zhou <xunengzhou@gmail.com>
Discussion: https://postgr.es/m/19093-c4fff49a608f82a0@postgresql.org
2025-11-04 12:57:36 +09:00
Michael Paquier
17b2d5ec75 Fix unconditional WAL receiver shutdown during stream-archive transition
Commit b4f584f9d2 (affecting v15~, later backpatched down to 13 as of
3635a0a35a) introduced an unconditional WAL receiver shutdown when
switching from streaming to archive WAL sources.  This causes problems
during a timeline switch, when a WAL receiver enters WALRCV_WAITING
state but remains alive, waiting for instructions.

The unconditional shutdown can break some monitoring scenarios as the
WAL receiver gets repeatedly terminated and re-spawned, causing
pg_stat_wal_receiver.status to show a "streaming" instead of "waiting"
status, masking the fact that the WAL receiver is waiting for a new TLI
and a new LSN to be able to continue streaming.

This commit changes the WAL receiver behavior so as the shutdown becomes
conditional, with InstallXLogFileSegmentActive being always reset to
prevent the regression fixed by b4f584f9d2: only terminate the WAL
receiver when it is actively streaming (WALRCV_STREAMING,
WALRCV_STARTING, or WALRCV_RESTARTING).  When in WALRCV_WAITING state,
just reset InstallXLogFileSegmentActive flag to allow archive
restoration without killing the process.  WALRCV_STOPPED and
WALRCV_STOPPING are not reachable states in this code path.  For the
latter, the startup process is the one in charge of setting
WALRCV_STOPPING via ShutdownWalRcv(), waiting for the WAL receiver to
reach a WALRCV_STOPPED state after switching walRcvState, so
WaitForWALToBecomeAvailable() cannot be reached while a WAL receiver is
in a WALRCV_STOPPING state.

A regression test is added to check that a WAL receiver is not stopped
on timeline jump, that fails when the fix of this commit is reverted.

Reported-by: Ryan Bird <ryanzxg@gmail.com>
Author: Xuneng Zhou <xunengzhou@gmail.com>
Reviewed-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/19093-c4fff49a608f82a0@postgresql.org
Backpatch-through: 13
2025-11-04 10:47:38 +09:00
Álvaro Herrera
645cb44c54
Add \pset options for boolean value display
New \pset variables display_true and display_false allow the user to
change how true and false values are displayed.

Author: David G. Johnston <David.G.Johnston@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Discussion: https://postgr.es/m/CAKFQuwYts3vnfQ5AoKhEaKMTNMfJ443MW2kFswKwzn7fiofkrw@mail.gmail.com
Discussion: https://postgr.es/m/56308F56.8060908@joh.to
2025-11-03 17:40:39 +01:00
Tom Lane
8f29467c57 Change "long" numGroups fields to be Cardinality (i.e., double).
We've been nibbling away at removing uses of "long" for a long time,
since its width is platform-dependent.  Here's one more: change the
remaining "long" fields in Plan nodes to Cardinality, since the three
surviving examples all represent group-count estimates.  The upstream
planner code was converted to Cardinality some time ago; for example
the corresponding fields in Path nodes are type Cardinality, as are
the arguments of the make_foo_path functions.  Downstream in the
executor, it turns out that these all feed to the table-size argument
of BuildTupleHashTable.  Change that to "double" as well, and fix it
so that it safely clamps out-of-range values to the uint32 limit of
simplehash.h, as was not being done before.

Essentially, this is removing all the artificial datatype-dependent
limitations on these values from upstream processing, and applying
just one clamp at the moment where we're forced to do so by the
datatype choices of simplehash.h.

Also, remove BuildTupleHashTable's misguided attempt to enforce
work_mem/hash_mem_limit.  It doesn't have enough information
(particularly not the expected tuple width) to do that accurately,
and it has no real business second-guessing the caller's choice.
For all these plan types, it's really the planner's responsibility
to not choose a hashed implementation if the hashtable is expected
to exceed hash_mem_limit.  The previous patch improved the
accuracy of those estimates, and even if BuildTupleHashTable had
more information it should arrive at the same conclusions.

Reported-by: Jeff Janes <jeff.janes@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/CAMkU=1zia0JfW_QR8L5xA2vpa0oqVuiapm78h=WpNsHH13_9uw@mail.gmail.com
2025-11-02 16:57:43 -05:00
Tom Lane
1ea5bdb00b Improve planner's estimates of tuple hash table sizes.
For several types of plan nodes that use TupleHashTables, the
planner estimated the expected size of the table as basically
numEntries * (MAXALIGN(dataWidth) + MAXALIGN(SizeofHeapTupleHeader)).
This is pretty far off, especially for small data widths, because
it doesn't account for the overhead of the simplehash.h hash table
nor for any per-tuple "additional space" the plan node may request.
Jeff Janes noted a case where the estimate was off by about a factor
of three, even though the obvious hazards such as inaccurate estimates
of numEntries or dataWidth didn't apply.

To improve matters, create functions provided by the relevant executor
modules that can estimate the required sizes with reasonable accuracy.
(We're still not accounting for effects like allocator padding, but
this at least gets the first-order effects correct.)

I added functions that can estimate the tuple table sizes for
nodeSetOp and nodeSubplan; these rely on an estimator for
TupleHashTables in general, and that in turn relies on one for
simplehash.h hash tables.  That feels like kind of a lot of mechanism,
but if we take any short-cuts we're violating modularity boundaries.

The other places that use TupleHashTables are nodeAgg, which took
pains to get its numbers right already, and nodeRecursiveunion.
I did not try to improve the situation for nodeRecursiveunion because
there's nothing to improve: we are not making an estimate of the hash
table size, and it wouldn't help us to do so because we have no
non-hashed alternative implementation.  On top of that, our estimate
of the number of entries to be hashed in that module is so suspect
that we'd likely often choose the wrong implementation if we did have
two ways to do it.

Reported-by: Jeff Janes <jeff.janes@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/CAMkU=1zia0JfW_QR8L5xA2vpa0oqVuiapm78h=WpNsHH13_9uw@mail.gmail.com
2025-11-02 16:57:26 -05:00
Tom Lane
645c1e2752 Avoid mixing void and integer in a conditional expression.
The C standard says that the second and third arguments of a
conditional operator shall be both void type or both not-void
type.  The Windows version of INTERRUPTS_PENDING_CONDITION()
got this wrong.  It's pretty harmless because the result of
the operator is ignored anyway, but apparently recent versions
of MSVC have started issuing a warning about it.  Silence the
warning by casting the dummy zero to void.

Reported-by: Christian Ullrich <chris@chrullrich.net>
Author: Bryan Green <dbryan.green@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/cc4ef8db-f8dc-4347-8a22-e7ebf44c0308@chrullrich.net
Backpatch-through: 13
2025-11-02 12:30:44 -05:00
Peter Eisentraut
8a27d418f8 Mark function arguments of type "Datum *" as "const Datum *" where possible
Several functions in the codebase accept "Datum *" parameters but do
not modify the pointed-to data.  These have been updated to take
"const Datum *" instead, improving type safety and making the
interfaces clearer about their intent.  This change helps the compiler
catch accidental modifications and better documents immutability of
arguments.

Most of "Datum *" parameters have a pairing "bool *isnull" parameter,
they are constified as well.

No functional behavior is changed by this patch.

Author: Chao Li <lic@highgo.com>
Discussion: https://www.postgresql.org/message-id/flat/CAEoWx2msfT0knvzUa72ZBwu9LR_RLY4on85w2a9YpE-o2By5HQ@mail.gmail.com
2025-10-31 10:47:25 +01:00
Tom Lane
c106ef0807 Use BumpContext contexts in TupleHashTables, and do some code cleanup.
For all extant uses of TupleHashTables, execGrouping.c itself does
nothing with the "tablecxt" except to allocate new hash entries in it,
and the callers do nothing with it except to reset the whole context.
So this is an ideal use-case for a BumpContext, and the hash tables
are frequently big enough for the savings to be significant.

(Commit cc721c459 already taught nodeAgg.c this idea, but neglected
the other callers of BuildTupleHashTable.)

While at it, let's clean up some ill-advised leftovers from rebasing
TupleHashTables on simplehash.h:

* Many comments and variable names were based on the idea that the
tablecxt holds the whole TupleHashTable, whereas now it only holds the
hashed tuples (plus any caller-defined "additional storage").  Rename
to names like tuplescxt and tuplesContext, and adjust the comments.
Also adjust the memory context names to be like "<Foo> hashed tuples".

* Make ResetTupleHashTable() reset the tuplescxt rather than relying
on the caller to do so; that was fairly bizarre and seems like a
recipe for leaks.  This is less efficient in the case where nodeAgg.c
uses the same tuplescxt for several different hashtables, but only
microscopically so because mcxt.c will short-circuit the extra resets
via its isReset flag.  I judge the extra safety and intellectual
cleanliness well worth those few cycles.

* Remove the long-obsolete "allow_jit" check added by ac88807f9;
instead, just Assert that metacxt and tuplescxt are different.
We need that anyway for this definition of ResetTupleHashTable() to
be safe.

There is a side issue of the extent to which this change invalidates
the planner's estimates of hashtable memory consumption.  However,
those estimates are already pretty bad, so improving them seems like
it can be a separate project.  This change is useful to do first to
establish consistent executor behavior that the planner can expect.

A loose end not addressed here is that the "entrysize" calculation
in BuildTupleHashTable seems wrong: "sizeof(TupleHashEntryData) +
additionalsize" corresponds neither to the size of the simplehash
entries nor to the total space needed per tuple.  It's questionable
why BuildTupleHashTable is second-guessing its caller's nbuckets
choice at all, since the original source of the number should have had
more information.  But that all seems wrapped up with the planner's
estimation logic, so let's leave it for the planned followup patch.

Reported-by: Jeff Janes <jeff.janes@gmail.com>
Reported-by: David Rowley <dgrowleyml@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/CAMkU=1zia0JfW_QR8L5xA2vpa0oqVuiapm78h=WpNsHH13_9uw@mail.gmail.com
Discussion: https://postgr.es/m/2268409.1761512111@sss.pgh.pa.us
2025-10-30 11:21:22 -04:00
Peter Eisentraut
e1ac846f3d Mark ItemPointer arguments as const throughout
This is a follow up 991295f.  I searched over src/ and made all
ItemPointer arguments as const as much as possible.

Note: We cut out from the original patch the pieces that would have
created incompatibilities in the index or table AM APIs.  Those could
be considered separately.

Author: Chao Li <li.evan.chao@gmail.com>
Discussion: https://www.postgresql.org/message-id/CAEoWx2nBaypg16Z5ciHuKw66pk850RFWw9ACS2DqqJ_AkKeRsw%40mail.gmail.com
2025-10-30 14:12:06 +01:00
Peter Eisentraut
8ce795fcb7 Fix some confusing uses of const
There are a few places where we have

    typedef struct FooData { ... } FooData;
    typedef FooData *Foo;

and then function declarations with

    bar(const Foo x)

which isn't incorrect but probably meant

    bar(const FooData *x)

meaning that the thing x points to is immutable, not x itself.

This patch makes those changes where appropriate.  In one
case (execGrouping.c), the thing being pointed to was not immutable,
so in that case remove the const altogether, to avoid further
confusion.

Co-authored-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://www.postgresql.org/message-id/CAEoWx2m2E0xE8Kvbkv31ULh_E%2B5zph-WA_bEdv3UR9CLhw%2B3vg%40mail.gmail.com
Discussion: https://www.postgresql.org/message-id/CAEoWx2kTDz%3Db6T2xHX78vy_B_osDeCC5dcTCi9eG0vXHp5QpdQ%40mail.gmail.com
2025-10-30 11:20:04 +01:00
Peter Eisentraut
3479a0f823 const-qualify ItemPointer comparison functions
Add const qualifiers to ItemPointerEquals() and ItemPointerCompare().
This will allow further changes up the stack.  It also complements
commit aeb767ca0b, as we now have all of itemptr.h appropriately
const-qualified.

Author: Chao Li <li.evan.chao@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CAEoWx2nBaypg16Z5ciHuKw66pk850RFWw9ACS2DqqJ_AkKeRsw@mail.gmail.com
2025-10-30 10:13:47 +01:00
Jeff Davis
3853a6956c Use C11 char16_t and char32_t for Unicode code points.
Reviewed-by: Tatsuo Ishii <ishii@postgresql.org>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/bedcc93d06203dfd89815b10f815ca2de8626e85.camel%40j-davis.com
2025-10-29 14:17:13 -07:00
Peter Eisentraut
a13833c35f Reorganize GUC structs
Instead of having five separate GUC structs, one for each type, with
the generic part contained in each of them, flip it around and have
one common struct, with the type-specific part has a subfield.

The very original GUC design had type-specific structs and
type-specific lists, and the membership in one of the lists defined
the type.  But now the structs themselves know the type (from the
.vartype field), and they are all loaded into a common hash table at
run time, and so this original separation no longer makes sense.  It
creates a bunch of inconsistencies in the code about whether the
type-specific or the generic struct is the primary struct, and a lot
of casting in between, which makes certain assumptions about the
struct layouts.

After the change, all these casts are gone and all the data is
accessed via normal field references.  Also, various code is
simplified because only one kind of struct needs to be processed.

Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://www.postgresql.org/message-id/flat/8fdfb91e-60fb-44fa-8df6-f5dea47353c9@eisentraut.org
2025-10-29 09:52:29 +01:00
Peter Eisentraut
f0f2c0c1ae Replace pg_restrict by standard restrict
MSVC in C11 mode supports the standard restrict qualifier, so we don't
need the workaround naming pg_restrict anymore.

Even though restrict is in C99 and should be supported by all
supported compilers, we keep the configure test and the hardcoded
redirection to __restrict, because that will also work in C++ in all
supported compilers.  (restrict is not part of the C++ standard.)

For backward compatibility for extensions, we keep a #define of
pg_restrict around, but our own code doesn't use it anymore.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/0e3d8644-c01d-4374-86ea-9f0a987981f0%40eisentraut.org
2025-10-29 07:52:58 +01:00
Peter Eisentraut
c094be259b Remove obsolete comment
The comment "type prefixes (const, signed, volatile, inline) are
handled in pg_config.h." has been mostly not true for a long time.
2025-10-29 07:32:21 +01:00
Michael Paquier
d3111cb753 Fix correctness issue with computation of FPI size in WAL stats
XLogRecordAssemble() may be called multiple times before inserting a
record in XLogInsertRecord(), and the amount of FPIs generated inside
a record whose insertion is attempted multiple times may vary.

The logic added in f9a09aa295 touched directly pgWalUsage in
XLogRecordAssemble(), meaning that it could be possible for pgWalUsage
to be incremented multiple times for a single record.  This commit
changes the code to use the same logic as the number of FPIs added to a
record, where XLogRecordAssemble() returns this information and feeds it
to XLogInsertRecord(), updating pgWalUsage only when a record is
inserted.

Reported-by: Shinya Kato <shinya11.kato@gmail.com>
Discussion: https://postgr.es/m/CAOzEurSiSr+rusd0GzVy8Bt30QwLTK=ugVMnF6=5WhsSrukvvw@mail.gmail.com
2025-10-29 09:13:31 +09:00
Michael Paquier
f9a09aa295 Add wal_fpi_bytes to pg_stat_wal and pg_stat_get_backend_wal()
This new counter, called "wal_fpi_bytes", tracks the total amount in
bytes of full page images (FPIs) generated in WAL.  This data becomes
available globally via pg_stat_wal, and for backend statistics via
pg_stat_get_backend_wal().

Previously, this information could only be retrieved with pg_waldump or
pg_walinspect, which may not be available depending on the environment,
and are expensive to execute.  It offers hints about how much FPIs
impact the WAL generated, which could be a large percentage for some
workloads, as well as the effects of wal_compression or page holes.

Bump catalog version.
Bump PGSTAT_FILE_FORMAT_ID, due to the addition of wal_fpi_bytes in
PgStat_WalCounters.

Author: Shinya Kato <shinya11.kato@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CAOzEurQtZEAfg6P0kU3Wa-f9BWQOi0RzJEMPN56wNTOmJLmfaQ@mail.gmail.com
2025-10-28 16:21:51 +09:00
Amit Kapila
3e8e05596a Add worker type argument to logical replication worker functions.
Extend logicalrep_worker_stop, logicalrep_worker_wakeup, and
logicalrep_worker_find to accept a worker type argument. This change
enables differentiation between logical replication worker types, such as
apply workers and table sync workers. While preserving existing behavior,
it lays the groundwork for upcoming patch to add sequence synchronization
workers.

Author: Vignesh C <vignesh21@gmail.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/CAA4eK1LC+KJiAkSrpE_NwvNdidw9F2os7GERUeSxSKv71gXysQ@mail.gmail.com
2025-10-28 05:47:50 +00:00
Peter Eisentraut
10b5bb3bff Add some const qualifications
Add some const qualifications afforded by the previous change that
added a const qualification to PageAddItemExtended().

Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Peter Geoghegan <pg@bowt.ie>
Discussion: https://www.postgresql.org/message-id/flat/c75cccf5-5709-407b-a36a-2ae6570be766@eisentraut.org
2025-10-27 09:55:59 +01:00
Peter Eisentraut
76acf4b722 Remove Item type
This type is just char * underneath, it provides no real value, no
type safety, and just makes the code one level more mysterious.  It is
more idiomatic to refer to blobs of memory by a combination of void *
and size_t, so change it to that.

Also, since this type hides the pointerness, we can't apply qualifiers
to what is pointed to, which requires some unconstify nonsense.  This
change allows fixing that.

Extension code that uses the Item type can change its code to use
void * to be backward compatible.

Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Peter Geoghegan <pg@bowt.ie>
Discussion: https://www.postgresql.org/message-id/flat/c75cccf5-5709-407b-a36a-2ae6570be766@eisentraut.org
2025-10-27 09:55:59 +01:00
Peter Eisentraut
64d2b0968e Remove meaninglist restrict qualifiers
The use of the restrict qualifier in casts is meaningless, so remove
them.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/0e3d8644-c01d-4374-86ea-9f0a987981f0%40eisentraut.org
2025-10-27 08:53:09 +01:00
Jeff Davis
371a302eec Comment typo fixes: pg_wchar_t should be pg_wchar.
Reported-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://postgr.es/m/CA+hUKGJ5Xh0KxLYXDZuPvw1_fHX=yuzb4xxtam1Cr6TPZZ1o+w@mail.gmail.com
2025-10-26 12:31:50 -07:00
Amit Kapila
f0b3573c3a Introduce "REFRESH SEQUENCES" for subscriptions.
This patch adds support for a new SQL command:
ALTER SUBSCRIPTION ... REFRESH SEQUENCES
This command updates the sequence entries present in the
pg_subscription_rel catalog table with the INIT state to trigger
resynchronization.

In addition to the new command, the following subscription commands have
been enhanced to automatically refresh sequence mappings:
ALTER SUBSCRIPTION ... REFRESH PUBLICATION
ALTER SUBSCRIPTION ... ADD PUBLICATION
ALTER SUBSCRIPTION ... DROP PUBLICATION
ALTER SUBSCRIPTION ... SET PUBLICATION

These commands will perform the following actions:
Add newly published sequences that are not yet part of the subscription.
Remove sequences that are no longer included in the publication.

This ensures that sequence replication remains aligned with the current
state of the publication on the publisher side.

Note that the actual synchronization of sequence data/values will be
handled in a subsequent patch that introduces a dedicated sequence sync
worker.

Author: Vignesh C <vignesh21@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Nisha Moond <nisha.moond412@gmail.com>
Reviewed-by: Shlok Kyal <shlok.kyal.oss@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Hou Zhijie <houzj.fnst@fujitsu.com>
Discussion: https://postgr.es/m/CAA4eK1LC+KJiAkSrpE_NwvNdidw9F2os7GERUeSxSKv71gXysQ@mail.gmail.com
2025-10-23 08:30:27 +00:00
Nathan Bossart
d10866f1fd Fix type of infomask parameter in htup_details.h functions.
Oversight in commit 34694ec888.  Since there aren't any known live
bugs related to this, no back-patch.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/aPk4u955ZPPZ_nYw%40nathan
2025-10-22 16:47:38 -05:00
Fujii Masao
f33e60a53a Make invalid primary_slot_name follow standard GUC error reporting.
Previously, if primary_slot_name was set to an invalid slot name and
the configuration file was reloaded, both the postmaster and all other
backend processes reported a WARNING. With many processes running,
this could produce a flood of duplicate messages. The problem was that
the GUC check hook for primary_slot_name reported errors at WARNING
level via ereport().

This commit changes the check hook to use GUC_check_errdetail() and
GUC_check_errhint() for error reporting. As with other GUC parameters,
this causes non-postmaster processes to log the message at DEBUG3,
so by default, only the postmaster's message appears in the log file.

Backpatch to all supported versions.

Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Chao Li <lic@highgo.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Discussion: https://postgr.es/m/CAHGQGwFud-cvthCTfusBfKHBS6Jj6kdAPTdLWKvP2qjUX6L_wA@mail.gmail.com
Backpatch-through: 13
2025-10-22 20:09:43 +09:00
Michael Paquier
2519fa8362 Bump catalog version for new function error_on_null()
Oversight in 2b75c38b70.  No comments.

Discussion: https://postgr.es/m/aPgu7kwiT4iGo6Ya@paquier.xyz
2025-10-22 10:08:47 +09:00
Michael Paquier
2b75c38b70 Add error_on_null(), checking if the input is the null value
This polymorphic function produces an error if the input value is
detected as being the null value; otherwise it returns the input value
unchanged.

This function can for example become handy in SQL function bodies, to
enforce that exactly one row was returned.

Author: Joel Jacobson <joel@compiler.org>
Reviewed-by: Vik Fearing <vik@postgresfriends.org>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/ece8c6d1-2ab1-45d5-ba12-8dec96fc8886@app.fastmail.com
Discussion: https://postgr.es/m/de94808d-ed58-4536-9e28-e79b09a534c7@app.fastmail.com
2025-10-22 09:55:17 +09:00
Jeff Davis
ff53907c35 Make char2wchar() static.
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/0151ad01239e2cc7b3139644358cf8f7b9622ff7.camel@j-davis.com
2025-10-21 09:32:12 -07:00
Jeff Davis
844385d12e Remove obsolete global database_ctype_is_c.
Now that tsearch uses the database default locale, there's no need to
track the database CTYPE separately.

Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/0151ad01239e2cc7b3139644358cf8f7b9622ff7.camel@j-davis.com
2025-10-21 09:32:04 -07:00
Álvaro Herrera
b7cc6474e9
Make smgr access for a BufferManagerRelation safer in relcache inval
Currently there's no bug, because we have no code path where we
invalidate relcache entries where it'd cause a problem.  But it's more
robust to do it this way in case we introduce such a path later, as some
Postgres forks reportedly already have.

Author: Daniil Davydov <3danissimo@gmail.com>
Reviewed-by: Stepan Neretin <slpmcf@gmail.com>
Discussion: https://postgr.es/m/CAJDiXgj3FNzAhV+jjPqxMs3jz=OgPohsoXFj_fh-L+nS+13CKQ@mail.gmail.com
2025-10-21 10:51:55 +03:00
Jeff Davis
e533524b23 Add pg_database_locale() to retrieve database default locale.
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/0151ad01239e2cc7b3139644358cf8f7b9622ff7.camel@j-davis.com
2025-10-18 16:25:23 -07:00
Jeff Davis
67a8b49e96 Add pg_iswxdigit(), useful for tsearch.
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/0151ad01239e2cc7b3139644358cf8f7b9622ff7.camel@j-davis.com
2025-10-18 16:25:11 -07:00
David Rowley
5c0a20003b Fix reset of incorrect hash iterator in GROUPING SETS queries
This fixes an unlikely issue when fetching GROUPING SET results from
their internally stored hash tables.  It was possible in rare cases that
the hash iterator would be set up incorrectly which could result in a
crash.

This was introduced in 4d143509c, so backpatch to v18.

Many thanks to Yuri Zamyatin for reporting and helping to debug this
issue.

Bug: #19078
Reported-by: Yuri Zamyatin <yuri@yrz.am>
Author: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Jeff Davis <pgsql@j-davis.com>
Discussion: https://postgr.es/m/19078-dfd62f840a2c0766@postgresql.org
Backpatch-through: 18
2025-10-18 16:07:04 +13:00
David Rowley
86d118f9a6 Englishify comment wording
Switch to using the English word here rather than using a verbified
function name.

The full word still fits within a single comment line, so it's probably
better just to use that instead of trying to shorten it, which might
cause confusion.

Author: Rafia Sabih <rafia.pghackers@gmail.com>
Discussion: https://postgr.es/m/CA+FpmFe7LnRF2NA_QfARjkSWme4mNt+Udwbh2Yb=zZm35Ji31w@mail.gmail.com
2025-10-18 12:50:14 +13:00
Masahiko Sawada
fd53065013 Remove unused data_bufsz from DecodedBkpBlock struct.
Author: Mikhail Gribkov <youzhick@gmail.com>
Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Discussion: https://postgr.es/m/CAMEv5_sxuaiAfSy1ZyN%3D7UGbHg3C10cwHhEk8nXEjiCsBVs4vQ%40mail.gmail.com
2025-10-17 11:28:54 -07:00
Peter Eisentraut
e1a912c86d Change config_generic.vartype to be initialized at compile time
Previously, this was initialized at run time so that it did not have
to be maintained by hand in guc_tables.c.  But since that table is now
generated anyway, we might as well generate this bit as well.

Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/8fdfb91e-60fb-44fa-8df6-f5dea47353c9@eisentraut.org
2025-10-17 10:33:54 +02:00
Nathan Bossart
812221b204 Remove partColsUpdated.
This information appears to have been unused since commit
c5b7ba4e67.  We could not find any references in third-party code,
either.

Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/aO_CyFRpbVMtgJWM%40nathan
2025-10-16 11:31:38 -05:00
Amit Kapila
41c674d2e3 Refactor logical worker synchronization code into a separate file.
To support the upcoming addition of a sequence synchronization worker,
this patch extracts common synchronization logic shared by table sync
workers and the new sequence sync worker into a dedicated file. This
modularization improves code reuse, maintainability, and clarity in the
logical workers framework.

Author: vignesh C <vignesh21@gmail.com>
Author: Hou Zhijie <houzj.fnst@fujitsu.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/CAA4eK1LC+KJiAkSrpE_NwvNdidw9F2os7GERUeSxSKv71gXysQ@mail.gmail.com
2025-10-16 05:10:50 +00:00
Jeff Davis
af164f31b9 Add pg_iswalpha() and related functions.
Per-character pg_locale_t APIs. Useful for tsearch parsing and
potentially other places.

Significant overlap with the regc_wc_isalpha() and related functions
in regc_pg_locale.c, but this change leaves those intact for
now.

Discussion: https://postgr.es/m/0151ad01239e2cc7b3139644358cf8f7b9622ff7.camel@j-davis.com
2025-10-15 12:54:01 -07:00