The error message added in 379695d3cc referred to the public key being
too long. This is confusing as it is in fact the session key included
in a PGP message which is too long. This is harmless, but let's be
precise about what is wrong.
Per offline report.
Reported-by: Zsolt Parragi <zsolt.parragi@percona.com>
Backpatch-through: 14
Commit 1e7fe06c10 changed
pg_mbstrlen_with_len() to ereport(ERROR) if the input ends in an
incomplete character. Most callers want that. text_substring() does
not. It detoasts the most bytes it could possibly need to get the
requested number of characters. For example, to extract up to 2 chars
from UTF8, it needs to detoast 8 bytes. In a string of 3-byte UTF8
chars, 8 bytes spans 2 complete chars and 1 partial char.
Fix this by replacing this pg_mbstrlen_with_len() call with a string
traversal that differs by stopping upon finding as many chars as the
substring could need. This also makes SUBSTRING() stop raising an
encoding error if the incomplete char is past the end of the substring.
This is consistent with the general philosophy of the above commit,
which was to raise errors on a just-in-time basis. Before the above
commit, SUBSTRING() never raised an encoding error.
SUBSTRING() has long been detoasting enough for one more char than
needed, because it did not distinguish exclusive and inclusive end
position. For avoidance of doubt, stop detoasting extra.
Back-patch to v14, like the above commit. For applications using
SUBSTRING() on non-ASCII column values, consider applying this to your
copy of any of the February 12, 2026 releases.
Reported-by: SATŌ Kentarō <ranvis@gmail.com>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Bug: #19406
Discussion: https://postgr.es/m/19406-9867fddddd724fca@postgresql.org
Backpatch-through: 14
The prior order caused spurious Valgrind errors. They're spurious
because the ereport(ERROR) non-local exit discards the pointer in
question. pg_mblen_cstr() ordered the checks correctly, but these other
two did not. Back-patch to v14, like commit
1e7fe06c10.
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://postgr.es/m/20260214053821.fa.noahmisch@microsoft.com
Backpatch-through: 14
Radix sort can be much faster than quicksort, but for our purposes it
is limited to sequences of unsigned bytes. To make tuples with other
types amenable to this technique, several features of tuple comparison
must be accounted for, i.e. the sort key must be "normalized":
1. Signedness -- It's possible to modify a signed integer such that
it can be compared as unsigned. For example, a signed char has range
-128 to 127. If we cast that to unsigned char and add 128, the range
of values becomes 0 to 255 while preserving order.
2. Direction -- SQL allows specification of ASC or DESC. The
descending case is easily handled by taking the complement of the
unsigned representation.
3. NULL values -- NULLS FIRST and NULLS LAST must work correctly.
This commmit only handles the case where datum1 is pass-by-value
Datum (possibly abbreviated) that compares like an ordinary
integer. (Abbreviations of values of type "numeric" are a convenient
counterexample.) First, tuples are partitioned by nullness in the
correct NULL ordering. Then the NOT NULL tuples are sorted with radix
sort on datum1. For tiebreaks on subsequent sortkeys (including the
first sort key if abbreviated), we divert to the usual qsort.
ORDER BY queries on pre-warmed buffers are up to 2x faster on high
cardinality inputs with radix sort than the sort specializations added
by commit 697492434, so get rid of them. It's sufficient to fall back
to qsort_tuple() for small arrays. Moderately low cardinality inputs
show more modest improvents. Our qsort is strongly optimized for very
low cardinality inputs, but radix sort is usually equal or very close
in those cases.
The changes to the regression tests are caused by under-specified sort
orders, e.g. "SELECT a, b from mytable order by a;". For unstable
sorts, such as our qsort and this in-place radix sort, there is no
guarantee of the order of "b" within each group of "a".
The implementation is taken from ska_byte_sort() (Boost licensed),
which is similar to American flag sort (an in-place radix sort) with
modifications to make it better suited for modern pipelined CPUs.
The technique of normalization described above can also be extended
to the case of multiple keys. That is left for future work (Thanks
to Peter Geoghegan for the suggestion to look into this area).
Reviewed-by: Chengpeng Yan <chengpeng_yan@outlook.com>
Reviewed-by: zengman <zengman@halodbtech.com>
Reviewed-by: ChangAo Chen <cca5507@qq.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Chao Li <li.evan.chao@gmail.com> (earlier version)
Discussion: https://postgr.es/m/CANWCAZYzx7a7E9AY16Jt_U3+GVKDADfgApZ-42SYNiig8dTnFA@mail.gmail.com
Commit dfd79e2d added a TODO comment to update this paragraph
when support for PASSING was added. Commit 6185c9737c added
PASSING but missed resolving this TODO. Fix by expanding the
paragraph with a reference to PASSING.
Author: Aditya Gollamudi <adigollamudi@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/20260117051406.sx6pss4ryirn2x4v@pgs
The URL for Ditaa linked to the old Sourceforge version which is
too old for what we need, the fork over on Github is the correct
version to use for re-generating the SVG files for the docs. The
required Ditaa version is 0.11.0 as it when SVG support as added.
Running the version found on Sourceforge produce the error below:
$ ditaa -E -S --svg in.txt out.txt
Unrecognized option: --svg
usage: ditaa <INPFILE> [OUTFILE] [-A] [-b <BACKGROUND>] [-d] [-E] [-e
<ENCODING>] [-h] [--help] [-o] [-r] [-S] [-s <SCALE>] [-T] [-t
<TABS>] [-v] [-W]
While there, also mention that meson rules exists for building
images.
Author: Nazir Bilal Yavuz <byavuz81@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Paul A Jungwirth <pj@illuminatedcomputing.com>
Discussion: https://postgr.es/m/CAN55FZ2O-23xERF2NYcvv9DM_1c9T16y6mi3vyP=O1iuXS0ASA@mail.gmail.com
This adds an 'images' target to the meson build system in order to
be able to regenerate the images used in the docs.
Author: Nazir Bilal Yavuz <byavuz81@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reported-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/CAN55FZ0c0Tcjx9=e-YibWGHa1-xmdV63p=THH4YYznz+pYcfig@mail.gmail.com
The idea is to encourage more the use of these allocation routines
across the tree, as these offer stronger type safety guarantees than
pg_malloc() & co (type cast in the result, sizeof() embedded). This set
of changes is dedicated to the pg_dump code.
Similar work has been done as of 31d3847a37, as one example.
Author: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Aleksander Alekseev <aleksander@tigerdata.com>
Discussion: https://postgr.es/m/CAHut+PvpGPDLhkHAoxw_g3jdrYxA1m16a8uagbgH3TGWSKtXNQ@mail.gmail.com
Use BackgroundPsql's published API for automatically restarting
its timer for each query, rather than manually reaching into it
to achieve the same thing.
010_tab_completion.pl's logic for this predates the invention
of BackgroundPsql (and 664d75753 missed the opportunity to
make it cleaner). 030_pager.pl copied-and-pasted the code.
Author: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Andrew Dunstan <andrew@dunslane.net>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/1100715.1712265845@sss.pgh.pa.us
This log message was referring to conflicts, but it is about checksum
failures. The log message improved in this commit should never show up,
due to the fact that pgstat_prepare_report_checksum_failure() should
always be called before pgstat_report_checksum_failures_in_db(), with a
stats entry already created in the pgstats shared hash table. The three
code paths able to report database-level checksum failures follow
already this requirement.
Oversight in b96d3c3897.
Author: Wang Peng <215722532@qq.com>
Discussion: https://postgr.es/m/tencent_9B6CD6D9D34AE28CDEADEC6188DB3BA1FE07@qq.com
Backpatch-through: 18
It's currently only used in the server, but it was placed in src/port
with the idea that it might be useful in client programs too. However,
it will currently fail to link if used in a client program, because
CHECK_FOR_INTERRUPTS() is not usable in client programs. Fix that by
wrapping it in "#ifndef FRONTEND".
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Discussion: https://www.postgresql.org/message-id/21cc7a48-99d9-4f69-9a3f-2c2de61ac8e5%40iki.fi
Backpatch-through: 18
The uses of these functions do not justify the level of
micro-optimization we've done and may even hurt performance in some
cases (e.g., due to using function pointers). This commit removes
all architecture-specific implementations of pg_popcount{32,64} and
converts the portable ones to inlined functions in pg_bitutils.h.
These inlined versions should produce the same code as before (but
inlined), so in theory this is a net gain for many machines. A
follow-up commit will replace the remaining loops over these
word-length popcount functions with calls to pg_popcount(), further
reducing the need for architecture-specific implementations.
Suggested-by: John Naylor <johncnaylorls@gmail.com>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Reviewed-by: Greg Burd <greg@burd.me>
Discussion: https://postgr.es/m/CANWCAZY7R%2Biy%2Br9YM_sySNydHzNqUirx1xk0tB3ej5HO62GdgQ%40mail.gmail.com
Over the past few releases, we've added a huge amount of complexity
to our popcount implementations. Commits fbe327e5b4, 79e232ca01,
8c6653516c, and 25dc485074 did some preliminary refactoring, but
many opportunities remain. In particular, if we disclaim interest
in micro-optimizing this code for 32-bit builds and in unnecessary
alignment checks on x86-64, we can remove a decent chunk of code.
I cannot find public discussion or benchmarks for the code this
commit removes, but it seems unlikely that this change will
noticeably impact performance on affected systems.
Suggested-by: John Naylor <johncnaylorls@gmail.com>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Discussion: https://postgr.es/m/CANWCAZY7R%2Biy%2Br9YM_sySNydHzNqUirx1xk0tB3ej5HO62GdgQ%40mail.gmail.com
This adds a new ON CONFLICT action DO SELECT [FOR UPDATE/SHARE], which
returns the pre-existing rows when conflicts are detected. The INSERT
statement must have a RETURNING clause, when DO SELECT is specified.
The optional FOR UPDATE/SHARE clause allows the rows to be locked
before they are are returned. As with a DO UPDATE conflict action, an
optional WHERE clause may be used to prevent rows from being selected
for return (but as with a DO UPDATE action, rows filtered out by the
WHERE clause are still locked).
Bumps catversion as stored rules change.
Author: Andreas Karlsson <andreas@proxel.se>
Author: Marko Tiikkaja <marko@joh.to>
Author: Viktor Holmberg <v@viktorh.net>
Reviewed-by: Joel Jacobson <joel@compiler.org>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Reviewed-by: Jian He <jian.universality@gmail.com>
Discussion: https://postgr.es/m/d631b406-13b7-433e-8c0b-c6040c4b4663@Spark
Discussion: https://postgr.es/m/5fca222d-62ae-4a2f-9fcb-0eca56277094@Spark
Discussion: https://postgr.es/m/2b5db2e6-8ece-44d0-9890-f256fdca9f7e@proxel.se
Discussion: https://postgr.es/m/CAL9smLCdV-v3KgOJX3mU19FYK82N7yzqJj2HAwWX70E=P98kgQ@mail.gmail.com
Following e68b6adad9, the reason for skipping slot synchronization is
stored as a slot property. This commit removes redundant function
parameters that previously tracked this state, instead relying directly on
the slot property.
Additionally, this change centralizes the logic for skipping
synchronization when required WAL has not yet been received or flushed. By
consolidating this check, we reduce code duplication and the risk of
inconsistent state updates across different code paths.
In passing, add an assertion to ensure a slot is marked as temporary if a
consistent point has not been reached during synchronization.
Author: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: Shveta Malik <shveta.malik@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/TY4PR01MB16907DD16098BE3B20486D4569463A@TY4PR01MB16907.jpnprd01.prod.outlook.com
Discussion: https://postgr.es/m/CAFPTHDZAA+gWDntpa5ucqKKba41=tXmoXqN3q4rpjO9cdxgQrw@mail.gmail.com
The only place that used p_is_insert was transformAssignedExpr(),
which used it to distinguish INSERT from UPDATE when handling
indirection on assignment target columns -- see commit c1ca3a19df.
However, this information is already available to
transformAssignedExpr() via its exprKind parameter, which is always
either EXPR_KIND_INSERT_TARGET or EXPR_KIND_UPDATE_TARGET.
As noted in the commit message for c1ca3a19df, this use of
p_is_insert isn't particularly pretty, so have transformAssignedExpr()
use the exprKind parameter instead. This then allows p_is_insert to be
removed entirely, which simplifies state management in a few other
places across the parser.
Author: Viktor Holmberg <v@viktorh.net>
Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Discussion: https://postgr.es/m/badc3b4c-da73-4000-b8d3-638a6f53a769@Spark
For a LEFT JOIN, if any var from the right-hand side (RHS) is forced
to null by upper-level quals but is known to be non-null for any
matching row, the only way the upper quals can be satisfied is if the
join fails to match, producing a null-extended row. Thus, we can
treat this left join as an anti-join.
Previously, this transformation was limited to cases where the join's
own quals were strict for the var forced to null by upper qual levels.
This patch extends the logic to check table constraints, leveraging
the NOT NULL attribute information already available thanks to the
infrastructure introduced by e2debb643. If a forced-null var belongs
to the RHS and is defined as NOT NULL in the schema (and is not
nullable due to lower-level outer joins), we know that the left join
can be reduced to an anti-join.
Note that to ensure the var is not nullable by any lower-level outer
joins within the current subtree, we collect the relids of base rels
that are nullable within each subtree during the first pass of the
reduce-outer-joins process. This allows us to verify in the second
pass that a NOT NULL var is indeed safe to treat as non-nullable.
Based on a proposal by Nicolas Adenis-Lamarre, but this is not the
original patch.
Suggested-by: Nicolas Adenis-Lamarre <nicolas.adenis.lamarre@gmail.com>
Author: Tender Wang <tndrwang@gmail.com>
Co-authored-by: Richard Guo <guofenglinux@gmail.com>
Discussion: https://postgr.es/m/CACPGbctKMDP50PpRH09in+oWbHtZdahWSroRstLPOoSDKwoFsw@mail.gmail.com
If the variable's value is null, exec_stmt_return() missed filling
in estate->rettype. This is a pretty old bug, but we'd managed not
to notice because that value isn't consulted for a null result ...
unless we have to cast it to a domain. That case led to a failure
with "cache lookup failed for type 0".
The correct way to assign the data type is known by exec_eval_datum.
While we could copy-and-paste that logic, it seems like a better
idea to just invoke exec_eval_datum, as the ROW case already does.
Reported-by: Pavel Stehule <pavel.stehule@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAFj8pRBT_ahexDf-zT-cyH8bMR_qcySKM8D5nv5MvTWPiatYGA@mail.gmail.com
Backpatch-through: 14
The pg_stat_activity view shows information for aux processes, but the
pg_stat_get_backend_wait_event() and
pg_stat_get_backend_wait_event_type() functions did not. To fix, call
AuxiliaryPidGetProc(pid) if BackendPidGetProc(pid) returns NULL, like
we do in pg_stat_get_activity().
In version 17 and above, it's a little silly to use those functions
when we already have the ProcNumber at hand, but it was necessary
before v17 because the backend ID was different from ProcNumber. I
have other plans for wait_event_info on master, so it doesn't seem
worth applying a different fix on different versions now.
Reviewed-by: Sami Imseih <samimseih@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Discussion: https://www.postgresql.org/message-id/c0320e04-6e85-4c49-80c5-27cfb3a58108@iki.fi
Backpatch-through: 14
This commit adds a new parameter called
password_expiration_warning_threshold that controls when the server
begins emitting imminent-password-expiration warnings upon
successful password authentication. By default, this parameter is
set to 7 days, but this functionality can be disabled by setting it
to 0. This patch also introduces a new "connection warning"
infrastructure that can be reused elsewhere. For example, we may
want to warn about the use of MD5 passwords for a couple of
releases before removing MD5 password support.
Author: Gilles Darold <gilles@darold.net>
Co-authored-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Japin Li <japinli@hotmail.com>
Reviewed-by: songjinzhou <tsinghualucky912@foxmail.com>
Reviewed-by: liu xiaohui <liuxh.zj.cn@gmail.com>
Reviewed-by: Yuefei Shi <shiyuefei1004@gmail.com>
Reviewed-by: Steven Niu <niushiji@gmail.com>
Reviewed-by: Soumya S Murali <soumyamurali.work@gmail.com>
Reviewed-by: Euler Taveira <euler@eulerto.com>
Reviewed-by: Zsolt Parragi <zsolt.parragi@percona.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Greg Sabino Mullane <htamfids@gmail.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/129bcfbf-47a6-e58a-190a-62fc21a17d03%40migops.com
The buildfarm occasionally shows a variant row order in the output
of this UPDATE ... RETURNING, implying that the preceding INSERT
dropped one of the rows into some free space within the table rather
than appending them all at the end. It's not entirely clear why that
happens some times and not other times, but we have established that
it's affected by concurrent activity in other databases of the
cluster. In any case, the behavior is not wrong; the test is at fault
for presuming that a seqscan will give deterministic row ordering.
Add an ORDER BY atop the update to stop the buildfarm noise.
The buildfarm seems to have shown this only in v18 and master
branches, but just in case the cause is older, back-patch to
all supported branches.
Discussion: https://postgr.es/m/3866274.1770743162@sss.pgh.pa.us
Backpatch-through: 14
* Remove an unused variable
* Use "default log level" consistently (instead of "generic")
* Keep the process types in alphabetical order (missed one place in the
SGML docs)
* Since log_min_messages type was changed from enum to string, it
is a good idea to add single quotes when printing it out. Otherwise
it fails if the user copies and pastes from the SHOW output to SET,
except in the simplest case. Using single quotes reduces confusion.
* Use lowercase string for the burned-in default value, to keep the same
output as previous versions.
Author: Euler Taveira <euler@eulerto.com>
Author: Man Zeng <zengman@halodbtech.com>
Author: Noriyoshi Shinoda <noriyoshi.shinoda@hpe.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/202602091250.genyflm2d5dw@alvherre.pgsql
It protects the freeProcs and some other fields in ProcGlobal, so
let's move it there. It's good for cache locality to have it next to
the thing it protects, and just makes more sense anyway. I believe it
was allocated as a separate shared memory area just for historical
reasons.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Discussion: https://www.postgresql.org/message-id/b78719db-0c54-409f-b185-b0d59261143f@iki.fi
On the INSERT page, mention that SELECT privileges are also required
for any columns mentioned in the arbiter clause, including those
referred to by the constraint, and clarify that this applies to all
forms of ON CONFLICT, not just ON CONFLICT DO UPDATE.
Author: Dean Rasheed <dean.a.rasheed@gmail.com>
Reviewed-by: Viktor Holmberg <v@viktorh.net>
Discussion: https://postgr.es/m/CAEZATCXGwMQ+x00YY9XYG46T0kCajH=21QaYL9Xatz0dLKii+g@mail.gmail.com
Backpatch-through: 14
On the CREATE POLICY page, the description of per-command policies
stated that SELECT policies are applied when an INSERT has an ON
CONFLICT DO NOTHING clause. However, that is only the case if it
includes an arbiter clause, so clarify that.
While at it, also clarify the comment in the regression tests that
cover this.
Author: Dean Rasheed <dean.a.rasheed@gmail.com>
Reviewed-by: Viktor Holmberg <v@viktorh.net>
Discussion: https://postgr.es/m/CAEZATCXGwMQ+x00YY9XYG46T0kCajH=21QaYL9Xatz0dLKii+g@mail.gmail.com
Backpatch-through: 14
An extension (or core code) might want to reconstruct the planner's
decisions about whether and where to perform partitionwise joins from
the final plan. To do so, it must be possible to find all of the RTIs
of partitioned tables appearing in the plan. But when an AppendPath
or MergeAppendPath pulls up child paths from a subordinate AppendPath
or MergeAppendPath, the RTIs of the subordinate path do not appear
in the final plan, making this kind of reconstruction impossible.
To avoid this, propagate the RTI sets that would have been present
in the 'apprelids' field of the subordinate Append or MergeAppend
nodes that would have been created into the surviving Append or
MergeAppend node, using a new 'child_append_relid_sets' field for
that purpose. The value of this field is a list of Bitmapsets,
because each relation whose append-list was pulled up had its own
set of RTIs: just one, if it was a partitionwise scan, or more than
one, if it was a partitionwise join. Since our goal is to see where
partitionwise joins were done, it is essential to avoid losing the
information about how the RTIs were grouped in the pulled-up
relations.
This commit also updates pg_overexplain so that EXPLAIN (RANGE_TABLE)
will display the saved RTI sets.
Co-authored-by: Robert Haas <rhaas@postgresql.org>
Co-authored-by: Lukas Fittl <lukas@fittl.com>
Reviewed-by: Lukas Fittl <lukas@fittl.com>
Reviewed-by: Jakub Wartak <jakub.wartak@enterprisedb.com>
Reviewed-by: Greg Burd <greg@burd.me>
Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Haibo Yan <tristan.yim@gmail.com>
Reviewed-by: Alexandra Wang <alexandra.wang.oss@gmail.com>
Discussion: http://postgr.es/m/CA+TgmoZ-Jh1T6QyWoCODMVQdhTUPYkaZjWztzP1En4=ZHoKPzw@mail.gmail.com
This commit changes the definition of varlena to a typedef, so as it
becomes possible to remove "struct" markers from various declarations in
the code base. Historically, "struct" markers are not the project style
for variable declarations, so this update simplifies the code and makes
it more consistent across the board.
This change has an impact on the following structures, simplifying
declarations using them:
- varlena
- varatt_indirect
- varatt_external
This cleanup has come up in a different path set that played with
TOAST and varatt.h, independently worth doing on its own.
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Andreas Karlsson <andreas@proxel.se>
Reviewed-by: Shinya Kato <shinya11.kato@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/aW8xvVbovdhyI4yo@paquier.xyz
An extension (or core code) might want to reconstruct the planner's
choice of join order from the final plan. To do so, it must be possible
to find all of the RTIs that were part of the join problem in that plan.
Commit adbad833f3, together with the
earlier work in 8c49a484e8, is enough to
let us match up RTIs we see in the final plan with RTIs that we see
during the planning cycle, but we still have a problem if the planner
decides to drop some RTIs out of the final plan altogether.
To fix that, when setrefs.c removes a SubqueryScan, single-child Append,
or single-child MergeAppend from the final Plan tree, record the type of
the removed node and the RTIs that the removed node would have scanned
in the final plan tree. It would be natural to record this information
on the child of the removed plan node, but that would require adding an
additional pointer field to type Plan, which seems undesirable. So,
instead, store the information in a separate list that the executor need
never consult, and use the plan_node_id to identify the plan node with
which the removed node is logically associated.
Also, update pg_overexplain to display these details.
Reviewed-by: Lukas Fittl <lukas@fittl.com>
Reviewed-by: Jakub Wartak <jakub.wartak@enterprisedb.com>
Reviewed-by: Greg Burd <greg@burd.me>
Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Haibo Yan <tristan.yim@gmail.com>
Reviewed-by: Alexandra Wang <alexandra.wang.oss@gmail.com>
Discussion: http://postgr.es/m/CA+TgmoZ-Jh1T6QyWoCODMVQdhTUPYkaZjWztzP1En4=ZHoKPzw@mail.gmail.com
Suppose that we're currently planning a query and, when that same
query was previously planned and executed, we learned something about
how a certain table within that query should be planned. We want to
take note when that same table is being planned during the current
planning cycle, but this is difficult to do, because the RTI of the
table from the previous plan won't necessarily be equal to the RTI
that we see during the current planning cycle. This is because each
subquery has a separate range table during planning, but these are
flattened into one range table when constructing the final plan,
changing RTIs.
Commit 8c49a484e8 allows us to match up
subqueries seen in the previous planning cycles with the subqueries
currently being planned just by comparing textual names, but that's
not quite enough to let us deduce anything about individual tables,
because we don't know where each subquery's range table appears in
the final, flattened range table.
To fix that, store a list of SubPlanRTInfo objects in the final
planned statement, each including the name of the subplan, the offset
at which it begins in the flattened range table, and whether or not
it was a dummy subplan -- if it was, some RTIs may have been dropped
from the final range table, but also there's no need to control how
a dummy subquery gets planned. The toplevel subquery has no name and
always begins at rtoffset 0, so we make no entry for it.
This commit teaches pg_overexplain's RANGE_TABLE option to make use
of this new data to display the subquery name for each range table
entry.
Reviewed-by: Lukas Fittl <lukas@fittl.com>
Reviewed-by: Jakub Wartak <jakub.wartak@enterprisedb.com>
Reviewed-by: Greg Burd <greg@burd.me>
Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Haibo Yan <tristan.yim@gmail.com>
Reviewed-by: Alexandra Wang <alexandra.wang.oss@gmail.com>
Discussion: http://postgr.es/m/CA+TgmoZ-Jh1T6QyWoCODMVQdhTUPYkaZjWztzP1En4=ZHoKPzw@mail.gmail.com
Commit 94f3ad3961 failed to do this
because I couldn't think of a use for the information, but this has
proven to be short-sighted. Best to fix it before this code is
officially released.
Now, the only argument to standard_planenr that isn't passed to
planner_setup_hook is boundParams, but that is accessible via
glob->boundParams, and so doesn't need to be passed separately.
Discussion: https://www.postgresql.org/message-id/CA+TgmoYS4ZCVAF2jTce=bMP0Oq_db_srocR4cZyO0OBp9oUoGg@mail.gmail.com
Commit 4020b370f2 had the idea that it
would be a good idea to handle testing PGS_CONSIDER_NONPARTIAL within
cost_material to save callers the trouble, but that turns out not to be
a very good idea. One concern is that it makes cost_material() dependent
on the caller having initialized certain fields in the MaterialPath,
which is a bit awkward for materialize_finished_plan, which wants to use
a dummy path.
Another problem is that it can result in generated materialized nested
loops where the Materialize node is disabled, contrary to the intention
of joinpath.c's logic in match_unsorted_outer() and
consider_parallel_nestloop(), which aims to consider such paths only
when they would not need to be disabled. In the previous coding, it was
possible for the pgs_mask on the joinrel to have PGS_CONSIDER_NONPARTIAL
set, while the inner rel had the same bit clear. In that case, we'd
generate and then disable a Materialize path.
That seems wrong, so instead, pull up the logic to test the
PGS_CONSIDER_NONPARTIAL bit into joinpath.c, restoring the historical
behavior that either we don't generate a given materialized nested loop
in the first place, or we don't disable it.
Discussion: http://postgr.es/m/CA+TgmoawzvCoZAwFS85tE5+c8vBkqgcS8ZstQ_ohjXQ9wGT9sw@mail.gmail.com
Discussion: http://postgr.es/m/CA+TgmoYS4ZCVAF2jTce=bMP0Oq_db_srocR4cZyO0OBp9oUoGg@mail.gmail.com
Two changes here:
1. Introduce a separate RECOVERY_CONFLICT_BUFFERPIN_DEADLOCK flag to
indicate a suspected deadlock that involves a buffer pin. Previously
the startup process used the same flag for a deadlock involving just
regular locks, and to check for deadlocks involving the buffer
pin. The cases are handled separately in the startup process, but the
receiving backend had to deduce which one it was based on
HoldingBufferPinThatDelaysRecovery(). With a separate flag, the
receiver doesn't need to guess.
2. Rewrite the ProcessRecoveryConflictInterrupt() function to not rely
on fallthrough through the switch-statement. That was difficult to
read.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://www.postgresql.org/message-id/4cc13ba1-4248-4884-b6ba-4805349e7f39@iki.fi
The log messages used in this file applied too much quoting logic:
- No need for quote_identifier(), which is fine to not use in the
context of a log entry.
- The usual project style is to group the namespace and object together
in a quoted string, when mentioned in an log message. This code quoted
the namespace name and the extended statistics object name separately,
which was confusing.
Reported-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Discussion: https://postgr.es/m/20260210.143752.1113524465620875233.horikyota.ntt@gmail.com
This commit adds three attributes to the system view pg_stats_ext_exprs,
whose data can exist when involving a range type in an expression:
range_length_histogram
range_empty_frac
range_bounds_histogram
These statistics fields exist since 918eee0c49, and have become
viewable in pg_stats later in bc3c8db8ae. This puts the definition of
pg_stats_ext_exprs on par with pg_stats.
This issue has showed up during the discussion about the restore of
extended statistics for expressions, so as it becomes possible to query
the stats data to restore from the catalogs. Having access to this data
is useful on its own, without the restore part.
Some documentation and some tests are added, written by me. Corey has
authored the part in system_views.sql.
Bump catalog version.
Author: Corey Huinker <corey.huinker@gmail.com>
Co-authored-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/aYmCUx9VvrKiZQLL@paquier.xyz
In the spirit of 8d19d0e13, this patch teaches the planner about the
principle that NullTest with !argisrow is fully equivalent to SQL's IS
[NOT] DISTINCT FROM NULL.
The parser already performs this transformation for literal NULLs.
However, a DistinctExpr expression with one input evaluating to NULL
during planning (e.g., via const-folding of "1 + NULL" or parameter
substitution in custom plans) currently remains as a DistinctExpr
node.
This patch closes the gap for const-folded NULLs. It specifically
targets the case where one input is a constant NULL and the other is a
nullable non-constant expression. (If the other input were otherwise,
the DistinctExpr node would have already been simplified to a constant
TRUE or FALSE.)
This transformation can be beneficial because NullTest is much more
amenable to optimization than DistinctExpr, since the planner knows a
good deal about the former and next to nothing about the latter.
Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Tender Wang <tndrwang@gmail.com>
Discussion: https://postgr.es/m/CAMbWs49BMAOWvkdSHxpUDnniqJcEcGq3_8dd_5wTR4xrQY8urA@mail.gmail.com
The BooleanTest construct (IS [NOT] TRUE/FALSE/UNKNOWN) treats a NULL
input as the logical value "unknown". However, when the input is
proven to be non-nullable, this special handling becomes redundant.
In such cases, the construct can be simplified directly to a boolean
expression or a constant.
Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Tender Wang <tndrwang@gmail.com>
Discussion: https://postgr.es/m/CAMbWs49BMAOWvkdSHxpUDnniqJcEcGq3_8dd_5wTR4xrQY8urA@mail.gmail.com
The IS DISTINCT FROM construct compares values acting as though NULL
were a normal data value, rather than "unknown". Semantically, "x IS
DISTINCT FROM y" yields true if the values differ or if exactly one is
NULL, and false if they are equal or both NULL. Unlike ordinary
comparison operators, it never returns NULL.
Previously, the planner only simplified this construct if all inputs
were constants, folding it to a constant boolean result. This patch
extends the optimization to cases where inputs are non-constant but
proven to be non-nullable. Specifically, "x IS DISTINCT FROM NULL"
folds to constant TRUE if "x" is known to be non-nullable. For cases
where both inputs are guaranteed not to be NULL, the expression
becomes semantically equivalent to "x <> y", and the DistinctExpr is
converted into an inequality OpExpr.
This transformation provides several benefits. It converts the
comparison into a standard operator, allowing the use of partial
indexes and constraint exclusion. Furthermore, if the clause is
negated (i.e., "IS NOT DISTINCT FROM"), it simplifies to an equality
operator. This enables the planner to generate better plans using
index scans, merge joins, hash joins, and EC-based qual deduction.
Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Tender Wang <tndrwang@gmail.com>
Discussion: https://postgr.es/m/CAMbWs49BMAOWvkdSHxpUDnniqJcEcGq3_8dd_5wTR4xrQY8urA@mail.gmail.com
For binary upgrades from v16 or newer, pg_upgrade transfers the
files for pg_largeobject_metadata from the old cluster, as opposed
to using COPY or ordinary SQL commands to reconstruct its contents.
While this approach adds complexity, it can greatly reduce
pg_upgrade's runtime when there are many large objects.
Large objects with comments or security labels are one source of
complexity for this approach. During pg_upgrade, schema
restoration happens before files are transferred. Comments and
security labels are transferred in the former step, but the COMMENT
and SECURITY LABEL commands will fail if their corresponding large
objects do not exist. To deal with this, pg_upgrade first copies
only the rows of pg_largeobject_metadata that are needed to avoid
failures. Later, pg_upgrade overwrites those rows by replacing
pg_largeobject_metadata's files with its files in the old cluster.
Unfortunately, there's a subtle problem here. Simply put, there's
no guarantee that pg_upgrade will overwrite all of
pg_largeobject_metadata's files on the new cluster. For example,
the new cluster's version might more aggressively extend relations
or create visibility maps, and pg_upgrade's file transfer code is
not sophisticated enough to remove files that lack counterparts in
the old cluster. These extra files could cause problems
post-upgrade.
More fortunately, we can simultaneously fix the aforementioned
problem and further optimize binary upgrades for clusters with many
large objects. If we teach the COMMENT and SECURITY LABEL commands
to allow nonexistent large objects during binary upgrades,
pg_upgrade no longer needs to transfer pg_largeobject_metadata's
contents beforehand. This approach also allows us to remove the
associated dependency tracking from pg_dump, even for upgrades from
v12-v15 that use COPY to transfer pg_largeobject_metadata's
contents.
In addition to what is described in the previous paragraph, this
commit modifies the query in getLOs() to only retrieve LOs with
comments or security labels for upgrades from v12 or newer. We
have long assumed that such usage is rare, so this should reduce
pg_upgrade's memory usage and runtime in many cases. We might also
be able to remove the "upgrades from v12 or newer" restriction on
the recent batch of optimizations by adding special handling for
pg_largeobject_metadata's hidden OID column on older versions
(since this catalog previously used the now-removed WITH OIDS
feature), but that is left as a future exercise.
Reported-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/3yd2ss6n7xywo6pmhd7jjh3bqwgvx35bflzgv3ag4cnzfkik7m%40hiyadppqxx6w
Clean up a few leftovers from when the deadlock checker was called
from signal handler. We stopped doing that in commit 6753333f55, in
year 2015.
- CheckDeadLock can return a return value directly to the caller,
there's no need to use a global variable for that.
- Remove outdated comments that claimed that CheckDeadLock "signals
ProcSleep".
- It should be OK to ereport() from DeadLockCheck now. I considered
getting rid of InitDeadLockChecking() and moving the workspace
allocations into DeadLockCheck, but it's still good to avoid doing
the allocations while we're holding all the partition locks. So just
update the comment to give that as the reason we do the allocations
up front.
They are not and never have been used by any known code -- apparently we
just cargo-culted them in commit 37484ad2aa (or their ancestor macros
anyway, which begat these functions in commit 34694ec888). Allegedly
they're also potentially dangerous; users are better off going through
HeapTupleSetHintBits instead.
Author: Andy Fan <zhihuifan1213@163.com>
Discussion: https://postgr.es/m/87sejogt4g.fsf@163.com
While the preceding commit prevented such attachments from occurring
in future, this one aims to prevent further abuse of any already-
created operator that exposes _int_matchsel to the wrong data types.
(No other contrib module has a vulnerable selectivity estimator.)
We need only check that the Const we've found in the query is indeed
of the type we expect (query_int), but there's a difficulty: as an
extension type, query_int doesn't have a fixed OID that we could
hard-code into the estimator.
Therefore, the bulk of this patch consists of infrastructure to let
an extension function securely look up the OID of a datatype
belonging to the same extension. (Extension authors have requested
such functionality before, so we anticipate that this code will
have additional non-security uses, and may soon be extended to allow
looking up other kinds of SQL objects.)
This is done by first finding the extension that owns the calling
function (there can be only one), and then thumbing through the
objects owned by that extension to find a type that has the desired
name. This is relatively expensive, especially for large extensions,
so a simple cache is put in front of these lookups.
Reported-by: Daniel Firer as part of zeroday.cloud
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Noah Misch <noah@leadboat.com>
Security: CVE-2026-2004
Backpatch-through: 14
Selectivity estimators come in two flavors: those that make specific
assumptions about the data types they are working with, and those
that don't. Most of the built-in estimators are of the latter kind
and are meant to be safely attachable to any operator. If the
operator does not behave as the estimator expects, you might get a
poor estimate, but it won't crash.
However, estimators that do make datatype assumptions can malfunction
if they are attached to the wrong operator, since then the data they
get from pg_statistic may not be of the type they expect. This can
rise to the level of a security problem, even permitting arbitrary
code execution by a user who has the ability to create SQL objects.
To close this hole, establish a rule that built-in estimators are
required to protect themselves against being called on the wrong type
of data. It does not seem practical however to expect estimators in
extensions to reach a similar level of security, at least not in the
near term. Therefore, also establish a rule that superuser privilege
is required to attach a non-built-in estimator to an operator.
We expect that this restriction will have little negative impact on
extensions, since estimators generally have to be written in C and
thus superuser privilege is required to create them in the first
place.
This commit changes the privilege checks in CREATE/ALTER OPERATOR
to enforce the rule about superuser privilege, and fixes a couple
of built-in estimators that were making datatype assumptions without
sufficiently checking that they're valid.
Reported-by: Daniel Firer as part of zeroday.cloud
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Noah Misch <noah@leadboat.com>
Security: CVE-2026-2004
Backpatch-through: 14
These data types are represented like full-fledged arrays, but
functions that deal specifically with these types assume that the
array is 1-dimensional and contains no nulls. However, there are
cast pathways that allow general oid[] or int2[] arrays to be cast
to these types, allowing these expectations to be violated. This
can be exploited to cause server memory disclosure or SIGSEGV.
Fix by installing explicit checks in functions that accept these
types.
Reported-by: Altan Birler <altan.birler@tum.de>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Noah Misch <noah@leadboat.com>
Security: CVE-2026-2003
Backpatch-through: 14
pgp_sym_decrypt() and pgp_pub_decrypt() will raise such errors, while
bytea variants will not. The existing "dat3" test decrypted to non-UTF8
text, so switch that query to bytea.
The long-term intent is for type "text" to always be valid in the
database encoding. pgcrypto has long been known as a source of
exceptions to that intent, but a report about exploiting invalid values
of type "text" brought this module to the forefront. This particular
exception is straightforward to fix, with reasonable effect on user
queries. Back-patch to v14 (all supported versions).
Reported-by: Paul Gerste (as part of zeroday.cloud)
Reported-by: Moritz Sanft (as part of zeroday.cloud)
Author: shihao zhong <zhong950419@gmail.com>
Reviewed-by: cary huang <hcary328@gmail.com>
Discussion: https://postgr.es/m/CAGRkXqRZyo0gLxPJqUsDqtWYBbgM14betsHiLRPj9mo2=z9VvA@mail.gmail.com
Backpatch-through: 14
Security: CVE-2026-2006