Add a new FDW callback routine that allows importing remote statistics
for a foreign table directly to the local server, instead of collecting
statistics locally. The new callback routine is called at the beginning
of the ANALYZE operation on the table, and if the FDW failed to import
the statistics, the existing callback routine is called on the table to
collect statistics locally.
Also implement this for postgres_fdw. It is enabled by "restore_stats"
option both at the server and table level. Currently, it is the user's
responsibility to ensure remote statistics to import are up-to-date, so
the default is false.
Author: Corey Huinker <corey.huinker@gmail.com>
Co-authored-by: Etsuro Fujita <etsuro.fujita@gmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Etsuro Fujita <etsuro.fujita@gmail.com>
Discussion: https://postgr.es/m/CADkLM%3DchrYAx%3DX2KUcDRST4RLaRLivYDohZrkW4LLBa0iBhb5w%40mail.gmail.com
Until now extensions that wanted to measure overall query execution could
create QueryDesc->totaltime, which the core executor would then start and
stop. That's a bit odd and composes badly, e.g. extensions always had to use
INSTRUMENT_ALL, because otherwise another extension might not get what they
need.
Instead this introduces a new field, QueryDesc->query_instr_options, that
extensions can use to indicate whether they need query level instrumentation
populated, and with which instrumentation options. Extensions should take care
to only add options they need, instead of replacing the options of others.
The prior name of the field, totaltime, sounded like it would only measure
time, but these days the instrumentation infrastructure can track more
resources. The secondary benefit is that this will make it obvious to
extensions that they may not create the Instrumentation struct themselves
anymore (often extensions build only against a postgres build without
assertions).
Adjust pg_stat_statements and auto_explain to match, and lower the
requested instrumentation level for auto_explain to INSTRUMENT_TIMER,
since the summary instrumentation it needs is only runtime.
The reason to push this now, rather in the PG 20 cycle, is that 5a79e78501
already required extensions using query level instrumentations to adjust their
code, and it seemed undesirable to require them to do so again for 20.
Author: Lukas Fittl <lukas@fittl.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CAP53Pkyqsht+exJQYRsjhSWYKu+vFGHhPub7m6PmFD6Or0=p1g@mail.gmail.com
If pg_stash_advice.persist = true, stashed advice will be written to
pg_stash_advice.tsv in the data directory, periodically and at
shutdown. On restart, stash modifications are locked out until this
file has been reloaded, but queries will not be, so there may be a
short window after startup during which previously-stashed advice is
not automatically applied.
Author: Robert Haas <rhaas@postgresql.org>
Co-authored-by: Lukas Fittl <lukas@fittl.com>
Discussion: https://postgr.es/m/CA+Tgmob87qsWa-VugofU6epuV0H5XjWZGMbQas4Q-ADKmvSyBg@mail.gmail.com
Use templated qsort() so that the comparison function can be
inlined. To speed up qunique(), use a specialized comparison function
that only checks for equality.
Author: David Geier <geidav.pg@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://www.postgresql.org/message-id/2a76b5ef-4b12-4023-93a1-eed6e64968f3@gmail.com
By default, the logical decoding assumes access to shared catalogs, so
the snapshot builder needs to consider cluster-wide XIDs during startup.
That in turn means that, if any transaction is already running (and has
XID assigned), the snapshot builder needs to wait for its completion, as
it does not know if that transaction performed catalog changes earlier.
A possible problem with this concept is that if REPACK (CONCURRENTLY) is
running in some database, backends running the same command in other
databases get stuck until the first one has committed. Thus only a
single backend in the cluster can run REPACK (CONCURRENTLY) at any time.
Likewise, REPACK (CONCURRENTLY) can block walsenders starting on behalf
of subscriptions throughout the cluster.
This patch adds a new option to logical replication output plugin, to
declare that it does not use shared catalogs (i.e. catalogs that can be
changed by transactions running in other databases in the cluster). In
that case, no snapshot the backend will use during the decoding needs to
contain information about transactions running in other databases. Thus
the snapshot builder only needs to wait for completion of transactions
in the current database.
Currently we only use this option in the REPACK background worker. It
could possibly be used in the plugin for logical replication too,
however that would need thorough analysis of that plugin.
Bump WAL version number, due to a new field in xl_running_xacts.
Author: Antonin Houska <ah@cybertec.at>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Discussion: https://postgr.es/m/90475.1775218118@localhost
This commit changes the post_parse_analyze_hook_type() hook to take a
const JumbleState, to tell external modules that they are not allowed to
touch the JumbleState that has been compiled by the core code. This
fixes a pretty old problem with pg_stat_statements, that had always the
idea of modifying the lengths of the constants stored in the
JumbleState. The previous state could confuse extensions that need to
look at a JumbleState depending on the loading order, if
pg_stat_statements is part of the stack loaded.
Another piece included in this commit is the move of the routine
fill_in_constant_lengths() to queryjumblefuncs.c, to give an option to
extensions to compile the lengths of the constants, if necessary. I was
surprised by the number of external code that carries a copy of this
routine (see the thread for details). Previously, this routine modified
JumbleState. It now copies the set of LocationLens from JumbleState,
and fills the constant lengths for separate use.
pg_stat_statements is updated to use the new ComputeConstantLengths().
JumbleState is now marked with a const in the module, where relevant.
Author: Sami Imseih <samimseih@gmail.com>
Co-authored-by: Lukas Fittl <lukas@fittl.com>
Discussion: https://postgr.es/m/CAA5RZ0tZp5qU0ikZEEqJnxvdSNGh1DWv80sb-k4QAUmiMoOp_Q@mail.gmail.com
The associated value should look like something that could be
part of an EXPLAIN options list, but restricted to EXPLAIN options
added by extensions.
For example, if pg_overexplain is loaded, you could set
auto_explain.log_extension_options = 'DEBUG, RANGE_TABLE'.
You can also specify arguments to these options in the same manner
as normal e.g. 'DEBUG 1, RANGE_TABLE false'.
Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com>
Reviewed-by: Lukas Fittl <lukas@fittl.com>
Discussion: http://postgr.es/m/CA+Tgmob-0W8306mvrJX5Urtqt1AAasu8pi4yLrZ1XfwZU-Uj1w@mail.gmail.com
contrib/pg_stash_advice and src/test/modules/test_shmem
missed these, leading to complaints from git after an
in-tree check-world run.
Use our standard boilerplate list of ignorable subdirectories,
although the two modules presently create different subsets
of that.
It would be useful to be able to tell auto_explain to set a custom
EXPLAIN option, but it would be bad if it tried to do so and the
option name or value wasn't valid, because then every query would fail
with a complaint about the EXPLAIN option. So add a guc_check_handler
that auto_explain will be able to use to only try to set option
name/value/type combinations that have been determined to be legal,
and to emit useful messages about ones that aren't.
Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com>
Reviewed-by: Lukas Fittl <lukas@fittl.com>
Discussion: http://postgr.es/m/CA+Tgmob-0W8306mvrJX5Urtqt1AAasu8pi4yLrZ1XfwZU-Uj1w@mail.gmail.com
This module allows plan advice strings to be provided automatically
from an in-memory advice stash. Advice stashes are stored in dynamic
shared memory and must be recreated and repopulated after a server
restart. If pg_stash_advice.stash_name is set to the name of an advice
stash, and if query identifiers are enabled, the query identifier
for each query will be looked up in the advice stash and the
associated advice string, if any, will be used each time that query
is planned.
Reviewed-by: Lukas Fittl <lukas@fittl.com>
Reviewed-by: Alexandra Wang <alexandra.wang.oss@gmail.com>
Reviewed-by: David G. Johnston <david.g.johnston@gmail.com>
Reviewed-by: Jakub Wartak <jakub.wartak@enterprisedb.com>
Discussion: http://postgr.es/m/CA+TgmoaeNuHXQ60P3ZZqJLrSjP3L1KYokW9kPfGbWDyt+1t=Ng@mail.gmail.com
As part of this, embed the LWLock it needs in the shared memory struct
itself, so that we don't need to use RequestNamedLWLockTranche()
anymore. LWLockNewTrancheId() + LWLockInitialize() is more convenient
to use in extensions.
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://www.postgresql.org/message-id/CAExHW5vM1bneLYfg0wGeAa=52UiJ3z4vKd3AJ72X8Fw6k3KKrg@mail.gmail.com
Previously, different places (e.g. query "total time") were repurposing the
Instrumentation struct initially introduced for capturing per-node statistics
during execution. This overuse of the same struct is confusing, e.g. by
cluttering calls of InstrStartNode/InstrStopNode in unrelated code paths, and
prevents future refactorings.
Instead, simplify the Instrumentation struct to only track time and WAL/buffer
usage. Similarly, drop the use of InstrEndLoop outside of per-node
instrumentation - these calls were added without any apparent benefit since
the relevant fields were never read.
Introduce the NodeInstrumentation struct to carry forward the per-node
instrumentation information. WorkerInstrumentation is renamed to
WorkerNodeInstrumentation for clarity.
In passing, clarify that InstrAggNode is expected to only run after
InstrEndLoop (as it does in practice), and drop unused code.
This also fixes a consequence-less bug: Previously ->async_mode was only set
when a non-zero instrument_option was passed. That turns out to be harmless
right now, as ->async_mode only affects a timing related field.
Author: Lukas Fittl <lukas@fittl.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CAP53PkzdBK8VJ1fS4AZ481LgMN8f9mJiC39ZRHqkFUSYq6KWmg@mail.gmail.com
Introduce TriggerInstrumentation to capture trigger timing and firings
(previously counted in "ntuples"), to aid a future refactoring that
splits out all Instrumentation fields beyond timing and WAL/buffers into
more specific structs.
In passing, drop the "n" argument to InstrAlloc, as all remaining callers need
exactly one Instrumentation struct. The duplication between InstrAlloc() and
InstrInit(), as well as the conditional initialization of async_mode will be
addressed in a subsequent commit.
Author: Lukas Fittl <lukas@fittl.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://www.postgresql.org/message-id/flat/CAP53PkzdBK8VJ1fS4AZ481LgMN8f9mJiC39ZRHqkFUSYq6KWmg@mail.gmail.com
READ ONLY transactions should prevent modifications to foreign data as
well as local data, but postgres_fdw transactions declared as READ ONLY
that reference foreign tables mapped to a remote view executing volatile
functions would modify data on remote servers, as it would open remote
transactions in READ WRITE mode.
Similarly, DEFERRABLE transactions should not abort due to a
serialization failure even when accessing foreign data, but postgres_fdw
transactions declared as DEFERRABLE would abort due to that failure in a
remote server, as it would open remote transactions in NOT DEFERRABLE
mode.
To fix, modify postgres_fdw to open remote transactions in the same
access/deferrable modes as the local transaction. This commit also
modifies it to open remote subtransactions in the same access mode as
the local subtransaction.
This commit changes the behavior of READ ONLY/DEFERRABLE transactions
using postgres_fdw; in particular, it doesn't allow the READ ONLY
transactions to modify data on remote servers anymore, so such
transactions should be redeclared as READ WRITE or rewritten using other
tools like dblink. The release notes should note this as an
incompatibility.
These issues exist since the introduction of postgres_fdw, but to avoid
the incompatibility in the back branches, fix them in master only.
Author: Etsuro Fujita <etsuro.fujita@gmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CAPmGK16n_hcUUWuOdmeUS%2Bw4Q6dZvTEDHb%3DOP%3D5JBzo-M3QmpQ%40mail.gmail.com
Discussion: https://postgr.es/m/E1uLe9X-000zsY-2g%40gemulon.postgresql.org
Replace the separate init and max size options with a single size
option. We didn't make much use of the feature, all callers except the
ones in wait_event.c already used the same size for both, and the hash
tables in wait_event.c are small so there's little harm in just
allocating them to the max size.
The only reason why you might want to not reserve the max size upfront
is to make the memory available for other hash tables to grow beyond
their max size. Letting hash tables grow much beyond their max size is
bad for performance, however, because we cannot resize the directory,
and we never had very much "wiggle room" to grow to anyway so you
couldn't really rely on it. We recently marked the LOCK and PROCLOCK
tables with HAS_FIXED_SIZE, so there's nothing left in core that would
benefit from more unallocated shared memory.
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Discussion: https://www.postgresql.org/message-id/01ab1d41-3eda-4705-8bbd-af898f5007f1@iki.fi
This is an extension of the UPDATE and DELETE commands to do a
"temporal update/delete" based on a range or multirange column. The
user can say UPDATE t FOR PORTION OF valid_at FROM '2001-01-01' TO
'2002-01-01' SET ... (or likewise with DELETE) where valid_at is a
range or multirange column.
The command is automatically limited to rows overlapping the targeted
portion, and only history within those bounds is changed. If a row
represents history partly inside and partly outside the bounds, then
the command truncates the row's application time to fit within the
targeted portion, then it inserts one or more "temporal leftovers":
new rows containing all the original values, except with the
application-time column changed to only represent the untouched part
of history.
To compute the temporal leftovers that are required, we use the *_minus_multi
set-returning functions defined in 5eed8ce50c.
- Added bison support for FOR PORTION OF syntax. The bounds must be
constant, so we forbid column references, subqueries, etc. We do
accept functions like NOW().
- Added logic to executor to insert new rows for the "temporal
leftover" part of a record touched by a FOR PORTION OF query.
- Documented FOR PORTION OF.
- Added tests.
Author: Paul A. Jungwirth <pj@illuminatedcomputing.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://www.postgresql.org/message-id/flat/ec498c3d-5f2b-48ec-b989-5561c8aa2024%40illuminatedcomputing.com
Commit 2252fcd427 modified some function prototypes in tableam.h
and heapam.h to take a VacuumParams argument instead of a pointer,
which required including vacuum.h in those headers. vacuum.h has a
reasonably large dependency tree, and headers like tableam.h are
widely included, so this is not ideal. To fix, change the
functions in question to accept a "const VacuumParams *" argument
instead. That allows us to use a forward declaration for
VacuumParams and avoid including vacuum.h. Since vacuum_rel()
needs to scribble on the params argument, we still pass it by value
to that function so that the original struct is not modified.
Reported-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/rzxpxod4c4la62yvutyrvgoyilrl2fx55djaf2suidy7np5m6c%403l2ln476eadh
In addition to removing the bits8, bits16, and bits32 typedefs,
this commit replaces all uses with uint8, uint16, or uint32. bits*
provided little benefit beyond establishing the intent of the
variable, and they were inconsistently used for that purpose.
Third-party code should instead use the corresponding uint*
typedef.
Suggested-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Discussion: https://postgr.es/m/absbX33E4eaA0Ity%40nathan
Add an AM user-settable flags parameter to several of the table scan
functions, one table AM callback, and index_beginscan(). This allows
users to pass additional context to be used when building the scan
descriptors.
For index scans, a new flags field is added to IndexFetchTableData, and
the heap AM saves the caller-provided flags there.
This introduces an extension point for follow-up work to pass per-scan
information (such as whether the relation is read-only for the current
query) from the executor to the AM layer.
Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/2be31f17-5405-4de9-8d73-90ebc322f7d8%40vondra.me
An Append node that is part of a partitionwise aggregate has no
apprelids. If such a node was elided, the previous coding would
attempt to call unique_nonjoin_rtekind() on a NULL pointer, which
leads to an assertion failure. Insert a NULL check to prevent that.
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: http://postgr.es/m/0afba1ce-c946-4131-972d-191d9a1c097c@gmail.com
PlannedStmt->resultRelations was an integer list of range table indexes
because at the time it was added (to Query), the Bitmapset data type did
not yet exist in Postgres.
0f4c170cf3 added a Bitmapset of result relations, so remove the integer
list of RTIs and use the more compact resultRelationRelids.
Discussion: https://postgr.es/m/CAApHDvqAOeOwCKh9g0gfxWa040%3DHyc7_oA%3DC59rjod8kXJDWyw%40mail.gmail.com
Previously, \d+ <table> displayed partitions differently from other object
lists: the first partition appeared on the same line as the "Partitions"
header. For example:
Partitions: pt12 FOR VALUES IN (1, 2),
pt34 FOR VALUES IN (3, 4)
This commit updates the output so that partitions are listed consistently
with other objects, with each entry on its own line starting below the header:
Partitions:
pt12 FOR VALUES IN (1, 2)
pt34 FOR VALUES IN (3, 4)
Author: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Neil Chen <carpenter.nail.cz@gmail.com>
Reviewed-by: Greg Sabino Mullane <htamfids@gmail.com>
Reviewed-by: Soumya S Murali <soumyamurali.work@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CAHut+Pu1puO00C-OhgLnAcECzww8MB3Q8DCsvx0cZWHRfs4gBQ@mail.gmail.com
An upcoming commit will make UnlockReleaseBuffer() considerably faster and
more scalable than doing LockBuffer(BUFFER_LOCK_UNLOCK); ReleaseBuffer();. But
it's a small performance benefit even as-is.
Most of the callsites changed in this patch are not performance sensitive,
however some, like the nbtree ones, are in critical paths.
This patch changes all the easily convertible places over to
UnlockReleaseBuffer() mainly because I needed to check all of them anyway, and
reducing cases where the operations are done separately makes the checking
easier.
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://postgr.es/m/5ubipyssiju5twkb7zgqwdr7q2vhpkpmuelxfpanetlk6ofnop@hvxb4g2amb2d
The premise of src/test/modules/test_plan_advice is that if we plan
a query once, generate plan advice, and then replan it using that
same advice, all of that advice should apply cleanly, since the
settings and everything else are the same. Unfortunately, that's
not the case: the test suite is the main regression tests, and
concurrent activity can change the statistics on tables involved
in the query, especially system catalogs. That's OK as long as it
only affects costing, but in a few cases, it affects which relations
appear in the final plan at all.
In the buildfarm failures observed to date, this happens because
we consider alternative subplans for the same portion of the query;
in theory, MinMaxAggPath is vulnerable to a similar hazard. In both
cases, the planner clones an entire subquery, and the clone has a
different plan name, and therefore different range table identifiers,
than the original. If a cost change results in flipping between one
of these plans and the other, the test_plan_advice tests will fail,
because the range table identifiers to which advice was applied won't
even be present in the output of the second planning cycle.
To fix, invent a new DO_NOT_SCAN advice tag. When generating advice,
emit it for relations that should not appear in the final plan at
all, because some alternative version of that relation was used
instead. When DO_NOT_SCAN is supplied, disable all scan methods for
that relation.
To make this work, we reuse a bunch of the machinery that previously
existed for the purpose of ensuring that we build the same set of
relation identifiers during planning as we do from the final
PlannedStmt. In the process, this commit slightly weakens the
cross-check mechanism: before this commit, it would fire whenever
the pg_plan_advice module was loaded, even if pg_plan_advice wasn't
actually doing anything; now, it will only engage when we have some
other reason to create a pgpa_planner_state. The old way was complex
and didn't add much useful test coverage, so this seems like an
acceptable sacrifice.
Discussion: http://postgr.es/m/CA+TgmoYuWmN-00Ec5pY7zAcpSFQUQLbgAdVWGR9kOR-HM-fHrA@mail.gmail.com
Reviewed-by: Lukas Fittl <lukas@fittl.com>
pg_plan_advice tracks two pieces of per-PlannerInfo data: (1) for each
RTI, the corresponding relation identifier, for purposes of
cross-checking those calculations against the final plan; and (2) the
set of semijoins seen during planning for which the strategy of making
one side unique was considered. The former is tracked using a hash
table that uses <plan_name, RTI> as the key, and the latter is
tracked using a List of <plan_name, relids>.
It seems better to track both of these things in the same way and
to try to reuse some code instead of having everything be completely
separate, so invent pgpa_planner_info; we'll create one every time we
see a new PlannerInfo and need to associate some data with it, and
we'll use the plan_name field to distinguish between PlannerInfo
objects, as it should always be unique. Then, refactor the two
systems mentioned above to use this new infrastructure.
(Note that the adjustment in pgpa_plan_walker is necessary in order
to avoid spuriously triggering the sanity check in that function,
in the case where a pgpa_planner_info is created for a purpose not
related to sj_unique_rels.)
Discussion: https://postgr.es/m/CA+TgmoaK=4w7-qknUo3QhUJ53pXZq=c=KgZmRyD+k7ytqfmgSg@mail.gmail.com
Reviewed-by: Lukas Fittl <lukas@fittl.com>
Since storage/locktags.h was added by commit 322bab7974, many headers
can be made leaner by depending on that instead of on storage/lock.h,
which has many other dependencies.
(In fact, some of these changes were possible even before that.)
Author: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/abvrRZo52Yx9ZzWQ@ip-10-97-1-34.eu-west-3.compute.internal
check_backtrace_functions() and check_archive_directory() were doing an
empty-string check this way:
*newval[0] == '\0'
which, because of operator precedence, is interpreted as *(newval[0])
instead of (*newval)[0] -- but these variables are pointers to C-strings
and we want to check the first character therein, rather than check the
first pointer of the array, so that interpretation is wrong. This would
be wrong for any index element other than 0, as evidenced by every other
dereference of the same variable in check_backtrace_functions, which use
parentheses.
Add parentheses to make the intended dereference explicit.
This is just cosmetic at this stage, so no backpatch, although it's been
"wrong" for a long time.
Author: Zhang Hu <kongbaik228@gmail.com>
Reviewed-by: Junwang Zhao <zhjwpku@gmail.com>
Reviewed-by: Chao Li <lic@highgo.com>
Discussion: https://postgr.es/m/CAB5m2QssN6UO+ckr6ZCcV0A71mKUB6WdiTw1nHo43v4DTW1Dfg@mail.gmail.com
Previously, XLogFindNextRecord() did not return detailed error information
when it failed to find a valid WAL record. As a result, callers such as
the WAL summarizer, pg_waldump, and pg_walinspect could only report generic
errors (e.g., "could not find a valid record after ..."), making
troubleshooting difficult.
This commit fix the issue by extending XLogFindNextRecord() to return
detailed error information on failure, and updating its callers to include
those details in their error messages.
For example, when pg_waldump is run on a WAL file with an invalid magic number,
it now reports not only the generic error but also the specific cause
(e.g., "invalid magic number").
Author: Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com>
Reviewed-by: Mircea Cadariu <cadariu.mircea@gmail.com>
Reviewed-by: Japin Li <japinli@hotmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CAO6_XqoxJXddcT4wkd9Xd+cD6Sz-fyspRGuV4Bq-wbXG4pVNzA@mail.gmail.com
Previously, we always fed SELECT ... INTO to the SPI machinery.
While that works for all cases, it's a great deal slower than
the otherwise-equivalent "var := expression" if the expression
is "simple" and the INTO target is a single variable. Users
coming from MSSQL or T_SQL are likely to be surprised by this;
they are used to writing SELECT ... INTO since there is no
"var := expression" syntax in those dialects. Hence, check for
a simple expression and use the faster code path if possible.
(Here, "simple" means whatever exec_is_simple_query accepts,
which basically means "SELECT scalar-expression" without any
input tables, aggregates, qual clauses, etc.)
This optimization is not entirely transparent. Notably, one of
the reasons it's faster is that the hooks that pg_stat_statements
uses aren't called in this path, so that the evaluated expression
no longer appears in pg_stat_statements output as it did before.
There may be some other minor behavioral changes too, although
I tried hard to make error reporting look the same. Hopefully,
none of them are significant enough to not be acceptable as
routine changes in a PG major version.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com>
Discussion: https://postgr.es/m/CAFj8pRDieSQOPDHD_svvR75875uRejS9cN87FoAC3iXMXS1saQ@mail.gmail.com
genericcostestimate() estimates the number of index leaf pages to
be visited as a pro-rata fraction of the total number of leaf pages.
Or at least that was the intention. What it actually used in the
calculation was the total number of index pages, so that non-leaf
pages were also counted. In a decent-sized index the error is
probably small, since we expect upper page fanout to be high.
But in a small index that's not true; in the worst case with one
data-bearing page plus a metapage, we had 100% relative error.
This led to surprising planning choices such as not using a small
partial index.
To fix, ask genericcostestimate's caller to supply an estimate of
the number of non-leaf pages, and subtract that. For the built-in
index AMs, it seems sufficient to count the index metapage (if the
AM uses one) as non-leaf. Per the above argument, counting upper
index pages shouldn't change the estimate much, and in most cases
we don't have any easy way of estimating the number of upper pages.
This might be an area for further research in future.
Any external genericcostestimate callers that do not set the new field
GenericCosts.numNonLeafPages will see the same behavior as before,
assuming they followed the advice to zero out that whole struct.
Unsurprisingly, this change affects a number of plans seen in the
core regression tests. I hacked up the existing tests to keep the
tests' plans the same, since in each case it appeared that the
test's intent was to test exactly that plan. Also add one new
test case demonstrating that a better index choice is now made.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Henson Choi <assam258@gmail.com>
Discussion: https://postgr.es/m/870521.1745860752@sss.pgh.pa.us
The second half of this file is meant to test feedback, not
generated advice, and is meant to use the statements that it
prepares, not leftover prepared statements from earlier in the
file.
These mistakes resulted in failures under debug_discard_caches = 1,
because re-executing pt2 instead of executing pt4 for the first
time resulted in different output depending on whether the query
was replanned.
Reported-by: Tom Lane <tgl@sss.pgh.pa.us> (per BF member avocet)
The previous code could allocate pgpa_sj_unique_rel objects in a context
that had too short a lifespan. Fix by allocating them (and any
associated List-related allocations) in the same context as the
pgpa_planner_state to which they are attached. We also need to copy
uniquerel->relids, because the associated RelOptInfo may also be
allocated within a short-lived context.
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: http://postgr.es/m/a6e6d603-e847-44dc-acd5-879fb4570062@gmail.com
The Makefile failed to set HEADERS_pg_plan_advice, so the header wasn't
installed. Fixing that reveals another problem: since this is just a
loadable module, not an extension, the header file is installed into
$(includedir_server)/contrib rather than $(includedir_server)/extension.
While we have no existing cases of installing header files there, it
appears to be the intent of pgxs.mk. However, this is inconsistent with
meson.build, which was using dir_include_extension. Changing that to
dir_include_server / 'contrib' makes the install locations consistent
across the two builds.
Author: Zsolt Parragi <zsolt.parragi@percona.com>
Discussion: http://postgr.es/m/CAN4CZFP6NOjv__4Mx+iQD8StdpbHvzDAatEQn2n15UKJ=MySSQ@mail.gmail.com
TOK_IDENT allows only non-keywords; identifier should be used
any place where either keywords or non-keywords should be accepted.
Hence, without this commit, any string that happens to be a keyword
can't be used as a partition schema, partition name, or plan name,
which is incorrect.
Author: Lukas Fittl <lukas@fittl.com>
Discussion: http://postgr.es/m/CAP53PkzKeD=t90OfeMsniYrcRe2THQbUx3g6wV17Y=ZtiwmWTQ@mail.gmail.com
Remove a bunch of #include lines from execnodes.h. Most of these
requier suitable typedefs to be added, so that it still compiles
standalone. In one case, the fix is to move a struct definition to the
one .c file where it is needed.
Also some light clean up in plannodes.h and genam.h, though not as
extensive as in execnodes.h.
Author: Álvaro Herrera <alvherre@kurilemu.de>
Author: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/202603131240.ihwqdxnj7w2o@alvherre.pgsql
Implementation of SQL property graph queries, according to SQL/PGQ
standard (ISO/IEC 9075-16:2023).
This adds:
- GRAPH_TABLE table function for graph pattern matching
- DDL commands CREATE/ALTER/DROP PROPERTY GRAPH
- several new system catalogs and information schema views
- psql \dG command
- pg_get_propgraphdef() function for pg_dump and psql
A property graph is a relation with a new relkind RELKIND_PROPGRAPH.
It acts like a view in many ways. It is rewritten to a standard
relational query in the rewriter. Access privileges act similar to a
security invoker view. (The security definer variant is not currently
implemented.)
Starting documentation can be found in doc/src/sgml/ddl.sgml and
doc/src/sgml/queries.sgml.
Author: Peter Eisentraut <peter@eisentraut.org>
Author: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Junwang Zhao <zhjwpku@gmail.com>
Reviewed-by: Ajay Pal <ajay.pal.k@gmail.com>
Reviewed-by: Henson Choi <assam258@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/a855795d-e697-4fa5-8698-d20122126567@eisentraut.org
Commit 2a525cc97e introduced the ON_ERROR = 'set_null' option for COPY,
allowing it to be used with foreign tables backed by file_fdw. However,
unlike ON_ERROR = 'ignore', no regression test was added to verify
this behavior for file_fdw.
This commit adds a regression test to ensure that foreign tables using
file_fdw work correctly with ON_ERROR = 'set_null', improving test coverage.
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Yi Ding <dingyi_yale@163.com>
Discussion: https://postgr.es/m/CAHGQGwGmPc6aHpA5=WxKreiDePiOEitfOFsW2dSo5m81xWXgRA@mail.gmail.com
As of this commit all TupleDescs must have TupleDescFinalize() called on
them once the TupleDesc is set up and before BlessTupleDesc() is called.
In this commit, TupleDescFinalize() does nothing. This change has only
been separated out from the commit that properly implements this function
to make the change more obvious. Any extension which makes its own
TupleDesc will need to be modified to call the new function.
The follow-up commit which properly implements TupleDescFinalize() will
cause any code which forgets to do this to fail in assert-enabled builds in
BlessTupleDesc(). It may still be worth mentioning this change in the
release notes so that extension authors update their code.
Author: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Zsolt Parragi <zsolt.parragi@percona.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Junwang Zhao <zhjwpku@gmail.com>
Discussion: https://postgr.es/m/CAApHDvpoFjaj3%2Bw_jD5uPnGazaw41A71tVJokLDJg2zfcigpMQ%40mail.gmail.com
This commit plugs into pgstattuple_approx(), the SQL function faster
than pgstattuple() that returns approximate results, the streaming read
APIs. A callback is used to be able to skip all-visible pages via VM
lookup, to match with the logic prior to this commit.
Under test conditions similar to 6c228755ad (some dm_delay and
debug_io_direct=data), this can substantially improve the execution time
of the function, particularly for large relations.
Author: Xuneng Zhou <xunengzhou@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Discussion: https://postgr.es/m/CABPTF7VrqfbcDXqGrdLQ2xaQ=K0RzExNuw6U_GGqzSJu32wfdQ@mail.gmail.com
Since commit 5883ff30b0, some compilers have been warning that the
rtekind variable in unique_nonjoin_rtekind() may be used
uninitialized. There doesn't appear to be any actual risk, so
let's just initialize it to something to silence the compiler
warnings.
Author: Sami Imseih <samimseih@gmail.com>
Discussion: https://postgr.es/m/CAA5RZ0sieVNfniCKMDdDjuXGd1OuzMQfTS5%3D9vX3sa-iiujKUA%40mail.gmail.com