Previously, LOCK TABLE command through a parent table checked
the permissions on not only the parent table but also the children
tables inherited from it. This was a bug and inherited queries should
perform access permission checks on the parent table only. This
commit fixes LOCK TABLE so that it does not check the permission
on children tables.
This patch is applied only in the master branch. We decided not to
back-patch because it's not hard to imagine that there are some
applications expecting the old behavior and the change breaks their
security.
Author: Amit Langote
Reviewed-by: Fujii Masao
Discussion: https://postgr.es/m/CAHGQGwE+GauyG7POtRfRwwthAGwTjPQYdFHR97+LzA4RHGnJxA@mail.gmail.com
Looks like guaibasaurus had a autovacuum running during the
controller_print_speculative_locks step (just added in
43e0841970). Which does indeed seem quite possible.
Avoid the problem by only looking for the backends participating in
the test.
Previously, the speculative insert tests did not cover the case when a
tuple t is inserted into a table with a unique index on a column but
before it can insert into the index, a concurrent transaction has
inserted a conflicting value into the index and the insertion of tuple t
must be aborted.
The basic permutation is one session successfully inserts into the table
and an associated unique index while a concurrent session successfully
inserts into the table but discovers a conflict before inserting into
the index and must abort the insertion.
Several variants on this include:
- swap which session is successful
- first session insert transaction does not commit, so second session
must wait on a transaction lock
- first session insert does not "complete", so second session must wait
on a speculative insertion lock
Also, refactor the existing TOAST table upsert test to be in the same
spec and reuse the steps.
Author: Melanie Plageman, Ashwin Agrawal, Andres Freund
Reviewed-by: Andres Freund, Taylor Vesely
Discussion: https://postgr.es/m/CAAKRu_ZRmxy_OEryfY3G8Zp01ouhgw59_-_Cm8n7LzRH5BAvng@mail.gmail.com
On a multi-level partioned table, when adding a partition not directly
connected to the root table, foreign key constraints referencing the
root were not cloned to the new partition, leading to the FK being
possibly inadvertently violated later on.
This was caused by fuzzy thinking in CloneFkReferenced (commit
f56f8f8da6): it was skipping constraints marked as having parents on
the theory that cloning those would create duplicates; but that's only
correct for the top level of the partitioning hierarchy. For levels
below that one, such constraints must still be considered and only
skipped if later on we see that we'd create duplicates. Apparently, I
(Álvaro) wrote the comments right but the code implemented something
slightly different.
Author: Jehan-Guillaume de Rorthais
Discussion: https://postgr.es/m/20200206004948.238352db@firost
When running TRUNCATE CASCADE on a child of a partitioned table
referenced by another partitioned table, the truncate was not applied to
partitions of the referencing table; this could leave rows violating the
constraint in the referencing partitioned table. Repair by walking the
pg_constraint chain all the way up to the topmost referencing table.
Note: any partitioned tables containing FKs that reference other
partitioned tables should be checked for possible violating rows, if
TRUNCATE has occurred in partitions of the referenced table.
Reported-by: Christophe Courtois
Author: Jehan-Guillaume de Rorthais
Discussion: https://postgr.es/m/20200204183906.115f693e@firost
Commit 147e3722f7 changed Tid scan so that it calls table_beginscan()
and uses the scan option for seq scan. This change caused two issues.
(1) The change caused Tid scan to take a predicate lock on the entire
relation in serializable transaction even when relation-level
lock is not necessary. This could lead to an unexpected
serialization error.
(2) The change caused Tid scan to increment the number of seq_scan
in pg_stat_*_tables views even though it's not seq scan. This
could confuse the users.
This commit adds the scan option for Tid scan and makes Tid scan
use it, to avoid those issues.
Back-patch to v12, where the bug was introduced.
Author: Tatsuhito Kasahara
Reviewed-by: Kyotaro Horiguchi, Masahiko Sawada, Fujii Masao
Discussion: https://postgr.es/m/CAP0=ZVKy+gTbFmB6X_UW0pP3WaeJ-fkUWHoD-pExS=at3CY76g@mail.gmail.com
This reverts commit 41aadee, as the GUC checks could run on older values
with the new values used, and result in incorrect errors if both
parameters are changed at the same time.
Per complaint from Tom Lane.
Discussion: https://postgr.es/m/27574.1581015893@sss.pgh.pa.us
Backpatch-through: 12
This new field tracks the PID of the group leader used with parallel
query. For parallel workers and the leader, the value is set to the
PID of the group leader. So, for the group leader, the value is the
same as its own PID. Note that this reflects what PGPROC stores in
shared memory, so as leader_pid is NULL if a backend has never been
involved in parallel query. If the backend is using parallel query or
has used it at least once, the value is set until the backend exits.
Author: Julien Rouhaud
Reviewed-by: Sergei Kornilov, Guillaume Lelarge, Michael Paquier, Tomas
Vondra
Discussion: https://postgr.es/m/CAOBaU_Yy5bt0vTPZ2_LUM6cUcGeqmYNoJ8-Rgto+c2+w3defYA@mail.gmail.com
Tuple conversion incorrectly concluded that no conversion was needed
as long as all the attributes lined up. But if the source tuple has a
missing attribute (from addition of a column with default), then the
destination tupdesc might not reflect the same default. The typical
symptom was that the affected columns would be unexpectedly NULL.
Repair by always forcing conversion if the source has missing
attributes, which will be filled in by the deform operation. (In
theory we could optimize for when the destination has the same
default, but that seemed overkill.)
Backpatch to 11 where missing attributes were added.
Per bug #16242.
Vik Fearing (discovery, code, testing) and me (analysis, testcase).
Discussion: https://postgr.es/m/16242-d1c9fca28445966b@postgresql.org
PostgresNode already retained base directories in such cases. Stop
using $SIG{__DIE__}, which is redundant with the exit status check, in
lieu of proliferating it to TestLib. Back-patch to 9.6, where commit
88802e0680 introduced retention on
failure.
Reviewed by Daniel Gustafsson.
Discussion: https://postgr.es/m/20200202170155.GA3264196@rfd.leadboat.com
Commit 499be013d added this field in a rather poorly-thought-through
manner, with the result being that rather than being a field of the
Append or MergeAppend plan node as intended (and as it seems to be,
in text format), it was actually an element of the "Plans" subgroup.
At least in JSON format, that's flat out invalid syntax, because
"Plans" is an array not an object.
While it's not hard to move the generation of the field so that it
appears where it's supposed to, this does result in a visible change
in field order in text format, in cases where a Append or MergeAppend
plan node has any InitPlans attached. That's slightly annoying to
do in stable branches; but the alternative of continuing to emit
broken non-text formats seems worse.
Also, since the set of fields emitted is not supposed to be
data-dependent in non-text formats, make sure that "Subplans Removed"
appears in Append and MergeAppend nodes even when it's zero, in those
formats. (The previous coding made it look like it could appear in
some other node types such as BitmapAnd, but we don't actually support
runtime pruning there, so don't emit it in those cases.)
Per bug #16171 from Mahadevan Ramachandran. Fix by Daniel Gustafsson
and Tom Lane, reviewed by Hamid Akhtar. Back-patch to v11 where this
code came in.
Discussion: https://postgr.es/m/16171-b72259ab75505fa2@postgresql.org
Commit fc7695891 changed CheckAttributeType to recurse into ranges,
but made it pass down the wrong collation (always InvalidOid, since
ranges as such have no collation). This would result in guaranteed
failure when considering a range type whose subtype is collatable.
Embarrassingly, we lack any regression tests that would expose such
a problem (but fortunately, somebody noticed before we shipped this
bug in any release).
Fix it to pass down the range's subtype collation property instead,
and add some regression test cases to exercise collatable-subtype
ranges a bit more. Back-patch to all supported branches, as the
previous patch was.
Report and patch by Julien Rouhaud, test cases tweaked by me
Discussion: https://postgr.es/m/CAOBaU_aBWqNweiGUFX0guzBKkcfJ8mnnyyGC_KBQmO12Mj5f_A@mail.gmail.com
Previously, TRUNCATE command through a parent table checked the
permissions on not only the parent table but also the children tables
inherited from it. This was a bug and inherited queries should perform
access permission checks on the parent table only. This commit fixes
that bug.
Back-patch to all supported branches.
Author: Amit Langote
Reviewed-by: Fujii Masao
Discussion: https://postgr.es/m/CAHGQGwFHdSvifhJE+-GSNqUHSfbiKxaeQQ7HGcYz6SC2n_oDcg@mail.gmail.com
Advancing a physical replication slot with pg_replication_slot_advance()
did not mark the slot as dirty if any advancing was done, preventing the
follow-up checkpoint to flush the slot data to disk. This caused the
advancing to be lost even on clean restarts. This does not happen for
logical slots as any advancing marked the slot as dirty. Per
discussion, the original feature has been implemented so as in the event
of a crash the slot may move backwards to a past LSN. This property is
kept and more documentation is added about that.
This commit adds some new TAP tests to check the persistency of physical
and logical slots after advancing across clean restarts.
Author: Alexey Kondratov, Michael Paquier
Reviewed-by: Andres Freund, Kyotaro Horiguchi, Craig Ringer
Discussion: https://postgr.es/m/059cc53a-8b14-653a-a24d-5f867503b0ee@postgrespro.ru
Backpatch-through: 11
This patch creates a new extension property, "trusted". An extension
that's marked that way in its control file can be installed by a
non-superuser who has the CREATE privilege on the current database,
even if the extension contains objects that normally would have to be
created by a superuser. The objects within the extension will (by
default) be owned by the bootstrap superuser, but the extension itself
will be owned by the calling user. This allows replicating the old
behavior around trusted procedural languages, without all the
special-case logic in CREATE LANGUAGE. We have, however, chosen to
loosen the rules slightly: formerly, only a database owner could take
advantage of the special case that allowed installation of a trusted
language, but now anyone who has CREATE privilege can do so.
Having done that, we can delete the pg_pltemplate catalog, moving the
knowledge it contained into the extension script files for the various
PLs. This ends up being no change at all for the in-core PLs, but it is
a large step forward for external PLs: they can now have the same ease
of installation as core PLs do. The old "trusted PL" behavior was only
available to PLs that had entries in pg_pltemplate, but now any
extension can be marked trusted if appropriate.
This also removes one of the stumbling blocks for our Python 2 -> 3
migration, since the association of "plpythonu" with Python 2 is no
longer hard-wired into pg_pltemplate's initial contents. Exactly where
we go from here on that front remains to be settled, but one problem
is fixed.
Patch by me, reviewed by Peter Eisentraut, Stephen Frost, and others.
Discussion: https://postgr.es/m/5889.1566415762@sss.pgh.pa.us
Before, if a recovery target is configured, but the archive ended
before the target was reached, recovery would end and the server would
promote without further notice. That was deemed to be pretty wrong.
With this change, if the recovery target is not reached, it is a fatal
error.
Based-on-patch-by: Leif Gunnar Erlandsen <leif@lako.no>
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/993736dd3f1713ec1f63fc3b653839f5@lako.no
EvalPlanQualStart() supposed that it could re-use the relsubs_rowmark
and relsubs_done arrays from a prior instantiation. But since they are
allocated in the es_query_cxt of the recheckestate, that's just wrong;
EvalPlanQualEnd() will blow away that storage. Therefore we were using
storage that could have been reallocated to something else, causing all
sorts of havoc.
I think this was modeled on the old code's handling of es_epqTupleSlot,
but since the code was anyway clearing the arrays at re-use, there's
clearly no expectation of importing any outside state. So it's just
a dubious savings of a couple of pallocs, which is negligible compared
to setting up a new planstate tree. Therefore, just allocate the
arrays always. (I moved the allocations slightly for readability.)
In principle this bug could cause a problem whenever EPQ rechecks are
needed in more than one target table of a ModifyTable plan node.
In practice it seems not quite so easy to trigger as that; I couldn't
readily duplicate a crash with a partitioned target table, for instance.
That's probably down to incidental choices about when to free or
reallocate stuff. The added isolation test case does seem to reliably
show an assertion failure, though.
Per report from Oleksii Kliukin. Back-patch to v12 where the bug was
introduced (evidently by commit 3fb307bc4).
Discussion: https://postgr.es/m/EEF05F66-2871-4786-992B-5F45C92FEE2E@hintbits.com
This gives more information to the user about the error and it makes such
messages consistent with the other similar messages in the code.
Reported-by: Simon Riggs
Author: Mahendra Singh and Simon Riggs
Reviewed-by: Beena Emerson and Amit Kapila
Discussion: https://postgr.es/m/CANP8+j+7YUvQvGxTrCiw77R23enMJ7DFmyA3buR+fa2pKs4XhA@mail.gmail.com
These two new parameters, named sslminprotocolversion and
sslmaxprotocolversion, allow to respectively control the minimum and the
maximum version of the SSL protocol used for the SSL connection attempt.
The default setting is to allow any version for both the minimum and the
maximum bounds, causing libpq to rely on the bounds set by the backend
when negotiating the protocol to use for an SSL connection. The bounds
are checked when the values are set at the earliest stage possible as
this makes the checks independent of any SSL implementation.
Author: Daniel Gustafsson
Reviewed-by: Michael Paquier, Cary Huang
Discussion: https://postgr.es/m/4F246AE3-A7AE-471E-BD3D-C799D3748E03@yesql.se
In non-TEXT output formats, the "Settings" field should appear when
requested, even if it would be empty.
Also, get rid of the premature optimization of counting all the
GUC_EXPLAIN variables at startup. Since there was no provision for
adjusting that count later, all it'd take would be some extension marking
a parameter as GUC_EXPLAIN to risk an assertion failure or memory stomp.
We could make get_explain_guc_options() count those variables on-the-fly,
or dynamically resize its array ... but TBH I do not think that making a
transient array of pointers a bit smaller is worth any extra complication,
especially when you consider all the other transient space EXPLAIN eats.
So just allocate that array at the max possible size.
In HEAD, also add some regression test coverage for this feature.
Because of the memory-stomp hazard, back-patch to v12 where this
feature was added.
Discussion: https://postgr.es/m/19416.1580069629@sss.pgh.pa.us
Previously, it was possible for EXPLAIN ANALYZE of a parallel query
to produce several different "Workers" fields for a single plan node,
because different portions of explain.c independently generated
per-worker data and wrapped that output in separate fields. This
is pretty bogus, especially for the structured output formats: even
if it's not technically illegal, most programs would have a hard time
dealing with such data.
To improve matters, add infrastructure that allows redirecting
per-worker values into a side data structure, and then collect that
data into a single "Workers" field after we've finished running all
the relevant code for a given plan node.
There are a few visible side-effects:
* In text format, instead of something like
Sort Method: external merge Disk: 4920kB
Worker 0: Sort Method: external merge Disk: 5880kB
Worker 1: Sort Method: external merge Disk: 5920kB
Buffers: shared hit=682 read=10188, temp read=1415 written=2101
Worker 0: actual time=130.058..130.324 rows=1324 loops=1
Buffers: shared hit=337 read=3489, temp read=505 written=739
Worker 1: actual time=130.273..130.512 rows=1297 loops=1
Buffers: shared hit=345 read=3507, temp read=505 written=744
you get
Sort Method: external merge Disk: 4920kB
Buffers: shared hit=682 read=10188, temp read=1415 written=2101
Worker 0: actual time=130.058..130.324 rows=1324 loops=1
Sort Method: external merge Disk: 5880kB
Buffers: shared hit=337 read=3489, temp read=505 written=739
Worker 1: actual time=130.273..130.512 rows=1297 loops=1
Sort Method: external merge Disk: 5920kB
Buffers: shared hit=345 read=3507, temp read=505 written=744
* When JIT is enabled, any relevant per-worker JIT stats are attached
to the child node of the Gather or Gather Merge node, which is where
the other per-worker output has always been. Previously, that info
was attached directly to a Gather node, or missed entirely for Gather
Merge.
* A query's summary JIT data no longer includes a bogus
"Worker Number: -1" field.
A notable code-level change is that indenting for lines of text-format
output should now be handled by calling "ExplainIndentText(es)",
instead of hard-wiring how much space to emit. This seems a good deal
cleaner anyway.
This patch also adds a new "explain.sql" regression test script that's
dedicated to testing EXPLAIN. There is more that can be done in that
line, certainly, but for now it just adds some coverage of the XML and
YAML output formats, which had been completely untested.
Although this is surely a bug fix, it's not clear that people would
be happy with rearranging EXPLAIN output in a minor release, so apply
to HEAD only.
Maciek Sakrejda and Tom Lane, based on an idea of Andres Freund's;
reviewed by Georgios Kokolatos
Discussion: https://postgr.es/m/CAOtHd0AvAA8CLB9Xz0wnxu1U=zJCKrr1r4QwwXi_kcQsHDVU=Q@mail.gmail.com
I had supposed that the from_char_seq_search() call sites were
all passing the constant arrays you'd expect them to pass ...
but on looking closer, the one for DY format was passing the
days[] array not days_short[]. This accidentally worked because
the day abbreviations in English are all the same as the first
three letters of the full day names. However, once we took out
the "maximum comparison length" logic, it stopped working.
As penance for that oversight, add regression test cases covering
this, as well as every other switch case in DCH_from_char() that
was not reached according to the code coverage report.
Also, fold the DCH_RM and DCH_rm cases into one --- now that
seq_search is case independent, there's no need to pass different
comparison arrays for those cases.
Back-patch, as the previous commit was.
seq_search(), which is used to match input substrings to constants
such as month and day names, had a lot of bizarre and unnecessary
behaviors. It was mostly possible to avert our eyes from that before,
but we don't want to duplicate those behaviors in the upcoming patch
to allow recognition of non-English month and day names. So it's time
to clean this up. In particular:
* seq_search scribbled on the input string, which is a pretty dangerous
thing to do, especially in the badly underdocumented way it was done here.
Fortunately the input string is a temporary copy, but that was being made
three subroutine levels away, making it something easy to break
accidentally. The behavior is externally visible nonetheless, in the form
of odd case-folding in error reports about unrecognized month/day names.
The scribbling is evidently being done to save a few calls to pg_tolower,
but that's such a cheap function (at least for ASCII data) that it's
pretty pointless to worry about. In HEAD I switched it to be
pg_ascii_tolower to ensure it is cheap in all cases; but there are corner
cases in Turkish where this'd change behavior, so leave it as pg_tolower
in the back branches.
* seq_search insisted on knowing the case form (all-upper, all-lower,
or initcap) of the constant strings, so that it didn't have to case-fold
them to perform case-insensitive comparisons. This likewise seems like
excessive micro-optimization, given that pg_tolower is certainly very
cheap for ASCII data. It seems unsafe to assume that we know the case
form that will come out of pg_locale.c for localized month/day names, so
it's better just to define the comparison rule as "downcase all strings
before comparing". (The choice between downcasing and upcasing is
arbitrary so far as English is concerned, but it might not be in other
locales, so follow citext's lead here.)
* seq_search also had a parameter that'd cause it to report a match
after a maximum number of characters, even if the constant string were
longer than that. This was not actually used because no caller passed
a value small enough to cut off a comparison. Replicating that behavior
for localized month/day names seems expensive as well as useless, so
let's get rid of that too.
* from_char_seq_search used the maximum-length parameter to truncate
the input string in error reports about not finding a matching name.
This leads to rather confusing reports in many cases. Worse, it is
outright dangerous if the input string isn't all-ASCII, because we
risk truncating the string in the middle of a multibyte character.
That'd lead either to delivering an illegible error message to the
client, or to encoding-conversion failures that obscure the actual
data problem. Get rid of that in favor of truncating at whitespace
if any (a suggestion due to Alvaro Herrera).
In addition to fixing these things, I const-ified the input string
pointers of DCH_from_char and its subroutines, to make sure there
aren't any other scribbling-on-input problems.
The risk of generating a badly-encoded error message seems like
enough of a bug to justify back-patching, so patch all supported
branches.
Discussion: https://postgr.es/m/29432.1579731087@sss.pgh.pa.us
This test case was sketched in commit message 4c87010981 to explain an
ancient bug; it translates to a coverage increase, so add it to the BRIN
regression tests.
Attempting to use CREATE INDEX, DROP INDEX or REINDEX with CONCURRENTLY
on a temporary relation with ON COMMIT actions triggered unexpected
errors because those operations use multiple transactions internally to
complete their work. Here is for example one confusing error when using
ON COMMIT DELETE ROWS:
ERROR: index "foo" already contains data
Issues related to temporary relations and concurrent indexing are fixed
in this commit by enforcing the non-concurrent path to be taken for
temporary relations even if using CONCURRENTLY, transparently to the
user. Using a non-concurrent path does not matter in practice as locks
cannot be taken on a temporary relation by a session different than the
one owning the relation, and the non-concurrent operation is more
effective.
The problem exists with REINDEX since v12 with the introduction of
CONCURRENTLY, and with CREATE/DROP INDEX since CONCURRENTLY exists for
those commands. In all supported versions, this caused only confusing
error messages to be generated. Note that with REINDEX, it was also
possible to issue a REINDEX CONCURRENTLY for a temporary relation owned
by a different session, leading to a server crash.
The idea to enforce transparently the non-concurrent code path for
temporary relations comes originally from Andres Freund.
Reported-by: Manuel Rigger
Author: Michael Paquier, Heikki Linnakangas
Reviewed-by: Andres Freund, Álvaro Herrera, Heikki Linnakangas
Discussion: https://postgr.es/m/CA+u7OA6gP7YAeCguyseusYcc=uR8+ypjCcgDDCTzjQ+k6S9ksQ@mail.gmail.com
Backpatch-through: 9.4
The behavior of something like
ALTER TABLE transactions
ADD COLUMN status varchar(30) DEFAULT 'old',
ALTER COLUMN status SET default 'current';
is to fill existing table rows with 'old', not 'current'. That's
intentional and desirable for a couple of reasons:
* It makes the behavior the same whether you merge the sub-commands
into one ALTER command or give them separately;
* If we applied the new default while filling the table, there would
be no way to get the existing behavior in one SQL command.
The same reasoning applies in cases that add a column and then
manipulate its GENERATED/IDENTITY status in a second sub-command,
since the generation expression is really just a kind of default.
However, that wasn't very obvious (at least not to me; earlier in
the referenced discussion thread I'd thought it was a bug to be
fixed). And it certainly wasn't documented.
Hence, add documentation, code comments, and a test case to clarify
that this behavior is all intentional.
In passing, adjust ATExecAddColumn's defaults-related relkind check
so that it matches up exactly with ATRewriteTables, instead of being
effectively (though not literally) the negated inverse condition.
The reasoning can be explained a lot more concisely that way, too
(not to mention that the comment now matches the code, which it
did not before).
Discussion: https://postgr.es/m/10365.1558909428@sss.pgh.pa.us
Some buildfarm members were still warning about this, because in
9c679a08f I'd missed decorating one of the ereport() code paths
with a dummy return.
Also, adjust the error messages to be more in line with project
style guide.
This feature allows the vacuum to leverage multiple CPUs in order to
process indexes. This enables us to perform index vacuuming and index
cleanup with background workers. This adds a PARALLEL option to VACUUM
command where the user can specify the number of workers that can be used
to perform the command which is limited by the number of indexes on a
table. Specifying zero as a number of workers will disable parallelism.
This option can't be used with the FULL option.
Each index is processed by at most one vacuum process. Therefore parallel
vacuum can be used when the table has at least two indexes.
The parallel degree is either specified by the user or determined based on
the number of indexes that the table has, and further limited by
max_parallel_maintenance_workers. The index can participate in parallel
vacuum iff it's size is greater than min_parallel_index_scan_size.
Author: Masahiko Sawada and Amit Kapila
Reviewed-by: Dilip Kumar, Amit Kapila, Robert Haas, Tomas Vondra,
Mahendra Singh and Sergei Kornilov
Tested-by: Mahendra Singh and Prabhat Sahu
Discussion:
https://postgr.es/m/CAD21AoDTPMgzSkV4E3SFo1CH_x50bf5PqZFQf4jmqjk-C03BWg@mail.gmail.comhttps://postgr.es/m/CAA4eK1J-VoR9gzS5E75pcD-OH0mEyCdp8RihcwKrcuw7J-Q0+w@mail.gmail.com
Mixing incorrect bounds set in the SSL context leads to confusing error
messages generated by OpenSSL which are hard to act on. New checks are
added within the GUC machinery to improve the user experience as they
apply to any SSL implementation, not only OpenSSL, and doing the checks
beforehand avoids the creation of a SSL during a reload (or startup)
which we know will never be used anyway.
Backpatch down to 12, as those parameters have been introduced by
e73e67c.
Author: Michael Paquier
Reviewed-by: Daniel Gustafsson
Discussion: https://postgr.es/m/20200114035420.GE1515@paquier.xyz
Backpatch-through: 12
The strategy of GIN index scan is driven by opclass-specific extract_query
method. This method that needed search mode is GIN_SEARCH_MODE_ALL. This
mode means that matching tuple may contain none of extracted entries. Simple
example is '!term' tsquery, which doesn't need any term to exist in matching
tsvector.
In order to handle such scan key GIN calculates virtual entry, which contains
all TIDs of all entries of attribute. In fact this is full scan of index
attribute. And typically this is very slow, but allows to handle some queries
correctly in GIN. However, current algorithm calculate such virtual entry for
each GIN_SEARCH_MODE_ALL scan key even if they are multiple for the same
attribute. This is clearly not optimal.
This commit improves the situation by introduction of "exclude only" scan keys.
Such scan keys are not capable to return set of matching TIDs. Instead, they
are capable only to filter TIDs produced by normal scan keys. Therefore,
each attribute should contain at least one normal scan key, while rest of them
may be "exclude only" if search mode is GIN_SEARCH_MODE_ALL.
The same optimization might be applied to the whole scan, not per-attribute.
But that leads to NULL values elimination problem. There is trade-off between
multiple possible ways to do this. We probably want to do this later using
some cost-based decision algorithm.
Discussion: https://postgr.es/m/CAOBaU_YGP5-BEt5Cc0%3DzMve92vocPzD%2BXiZgiZs1kjY0cj%3DXBg%40mail.gmail.com
Author: Nikita Glukhov, Alexander Korotkov, Tom Lane, Julien Rouhaud
Reviewed-by: Julien Rouhaud, Tomas Vondra, Tom Lane
Commit 9b63c13f0 turns out to have been fundamentally misguided:
the parent node's subPlan list is by no means the only way in which
a child SubPlan node can be hooked into the outer execution state.
As shown in bug #16213 from Matt Jibson, we can also get short-lived
tuple table slots added to the outer es_tupleTable list. At this point
I have little faith that there aren't other possible connections as
well; the long time it took to notice this problem shows that this
isn't a heavily-exercised situation.
Therefore, revert that fix, returning to the coding that passed a
NULL parent plan pointer down to the transiently-built subexpressions.
That gives us a pretty good guarantee that they won't hook into the
outer executor state in any way. But then we need some other solution
to make SubPlans work. Adopt the solution speculated about in the
previous commit's log message: do expression initialization at plan
startup for just those VALUES rows containing SubPlans, abandoning the
goal of reclaiming memory intra-query for those rows. In practice it
seems unlikely that queries containing a vast number of VALUES rows
would be using SubPlans in them, so this should not give up much.
(BTW, this test case also refutes my claim in connection with the prior
commit that the issue only arises with use of LATERAL. That was just
wrong: some variants of SubLink always produce SubPlans.)
As with previous patch, back-patch to all supported branches.
Discussion: https://postgr.es/m/16213-871ac3bc208ecf23@postgresql.org
jsonb_set_lax() is the same as jsonb_set, except that it takes and extra
argument that specifies what to do if the value argument is NULL. The
default is 'use_json_null'. Other possibilities are 'raise_exception',
'return_target' and 'delete_key', all these behaviours having been
suggested as reasonable by various users.
Discussion: https://postgr.es/m/375873e2-c957-3a8d-64f9-26c43c2b16e7@2ndQuadrant.com
Reviewed by: Pavel Stehule
We've had numerous bug reports about how (1) IF NOT EXISTS clauses in
ALTER TABLE don't behave as-expected, and (2) combining certain actions
into one ALTER TABLE doesn't work, though executing the same actions as
separate statements does. This patch cleans up all of the cases so far
reported from the field, though there are still some oddities associated
with identity columns.
The core problem behind all of these bugs is that we do parse analysis
of ALTER TABLE subcommands too soon, before starting execution of the
statement. The root of the bugs in group (1) is that parse analysis
schedules derived commands (such as a CREATE SEQUENCE for a serial
column) before it's known whether the IF NOT EXISTS clause should cause
a subcommand to be skipped. The root of the bugs in group (2) is that
earlier subcommands may change the catalog state that later subcommands
need to be parsed against.
Hence, postpone parse analysis of ALTER TABLE's subcommands, and do
that one subcommand at a time, during "phase 2" of ALTER TABLE which
is the phase that does catalog rewrites. Thus the catalog effects
of earlier subcommands are already visible when we analyze later ones.
(The sole exception is that we do parse analysis for ALTER COLUMN TYPE
subcommands during phase 1, so that their USING expressions can be
parsed against the table's original state, which is what we need.
Arguably, these bugs stem from falsely concluding that because ALTER
COLUMN TYPE must do early parse analysis, every other command subtype
can too.)
This means that ALTER TABLE itself must deal with execution of any
non-ALTER-TABLE derived statements that are generated by parse analysis.
Add a suitable entry point to utility.c to accept those recursive
calls, and create a struct to pass through the information needed by
the recursive call, rather than making the argument lists of
AlterTable() and friends even longer.
Getting this to work correctly required a little bit of fiddling
with the subcommand pass structure, in particular breaking up
AT_PASS_ADD_CONSTR into multiple passes. But otherwise it's mostly
a pretty straightforward application of the above ideas.
Fixing the residual issues for identity columns requires refactoring of
where the dependency link from an identity column to its sequence gets
set up. So that seems like suitable material for a separate patch,
especially since this one is pretty big already.
Discussion: https://postgr.es/m/10365.1558909428@sss.pgh.pa.us
This uses the progress reporting infrastructure added by c16dc1aca5,
adding support for ANALYZE.
Co-authored-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Co-authored-by: Tatsuro Yamada <tatsuro.yamada.tf@nttcom.co.jp>
Reviewed-by: Julien Rouhaud, Robert Haas, Anthony Nowocien, Kyotaro Horiguchi,
Vignesh C, Amit Langote
Introduce new fields amusemaintenanceworkmem and amparallelvacuumoptions
in IndexAmRoutine for parallel vacuum. The amusemaintenanceworkmem tells
whether a particular IndexAM uses maintenance_work_mem or not. This will
help in controlling the memory used by individual workers as otherwise,
each worker can consume memory equal to maintenance_work_mem. The
amparallelvacuumoptions tell whether a particular IndexAM participates in
a parallel vacuum and if so in which phase (bulkdelete, vacuumcleanup) of
vacuum.
Author: Masahiko Sawada and Amit Kapila
Reviewed-by: Dilip Kumar, Amit Kapila, Tomas Vondra and Robert Haas
Discussion:
https://postgr.es/m/CAD21AoDTPMgzSkV4E3SFo1CH_x50bf5PqZFQf4jmqjk-C03BWg@mail.gmail.comhttps://postgr.es/m/CAA4eK1LmcD5aPogzwim5Nn58Ki+74a6Edghx4Wd8hAskvHaq5A@mail.gmail.com
A view with conditional INSTEAD rules and no unconditional INSTEAD
rules or INSTEAD OF triggers is not auto-updatable. Previously we
relied on a check in the executor to catch this, but that's
problematic since the planner may fail to properly handle such a query
and thus return a particularly unhelpful error to the user, before
reaching the executor check.
Instead, trap this in the rewriter and report the correct error there.
Doing so also allows us to include more useful error detail than the
executor check can provide. This doesn't change the existing behaviour
of updatable views; it merely ensures that useful error messages are
reported when a view isn't updatable.
Per report from Pengzhou Tang, though not adopting that suggested fix.
Back-patch to all supported branches.
Discussion: https://postgr.es/m/CAG4reAQn+4xB6xHJqWdtE0ve_WqJkdyCV4P=trYr4Kn8_3_PEA@mail.gmail.com
This test was trying to test the mechanism to release kernel FDs as needed
to get us under the max_safe_fds limit in case of spill files. To do that,
it needs to set max_files_per_process to a very low value which doesn't
even permit starting of the server in the case when there are a few already
opened files. This test also won't work on platforms where we use one FD
per semaphore.
Backpatch-through: 10, till where this test was added
Discussion:
https://postgr.es/m/CAA4eK1LHhERi06Q+MmP9qBXBBboi+7WV3910J0aUgz71LcnKAw@mail.gmail.comhttps://postgr.es/m/6485.1578583522@sss.pgh.pa.us
Previously, the core scanner's yy_transition[] array had 37045 elements.
Since that number is larger than INT16_MAX, Flex generated the array to
contain 32-bit integers. By reimplementing some of the bulkier scanner
rules, this patch reduces the array to 20495 elements. The much smaller
total length, combined with the consequent use of 16-bit integers for
the array elements reduces the binary size by over 200kB. This was
accomplished in two ways:
1. Consolidate handling of quote continuations into a new start condition,
rather than duplicating that logic for five different string types.
2. Treat Unicode strings and identifiers followed by a UESCAPE sequence
as three separate tokens, rather than one. The logic to de-escape
Unicode strings is moved to the filter code in parser.c, which already
had the ability to provide special processing for token sequences.
While we could have implemented the conversion in the grammar, that
approach was rejected for performance and maintainability reasons.
Performance in microbenchmarks of raw parsing seems equal or slightly
faster in most cases, and it's reasonable to expect that in real-world
usage (with more competition for the CPU cache) there will be a larger
win. The exception is UESCAPE sequences; lexing those is about 10%
slower, primarily because the scanner now has to be called three times
rather than one. This seems acceptable since that feature is very
rarely used.
The psql and epcg lexers are likewise modified, primarily because we
want to keep them all in sync. Since those lexers don't use the
space-hogging -CF option, the space savings is much less, but it's
still good for perhaps 10kB apiece.
While at it, merge the ecpg lexer's handling of C-style comments used
in SQL and in C. Those have different rules regarding nested comments,
but since we already have the ability to keep track of the previous
start condition, we can use that to handle both cases within a single
start condition. This matches the core scanner more closely.
John Naylor
Discussion: https://postgr.es/m/CACPNZCvaoa3EgVWm5yZhcSTX6RAtaLgniCPcBVOCwm8h3xpWkw@mail.gmail.com
Until now we've only used a single multivariate MCV list per relation,
covering the largest number of clauses. So for example given a query
SELECT * FROM t WHERE a = 1 AND b =1 AND c = 1 AND d = 1
and extended statistics on (a,b) and (c,d), we'd only pick and use one
of them. This commit improves this by repeatedly picking and applying
the best statistics (matching the largest number of remaining clauses)
until no additional statistics is applicable.
This greedy algorithm is simple, but may not be optimal. A different
choice of statistics may leave fewer clauses unestimated and/or give
better estimates for some other reason.
This can however happen only when there are overlapping statistics, and
selecting one makes it impossible to use the other. E.g. with statistics
on (a,b), (c,d), (b,c,d), we may pick either (a,b) and (c,d) or (b,c,d).
But it's not clear which option is the best one.
We however assume cases like this are rare, and the easiest solution is
to define statistics covering the whole group of correlated columns. In
the future we might support overlapping stats, using some of the clauses
as conditions (in conditional probability sense).
Author: Tomas Vondra
Reviewed-by: Mark Dilger, Kyotaro Horiguchi
Discussion: https://postgr.es/m/20191028152048.jc6pqv5hb7j77ocp@development
When considering functional dependencies during selectivity estimation,
it's not necessary to bother with selecting the best extended statistic
object and then use just dependencies from it. We can simply consider
all applicable functional dependencies at once.
This means we need to deserialie all (applicable) dependencies before
applying them to the clauses. This is a bit more expensive than picking
the best statistics and deserializing dependencies for it. To minimize
the additional cost, we ignore statistics that are not applicable.
Author: Tomas Vondra
Reviewed-by: Mark Dilger
Discussion: https://postgr.es/m/20191028152048.jc6pqv5hb7j77ocp@development
On the publisher, it was assumed that an INSERT change cannot happen for
a relation with no replica identity. However this is true only for a
change that needs references to old rows, aka UPDATE or DELETE, so
trying to use logical replication with a relation that has no replica
identity led to an assertion failure in the publisher when issuing an
INSERT. This commit removes the incorrect assertion, and adds more
regression tests to provide coverage for relations without replica
identity.
Reported-by: Neha Sharma
Author: Dilip Kumar, Michael Paquier
Reviewed-by: Andres Freund
Discussion: https://postgr.es/m/CANiYTQsL1Hb8_Km08qd32svrqNumXLJeoGo014O7VZymgOhZEA@mail.gmail.com
Backpatch-through: 10
Fix assorted bugs in handling of non-blocking I/O when using GSSAPI
encryption. The encryption layer could return the wrong status
information to its caller, resulting in effectively dropping some data
(or possibly in aborting a not-broken connection), or in a "livelock"
situation where data remains to be sent but the upper layers think
transmission is done and just go to sleep. There were multiple small
thinkos contributing to that, as well as one big one (failure to think
through what to do when a send fails after having already transmitted
data). Note that these errors could cause failures whether the client
application asked for non-blocking I/O or not, since both libpq and
the backend always run things in non-block mode at this level.
Also get rid of use of static variables for GSSAPI inside libpq;
that's entirely not okay given that multiple connections could be
open at once inside a single client process.
Also adjust a bunch of random small discrepancies between the frontend
and backend versions of the send/receive functions -- except for error
handling, they should be identical, and now they are.
Also extend the Kerberos TAP tests to exercise cases where nontrivial
amounts of data need to be pushed through encryption. Before, those
tests didn't provide any useful coverage at all for the cases of
interest here. (They still might not, depending on timing, but at
least there's a chance.)
Per complaint from pmc@citylink and subsequent investigation.
Back-patch to v12 where this code was introduced.
Discussion: https://postgr.es/m/20200109181822.GA74698@gate.oper.dinoex.org
This tells you about allocations that have been made from the main
shared memory segment. The original patch also tried to show information
about dynamic shared memory allocation as well, but I decided to
leave that problem for another time.
Andres Freund and Robert Haas, reviewed by Michael Paquier, Marti
Raudsepp, Tom Lane, Álvaro Herrera, and Kyotaro Horiguchi.
Discussion: http://postgr.es/m/20140504114417.GM12715@awork2.anarazel.de
Use the parser's standard type coercion machinery to convert the
output column(s) of a SQL function's final SELECT or RETURNING
to the type(s) they should have according to the function's declared
result type. We'll allow any case where an assignment-level
coercion is available. Previously, we failed unless the required
coercion was a binary-compatible one (and the documentation ignored
this, falsely claiming that the types must match exactly).
Notably, the coercion now accounts for typmods, so that cases where
a SQL function is declared to return a composite type whose columns
are typmod-constrained now behave as one would expect. Arguably
this aspect is a bug fix, but the overall behavioral change here
seems too large to consider back-patching.
A nice side-effect is that functions can now be inlined in a
few cases where we previously failed to do so because of type
mismatches.
Discussion: https://postgr.es/m/18929.1574895430@sss.pgh.pa.us