Commit graph

5150 commits

Author SHA1 Message Date
Michael Paquier
41a033b505 Preserve index data in pg_statistic across REINDEX CONCURRENTLY
Statistics associated to an index got lost after running REINDEX
CONCURRENTLY, while the non-concurrent case preserves these correctly.
The concurrent and non-concurrent operations need to be consistent for
the end-user, and missing statistics would force to wait for a new
analyze to happen, which could take some time depending on the activity
of the existing autovacuum workers.  This issue is fixed by copying any
existing entries in pg_statistic associated to the old index to the new
one.  Note that this copy is already done with the data of the index in
the stats collector.

Reported-by: Fabrízio de Royes Mello
Author: Michael Paquier, Fabrízio de Royes Mello
Reviewed-by: Justin Pryzby
Discussion: https://postgr.es/m/CAFcNs+qpFPmiHd1oTXvcPdvAHicJDA9qBUSujgAhUMJyUMb+SA@mail.gmail.com
Backpatch-through: 12
2020-11-01 21:24:15 +09:00
Tom Lane
25b587f03a Stabilize timetz test across DST transitions.
The timetz test cases I added in commit a9632830b were unintentionally
sensitive to whether or not DST is active in the PST8PDT time zone.
Thus, they'll start failing this coming weekend, as reported by
Bernhard M. Wiedemann in bug #16689.  Fortunately, DST-awareness is
not significant to the purpose of these test cases, so we can just
force them all to PDT (DST hours) to preserve stability of the
results.

Back-patch to v10, as the prior patch was.

Discussion: https://postgr.es/m/16689-57701daa23b377bf@postgresql.org
2020-10-29 15:28:35 -04:00
Tom Lane
43330cdd40 Calculate extraUpdatedCols in query rewriter, not parser.
It's unsafe to do this at parse time because addition of generated
columns to a table would not invalidate stored rules containing
UPDATEs on the table ... but there might now be dependent generated
columns that were not there when the rule was made.  This also fixes
an oversight that rewriteTargetView failed to update extraUpdatedCols
when transforming an UPDATE on an updatable view.  (Since the new
calculation is downstream of that, rewriteTargetView doesn't actually
need to do anything; but before, there was a demonstrable bug there.)

In v13 and HEAD, this leads to easily-visible bugs because (since
commit c6679e4fc) we won't recalculate generated columns that aren't
listed in extraUpdatedCols.  In v12 this bitmap is mostly just used
for trigger-firing decisions, so you'd only notice a problem if a
trigger cared whether a generated column had been updated.

I'd complained about this back in May, but then forgot about it
until bug #16671 from Michael Paul Killian revived the issue.

Back-patch to v12 where this field was introduced.  If existing
stored rules contain any extraUpdatedCols values, they'll be
ignored because the rewriter will overwrite them, so the bug will
be fixed even for existing rules.  (But note that if someone were
to update to 13.1 or 12.5, store some rules with UPDATEs on tables
having generated columns, and then downgrade to a prior minor version,
they might observe issues similar to what this patch fixes.  That
seems unlikely enough to not be worth going to a lot of effort to fix.)

Discussion: https://postgr.es/m/10206.1588964727@sss.pgh.pa.us
Discussion: https://postgr.es/m/16671-2fa55851859fb166@postgresql.org
2020-10-28 13:47:02 -04:00
Alvaro Herrera
7ffead21a4
Accept relations of any kind in LOCK TABLE
The restriction that only tables and views can be locked by LOCK TABLE
is quite arbitrary, since the underlying mechanism can lock any relation
type.  Drop the restriction so that programs such as pg_dump can lock
all relations they're interested in, preventing schema changes that
could cause a dump to fail after expending much effort.

Backpatch to 9.5.

Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reported-by: Wells Oliver <wells.oliver@gmail.com>
Discussion: https://postgr.es/m/20201021200659.GA32358@alvherre.pgsql
2020-10-27 13:49:19 -03:00
Tom Lane
de78c10072 Fix corner case for a BEFORE ROW UPDATE trigger returning OLD.
If the old row has any "missing" attributes that are supposed to
be retrieved from an associated tuple descriptor, the wrong things
happened because the trigger result is shoved directly into an
executor slot that lacks the missing-attribute data.  Notably,
CHECK-constraint verification would incorrectly see those columns
as NULL, and so would RETURNING-list evaluation.

Band-aid around this by forcibly expanding the tuple before passing
it to the trigger function.  (IMO it was a fundamental misdesign to
put the missing-attribute data into tuple constraints, which so
much of the system considers to be optional.  But we're probably
stuck with that now, and will have to continue to apply band-aids
as we find other places with similar issues.)

Back-patch to v12.  v11 would also have the issue, except that
commit 920311ab1 already applied a similar band-aid.  That forced
expansion in more cases than seem really necessary, though, so
this isn't a directly equivalent fix.

Amit Langote, with some cosmetic changes by me

Discussion: https://postgr.es/m/16644-5da7ef98a7ac4545@postgresql.org
2020-10-25 13:57:46 -04:00
Alvaro Herrera
a818000f6a
Use fast checkpoint in PostgresNode::backup()
Should cause tests to be a bit faster
2020-10-21 14:37:25 -03:00
Alvaro Herrera
0e6b6f8c71
Fix ALTER TABLE .. ENABLE/DISABLE TRIGGER recursion
More precisely, correctly handle the ONLY flag indicating not to
recurse.  This was implemented in 86f575948c by recursing in
trigger.c, but that's the wrong place; use ATSimpleRecursion instead,
which behaves properly.  However, because legacy inheritance has never
recursed in that situation, make sure to do that only for new-style
partitioning.

I noticed this problem while testing a fix for another bug in the
vicinity.

This has been wrong all along, so backpatch to 11.

Discussion: https://postgr.es/m/20201016235925.GA29829@alvherre.pgsql
2020-10-20 19:22:09 -03:00
Tom Lane
c7e2364a5f Paper over regression failures in infinite_recurse() on PPC64 Linux.
Our infinite_recurse() test to verify sane stack-overrun behavior
is affected by a bug of the Linux kernel on PPC64: it will get SIGSEGV
if it receives a signal when the stack depth is (a) over 1MB and
(b) within a few kB of filling the current physical stack allocation.
See https://bugzilla.kernel.org/show_bug.cgi?id=205183.

Since this test is a bit time-consuming and we run it in parallel with
test scripts that do a lot of DDL, it can be expected to get an sinval
catchup interrupt at some point, leading to failure if the timing is
wrong.  This has caused more than 100 buildfarm failures over the
past year or so.

While a fix exists for the kernel bug, it might be years before that
propagates into all production kernels, particularly in some of the
older distros we have in the buildfarm.  For now, let's just back off
and not run this test on Linux PPC64; that loses nothing in test
coverage so far as our own code is concerned.

To do that, split this test into a new script infinite_recurse.sql
and skip the test when the platform name is powerpc64...-linux-gnu.

Back-patch to v12.  Branches before that have not been seen to get
this failure.  No doubt that's because the "errors" test was not
run in parallel with other tests before commit 798070ec0, greatly
reducing the odds of an sinval catchup being necessary.

I also back-patched 3c8553547 into v12, just so the new regression
script would look the same in all branches having it.

Discussion: https://postgr.es/m/3479046.1602607848@sss.pgh.pa.us
Discussion: https://postgr.es/m/20190723162703.GM22387%40telsasoft.com
2020-10-13 17:44:56 -04:00
Tom Lane
d8c2a21998 Rethink recent fix for pg_dump's handling of extension config tables.
Commit 3eb3d3e78 was a few bricks shy of a load: while it correctly
set the table's "interesting" flag when deciding to dump the data of
an extension config table, it was not correct to clear that flag
if we concluded we shouldn't dump the data.  This led to the crash
reported in bug #16655, because in fact we'll traverse dumpTableSchema
anyway for all extension tables (to see if they have user-added
seclabels or RLS policies).

The right thing to do is to force "interesting" true in makeTableDataInfo,
and otherwise leave the flag alone.  (Doing it there is more future-proof
in case additional calls are added, and it also avoids setting the flag
unnecessarily if that function decides the table is non-dumpable.)

This investigation also showed that while only the --inserts code path
had an obvious failure in the case considered by 3eb3d3e78, the COPY
code path also has a problem with not having loaded table subsidiary
data.  That causes fmtCopyColumnList to silently return an empty string
instead of the correct column list.  That accidentally mostly works,
which perhaps is why we didn't notice this before.  It would only fail
if the restore column order is different from the dump column order,
which only happens in weird inheritance cases, so it's not surprising
nobody had hit the case with an extension config table.  Nonetheless,
it's a bug, and it goes a long way back, not just to v12 where the
--inserts code path started to have a problem with this.

In hopes of catching such cases a bit sooner in future, add some
Asserts that "interesting" has been set in both dumpTableData and
dumpTableSchema.  Adjust the test case added by 3eb3d3e78 so that it
checks the COPY rather than INSERT form of that bug, allowing it to
detect the longer-standing symptom.

Per bug #16655 from Cameron Daniel.  Back-patch to all supported
branches.

Discussion: https://postgr.es/m/16655-5c92d6b3a9438137@postgresql.org
Discussion: https://postgr.es/m/18048b44-3414-b983-8c7c-9165b177900d@2ndQuadrant.com
2020-10-07 12:51:04 -04:00
Tom Lane
5856ed1099 Improve stability of identity.sql regression test.
I noticed while trying to run the regression tests under a low
geqo_threshold that one query on information_schema.columns had
unstable (as in, variable from one run to the next) output order.
This is pretty unsurprising given the complexity of the underlying
plan.  Interestingly, of this test's three nigh-identical queries on
information_schema.columns, the other two already had ORDER BY clauses
guaranteeing stable output.  Let's make this one look the same.

Back-patch to v10 where this test was added.  We've not heard field
reports of the test failing, but this experience shows that it can
happen when testing under even slightly unusual conditions.
2020-10-04 20:45:36 -04:00
Tom Lane
6854c45b36 Put back explicit setting of replication values within TAP tests.
Commit 151c0c5f7 neglected the possibility that a TEMP_CONFIG file
would explicitly set max_wal_senders=0; as indeed buildfarm member
thorntail does, so that it can test wal_level=minimal in other test
suites.  Hence, rather than assuming that max_wal_senders=10 will
prevail if we say nothing, set it explicitly.

Set max_replication_slots=10 explicitly too, just to be safe.

Back-patch to v10, like the previous patch.

Discussion: https://postgr.es/m/723911.1601417626@sss.pgh.pa.us
2020-10-01 10:59:20 -04:00
Alvaro Herrera
f669ba7bdb
Reword partitioning error message
The error message about columns in the primary key not including all of
the partition key was unclear; reword it.

Backpatch all the way to pg11, where it appeared.

Reported-by: Nagaraj Raj <nagaraj.sf@yahoo.com>
Discussion: https://postgr.es/m/64062533.78364.1601415362244@mail.yahoo.com
2020-09-30 18:25:23 -03:00
Tom Lane
c5232dca8d Fix handling of BC years in to_date/to_timestamp.
Previously, a conversion such as
	to_date('-44-02-01','YYYY-MM-DD')
would result in '0045-02-01 BC', as the code attempted to interpret
the negative year as BC, but failed to apply the correction needed
for our internal handling of BC years.  Fix the off-by-one problem.

Also, arrange for the combination of a negative year and an
explicit "BC" marker to cancel out and produce AD.  This is how
the negative-century case works, so it seems sane to do likewise.

Continue to read "year 0000" as 1 BC.  Oracle would throw an error,
but we've accepted that case for a long time so I'm hesitant to
change it in a back-patch.

Per bug #16419 from Saeed Hubaishan.  Back-patch to all supported
branches.

Dar Alathar-Yemen and Tom Lane

Discussion: https://postgr.es/m/16419-d8d9db0a7553f01b@postgresql.org
2020-09-30 15:40:23 -04:00
Tom Lane
09b29ca82b Remove obsolete replication settings within TAP tests.
PostgresNode.pm set "max_wal_senders = 5" for replication testing,
but this seems to be slightly too low for our current test suite.
Slower buildfarm members frequently report "number of requested standby
connections exceeds max_wal_senders" failures, due to old walsenders
not exiting instantaneously.  Usually, the test does not fail overall
because of automatic walreceiver restart, but sometimes the failure
becomes visible; and in any case such retries slow down the test.

That value came in with commit 89ac7004d, but was soon obsoleted by
f6d6d2920, which raised the built-in default from zero to 10; so that
PostgresNode.pm is actually setting it to less than the conservative
built-in default.  That seems pretty pointless, so let's remove the
special setting and let the default prevail, in hopes of making
the TAP tests more robust.

Likewise, the setting "max_replication_slots = 5" is obsolete and
can be removed.

While here, reverse-engineer a comment about why we're choosing
less-than-default values for some other settings.

(Note: before v12, max_wal_senders counted against max_connections
so that the latter setting also needs some fiddling with.)

Back-patch to v10 where the subscription tests were added.
It's likely that the older branches aren't pushing the boundaries
of max_wal_senders, but I'm disinclined to spend time trying to
figure out exactly when it started to be a problem.

Discussion: https://postgr.es/m/723911.1601417626@sss.pgh.pa.us
2020-09-29 20:02:58 -04:00
Tom Lane
29f20db85e Assign collations in partition bound expressions.
Failure to do this can result in errors during evaluation of
the bound expression, as illustrated by the new regression test.

Back-patch to v12 where the ability for partition bounds to be
expressions was added.

Discussion: https://postgr.es/m/CAJV4CdrZ5mKuaEsRSbLf2URQ3h6iMtKD=hik8MaF5WwdmC9uZw@mail.gmail.com
2020-09-28 14:12:38 -04:00
Tom Lane
1af91dc032 Use factorial rather than numeric_fac in create_operator.sql.
These two SQL functions are aliases for the same C function, so this
change has no semantic effect.  However, because we dropped the
numeric_fac alias in HEAD (commit 76f412ab3), operator definitions
based on that one don't port forward, causing problems for cross-version
upgrade tests based on the regression database.

Patch all active back branches to dodge the problem.

Discussion: https://postgr.es/m/449144.1600439950@sss.pgh.pa.us
2020-09-18 18:03:44 -04:00
Noah Misch
af2d09fa1c Fix interpolation in test name.
A pre-commit review had reported the problem, but the fix reached only
v10 and earlier.  Back-patch to v11.

Discussion: https://postgr.es/m/20200423.140546.1055476118690602079.horikyota.ntt@gmail.com
2020-09-13 23:29:55 -07:00
Tom Lane
1371a1e416 Use the properly transformed RangeVar for expandTableLikeClause().
transformCreateStmt() adjusts the transformed statement's RangeVar
to specify the target schema explicitly, for the express reason
of making sure that auxiliary statements derived by parse
transformation operate on the right table.  But the refactoring
I did in commit 502898192 got this wrong and passed the untransformed
RangeVar to expandTableLikeClause().  This could lead to assertion
failures or weird misbehavior if the wrong table was accessed.

Per report from Alexander Lakhin.  Like the previous patch, back-patch
to all supported branches.

Discussion: https://postgr.es/m/05051f9d-b32b-cb35-6735-0e9f2ab86b5f@gmail.com
2020-09-13 12:51:21 -04:00
Alvaro Herrera
ef1e1250e7
Check default partitions constraints while descending
Partitioning tuple route code assumes that the partition chosen while
descending the partition hierarchy is always the correct one.  This is
true except when the partition is the default partition and another
partition has been added concurrently: the partition constraint changes
and we don't recheck it.  This can lead to tuples mistakenly being added
to the default partition that should have been rejected.

Fix by rechecking the default partition constraint while descending the
hierarchy.

An isolation test based on the reproduction steps described by Hao Wu
(with tweaks for extra coverage) is included.

Backpatch to 12, where this bug came in with 898e5e3290.

Reported by: Hao Wu <hawu@vmware.com>
Author: Amit Langote <amitlangote09@gmail.com>
Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/CA+HiwqFqBmcSSap4sFnCBUEL_VfOMmEKaQ3gwUhyfa4c7J_-nA@mail.gmail.com
Discussion: https://postgr.es/m/DM5PR0501MB3910E97A9EDFB4C775CF3D75A42F0@DM5PR0501MB3910.namprd05.prod.outlook.com
2020-09-08 19:35:15 -03:00
Tom Lane
fc37c6f610 Make new authentication test case more robust.
I happened to notice that the new test case I added in b55b4dad9
falls over if one runs "make check" repeatedly; though not in branches
after v10.  That's because it was assuming that tmp_check/pgpass
wouldn't exist already.  However, it's only been since v11 that the
Makefiles forcibly remove all of tmp_check/ before starting a TAP run.
This fix to unlink the file is therefore strictly necessary only in
v10 ... but it seems wisest to do it across the board, rather than
let the test rely on external logic to get the conditions right.
2020-09-04 21:01:59 -04:00
Andrew Dunstan
616110eac3 Collect attribute data on extension owned tables being dumped
If this data is not collected, pg_dump segfaults if asked for column
inserts.

Fix by Fabrízio de Royes Mello

Backpatch to release 12 where the bug was introduced.
2020-09-04 13:55:03 -04:00
Tom Lane
aa4eeb38f3 Fix rare deadlock failure in create_am regression test.
The "DROP ACCESS METHOD gist2" test will require locking the index
to be dropped and then its table; while most ordinary operations
lock a table first then its index.  While no concurrent test scripts
should be touching fast_emp4000, autovacuum might chance to be
processing that table when the DROP runs, resulting in a deadlock
failure.  This is pretty rare but we see it in the buildfarm from
time to time.

To fix, acquire a lock on fast_emp4000 before issuing the DROP.

Since the point of the exercise is mostly to prevent buildfarm
failures, back-patch to 9.6 where this test was introduced.

Discussion: https://postgr.es/m/839004.1599185607@sss.pgh.pa.us
2020-09-04 12:40:28 -04:00
Tom Lane
82dd373f2c Avoid lockup of a parallel worker when reporting a long error message.
Because sigsetjmp() will restore the initial state with signals blocked,
the code path in bgworker.c for reporting an error and exiting would
execute that way.  Usually this is fairly harmless; but if a parallel
worker had an error message exceeding the shared-memory communication
buffer size (16K) it would lock up, because it would wait for a
resume-sending signal from its parallel leader which it would never
detect.

To fix, just unblock signals at the appropriate point.

This can be shown to fail back to 9.6.  The lack of parallel query
infrastructure makes it difficult to provide a simple test case for
9.5; but I'm pretty sure the issue exists in some form there as well,
so apply the code change there too.

Vignesh C, reviewed by Bharath Rupireddy, Robert Haas, and myself

Discussion: https://postgr.es/m/CALDaNm1d1hHPZUg3xU4XjtWBOLCrA+-2cJcLpw-cePZ=GgDVfA@mail.gmail.com
2020-09-03 16:52:09 -04:00
Alvaro Herrera
7067ba1b4b
Raise error on concurrent drop of partitioned index
We were already raising an error for DROP INDEX CONCURRENTLY on a
partitioned table, albeit a different and confusing one:
  ERROR:  DROP INDEX CONCURRENTLY must be first action in transaction

Change that to throw a more comprehensible error:
  ERROR:  cannot drop partitioned index \"%s\" concurrently

Michael Paquier authored the test case for indexes on temporary
partitioned tables.

Backpatch to 11, where indexes on partitioned tables were added.

Reported-by: Jan Mussler <jan.mussler@zalando.de>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/16594-d2956ca909585067@postgresql.org
2020-09-01 13:40:43 -04:00
Tom Lane
55aea0c706 Teach libpq to handle arbitrary-length lines in .pgpass files.
Historically there's been a hard-wired assumption here that no line of
a .pgpass file could be as long as NAMEDATALEN*5 bytes.  That's a bit
shaky to start off with, because (a) there's no reason to suppose that
host names fit in NAMEDATALEN, and (b) this figure fails to allow for
backslash escape characters.  However, it fails completely if someone
wants to use a very long password, and we're now hearing reports of
people wanting to use "security tokens" that can run up to several
hundred bytes.  Another angle is that the file is specified to allow
comment lines, but there's no reason to assume that long comment lines
aren't possible.

Rather than guessing at what might be a more suitable limit, let's
replace the fixed-size buffer with an expansible PQExpBuffer.  That
adds one malloc/free cycle to the typical use-case, but that's surely
pretty cheap relative to the I/O this code has to do.

Also, add TAP test cases to exercise this code, because there was no
test coverage before.

This reverts most of commit 2eb3bc588, as there's no longer a need for
a warning message about overlength .pgpass lines.  (I kept the explicit
check for comment lines, though.)

In HEAD and v13, this also fixes an oversight in 74a308cf5: there's not
much point in explicit_bzero'ing the line buffer if we only do so in two
of the three exit paths.

Back-patch to all supported branches, except that the test case only
goes back to v10 where src/test/authentication/ was added.

Discussion: https://postgr.es/m/4187382.1598909041@sss.pgh.pa.us
2020-09-01 13:14:44 -04:00
Tom Lane
6b701eaaa9 Avoid pushing quals down into sub-queries that have grouping sets.
The trouble with doing this is that an apparently-constant subquery
output column isn't really constant if it is a grouping column that
appears in only some of the grouping sets.  A qual using such a
column would be subject to incorrect const-folding after push-down,
as seen in bug #16585 from Paul Sivash.

To fix, just disable qual pushdown altogether if the sub-query has
nonempty groupingSets.  While we could imagine far less restrictive
solutions, there is not much point in working harder right now,
because subquery_planner() won't move HAVING clauses to WHERE within
such a subquery.  If the qual stays in HAVING it's not going to be
a lot more useful than if we'd kept it at the outer level.

Having said that, this restriction could be removed if we used a
parsetree representation that distinguished such outputs from actual
constants, which is something I hope to do in future.  Hence, make
the patch a minimal addition rather than integrating it more tightly
(e.g. by renumbering the existing items in subquery_is_pushdown_safe's
comment).

Back-patch to 9.5 where grouping sets were introduced.

Discussion: https://postgr.es/m/16585-9d8c340d23ade8c1@postgresql.org
2020-08-22 14:46:40 -04:00
Tom Lane
d9253df12e Fix handling of CREATE TABLE LIKE with inheritance.
If a CREATE TABLE command uses both LIKE and traditional inheritance,
Vars in CHECK constraints and expression indexes that are absorbed
from a LIKE parent table tended to get mis-numbered, resulting in
wrong answers and/or bizarre error messages (though probably not any
actual crashes, thanks to validation occurring in the executor).

In v12 and up, the same could happen to Vars in GENERATED expressions,
even in cases with no LIKE clause but multiple traditional-inheritance
parents.

The cause of the problem for LIKE is that parse_utilcmd.c supposed
it could renumber such Vars correctly during transformCreateStmt(),
which it cannot since we have not yet accounted for columns added via
inheritance.  Fix that by postponing processing of LIKE INCLUDING
CONSTRAINTS, DEFAULTS, GENERATED, INDEXES till after we've performed
DefineRelation().

The error with GENERATED and multiple inheritance is a simple oversight
in MergeAttributes(); it knows it has to renumber Vars in inherited
CHECK constraints, but forgot to apply the same processing to inherited
GENERATED expressions (a/k/a defaults).

Per bug #16272 from Tom Gottfried.  The non-GENERATED variants of the
issue are ancient, presumably dating right back to the addition of
CREATE TABLE LIKE; hence back-patch to all supported branches.

Discussion: https://postgr.es/m/16272-6e32da020e9a9381@postgresql.org
2020-08-21 15:00:43 -04:00
Alvaro Herrera
4f47c8e7d4
Disable autovacuum for BRIN test table
This should improve stability in the tests.

Per buildfarm member hyrax (CLOBBER_CACHE_ALWAYS) via Tom Lane.

Discussion: https://postgr.es/m/871534.1597503261@sss.pgh.pa.us
2020-08-17 16:20:06 -04:00
Tom Lane
912fb290c5 Be more careful about the shape of hashable subplan clauses.
nodeSubplan.c expects that the testexpr for a hashable ANY SubPlan
has the form of one or more OpExprs whose LHS is an expression of the
outer query's, while the RHS is an expression over Params representing
output columns of the subquery.  However, the planner only went as far
as verifying that the clauses were all binary OpExprs.  This works
99.99% of the time, because the clauses have the right shape when
emitted by the parser --- but it's possible for function inlining to
break that, as reported by PegoraroF10.  To fix, teach the planner
to check that the LHS and RHS contain the right things, or more
accurately don't contain the wrong things.  Given that this has been
broken for years without anyone noticing, it seems sufficient to just
give up hashing when it happens, rather than go to the trouble of
commuting the clauses back again (which wouldn't necessarily work
anyway).

While poking at that, I also noticed that nodeSubplan.c had a baked-in
assumption that the number of hash clauses is identical to the number
of subquery output columns.  Again, that's fine as far as parser output
goes, but it's not hard to break it via function inlining.  There seems
little reason for that assumption though --- AFAICS, the only thing
it's buying us is not having to store the number of hash clauses
explicitly.  Adding code to the planner to reject such cases would take
more code than getting nodeSubplan.c to cope, so I fixed it that way.

This has been broken for as long as we've had hashable SubPlans,
so back-patch to all supported branches.

Discussion: https://postgr.es/m/1549209182255-0.post@n3.nabble.com
2020-08-14 22:14:03 -04:00
Heikki Linnakangas
f81167adbf Fix typo in test comment. 2020-08-14 10:41:19 +03:00
Noah Misch
64a71062e0 Empty search_path in logical replication apply worker and walsender.
This is like CVE-2018-1058 commit
582edc369c.  Today, a malicious user of a
publisher or subscriber database can invoke arbitrary SQL functions
under an identity running replication, often a superuser.  This fix may
cause "does not exist" or "no schema has been selected to create in"
errors in a replication process.  After upgrading, consider watching
server logs for these errors.  Objects accruing schema qualification in
the wake of the earlier commit are unlikely to need further correction.
Back-patch to v10, which introduced logical replication.

Security: CVE-2020-14349
2020-08-10 09:22:58 -07:00
Etsuro Fujita
4f26932296 Fix yet another issue with step generation in partition pruning.
Commit 13838740f fixed some issues with step generation in partition
pruning, but there was yet another one: get_steps_using_prefix() assumes
that clauses in the passed-in prefix list are sorted in ascending order
of their partition key numbers, but the caller failed to ensure this for
range partitioning, which led to an assertion failure in debug builds.
Adjust the caller function to arrange the clauses in the prefix list in
the required order for range partitioning.

Back-patch to v11, like the previous commit.

Patch by me, reviewed by Amit Langote.

Discussion: https://postgr.es/m/CAPmGK16jkXiFG0YqMbU66wte-oJTfW6D1HaNvQf%3D%2B5o9%3Dm55wQ%40mail.gmail.com
2020-08-07 14:45:02 +09:00
Thomas Munro
76b2b3e724 Fix rare failure in LDAP tests.
Instead of writing a query to psql's stdin, use -c.  This avoids a
failure where psql exits before we write, seen a few times on the build
farm.  Thanks to Tom Lane for the suggestion.

Back-patch to 11, where the LDAP tests arrived.

Reviewed-by: Noah Misch <noah@leadboat.com>
Discussion: https://postgr.es/m/CA%2BhUKGLFmW%2BHQYPeKiwSp5sdFFHtFViCpw4Mh6yAgEx74r5-Cw%40mail.gmail.com
2020-08-03 12:44:27 +12:00
Etsuro Fujita
62c4a77295 Fix some issues with step generation in partition pruning.
In the case of range partitioning, get_steps_using_prefix() assumes that
the passed-in prefix list contains at least one clause for each of the
partition keys earlier than one specified in the passed-in
step_lastkeyno, but the caller (ie, gen_prune_steps_from_opexps())
didn't take it into account, which led to a server crash or incorrect
results when the list contained no clauses for such partition keys, as
reported in bug #16500 and #16501 from Kobayashi Hisanori.  Update the
caller to call that function only when the list created there contains
at least one clause for each of the earlier partition keys in the case
of range partitioning.

While at it, fix some other issues:

* The list to pass to get_steps_using_prefix() is allowed to contain
  multiple clauses for the same partition key, as described in the
  comment for that function, but that function actually assumed that the
  list contained just a single clause for each of middle partition keys,
  which led to an assertion failure when the list contained multiple
  clauses for such partition keys.  Update that function to match the
  comment.
* In the case of hash partitioning, partition keys are allowed to be
  NULL, in which case the list to pass to get_steps_using_prefix()
  contains no clauses for NULL partition keys, but that function treats
  that case as like the case of range partitioning, which led to the
  assertion failure.  Update the assertion test to take into account
  NULL partition keys in the case of hash partitioning.
* Fix a typo in a comment in get_steps_using_prefix_recurse().
* gen_partprune_steps() failed to detect self-contradiction from
  strict-qual clauses and an IS NULL clause for the same partition key
  in some cases, producing incorrect partition-pruning steps, which led
  to incorrect results of partition pruning, but didn't cause any
  user-visible problems fortunately, as the self-contradiction is
  detected later in the query planning.  Update that function to detect
  the self-contradiction.

Per bug #16500 and #16501 from Kobayashi Hisanori.  Patch by me, initial
diagnosis for the reported issue and review by Dmitry Dolgov.
Back-patch to v11, where partition pruning was introduced.

Discussion: https://postgr.es/m/16500-d1613f2a78e1e090%40postgresql.org
Discussion: https://postgr.es/m/16501-5234a9a0394f6754%40postgresql.org
2020-07-28 11:00:02 +09:00
Tom Lane
798b4faefd Kluge slot_compile_deform() to ignore incorrect attnotnull markings.
Since we mustn't force an initdb in released branches, there is no
simple way to correct the markings of pg_subscription.subslotname
and pg_subscription_rel.srsublsn as attnotnull in existing pre-v13
installations.

Fortunately, released branches don't rely on attnotnull being correct
for much.  The planner looks at it in relation_excluded_by_constraints,
but it'd be difficult to get that to matter for a query on a system
catalog.  The only place where it's really problematic is in JIT's
slot_compile_deform(), which can produce incorrect code that crashes
if there are NULLs in an allegedly not-null column.

Hence, hack up slot_compile_deform() to be specifically aware of
these two incorrect markings and not trust them.

This applies to v11 and v12; the JIT code didn't exist before that,
and we've fixed the markings in v13.

Discussion: https://postgr.es/m/229396.1595191345@sss.pgh.pa.us
2020-07-20 15:54:24 -04:00
Tom Lane
71e561bd4b Fix construction of updated-columns bitmap in logical replication.
Commit b9c130a1f failed to apply the publisher-to-subscriber column
mapping while checking which columns were updated.  Perhaps less
significantly, it didn't exclude dropped columns either.  This could
result in an incorrect updated-columns bitmap and thus wrong decisions
about whether to fire column-specific triggers on the subscriber while
applying updates.  In HEAD (since commit 9de77b545), it could also
result in accesses off the end of the colstatus array, as detected by
buildfarm member skink.  Fix the logic, and adjust 003_constraints.pl
so that the problem is exposed in unpatched code.

In HEAD, also add some assertions to check that we don't access off
the ends of these newly variable-sized arrays.

Back-patch to v10, as b9c130a1f was.

Discussion: https://postgr.es/m/CAH2-Wz=79hKQ4++c5A060RYbjTHgiYTHz=fw6mptCtgghH2gJA@mail.gmail.com
2020-07-20 13:40:16 -04:00
Tom Lane
de797e8235 Fix bitmap AND/OR scans on the inside of a nestloop partition-wise join.
reparameterize_path_by_child() failed to reparameterize BitmapAnd
and BitmapOr paths.  This matters only if such a path is chosen as
the inside of a nestloop partition-wise join, where we have to pass
in parameters from the outside of the nestloop.  If that did happen,
we generated a bad plan that would likely lead to crashes at execution.

This is not entirely reparameterize_path_by_child()'s fault though;
it's the victim of an ancient decision (my ancient decision, I think)
to not bother filling in param_info in BitmapAnd/Or path nodes.  That
caused the function to believe that such nodes and their children
contain no parameter references and so need not be processed.

In hindsight that decision looks pretty penny-wise and pound-foolish:
while it saves a few cycles during path node setup, we do commonly
need the information later.  In particular, by reversing the decision
and requiring valid param_info data in all nodes of a bitmap path
tree, we can get rid of indxpath.c's get_bitmap_tree_required_outer()
function, which computed the data on-demand.  It's not unlikely that
that nets out as a savings of cycles in many scenarios.  A couple
of other things in indxpath.c can be simplified as well.

While here, get rid of some cases in reparameterize_path_by_child()
that are visibly dead or useless, given that we only care about
reparameterizing paths that can be on the inside of a parameterized
nestloop.  This case reminds one of the maxim that untested code
probably does not work, so I'm unwilling to leave unreachable code
in this function.  (I did leave the T_Gather case in place even
though it's not reached in the regression tests.  It's not very
clear to me when the planner might prefer to put Gather below
rather than above a nestloop, but at least in principle the case
might be interesting.)

Per bug #16536, originally from Arne Roland but with a test case
by Andrew Gierth.  Back-patch to v11 where this code came in.

Discussion: https://postgr.es/m/16536-2213ee0b3aad41fd@postgresql.org
2020-07-14 18:56:49 -04:00
David Rowley
1231a0b0ea Fix timing issue with ALTER TABLE's validate constraint
An ALTER TABLE to validate a foreign key in which another subcommand
already caused a pending table rewrite could fail due to ALTER TABLE
attempting to validate the foreign key before the actual table rewrite
takes place.  This situation could result in an error such as:

ERROR:  could not read block 0 in file "base/nnnnn/nnnnn": read only 0 of 8192 bytes

The failure here was due to the SPI call which validates the foreign key
trying to access an index which is yet to be rebuilt.

Similarly, we also incorrectly tried to validate CHECK constraints before
the heap had been rewritten.

The fix for both is to delay constraint validation until phase 3, after
the table has been rewritten.  For CHECK constraints this means a slight
behavioral change.  Previously ALTER TABLE VALIDATE CONSTRAINT on
inheritance tables would be validated from the bottom up.  This was
different from the order of evaluation when a new CHECK constraint was
added.  The changes made here aligns the VALIDATE CONSTRAINT evaluation
order for inheritance tables to be the same as ADD CONSTRAINT, which is
generally top-down.

Reported-by: Nazli Ugur Koyluoglu, using SQLancer
Discussion: https://postgr.es/m/CAApHDvp%3DZXv8wiRyk_0rWr00skhGkt8vXDrHJYXRMft3TjkxCA%40mail.gmail.com
Backpatch-through: 9.5 (all supported versions)
2020-07-14 17:03:12 +12:00
Tom Lane
d3b642ad99 Cope with lateral references in the quals of a subquery RTE.
The qual pushdown logic assumed that all Vars in a restriction clause
must be Vars referencing subquery outputs; but since we introduced
LATERAL, it's possible for such a Var to be a lateral reference instead.
This led to an assertion failure in debug builds.  In a non-debug
build, there might be no ill effects (if qual_is_pushdown_safe decided
the qual was unsafe anyway), or we could get failures later due to
construction of an invalid plan.  I've not gone to much length to
characterize the possible failures, but at least segfaults in the
executor have been observed.

Given that this has been busted since 9.3 and it took this long for
anybody to notice, I judge that the case isn't worth going to great
lengths to optimize.  Hence, fix by just teaching qual_is_pushdown_safe
that such quals are unsafe to push down, matching the previous behavior
when it accidentally didn't fail.

Per report from Tom Ellis.  Back-patch to all supported branches.

Discussion: https://postgr.es/m/20200713175124.GQ8220@cloudinit-builder
2020-07-13 20:38:21 -04:00
Alexander Korotkov
f4ae676e31 Forbid numeric NaN in jsonpath
SQL standard doesn't define numeric Inf or NaN values.  It appears even more
ridiculous to support then in jsonpath assuming JSON doesn't support these
values as well.  This commit forbids returning NaN from .double(), which was
previously allowed.  NaN can't be result of inner-jsonpath computation over
non-NaNs.  So, we can not expect NaN in the jsonpath output.

Reported-by: Tom Lane
Discussion: https://postgr.es/m/203949.1591879542%40sss.pgh.pa.us
Author: Alexander Korotkov
Reviewed-by: Tom Lane
Backpatch-through: 12
2020-07-11 03:21:57 +03:00
Alexander Korotkov
3ec5f6b53d Improve error reporting for jsonpath .double() method
When jsonpath .double() method detects that numeric or string can't be
converted to double precision, it throws an error.  This commit makes these
errors explicitly express the reason of failure.

Discussion: https://postgr.es/m/CAPpHfdtqJtiSXkP7tOXez18NxhLUH_-75bL8%3DOce4Ki%2Bbv7V6Q%40mail.gmail.com
Author: Alexander Korotkov
Reviewed-by: Tom Lane
Backpatch-through: 12
2020-07-11 03:21:53 +03:00
Tom Lane
a3cfb10764 Tighten up Windows CRLF conversion in our TAP test scripts.
Back-patch commits 91bdf499b and ffb4cee43, so that all branches
agree on when and how to do Windows CRLF conversion.

This should close the referenced thread.  Thanks to Andrew Dunstan
for discussion/review.

Discussion: https://postgr.es/m/412ae8da-76bb-640f-039a-f3513499e53d@gmx.net
2020-07-09 17:38:52 -04:00
Tom Lane
b22ca7648b Future-proof regression tests against possibly-missing posixrules file.
The IANA time zone folk have deprecated use of a "posixrules" file in
the tz database.  While for now it's our choice whether to keep
supplying one in our own builds, installations built with
--with-system-tzdata will soon be needing to cope with that file not
being present, at least on some platforms.

This causes a problem for the horology test, which expected the
nonstandard POSIX zone spec "CST7CDT" to apply pre-2007 US daylight
savings rules.  That does happen if the posixrules file supplies such
information, but otherwise the test produces undesired results.
To fix, add an explicit transition date rule that matches 2005 practice.
(We could alternatively have switched the test to use some real time
zone, but it seems useful to have coverage of this type of zone spec.)

While at it, update a documentation example that also relied on
"CST7CDT"; use a real-world zone name instead.  Also, document why
the zone names EST5EDT, CST6CDT, MST7MDT, PST8PDT aren't subject to
similar failures when "posixrules" is missing.

Back-patch to all supported branches, since the hazard is the same
for all.

Discussion: https://postgr.es/m/1665379.1592581287@sss.pgh.pa.us
2020-06-19 13:55:21 -04:00
Andres Freund
6cc2866c4c Fix deadlock danger when atomic ops are done under spinlock.
This was a danger only for --disable-spinlocks in combination with
atomic operations unsupported by the current platform.

While atomics.c was careful to signal that a separate semaphore ought
to be used when spinlock emulation is active, spin.c didn't actually
implement that mechanism. That's my (Andres') fault, it seems to have
gotten lost during the development of the atomic operations support.

Fix that issue and add test for nesting atomic operations inside a
spinlock.

Author: Andres Freund
Discussion: https://postgr.es/m/20200605023302.g6v3ydozy5txifji@alap3.anarazel.de
Backpatch: 9.5-
2020-06-18 14:13:06 -07:00
Andres Freund
008c119928 Add basic spinlock tests to regression tests.
As s_lock_test, the already existing test for spinlocks, isn't run in
an automated fashion (and doesn't test a normal backend environment),
adding tests that are run as part of a normal regression run is a good
idea. Particularly in light of several recent and upcoming spinlock
related fixes.

Currently the new tests are run as part of the pre-existing
test_atomic_ops() test. That perhaps can be quibbled about, but for
now seems ok.

The only operations that s_lock_test tests but the new tests don't are
the detection of a stuck spinlock and S_LOCK_FREE (which is otherwise
unused, not implemented on all platforms, and will be removed).

This currently contains a test for more than INT_MAX spinlocks (only
run with --disable-spinlocks), to ensure the recent commit fixing a
bug with more than INT_MAX spinlock initializations is correct. That
test is somewhat slow, so we might want to disable it after a few
days.

It might be worth retiring s_lock_test after this. The added coverage
of a stuck spinlock probably isn't worth the added complexity?

Author: Andres Freund
Discussion: https://postgr.es/m/20200606023103.avzrctgv7476xj7i@alap3.anarazel.de
2020-06-18 14:06:26 -07:00
Michael Paquier
16f43122d4 Fix oldest xmin and LSN computation across repslots after advancing
Advancing a replication slot did not recompute the oldest xmin and LSN
values across replication slots, preventing resource removal like
segments not recycled at checkpoint time.  The original commit that
introduced the slot advancing in 9c7d06d never did the update of those
oldest values, and b0afdca removed this code.

This commit adds a TAP test to check segment recycling with advancing
for physical slots, enforcing an extra segment switch before advancing
to check if the segment gets correctly recycled after a checkpoint.

Reported-by: Andres Freund
Reviewed-by: Alexey Kondratov, Kyptaro Horiguchi
Discussion: https://postgr.es/m/20200609171904.kpltxxvjzislidks@alap3.anarazel.de
Backpatch-through: 11
2020-06-18 16:35:32 +09:00
Tom Lane
874372a941 Fix behavior of float aggregates for single Inf or NaN inputs.
When there is just one non-null input value, and it is infinity or NaN,
aggregates such as stddev_pop and covar_pop should produce a NaN
result, because the calculation is not well-defined.  They used to do
so, but since we adopted Youngs-Cramer aggregation in commit e954a727f,
they produced zero instead.  That's an oversight, so fix it.  Add tests
exercising these edge cases.

Affected aggregates are

 var_pop(double precision)
 stddev_pop(double precision)
 var_pop(real)
 stddev_pop(real)
 regr_sxx(double precision,double precision)
 regr_syy(double precision,double precision)
 regr_sxy(double precision,double precision)
 regr_r2(double precision,double precision)
 regr_slope(double precision,double precision)
 regr_intercept(double precision,double precision)
 covar_pop(double precision,double precision)
 corr(double precision,double precision)

Back-patch to v12 where the behavior change was accidentally introduced.

Report and patch by me; thanks to Dean Rasheed for review.

Discussion: https://postgr.es/m/353062.1591898766@sss.pgh.pa.us
2020-06-13 13:43:24 -04:00
Tom Lane
4284e11846 Fix mishandling of NaN counts in numeric_[avg_]combine.
When merging two NumericAggStates, the code missed adding the new
state's NaNcount unless its N was also nonzero; since those counts
are independent, this is wrong.

This would only have visible effect if some partial aggregate scans
found only NaNs while earlier ones found only non-NaNs; then we could
end up falsely deciding that there were no NaNs and fail to return a
NaN final result as expected.  That's pretty improbable, so it's no
surprise this hasn't been reported from the field.  Still, it's a bug.

I didn't try to produce a regression test that would show the bug,
but I did notice that these functions weren't being reached at all
in our regression tests, so I improved the tests to at least
exercise them.  With these additions, I see pretty complete code
coverage on the aggregation-related functions in numeric.c.

Back-patch to 9.6 where this code was introduced.  (I only added
the improved test case as far back as v10, though, since the
relevant part of aggregates.sql isn't there at all in 9.6.)
2020-06-11 17:38:42 -04:00
Peter Geoghegan
e620a38c23 Avoid update conflict out serialization anomalies.
SSI's HeapCheckForSerializableConflictOut() test failed to correctly
handle conditions involving a concurrently inserted tuple which is later
concurrently updated by a separate transaction .  A SELECT statement
that called HeapCheckForSerializableConflictOut() could end up using the
same XID (updater's XID) for both the original tuple, and the successor
tuple, missing the XID of the xact that created the original tuple
entirely.  This only happened when neither tuple from the chain was
visible to the transaction's MVCC snapshot.

The observable symptoms of this bug were subtle.  A pair of transactions
could commit, with the later transaction failing to observe the effects
of the earlier transaction (because of the confusion created by the
update to the non-visible row).  This bug dates all the way back to
commit dafaa3ef, which added SSI.

To fix, make sure that we check the xmin of concurrently inserted tuples
that happen to also have been updated concurrently.

Author: Peter Geoghegan
Reported-By: Kyle Kingsbury
Reviewed-By: Thomas Munro
Discussion: https://postgr.es/m/db7b729d-0226-d162-a126-8a8ab2dc4443@jepsen.io
Backpatch: All supported versions
2020-06-11 10:09:43 -07:00
Amit Kapila
b7ed1d9944 Fix typos.
Reported-by: John Naylor
Author: John Naylor
Backpatch-through: 9.5
Discussion: https://postgr.es/m/CACPNZCtRuvs6G+EYqejhVJgBq2AKeZdXRVJsbX4syhO9gn5SNQ@mail.gmail.com
2020-06-11 14:26:17 +05:30