Commit graph

1426 commits

Author SHA1 Message Date
Alvaro Herrera
c801c3a1ff
Revert "Avoid creating archive status ".ready" files too early"
This reverts commit 515e3d84a0 and equivalent commits in back
branches.  This solution to the problem has a number of problems, so
we'll try again with a different approach.

Per note from Andres Freund

Discussion: https://postgr.es/m/20210831042949.52eqp5xwbxgrfank@alap3.anarazel.de
2021-09-04 12:14:30 -04:00
Alvaro Herrera
cf69f7109c
Avoid creating archive status ".ready" files too early
WAL records may span multiple segments, but XLogWrite() does not
wait for the entire record to be written out to disk before
creating archive status files.  Instead, as soon as the last WAL page of
the segment is written, the archive status file is created, and the
archiver may process it.  If PostgreSQL crashes before it is able to
write and flush the rest of the record (in the next WAL segment), the
wrong version of the first segment file lingers in the archive, which
causes operations such as point-in-time restores to fail.

To fix this, keep track of records that span across segments and ensure
that segments are only marked ready-for-archival once such records have
been completely written to disk.

This has always been wrong, so backpatch all the way back.

Author: Nathan Bossart <bossartn@amazon.com>
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-by: Ryo Matsumura <matsumura.ryo@fujitsu.com>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Discussion: https://postgr.es/m/CBDDFA01-6E40-46BB-9F98-9340F4379505@amazon.com
2021-08-23 15:50:35 -04:00
Tom Lane
6fcbaea7af Be more careful about barriers when releasing BackgroundWorkerSlots.
ForgetBackgroundWorker lacked any memory barrier at all, while
BackgroundWorkerStateChange had one but unaccountably did
additional manipulation of the slot after the barrier.  AFAICS,
the rule must be that the barrier is immediately before setting
or clearing slot->in_use.

It looks like back in 9.6 when ForgetBackgroundWorker was first
written, there might have been some case for not needing a
barrier there, but I'm not very convinced of that --- the fact
that the load of bgw_notify_pid is in the caller doesn't seem
to guarantee no memory ordering problem.  So patch 9.6 too.

It's likely that this doesn't fix any observable bug on Intel
hardware, but machines with weaker memory ordering rules could
have problems here.

Discussion: https://postgr.es/m/4046084.1620244003@sss.pgh.pa.us
2021-05-15 12:21:06 -04:00
Fujii Masao
b59d3abaf4 Shut down transaction tracking at startup process exit.
Maxim Orlov reported that the shutdown of standby server could result in
the following assertion failure. The cause of this issue was that,
when the shutdown caused the startup process to exit, recovery-time
transaction tracking was not shut down even if it's already initialized,
and some locks the tracked transactions were holding could not be released.
At this situation, if other process was invoked and the PGPROC entry that
the startup process used was assigned to it, it found such unreleased locks
and caused the assertion failure, during the initialization of it.

    TRAP: FailedAssertion("SHMQueueEmpty(&(MyProc->myProcLocks[i]))"

This commit fixes this issue by making the startup process shut down
transaction tracking and release all locks, at the exit of it.

Back-patch to all supported branches.

Reported-by: Maxim Orlov
Author: Fujii Masao
Reviewed-by: Maxim Orlov
Discussion: https://postgr.es/m/ad4ce692cc1d89a093b471ab1d969b0b@postgrespro.ru
2021-04-06 02:27:30 +09:00
Tom Lane
b99b6b9d6c Fix race condition between shutdown and unstarted background workers.
If a database shutdown (smart or fast) is commanded between the time
some process decides to request a new background worker and the time
that the postmaster can launch that worker, then nothing happens
because the postmaster won't launch any bgworkers once it's exited
PM_RUN state.  This is fine ... unless the requesting process is
waiting for that worker to finish (or even for it to start); in that
case the requestor is stuck, and only manual intervention will get us
to the point of being able to shut down.

To fix, cancel pending requests for workers when the postmaster sends
shutdown (SIGTERM) signals, and similarly cancel any new requests that
arrive after that point.  (We can optimize things slightly by only
doing the cancellation for workers that have waiters.)  To fit within
the existing bgworker APIs, the "cancel" is made to look like the
worker was started and immediately stopped, causing deregistration of
the bgworker entry.  Waiting processes would have to deal with
premature worker exit anyway, so this should introduce no bugs that
weren't there before.  We do have a side effect that registration
records for restartable bgworkers might disappear when theoretically
they should have remained in place; but since we're shutting down,
that shouldn't matter.

Back-patch to v10.  There might be value in putting this into 9.6
as well, but the management of bgworkers is a bit different there
(notably see 8ff518699) and I'm not convinced it's worth the effort
to validate the patch for that branch.

Discussion: https://postgr.es/m/661570.1608673226@sss.pgh.pa.us
2020-12-24 17:00:43 -05:00
Tom Lane
85834023a9 In the postmaster, rely on the signal infrastructure to block signals.
POSIX sigaction(2) can be told to block a set of signals while a
signal handler executes.  Make use of that instead of manually
blocking and unblocking signals in the postmaster's signal handlers.
This should save a few cycles, but more importantly it prevents
recursive invocation of signal handlers when many signals arrive in
close succession.  (Assuming that the platform's signal infrastructure
is designed to avoid consuming stack space in that case, but this is
demonstrably true at least on Linux.)  The existing code has been seen
to recurse to the point of stack overflow, either in the postmaster
or in a forked-off child.

Back-patch of commit 9abb2bfc0.  At the time, we'd only seen excess
postmaster stack consumption in the buildfarm; but we now have a
user report of it, and that commit has aged enough to have a fair
amount of confidence that it doesn't break anything.

This still doesn't change anything about the way that it works on
Windows.  Perhaps someone else would like to fix that?

Per bug #16673 from David Geier.  Back-patch to 9.6.  Although
the problem exists in principle before that, we've only seen it
actually materialize in connection with heavy use of parallel
workers, so it doesn't seem necessary to do anything in 9.5;
and the relevant code is different there, too.

Discussion: https://postgr.es/m/16673-d278c604f8e34ec0@postgresql.org
Discussion: https://postgr.es/m/14878.1570820201@sss.pgh.pa.us
2020-10-15 12:50:57 -04:00
Tom Lane
93871b693c Use _exit(2) for SIGQUIT during ProcessStartupPacket, too.
Bring the signal handling for startup-packet collection into line
with the policy established in commits bedadc732 and 8e19a8264,
namely don't risk running atexit callbacks when handling SIGQUIT.

Ideally, we'd not do so for SIGTERM or timeout interrupts either,
but that change seems a bit too risky for the back branches.
For now, just improve the comments in this area to describe the risk.

Also relocate where BackendInitialize re-disables these interrupts,
to minimize the code span where they're active.  This doesn't buy
a whole lot of safety, but it can't hurt.

In passing, rename startup_die() to remove confusion about whether
it is for the startup process.

Like the previous commits, back-patch to all supported branches.

Discussion: https://postgr.es/m/1850884.1599601164@sss.pgh.pa.us
2020-09-10 12:06:26 -04:00
Tom Lane
67dde49a3d Make archiver's SIGQUIT handler exit via _exit().
Commit 8e19a8264 changed the SIGQUIT handlers of almost all server
processes not to run atexit callbacks.  The archiver process was
skipped, perhaps because it's not connected to shared memory; but
it's just as true here that running atexit callbacks in a signal
handler is unsafe.  So let's make it work like the rest.

In HEAD and v13, we can use the common SignalHandlerForCrashExit
handler.  Before that, just tweak pgarch_exit to use _exit(2)
explicitly.

Like the previous commit, back-patch to all supported branches.

Kyotaro Horiguchi, back-patching by me

Discussion: https://postgr.es/m/1850884.1599601164@sss.pgh.pa.us
2020-09-09 15:32:34 -04:00
Tom Lane
526df0a236 Avoid lockup of a parallel worker when reporting a long error message.
Because sigsetjmp() will restore the initial state with signals blocked,
the code path in bgworker.c for reporting an error and exiting would
execute that way.  Usually this is fairly harmless; but if a parallel
worker had an error message exceeding the shared-memory communication
buffer size (16K) it would lock up, because it would wait for a
resume-sending signal from its parallel leader which it would never
detect.

To fix, just unblock signals at the appropriate point.

This can be shown to fail back to 9.6.  The lack of parallel query
infrastructure makes it difficult to provide a simple test case for
9.5; but I'm pretty sure the issue exists in some form there as well,
so apply the code change there too.

Vignesh C, reviewed by Bharath Rupireddy, Robert Haas, and myself

Discussion: https://postgr.es/m/CALDaNm1d1hHPZUg3xU4XjtWBOLCrA+-2cJcLpw-cePZ=GgDVfA@mail.gmail.com
2020-09-03 16:52:09 -04:00
Tom Lane
250aaa2de9 Fix postmaster's behavior during smart shutdown.
Up to now, upon receipt of a SIGTERM ("smart shutdown" command), the
postmaster has immediately killed all "optional" background processes,
and subsequently refused to launch new ones while it's waiting for
foreground client processes to exit.  No doubt this seemed like an OK
policy at some point; but it's a pretty bad one now, because it makes
for a seriously degraded environment for the remaining clients:

* Parallel queries are killed, and new ones fail to launch. (And our
parallel-query infrastructure utterly fails to deal with the case
in a reasonable way --- it just hangs waiting for workers that are
not going to arrive.  There is more work needed in that area IMO.)

* Autovacuum ceases to function.  We can tolerate that for awhile,
but if bulk-update queries continue to run in the surviving client
sessions, there's eventually going to be a mess.  In the worst case
the system could reach a forced shutdown to prevent XID wraparound.

* The bgwriter and walwriter are also stopped immediately, likely
resulting in performance degradation.

Hence, let's rearrange things so that the only immediate change in
behavior is refusing to let in new normal connections.  Once the last
normal connection is gone, shut everything down as though we'd received
a "fast" shutdown.  To implement this, remove the PM_WAIT_BACKUP and
PM_WAIT_READONLY states, instead staying in PM_RUN or PM_HOT_STANDBY
while normal connections remain.  A subsidiary state variable tracks
whether or not we're letting in new connections in those states.

This also allows having just one copy of the logic for killing child
processes in smart and fast shutdown modes.  I moved that logic into
PostmasterStateMachine() by inventing a new state PM_STOP_BACKENDS.

Back-patch to 9.6 where parallel query was added.  In principle
this'd be a good idea in 9.5 as well, but the risk/reward ratio
is not as good there, since lack of autovacuum is not a problem
during typical uses of smart shutdown.

Per report from Bharath Rupireddy.

Patch by me, reviewed by Thomas Munro

Discussion: https://postgr.es/m/CALj2ACXAZ5vKxT9P7P89D87i3MDO9bfS+_bjMHgnWJs8uwUOOw@mail.gmail.com
2020-08-14 13:26:57 -04:00
Tom Lane
c6d43ffab3 Replace use of sys_siglist[] with strsignal().
This commit back-patches the v12-era commits a73d08319, cc92cca43,
and 7570df0f3 into supported pre-v12 branches.  The net effect is to
eliminate our former dependency on the never-standard sys_siglist[]
array, instead using POSIX-standard strsignal(3).

What motivates doing this now is that glibc just removed sys_siglist[]
from the set of symbols available to newly-built programs.  While our
code can survive without sys_siglist[], it then fails to print any
description of the signal that killed a child process, which is a
non-negligible loss of friendliness.  We can expect that people will
be wanting to build the back branches on platforms that include this
change, so we need to do something.

Since strsignal(3) has existed for quite a long time, and we've not
had any trouble with these patches so far in v12, it seems safe to
back-patch into older branches.

Discussion: https://postgr.es/m/3179114.1594853308@sss.pgh.pa.us
2020-07-15 22:05:12 -04:00
Tom Lane
7ea20a2bc6 Avoid failure if autovacuum tries to access a just-dropped temp namespace.
Such an access became possible when commit 246a6c8f7 added more
aggressive cleanup of orphaned temp relations by autovacuum.
Since autovacuum's snapshot might be slightly stale, it could
attempt to access an already-dropped temp namespace, resulting in
an assertion failure or null-pointer dereference.  (In practice,
since we don't drop temp namespaces automatically but merely
recycle them, this situation could only arise if a superuser does
a manual drop of a temp namespace.  Still, that should be allowed.)

The core of the bug, IMO, is that isTempNamespaceInUse and its callers
failed to think hard about whether to treat "temp namespace isn't there"
differently from "temp namespace isn't in use".  In hopes of forestalling
future mistakes of the same ilk, replace that function with a new one
checkTempNamespaceStatus, which makes the same tests but returns a
three-way enum rather than just a bool.  isTempNamespaceInUse is gone
entirely in HEAD; but just in case some external code is relying on it,
keep it in the back branches, as a bug-compatible wrapper around the
new function.

Per report originally from Prabhat Kumar Sahu, investigated by Mahendra
Singh and Michael Paquier; the final form of the patch is my fault.
This replaces the failed fix attempt in a052f6cbb.

Backpatch as far as v11, as 246a6c8f7 was.

Discussion: https://postgr.es/m/CAKYtNAr9Zq=1-ww4etHo-VCC-k120YxZy5OS01VkaLPaDbv2tg@mail.gmail.com
2020-02-28 20:28:34 -05:00
Michael Paquier
669feabfb7 Clean up properly error_context_stack in autovacuum worker on exception
Any callback set would have no meaning in the context of an exception.
As an autovacuum worker exits quickly in this context, this could be
only an issue within EmitErrorReport(), where the elog hook is for
example called.  That's unlikely to going to be a problem, but let's be
clean and consistent with other code paths handling exceptions.  This is
present since 2909419, which introduced autovacuum.

Author: Ashwin Agrawal
Reviewed-by: Tom Lane, Michael Paquier
Discussion: https://postgr.es/m/CALfoeisM+_+dgmAdAOHAu0k-ZpEHHqSSG=GRf3pKJGm8OqWX0w@mail.gmail.com
Backpatch-through: 9.4
2019-10-23 10:25:50 +09:00
Tom Lane
021065aac6 Check for too many postmaster children before spawning a bgworker.
The postmaster's code path for spawning a bgworker neglected to check
whether we already have the max number of live child processes.  That's
a bit hard to hit, since it would necessarily be a transient condition;
but if we do, AssignPostmasterChildSlot() fails causing a postmaster
crash, as seen in a report from Bhargav Kamineni.

To fix, invoke canAcceptConnections() in the bgworker code path, as we
do in the other code paths that spawn children.  Since we don't want
the same pmState tests in this case, add a child-process-type parameter
to canAcceptConnections() so that it can know what to do.

Back-patch to 9.5.  In principle the same hazard exists in 9.4, but the
code is enough different that this patch wouldn't quite fix it there.
Given the tiny usage of bgworkers in that branch it doesn't seem worth
creating a variant patch for it.

Discussion: https://postgr.es/m/18733.1570382257@sss.pgh.pa.us
2019-10-07 12:39:09 -04:00
Michael Paquier
af28744288 Fix thinko when ending progress report for a backend
The logic ending progress reporting for a backend entry introduced by
b6fb647 causes callers of pgstat_progress_end_command() to do some extra
work when track_activities is enabled as the process fields are reset in
the backend entry even if no command were started for reporting.

This resets the fields only if a command is registered for progress
reporting, and only if track_activities is enabled.

Author: Masahiho Sawada
Discussion: https://postgr.es/m/CAD21AoCry_vJ0E-m5oxJXGL3pnos-xYGCzF95rK5Bbi3Uf-rpA@mail.gmail.com
Backpatch-through: 9.6
2019-09-04 15:46:49 +09:00
Tom Lane
eb97242c2f Rearrange pgstat_bestart() to avoid failures within its critical section.
We long ago decided to design the shared PgBackendStatus data structure to
minimize the cost of writing status updates, which means that writers just
have to increment the st_changecount field twice.  That isn't hooked into
any sort of resource management mechanism, which means that if something
were to throw error between the two increments, the st_changecount field
would be left odd indefinitely.  That would cause readers to lock up.
Now, since it's also a bad idea to leave the field odd for longer than
absolutely necessary (because readers will spin while we have it set),
the expectation was that we'd treat these segments like spinlock critical
sections, with only short, more or less straight-line, code in them.

That was fine as originally designed, but commit 9029f4b37 broke it
by inserting a significant amount of non-straight-line code into
pgstat_bestart(), code that is very capable of throwing errors, not to
mention taking a significant amount of time during which readers will spin.
We have a report from Neeraj Kumar of readers actually locking up, which
I suspect was due to an encoding conversion error in X509_NAME_to_cstring,
though conceivably it was just a garden-variety OOM failure.

Subsequent commits have loaded even more dubious code into pgstat_bestart's
critical section (and commit fc70a4b0d deserves some kind of booby prize
for managing to miss the critical section entirely, although the negative
consequences seem minimal given that the PgBackendStatus entry should be
seen by readers as inactive at that point).

The right way to fix this mess seems to be to compute all these values
into a local copy of the process' PgBackendStatus struct, and then just
copy the data back within the critical section proper.  This plan can't
be implemented completely cleanly because of the struct's heavy reliance
on out-of-line strings, which we must initialize separately within the
critical section.  But still, the critical section is far smaller and
safer than it was before.

In hopes of forestalling future errors of the same ilk, rename the
macros for st_changecount management to make it more apparent that
the writer-side macros create a critical section.  And to prevent
the worst consequences if we nonetheless manage to mess it up anyway,
adjust those macros so that they really are a critical section, ie
they now bump CritSectionCount.  That doesn't add much overhead, and
it guarantees that if we do somehow throw an error while the counter
is odd, it will lead to PANIC and a database restart to reset shared
memory.

Back-patch to 9.5 where the problem was introduced.

In HEAD, also fix an oversight in commit b0b39f72b: it failed to teach
pgstat_read_current_status to copy st_gssstatus data from shared memory to
local memory.  Hence, subsequent use of that data within the transaction
would potentially see changing data that it shouldn't see.

Discussion: https://postgr.es/m/CAPR3Wj5Z17=+eeyrn_ZDG3NQGYgMEOY6JV6Y-WRRhGgwc16U3Q@mail.gmail.com
2019-05-11 21:27:13 -04:00
Tom Lane
f8642fb178 Fix some minor postmaster-state-machine issues.
In sigusr1_handler, don't ignore PMSIGNAL_ADVANCE_STATE_MACHINE based
on pmState.  The restriction is unnecessary (PostmasterStateMachine
should work in any state), not future-proof (since it makes too many
assumptions about why the signal might be sent), and broken even today
because a race condition can make it necessary to respond to the signal
in PM_WAIT_READONLY state.  The race condition seems unlikely, but
if it did happen, a hot-standby postmaster could fail to shut down
after receiving a smart-shutdown request.

In MaybeStartWalReceiver, don't clear the WalReceiverRequested flag
if the fork attempt fails.  Leaving it set allows us to try
again in future iterations of the postmaster idle loop.  (The startup
process would eventually send a fresh request signal, but this change
may allow us to retry the fork sooner.)

Remove an obsolete comment and unnecessary test in
PostmasterStateMachine's handling of PM_SHUTDOWN_2 state.  It's not
possible to have a live walreceiver in that state, and AFAICT has not
been possible since commit 5e85315ea.  This isn't a live bug, but the
false comment is quite confusing to readers.

In passing, rearrange sigusr1_handler's CheckPromoteSignal tests so that
we don't uselessly perform stat() calls that we're going to ignore the
results of.

Add some comments clarifying the behavior of MaybeStartWalReceiver;
I very nearly rearranged it in a way that'd reintroduce the race
condition fixed in e5d494d78.  Mea culpa for not commenting that
properly at the time.

Back-patch to all supported branches.  The PMSIGNAL_ADVANCE_STATE_MACHINE
change is the only one of even minor significance, but we might as well
keep this code in sync across branches.

Discussion: https://postgr.es/m/9001.1556046681@sss.pgh.pa.us
2019-04-24 14:15:44 -04:00
Noah Misch
7ef2b313e6 Consistently test for in-use shared memory.
postmaster startup scrutinizes any shared memory segment recorded in
postmaster.pid, exiting if that segment matches the current data
directory and has an attached process.  When the postmaster.pid file was
missing, a starting postmaster used weaker checks.  Change to use the
same checks in both scenarios.  This increases the chance of a startup
failure, in lieu of data corruption, if the DBA does "kill -9 `head -n1
postmaster.pid` && rm postmaster.pid && pg_ctl -w start".  A postmaster
will no longer stop if shmat() of an old segment fails with EACCES.  A
postmaster will no longer recycle segments pertaining to other data
directories.  That's good for production, but it's bad for integration
tests that crash a postmaster and immediately delete its data directory.
Such a test now leaks a segment indefinitely.  No "make check-world"
test does that.  win32_shmem.c already avoided all these problems.  In
9.6 and later, enhance PostgresNode to facilitate testing.  Back-patch
to 9.4 (all supported versions).

Reviewed (in earlier versions) by Daniel Gustafsson and Kyotaro HORIGUCHI.

Discussion: https://postgr.es/m/20190408064141.GA2016666@rfd.leadboat.com
2019-04-12 22:36:42 -07:00
Amit Kapila
036f7d3782 Avoid counting transaction stats for parallel worker cooperating
transaction.

The transaction that is initiated by the parallel worker to cooperate
with the actual transaction started by the main backend to complete the
query execution should not be counted as a separate transaction.  The
other internal transactions started and committed by the parallel worker
are still counted as separate transactions as we that is what we do in
other places like autovacuum.

This will partially fix the bloat in transaction stats due to additional
transactions performed by parallel workers.  For a complete fix, we need to
decide how we want to show all the transactions that are started internally
for various operations and that is a matter of separate patch.

Reported-by: Haribabu Kommi
Author: Haribabu Kommi
Reviewed-by: Amit Kapila, Jamison Kirk and Rahila Syed
Backpatch-through: 9.6
Discussion: https://postgr.es/m/CAJrrPGc9=jKXuScvNyQ+VNhO0FZk7LLAShAJRyZjnedd2D61EQ@mail.gmail.com
2019-04-10 08:36:42 +05:30
Noah Misch
e45a8ff871 Avoid "could not reattach" by providing space for concurrent allocation.
We've long had reports of intermittent "could not reattach to shared
memory" errors on Windows.  Buildfarm member dory fails that way when
PGSharedMemoryReAttach() execution overlaps with creation of a thread
for the process's "default thread pool".  Fix that by providing a second
region to receive asynchronous allocations that would otherwise intrude
into UsedShmemSegAddr.  In pgwin32_ReserveSharedMemoryRegion(), stop
trying to free reservations landing at incorrect addresses; the caller's
next step has been to terminate the affected process.  Back-patch to 9.4
(all supported versions).

Reviewed by Tom Lane.  He also did much of the prerequisite research;
see commit bcbf2346d6.

Discussion: https://postgr.es/m/20190402135442.GA1173872@rfd.leadboat.com
2019-04-08 21:39:03 -07:00
Noah Misch
392ea22e9b Revert "Consistently test for in-use shared memory."
This reverts commits 2f932f71d9,
16ee6eaf80 and
6f0e190056.  The buildfarm has revealed
several bugs.  Back-patch like the original commits.

Discussion: https://postgr.es/m/20190404145319.GA1720877@rfd.leadboat.com
2019-04-05 00:00:55 -07:00
Noah Misch
b2307f8e31 Consistently test for in-use shared memory.
postmaster startup scrutinizes any shared memory segment recorded in
postmaster.pid, exiting if that segment matches the current data
directory and has an attached process.  When the postmaster.pid file was
missing, a starting postmaster used weaker checks.  Change to use the
same checks in both scenarios.  This increases the chance of a startup
failure, in lieu of data corruption, if the DBA does "kill -9 `head -n1
postmaster.pid` && rm postmaster.pid && pg_ctl -w start".  A postmaster
will no longer recycle segments pertaining to other data directories.
That's good for production, but it's bad for integration tests that
crash a postmaster and immediately delete its data directory.  Such a
test now leaks a segment indefinitely.  No "make check-world" test does
that.  win32_shmem.c already avoided all these problems.  In 9.6 and
later, enhance PostgresNode to facilitate testing.  Back-patch to 9.4
(all supported versions).

Reviewed by Daniel Gustafsson and Kyotaro HORIGUCHI.

Discussion: https://postgr.es/m/20130911033341.GD225735@tornado.leadboat.com
2019-04-03 17:03:50 -07:00
Michael Paquier
7d7435c5c5 Make current_logfiles use permissions assigned to files in data directory
Since its introduction in 19dc233c, current_logfiles has been assigned
the same permissions as a log file, which can be enforced with
log_file_mode.  This setup can lead to incompatibility problems with
group access permissions as current_logfiles is not located in the log
directory, but at the root of the data folder.  Hence, if group
permissions are used but log_file_mode is more restrictive, a backup
with a user in the group having read access could fail even if the log
directory is located outside of the data folder.

Per discussion with the folks mentioned below, we have concluded that
current_logfiles should not be treated as a log file as it only stores
metadata related to log files, and that it should use the same
permissions as all other files in the data directory.  This solution has
the merit to be simple and fixes all the interaction problems between
group access and log_file_mode.

Author: Haribabu Kommi
Reviewed-by: Stephen Frost, Robert Haas, Tom Lane, Michael Paquier
Discussion: https://postgr.es/m/CAJrrPGcEotF1P7AWoeQyD3Pqr-0xkQg_Herv98DjbaMj+naozw@mail.gmail.com
Backpatch-through: 11, where group access has been added.
2019-03-24 21:01:10 +09:00
Tom Lane
cba8fc6882 Make checkpoint requests more robust.
Commit 6f6a6d8b1 introduced a delay of up to 2 seconds if we're trying
to request a checkpoint but the checkpointer hasn't started yet (or,
much less likely, our kill() call fails).  However buildfarm experience
shows that that's not quite enough for slow or heavily-loaded machines.
There's no good reason to assume that the checkpointer won't start
eventually, so we may as well make the timeout much longer, say 60 sec.

However, if the caller didn't say CHECKPOINT_WAIT, it seems like a bad
idea to be waiting at all, much less for as long as 60 sec.  We can
remove the need for that, and make this whole thing more robust, by
adjusting the code so that the existence of a pending checkpoint
request is clear from the contents of shared memory, and making sure
that the checkpointer process will notice it at startup even if it did
not get a signal.  In this way there's no need for a non-CHECKPOINT_WAIT
call to wait at all; if it can't send the signal, it can nonetheless
assume that the checkpointer will eventually service the request.

A potential downside of this change is that "kill -INT" on the checkpointer
process is no longer enough to trigger a checkpoint, should anyone be
relying on something so hacky.  But there's no obvious reason to do it
like that rather than issuing a plain old CHECKPOINT command, so we'll
assume that nobody is.  There doesn't seem to be a way to preserve this
undocumented quasi-feature without introducing race conditions.

Since a principal reason for messing with this is to prevent intermittent
buildfarm failures, back-patch to all supported branches.

Discussion: https://postgr.es/m/27830.1552752475@sss.pgh.pa.us
2019-03-19 12:49:27 -04:00
Alvaro Herrera
630de1131d Report correct name in autovacuum "work items" activity
We were reporting the database name instead of the relation name to
pg_stat_activity.  Repair.

Reported-by: Justin Pryzby
Discussion: https://postgr.es/m/20190220185552.GR28750@telsasoft.com
2019-02-22 13:00:15 -03:00
Michael Paquier
6afea53c30 Fix typos in documentation and for one wait event
These have been found while cross-checking for the use of unique words
in the documentation, and a wait event was not getting generated in a way
consistent to what the documentation provided.

Author: Alexander Lakhin
Discussion: https://postgr.es/m/9b5a3a85-899a-ae62-dbab-1e7943aa5ab1@gmail.com
2019-01-15 08:47:08 +09:00
Michael Paquier
a016f59d59 Prioritize history files when archiving
At the end of recovery for the post-promotion process, a new history
file is created followed by the last partial segment of the previous
timeline.  Based on the timing, the archiver would first try to archive
the last partial segment and then the history file.  This can delay the
detection of a new timeline taken, particularly depending on the time it
takes to transfer the last partial segment as it delays the moment the
history file of the new timeline gets archived.  This can cause promoted
standbys to use the same timeline as one already taken depending on the
circumstances if multiple instances look at archives at the same
location.

This commit changes the order of archiving so as history files are
archived in priority over other file types, which reduces the likelihood
of the same timeline being taken (still not reducing the window to
zero), and it makes the archiver behave more consistently with the
startup process doing its post-promotion business.

Author: David Steele
Reviewed-by: Michael Paquier, Kyotaro Horiguchi
Discussion: https://postgr.es/m/929068cf-69e1-bba2-9dc0-e05986aed471@pgmasters.net
Backpatch-through: 9.5
2018-12-24 20:25:49 +09:00
Tom Lane
b1894a6076 Improve detection of child-process SIGPIPE failures.
Commit ffa4cbd62 added logic to detect SIGPIPE failure of a COPY child
process, but it only worked correctly if the SIGPIPE occurred in the
immediate child process.  Depending on the shell in use and the
complexity of the shell command string, we might instead get back
an exit code of 128 + SIGPIPE, representing a shell error exit
reporting SIGPIPE in the child process.

We could just hack up ClosePipeToProgram() to add the extra case,
but it seems like this is a fairly general issue deserving a more
general and better-documented solution.  I chose to add a couple
of functions in src/common/wait_error.c, which is a natural place
to know about wait-result encodings, that will test for either a
specific child-process signal type or any child-process signal failure.
Then, adjust other places that were doing ad-hoc tests of this type
to use the common functions.

In RestoreArchivedFile, this fixes a race condition affecting whether
the process will report an error or just silently proc_exit(1): before,
that depended on whether the intermediate shell got SIGTERM'd itself
or reported a child process failing on SIGTERM.

Like the previous patch, back-patch to v10; we could go further
but there seems no real need to.

Per report from Erik Rijkers.

Discussion: https://postgr.es/m/f3683f87ab1701bea5d86a7742b22432@xs4all.nl
2018-12-16 14:32:14 -05:00
Michael Paquier
35622f7d32 Stop bgworkers during fast shutdown with postmaster in startup phase
When a postmaster gets into its phase PM_STARTUP, it would start
background workers using BgWorkerStart_PostmasterStart mode immediately,
which would cause problems for a fast shutdown as the postmaster forgets
to send SIGTERM to already-started background workers.  With smart and
immediate shutdowns, this correctly happened, and fast shutdown is the
only mode missing the shot.

Author: Alexander Kukushkin
Reviewed-by: Michael Paquier
Discussion: https://postgr.es/m/CAFh8B=mvnD8+DZUfzpi50DoaDfZRDfd7S=gwj5vU9GYn8UvHkA@mail.gmail.com
Backpatch-through: 9.5
2018-08-29 17:10:13 -07:00
Tom Lane
f8fc5f5f50 Make syslogger more robust against failures in opening CSV log files.
The previous coding figured it'd be good enough to postpone opening
the first CSV log file until we got a message we needed to write there.
This is unsafe, though, because if the open fails we end up in infinite
recursion trying to report the failure.  Instead make the CSV log file
management code look as nearly as possible like the longstanding logic
for the stderr log file.  In particular, open it immediately at postmaster
startup (if enabled), or when we get a SIGHUP in which we find that
log_destination has been changed to enable CSV logging.

It seems OK to fail if a postmaster-start-time open attempt fails, as
we've long done for the stderr log file.  But we can't die if we fail
to open a CSV log file during SIGHUP, so we're still left with a problem.
In that case, write any output meant for the CSV log file to the stderr
log file.  (This will also cover race-condition cases in which backends
send CSV log data before or after we have the CSV log file open.)

This patch also fixes an ancient oversight that, if CSV logging was
turned off during a SIGHUP, we never actually closed the last CSV
log file.

In passing, remember to reset whereToSendOutput = DestNone during syslogger
start, since (unlike all other postmaster children) it's forked before the
postmaster has done that.  This made for a platform-dependent difference
in error reporting behavior between the syslogger and other children:
except on Windows, it'd report problems to the original postmaster stderr
as well as the normal error log file(s).  It's barely possible that that
was intentional at some point; but it doesn't seem likely to be desirable
in production, and the platform dependency definitely isn't desirable.

Per report from Alexander Kukushkin.  It's been like this for a long time,
so back-patch to all supported branches.

Discussion: https://postgr.es/m/CAFh8B==iLUD_gqC-dAENS0V+kVrCeGiKujtKqSQ7++S-caaChw@mail.gmail.com
2018-08-26 14:21:55 -04:00
Tom Lane
d7ed4eea53 Clean up assorted misuses of snprintf()'s result value.
Fix a small number of places that were testing the result of snprintf()
but doing so incorrectly.  The right test for buffer overrun, per C99,
is "result >= bufsize" not "result > bufsize".  Some places were also
checking for failure with "result == -1", but the standard only says
that a negative value is delivered on failure.

(Note that this only makes these places correct if snprintf() delivers
C99-compliant results.  But at least now these places are consistent
with all the other places where we assume that.)

Also, make psql_start_test() and isolation_start_test() check for
buffer overrun while constructing their shell commands.  There seems
like a higher risk of overrun, with more severe consequences, here
than there is for the individual file paths that are made elsewhere
in the same functions, so this seemed like a worthwhile change.

Also fix guc.c's do_serialize() to initialize errno = 0 before
calling vsnprintf.  In principle, this should be unnecessary because
vsnprintf should have set errno if it returns a failure indication ...
but the other two places this coding pattern is cribbed from don't
assume that, so let's be consistent.

These errors are all very old, so back-patch as appropriate.  I think
that only the shell command overrun cases are even theoretically
reachable in practice, but there's not much point in erroneous error
checks.

Discussion: https://postgr.es/m/17245.1534289329@sss.pgh.pa.us
2018-08-15 16:29:32 -04:00
Michael Paquier
943576bddc Make autovacuum more aggressive to remove orphaned temp tables
Commit dafa084, added in 10, made the removal of temporary orphaned
tables more aggressive.  This commit makes an extra step into the
aggressiveness by adding a flag in each backend's MyProc which tracks
down any temporary namespace currently in use.  The flag is set when the
namespace gets created and can be reset if the temporary namespace has
been created in a transaction or sub-transaction which is aborted.  The
flag value assignment is assumed to be atomic, so this can be done in a
lock-less fashion like other flags already present in PGPROC like
databaseId or backendId, still the fact that the temporary namespace and
table created are still locked until the transaction creating those
commits acts as a barrier for other backends.

This new flag gets used by autovacuum to discard more aggressively
orphaned tables by additionally checking for the database a backend is
connected to as well as its temporary namespace in-use, removing
orphaned temporary relations even if a backend reuses the same slot as
one which created temporary relations in a past session.

The base idea of this patch comes from Robert Haas, has been written in
its first version by Tsunakawa Takayuki, then heavily reviewed by me.

Author: Tsunakawa Takayuki
Reviewed-by: Michael Paquier, Kyotaro Horiguchi, Andres Freund
Discussion: https://postgr.es/m/0A3221C70F24FB45833433255569204D1F8A4DC6@G01JPEXMBYT05
Backpatch: 11-, as PGPROC gains a new flag and we don't want silent ABI
breakages on already released versions.
2018-08-13 11:49:12 +02:00
Heikki Linnakangas
79f17d45e8 Don't run atexit callbacks in quickdie signal handlers.
exit() is not async-signal safe. Even if the libc implementation is, 3rd
party libraries might have installed unsafe atexit() callbacks. After
receiving SIGQUIT, we really just want to exit as quickly as possible, so
we don't really want to run the atexit() callbacks anyway.

The original report by Jimmy Yih was a self-deadlock in startup_die().
However, this patch doesn't address that scenario; the signal handling
while waiting for the startup packet is more complicated. But at least this
alleviates similar problems in the SIGQUIT handlers, like that reported
by Asim R P later in the same thread.

Backpatch to 9.3 (all supported versions).

Discussion: https://www.postgresql.org/message-id/CAOMx_OAuRUHiAuCg2YgicZLzPVv5d9_H4KrL_OFsFP%3DVPekigA%40mail.gmail.com
2018-08-08 19:10:35 +03:00
Tom Lane
1b5438ec2a Fix incorrect initialization of BackendActivityBuffer.
Since commit c8e8b5a6e, this has been zeroed out using the wrong length.
In practice the length would always be too small, leading to not zeroing
the whole buffer rather than clobbering additional memory; and that's
pretty harmless, both because shmem would likely start out as zeroes
and because we'd reinitialize any given entry before use.  Still,
it's bogus, so fix it.

Reported by Petru-Florin Mihancea (bug #15312)

Discussion: https://postgr.es/m/153363913073.1303.6518849192351268091@wrigleys.postgresql.org
2018-08-07 16:00:55 -04:00
Tom Lane
a7a7387575 Further improve code for probing the availability of ARM CRC instructions.
Andrew Gierth pointed out that commit 1c72ec6f4 would yield the wrong
answer on big-endian ARM systems, because the data being CRC'd would be
different.  To fix that, and avoid the rather unsightly hard-wired
constant, simply compare the hardware and software implementations'
results.

While we're at it, also log the resulting decision at DEBUG1, and error
out if the hw and sw results unexpectedly differ.  Also, since this
file must compile for both frontend and backend, avoid incorrect
dependencies on backend-only headers.

In passing, add a comment to postmaster.c about when the CRC function
pointer will get initialized.

Thomas Munro, based on complaints from Andrew Gierth and Tom Lane

Discussion: https://postgr.es/m/HE1PR0801MB1323D171938EABC04FFE7FA9E3110@HE1PR0801MB1323.eurprd08.prod.outlook.com
2018-05-03 11:32:57 -04:00
Tom Lane
9cb7db3f0c In AtEOXact_Files, complain if any files remain unclosed at commit.
This change makes this module act more like most of our other low-level
resource management modules.  It's a caller error if something is not
explicitly closed by the end of a successful transaction, so issue
a WARNING about it.  This would not actually have caught the file leak
bug fixed in commit 231bcd080, because that was in a transaction-abort
path; but it still seems like a good, and pretty cheap, cross-check.

Discussion: https://postgr.es/m/152056616579.4966.583293218357089052@wrigleys.postgresql.org
2018-04-28 17:45:02 -04:00
Heikki Linnakangas
811969b218 Allocate enough shared string memory for stats of auxiliary processes.
This fixes a bug whereby the st_appname, st_clienthostname, and
st_activity_raw fields for auxiliary processes point beyond the end of
their respective shared memory segments. As a result, the application_name
of a backend might show up as the client hostname of an auxiliary process.

Backpatch to v10, where this bug was introduced, when the auxiliary
processes were added to the array.

Author: Edmund Horner
Reviewed-by: Michael Paquier
Discussion: https://www.postgresql.org/message-id/CAMyN-kA7aOJzBmrYFdXcc7Z0NmW%2B5jBaf_m%3D_-77uRNyKC9r%3DA%40mail.gmail.com
2018-04-11 23:39:49 +03:00
Heikki Linnakangas
a820b4c329 Make local copy of client hostnames in backend status array.
The other strings, application_name and query string, were snapshotted to
local memory in pgstat_read_current_status(), but we forgot to do that for
client hostnames. As a result, the client hostname would appear to change in
the local copy, if the client disconnected.

Backpatch to all supported versions.

Author: Edmund Horner
Reviewed-by: Michael Paquier
Discussion: https://www.postgresql.org/message-id/CAMyN-kA7aOJzBmrYFdXcc7Z0NmW%2B5jBaf_m%3D_-77uRNyKC9r%3DA%40mail.gmail.com
2018-04-11 23:39:48 +03:00
Magnus Hagander
a228cc13ae Revert "Allow on-line enabling and disabling of data checksums"
This reverts the backend sides of commit 1fde38beaa.
I have, at least for now, left the pg_verify_checksums tool in place, as
this tool can be very valuable without the rest of the patch as well,
and since it's a read-only tool that only runs when the cluster is down
it should be a lot safer.
2018-04-09 19:03:42 +02:00
Stephen Frost
2b74022473 Fix EXEC BACKEND + Windows builds for group privs
Under EXEC BACKEND we also need to be going through the group privileges
setup since we do support that on Unixy systems, so add that to
SubPostmasterMain().

Under Windows, we need to simply return true from
GetDataDirectoryCreatePerm(), but that wasn't happening due to a missing
 #else clause.

Per buildfarm.
2018-04-07 19:01:43 -04:00
Stephen Frost
c37b3d08ca Allow group access on PGDATA
Allow the cluster to be optionally init'd with read access for the
group.

This means a relatively non-privileged user can perform a backup of the
cluster without requiring write privileges, which enhances security.

The mode of PGDATA is used to determine whether group permissions are
enabled for directory and file creates.  This method was chosen as it's
simple and works well for the various utilities that write into PGDATA.

Changing the mode of PGDATA manually will not automatically change the
mode of all the files contained therein.  If the user would like to
enable group access on an existing cluster then changing the mode of all
the existing files will be required.  Note that pg_upgrade will
automatically change the mode of all migrated files if the new cluster
is init'd with the -g option.

Tests are included for the backend and all the utilities which operate
on the PG data directory to ensure that the correct mode is set based on
the data directory permissions.

Author: David Steele <david@pgmasters.net>
Reviewed-By: Michael Paquier, with discussion amongst many others.
Discussion: https://postgr.es/m/ad346fe6-b23e-59f1-ecb7-0e08390ad629%40pgmasters.net
2018-04-07 17:45:39 -04:00
Stephen Frost
da9b580d89 Refactor dir/file permissions
Consolidate directory and file create permissions for tools which work
with the PG data directory by adding a new module (common/file_perm.c)
that contains variables (pg_file_create_mode, pg_dir_create_mode) and
constants to initialize them (0600 for files and 0700 for directories).

Convert mkdir() calls in the backend to MakePGDirectory() if the
original call used default permissions (always the case for regular PG
directories).

Add tests to make sure permissions in PGDATA are set correctly by the
tools which modify the PG data directory.

Authors: David Steele <david@pgmasters.net>,
         Adam Brightwell <adam.brightwell@crunchydata.com>
Reviewed-By: Michael Paquier, with discussion amongst many others.
Discussion: https://postgr.es/m/ad346fe6-b23e-59f1-ecb7-0e08390ad629%40pgmasters.net
2018-04-07 17:45:39 -04:00
Magnus Hagander
1fde38beaa Allow on-line enabling and disabling of data checksums
This makes it possible to turn checksums on in a live cluster, without
the previous need for dump/reload or logical replication (and to turn it
off).

Enabling checkusm starts a background process in the form of a
launcher/worker combination that goes through the entire database and
recalculates checksums on each and every page. Only when all pages have
been checksummed are they fully enabled in the cluster. Any failure of
the process will revert to checksums off and the process has to be
started.

This adds a new WAL record that indicates the state of checksums, so
the process works across replicated clusters.

Authors: Magnus Hagander and Daniel Gustafsson
Review: Tomas Vondra, Michael Banck, Heikki Linnakangas, Andrey Borodin
2018-04-05 22:04:48 +02:00
Magnus Hagander
eed1ce72e1 Allow background workers to bypass datallowconn
THis adds a "flags" field to the BackgroundWorkerInitializeConnection()
and BackgroundWorkerInitializeConnectionByOid(). For now only one flag,
BGWORKER_BYPASS_ALLOWCONN, is defined, which allows the worker to ignore
datallowconn.
2018-04-05 19:02:45 +02:00
Alvaro Herrera
484a4a08ab Log when a BRIN autosummarization request fails
Autovacuum's 'workitem' request queue is of limited size, so requests
can fail if they arrive more quickly than autovacuum can process them.
Emit a log message when this happens, to provide better visibility of
this.

Backpatch to 10.  While this represents an API change for
AutoVacuumRequestWork, that function is not yet prepared to deal with
external modules calling it, so there doesn't seem to be any risk (other
than log spam, that is.)

Author: Masahiko Sawada
Reviewed-by: Fabrízio Mello, Ildar Musin, Álvaro Herrera
Discussion: https://postgr.es/m/CAD21AoB1HrQhp6_4rTyHN5kWEJCEsG8YzsjZNt-ctoXSn5Uisw@mail.gmail.com
2018-03-14 11:59:40 -03:00
Tom Lane
38f7831d70 Avoid holding AutovacuumScheduleLock while rechecking table statistics.
In databases with many tables, re-fetching the statistics takes some time,
so that this behavior seriously decreases the available concurrency for
multiple autovac workers.  There's discussion afoot about more complete
fixes, but a simple and back-patchable amelioration is to claim the table
and release the lock before rechecking stats.  If we find out there's no
longer a reason to process the table, re-taking the lock to un-claim the
table is cheap enough.

(This patch is quite old, but got lost amongst a discussion of more
aggressive fixes.  It's not clear when or if such a fix will be
accepted, but in any case it'd be unlikely to get back-patched.
Let's do this now so we have some improvement for the back branches.)

In passing, make the normal un-claim step take AutovacuumScheduleLock
not AutovacuumLock, since that is what is documented to protect the
wi_tableoid field.  This wasn't an actual bug in view of the fact that
readers of that field hold both locks, but it creates some concurrency
penalty against operations that need only AutovacuumLock.

Back-patch to all supported versions.

Jeff Janes

Discussion: https://postgr.es/m/26118.1520865816@sss.pgh.pa.us
2018-03-13 12:28:35 -04:00
Tom Lane
4e0c743c18 Fix cross-checking of ReservedBackends/max_wal_senders/MaxConnections.
We were independently checking ReservedBackends < MaxConnections and
max_wal_senders < MaxConnections, but because walsenders aren't allowed
to use superuser-reserved connections, that's really the wrong thing.
Correct behavior is to insist on ReservedBackends + max_wal_senders being
less than MaxConnections.  Fix the code and associated documentation.

This has been wrong for a long time, but since the situation probably
hardly ever arises in the field (especially pre-v10, when the default
for max_wal_senders was zero), no back-patch.

Discussion: https://postgr.es/m/28271.1520195491@sss.pgh.pa.us
2018-03-08 11:25:26 -05:00
Noah Misch
582edc369c Empty search_path in Autovacuum and non-psql/pgbench clients.
This makes the client programs behave as documented regardless of the
connect-time search_path and regardless of user-created objects.  Today,
a malicious user with CREATE permission on a search_path schema can take
control of certain of these clients' queries and invoke arbitrary SQL
functions under the client identity, often a superuser.  This is
exploitable in the default configuration, where all users have CREATE
privilege on schema "public".

This changes behavior of user-defined code stored in the database, like
pg_index.indexprs and pg_extension_config_dump().  If they reach code
bearing unqualified names, "does not exist" or "no schema has been
selected to create in" errors might appear.  Users may fix such errors
by schema-qualifying affected names.  After upgrading, consider watching
server logs for these errors.

The --table arguments of src/bin/scripts clients have been lax; for
example, "vacuumdb -Zt pg_am\;CHECKPOINT" performed a checkpoint.  That
now fails, but for now, "vacuumdb -Zt 'pg_am(amname);CHECKPOINT'" still
performs a checkpoint.

Back-patch to 9.3 (all supported versions).

Reviewed by Tom Lane, though this fix strategy was not his first choice.
Reported by Arseniy Sharoglazov.

Security: CVE-2018-1058
2018-02-26 07:39:44 -08:00
Robert Haas
9da0cc3528 Support parallel btree index builds.
To make this work, tuplesort.c and logtape.c must also support
parallelism, so this patch adds that infrastructure and then applies
it to the particular case of parallel btree index builds.  Testing
to date shows that this can often be 2-3x faster than a serial
index build.

The model for deciding how many workers to use is fairly primitive
at present, but it's better than not having the feature.  We can
refine it as we get more experience.

Peter Geoghegan with some help from Rushabh Lathia.  While Heikki
Linnakangas is not an author of this patch, he wrote other patches
without which this feature would not have been possible, and
therefore the release notes should possibly credit him as an author
of this feature.  Reviewed by Claudio Freire, Heikki Linnakangas,
Thomas Munro, Tels, Amit Kapila, me.

Discussion: http://postgr.es/m/CAM3SWZQKM=Pzc=CAHzRixKjp2eO5Q0Jg1SoFQqeXFQ647JiwqQ@mail.gmail.com
Discussion: http://postgr.es/m/CAH2-Wz=AxWqDoVvGU7dq856S4r6sJAj6DBn7VMtigkB33N5eyg@mail.gmail.com
2018-02-02 13:32:44 -05:00
Peter Eisentraut
c1869542b3 Use abstracted SSL API in server connection log messages
The existing "connection authorized" server log messages used OpenSSL
API calls directly, even though similar abstracted API calls exist.
Change to use the latter instead.

Change the function prototype for the functions that return the TLS
version and the cipher to return const char * directly instead of
copying into a buffer.  That makes them slightly easier to use.

Add bits= to the message.  psql shows that, so we might as well show the
same information on the client and server.

Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
2018-01-26 09:50:46 -05:00