Commit graph

1076 commits

Author SHA1 Message Date
Tom Lane
97122b8a8e Fix postmaster's handling of a startup-process crash.
Ordinarily, a failure (unexpected exit status) of the startup subprocess
should be considered fatal, so the postmaster should just close up shop
and quit.  However, if we sent the startup process a SIGQUIT or SIGKILL
signal, the failure is hardly "unexpected", and we should attempt restart;
this is necessary for recovery from ordinary backend crashes in hot-standby
scenarios.  I attempted to implement the latter rule with a two-line patch
in commit 442231d7f7, but it now emerges that
that patch was a few bricks shy of a load: it failed to distinguish the
case of a signaled startup process from the case where the new startup
process crashes before reaching database consistency.  That resulted in
infinitely respawning a new startup process only to have it crash again.

To handle this properly, we really must track whether we have sent the
*current* startup process a kill signal.  Rather than add yet another
ad-hoc boolean to the postmaster's state, I chose to unify this with the
existing RecoveryError flag into an enum tracking the startup process's
state.  That seems more consistent with the postmaster's general state
machine design.

Back-patch to 9.0, like the previous patch.
2015-07-09 13:22:23 -04:00
Alvaro Herrera
ab232db2c2 Fix thinko in comment (launcher -> worker) 2015-06-20 11:45:58 -03:00
Alvaro Herrera
41acde2df6 Clamp autovacuum launcher sleep time to 5 minutes
This avoids the problem that it might go to sleep for an unreasonable
amount of time in unusual conditions like the server clock moving
backwards an unreasonable amount of time.

(Simply moving the server clock forward again doesn't solve the problem
unless you wake up the autovacuum launcher manually, say by sending it
SIGHUP).

Per trouble report from Prakash Itnal in
https://www.postgresql.org/message-id/CAHC5u79-UqbapAABH2t4Rh2eYdyge0Zid-X=Xz-ZWZCBK42S0Q@mail.gmail.com

Analyzed independently by Haribabu Kommi and Tom Lane.
2015-06-19 12:44:34 -03:00
Noah Misch
439ff9b6b9 Prevent a double free by not reentering be_tls_close().
Reentering this function with the right timing caused a double free,
typically crashing the backend.  By synchronizing a disconnection with
the authentication timeout, an unauthenticated attacker could achieve
this somewhat consistently.  Call be_tls_close() solely from within
proc_exit_prepare().  Back-patch to 9.0 (all supported versions).

Benkocs Norbert Attila

Security: CVE-2015-3165
2015-05-18 10:02:37 -04:00
Alvaro Herrera
37dc228e8d Fix autovacuum launcher shutdown sequence
It was previously possible to have the launcher re-execute its main loop
before shutting down if some other signal was received or an error
occurred after getting SIGTERM, as reported by Qingqing Zhou.

While investigating, Tom Lane further noticed that if autovacuum had
been disabled in the config file, it would misbehave by trying to start
a new worker instead of bailing out immediately -- it would consider
itself as invoked in emergency mode.

Fix both problems by checking the shutdown flag in a few more places.
These problems have existed since autovacuum was introduced, so
backpatch all the way back.
2015-04-08 13:19:49 -03:00
Heikki Linnakangas
289592b23e Be more careful to not lose sync in the FE/BE protocol.
If any error occurred while we were in the middle of reading a protocol
message from the client, we could lose sync, and incorrectly try to
interpret a part of another message as a new protocol message. That will
usually lead to an "invalid frontend message" error that terminates the
connection. However, this is a security issue because an attacker might
be able to deliberately cause an error, inject a Query message in what's
supposed to be just user data, and have the server execute it.

We were quite careful to not have CHECK_FOR_INTERRUPTS() calls or other
operations that could ereport(ERROR) in the middle of processing a message,
but a query cancel interrupt or statement timeout could nevertheless cause
it to happen. Also, the V2 fastpath and COPY handling were not so careful.
It's very difficult to recover in the V2 COPY protocol, so we will just
terminate the connection on error. In practice, that's what happened
previously anyway, as we lost protocol sync.

To fix, add a new variable in pqcomm.c, PqCommReadingMsg, that is set
whenever we're in the middle of reading a message. When it's set, we cannot
safely ERROR out and continue running, because we might've read only part
of a message. PqCommReadingMsg acts somewhat similarly to critical sections
in that if an error occurs while it's set, the error handler will force the
connection to be terminated, as if the error was FATAL. It's not
implemented by promoting ERROR to FATAL in elog.c, like ERROR is promoted
to PANIC in critical sections, because we want to be able to use
PG_TRY/CATCH to recover and regain protocol sync. pq_getmessage() takes
advantage of that to prevent an OOM error from terminating the connection.

To prevent unnecessary connection terminations, add a holdoff mechanism
similar to HOLD/RESUME_INTERRUPTS() that can be used hold off query cancel
interrupts, but still allow die interrupts. The rules on which interrupts
are processed when are now a bit more complicated, so refactor
ProcessInterrupts() and the calls to it in signal handlers so that the
signal handlers always call it if ImmediateInterruptOK is set, and
ProcessInterrupts() can decide to not do anything if the other conditions
are not met.

Reported by Emil Lenngren. Patch reviewed by Noah Misch and Andres Freund.
Backpatch to all supported versions.

Security: CVE-2015-0244
2015-02-02 17:09:35 +02:00
Tom Lane
33b723538d Adjust "pgstat wait timeout" message to be a translatable LOG message.
Per discussion, change the log level of this message to be LOG not WARNING.
The main point of this change is to avoid causing buildfarm run failures
when the stats collector is exceptionally slow to respond, which it not
infrequently is on some of the smaller/slower buildfarm members.

This change does lose notice to an interactive user when his stats query
is looking at out-of-date stats, but the majority opinion (not necessarily
that of yours truly) is that WARNING messages would probably not get
noticed anyway on heavily loaded production systems.  A LOG message at
least ensures that the problem is recorded somewhere where bulk auditing
for the issue is possible.

Also, instead of an untranslated "pgstat wait timeout" message, provide
a translatable and hopefully more understandable message "using stale
statistics instead of current ones because stats collector is not
responding".  The original text was written hastily under the assumption
that it would never really happen in practice, which we now know to be
unduly optimistic.

Back-patch to all active branches, since we've seen the buildfarm issue
in all branches.
2015-01-19 23:01:41 -05:00
Andres Freund
7a70b0d31f Make logging_collector=on work with non-windows EXEC_BACKEND again.
Commit b94ce6e80 reordered postmaster's startup sequence so that the
tempfile directory is only cleaned up after all the necessary state
for pg_ctl is collected.  Unfortunately the chosen location is after
the syslogger has been started; which normally is fine, except for
!WIN32 EXEC_BACKEND builds, which pass information to children via
files in the temp directory.

Move the call to RemovePgTempFiles() to just before the syslogger has
started. That's the first child we fork.

Luckily EXEC_BACKEND is pretty much only used by endusers on windows,
which has a separate method to pass information to children. That
means the real world impact of this bug is very small.

Discussion: 20150113182344.GF12272@alap3.anarazel.de

Backpatch to 9.1, just as the previous commit was.
2015-01-14 00:14:53 +01:00
Noah Misch
5ca4e444cb On Darwin, detect and report a multithreaded postmaster.
Darwin --enable-nls builds use a substitute setlocale() that may start a
thread.  Buildfarm member orangutan experienced BackendList corruption
on account of different postmaster threads executing signal handlers
simultaneously.  Furthermore, a multithreaded postmaster risks undefined
behavior from sigprocmask() and fork().  Emit LOG messages about the
problem and its workaround.  Back-patch to 9.0 (all supported versions).
2015-01-07 22:41:49 -05:00
Alvaro Herrera
769d6815e5 Don't balance vacuum cost delay when per-table settings are in effect
When there are cost-delay-related storage options set for a table,
trying to make that table participate in the autovacuum cost-limit
balancing algorithm produces undesirable results: instead of using the
configured values, the global values are always used,
as illustrated by Mark Kirkwood in
http://www.postgresql.org/message-id/52FACF15.8020507@catalyst.net.nz

Since the mechanism is already complicated, just disable it for those
cases rather than trying to make it cope.  There are undesirable
side-effects from this too, namely that the total I/O impact on the
system will be higher whenever such tables are vacuumed.  However, this
is seen as less harmful than slowing down vacuum, because that would
cause bloat to accumulate.  Anyway, in the new system it is possible to
tweak options to get the precise behavior one wants, whereas with the
previous system one was simply hosed.

This has been broken forever, so backpatch to all supported branches.
This might affect systems where cost_limit and cost_delay have been set
for individual tables.
2014-10-03 13:01:27 -03:00
Tom Lane
c83aed34be Avoid wholesale autovacuuming when autovacuum is nominally off.
When autovacuum is nominally off, we will still launch autovac workers
to vacuum tables that are at risk of XID wraparound.  But after we'd done
that, an autovac worker would proceed to autovacuum every table in the
targeted database, if they meet the usual thresholds for autovacuuming.
This is at best pretty unexpected; at worst it delays response to the
wraparound threat.  Fix it so that if autovacuum is nominally off, we
*only* do forced vacuums and not any other work.

Per gripe from Andrey Zhidenkov.  This has been like this all along,
so back-patch to all supported branches.
2014-07-30 14:41:53 -04:00
Tom Lane
9601cb7bcc Fix unportable setvbuf() usage in initdb.
In yesterday's commit 2dc4f011fd, I tried
to force buffering of stdout/stderr in initdb to be what it is by
default when the program is run interactively on Unix (since that's how
most manual testing is done).  This tripped over the fact that Windows
doesn't support _IOLBF mode.  We dealt with that a long time ago in
syslogger.c by falling back to unbuffered mode on Windows.  Export that
solution in port.h and use it in initdb.

Back-patch to 8.4, like the previous commit.
2014-05-15 15:58:01 -04:00
Bruce Momjian
0b44914c21 Remove tabs after spaces in C comments
This was not changed in HEAD, but will be done later as part of a
pgindent run.  Future pgindent runs will also do this.

Report by Tom Lane

Backpatch through all supported branches, but not HEAD
2014-05-06 11:26:27 -04:00
Tom Lane
bac05d622d Use AF_UNSPEC not PF_UNSPEC in getaddrinfo calls.
According to the Single Unix Spec and assorted man pages, you're supposed
to use the constants named AF_xxx when setting ai_family for a getaddrinfo
call.  In a few places we were using PF_xxx instead.  Use of PF_xxx
appears to be an ancient BSD convention that was not adopted by later
standardization.  On BSD and most later Unixen, it doesn't matter much
because those constants have equivalent values anyway; but nonetheless
this code is not per spec.

In the same vein, replace PF_INET by AF_INET in one socket() call, which
wasn't even consistent with the other socket() call in the same function
let alone the remainder of our code.

Per investigation of a Cygwin trouble report from Marco Atzeri.  It's
probably a long shot that this will fix his issue, but it's wrong in
any case.
2014-04-16 13:21:31 -04:00
Bruce Momjian
966f015b60 check socket creation errors against PGINVALID_SOCKET
Previously, in some places, socket creation errors were checked for
negative values, which is not true for Windows because sockets are
unsigned.  This masked socket creation errors on Windows.

Backpatch through 9.0.  8.4 doesn't have the infrastructure to fix this.
2014-04-16 10:45:48 -04:00
Tom Lane
53463e2479 Block signals earlier during postmaster startup.
Formerly, we set up the postmaster's signal handling only when we were
about to start launching subprocesses.  This is a bad idea though, as
it means that for example a SIGINT arriving before that will kill the
postmaster instantly, perhaps leaving lockfiles, socket files, shared
memory, etc laying about.  We'd rather that such a signal caused orderly
postmaster termination including releasing of those resources.  A simple
fix is to move the PostmasterMain stanza that initializes signal handling
to an earlier point, before we've created any such resources.  Then, an
early-arriving signal will be blocked until we're ready to deal with it
in the usual way.  (The only part that really needs to be moved up is
blocking of signals, but it seems best to keep the signal handler
installation calls together with that; for one thing this ensures the
kernel won't drop any signals we wished to get.  The handlers won't get
invoked in any case until we unblock signals in ServerLoop.)

Per a report from MauMau.  He proposed changing the way "pg_ctl stop"
works to deal with this, but that'd just be masking one symptom not
fixing the core issue.

It's been like this since forever, so back-patch to all supported branches.
2014-04-05 18:16:14 -04:00
Tom Lane
029decfec6 Fix assorted issues in client host name lookup.
The code for matching clients to pg_hba.conf lines that specify host names
(instead of IP address ranges) failed to complain if reverse DNS lookup
failed; instead it silently didn't match, so that you might end up getting
a surprising "no pg_hba.conf entry for ..." error, as seen in bug #9518
from Mike Blackwell.  Since we don't want to make this a fatal error in
situations where pg_hba.conf contains a mixture of host names and IP
addresses (clients matching one of the numeric entries should not have to
have rDNS data), remember the lookup failure and mention it as DETAIL if
we get to "no pg_hba.conf entry".  Apply the same approach to forward-DNS
lookup failures, too, rather than treating them as immediate hard errors.

Along the way, fix a couple of bugs that prevented us from detecting an
rDNS lookup error reliably, and make sure that we make only one rDNS lookup
attempt; formerly, if the lookup attempt failed, the code would try again
for each host name entry in pg_hba.conf.  Since more or less the whole
point of this design is to ensure there's only one lookup attempt not one
per entry, the latter point represents a performance bug that seems
sufficient justification for back-patching.

Also, adjust src/port/getaddrinfo.c so that it plays as well as it can
with this code.  Which is not all that well, since it does not have actual
support for rDNS lookup, but at least it should return the expected (and
required by spec) error codes so that the main code correctly perceives the
lack of functionality as a lookup failure.  It's unlikely that PG is still
being used in production on any machines that require our getaddrinfo.c,
so I'm not excited about working harder than this.

To keep the code in the various branches similar, this includes
back-patching commits c424d0d105 and
1997f34db4 into 9.2 and earlier.

Back-patch to 9.1 where the facility for hostnames in pg_hba.conf was
introduced.
2014-04-02 17:11:31 -04:00
Tom Lane
e83bee8ddc Fix bugs in manipulation of PgBackendStatus.st_clienthostname.
Initialization of this field was not being done according to the
st_changecount protocol (it has to be done within the changecount increment
range, not outside).  And the test to see if the value should be reported
as null was wrong.  Noted while perusing uses of Port.remote_hostname.

This was wrong from the introduction of this code (commit 4a25bc145),
so back-patch to 9.1.
2014-04-01 21:30:14 -04:00
Tom Lane
2de9051868 Fix possible crashes due to using elog/ereport too early in startup.
Per reports from Andres Freund and Luke Campbell, a server failure during
set_pglocale_pgservice results in a segfault rather than a useful error
message, because the infrastructure needed to use ereport hasn't been
initialized; specifically, MemoryContextInit hasn't been called.
One known cause of this is starting the server in a directory it
doesn't have permission to read.

We could try to prevent set_pglocale_pgservice from using anything that
depends on palloc or elog, but that would be messy, and the odds of future
breakage seem high.  Moreover there are other things being called in main.c
that look likely to use palloc or elog too --- perhaps those things
shouldn't be there, but they are there today.  The best solution seems to
be to move the call of MemoryContextInit to very early in the backend's
real main() function.  I've verified that an elog or ereport occurring
immediately after that is now capable of sending something useful to
stderr.

I also added code to elog.c to print something intelligible rather than
just crashing if MemoryContextInit hasn't created the ErrorContext.
This could happen if MemoryContextInit itself fails (due to malloc
failure), and provides some future-proofing against someone trying to
sneak in new code even earlier in server startup.

Back-patch to all supported branches.  Since we've only heard reports of
this type of failure recently, it may be that some recent change has made
it more likely to see a crash of this kind; but it sure looks like it's
broken all the way back.
2014-01-11 16:35:34 -05:00
Tom Lane
a0c2492b95 Avoid updating our PgBackendStatus entry when track_activities is off.
The point of turning off track_activities is to avoid this reporting
overhead, but a thinko in commit 4f42b546fd
caused pgstat_report_activity() to perform half of its updates anyway.
Fix that, and also make sure that we clear all the now-disabled fields
when transitioning to the non-reporting state.
2013-04-03 14:13:34 -04:00
Tom Lane
a6e0cd7b76 Fix insecure parsing of server command-line switches.
An oversight in commit e710b65c1c allowed
database names beginning with "-" to be treated as though they were secure
command-line switches; and this switch processing occurs before client
authentication, so that even an unprivileged remote attacker could exploit
the bug, needing only connectivity to the postmaster's port.  Assorted
exploits for this are possible, some requiring a valid database login,
some not.  The worst known problem is that the "-r" switch can be invoked
to redirect the process's stderr output, so that subsequent error messages
will be appended to any file the server can write.  This can for example be
used to corrupt the server's configuration files, so that it will fail when
next restarted.  Complete destruction of database tables is also possible.

Fix by keeping the database name extracted from a startup packet fully
separate from command-line switches, as had already been done with the
user name field.

The Postgres project thanks Mitsumasa Kondo for discovering this bug,
Kyotaro Horiguchi for drafting the fix, and Noah Misch for recognizing
the full extent of the danger.

Security: CVE-2013-1899
2013-04-01 14:00:59 -04:00
Tom Lane
21ce40c8ea Reset OpenSSL randomness state in each postmaster child process.
Previously, if the postmaster initialized OpenSSL's PRNG (which it will do
when ssl=on in postgresql.conf), the same pseudo-random state would be
inherited by each forked child process.  The problem is masked to a
considerable extent if the incoming connection uses SSL encryption, but
when it does not, identical pseudo-random state is made available to
functions like contrib/pgcrypto.  The process's PID does get mixed into any
requested random output, but on most systems that still only results in 32K
or so distinct random sequences available across all Postgres sessions.
This might allow an attacker who has database access to guess the results
of "secure" operations happening in another session.

To fix, forcibly reset the PRNG after fork().  Each child process that has
need for random numbers from OpenSSL's generator will thereby be forced to
go through OpenSSL's normal initialization sequence, which should provide
much greater variability of the sequences.  There are other ways we might
do this that would be slightly cheaper, but this approach seems the most
future-proof against SSL-related code changes.

This has been assigned CVE-2013-1900, but since the issue and the patch
have already been publicized on pgsql-hackers, there's no point in trying
to hide this commit.

Back-patch to all supported branches.

Marko Kreen
2013-03-27 18:50:25 -04:00
Tom Lane
89b81f0d8b Fix possible failure to send final transaction counts to stats collector.
Normally, we suppress sending a tabstats message to the collector unless
there were some actual table stats to send.  However, during backend exit
we should force out the message if there are any transaction commit/abort
counts to send, else the session's last few commit/abort counts will never
get reported at all.  We had logic for this, but the short-circuit test
at the top of pgstat_report_stat() ignored the "force" flag, with the
consequence that session-ending transactions that touched no database-local
tables would not get counted.  Seems to be an oversight in my commit
641912b4d1, which added the "force" flag.
That was back in 8.3, so back-patch to all supported versions.
2013-02-07 14:44:10 -05:00
Heikki Linnakangas
3cef2179e0 Also fix rotation of csvlog on Windows.
Backpatch to 9.2, like the previous fix.
2013-01-24 11:43:22 +02:00
Tom Lane
14ba9b11ea Fix failure to rotate postmaster log file for size reasons on Windows.
When we eliminated "unnecessary" wakeups of the syslogger process, we
broke size-based logfile rotation on Windows, because on that platform
data transfer is done in a separate thread.  While non-Windows platforms
would recheck the output file size after every log message, Windows only
did so when the control thread woke up for some other reason, which might
be quite infrequent.  Per bug #7814 from Tsunezumi.  Back-patch to 9.2
where the problem was introduced.

Jeff Janes
2013-01-23 22:08:14 -05:00
Tom Lane
8af60da9dd Don't launch new child processes after we've been told to shut down.
Once we've received a shutdown signal (SIGINT or SIGTERM), we should not
launch any more child processes, even if we get signals requesting such.
The normal code path for spawning backends has always understood that,
but the postmaster's infrastructure for hot standby and autovacuum didn't
get the memo.  As reported by Hari Babu in bug #7643, this could lead to
failure to shut down at all in some cases, such as when SIGINT is received
just before the startup process sends PMSIGNAL_RECOVERY_STARTED: we'd
launch a bgwriter and checkpointer, and then those processes would have no
idea that they ought to quit.  Similarly, launching a new autovacuum worker
would result in waiting till it finished before shutting down.

Also, switch the order of the code blocks in reaper() that detect startup
process crash versus shutdown termination.  Once we've sent it a signal,
we should not consider that exit(1) is surprising.  This is just a cosmetic
fix since shutdown occurs correctly anyway, but better not to log a phony
complaint about startup process crash.

Back-patch to 9.0.  Some parts of this might be applicable before that,
but given the lack of prior complaints I'm not going to worry too much
about older branches.
2012-11-21 15:18:43 -05:00
Tom Lane
89067bc16a Fix syslogger to not fail when log_rotation_age exceeds 2^31 milliseconds.
We need to avoid calling WaitLatch with timeouts exceeding INT_MAX.
Fortunately a simple clamp will do the trick, since no harm is done if
the wait times out before it's really time to rotate the log file.
Per bug #7670 (probably bug #7545 is the same thing, too).

In passing, fix bogus definition of log_rotation_age's maximum value in
guc.c --- it was numerically right, but only because MINS_PER_HOUR and
SECS_PER_MINUTE have the same value.

Back-patch to 9.2.  Before that, syslogger wasn't using WaitLatch.
2012-11-18 16:16:47 -05:00
Tom Lane
d7598aeea9 Close un-owned SMgrRelations at transaction end.
If an SMgrRelation is not "owned" by a relcache entry, don't allow it to
live past transaction end.  This design allows the same SMgrRelation to be
used for blind writes of multiple blocks during a transaction, but ensures
that we don't hold onto such an SMgrRelation indefinitely.  Because an
SMgrRelation typically corresponds to open file descriptors at the fd.c
level, leaving it open when there's no corresponding relcache entry can
mean that we prevent the kernel from reclaiming deleted disk space.
(While CacheInvalidateSmgr messages usually fix that, there are cases
where they're not issued, such as DROP DATABASE.  We might want to add
some more sinval messaging for that, but I'd be inclined to keep this
type of logic anyway, since allowing VFDs to accumulate indefinitely
for blind-written relations doesn't seem like a good idea.)

This code replaces a previous attempt towards the same goal that proved
to be unreliable.  Back-patch to 9.1 where the previous patch was added.
2012-10-17 12:38:28 -04:00
Tom Lane
4d4005cb48 Split up process latch initialization for more-fail-soft behavior.
In the previous coding, new backend processes would attempt to create their
self-pipe during the OwnLatch call in InitProcess.  However, pipe creation
could fail if the kernel is short of resources; and the system does not
recover gracefully from a FATAL error right there, since we have armed the
dead-man switch for this process and not yet set up the on_shmem_exit
callback that would disarm it.  The postmaster then forces an unnecessary
database-wide crash and restart, as reported by Sean Chittenden.

There are various ways we could rearrange the code to fix this, but the
simplest and sanest seems to be to split out creation of the self-pipe into
a new function InitializeLatchSupport, which must be called from a place
where failure is allowed.  For most processes that gets called in
InitProcess or InitAuxiliaryProcess, but processes that don't call either
but still use latches need their own calls.

Back-patch to 9.1, which has only a part of the latch logic that 9.2 and
HEAD have, but nonetheless includes this bug.
2012-10-14 23:00:01 -04:00
Magnus Hagander
63e719ec51 Fix upper limit of superuser_reserved_connections, add limit for wal_senders
Should be limited to the maximum number of connections excluding
autovacuum workers, not including.

Add similar check for max_wal_senders, which should never be higher than
max_connections.
2012-08-10 14:51:57 +02:00
Tom Lane
63aba79c7f Fix syslogger so that log_truncate_on_rotation works in the first rotation.
In the original coding of the log rotation stuff, we did not bother to make
the truncation logic work for the very first rotation after postmaster
start (or after a syslogger crash and restart).  It just always appended
in that case.  It did not seem terribly important at the time, but we've
recently had two separate complaints from people who expected it to work
unsurprisingly.  (Both users tend to restart the postmaster about as often
as a log rotation is configured to happen, which is maybe not typical use,
but still...)  Since the initial log file is opened in the postmaster,
fixing this requires passing down some more state to the syslogger child
process.

It's always been like this, so back-patch to all supported branches.
2012-07-31 14:36:58 -04:00
Tom Lane
1e9326d6a3 Fix statistics breakage from bgwriter/checkpointer process split.
ForwardFsyncRequest() supposed that it could only be called in regular
backends, which used to be true; but since the splitup of bgwriter and
checkpointer, it is also called in the bgwriter.  We do not want to count
such calls in pg_stat_bgwriter.buffers_backend statistics, so fix things
so that they aren't.

(It's worth noting here that this implies an alarmingly large increase in
the expected amount of cross-process fsync request traffic, which may well
mean that the process splitup was not such a hot idea.)
2012-07-18 15:40:35 -04:00
Tom Lane
d843589e5a Fix management of pendingOpsTable in auxiliary processes.
mdinit() was misusing IsBootstrapProcessingMode() to decide whether to
create an fsync pending-operations table in the current process.  This led
to creating a table not only in the startup and checkpointer processes as
intended, but also in the bgwriter process, not to mention other auxiliary
processes such as walwriter and walreceiver.  Creation of the table in the
bgwriter is fatal, because it absorbs fsync requests that should have gone
to the checkpointer; instead they just sit in bgwriter local memory and are
never acted on.  So writes performed by the bgwriter were not being fsync'd
which could result in data loss after an OS crash.  I think there is no
live bug with respect to walwriter and walreceiver because those never
perform any writes of shared buffers; but the potential is there for
future breakage in those processes too.

To fix, make AuxiliaryProcessMain() export the current process's
AuxProcType as a global variable, and then make mdinit() test directly for
the types of aux process that should have a pendingOpsTable.  Having done
that, we might as well also get rid of the random bool flags such as
am_walreceiver that some of the aux processes had grown.  (Note that we
could not have fixed the bug by examining those variables in mdinit(),
because it's called from BaseInit() which is run by AuxiliaryProcessMain()
before entering any of the process-type-specific code.)

Back-patch to 9.2, where the problem was introduced by the split-up of
bgwriter and checkpointer processes.  The bogus pendingOpsTable exists
in walwriter and walreceiver processes in earlier branches, but absent
any evidence that it causes actual problems there, I'll leave the older
branches alone.
2012-07-18 15:28:17 -04:00
Tom Lane
4abcce8cab Improve coding around the fsync request queue.
In all branches back to 8.3, this patch fixes a questionable assumption in
CompactCheckpointerRequestQueue/CompactBgwriterRequestQueue that there are
no uninitialized pad bytes in the request queue structs.  This would only
cause trouble if (a) there were such pad bytes, which could happen in 8.4
and up if the compiler makes enum ForkNumber narrower than 32 bits, but
otherwise would require not-currently-planned changes in the widths of
other typedefs; and (b) the kernel has not uniformly initialized the
contents of shared memory to zeroes.  Still, it seems a tad risky, and we
can easily remove any risk by pre-zeroing the request array for ourselves.
In addition to that, we need to establish a coding rule that struct
RelFileNode can't contain any padding bytes, since such structs are copied
into the request array verbatim.  (There are other places that are assuming
this anyway, it turns out.)

In 9.1 and up, the risk was a bit larger because we were also effectively
assuming that struct RelFileNodeBackend contained no pad bytes, and with
fields of different types in there, that would be much easier to break.
However, there is no good reason to ever transmit fsync or delete requests
for temp files to the bgwriter/checkpointer, so we can revert the request
structs to plain RelFileNode, getting rid of the padding risk and saving
some marginal number of bytes and cycles in fsync queue manipulation while
we are at it.  The savings might be more than marginal during deletion of
a temp relation, because the old code transmitted an entirely useless but
nonetheless expensive-to-process ForgetRelationFsync request to the
background process, and also had the background process perform the file
deletion even though that can safely be done immediately.

In addition, make some cleanup of nearby comments and small improvements to
the code in CompactCheckpointerRequestQueue/CompactBgwriterRequestQueue.
2012-07-17 16:57:10 -04:00
Tom Lane
93f4d7f806 Support Linux's oom_score_adj API as well as the older oom_adj API.
The simplest way to handle this is just to copy-and-paste the relevant
code block in fork_process.c, so that's what I did. (It's possible that
something more complicated would be useful to packagers who want to work
with either the old or the new API; but at this point the number of such
people is rapidly approaching zero, so let's just get the minimal thing
done.)  Update relevant documentation as well.
2012-06-13 15:35:52 -04:00
Bruce Momjian
927d61eeff Run pgindent on 9.2 source tree in preparation for first 9.3
commit-fest.
2012-06-10 15:20:04 -04:00
Simon Riggs
055c352abb After any checkpoint, close all smgr files handles in bgwriter 2012-06-01 09:24:53 +01:00
Simon Riggs
a297d64d92 Checkpointer starts before bgwriter to avoid missing fsync requests.
Noted while testing Hot Standby startup.
2012-06-01 08:25:17 +01:00
Simon Riggs
1ec6a2bbc9 Provide interim statistics while in mid-checkpoint.
Re-implements similar functionality in 9.1 and previously which
was removed during split of checkpointer and bgwriter.

Requested/spotted by Magnus Hagander
2012-06-01 08:19:06 +01:00
Peter Eisentraut
27314d32a8 Suppress -Wunused-result warning about write()
This is related to aa90e148ca, but this
code is only used under -DLINUX_OOM_ADJ, so it was apparently
overlooked then.
2012-05-27 22:35:01 +03:00
Tom Lane
b94ce6e807 Move postmaster's RemovePgTempFiles call to a less randomly chosen place.
There is no reason to do this as early as possible in postmaster startup,
and good reason not to do it until we have completely created the
postmaster's lock file, namely that it might contribute to pg_ctl thinking
that postmaster startup has timed out.  (This would require a rather
unusual amount of time to be spent scanning temp file directories, but we
have at least one field report of it happening reproducibly.)

Back-patch to 9.1.  Before that, pg_ctl didn't wait for additional info to
be added to the lock file, so it wasn't a problem.

Note that this is not a complete fix to the slow-start issue in 9.1,
because we still had identify_system_timezone being run during postmaster
start in 9.1.  But that's at least a reasonably well-defined delay, with
an easy workaround if needed, whereas the temp-files scan is not so
predictable and cannot be avoided.
2012-05-21 22:50:30 -04:00
Tom Lane
9b63e9869f In pgstat.c, use a timeout in WaitLatchOrSocket only on Windows.
We have no need for a timeout here really, but some broken products from
Redmond seem to lose FD_READ events occasionally, and waking up and
retrying the recv() is the only known way to work around that.  Perhaps
somebody will be motivated to figure out a better answer here; but not I.
2012-05-14 23:51:34 -04:00
Tom Lane
5a2bb06012 Revert "Add some temporary instrumentation to pgstat.c."
This reverts commit 7d88bb73f7.
That instrumentation has served its purpose.
2012-05-14 23:08:10 -04:00
Tom Lane
d461d0502b For testing purposes, reinsert a timeout in pgstat.c's wait call.
Test results from buildfarm members mastodon/narwhal (Windows Server 2003)
make it look like that platform just plain loses FD_READ events
occasionally, and the only reason our previous coding seemed to work was
that it timed out every couple of seconds and retried the whole operation.
Try to verify this by reinserting a finite timeout into the pgstat loop.
This isn't meant to be a permanent patch either, just to confirm or
disprove a theory.
2012-05-14 15:03:14 -04:00
Tom Lane
f1ca51549e Force pgwin32_recv into nonblock mode when called from pgstat.c.
This should get rid of the usage of pgwin32_waitforsinglesocket entirely,
and perhaps thereby remove the race condition that's evidently still
present on some versions of Windows.  The previous arrangement was a bit
unsafe anyway, since waiting at the recv() would not allow pgstat to notice
postmaster death.
2012-05-14 10:57:07 -04:00
Heikki Linnakangas
9e4637bf89 Update comments that became out-of-date with the PGXACT struct.
When the "hot" members of PGPROC were split off to separate PGXACT structs,
many PGPROC fields referred to in comments were moved to PGXACT, but the
comments were neglected in the commit. Mostly this is just a search/replace
of PGPROC with PGXACT, but the way the dummy PGPROC entries are created for
prepared transactions changed more, making some of the comments totally
bogus.

Noah Misch
2012-05-14 10:28:55 +03:00
Tom Lane
7d88bb73f7 Add some temporary instrumentation to pgstat.c.
Log main-loop blocking events and the results of inquiry messages.
This is to get some clarity as to what's happening on those Windows
buildfarm members that still don't like the latch-ified stats collector.
This bulks up the postmaster log a tad, so I won't leave it in place for
long.
2012-05-13 21:11:31 -04:00
Tom Lane
966970ed63 Re-revert stats collector latch changes.
This reverts commit cb2f2873d6, restoring
the latch-ified stats collector logic.  We'll soon see if this works any
better on the Windows buildfarm machines.
2012-05-13 14:44:39 -04:00
Tom Lane
398b240151 Avoid unnecessary process wakeups in the log collector.
syslogger was coded to wake up once per second whether there was anything
useful to do or not.  As part of our campaign to reduce the server's idle
power consumption, change it to use a latch for waiting.  Now, in the
absence of any data to log or any signals to service, it will only wake up
at the programmed logfile rotation times (if any).
2012-05-12 19:21:54 -04:00
Tom Lane
d0c231d132 Cosmetic adjustments for postmaster's handling of checkpointer.
Correct some comments, order some operations a bit more consistently.
No functional changes.
2012-05-11 17:46:37 -04:00