false positives during Hot Standby conflict processing. Simple
patch to enhance conflict processing, following previous discussions.
Controlled by parameter minimize_standby_conflicts = on | off, with
default off allows measurement of performance impact to see whether
it should be set on all the time.
records for heap and btree. Minor change, mostly API changes to
pass through the required values. This is a simple change though
also provides the refactoring required for further enhancements
to conflict processing using the relOid. Changes only have effect
during Hot Standby.
restore_command, if the connection to the primary server is lost. This
ensures that the standby can recover automatically, if the connection is
lost for a long time and standby falls behind so much that the required
WAL segments have been archived and deleted in the master.
This also makes standby_mode useful without streaming replication; the
server will keep retrying restore_command every few seconds until the
trigger file is found. That's the same basic functionality pg_standby
offers, but without the bells and whistles.
To implement that, refactor the ReadRecord/FetchRecord functions. The
FetchRecord() function introduced in the original streaming replication
patch is removed, and all the retry logic is now in a new function called
XLogReadPage(). XLogReadPage() is now responsible for executing
restore_command, launching walreceiver, and waiting for new WAL to arrive
from primary, as required.
This also changes the life cycle of walreceiver. When launched, it now only
tries to connect to the master once, and exits if the connection fails, or
is lost during streaming for any reason. The startup process detects the
death, and re-launches walreceiver if necessary.
default of "plpgsql". This is more reasonable than it was when the DO patch
was written, because we have since decided that plpgsql should be installed
by default. Per discussion, having a parameter for this doesn't seem useful
enough to justify the risk of application breakage if the value is changed
unexpectedly.
and implement OVERLAY() for bit strings and bytea.
In passing also convert text OVERLAY() to a true built-in, instead of
relying on a SQL function.
Leonardo F, reviewed by Kevin Grittner
woken by alarm we send SIGUSR1 to all backends requesting that they
check to see if they are blocking Startup process. If so, they throw
ERROR/FATAL as for other conflict resolutions. Deadlock stop gap
removed. max_standby_delay = -1 option removed to prevent deadlock.
Attributes can now have options, just as relations and tablespaces do, and
the reloptions code is used to parse, validate, and store them. For
simplicity and because these options are not performance critical, we store
them in a separate cache rather than the main relcache.
Thanks to Alex Hunsaker for the review.
that would've been WAL-logged if archiving was enabled. If we encounter
such records in archive recovery anyway, we know that some data is
missing from the log. A WARNING is emitted in that case.
Original patch by Fujii Masao, with changes by me.
binary, revert PGDLLIMPORT decoration of global variables. I'm not sure
if there's any real harm from unnecessary PGDLLIMPORTs, but these are all
internal variables that external modules really shouldn't be messing
with. ThisTimeLineID still needs PGDLLIMPORT.
walreceiver as whole into a dynamically loaded module, split the
libpq-specific parts of it into dynamically loaded module and keep the rest
in the main backend binary.
Although Tom fixed the Windows compilation problems with the old walreceiver
module already, this is a cleaner division of labour and makes the code
more readable. There's also the prospect of adding new transport methods
as pluggable modules in the future, which this patch makes easier, though for
now the API between libpqwalreceiver and walreceiver process should be
considered private.
The libpq-specific module is now in src/backend/replication/libpqwalreceiver,
and the part linked with postgres binary is in
src/backend/replication/walreceiver.c.
pg_constraint before searching pg_trigger. This allows saner handling of
corner cases; in particular we now say "constraint is not deferrable"
rather than "constraint does not exist" when the command is applied to
a constraint that's inherently non-deferrable. Per a gripe several months
ago from hubert depesz lubaczewski.
To make this work without breaking user-defined constraint triggers,
we have to add entries for them to pg_constraint. However, in return
we can remove the pgconstrname column from pg_constraint, which represents
a fairly sizable space savings. I also replaced the tgisconstraint column
with tgisinternal; the old meaning of tgisconstraint can now be had by
testing for nonzero tgconstraint, while there is no other way to get
the old meaning of nonzero tgconstraint, namely that the trigger was
internally generated rather than being user-created.
In passing, fix an old misstatement in the docs and comments, namely that
pg_trigger.tgdeferrable is exactly redundant with pg_constraint.condeferrable.
Actually, we mark RI action triggers as nondeferrable even when they belong to
a nominally deferrable FK constraint. The SET CONSTRAINTS code now relies on
that instead of hard-coding a list of exception OIDs.
Conflict reason is passed through directly to the backend, so we can
take decisions about the effect of the conflict based upon the local
state. No specific changes, as yet, though this prepares for later work.
CancelVirtualTransaction() sends signals while holding ProcArrayLock.
Introduce errdetail_abort() to give message detail explaining that the
abort was caused by conflict processing. Remove CONFLICT_MODE states
in favour of using PROCSIG_RECOVERY_CONFLICT states directly, for clarity.
parse analysis phase, rather than at execution time. This makes parameter
handling work the same as it does in ordinary plannable queries, and in
particular fixes the incompatibility that Pavel pointed out with plpgsql's
new handling of variable references. plancache.c gets a little bit
grottier, but the alternatives seem worse.
This includes two new kinds of postmaster processes, walsenders and
walreceiver. Walreceiver is responsible for connecting to the primary server
and streaming WAL to disk, while walsender runs in the primary server and
streams WAL from disk to the client.
Documentation still needs work, but the basics are there. We will probably
pull the replication section to a new chapter later on, as well as the
sections describing file-based replication. But let's do that as a separate
patch, so that it's easier to see what has been added/changed. This patch
also adds a new section to the chapter about FE/BE protocol, documenting the
protocol used by walsender/walreceivxer.
Bump catalog version because of two new functions,
pg_last_xlog_receive_location() and pg_last_xlog_replay_location(), for
monitoring the progress of replication.
Fujii Masao, with additional hacking by me
of this are to centralise the conflict code to allow further change,
as well as to allow passing through the full reason for the conflict
through to the conflicting backends. Backend state alters how we
can handle different types of conflict so this is now required.
As originally suggested by Heikki, no longer optional.
can upgrade clusters without renaming the tablespace directories. New
directory structure format is, e.g.:
$PGDATA/pg_tblspc/20981/PG_8.5_201001061/719849/83292814
Previously we only cancelled sessions that were in-transaction.
Simple fix is to just cancel all sessions without waiting. Doing
it this way avoids complicating common code paths, which would
not be worth the trouble to cover this rare case.
Problem report and fix by Andres Freund, edited somewhat by me
This silences some warnings on Win64. Not using the proper SOCKET datatype
was actually wrong on Win32 as well, but didn't cause any warnings there.
Also create define PGINVALID_SOCKET to indicate an invalid/non-existing
socket, instead of using a hardcoded -1 value.
Previously, fastgetattr() and heap_getattr() tested their fourth argument
against a null pointer, but any attempt to use them with a literal-NULL
fourth argument evaluated to *(void *)0, resulting in a compiler error.
Remove these NULL tests to avoid leading future readers of this code to
believe that this has a chance of working. Also clean up related legacy
code in nocachegetattr(), heap_getsysattr(), and nocache_index_getattr().
The new coding standard is that any code which calls a getattr-type
function or macro which takes an isnull argument MUST pass a valid
boolean pointer. Per discussion with Bruce Momjian, Tom Lane, Alvaro
Herrera.
deletion, so that we attempt to unlink the correct filepath. unlink()
errors are ignorable there, so lack of a DatabasePath initialization step
did not cause visible problems until a related bug showed up on Solaris.
Code refactored from xact_redo_commit() to
ProcessCommittedInvalidationMessages() in inval.c. Recovery may replay
shared invalidation messages for many databases, so we cannot
SetDatabasePath() once as we do in normal backends. Read the databaseid
from the shared invalidation messages, then set DatabasePath
temporarily before calling RelationCacheInitFileInvalidate().
Problem report by Robert Treat, analysis and fix by me.
we're not going to support that anymore.
I did keep the 64-bit-CRC-with-32-bit-arithmetic code, since it has a
performance excuse to live. It's a bit moot since that's all ifdef'd
out, of course.
Add missing varlena header to TableSpaceOpts structure. And, per
Tom Lane, instead of calling tablespace_reloptions in CacheMemoryContext,
call it in the caller's memory context and copy the value over
afterwards, to reduce the chances of a session-lifetime memory leak.
provide a working 64-bit integer datatype. As recently noted, we've been
broken on such platforms since early in the 8.4 development cycle. Since
it took nearly two years for anyone to even notice, it seems that the
rationale for continuing to support such platforms has reached the point
of non-existence. Rather than thrashing around to try to make it work
again, we'll just admit up front that this no longer works.
Back-patch to 8.4 since that branch is also broken.
We should go around to remove INT64_IS_BUSTED support, but just in HEAD,
so that seems like material for a separate commit.
VACUUM FULL was renamed to VACUUM FULL INPLACE. Also added a new
option -i, --inplace for vacuumdb to perform FULL INPLACE vacuuming.
Since the new VACUUM FULL uses CLUSTER infrastructure, we cannot
use it for system tables. VACUUM FULL for system tables always
fall back into VACUUM FULL INPLACE silently.
Itagaki Takahiro, reviewed by Jeff Davis and Simon Riggs.
peculiar variant of UNION ALL, and so wouldn't likely get written directly
as-is, it's possible for it to arise as a result of simplification of
less-obviously-silly queries. In particular, now that we can do flattening
of subqueries that have constant outputs and are underneath an outer join,
it's possible for the case to result from simplification of queries of the
type exhibited in bug #5263. Back-patch to 8.4 to avoid a functionality
regression for this type of query.
This patch only supports seq_page_cost and random_page_cost as parameters,
but it provides the infrastructure to scalably support many more.
In particular, we may want to add support for effective_io_concurrency,
but I'm leaving that as future work for now.
Thanks to Tom Lane for design help and Alvaro Herrera for the review.
pg_attribute, by having genbki.pl derive the information from the various
catalog header files. This greatly simplifies modification of the
"bootstrapped" catalogs.
This patch finally kills genbki.sh and Gen_fmgrtab.sh; we now rely entirely on
Perl scripts for those build steps. To avoid creating a Perl build dependency
where there was not one before, the output files generated by these scripts
are now treated as distprep targets, ie, they will be built and shipped in
tarballs. But you will need a reasonably modern Perl (probably at least
5.6) if you want to build from a CVS pull.
The changes to the MSVC build process are untested, and may well break ---
we'll soon find out from the buildfarm.
John Naylor, based on ideas from Robert Haas and others
recovery instead of reading the backup history file. This is more robust,
as it stops you from prematurely starting up an inconsisten cluster if the
backup history file is lost for some reason, or if the base backup was
never finished with pg_stop_backup().
This also paves the way for a simpler streaming replication patch, which
doesn't need to care about backup history files anymore.
The backup history file is still created and archived as before, but it's
not used by the system anymore. It's just for informational purposes now.
Bump PG_CONTROL_VERSION as the location of the backup startpoint is now
written to a new field in pg_control, and catversion because initdb is
required
Original patch by Fujii Masao per Simon's idea, with further fixes by me.
"column < constant", and the comparison value is in the first or last
histogram bin or outside the histogram entirely, try to fetch the actual
column min or max value using an index scan (if there is an index on the
column). If successful, replace the lower or upper histogram bound with
that value before carrying on with the estimate. This limits the
estimation error caused by moving min/max values when the comparison
value is close to the min or max. Per a complaint from Josh Berkus.
It is tempting to consider using this mechanism for mergejoinscansel as well,
but that would inject index fetches into main-line join estimation not just
endpoint cases. I'm refraining from that until we can get a better handle
on the costs of doing this type of lookup.