Commit graph

2504 commits

Author SHA1 Message Date
Tom Lane
d20eec82a0 Add HOLD/RESUME_INTERRUPTS in HandleCatchupInterrupt/HandleNotifyInterrupt.
This prevents a possible longjmp out of the signal handler if a timeout
or SIGINT occurs while something within the handler has transiently set
ImmediateInterruptOK.  For safety we must hold off the timeout or cancel
error until we're back in mainline, or at least till we reach the end of
the signal handler when ImmediateInterruptOK was true at entry.  This
syncs these functions with the logic now present in handle_sig_alarm.

AFAICT there is no live bug here in 9.0 and up, because I don't think we
currently can wait for any heavyweight lock inside these functions, and
there is no other code (except read-from-client) that will turn on
ImmediateInterruptOK.  However, that was not true pre-9.0: in older
branches ProcessIncomingNotify might block trying to lock pg_listener, and
then a SIGINT could lead to undesirable control flow.  It might be all
right anyway given the relatively narrow code ranges in which NOTIFY
interrupts are enabled, but for safety's sake I'm back-patching this.
2013-12-13 14:05:19 -05:00
Heikki Linnakangas
0b132b9042 Don't update relfrozenxid if any pages were skipped.
Vacuum recognizes that it can update relfrozenxid by checking whether it has
processed all pages of a relation. Unfortunately it performed that check
after truncating the dead pages at the end of the relation, and used the new
number of pages to decide whether all pages have been scanned. If the new
number of pages happened to be smaller or equal to the number of pages
scanned, it incorrectly decided that all pages were scanned.

This can lead to relfrozenxid being updated, even though some pages were
skipped that still contain old XIDs. That can lead to data loss due to xid
wraparounds with some rows suddenly missing. This likely has escaped notice
so far because it takes a large number (~2^31) of xids being used to see the
effect, while a full-table vacuum before that would fix the issue.

The incorrect logic was introduced by commit
b4b6923e03. Backpatch this fix down to 8.4,
like that commit.

Andres Freund, with some modifications by me.
2013-11-27 13:28:47 +02:00
Heikki Linnakangas
3379263b6d Count locked pages that don't need vacuuming as scanned.
Previously, if VACUUM skipped vacuuming a page because it's pinned, it
didn't count that page as scanned. However, that meant that relfrozenxid
was not bumped up either, which prevented anti-wraparound vacuum from
doing its job.

Report by Миша Тюрин, analysis and patch by Sergey Burladyn and Jeff Janes.
Backpatch to 9.2, where the skip-locked-pages behavior was introduced.
2013-11-18 10:12:22 +02:00
Tom Lane
733e49ecff Fix subtly-wrong volatility checking in BeginCopyFrom().
contain_volatile_functions() is best applied to the output of
expression_planner(), not its input, so that insertion of function
default arguments and constant-folding have been done.  (See comments
at CheckMutability, for instance.)  It's perhaps unlikely that anyone
will notice a difference in practice, but still we should do it properly.

In passing, change variable type from Node* to Expr* to reduce the net
number of casts needed.

Noted while perusing uses of contain_volatile_functions().
2013-11-08 08:59:49 -05:00
Tom Lane
2e2134954a Fix some odd behaviors when using a SQL-style simple GMT offset timezone.
Formerly, when using a SQL-spec timezone setting with a fixed GMT offset
(called a "brute force" timezone in the code), the session_timezone
variable was not updated to match the nominal timezone; rather, all code
was expected to ignore session_timezone if HasCTZSet was true.  This is
of course obviously fragile, though a search of the code finds only
timeofday() failing to honor the rule.  A bigger problem was that
DetermineTimeZoneOffset() supposed that if its pg_tz parameter was
pointer-equal to session_timezone, then HasCTZSet should override the
parameter.  This would cause datetime input containing an explicit zone
name to be treated as referencing the brute-force zone instead, if the
zone name happened to match the session timezone that had prevailed
before installing the brute-force zone setting (as reported in bug #8572).
The same malady could affect AT TIME ZONE operators.

To fix, set up session_timezone so that it matches the brute-force zone
specification, which we can do using the POSIX timezone definition syntax
"<abbrev>offset", and get rid of the bogus lookaside check in
DetermineTimeZoneOffset().  Aside from fixing the erroneous behavior in
datetime parsing and AT TIME ZONE, this will cause the timeofday() function
to print its result in the user-requested time zone rather than some
previously-set zone.  It might also affect results in third-party
extensions, if there are any that make use of session_timezone without
considering HasCTZSet, but in all cases the new behavior should be saner
than before.

Back-patch to all supported branches.
2013-11-01 12:13:26 -04:00
Heikki Linnakangas
4da24f12e6 Fix two bugs in setting the vm bit of empty pages.
Use a critical section when setting the all-visible flag on an empty page,
and WAL-logging it. log_newpage_buffer() contains an assertion that it
must be called inside a critical section, and it's the right thing to do
when modifying a buffer anyway.

Also, the page should be marked dirty before calling log_newpage_buffer(),
per the comment in log_newpage_buffer() and src/backend/access/transam/README.

Patch by Andres Freund, in response to my report. Backpatch to 9.2, like
the patch that introduced these bugs (a6370fd9).
2013-10-23 14:25:50 +03:00
Heikki Linnakangas
e046194530 Fix spurious warning after vacuuming a page on a table with no indexes.
There is a rare race condition, when a transaction that inserted a tuple
aborts while vacuum is processing the page containing the inserted tuple.
Vacuum prunes the page first, which normally removes any dead tuples, but
if the inserting transaction aborts right after that, the loop after
pruning will see a dead tuple and remove it instead. That's OK, but if the
page is on a table with no indexes, and the page becomes completely empty
after removing the dead tuple (or tuples) on it, it will be immediately
marked as all-visible. That's OK, but the sanity check in vacuum would
throw a warning because it thinks that the page contains dead tuples and
was nevertheless marked as all-visible, even though it just vacuumed away
the dead tuples and so it doesn't actually contain any.

Spotted this while reading the code. It's difficult to hit the race
condition otherwise, but can be done by putting a breakpoint after the
heap_page_prune() call.

Backpatch all the way to 8.4, where this code first appeared.
2013-09-26 11:38:55 +03:00
Noah Misch
b3ddd10635 Restore REINDEX constraint validation.
Refactoring as part of commit 8ceb245680
had the unintended effect of making REINDEX TABLE and REINDEX DATABASE
no longer validate constraints enforced by the indexes in question;
REINDEX INDEX still did so.  Indexes marked invalid remained so, and
constraint violations arising from data corruption went undetected.
Back-patch to 9.0, like the causative commit.
2013-07-30 18:39:34 -04:00
Tom Lane
7e0b9ed6c5 Only install a portal's ResourceOwner if it actually has one.
In most scenarios a portal without a ResourceOwner is dead and not subject
to any further execution, but a portal for a cursor WITH HOLD remains in
existence with no ResourceOwner after the creating transaction is over.
In this situation, if we attempt to "execute" the portal directly to fetch
data from it, we were setting CurrentResourceOwner to NULL, leading to a
segfault if the datatype output code did anything that required a resource
owner (such as trying to fetch system catalog entries that weren't already
cached).  The case appears to be impossible to provoke with stock libpq,
but psqlODBC at least is able to cause it when working with held cursors.

Simplest fix is to just skip the assignment to CurrentResourceOwner, so
that any resources used by the data output operations will be managed by
the transaction-level resource owner instead.  For consistency I changed
all the places that install a portal's resowner as current, even though
some of them are probably not reachable with a held cursor's portal.

Per report from Joshua Berry (with thanks to Hiroshi Inoue for developing
a self-contained test case).  Back-patch to all supported versions.
2013-06-13 13:11:35 -04:00
Robert Haas
17fa4c321c Ensure that XLOG_HEAP2_VISIBLE always targets an initialized page.
Andres Freund
2013-06-06 10:21:44 -04:00
Tom Lane
2ff74efa40 Provide better message when CREATE EXTENSION can't find a target schema.
The new message (and SQLSTATE) matches the corresponding error cases in
namespace.c.

This was thought to be a "can't happen" case when extension.c was written,
so we didn't think hard about how to report it.  But it definitely can
happen in 9.2 and later, since we no longer require search_path to contain
any valid schema names.  It's probably also possible in 9.1 if search_path
came from a noninteractive source.  So, back-patch to all releases
containing this code.

Per report from Sean Chittenden, though this isn't exactly his patch.
2013-06-04 17:22:43 -04:00
Heikki Linnakangas
fcf91c06e0 Print line number correctly in COPY.
When COPY uses the multi-insert method to insert a batch of tuples into the
heap at a time, incorrect line number was printed if something went wrong in
inserting the index tuples (primary key failure, for exampl), or processing
after row triggers.

Fixes bug #8173 reported by Lloyd Albin. Backpatch to 9.2, where the multi-
insert code was added.
2013-05-23 07:53:01 -04:00
Kevin Grittner
95909f3be1 Ensure ANALYZE phase is not skipped because of canceled truncate.
Patch b19e4250b4 attempted to
preserve existing behavior regarding statistics generation in the
case that a truncation attempt was canceled due to lock conflicts.
It failed to do this accurately in two regards: (1) autovacuum had
previously generated statistics if the truncate attempt failed to
initially get the lock rather than having started the attempt, and
(2) the VACUUM ANALYZE command had always generated statistics.

Both of these changes were unintended, and are reverted by this
patch.  On review, there seems to be consensus that the previous
failure to generate statistics when the truncate was terminated
was more an unfortunate consequence of how that effort was
previously terminated than a feature we want to keep; so this
patch generates statistics even when an autovacuum truncation
attempt terminates early.  Another unintended change which is kept
on the basis that it is an improvement is that when a VACUUM
command is truncating, it will the new heuristic for avoiding
blocking other processes, rather than keeping an
AccessExclusiveLock on the table for however long the truncation
takes.

Per multiple reports, with some renaming per patch by Jeff Janes.

Backpatch to 9.0, where problem was created.
2013-04-29 13:05:56 -05:00
Tom Lane
9e859a9150 Avoid deadlock between concurrent CREATE INDEX CONCURRENTLY commands.
There was a high probability of two or more concurrent C.I.C. commands
deadlocking just before completion, because each would wait for the others
to release their reference snapshots.  Fix by releasing the snapshot
before waiting for other snapshots to go away.

Per report from Paul Hinze.  Back-patch to all active branches.
2013-04-25 16:58:10 -04:00
Peter Eisentraut
c16a9be8d2 Correct tense in log message 2013-02-23 23:31:10 -05:00
Tom Lane
a25ccd615f Fix bogus when-to-deregister-from-listener-array logic.
Since a backend adds itself to the global listener array during
Exec_ListenPreCommit, it's inappropriate for it to remove itself during
Exec_UnlistenCommit or Exec_UnlistenAllCommit --- that leads to failure
when committing a transaction that did UNLISTEN then LISTEN, since we end
up not registered though we should be.  (This leads to missing later
notifications, or to Assert failures in assert-enabled builds.)  Instead
deal with deregistering at the bottom of AtCommit_Notify, when we know the
final state of the listenChannels list.

Also, simplify the representation of registration status by replacing the
transient backendHasExecutedInitialListen flag with an amRegisteredListener
flag.

Per report from Greg Sabino Mullane.  Back-patch to 9.0, where the problem
was introduced during the LISTEN/NOTIFY rewrite.
2013-02-13 12:48:11 -05:00
Alvaro Herrera
231dbb3c9b Fix typo in freeze_table_age implementation
The original code used freeze_min_age instead of freeze_table_age.  The
main consequence of this mistake is that lowering freeze_min_age would
cause full-table scans to occur much more frequently, which causes
serious issues because the number of writes required is much larger.
That feature (freeze_min_age) is supposed to affect only how soon tuples
are frozen; some pages should still be skipped due to the visibility
map.

Backpatch to 8.4, where the freeze_table_age feature was introduced.

Report and patch from Andres Freund
2013-02-01 12:10:10 -03:00
Kevin Grittner
a79ae0bc0d Fix performance problems with autovacuum truncation in busy workloads.
In situations where there are over 8MB of empty pages at the end of
a table, the truncation work for trailing empty pages takes longer
than deadlock_timeout, and there is frequent access to the table by
processes other than autovacuum, there was a problem with the
autovacuum worker process being canceled by the deadlock checking
code. The truncation work done by autovacuum up that point was
lost, and the attempt tried again by a later autovacuum worker. The
attempts could continue indefinitely without making progress,
consuming resources and blocking other processes for up to
deadlock_timeout each time.

This patch has the autovacuum worker checking whether it is
blocking any other thread at 20ms intervals. If such a condition
develops, the autovacuum worker will persist the work it has done
so far, release its lock on the table, and sleep in 50ms intervals
for up to 5 seconds, hoping to be able to re-acquire the lock and
try again. If it is unable to get the lock in that time, it moves
on and a worker will try to continue later from the point this one
left off.

While this patch doesn't change the rules about when and what to
truncate, it does cause the truncation to occur sooner, with less
blocking, and with the consumption of fewer resources when there is
contention for the table's lock.

The only user-visible change other than improved performance is
that the table size during truncation may change incrementally
instead of just once.

Backpatched to 9.0 from initial master commit at
b19e4250b4 -- before that the
differences are too large to be clearly safe.

Jan Wieck
2013-01-23 13:39:28 -06:00
Tom Lane
d98547eb43 Protect against SnapshotNow race conditions in pg_tablespace scans.
Use of SnapshotNow is known to expose us to race conditions if the tuple(s)
being sought could be updated by concurrently-committing transactions.
CREATE DATABASE and DROP DATABASE are particularly exposed because they do
heavyweight filesystem operations during their scans of pg_tablespace,
so that the scans run for a very long time compared to most.  Furthermore,
the potential consequences of a missed or twice-visited row are nastier
than average:

* createdb() could fail with a bogus "file already exists" error, or
  silently fail to copy one or more tablespace's worth of files into the
  new database.

* remove_dbtablespaces() could miss one or more tablespaces, thus failing
  to free filesystem space for the dropped database.

* check_db_file_conflict() could likewise miss a tablespace, leading to an
  OID conflict that could result in data loss either immediately or in
  future operations.  (This seems of very low probability, though, since a
  duplicate database OID would be unlikely to start with.)

Hence, it seems worth fixing these three places to use MVCC snapshots, even
though this will someday be superseded by a generic solution to SnapshotNow
race conditions.

Back-patch to all active branches.

Stephen Frost and Tom Lane
2013-01-18 18:06:27 -05:00
Tom Lane
bfe738d7f3 Fix pg_extension_config_dump() to handle update cases more sanely.
If pg_extension_config_dump() is executed again for a table already listed
in the extension's extconfig, the code was blindly making a new array entry.
This does not seem useful.  Fix it to replace the existing array entry
instead, so that it's possible for extension update scripts to alter the
filter conditions for configuration tables.

In addition, teach ALTER EXTENSION DROP TABLE to check for an extconfig
entry for the target table, and remove it if present.  This is not a 100%
solution because it's allowed for an extension update script to just
summarily DROP a member table, and that code path doesn't go through
ExecAlterExtensionContentsStmt.  We could probably make that case clean
things up if we had to, but it would involve sticking a very ugly wart
somewhere in the guts of dependency.c.  Since on the whole it seems quite
unlikely that extension updates would want to remove pre-existing
configuration tables, making the case possible with an explicit command
seems sufficient.

Per bug #7756 from Regina Obe.  Back-patch to 9.1 where extensions were
introduced.
2012-12-20 16:31:58 -05:00
Tom Lane
fe2ef429a1 Fix failure to ignore leftover temp tables after a server crash.
During crash recovery, we remove disk files belonging to temporary tables,
but the system catalog entries for such tables are intentionally not
cleaned up right away.  Instead, the first backend that uses a temp schema
is expected to clean out any leftover objects therein.  This approach
requires that we be careful to ignore leftover temp tables (since any
actual access attempt would fail), *even if their BackendId matches our
session*, if we have not yet established use of the session's corresponding
temp schema.  That worked fine in the past, but was broken by commit
debcec7dc3 which incorrectly removed the
rd_islocaltemp relcache flag.  Put it back, and undo various changes
that substituted tests like "rel->rd_backend == MyBackendId" for use
of a state-aware flag.  Per trouble report from Heikki Linnakangas.

Back-patch to 9.1 where the erroneous change was made.  In the back
branches, be careful to add rd_islocaltemp in a spot in the struct that
was alignment padding before, so as not to break existing add-on code.
2012-12-17 20:15:39 -05:00
Simon Riggs
17ee02ea1c Avoid holding vmbuffer pin after VACUUM.
During VACUUM if we pause to perform a cycle
of index cleanup we drop the vmbuffer pin,
so we should do the same thing when heap
scan completes. This avoids holding vmbuffer
pin across the main index cleanup in VACUUM,
which could be minutes or hours longer than
necessary for correctness.

Bug report and suggested fix from Pavan Deolasee
2012-12-03 18:54:52 +00:00
Tom Lane
1c316e3a02 Add missing buffer lock acquisition in GetTupleForTrigger().
If we had not been holding buffer pin continuously since the tuple was
initially fetched by the UPDATE or DELETE query, it would be possible for
VACUUM or a page-prune operation to move the tuple while we're trying to
copy it.  This would result in a garbage "old" tuple value being passed to
an AFTER ROW UPDATE or AFTER ROW DELETE trigger.  The preconditions for
this are somewhat improbable, and the timing constraints are very tight;
so it's not so surprising that this hasn't been reported from the field,
even though the bug has been there a long time.

Problem found by Andres Freund.  Back-patch to all active branches.
2012-11-30 13:56:00 -05:00
Tom Lane
94c014b532 Fix assorted bugs in CREATE/DROP INDEX CONCURRENTLY.
Commit 8cb53654db, which introduced DROP
INDEX CONCURRENTLY, managed to break CREATE INDEX CONCURRENTLY via a poor
choice of catalog state representation.  The pg_index state for an index
that's reached the final pre-drop stage was the same as the state for an
index just created by CREATE INDEX CONCURRENTLY.  This meant that the
(necessary) change to make RelationGetIndexList ignore about-to-die indexes
also made it ignore freshly-created indexes; which is catastrophic because
the latter do need to be considered in HOT-safety decisions.  Failure to
do so leads to incorrect index entries and subsequently wrong results from
queries depending on the concurrently-created index.

To fix, make the final state be indisvalid = true and indisready = false,
which is otherwise nonsensical.  This is pretty ugly but we can't add
another column without forcing initdb, and it's too late for that in 9.2.
(There's a cleaner fix in HEAD.)

In addition, change CREATE/DROP INDEX CONCURRENTLY so that the pg_index
flag changes they make without exclusive lock on the index are made via
heap_inplace_update() rather than a normal transactional update.  The
latter is not very safe because moving the pg_index tuple could result in
concurrent SnapshotNow scans finding it twice or not at all, thus possibly
resulting in index corruption.  This is a pre-existing bug in CREATE INDEX
CONCURRENTLY, which was copied into the DROP code.

In addition, fix various places in the code that ought to check to make
sure that the indexes they are manipulating are valid and/or ready as
appropriate.  These represent bugs that have existed since 8.2, since
a failed CREATE INDEX CONCURRENTLY could leave a corrupt or invalid
index behind, and we ought not try to do anything that might fail with
such an index.

Also fix RelationReloadIndexInfo to ensure it copies all the pg_index
columns that are allowed to change after initial creation.  Previously we
could have been left with stale values of some fields in an index relcache
entry.  It's not clear whether this actually had any user-visible
consequences, but it's at least a bug waiting to happen.

In addition, do some code and docs review for DROP INDEX CONCURRENTLY;
some cosmetic code cleanup but mostly addition and revision of comments.

Portions of this need to be back-patched even further, but I'll work
on that separately.

Problem reported by Amit Kapila, diagnosis by Pavan Deolasee,
fix by Tom Lane and Andres Freund.
2012-11-29 10:37:13 -05:00
Tom Lane
786afc1ce5 Revert patch for taking fewer snapshots.
This reverts commit d573e239f0, "Take fewer
snapshots".  While that seemed like a good idea at the time, it caused
execution to use a snapshot that had been acquired before locking any of
the tables mentioned in the query.  This created user-visible anomalies
that were not present in any prior release of Postgres, as reported by
Tomas Vondra.  While this whole area could do with a redesign (since there
are related cases that have anomalies anyway), it doesn't seem likely that
any future patch would be reasonably back-patchable; and we don't want 9.2
to exhibit a behavior that's subtly unlike either past or future releases.
Hence, revert to prior code while we rethink the problem.
2012-11-26 15:55:51 -05:00
Tom Lane
03787f6392 Don't trash input list structure in does_not_exist_skipping().
The trigger and rule cases need to split up the input name list, but
they mustn't corrupt the passed-in data structure, since it could be part
of a cached utility-statement parsetree.  Per bug #7641.
2012-11-08 11:34:37 -05:00
Tom Lane
329057fd8f Fix handling of inherited check constraints in ALTER COLUMN TYPE.
This case got broken in 8.4 by the addition of an error check that
complains if ALTER TABLE ONLY is used on a table that has children.
We do use ONLY for this situation, but it's okay because the necessary
recursion occurs at a higher level.  So we need to have a separate
flag to suppress recursion without making the error check.

Reported and patched by Pavan Deolasee, with some editorial adjustments by
me.  Back-patch to 8.4, since this is a regression of functionality that
worked in earlier branches.
2012-11-05 13:36:21 -05:00
Alvaro Herrera
fb3590dde0 Fix ALTER EXTENSION / SET SCHEMA
In its original conception, it was leaving some objects into the old
schema, but without their proper pg_depend entries; this meant that the
old schema could be dropped, causing future pg_dump calls to fail on the
affected database.  This was originally reported by Jeff Frost as #6704;
there have been other complaints elsewhere that can probably be traced
to this bug.

To fix, be more consistent about altering a table's subsidiary objects
along the table itself; this requires some restructuring in how tables
are relocated when altering an extension -- hence the new
AlterTableNamespaceInternal routine which encapsulates it for both the
ALTER TABLE and the ALTER EXTENSION cases.

There was another bug lurking here, which was unmasked after fixing the
previous one: certain objects would be reached twice via the dependency
graph, and the second attempt to move them would cause the entire
operation to fail.  Per discussion, it seems the best fix for this is to
do more careful tracking of objects already moved: we now maintain a
list of moved objects, to avoid attempting to do it twice for the same
object.

Authors: Alvaro Herrera, Dimitri Fontaine
Reviewed by Tom Lane
2012-10-31 10:48:41 -03:00
Tom Lane
819c4129ca Fix a couple other leftover uses of 'conisonly' terminology. 2012-09-12 15:12:43 -04:00
Tom Lane
c3ca786df6 Fix issues with checks for unsupported transaction states in Hot Standby.
The GUC check hooks for transaction_read_only and transaction_isolation
tried to check RecoveryInProgress(), so as to disallow setting read/write
mode or serializable isolation level (respectively) in hot standby
sessions.  However, GUC check hooks can be called in many situations where
we're not connected to shared memory at all, resulting in a crash in
RecoveryInProgress().  Among other cases, this results in EXEC_BACKEND
builds crashing during child process start if default_transaction_isolation
is serializable, as reported by Heikki Linnakangas.  Protect those calls
by silently allowing any setting when not inside a transaction; which is
okay anyway since these GUCs are always reset at start of transaction.

Also, add a check to GetSerializableTransactionSnapshot() to complain
if we are in hot standby.  We need that check despite the one in
check_XactIsoLevel() because default_transaction_isolation could be
serializable.  We don't want to complain any sooner than this in such
cases, since that would prevent running transactions at all in such a
state; but a transaction can be run, if SET TRANSACTION ISOLATION is done
before setting a snapshot.  Per report some months ago from Robert Haas.

Back-patch to 9.1, since these problems were introduced by the SSI patch.

Kevin Grittner and Tom Lane, with ideas from Heikki Linnakangas
2012-08-24 13:09:12 -04:00
Tom Lane
82634a88d1 Disallow extensions from owning the schema they are assigned to.
This situation creates a dependency loop that confuses pg_dump and probably
other things.  Moreover, since the mental model is that the extension
"contains" schemas it owns, but "is contained in" its extschema (even
though neither is strictly true), having both true at once is confusing for
people too.  So prevent the situation from being set up.

Reported and patched by Thom Brown.  Back-patch to 9.1 where extensions
were added.
2012-08-15 11:27:00 -04:00
Tom Lane
ca07d7ebe0 Fix dependencies generated during ALTER TABLE ADD CONSTRAINT USING INDEX.
This command generated new pg_depend entries linking the index to the
constraint and the constraint to the table, which match the entries made
when a unique or primary key constraint is built de novo.  However, it did
not bother to get rid of the entries linking the index directly to the
table.  We had considered the issue when the ADD CONSTRAINT USING INDEX
patch was written, and concluded that we didn't need to get rid of the
extra entries.  But this is wrong: ALTER COLUMN TYPE wasn't expecting such
redundant dependencies to exist, as reported by Hubert Depesz Lubaczewski.
On reflection it seems rather likely to break other things as well, since
there are many bits of code that crawl pg_depend for one purpose or
another, and most of them are pretty naive about what relationships they're
expecting to find.  Fortunately it's not that hard to get rid of the extra
dependency entries, so let's do that.

Back-patch to 9.1, where ALTER TABLE ADD CONSTRAINT USING INDEX was added.
2012-08-11 12:51:30 -04:00
Alvaro Herrera
7c055d64a6 Fix typo in comment 2012-08-08 17:40:50 -04:00
Tom Lane
a4a7eb37d4 Fix longstanding crash-safety bug with newly-created-or-reset sequences.
If a crash occurred immediately after the first nextval() call for a serial
column, WAL replay would restore the sequence to a state in which it
appeared that no nextval() had been done, thus allowing the first sequence
value to be returned again by the next nextval() call; as reported in
bug #6748 from Xiangming Mei.

More generally, the problem would occur if an ALTER SEQUENCE was executed
on a freshly created or reset sequence.  (The manifestation with serial
columns was introduced in 8.2 when we added an ALTER SEQUENCE OWNED BY step
to serial column creation.)  The cause is that sequence creation attempted
to save one WAL entry by writing out a WAL record that made it appear that
the first nextval() had already happened (viz, with is_called = true),
while marking the sequence's in-database state with log_cnt = 1 to show
that the first nextval() need not emit a WAL record.  However, ALTER
SEQUENCE would emit a new WAL entry reflecting the actual in-database state
(with is_called = false).  Then, nextval would allocate the first sequence
value and set is_called = true, but it would trust the log_cnt value and
not emit any WAL record.  A crash at this point would thus restore the
sequence to its post-ALTER state, causing the next nextval() call to return
the first sequence value again.

To fix, get rid of the idea of logging an is_called status different from
reality.  This means that the first nextval-driven WAL record will happen
at the first nextval call not the second, but the marginal cost of that is
pretty negligible.  In addition, make sure that ALTER SEQUENCE resets
log_cnt to zero in any case where it touches sequence parameters that
affect future nextval results.  This will result in some user-visible
changes in the contents of a sequence's log_cnt column, as reflected in the
patch's regression test changes; but no application should be depending on
that anyway, since it was already true that log_cnt changes rather
unpredictably depending on checkpoint timing.

In addition, make some basically-cosmetic improvements to get rid of
sequence.c's undesirable intimacy with page layout details.  It was always
really trying to WAL-log the contents of the sequence tuple, so we should
have it do that directly using a HeapTuple's t_data and t_len, rather than
backing into it with some magic assumptions about where the tuple would be
on the sequence's page.

Back-patch to all supported branches.
2012-07-25 17:42:42 -04:00
Alvaro Herrera
68043258ac Change syntax of new CHECK NO INHERIT constraints
The initially implemented syntax, "CHECK NO INHERIT (expr)" was not
deemed very good, so switch to "CHECK (expr) NO INHERIT" instead.  This
way it looks similar to SQL-standards compliant constraint attribute.

Backport to 9.2 where the new syntax and feature was introduced.

Per discussion.
2012-07-24 16:02:18 -04:00
Alvaro Herrera
d721f208af connoinherit may be true only for CHECK constraints
The code was setting it true for other constraints, which is
bogus.  Doing so caused bogus catalog entries for such constraints, and
in particular caused an error to be raised when trying to drop a
constraint of types other than CHECK from a table that has children,
such as reported in bug #6712.

In 9.2, additionally ignore connoinherit=true for other constraint
types, to avoid having to force initdb; existing databases might already
contain bogus catalog entries.

Includes a catversion bump (in HEAD only).

Bug report from Miroslav Šulc
Analysis from Amit Kapila and Noah Misch; Amit also contributed the patch.
2012-07-20 14:07:09 -04:00
Tom Lane
3727240d01 Avoid pre-determining index names during CREATE TABLE LIKE parsing.
Formerly, when trying to copy both indexes and comments, CREATE TABLE LIKE
had to pre-assign names to indexes that had comments, because it made up an
explicit CommentStmt command to apply the comment and so it had to know the
name for the index.  This creates bad interactions with other indexes, as
shown in bug #6734 from Daniele Varrazzo: the preassignment logic couldn't
take any other indexes into account so it could choose a conflicting name.

To fix, add a field to IndexStmt that allows it to carry a comment to be
assigned to the new index.  (This isn't a user-exposed feature of CREATE
INDEX, only an internal option.)  Now we don't need preassignment of index
names in any situation.

I also took the opportunity to refactor DefineIndex to accept the IndexStmt
as such, rather than passing all its fields individually in a mile-long
parameter list.

Back-patch to 9.2, but no further, because it seems too dangerous to change
IndexStmt or DefineIndex's API in released branches.  The bug exists back
to 9.0 where CREATE TABLE LIKE grew the ability to copy comments, but given
the lack of prior complaints we'll just let it go unfixed before 9.2.
2012-07-16 13:25:26 -04:00
Alvaro Herrera
6416895cc6 Have REASSIGN OWNED work on extensions, too
Per bug #6593, REASSIGN OWNED fails when the affected role has created
an extension.  Even though the user related to the extension is not
nominally the owner, its OID appears on pg_shdepend and thus causes
problems when the user is to be dropped.

This commit adds code to change the "ownership" of the extension itself,
not of the contained objects.  This is fine because it's currently only
called from REASSIGN OWNED, which would also modify the ownership of the
contained objects.  However, this is not sufficient for a working ALTER
OWNER implementation extension.

Back-patch to 9.1, where extensions were introduced.

Bug #6593 reported by Emiliano Leporati.
2012-07-03 15:18:40 -04:00
Tom Lane
52336a4702 Prevent CREATE TABLE LIKE/INHERITS from (mis) copying whole-row Vars.
If a CHECK constraint or index definition contained a whole-row Var (that
is, "table.*"), an attempt to copy that definition via CREATE TABLE LIKE or
table inheritance produced incorrect results: the copied Var still claimed
to have the rowtype of the source table, rather than the created table.

For the LIKE case, it seems reasonable to just throw error for this
situation, since the point of LIKE is that the new table is not permanently
coupled to the old, so there's no reason to assume its rowtype will stay
compatible.  In the inheritance case, we should ideally allow such
constraints, but doing so will require nontrivial refactoring of CREATE
TABLE processing (because we'd need to know the OID of the new table's
rowtype before we adjust inherited CHECK constraints).  In view of the lack
of previous complaints, that doesn't seem worth the risk in a back-patched
bug fix, so just make it throw error for the inheritance case as well.

Along the way, replace change_varattnos_of_a_node() with a more robust
function map_variable_attnos(), which is capable of being extended to
handle insertion of ConvertRowtypeExpr whenever we get around to fixing
the inheritance case nicely, and in the meantime it returns a failure
indication to the caller so that a helpful message with some context can be
thrown.  Also, this code will do the right thing with subselects (if we
ever allow them in CHECK or indexes), and it range-checks varattnos before
using them to index into the map array.

Per report from Sergey Konoplev.  Back-patch to all supported branches.
2012-06-30 16:45:27 -04:00
Tom Lane
f374098e8c Fix NOTIFY to cope with I/O problems, such as out-of-disk-space.
The LISTEN/NOTIFY subsystem got confused if SimpleLruZeroPage failed,
which would typically happen as a result of a write() failure while
attempting to dump a dirty pg_notify page out of memory.  Subsequently,
all attempts to send more NOTIFY messages would fail with messages like
"Could not read from file "pg_notify/nnnn" at offset nnnnn: Success".
Only restarting the server would clear this condition.  Per reports from
Kevin Grittner and Christoph Berg.

Back-patch to 9.0, where the problem was introduced during the
LISTEN/NOTIFY rewrite.
2012-06-29 00:51:40 -04:00
Peter Eisentraut
0b847cba69 Improve reporting of permission errors for array types
Because permissions are assigned to element types, not array types,
complaining about permission denied on an array type would be
misleading to users.  So adjust the reporting to refer to the element
type instead.

In order not to duplicate the required logic in two dozen places,
refactor the permission denied reporting for types a bit.

pointed out by Yeb Havinga during the review of the type privilege
feature
2012-06-15 23:03:50 +03:00
Peter Eisentraut
ccc65b710e Add more message pluralization
Even though we can't do much about the case with multiple plurals in
one sentence, we can fix the other cases.
2012-06-15 02:01:00 +03:00
Bruce Momjian
927d61eeff Run pgindent on 9.2 source tree in preparation for first 9.3
commit-fest.
2012-06-10 15:20:04 -04:00
Robert Haas
b50991eedb Fix more crash-safe visibility map bugs, and improve comments.
In lazy_scan_heap, we could issue bogus warnings about incorrect
information in the visibility map, because we checked the visibility
map bit before locking the heap page, creating a race condition.  Fix
by rechecking the visibility map bit before we complain.  Rejigger
some related logic so that we rely on the possibly-outdated
all_visible_according_to_vm value as little as possible.

In heap_multi_insert, it's not safe to clear the visibility map bit
before beginning the critical section.  The visibility map is not
crash-safe unless we treat clearing the bit as a critical operation.
Specifically, if the transaction were to error out after we set the
bit and before entering the critical section, we could end up writing
the heap page to disk (with the bit cleared) and crashing before the
visibility map page made it to disk.  That would be bad.  heap_insert
has this correct, but somehow the order of operations got rearranged
when heap_multi_insert was added.

Also, add some more comments to visibilitymap_test, lazy_scan_heap,
and IndexOnlyNext, expounding on concurrency issues.

Per extensive code review by Andres Freund, and further review by Tom
Lane, who also made the original report about the bogus warnings.
2012-06-07 12:48:13 -04:00
Tom Lane
ad0009e7be Force PL and range-type support functions to be owned by a superuser.
We allow non-superusers to create procedural languages (with restrictions)
and range datatypes.  Previously, the automatically-created support
functions for these objects ended up owned by the creating user.  This
represents a rather considerable security hazard, because the owning user
might be able to alter a support function's definition in such a way as to
crash the server, inject trojan-horse SQL code, or even execute arbitrary
C code directly.  It appears that right now the only actually exploitable
problem is the infinite-recursion bug fixed in the previous patch for
CVE-2012-2655.  However, it's not hard to imagine that future additions of
more ALTER FUNCTION capability might unintentionally open up new hazards.
To forestall future problems, cause these support functions to be owned by
the bootstrap superuser, not the user creating the parent object.
2012-05-30 23:47:57 -04:00
Tom Lane
488c6dd170 Improve error message for ALTER COLUMN TYPE coercion failure.
Per recent discussion, the error message for this was actually a trifle
inaccurate, since it said "cannot be cast" which might be incorrect.
Adjust that wording, and add a HINT suggesting that a USING clause might
be needed.
2012-05-16 07:28:25 -04:00
Heikki Linnakangas
9e4637bf89 Update comments that became out-of-date with the PGXACT struct.
When the "hot" members of PGPROC were split off to separate PGXACT structs,
many PGPROC fields referred to in comments were moved to PGXACT, but the
comments were neglected in the commit. Mostly this is just a search/replace
of PGPROC with PGXACT, but the way the dummy PGPROC entries are created for
prepared transactions changed more, making some of the comments totally
bogus.

Noah Misch
2012-05-14 10:28:55 +03:00
Tom Lane
b8347138e9 Fix DROP TABLESPACE to unlink symlink when directory is not there.
If the tablespace directory is missing entirely, we allow DROP TABLESPACE
to go through, on the grounds that it should be possible to clean up the
catalog entry in such a situation.  However, we forgot that the pg_tblspc
symlink might still be there.  We should try to remove the symlink too
(but not fail if it's no longer there), since not doing so can lead to
weird behavior subsequently, as per report from Michael Nolan.

There was some discussion of adding dependency links to prevent DROP
TABLESPACE when the catalogs still contain references to the tablespace.
That might be worth doing too, but it's an orthogonal question, and in
any case wouldn't be back-patchable.

Back-patch to 9.0, which is as far back as the logic looks like this.
We could possibly do something similar in 8.x, but given the lack of
reports I'm not sure it's worth the trouble, and anyway the case could
not arise in the form the logic is meant to cover (namely, a post-DROP
transaction rollback having resurrected the pg_tablespace entry after
some or all of the filesystem infrastructure is gone).
2012-05-13 18:06:52 -04:00
Robert Haas
1331cc6c1a Prevent loss of init fork when truncating an unlogged table.
Fixes bug #6635, reported by Akira Kurosawa.
2012-05-11 09:48:56 -04:00
Magnus Hagander
916d589a10 Make "unexpected EOF" messages DEBUG1 unless in an open transaction
"Unexpected EOF on client connection" without an open transaction
is mostly noise, so turn it into DEBUG1. With an open transaction it's
still indicating a problem, so keep those as ERROR, and change the message
to indicate that it happened in a transaction.
2012-05-07 18:50:44 +02:00