Commit graph

25913 commits

Author SHA1 Message Date
Robert Haas
3ecab37d97 Teach autovacuum about multixact member wraparound.
The logic introduced in commit b69bf30b9b
and repaired in commits 669c7d20e6 and
7be47c56af helps to ensure that we don't
overwrite old multixact member information while it is still needed,
but a user who creates many large multixacts can still exhaust the
member space (and thus start getting errors) while autovacuum stands
idly by.

To fix this, progressively ramp down the effective value (but not the
actual contents) of autovacuum_multixact_freeze_max_age as member space
utilization increases.  This makes autovacuum more aggressive and also
reduces the threshold for a manual VACUUM to perform a full-table scan.

This patch leaves unsolved the problem of ensuring that emergency
autovacuums are triggered even when autovacuum=off.  We'll need to fix
that via a separate patch.

Thomas Munro and Robert Haas
2015-05-08 12:53:30 -04:00
Robert Haas
32c50af4cf Fix incorrect math in DetermineSafeOldestOffset.
The old formula didn't have enough parentheses, so it would do the wrong
thing, and it used / rather than % to find a remainder.  The effect of
these oversights is that the stop point chosen by the logic introduced in
commit b69bf30b9b might be rather
meaningless.

Thomas Munro, reviewed by Kevin Grittner, with a whitespace tweak by me.
2015-05-07 11:13:55 -04:00
Magnus Hagander
43ed06816f Properly send SCM status updates when shutting down service on Windows
The Service Control Manager should be notified regularly during a shutdown
that takes a long time. Previously we would increaes the counter, but forgot
to actually send the notification to the system. The loop counter was also
incorrectly initalized in the event that the startup of the system took long
enough for it to increase, which could cause the shutdown process not to wait
as long as expected.

Krystian Bigaj, reviewed by Michael Paquier
2015-05-07 15:09:21 +02:00
Robert Haas
603fe0181a Fix some problems with patch to fsync the data directory.
pg_win32_is_junction() was a typo for pgwin32_is_junction().  open()
was used not only in a two-argument form, which breaks on Windows,
but also where BasicOpenFile() should have been used.

Per reports from Andrew Dunstan and David Rowley.
2015-05-05 09:16:39 -04:00
Robert Haas
d8ac77ab17 Recursively fsync() the data directory after a crash.
Otherwise, if there's another crash, some writes from after the first
crash might make it to disk while writes from before the crash fail
to make it to disk.  This could lead to data corruption.

Back-patch to all supported versions.

Abhijit Menon-Sen, reviewed by Andres Freund and slightly revised
by me.
2015-05-04 14:19:32 -04:00
Andrew Dunstan
997066f445 Fix two small bugs in json's populate_record_worker
The first bug is not releasing a tupdesc when doing an early return out
of the function. The second bug is a logic error in choosing when to do
an early return if given an empty jsonb object.

Bug reports from Pavel Stehule and Tom Lane respectively.

Backpatch to 9.4 where these were introduced.
2015-05-04 12:43:16 -04:00
Tom Lane
79edb29812 Fix overlooked relcache invalidation in ALTER TABLE ... ALTER CONSTRAINT.
When altering the deferredness state of a foreign key constraint, we
correctly updated the catalogs and then invalidated the relcache state for
the target relation ... but that's not the only relation with relevant
triggers.  Must invalidate the other table as well, or the state change
fails to take effect promptly for operations triggered on the other table.
Per bug #13224 from Christian Ullrich.

In passing, reorganize regression test case for this feature so that it
isn't randomly injected into the middle of an unrelated test sequence.

Oversight in commit f177cbfe67.  Back-patch
to 9.4 where the faulty code was added.
2015-05-03 11:30:24 -04:00
Bruce Momjian
70fac48446 Mark views created from tables as replication identity 'nothing'
pg_dump turns tables into views using a method that was not setting
pg_class.relreplident properly.

Patch by Marko Tiikkaja

Backpatch through 9.4
2015-05-01 13:03:23 -04:00
Alvaro Herrera
7140e11d8a Fix pg_upgrade's multixact handling (again)
We need to create the pg_multixact/offsets file deleted by pg_upgrade
much earlier than we originally were: it was in TrimMultiXact(), which
runs after we exit recovery, but it actually needs to run earlier than
the first call to SetMultiXactIdLimit (before recovery), because that
routine already wants to read the first offset segment.

Per pg_upgrade trouble report from Jeff Janes.

While at it, silence a compiler warning about a pointless assert that an
unsigned variable was being tested non-negative.  This was a signed
constant in Thomas Munro's patch which I changed to unsigned before
commit.  Pointed out by Andres Freund.
2015-04-30 13:55:06 -03:00
Alvaro Herrera
9ae6c11f97 Code review for multixact bugfix
Reword messages, rename a confusingly named function.

Per Robert Haas.
2015-04-28 14:52:29 -03:00
Alvaro Herrera
942542cbba Protect against multixact members wraparound
Multixact member files are subject to early wraparound overflow and
removal: if the average multixact size is above a certain threshold (see
note below) the protections against offset overflow are not enough:
during multixact truncation at checkpoint time, some
pg_multixact/members files would be removed because the server considers
them to be old and not needed anymore.  This leads to loss of files that
are critical to interpret existing tuples's Xmax values.

To protect against this, since we don't have enough info in pg_control
and we can't modify it in old branches, we maintain shared memory state
about the oldest value that we need to keep; we use this during new
multixact creation to abort if an old still-needed file would get
overwritten.  This value is kept up to date by checkpoints, which makes
it not completely accurate but should be good enough.  We start emitting
warnings sometime earlier, so that the eventual multixact-shutdown
doesn't take DBAs completely by surprise (more precisely: once 20
members SLRU segments are remaining before shutdown.)

On troublesome average multixact size: The threshold size depends on the
multixact freeze parameters. The oldest age is related to the greater of
multixact_freeze_table_age and multixact_freeze_min_age: anything
older than that should be removed promptly by autovacuum.  If autovacuum
is keeping up with multixact freezing, the troublesome multixact average
size is
	(2^32-1) / Max(freeze table age, freeze min age)
or around 28 members per multixact.  Having an average multixact size
larger than that will eventually cause new multixact data to overwrite
the data area for older multixacts.  (If autovacuum is not able to keep
up, or there are errors in vacuuming, the actual maximum is
multixact_freeeze_max_age instead, at which point multixact generation
is stopped completely.  The default value for this limit is 400 million,
which means that the multixact size that would cause trouble is about 10
members).

Initial bug report by Timothy Garnett, bug #12990
Backpatch to 9.3, where the problem was introduced.

Authors: Álvaro Herrera, Thomas Munro
Reviews: Thomas Munro, Amit Kapila, Robert Haas, Kevin Grittner
2015-04-28 11:32:53 -03:00
Andres Freund
fd3dfc236c Use a fd opened for read/write when syncing slots during startup.
Some operating systems, including the reporter's windows, return EBADFD
or similar when fsync() is invoked on a O_RDONLY file descriptor.
Unfortunately RestoreSlotFromDisk() does exactly that; which causes
failures after restarts in at least some scenarios.

If you hit the bug the error message will be something like
ERROR: could not fsync file "pg_replslot/$name/state": Bad file descriptor

Simply use O_RDWR instead of O_RDONLY when opening the relevant file
descriptor to fix the bug.  Unfortunately I have no way of verifying the
fix, but we've seen similar problems in the past.

This bug goes back to 9.4 where slots were introduced. Backpatch
accordingly.

Reported-By: Patrice Drolet
Bug: #13143:
Discussion: 20150424101006.2556.60897@wrigleys.postgresql.org
2015-04-28 00:18:04 +02:00
Tom Lane
5f3d1909c5 Prevent improper reordering of antijoins vs. outer joins.
An outer join appearing within the RHS of an antijoin can't commute with
the antijoin, but somehow I missed teaching make_outerjoininfo() about
that.  In Teodor Sigaev's recent trouble report, this manifests as a
"could not find RelOptInfo for given relids" error within eqjoinsel();
but I think silently wrong query results are possible too, if the planner
misorders the joins and doesn't happen to trigger any internal consistency
checks.  It's broken as far back as we had antijoins, so back-patch to all
supported branches.
2015-04-25 16:44:27 -04:00
Noah Misch
0a5570e368 Build every ECPG library with -DFRONTEND.
Each of the libraries incorporates src/port files, which often check
FRONTEND.  Build systems disagreed on whether to build libpgtypes this
way.  Only libecpg incorporates files that rely on it today.  Back-patch
to 9.0 (all supported versions) to forestall surprises.
2015-04-24 19:29:21 -04:00
Tom Lane
d3398d085d Fix obsolete comment in set_rel_size().
The cross-reference to set_append_rel_pathlist() was obsoleted by
commit e2fa76d80b, which split what
had been set_rel_pathlist() and child routines into two sets of
functions.  But I (tgl) evidently missed updating this comment.

Back-patch to 9.2 to avoid unnecessary divergence among branches.

Amit Langote
2015-04-24 15:18:37 -04:00
Heikki Linnakangas
438a062d54 Fix deadlock at startup, if max_prepared_transactions is too small.
When the startup process recovers transactions by scanning pg_twophase
directory, it should clear MyLockedGxact after it's done processing each
transaction. Like we do during normal operation, at PREPARE TRANSACTION.
Otherwise, if the startup process exits due to an error, it will try to
clear the locking_backend field of the last recovered transaction. That's
usually harmless, but if the error happens in MarkAsPreparing, while
holding TwoPhaseStateLock, the shmem-exit hook will try to acquire
TwoPhaseStateLock again, and deadlock with itself.

This fixes bug #13128 reported by Grant McAlister. The bug was introduced
by commit bb38fb0d, so backpatch to all supported versions like that
commit.
2015-04-23 21:35:10 +03:00
Alvaro Herrera
292ea10573 Fix typo in comment
SLRU_SEGMENTS_PER_PAGE -> SLRU_PAGES_PER_SEGMENT

I introduced this ancient typo in subtrans.c and later propagated it to
multixact.c.  I fixed the latter in f741300c, but only back to 9.3;
backpatch to all supported branches for consistency.
2015-04-14 12:12:18 -03:00
Heikki Linnakangas
d72792d021 Don't archive bogus recycled or preallocated files after timeline switch.
After a timeline switch, we would leave behind recycled WAL segments that
are in the future, but on the old timeline. After promotion, and after they
become old enough to be recycled again, we would notice that they don't have
a .ready or .done file, create a .ready file for them, and archive them.
That's bogus, because the files contain garbage, recycled from an older
timeline (or prealloced as zeros). We shouldn't archive such files.

This could happen when we're following a timeline switch during replay, or
when we switch to new timeline at end-of-recovery.

To fix, whenever we switch to a new timeline, scan the data directory for
WAL segments on the old timeline, but with a higher segment number, and
remove them. Those don't belong to our timeline history, and are most
likely bogus recycled or preallocated files. They could also be valid files
that we streamed from the primary ahead of time, but in any case, they're
not needed to recover to the new timeline.
2015-04-13 17:22:21 +03:00
Heikki Linnakangas
131fb7fc66 Remove duplicated words in comments.
David Rowley
2015-04-12 10:49:27 +03:00
Alvaro Herrera
ec01c1c0a4 Fix autovacuum launcher shutdown sequence
It was previously possible to have the launcher re-execute its main loop
before shutting down if some other signal was received or an error
occurred after getting SIGTERM, as reported by Qingqing Zhou.

While investigating, Tom Lane further noticed that if autovacuum had
been disabled in the config file, it would misbehave by trying to start
a new worker instead of bailing out immediately -- it would consider
itself as invoked in emergency mode.

Fix both problems by checking the shutdown flag in a few more places.
These problems have existed since autovacuum was introduced, so
backpatch all the way back.
2015-04-08 13:19:49 -03:00
Tom Lane
f97a0a2cc4 Fix assorted inconsistent function declarations.
While gcc doesn't complain if you declare a function "static" and then
define it not-static, other compilers do; and in any case the code is
highly misleading this way.  Add the missing "static" keywords to a
couple of recent patches.  Per buildfarm member pademelon.
2015-04-07 16:56:21 -04:00
Tom Lane
1d71d36ffe Fix incorrect matching of subexpressions in outer-join plan nodes.
Previously we would re-use input subexpressions in all expression trees
attached to a Join plan node.  However, if it's an outer join and the
subexpression appears in the nullable-side input, this is potentially
incorrect for apparently-matching subexpressions that came from above
the outer join (ie, targetlist and qpqual expressions), because the
executor will treat the subexpression value as NULL when maybe it should
not be.

The case is fairly hard to hit because (a) you need a non-strict
subexpression (else NULL is correct), and (b) we don't usually compute
expressions in the outputs of non-toplevel plan nodes.  But we might do
so if the expressions are sort keys for a mergejoin, for example.

Probably in the long run we should make a more explicit distinction between
Vars appearing above and below an outer join, but that will be a major
planner redesign and not at all back-patchable.  For the moment, just hack
set_join_references so that it will not match any non-Var expressions
coming from nullable inputs to expressions that came from above the join.
(This is somewhat overkill, in that a strict expression could still be
matched, but it doesn't seem worth the effort to check that.)

Per report from Qingqing Zhou.  The added regression test case is based
on his example.

This has been broken for a very long time, so back-patch to all active
branches.
2015-04-04 19:55:15 -04:00
Tom Lane
406b2f0b4c Fix TAP tests to use only standard command-line argument ordering.
Some of the TAP tests were supposing that PG programs would accept switches
after non-switch arguments on their command lines.  While GNU getopt_long()
does allow that, our own implementation does not, and it's nowhere
suggested in our documentation that such cases should work.  Adjust the
tests to use only the documented syntax.

Back-patch to 9.4, since without this the TAP tests fail when run with
src/port's getopt_long() implementation.

Michael Paquier
2015-04-04 13:34:23 -04:00
Tom Lane
c419321061 Remove unnecessary variables in _hash_splitbucket().
Commit ed9cc2b5df made it unnecessary to pass
start_nblkno to _hash_splitbucket(), and for that matter unnecessary to
have the internal nblkno variable either.  My compiler didn't complain
about that, but some did.  I also rearranged the use of oblkno a bit to
make that case more parallel.

Report and initial patch by Petr Jelinek, rearranged a bit by me.
Back-patch to all branches, like the previous patch.
2015-04-03 16:49:11 -04:00
Tom Lane
ee0d06c0b9 Fix rare startup failure induced by MVCC-catalog-scans patch.
While a new backend nominally participates in sinval signaling starting
from the SharedInvalBackendInit call near the top of InitPostgres, it
cannot recognize sinval messages for unshared catalogs of its database
until it has set up MyDatabaseId.  This is not problematic for the catcache
or relcache, which by definition won't have loaded any data from or about
such catalogs before that point.  However, commit 568d4138c6
introduced a mechanism for re-using MVCC snapshots for catalog scans, and
made invalidation of those depend on recognizing relevant sinval messages.
So it's possible to establish a catalog snapshot to read pg_authid and
pg_database, then before we set MyDatabaseId, receive sinval messages that
should result in invalidating that snapshot --- but do not, because we
don't realize they are for our database.  This mechanism explains the
intermittent buildfarm failures we've seen since commit 31eae6028e.
That commit was not itself at fault, but it introduced a new regression
test that does reconnections concurrently with the "vacuum full pg_am"
command in vacuum.sql.  This allowed the pre-existing error to be exposed,
given just the right timing, because we'd fail to update our information
about how to access pg_am.  In principle any VACUUM FULL on a system
catalog could have created a similar hazard for concurrent incoming
connections.  Perhaps there are more subtle failure cases as well.

To fix, force invalidation of the catalog snapshot as soon as we've
set MyDatabaseId.

Back-patch to 9.4 where the error was introduced.
2015-04-03 00:07:29 -04:00
Robert Haas
a1f4ade01c After a crash, don't restart workers with BGW_NEVER_RESTART.
Amit Khandekar
2015-04-02 14:39:18 -04:00
Simon Riggs
3ebe6d8956 Correct comment to use RS_EPHEMERAL 2015-04-02 07:49:48 -04:00
Alvaro Herrera
a44e54cf4b psql: fix \connect with URIs and conninfo strings
psql was already accepting conninfo strings as the first parameter in
\connect, but the way it worked wasn't sane; some of the other
parameters would get the previous connection's values, causing it to
connect to a completely unexpected server or, more likely, not finding
any server at all because of completely wrong combinations of
parameters.

Fix by explicitely checking for a conninfo-looking parameter in the
dbname position; if one is found, use its complete specification rather
than mix with the other arguments.  Also, change tab-completion to not
try to complete conninfo/URI-looking "dbnames" and document that
conninfos are accepted as first argument.

There was a weak consensus to backpatch this, because while the behavior
of using the dbname as a conninfo is nowhere documented for \connect, it
is reasonable to expect that it works because it does work in many other
contexts.  Therefore this is backpatched all the way back to 9.0.

To implement this, routines previously private to libpq have been
duplicated so that psql can decide what looks like a conninfo/URI
string.  In back branches, just duplicate the same code all the way back
to 9.2, where URIs where introduced; 9.0 and 9.1 have a simpler version.
In master, the routines are moved to src/common and renamed.

Author: David Fetter, Andrew Dunstan.  Some editorialization by me
(probably earning a Gierth's "Sloppy" badge in the process.)
Reviewers: Andrew Gierth, Erik Rijkers, Pavel Stěhule, Stephen Frost,
Robert Haas, Andrew Dunstan.
2015-04-01 20:00:07 -03:00
Heikki Linnakangas
4637123907 Remove spurious semicolons.
Petr Jelinek
2015-03-31 15:13:35 +03:00
Andrew Dunstan
2366761bf9 Run pg_upgrade and pg_resetxlog with restricted token on Windows
As with initdb these programs need to run with a restricted token, and
if they don't pg_upgrade will fail when run as a user with Adminstrator
privileges.

Backpatch to all live branches. On the development branch the code is
reorganized so that the restricted token code is now in a single
location. On the stable bramches a less invasive change is made by
simply copying the relevant code to pg_upgrade.c and pg_resetxlog.c.

Patches and bug report from Muhammad Asif Naeem, reviewed by Michael
Paquier, slightly edited by me.
2015-03-30 17:16:57 -04:00
Tom Lane
a6a8bf5cdd Fix bogus concurrent use of _hash_getnewbuf() in bucket split code.
_hash_splitbucket() obtained the base page of the new bucket by calling
_hash_getnewbuf(), but it held no exclusive lock that would prevent some
other process from calling _hash_getnewbuf() at the same time.  This is
contrary to _hash_getnewbuf()'s API spec and could in fact cause failures.
In practice, we must only call that function while holding write lock on
the hash index's metapage.

An additional problem was that we'd already modified the metapage's bucket
mapping data, meaning that failure to extend the index would leave us with
a corrupt index.

Fix both issues by moving the _hash_getnewbuf() call to just before we
modify the metapage in _hash_expandtable().

Unfortunately there's still a large problem here, which is that we could
also incur ENOSPC while trying to get an overflow page for the new bucket.
That would leave the index corrupt in a more subtle way, namely that some
index tuples that should be in the new bucket might still be in the old
one.  Fixing that seems substantially more difficult; even preallocating as
many pages as we could possibly need wouldn't entirely guarantee that the
bucket split would complete successfully.  So for today let's just deal
with the base case.

Per report from Antonin Houska.  Back-patch to all active branches.
2015-03-30 16:40:05 -04:00
Tom Lane
2897e069c1 Fix rare core dump in BackendIdGetTransactionIds().
BackendIdGetTransactionIds() neglected the possibility that the PROC
pointer in a ProcState array entry is null.  In current usage, this could
only crash if the other backend had exited since pgstat_read_current_status
saw it as active, which is a pretty narrow window.  But it's reachable in
the field, per bug #12918 from Vladimir Borodin.

Back-patch to 9.4 where the faulty code was introduced.
2015-03-30 13:05:35 -04:00
Tom Lane
f444de5e34 Add vacuum_delay_point call in compute_index_stats's per-sample-row loop.
Slow functions in index expressions might cause this loop to take long
enough to make it worth being cancellable.  Probably it would be enough
to call CHECK_FOR_INTERRUPTS here, but for consistency with other
per-sample-row loops in this file, let's use vacuum_delay_point.

Report and patch by Jeff Janes.  Back-patch to all supported branches.
2015-03-29 15:04:18 -04:00
Tatsuo Ishii
3740bb8347 Make SyncRepWakeQueue to a static function
It is only used in src/backend/replication/syncrep.c.

Back-patch to all supported branches except 9.1 which declares the
function as static.
2015-03-26 10:38:11 +09:00
Tom Lane
3171edddc4 Fix ExecOpenScanRelation to take a lock on a ROW_MARK_COPY relation.
ExecOpenScanRelation assumed that any relation listed in the ExecRowMark
list has been locked by InitPlan; but this is not true if the rel's
markType is ROW_MARK_COPY, which is possible if it's a foreign table.

In most (possibly all) cases, failure to acquire a lock here isn't really
problematic because the parser, planner, or plancache would have taken the
appropriate lock already.  In principle though it might leave us vulnerable
to working with a relation that we hold no lock on, and in any case if the
executor isn't depending on previously-taken locks otherwise then it should
not do so for ROW_MARK_COPY relations.

Noted by Etsuro Fujita.  Back-patch to all active versions, since the
inconsistency has been there a long time.  (It's almost certainly
irrelevant in 9.0, since that predates foreign tables, but the code's
still wrong on its own terms.)
2015-03-24 15:53:06 -04:00
Andres Freund
16be9737cc Don't delay replication for less than recovery_min_apply_delay's resolution.
Recovery delays are implemented by waiting on a latch, and latches take
milliseconds as a parameter. The required amount of waiting was computed
using microsecond resolution though and the wait loop's abort condition
was checking the delay in microseconds as well.  This could lead to
short spurts of busy looping when the overall wait time was below a
millisecond, but above 0 microseconds.

Instead just formulate the wait loop's abort condition in millisecond
granularity as well. Given that that's recovery_min_apply_delay
resolution, it seems harmless to not wait for less than a millisecond.

Backpatch to 9.4 where recovery_min_apply_delay was introduced.

Discussion: 20150323141819.GH26995@alap3.anarazel.de
2015-03-23 16:52:17 +01:00
Robert Haas
76d07a2a06 Fix status reporting for terminated bgworkers that were never started.
Previously, GetBackgroundWorkerPid() would return BGWH_NOT_YET_STARTED
if the slot used for the worker registration had not been reused by
unrelated activity, and BGWH_STOPPED if it had.  Either way, a process
that had requested notification when the state of one of its
background workers changed did not receive such notifications.  Fix
things so that GetBackgroundWorkerPid() always returns BGWH_STOPPED in
this situation, so that we do not erroneously give waiters the
impression that the worker will eventually be started; and send
notifications just as we would if the process terminated after having
been started, so that it's possible to wait for the postmaster to
process a worker termination request without polling.

Discovered by Amit Kapila during testing of parallel sequential scan.
Analysis and fix by me.  Back-patch to 9.4; there may not be anyone
relying on this interface yet, but if anyone is, the new behavior is a
clear improvement.
2015-03-19 11:08:54 -04:00
Tom Lane
c415c13b7e Build src/port/dirmod.c only on Windows.
Since commit ba7c5975ad, port/dirmod.c
has contained only Windows-specific functions.  Most platforms don't
seem to mind uselessly building an empty file, but OS X for one issues
warnings.  Hence, treat dirmod.c as a Windows-specific file selected
by configure rather than one that's always built.  We can revert this
change if dirmod.c ever gains any non-Windows functionality again.

Back-patch to 9.4 where the mentioned commit appeared.
2015-03-14 14:08:45 -04:00
Tom Lane
f50b5c7d0d Remove workaround for ancient incompatibility between readline and libedit.
GNU readline defines the return value of write_history() as "zero if OK,
else an errno code".  libedit's version of that function used to have a
different definition (to wit, "-1 if error, else the number of lines
written to the file").  We tried to work around that by checking whether
errno had become nonzero, but this method has never been kosher according
to the published API of either library.  It's reportedly completely broken
in recent Ubuntu releases: psql bleats about "No such file or directory"
when saving ~/.psql_history, even though the write worked fine.

However, libedit has been following the readline definition since somewhere
around 2006, so it seems all right to finally break compatibility with
ancient libedit releases and trust that the return value is what readline
specifies.  (I'm not sure when the various Linux distributions incorporated
this fix, but I did find that OS X has been shipping fixed versions since
10.5/Leopard.)

If anyone is still using such an ancient libedit, they will find that psql
complains it can't write ~/.psql_history at exit, even when the file was
written correctly.  This is no worse than the behavior we're fixing for
current releases.

Back-patch to all supported branches.
2015-03-14 13:43:08 -04:00
Tatsuo Ishii
4324785979 Fix integer overflow in debug message of walreceiver
The message tries to tell the replication apply delay which fails if
the first WAL record is not applied yet. Fix is, instead of telling
overflowed minus numeric, showing "N/A" which indicates that the delay
data is not yet available. Problem reported by me and patch by
Fabrízio de Royes Mello.

Back patched to 9.4, 9.3 and 9.2 stable branches (9.1 and 9.0 do not
have the debug message).
2015-03-14 08:21:45 +09:00
Tom Lane
32269be59e Ensure tableoid reads correctly in EvalPlanQual-manufactured tuples.
The ROW_MARK_COPY path in EvalPlanQualFetchRowMarks() was just setting
tableoid to InvalidOid, I think on the assumption that the referenced
RTE must be a subquery or other case without a meaningful OID.  However,
foreign tables also use this code path, and they do have meaningful
table OIDs; so failure to set the tuple field can lead to user-visible
misbehavior.  Fix that by fetching the appropriate OID from the range
table.

There's still an issue about whether CTID can ever have a meaningful
value in this case; at least with postgres_fdw foreign tables, it does.
But that is a different problem that seems to require a significantly
different patch --- it's debatable whether postgres_fdw really wants to
use this code path at all.

Simplified version of a patch by Etsuro Fujita, who also noted the
problem to begin with.  The issue can be demonstrated in all versions
having FDWs, so back-patch to 9.1.
2015-03-12 13:39:10 -04:00
Heikki Linnakangas
d810720262 Fix memory leaks in GIN index vacuum.
Per bug #12850 by Walter Nordmann. Backpatch to 9.4 where the leak was
introduced.
2015-03-12 15:40:07 +01:00
Tom Lane
3d0d3c02ff Cast to (void *) rather than (int *) when passing int64's to PQfn().
This is a possibly-vain effort to silence a Coverity warning about
bogus endianness dependency.  The code's fine, because it takes care
of endianness issues for itself, but Coverity sees an int64 being
passed to an int* argument and not unreasonably suspects something's
wrong.  I'm not sure if putting the void* cast in the way will shut it
up; but it can't hurt and seems better from a documentation standpoint
anyway, since the pointer is not used as an int* in this code path.

Just for a bit of additional safety, verify that the result length
is 8 bytes as expected.

Back-patch to 9.3 where the code in question was added.
2015-03-08 13:58:36 -04:00
Tom Lane
089e5ab1f8 Fix documentation for libpq's PQfn().
The SGML docs claimed that 1-byte integers could be sent or received with
the "isint" options, but no such behavior has ever been implemented in
pqGetInt() or pqPutInt().  The in-code documentation header for PQfn() was
even less in tune with reality, and the code itself used parameter names
matching neither the SGML docs nor its libpq-fe.h declaration.  Do a bit
of additional wordsmithing on the SGML docs while at it.

Since the business about 1-byte integers is a clear documentation bug,
back-patch to all supported branches.
2015-03-08 13:35:37 -04:00
Tom Lane
629f8613f0 Rethink function argument sorting in pg_dump.
Commit 7b583b20b1 created an unnecessary
dump failure hazard by applying pg_get_function_identity_arguments()
to every function in the database, even those that won't get dumped.
This could result in snapshot-related problems if concurrent sessions are,
for example, creating and dropping temporary functions, as noted by Marko
Tiikkaja in bug #12832.  While this is by no means pg_dump's only such
issue with concurrent DDL, it's unfortunate that we added a new failure
mode for cases that used to work, and even more so that the failure was
created for basically cosmetic reasons (ie, to sort overloaded functions
more deterministically).

To fix, revert that patch and instead sort function arguments using
information that pg_dump has available anyway, namely the names of the
argument types.  This will produce a slightly different sort ordering for
overloaded functions than the previous coding; but applying strcmp
directly to the output of pg_get_function_identity_arguments really was
a bit odd anyway.  The sorting will still be name-based and hence
independent of possibly-installation-specific OID assignments.  A small
additional benefit is that sorting now works regardless of server version.

Back-patch to 9.3, where the previous commit appeared.
2015-03-06 13:27:46 -05:00
Alvaro Herrera
7499776348 Fix user mapping object description
We were using "user mapping for user XYZ" as description for user mappings, but
that's ambiguous because users can have mappings on multiple foreign
servers; therefore change it to "for user XYZ on server UVW" instead.
Object identities for user mappings are also updated in the same way, in
branches 9.3 and above.

The incomplete description string was introduced together with the whole
SQL/MED infrastructure by commit cae565e503 of 8.4 era, so backpatch all
the way back.
2015-03-05 18:03:16 -03:00
Alvaro Herrera
2570e28827 Add comment for "is_internal" parameter
This was missed in my commit f4c4335 of 9.3 vintage, so backpatch to
that.
2015-03-03 14:04:01 -03:00
Stephen Frost
c05fa34333 Fix pg_dump handling of extension config tables
Since 9.1, we've provided extensions with a way to denote
"configuration" tables- tables created by an extension which the user
may modify.  By marking these as "configuration" tables, the extension
is asking for the data in these tables to be pg_dump'd (tables which
are not marked in this way are assumed to be entirely handled during
CREATE EXTENSION and are not included at all in a pg_dump).

Unfortunately, pg_dump neglected to consider foreign key relationships
between extension configuration tables and therefore could end up
trying to reload the data in an order which would cause FK violations.

This patch teaches pg_dump about these dependencies, so that the data
dumped out is done so in the best order possible.  Note that there's no
way to handle circular dependencies, but those have yet to be seen in
the wild.

The release notes for this should include a caution to users that
existing pg_dump-based backups may be invalid due to this issue.  The
data is all there, but restoring from it will require extracting the
data for the configuration tables and then loading them in the correct
order by hand.

Discussed initially back in bug #6738, more recently brought up by
Gilles Darold, who provided an initial patch which was further reworked
by Michael Paquier.  Further modifications and documentation updates
by me.

Back-patch to 9.1 where we added the concept of extension configuration
tables.
2015-03-02 14:12:28 -05:00
Stephen Frost
99ccda339a Fix targetRelation initializiation in prepsecurity
In 6f9bd50eab, we modified
expand_security_quals() to tell expand_security_qual() about when the
current RTE was the targetRelation.  Unfortunately, that commit
initialized the targetRelation variable used outside of the loop over
the RTEs instead of at the start of it.

This patch moves the variable and the initialization of it into the
loop, where it should have been to begin with.

Pointed out by Dean Rasheed.

Back-patch to 9.4 as the original commit was.
2015-03-01 15:27:40 -05:00
Noah Misch
22dd465d34 Unlink static libraries before rebuilding them.
When the library already exists in the build directory, "ar" preserves
members not named on its command line.  This mattered when, for example,
a "configure" rerun dropped a file from $(LIBOBJS).  libpgport carried
the obsolete member until "make clean".  Back-patch to 9.0 (all
supported versions).
2015-03-01 13:06:33 -05:00
Tom Lane
fdacbf9e89 Fix planning of star-schema-style queries.
Part of the intent of the parameterized-path mechanism was to handle
star-schema queries efficiently, but some overly-restrictive search
limiting logic added in commit e2fa76d80b
prevented such cases from working as desired.  Fix that and add a
regression test about it.  Per gripe from Marc Cousin.

This is arguably a bug rather than a new feature, so back-patch to 9.2
where parameterized paths were introduced.
2015-02-28 12:43:04 -05:00
Tom Lane
60ff5de22a Suppress uninitialized-variable warning from less-bright compilers.
The type variable must get set on first iteration of the while loop,
but there are reasonably modern gcc versions that don't realize that.
Initialize it with a dummy value.  This undoes a removal of initialization
in commit 654809e770.
2015-02-28 10:46:10 -05:00
Alvaro Herrera
ecfe1a1889 Fix a couple of trivial issues in jsonb.c
Typo "aggreagate" appeared three times, and the return value of function
JsonbIteratorNext() was being assigned to an int variable in a bunch of
places.
2015-02-27 18:54:49 -03:00
Andrew Dunstan
79afe6e66f Render infinite date/timestamps as 'infinity' for json/jsonb
Commit ab14a73a6c raised an error in these cases and later the
behaviour was copied to jsonb. This is what the XML code, which we
then adopted, does, as the XSD types don't accept infinite values.
However, json dates and timestamps are just strings as far as json is
concerned, so there is no reason not to render these values as
'infinity'.

The json portion of this is backpatched to 9.4 where the behaviour was
introduced. The jsonb portion only affects the development branch.

Per gripe on pgsql-general.
2015-02-26 12:34:43 -05:00
Andres Freund
d721151127 Reconsider when to wait for WAL flushes/syncrep during commit.
Up to now RecordTransactionCommit() waited for WAL to be flushed (if
synchronous_commit != off) and to be synchronously replicated (if
enabled), even if a transaction did not have a xid assigned. The primary
reason for that is that sequence's nextval() did not assign a xid, but
are worthwhile to wait for on commit.

This can be problematic because sometimes read only transactions do
write WAL, e.g. HOT page prune records. That then could lead to read only
transactions having to wait during commit. Not something people expect
in a read only transaction.

This lead to such strange symptoms as backends being seemingly stuck
during connection establishment when all synchronous replicas are
down. Especially annoying when said stuck connection is the standby
trying to reconnect to allow syncrep again...

This behavior also is involved in a rather complicated <= 9.4 bug where
the transaction started by catchup interrupt processing waited for
syncrep using latches, but didn't get the wakeup because it was already
running inside the same overloaded signal handler. Fix the issue here
doesn't properly solve that issue, merely papers over the problems. In
9.5 catchup interrupts aren't processed out of signal handlers anymore.

To fix all this, make nextval() acquire a top level xid, and only wait for
transaction commit if a transaction both acquired a xid and emitted WAL
records.  If only a xid has been assigned we don't uselessly want to
wait just because of writes to temporary/unlogged tables; if only WAL
has been written we don't want to wait just because of HOT prunes.

The xid assignment in nextval() is unlikely to cause overhead in
real-world workloads. For one it only happens SEQ_LOG_VALS/32 values
anyway, for another only usage of nextval() without using the result in
an insert or similar is affected.

Discussion: 20150223165359.GF30784@awork2.anarazel.de,
    369698E947874884A77849D8FE3680C2@maumau,
    5CF4ABBA67674088B3941894E22A0D25@maumau

Per complaint from maumau and Thom Brown

Backpatch all the way back; 9.0 doesn't have syncrep, but it seems
better to be consistent behavior across all maintained branches.
2015-02-26 12:50:07 +01:00
Noah Misch
38930e473a Free SQLSTATE and SQLERRM no earlier than other PL/pgSQL variables.
"RETURN SQLERRM" prompted plpgsql_exec_function() to read from freed
memory.  Back-patch to 9.0 (all supported versions).  Little code ran
between the premature free and the read, so non-assert builds are
unlikely to witness user-visible consequences.
2015-02-25 23:48:49 -05:00
Stephen Frost
f16270aded Add locking clause for SB views for update/delete
In expand_security_qual(), we were handling locking correctly when a
PlanRowMark existed, but not when we were working with the target
relation (which doesn't have any PlanRowMarks, but the subquery created
for the security barrier quals still needs to lock the rows under it).

Noted by Etsuro Fujita when working with the Postgres FDW, which wasn't
properly issuing a SELECT ... FOR UPDATE to the remote side under a
DELETE.

Back-patch to 9.4 where updatable security barrier views were
introduced.

Per discussion with Etsuro and Dean Rasheed.
2015-02-25 21:36:40 -05:00
Tom Lane
2164a0de29 Fix dumping of views that are just VALUES(...) but have column aliases.
The "simple" path for printing VALUES clauses doesn't work if we need
to attach nondefault column aliases, because there's noplace to do that
in the minimal VALUES() syntax.  So modify get_simple_values_rte() to
detect nondefault aliases and treat that as a non-simple case.  This
further exposes that the "non-simple" path never actually worked;
it didn't produce valid syntax.  Fix that too.  Per bug #12789 from
Curtis McEnroe, and analysis by Andrew Gierth.

Back-patch to all supported branches.  Before 9.3, this also requires
back-patching the part of commit 092d7ded29
that created get_simple_values_rte() to begin with; inserting the extra
test into the old factorization of that logic would've been too messy.
2015-02-25 12:01:12 -05:00
Andres Freund
89629f2895 Guard against spurious signals in LockBufferForCleanup.
When LockBufferForCleanup() has to wait for getting a cleanup lock on a
buffer it does so by setting a flag in the buffer header and then wait
for other backends to signal it using ProcWaitForSignal().
Unfortunately LockBufferForCleanup() missed that ProcWaitForSignal() can
return for other reasons than the signal it is hoping for. If such a
spurious signal arrives the wait flags on the buffer header will still
be set. That then triggers "ERROR: multiple backends attempting to wait
for pincount 1".

The fix is simple, unset the flag if still set when retrying. That
implies an additional spinlock acquisition/release, but that's unlikely
to matter given the cost of waiting for a cleanup lock.  Alternatively
it'd have been possible to move responsibility for maintaining the
relevant flag to the waiter all together, but that might have had
negative consequences due to possible floods of signals. Besides being
more invasive.

This looks to be a very longstanding bug. The relevant code in
LockBufferForCleanup() hasn't changed materially since its introduction
and ProcWaitForSignal() was documented to return for unrelated reasons
since 8.2.  The master only patch series removing ImmediateInterruptOK
made it much easier to hit though, as ProcSendSignal/ProcWaitForSignal
now uses a latch shared with other tasks.

Per discussion with Kevin Grittner, Tom Lane and me.

Backpatch to all supported branches.

Discussion: 11553.1423805224@sss.pgh.pa.us
2015-02-23 16:14:14 +01:00
Heikki Linnakangas
0214a61e06 Fix potential deadlock with libpq non-blocking mode.
If libpq output buffer is full, pqSendSome() function tries to drain any
incoming data. This avoids deadlock, if the server e.g. sends a lot of
NOTICE messages, and blocks until we read them. However, pqSendSome() only
did that in blocking mode. In non-blocking mode, the deadlock could still
happen.

To fix, take a two-pronged approach:

1. Change the documentation to instruct that when PQflush() returns 1, you
should wait for both read- and write-ready, and call PQconsumeInput() if it
becomes read-ready. That fixes the deadlock, but applications are not going
to change overnight.

2. In pqSendSome(), drain the input buffer before returning 1. This
alleviates the problem for applications that only wait for write-ready. In
particular, a slow but steady stream of NOTICE messages during COPY FROM
STDIN will no longer cause a deadlock. The risk remains that the server
attempts to send a large burst of data and fills its output buffer, and at
the same time the client also sends enough data to fill its output buffer.
The application will deadlock if it goes to sleep, waiting for the socket
to become write-ready, before the server's data arrives. In practice,
NOTICE messages and such that the server might be sending are usually
short, so it's highly unlikely that the server would fill its output buffer
so quickly.

Backpatch to all supported versions.
2015-02-23 13:32:39 +02:00
Tom Lane
9c15a778a0 Fix misparsing of empty value in conninfo_uri_parse_params().
After finding an "=" character, the pointer was advanced twice when it
should only advance once.  This is harmless as long as the value after "="
has at least one character; but if it doesn't, we'd miss the terminator
character and include too much in the value.

In principle this could lead to reading off the end of memory.  It does not
seem worth treating as a security issue though, because it would happen on
client side, and besides client logic that's taking conninfo strings from
untrusted sources has much worse security problems than this.

Report and patch received off-list from Thomas Fanghaenel.
Back-patch to 9.2 where the faulty code was introduced.
2015-02-21 12:59:35 -05:00
Alvaro Herrera
66463a3cf1 Fix object identities for pg_conversion objects
We were neglecting to schema-qualify them.

Backpatch to 9.3, where object identities were introduced as a concept
by commit f8348ea32e.
2015-02-18 14:28:12 -03:00
Tom Lane
a75dfb73e1 Fix failure to honor -Z compression level option in pg_dump -Fd.
cfopen() and cfopen_write() failed to pass the compression level through
to zlib, so that you always got the default compression level if you got
any at all.

In passing, also fix these and related functions so that the correct errno
is reliably returned on failure; the original coding supposes that free()
cannot change errno, which is untrue on at least some platforms.

Per bug #12779 from Christoph Berg.  Back-patch to 9.1 where the faulty
code was introduced.

Michael Paquier
2015-02-18 11:43:00 -05:00
Tom Lane
a271c9260f Remove code to match IPv4 pg_hba.conf entries to IPv4-in-IPv6 addresses.
In investigating yesterday's crash report from Hugo Osvaldo Barrera, I only
looked back as far as commit f3aec2c7f5 where the breakage occurred
(which is why I thought the IPv4-in-IPv6 business was undocumented).  But
actually the logic dates back to commit 3c9bb8886d and was simply
broken by erroneous refactoring in the later commit.  A bit of archives
excavation shows that we added the whole business in response to a report
that some 2003-era Linux kernels would report IPv4 connections as having
IPv4-in-IPv6 addresses.  The fact that we've had no complaints since 9.0
seems to be sufficient confirmation that no modern kernels do that, so
let's just rip it all out rather than trying to fix it.

Do this in the back branches too, thus essentially deciding that our
effective behavior since 9.0 is correct.  If there are any platforms on
which the kernel reports IPv4-in-IPv6 addresses as such, yesterday's fix
would have made for a subtle and potentially security-sensitive change in
the effective meaning of IPv4 pg_hba.conf entries, which does not seem like
a good thing to do in minor releases.  So let's let the post-9.0 behavior
stand, and change the documentation to match it.

In passing, I failed to resist the temptation to wordsmith the description
of pg_hba.conf IPv4 and IPv6 address entries a bit.  A lot of this text
hasn't been touched since we were IPv4-only.
2015-02-17 12:49:18 -05:00
Robert Haas
5e49c98e0a Improve pg_check_dir code and comments.
Avoid losing errno if readdir() fails and closedir() works.  Consistently
return 4 rather than 3 if both a lost+found directory and other files are
found, rather than returning one value or the other depending on the
order of the directory listing.  Update comments to match the actual
behavior.

These oversights date to commits 6f03927fce
and 17f1523932.

Marco Nenciarini
2015-02-17 10:50:49 -05:00
Tom Lane
23291a796f Fix misuse of memcpy() in check_ip().
The previous coding copied garbage into a local variable, pretty much
ensuring that the intended test of an IPv6 connection address against a
promoted IPv4 address from pg_hba.conf would never match.  The lack of
field complaints likely indicates that nobody realized this was supposed
to work, which is unsurprising considering that no user-facing docs suggest
it should work.

In principle this could have led to a SIGSEGV due to reading off the end of
memory, but since the source address would have pointed to somewhere in the
function's stack frame, that's quite unlikely.  What led to discovery of
the bug is Hugo Osvaldo Barrera's report of a crash after an OS upgrade,
which is probably because he is now running a system in which memcpy raises
abort() upon detecting overlapping source and destination areas.  (You'd
have to additionally suppose some things about the stack frame layout to
arrive at this conclusion, but it seems plausible.)

This has been broken since the code was added, in commit f3aec2c7f5,
so back-patch to all supported branches.
2015-02-16 16:17:59 -05:00
Tom Lane
1bf32972e6 Fix null-pointer-deref crash while doing COPY IN with check constraints.
In commit bf7ca15875 I introduced an
assumption that an RTE referenced by a whole-row Var must have a valid eref
field.  This is false for RTEs constructed by DoCopy, and there are other
places taking similar shortcuts.  Perhaps we should make all those places
go through addRangeTableEntryForRelation or its siblings instead of having
ad-hoc logic, but the most reliable fix seems to be to make the new code in
ExecEvalWholeRowVar cope if there's no eref.  We can reasonably assume that
there's no need to insert column aliases if no aliases were provided.

Add a regression test case covering this, and also verifying that a sane
column name is in fact available in this situation.

Although the known case only crashes in 9.4 and HEAD, it seems prudent to
back-patch the code change to 9.2, since all the ingredients for a similar
failure exist in the variant patch applied to 9.3 and 9.2.

Per report from Jean-Pierre Pelletier.
2015-02-15 23:26:45 -05:00
Peter Eisentraut
9a165aa0b4 pg_regress: Write processed input/*.source into output dir
Before, it was writing the processed files into the input directory,
which is incorrect in a vpath build.
2015-02-15 01:20:44 -05:00
Heikki Linnakangas
56a23a83fd Fix broken #ifdef for __sparcv8
Rob Rowan. Backpatch to all supported versions, like the patch that added
the broken #ifdef.
2015-02-13 23:56:57 +02:00
Bruce Momjian
c7bc5be11d pg_upgrade: preserve freeze info for postgres/template1 dbs
pg_database.datfrozenxid and pg_database.datminmxid were not preserved
for the 'postgres' and 'template1' databases.  This could cause missing
clog file errors on access to user tables and indexes after upgrades in
these databases.

Backpatch through 9.0
2015-02-11 21:02:36 -05:00
Tom Lane
0ac1fedab9 Fix missing PQclear() in libpqrcv_endstreaming().
This omission leaked one PGresult per WAL streaming cycle, which possibly
would never be enough to notice in the real world, but it's still a leak.

Per Coverity.  Back-patch to 9.3 where the error was introduced.
2015-02-11 19:20:49 -05:00
Tom Lane
e44735c488 Fix minor memory leak in ident_inet().
We'd leak the ident_serv data structure if the second pg_getaddrinfo_all
(the one for the local address) failed.  This is not of great consequence
because a failure return here just leads directly to backend exit(), but
if this function is going to try to clean up after itself at all, it should
not have such holes in the logic.  Try to fix it in a future-proof way by
having all the failure exits go through the same cleanup path, rather than
"optimizing" some of them.

Per Coverity.  Back-patch to 9.2, which is as far back as this patch
applies cleanly.
2015-02-11 19:09:54 -05:00
Tom Lane
ee8d377831 Fix more memory leaks in failure path in buildACLCommands.
We already had one go at this issue in commit d73b7f973d, but we
failed to notice that buildACLCommands also leaked several PQExpBuffers
along with a simply malloc'd string.  This time let's try to make the
fix a bit more future-proof by eliminating the separate exit path.

It's still not exactly critical because pg_dump will curl up and die on
failure; but since the amount of the potential leak is now several KB,
it seems worth back-patching as far as 9.2 where the previous fix landed.

Per Coverity, which evidently is smarter than clang's static analyzer.
2015-02-11 18:35:23 -05:00
Michael Meskes
66c4ea8cb6 Fixed array handling in ecpg.
When ecpg was rewritten to the new protocol version not all variable types
were corrected. This patch rewrites the code for these types to fix that. It
also fixes the documentation to correctly tell the status of array handling.
2015-02-11 10:57:02 +01:00
Tom Lane
a592e58835 Fix pg_dump's heuristic for deciding which casts to dump.
Back in 2003 we had a discussion about how to decide which casts to dump.
At the time pg_dump really only considered an object's containing schema
to decide what to dump (ie, dump whatever's not in pg_catalog), and so
we chose a complicated idea involving whether the underlying types were to
be dumped (cf commit a6790ce857).  But users
are allowed to create casts between built-in types, and we failed to dump
such casts.  Let's get rid of that heuristic, which has accreted even more
ugliness since then, in favor of just looking at the cast's OID to decide
if it's a built-in cast or not.

In passing, also fix some really ancient code that supposed that it had to
manufacture a dependency for the cast on its cast function; that's only
true when dumping from a pre-7.3 server.  This just resulted in some wasted
cycles and duplicate dependency-list entries with newer servers, but we
might as well improve it.

Per gripes from a number of people, most recently Greg Sabino Mullane.
Back-patch to all supported branches.
2015-02-10 22:38:17 -05:00
Tom Lane
433c79d2c8 Fix GEQO to not assume its join order heuristic always works.
Back in commit 400e2c9344 I rewrote GEQO's
gimme_tree function to improve its heuristic for modifying the given tour
into a legal join order.  In what can only be called a fit of hubris,
I supposed that this new heuristic would *always* find a legal join order,
and ripped out the old logic that allowed gimme_tree to sometimes fail.

The folly of this is exposed by bug #12760, in which the "greedy" clumping
behavior of merge_clump() can lead it into a dead end which could only be
recovered from by un-clumping.  We have no code for that and wouldn't know
exactly what to do with it if we did.  Rather than try to improve the
heuristic rules still further, let's just recognize that it *is* a
heuristic and probably must always have failure cases.  So, put back the
code removed in the previous commit to allow for failure (but comment it
a bit better this time).

It's possible that this code was actually fully correct at the time and
has only been broken by the introduction of LATERAL.  But having seen this
example I no longer have much faith in that proposition, so back-patch to
all supported branches.
2015-02-10 20:37:22 -05:00
Tom Lane
ccf655c446 Minor cleanup/code review for "indirect toast" stuff.
Fix some issues I noticed while fooling with an extension to allow an
additional kind of toast pointer.  Much of this is just comment
improvement, but there are a couple of actual bugs, which might or might
not be reachable today depending on what can happen during logical
decoding.  An example is that toast_flatten_tuple() failed to cover the
possibility of an indirection pointer in its input.  Back-patch to 9.4
just in case that is reachable now.

In HEAD, also correct some really minor issues with recent compression
reorganization, such as dangerously underparenthesized macros.
2015-02-09 12:30:55 -05:00
Heikki Linnakangas
3bc4c69427 Report WAL flush, not insert, position in replication IDENTIFY_SYSTEM
When beginning streaming replication, the client usually issues the
IDENTIFY_SYSTEM command, which used to return the current WAL insert
position. That's not suitable for the intended purpose of that field,
however. pg_receivexlog uses it to start replication from the reported
point, but if it hasn't been flushed to disk yet, it will fail. Change
IDENTIFY_SYSTEM to report the flush position instead.

Backpatch to 9.1 and above. 9.0 doesn't report any WAL position.
2015-02-06 11:27:12 +02:00
Heikki Linnakangas
48a565d78b Fix reference-after-free when waiting for another xact due to constraint.
If an insertion or update had to wait for another transaction to finish,
because there was another insertion with conflicting key in progress,
we would pass a just-free'd item pointer to XactLockTableWait().

All calls to XactLockTableWait() and MultiXactIdWait() had similar issues.
Some passed a pointer to a buffer in the buffer cache, after already
releasing the lock. The call in EvalPlanQualFetch had already released the
pin too. All but the call in execUtils.c would merely lead to reporting a
bogus ctid, however (or an assertion failure, if enabled).

All the callers that passed HeapTuple->t_data->t_ctid were slightly bogus
anyway: if the tuple was updated (again) in the same transaction, its ctid
field would point to the next tuple in the chain, not the tuple itself.

Backpatch to 9.4, where the 'ctid' argument to XactLockTableWait was added
(in commit f88d4cfc)
2015-02-04 16:07:27 +02:00
Andres Freund
742734d2b5 Add missing float.h include to snprintf.c.
On windows _isnan() (which isnan() is redirected to in port/win32.h)
is declared in float.h, not math.h.

Per buildfarm animal currawong.

Backpatch to all supported branches.
2015-02-04 13:31:36 +01:00
Tom Lane
900679245d Fix breakage in GEODEBUG debug code.
LINE doesn't have an "m" field (anymore anyway).  Also fix unportable
assumption that %x can print the result of pointer subtraction.

In passing, improve single_decode() in minor ways:
* Remove unnecessary leading-whitespace skip (strtod does that already).
* Make GEODEBUG message more intelligible.
* Remove entirely-useless test to see if strtod returned a silly pointer.
* Don't bother computing trailing-whitespace skip unless caller wants
  an ending pointer.

This has been broken since 261c7d4b65.
Although it's only debug code, might as well fix the 9.4 branch too.
2015-02-03 15:20:48 -05:00
Tom Lane
d0f83327d3 Stamp 9.4.1. 2015-02-02 15:42:55 -05:00
Heikki Linnakangas
57ec87c6b8 Be more careful to not lose sync in the FE/BE protocol.
If any error occurred while we were in the middle of reading a protocol
message from the client, we could lose sync, and incorrectly try to
interpret a part of another message as a new protocol message. That will
usually lead to an "invalid frontend message" error that terminates the
connection. However, this is a security issue because an attacker might
be able to deliberately cause an error, inject a Query message in what's
supposed to be just user data, and have the server execute it.

We were quite careful to not have CHECK_FOR_INTERRUPTS() calls or other
operations that could ereport(ERROR) in the middle of processing a message,
but a query cancel interrupt or statement timeout could nevertheless cause
it to happen. Also, the V2 fastpath and COPY handling were not so careful.
It's very difficult to recover in the V2 COPY protocol, so we will just
terminate the connection on error. In practice, that's what happened
previously anyway, as we lost protocol sync.

To fix, add a new variable in pqcomm.c, PqCommReadingMsg, that is set
whenever we're in the middle of reading a message. When it's set, we cannot
safely ERROR out and continue running, because we might've read only part
of a message. PqCommReadingMsg acts somewhat similarly to critical sections
in that if an error occurs while it's set, the error handler will force the
connection to be terminated, as if the error was FATAL. It's not
implemented by promoting ERROR to FATAL in elog.c, like ERROR is promoted
to PANIC in critical sections, because we want to be able to use
PG_TRY/CATCH to recover and regain protocol sync. pq_getmessage() takes
advantage of that to prevent an OOM error from terminating the connection.

To prevent unnecessary connection terminations, add a holdoff mechanism
similar to HOLD/RESUME_INTERRUPTS() that can be used hold off query cancel
interrupts, but still allow die interrupts. The rules on which interrupts
are processed when are now a bit more complicated, so refactor
ProcessInterrupts() and the calls to it in signal handlers so that the
signal handlers always call it if ImmediateInterruptOK is set, and
ProcessInterrupts() can decide to not do anything if the other conditions
are not met.

Reported by Emil Lenngren. Patch reviewed by Noah Misch and Andres Freund.
Backpatch to all supported versions.

Security: CVE-2015-0244
2015-02-02 17:09:46 +02:00
Bruce Momjian
2ac95c83ce port/snprintf(): fix overflow and do padding
Prevent port/snprintf() from overflowing its local fixed-size
buffer and pad to the desired number of digits with zeros, even
if the precision is beyond the ability of the native sprintf().
port/snprintf() is only used on systems that lack a native
snprintf().

Reported by Bruce Momjian. Patch by Tom Lane.	Backpatch to all
supported versions.

Security: CVE-2015-0242
2015-02-02 10:00:49 -05:00
Bruce Momjian
56d2bee9db to_char(): prevent writing beyond the allocated buffer
Previously very long localized month and weekday strings could
overflow the allocated buffers, causing a server crash.

Reported and patch reviewed by Noah Misch.  Backpatch to all
supported versions.

Security: CVE-2015-0241
2015-02-02 10:00:49 -05:00
Bruce Momjian
1628a0bbfa to_char(): prevent accesses beyond the allocated buffer
Previously very long field masks for floats could access memory
beyond the existing buffer allocated to hold the result.

Reported by Andres Freund and Peter Geoghegan.	Backpatch to all
supported versions.

Security: CVE-2015-0241
2015-02-02 10:00:49 -05:00
Peter Eisentraut
1de0702d92 Translation updates
Source-Git-URL: git://git.postgresql.org/git/pgtranslation/messages.git
Source-Git-Hash: 19c72ea8d856d7b1d4f5d759a766c8206bf9ce53
2015-02-01 23:18:42 -05:00
Tom Lane
c2b06ab177 Update time zone data files to tzdata release 2015a.
DST law changes in Chile and Mexico (state of Quintana Roo).
Historical changes for Iceland.
2015-01-30 22:45:58 -05:00
Tom Lane
4cbf390d5f Fix jsonb Unicode escape processing, and in consequence disallow \u0000.
We've been trying to support \u0000 in JSON values since commit
78ed8e03c6, and have introduced increasingly worse hacks to try to
make it work, such as commit 0ad1a81632.  However, it fundamentally
can't work in the way envisioned, because the stored representation looks
the same as for \\u0000 which is not the same thing at all.  It's also
entirely bogus to output \u0000 when de-escaped output is called for.

The right way to do this would be to store an actual 0x00 byte, and then
throw error only if asked to produce de-escaped textual output.  However,
getting to that point seems likely to take considerable work and may well
never be practical in the 9.4.x series.

To preserve our options for better behavior while getting rid of the nasty
side-effects of 0ad1a81632, revert that commit in toto and instead
throw error if \u0000 is used in a context where it needs to be de-escaped.
(These are the same contexts where non-ASCII Unicode escapes throw error
if the database encoding isn't UTF8, so this behavior is by no means
without precedent.)

In passing, make both the \u0000 case and the non-ASCII Unicode case report
ERRCODE_UNTRANSLATABLE_CHARACTER / "unsupported Unicode escape sequence"
rather than claiming there's something wrong with the input syntax.

Back-patch to 9.4, where we have to do something because 0ad1a81632
broke things for many cases having nothing to do with \u0000.  9.3 also has
bogus behavior, but only for that specific escape value, so given the lack
of field complaints it seems better to leave 9.3 alone.
2015-01-30 14:44:49 -05:00
Tom Lane
b6a164e5cb Fix assorted oversights in range selectivity estimation.
calc_rangesel() failed outright when comparing range variables to empty
constant ranges with < or >=, as a result of missing cases in a switch.
It also produced a bogus estimate for > comparison to an empty range.

On top of that, the >= and > cases were mislabeled throughout.  For
nonempty constant ranges, they managed to produce the right answers
anyway as a result of counterbalancing typos.

Also, default_range_selectivity() omitted cases for elem <@ range,
range &< range, and range &> range, so that rather dubious defaults
were applied for these operators.

In passing, rearrange the code in rangesel() so that the elem <@ range
case is handled in a less opaque fashion.

Report and patch by Emre Hasegeli, some additional work by me
2015-01-30 12:31:08 -05:00
Heikki Linnakangas
dc40ca6963 Fix query-duration memory leak with GIN rescans.
The requiredEntries / additionalEntries arrays were not freed in
freeScanKeys() like other per-key stuff.

It's not obvious, but startScanKey() was only ever called after the keys
have been initialized with ginNewScanKey(). That's why it doesn't need to
worry about freeing existing arrays. The ginIsNewKey() test in gingetbitmap
was never true, because ginrescan free's the existing keys, and it's not OK
to call gingetbitmap twice in a row without calling ginrescan in between.
To make that clear, remove the unnecessary ginIsNewKey(). And just to be
extra sure that nothing funny happens if there is an existing key after all,
call freeScanKeys() to free it if it exists. This makes the code more
straightforward.

(I'm seeing other similar leaks in testing a query that rescans an GIN index
scan, but that's a different issue. This just fixes the obvious leak with
those two arrays.)

Backpatch to 9.4, where GIN fast scan was added.
2015-01-30 17:59:17 +01:00
Kevin Grittner
cb01685282 Allow pg_dump to use jobs and serializable transactions together.
Since 9.3, when the --jobs option was introduced, using it together
with the --serializable-deferrable option generated multiple
errors.  We can get correct behavior by allowing the connection
which acquires the snapshot to use SERIALIZABLE, READ ONLY,
DEFERRABLE and pass that to the workers running the other
connections using REPEATABLE READ, READ ONLY.  This is a bit of a
kluge since the SERIALIZABLE behavior is achieved by running some
of the participating connections at a different isolation level,
but it is a simple and safe change, suitable for back-patching.

This will be followed by a proposal for a more invasive fix with
some slight behavioral changes on just the master branch, based on
suggestions from Andres Freund, but the kluge will be applied to
master until something is agreed along those lines.

Back-patched to 9.3, where the --jobs option was added.

Based on report from Alexander Korotkov
2015-01-30 08:57:53 -06:00
Stephen Frost
b1700055c2 Fix BuildIndexValueDescription for expressions
In 804b6b6db4 we modified
BuildIndexValueDescription to pay attention to which columns are visible
to the user, but unfortunatley that commit neglected to consider indexes
which are built on expressions.

Handle error-reporting of violations of constraint indexes based on
expressions by not returning any detail when the user does not have
table-level SELECT rights.

Backpatch to 9.0, as the prior commit was.

Pointed out by Tom.
2015-01-29 21:59:43 -05:00
Andres Freund
76e962750f Properly terminate the array returned by GetLockConflicts().
GetLockConflicts() has for a long time not properly terminated the
returned array. During normal processing the returned array is zero
initialized which, while not pretty, is sufficient to be recognized as
a invalid virtual transaction id. But the HotStandby case is more than
aesthetically broken: The allocated (and reused) array is neither
zeroed upon allocation, nor reinitialized, nor terminated.

Not having a terminating element means that the end of the array will
not be recognized and that recovery conflict handling will thus read
ahead into adjacent memory. Only terminating when hitting memory
content that looks like a invalid virtual transaction id.  Luckily
this seems so far not have caused significant problems, besides making
recovery conflict more expensive.

Discussion: 20150127142713.GD29457@awork2.anarazel.de

Backpatch into all supported branches.
2015-01-29 22:48:45 +01:00
Heikki Linnakangas
28a37deab3 Fix bug where GIN scan keys were not initialized with gin_fuzzy_search_limit.
When gin_fuzzy_search_limit was used, we could jump out of startScan()
without calling startScanKey(). That was harmless in 9.3 and below, because
startScanKey()() didn't do anything interesting, but in 9.4 it initializes
information needed for skipping entries (aka GIN fast scans), and you
readily get a segfault if it's not done. Nevertheless, it was clearly wrong
all along, so backpatch all the way to 9.1 where the early return was
introduced.

(AFAICS startScanKey() did nothing useful in 9.3 and below, because the
fields it initialized were already initialized in ginFillScanKey(), but I
don't dare to change that in a minor release. ginFillScanKey() is always
called in gingetbitmap() even though there's a check there to see if the
scan keys have already been initialized, because they never are; ginrescan()
free's them.)

In the passing, remove unnecessary if-check from the second inner loop in
startScan(). We already check in the first loop that the condition is true
for all entries.

Reported by Olaf Gawenda, bug #12694, Backpatch to 9.1 and above, although
AFAICS it causes a live bug only in 9.4.
2015-01-29 19:37:29 +02:00
Stephen Frost
b40fe42781 Clean up range-table building in copy.c
Commit 804b6b6db4 added the build of a
range table in copy.c to initialize the EState es_range_table since it
can be needed in error paths.  Unfortunately, that commit didn't
appreciate that some code paths might end up not initializing the rte
which is used to build the range table.

Fix that and clean up a couple others things along the way- build it
only once and don't explicitly set it on the !is_from path as it
doesn't make any sense there (cstate is palloc0'd, so this isn't an
issue from an initializing standpoint either).

The prior commit went back to 9.0, but this only goes back to 9.1 as
prior to that the range table build happens immediately after building
the RTE and therefore doesn't suffer from this issue.

Pointed out by Robert.
2015-01-28 17:42:51 -05:00
Stephen Frost
3cc74a3d61 Fix column-privilege leak in error-message paths
While building error messages to return to the user,
BuildIndexValueDescription, ExecBuildSlotValueDescription and
ri_ReportViolation would happily include the entire key or entire row in
the result returned to the user, even if the user didn't have access to
view all of the columns being included.

Instead, include only those columns which the user is providing or which
the user has select rights on.  If the user does not have any rights
to view the table or any of the columns involved then no detail is
provided and a NULL value is returned from BuildIndexValueDescription
and ExecBuildSlotValueDescription.  Note that, for key cases, the user
must have access to all of the columns for the key to be shown; a
partial key will not be returned.

Back-patch all the way, as column-level privileges are now in all
supported versions.

This has been assigned CVE-2014-8161, but since the issue and the patch
have already been publicized on pgsql-hackers, there's no point in trying
to hide this commit.
2015-01-28 12:32:06 -05:00
Tom Lane
3d5e857ab1 Fix NUMERIC field access macros to treat NaNs consistently.
Commit 145343534c arranged to store numeric
NaN values as short-header numerics, but the field access macros did not
get the memo: they thought only "SHORT" numerics have short headers.

Most of the time this makes no difference because we don't access the
weight or dscale of a NaN; but numeric_send does that.  As pointed out
by Andrew Gierth, this led to fetching uninitialized bytes.

AFAICS this could not have any worse consequences than that; in particular,
an unaligned stored numeric would have been detoasted by PG_GETARG_NUMERIC,
so that there's no risk of a fetch off the end of memory.  Still, the code
is wrong on its own terms, and it's not hard to foresee future changes that
might expose us to real risks.  So back-patch to all affected branches.
2015-01-27 12:06:36 -05:00
Tom Lane
dd9d78f595 Fix volatile-safety issue in pltcl_SPI_execute_plan().
The "callargs" variable is modified within PG_TRY and then referenced
within PG_CATCH, which is exactly the coding pattern we've now found
to be unsafe.  Marking "callargs" volatile would be problematic because
it is passed by reference to some Tcl functions, so fix the problem
by not modifying it within PG_TRY.  We can just postpone the free()
till we exit the PG_TRY construct, as is already done elsewhere in this
same file.

Also, fix failure to free(callargs) when exiting on too-many-arguments
error.  This is only a minor memory leak, but a leak nonetheless.

In passing, remove some unnecessary "volatile" markings in the same
function.  Those doubtless are there because gcc 2.95.3 whinged about
them, but we now know that its algorithm for complaining is many bricks
shy of a load.

This is certainly a live bug with compilers that optimize similarly
to current gcc, so back-patch to all active branches.
2015-01-26 12:18:36 -05:00
Tom Lane
df923be03d Fix volatile-safety issue in asyncQueueReadAllNotifications().
The "pos" variable is modified within PG_TRY and then referenced
within PG_CATCH, so for strict POSIX conformance it must be marked
volatile.  Superficially the code looked safe because pos's address
was taken, which was sufficient to force it into memory ... but it's
not sufficient to ensure that the compiler applies updates exactly
where the program text says to.  The volatility marking has to extend
into a couple of subroutines too, but I think that's probably a good
thing because the risk of out-of-order updates is mostly in those
subroutines not asyncQueueReadAllNotifications() itself.  In principle
the compiler could have re-ordered operations such that an error could
be thrown while "pos" had an incorrect value.

It's unclear how real the risk is here, but for safety back-patch
to all active branches.
2015-01-26 11:57:36 -05:00