Commit graph

7052 commits

Author SHA1 Message Date
Tom Lane
72eee410d4 Move pg_constraint.h function declarations to new file pg_constraint_fn.h.
A pending patch requires exporting a function returning Bitmapset from
catalog/pg_constraint.c.  As things stand, that would mean including
nodes/bitmapset.h in pg_constraint.h, which might be hazardous for the
client-side includability of that header.  It's not entirely clear whether
any client-side code needs to include pg_constraint.h, but it seems prudent
to assume that there is some such code somewhere.  Therefore, split off the
function definitions into a new file pg_constraint_fn.h, similarly to what
we've done for some other catalog header files.
2016-02-11 15:51:28 -05:00
Robert Haas
c319991bca Use separate lwlock tranches for buffer, lock, and predicate lock managers.
This finishes the work - spread across many commits over the last
several months - of putting each type of lock other than the named
individual locks into a separate tranche.

Amit Kapila
2016-02-11 14:07:33 -05:00
Robert Haas
a455878d99 Rename PGPROC fields related to group XID clearing again.
Commit 0e141c0fbb introduced a new
facility to reduce ProcArrayLock contention by clearing several XIDs
from the ProcArray under a single lock acquisition.  The names
initially chosen were deemed not to be very good choices, so commit
4aec49899e renamed them.  But now it
seems like we still didn't get it right.  A pending patch wants to
add similar infrastructure for batching CLOG updates, so the names
need to be clear enough to allow a new set of structure members with
a related purpose.

Amit Kapila
2016-02-11 08:55:24 -05:00
Tom Lane
c5e9b77127 Revert "Temporarily make pg_ctl and server shutdown a whole lot chattier."
This reverts commit 3971f64843 and a
couple of followon debugging commits; I think we've learned what we can
from them.
2016-02-10 16:01:04 -05:00
Robert Haas
79a7ff0fe5 Code cleanup in the wake of recent LWLock refactoring.
As of commit c1772ad922, there's no
longer any way of requesting additional LWLocks in the main tranche,
so we don't need NumLWLocks() or LWLockAssign() any more.  Also,
some of the allocation counters that we had previously aren't needed
any more either.

Amit Kapila
2016-02-10 09:58:09 -05:00
Tom Lane
3971f64843 Temporarily make pg_ctl and server shutdown a whole lot chattier.
This is a quick hack, due to be reverted when its purpose has been served,
to try to gather information about why some of the buildfarm critters
regularly fail with "postmaster does not shut down" complaints.  Maybe they
are just really overloaded, but maybe something else is going on.  Hence,
instrument pg_ctl to print the current time when it starts waiting for
postmaster shutdown and when it gives up, and add a lot of logging of the
current time in the server's checkpoint and shutdown code paths.

No attempt has been made to make this pretty.  I'm not even totally sure
if it will build on Windows, but we'll soon find out.
2016-02-08 18:43:11 -05:00
Tom Lane
3bb3f42f37 Fix some regex issues with out-of-range characters and large char ranges.
Previously, our regex code defined CHR_MAX as 0xfffffffe, which is a
bad choice because it is outside the range of type "celt" (int32).
Characters approaching that limit could lead to infinite loops in logic
such as "for (c = a; c <= b; c++)" where c is of type celt but the
range bounds are chr.  Such loops will work safely only if CHR_MAX+1
is representable in celt, since c must advance to beyond b before the
loop will exit.

Fortunately, there seems no reason not to restrict CHR_MAX to 0x7ffffffe.
It's highly unlikely that Unicode will ever assign codes that high, and
none of our other backend encodings need characters beyond that either.

In addition to modifying the macro, we have to explicitly enforce character
range restrictions on the values of \u, \U, and \x escape sequences, else
the limit is trivially bypassed.

Also, the code for expanding case-independent character ranges in bracket
expressions had a potential integer overflow in its calculation of the
number of characters it could generate, which could lead to allocating too
small a character vector and then overwriting memory.  An attacker with the
ability to supply arbitrary regex patterns could easily cause transient DOS
via server crashes, and the possibility for privilege escalation has not
been ruled out.

Quite aside from the integer-overflow problem, the range expansion code was
unnecessarily inefficient in that it always produced a result consisting of
individual characters, abandoning the knowledge that we had a range to
start with.  If the input range is large, this requires excessive memory.
Change it so that the original range is reported as-is, and then we add on
any case-equivalent characters that are outside that range.  With this
approach, we can bound the number of individual characters allowed without
sacrificing much.  This patch allows at most 100000 individual characters,
which I believe to be more than the number of case pairs existing in
Unicode, so that the restriction will never be hit in practice.

It's still possible for range() to take awhile given a large character code
range, so also add statement-cancel detection to its loop.  The downstream
function dovec() also lacked cancel detection, and could take a long time
given a large output from range().

Per fuzz testing by Greg Stark.  Back-patch to all supported branches.

Security: CVE-2016-0773
2016-02-08 10:25:40 -05:00
Robert Haas
d89f06f048 Fix parallel-safety markings for pg_upgrade functions.
These establish backend-local state which will not be copied to
parallel workers, so they must be marked parallel-restricted, not
parallel-safe.
2016-02-07 11:45:21 -05:00
Robert Haas
7c944bd903 Introduce a new GUC force_parallel_mode for testing purposes.
When force_parallel_mode = true, we enable the parallel mode restrictions
for all queries for which this is believed to be safe.  For the subset of
those queries believed to be safe to run entirely within a worker, we spin
up a worker and run the query there instead of running it in the
original process.  When force_parallel_mode = regress, make additional
changes to allow the regression tests to run cleanly even though parallel
workers have been injected under the hood.

Taken together, this facilitates both better user testing and better
regression testing of the parallelism code.

Robert Haas, with help from Amit Kapila and Rushabh Lathia.
2016-02-07 11:41:33 -05:00
Robert Haas
a1c1af2a1f Introduce group locking to prevent parallel processes from deadlocking.
For locking purposes, we now regard heavyweight locks as mutually
non-conflicting between cooperating parallel processes.  There are some
possible pitfalls to this approach that are not to be taken lightly,
but it works OK for now and can be changed later if we find a better
approach.  Without this, it's very easy for parallel queries to
silently self-deadlock if the user backend holds strong relation locks.

Robert Haas, with help from Amit Kapila.  Thanks to Noah Misch and
Andres Freund for extensive discussion of possible issues with this
approach.
2016-02-07 10:16:13 -05:00
Tom Lane
aa2387e2fd Improve speed of timestamp/time/date output functions.
It seems that sprintf(), at least in glibc's version, is unreasonably slow
compared to hand-rolled code for printing integers.  Replacing most uses of
sprintf() in the datetime.c output functions with special-purpose code
turns out to give more than a 2X speedup in COPY of a table with a single
timestamp column; which is pretty impressive considering all the other
logic in that code path.

David Rowley and Andres Freund, reviewed by Peter Geoghegan and myself
2016-02-06 23:11:28 -05:00
Robert Haas
78bea62ab0 Fix typo.
Amit Kapila
2016-02-05 07:56:59 -05:00
Tom Lane
6819514fca Add num_nulls() and num_nonnulls() to count NULL arguments.
An example use-case is "CHECK(num_nonnulls(a,b,c) = 1)" to assert that
exactly one of a,b,c isn't NULL.  The functions are variadic, so they
can also be pressed into service to count the number of null or nonnull
elements in an array.

Marko Tiikkaja, reviewed by Pavel Stehule
2016-02-04 23:03:37 -05:00
Robert Haas
a104a017fc Add some additional core functions to support join pushdown for FDWs.
GetExistingLocalJoinPath() is useful for handling EvalPlanQual rechecks
properly, and GetUserMappingById() is needed to make sure you're using
the right credentials.

Shigeru Hanada, Etsuro Fujita, Ashutosh Bapat, Robert Haas
2016-02-04 17:05:09 -05:00
Robert Haas
c1772ad922 Change the way that LWLocks for extensions are allocated.
The previous RequestAddinLWLocks() method had several disadvantages.
First, the locks would be in the main tranche; we've recently decided
that it's useful for LWLocks used for separate purposes to have
separate tranche IDs.  Second, there wasn't any correlation between
what code called RequestAddinLWLocks() and what code called
LWLockAssign(); when multiple modules are in use, it could become
quite difficult to troubleshoot problems where LWLockAssign() ran out
of locks.  To fix, create a concept of named LWLock tranches which
can be used either by extension or by core code.

Amit Kapila and Robert Haas
2016-02-04 16:43:04 -05:00
Robert Haas
b47b4dbf68 Extend sortsupport for text to more opclasses.
Have varlena.c expose an interface that allows the char(n), bytea, and
bpchar types to piggyback on a now-generalized SortSupport for text.
This pushes a little more knowledge of the bpchar/char(n) type into
varlena.c than might be preferred, but that seems like the approach
that creates least friction.  Also speed things up for index builds
that use text_pattern_ops or varchar_pattern_ops.

This patch does quite a bit of renaming, but it seems likely to be
worth it, so as to avoid future confusion about the fact that this code
is now more generally used than the old names might have suggested.

Peter Geoghegan, reviewed by Álvaro Herrera and Andreas Karlsson,
with small tweaks by me.
2016-02-03 14:29:53 -05:00
Robert Haas
69d34408e5 Allow parallel custom and foreign scans.
This patch doesn't put the new infrastructure to use anywhere, and
indeed it's not clear how it could ever be used for something like
postgres_fdw which has to send an SQL query and wait for a reply,
but there might be FDWs or custom scan providers that are CPU-bound,
so let's give them a way to join club parallel.

KaiGai Kohei, reviewed by me.
2016-02-03 12:49:46 -05:00
Robert Haas
f2305d40ec Remove CustomPath's TextOutCustomPath method.
You can't really do anything useful with this in the form it currently
exists; among other problems, there's no way to reread whatever
information might be produced when the path is output.  Work is
underway to replace this with a more useful and more general system of
extensible nodes, but let's start by getting rid of this bit.

Extracted from a larger patch by KaiGai Kohei.
2016-02-03 10:38:50 -05:00
Peter Eisentraut
7d17e683fc Add support for systemd service notifications
Insert sd_notify() calls at server start and stop for integration with
systemd.  This allows the use of systemd service units of type "notify",
which greatly simplifies the systemd configuration.

Reviewed-by: Pavel Stěhule <pavel.stehule@gmail.com>
2016-02-02 21:04:29 -05:00
Tom Lane
2ad83fff22 Remove unnecessary "implementation of FOO operator" DESCR() entries.
Apparently at least one committer hasn't gotten the word that these do not
need to be maintained by hand, since initdb will create them automatically.
Noted while fixing bug #13905.

No catversion bump since the post-initdb state is exactly the same either
way.  I don't see a need for back-patch, either.
2016-02-02 11:52:27 -05:00
Tom Lane
a4627e8fd4 Fix pg_description entries for jsonb_to_record() and jsonb_to_recordset().
All the other jsonb function descriptions refer to the arguments as being
"jsonb", but these two said "json".  Make it consistent.  Per bug #13905
from Petru Florin Mihancea.

No catversion bump --- we can't force one in the back branches, and this
isn't very critical anyway.
2016-02-02 11:39:50 -05:00
Robert Haas
7191ce8bea Make all built-in lwlock tranche IDs fixed.
This makes the values more stable, which seems like a good thing for
anybody who needs to look at at them.

Alexander Korotkov and Amit Kapila
2016-02-02 06:45:55 -05:00
Robert Haas
2251179e6a Migrate replication slot I/O locks into a separate tranche.
This is following in a long train of similar changes and for the same
reasons - see b319356f0e and
fe702a7b3f inter alia.

Author: Amit Kapila
Reviewed-by: Alexander Korotkov, Robert Haas
2016-01-29 09:45:38 -05:00
Robert Haas
b319356f0e Migrate PGPROC's backendLock into PGPROC itself, using a new tranche.
Previously, each PGPROC's backendLock was part of the main tranche,
and the PGPROC just contained a pointer.  Now, the actual LWLock is
part of the PGPROC.

As with previous, similar patches, this makes it significantly easier
to identify these lwlocks in LWLOCK_STATS or Trace_lwlocks output
and improves modularity.

Author: Ildus Kurbangaliev
Reviewed-by: Amit Kapila, Robert Haas
2016-01-29 08:14:28 -05:00
Robert Haas
fbe5a3fb73 Only try to push down foreign joins if the user mapping OIDs match.
Previously, the foreign join pushdown infrastructure left the question
of security entirely up to individual FDWs, but it would be easy for
a foreign data wrapper to inadvertently open up subtle security holes
that way.  So, make it the core code's job to determine which user
mapping OID is relevant, and don't attempt join pushdown unless it's
the same for all relevant relations.

Per a suggestion from Tom Lane.  Shigeru Hanada and Ashutosh Bapat,
reviewed by Etsuro Fujita and KaiGai Kohei, with some further
changes by me.
2016-01-28 14:05:36 -05:00
Robert Haas
96198d94cb Avoid multiple foreign server connections when all use same user mapping.
Previously, postgres_fdw's connection cache was keyed by user OID and
server OID, but this can lead to multiple connections when it's not
really necessary.  In particular, if all relevant users are mapped to
the public user mapping, then their connection options are certainly
the same, so one connection can be used for all of them.

While we're cleaning things up here, drop the "server" argument to
GetConnection(), which isn't really needed.  This saves a few cycles
because callers no longer have to look this up; the function itself
does, but only when establishing a new connection, not when reusing
an existing one.

Ashutosh Bapat, with a few small changes by me.
2016-01-28 12:05:19 -05:00
Fujii Masao
7f46eaf035 Add gin_clean_pending_list function to clean up GIN pending list
This function cleans up the pending list of the GIN index by
moving entries in it to the main GIN data structure in bulk.
It returns the number of pages cleaned up from the pending list.

This function is useful, for example, when the pending list
needs to be cleaned up *quickly* to improve the performance of
the search using GIN index. VACUUM can do the same thing, too,
but it may take days to run on a large table.

Jeff Janes,
reviewed by Julien Rouhaud, Jaime Casanova, Alvaro Herrera and me.

Discussion: CAMkU=1x8zFkpfnozXyt40zmR3Ub_kHu58LtRmwHUKRgQss7=iQ@mail.gmail.com
2016-01-28 12:57:52 +09:00
Fujii Masao
e09507a272 Fix volatility marking of pg_size_pretty function
pg_size_pretty function should be marked immutable rather than volatile
because it always returns the same result given the same argument.

Pavel Stehule
2016-01-27 11:13:31 +09:00
Tom Lane
e1bd684a34 Add trigonometric functions that work in degrees.
The implementations go to some lengths to deliver exact results for values
where an exact result can be expected, such as sind(30) = 0.5 exactly.

Dean Rasheed, reviewed by Michael Paquier
2016-01-22 15:46:22 -05:00
Tom Lane
a396144ac0 Remove new coupling between NAMEDATALEN and MAX_LEVENSHTEIN_STRLEN.
Commit e529cd4ffa introduced an Assert requiring NAMEDATALEN to be
less than MAX_LEVENSHTEIN_STRLEN, which has been 255 for a long time.
Since up to that instant we had always allowed NAMEDATALEN to be
substantially more than that, this was ill-advised.

It's debatable whether we need MAX_LEVENSHTEIN_STRLEN at all (versus
putting a CHECK_FOR_INTERRUPTS into the loop), or whether it has to be
so tight; but this patch takes the narrower approach of just not applying
the MAX_LEVENSHTEIN_STRLEN limit to calls from the parser.

Trusting the parser for this seems reasonable, first because the strings
are limited to NAMEDATALEN which is unlikely to be hugely more than 256,
and second because the maximum distance is tightly constrained by
MAX_FUZZY_DISTANCE (though we'd forgotten to make use of that limit in one
place).  That means the cost is not really O(mn) but more like O(max(m,n)).

Relaxing the limit for user-supplied calls is left for future research;
given the lack of complaints to date, it doesn't seem very high priority.

In passing, fix confusion between lengths-in-bytes and lengths-in-chars
in comments and error messages.

Per gripe from Kevin Day; solution suggested by Robert Haas.  Back-patch
to 9.5 where the unwanted restriction was introduced.
2016-01-22 11:53:06 -05:00
Tom Lane
be44ed27b8 Improve index AMs' opclass validation procedures.
The amvalidate functions added in commit 65c5fcd353 were on the
crude side.  Improve them in a few ways:

* Perform signature checking for operators and support functions.

* Apply more thorough checks for missing operators and functions,
where possible.

* Instead of reporting problems as ERRORs, report most problems as INFO
messages and make the amvalidate function return FALSE.  This allows
more than one problem to be discovered per run.

* Report object names rather than OIDs, and work a bit harder on making
the messages understandable.

Also, remove a few more opr_sanity regression test queries that are
now superseded by the amvalidate checks.
2016-01-21 19:47:15 -05:00
Tom Lane
b99551832e Add defenses against putting expanded objects into Const nodes.
Putting a reference to an expanded-format value into a Const node would be
a bad idea for a couple of reasons.  It'd be possible for the supposedly
immutable Const to change value, if something modified the referenced
variable ... in fact, if the Const's reference were R/W, any function that
has the Const as argument might itself change it at runtime.  Also, because
datumIsEqual() is pretty simplistic, the Const might fail to compare equal
to other Consts that it should compare equal to, notably including copies
of itself.  This could lead to unexpected planner behavior, such as "could
not find pathkey item to sort" errors or inferior plans.

I have not been able to find any way to get an expanded value into a Const
within the existing core code; but Paul Ramsey was able to trigger the
problem by writing a datatype input function that returns an expanded
value.

The best fix seems to be to establish a rule that varlena values being
placed into Const nodes should be passed through pg_detoast_datum().
That will do nothing (and cost little) in normal cases, but it will flatten
expanded values and thereby avoid the above problems.  Also, it will
convert short-header or compressed values into canonical format, which will
avoid possible unexpected lack-of-equality issues for those cases too.
And it provides a last-ditch defense against putting a toasted value into
a Const, which we already knew was dangerous, cf commit 2b0c86b665.
(In the light of this discussion, I'm no longer sure that that commit
provided 100% protection against such cases, but this fix should do it.)

The test added in commit 65c3d05e18 to catch datatype input functions
with unstable results would fail for functions that returned expanded
values; but it seems a bit uncharitable to deem a result unstable just
because it's expressed in expanded form, so revise the coding so that we
check for bitwise equality only after applying pg_detoast_datum().  That's
a sufficient condition anyway given the new rule about detoasting when
forming a Const.

Back-patch to 9.5 where the expanded-object facility was added.  It's
possible that this should go back further; but in the absence of clear
evidence that there's any live bug in older branches, I'll refrain for now.
2016-01-21 12:56:08 -05:00
Fujii Masao
38710a374e Remove unused argument from ginInsertCleanup()
It's an oversight in commit dc943ad.
2016-01-22 01:22:56 +09:00
Simon Riggs
c80b31d557 Refactor headers to split out standby defs
Jeff Janes
2016-01-20 18:51:34 -08:00
Simon Riggs
978b2f65aa Speedup 2PC by skipping two phase state files in normal path
2PC state info is written only to WAL at PREPARE, then read back from WAL at
COMMIT PREPARED/ABORT PREPARED. Prepared transactions that live past one bufmgr
checkpoint cycle will be written to disk in the same form as previously. Crash
recovery path is not altered. Measured performance gains of 50-100% for short
2PC transactions by completely avoiding writing files and fsyncing. Other
optimizations still available, further patches in related areas expected.

Stas Kelvich and heavily edited by Simon Riggs

Based upon earlier ideas and patches by Michael Paquier and Heikki Linnakangas,
a concrete example of how Postgres-XC has fed back ideas into PostgreSQL.

Reviewed by Michael Paquier, Jeff Janes and Andres Freund
Performance testing by Jesper Pedersen
2016-01-20 18:40:44 -08:00
Simon Riggs
422a55a687 Refactor to create generic WAL page read callback
Previously we didn’t have a generic WAL page read callback function,
surprisingly. Logical decoding has logical_read_local_xlog_page(), which was
actually generic, so move that to xlogfunc.c and rename to
read_local_xlog_page().
Maintain logical_read_local_xlog_page() so existing callers still work.

As requested by Michael Paquier, Alvaro Herrera and Andres Freund
2016-01-20 17:18:58 -08:00
Robert Haas
45be99f8cd Support parallel joins, and make related improvements.
The core innovation of this patch is the introduction of the concept
of a partial path; that is, a path which if executed in parallel will
generate a subset of the output rows in each process.  Gathering a
partial path produces an ordinary (complete) path.  This allows us to
generate paths for parallel joins by joining a partial path for one
side (which at the baserel level is currently always a Partial Seq
Scan) to an ordinary path on the other side.  This is subject to
various restrictions at present, especially that this strategy seems
unlikely to be sensible for merge joins, so only nested loops and
hash joins paths are generated.

This also allows an Append node to be pushed below a Gather node in
the case of a partitioned table.

Testing revealed that early versions of this patch made poor decisions
in some cases, which turned out to be caused by the fact that the
original cost model for Parallel Seq Scan wasn't very good.  So this
patch tries to make some modest improvements in that area.

There is much more to be done in the area of generating good parallel
plans in all cases, but this seems like a useful step forward.

Patch by me, reviewed by Dilip Kumar and Amit Kapila.
2016-01-20 14:40:26 -05:00
Robert Haas
a7de3dc5c3 Support multi-stage aggregation.
Aggregate nodes now have two new modes: a "partial" mode where they
output the unfinalized transition state, and a "finalize" mode where
they accept unfinalized transition states rather than individual
values as input.

These new modes are not used anywhere yet, but they will be necessary
for parallel aggregation.  The infrastructure also figures to be
useful for cases where we want to aggregate local data and remote
data via the FDW interface, and want to bring back partial aggregates
from the remote side that can then be combined with locally generated
partial aggregates to produce the final value.  It may also be useful
even when neither FDWs nor parallelism are in play, as explained in
the comments in nodeAgg.c.

David Rowley and Simon Riggs, reviewed by KaiGai Kohei, Heikki
Linnakangas, Haribabu Kommi, and me.
2016-01-20 13:46:50 -05:00
Tom Lane
dbe2328959 Fix assorted inconsistencies in GIN opclass support function declarations.
GIN had some minor issues too, mostly using "internal" where something
else would be more appropriate.  I went with the same approach as in
9ff60273e3, namely preferring the opclass' indexed datatype for
arguments that receive an operator RHS value, even if that's not
necessarily what they really are.

Again, this is with an eye to having a uniform rule for ginvalidate()
to check support function signatures.
2016-01-19 22:32:22 -05:00
Alvaro Herrera
948c97958b Add two HyperLogLog functions
New functions initHyperLogLogError() and freeHyperLogLog() simplify
using this module from elsewhere.

Author: Tomáš Vondra
Review: Peter Geoghegan
2016-01-19 17:40:15 -03:00
Tom Lane
9ff60273e3 Fix assorted inconsistencies in GiST opclass support function declarations.
The conventions specified by the GiST SGML documentation were widely
ignored.  For example, the strategy-number argument for "consistent" and
"distance" functions is specified to be a smallint, but most of the
built-in support functions declared it as an integer, and for that matter
the core code passed it using Int32GetDatum not Int16GetDatum.  None of
that makes any real difference at runtime, but it's quite confusing for
newcomers to the code, and it makes it very hard to write an amvalidate()
function that checks support function signatures.  So let's try to instill
some consistency here.

Another similar issue is that the "query" argument is not of a single
well-defined type, but could have different types depending on the strategy
(corresponding to search operators with different righthand-side argument
types).  Some of the functions threw up their hands and declared the query
argument as being of "internal" type, which surely isn't right ("any" would
have been more appropriate); but the majority position seemed to be to
declare it as being of the indexed data type, corresponding to a search
operator with both input types the same.  So I've specified a convention
that that's what to do always.

Also, the result of the "union" support function actually must be of the
index's storage type, but the documentation suggested declaring it to
return "internal", and some of the functions followed that.  Standardize
on telling the truth, instead.

Similarly, standardize on declaring the "same" function's inputs as
being of the storage type, not "internal".

Also, somebody had forgotten to add the "recheck" argument to both
the documentation of the "distance" support function and all of their
SQL declarations, even though the C code was happily using that argument.
Clean that up too.

Fix up some other omissions in the docs too, such as documenting that
union's second input argument is vestigial.

So far as the errors in core function declarations go, we can just fix
pg_proc.h and bump catversion.  Adjusting the erroneous declarations in
contrib modules is more debatable: in principle any change in those
scripts should involve an extension version bump, which is a pain.
However, since these changes are purely cosmetic and make no functional
difference, I think we can get away without doing that.
2016-01-19 12:04:36 -05:00
Tom Lane
65c5fcd353 Restructure index access method API to hide most of it at the C level.
This patch reduces pg_am to just two columns, a name and a handler
function.  All the data formerly obtained from pg_am is now provided
in a C struct returned by the handler function.  This is similar to
the designs we've adopted for FDWs and tablesample methods.  There
are multiple advantages.  For one, the index AM's support functions
are now simple C functions, making them faster to call and much less
error-prone, since the C compiler can now check function signatures.
For another, this will make it far more practical to define index access
methods in installable extensions.

A disadvantage is that SQL-level code can no longer see attributes
of index AMs; in particular, some of the crosschecks in the opr_sanity
regression test are no longer possible from SQL.  We've addressed that
by adding a facility for the index AM to perform such checks instead.
(Much more could be done in that line, but for now we're content if the
amvalidate functions more or less replace what opr_sanity used to do.)
We might also want to expose some sort of reporting functionality, but
this patch doesn't do that.

Alexander Korotkov, reviewed by Petr Jelínek, and rather heavily
editorialized on by me.
2016-01-17 19:36:59 -05:00
Tom Lane
8d290c8ec6 Re-pgindent a few files.
In preparation for landing index AM interface changes.
2016-01-17 19:13:18 -05:00
Magnus Hagander
cf7dfbf2d6 Fix minor typo in comment
Tatsuro Yamada
2016-01-15 10:24:37 +01:00
Simon Riggs
e63bb4549a Add new user fn pg_current_xlog_flush_location()
Tomas Vondra, reviewed by Michael Paquier and Amit Kapila
Minor edits by me
2016-01-12 07:54:52 +00:00
Tom Lane
26d538dc93 Clean up some lack-of-STRICT issues in the core code, too.
A scan for missed proisstrict markings in the core code turned up
these functions:

brin_summarize_new_values
pg_stat_reset_single_table_counters
pg_stat_reset_single_function_counters
pg_create_logical_replication_slot
pg_create_physical_replication_slot
pg_drop_replication_slot

The first three of these take OID, so a null argument will normally look
like a zero to them, resulting in "ERROR: could not open relation with OID
0" for brin_summarize_new_values, and no action for the pg_stat_reset_XXX
functions.  The other three will dump core on a null argument, though this
is mitigated by the fact that they won't do so until after checking that
the caller is superuser or has rolreplication privilege.

In addition, the pg_logical_slot_get/peek[_binary]_changes family was
intentionally marked nonstrict, but failed to make nullness checks on all
the arguments; so again a null-pointer-dereference crash is possible but
only for superusers and rolreplication users.

Add the missing ARGISNULL checks to the latter functions, and mark the
former functions as strict in pg_proc.  Make that change in the back
branches too, even though we can't force initdb there, just so that
installations initdb'd in future won't have the issue.  Since none of these
bugs rise to the level of security issues (and indeed the pg_stat_reset_XXX
functions hardly misbehave at all), it seems sufficient to do this.

In addition, fix some order-of-operations oddities in the slot_get_changes
family, mostly cosmetic, but not the part that moves the function's last
few operations into the PG_TRY block.  As it stood, there was significant
risk for an error to exit without clearing historical information from
the system caches.

The slot_get_changes bugs go back to 9.4 where that code was introduced.
Back-patch appropriate subsets of the pg_proc changes into all active
branches, as well.
2016-01-09 16:58:32 -05:00
Simon Riggs
687f2cd7a0 Avoid pin scan for replay of XLOG_BTREE_VACUUM
Replay of XLOG_BTREE_VACUUM during Hot Standby was previously thought to require
complex interlocking that matched the requirements on the master. This required
an O(N) operation that became a significant problem with large indexes, causing
replication delays of seconds or in some cases minutes while the
XLOG_BTREE_VACUUM was replayed.

This commit skips the “pin scan” that was previously required, by observing in
detail when and how it is safe to do so, with full documentation. The pin scan
is skipped only in replay; the VACUUM code path on master is not touched here.

The current commit still performs the pin scan for toast indexes, though this
can also be avoided if we recheck scans on toast indexes. Later patch will
address this.

No tests included. Manual tests using an additional patch to view WAL records
and their timing have shown the change in WAL records and their handling has
successfully reduced replication delay.
2016-01-09 10:10:08 +00:00
Magnus Hagander
2650486ebc Fix typo in comment
Tatsuro Yamada
2016-01-08 08:54:40 +01:00
Alvaro Herrera
b1a9bad9e7 pgstat: add WAL receiver status view & SRF
This new view provides insight into the state of a running WAL receiver
in a HOT standby node.
The information returned includes the PID of the WAL receiver process,
its status (stopped, starting, streaming, etc), start LSN and TLI, last
received LSN and TLI, timestamp of last message send and receipt, latest
end-of-WAL LSN and time, and the name of the slot (if any).

Access to the detailed data is only granted to superusers; others only
get the PID.

Author: Michael Paquier
Reviewer: Haribabu Kommi
2016-01-07 16:21:19 -03:00
Alvaro Herrera
a967613911 Windows: Make pg_ctl reliably detect service status
pg_ctl is using isatty() to verify whether the process is running in a
terminal, and if not it sends its output to Windows' Event Log ... which
does the wrong thing when the output has been redirected to a pipe, as
reported in bug #13592.

To fix, make pg_ctl use the code we already have to detect service-ness:
in the master branch, move src/backend/port/win32/security.c to src/port
(with suitable tweaks so that it runs properly in backend and frontend
environments); pg_ctl already has access to pgport so it Just Works.  In
older branches, that's likely to cause trouble, so instead duplicate the
required code in pg_ctl.c.

Author: Michael Paquier
Bug report and diagnosis: Egon Kocjan
Backpatch: all supported branches
2016-01-07 11:59:08 -03:00