postgresql

mirror of https://github.com/postgres/postgres.git synced 2026-04-23 15:19:08 -04:00

Author	SHA1	Message	Date
Heikki Linnakangas	25600c42e0	Fix misc typos. Oskari Saarenmaa. Backpatch to stable branches where applicable.	2015-09-05 11:36:18 +03:00
Tom Lane	a2538da89f	Fix subtransaction cleanup after an outer-subtransaction portal fails. Formerly, we treated only portals created in the current subtransaction as having failed during subtransaction abort. However, if the error occurred while running a portal created in an outer subtransaction (ie, a cursor declared before the last savepoint), that has to be considered broken too. To allow reliable detection of which ones those are, add a bookkeeping field to struct Portal that tracks the innermost subtransaction in which each portal has actually been executed. (Without this, we'd end up failing portals containing functions that had called the subtransaction, thereby breaking plpgsql exception blocks completely.) In addition, when we fail an outer-subtransaction Portal, transfer its resources into the subtransaction's resource owner, so that they're released early in cleanup of the subxact. This fixes a problem reported by Jim Nasby in which a function executed in an outer-subtransaction cursor could cause an Assert failure or crash by referencing a relation created within the inner subtransaction. The proximate cause of the Assert failure is that AtEOSubXact_RelationCache assumed it could blow away a relcache entry without first checking that the entry had zero refcount. That was a bad idea on its own terms, so add such a check there, and to the similar coding in AtEOXact_RelationCache. This provides an independent safety measure in case there are still ways to provoke the situation despite the Portal-level changes. This has been broken since subtransactions were invented, so back-patch to all supported branches. Tom Lane and Michael Paquier	2015-09-04 13:37:16 -04:00
Tom Lane	ffbc387bfc	Fix s_lock.h PPC assembly code to be compatible with native AIX assembler. On recent AIX it's necessary to configure gcc to use the native assembler (because the GNU assembler hasn't been updated to handle AIX 6+). This caused PG builds to fail with assembler syntax errors, because we'd try to compile s_lock.h's gcc asm fragment for PPC, and that assembly code relied on GNU-style local labels. We can't substitute normal labels because it would fail in any file containing more than one inlined use of tas(). Fortunately, that code is stable enough, and the PPC ISA is simple enough, that it doesn't seem like too much of a maintenance burden to just hand-code the branch offsets, removing the need for any labels. Note that the AIX assembler only accepts "$" for the location counter pseudo-symbol. The usual GNU convention is "."; but it appears that all versions of gas for PPC also accept "$", so in theory this patch will not break any other PPC platforms. This has been reported by a few people, but Steve Underwood gets the credit for being the first to pursue the problem far enough to understand why it was failing. Thanks also to Noah Misch for additional testing.	2015-08-29 16:09:25 -04:00
Tom Lane	4022f94c35	Avoid use of float arithmetic in bipartite_match.c. Since the distances used in this algorithm are small integers (not more than the size of the U set, in fact), there is no good reason to use float arithmetic for them. Use short ints instead: they're smaller, faster, and require no special portability assumptions. Per testing by Greg Stark, which disclosed that the code got into an infinite loop on VAX for lack of IEEE-style float infinities. We don't really care all that much whether Postgres can run on a VAX anymore, but there seems sufficient reason to change this code anyway. In passing, make a few other small adjustments to make the code match usual Postgres coding style a bit better.	2015-08-23 13:02:19 -04:00
Stephen Frost	49f9a2831b	Rename 'cmd' to 'cmd_name' in CreatePolicyStmt To avoid confusion, rename CreatePolicyStmt's 'cmd' to 'cmd_name', parse_policy_command's 'cmd' to 'polcmd', and AlterPolicy's 'cmd_datum' to 'polcmd_datum', per discussion with Noah and as a follow-up to his correction of copynodes/equalnodes handling of the CreatePolicyStmt 'cmd' field. Back-patch to 9.5 where the CreatePolicyStmt was introduced, as we are still only in alpha.	2015-08-21 08:22:29 -04:00
Alvaro Herrera	fc0a640230	Close some holes in BRIN page assignment In some corner cases, it is possible for the BRIN index relation to be extended by brin_getinsertbuffer but the new page not be used immediately for anything by its callers; when this happens, the page is initialized and the FSM is updated (by brin_getinsertbuffer) with the info about that page, but these actions are not WAL-logged. A later index insert/update can use the page, but since the page is already initialized, the initialization itself is not WAL-logged then either. Replay of this sequence of events causes recovery to fail altogether. There is a related corner case within brin_getinsertbuffer itself, in which we extend the relation to put a new index tuple there, but later find out that we cannot do so, and do not return the buffer; the page obtained from extension is not even initialized. The resulting page is lost forever. To fix, shuffle the code so that initialization is not the responsibility of brin_getinsertbuffer anymore, in normal cases; instead, the initialization is done by its callers (brin_doinsert and brin_doupdate) once they're certain that the page is going to be used. When either those functions determine that the new page cannot be used, before bailing out they initialize the page as an empty regular page, enter it in FSM and WAL-log all this. This way, the page is usable for future index insertions, and WAL replay doesn't find trying to insert tuples in pages whose initialization didn't make it to the WAL. The same strategy is used in brin_getinsertbuffer when it cannot return the new page. Additionally, add a new step to vacuuming so that all pages of the index are scanned; whenever an uninitialized page is found, it is initialized as empty and WAL-logged. This closes the hole that the relation is extended but the system crashes before anything is WAL-logged about it. We also take this opportunity to update the FSM, in case it has gotten out of date. Thanks to Heikki Linnakangas for finding the problem that kicked some additional analysis of BRIN page assignment code. Backpatch to 9.5, where BRIN was introduced. Discussion: https://www.postgresql.org/message-id/20150723204810.GY5596@postgresql.org	2015-08-12 14:20:38 -03:00
Tom Lane	f8e4e0e3f3	Accept alternate spellings of __sparcv7 and __sparcv8. Apparently some versions of gcc prefer __sparc_v7__ and __sparc_v8__. Per report from Waldemar Brodkorb.	2015-08-10 17:34:51 -04:00
Tom Lane	a8725c2bad	Fix eclass_useful_for_merging to give valid results for appendrel children. Formerly, this function would always return "true" for an appendrel child relation, because it would think that the appendrel parent was a potential join target for the child. In principle that should only lead to some inefficiency in planning, but fuzz testing by Andreas Seltenreich disclosed that it could lead to "could not find pathkey item to sort" planner errors in odd corner cases. Specifically, we would think that all columns of a child table's multicolumn index were interesting pathkeys, causing us to generate a MergeAppend path that sorts by all the columns. However, if any of those columns weren't actually used above the level of the appendrel, they would not get added to that rel's targetlist, which would result in being unable to resolve the MergeAppend's sort keys against its targetlist during createplan.c. Backpatch to 9.3. In older versions, columns of an appendrel get added to its targetlist even if they're not mentioned above the scan level, so that the failure doesn't occur. It might be worth back-patching this fix to older versions anyway, but I'll refrain for the moment.	2015-08-06 20:14:54 -04:00
Andres Freund	449b13313c	Improve includes introduced in the replication origins patch. pg_resetxlog.h contained two superfluous includes, origin.h superfluously depended on logical.h, and pg_xlogdump's rmgrdesc.h only indirectly included origin.h. Backpatch: 9.5, where replication origins were introduced.	2015-08-06 12:41:59 +02:00
Noah Misch	4877281f3a	Reconcile nodes/*funcs.c with recent work. A few of the discrepancies had semantic significance, but I did not track down the resulting user-visible bugs, if any. Back-patch to 9.5, where all but one discrepancy appeared. The _equalCreateEventTrigStmt() situation dates to 9.3 but does not affect semantics. catversion bump due to readfuncs.c field order changes.	2015-08-05 20:45:48 -04:00
Alvaro Herrera	94a8b45feb	Fix BRIN to use SnapshotAny during summarization For correctness of summarization results, it is critical that the snapshot used during the summarization scan is able to see all tuples that are live to all transactions -- including tuples inserted or deleted by in-progress transactions. Otherwise, it would be possible for a transaction to insert a tuple, then idle for a long time while a concurrent transaction executes summarization of the range: this would result in the inserted value not being considered in the summary. Previously we were trying to use a MVCC snapshot in conjunction with adding a "placeholder" tuple in the index: the snapshot would see all committed tuples, and the placeholder tuple would catch insertions by any new inserters. The hole is that prior insertions by transactions that are still in progress by the time the MVCC snapshot was taken were ignored. Kevin Grittner reported this as a bogus error message during vacuum with default transaction isolation mode set to repeatable read (because the error report mentioned a function name not being invoked during), but the problem is larger than that. To fix, tweak IndexBuildHeapRangeScan to have a new mode that behaves the way we need using SnapshotAny visibility rules. This change simplifies the BRIN code a bit, mainly by removing large comments that were mistaken. Instead, rely on the SnapshotAny semantics to provide what it needs. (The business about a placeholder tuple needs to remain: that covers the case that a transaction inserts a a tuple in a page that summarization already scanned.) Discussion: https://www.postgresql.org/message-id/20150731175700.GX2441@postgresql.org In passing, remove a couple of unused declarations from brin.h and reword a comment to be proper English. This part submitted by Kevin Grittner. Backpatch to 9.5, where BRIN was introduced.	2015-08-05 16:20:50 -03:00
Andres Freund	04521eba3f	Fix comment atomics.h. I appear to accidentally have switched the comments for pg_atomic_write_u32 and pg_atomic_read_u32 around. Also fix some minor typos I found while fixing. Noticed-By: Amit Kapila Backpatch: 9.5	2015-08-05 13:07:44 +02:00
Tom Lane	e2035dc9a7	Fix bogus "out of memory" reports in tuplestore.c. The tuplesort/tuplestore memory management logic assumed that the chunk allocation overhead for its memtuples array could not increase when increasing the array size. This is and always was true for tuplesort, but we (I, I think) blindly copied that logic into tuplestore.c without noticing that the assumption failed to hold for the much smaller array elements used by tuplestore. Given rather small work_mem, this could result in an improper complaint about "unexpected out-of-memory situation", as reported by Brent DeSpain in bug #13530. The easiest way to fix this is just to increase tuplestore's initial array size so that the assumption holds. Rather than relying on magic constants, though, let's export a #define from aset.c that represents the safe allocation threshold, and make tuplestore's calculation depend on that. Do the same in tuplesort.c to keep the logic looking parallel, even though tuplesort.c isn't actually at risk at present. This will keep us from breaking it if we ever muck with the allocation parameters in aset.c. Back-patch to all supported versions. The error message doesn't occur pre-9.3, not so much because the problem can't happen as because the pre-9.3 tuplestore code neglected to check for it. (The chance of trouble is a great deal larger as of 9.3, though, due to changes in the array-size-increasing strategy.) However, allowing LACKMEM() to become true unexpectedly could still result in less-than-desirable behavior, so let's patch it all the way back.	2015-08-04 18:18:46 -04:00
Tom Lane	6bd01f082b	Stamp 9.5alpha2.	2015-08-03 16:34:55 -04:00
Tom Lane	72697d2ba7	Fix incorrect order of lock file removal and failure to close() sockets. Commit `c9b0cbe98b` accidentally broke the order of operations during postmaster shutdown: it resulted in removing the per-socket lockfiles after, not before, postmaster.pid. This creates a race-condition hazard for a new postmaster that's started immediately after observing that postmaster.pid has disappeared; if it sees the socket lockfile still present, it will quite properly refuse to start. This error appears to be the explanation for at least some of the intermittent buildfarm failures we've seen in the pg_upgrade test. Another problem, which has been there all along, is that the postmaster has never bothered to close() its listen sockets, but has just allowed them to close at process death. This creates a different race condition for an incoming postmaster: it might be unable to bind to the desired listen address because the old postmaster is still incumbent. This might explain some odd failures we've seen in the past, too. (Note: this is not related to the fact that individual backends don't close their client communication sockets. That behavior is intentional and is not changed by this patch.) Fix by adding an on_proc_exit function that closes the postmaster's ports explicitly, and (in 9.3 and up) reshuffling the responsibility for where to unlink the Unix socket files. Lock file unlinking can stay where it is, but teach it to unlink the lock files in reverse order of creation.	2015-08-02 14:55:04 -04:00
Andres Freund	27b7191735	Fix issues around the "variable" support in the lwlock infrastructure. The lwlock scalability work introduced two race conditions into the lwlock variable support provided for xlog.c. First, and harmlessly on most platforms, it set/read the variable without the spinlock in some places. Secondly, due to the removal of the spinlock, it was possible that a backend missed changes to the variable's state if it changed in the wrong moment because checking the lock's state, the variable's state and the queuing are not protected by a single spinlock acquisition anymore. To fix first move resetting the variable's from LWLockAcquireWithVar to WALInsertLockRelease, via a new function LWLockReleaseClearVar. That prevents issues around waiting for a variable's value to change when a new locker has acquired the lock, but not yet set the value. Secondly re-check that the variable hasn't changed after enqueing, that prevents the issue that the lock has been released and already re-acquired by the time the woken up backend checks for the lock's state. Reported-By: Jeff Janes Analyzed-By: Heikki Linnakangas Reviewed-By: Heikki Linnakangas Discussion: 5592DB35.2060401@iki.fi Backpatch: 9.5, where the lwlock scalability went in	2015-08-02 18:41:36 +02:00
Alvaro Herrera	71b66e78e4	Fix volatility marking of commit timestamp functions They are marked stable, but since they act on instantaneous state and it is possible to consult state of transactions as they commit, the results could change mid-query. They need to be marked volatile, and this commit does so. There would normally be a catversion bump here, but this is so much a niche feature and I don't believe there's real damage from the incorrect marking, that I refrained. Backpatch to 9.5, where commit timestamps where introduced. Per note from Fujii Masao.	2015-07-30 15:19:49 -03:00
Joe Conway	43797ed42a	Create new ParseExprKind for use by policy expressions. Policy USING and WITH CHECK expressions were using EXPR_KIND_WHERE for parse analysis, which results in inappropriate ERROR messages when the expression contains unsupported constructs such as aggregates. Create a new ParseExprKind called EXPR_KIND_POLICY and tailor the related messages to fit. Reported by Noah Misch. Reviewed by Dean Rasheed, Alvaro Herrera, and Robert Haas. Back-patch to 9.5 where RLS was introduced.	2015-07-29 15:41:00 -07:00
Joe Conway	344703bcc4	Disallow converting a table to a view if row security is present. When DefineQueryRewrite() is about to convert a table to a view, it checks the table for features unavailable to views. For example, it rejects tables having triggers. It omits to reject tables having relrowsecurity or a pg_policy record. Fix that. To faciliate the repair, invent relation_has_policies() which indicates the presence of policies on a relation even when row security is disabled for that relation. Reported by Noah Misch. Patch by me, review by Stephen Frost. Back-patch to 9.5 where RLS was introduced.	2015-07-28 16:24:09 -07:00
Joe Conway	992c9d345f	Create a pg_shdepend entry for each role in TO clause of policies. CreatePolicy() and AlterPolicy() omit to create a pg_shdepend entry for each role in the TO clause. Fix this by creating a new shared dependency type called SHARED_DEPENDENCY_POLICY and assigning it to each role. Reported by Noah Misch. Patch by me, reviewed by Alvaro Herrera. Back-patch to 9.5 where RLS was introduced.	2015-07-28 16:01:56 -07:00
Joe Conway	cfa928ff6f	Plug RLS related information leak in pg_stats view. The pg_stats view is supposed to be restricted to only show rows about tables the user can read. However, it sometimes can leak information which could not otherwise be seen when row level security is enabled. Fix that by not showing pg_stats rows to users that would be subject to RLS on the table the row is related to. This is done by creating/using the newly introduced SQL visible function, row_security_active(). Along the way, clean up three call sites of check_enable_rls(). The second argument of that function should only be specified as other than InvalidOid when we are checking as a different user than the current one, as in when querying through a view. These sites were passing GetUserId() instead of InvalidOid, which can cause the function to return incorrect results if the current user has the BYPASSRLS privilege and row_security has been set to OFF. Additionally fix a bug causing RI Trigger error messages to unintentionally leak information when RLS is enabled, and other minor cleanup and improvements. Also add WITH (security_barrier) to the definition of pg_stats. Bumped CATVERSION due to new SQL functions and pg_stats view definition. Back-patch to 9.5 where RLS was introduced. Reported by Yaroslav. Patch by Joe Conway and Dean Rasheed with review and input by Michael Paquier and Stephen Frost.	2015-07-28 13:21:37 -07:00
Andres Freund	6087d952b3	Remove ssl renegotiation support. While postgres' use of SSL renegotiation is a good idea in theory, it turned out to not work well in practice. The specification and openssl's implementation of it have lead to several security issues. Postgres' use of renegotiation also had its share of bugs. Additionally OpenSSL has a bunch of bugs around renegotiation, reported and open for years, that regularly lead to connections breaking with obscure error messages. We tried increasingly complex workarounds to get around these bugs, but we didn't find anything complete. Since these connection breakages often lead to hard to debug problems, e.g. spuriously failing base backups and significant latency spikes when synchronous replication is used, we have decided to change the default setting for ssl renegotiation to 0 (disabled) in the released backbranches and remove it entirely in 9.5 and master. Author: Andres Freund Discussion: 20150624144148.GQ4797@alap3.anarazel.de Backpatch: 9.5 and master, 9.0-9.4 get a different patch	2015-07-28 22:06:31 +02:00
Heikki Linnakangas	1b7f125bf7	Don't assume that 'char' is signed. On some platforms, notably ARM and PowerPC, 'char' is unsigned by default. This fixes an assertion failure at WAL replay on such platforms. Reported by Noah Misch. Backpatch to 9.5, where this was broken.	2015-07-27 21:51:49 +03:00
Heikki Linnakangas	2fa8ba3480	Fix handling of all-zero pages in SP-GiST vacuum. SP-GiST initialized an all-zeros page at vacuum, but that was not WAL-logged, which is not safe. You might get a torn page write, when it gets flushed to disk, and end-up with a half-initialized index page. To fix, leave it in the all-zeros state, and add it to the FSM. It will be initialized when reused. Also don't set the page-deleted flag when recycling an empty page. That was also not WAL-logged, and a torn write of that would cause the page to have an invalid checksum. Backpatch to 9.2, where SP-GiST indexes were added.	2015-07-27 12:31:05 +03:00
Tom Lane	6fcb337fa5	Redesign tablesample method API, and do extensive code review. The original implementation of TABLESAMPLE modeled the tablesample method API on index access methods, which wasn't a good choice because, without specialized DDL commands, there's no way to build an extension that can implement a TSM. (Raw inserts into system catalogs are not an acceptable thing to do, because we can't undo them during DROP EXTENSION, nor will pg_upgrade behave sanely.) Instead adopt an API more like procedural language handlers or foreign data wrappers, wherein the only SQL-level support object needed is a single handler function identified by having a special return type. This lets us get rid of the supporting catalog altogether, so that no custom DDL support is needed for the feature. Adjust the API so that it can support non-constant tablesample arguments (the original coding assumed we could evaluate the argument expressions at ExecInitSampleScan time, which is undesirable even if it weren't outright unsafe), and discourage sampling methods from looking at invisible tuples. Make sure that the BERNOULLI and SYSTEM methods are genuinely repeatable within and across queries, as required by the SQL standard, and deal more honestly with methods that can't support that requirement. Make a full code-review pass over the tablesample additions, and fix assorted bugs, omissions, infelicities, and cosmetic issues (such as failure to put the added code stanzas in a consistent ordering). Improve EXPLAIN's output of tablesample plans, too. Back-patch to 9.5 so that we don't have to support the original API in production.	2015-07-25 14:39:00 -04:00
Andres Freund	bb0203f26f	Fix bug around assignment expressions containing indirections. Handling of assigned-to expressions with indirection (e.g. set f1[1] = 3) was broken for ON CONFLICT DO UPDATE. The problem was that ParseState was consulted to determine if an INSERT-appropriate or UPDATE-appropriate behavior should be used when transforming expressions with indirections. When the wrong path was taken the old row was substituted with NULL, leading to wrong results.. To fix remove p_is_update and only use p_is_insert to decide how to transform the assignment expression, and uset p_is_insert while parsing the on conflict statement. This isn't particularly pretty, but it's not any worse than before. Author: Peter Geoghegan, slightly edited by me Discussion: CAM3SWZS8RPvA=KFxADZWw3wAHnnbxMxDzkEC6fNaFc7zSm411w@mail.gmail.com Backpatch: 9.5, where the feature was introduced	2015-07-24 11:52:22 +02:00
Tom Lane	35ac618a7c	Fix some oversights in BRIN patch. Remove HeapScanDescData.rs_initblock, which wasn't being used for anything in the final version of the patch. Fix IndexBuildHeapScan so that it supports syncscan again; the patch broke synchronous scanning for index builds by forcing rs_startblk to zero even when the caller did not care about that and had asked for syncscan. Add some commentary and usage defenses to heap_setscanlimits(). Fix heapam so that asking for rs_numblocks == 0 does what you would reasonably expect. As coded it amounted to requesting a whole-table scan, because those "--x <= 0" tests on an unsigned variable would behave surprisingly.	2015-07-21 13:38:24 -04:00
Alvaro Herrera	d6ec181cf1	Fix omission of OCLASS_TRANSFORM in object_classes[] This was forgotten in `cac7658205` (and its fixup `ad89a5d115`). Since it seems way too easy to miss this, this commit also introduces a mechanism to enforce that the array is consistent with the enum. Problem reported independently by Robert Haas and Jaimin Pan. Patches proposed by Jaimin Pan, Jim Nasby, Michael Paquier and myself, though I didn't use any of these and instead went with a cleaner approach suggested by Tom Lane. Backpatch to 9.5. Discussion: https://www.postgresql.org/message-id/CA+Tgmoa6SgDaxW_n_7SEhwBAc=mniYga+obUj5fmw4rU9_mLvA@mail.gmail.com https://www.postgresql.org/message-id/29788.1437411581@sss.pgh.pa.us	2015-07-21 13:20:53 +02:00
Heikki Linnakangas	e66e31958f	Handle AT_ReAddComment in test_ddl_deparse, and add a catch-all default. In the passing, also move AT_ReAddComment to more logical position in the enum, after all the Constraint-related subcommands. This fixes a compiler warning, added by commit `e42375fc`. Backpatch to 9.5, like that patch.	2015-07-20 10:27:01 +03:00
Andrew Dunstan	89ddd29bbd	Support JSON negative array subscripts everywhere Previously, there was an inconsistency across json/jsonb operators that operate on datums containing JSON arrays -- only some operators supported negative array count-from-the-end subscripting. Specifically, only a new-to-9.5 jsonb deletion operator had support (the new "jsonb - integer" operator). This inconsistency seemed likely to be counter-intuitive to users. To fix, allow all places where the user can supply an integer subscript to accept a negative subscript value, including path-orientated operators and functions, as well as other extraction operators. This will need to be called out as an incompatibility in the 9.5 release notes, since it's possible that users are relying on certain established extraction operators changed here yielding NULL in the event of a negative subscript. For the json type, this requires adding a way of cheaply getting the total JSON array element count ahead of time when parsing arrays with a negative subscript involved, necessitating an ad-hoc lex and parse. This is followed by a "conversion" from a negative subscript to its equivalent positive-wise value using the count. From there on, it's as if a positive-wise value was originally provided. Note that there is still a minor inconsistency here across jsonb deletion operators. Unlike the aforementioned new "-" deletion operator that accepts an integer on its right hand side, the new "#-" path orientated deletion variant does not throw an error when it appears like an array subscript (input that could be recognized by as an integer literal) is being used on an object, which is wrong-headed. The reason for not being stricter is that it could be the case that an object pair happens to have a key value that looks like an integer; in general, these two possibilities are impossible to differentiate with rhs path text[] argument elements. However, we still don't allow the "#-" path-orientated deletion operator to perform array-style subscripting. Rather, we just return the original left operand value in the event of a negative subscript (which seems analogous to how the established "jsonb/json #> text[]" path-orientated operator may yield NULL in the event of an invalid subscript). In passing, make SetArrayPath() stricter about not accepting cases where there is trailing non-numeric garbage bytes rather than a clean NUL byte. This means, for example, that strings like "10e10" are now not accepted as an array subscript of 10 by some new-to-9.5 path-orientated jsonb operators (e.g. the new #- operator). Finally, remove dead code for jsonb subscript deletion; arguably, this should have been done in commit `b81c7b409`. Peter Geoghegan and Andrew Dunstan	2015-07-17 21:06:39 -04:00
Heikki Linnakangas	9dee48c94b	Retain comments on indexes and constraints at ALTER TABLE ... TYPE ... When a column's datatype is changed, ATExecAlterColumnType() rebuilds all the affected indexes and constraints, and the comments from the old indexes/constraints were not carried over. To fix, create a synthetic COMMENT ON command in the work queue, to re-add any comments on constraints. For indexes, there's a comment field in IndexStmt that is used. This fixes bug #13126, reported by Kirill Simonov. Original patch by Michael Paquier, reviewed by Petr Jelinek and me. This bug is present in all versions, but only backpatch to 9.5. Given how minor the issue is, it doesn't seem worth the work and risk to backpatch further than that.	2015-07-14 11:42:26 +03:00
Noah Misch	c0d7342f16	Given a gcc-compatible xlc compiler, prefer xlc-style atomics. This evades a ppc64le "IBM XL C/C++ for Linux" compiler bug. Back-patch to 9.5, where the atomics facility was introduced.	2015-07-08 20:44:26 -04:00
Noah Misch	abf5190c07	Finish generic-xlc.h draft atomics implementation. Back-patch to 9.5, where commit `b64d92f1a5` introduced this file.	2015-07-08 20:44:25 -04:00
Noah Misch	aaf15ee33a	Revoke support for strxfrm() that write past the specified array length. This formalizes a decision implicit in commit `4ea51cdfe8` and adds clean detection of affected systems. Vendor updates are available for each such known bug. Back-patch to 9.5, where the aforementioned commit first appeared.	2015-07-08 20:44:25 -04:00
Heikki Linnakangas	00ccea9e9d	Fix name of argument to pg_stat_file. It's called "missing_ok" in the docs and in the C code. I refrained from doing a catversion bump for this, because the name of an input argument is just documentation, it has no effect on any callers. Michael Paquier	2015-07-02 12:15:30 +03:00
Fujii Masao	163e29dc38	Make use of xlog_internal.h's macros in WAL-related utilities. Commit `179cdd09` added macros to check if a filename is a WAL segment or other such file. However there were still some instances of the strlen + strspn combination to check for that in WAL-related utilities like pg_archivecleanup. Those checks can be replaced with the macros. This patch makes use of the macros in those utilities and which would make the code a bit easier to read. Back-patch to 9.5. Michael Paquier	2015-07-02 10:36:18 +09:00
Heikki Linnakangas	302ac7f271	Add assertion to check the special size is sane before dereferencing it. This seems useful to catch errors of the sort I just fixed, where PageGetSpecialPointer is called before initializing the page.	2015-06-30 13:44:04 +03:00
Tom Lane	f78329d594	Stamp 9.5alpha1.	2015-06-29 15:42:18 -04:00
Tom Lane	cbc8d65639	Code + docs review for escaping of option values (commit `11a020eb6`). Avoid memory leak from incorrect choice of how to free a StringInfo (resetStringInfo doesn't do it). Now that pg_split_opts doesn't scribble on the optstr, mark that as "const" for clarity. Attach the commentary in protocol.sgml to the right place, and add documentation about the user-visible effects of this change on postgres' -o option and libpq's PGOPTIONS option.	2015-06-29 12:42:52 -04:00
Andres Freund	07cb8b02ab	Replace ia64 S_UNLOCK compiler barrier with a full memory barrier. _Asm_sched_fence() is just a compiler barrier, not a memory barrier. But spinlock release on IA64 needs, at the very least, release semantics. Use a full barrier instead. This might be the cause for the occasional failures on buildfarm member anole. Discussion: 20150629101108.GB17640@alap3.anarazel.de	2015-06-29 14:53:32 +02:00
Tom Lane	62d16c7fc5	Improve design and implementation of pg_file_settings view. As first committed, this view reported on the file contents as they were at the last SIGHUP event. That's not as useful as reporting on the current contents, and what's more, it didn't work right on Windows unless the current session had serviced at least one SIGHUP. Therefore, arrange to re-read the files when pg_show_all_settings() is called. This requires only minor refactoring so that we can pass changeVal = false to set_config_option() so that it won't actually apply any changes locally. In addition, add error reporting so that errors that would prevent the configuration files from being loaded, or would prevent individual settings from being applied, are visible directly in the view. This makes the view usable for pre-testing whether edits made in the config files will have the desired effect, before one actually issues a SIGHUP. I also added an "applied" column so that it's easy to identify entries that are superseded by later entries; this was the main use-case for the original design, but it seemed unnecessarily hard to use for that. Also fix a 9.4.1 regression that allowed multiple entries for a PGC_POSTMASTER variable to cause bogus complaints in the postmaster log. (The issue here was that commit `bf007a27ac` unintentionally reverted `3e3f65973a`, which suppressed any duplicate entries within ParseConfigFp. However, since the original coding of the pg_file_settings view depended on such suppression not happening, we couldn't have fixed this issue now without first doing something with pg_file_settings. Now we suppress duplicates by marking them "ignored" within ProcessConfigFileInternal, which doesn't hide them in the view.) Lesser changes include: Drive the view directly off the ConfigVariable list, instead of making a basically-equivalent second copy of the data. There's no longer any need to hang onto the data permanently, anyway. Convert show_all_file_settings() to do its work in one call and return a tuplestore; this avoids risks associated with assuming that the GUC state will hold still over the course of query execution. (I think there were probably latent bugs here, though you might need something like a cursor on the view to expose them.) Arrange to run SIGHUP processing in a short-lived memory context, to forestall process-lifespan memory leaks. (There is one known leak in this code, in ProcessConfigDirectory; it seems minor enough to not be worth back-patching a specific fix for.) Remove mistaken assignment to ConfigFileLineno that caused line counting after an include_dir directive to be completely wrong. Add missed failure check in AlterSystemSetConfigFile(). We don't really expect ParseConfigFp() to fail, but that's not an excuse for not checking.	2015-06-28 18:06:14 -04:00
Heikki Linnakangas	cb2acb1081	Add missing_ok option to the SQL functions for reading files. This makes it possible to use the functions without getting errors, if there is a chance that the file might be removed or renamed concurrently. pg_rewind needs to do just that, although this could be useful for other purposes too. (The changes to pg_rewind to use these functions will come in a separate commit.) The read_binary_file() function isn't very well-suited for extensions.c's purposes anymore, if it ever was. So bite the bullet and make a copy of it in extension.c, tailored for that use case. This seems better than the accidental code reuse, even if it's a some more lines of code. Michael Paquier, with plenty of kibitzing by me.	2015-06-28 21:35:46 +03:00
Kevin Grittner	604e99396d	Add opaque declaration of HTAB to tqual.h. Commit `b89e151054` added the ResolveCminCmaxDuringDecoding declaration to tqual.h, which uses an HTAB parameter, without declaring HTAB. It accidentally fails to fail to build with current sources because a declaration happens to be included, directly or indirectly, in all source files that currently use tqual.h before tqual.h is first included, but we shouldn't count on that. Since an opaque declaration is enough here, just use that, as was done in snapmgr.h. Backpatch to 9.4, where the HTAB reference was added to tqual.h.	2015-06-27 09:55:06 -05:00
Alvaro Herrera	7d60b2af34	Fix DDL command collection for TRANSFORM Commit `b488c580ae`, which added the DDL command collection feature, neglected to update the code that commit `cac7658205` had previously added two weeks earlier for the TRANSFORM feature. Reported by Michael Paquier.	2015-06-26 18:17:54 -03:00
Robert Haas	8f15f74a44	Be more conservative about removing tablespace "symlinks". Don't apply rmtree(), which will gleefully remove an entire subtree, and don't even apply unlink() unless it's symlink or a directory, the only things that we expect to find. Amit Kapila, with minor tweaks by me, per extensive discussions involving Andrew Dunstan, Fujii Masao, and Heikki Linnakangas, at least some of whom also reviewed the code.	2015-06-26 15:53:13 -04:00
Robert Haas	5ca611841b	Improve handling of CustomPath/CustomPlan(State) children. Allow CustomPath to have a list of paths, CustomPlan a list of plans, and CustomPlanState a list of planstates known to the core system, so that custom path/plan providers can more reasonably use this infrastructure for nodes with multiple children. KaiGai Kohei, per a design suggestion from Tom Lane, with some further kibitzing by me.	2015-06-26 09:40:47 -04:00
Tom Lane	5d1ff6bd55	Fix the logic for putting relations into the relcache init file. Commit `f3b5565dd4` was a couple of bricks shy of a load; specifically, it missed putting pg_trigger_tgrelid_tgname_index into the relcache init file, because that index is not used by any syscache. However, we have historically nailed that index into cache for performance reasons. The upshot was that load_relcache_init_file always decided that the init file was busted and silently ignored it, resulting in a significant hit to backend startup speed. To fix, reinstantiate RelationIdIsInInitFile() as a wrapper around RelationSupportsSysCache(), which can know about additional relations that should be in the init file despite being unknown to syscache.c. Also install some guards against future mistakes of this type: make write_relcache_init_file Assert that all nailed relations get written to the init file, and make load_relcache_init_file emit a WARNING if it takes the "wrong number of nailed relations" exit path. Now that we remove the init files during postmaster startup, that case should never occur in the field, even if we are starting a minor-version update that added or removed rels from the nailed set. So the warning shouldn't ever be seen by end users, but it will show up in the regression tests if somebody breaks this logic. Back-patch to all supported branches, like the previous commit.	2015-06-25 14:39:05 -04:00
Andrew Dunstan	41d798a139	Fix comment in fmgr.h to refer to actual function used. FunctionLookup() is long gone if it ever existed, and fmgr_info() is what's now used, so the comments now reflect that.	2015-06-15 23:21:03 -04:00
Fujii Masao	b5fe62038f	Make postmaster restart archiver soon after it dies, even during recovery. After the archiver dies, postmaster tries to start a new one immediately. But previously this could happen only while server was running normally even though archiving was enabled always (i.e., archive_mode was set to always). So the archiver running during recovery could not restart soon after it died. This is an oversight in commit `ffd3774`. This commit changes reaper(), postmaster's signal handler to cleanup after a child process dies, so that it tries to a new archiver even during recovery if necessary. Patch by me. Review by Alvaro Herrera.	2015-06-12 23:11:51 +09:00
Andrew Dunstan	908e234733	Rename jsonb - text[] operator to #- to avoid ambiguity. Following recent discussion on -hackers. The underlying function is also renamed to jsonb_delete_path. The regression tests now don't need ugly type casts to avoid the ambiguity, so they are also removed. Catalog version bumped.	2015-06-11 10:06:58 -04:00
Kevin Grittner	870681017a	Fix typo in comment. Backpatch to 9.4 to minimize possible conflicts.	2015-06-10 17:03:56 -05:00
Fujii Masao	ea9c4c1e4a	Fix typo in comment. David Rowley	2015-06-10 15:26:02 +09:00
Tom Lane	f3b5565dd4	Use a safer method for determining whether relcache init file is stale. When we invalidate the relcache entry for a system catalog or index, we must also delete the relcache "init file" if the init file contains a copy of that rel's entry. The old way of doing this relied on a specially maintained list of the OIDs of relations present in the init file: we made the list either when reading the file in, or when writing the file out. The problem is that when writing the file out, we included only rels present in our local relcache, which might have already suffered some deletions due to relcache inval events. In such cases we correctly decided not to overwrite the real init file with incomplete data --- but we still used the incomplete initFileRelationIds list for the rest of the current session. This could result in wrong decisions about whether the session's own actions require deletion of the init file, potentially allowing an init file created by some other concurrent session to be left around even though it's been made stale. Since we don't support changing the schema of a system catalog at runtime, the only likely scenario in which this would cause a problem in the field involves a "vacuum full" on a catalog concurrently with other activity, and even then it's far from easy to provoke. Remarkably, this has been broken since 2002 (in commit `7863404417`), but we had never seen a reproducible test case until recently. If it did happen in the field, the symptoms would probably involve unexpected "cache lookup failed" errors to begin with, then "could not open file" failures after the next checkpoint, as all accesses to the affected catalog stopped working. Recovery would require manually removing the stale "pg_internal.init" file. To fix, get rid of the initFileRelationIds list, and instead consult syscache.c's list of relations used in catalog caches to decide whether a relation is included in the init file. This should be a tad more efficient anyway, since we're replacing linear search of a list with ~100 entries with a binary search. It's a bit ugly that the init file contents are now so directly tied to the catalog caches, but in practice that won't make much difference. Back-patch to all supported branches.	2015-06-07 15:32:09 -04:00
Tom Lane	3f59be836c	Fix planner's cost estimation for SEMI/ANTI joins with inner indexscans. When the inner side of a nestloop SEMI or ANTI join is an indexscan that uses all the join clauses as indexquals, it can be presumed that both matched and unmatched outer rows will be processed very quickly: for matched rows, we'll stop after fetching one row from the indexscan, while for unmatched rows we'll have an indexscan that finds no matching index entries, which should also be quick. The planner already knew about this, but it was nonetheless charging for at least one full run of the inner indexscan, as a consequence of concerns about the behavior of materialized inner scans --- but those concerns don't apply in the fast case. If the inner side has low cardinality (many matching rows) this could make an indexscan plan look far more expensive than it actually is. To fix, rearrange the work in initial_cost_nestloop/final_cost_nestloop so that we don't add the inner scan cost until we've inspected the indexquals, and then we can add either the full-run cost or just the first tuple's cost as appropriate. Experimentation with this fix uncovered another problem: add_path and friends were coded to disregard cheap startup cost when considering parameterized paths. That's usually okay (and desirable, because it thins the path herd faster); but in this fast case for SEMI/ANTI joins, it could result in throwing away the desired plain indexscan path in favor of a bitmap scan path before we ever get to the join costing logic. In the many-matching-rows cases of interest here, a bitmap scan will do a lot more work than required, so this is a problem. To fix, add a per-relation flag consider_param_startup that works like the existing consider_startup flag, but applies to parameterized paths, and set it for relations that are the inside of a SEMI or ANTI join. To make this patch reasonably safe to back-patch, care has been taken to avoid changing the planner's behavior except in the very narrow case of SEMI/ANTI joins with inner indexscans. There are places in compare_path_costs_fuzzily and add_path_precheck that are not terribly consistent with the new approach, but changing them will affect planner decisions at the margins in other cases, so we'll leave that for a HEAD-only fix. Back-patch to 9.3; before that, the consider_startup flag didn't exist, meaning that the second aspect of the patch would be too invasive. Per a complaint from Peter Holzer and analysis by Tomas Vondra.	2015-06-03 11:59:10 -04:00
Andrew Dunstan	37def42245	Rename jsonb_replace to jsonb_set and allow it to add new values The function is given a fourth parameter, which defaults to true. When this parameter is true, if the last element of the path is missing in the original json, jsonb_set creates it in the result and assigns it the new value. If it is false then the function does nothing unless all elements of the path are present, including the last. Based on some original code from Dmitry Dolgov, heavily modified by me. Catalog version bumped.	2015-05-31 20:34:10 -04:00
Tom Lane	1c8c656b3c	Check that all aliases of a built-in function have same leakproof property. opr_sanity.sql has a test checking that relevant properties of built-in functions match when the same C function is referenced by multiple pg_proc entries. The test neglected to check proleakproof, though, and when I added that condition it exposed that xideqint4 hadn't been updated to match xideq. So fix that as well, and in consequence bump catversion. This isn't very critical, so no need to worry about fixing back branches.	2015-05-29 13:26:21 -04:00
Tom Lane	da33a3894e	Revert exporting of internal GUC variable "data_directory". This undoes a poorly-thought-out choice in commit `970a18687f`, namely to export guc.c's internal variable data_directory. The authoritative variable so far as C code is concerned is DataDir; there is no reason for anything except specific bits of GUC code to look at the GUC variable. After yesterday's commits fixing the fsync-on-restart patch, the only remaining misuse of data_directory was in AlterSystemSetConfigFile(), which would be much better off just using a relative path anyhow: it's less code and it doesn't break if the DBA moves the data directory of a running system, which is a case we've taken some pains over in the past. This is mostly cosmetic, so no need for a back-patch (and I'd be hesitant to remove a global variable in stable branches anyway).	2015-05-29 11:57:33 -04:00
Tom Lane	d8179b001a	Fix fsync-at-startup code to not treat errors as fatal. Commit `2ce439f337` introduced a rather serious regression, namely that if its scan of the data directory came across any un-fsync-able files, it would fail and thereby prevent database startup. Worse yet, symlinks to such files also caused the problem, which meant that crash restart was guaranteed to fail on certain common installations such as older Debian. After discussion, we agreed that (1) failure to start is worse than any consequence of not fsync'ing is likely to be, therefore treat all errors in this code as nonfatal; (2) we should not chase symlinks other than those that are expected to exist, namely pg_xlog/ and tablespace links under pg_tblspc/. The latter restriction avoids possibly fsync'ing a much larger part of the filesystem than intended, if the user has left random symlinks hanging about in the data directory. This commit takes care of that and also does some code beautification, mainly moving the relevant code into fd.c, which seems a much better place for it than xlog.c, and making sure that the conditional compilation for the pre_sync_fname pass has something to do with whether pg_flush_data works. I also relocated the call site in xlog.c down a few lines; it seems a bit silly to be doing this before ValidateXLOGDirectoryStructure(). The similar logic in initdb.c ought to be made to match this, but that change is noncritical and will be dealt with separately. Back-patch to all active branches, like the prior commit. Abhijit Menon-Sen and Tom Lane	2015-05-28 17:33:03 -04:00
Bruce Momjian	befa3e648c	Revert 9.5 pgindent changes to atomics directory files This is because there are many __asm__ blocks there that pgindent messes up. Also configure pgindent to skip that directory in the future.	2015-05-24 21:45:01 -04:00
Tom Lane	23116d5437	Add a bit more commentary about regex's colormap tree data structure. Per an off-list question from Piotr Stefaniak.	2015-05-24 12:40:38 -04:00
Tom Lane	91e79260f6	Remove no-longer-required function declarations. Remove a bunch of "extern Datum foo(PG_FUNCTION_ARGS);" declarations that are no longer needed now that PG_FUNCTION_INFO_V1(foo) provides that. Some of these were evidently missed in commit `e7128e8dbb`, but others were cargo-culted in in code added since then. Possibly that can be blamed in part on the fact that we'd not fixed relevant documentation examples, which I've now done.	2015-05-24 12:20:23 -04:00
Bruce Momjian	807b9e0dff	pgindent run for 9.5	2015-05-23 21:35:49 -04:00
Tom Lane	821b821a24	Still more fixes for lossy-GiST-distance-functions patch. Fix confusion in documentation, substantial memory leakage if float8 or float4 are pass-by-reference, and assorted comments that were obsoleted by commit `98edd617f3`.	2015-05-23 15:22:25 -04:00
Andres Freund	631d749007	Remove the new UPSERT command tag and use INSERT instead. Previously, INSERT with ON CONFLICT DO UPDATE specified used a new command tag -- UPSERT. It was introduced out of concern that INSERT as a command tag would be a misrepresentation for ON CONFLICT DO UPDATE, as some affected rows may actually have been updated. Alvaro Herrera noticed that the implementation of that new command tag was incomplete; in subsequent discussion we concluded that having it doesn't provide benefits that are in line with the compatibility breaks it requires. Catversion bump due to the removal of PlannedStmt->isUpsert. Author: Peter Geoghegan Discussion: 20150520215816.GI5885@postgresql.org	2015-05-23 00:58:45 +02:00
Andrew Dunstan	5302760a50	Unpack jbvBinary objects passed to pushJsonbValue pushJsonbValue was accepting jbvBinary objects passed as WJB_ELEM or WJB_VALUE data. While this succeeded, when those objects were later encountered in attempting to convert the result to Jsonb, errors occurred. With this change we ghuarantee that a JSonbValue constructed from calls to pushJsonbValue does not contain any jbvBinary objects. This cures a problem observed with jsonb_delete. This means callers of pushJsonbValue no longer need to perform this unpacking themselves. A subsequent patch will perform some cleanup in that area. The error was not triggered by any 9.4 code, but this is a publicly visible routine, and so the error could be exercised by third party code, therefore backpatch to 9.4. Bug report from Peter Geoghegan, fix by me.	2015-05-22 10:21:41 -04:00
Heikki Linnakangas	7cbee7c0a1	At promotion, don't leave behind a partial segment on the old timeline. With commit `de768844`, a copy of the partial segment was archived with the .partial suffix, but the original file was still left in pg_xlog, so it didn't actually solve the problems with archiving the partial segment that it was supposed to solve. With this patch, the partial segment is renamed rather than copied, so we only archive it with the .partial suffix. Also be more robust in detecting if the last segment is already being archived. Previously I used XLogArchiveIsBusy() for that, but that's not quite right. With archive_mode='always', there might be a .ready file for it, and we don't want to rename it to .partial in that case. The old segment is needed until we're fully committed to the new timeline, i.e. until we've written the end-of-recovery WAL record and updated the min recovery point and timeline in the control file. So move the renaming later in the startup sequence, after all that's been done.	2015-05-22 11:04:33 +03:00
Tom Lane	c5dd8ead40	More fixes for lossy-GiST-distance-functions patch. Paul Ramsey reported that commit `35fcb1b3d0` induced a core dump on commuted ORDER BY expressions, because it was assuming that the indexorderby expression could be found verbatim in the relevant equivalence class, but it wasn't there. We really don't need anything that complicated anyway; for the data types likely to be used for index ORDER BY operators in the foreseeable future, the exprType() of the ORDER BY expression will serve fine. (The case where we'd have to work harder is where the ORDER BY expression's result is only binary-compatible with the declared input type of the ordering operator; long before worrying about that, one would need to get rid of GiST's hard-wired assumption that said datatype is float8.) Aside from fixing that crash and adding a regression test for the case, I did some desultory code review: nodeIndexscan.c was likewise overthinking how hard it ought to work to identify the datatype of the ORDER BY expressions. Add comments explaining how come nodeIndexscan.c can get away with simplifying assumptions about NULLS LAST ordering and no backward scan. Revert no-longer-needed changes of find_ec_member_for_tle(); while the new definition was no worse than the old, it wasn't better either, and it might cause back-patching pain. Revert entirely bogus additions to genam.h.	2015-05-21 19:47:48 -04:00
Tom Lane	d4b538ea36	Improve packing/alignment annotation for ItemPointerData. We want this struct to be exactly a series of 3 int16 words, no more and no less. Historically, at least, some ARM compilers preferred to pad it to 8 bytes unless coerced. Our old way of doing that was just to use __attribute__((packed)), but as pointed out by Piotr Stefaniak, that does too much: it also licenses the compiler to give the struct only byte-alignment. We don't want that because it adds access overhead, possibly quite significant overhead. According to the GCC manual, what we want requires also specifying __attribute__((align(2))). It's not entirely clear if all the relevant compilers accept this pragma as well, but we can hope the buildfarm will tell us if not. We can also add a static assertion that should fire if the compiler padded the struct. Since the combination of these pragmas should define exactly what we want on any compiler that accepts them, let's try using them wherever we think they exist, not only for __arm__. (This is likely to expose that the conditional definitions in c.h are inadequate, but finding that out would be a good thing.) The immediate motivation for this is that the current definition of ExecRowMark allows its curCtid field to be misaligned. It is not clear whether there are any other uses of ItemPointerData with a similar hazard. We could change the definition of ExecRowMark if this doesn't work, but it would be far better to have a future-proof fix. Piotr Stefaniak, some further hacking by me	2015-05-21 17:21:46 -04:00
Heikki Linnakangas	fa60fb63e5	Fix more typos in comments. Patch by CharSyam, plus a few more I spotted with grep.	2015-05-20 19:45:43 +03:00
Heikki Linnakangas	4fc72cc7bb	Collection of typo fixes. Use "a" and "an" correctly, mostly in comments. Two error messages were also fixed (they were just elogs, so no translation work required). Two function comments in pg_proc.h were also fixed. Etsuro Fujita reported one of these, but I found a lot more with grep. Also fix a few other typos spotted while grepping for the a/an typos. For example, "consists out of ..." -> "consists of ...". Plus a "though"/ "through" mixup reported by Euler Taveira. Many of these typos were in old code, which would be nice to backpatch to make future backpatching easier. But much of the code was new, and I didn't feel like crafting separate patches for each branch. So no backpatching.	2015-05-20 16:56:22 +03:00
Tom Lane	0c071936e9	Revert error-throwing wrappers for the printf family of functions. This reverts commit `16304a0134`, except for its changes in src/port/snprintf.c; as well as commit `cac18a76bb` which is no longer needed. Fujii Masao reported that the previous commit caused failures in psql on OS X, since if one exits the pager program early while viewing a query result, psql sees an EPIPE error from fprintf --- and the wrapper function thought that was reason to panic. (It's a bit surprising that the same does not happen on Linux.) Further discussion among the security list concluded that the risk of other such failures was far too great, and that the one-size-fits-all approach to error handling embodied in the previous patch is unlikely to be workable. This leaves us again exposed to the possibility of the type of failure envisioned in CVE-2015-3166. However, that failure mode is strictly hypothetical at this point: there is no concrete reason to believe that an attacker could trigger information disclosure through the supposed mechanism. In the first place, the attack surface is fairly limited, since so much of what the backend does with format strings goes through stringinfo.c or psprintf(), and those already had adequate defenses. In the second place, even granting that an unprivileged attacker could control the occurrence of ENOMEM with some precision, it's a stretch to believe that he could induce it just where the target buffer contains some valuable information. So we concluded that the risk of non-hypothetical problems induced by the patch greatly outweighs the security risks. We will therefore revert, and instead undertake closer analysis to identify specific calls that may need hardening, rather than attempt a universal solution. We have kept the portion of the previous patch that improved snprintf.c's handling of errors when it calls the platform's sprintf(). That seems to be an unalloyed improvement. Security: CVE-2015-3166	2015-05-19 18:19:38 -04:00
Andres Freund	0740cbd759	Refactor ON CONFLICT index inference parse tree representation. Defer lookup of opfamily and input type of a of a user specified opclass until the optimizer selects among available unique indexes; and store the opclass in the parse analyzed tree instead. The primary reason for doing this is that for rule deparsing it's easier to use the opclass than the previous representation. While at it also rename a variable in the inference code to better fit it's purpose. This is separate from the actual fixes for deparsing to make review easier.	2015-05-19 21:21:27 +02:00
Tom Lane	0b28ea79c0	Avoid collation dependence in indexes of system catalogs. No index in template0 should have collation-dependent ordering, especially not indexes on shared catalogs. For most textual columns we avoid this issue by using type "name" (which sorts per strcmp()). However there are a few indexed columns that we'd prefer to use "text" for, and for that, the default opclass text_ops is unsafe. Fortunately, text_pattern_ops is safe (it sorts per memcmp()), and it has no real functional disadvantage for our purposes. So change the indexes on pg_seclabel.provider and pg_shseclabel.provider to use text_pattern_ops. In passing, also mark pg_replication_origin.roname as using text_pattern_ops --- for some reason it was labeled varchar_pattern_ops which is just wrong, even though it accidentally worked. Add regression test queries to catch future errors of these kinds. We still can't do anything about the misdeclared pg_seclabel and pg_shseclabel indexes in back branches :-(	2015-05-19 11:47:42 -04:00
Tom Lane	afee04352b	Revert "Change pg_seclabel.provider and pg_shseclabel.provider to type "name"." This reverts commit `b82a7be603`. There is a better (less invasive) way to fix it, which I will commit next.	2015-05-19 10:40:04 -04:00
Tom Lane	b82a7be603	Change pg_seclabel.provider and pg_shseclabel.provider to type "name". These were "text", but that's a bad idea because it has collation-dependent ordering. No index in template0 should have collation-dependent ordering, especially not indexes on shared catalogs. There was general agreement that provider names don't need to be longer than other identifiers, so we can fix this at a small waste of table space by changing from text to name. There's no way to fix the problem in the back branches, but we can hope that security labels don't yet have widespread-enough usage to make it urgent to fix. There needs to be a regression sanity test to prevent us from making this same mistake again; but before putting that in, we'll need to get rid of similar brain fade in the recently-added pg_replication_origin catalog. Note: for lack of a suitable testing environment, I've not really exercised this change. I trust the buildfarm will show up any mistakes.	2015-05-18 20:07:53 -04:00
Tom Lane	4db485e75b	Put back a backwards-compatible version of sampling support functions. Commit `83e176ec18` removed the longstanding support functions for block sampling without any consideration of the impact this would have on third-party FDWs. The new API is not notably more functional for FDWs than the old, so forcing them to change doesn't seem like a good thing. We can provide the old API as a wrapper (more or less) around the new one for a minimal amount of extra code.	2015-05-18 18:34:37 -04:00
Noah Misch	16304a0134	Add error-throwing wrappers for the printf family of functions. All known standard library implementations of these functions can fail with ENOMEM. A caller neglecting to check for failure would experience missing output, information exposure, or a crash. Check return values within wrappers and code, currently just snprintf.c, that bypasses the wrappers. The wrappers do not return after an error, so their callers need not check. Back-patch to 9.0 (all supported versions). Popular free software standard library implementations do take pains to bypass malloc() in simple cases, but they risk ENOMEM for floating point numbers, positional arguments, large field widths, and large precisions. No specification demands such caution, so this commit regards every call to a printf family function as a potential threat. Injecting the wrappers implicitly is a compromise between patch scope and design goals. I would prefer to edit each call site to name a wrapper explicitly. libpq and the ECPG libraries would, ideally, convey errors to the caller rather than abort(). All that would be painfully invasive for a back-patched security fix, hence this compromise. Security: CVE-2015-3166	2015-05-18 10:02:31 -04:00
Noah Misch	cac18a76bb	Permit use of vsprintf() in PostgreSQL code. The next commit needs it. Back-patch to 9.0 (all supported versions).	2015-05-18 10:02:31 -04:00
Tom Lane	424661913c	Fix failure to copy IndexScan.indexorderbyops in copyfuncs.c. This oversight results in a crash at executor startup if the plan has been copied. outfuncs.c was missed as well. While we could probably have taught both those files to cope with the originally chosen representation of an Oid array, it would have been painful, not least because there'd be no easy way to verify the array length. An Oid List is far easier to work with. And AFAICS, there is no particular notational benefit to using an array rather than a list in the existing parts of the patch either. So just change it to a list. Error in commit `35fcb1b3d0`, which is new, so no need for back-patch.	2015-05-17 21:22:12 -04:00
Andres Freund	f3d3118532	Support GROUPING SETS, CUBE and ROLLUP. This SQL standard functionality allows to aggregate data by different GROUP BY clauses at once. Each grouping set returns rows with columns grouped by in other sets set to NULL. This could previously be achieved by doing each grouping as a separate query, conjoined by UNION ALLs. Besides being considerably more concise, grouping sets will in many cases be faster, requiring only one scan over the underlying data. The current implementation of grouping sets only supports using sorting for input. Individual sets that share a sort order are computed in one pass. If there are sets that don't share a sort order, additional sort & aggregation steps are performed. These additional passes are sourced by the previous sort step; thus avoiding repeated scans of the source data. The code is structured in a way that adding support for purely using hash aggregation or a mix of hashing and sorting is possible. Sorting was chosen to be supported first, as it is the most generic method of implementation. Instead of, as in an earlier versions of the patch, representing the chain of sort and aggregation steps as full blown planner and executor nodes, all but the first sort are performed inside the aggregation node itself. This avoids the need to do some unusual gymnastics to handle having to return aggregated and non-aggregated tuples from underlying nodes, as well as having to shut down underlying nodes early to limit memory usage. The optimizer still builds Sort/Agg node to describe each phase, but they're not part of the plan tree, but instead additional data for the aggregation node. They're a convenient and preexisting way to describe aggregation and sorting. The first (and possibly only) sort step is still performed as a separate execution step. That retains similarity with existing group by plans, makes rescans fairly simple, avoids very deep plans (leading to slow explains) and easily allows to avoid the sorting step if the underlying data is sorted by other means. A somewhat ugly side of this patch is having to deal with a grammar ambiguity between the new CUBE keyword and the cube extension/functions named cube (and rollup). To avoid breaking existing deployments of the cube extension it has not been renamed, neither has cube been made a reserved keyword. Instead precedence hacking is used to make GROUP BY cube(..) refer to the CUBE grouping sets feature, and not the function cube(). To actually group by a function cube(), unlikely as that might be, the function name has to be quoted. Needs a catversion bump because stored rules may change. Author: Andrew Gierth and Atri Sharma, with contributions from Andres Freund Reviewed-By: Andres Freund, Noah Misch, Tom Lane, Svenne Krap, Tomas Vondra, Erik Rijkers, Marti Raudsepp, Pavel Stehule Discussion: CAOeZVidmVRe2jU6aMk_5qkxnB7dfmPROzM7Ur8JPW5j8Y5X-Lw@mail.gmail.com	2015-05-16 03:46:31 +02:00
Alvaro Herrera	b0b7be6133	Add BRIN infrastructure for "inclusion" opclasses This lets BRIN be used with R-Tree-like indexing strategies. Also provided are operator classes for range types, box and inet/cidr. The infrastructure provided here should be sufficient to create operator classes for similar datatypes; for instance, opclasses for PostGIS geometries should be doable, though we didn't try to implement one. (A box/point opclass was also submitted, but we ripped it out before commit because the handling of floating point comparisons in existing code is inconsistent and would generate corrupt indexes.) Author: Emre Hasegeli. Cosmetic changes by me Review: Andreas Karlsson	2015-05-15 18:05:22 -03:00
Alvaro Herrera	26df7066cc	Move strategy numbers to include/access/stratnum.h For upcoming BRIN opclasses, it's convenient to have strategy numbers defined in a single place. Since there's nothing appropriate, create it. The StrategyNumber typedef now lives there, as well as existing strategy numbers for B-trees (from skey.h) and R-tree-and-friends (from gist.h). skey.h is forced to include stratnum.h because of the StrategyNumber typedef, but gist.h is not; extensions that currently rely on gist.h for rtree strategy numbers might need to add a new A few .c files can stop including skey.h and/or gist.h, which is a nice side benefit. Per discussion: https://www.postgresql.org/message-id/20150514232132.GZ2523@alvh.no-ip.org Authored by Emre Hasegeli and Álvaro. (It's not clear to me why bootscanner.l has any #include lines at all.)	2015-05-15 17:03:16 -03:00
Simon Riggs	f6d208d6e5	TABLESAMPLE, SQL Standard and extensible Add a TABLESAMPLE clause to SELECT statements that allows user to specify random BERNOULLI sampling or block level SYSTEM sampling. Implementation allows for extensible sampling functions to be written, using a standard API. Basic version follows SQLStandard exactly. Usable concrete use cases for the sampling API follow in later commits. Petr Jelinek Reviewed by Michael Paquier and Simon Riggs	2015-05-15 14:37:10 -04:00
Heikki Linnakangas	ffd37740ee	Add archive_mode='always' option. In 'always' mode, the standby independently archives all files it receives from the primary. Original patch by Fujii Masao, docs and review by me.	2015-05-15 18:55:24 +03:00
Heikki Linnakangas	98edd617f3	Fix datatype confusion with the new lossy GiST distance functions. We can only support a lossy distance function when the distance function's datatype is comparable with the original ordering operator's datatype. The distance function always returns a float8, so we are limited to float8, and float4 (by a hard-coded cast of the float8 to float4). In light of this limitation, it seems like a good idea to have a separate 'recheck' flag for the ORDER BY expressions, so that if you have a non-lossy distance function, it still works with lossy quals. There are cases like that with the build-in or contrib opclasses, but it's plausible. There was a hidden assumption that the ORDER BY values returned by GiST match the original ordering operator's return type, but there are plenty of examples where that's not true, e.g. in btree_gist and pg_trgm. As long as the distance function is not lossy, we can tolerate that and just not return the distance to the executor (or rather, always return NULL). The executor doesn't need the distances if there are no lossy results. There was another little bug: the recheck variable was not initialized before calling the distance function. That revealed the bigger issue, as the executor tried to reorder tuples that didn't need reordering, and that failed because of the datatype mismatch.	2015-05-15 18:09:31 +03:00
Heikki Linnakangas	35fcb1b3d0	Allow GiST distance function to return merely a lower-bound. The distance function can now set *recheck = false, like index quals. The executor will then re-check the ORDER BY expressions, and use a queue to reorder the results on the fly. This makes it possible to do kNN-searches on polygons and circles, which don't store the exact value in the index, but just a bounding box. Alexander Korotkov and me	2015-05-15 14:26:51 +03:00
Fujii Masao	ecd222e770	Support VERBOSE option in REINDEX command. When this option is specified, a progress report is printed as each index is reindexed. Per discussion, we agreed on the following syntax for the extensibility of the options. REINDEX (flexible options) { INDEX \| ... } name Sawada Masahiko. Reviewed by Robert Haas, Fabrízio Mello, Alvaro Herrera, Kyotaro Horiguchi, Jim Nasby and me. Discussion: CAD21AoA0pK3YcOZAFzMae+2fcc3oGp5zoRggDyMNg5zoaWDhdQ@mail.gmail.com	2015-05-15 20:09:57 +09:00
Tom Lane	7730f48ede	Teach UtfToLocal/LocalToUtf to support algorithmic encoding conversions. Until now, these functions have only supported encoding conversions using lookup tables, which is fine as long as there's not too many code points to convert. However, GB18030 expects all 1.1 million Unicode code points to be convertible, which would require a ridiculously-sized lookup table. Fortunately, a large fraction of those conversions can be expressed through arithmetic, ie the conversions are one-to-one in certain defined ranges. To support that, provide a callback function that is used after consulting the lookup tables. (This patch doesn't actually change anything about the GB18030 conversion behavior, just provide infrastructure for fixing it.) Since this requires changing the APIs of UtfToLocal/LocalToUtf anyway, take the opportunity to rearrange their argument lists into what seems to me a saner order. And beautify the call sites by using lengthof() instead of error-prone sizeof() arithmetic. In passing, also mark all the lookup tables used by these calls "const". This moves an impressive amount of stuff into the text segment, at least on my machine, and is safer anyhow.	2015-05-14 22:27:12 -04:00
Simon Riggs	83e176ec18	Separate block sampling functions Refactoring ahead of tablesample patch Requested and reviewed by Michael Paquier Petr Jelinek	2015-05-15 04:02:54 +02:00
Peter Eisentraut	a486e35706	Add pg_settings.pending_restart column with input from David G. Johnston, Robert Haas, Michael Paquier	2015-05-14 20:08:51 -04:00
Tom Lane	1dc5ebc907	Support "expanded" objects, particularly arrays, for better performance. This patch introduces the ability for complex datatypes to have an in-memory representation that is different from their on-disk format. On-disk formats are typically optimized for minimal size, and in any case they can't contain pointers, so they are often not well-suited for computation. Now a datatype can invent an "expanded" in-memory format that is better suited for its operations, and then pass that around among the C functions that operate on the datatype. There are also provisions (rudimentary as yet) to allow an expanded object to be modified in-place under suitable conditions, so that operations like assignment to an element of an array need not involve copying the entire array. The initial application for this feature is arrays, but it is not hard to foresee using it for other container types like JSON, XML and hstore. I have hopes that it will be useful to PostGIS as well. In this initial implementation, a few heuristics have been hard-wired into plpgsql to improve performance for arrays that are stored in plpgsql variables. We would like to generalize those hacks so that other datatypes can obtain similar improvements, but figuring out some appropriate APIs is left as a task for future work. (The heuristics themselves are probably not optimal yet, either, as they sometimes force expansion of arrays that would be better left alone.) Preliminary performance testing shows impressive speed gains for plpgsql functions that do element-by-element access or update of large arrays. There are other cases that get a little slower, as a result of added array format conversions; but we can hope to improve anything that's annoyingly bad. In any case most applications should see a net win. Tom Lane, reviewed by Andres Freund	2015-05-14 12:08:49 -04:00
Andrew Dunstan	5c7df74204	Fix some errors from jsonb functions patch. The catalog version should have been bumped, and the alternative regression result file was not up to date with the name of jsonb_pretty.	2015-05-12 16:54:38 -04:00
Andrew Dunstan	c6947010ce	Additional functions and operators for jsonb jsonb_pretty(jsonb) produces nicely indented json output. jsonb \|\| jsonb concatenates two jsonb values. jsonb - text removes a key and its associated value from the json jsonb - int removes the designated array element jsonb - text[] removes a key and associated value or array element at the designated path jsonb_replace(jsonb,text[],jsonb) replaces the array element designated by the path or the value associated with the key designated by the path with the given value. Original work by Dmitry Dolgov, adapted and reworked for PostgreSQL core by Andrew Dunstan, reviewed and tidied up by Petr Jelinek.	2015-05-12 15:52:45 -04:00
Tom Lane	afb9249d06	Add support for doing late row locking in FDWs. Previously, FDWs could only do "early row locking", that is lock a row as soon as it's fetched, even though local restriction/join conditions might discard the row later. This patch adds callbacks that allow FDWs to do late locking in the same way that it's done for regular tables. To make use of this feature, an FDW must support the "ctid" column as a unique row identifier. Currently, since ctid has to be of type TID, the feature is of limited use, though in principle it could be used by postgres_fdw. We may eventually allow FDWs to specify another data type for ctid, which would make it possible for more FDWs to use this feature. This commit does not modify postgres_fdw to use late locking. We've tested some prototype code for that, but it's not in committable shape, and besides it's quite unclear whether it actually makes sense to do late locking against a remote server. The extra round trips required are likely to outweigh any benefit from improved concurrency. Etsuro Fujita, reviewed by Ashutosh Bapat, and hacked up a lot by me	2015-05-12 14:10:17 -04:00
Andrew Dunstan	72d422a522	Map basebackup tablespaces using a tablespace_map file Windows can't reliably restore symbolic links from a tar format, so instead during backup start we create a tablespace_map file, which is used by the restoring postgres to create the correct links in pg_tblspc. The backup protocol also now has an option to request this file to be included in the backup stream, and this is used by pg_basebackup when operating in tar mode. This is done on all platforms, not just Windows. This means that pg_basebackup will not not work in tar mode against 9.4 and older servers, as this protocol option isn't implemented there. Amit Kapila, reviewed by Dilip Kumar, with a little editing from me.	2015-05-12 09:29:10 -04:00
Alvaro Herrera	b488c580ae	Allow on-the-fly capture of DDL event details This feature lets user code inspect and take action on DDL events. Whenever a ddl_command_end event trigger is installed, DDL actions executed are saved to a list which can be inspected during execution of a function attached to ddl_command_end. The set-returning function pg_event_trigger_ddl_commands can be used to list actions so captured; it returns data about the type of command executed, as well as the affected object. This is sufficient for many uses of this feature. For the cases where it is not, we also provide a "command" column of a new pseudo-type pg_ddl_command, which is a pointer to a C structure that can be accessed by C code. The struct contains all the info necessary to completely inspect and even reconstruct the executed command. There is no actual deparse code here; that's expected to come later. What we have is enough infrastructure that the deparsing can be done in an external extension. The intention is that we will add some deparsing code in a later release, as an in-core extension. A new test module is included. It's probably insufficient as is, but it should be sufficient as a starting point for a more complete and future-proof approach. Authors: Álvaro Herrera, with some help from Andres Freund, Ian Barwick, Abhijit Menon-Sen. Reviews by Andres Freund, Robert Haas, Amit Kapila, Michael Paquier, Craig Ringer, David Steele. Additional input from Chris Browne, Dimitri Fontaine, Stephen Frost, Petr Jelínek, Tom Lane, Jim Nasby, Steven Singer, Pavel Stěhule. Based on original work by Dimitri Fontaine, though I didn't use his code. Discussion: https://www.postgresql.org/message-id/m2txrsdzxa.fsf@2ndQuadrant.fr https://www.postgresql.org/message-id/20131108153322.GU5809@eldon.alvh.no-ip.org https://www.postgresql.org/message-id/20150215044814.GL3391@alvh.no-ip.org	2015-05-11 19:14:31 -03:00
Tom Lane	1a8a4e5cde	Code review for foreign/custom join pushdown patch. Commit `e7cb7ee145` included some design decisions that seem pretty questionable to me, and there was quite a lot of stuff not to like about the documentation and comments. Clean up as follows: * Consider foreign joins only between foreign tables on the same server, rather than between any two foreign tables with the same underlying FDW handler function. In most if not all cases, the FDW would simply have had to apply the same-server restriction itself (far more expensively, both for lack of caching and because it would be repeated for each combination of input sub-joins), or else risk nasty bugs. Anyone who's really intent on doing something outside this restriction can always use the set_join_pathlist_hook. * Rename fdw_ps_tlist/custom_ps_tlist to fdw_scan_tlist/custom_scan_tlist to better reflect what they're for, and allow these custom scan tlists to be used even for base relations. * Change make_foreignscan() API to include passing the fdw_scan_tlist value, since the FDW is required to set that. Backwards compatibility doesn't seem like an adequate reason to expect FDWs to set it in some ad-hoc extra step, and anyway existing FDWs can just pass NIL. * Change the API of path-generating subroutines of add_paths_to_joinrel, and in particular that of GetForeignJoinPaths and set_join_pathlist_hook, so that various less-used parameters are passed in a struct rather than as separate parameter-list entries. The objective here is to reduce the probability that future additions to those parameter lists will result in source-level API breaks for users of these hooks. It's possible that this is even a small win for the core code, since most CPU architectures can't pass more than half a dozen parameters efficiently anyway. I kept root, joinrel, outerrel, innerrel, and jointype as separate parameters to reduce code churn in joinpath.c --- in particular, putting jointype into the struct would have been problematic because of the subroutines' habit of changing their local copies of that variable. * Avoid ad-hocery in ExecAssignScanProjectionInfo. It was probably all right for it to know about IndexOnlyScan, but if the list is to grow we should refactor the knowledge out to the callers. * Restore nodeForeignscan.c's previous use of the relcache to avoid extra GetFdwRoutine lookups for base-relation scans. * Lots of cleanup of documentation and missed comments. Re-order some code additions into more logical places.	2015-05-10 14:36:36 -04:00
Andrew Dunstan	cb9fa802b3	Add new OID alias type regnamespace Catalog version bumped Kyotaro HORIGUCHI	2015-05-09 13:36:52 -04:00
Andrew Dunstan	0c90f6769d	Add new OID alias type regrole The new type has the scope of whole the database cluster so it doesn't behave the same as the existing OID alias types which have database scope, concerning object dependency. To avoid confusion constants of the new type are prohibited from appearing where dependencies are made involving it. Also, add a note to the docs about possible MVCC violation and optimization issues, which are general over the all reg* types. Kyotaro Horiguchi	2015-05-09 13:06:49 -04:00
Stephen Frost	4b342fb591	Bump catversion for pg_file_settings Pointed out by Andres (thanks!) Apologies for not including it in the initial patch.	2015-05-08 19:14:32 -04:00
Stephen Frost	a97e0c3354	Add pg_file_settings view and function The function and view added here provide a way to look at all settings in postgresql.conf, any #include'd files, and postgresql.auto.conf (which is what backs the ALTER SYSTEM command). The information returned includes the configuration file name, line number in that file, sequence number indicating when the parameter is loaded (useful to see if it is later masked by another definition of the same parameter), parameter name, and what it is set to at that point. This information is updated on reload of the server. This is unfiltered, privileged, information and therefore access is restricted to superusers through the GRANT system. Author: Sawada Masahiko, various improvements by me. Reviewers: David Steele	2015-05-08 19:09:26 -04:00
Heikki Linnakangas	de7688442f	At promotion, archive last segment from old timeline with .partial suffix. Previously, we would archive the possible-incomplete WAL segment with its normal filename, but that causes trouble if the server owning that timeline is still running, and tries to archive the same segment later. It's not nice for the standby to trip up the master's archival like that. And it's pretty confusing, anyway, to have an incomplete segment in the archive that's indistinguishable from a normal, complete segment. To avoid such confusion, add a .partial suffix to the file. Or to be more precise, make a copy of the old segment under the .partial suffix, and archive that instead of the original file. pg_receivexlog also uses the .partial suffix for the same purpose, to tell apart incompletely streamed files from complete ones. There is no automatic mechanism to use the .partial files at recovery, so they will go unused, unless the administrator manually copies to them to the pg_xlog directory (and removes the .partial suffix). Recovery won't normally need the WAL - when recovering to the new timeline, it will find the same WAL on the first segment on the new timeline instead - but it nevertheless feels better to archive the file with the .partial suffix, for debugging purposes if nothing else.	2015-05-08 21:59:01 +03:00
Heikki Linnakangas	179cdd0981	Add macros to check if a filename is a WAL segment or other such file. We had many instances of the strlen + strspn combination to check for that. This makes the code a bit easier to read.	2015-05-08 21:58:57 +03:00
Andres Freund	e8898e9169	Minor ON CONFLICT related comments and doc fixes. Geoff Winkless, Stephen Frost, Peter Geoghegan and me.	2015-05-08 19:24:14 +02:00
Robert Haas	53bb309d2d	Teach autovacuum about multixact member wraparound. The logic introduced in commit `b69bf30b9b` and repaired in commits `669c7d20e6` and `7be47c56af` helps to ensure that we don't overwrite old multixact member information while it is still needed, but a user who creates many large multixacts can still exhaust the member space (and thus start getting errors) while autovacuum stands idly by. To fix this, progressively ramp down the effective value (but not the actual contents) of autovacuum_multixact_freeze_max_age as member space utilization increases. This makes autovacuum more aggressive and also reduces the threshold for a manual VACUUM to perform a full-table scan. This patch leaves unsolved the problem of ensuring that emergency autovacuums are triggered even when autovacuum=off. We'll need to fix that via a separate patch. Thomas Munro and Robert Haas	2015-05-08 12:53:00 -04:00
Andres Freund	168d5805e4	Add support for INSERT ... ON CONFLICT DO NOTHING/UPDATE. The newly added ON CONFLICT clause allows to specify an alternative to raising a unique or exclusion constraint violation error when inserting. ON CONFLICT refers to constraints that can either be specified using a inference clause (by specifying the columns of a unique constraint) or by naming a unique or exclusion constraint. DO NOTHING avoids the constraint violation, without touching the pre-existing row. DO UPDATE SET ... [WHERE ...] updates the pre-existing tuple, and has access to both the tuple proposed for insertion and the existing tuple; the optional WHERE clause can be used to prevent an update from being executed. The UPDATE SET and WHERE clauses have access to the tuple proposed for insertion using the "magic" EXCLUDED alias, and to the pre-existing tuple using the table name or its alias. This feature is often referred to as upsert. This is implemented using a new infrastructure called "speculative insertion". It is an optimistic variant of regular insertion that first does a pre-check for existing tuples and then attempts an insert. If a violating tuple was inserted concurrently, the speculatively inserted tuple is deleted and a new attempt is made. If the pre-check finds a matching tuple the alternative DO NOTHING or DO UPDATE action is taken. If the insertion succeeds without detecting a conflict, the tuple is deemed inserted. To handle the possible ambiguity between the excluded alias and a table named excluded, and for convenience with long relation names, INSERT INTO now can alias its target table. Bumps catversion as stored rules change. Author: Peter Geoghegan, with significant contributions from Heikki Linnakangas and Andres Freund. Testing infrastructure by Jeff Janes. Reviewed-By: Heikki Linnakangas, Andres Freund, Robert Haas, Simon Riggs, Dean Rasheed, Stephen Frost and many others.	2015-05-08 05:43:10 +02:00
Andres Freund	2c8f4836db	Represent columns requiring insert and update privileges indentently. Previously, relation range table entries used a single Bitmapset field representing which columns required either UPDATE or INSERT privileges, despite the fact that INSERT and UPDATE privileges are separately cataloged, and may be independently held. As statements so far required either insert or update privileges but never both, that was sufficient. The required permission could be inferred from the top level statement run. The upcoming INSERT ... ON CONFLICT UPDATE feature needs to independently check for both privileges in one statement though, so that is not sufficient anymore. Bumps catversion as stored rules change. Author: Peter Geoghegan Reviewed-By: Andres Freund	2015-05-08 00:20:46 +02:00
Alvaro Herrera	db5f98ab4f	Improve BRIN infra, minmax opclass and regression test The minmax opclass was using the wrong support functions when cross-datatypes queries were run. Instead of trying to fix the pg_amproc definitions (which apparently is not possible), use the already correct pg_amop entries instead. This requires jumping through more hoops (read: extra syscache lookups) to obtain the underlying functions to execute, but it is necessary for correctness. Author: Emre Hasegeli, tweaked by Álvaro Review: Andreas Karlsson Also change BrinOpcInfo to record each stored type's typecache entry instead of just the OID. Turns out that the full type cache is necessary in brin_deform_tuple: the original code used the indexed type's byval and typlen properties to extract the stored tuple, which is correct in Minmax; but in other implementations that want to store something different, that's wrong. The realization that this is a bug comes from Emre also, but I did not use his patch. I also adopted Emre's regression test code (with smallish changes), which is more complete.	2015-05-07 13:02:22 -03:00
Robert Haas	1998261034	Avoid using a C++ keyword as a structure member name. Per request from Peter Eisentraut.	2015-05-05 22:41:03 -04:00
Alvaro Herrera	3b6db1f445	Add geometry/range functions to support BRIN inclusion This commit adds the following functions: box(point) -> box bound_box(box, box) -> box inet_same_family(inet, inet) -> bool inet_merge(inet, inet) -> cidr range_merge(anyrange, anyrange) -> anyrange The first of these is also used to implement a new assignment cast from point to box. These functions are the first part of a base to implement an "inclusion" operator class for BRIN, for multidimensional data types. Author: Emre Hasegeli Reviewed by: Andreas Karlsson	2015-05-05 15:22:24 -03:00
Tom Lane	2503982be4	Improve procost estimates for some text search functions. The text search functions that involve parsing raw text into lexemes are remarkably CPU-intensive, so estimating them at the same cost as most other built-in functions seems like a mistake; moreover, doing so turns out to discourage the optimizer from using functional indexes on these functions. After some debate, we've agreed to raise procost from 1 to 100 for to_tsvector(), plainto_tsvector(), to_tsquery(), ts_headline(), ts_match_tt(), and ts_match_tq(), which are all the text search functions that parse raw text. Also increase procost for the 2-argument form of ts_rewrite() (tsquery_rewrite_query); while this function doesn't do text parsing, it does execute a user-supplied SQL query, so its previous procost of 1 is clearly a drastic underestimate. It seems reasonable to assign it the same cost we assign to PL functions by default, so 100 is the number here too. I did not bother bumping catversion for this change, since it does not break catalog compatibility with the server executable nor result in any regression test changes. Per complaint from Andrew Gierth and subsequent discussion.	2015-05-04 15:38:57 -04:00
Robert Haas	2ce439f337	Recursively fsync() the data directory after a crash. Otherwise, if there's another crash, some writes from after the first crash might make it to disk while writes from before the crash fail to make it to disk. This could lead to data corruption. Back-patch to all supported versions. Abhijit Menon-Sen, reviewed by Andres Freund and slightly revised by me.	2015-05-04 14:13:53 -04:00
Robert Haas	e7cb7ee145	Allow FDWs and custom scan providers to replace joins with scans. Foreign data wrappers can use this capability for so-called "join pushdown"; that is, instead of executing two separate foreign scans and then joining the results locally, they can generate a path which performs the join on the remote server and then is scanned locally. This commit does not extend postgres_fdw to take advantage of this capability; it just provides the infrastructure. Custom scan providers can use this in a similar way. Previously, it was only possible for a custom scan provider to scan a single relation. Now, it can scan an entire join tree, provided of course that it knows how to produce the same results that the join would have produced if executed normally. KaiGai Kohei, reviewed by Shigeru Hanada, Ashutosh Bapat, and me.	2015-05-01 08:50:35 -04:00
Andres Freund	2b22795b32	Copy editing of the replication origins patch. Michael Paquier and myself.	2015-05-01 12:22:13 +02:00
Andres Freund	1db12da85b	Fix unaligned memory access in xlog parsing due to replication origin patch. ParseCommitRecord() accessed xl_xact_origin directly. But the chunks in the commit record's data only have 4 byte alignment, whereas xl_xact_origin's members require 8 byte alignment on some platforms. Update comments to make not of that and copy the record to stack local storage before reading. With help from Stefan Kaltenbrunner in pinning down the buildfarm and verifying the fix.	2015-05-01 11:36:14 +02:00
Robert Haas	924bcf4f16	Create an infrastructure for parallel computation in PostgreSQL. This does four basic things. First, it provides convenience routines to coordinate the startup and shutdown of parallel workers. Second, it synchronizes various pieces of state (e.g. GUCs, combo CID mappings, transaction snapshot) from the parallel group leader to the worker processes. Third, it prohibits various operations that would result in unsafe changes to that state while parallelism is active. Finally, it propagates events that would result in an ErrorResponse, NoticeResponse, or NotifyResponse message being sent to the client from the parallel workers back to the master, from which they can then be sent on to the client. Robert Haas, Amit Kapila, Noah Misch, Rushabh Lathia, Jeevan Chalke. Suggestions and review from Andres Freund, Heikki Linnakangas, Noah Misch, Simon Riggs, Euler Taveira, and Jim Nasby.	2015-04-30 15:02:14 -04:00
Andres Freund	e0f26fc765	Correct replication origin's use of UINT16_MAX to PG_UINT16_MAX. We can't rely on UINT16_MAX being present, which is why we introduced PG_UINT16_MAX... Buildfarm animal bowerbird via Andrew Gierth.	2015-04-30 00:19:36 +02:00
Andres Freund	5aa2350426	Introduce replication progress tracking infrastructure. When implementing a replication solution ontop of logical decoding, two related problems exist: * How to safely keep track of replication progress * How to change replication behavior, based on the origin of a row; e.g. to avoid loops in bi-directional replication setups The solution to these problems, as implemented here, consist out of three parts: 1) 'replication origins', which identify nodes in a replication setup. 2) 'replication progress tracking', which remembers, for each replication origin, how far replay has progressed in a efficient and crash safe manner. 3) The ability to filter out changes performed on the behest of a replication origin during logical decoding; this allows complex replication topologies. E.g. by filtering all replayed changes out. Most of this could also be implemented in "userspace", e.g. by inserting additional rows contain origin information, but that ends up being much less efficient and more complicated. We don't want to require various replication solutions to reimplement logic for this independently. The infrastructure is intended to be generic enough to be reusable. This infrastructure also replaces the 'nodeid' infrastructure of commit timestamps. It is intended to provide all the former capabilities, except that there's only 2^16 different origins; but now they integrate with logical decoding. Additionally more functionality is accessible via SQL. Since the commit timestamp infrastructure has also been introduced in 9.5 (commit `73c986add`) changing the API is not a problem. For now the number of origins for which the replication progress can be tracked simultaneously is determined by the max_replication_slots GUC. That GUC is not a perfect match to configure this, but there doesn't seem to be sufficient reason to introduce a separate new one. Bumps both catversion and wal page magic. Author: Andres Freund, with contributions from Petr Jelinek and Craig Ringer Reviewed-By: Heikki Linnakangas, Petr Jelinek, Robert Haas, Steve Singer Discussion: 20150216002155.GI15326@awork2.anarazel.de, 20140923182422.GA15776@alap3.anarazel.de, 20131114172632.GE7522@alap2.anarazel.de	2015-04-29 19:30:53 +02:00
Stephen Frost	dcbf5948e1	Improve qual pushdown for RLS and SB views The original security barrier view implementation, on which RLS is built, prevented all non-leakproof functions from being pushed down to below the view, even when the function was not receiving any data from the view. This optimization improves on that situation by, instead of checking strictly for non-leakproof functions, it checks for Vars being passed to non-leakproof functions and allows functions which do not accept arguments or whose arguments are not from the current query level (eg: constants can be particularly useful) to be pushed down. As discussed, this does mean that a function which is pushed down might gain some idea that there are rows meeting a certain criteria based on the number of times the function is called, but this isn't a particularly new issue and the documentation in rules.sgml already addressed similar covert-channel risks. That documentation is updated to reflect that non-leakproof functions may be pushed down now, if they meet the above-described criteria. Author: Dean Rasheed, with a bit of rework to make things clearer, along with comment and documentation updates from me.	2015-04-27 12:29:42 -04:00
Andres Freund	6aab1f45ac	Fix various typos and grammar errors in comments. Author: Dmitriy Olshevskiy Discussion: 553D00A6.4090205@bk.ru	2015-04-26 18:42:31 +02:00
Peter Eisentraut	cac7658205	Add transforms feature This provides a mechanism for specifying conversions between SQL data types and procedural languages. As examples, there are transforms for hstore and ltree for PL/Perl and PL/Python. reviews by Pavel Stěhule and Andres Freund	2015-04-26 10:33:14 -04:00
Stephen Frost	e89bd02f58	Perform RLS WITH CHECK before constraints, etc The RLS capability is built on top of the WITH CHECK OPTION system which was added for auto-updatable views, however, unlike WCOs on views (which are mandated by the SQL spec to not fire until after all other constraints and checks are done), it makes much more sense for RLS checks to happen earlier than constraint and uniqueness checks. This patch reworks the structure which holds the WCOs a bit to be explicitly either VIEW or RLS checks and the RLS-related checks are done prior to the constraint and uniqueness checks. This also allows better error reporting as we are now reporting when a violation is due to a WITH CHECK OPTION and when it's due to an RLS policy violation, which was independently noted by Craig Ringer as being confusing. The documentation is also updated to include a paragraph about when RLS WITH CHECK handling is performed, as there have been a number of questions regarding that and the documentation was previously silent on the matter. Author: Dean Rasheed, with some kabitzing and comment changes by me.	2015-04-24 20:34:26 -04:00
Heikki Linnakangas	62420ae7d6	Move functions related to index maintenance to separate source file. There is enough code here to deserve a file of their own, not be buried in the middle of execUtils.c.	2015-04-24 09:33:23 +03:00
Stephen Frost	0bf22e0c8b	RLS fixes, new hooks, and new test module In prepend_row_security_policies(), defaultDeny was always true, so if there were any hook policies, the RLS policies on the table would just get discarded. Fixed to start off with defaultDeny as false and then properly set later if we detect that only the default deny policy exists for the internal policies. The infinite recursion detection in fireRIRrules() didn't properly manage the activeRIRs list in the case of WCOs, so it would incorrectly report infinite recusion if the same relation with RLS appeared more than once in the rtable, for example "UPDATE t ... FROM t ...". Further, the RLS expansion code in fireRIRrules() was handling RLS in the main loop through the rtable, which lead to RTEs being visited twice if they contained sublink subqueries, which prepend_row_security_policies() attempted to handle by exiting early if the RTE already had securityQuals. That doesn't work, however, since if the query involved a security barrier view on top of a table with RLS, the RTE would already have securityQuals (from the view) by the time fireRIRrules() was invoked, and so the table's RLS policies would be ignored. This is fixed in fireRIRrules() by handling RLS in a separate loop at the end, after dealing with any other sublink subqueries, thus ensuring that each RTE is only visited once for RLS expansion. The inheritance planner code didn't correctly handle non-target relations with RLS, which would get turned into subqueries during planning. Thus an update of the form "UPDATE t1 ... FROM t2 ..." where t1 has inheritance and t2 has RLS quals would fail. Fix by making sure to copy in and update the securityQuals when they exist for non-target relations. process_policies() was adding WCOs to non-target relations, which is unnecessary, and could lead to a lot of wasted time in the rewriter and the planner. Fix by only adding WCO policies when working on the result relation. Also in process_policies, we should be copying the USING policies to the WITH CHECK policies on a per-policy basis, fix by moving the copying up into the per-policy loop. Lastly, as noted by Dean, we were simply adding policies returned by the hook provided to the list of quals being AND'd, meaning that they would actually restrict records returned and there was no option to have internal policies and hook-based policies work together permissively (as all internal policies currently work). Instead, explicitly add support for both permissive and restrictive policies by having a hook for each and combining the results appropriately. To ensure this is all done correctly, add a new test module (test_rls_hooks) to test the various combinations of internal, permissive, and restrictive hook policies. Largely from Dean Rasheed (thanks!): CAEZATCVmFUfUOwwhnBTcgi6AquyjQ0-1fyKd0T3xBWJvn+xsFA@mail.gmail.com Author: Dean Rasheed, though I added the new hooks and test module.	2015-04-22 12:01:06 -04:00
Andres Freund	cef939c347	Rename pg_replication_slot's new active_in to active_pid. In `d811c037ce` active_in was added but discussion since showed that active_pid is preferred as a name. Discussion: CAMsr+YFKgZca5_7_ouaMWxA5PneJC9LNViPzpDHusaPhU9pA7g@mail.gmail.com	2015-04-22 09:43:40 +02:00
Andres Freund	d811c037ce	Add 'active_in' column to pg_replication_slots. Right now it is visible whether a replication slot is active in any session, but not in which. Adding the active_in column, containing the pid of the backend having acquired the slot, makes it much easier to associate pg_replication_slots entries with the corresponding pg_stat_replication/pg_stat_activity row. This should have been done from the start, but I (Andres) dropped the ball there somehow. Author: Craig Ringer, revised by me Discussion: CAMsr+YFKgZca5_7_ouaMWxA5PneJC9LNViPzpDHusaPhU9pA7g@mail.gmail.com	2015-04-21 11:51:06 +02:00
Bruce Momjian	f92fc4c95d	pg_upgrade: binary_upgrade_create_empty_extension() is strict Was broken by commit `30982be4e5`. Patch by Jeff Janes	2015-04-17 20:08:42 -04:00
Peter Eisentraut	30982be4e5	Integrate pg_upgrade_support module into backend Previously, these functions were created in a schema "binary_upgrade", which was deleted after pg_upgrade was finished. Because we don't want to keep that schema around permanently, move them to pg_catalog but rename them with a binary_upgrade_... prefix. The provided functions are only small wrappers around global variables that were added specifically for pg_upgrade use, so keeping the module separate does not create any modularity. The functions still check that they are only called in binary upgrade mode, so it is not possible to call these during normal operation. Reviewed-by: Michael Paquier <michael.paquier@gmail.com>	2015-04-14 19:26:37 -04:00
Heikki Linnakangas	b73e7a0716	Oops, fix misspelled #endif I hope this fixes the Windows builfarm failures.	2015-04-14 22:00:52 +03:00
Heikki Linnakangas	3dc2d62d04	Use Intel SSE 4.2 CRC instructions where available. Modern x86 and x86-64 processors with SSE 4.2 support have special instructions, crc32b and crc32q, for calculating CRC-32C. They greatly speed up CRC calculation. Whether the instructions can be used or not depends on the compiler and the target architecture. If generation of SSE 4.2 instructions is allowed for the target (-msse4.2 flag on gcc and clang), use them. If they are not allowed by default, but the compiler supports the -msse4.2 flag to enable them, compile just the CRC-32C function with -msse4.2 flag, and check at runtime whether the processor we're running on supports it. If it doesn't, fall back to the slicing-by-8 algorithm. (With the common defaults on current operating systems, the runtime-check variant is what you get in practice.) Abhijit Menon-Sen, heavily modified by me, reviewed by Andres Freund.	2015-04-14 17:05:03 +03:00
Heikki Linnakangas	4f700bcd20	Reorganize our CRC source files again. Now that we use CRC-32C in WAL and the control file, the "traditional" and "legacy" CRC-32 variants are not used in any frontend programs anymore. Move the code for those back from src/common to src/backend/utils/hash. Also move the slicing-by-8 implementation (back) to src/port. This is in preparation for next patch that will add another implementation that uses Intel SSE 4.2 instructions to calculate CRC-32C, where available.	2015-04-14 17:03:42 +03:00
Heikki Linnakangas	b2a5545bd6	Don't archive bogus recycled or preallocated files after timeline switch. After a timeline switch, we would leave behind recycled WAL segments that are in the future, but on the old timeline. After promotion, and after they become old enough to be recycled again, we would notice that they don't have a .ready or .done file, create a .ready file for them, and archive them. That's bogus, because the files contain garbage, recycled from an older timeline (or prealloced as zeros). We shouldn't archive such files. This could happen when we're following a timeline switch during replay, or when we switch to new timeline at end-of-recovery. To fix, whenever we switch to a new timeline, scan the data directory for WAL segments on the old timeline, but with a higher segment number, and remove them. Those don't belong to our timeline history, and are most likely bogus recycled or preallocated files. They could also be valid files that we streamed from the primary ahead of time, but in any case, they're not needed to recover to the new timeline.	2015-04-13 16:53:49 +03:00
Magnus Hagander	9029f4b374	Add system view pg_stat_ssl This view shows information about all connections, such as if the connection is using SSL, which cipher is used, and which client certificate (if any) is used. Reviews by Alex Shulgin, Heikki Linnakangas, Andres Freund & Michael Paquier	2015-04-12 19:07:46 +02:00
Alvaro Herrera	27846f02c1	Optimize locking a tuple already locked by another subxact Locking and updating the same tuple repeatedly led to some strange multixacts being created which had several subtransactions of the same parent transaction holding locks of the same strength. However, once a subxact of the current transaction holds a lock of a given strength, it's not necessary to acquire the same lock again. This made some coding patterns much slower than required. The fix is twofold. First we change HeapTupleSatisfiesUpdate to return HeapTupleBeingUpdated for the case where the current transaction is already a single-xid locker for the given tuple; it used to return HeapTupleMayBeUpdated for that case. The new logic is simpler, and the change to pgrowlocks is a testament to that: previously we needed to check for the single-xid locker separately in a very ugly way. That test is simpler now. As fallout from the HTSU change, some of its callers need to be amended so that tuple-locked-by-own-transaction is taken into account in the BeingUpdated case rather than the MayBeUpdated case. For many of them there is no difference; but heap_delete() and heap_update now check explicitely and do not grab tuple lock in that case. The HTSU change also means that routine MultiXactHasRunningRemoteMembers introduced in commit `11ac4c73cb` is no longer necessary and can be removed; the case that used to require it is now handled naturally as result of the changes to heap_delete and heap_update. The second part of the fix to the performance issue is to adjust heap_lock_tuple to avoid the slowness: 1. Previously we checked for the case that our own transaction already held a strong enough lock and returned MayBeUpdated, but only in the multixact case. Now we do it for the plain Xid case as well, which saves having to LockTuple. 2. If the current transaction is the only locker of the tuple (but with a lock not as strong as what we need; otherwise it would have been caught in the check mentioned above), we can skip sleeping on the multixact, and instead go straight to create an updated multixact with the additional lock strength. 3. Most importantly, make sure that both the single-xid-locker case and the multixact-locker case optimization are applied always. We do this by checking both in a single place, rather than them appearing in two separate portions of the routine -- something that is made possible by the HeapTupleSatisfiesUpdate API change. Previously we would only check for the single-xid case when HTSU returned MayBeUpdated, and only checked for the multixact case when HTSU returned BeingUpdated. This was at odds with what HTSU actually returned in one case: if our own transaction was locker in a multixact, it returned MayBeUpdated, so the optimization never applied. This is what led to the large multixacts in the first place. Per bug report #8470 by Oskari Saarenmaa.	2015-04-10 13:47:15 -03:00
Alvaro Herrera	e9a077cad3	pg_event_trigger_dropped_objects: add is_temp column It now also reports temporary objects dropped that are local to the backend. Previously we weren't reporting any temp objects because it was deemed unnecessary; but as it turns out, it is necessary if we want to keep close track of DDL command execution inside one session. Temp objects are reported as living in schema pg_temp, which works because such a schema-qualification always refers to the temp objects of the current session.	2015-04-06 11:40:55 -03:00
Alvaro Herrera	4ff695b17d	Add log_min_autovacuum_duration per-table option This is useful to control autovacuum log volume, for situations where monitoring only a set of tables is necessary. Author: Michael Paquier Reviewed by: A team led by Naoya Anzai (also including Akira Kurosawa, Taiki Kondo, Huong Dangminh), Fujii Masao.	2015-04-03 11:55:50 -03:00
Fujii Masao	8c8a886268	Add palloc_extended for frontend and backend. This commit also adds pg_malloc_extended for frontend. These interfaces can be used to control at a lower level memory allocation using an interface similar to MemoryContextAllocExtended. For example, the callers can specify MCXT_ALLOC_NO_OOM if they want to suppress the "out of memory" error while allocating the memory and handle a NULL return value. Michael Paquier, reviewed by me.	2015-04-03 17:36:12 +09:00
Robert Haas	abd94bcac4	Use abbreviated keys for faster sorting of numeric datums. Andrew Gierth, reviewed by Peter Geoghegan, with further tweaks by me.	2015-04-02 14:04:26 -04:00
Andres Freund	62e2a8dc2c	Define integer limits independently from the system definitions. In `83ff1618` we defined integer limits iff they're not provided by the system. That turns out not to be the greatest idea because there's different ways some datatypes can be represented. E.g. on OSX PG's 64bit datatype will be a 'long int', but OSX unconditionally uses 'long long'. That disparity then can lead to warnings, e.g. around printf formats. One way to fix that would be to back int64 using stdint.h's int64_t. While a good idea it's not that easy to implement. We would e.g. need to include stdint.h in our external headers, which we don't today. Also computing the correct int64 printf formats in that case is nontrivial. Instead simply prefix the integer limits with PG_ and define them unconditionally. I've adjusted all the references to them in code, but not the ones in comments; the latter seems unnecessary to me. Discussion: 20150331141423.GK4878@alap3.anarazel.de	2015-04-02 17:43:35 +02:00
Robert Haas	4cd639baf4	Revert "psql: fix \connect with URIs and conninfo strings" This reverts commit `fcef161729`, about which both the buildfarm and my local machine are very unhappy.	2015-04-02 10:10:22 -04:00
Alvaro Herrera	fcef161729	psql: fix \connect with URIs and conninfo strings psql was already accepting conninfo strings as the first parameter in \connect, but the way it worked wasn't sane; some of the other parameters would get the previous connection's values, causing it to connect to a completely unexpected server or, more likely, not finding any server at all because of completely wrong combinations of parameters. Fix by explicitely checking for a conninfo-looking parameter in the dbname position; if one is found, use its complete specification rather than mix with the other arguments. Also, change tab-completion to not try to complete conninfo/URI-looking "dbnames" and document that conninfos are accepted as first argument. There was a weak consensus to backpatch this, because while the behavior of using the dbname as a conninfo is nowhere documented for \connect, it is reasonable to expect that it works because it does work in many other contexts. Therefore this is backpatched all the way back to 9.0. To implement this, routines previously private to libpq have been duplicated so that psql can decide what looks like a conninfo/URI string. In back branches, just duplicate the same code all the way back to 9.2, where URIs where introduced; 9.0 and 9.1 have a simpler version. In master, the routines are moved to src/common and renamed. Author: David Fetter, Andrew Dunstan. Some editorialization by me (probably earning a Gierth's "Sloppy" badge in the process.) Reviewers: Andrew Gierth, Erik Rijkers, Pavel Stěhule, Stephen Frost, Robert Haas, Andrew Dunstan.	2015-04-01 20:00:07 -03:00
Heikki Linnakangas	f770870d9e	Move inet/cidr GiST opclass functions to correct place in header file. They were accidentally placed under the GIN heading. Andreas Karlsson	2015-04-01 19:20:45 +03:00
Andrew Dunstan	fa1e5afa8a	Run pg_upgrade and pg_resetxlog with restricted token on Windows As with initdb these programs need to run with a restricted token, and if they don't pg_upgrade will fail when run as a user with Adminstrator privileges. Backpatch to all live branches. On the development branch the code is reorganized so that the restricted token code is now in a single location. On the stable bramches a less invasive change is made by simply copying the relevant code to pg_upgrade.c and pg_resetxlog.c. Patches and bug report from Muhammad Asif Naeem, reviewed by Michael Paquier, slightly edited by me.	2015-03-30 17:07:52 -04:00
Alvaro Herrera	97690ea6e8	Change array_offset to return subscripts, not offsets ... and rename it and its sibling array_offsets to array_position and array_positions, to account for the changed behavior. Having the functions return subscripts better matches existing practice, and is better suited to using the result value as a subscript into the array directly. For one-based arrays, the new definition is identical to what was originally committed. (We use the term "subscript" in the documentation, which is what we use whenever we talk about arrays; but the functions themselves are named using the word "position" to match the standard-defined POSITION() functions.) Author: Pavel Stěhule Behavioral problem noted by Dean Rasheed.	2015-03-30 16:13:21 -03:00
Heikki Linnakangas	0633a60f4d	Add index-only scan support to range type GiST opclass. Andreas Karlsson	2015-03-30 13:22:38 +03:00
Heikki Linnakangas	3a20b0e7b6	Add index-only scan support to inet GiST opclass. Andreas Karlsson	2015-03-28 15:11:53 +02:00
Heikki Linnakangas	55b59eda13	Fix GiST index-only scans for opclasses with different storage type. We cannot use the index's tuple descriptor directly to describe the index tuples returned in an index-only scan. That's because the index might use a different datatype for the values stored on disk than the type originally indexed. As long as they were both pass-by-ref, it worked, but will not work for pass-by-value types of different sizes. I noticed this as a crash when I started hacking a patch to add fetch methods to btree_gist.	2015-03-26 23:07:52 +02:00
Tom Lane	785941cdc3	Tweak __attribute__-wrapping macros for better pgindent results. This improves on commit `bbfd7edae5` by making two simple changes: * pg_attribute_noreturn now takes parentheses, ie pg_attribute_noreturn(). Likewise pg_attribute_unused(), pg_attribute_packed(). This reduces pgindent's tendency to misformat declarations involving them. * attributes are now always attached to function declarations, not definitions. Previously some places were taking creative shortcuts, which were not merely candidates for bad misformatting by pgindent but often were outright wrong anyway. (It does little good to put a noreturn annotation where callers can't see it.) In any case, if we would like to believe that these macros can be used with non-gcc compilers, we should avoid gratuitous variance in usage patterns. I also went through and manually improved the formatting of a lot of declarations, and got rid of excessively repetitive (and now obsolete anyway) comments informing the reader what pg_attribute_printf is for.	2015-03-26 14:03:25 -04:00
Heikki Linnakangas	d04c8ed904	Add support for index-only scans in GiST. This adds a new GiST opclass method, 'fetch', which is used to reconstruct the original Datum from the value stored in the index. Also, the 'canreturn' index AM interface function gains a new 'attno' argument. That makes it possible to use index-only scans on a multi-column index where some of the opclasses support index-only scans but some do not. This patch adds support in the box and point opclasses. Other opclasses can added later as follow-on patches (btree_gist would be particularly interesting). Anastasia Lubennikova, with additional fixes and modifications by me.	2015-03-26 19:12:00 +02:00
Heikki Linnakangas	8fa393a6d7	Minor cleanup of GiST code, for readability. Remove the gistcentryinit function, inlining the relevant part of it into the only caller.	2015-03-26 19:11:54 +02:00

1 2 3 4 5 ...

6973 commits