This is a followup to commit 43ac12c6e6,
which added regression tests checking that I/O functions of built-in
types are not marked volatile. Complaining in CREATE TYPE should push
developers of add-on types to fix any misdeclared functions in their
types. It's just a warning not an error, to avoid creating upgrade
problems for what might be just cosmetic mis-markings.
Aside from adding the warning code, fix a number of types that were
sloppily created in the regression tests.
VACUUM and ANALYZE update the target table's pg_class row in-place, that is
nontransactionally. This is OK, more or less, for the statistical columns,
which are mostly nontransactional anyhow. It's not so OK for the DDL hint
flags (relhasindex etc), which might get changed in response to
transactional changes that could still be rolled back. This isn't a
problem for VACUUM, since it can't be run inside a transaction block nor
in parallel with DDL on the table. However, we allow ANALYZE inside a
transaction block, so if the transaction had earlier removed the last
index, rule, or trigger from the table, and then we roll back the
transaction after ANALYZE, the table would be left in a corrupted state
with the hint flags not set though they should be.
To fix, suppress the hint-flag updates if we are InTransactionBlock().
This is safe enough because it's always OK to postpone hint maintenance
some more; the worst-case consequence is a few extra searches of pg_index
et al. There was discussion of instead using a transactional update,
but that would change the behavior in ways that are not all desirable:
in most scenarios we're better off keeping ANALYZE's statistical values
even if the ANALYZE itself rolls back. In any case we probably don't want
to change this behavior in back branches.
Per bug #11638 from Casey Shobe. This has been broken for a good long
time, so back-patch to all supported branches.
Tom Lane and Michael Paquier, initial diagnosis by Andres Freund
Since we taught btree to handle ScalarArrayOpExpr quals natively (commit
9e8da0f757), the planner has always included
ScalarArrayOpExpr quals in index conditions if possible. However, if the
qual is for a non-first index column, this could result in an inferior plan
because we can no longer take advantage of index ordering (cf. commit
807a40c551). It can be better to omit the
ScalarArrayOpExpr qual from the index condition and let it be done as a
filter, so that the output doesn't need to get sorted. Indeed, this is
true for the query introduced as a test case by the latter commit.
To fix, restructure get_index_paths and build_index_paths so that we
consider paths both with and without ScalarArrayOpExpr quals in non-first
index columns. Redesign the API of build_index_paths so that it reports
what it found, saving useless second or third calls.
Report and patch by Andrew Gierth (though rather heavily modified by me).
Back-patch to 9.2 where this code was introduced, since the issue can
result in significant performance regressions compared to plans produced
by 9.1 and earlier.
We have a project policy that I/O functions must not be volatile, as per
commit aab353a60b, but we weren't doing
anything to enforce that. In most usage the marking of the function
doesn't matter as long as its behavior is sane --- but I/O casts can
expose the marking as user-visible behavior, as per today's complaint
from Joe Van Dyk about contrib/ltree.
This test as such will only protect us against future errors in built-in
data types. To catch the same error in contrib or third-party types,
perhaps we should make CREATE TYPE complain? But that's a separate
issue from enforcing the policy for built-in types.
If an inline-able SQL function taking a composite argument is used in a
LATERAL subselect, and the composite argument is a lateral reference,
the planner could fail with "variable not found in subplan target list",
as seen in bug #11703 from Karl Bartel. (The outer function call used in
the bug report and in the committed regression test is not really necessary
to provoke the bug --- you can get it if you manually expand the outer
function into "LATERAL (SELECT inner_function(outer_relation))", too.)
The cause of this is that we generate the reltargetlist for the referenced
relation before doing eval_const_expressions() on the lateral sub-select's
expressions (cf find_lateral_references()), so what's scheduled to be
emitted by the referenced relation is a whole-row Var, not the simplified
single-column Var produced by optimizing the function's FieldSelect on the
whole-row Var. Then setrefs.c fails to match up that lateral reference to
what's available from the outer scan.
Preserving the FieldSelect optimization in such cases would require either
major planner restructuring (to recursively do expression simplification
on sub-selects much earlier) or some amazingly ugly kluge to change the
reltargetlist of a possibly-already-planned relation. It seems better
just to skip the optimization when the Var is from an upper query level;
the case is not so common that it's likely anyone will notice a few
wasted cycles.
AFAICT this problem only occurs for uplevel LATERAL references, so
back-patch to 9.3 where LATERAL was added.
interval precision can only be specified after the "interval" keyword if
no units are specified.
Previously we incorrectly checked the units to see if the precision was
legal, causing confusion.
Report by Alvaro Herrera
Up to now, PG has assumed that any given timezone abbreviation (such as
"EDT") represents a constant GMT offset in the usage of any particular
region; we had a way to configure what that offset was, but not for it
to be changeable over time. But, as with most things horological, this
view of the world is too simplistic: there are numerous regions that have
at one time or another switched to a different GMT offset but kept using
the same timezone abbreviation. Almost the entire Russian Federation did
that a few years ago, and later this month they're going to do it again.
And there are similar examples all over the world.
To cope with this, invent the notion of a "dynamic timezone abbreviation",
which is one that is referenced to a particular underlying timezone
(as defined in the IANA timezone database) and means whatever it currently
means in that zone. For zones that use or have used daylight-savings time,
the standard and DST abbreviations continue to have the property that you
can specify standard or DST time and get that time offset whether or not
DST was theoretically in effect at the time. However, the abbreviations
mean what they meant at the time in question (or most recently before that
time) rather than being absolutely fixed.
The standard abbreviation-list files have been changed to use this behavior
for abbreviations that have actually varied in meaning since 1970. The
old simple-numeric definitions are kept for abbreviations that have not
changed, since they are a bit faster to resolve.
While this is clearly a new feature, it seems necessary to back-patch it
into all active branches, because otherwise use of Russian zone
abbreviations is going to become even more problematic than it already was.
This change supersedes the changes in commit 513d06ded et al to modify the
fixed meanings of the Russian abbreviations; since we've not shipped that
yet, this will avoid an undesirably incompatible (not to mention incorrect)
change in behavior for timestamps between 2011 and 2014.
This patch makes some cosmetic changes in ecpglib to keep its usage of
datetime lookup tables as similar as possible to the backend code, but
doesn't do anything about the increasingly obsolete set of timezone
abbreviation definitions that are hard-wired into ecpglib. Whatever we
do about that will likely not be appropriate material for back-patching.
Also, a potential free() of a garbage pointer after an out-of-memory
failure in ecpglib has been fixed.
This patch also fixes pre-existing bugs in DetermineTimeZoneOffset() that
caused it to produce unexpected results near a timezone transition, if
both the "before" and "after" states are marked as standard time. We'd
only ever thought about or tested transitions between standard and DST
time, but that's not what's happening when a zone simply redefines their
base GMT offset.
In passing, update the SGML documentation to refer to the Olson/zoneinfo/
zic timezone database as the "IANA" database, since it's now being
maintained under the auspices of IANA.
When determining whether one JSONB object contains another, it's okay to
make a quick exit if the first object has fewer pairs than the second:
because we de-duplicate keys within objects, it is impossible that the
first object has all the keys the second does. However, the code was
applying this rule to JSONB arrays as well, where it does *not* hold
because arrays can contain duplicate entries. The test was really in
the wrong place anyway; we should do it within JsonbDeepContains, where
it can be applied to nested objects not only top-level ones.
Report and test cases by Alexander Korotkov; fix by Peter Geoghegan and
Tom Lane.
As of commit a87c72915 (which later got backpatched as far as 9.1),
we're explicitly supporting the notion that append relations can be
nested; this can occur when UNION ALL constructs are nested, or when
a UNION ALL contains a table with inheritance children.
Bug #11457 from Nelson Page, as well as an earlier report from Elvis
Pranskevichus, showed that there were still nasty bugs associated with such
cases: in particular the EquivalenceClass mechanism could try to generate
"join" clauses connecting an appendrel child to some grandparent appendrel,
which would result in assertion failures or bogus plans.
Upon investigation I concluded that all current callers of
find_childrel_appendrelinfo() need to be fixed to explicitly consider
multiple levels of parent appendrels. The most complex fix was in
processing of "broken" EquivalenceClasses, which are ECs for which we have
been unable to generate all the derived equality clauses we would like to
because of missing cross-type equality operators in the underlying btree
operator family. That code path is more or less entirely untested by
the regression tests to date, because no standard opfamilies have such
holes in them. So I wrote a new regression test script to try to exercise
it a bit, which turned out to be quite a worthwhile activity as it exposed
existing bugs in all supported branches.
The present patch is essentially the same as far back as 9.2, which is
where parameterized paths were introduced. In 9.0 and 9.1, we only need
to back-patch a small fragment of commit 5b7b5518d, which fixes failure to
propagate out the original WHERE clauses when a broken EC contains constant
members. (The regression test case results show that these older branches
are noticeably stupider than 9.2+ in terms of the quality of the plans
generated; but we don't really care about plan quality in such cases,
only that the plan not be outright wrong. A more invasive fix in the
older branches would not be a good idea anyway from a plan-stability
standpoint.)
Per discussion, revert the commit which added 'ignore_nulls' to
row_to_json. This capability would be better added as an independent
function rather than being bolted on to row_to_json. Additionally,
the implementation didn't address complex JSON objects, and so was
incomplete anyway.
Pointed out by Tom and discussed with Andrew and Robert.
Andres pointed out that there was an extra ';' in equalPolicies, which
made me realize that my prior testing with CLOBBER_CACHE_ALWAYS was
insufficient (it didn't always catch the issue, just most of the time).
Thanks to that, a different issue was discovered, specifically in
equalRSDescs. This change corrects eqaulRSDescs to return 'true' once
all policies have been confirmed logically identical. After stepping
through both functions to ensure correct behavior, I ran this for
about 12 hours of CLOBBER_CACHE_ALWAYS runs of the regression tests
with no failures.
In addition, correct a few typos in the documentation which were pointed
out by Thom Brown (thanks!) and improve the policy documentation further
by adding a flushed out usage example based on a unix passwd file.
Lastly, clean up a few comments in the regression tests and pg_dump.h.
Several upcoming performance/scalability improvements require atomic
operations. This new API avoids the need to splatter compiler and
architecture dependent code over all the locations employing atomic
ops.
For several of the potential usages it'd be problematic to maintain
both, a atomics using implementation and one using spinlocks or
similar. In all likelihood one of the implementations would not get
tested regularly under concurrency. To avoid that scenario the new API
provides a automatic fallback of atomic operations to spinlocks. All
properties of atomic operations are maintained. This fallback -
obviously - isn't as fast as just using atomic ops, but it's not bad
either. For one of the future users the atomics ontop spinlocks
implementation was actually slightly faster than the old purely
spinlock using implementation. That's important because it reduces the
fear of regressing older platforms when improving the scalability for
new ones.
The API, loosely modeled after the C11 atomics support, currently
provides 'atomic flags' and 32 bit unsigned integers. If the platform
efficiently supports atomic 64 bit unsigned integers those are also
provided.
To implement atomics support for a platform/architecture/compiler for
a type of atomics 32bit compare and exchange needs to be
implemented. If available and more efficient native support for flags,
32 bit atomic addition, and corresponding 64 bit operations may also
be provided. Additional useful atomic operations are implemented
generically ontop of these.
The implementation for various versions of gcc, msvc and sun studio have
been tested. Additional existing stub implementations for
* Intel icc
* HUPX acc
* IBM xlc
are included but have never been tested. These will likely require
fixes based on buildfarm and user feedback.
As atomic operations also require barriers for some operations the
existing barrier support has been moved into the atomics code.
Author: Andres Freund with contributions from Oskari Saarenmaa
Reviewed-By: Amit Kapila, Robert Haas, Heikki Linnakangas and Álvaro Herrera
Discussion: CA+TgmoYBW+ux5-8Ja=Mcyuy8=VXAnVRHp3Kess6Pn3DMXAPAEA@mail.gmail.com,
20131015123303.GH5300@awork2.anarazel.de,
20131028205522.GI20248@awork2.anarazel.de
When the number of allowed iterations is limited (either a "?" quantifier
or a bound expression), the last sub-match has to reach to the end of the
target string. The previous coding here first tried the shortest possible
match (one character, usually) and then gave up and back-tracked if that
didn't work, typically leading to failure to match overall, as shown in
bug #11478 from Christoph Berg. The minimum change to fix that would be to
not decrement k before "goto backtrack"; but that would be a pretty stupid
solution, because we'd laboriously try each possible sub-match length
before finally discovering that only ending at the end can work. Instead,
force the sub-match endpoint limit up to the end for even the first
shortest() call if we cannot have any more sub-matches after this one.
Bug introduced in my rewrite that added the iterdissect logic, commit
173e29aa5d. The shortest-first search code
was too closely modeled on the longest-first code, which hasn't got this
issue since it tries a match reaching to the end to start with anyway.
Back-patch to all affected branches.
While withCheckOption exprs had been handled in many cases by
happenstance, they need to be handled during set_plan_references and
more specifically down in set_plan_refs for ModifyTable plan nodes.
This is to ensure that the opfuncid's are set for operators referenced
in the withCheckOption exprs.
Identified as an issue by Thom Brown
Patch by Dean Rasheed
Back-patch to 9.4, where withCheckOption was introduced.
Building on the updatable security-barrier views work, add the
ability to define policies on tables to limit the set of rows
which are returned from a query and which are allowed to be added
to a table. Expressions defined by the policy for filtering are
added to the security barrier quals of the query, while expressions
defined to check records being added to a table are added to the
with-check options of the query.
New top-level commands are CREATE/ALTER/DROP POLICY and are
controlled by the table owner. Row Security is able to be enabled
and disabled by the owner on a per-table basis using
ALTER TABLE .. ENABLE/DISABLE ROW SECURITY.
Per discussion, ROW SECURITY is disabled on tables by default and
must be enabled for policies on the table to be used. If no
policies exist on a table with ROW SECURITY enabled, a default-deny
policy is used and no records will be visible.
By default, row security is applied at all times except for the
table owner and the superuser. A new GUC, row_security, is added
which can be set to ON, OFF, or FORCE. When set to FORCE, row
security will be applied even for the table owner and superusers.
When set to OFF, row security will be disabled when allowed and an
error will be thrown if the user does not have rights to bypass row
security.
Per discussion, pg_dump sets row_security = OFF by default to ensure
that exports and backups will have all data in the table or will
error if there are insufficient privileges to bypass row security.
A new option has been added to pg_dump, --enable-row-security, to
ask pg_dump to export with row security enabled.
A new role capability, BYPASSRLS, which can only be set by the
superuser, is added to allow other users to be able to bypass row
security using row_security = OFF.
Many thanks to the various individuals who have helped with the
design, particularly Robert Haas for his feedback.
Authors include Craig Ringer, KaiGai Kohei, Adam Brightwell, Dean
Rasheed, with additional changes and rework by me.
Reviewers have included all of the above, Greg Smith,
Jeff McCormick, and Robert Haas.
The code for raising a NUMERIC value to an integer power wasn't very
careful about large powers. It got an outright wrong answer for an
exponent of INT_MIN, due to failure to consider overflow of the Abs(exp)
operation; which is fixable by using an unsigned rather than signed
exponent value after that point. Also, even though the number of
iterations of the power-computation loop is pretty limited, it's easy for
the repeated squarings to result in ridiculously enormous intermediate
values, which can take unreasonable amounts of time/memory to process,
or even overflow the internal "weight" field and so produce a wrong answer.
We can forestall misbehaviors of that sort by bailing out as soon as the
weight value exceeds what will fit in int16, since then the final answer
must overflow (if exp > 0) or underflow (if exp < 0) the packed numeric
format.
Per off-list report from Pavel Stehule. Back-patch to all supported
branches.
Provide an option to skip NULL values in a row when generating a JSON
object from that row with row_to_json. This can reduce the size of the
JSON object in cases where columns are NULL without really reducing the
information in the JSON object.
This also makes row_to_json into a single function with default values,
rather than having multiple functions. In passing, change array_to_json
to also be a single function with default values (we don't add an
'ignore_nulls' option yet- it's not clear that there is a sensible
use-case there, and it hasn't been asked for in any case).
Pavel Stehule
The code I added in commit f343a880d5 was
careless about preserving AND/OR flatness: it could create a structure with
an OR node directly underneath another one. That breaks an assumption
that's fairly important for planning efficiency, not to mention triggering
various Asserts (as reported by Benjamin Smith). Add a trifle more logic
to handle the case properly.
This provides a convenient method of classifying input values into buckets
that are not necessarily equal-width. It works on any sortable data type.
The choice of function name is a bit debatable, perhaps, but showing that
there's a relationship to the SQL standard's width_bucket() function seems
more attractive than the other proposals.
Petr Jelinek, reviewed by Pavel Stehule
The xml type previously rejected "content" that is empty or consists
only of spaces. But the SQL/XML standard allows that, so change that.
The accepted values for XML "documents" are not changed.
Reviewed-by: Ali Akbar <the.apaan@gmail.com>
The number of % parameter markers in RAISE statement should match the number
of parameters given. We used to check that at execution time, but we have
all the information needed at compile time, so let's check it at compile
time instead. It's generally better to find mistakes earlier.
Marko Tiikkaja, reviewed by Fabien Coelho
This reverts commit e23014f3d4.
As the side effect of the reverted commit, when the unit is
specified, the reloption was stored in the catalog with the unit.
This broke pg_dump (specifically, it prevented pg_dump from
outputting restorable backup regarding the reloption) and
turned the buildfarm red. Revert the commit until the fixed
version is ready.
This introduces an infrastructure which allows us to specify the units
like ms (milliseconds) in integer relation option, like GUC parameter.
Currently only autovacuum_vacuum_cost_delay reloption can accept
the units.
Reviewed by Michael Paquier
Use SECURITY_LOCAL_USERID_CHANGE while building temporary tables;
only escalate to SECURITY_RESTRICTED_OPERATION while potentially
running user-supplied code. The more secure mode was preventing
temp table creation. Add regression tests to cover this problem.
This fixes Bug #11208 reported by Bruno Emanuel de Andrade Silva.
Backpatch to 9.4, where the bug was introduced.
This enables changing permanent (logged) tables to unlogged and
vice-versa.
(Docs for ALTER TABLE / SET TABLESPACE got shuffled in an order that
hopefully makes more sense than the original.)
Author: Fabrízio de Royes Mello
Reviewed by: Christoph Berg, Andres Freund, Thom Brown
Some tweaking by Álvaro Herrera
Cause the path extraction operators to return their lefthand input,
not NULL, if the path array has no elements. This seems more consistent
since the case ought to correspond to applying the simple extraction
operator (->) zero times.
Cause other corner cases in field/element/path extraction to return NULL
rather than failing. This behavior is arguably more useful than throwing
an error, since it allows an expression index using these operators to be
built even when not all values in the column are suitable for the
extraction being indexed. Moreover, we already had multiple
inconsistencies between the path extraction operators and the simple
extraction operators, as well as inconsistencies between the JSON and
JSONB code paths. Adopt a uniform rule of returning NULL rather than
throwing an error when the JSON input does not have a structure that
permits the request to be satisfied.
Back-patch to 9.4. Update the release notes to list this as a behavior
change since 9.3.
Cover some cases I omitted before, such as null and empty-string
elements in the path array. This exposes another inconsistency:
json_extract_path complains about empty path elements but
jsonb_extract_path does not.
jsonb's #> operator segfaulted (dereferencing a null pointer) if the RHS
was a zero-length array, as reported in bug #11207 from Justin Van Winkle.
json's #> operator returns NULL in such cases, so for the moment let's
make jsonb act likewise.
Also add a bunch of regression test queries memorializing the -> and #>
operators' behavior for this and other corner cases.
There is a good argument for changing some of these behaviors, as they
are not very consistent with each other, and throwing an error isn't
necessarily a desirable behavior for operators that are likely to be
used in indexes. However, everybody can agree that a core dump is the
Wrong Thing, and we need test cases even if we decide to change their
expected output later.
Make lists of the names of all operators that are claimed to be commutator
pairs or negator pairs. This is analogous to the existing queries that
make lists of all operator names appearing in particular opclass strategy
slots. Unexpected additions to these lists are likely to be mistakes; had
we had these queries in place before, bug #11178 might've been prevented.
When a view has a function-returning-composite in FROM, and there are
some dropped columns in the underlying composite type, ruleutils.c
printed junk in the column alias list for the reconstructed FROM entry.
Before 9.3, this was prevented by doing get_rte_attribute_is_dropped
tests while printing the column alias list; but that solution is not
currently available to us for reasons I'll explain below. Instead,
check for empty-string entries in the alias list, which can only exist
if that column position had been dropped at the time the view was made.
(The parser fills in empty strings to preserve the invariant that the
aliases correspond to physical column positions.)
While this is sufficient to handle the case of columns dropped before
the view was made, we have still got issues with columns dropped after
the view was made. In particular, the view could contain Vars that
explicitly reference such columns! The dependency machinery really
ought to refuse the column drop attempt in such cases, as it would do
when trying to drop a table column that's explicitly referenced in
views. However, we currently neglect to store dependencies on columns
of composite types, and fixing that is likely to be too big to be
back-patchable (not to mention that existing views in existing databases
would not have the needed pg_depend entries anyway). So I'll leave that
for a separate patch.
Pre-9.3, ruleutils would print such Vars normally (with their original
column names) even though it suppressed their entries in the RTE's
column alias list. This is certainly bogus, since the printed view
definition would fail to reload, but at least it didn't crash. However,
as of 9.3 the printed column alias list is tightly tied to the names
printed for Vars; so we can't treat columns as dropped for one purpose
and not dropped for the other. This is why we can't just put back the
get_rte_attribute_is_dropped test: it results in an assertion failure
if the view in fact contains any Vars referencing the dropped column.
Once we've got dependencies preventing such cases, we'll probably want
to do it that way instead of relying on the empty-string test used here.
This fix turned up a very ancient bug in outfuncs/readfuncs, namely
that T_String nodes containing empty strings were not dumped/reloaded
correctly: the node was printed as "<>" which is read as a string
value of <>. Since (per SQL) we disallow empty-string identifiers,
such nodes don't occur normally, which is why we'd not noticed.
(Such nodes aren't used for literal constants, just identifiers.)
Per report from Marc Schablewski. Back-patch to 9.3 which is where
the rule printing behavior changed. The dangling-variable case is
broken all the way back, but that's not what his complaint is about.
We can remove a left join to a relation if the relation's output is
provably distinct for the columns involved in the join clause (considering
only equijoin clauses) and the relation supplies no variables needed above
the join. Previously, the join removal logic could only prove distinctness
by reference to unique indexes of a table. This patch extends the logic
to consider subquery relations, wherein distinctness might be proven by
reference to GROUP BY, DISTINCT, etc.
We actually already had some code to check that a subquery's output was
provably distinct, but it was hidden inside pathnode.c; which was a pretty
bad place for it really, since that file is mostly boilerplate Path
construction and comparison. Move that code to analyzejoins.c, which is
arguably a more appropriate location, and is certainly the site of the
new usage for it.
David Rowley, reviewed by Simon Riggs
ExecEvalWholeRowVar incorrectly supposed that it could "bless" the source
TupleTableSlot just once per query. But if the input is coming from an
Append (or, perhaps, other cases?) more than one slot might be returned
over the query run. This led to "record type has not been registered"
errors when a composite datum was extracted from a non-blessed slot.
This bug has been there a long time; I guess it escaped notice because when
dealing with subqueries the planner tends to expand whole-row Vars into
RowExprs, which don't have the same problem. It is possible to trigger
the problem in all active branches, though, as illustrated by the added
regression test.
This command provides an automated way to create foreign table definitions
that match remote tables, thereby reducing tedium and chances for error.
In this patch, we provide the necessary core-server infrastructure and
implement the feature fully in the postgres_fdw foreign-data wrapper.
Other wrappers will throw a "feature not supported" error until/unless
they are updated.
Ronan Dunklau and Michael Paquier, additional work by me
While the x output of "select x from t group by x" can be presumed unique,
this does not hold for "select x, generate_series(1,10) from t group by x",
because we may expand the set-returning function after the grouping step.
(Perhaps that should be re-thought; but considering all the other oddities
involved with SRFs in targetlists, it seems unlikely we'll change it.)
Put a check in query_is_distinct_for() so it's not fooled by such cases.
Back-patch to all supported branches.
David Rowley
The "false" case was really quite useless since all it did was to throw
an error; a definition not helped in the least by making it the default.
Instead let's just have the "true" case, which emits nested objects and
arrays in JSON syntax. We might later want to provide the ability to
emit sub-objects in Postgres record or array syntax, but we'd be best off
to drive that off a check of the target field datatype, not a separate
argument.
For the functions newly added in 9.4, we can just remove the flag arguments
outright. We can't do that for json_populate_record[set], which already
existed in 9.3, but we can ignore the argument and always behave as if it
were "true". It helps that the flag arguments were optional and not
documented in any useful fashion anyway.
We can allow this even without any specific knowledge of the semantics
of the window function, so long as pushed-down quals will either accept
every row in a given window partition, or reject every such row. Because
window functions act only within a partition, such a case can't result
in changing the window functions' outputs for any surviving row.
Eliminating entire partitions in this way obviously can reduce the cost
of the window-function computations substantially.
The fly in the ointment is that it's hard to be entirely sure whether
this is true for an arbitrary qual condition. This patch allows pushdown
if (a) the qual references only partitioning columns, and (b) the qual
contains no volatile functions. We are at risk of incorrect results if
the qual can produce different answers for values that the partitioning
equality operator sees as equal. While it's not hard to invent cases
for which that can happen, it seems to seldom be a problem in practice,
since no one has complained about a similar assumption that we've had
for many years with respect to DISTINCT. The potential performance
gains seem to be worth the risk.
David Rowley, reviewed by Vik Fearing; some credit is due also to
Thomas Mayer who did considerable preliminary investigation.
A WHERE clause applied to the output of a subquery with DISTINCT should
theoretically be applied only once per distinct row; but if we push it
into the subquery then it will be evaluated at each row before duplicate
elimination occurs. If the qual is volatile this can give rise to
observably wrong results, so don't do that.
While at it, refactor a little bit to allow subquery_is_pushdown_safe
to report more than one kind of restrictive condition without indefinitely
expanding its argument list.
Although this is a bug fix, it seems unwise to back-patch it into released
branches, since it might de-optimize plans for queries that aren't giving
any trouble in practice. So apply to 9.4 but not further back.
Commit a87c729153 already fixed the bug this
is checking for, but the regression test case it added didn't cover this
scenario. Since we managed to miss the fact that there was a bug at all,
it seems like a good idea to propagate the extra test case forward to HEAD.
populate_recordset_object_start() improperly created a new hash table
(overwriting the link to the existing one) if called at nest levels
greater than one. This resulted in previous fields not appearing in
the final output, as reported by Matti Hameister in bug #10728.
In 9.4 the problem also affects json_to_recordset.
This perhaps missed detection earlier because the default behavior is to
throw an error for nested objects: you have to pass use_json_as_text = true
to see the problem.
In addition, fix query-lifespan leakage of the hashtable created by
json_populate_record(). This is pretty much the same problem recently
fixed in dblink: creating an intended-to-be-temporary context underneath
the executor's per-tuple context isn't enough to make it go away at the
end of the tuple cycle, because MemoryContextReset is not
MemoryContextResetAndDeleteChildren.
Michael Paquier and Tom Lane
This SQL-standard feature allows a sub-SELECT yielding multiple columns
(but only one row) to be used to compute the new values of several columns
to be updated. While the same results can be had with an independent
sub-SELECT per column, such a workaround can require a great deal of
duplicated computation.
The standard actually says that the source for a multi-column assignment
could be any row-valued expression. The implementation used here is
tightly tied to our existing sub-SELECT support and can't handle other
cases; the Bison grammar would have some issues with them too. However,
I don't feel too bad about this since other cases can be converted into
sub-SELECTs. For instance, "SET (a,b,c) = row_valued_function(x)" could
be written "SET (a,b,c) = (SELECT * FROM row_valued_function(x))".
Memorialize the expected output of the query that libpq has been using for
many years to get the OIDs of large-object support functions. Although
we really ought to change the way libpq does this, we must expect that
this query will remain in use in the field for the foreseeable future,
so until we're ready to break compatibility with old libpq versions
we'd better check the results stay the same. See the recent lo_create()
fiasco.