MERGE performs actions that modify rows in the target table
using a source table or query. MERGE provides a single SQL
statement that can conditionally INSERT/UPDATE/DELETE rows
a task that would other require multiple PL statements.
e.g.
MERGE INTO target AS t
USING source AS s
ON t.tid = s.sid
WHEN MATCHED AND t.balance > s.delta THEN
UPDATE SET balance = t.balance - s.delta
WHEN MATCHED THEN
DELETE
WHEN NOT MATCHED AND s.delta > 0 THEN
INSERT VALUES (s.sid, s.delta)
WHEN NOT MATCHED THEN
DO NOTHING;
MERGE works with regular and partitioned tables, including
column and row security enforcement, as well as support for
row, statement and transition triggers.
MERGE is optimized for OLTP and is parameterizable, though
also useful for large scale ETL/ELT. MERGE is not intended
to be used in preference to existing single SQL commands
for INSERT, UPDATE or DELETE since there is some overhead.
MERGE can be used statically from PL/pgSQL.
MERGE does not yet support inheritance, write rules,
RETURNING clauses, updatable views or foreign tables.
MERGE follows SQL Standard per the most recent SQL:2016.
Includes full tests and documentation, including full
isolation tests to demonstrate the concurrent behavior.
This version written from scratch in 2017 by Simon Riggs,
using docs and tests originally written in 2009. Later work
from Pavan Deolasee has been both complex and deep, leaving
the lead author credit now in his hands.
Extensive discussion of concurrency from Peter Geoghegan,
with thanks for the time and effort contributed.
Various issues reported via sqlsmith by Andreas Seltenreich
Authors: Pavan Deolasee, Simon Riggs
Reviewer: Peter Geoghegan, Amit Langote, Tomas Vondra, Simon Riggs
Discussion:
https://postgr.es/m/CANP8+jKitBSrB7oTgT9CY2i1ObfOt36z0XMraQc+Xrz8QB0nXA@mail.gmail.comhttps://postgr.es/m/CAH2-WzkJdBuxj9PO=2QaO9-3h3xGbQPZ34kJH=HukRekwM-GZg@mail.gmail.com
Previously, committing or aborting inside a cursor loop was prohibited
because that would close and remove the cursor. To allow that,
automatically convert such cursors to holdable cursors so they survive
commits or rollbacks. Portals now have a new state "auto-held", which
means they have been converted automatically from pinned. An auto-held
portal is kept on transaction commit or rollback, but is still removed
when returning to the main loop on error.
This supports all languages that have cursor loop constructs: PL/pgSQL,
PL/Python, PL/Perl.
Reviewed-by: Ildus Kurbangaliev <i.kurbangaliev@postgrespro.ru>
So far, a nested CALL or DO in PL/pgSQL would not establish a context
where transaction control statements were allowed. This fixes that by
handling CALL and DO specially in PL/pgSQL, passing the atomic/nonatomic
execution context through and doing the required management around
transaction boundaries.
Reviewed-by: Tomas Vondra <tomas.vondra@2ndquadrant.com>
Originally, we treated memory context names as potentially variable in
all cases, and therefore always copied them into the context header.
Commit 9fa6f00b1 rethought this a little bit and invented a distinction
between fixed and variable names, skipping the copy step for the former.
But we can make things both simpler and more useful by instead allowing
there to be two parts to a context's identification, a fixed "name" and
an optional, variable "ident". The name supplied in the context create
call is now required to be a compile-time-constant string in all cases,
as it is never copied but just pointed to. The "ident" string, if
wanted, is supplied later. This is needed because typically we want
the ident to be stored inside the context so that it's cleaned up
automatically on context deletion; that means it has to be copied into
the context before we can set the pointer.
The cost of this approach is basically just an additional pointer field
in struct MemoryContextData, which isn't much overhead, and is bought
back entirely in the AllocSet case by not needing a headerSize field
anymore, since we no longer have to cope with variable header length.
In addition, we can simplify the internal interfaces for memory context
creation still further, saving a few cycles there. And it's no longer
true that a custom identifier disqualifies a context from participating
in aset.c's freelist scheme, so possibly there's some win on that end.
All the places that were using non-compile-time-constant context names
are adjusted to put the variable info into the "ident" instead. This
allows more effective identification of those contexts in many cases;
for example, subsidary contexts of relcache entries are now identified
by both type (e.g. "index info") and relname, where before you got only
one or the other. Contexts associated with PL function cache entries
are now identified more fully and uniformly, too.
I also arranged for plancache contexts to use the query source string
as their identifier. This is basically free for CachedPlanSources, as
they contained a copy of that string already. We pay an extra pstrdup
to do it for CachedPlans. That could perhaps be avoided, but it would
make things more fragile (since the CachedPlanSource is sometimes
destroyed first). I suspect future improvements in error reporting will
require CachedPlans to have a copy of that string anyway, so it's not
clear that it's worth moving mountains to avoid it now.
This also changes the APIs for context statistics routines so that the
context-specific routines no longer assume that output goes straight
to stderr, nor do they know all details of the output format. This
is useful immediately to reduce code duplication, and it also allows
for external code to do something with stats output that's different
from printing to stderr.
The reason for pushing this now rather than waiting for v12 is that
it rethinks some of the API changes made by commit 9fa6f00b1. Seems
better for extension authors to endure just one round of API changes
not two.
Discussion: https://postgr.es/m/CAB=Je-FdtmFZ9y9REHD7VsSrnCkiBhsA4mdsLKSPauwXtQBeNA@mail.gmail.com
In the case that PostgreSQL uses stdbool.h but Perl doesn't, we need to
prevent Perl from defining bool, to prevent compiler warnings about
redefinition.
For years, our makefiles have correctly observed that "there is no correct
way to write a rule that generates two files". However, what we did is to
provide empty rules that "generate" the secondary output files from the
primary one, and that's not right either. Depending on the details of
the creating process, the primary file might end up timestamped later than
one or more secondary files, causing subsequent make runs to consider the
secondary file(s) out of date. That's harmless in a plain build, since
make will just re-execute the empty rule and nothing happens. But it's
fatal in a VPATH build, since make will expect the secondary file to be
rebuilt in the build directory. This would manifest as "file not found"
failures during VPATH builds from tarballs, if we were ever unlucky enough
to ship a tarball with apparently out-of-date secondary files. (It's not
clear whether that has ever actually happened, but it definitely could.)
To ensure that secondary output files have timestamps >= their primary's,
change our makefile convention to be that we provide a "touch $@" action
not an empty rule. Also, make sure that this rule actually gets invoked
during a distprep run, else the hazard remains.
It's been like this a long time, so back-patch to all supported branches.
In HEAD, I skipped the changes in src/backend/catalog/Makefile, because
those rules are due to get replaced soon in the bootstrap data format
patch, and there seems no need to create a merge issue for that patch.
If for some reason we fail to land that patch in v11, we'll need to
back-fill the changes in that one makefile from v10.
Discussion: https://postgr.es/m/18556.1521668179@sss.pgh.pa.us
Revert the PL/Perl-specific change in
9a95a77d9d. We must not prevent Perl from
using stdbool.h when it has been built to do so, even if it uses an
incompatible size. Otherwise, we would be imposing our bool on Perl,
which will lead to crashes because of the size mismatch.
Instead, we undef bool after including the Perl headers, as we did
previously, but now only if we are not using stdbool.h ourselves.
Record that choice in c.h as USE_STDBOOL. This will also make it easier
to apply that coding pattern elsewhere if necessary.
Using the standard bool type provided by C allows some recent compilers
and debuggers to give better diagnostics. Also, some extension code and
third-party headers are increasingly pulling in stdbool.h, so it's
probably saner if everyone uses the same definition.
But PostgreSQL code is not prepared to handle bool of a size other than
1, so we keep our own old definition if we encounter a stdbool.h with a
bool of a different size. (Among current build farm members, this only
applies to old macOS versions on PowerPC.)
To check that the used bool is of the right size, add a static
assertions about size of GinTernaryValue vs bool. This is currently the
only place that assumes that bool and char are of the same size.
Discussion: https://www.postgresql.org/message-id/flat/3a0fe7e1-5ed1-414b-9230-53bbc0ed1f49@2ndquadrant.com
The test to exit the loop if the integer control value would overflow
an int32 turns out not to work on some ICC versions, as it's dependent
on the assumption that the compiler will execute the code as written
rather than "optimize" it. ICC lacks any equivalent of gcc's -fwrapv
switch, so it was optimizing on the assumption of no integer overflow,
and that breaks this. Rewrite into a form that in fact does not
do any overflowing computations.
Per Tomas Vondra and buildfarm member fulmar. It's been like this
for a long time, although it was not till we added a regression test
case covering the behavior (in commit dd2243f2a) that the problem
became apparent. Back-patch to all supported versions.
Discussion: https://postgr.es/m/50562fdc-0876-9843-c883-15b8566c7511@2ndquadrant.com
Fix the warnings created by the compiler warning options
-Wformat-overflow=2 -Wformat-truncation=2, supported since GCC 7. This
is a more aggressive variant of the fixes in
6275f5d28a, which GCC 7 warned about by
default.
The issues are all harmless, but some dubious coding patterns are
cleaned up.
One issue that is of external interest is that BGW_MAXLEN is increased
from 64 to 96. Apparently, the old value would cause the bgw_name of
logical replication workers to be truncated in some circumstances.
But this doesn't actually add those warning options. It appears that
the warnings depend a bit on compilation and optimization options, so it
would be annoying to have to keep up with that. This is more of a
once-in-a-while cleanup.
Reviewed-by: Michael Paquier <michael@paquier.xyz>
In a top-level CALL, the values of INOUT arguments will be returned as a
result row. In PL/pgSQL, the values are assigned back to the input
arguments. In other languages, the same convention as for return a
record from a function is used. That does not require any code changes
in the PL implementations.
Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com>
The new column distinguishes normal functions, procedures, aggregates,
and window functions. This replaces the existing columns proisagg and
proiswindow, and replaces the convention that procedures are indicated
by prorettype == 0. Also change prorettype to be VOIDOID for procedures.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
To support parameters in CALL, move the parse analysis of the procedure
and arguments into the global transformation phase, so that the parser
hooks can be applied. And then at execution time pass the parameters
from ProcessUtility on to ExecuteCallStmt.
Other header files should never #include postgres.h (nor postgres_fe.h,
nor c.h), per project policy. Also, there's no need for any backend .c
file to explicitly include elog.h or palloc.h, because postgres.h pulls
those in already.
Extracted from a larger patch by Kyotaro Horiguchi. The rest of the
removals he suggests require more study, but these are no-brainers.
Discussion: https://postgr.es/m/20180215.200447.209320006.horiguchi.kyotaro@lab.ntt.co.jp
If a plpgsql function is declared to return a domain type, and the domain's
constraints forbid a null value, it was nonetheless possible to return
NULL, because we didn't bother to check the constraints for a null result.
I'd noticed this while fooling with domains-over-composite, but had not
gotten around to fixing it immediately.
Add a regression test script exercising this and various other domain
cases, largely borrowed from the plpython_types test.
Although this is clearly a bug fix, I'm not sure whether anyone would
thank us for changing the behavior in stable branches, so I'm inclined
not to back-patch.
This effectively reverts commit 9edc97b71 (although the test is now
in a different place and has different contents). We don't need that
hack anymore, because since commit 4b93f5799, this test case no longer
throws an error and so there's no displayed CONTEXT that could vary
depending on CLOBBER_CACHE_ALWAYS. The underlying unstable-output
problem isn't really gone, of course, but it no longer manifests here.
The buildfarm's CLOBBER_CACHE_ALWAYS animals aren't happy with some
of the test cases added in commit 4b93f5799. There are two different
problems:
* In two places, a different CONTEXT stack is shown because the error
is detected in a different place, due to recompiling an expression
from scratch rather than re-using a previously cached plan for it.
I fixed these via the expedient of hiding the CONTEXT stack altogether.
* In one place, a test expected to fail (because a cached plan hadn't
been updated) actually succeeds (because the forced recompile makes
it good). I couldn't think of a simple workaround for this, so I've
just commented out that test step altogether.
I have hopes of improving things enough that both of these kluges can
be reverted eventually. The first one is the same kind of problem
previously discussed at
https://postgr.es/m/31545.1512924904@sss.pgh.pa.us
but there was insufficient agreement about how to fix it, so we
just hacked around the output instability (commit 9edc97b71).
The second issue should be fixed by allowing the plan to be rebuilt
when a type conflict is detected. But for today, let's just make the
buildfarm green again.
plpython_error_callback() reported the name of the function associated
with the topmost PL/Python execution context. This was not merely
wrong if there were nested PL/Python contexts, but it risked a core
dump if the topmost one is an inline code block rather than a named
function. That will have proname = NULL, and so we were passing a NULL
pointer to snprintf("%s"). It seems that none of the PL/Python-testing
machines in the buildfarm will dump core for that, but some platforms do,
as reported by Marina Polyakova.
Investigation finds that there actually is an existing regression test
that used to prove that the behavior was wrong, though apparently no one
had noticed that it was printing the wrong function name. It stopped
showing the problem in 9.6 when we adjusted psql to not print CONTEXT
by default for NOTICE messages. The problem is masked (if your platform
avoids the core dump) in error cases, because PL/Python will throw away
the originally generated error info in favor of a new traceback produced
at the outer level.
Repair by using ErrorContextCallback.arg to pass the correct context to
the error callback. Add a regression test illustrating correct behavior.
Back-patch to all supported branches, since they're all broken this way.
Discussion: https://postgr.es/m/156b989dbc6fe7c4d3223cf51da61195@postgrespro.ru
These features were never implemented previously for composite or record
variables ... not that the documentation admitted it, so there's no doc
updates here.
This also fixes some issues concerning enforcing DOMAIN NOT NULL
constraints against plpgsql variables, although I'm not sure that
that topic is completely dealt with.
I created a new plpgsql test file for these features, and moved the
one relevant existing test case into that file.
Tom Lane, reviewed by Daniel Gustafsson
Discussion: https://postgr.es/m/18362.1514605650@sss.pgh.pa.us
Over the years we've accreted quite a few special variables that are
predefined in plpgsql trigger functions. The cost of initializing these
variables to their defined values turns out to be a significant part of
the runtime of simple triggers; but, undoubtedly, most real-world triggers
never examine the values of most of these variables.
To improve matters, invent the notion of a variable that has a "promise"
attached to it, specifying which of the predetermined values should be
assigned to the variable if anything ever reads it. This eliminates all
the unneeded startup overhead, in return for a small penalty on accesses
to these variables.
Tom Lane, reviewed by Pavel Stehule
Discussion: https://postgr.es/m/11986.1514407114@sss.pgh.pa.us
Previously, copy_plpgsql_datum did a separate palloc for each variable
needing instance-local storage. In simple benchmarks this made for a
noticeable fraction of the total runtime. Improve it by precalculating
the space needed for all of a function's variables and doing just one
palloc for all of them.
In passing, remove PLPGSQL_DTYPE_EXPR from the list of plpgsql "datum"
types, since in fact it has nothing in common with the others, and there
is noplace that needs to discriminate on the basis of dtype between an
expression and any type of datum. And add comments clarifying which
datum struct fields are generic and which aren't.
Tom Lane, reviewed by Pavel Stehule
Discussion: https://postgr.es/m/11986.1514407114@sss.pgh.pa.us
Formerly, DTYPE_REC was used only for variables declared as "record";
variables of named composite types used DTYPE_ROW, which is faster for
some purposes but much less flexible. In particular, the ROW code paths
are entirely incapable of dealing with DDL-caused changes to the number
or data types of the columns of a row variable, once a particular plpgsql
function has been parsed for the first time in a session. And, since the
stored representation of a ROW isn't a tuple, there wasn't any easy way
to deal with variables of domain-over-composite types, since the domain
constraint checking code would expect the value to be checked to be a
tuple. A lesser, but still real, annoyance is that ROW format cannot
represent a true NULL composite value, only a row of per-field NULL
values, which is not exactly the same thing.
Hence, switch to using DTYPE_REC for all composite-typed variables,
whether "record", named composite type, or domain over named composite
type. DTYPE_ROW remains but is used only for its native purpose, to
represent a fixed-at-compile-time list of variables, for instance the
targets of an INTO clause.
To accomplish this without taking significant performance losses, introduce
infrastructure that allows storing composite-type variables as "expanded
objects", similar to the "expanded array" infrastructure introduced in
commit 1dc5ebc90. A composite variable's value is thereby kept (most of
the time) in the form of separate Datums, so that field accesses and
updates are not much more expensive than they were in the ROW format.
This holds the line, more or less, on performance of variables of named
composite types in field-access-intensive microbenchmarks, and makes
variables declared "record" perform much better than before in similar
tests. In addition, the logic involved with enforcing composite-domain
constraints against updates of individual fields is in the expanded
record infrastructure not plpgsql proper, so that it might be reusable
for other purposes.
In further support of this, introduce a typcache feature for assigning a
unique-within-process identifier to each distinct tuple descriptor of
interest; in particular, DDL alterations on composite types result in a new
identifier for that type. This allows very cheap detection of the need to
refresh tupdesc-dependent data. This improves on the "tupDescSeqNo" idea
I had in commit 687f096ea: that assigned identifying sequence numbers to
successive versions of individual composite types, but the numbers were not
unique across different types, nor was there support for assigning numbers
to registered record types.
In passing, allow plpgsql functions to accept as well as return type
"record". There was no good reason for the old restriction, and it
was out of step with most of the other PLs.
Tom Lane, reviewed by Pavel Stehule
Discussion: https://postgr.es/m/8962.1514399547@sss.pgh.pa.us
Commit 8561e4840c neglected to handle
older Python versions that don't support the "with" statement. So write
the tests in a way that older versions can handle as well.
In each of the supplied procedural languages (PL/pgSQL, PL/Perl,
PL/Python, PL/Tcl), add language-specific commit and rollback
functions/commands to control transactions in procedures in that
language. Add similar underlying functions to SPI. Some additional
cleanup so that transaction commit or abort doesn't blow away data
structures still used by the procedure call. Add execution context
tracking to CALL and DO statements so that transaction control commands
can only be issued in top-level procedure and block calls, not function
calls or other procedure or block calls.
- SPI
Add a new function SPI_connect_ext() that is like SPI_connect() but
allows passing option flags. The only option flag right now is
SPI_OPT_NONATOMIC. A nonatomic SPI connection can execute transaction
control commands, otherwise it's not allowed. This is meant to be
passed down from CALL and DO statements which themselves know in which
context they are called. A nonatomic SPI connection uses different
memory management. A normal SPI connection allocates its memory in
TopTransactionContext. For nonatomic connections we use PortalContext
instead. As the comment in SPI_connect_ext() (previously SPI_connect())
indicates, one could potentially use PortalContext in all cases, but it
seems safest to leave the existing uses alone, because this stuff is
complicated enough already.
SPI also gets new functions SPI_start_transaction(), SPI_commit(), and
SPI_rollback(), which can be used by PLs to implement their transaction
control logic.
- portalmem.c
Some adjustments were made in the code that cleans up portals at
transaction abort. The portal code could already handle a command
*committing* a transaction and continuing (e.g., VACUUM), but it was not
quite prepared for a command *aborting* a transaction and continuing.
In AtAbort_Portals(), remove the code that marks an active portal as
failed. As the comment there already predicted, this doesn't work if
the running command wants to keep running after transaction abort. And
it's actually not necessary, because pquery.c is careful to run all
portal code in a PG_TRY block and explicitly runs MarkPortalFailed() if
there is an exception. So the code in AtAbort_Portals() is never used
anyway.
In AtAbort_Portals() and AtCleanup_Portals(), we need to be careful not
to clean up active portals too much. This mirrors similar code in
PreCommit_Portals().
- PL/Perl
Gets new functions spi_commit() and spi_rollback()
- PL/pgSQL
Gets new commands COMMIT and ROLLBACK.
Update the PL/SQL porting example in the documentation to reflect that
transactions are now possible in procedures.
- PL/Python
Gets new functions plpy.commit and plpy.rollback.
- PL/Tcl
Gets new commands commit and rollback.
Reviewed-by: Andrew Dunstan <andrew.dunstan@2ndquadrant.com>
The previous code converted SPI_processed to a Python float if it didn't
fit into a Python int. But Python longs have unlimited precision, so
use that instead in all cases.
As in eee50a8d4c, we use the Python
LongLong API unconditionally for simplicity.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
We don't actually need two code paths, one for 32 bits and one for 64
bits. Since the existing code already assumed that "long long" is
available, we can just use PyLong_FromLongLong() for 64 bits as well.
In Python 2.5 and later, PyLong_FromLong() and PyLong_FromLongLong() use
the same code, so there will be no difference for 64-bit platforms. In
Python 2.4, the code is different, but performance testing showed no
noticeable difference in PL/Python, and that Python version is ancient
anyway.
Discussion: https://www.postgresql.org/message-id/0a02203c-e157-55b2-464e-6087066a1849@2ndquadrant.com
AclObjectKind was basically just another enumeration for object types,
and we already have a preferred one for that. It's only used in
aclcheck_error. By using ObjectType instead, we can also give some more
precise error messages, for example "index" instead of "relation".
Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
PL/pgSQL "pins" internally generated portals so that user code cannot
close them by guessing their names. Add this functionality to PL/Perl
and PL/Python as well, preventing users from manually closing cursors
created by spi_query and plpy.cursor, respectively. (PL/Tcl does not
currently offer any cursor functionality.)
PL/pgSQL "pins" internally generated (unnamed) portals so that user code
cannot close them by guessing their names. This logic is also useful in
other languages and really for any code. So move that logic into SPI.
An unnamed portal obtained through SPI_cursor_open() and related
functions is now automatically pinned, and SPI_cursor_close()
automatically unpins a portal that is pinned.
In the core distribution, this affects PL/Perl and PL/Python, preventing
users from manually closing cursors created by spi_query and
plpy.cursor, respectively. (PL/Tcl does not currently offer any cursor
functionality.)
Reviewed-by: Andrew Dunstan <andrew.dunstan@2ndquadrant.com>
plpgsql's five different loop control statements contained three distinct
implementations of the same (or what ought to be the same, at least)
logic for handling return/exit/continue result codes from their child
statements. At best, that's trouble waiting to happen, and there seems
no very good reason for the coding to be so different. Refactor so that
all the common logic is expressed in a single macro.
Discussion: https://postgr.es/m/26314.1514670401@sss.pgh.pa.us
I noticed that our code coverage report showed considerable deficiency
in test coverage for PL/pgSQL control statements. Notably, both
exec_stmt_block and most of the loop control statements had very poor
coverage of handling of return/exit/continue result codes from their
child statements; and exec_stmt_fori was seriously lacking in feature
coverage, having no test that exercised its BY or REVERSE features,
nor verification that its overflow defenses work.
Now that we have some infrastructure for plpgsql-specific test scripts,
the natural thing to do is make a new script rather than further extend
plpgsql.sql. So I created a new script plpgsql_control.sql with the
charter to test plpgsql control structures, and moved a few existing
tests there because they fell entirely under that charter. I then
added new test cases that exercise the bits of code complained of above.
Of the five kinds of loop statements, only exec_stmt_while's result code
handling is fully exercised by these tests. That would be a deficiency
as things stand, but a follow-on commit will merge the loop statements'
result code handling into one implementation. So testing each usage of
that implementation separately seems redundant.
In passing, also add a couple test cases to plpgsql.sql to more fully
exercise plpgsql's code related to expanded arrays --- I had thought
that area was sufficiently covered already, but the coverage report
showed a couple of un-executed code paths.
Discussion: https://postgr.es/m/26314.1514670401@sss.pgh.pa.us
This patch does three interrelated things:
* Create a new expression execution step type EEOP_PARAM_CALLBACK
and add the infrastructure needed for add-on modules to generate that.
As discussed, the best control mechanism for that seems to be to add
another hook function to ParamListInfo, which will be called by
ExecInitExpr if it's supplied and a PARAM_EXTERN Param is found.
For stand-alone expressions, we add a new entry point to allow the
ParamListInfo to be specified directly, since it can't be retrieved
from the parent plan node's EState.
* Redesign the API for the ParamListInfo paramFetch hook so that the
ParamExternData array can be entirely virtual. This also lets us get rid
of ParamListInfo.paramMask, instead leaving it to the paramFetch hook to
decide which param IDs should be accessible or not. plpgsql_param_fetch
was already doing the identical masking check, so having callers do it too
seemed redundant. While I was at it, I added a "speculative" flag to
paramFetch that the planner can specify as TRUE to avoid unwanted failures.
This solves an ancient problem for plpgsql that it couldn't provide values
of non-DTYPE_VAR variables to the planner for fear of triggering premature
"record not assigned yet" or "field not found" errors during planning.
* Rework plpgsql to get rid of the need for "unshared" parameter lists,
by dint of turning the single ParamListInfo per estate into a nearly
read-only data structure that doesn't instantiate any per-variable data.
Instead, the paramFetch hook controls access to per-variable data and can
make the right decisions on the fly, replacing the cases that we used to
need multiple ParamListInfos for. This might perhaps have been a
performance loss on its own, but by using a paramCompile hook we can
bypass plpgsql_param_fetch entirely during normal query execution.
(It's now only called when, eg, we copy the ParamListInfo into a cursor
portal. copyParamList() or SerializeParamList() effectively instantiate
the virtual parameter array as a simple physical array without a
paramFetch hook, which is what we want in those cases.) This allows
reverting most of commit 6c82d8d1f, though I kept the cosmetic
code-consolidation aspects of that (eg the assign_simple_var function).
Performance testing shows this to be at worst a break-even change,
and it can provide wins ranging up to 20% in test cases involving
accesses to fields of "record" variables. The fact that values of
such variables can now be exposed to the planner might produce wins
in some situations, too, but I've not pursued that angle.
In passing, remove the "parent" pointer from the arguments to
ExecInitExprRec and related functions, instead storing that pointer in a
transient field in ExprState. The ParamListInfo pointer for a stand-alone
expression is handled the same way; we'd otherwise have had to add
yet another recursively-passed-down argument in expression compilation.
Discussion: https://postgr.es/m/32589.1513706441@sss.pgh.pa.us
Various Perl scripts we use to generate files were in the habit of
printing things like "generated by $0" into their output files.
That looks like a fine idea at first glance, but it results in
non-reproducible output, because in VPATH builds $0 won't be just
the name of the script file, but a full path for it. We'd prefer
that you get identical results whether using VPATH or not, so this
is a bad thing.
Some of these places also printed their input file name(s), causing
an additional hazard of the same type.
Hence, establish a policy that thou shalt not print $0, nor input file
pathnames, into output files (they're still allowed in error messages,
though). Instead just write the script name verbatim. While we are at
it, we can make these annotations more useful by giving the script's
full relative path name within the PG source tree, eg instead of
Gen_fmgrtab.pl let's print src/backend/utils/Gen_fmgrtab.pl.
Not all of the changes made here actually affect any files shipped
in finished tarballs today, but it seems best to apply the policy
everyplace so that nobody copies unsafe code into places where it
could matter.
Christoph Berg and Tom Lane
Discussion: https://postgr.es/m/20171215102223.GB31812@msg.df7cb.de
Fix a couple of minor oversights in commit 632b03da3: the tests
should be run in database "pl_regression" like the other PLs do,
and we should clean up the tests' output cruft during "make clean".
This patch makes a number of interrelated changes to reduce the overhead
involved in creating/deleting memory contexts. The key ideas are:
* Include the AllocSetContext header of an aset.c context in its first
malloc request, rather than allocating it separately in TopMemoryContext.
This means that we now always create an initial or "keeper" block in an
aset, even if it never receives any allocation requests.
* Create freelists in which we can save and recycle recently-destroyed
asets (this idea is due to Robert Haas).
* In the common case where the name of a context is a constant string,
just store a pointer to it in the context header, rather than copying
the string.
The first change eliminates a palloc/pfree cycle per context, and
also avoids bloat in TopMemoryContext, at the price that creating
a context now involves a malloc/free cycle even if the context never
receives any allocations. That would be a loser for some common
usage patterns, but recycling short-lived contexts via the freelist
eliminates that pain.
Avoiding copying constant strings not only saves strlen() and strcpy()
overhead, but is an essential part of the freelist optimization because
it makes the context header size constant. Currently we make no
attempt to use the freelist for contexts with non-constant names.
(Perhaps someday we'll need to think harder about that, but in current
usage, most contexts with custom names are long-lived anyway.)
The freelist management in this initial commit is pretty simplistic,
and we might want to refine it later --- but in common workloads that
will never matter because the freelists will never get full anyway.
To create a context with a non-constant name, one is now required to
call AllocSetContextCreateExtended and specify the MEMCONTEXT_COPY_NAME
option. AllocSetContextCreate becomes a wrapper macro, and it includes
a test that will complain about non-string-literal context name
parameters on gcc and similar compilers.
An unfortunate side effect of making AllocSetContextCreate a macro is
that one is now *required* to use the size parameter abstraction macros
(ALLOCSET_DEFAULT_SIZES and friends) with it; the pre-9.6 habit of
writing out individual size parameters no longer works unless you
switch to AllocSetContextCreateExtended.
Internally to the memory-context-related modules, the context creation
APIs are simplified, removing the rather baroque original design whereby
a context-type module called mcxt.c which then called back into the
context-type module. That saved a bit of code duplication, but not much,
and it prevented context-type modules from exercising control over the
allocation of context headers.
In passing, I converted the test-and-elog validation of aset size
parameters into Asserts to save a few more cycles. The original thought
was that callers might compute size parameters on the fly, but in practice
nobody does that, so it's useless to expend cycles on checking those
numbers in production builds.
Also, mark the memory context method-pointer structs "const",
just for cleanliness.
Discussion: https://postgr.es/m/2264.1512870796@sss.pgh.pa.us
The plpgsql.sql test file in the main regression tests is now by far the
largest after numeric_big, making editing and managing the test cases
very cumbersome. The other PLs have their own test suites split up into
smaller files by topic. It would be nice to have that for plpgsql as
well. So, to get that started, set up test infrastructure in
src/pl/plpgsql/src/ and split out the recently added procedure test
cases into a new file there. That file now mirrors the test cases added
to the other PLs, making managing those matching tests a bit easier too.
msvc build system changes with help from Michael Paquier
After d0aa965c0a, one error path in
PLy_spi_execute_fetch_result() could result in the variable "result"
being dereferenced after being set to NULL. Rearrange the code a bit to
fix that.
Also add another SPI_freetuptable() call so that that is cleared in all
error paths.
discovered by John Naylor <jcnaylor@gmail.com> via scan-build
ideas and review by Tom Lane
If one exits and re-enters a DECLARE ... BEGIN ... END block within a
single execution of a plpgsql function, perhaps due to a surrounding loop,
the declared variables are supposed to get re-initialized to null (or
whatever their initializer is). But this failed to happen for variables
of type "record", because while exec_stmt_block() expected such variables
to be included in the block's initvarnos list, plpgsql_add_initdatums()
only adds DTYPE_VAR variables to that list. This bug appears to have
been there since the aboriginal addition of plpgsql to our tree.
Fix by teaching plpgsql_add_initdatums() to include DTYPE_REC variables
as well. (We don't need to consider other DTYPEs because they don't
represent separately-stored values.) I failed to resist the temptation
to make some nearby cosmetic adjustments, too.
No back-patch, because there have not been field complaints, and it
seems possible that somewhere out there someone has code depending
on the incorrect behavior. In any case this change would have no
impact on correctly-written code.
Discussion: https://postgr.es/m/22994.1512800671@sss.pgh.pa.us
plpgsql's function exec_move_row() handles assignment of a composite
source value to either a PLpgSQL_rec or PLpgSQL_row target variable.
Oddly, rather than taking a single target argument which it could do
run-time type detection on, it was coded to take two separate arguments
(only one of which is allowed to be non-NULL). This choice had then
back-propagated into storing two separate target variables in various
plpgsql statement nodes, with lots of duplicative coding and awkward
interface logic to support that. Simplify matters by folding those
pairs down to single variables, distinguishing the two cases only
where we must ... which turns out to be only in exec_move_row itself.
This is purely refactoring and should not change any behavior.
In passing, remove unused field PLpgSQL_stmt_open.returntype.
Discussion: https://postgr.es/m/11787.1512713374@sss.pgh.pa.us
I'm a little bit astonished that anyone's compiler would have failed to
complain about this. The compiler surely does not know that is_procedure
means the function return value will be ignored.
This adds a new object type "procedure" that is similar to a function
but does not have a return type and is invoked by the new CALL statement
instead of SELECT or similar. This implementation is aligned with the
SQL standard and compatible with or similar to other SQL implementations.
This commit adds new commands CALL, CREATE/ALTER/DROP PROCEDURE, as well
as ALTER/DROP ROUTINE that can refer to either a function or a
procedure (or an aggregate function, as an extension to SQL). There is
also support for procedures in various utility commands such as COMMENT
and GRANT, as well as support in pg_dump and psql. Support for defining
procedures is available in all the languages supplied by the core
distribution.
While this commit is mainly syntax sugar around existing functionality,
future features will rely on having procedures as a separate object
type.
Reviewed-by: Andrew Dunstan <andrew.dunstan@2ndquadrant.com>