This patch widens SPI_processed, EState's es_processed field, PortalData's
portalPos field, FuncCallContext's call_cntr and max_calls fields,
ExecutorRun's count argument, PortalRunFetch's result, and the max number
of rows in a SPITupleTable to uint64, and deals with (I hope) all the
ensuing fallout. Some of these values were declared uint32 before, and
others "long".
I also removed PortalData's posOverflow field, since that logic seems
pretty useless given that portalPos is now always 64 bits.
The user-visible results are that command tags for SELECT etc will
correctly report tuple counts larger than 4G, as will plpgsql's GET
GET DIAGNOSTICS ... ROW_COUNT command. Queries processing more tuples
than that are still not exactly the norm, but they're becoming more
common.
Most values associated with FETCH/MOVE distances, such as PortalRun's count
argument and the count argument of most SPI functions that have one, remain
declared as "long". It's not clear whether it would be worth promoting
those to int64; but it would definitely be a large dollop of additional
API churn on top of this, and it would only help 32-bit platforms which
seem relatively less likely to see any benefit.
Andreas Scherbaum, reviewed by Christian Ullrich, additional hacking by me
Instead of having planner.c compute a groupColIdx array and store it in
GroupingSetsPaths, make create_groupingsets_plan() find the grouping
columns by searching in the child plan node's tlist. Although that's
probably a bit slower for create_groupingsets_plan(), it's more like
the way every other plan node type does this, and it provides positive
confirmation that we know which child output columns we're supposed to be
grouping on. (Indeed, looking at this now, I'm not at all sure that it
wasn't broken before, because create_groupingsets_plan() isn't demanding
an exact tlist match from its child node.) Also, this allows substantial
simplification in planner.c, because it no longer needs to compute the
groupColIdx array at all; no other cases were using it.
I'd intended to put off this refactoring until later (like 9.7), but
in view of the likely bug fix and the need to rationalize planner.c's
tlist handling so we can do something sane with Konstantin Knizhnik's
function-evaluation-postponement patch, I think it can't wait.
I've been saying we needed to do this for more than five years, and here it
finally is. This patch removes the ever-growing tangle of spaghetti logic
that grouping_planner() used to use to try to identify the best plan for
post-scan/join query steps. Now, there is (nearly) independent
consideration of each execution step, and entirely separate construction of
Paths to represent each of the possible ways to do that step. We choose
the best Path or set of Paths using the same add_path() logic that's been
used inside query_planner() for years.
In addition, this patch removes the old restriction that subquery_planner()
could return only a single Plan. It now returns a RelOptInfo containing a
set of Paths, just as query_planner() does, and the parent query level can
use each of those Paths as the basis of a SubqueryScanPath at its level.
This allows finding some optimizations that we missed before, wherein a
subquery was capable of returning presorted data and thereby avoiding a
sort in the parent level, making the overall cost cheaper even though
delivering sorted output was not the cheapest plan for the subquery in
isolation. (A couple of regression test outputs change in consequence of
that. However, there is very little change in visible planner behavior
overall, because the point of this patch is not to get immediate planning
benefits but to create the infrastructure for future improvements.)
There is a great deal left to do here. This patch unblocks a lot of
planner work that was basically impractical in the old code structure,
such as allowing FDWs to implement remote aggregation, or rewriting
plan_set_operations() to allow consideration of multiple implementation
orders for set operations. (The latter will likely require a full
rewrite of plan_set_operations(); what I've done here is only to fix it
to return Paths not Plans.) I have also left unfinished some localized
refactoring in createplan.c and planner.c, because it was not necessary
to get this patch to a working state.
Thanks to Robert Haas, David Rowley, and Amit Kapila for review.
Up to now, there's been an assumption that all Paths for a given relation
compute the same output column set (targetlist). However, there are good
reasons to remove that assumption. For example, an indexscan on an
expression index might be able to return the value of an expensive function
"for free". While we have the ability to generate such a plan today in
simple cases, we don't have a way to model that it's cheaper than a plan
that computes the function from scratch, nor a way to create such a plan
in join cases (where the function computation would normally happen at
the topmost join node). Also, we need this so that we can have Paths
representing post-scan/join steps, where the targetlist may well change
from one step to the next. Therefore, invent a "struct PathTarget"
representing the columns we expect a plan step to emit. It's convenient
to include the output tuple width and tlist evaluation cost in this struct,
and there will likely be additional fields in future.
While Path nodes that actually do have custom outputs will need their own
PathTargets, it will still be true that most Paths for a given relation
will compute the same tlist. To reduce the overhead added by this patch,
keep a "default PathTarget" in RelOptInfo, and allow Paths that compute
that column set to just point to their parent RelOptInfo's reltarget.
(In the patch as committed, actually every Path is like that, since we
do not yet have any cases of custom PathTargets.)
I took this opportunity to provide some more-honest costing of
PlaceHolderVar evaluation. Up to now, the assumption that "scan/join
reltargetlists have cost zero" was applied not only to Vars, where it's
reasonable, but also PlaceHolderVars where it isn't. Now, we add the eval
cost of a PlaceHolderVar's expression to the first plan level where it can
be computed, by including it in the PathTarget cost field and adding that
to the cost estimates for Paths. This isn't perfect yet but it's much
better than before, and there is a way forward to improve it more. This
costing change affects the join order chosen for a couple of the regression
tests, changing expected row ordering.
An extensible node is always tagged T_Extensible, but the extnodename
field identifies it more specifically; it may also include arbitrary
private data. Extensible nodes can be copied, tested for equality,
serialized, and deserialized, but the core system doesn't know
anything about them otherwise. Some extensions may find it useful to
include these nodes in fdw_private or custom_private lists in lieu of
arm-wrestling their data into a format that the core code can
understand.
Along the way, so as not to burden the authors of such extensible
node types too much, expose the functions for writing serialized
tokens, and for serializing and deserializing bitmapsets.
KaiGai Kohei, per a design suggested by me. Reviewed by Andres Freund
and by me, and further edited by me.
When force_parallel_mode = true, we enable the parallel mode restrictions
for all queries for which this is believed to be safe. For the subset of
those queries believed to be safe to run entirely within a worker, we spin
up a worker and run the query there instead of running it in the
original process. When force_parallel_mode = regress, make additional
changes to allow the regression tests to run cleanly even though parallel
workers have been injected under the hood.
Taken together, this facilitates both better user testing and better
regression testing of the parallelism code.
Robert Haas, with help from Amit Kapila and Rushabh Lathia.
This patch doesn't put the new infrastructure to use anywhere, and
indeed it's not clear how it could ever be used for something like
postgres_fdw which has to send an SQL query and wait for a reply,
but there might be FDWs or custom scan providers that are CPU-bound,
so let's give them a way to join club parallel.
KaiGai Kohei, reviewed by me.
You can't really do anything useful with this in the form it currently
exists; among other problems, there's no way to reread whatever
information might be produced when the path is output. Work is
underway to replace this with a more useful and more general system of
extensible nodes, but let's start by getting rid of this bit.
Extracted from a larger patch by KaiGai Kohei.
Previously, the foreign join pushdown infrastructure left the question
of security entirely up to individual FDWs, but it would be easy for
a foreign data wrapper to inadvertently open up subtle security holes
that way. So, make it the core code's job to determine which user
mapping OID is relevant, and don't attempt join pushdown unless it's
the same for all relevant relations.
Per a suggestion from Tom Lane. Shigeru Hanada and Ashutosh Bapat,
reviewed by Etsuro Fujita and KaiGai Kohei, with some further
changes by me.
Putting a reference to an expanded-format value into a Const node would be
a bad idea for a couple of reasons. It'd be possible for the supposedly
immutable Const to change value, if something modified the referenced
variable ... in fact, if the Const's reference were R/W, any function that
has the Const as argument might itself change it at runtime. Also, because
datumIsEqual() is pretty simplistic, the Const might fail to compare equal
to other Consts that it should compare equal to, notably including copies
of itself. This could lead to unexpected planner behavior, such as "could
not find pathkey item to sort" errors or inferior plans.
I have not been able to find any way to get an expanded value into a Const
within the existing core code; but Paul Ramsey was able to trigger the
problem by writing a datatype input function that returns an expanded
value.
The best fix seems to be to establish a rule that varlena values being
placed into Const nodes should be passed through pg_detoast_datum().
That will do nothing (and cost little) in normal cases, but it will flatten
expanded values and thereby avoid the above problems. Also, it will
convert short-header or compressed values into canonical format, which will
avoid possible unexpected lack-of-equality issues for those cases too.
And it provides a last-ditch defense against putting a toasted value into
a Const, which we already knew was dangerous, cf commit 2b0c86b665.
(In the light of this discussion, I'm no longer sure that that commit
provided 100% protection against such cases, but this fix should do it.)
The test added in commit 65c3d05e18 to catch datatype input functions
with unstable results would fail for functions that returned expanded
values; but it seems a bit uncharitable to deem a result unstable just
because it's expressed in expanded form, so revise the coding so that we
check for bitwise equality only after applying pg_detoast_datum(). That's
a sufficient condition anyway given the new rule about detoasting when
forming a Const.
Back-patch to 9.5 where the expanded-object facility was added. It's
possible that this should go back further; but in the absence of clear
evidence that there's any live bug in older branches, I'll refrain for now.
The core innovation of this patch is the introduction of the concept
of a partial path; that is, a path which if executed in parallel will
generate a subset of the output rows in each process. Gathering a
partial path produces an ordinary (complete) path. This allows us to
generate paths for parallel joins by joining a partial path for one
side (which at the baserel level is currently always a Partial Seq
Scan) to an ordinary path on the other side. This is subject to
various restrictions at present, especially that this strategy seems
unlikely to be sensible for merge joins, so only nested loops and
hash joins paths are generated.
This also allows an Append node to be pushed below a Gather node in
the case of a partitioned table.
Testing revealed that early versions of this patch made poor decisions
in some cases, which turned out to be caused by the fact that the
original cost model for Parallel Seq Scan wasn't very good. So this
patch tries to make some modest improvements in that area.
There is much more to be done in the area of generating good parallel
plans in all cases, but this seems like a useful step forward.
Patch by me, reviewed by Dilip Kumar and Amit Kapila.
Aggregate nodes now have two new modes: a "partial" mode where they
output the unfinalized transition state, and a "finalize" mode where
they accept unfinalized transition states rather than individual
values as input.
These new modes are not used anywhere yet, but they will be necessary
for parallel aggregation. The infrastructure also figures to be
useful for cases where we want to aggregate local data and remote
data via the FDW interface, and want to bring back partial aggregates
from the remote side that can then be combined with locally generated
partial aggregates to produce the final value. It may also be useful
even when neither FDWs nor parallelism are in play, as explained in
the comments in nodeAgg.c.
David Rowley and Simon Riggs, reviewed by KaiGai Kohei, Heikki
Linnakangas, Haribabu Kommi, and me.
This patch reduces pg_am to just two columns, a name and a handler
function. All the data formerly obtained from pg_am is now provided
in a C struct returned by the handler function. This is similar to
the designs we've adopted for FDWs and tablesample methods. There
are multiple advantages. For one, the index AM's support functions
are now simple C functions, making them faster to call and much less
error-prone, since the C compiler can now check function signatures.
For another, this will make it far more practical to define index access
methods in installable extensions.
A disadvantage is that SQL-level code can no longer see attributes
of index AMs; in particular, some of the crosschecks in the opr_sanity
regression test are no longer possible from SQL. We've addressed that
by adding a facility for the index AM to perform such checks instead.
(Much more could be done in that line, but for now we're content if the
amvalidate functions more or less replace what opr_sanity used to do.)
We might also want to expose some sort of reporting functionality, but
this patch doesn't do that.
Alexander Korotkov, reviewed by Petr Jelínek, and rather heavily
editorialized on by me.
Omitted boundaries represent the upper or lower limit of the corresponding
array subscript. This allows simpler specification of many common
use-cases.
(Revised version of commit 9246af6799)
YUriy Zhuravlev
I originally modeled this data structure on SpecialJoinInfo, but after
commit acfcd45cac that looks like a pretty poor decision.
All we really need is relid sets identifying laterally-referenced rels;
and most of the time, what we want to know about includes indirect lateral
references, a case the LateralJoinInfo data was unsuited to compute with
any efficiency. The previous commit redefined RelOptInfo.lateral_relids
as the transitive closure of lateral references, so that it easily supports
checking indirect references. For the places where we really do want just
direct references, add a new RelOptInfo field direct_lateral_relids, which
is easily set up as a copy of lateral_relids before we perform the
transitive closure calculation. Then we can just drop lateral_info_list
and LateralJoinInfo and the supporting code. This makes the planner's
handling of lateral references noticeably more efficient, and shorter too.
Such a change can't be back-patched into stable branches for fear of
breaking extensions that might be looking at the planner's data structures;
but it seems not too late to push it into 9.5, so I've done so.
More fuzz testing by Andreas Seltenreich exposed that the planner did not
cope well with chains of lateral references. If relation X references Y
laterally, and Y references Z laterally, then we will have to scan X on the
inside of a nestloop with Z, so for all intents and purposes X is laterally
dependent on Z too. The planner did not understand this and would generate
intermediate joins that could not be used. While that was usually harmless
except for wasting some planning cycles, under the right circumstances it
would lead to "failed to build any N-way joins" or "could not devise a
query plan" planner failures.
To fix that, convert the existing per-relation lateral_relids and
lateral_referencers relid sets into their transitive closures; that is,
they now show all relations on which a rel is directly or indirectly
laterally dependent. This not only fixes the chained-reference problem
but allows some of the relevant tests to be made substantially simpler
and faster, since they can be reduced to simple bitmap manipulations
instead of searches of the LateralJoinInfo list.
Also, when a PlaceHolderVar that is due to be evaluated at a join contains
lateral references, we should treat those references as indirect lateral
dependencies of each of the join's base relations. This prevents us from
trying to join any individual base relations to the lateral reference
source before the join is formed, which again cannot work.
Andreas' testing also exposed another oversight in the "dangerous
PlaceHolderVar" test added in commit 85e5e222b1. Simply rejecting
unsafe join paths in joinpath.c is insufficient, because in some cases
we will end up rejecting *all* possible paths for a particular join, again
leading to "could not devise a query plan" failures. The restriction has
to be known also to join_is_legal and its cohort functions, so that they
will not select a join for which that will happen. I chose to move the
supporting logic into joinrels.c where the latter functions are.
Back-patch to 9.3 where LATERAL support was introduced.
The original parallel sequential scan commit included only very limited
changes to the EXPLAIN output. Aggregated totals from all workers were
displayed, but there was no way to see what each individual worker did
or to distinguish the effort made by the workers from the effort made by
the leader.
Per a gripe by Thom Brown (and maybe others). Patch by me, reviewed
by Amit Kapila.
The single linked list of memory contexts could result in O(N^2)
performance to free a set of contexts if they were not freed in
reverse order of creation. In many cases the reverse order was
used, but there were some significant exceptions that caused real-
world performance problems. Rather than requiring all callers to
care about the order in which contexts were freed, and hunting down
and changing all existing cases where the wrong order was used, we
add one pointer per memory context so that the implementation
details are not so visible.
Jan Wieck
Commit e7cb7ee145 provided basic
infrastructure for allowing a foreign data wrapper or custom scan
provider to replace a join of one or more tables with a scan.
However, this infrastructure failed to take into account the need
for possible EvalPlanQual rechecks, and ExecScanFetch would fail
an assertion (or just overwrite memory) if such a check was attempted
for a plan containing a pushed-down join. To fix, adjust the EPQ
machinery to skip some processing steps when scanrelid == 0, making
those the responsibility of scan's recheck method, which also has
the responsibility in this case of correctly populating the relevant
slot.
To allow foreign scans to gain control in the right place to make
use of this new facility, add a new, optional RecheckForeignScan
method. Also, allow a foreign scan to have a child plan, which can
be used to correctly populate the slot (or perhaps for something
else, but this is the only use currently envisioned).
KaiGai Kohei, reviewed by Robert Haas, Etsuro Fujita, and Kyotaro
Horiguchi.
While convincing myself that commit 7e19db0c09 would solve both of
the problems recently reported by Andreas Seltenreich, I realized that
add_paths_to_joinrel's handling of LATERAL restrictions could be made
noticeably simpler and faster if we were to retain the minimum possible
parameterization for each joinrel (that is, the set of relids supplying
unsatisfied lateral references in it). We already retain that for
baserels, in RelOptInfo.lateral_relids, so we can use that field for
joinrels too.
I re-pgindent'd the files touched here, which affects some unrelated
comments.
This is, I believe, just a minor optimization not a bug fix, so no
back-patch.
Commit a0d9f6e434 added this support for
all other plan node types; this fills in the gap.
Since TextOutCustomScan complicates this and is pretty well useless,
remove it.
KaiGai Kohei, with some modifications by me.
Add a new flag, consider_parallel, to each RelOptInfo, indicating
whether a plan for that relation could conceivably be run inside of
a parallel worker. Right now, we're pretty conservative: for example,
it might be possible to defer applying a parallel-restricted qual
in a worker, and later do it in the leader, but right now we just
don't try to parallelize access to that relation. That's probably
the right decision in most cases, anyway.
Using the new flag, generate parallel sequential scan plans for plain
baserels, meaning that we now have parallel sequential scan in
PostgreSQL. The logic here is pretty unsophisticated right now: the
costing model probably isn't right in detail, and we can't push joins
beneath Gather nodes, so the number of plans that can actually benefit
from this is pretty limited right now. Lots more work is needed.
Nevertheless, it seems time to enable this functionality so that all
this code can actually be tested easily by users and developers.
Note that, if you wish to test this functionality, it will be
necessary to set max_parallel_degree to a value greater than the
default of 0. Once a few more loose ends have been tidied up here, we
might want to consider changing the default value of this GUC, but
I'm leaving it alone for now.
Along the way, fix a bug in cost_gather: the previous coding thought
that a Gather node's transfer overhead should be costed on the basis of
the relation size rather than the number of tuples that actually need
to be passed off to the leader.
Patch by me, reviewed in earlier versions by Amit Kapila.
In addition, this path fills in a number of missing bits and pieces in
the parallel infrastructure. Paths and plans now have a parallel_aware
flag indicating whether whatever parallel-aware logic they have should
be engaged. It is believed that we will need this flag for a number of
path/plan types, not just sequential scans, which is why the flag is
generic rather than part of the SeqScan structures specifically.
Also, execParallel.c now gives parallel nodes a chance to initialize
their PlanState nodes from the DSM during parallel worker startup.
Amit Kapila, with a fair amount of adjustment by me. Review of previous
patch versions by Haribabu Kommi and others.
Commit 4a4e6893aa, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
Commit d1b7c1ffe7 introduced a mechanism
for serializing a ParamListInfo structure to be passed to a parallel
worker. However, this mechanism failed to handle external expanded
values, as pointed out by Noah Misch. Repair.
Moreover, plpgsql_param_fetch requires adjustment because the
serialization mechanism needs it to skip evaluating unused parameters
just as we would do when it is called from copyParamList, but params
== estate->paramLI in that case. To fix, make the bms_is_member test
in that function unconditional.
Finally, have setup_param_list set a new ParamListInfo field,
paramMask, to the parameters actually used in the expression, so that
we don't try to fetch those that are not needed when serializing a
parameter list. This isn't necessary for correctness, but it makes
the performance of the parallel executor code comparable to what we
do for cases involving cursors.
Design suggestions and extensive review by Noah Misch. Patch by me.
The original Gather code failed to mark a Gather node as not able to
do projection, but it couldn't, even though it did call initialize its
projection info via ExecAssignProjectionInfo. There doesn't seem to
be any good reason for this node not to have projection capability,
so clean things up so that it does. Without this, plans using Gather
nodes might need to carry extra Result nodes to do projection.
In the previous coding, before returning from ExecutorRun, we'd shut
down all parallel workers. This was dead wrong if ExecutorRun was
called with a non-zero tuple count; it had the effect of truncating
the query output. To fix, give ExecutePlan control over whether to
enter parallel mode, and have it refuse to do so if the tuple count
is non-zero. Rewrite the Gather logic so that it can cope with being
called outside parallel mode.
Commit 7aea8e4f2d is largely to blame
for this problem, though this patch modifies some subsequently-committed
code which relied on the guarantees it purported to make.
This fixes a long-standing bug which was discovered while investigating
the interaction between the new join pushdown code and the EvalPlanQual
machinery: if a ForeignScan appears on the inner side of a paramaterized
nestloop, an EPQ recheck would re-return the original tuple even if
it no longer satisfied the pushed-down quals due to changed parameter
values.
This fix adds a new member to ForeignScan and ForeignScanState and a
new argument to make_foreignscan, and requires changes to FDWs which
push down quals to populate that new argument with a list of quals they
have chosen to push down. Therefore, I'm only back-patching to 9.5,
even though the bug is not new in 9.5.
Etsuro Fujita, reviewed by me and by Kyotaro Horiguchi.
The WithCheckOptions list in Query are only populated during rewrite and
do not need to be written out or read in as part of a Query structure.
Further, move WithCheckOptions to the bottom and add comments to clarify
that it is only populated during rewrite.
Back-patch to 9.5 with a catversion bump, as we are still in alpha.
To allow users to force RLS to always be applied, even for table owners,
add ALTER TABLE .. FORCE ROW LEVEL SECURITY.
row_security=off overrides FORCE ROW LEVEL SECURITY, to ensure pg_dump
output is complete (by default).
Also add SECURITY_NOFORCE_RLS context to avoid data corruption when
ALTER TABLE .. FORCE ROW SECURITY is being used. The
SECURITY_NOFORCE_RLS security context is used only during referential
integrity checks and is only considered in check_enable_rls() after we
have already checked that the current user is the owner of the relation
(which should always be the case during referential integrity checks).
Back-patch to 9.5 where RLS was added.
A Gather executor node runs any number of copies of a plan in an equal
number of workers and merges all of the results into a single tuple
stream. It can also run the plan itself, if the workers are
unavailable or haven't started up yet. It is intended to work with
the Partial Seq Scan node which will be added in future commits.
It could also be used to implement parallel query of a different sort
by itself, without help from Partial Seq Scan, if the single_copy mode
is used. In that mode, a worker executes the plan, and the parallel
leader does not, merely collecting the worker's results. So, a Gather
node could be inserted into a plan to split the execution of that plan
across two processes. Nested Gather nodes aren't currently supported,
but we might want to add support for that in the future.
There's nothing in the planner to actually generate Gather nodes yet,
so it's not quite time to break out the champagne. But we're getting
close.
Amit Kapila. Some designs suggestions were provided by me, and I also
reviewed the patch. Single-copy mode, documentation, and other minor
changes also by me.
This code provides infrastructure for a parallel leader to start up
parallel workers to execute subtrees of the plan tree being executed
in the master. User-supplied parameters from ParamListInfo are passed
down, but PARAM_EXEC parameters are not. Various other constructs,
such as initplans, subplans, and CTEs, are also not currently shared.
Nevertheless, there's enough here to support a basic implementation of
parallel query, and we can lift some of the current restrictions as
needed.
Amit Kapila and Robert Haas
ExplainPreScanNode knows how to iterate over a generic tree of plan
states; factor that logic out into a separate walker function so that
other code, such as upcoming patches for parallel query, can also use
it.
Patch by me, reviewed by Tom Lane.
Commit 924bcf4f16 introduced a framework
for parallel computation in PostgreSQL that makes most but not all
built-in functions safe to execute in parallel mode. In order to have
parallel query, we'll need to be able to determine whether that query
contains functions (either built-in or user-defined) that cannot be
safely executed in parallel mode. This requires those functions to be
labeled, so this patch introduces an infrastructure for that. Some
functions currently labeled as safe may need to be revised depending on
how pending issues related to heavyweight locking under paralllelism
are resolved.
Parallel plans can't be used except for the case where the query will
run to completion. If portal execution were suspended, the parallel
mode restrictions would need to remain in effect during that time, but
that might make other queries fail. Therefore, this patch introduces
a framework that enables consideration of parallel plans only when it
is known that the plan will be run to completion. This probably needs
some refinement; for example, at bind time, we do not know whether a
query run via the extended protocol will be execution to completion or
run with a limited fetch count. Having the client indicate its
intentions at bind time would constitute a wire protocol break. Some
contexts in which parallel mode would be safe are not adjusted by this
patch; the default is not to try parallel plans except from call sites
that have been updated to say that such plans are OK.
This commit doesn't introduce any parallel paths or plans; it just
provides a way to determine whether they could potentially be used.
I'm committing it on the theory that the remaining parallel sequential
scan patches will also get committed to this release, hopefully in the
not-too-distant future.
Robert Haas and Amit Kapila. Reviewed (in earlier versions) by Noah
Misch.
This refactors rewrite/rowsecurity.c to simplify the handling of the
default deny case (reducing the number of places where we check for and
add the default deny policy from three to one) by splitting up the
retrival of the policies from the application of them.
This also allowed us to do away with the policy_id field. A policy_name
field was added for WithCheckOption policies and is used in error
reporting, when available.
Patch by Dean Rasheed, with various mostly cosmetic changes by me.
Back-patch to 9.5 where RLS was introduced to avoid unnecessary
differences, since we're still in alpha, per discussion with Robert.
Per discussion, nowadays it is possible to have tablespaces that have
wildly different I/O characteristics from others. Setting different
effective_io_concurrency parameters for those has been measured to
improve performance.
Author: Julien Rouhaud
Reviewed by: Andres Freund
Since 6fcd885 it is possible to immediately reserve WAL when creating a
slot via pg_create_physical_replication_slot(). Extend the replication
protocol to allow that as well.
Although, in contrast to the SQL interface, it is possible to update the
reserved location via the replication interface, it is still useful
being able to reserve upon creation there. Otherwise the logic in
ReplicationSlotReserveWal() has to be repeated in slot employing
clients.
Author: Michael Paquier
Discussion: CAB7nPqT0Wc1W5mdYGeJ_wbutbwNN+3qgrFR64avXaQCiJMGaYA@mail.gmail.com
We had a report from Stefan Kaltenbrunner of a case in which postmaster
log files overran available disk space because multiple backends spewed
enormous context stats dumps upon hitting an out-of-memory condition.
Given the lack of similar reports, this isn't a common problem, but it
still seems worth doing something about. However, we don't want to just
blindly truncate the output, because that might prevent diagnosis of OOM
problems. What seems like a workable compromise is to limit the dump to
100 child contexts per parent, and summarize the space used within any
additional child contexts. That should help because practical cases where
the dump gets long will typically be huge numbers of siblings under the
same parent context; while the additional debugging value from seeing
details about individual siblings beyond 100 will not be large, we hope.
Anyway it doesn't take much code or memory space to do this, so let's try
it like this and see how things go.
Since the summarization mechanism requires passing totals back up anyway,
I took the opportunity to add a "grand total" line to the end of the
printout.
To avoid confusion, rename CreatePolicyStmt's 'cmd' to 'cmd_name',
parse_policy_command's 'cmd' to 'polcmd', and AlterPolicy's 'cmd_datum'
to 'polcmd_datum', per discussion with Noah and as a follow-up to his
correction of copynodes/equalnodes handling of the CreatePolicyStmt
'cmd' field.
Back-patch to 9.5 where the CreatePolicyStmt was introduced, as we
are still only in alpha.
Until now we computed these Param ID sets at the end of subquery_planner,
but that approach depends on subquery_planner returning a concrete Plan
tree. We would like to switch over to returning one or more Paths for a
subquery, and in that representation the necessary details aren't fully
fleshed out (not to mention that we don't really want to do this work for
Paths that end up getting discarded). Hence, refactor so that we can
compute the param ID sets at the end of planning, just before
set_plan_references is run.
The main change necessary to make this work is that we need to capture
the set of outer-level Param IDs available to the current query level
before exiting subquery_planner, since the outer levels' plan_params lists
are transient. (That's not going to pose a problem for returning Paths,
since all the work involved in producing that data is part of expression
preprocessing, which will continue to happen before Paths are produced.)
On the plus side, this change gets rid of several existing kluges.
Eventually I'd like to get rid of SS_finalize_plan altogether in favor of
doing this work during set_plan_references, but that will require some
complex rejiggering because SS_finalize_plan needs to visit subplans and
initplans before the main plan. So leave that idea for another day.
So far we have worked around the fact that some very old compilers do
not support 'inline' functions by only using inline functions
conditionally (or not at all). Since such compilers are very rare by
now, we have decided to rely on inline functions from 9.6 onwards.
To avoid breaking these old compilers inline is defined away when not
supported. That'll cause "function x defined but not used" type of
warnings, but since nobody develops on such compilers anymore that's
ok.
This change in policy will allow us to more easily employ inline
functions.
I chose to remove code previously conditional on PG_USE_INLINE as it
seemed confusing to have code dependent on a define that's always
defined.
Blacklisting of compilers, like in c53f73879f, now has to be done
differently. A platform template can define PG_FORCE_DISABLE_INLINE to
force inline to be defined empty.
Discussion: 20150701161447.GB30708@awork2.anarazel.de
If there are two different aggregates in the query with same inputs, and
the aggregates have the same initial condition and transition function,
only calculate the state value once, and only call the final functions
separately. For example, AVG(x) and SUM(x) aggregates have the same
transition function, which accumulates the sum and number of input tuples.
For a query like "SELECT AVG(x), SUM(x) FROM x", we can therefore
accumulate the state function only once, which gives a nice speedup.
David Rowley, reviewed and edited by me.
The original implementation of TABLESAMPLE modeled the tablesample method
API on index access methods, which wasn't a good choice because, without
specialized DDL commands, there's no way to build an extension that can
implement a TSM. (Raw inserts into system catalogs are not an acceptable
thing to do, because we can't undo them during DROP EXTENSION, nor will
pg_upgrade behave sanely.) Instead adopt an API more like procedural
language handlers or foreign data wrappers, wherein the only SQL-level
support object needed is a single handler function identified by having
a special return type. This lets us get rid of the supporting catalog
altogether, so that no custom DDL support is needed for the feature.
Adjust the API so that it can support non-constant tablesample arguments
(the original coding assumed we could evaluate the argument expressions at
ExecInitSampleScan time, which is undesirable even if it weren't outright
unsafe), and discourage sampling methods from looking at invisible tuples.
Make sure that the BERNOULLI and SYSTEM methods are genuinely repeatable
within and across queries, as required by the SQL standard, and deal more
honestly with methods that can't support that requirement.
Make a full code-review pass over the tablesample additions, and fix
assorted bugs, omissions, infelicities, and cosmetic issues (such as
failure to put the added code stanzas in a consistent ordering).
Improve EXPLAIN's output of tablesample plans, too.
Back-patch to 9.5 so that we don't have to support the original API
in production.
In the passing, also move AT_ReAddComment to more logical position in the
enum, after all the Constraint-related subcommands.
This fixes a compiler warning, added by commit e42375fc. Backpatch to 9.5,
like that patch.