1996-07-09 02:22:35 -04:00
|
|
|
/*-------------------------------------------------------------------------
|
|
|
|
|
*
|
1999-02-13 18:22:53 -05:00
|
|
|
* planmain.h
|
1996-07-09 02:22:35 -04:00
|
|
|
* prototypes for various files in optimizer/plan
|
|
|
|
|
*
|
|
|
|
|
*
|
2025-01-01 11:21:55 -05:00
|
|
|
* Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
|
2000-01-26 00:58:53 -05:00
|
|
|
* Portions Copyright (c) 1994, Regents of the University of California
|
1996-07-09 02:22:35 -04:00
|
|
|
*
|
2010-09-20 16:08:53 -04:00
|
|
|
* src/include/optimizer/planmain.h
|
1996-07-09 02:22:35 -04:00
|
|
|
*
|
|
|
|
|
*-------------------------------------------------------------------------
|
|
|
|
|
*/
|
|
|
|
|
#ifndef PLANMAIN_H
|
|
|
|
|
#define PLANMAIN_H
|
|
|
|
|
|
2019-01-29 16:49:25 -05:00
|
|
|
#include "nodes/pathnodes.h"
|
1997-11-25 20:14:33 -05:00
|
|
|
#include "nodes/plannodes.h"
|
1996-07-09 02:22:35 -04:00
|
|
|
|
2008-05-02 17:26:10 -04:00
|
|
|
/* GUC parameters */
|
|
|
|
|
#define DEFAULT_CURSOR_TUPLE_FRACTION 0.1
|
2022-04-08 08:16:38 -04:00
|
|
|
extern PGDLLIMPORT double cursor_tuple_fraction;
|
Implement Self-Join Elimination
The Self-Join Elimination (SJE) feature removes an inner join of a plain
table to itself in the query tree if it is proven that the join can be
replaced with a scan without impacting the query result. Self-join and
inner relation get replaced with the outer in query, equivalence classes,
and planner info structures. Also, the inner restrictlist moves to the
outer one with the removal of duplicated clauses. Thus, this optimization
reduces the length of the range table list (this especially makes sense for
partitioned relations), reduces the number of restriction clauses and,
in turn, selectivity estimations, and potentially improves total planner
prediction for the query.
This feature is dedicated to avoiding redundancy, which can appear after
pull-up transformations or the creation of an EquivalenceClass-derived clause
like the below.
SELECT * FROM t1 WHERE x IN (SELECT t3.x FROM t1 t3);
SELECT * FROM t1 WHERE EXISTS (SELECT t3.x FROM t1 t3 WHERE t3.x = t1.x);
SELECT * FROM t1,t2, t1 t3 WHERE t1.x = t2.x AND t2.x = t3.x;
In the future, we could also reduce redundancy caused by subquery pull-up
after unnecessary outer join removal in cases like the one below.
SELECT * FROM t1 WHERE x IN
(SELECT t3.x FROM t1 t3 LEFT JOIN t2 ON t2.x = t1.x);
Also, it can drastically help to join partitioned tables, removing entries
even before their expansion.
The SJE proof is based on innerrel_is_unique() machinery.
We can remove a self-join when for each outer row:
1. At most, one inner row matches the join clause;
2. Each matched inner row must be (physically) the same as the outer one;
3. Inner and outer rows have the same row mark.
In this patch, we use the next approach to identify a self-join:
1. Collect all merge-joinable join quals which look like a.x = b.x;
2. Add to the list above the baseretrictinfo of the inner table;
3. Check innerrel_is_unique() for the qual list. If it returns false, skip
this pair of joining tables;
4. Check uniqueness, proved by the baserestrictinfo clauses. To prove the
possibility of self-join elimination, the inner and outer clauses must
match exactly.
The relation replacement procedure is not trivial and is partly combined
with the one used to remove useless left joins. Tests covering this feature
were added to join.sql. Some of the existing regression tests changed due
to self-join removal logic.
Discussion: https://postgr.es/m/flat/64486b0b-0404-e39e-322d-0801154901f3%40postgrespro.ru
Author: Andrey Lepikhov <a.lepikhov@postgrespro.ru>
Author: Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru>
Co-authored-by: Alexander Korotkov <aekorotkov@gmail.com>
Co-authored-by: Alena Rybakina <lena.ribackina@yandex.ru>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Simon Riggs <simon@2ndquadrant.com>
Reviewed-by: Jonathan S. Katz <jkatz@postgresql.org>
Reviewed-by: David Rowley <david.rowley@2ndquadrant.com>
Reviewed-by: Thomas Munro <thomas.munro@enterprisedb.com>
Reviewed-by: Konstantin Knizhnik <k.knizhnik@postgrespro.ru>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Hywel Carver <hywel@skillerwhale.com>
Reviewed-by: Laurenz Albe <laurenz.albe@cybertec.at>
Reviewed-by: Ronan Dunklau <ronan.dunklau@aiven.io>
Reviewed-by: vignesh C <vignesh21@gmail.com>
Reviewed-by: Zhihong Yu <zyu@yugabyte.com>
Reviewed-by: Greg Stark <stark@mit.edu>
Reviewed-by: Jaime Casanova <jcasanov@systemguards.com.ec>
Reviewed-by: Michał Kłeczek <michal@kleczek.org>
Reviewed-by: Alena Rybakina <lena.ribackina@yandex.ru>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
2025-02-12 17:56:03 -05:00
|
|
|
extern PGDLLIMPORT bool enable_self_join_elimination;
|
2008-05-02 17:26:10 -04:00
|
|
|
|
Postpone creation of pathkeys lists to fix bug #8049.
This patch gets rid of the concept of, and infrastructure for,
non-canonical PathKeys; we now only ever create canonical pathkey lists.
The need for non-canonical pathkeys came from the desire to have
grouping_planner initialize query_pathkeys and related pathkey lists before
calling query_planner. However, since query_planner didn't actually *do*
anything with those lists before they'd been made canonical, we can get rid
of the whole mess by just not creating the lists at all until the point
where we formerly canonicalized them.
There are several ways in which we could implement that without making
query_planner itself deal with grouping/sorting features (which are
supposed to be the province of grouping_planner). I chose to add a
callback function to query_planner's API; other alternatives would have
required adding more fields to PlannerInfo, which while not bad in itself
would create an ABI break for planner-related plugins in the 9.2 release
series. This still breaks ABI for anything that calls query_planner
directly, but it seems somewhat unlikely that there are any such plugins.
I had originally conceived of this change as merely a step on the way to
fixing bug #8049 from Teun Hoogendoorn; but it turns out that this fixes
that bug all by itself, as per the added regression test. The reason is
that now get_eclass_for_sort_expr is adding the ORDER BY expression at the
end of EquivalenceClass creation not the start, and so anything that is in
a multi-member EquivalenceClass has already been created with correct
em_nullable_relids. I am suspicious that there are related scenarios in
which we still need to teach get_eclass_for_sort_expr to compute correct
nullable_relids, but am not eager to risk destabilizing either 9.2 or 9.3
to fix bugs that are only hypothetical. So for the moment, do this and
stop here.
Back-patch to 9.2 but not to earlier branches, since they don't exhibit
this bug for lack of join-clause-movement logic that depends on
em_nullable_relids being correct. (We might have to revisit that choice
if any related bugs turn up.) In 9.2, don't change the signature of
make_pathkeys_for_sortclauses nor remove canonicalize_pathkeys, so as
not to risk more plugin breakage than we have to.
2013-04-29 14:49:01 -04:00
|
|
|
/* query_planner callback to compute query_pathkeys */
|
|
|
|
|
typedef void (*query_pathkeys_callback) (PlannerInfo *root, void *extra);
|
|
|
|
|
|
1996-07-09 02:22:35 -04:00
|
|
|
/*
|
|
|
|
|
* prototypes for plan/planmain.c
|
|
|
|
|
*/
|
Avoid passing query tlist around separately from root->processed_tlist.
In the dim past, the planner kept the fully-processed version of the query
targetlist (the result of preprocess_targetlist) in grouping_planner's
local variable "tlist", and only grudgingly passed it to individual other
routines as needed. Later we discovered a need to still have it available
after grouping_planner finishes, and invented the root->processed_tlist
field for that purpose, but it wasn't used internally to grouping_planner;
the tlist was still being passed around separately in the same places as
before.
Now comes a proposed patch to allow appendrel expansion to add entries
to the processed tlist, well after preprocess_targetlist has finished
its work. To avoid having to pass around the tlist explicitly, it's
proposed to allow appendrel expansion to modify root->processed_tlist.
That makes aliasing the tlist with assorted parameters and local
variables really scary. It would accidentally work as long as the
tlist is initially nonempty, because then the List header won't move
around, but it's not exactly hard to think of ways for that to break.
Aliased values are poor programming practice anyway.
Hence, get rid of local variables and parameters that can be identified
with root->processed_tlist, in favor of just using that field directly.
And adjust comments to match. (Some of the new comments speak as though
it's already possible for appendrel expansion to modify the tlist; that's
not true yet, but will happen in a later patch.)
Discussion: https://postgr.es/m/9d7c5112-cb99-6a47-d3be-cf1ee6862a1d@lab.ntt.co.jp
2019-03-27 12:57:41 -04:00
|
|
|
extern RelOptInfo *query_planner(PlannerInfo *root,
|
Simplify query_planner's API by having it return the top-level RelOptInfo.
Formerly, query_planner returned one or possibly two Paths for the topmost
join relation, so that grouping_planner didn't see the join RelOptInfo
(at least not directly; it didn't have any hesitation about examining
cheapest_path->parent, though). However, correct selection of the Paths
involved a significant amount of coupling between query_planner and
grouping_planner, a problem which has gotten worse over time. It seems
best to give up on this API choice and instead return the topmost
RelOptInfo explicitly. Then grouping_planner can pull out the Paths it
wants from the rel's path list. In this way we can remove all knowledge
of grouping behaviors from query_planner.
The only real benefit of the old way is that in the case of an empty
FROM clause, we never made any RelOptInfos at all, just a Path. Now
we have to gin up a dummy RelOptInfo to represent the empty FROM clause.
That's not a very big deal though.
While at it, simplify query_planner's API a bit more by having the caller
set up root->tuple_fraction and root->limit_tuples, rather than passing
those values as separate parameters. Since query_planner no longer does
anything with either value, requiring it to fill the PlannerInfo fields
seemed pretty arbitrary.
This patch just rearranges code; it doesn't (intentionally) change any
behaviors. Followup patches will do more interesting things.
2013-08-05 15:00:57 -04:00
|
|
|
query_pathkeys_callback qp_callback, void *qp_extra);
|
1996-07-09 02:22:35 -04:00
|
|
|
|
2005-04-11 19:06:57 -04:00
|
|
|
/*
|
|
|
|
|
* prototypes for plan/planagg.c
|
|
|
|
|
*/
|
Avoid passing query tlist around separately from root->processed_tlist.
In the dim past, the planner kept the fully-processed version of the query
targetlist (the result of preprocess_targetlist) in grouping_planner's
local variable "tlist", and only grudgingly passed it to individual other
routines as needed. Later we discovered a need to still have it available
after grouping_planner finishes, and invented the root->processed_tlist
field for that purpose, but it wasn't used internally to grouping_planner;
the tlist was still being passed around separately in the same places as
before.
Now comes a proposed patch to allow appendrel expansion to add entries
to the processed tlist, well after preprocess_targetlist has finished
its work. To avoid having to pass around the tlist explicitly, it's
proposed to allow appendrel expansion to modify root->processed_tlist.
That makes aliasing the tlist with assorted parameters and local
variables really scary. It would accidentally work as long as the
tlist is initially nonempty, because then the List header won't move
around, but it's not exactly hard to think of ways for that to break.
Aliased values are poor programming practice anyway.
Hence, get rid of local variables and parameters that can be identified
with root->processed_tlist, in favor of just using that field directly.
And adjust comments to match. (Some of the new comments speak as though
it's already possible for appendrel expansion to modify the tlist; that's
not true yet, but will happen in a later patch.)
Discussion: https://postgr.es/m/9d7c5112-cb99-6a47-d3be-cf1ee6862a1d@lab.ntt.co.jp
2019-03-27 12:57:41 -04:00
|
|
|
extern void preprocess_minmax_aggregates(PlannerInfo *root);
|
2005-04-11 19:06:57 -04:00
|
|
|
|
1996-07-09 02:22:35 -04:00
|
|
|
/*
|
|
|
|
|
* prototypes for plan/createplan.c
|
|
|
|
|
*/
|
2005-06-05 18:32:58 -04:00
|
|
|
extern Plan *create_plan(PlannerInfo *root, Path *best_path);
|
Revise FDW planning API, again.
Further reflection shows that a single callback isn't very workable if we
desire to let FDWs generate multiple Paths, because that forces the FDW to
do all work necessary to generate a valid Plan node for each Path. Instead
split the former PlanForeignScan API into three steps: GetForeignRelSize,
GetForeignPaths, GetForeignPlan. We had already bit the bullet of breaking
the 9.1 FDW API for 9.2, so this shouldn't cause very much additional pain,
and it's substantially more flexible for complex FDWs.
Add an fdw_private field to RelOptInfo so that the new functions can save
state there rather than possibly having to recalculate information two or
three times.
In addition, we'd not thought through what would be needed to allow an FDW
to set up subexpressions of its choice for runtime execution. We could
treat ForeignScan.fdw_private as an executable expression but that seems
likely to break existing FDWs unnecessarily (in particular, it would
restrict the set of node types allowable in fdw_private to those supported
by expression_tree_walker). Instead, invent a separate field fdw_exprs
which will receive the postprocessing appropriate for expression trees.
(One field is enough since it can be a list of expressions; also, we assume
the corresponding expression state tree(s) will be held within fdw_state,
so we don't need to add anything to ForeignScanState.)
Per review of Hanada Shigeru's pgsql_fdw patch. We may need to tweak this
further as we continue to work on that patch, but to me it feels a lot
closer to being right now.
2012-03-09 12:48:48 -05:00
|
|
|
extern ForeignScan *make_foreignscan(List *qptlist, List *qpqual,
|
Code review for foreign/custom join pushdown patch.
Commit e7cb7ee14555cc9c5773e2c102efd6371f6f2005 included some design
decisions that seem pretty questionable to me, and there was quite a lot
of stuff not to like about the documentation and comments. Clean up
as follows:
* Consider foreign joins only between foreign tables on the same server,
rather than between any two foreign tables with the same underlying FDW
handler function. In most if not all cases, the FDW would simply have had
to apply the same-server restriction itself (far more expensively, both for
lack of caching and because it would be repeated for each combination of
input sub-joins), or else risk nasty bugs. Anyone who's really intent on
doing something outside this restriction can always use the
set_join_pathlist_hook.
* Rename fdw_ps_tlist/custom_ps_tlist to fdw_scan_tlist/custom_scan_tlist
to better reflect what they're for, and allow these custom scan tlists
to be used even for base relations.
* Change make_foreignscan() API to include passing the fdw_scan_tlist
value, since the FDW is required to set that. Backwards compatibility
doesn't seem like an adequate reason to expect FDWs to set it in some
ad-hoc extra step, and anyway existing FDWs can just pass NIL.
* Change the API of path-generating subroutines of add_paths_to_joinrel,
and in particular that of GetForeignJoinPaths and set_join_pathlist_hook,
so that various less-used parameters are passed in a struct rather than
as separate parameter-list entries. The objective here is to reduce the
probability that future additions to those parameter lists will result in
source-level API breaks for users of these hooks. It's possible that this
is even a small win for the core code, since most CPU architectures can't
pass more than half a dozen parameters efficiently anyway. I kept root,
joinrel, outerrel, innerrel, and jointype as separate parameters to reduce
code churn in joinpath.c --- in particular, putting jointype into the
struct would have been problematic because of the subroutines' habit of
changing their local copies of that variable.
* Avoid ad-hocery in ExecAssignScanProjectionInfo. It was probably all
right for it to know about IndexOnlyScan, but if the list is to grow
we should refactor the knowledge out to the callers.
* Restore nodeForeignscan.c's previous use of the relcache to avoid
extra GetFdwRoutine lookups for base-relation scans.
* Lots of cleanup of documentation and missed comments. Re-order some
code additions into more logical places.
2015-05-10 14:36:30 -04:00
|
|
|
Index scanrelid, List *fdw_exprs, List *fdw_private,
|
Allow foreign and custom joins to handle EvalPlanQual rechecks.
Commit e7cb7ee14555cc9c5773e2c102efd6371f6f2005 provided basic
infrastructure for allowing a foreign data wrapper or custom scan
provider to replace a join of one or more tables with a scan.
However, this infrastructure failed to take into account the need
for possible EvalPlanQual rechecks, and ExecScanFetch would fail
an assertion (or just overwrite memory) if such a check was attempted
for a plan containing a pushed-down join. To fix, adjust the EPQ
machinery to skip some processing steps when scanrelid == 0, making
those the responsibility of scan's recheck method, which also has
the responsibility in this case of correctly populating the relevant
slot.
To allow foreign scans to gain control in the right place to make
use of this new facility, add a new, optional RecheckForeignScan
method. Also, allow a foreign scan to have a child plan, which can
be used to correctly populate the slot (or perhaps for something
else, but this is the only use currently envisioned).
KaiGai Kohei, reviewed by Robert Haas, Etsuro Fujita, and Kyotaro
Horiguchi.
2015-12-08 12:31:03 -05:00
|
|
|
List *fdw_scan_tlist, List *fdw_recheck_quals,
|
|
|
|
|
Plan *outer_plan);
|
2018-12-12 16:08:30 -05:00
|
|
|
extern Plan *change_plan_targetlist(Plan *subplan, List *tlist,
|
|
|
|
|
bool tlist_parallel_safe);
|
2003-03-09 22:53:52 -05:00
|
|
|
extern Plan *materialize_finished_plan(Plan *subplan);
|
Make the upper part of the planner work by generating and comparing Paths.
I've been saying we needed to do this for more than five years, and here it
finally is. This patch removes the ever-growing tangle of spaghetti logic
that grouping_planner() used to use to try to identify the best plan for
post-scan/join query steps. Now, there is (nearly) independent
consideration of each execution step, and entirely separate construction of
Paths to represent each of the possible ways to do that step. We choose
the best Path or set of Paths using the same add_path() logic that's been
used inside query_planner() for years.
In addition, this patch removes the old restriction that subquery_planner()
could return only a single Plan. It now returns a RelOptInfo containing a
set of Paths, just as query_planner() does, and the parent query level can
use each of those Paths as the basis of a SubqueryScanPath at its level.
This allows finding some optimizations that we missed before, wherein a
subquery was capable of returning presorted data and thereby avoiding a
sort in the parent level, making the overall cost cheaper even though
delivering sorted output was not the cheapest plan for the subquery in
isolation. (A couple of regression test outputs change in consequence of
that. However, there is very little change in visible planner behavior
overall, because the point of this patch is not to get immediate planning
benefits but to create the infrastructure for future improvements.)
There is a great deal left to do here. This patch unblocks a lot of
planner work that was basically impractical in the old code structure,
such as allowing FDWs to implement remote aggregation, or rewriting
plan_set_operations() to allow consideration of multiple implementation
orders for set operations. (The latter will likely require a full
rewrite of plan_set_operations(); what I've done here is only to fix it
to return Paths not Plans.) I have also left unfinished some localized
refactoring in createplan.c and planner.c, because it was not necessary
to get this patch to a working state.
Thanks to Robert Haas, David Rowley, and Amit Kapila for review.
2016-03-07 15:58:22 -05:00
|
|
|
extern bool is_projection_capable_path(Path *path);
|
2004-01-17 19:50:03 -05:00
|
|
|
extern bool is_projection_capable_plan(Plan *plan);
|
2016-04-06 11:34:02 -04:00
|
|
|
|
2016-03-12 12:12:59 -05:00
|
|
|
/* External use of these functions is deprecated: */
|
|
|
|
|
extern Sort *make_sort_from_sortclauses(List *sortcls, Plan *lefttree);
|
2016-06-26 14:33:38 -04:00
|
|
|
extern Agg *make_agg(List *tlist, List *qual,
|
|
|
|
|
AggStrategy aggstrategy, AggSplit aggsplit,
|
2019-03-22 07:09:32 -04:00
|
|
|
int numGroupCols, AttrNumber *grpColIdx, Oid *grpOperators, Oid *grpCollations,
|
2020-02-27 13:46:58 -05:00
|
|
|
List *groupingSets, List *chain, double dNumGroups,
|
|
|
|
|
Size transitionSpace, Plan *lefttree);
|
2020-04-07 16:22:13 -04:00
|
|
|
extern Limit *make_limit(Plan *lefttree, Node *limitOffset, Node *limitCount,
|
|
|
|
|
LimitOption limitOption, int uniqNumCols,
|
|
|
|
|
AttrNumber *uniqColIdx, Oid *uniqOperators,
|
|
|
|
|
Oid *uniqCollations);
|
1996-07-09 02:22:35 -04:00
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* prototypes for plan/initsplan.c
|
|
|
|
|
*/
|
2022-04-08 08:16:38 -04:00
|
|
|
extern PGDLLIMPORT int from_collapse_limit;
|
|
|
|
|
extern PGDLLIMPORT int join_collapse_limit;
|
2005-12-19 21:30:36 -05:00
|
|
|
|
2005-06-05 18:32:58 -04:00
|
|
|
extern void add_base_rels_to_query(PlannerInfo *root, Node *jtnode);
|
Build "other rels" of appendrel baserels in a separate step.
Up to now, otherrel RelOptInfos were built at the same time as baserel
RelOptInfos, thanks to recursion in build_simple_rel(). However,
nothing in query_planner's preprocessing cares at all about otherrels,
only baserels, so we don't really need to build them until just before
we enter make_one_rel. This has two benefits:
* create_lateral_join_info did a lot of extra work to propagate
lateral-reference information from parents to the correct children.
But if we delay creation of the children till after that, it's
trivial (and much harder to break, too).
* Since we have all the restriction quals correctly assigned to
parent appendrels by this point, it'll be possible to do plan-time
pruning and never make child RelOptInfos at all for partitions that
can be pruned away. That's not done here, but will be later on.
Amit Langote, reviewed at various times by Dilip Kumar, Jesper Pedersen,
Yoshikazu Imai, and David Rowley
Discussion: https://postgr.es/m/9d7c5112-cb99-6a47-d3be-cf1ee6862a1d@lab.ntt.co.jp
2019-03-26 18:21:10 -04:00
|
|
|
extern void add_other_rels_to_query(PlannerInfo *root);
|
2005-06-05 18:32:58 -04:00
|
|
|
extern void build_base_rel_tlists(PlannerInfo *root, List *final_tlist);
|
2007-01-20 15:45:41 -05:00
|
|
|
extern void add_vars_to_targetlist(PlannerInfo *root, List *vars,
|
2022-08-17 15:52:53 -04:00
|
|
|
Relids where_needed);
|
Recalculate where-needed data accurately after a join removal.
Up to now, remove_rel_from_query() has done a pretty shoddy job
of updating our where-needed bitmaps (per-Var attr_needed and
per-PlaceHolderVar ph_needed relid sets). It removed direct mentions
of the to-be-removed baserel and outer join, which is the minimum
amount of effort needed to keep the data structures self-consistent.
But it didn't account for the fact that the removed join ON clause
probably mentioned Vars of other relations, and those Vars might now
not be needed as high up in the join tree as before. It's easy to
show cases where this results in failing to remove a lower outer join
that could also have been removed.
To fix, recalculate the where-needed bitmaps from scratch after
each successful join removal. This sounds expensive, but it seems
to add only negligible planner runtime. (We cheat a little bit
by preserving "relation 0" entries in the bitmaps, allowing us to
skip re-scanning the targetlist and HAVING qual.)
The submitted test case drew attention because we had successfully
optimized away the lower join prior to v16. I suspect that that's
somewhat accidental and there are related cases that were never
optimized before and now can be. I've not tried to come up with
one, though.
Perhaps we should back-patch this into v16 and v17 to repair the
performance regression. However, since it took a year for anyone
to notice the problem, it can't be affecting too many people. Let's
let the patch bake awhile in HEAD, and see if we get more complaints.
Per bug #18627 from Mikaël Gourlaouen. No back-patch for now.
Discussion: https://postgr.es/m/18627-44f950eb6a8416c2@postgresql.org
2024-09-27 16:04:04 -04:00
|
|
|
extern void add_vars_to_attr_needed(PlannerInfo *root, List *vars,
|
|
|
|
|
Relids where_needed);
|
2024-12-11 20:22:15 -05:00
|
|
|
extern void remove_useless_groupby_columns(PlannerInfo *root);
|
2012-08-26 22:48:55 -04:00
|
|
|
extern void find_lateral_references(PlannerInfo *root);
|
Recalculate where-needed data accurately after a join removal.
Up to now, remove_rel_from_query() has done a pretty shoddy job
of updating our where-needed bitmaps (per-Var attr_needed and
per-PlaceHolderVar ph_needed relid sets). It removed direct mentions
of the to-be-removed baserel and outer join, which is the minimum
amount of effort needed to keep the data structures self-consistent.
But it didn't account for the fact that the removed join ON clause
probably mentioned Vars of other relations, and those Vars might now
not be needed as high up in the join tree as before. It's easy to
show cases where this results in failing to remove a lower outer join
that could also have been removed.
To fix, recalculate the where-needed bitmaps from scratch after
each successful join removal. This sounds expensive, but it seems
to add only negligible planner runtime. (We cheat a little bit
by preserving "relation 0" entries in the bitmaps, allowing us to
skip re-scanning the targetlist and HAVING qual.)
The submitted test case drew attention because we had successfully
optimized away the lower join prior to v16. I suspect that that's
somewhat accidental and there are related cases that were never
optimized before and now can be. I've not tried to come up with
one, though.
Perhaps we should back-patch this into v16 and v17 to repair the
performance regression. However, since it took a year for anyone
to notice the problem, it can't be affecting too many people. Let's
let the patch bake awhile in HEAD, and see if we get more complaints.
Per bug #18627 from Mikaël Gourlaouen. No back-patch for now.
Discussion: https://postgr.es/m/18627-44f950eb6a8416c2@postgresql.org
2024-09-27 16:04:04 -04:00
|
|
|
extern void rebuild_lateral_attr_needed(PlannerInfo *root);
|
2012-08-26 22:48:55 -04:00
|
|
|
extern void create_lateral_join_info(PlannerInfo *root);
|
2005-12-19 21:30:36 -05:00
|
|
|
extern List *deconstruct_jointree(PlannerInfo *root);
|
2024-01-23 00:09:18 -05:00
|
|
|
extern bool restriction_is_always_true(PlannerInfo *root,
|
|
|
|
|
RestrictInfo *restrictinfo);
|
|
|
|
|
extern bool restriction_is_always_false(PlannerInfo *root,
|
|
|
|
|
RestrictInfo *restrictinfo);
|
2007-01-20 15:45:41 -05:00
|
|
|
extern void distribute_restrictinfo_to_rels(PlannerInfo *root,
|
|
|
|
|
RestrictInfo *restrictinfo);
|
Fix foreign-key selectivity estimation in the presence of constants.
get_foreign_key_join_selectivity() looks for join clauses that equate
the two sides of the FK constraint. However, if we have a query like
"WHERE fktab.a = pktab.a and fktab.a = 1", it won't find any such join
clause, because equivclass.c replaces the given clauses with "fktab.a
= 1 and pktab.a = 1", which can be enforced at the scan level, leaving
nothing to be done for column "a" at the join level.
We can fix that expectation without much trouble, but then a new problem
arises: applying the foreign-key-based selectivity rule produces a
rowcount underestimate, because we're effectively double-counting the
selectivity of the "fktab.a = 1" clause. So we have to cancel that
selectivity out of the estimate.
To fix, refactor process_implied_equality() so that it can pass back the
new RestrictInfo to its callers in equivclass.c, allowing the generated
"fktab.a = 1" clause to be saved in the EquivalenceClass's ec_derives
list. Then it's not much trouble to dig out the relevant RestrictInfo
when we need to adjust an FK selectivity estimate. (While at it, we
can also remove the expensive use of initialize_mergeclause_eclasses()
to set up the new RestrictInfo's left_ec and right_ec pointers.
The equivclass.c code can set those basically for free.)
This seems like clearly a bug fix, but I'm hesitant to back-patch it,
first because there's some API/ABI risk for extensions and second because
we're usually loath to destabilize plan choices in stable branches.
Per report from Sigrid Ehrenreich.
Discussion: https://postgr.es/m/1019549.1603770457@sss.pgh.pa.us
Discussion: https://postgr.es/m/AM6PR02MB5287A0ADD936C1FA80973E72AB190@AM6PR02MB5287.eurprd02.prod.outlook.com
2020-10-28 11:15:47 -04:00
|
|
|
extern RestrictInfo *process_implied_equality(PlannerInfo *root,
|
|
|
|
|
Oid opno,
|
|
|
|
|
Oid collation,
|
|
|
|
|
Expr *item1,
|
|
|
|
|
Expr *item2,
|
|
|
|
|
Relids qualscope,
|
|
|
|
|
Index security_level,
|
|
|
|
|
bool both_const);
|
Fix pull_varnos' miscomputation of relids set for a PlaceHolderVar.
Previously, pull_varnos() took the relids of a PlaceHolderVar as being
equal to the relids in its contents, but that fails to account for the
possibility that we have to postpone evaluation of the PHV due to outer
joins. This could result in a malformed plan. The known cases end up
triggering the "failed to assign all NestLoopParams to plan nodes"
sanity check in createplan.c, but other symptoms may be possible.
The right value to use is the join level we actually intend to evaluate
the PHV at. We can get that from the ph_eval_at field of the associated
PlaceHolderInfo. However, there are some places that call pull_varnos()
before the PlaceHolderInfos have been created; in that case, fall back
to the conservative assumption that the PHV will be evaluated at its
syntactic level. (In principle this might result in missing some legal
optimization, but I'm not aware of any cases where it's an issue in
practice.) Things are also a bit ticklish for calls occurring during
deconstruct_jointree(), but AFAICS the ph_eval_at fields should have
reached their final values by the time we need them.
The main problem in making this work is that pull_varnos() has no
way to get at the PlaceHolderInfos. We can fix that easily, if a
bit tediously, in HEAD by passing it the planner "root" pointer.
In the back branches that'd cause an unacceptable API/ABI break for
extensions, so leave the existing entry points alone and add new ones
with the additional parameter. (If an old entry point is called and
encounters a PHV, it'll fall back to using the syntactic level,
again possibly missing some valid optimization.)
Back-patch to v12. The computation is surely also wrong before that,
but it appears that we cannot reach a bad plan thanks to join order
restrictions imposed on the subquery that the PlaceHolderVar came from.
The error only became reachable when commit 4be058fe9 allowed trivial
subqueries to be collapsed out completely, eliminating their join order
restrictions.
Per report from Stephan Springl.
Discussion: https://postgr.es/m/171041.1610849523@sss.pgh.pa.us
2021-01-21 15:37:23 -05:00
|
|
|
extern RestrictInfo *build_implied_join_equality(PlannerInfo *root,
|
|
|
|
|
Oid opno,
|
2011-03-19 20:29:08 -04:00
|
|
|
Oid collation,
|
2007-01-20 15:45:41 -05:00
|
|
|
Expr *item1,
|
|
|
|
|
Expr *item2,
|
Fix planning of non-strict equivalence clauses above outer joins.
If a potential equivalence clause references a variable from the nullable
side of an outer join, the planner needs to take care that derived clauses
are not pushed to below the outer join; else they may use the wrong value
for the variable. (The problem arises only with non-strict clauses, since
if an upper clause can be proven strict then the outer join will get
simplified to a plain join.) The planner attempted to prevent this type
of error by checking that potential equivalence clauses aren't
outerjoin-delayed as a whole, but actually we have to check each side
separately, since the two sides of the clause will get moved around
separately if it's treated as an equivalence. Bugs of this type can be
demonstrated as far back as 7.4, even though releases before 8.3 had only
a very ad-hoc notion of equivalence clauses.
In addition, we neglected to account for the possibility that such clauses
might have nonempty nullable_relids even when not outerjoin-delayed; so the
equivalence-class machinery lacked logic to compute correct nullable_relids
values for clauses it constructs. This oversight was harmless before 9.2
because we were only using RestrictInfo.nullable_relids for OR clauses;
but as of 9.2 it could result in pushing constructed equivalence clauses
to incorrect places. (This accounts for bug #7604 from Bill MacArthur.)
Fix the first problem by adding a new test check_equivalence_delay() in
distribute_qual_to_rels, and fix the second one by adding code in
equivclass.c and called functions to set correct nullable_relids for
generated clauses. Although I believe the second part of this is not
currently necessary before 9.2, I chose to back-patch it anyway, partly to
keep the logic similar across branches and partly because it seems possible
we might find other reasons why we need valid values of nullable_relids in
the older branches.
Add regression tests illustrating these problems. In 9.0 and up, also
add test cases checking that we can push constants through outer joins,
since we've broken that optimization before and I nearly broke it again
with an overly simplistic patch for this problem.
2012-10-18 12:28:45 -04:00
|
|
|
Relids qualscope,
|
Improve RLS planning by marking individual quals with security levels.
In an RLS query, we must ensure that security filter quals are evaluated
before ordinary query quals, in case the latter contain "leaky" functions
that could expose the contents of sensitive rows. The original
implementation of RLS planning ensured this by pushing the scan of a
secured table into a sub-query that it marked as a security-barrier view.
Unfortunately this results in very inefficient plans in many cases, because
the sub-query cannot be flattened and gets planned independently of the
rest of the query.
To fix, drop the use of sub-queries to enforce RLS qual order, and instead
mark each qual (RestrictInfo) with a security_level field establishing its
priority for evaluation. Quals must be evaluated in security_level order,
except that "leakproof" quals can be allowed to go ahead of quals of lower
security_level, if it's helpful to do so. This has to be enforced within
the ordering of any one list of quals to be evaluated at a table scan node,
and we also have to ensure that quals are not chosen for early evaluation
(i.e., use as an index qual or TID scan qual) if they're not allowed to go
ahead of other quals at the scan node.
This is sufficient to fix the problem for RLS quals, since we only support
RLS policies on simple tables and thus RLS quals will always exist at the
table scan level only. Eventually these qual ordering rules should be
enforced for join quals as well, which would permit improving planning for
explicit security-barrier views; but that's a task for another patch.
Note that FDWs would need to be aware of these rules --- and not, for
example, send an insecure qual for remote execution --- but since we do
not yet allow RLS policies on foreign tables, the case doesn't arise.
This will need to be addressed before we can allow such policies.
Patch by me, reviewed by Stephen Frost and Dean Rasheed.
Discussion: https://postgr.es/m/8185.1477432701@sss.pgh.pa.us
2017-01-18 12:58:20 -05:00
|
|
|
Index security_level);
|
Recalculate where-needed data accurately after a join removal.
Up to now, remove_rel_from_query() has done a pretty shoddy job
of updating our where-needed bitmaps (per-Var attr_needed and
per-PlaceHolderVar ph_needed relid sets). It removed direct mentions
of the to-be-removed baserel and outer join, which is the minimum
amount of effort needed to keep the data structures self-consistent.
But it didn't account for the fact that the removed join ON clause
probably mentioned Vars of other relations, and those Vars might now
not be needed as high up in the join tree as before. It's easy to
show cases where this results in failing to remove a lower outer join
that could also have been removed.
To fix, recalculate the where-needed bitmaps from scratch after
each successful join removal. This sounds expensive, but it seems
to add only negligible planner runtime. (We cheat a little bit
by preserving "relation 0" entries in the bitmaps, allowing us to
skip re-scanning the targetlist and HAVING qual.)
The submitted test case drew attention because we had successfully
optimized away the lower join prior to v16. I suspect that that's
somewhat accidental and there are related cases that were never
optimized before and now can be. I've not tried to come up with
one, though.
Perhaps we should back-patch this into v16 and v17 to repair the
performance regression. However, since it took a year for anyone
to notice the problem, it can't be affecting too many people. Let's
let the patch bake awhile in HEAD, and see if we get more complaints.
Per bug #18627 from Mikaël Gourlaouen. No back-patch for now.
Discussion: https://postgr.es/m/18627-44f950eb6a8416c2@postgresql.org
2024-09-27 16:04:04 -04:00
|
|
|
extern void rebuild_joinclause_attr_needed(PlannerInfo *root);
|
2016-06-18 15:22:34 -04:00
|
|
|
extern void match_foreign_keys_to_quals(PlannerInfo *root);
|
1996-07-09 02:22:35 -04:00
|
|
|
|
2010-03-28 18:59:34 -04:00
|
|
|
/*
|
|
|
|
|
* prototypes for plan/analyzejoins.c
|
|
|
|
|
*/
|
|
|
|
|
extern List *remove_useless_joins(PlannerInfo *root, List *joinlist);
|
2017-05-01 14:53:42 -04:00
|
|
|
extern void reduce_unique_semijoins(PlannerInfo *root);
|
2014-07-15 21:12:43 -04:00
|
|
|
extern bool query_supports_distinctness(Query *query);
|
|
|
|
|
extern bool query_is_distinct_for(Query *query, List *colnos, List *opids);
|
2017-04-07 22:20:03 -04:00
|
|
|
extern bool innerrel_is_unique(PlannerInfo *root,
|
2018-04-20 16:00:47 -04:00
|
|
|
Relids joinrelids, Relids outerrelids, RelOptInfo *innerrel,
|
2017-05-01 14:53:42 -04:00
|
|
|
JoinType jointype, List *restrictlist, bool force_cache);
|
Implement Self-Join Elimination
The Self-Join Elimination (SJE) feature removes an inner join of a plain
table to itself in the query tree if it is proven that the join can be
replaced with a scan without impacting the query result. Self-join and
inner relation get replaced with the outer in query, equivalence classes,
and planner info structures. Also, the inner restrictlist moves to the
outer one with the removal of duplicated clauses. Thus, this optimization
reduces the length of the range table list (this especially makes sense for
partitioned relations), reduces the number of restriction clauses and,
in turn, selectivity estimations, and potentially improves total planner
prediction for the query.
This feature is dedicated to avoiding redundancy, which can appear after
pull-up transformations or the creation of an EquivalenceClass-derived clause
like the below.
SELECT * FROM t1 WHERE x IN (SELECT t3.x FROM t1 t3);
SELECT * FROM t1 WHERE EXISTS (SELECT t3.x FROM t1 t3 WHERE t3.x = t1.x);
SELECT * FROM t1,t2, t1 t3 WHERE t1.x = t2.x AND t2.x = t3.x;
In the future, we could also reduce redundancy caused by subquery pull-up
after unnecessary outer join removal in cases like the one below.
SELECT * FROM t1 WHERE x IN
(SELECT t3.x FROM t1 t3 LEFT JOIN t2 ON t2.x = t1.x);
Also, it can drastically help to join partitioned tables, removing entries
even before their expansion.
The SJE proof is based on innerrel_is_unique() machinery.
We can remove a self-join when for each outer row:
1. At most, one inner row matches the join clause;
2. Each matched inner row must be (physically) the same as the outer one;
3. Inner and outer rows have the same row mark.
In this patch, we use the next approach to identify a self-join:
1. Collect all merge-joinable join quals which look like a.x = b.x;
2. Add to the list above the baseretrictinfo of the inner table;
3. Check innerrel_is_unique() for the qual list. If it returns false, skip
this pair of joining tables;
4. Check uniqueness, proved by the baserestrictinfo clauses. To prove the
possibility of self-join elimination, the inner and outer clauses must
match exactly.
The relation replacement procedure is not trivial and is partly combined
with the one used to remove useless left joins. Tests covering this feature
were added to join.sql. Some of the existing regression tests changed due
to self-join removal logic.
Discussion: https://postgr.es/m/flat/64486b0b-0404-e39e-322d-0801154901f3%40postgrespro.ru
Author: Andrey Lepikhov <a.lepikhov@postgrespro.ru>
Author: Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru>
Co-authored-by: Alexander Korotkov <aekorotkov@gmail.com>
Co-authored-by: Alena Rybakina <lena.ribackina@yandex.ru>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Simon Riggs <simon@2ndquadrant.com>
Reviewed-by: Jonathan S. Katz <jkatz@postgresql.org>
Reviewed-by: David Rowley <david.rowley@2ndquadrant.com>
Reviewed-by: Thomas Munro <thomas.munro@enterprisedb.com>
Reviewed-by: Konstantin Knizhnik <k.knizhnik@postgrespro.ru>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Hywel Carver <hywel@skillerwhale.com>
Reviewed-by: Laurenz Albe <laurenz.albe@cybertec.at>
Reviewed-by: Ronan Dunklau <ronan.dunklau@aiven.io>
Reviewed-by: vignesh C <vignesh21@gmail.com>
Reviewed-by: Zhihong Yu <zyu@yugabyte.com>
Reviewed-by: Greg Stark <stark@mit.edu>
Reviewed-by: Jaime Casanova <jcasanov@systemguards.com.ec>
Reviewed-by: Michał Kłeczek <michal@kleczek.org>
Reviewed-by: Alena Rybakina <lena.ribackina@yandex.ru>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
2025-02-12 17:56:03 -05:00
|
|
|
extern bool innerrel_is_unique_ext(PlannerInfo *root, Relids joinrelids,
|
|
|
|
|
Relids outerrelids, RelOptInfo *innerrel,
|
|
|
|
|
JoinType jointype, List *restrictlist,
|
|
|
|
|
bool force_cache, List **uclauses);
|
|
|
|
|
extern List *remove_useless_self_joins(PlannerInfo *root, List *jointree);
|
2010-03-28 18:59:34 -04:00
|
|
|
|
1996-07-09 02:22:35 -04:00
|
|
|
/*
|
|
|
|
|
* prototypes for plan/setrefs.c
|
|
|
|
|
*/
|
2011-09-03 15:35:12 -04:00
|
|
|
extern Plan *set_plan_references(PlannerInfo *root, Plan *plan);
|
2022-04-06 02:45:00 -04:00
|
|
|
extern bool trivial_subqueryscan(SubqueryScan *plan);
|
Account for optimized MinMax aggregates during SS_finalize_plan.
We are capable of optimizing MIN() and MAX() aggregates on indexed
columns into subqueries that exploit the index, rather than the normal
thing of scanning the whole table. When we do this, we replace the
Aggref node(s) with Params referencing subquery outputs. Such Params
really ought to be included in the per-plan-node extParam/allParam
sets computed by SS_finalize_plan. However, we've never done so
up to now because of an ancient implementation choice to perform
that substitution during set_plan_references, which runs after
SS_finalize_plan, so that SS_finalize_plan never sees these Params.
This seems like clearly a bug, yet there have been no field reports
of problems that could trace to it. This may be because the types
of Plan nodes that could contain Aggrefs do not have any of the
rescan optimizations that are controlled by extParam/allParam.
Nonetheless it seems certain to bite us someday, so let's fix it
in a self-contained patch that can be back-patched if we find a
case in which there's a live bug pre-v17.
The cleanest fix would be to perform a separate tree walk to do
these substitutions before SS_finalize_plan runs. That seems
unattractive, first because a whole-tree mutation pass is expensive,
and second because we lack infrastructure for visiting expression
subtrees in a Plan tree, so that we'd need a new function knowing
as much as SS_finalize_plan knows about that. I also considered
swapping the order of SS_finalize_plan and set_plan_references,
but that fell foul of various assumptions that seem tricky to fix.
So the approach adopted here is to teach SS_finalize_plan itself
to check for such Aggrefs. I refactored things a bit in setrefs.c
to avoid having three copies of the code that does that.
Given the lack of any currently-known bug, no test case here.
Discussion: https://postgr.es/m/2391880.1689025003@sss.pgh.pa.us
2023-07-13 16:50:13 -04:00
|
|
|
extern Param *find_minmax_agg_replacement_param(PlannerInfo *root,
|
|
|
|
|
Aggref *aggref);
|
2011-09-03 15:35:12 -04:00
|
|
|
extern void record_plan_function_dependency(PlannerInfo *root, Oid funcid);
|
2019-01-10 14:07:01 -05:00
|
|
|
extern void record_plan_type_dependency(PlannerInfo *root, Oid typid);
|
2022-09-20 16:09:30 -04:00
|
|
|
extern bool extract_query_dependencies_walker(Node *node, PlannerInfo *context);
|
1999-08-20 23:49:17 -04:00
|
|
|
|
1996-07-09 02:22:35 -04:00
|
|
|
#endif /* PLANMAIN_H */
|