mirror of
https://github.com/postgres/postgres.git
synced 2026-03-11 10:45:17 -04:00
Here we add a new executor node type named "Result Cache". The planner can include this node type in the plan to have the executor cache the results from the inner side of parameterized nested loop joins. This allows caching of tuples for sets of parameters so that in the event that the node sees the same parameter values again, it can just return the cached tuples instead of rescanning the inner side of the join all over again. Internally, result cache uses a hash table in order to quickly find tuples that have been previously cached. For certain data sets, this can significantly improve the performance of joins. The best cases for using this new node type are for join problems where a large portion of the tuples from the inner side of the join have no join partner on the outer side of the join. In such cases, hash join would have to hash values that are never looked up, thus bloating the hash table and possibly causing it to multi-batch. Merge joins would have to skip over all of the unmatched rows. If we use a nested loop join with a result cache, then we only cache tuples that have at least one join partner on the outer side of the join. The benefits of using a parameterized nested loop with a result cache increase when there are fewer distinct values being looked up and the number of lookups of each value is large. Also, hash probes to lookup the cache can be much faster than the hash probe in a hash join as it's common that the result cache's hash table is much smaller than the hash join's due to result cache only caching useful tuples rather than all tuples from the inner side of the join. This variation in hash probe performance is more significant when the hash join's hash table no longer fits into the CPU's L3 cache, but the result cache's hash table does. The apparent "random" access of hash buckets with each hash probe can cause a poor L3 cache hit ratio for large hash tables. Smaller hash tables generally perform better. The hash table used for the cache limits itself to not exceeding work_mem * hash_mem_multiplier in size. We maintain a dlist of keys for this cache and when we're adding new tuples and realize we've exceeded the memory budget, we evict cache entries starting with the least recently used ones until we have enough memory to add the new tuples to the cache. For parameterized nested loop joins, we now consider using one of these result cache nodes in between the nested loop node and its inner node. We determine when this might be useful based on cost, which is primarily driven off of what the expected cache hit ratio will be. Estimating the cache hit ratio relies on having good distinct estimates on the nested loop's parameters. For now, the planner will only consider using a result cache for parameterized nested loop joins. This works for both normal joins and also for LATERAL type joins to subqueries. It is possible to use this new node for other uses in the future. For example, to cache results from correlated subqueries. However, that's not done here due to some difficulties obtaining a distinct estimation on the outer plan to calculate the estimated cache hit ratio. Currently we plan the inner plan before planning the outer plan so there is no good way to know if a result cache would be useful or not since we can't estimate the number of times the subplan will be called until the outer plan is generated. The functionality being added here is newly introducing a dependency on the return value of estimate_num_groups() during the join search. Previously, during the join search, we only ever needed to perform selectivity estimations. With this commit, we need to use estimate_num_groups() in order to estimate what the hit ratio on the result cache will be. In simple terms, if we expect 10 distinct values and we expect 1000 outer rows, then we'll estimate the hit ratio to be 99%. Since cache hits are very cheap compared to scanning the underlying nodes on the inner side of the nested loop join, then this will significantly reduce the planner's cost for the join. However, it's fairly easy to see here that things will go bad when estimate_num_groups() incorrectly returns a value that's significantly lower than the actual number of distinct values. If this happens then that may cause us to make use of a nested loop join with a result cache instead of some other join type, such as a merge or hash join. Our distinct estimations have been known to be a source of trouble in the past, so the extra reliance on them here could cause the planner to choose slower plans than it did previous to having this feature. Distinct estimations are also fairly hard to estimate accurately when several tables have been joined already or when a WHERE clause filters out a set of values that are correlated to the expressions we're estimating the number of distinct value for. For now, the costing we perform during query planning for result caches does put quite a bit of faith in the distinct estimations being accurate. When these are accurate then we should generally see faster execution times for plans containing a result cache. However, in the real world, we may find that we need to either change the costings to put less trust in the distinct estimations being accurate or perhaps even disable this feature by default. There's always an element of risk when we teach the query planner to do new tricks that it decides to use that new trick at the wrong time and causes a regression. Users may opt to get the old behavior by turning the feature off using the enable_resultcache GUC. Currently, this is enabled by default. It remains to be seen if we'll maintain that setting for the release. Additionally, the name "Result Cache" is the best name I could think of for this new node at the time I started writing the patch. Nobody seems to strongly dislike the name. A few people did suggest other names but no other name seemed to dominate in the brief discussion that there was about names. Let's allow the beta period to see if the current name pleases enough people. If there's some consensus on a better name, then we can change it before the release. Please see the 2nd discussion link below for the discussion on the "Result Cache" name. Author: David Rowley Reviewed-by: Andy Fan, Justin Pryzby, Zhihong Yu, Hou Zhijie Tested-By: Konstantin Knizhnik Discussion: https://postgr.es/m/CAApHDvrPcQyQdWERGYWx8J%2B2DLUNgXu%2BfOSbQ1UscxrunyXyrQ%40mail.gmail.com Discussion: https://postgr.es/m/CAApHDvq=yQXr5kqhRviT2RhNKwToaWr9JAN5t+5_PzhuRJ3wvg@mail.gmail.com
661 lines
16 KiB
C
661 lines
16 KiB
C
/*-------------------------------------------------------------------------
|
|
*
|
|
* execAmi.c
|
|
* miscellaneous executor access method routines
|
|
*
|
|
* Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
|
|
* Portions Copyright (c) 1994, Regents of the University of California
|
|
*
|
|
* src/backend/executor/execAmi.c
|
|
*
|
|
*-------------------------------------------------------------------------
|
|
*/
|
|
#include "postgres.h"
|
|
|
|
#include "access/amapi.h"
|
|
#include "access/htup_details.h"
|
|
#include "executor/execdebug.h"
|
|
#include "executor/nodeAgg.h"
|
|
#include "executor/nodeAppend.h"
|
|
#include "executor/nodeBitmapAnd.h"
|
|
#include "executor/nodeBitmapHeapscan.h"
|
|
#include "executor/nodeBitmapIndexscan.h"
|
|
#include "executor/nodeBitmapOr.h"
|
|
#include "executor/nodeCtescan.h"
|
|
#include "executor/nodeCustom.h"
|
|
#include "executor/nodeForeignscan.h"
|
|
#include "executor/nodeFunctionscan.h"
|
|
#include "executor/nodeGather.h"
|
|
#include "executor/nodeGatherMerge.h"
|
|
#include "executor/nodeGroup.h"
|
|
#include "executor/nodeHash.h"
|
|
#include "executor/nodeHashjoin.h"
|
|
#include "executor/nodeIncrementalSort.h"
|
|
#include "executor/nodeIndexonlyscan.h"
|
|
#include "executor/nodeIndexscan.h"
|
|
#include "executor/nodeLimit.h"
|
|
#include "executor/nodeLockRows.h"
|
|
#include "executor/nodeMaterial.h"
|
|
#include "executor/nodeMergeAppend.h"
|
|
#include "executor/nodeMergejoin.h"
|
|
#include "executor/nodeModifyTable.h"
|
|
#include "executor/nodeNamedtuplestorescan.h"
|
|
#include "executor/nodeNestloop.h"
|
|
#include "executor/nodeProjectSet.h"
|
|
#include "executor/nodeRecursiveunion.h"
|
|
#include "executor/nodeResult.h"
|
|
#include "executor/nodeResultCache.h"
|
|
#include "executor/nodeSamplescan.h"
|
|
#include "executor/nodeSeqscan.h"
|
|
#include "executor/nodeSetOp.h"
|
|
#include "executor/nodeSort.h"
|
|
#include "executor/nodeSubplan.h"
|
|
#include "executor/nodeSubqueryscan.h"
|
|
#include "executor/nodeTableFuncscan.h"
|
|
#include "executor/nodeTidrangescan.h"
|
|
#include "executor/nodeTidscan.h"
|
|
#include "executor/nodeUnique.h"
|
|
#include "executor/nodeValuesscan.h"
|
|
#include "executor/nodeWindowAgg.h"
|
|
#include "executor/nodeWorktablescan.h"
|
|
#include "nodes/extensible.h"
|
|
#include "nodes/nodeFuncs.h"
|
|
#include "nodes/pathnodes.h"
|
|
#include "utils/rel.h"
|
|
#include "utils/syscache.h"
|
|
|
|
static bool IndexSupportsBackwardScan(Oid indexid);
|
|
|
|
|
|
/*
|
|
* ExecReScan
|
|
* Reset a plan node so that its output can be re-scanned.
|
|
*
|
|
* Note that if the plan node has parameters that have changed value,
|
|
* the output might be different from last time.
|
|
*/
|
|
void
|
|
ExecReScan(PlanState *node)
|
|
{
|
|
/* If collecting timing stats, update them */
|
|
if (node->instrument)
|
|
InstrEndLoop(node->instrument);
|
|
|
|
/*
|
|
* If we have changed parameters, propagate that info.
|
|
*
|
|
* Note: ExecReScanSetParamPlan() can add bits to node->chgParam,
|
|
* corresponding to the output param(s) that the InitPlan will update.
|
|
* Since we make only one pass over the list, that means that an InitPlan
|
|
* can depend on the output param(s) of a sibling InitPlan only if that
|
|
* sibling appears earlier in the list. This is workable for now given
|
|
* the limited ways in which one InitPlan could depend on another, but
|
|
* eventually we might need to work harder (or else make the planner
|
|
* enlarge the extParam/allParam sets to include the params of depended-on
|
|
* InitPlans).
|
|
*/
|
|
if (node->chgParam != NULL)
|
|
{
|
|
ListCell *l;
|
|
|
|
foreach(l, node->initPlan)
|
|
{
|
|
SubPlanState *sstate = (SubPlanState *) lfirst(l);
|
|
PlanState *splan = sstate->planstate;
|
|
|
|
if (splan->plan->extParam != NULL) /* don't care about child
|
|
* local Params */
|
|
UpdateChangedParamSet(splan, node->chgParam);
|
|
if (splan->chgParam != NULL)
|
|
ExecReScanSetParamPlan(sstate, node);
|
|
}
|
|
foreach(l, node->subPlan)
|
|
{
|
|
SubPlanState *sstate = (SubPlanState *) lfirst(l);
|
|
PlanState *splan = sstate->planstate;
|
|
|
|
if (splan->plan->extParam != NULL)
|
|
UpdateChangedParamSet(splan, node->chgParam);
|
|
}
|
|
/* Well. Now set chgParam for left/right trees. */
|
|
if (node->lefttree != NULL)
|
|
UpdateChangedParamSet(node->lefttree, node->chgParam);
|
|
if (node->righttree != NULL)
|
|
UpdateChangedParamSet(node->righttree, node->chgParam);
|
|
}
|
|
|
|
/* Call expression callbacks */
|
|
if (node->ps_ExprContext)
|
|
ReScanExprContext(node->ps_ExprContext);
|
|
|
|
/* And do node-type-specific processing */
|
|
switch (nodeTag(node))
|
|
{
|
|
case T_ResultState:
|
|
ExecReScanResult((ResultState *) node);
|
|
break;
|
|
|
|
case T_ProjectSetState:
|
|
ExecReScanProjectSet((ProjectSetState *) node);
|
|
break;
|
|
|
|
case T_ModifyTableState:
|
|
ExecReScanModifyTable((ModifyTableState *) node);
|
|
break;
|
|
|
|
case T_AppendState:
|
|
ExecReScanAppend((AppendState *) node);
|
|
break;
|
|
|
|
case T_MergeAppendState:
|
|
ExecReScanMergeAppend((MergeAppendState *) node);
|
|
break;
|
|
|
|
case T_RecursiveUnionState:
|
|
ExecReScanRecursiveUnion((RecursiveUnionState *) node);
|
|
break;
|
|
|
|
case T_BitmapAndState:
|
|
ExecReScanBitmapAnd((BitmapAndState *) node);
|
|
break;
|
|
|
|
case T_BitmapOrState:
|
|
ExecReScanBitmapOr((BitmapOrState *) node);
|
|
break;
|
|
|
|
case T_SeqScanState:
|
|
ExecReScanSeqScan((SeqScanState *) node);
|
|
break;
|
|
|
|
case T_SampleScanState:
|
|
ExecReScanSampleScan((SampleScanState *) node);
|
|
break;
|
|
|
|
case T_GatherState:
|
|
ExecReScanGather((GatherState *) node);
|
|
break;
|
|
|
|
case T_GatherMergeState:
|
|
ExecReScanGatherMerge((GatherMergeState *) node);
|
|
break;
|
|
|
|
case T_IndexScanState:
|
|
ExecReScanIndexScan((IndexScanState *) node);
|
|
break;
|
|
|
|
case T_IndexOnlyScanState:
|
|
ExecReScanIndexOnlyScan((IndexOnlyScanState *) node);
|
|
break;
|
|
|
|
case T_BitmapIndexScanState:
|
|
ExecReScanBitmapIndexScan((BitmapIndexScanState *) node);
|
|
break;
|
|
|
|
case T_BitmapHeapScanState:
|
|
ExecReScanBitmapHeapScan((BitmapHeapScanState *) node);
|
|
break;
|
|
|
|
case T_TidScanState:
|
|
ExecReScanTidScan((TidScanState *) node);
|
|
break;
|
|
|
|
case T_TidRangeScanState:
|
|
ExecReScanTidRangeScan((TidRangeScanState *) node);
|
|
break;
|
|
|
|
case T_SubqueryScanState:
|
|
ExecReScanSubqueryScan((SubqueryScanState *) node);
|
|
break;
|
|
|
|
case T_FunctionScanState:
|
|
ExecReScanFunctionScan((FunctionScanState *) node);
|
|
break;
|
|
|
|
case T_TableFuncScanState:
|
|
ExecReScanTableFuncScan((TableFuncScanState *) node);
|
|
break;
|
|
|
|
case T_ValuesScanState:
|
|
ExecReScanValuesScan((ValuesScanState *) node);
|
|
break;
|
|
|
|
case T_CteScanState:
|
|
ExecReScanCteScan((CteScanState *) node);
|
|
break;
|
|
|
|
case T_NamedTuplestoreScanState:
|
|
ExecReScanNamedTuplestoreScan((NamedTuplestoreScanState *) node);
|
|
break;
|
|
|
|
case T_WorkTableScanState:
|
|
ExecReScanWorkTableScan((WorkTableScanState *) node);
|
|
break;
|
|
|
|
case T_ForeignScanState:
|
|
ExecReScanForeignScan((ForeignScanState *) node);
|
|
break;
|
|
|
|
case T_CustomScanState:
|
|
ExecReScanCustomScan((CustomScanState *) node);
|
|
break;
|
|
|
|
case T_NestLoopState:
|
|
ExecReScanNestLoop((NestLoopState *) node);
|
|
break;
|
|
|
|
case T_MergeJoinState:
|
|
ExecReScanMergeJoin((MergeJoinState *) node);
|
|
break;
|
|
|
|
case T_HashJoinState:
|
|
ExecReScanHashJoin((HashJoinState *) node);
|
|
break;
|
|
|
|
case T_MaterialState:
|
|
ExecReScanMaterial((MaterialState *) node);
|
|
break;
|
|
|
|
case T_ResultCacheState:
|
|
ExecReScanResultCache((ResultCacheState *) node);
|
|
break;
|
|
|
|
case T_SortState:
|
|
ExecReScanSort((SortState *) node);
|
|
break;
|
|
|
|
case T_IncrementalSortState:
|
|
ExecReScanIncrementalSort((IncrementalSortState *) node);
|
|
break;
|
|
|
|
case T_GroupState:
|
|
ExecReScanGroup((GroupState *) node);
|
|
break;
|
|
|
|
case T_AggState:
|
|
ExecReScanAgg((AggState *) node);
|
|
break;
|
|
|
|
case T_WindowAggState:
|
|
ExecReScanWindowAgg((WindowAggState *) node);
|
|
break;
|
|
|
|
case T_UniqueState:
|
|
ExecReScanUnique((UniqueState *) node);
|
|
break;
|
|
|
|
case T_HashState:
|
|
ExecReScanHash((HashState *) node);
|
|
break;
|
|
|
|
case T_SetOpState:
|
|
ExecReScanSetOp((SetOpState *) node);
|
|
break;
|
|
|
|
case T_LockRowsState:
|
|
ExecReScanLockRows((LockRowsState *) node);
|
|
break;
|
|
|
|
case T_LimitState:
|
|
ExecReScanLimit((LimitState *) node);
|
|
break;
|
|
|
|
default:
|
|
elog(ERROR, "unrecognized node type: %d", (int) nodeTag(node));
|
|
break;
|
|
}
|
|
|
|
if (node->chgParam != NULL)
|
|
{
|
|
bms_free(node->chgParam);
|
|
node->chgParam = NULL;
|
|
}
|
|
}
|
|
|
|
/*
|
|
* ExecMarkPos
|
|
*
|
|
* Marks the current scan position.
|
|
*
|
|
* NOTE: mark/restore capability is currently needed only for plan nodes
|
|
* that are the immediate inner child of a MergeJoin node. Since MergeJoin
|
|
* requires sorted input, there is never any need to support mark/restore in
|
|
* node types that cannot produce sorted output. There are some cases in
|
|
* which a node can pass through sorted data from its child; if we don't
|
|
* implement mark/restore for such a node type, the planner compensates by
|
|
* inserting a Material node above that node.
|
|
*/
|
|
void
|
|
ExecMarkPos(PlanState *node)
|
|
{
|
|
switch (nodeTag(node))
|
|
{
|
|
case T_IndexScanState:
|
|
ExecIndexMarkPos((IndexScanState *) node);
|
|
break;
|
|
|
|
case T_IndexOnlyScanState:
|
|
ExecIndexOnlyMarkPos((IndexOnlyScanState *) node);
|
|
break;
|
|
|
|
case T_CustomScanState:
|
|
ExecCustomMarkPos((CustomScanState *) node);
|
|
break;
|
|
|
|
case T_MaterialState:
|
|
ExecMaterialMarkPos((MaterialState *) node);
|
|
break;
|
|
|
|
case T_SortState:
|
|
ExecSortMarkPos((SortState *) node);
|
|
break;
|
|
|
|
case T_ResultState:
|
|
ExecResultMarkPos((ResultState *) node);
|
|
break;
|
|
|
|
default:
|
|
/* don't make hard error unless caller asks to restore... */
|
|
elog(DEBUG2, "unrecognized node type: %d", (int) nodeTag(node));
|
|
break;
|
|
}
|
|
}
|
|
|
|
/*
|
|
* ExecRestrPos
|
|
*
|
|
* restores the scan position previously saved with ExecMarkPos()
|
|
*
|
|
* NOTE: the semantics of this are that the first ExecProcNode following
|
|
* the restore operation will yield the same tuple as the first one following
|
|
* the mark operation. It is unspecified what happens to the plan node's
|
|
* result TupleTableSlot. (In most cases the result slot is unchanged by
|
|
* a restore, but the node may choose to clear it or to load it with the
|
|
* restored-to tuple.) Hence the caller should discard any previously
|
|
* returned TupleTableSlot after doing a restore.
|
|
*/
|
|
void
|
|
ExecRestrPos(PlanState *node)
|
|
{
|
|
switch (nodeTag(node))
|
|
{
|
|
case T_IndexScanState:
|
|
ExecIndexRestrPos((IndexScanState *) node);
|
|
break;
|
|
|
|
case T_IndexOnlyScanState:
|
|
ExecIndexOnlyRestrPos((IndexOnlyScanState *) node);
|
|
break;
|
|
|
|
case T_CustomScanState:
|
|
ExecCustomRestrPos((CustomScanState *) node);
|
|
break;
|
|
|
|
case T_MaterialState:
|
|
ExecMaterialRestrPos((MaterialState *) node);
|
|
break;
|
|
|
|
case T_SortState:
|
|
ExecSortRestrPos((SortState *) node);
|
|
break;
|
|
|
|
case T_ResultState:
|
|
ExecResultRestrPos((ResultState *) node);
|
|
break;
|
|
|
|
default:
|
|
elog(ERROR, "unrecognized node type: %d", (int) nodeTag(node));
|
|
break;
|
|
}
|
|
}
|
|
|
|
/*
|
|
* ExecSupportsMarkRestore - does a Path support mark/restore?
|
|
*
|
|
* This is used during planning and so must accept a Path, not a Plan.
|
|
* We keep it here to be adjacent to the routines above, which also must
|
|
* know which plan types support mark/restore.
|
|
*/
|
|
bool
|
|
ExecSupportsMarkRestore(Path *pathnode)
|
|
{
|
|
/*
|
|
* For consistency with the routines above, we do not examine the nodeTag
|
|
* but rather the pathtype, which is the Plan node type the Path would
|
|
* produce.
|
|
*/
|
|
switch (pathnode->pathtype)
|
|
{
|
|
case T_IndexScan:
|
|
case T_IndexOnlyScan:
|
|
/*
|
|
* Not all index types support mark/restore.
|
|
*/
|
|
return castNode(IndexPath, pathnode)->indexinfo->amcanmarkpos;
|
|
|
|
case T_Material:
|
|
case T_Sort:
|
|
return true;
|
|
|
|
case T_CustomScan:
|
|
{
|
|
CustomPath *customPath = castNode(CustomPath, pathnode);
|
|
|
|
if (customPath->flags & CUSTOMPATH_SUPPORT_MARK_RESTORE)
|
|
return true;
|
|
return false;
|
|
}
|
|
case T_Result:
|
|
|
|
/*
|
|
* Result supports mark/restore iff it has a child plan that does.
|
|
*
|
|
* We have to be careful here because there is more than one Path
|
|
* type that can produce a Result plan node.
|
|
*/
|
|
if (IsA(pathnode, ProjectionPath))
|
|
return ExecSupportsMarkRestore(((ProjectionPath *) pathnode)->subpath);
|
|
else if (IsA(pathnode, MinMaxAggPath))
|
|
return false; /* childless Result */
|
|
else if (IsA(pathnode, GroupResultPath))
|
|
return false; /* childless Result */
|
|
else
|
|
{
|
|
/* Simple RTE_RESULT base relation */
|
|
Assert(IsA(pathnode, Path));
|
|
return false; /* childless Result */
|
|
}
|
|
|
|
case T_Append:
|
|
{
|
|
AppendPath *appendPath = castNode(AppendPath, pathnode);
|
|
|
|
/*
|
|
* If there's exactly one child, then there will be no Append
|
|
* in the final plan, so we can handle mark/restore if the
|
|
* child plan node can.
|
|
*/
|
|
if (list_length(appendPath->subpaths) == 1)
|
|
return ExecSupportsMarkRestore((Path *) linitial(appendPath->subpaths));
|
|
/* Otherwise, Append can't handle it */
|
|
return false;
|
|
}
|
|
|
|
case T_MergeAppend:
|
|
{
|
|
MergeAppendPath *mapath = castNode(MergeAppendPath, pathnode);
|
|
|
|
/*
|
|
* Like the Append case above, single-subpath MergeAppends
|
|
* won't be in the final plan, so just return the child's
|
|
* mark/restore ability.
|
|
*/
|
|
if (list_length(mapath->subpaths) == 1)
|
|
return ExecSupportsMarkRestore((Path *) linitial(mapath->subpaths));
|
|
/* Otherwise, MergeAppend can't handle it */
|
|
return false;
|
|
}
|
|
|
|
default:
|
|
break;
|
|
}
|
|
|
|
return false;
|
|
}
|
|
|
|
/*
|
|
* ExecSupportsBackwardScan - does a plan type support backwards scanning?
|
|
*
|
|
* Ideally, all plan types would support backwards scan, but that seems
|
|
* unlikely to happen soon. In some cases, a plan node passes the backwards
|
|
* scan down to its children, and so supports backwards scan only if its
|
|
* children do. Therefore, this routine must be passed a complete plan tree.
|
|
*/
|
|
bool
|
|
ExecSupportsBackwardScan(Plan *node)
|
|
{
|
|
if (node == NULL)
|
|
return false;
|
|
|
|
/*
|
|
* Parallel-aware nodes return a subset of the tuples in each worker, and
|
|
* in general we can't expect to have enough bookkeeping state to know
|
|
* which ones we returned in this worker as opposed to some other worker.
|
|
*/
|
|
if (node->parallel_aware)
|
|
return false;
|
|
|
|
switch (nodeTag(node))
|
|
{
|
|
case T_Result:
|
|
if (outerPlan(node) != NULL)
|
|
return ExecSupportsBackwardScan(outerPlan(node));
|
|
else
|
|
return false;
|
|
|
|
case T_Append:
|
|
{
|
|
ListCell *l;
|
|
|
|
/* With async, tuples may be interleaved, so can't back up. */
|
|
if (((Append *) node)->nasyncplans > 0)
|
|
return false;
|
|
|
|
foreach(l, ((Append *) node)->appendplans)
|
|
{
|
|
if (!ExecSupportsBackwardScan((Plan *) lfirst(l)))
|
|
return false;
|
|
}
|
|
/* need not check tlist because Append doesn't evaluate it */
|
|
return true;
|
|
}
|
|
|
|
case T_SampleScan:
|
|
/* Simplify life for tablesample methods by disallowing this */
|
|
return false;
|
|
|
|
case T_Gather:
|
|
return false;
|
|
|
|
case T_IndexScan:
|
|
return IndexSupportsBackwardScan(((IndexScan *) node)->indexid);
|
|
|
|
case T_IndexOnlyScan:
|
|
return IndexSupportsBackwardScan(((IndexOnlyScan *) node)->indexid);
|
|
|
|
case T_SubqueryScan:
|
|
return ExecSupportsBackwardScan(((SubqueryScan *) node)->subplan);
|
|
|
|
case T_CustomScan:
|
|
{
|
|
uint32 flags = ((CustomScan *) node)->flags;
|
|
|
|
if (flags & CUSTOMPATH_SUPPORT_BACKWARD_SCAN)
|
|
return true;
|
|
}
|
|
return false;
|
|
|
|
case T_SeqScan:
|
|
case T_TidScan:
|
|
case T_TidRangeScan:
|
|
case T_FunctionScan:
|
|
case T_ValuesScan:
|
|
case T_CteScan:
|
|
case T_Material:
|
|
case T_Sort:
|
|
/* these don't evaluate tlist */
|
|
return true;
|
|
|
|
case T_IncrementalSort:
|
|
|
|
/*
|
|
* Unlike full sort, incremental sort keeps only a single group of
|
|
* tuples in memory, so it can't scan backwards.
|
|
*/
|
|
return false;
|
|
|
|
case T_LockRows:
|
|
case T_Limit:
|
|
return ExecSupportsBackwardScan(outerPlan(node));
|
|
|
|
default:
|
|
return false;
|
|
}
|
|
}
|
|
|
|
/*
|
|
* An IndexScan or IndexOnlyScan node supports backward scan only if the
|
|
* index's AM does.
|
|
*/
|
|
static bool
|
|
IndexSupportsBackwardScan(Oid indexid)
|
|
{
|
|
bool result;
|
|
HeapTuple ht_idxrel;
|
|
Form_pg_class idxrelrec;
|
|
IndexAmRoutine *amroutine;
|
|
|
|
/* Fetch the pg_class tuple of the index relation */
|
|
ht_idxrel = SearchSysCache1(RELOID, ObjectIdGetDatum(indexid));
|
|
if (!HeapTupleIsValid(ht_idxrel))
|
|
elog(ERROR, "cache lookup failed for relation %u", indexid);
|
|
idxrelrec = (Form_pg_class) GETSTRUCT(ht_idxrel);
|
|
|
|
/* Fetch the index AM's API struct */
|
|
amroutine = GetIndexAmRoutineByAmId(idxrelrec->relam, false);
|
|
|
|
result = amroutine->amcanbackward;
|
|
|
|
pfree(amroutine);
|
|
ReleaseSysCache(ht_idxrel);
|
|
|
|
return result;
|
|
}
|
|
|
|
/*
|
|
* ExecMaterializesOutput - does a plan type materialize its output?
|
|
*
|
|
* Returns true if the plan node type is one that automatically materializes
|
|
* its output (typically by keeping it in a tuplestore). For such plans,
|
|
* a rescan without any parameter change will have zero startup cost and
|
|
* very low per-tuple cost.
|
|
*/
|
|
bool
|
|
ExecMaterializesOutput(NodeTag plantype)
|
|
{
|
|
switch (plantype)
|
|
{
|
|
case T_Material:
|
|
case T_FunctionScan:
|
|
case T_TableFuncScan:
|
|
case T_CteScan:
|
|
case T_NamedTuplestoreScan:
|
|
case T_WorkTableScan:
|
|
case T_Sort:
|
|
return true;
|
|
|
|
default:
|
|
break;
|
|
}
|
|
|
|
return false;
|
|
}
|