node, as this behavior is now better done as a bitmap OR indexscan.
This allows considerable simplification in nodeIndexscan.c itself as
well as several planner modules concerned with indexscan plan generation.
Also we can improve the sharing of code between regular and bitmap
indexscans, since they are now working with nigh-identical Plan nodes.
but the code is basically working. Along the way, rewrite the entire
approach to processing OR index conditions, and make it work in join
cases for the first time ever. orindxpath.c is now basically obsolete,
but I left it in for the time being to allow easy comparison testing
against the old implementation.
logic operations during planning. Seems cleaner to create two new Path
node types, instead --- this avoids duplication of cost-estimation code.
Also, create an enable_bitmapscan GUC parameter to control use of bitmap
plans.
scans, using in-memory tuple ID bitmaps as the intermediary. The planner
frontend (path creation and cost estimation) is not there yet, so none
of this code can be executed. I have tested it using some hacked planner
code that is far too ugly to see the light of day, however. Committing
now so that the bulk of the infrastructure changes go in before the tree
drifts under me.
structs. There are many places in the planner where we were passing
both a rel and an index to subroutines, and now need only pass the
index struct. Notationally simpler, and perhaps a tad faster.
Also performed an initial run through of upgrading our Copyright date to
extend to 2005 ... first run here was very simple ... change everything
where: grep 1996-2004 && the word 'Copyright' ... scanned through the
generated list with 'less' first, and after, to make sure that I only
picked up the right entries ...
a relation's number of blocks, rather than the possibly-obsolete value
in pg_class.relpages. Scale the value in pg_class.reltuples correspondingly
to arrive at a hopefully more accurate number of rows. When pg_class
contains 0/0, estimate a tuple width from the column datatypes and divide
that into current file size to estimate number of rows. This improved
methodology allows us to jettison the ancient hacks that put bogus default
values into pg_class when a table is first created. Also, per a suggestion
from Simon, make VACUUM (but not VACUUM FULL or ANALYZE) adjust the value
it puts into pg_class.reltuples to try to represent the mean tuple density
instead of the minimal density that actually prevails just after VACUUM.
These changes alter the plans selected for certain regression tests, so
update the expected files accordingly. (I removed join_1.out because
it's not clear if it still applies; we can add back any variant versions
as they are shown to be needed.)
1. Solve the problem of not having TOAST references hiding inside composite
values by establishing the rule that toasting only goes one level deep:
a tuple can contain toasted fields, but a composite-type datum that is
to be inserted into a tuple cannot. Enforcing this in heap_formtuple
is relatively cheap and it avoids a large increase in the cost of running
the tuptoaster during final storage of a row.
2. Fix some interesting problems in expansion of inherited queries that
reference whole-row variables. We never really did this correctly before,
but it's now relatively painless to solve by expanding the parent's
whole-row Var into a RowExpr() selecting the proper columns from the
child.
If you dike out the preventive check in CheckAttributeType(),
composite-type columns now seem to actually work. However, we surely
cannot ship them like this --- without I/O for composite types, you
can't get pg_dump to dump tables containing them. So a little more
work still to do.
with index qual clauses in the Path representation. This saves a little
work during createplan and (probably more importantly) allows reuse of
cached selectivity estimates during indexscan planning. Also fix latent
bug: wrong plan would have been generated for a 'special operator' used
in a nestloop-inner-indexscan join qual, because the special operator
would not have gotten into the list of quals to recheck. This bug is
only latent because at present the special-operator code could never
trigger on a join qual, but sooner or later someone will want to do it.
join conditions in which each OR subclause includes a constraint on
the same relation. This implements the other useful side-effect of
conversion to CNF format, without its unpleasant side-effects. As
per pghackers discussion of a few weeks ago.
teaching the latter to accept either RestrictInfo nodes or bare
clause expressions; and cache the selectivity result in the RestrictInfo
node when possible. This extends the caching behavior of approx_selectivity
to many more contexts, and should reduce duplicate selectivity
calculations.
first time generate an OR indexscan for a two-column index when the WHERE
condition is like 'col1 = foo AND (col2 = bar OR col2 = baz)' --- before,
the OR had to be on the first column of the index or we'd not notice the
possibility of using it. Some progress towards extracting OR indexscans
from subclauses of an OR that references multiple relations, too, although
this code is #ifdef'd out because it needs more work.
fields: now they are valid whenever the clause is a binary opclause,
not only when it is a potential join clause (there is a new boolean
field canjoin to signal the latter condition). This lets us avoid
recomputing the relid sets over and over while examining indexes.
Still more work to do to make this as useful as it could be, because
there are places that could use the info but don't have access to the
RestrictInfo node.
about whether it is applied before or after eval_const_expressions().
I believe there were some corner cases where the system would fail to
recognize that a partial index is applicable because of the previous
inconsistency. Store normal rather than 'implicit AND' representations
of constraints and index predicates in the catalogs.
initdb forced due to representation change of constraints/predicates.
node emits only those vars that are actually needed above it in the
plan tree. (There were comments in the code suggesting that this was
done at some point in the dim past, but for a long time we have just
made join nodes emit everything that either input emitted.) Aside from
being marginally more efficient, this fixes the problem noted by Peter
Eisentraut where a join above an IN-implemented-as-join might fail,
because the subplan targetlist constructed in the latter case didn't
meet the expectation of including everything.
Along the way, fix some places that were O(N^2) in the targetlist
length. This is not all the trouble spots for wide queries by any
means, but it's a step forward.
some cases of redundant clauses that were formerly not caught. We have
to special-case this because the clauses involved never get attached to
the same join restrictlist and so the existing logic does not notice
that they are redundant.
of an index can now be a computed expression instead of a simple variable.
Restrictions on expressions are the same as for predicates (only immutable
functions, no sub-selects). This fixes problems recently introduced with
inlining SQL functions, because the inlining transformation is applied to
both expression trees so the planner can still match them up. Along the
way, improve efficiency of handling index predicates (both predicates and
index expressions are now cached by the relcache) and fix 7.3 oversight
that didn't record dependencies of predicate expressions.
nodes where it's not really necessary. In many cases where the scan node
is not the topmost plan node (eg, joins, aggregation), it's possible to
just return the table tuple directly instead of generating an intermediate
projection tuple. In preliminary testing, this reduced the CPU time
needed for 'SELECT COUNT(*) FROM foo' by about 10%.
There are two implementation techniques: the executor understands a new
JOIN_IN jointype, which emits at most one matching row per left-hand row,
or the result of the IN's sub-select can be fed through a DISTINCT filter
and then joined as an ordinary relation.
Along the way, some minor code cleanup in the optimizer; notably, break
out most of the jointree-rearrangement preprocessing in planner.c and
put it in a new file prep/prepjointree.c.
containing a volatile function), rather than only on 'Var = Var' clauses
as before. This makes it practical to do flatten_join_alias_vars at the
start of planning, which in turn eliminates a bunch of klugery inside the
planner to deal with alias vars. As a free side effect, we now detect
implied equality of non-Var expressions; for example in
SELECT ... WHERE a.x = b.y and b.y = 42
we will deduce a.x = 42 and use that as a restriction qual on a. Also,
we can remove the restriction introduced 12/5/02 to prevent pullup of
subqueries whose targetlists contain sublinks.
Still TODO: make statistical estimation routines in selfuncs.c and costsize.c
smarter about expressions that are more complex than plain Vars. The need
for this is considerably greater now that we have to be able to estimate
the suitability of merge and hash join techniques on such expressions.
costs for expression evaluation, not only per-tuple cost as before.
This extension is needed in order to deal realistically with hashed or
materialized sub-selects.
so that all executable expression nodes inherit from a common supertype
Expr. This is somewhat of an exercise in code purity rather than any
real functional advance, but getting rid of the extra Oper or Func node
formerly used in each operator or function call should provide at least
a little space and speed improvement.
initdb forced by changes in stored-rules representation.
to plan nodes, not vice-versa. All executor state nodes now inherit from
struct PlanState. Copying of plan trees has been simplified by not
storing a list of SubPlans in Plan nodes (eliminating duplicate links).
The executor still needs such a list, but it can build it during
ExecutorStart since it has to scan the plan tree anyway.
No initdb forced since no stored-on-disk structures changed, but you
will need a full recompile because of node-numbering changes.
instead of only one. This should speed up planning (only one hash path
to consider for a given pair of relations) as well as allow more effective
hashing, when there are multiple hashable joinclauses.
joinclauses is determined accurately for each join. Formerly, the code only
considered joinclauses that used all of the rels from the outer side of the
join; thus for example
FROM (a CROSS JOIN b) JOIN c ON (c.f1 = a.x AND c.f2 = b.y)
could not exploit a two-column index on c(f1,f2), since neither of the
qual clauses would be in the joininfo list it looked in. The new code does
this correctly, and also is able to eliminate redundant clauses, thus fixing
the problem noted 24-Oct-02 by Hans-Jürgen Schönig.
node now does its own grouping of the input rows, and has no need for a
preceding GROUP node in the plan pipeline. This allows elimination of
the misnamed tuplePerGroup option for GROUP, and actually saves more code
in nodeGroup.c than it costs in nodeAgg.c, as well as being presumably
faster. Restructure the API of query_planner so that we do not commit to
using a sorted or unsorted plan in query_planner; instead grouping_planner
makes the decision. (Right now it isn't any smarter than query_planner
was, but that will change as soon as it has the option to select a hash-
based aggregation step.) Despite all the hackery, no initdb needed since
only in-memory node types changed.
latent wrong-struct-type bugs and makes the coding style more uniform,
since the majority of places working with lists of column names were
already using Strings not Idents. While at it, remove vestigial
support for Stream node type, and otherwise-unreferenced nodes.h entries
for T_TupleCount and T_BaseNode.
NB: full recompile is recommended due to changes of Node type numbers.
This shouldn't force an initdb though.
some kibitzing from Tom Lane. Not everything works yet, and there's
no documentation or regression test, but let's commit this so Joe
doesn't need to cope with tracking changes in so many files ...
now has an RTE of its own, and references to its outputs now are Vars
referencing the JOIN RTE, rather than CASE-expressions. This allows
reverse-listing in ruleutils.c to use the correct alias easily, rather
than painfully reverse-engineering the alias namespace as it used to do.
Also, nested FULL JOINs work correctly, because the result of the inner
joins are simple Vars that the planner can cope with. This fixes a bug
reported a couple times now, notably by Tatsuo on 18-Nov-01. The alias
Vars are expanded into COALESCE expressions where needed at the very end
of planning, rather than during parsing.
Also, beginnings of support for showing plan qualifier expressions in
EXPLAIN. There are probably still cases that need work.
initdb forced due to change of stored-rule representation.
pgsql-hackers. pg_opclass now has a row for each opclass supported by each
index AM, not a row for each opclass name. This allows pg_opclass to show
directly whether an AM supports an opclass, and furthermore makes it possible
to store additional information about an opclass that might be AM-dependent.
pg_opclass and pg_amop now store "lossy" and "haskeytype" information that we
previously expected the user to remember to provide in CREATE INDEX commands.
Lossiness is no longer an index-level property, but is associated with the
use of a particular operator in a particular index opclass.
Along the way, IndexSupportInitialize now uses the syscaches to retrieve
pg_amop and pg_amproc entries. I find this reduces backend launch time by
about ten percent, at the cost of a couple more special cases in catcache.c's
IndexScanOK.
Initial work by Oleg Bartunov and Teodor Sigaev, further hacking by Tom Lane.
initdb forced.
of costsize.c routines to pass Query root, so that costsize can figure
more things out by itself and not be so dependent on its callers to tell
it everything it needs to know. Use selectivity of hash or merge clause
to estimate number of tuples processed internally in these joins
(this is more useful than it would've been before, since eqjoinsel is
somewhat more accurate than before).
create_index_paths are not immediately discarded, but are available for
subsequent planner work. This allows avoiding redundant syscache lookups
in several places. Change interface to operator selectivity estimation
procedures to allow faster and more flexible estimation.
Initdb forced due to change of pg_proc entries for selectivity functions!
a separate statement (though it can still be invoked as part of VACUUM, too).
pg_statistic redesigned to be more flexible about what statistics are
stored. ANALYZE now collects a list of several of the most common values,
not just one, plus a histogram (not just the min and max values). Random
sampling is used to make the process reasonably fast even on very large
tables. The number of values and histogram bins collected is now
user-settable via an ALTER TABLE command.
There is more still to do; the new stats are not being used everywhere
they could be in the planner. But the remaining changes for this project
should be localized, and the behavior is already better than before.
A not-very-related change is that sorting now makes use of btree comparison
routines if it can find one, rather than invoking '<' twice.
comparison does not consider paths different when they differ only in
uninteresting aspects of sort order. (We had a special case of this
consideration for indexscans already, but generalize it to apply to
ordered join paths too.) Be stricter about what is a canonical pathkey
to allow faster pathkey comparison. Cache canonical pathkeys and
dispersion stats for left and right sides of a RestrictInfo's clause,
to avoid repeated computation. Total speedup will depend on number of
tables in a query, but I see about 4x speedup of planning phase for
a sample seven-table query.
avoid repeated evaluations in cost_qual_eval(). This turns out to save
a useful fraction of planning time. No change to external representation
of RestrictInfo --- although that node type doesn't appear in stored
rules anyway.
joins, and clean things up a good deal at the same time. Append plan node
no longer hacks on rangetable at runtime --- instead, all child tables are
given their own RT entries during planning. Concept of multiple target
tables pushed up into execMain, replacing bug-prone implementation within
nodeAppend. Planner now supports generating Append plans for inheritance
sets either at the top of the plan (the old way) or at the bottom. Expanding
at the bottom is appropriate for tables used as sources, since they may
appear inside an outer join; but we must still expand at the top when the
target of an UPDATE or DELETE is an inheritance set, because we actually need
a different targetlist and junkfilter for each target table in that case.
Fortunately a target table can't be inside an outer join... Bizarre mutual
recursion between union_planner and prepunion.c is gone --- in fact,
union_planner doesn't really have much to do with union queries anymore,
so I renamed it grouping_planner.
(Don't forget that an alias is required.) Views reimplemented as expanding
to subselect-in-FROM. Grouping, aggregates, DISTINCT in views actually
work now (he says optimistically). No UNION support in subselects/views
yet, but I have some ideas about that. Rule-related permissions checking
moved out of rewriter and into executor.
INITDB REQUIRED!
costs using the inner path's parent->rows count as the number of tuples
processed per inner scan iteration. This is wrong when we are using an
inner indexscan with indexquals based on join clauses, because the rows
count in a Relation node reflects the selectivity of the restriction
clauses for that rel only. Upshot was that if join clause was very
selective, we'd drastically overestimate the true cost of the join.
Fix is to calculate correct output-rows estimate for an inner indexscan
when the IndexPath node is created and save it in the path node.
Change of path node doesn't require initdb, since path nodes don't
appear in saved rules.
accesses versus sequential accesses, a (very crude) estimate of the
effects of caching on random page accesses, and cost to evaluate WHERE-
clause expressions. Export critical parameters for this model as SET
variables. Also, create SET variables for the planner's enable flags
(enable_seqscan, enable_indexscan, etc) so that these can be controlled
more conveniently than via PGOPTIONS.
Planner now estimates both startup cost (cost before retrieving
first tuple) and total cost of each path, so it can optimize queries
with LIMIT on a reasonable basis by interpolating between these costs.
Same facility is a win for EXISTS(...) subqueries and some other cases.
Redesign pathkey representation to achieve a major speedup in planning
(I saw as much as 5X on a 10-way join); also minor changes in planner
to reduce memory consumption by recycling discarded Path nodes and
not constructing unnecessary lists.
Minor cleanups to display more-plausible costs in some cases in
EXPLAIN output.
Initdb forced by change in interface to index cost estimation
functions.
fields in JoinPaths --- turns out that we do need that after all :-(.
Also, rearrange planner so that only one RelOptInfo is created for a
particular set of joined base relations, no matter how many different
subsets of relations it can be created from. This saves memory and
processing time compared to the old method of making a bunch of RelOptInfos
and then removing the duplicates. Clean up the jointree iteration logic;
not sure if it's better, but I sure find it more readable and plausible
now, particularly for the case of 'bushy plans'.
pghackers discussion of 5-Jan-2000. The amopselect and amopnpages
estimators are gone, and in their place is a per-AM amcostestimate
procedure (linked to from pg_am, not pg_amop).
store all ordering information in pathkeys lists (which are now lists of
lists of PathKeyItem nodes, not just lists of lists of vars). This was
a big win --- the code is smaller and IMHO more understandable than it
was, even though it handles more cases. I believe the node changes will
not force an initdb for anyone; planner nodes don't show up in stored
rules.
optimizer rather than parser. This has many advantages, such as not
getting fooled by chance uses of operator names ~ and ~~ (the operators
are identified by OID now), and not creating useless comparison operations
in contexts where the comparisons will not actually be used as indexquals.
The new code also recognizes exact-match LIKE and regex patterns, and
produces an = indexqual instead of >= and <=.
This change does NOT fix the problem with non-ASCII locales: the code
still doesn't know how to generate an upper bound indexqual for non-ASCII
collation order. But it's no worse than before, just the same deficiency
in a different place...
Also, dike out loc_restrictinfo fields in Plan nodes. These were doing
nothing useful in the absence of 'expensive functions' optimization,
and they took a considerable amount of processing to fill in.
The only place it was being used was as temporary storage in indxpath.c,
and the logic was wrong: the same restrictinfo node could get chosen to
carry the info for two different joins. Right fix is to return a second
list of unjoined-relids parallel to the list of clause groups.
identified by Hiroshi (incorrect cost attributed to OR clauses
after multiple passes through set_rest_selec()). I think the code
was trying to allow selectivities of OR subclauses to be passed in
from outside, but noplace was actually passing any useful data, and
set_rest_selec() was passing wrong data.
Restructure representation of "indexqual" in IndexPath nodes so that
it is the same as for indxqual in completed IndexScan nodes: namely,
a toplevel list with an entry for each pass of the index scan, having
sublists that are implicitly-ANDed index qual conditions for that pass.
You don't want to know what the old representation was :-(
Improve documentation of OR-clause indexscan functions.
Remove useless 'notclause' field from RestrictInfo nodes. (This might
force an initdb for anyone who has stored rules containing RestrictInfos,
but I do not think that RestrictInfo ever appears in completed plans.)