postgresql/src/include/utils
Peter Geoghegan 92fe23d93a Add nbtree skip scan optimization.
Teach nbtree multi-column index scans to opportunistically skip over
irrelevant sections of the index given a query with no "=" conditions on
one or more prefix index columns.  When nbtree is passed input scan keys
derived from a predicate "WHERE b = 5", new nbtree preprocessing steps
output "WHERE a = ANY(<every possible 'a' value>) AND b = 5" scan keys.
That is, preprocessing generates a "skip array" (and an output scan key)
for the omitted prefix column "a", which makes it safe to mark the scan
key on "b" as required to continue the scan.  The scan is therefore able
to repeatedly reposition itself by applying both the "a" and "b" keys.

A skip array has "elements" that are generated procedurally and on
demand, but otherwise works just like a regular ScalarArrayOp array.
Preprocessing can freely add a skip array before or after any input
ScalarArrayOp arrays.  Index scans with a skip array decide when and
where to reposition the scan using the same approach as any other scan
with array keys.  This design builds on the design for array advancement
and primitive scan scheduling added to Postgres 17 by commit 5bf748b8.

Testing has shown that skip scans of an index with a low cardinality
skipped prefix column can be multiple orders of magnitude faster than an
equivalent full index scan (or sequential scan).  In general, the
cardinality of the scan's skipped column(s) limits the number of leaf
pages that can be skipped over.

The core B-Tree operator classes on most discrete types generate their
array elements with the help of their own custom skip support routine.
This infrastructure gives nbtree a way to generate the next required
array element by incrementing (or decrementing) the current array value.
It can reduce the number of index descents in cases where the next
possible indexable value frequently turns out to be the next value
stored in the index.  Opclasses that lack a skip support routine fall
back on having nbtree "increment" (or "decrement") a skip array's
current element by setting the NEXT (or PRIOR) scan key flag, without
directly changing the scan key's sk_argument.  These sentinel values
behave just like any other value from an array -- though they can never
locate equal index tuples (they can only locate the next group of index
tuples containing the next set of non-sentinel values that the scan's
arrays need to advance to).

A skip array's range is constrained by "contradictory" inequality keys.
For example, a skip array on "x" will only generate the values 1 and 2
given a qual such as "WHERE x BETWEEN 1 AND 2 AND y = 66".  Such a skip
array qual usually has near-identical performance characteristics to a
comparable SAOP qual "WHERE x = ANY('{1, 2}') AND y = 66".  However,
improved performance isn't guaranteed.  Much depends on physical index
characteristics.

B-Tree preprocessing is optimistic about skipping working out: it
applies static, generic rules when determining where to generate skip
arrays, which assumes that the runtime overhead of maintaining skip
arrays will pay for itself -- or lead to only a modest performance loss.
As things stand, these assumptions are much too optimistic: skip array
maintenance will lead to unacceptable regressions with unsympathetic
queries (queries whose scan can't skip over many irrelevant leaf pages).
An upcoming commit will address the problems in this area by enhancing
_bt_readpage's approach to saving cycles on scan key evaluation, making
it work in a way that directly considers the needs of = array keys
(particularly = skip array keys).

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Masahiro Ikeda <masahiro.ikeda@nttdata.com>
Reviewed-By: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Reviewed-By: Matthias van de Meent <boekewurm+postgres@gmail.com>
Reviewed-By: Tomas Vondra <tomas@vondra.me>
Reviewed-By: Aleksander Alekseev <aleksander@timescale.com>
Reviewed-By: Alena Rybakina <a.rybakina@postgrespro.ru>
Discussion: https://postgr.es/m/CAH2-Wzmn1YsLzOGgjAQZdn1STSG_y8qP__vggTaPAYXJP+G4bw@mail.gmail.com
2025-04-04 12:27:04 -04:00
..
.gitignore Generate automatically code and documentation related to wait events 2023-07-05 10:53:11 +09:00
acl.h Update copyright for 2025 2025-01-01 11:21:55 -05:00
aclchk_internal.h Update copyright for 2025 2025-01-01 11:21:55 -05:00
array.h Update copyright for 2025 2025-01-01 11:21:55 -05:00
arrayaccess.h Update copyright for 2025 2025-01-01 11:21:55 -05:00
ascii.h Update copyright for 2025 2025-01-01 11:21:55 -05:00
attoptcache.h Update copyright for 2025 2025-01-01 11:21:55 -05:00
backend_progress.h Update copyright for 2025 2025-01-01 11:21:55 -05:00
backend_status.h Allow plugins to set a 64-bit plan identifier in PlannedStmt 2025-03-24 13:23:42 +09:00
builtins.h Update copyright for 2025 2025-01-01 11:21:55 -05:00
bytea.h Update copyright for 2025 2025-01-01 11:21:55 -05:00
cash.h Convert *GetDatum() and DatumGet*() macros to inline functions 2022-09-27 20:50:21 +02:00
catcache.h Fix catcache invalidation of a list entry that's being built 2025-01-14 14:28:49 +02:00
combocid.h Update copyright for 2025 2025-01-01 11:21:55 -05:00
conffiles.h Update copyright for 2025 2025-01-01 11:21:55 -05:00
date.h Update copyright for 2025 2025-01-01 11:21:55 -05:00
datetime.h Seek zone abbreviations in the IANA data before timezone_abbreviations. 2025-01-16 14:11:19 -05:00
datum.h Update copyright for 2025 2025-01-01 11:21:55 -05:00
dsa.h Get rid of our dependency on type "long" for memory size calculations. 2025-01-31 13:52:40 -05:00
dynahash.h Update copyright for 2025 2025-01-01 11:21:55 -05:00
elog.h Add errhint_internal() 2025-03-30 16:10:51 -04:00
evtcache.h Update copyright for 2025 2025-01-01 11:21:55 -05:00
expandeddatum.h Update copyright for 2025 2025-01-01 11:21:55 -05:00
expandedrecord.h Update copyright for 2025 2025-01-01 11:21:55 -05:00
float.h pg_noreturn to replace pg_attribute_noreturn() 2025-03-13 12:37:26 +01:00
fmgrtab.h Update copyright for 2025 2025-01-01 11:21:55 -05:00
formatting.h Add SQL function CASEFOLD(). 2025-01-24 14:56:22 -08:00
freepage.h Update copyright for 2025 2025-01-01 11:21:55 -05:00
funccache.h Change SQL-language functions to use the plan cache. 2025-04-02 14:06:02 -04:00
geo_decls.h Update copyright for 2025 2025-01-01 11:21:55 -05:00
guc.h aio: Basic subsystem initialization 2025-03-17 18:51:33 -04:00
guc_hooks.h Enable IO concurrency on all systems 2025-03-30 19:16:47 -04:00
guc_tables.h Add vacuum_truncate configuration parameter. 2025-03-20 10:16:50 -05:00
help_config.h pg_noreturn to replace pg_attribute_noreturn() 2025-03-13 12:37:26 +01:00
hsearch.h Revert "Improve accounting for memory used by shared hash tables" 2025-04-04 04:43:50 +02:00
index_selfuncs.h Update copyright for 2025 2025-01-01 11:21:55 -05:00
inet.h Update copyright for 2025 2025-01-01 11:21:55 -05:00
injection_point.h Update copyright for 2025 2025-01-01 11:21:55 -05:00
inval.h Avoid invalidating all RelationSyncCache entries on publication rename. 2025-03-13 09:16:33 +05:30
json.h Update copyright for 2025 2025-01-01 11:21:55 -05:00
jsonb.h Update copyright for 2025 2025-01-01 11:21:55 -05:00
jsonfuncs.h Update copyright for 2025 2025-01-01 11:21:55 -05:00
jsonpath.h Update copyright for 2025 2025-01-01 11:21:55 -05:00
logtape.h Update copyright for 2025 2025-01-01 11:21:55 -05:00
lsyscache.h Add some opfamily support functions to lsyscache.c 2025-03-18 11:17:43 +01:00
memdebug.h Update copyright for 2025 2025-01-01 11:21:55 -05:00
memutils.h Update copyright for 2025 2025-01-01 11:21:55 -05:00
memutils_internal.h pg_noreturn to replace pg_attribute_noreturn() 2025-03-13 12:37:26 +01:00
memutils_memorychunk.h Update copyright for 2025 2025-01-01 11:21:55 -05:00
meson.build Update copyright for 2025 2025-01-01 11:21:55 -05:00
multirangetypes.h Update copyright for 2025 2025-01-01 11:21:55 -05:00
numeric.h Update copyright for 2025 2025-01-01 11:21:55 -05:00
palloc.h Swap order of extern/static and pg_nodiscard 2025-03-14 07:18:07 +01:00
partcache.h Update copyright for 2025 2025-01-01 11:21:55 -05:00
pg_crc.h Update copyright for 2025 2025-01-01 11:21:55 -05:00
pg_locale.h Use thread-safe strftime_l() instead of strftime(). 2025-03-28 07:13:43 +01:00
pg_lsn.h Update copyright for 2025 2025-01-01 11:21:55 -05:00
pg_rusage.h Update copyright for 2025 2025-01-01 11:21:55 -05:00
pgstat_internal.h Add WAL data to backend statistics 2025-03-11 09:04:11 +09:00
pgstat_kind.h Move information about pgstats kinds into its own header pgstat_kind.h 2025-01-14 12:43:07 +09:00
pidfile.h Update copyright for 2025 2025-01-01 11:21:55 -05:00
plancache.h Change SQL-language functions to use the plan cache. 2025-04-02 14:06:02 -04:00
portal.h Don't lock partitions pruned by initial pruning 2025-02-20 17:09:48 +09:00
ps_status.h Speedup and increase usability of set proc title functions 2023-02-20 16:18:27 +13:00
queryenvironment.h Update copyright for 2025 2025-01-01 11:21:55 -05:00
rangetypes.h Update copyright for 2025 2025-01-01 11:21:55 -05:00
regproc.h Update copyright for 2025 2025-01-01 11:21:55 -05:00
rel.h Add support for NOT ENFORCED in foreign key constraints 2025-04-02 13:36:44 +02:00
relcache.h Update copyright for 2025 2025-01-01 11:21:55 -05:00
relfilenumbermap.h Update copyright for 2025 2025-01-01 11:21:55 -05:00
relmapper.h Update copyright for 2025 2025-01-01 11:21:55 -05:00
relptr.h Update copyright for 2025 2025-01-01 11:21:55 -05:00
reltrigger.h Update copyright for 2025 2025-01-01 11:21:55 -05:00
resowner.h aio: Basic subsystem initialization 2025-03-17 18:51:33 -04:00
rls.h Update copyright for 2025 2025-01-01 11:21:55 -05:00
ruleutils.h Improve EXPLAIN's display of window functions. 2025-03-11 11:19:54 -04:00
sampling.h Update copyright for 2025 2025-01-01 11:21:55 -05:00
selfuncs.h Use extended stats for precise estimation of bucket size in hash join 2025-03-10 13:42:01 +02:00
sharedtuplestore.h Update copyright for 2025 2025-01-01 11:21:55 -05:00
skipsupport.h Add nbtree skip scan optimization. 2025-04-04 12:27:04 -04:00
snapmgr.h Update copyright for 2025 2025-01-01 11:21:55 -05:00
snapshot.h Update copyright for 2025 2025-01-01 11:21:55 -05:00
sortsupport.h Simplify and generalize PrepareSortSupportFromIndexRel() 2025-03-14 10:34:08 +01:00
spccache.h Update copyright for 2025 2025-01-01 11:21:55 -05:00
syscache.h IWYU widely useful pragmas 2025-01-15 18:57:53 +01:00
timeout.h Update copyright for 2025 2025-01-01 11:21:55 -05:00
timestamp.h Add connection establishment duration logging 2025-03-12 11:35:27 -04:00
tuplesort.h Allow parallel CREATE INDEX for GIN indexes 2025-03-03 16:53:06 +01:00
tuplestore.h Update copyright for 2025 2025-01-01 11:21:55 -05:00
typcache.h Introduce a SQL-callable function array_sort(anyarray). 2025-04-01 18:03:55 -04:00
tzparser.h Update copyright for 2025 2025-01-01 11:21:55 -05:00
usercontext.h Perform logical replication actions as the table owner. 2023-04-04 11:25:23 -04:00
uuid.h Update copyright for 2025 2025-01-01 11:21:55 -05:00
varbit.h Update copyright for 2025 2025-01-01 11:21:55 -05:00
varlena.h Update copyright for 2025 2025-01-01 11:21:55 -05:00
wait_event.h Update copyright for 2025 2025-01-01 11:21:55 -05:00
xid8.h Update copyright for 2025 2025-01-01 11:21:55 -05:00
xml.h Update copyright for 2025 2025-01-01 11:21:55 -05:00