Implementation of SQL property graph queries, according to SQL/PGQ
standard (ISO/IEC 9075-16:2023).
This adds:
- GRAPH_TABLE table function for graph pattern matching
- DDL commands CREATE/ALTER/DROP PROPERTY GRAPH
- several new system catalogs and information schema views
- psql \dG command
- pg_get_propgraphdef() function for pg_dump and psql
A property graph is a relation with a new relkind RELKIND_PROPGRAPH.
It acts like a view in many ways. It is rewritten to a standard
relational query in the rewriter. Access privileges act similar to a
security invoker view. (The security definer variant is not currently
implemented.)
Starting documentation can be found in doc/src/sgml/ddl.sgml and
doc/src/sgml/queries.sgml.
Author: Peter Eisentraut <peter@eisentraut.org>
Author: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Junwang Zhao <zhjwpku@gmail.com>
Reviewed-by: Ajay Pal <ajay.pal.k@gmail.com>
Reviewed-by: Henson Choi <assam258@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/a855795d-e697-4fa5-8698-d20122126567@eisentraut.org
An extension (or core code) might want to reconstruct the planner's
decisions about whether and where to perform partitionwise joins from
the final plan. To do so, it must be possible to find all of the RTIs
of partitioned tables appearing in the plan. But when an AppendPath
or MergeAppendPath pulls up child paths from a subordinate AppendPath
or MergeAppendPath, the RTIs of the subordinate path do not appear
in the final plan, making this kind of reconstruction impossible.
To avoid this, propagate the RTI sets that would have been present
in the 'apprelids' field of the subordinate Append or MergeAppend
nodes that would have been created into the surviving Append or
MergeAppend node, using a new 'child_append_relid_sets' field for
that purpose. The value of this field is a list of Bitmapsets,
because each relation whose append-list was pulled up had its own
set of RTIs: just one, if it was a partitionwise scan, or more than
one, if it was a partitionwise join. Since our goal is to see where
partitionwise joins were done, it is essential to avoid losing the
information about how the RTIs were grouped in the pulled-up
relations.
This commit also updates pg_overexplain so that EXPLAIN (RANGE_TABLE)
will display the saved RTI sets.
Co-authored-by: Robert Haas <rhaas@postgresql.org>
Co-authored-by: Lukas Fittl <lukas@fittl.com>
Reviewed-by: Lukas Fittl <lukas@fittl.com>
Reviewed-by: Jakub Wartak <jakub.wartak@enterprisedb.com>
Reviewed-by: Greg Burd <greg@burd.me>
Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Haibo Yan <tristan.yim@gmail.com>
Reviewed-by: Alexandra Wang <alexandra.wang.oss@gmail.com>
Discussion: http://postgr.es/m/CA+TgmoZ-Jh1T6QyWoCODMVQdhTUPYkaZjWztzP1En4=ZHoKPzw@mail.gmail.com
An extension (or core code) might want to reconstruct the planner's
choice of join order from the final plan. To do so, it must be possible
to find all of the RTIs that were part of the join problem in that plan.
Commit adbad833f3, together with the
earlier work in 8c49a484e8, is enough to
let us match up RTIs we see in the final plan with RTIs that we see
during the planning cycle, but we still have a problem if the planner
decides to drop some RTIs out of the final plan altogether.
To fix that, when setrefs.c removes a SubqueryScan, single-child Append,
or single-child MergeAppend from the final Plan tree, record the type of
the removed node and the RTIs that the removed node would have scanned
in the final plan tree. It would be natural to record this information
on the child of the removed plan node, but that would require adding an
additional pointer field to type Plan, which seems undesirable. So,
instead, store the information in a separate list that the executor need
never consult, and use the plan_node_id to identify the plan node with
which the removed node is logically associated.
Also, update pg_overexplain to display these details.
Reviewed-by: Lukas Fittl <lukas@fittl.com>
Reviewed-by: Jakub Wartak <jakub.wartak@enterprisedb.com>
Reviewed-by: Greg Burd <greg@burd.me>
Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Haibo Yan <tristan.yim@gmail.com>
Reviewed-by: Alexandra Wang <alexandra.wang.oss@gmail.com>
Discussion: http://postgr.es/m/CA+TgmoZ-Jh1T6QyWoCODMVQdhTUPYkaZjWztzP1En4=ZHoKPzw@mail.gmail.com
Suppose that we're currently planning a query and, when that same
query was previously planned and executed, we learned something about
how a certain table within that query should be planned. We want to
take note when that same table is being planned during the current
planning cycle, but this is difficult to do, because the RTI of the
table from the previous plan won't necessarily be equal to the RTI
that we see during the current planning cycle. This is because each
subquery has a separate range table during planning, but these are
flattened into one range table when constructing the final plan,
changing RTIs.
Commit 8c49a484e8 allows us to match up
subqueries seen in the previous planning cycles with the subqueries
currently being planned just by comparing textual names, but that's
not quite enough to let us deduce anything about individual tables,
because we don't know where each subquery's range table appears in
the final, flattened range table.
To fix that, store a list of SubPlanRTInfo objects in the final
planned statement, each including the name of the subplan, the offset
at which it begins in the flattened range table, and whether or not
it was a dummy subplan -- if it was, some RTIs may have been dropped
from the final range table, but also there's no need to control how
a dummy subquery gets planned. The toplevel subquery has no name and
always begins at rtoffset 0, so we make no entry for it.
This commit teaches pg_overexplain's RANGE_TABLE option to make use
of this new data to display the subquery name for each range table
entry.
Reviewed-by: Lukas Fittl <lukas@fittl.com>
Reviewed-by: Jakub Wartak <jakub.wartak@enterprisedb.com>
Reviewed-by: Greg Burd <greg@burd.me>
Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Haibo Yan <tristan.yim@gmail.com>
Reviewed-by: Alexandra Wang <alexandra.wang.oss@gmail.com>
Discussion: http://postgr.es/m/CA+TgmoZ-Jh1T6QyWoCODMVQdhTUPYkaZjWztzP1En4=ZHoKPzw@mail.gmail.com
Result nodes now include an RTI set, which is only non-NULL when they
have no subplan, and is taken from the relid set of the RelOptInfo that
the Result is generating. ExplainPreScanNode now takes notice of these
RTIs, which means that a few things get schema-qualified in the
regression tests that previously did not. This makes the output more
consistent between cases where some part of the plan tree is replaced by
a Result node and those where this does not happen.
Likewise, pg_overexplain's EXPLAIN (RANGE_TABLE) now displays the RTIs
stored in a Result node just as it already does for other RTI-bearing
node types.
Result nodes also now include a result_reason, which tells us something
about why the Result node was inserted. Using that information, EXPLAIN
now emits, where relevant, a "Replaces" line describing the origin of
a Result node.
The purpose of these changes is to allow code that inspects a Plan
tree to understand the origin of Result nodes that appear therein.
Discussion: http://postgr.es/m/CA+TgmoYeUZePZWLsSO+1FAN7UPePT_RMEZBKkqYBJVCF1s60=w@mail.gmail.com
Reviewed-by: Alexandra Wang <alexandra.wang.oss@gmail.com>
Reviewed-by: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Junwang Zhao <zhjwpku@gmail.com>
This commit reverts the two following commits:
- 499edb0974, track more precisely query locations for nested
statements.
- 06450c7b8c, a follow-up fix of 499edb0974 with query locations.
The test introduced in this commit is not reverted. This is proving
useful to track a problem that only pgaudit was able to detect.
These prove to have issues with the tracking of SELECT statements, when
these use multiple parenthesis which is something supported by the
grammar. Incorrect location and lengths are causing pg_stat_statements
to become confused, failing its job in query normalization with
potential out-of-bound writes because the location and the length may
not match with what can be handled. A lot of the query patterns
discussed when this issue was reported have no test coverage in the main
regression test suite, or the recovery test 027_stream_regress.pl would
have caught the problems as pg_stat_statements is loaded by the node
running the regression tests. A first step would be to improve the test
coverage to stress more the query normalization logic.
A different portion of this work was done in 45e0ba30fc, with the
addition of tests for nested queries. These can be left in the tree.
They are useful to track the way inner queries are currently tracked by
PGSS with non-top-level entries, and will be useful when reconsidering
in the future the work reverted here.
Reported-by: Alexander Kozhemyakin <a.kozhemyakin@postgrespro.ru>
Discussion: https://postgr.es/m/18947-cdd2668beffe02bf@postgresql.org
The statement location calculated for some nested query cases was wrong
when multiple queries are sent as a single string, these being separated
by semicolons. As pointed by Sami Imseih, the location calculation was
incorrect when the last query of nested statement with multiple queries
does **NOT** finish with a semicolon for the last statement. In this
case, the statement length tracked by RawStmt is 0, which is equivalent
to say that the string should be used until its end. The code
previously discarded this case entirely, causing the location to remain
at 0, the same as pointing at the beginning of the string. This caused
pg_stat_statements to store incorrect query strings.
This issue has been introduced in 499edb0974. I have looked at the
diffs generated by pgaudit back then, and noticed the difference
generated for this nested query case, but I have missed the point that
it was an actual regression with an existing case. A test case is added
in pg_stat_statements to provide some coverage, restoring the pre-17
behavior for the calculation of the query locations. Special thanks to
David Steele, who, through an analysis of the test diffs generated by
pgaudit with the new v18 logic, has poked me about the fact that my
original analysis of the matter was wrong.
The test output of pg_overexplain is updated to reflect the new logic,
as the new locations refer to the beginning of the argument passed to
the function explain_filter(). When the module was introduced in
8d5ceb113e, which was after 499edb0974 (for the new calculation
method), the locations of the test were not actually right: the plan
generated for the query string given in input of the function pointed to
the top-level query, not the nested one.
Reported-by: David Steele <david@pgbackrest.org>
Author: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com>
Reviewed-by: Jian He <jian.universality@gmail.com>
Reviewed-by: Sami Imseih <samimseih@gmail.com>
Reviewed-by: David Steele <david@pgbackrest.org>
Discussion: https://postgr.es/m/844a3b38-bbf1-4fb2-9fd6-f58c35c09917@pgbackrest.org
It added bogus whitespace at the end of a line in the documentation.
It should not have done that.
The pg_overexplain tests must SET debug_parallel_query = false,
not just RESET debug_parallel_query, or we get failures on test
machines that make debug_parallel_query = true the defualt.
There's a fair amount of information in the Plan and PlanState trees
that isn't printed by any existing EXPLAIN option. This means that,
when working on the planner, it's often necessary to rely on facilities
such as debug_print_plan, which produce excessively voluminous
output. Hence, use the new EXPLAIN extension facilities to implement
EXPLAIN (DEBUG) and EXPLAIN (RANGE_TABLE) as extensions to the core
EXPLAIN facility.
A great deal more could be done here, and the specific choices about
what to print and how are definitely arguable, but this is at least
a starting point for discussion and a jumping-off point for possible
future improvements.
Reviewed-by: Sami Imseih <samimseih@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviweed-by: Andrei Lepikhov <lepihov@gmail.com> (who didn't like it)
Discussion: http://postgr.es/m/CA+TgmoZfvQUBWQ2P8iO30jywhfEAKyNzMZSR+uc2xr9PZBw6eQ@mail.gmail.com