Commit graph

481 commits

Author SHA1 Message Date
Daniel Gustafsson
390b3cbbb2 Protect against small overread in SASLprep validation
In case of torn UTF8 in the input data we might end up going
past the end of the string since we don't account for length.
While validation won't be performed on a sequence with a NULL
byte it's better to avoid going past the end to beging with.
Fix by taking the length into consideration.

Author: Jacob Champion <jacob.champion@enterprisedb.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/CAOYmi+mTnmM172g=_+Yvc47hzzeAsYPy2C4UBY3HK9p-AXNV0g@mail.gmail.com
2024-09-10 11:02:28 +02:00
Michael Paquier
c7cd2d6ed0 Define PG_TBLSPC_DIR for path pg_tblspc/ in data folder
Similarly to 2065ddf5e3, this introduces a define for "pg_tblspc".
This makes the style more consistent with the existing PG_STAT_TMP_DIR,
for example.

There is a difference with the other cases with the introduction of
PG_TBLSPC_DIR_SLASH, required in two places for recovery and backups.

Author: Bertrand Drouvot
Reviewed-by: Ashutosh Bapat, Álvaro Herrera, Yugo Nagata, Michael
Paquier
Discussion: https://postgr.es/m/ZryVvjqS9SnV1GPP@ip-10-97-1-34.eu-west-3.compute.internal
2024-09-03 09:11:54 +09:00
Daniel Gustafsson
a70e01d430 Remove support for OpenSSL older than 1.1.0
OpenSSL 1.0.2 has been EOL from the upstream OpenSSL project for
some time, and is no longer the default OpenSSL version with any
vendor which package PostgreSQL. By retiring support for OpenSSL
1.0.2 we can remove a lot of no longer required complexity for
managing state within libcrypto which is now handled by OpenSSL.

Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/ZG3JNursG69dz1lr@paquier.xyz
Discussion: https://postgr.es/m/CA+hUKGKh7QrYzu=8yWEUJvXtMVm_CNWH1L_TLWCbZMwbi1XP2Q@mail.gmail.com
2024-09-02 13:51:48 +02:00
Peter Eisentraut
edee0c621d Message style improvements 2024-08-29 14:43:34 +02:00
Peter Eisentraut
f1976df5ea Remove fe_memutils from libpgcommon_shlib
libpq must not use palloc/pfree. It's not allowed to exit on allocation
failure, and mixing the frontend pfree with malloc is architecturally
unsound.

Remove fe_memutils from the shlib build entirely, to keep devs from
accidentally depending on it in the future.

Author: Jacob Champion <jacob.champion@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/CAOYmi+=pg=W5L1h=3MEP_EB24jaBu2FyATrLXqQHGe7cpuvwyg@mail.gmail.com
2024-08-12 08:30:39 +02:00
Peter Eisentraut
94980c4567 Remove support for old realpath() API
The now preferred way to call realpath() is by passing NULL as the
second argument and get a malloc'ed result.  We still supported the
old way of providing our own buffer as a second argument, for some
platforms that didn't support the new way yet.  Those were only
Solaris less than version 11 and some older AIX versions (7.1 and
newer appear to support the new variant).  We don't support those
platforms versions anymore, so we can remove this extra code.

Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://www.postgresql.org/message-id/flat/9e638b49-5c3f-470f-a392-2cbedb2f7855%40eisentraut.org
2024-08-12 08:04:35 +02:00
Heikki Linnakangas
2676040df0 Make fallback MD5 implementation thread-safe on big-endian systems
Replace a static scratch buffer with a local variable, because a
static buffer makes the function not thread-safe. This function is
used in client-code in libpq, so it needs to be thread-safe. It was
until commit b67b57a966, which replaced the implementation with the
one from pgcrypto.

Backpatch to v14, where we switched to the new implementation.

Reviewed-by: Robert Haas, Michael Paquier
Discussion: https://www.postgresql.org/message-id/dfa2015d-ad21-4802-a4cc-3850fc5fff3f@iki.fi
2024-08-07 10:43:52 +03:00
Heikki Linnakangas
fe8dd65bf2 Mark misc static global variables as const
Reviewed-by: Andres Freund
Discussion: https://www.postgresql.org/message-id/54c29fb0-edf2-48ea-9814-44e918bbd6e8@iki.fi
2024-08-06 23:04:48 +03:00
Heikki Linnakangas
85829c973c Make nullSemAction const, add 'const' decorators to related functions
To make it more clear that these should never be modified.

Reviewed-by: Andres Freund
Discussion: https://www.postgresql.org/message-id/54c29fb0-edf2-48ea-9814-44e918bbd6e8@iki.fi
2024-08-06 23:04:22 +03:00
Peter Eisentraut
c27090bd60 Convert some extern variables to static, Windows code
Similar to 720b0eaae9, discovered by MinGW.
2024-08-01 13:28:32 +02:00
Peter Eisentraut
5d2e1cc117 Replace some strtok() with strsep()
strtok() considers adjacent delimiters to be one delimiter, which is
arguably the wrong behavior in some cases.  Replace with strsep(),
which has the right behavior: Adjacent delimiters create an empty
token.

Affected by this are parsing of:

- Stored SCRAM secrets
  ("SCRAM-SHA-256$<iterations>:<salt>$<storedkey>:<serverkey>")

- ICU collation attributes
  ("und@colStrength=primary;colCaseLevel=yes") for ICU older than
  version 54

- PG_COLORS environment variable
  ("error=01;31:warning=01;35:note=01;36:locus=01")

- pg_regress command-line options with comma-separated list arguments
  (--dbname, --create-role) (currently only used pg_regress_ecpg)

Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-by: David Steele <david@pgmasters.net>
Discussion: https://www.postgresql.org/message-id/flat/79692bf9-17d3-41e6-b9c9-fc8c3944222a@eisentraut.org
2024-07-22 15:45:46 +02:00
Amit Langote
519d710720 Typo fix
Reported-by: Junwang Zhao <zhjwpku@gmail.com>
Discussion: https://postgr.es/m/CAEG8a3KPi=LayiTwJ11ikF7bcqnZUrcj8NgX0V8nO1mQKZ9GfQ@mail.gmail.com
Backpatch-through: 17
2024-07-08 22:12:55 +09:00
David Rowley
1029bdec2d Improve enlargeStringInfo's ERROR message
Until now, when an enlargeStringInfo() call would cause the StringInfo to
exceed its maximum size, we reported an "out of memory" error.  This is
misleading as it's no such thing.

Here we remove the "out of memory" text and replace it with something
more relevant to better indicate that it's a program limitation that's
been reached.

Reported-by: Michael Banck
Reviewed-by: Daniel Gustafsson, Tom Lane
Discussion: https://postgr.es/m/18484-3e357ade5fe50e61@postgresql.org
2024-07-01 12:11:10 +12:00
Peter Eisentraut
02bbc3c83a parse_manifest: Use const char *
This adapts the manifest parsing code to take advantage of the
const-ified jsonapi.

Reviewed-by: Andrew Dunstan <andrew@dunslane.net>
Discussion: https://www.postgresql.org/message-id/flat/f732b014-f614-4600-a437-dba5a2c3738b%40eisentraut.org
2024-06-21 07:53:30 +02:00
Peter Eisentraut
15cd9a3881 jsonapi: Use const char *
Apply const qualifiers to char * arguments and fields throughout the
jsonapi.  This allows the top-level APIs such as
pg_parse_json_incremental() to declare their input argument as const.
It also reduces the number of unconstify() calls.

Reviewed-by: Andrew Dunstan <andrew@dunslane.net>
Discussion: https://www.postgresql.org/message-id/flat/f732b014-f614-4600-a437-dba5a2c3738b%40eisentraut.org
2024-06-21 07:53:30 +02:00
Peter Eisentraut
0b06bf9fa9 jsonapi: Use size_t
Use size_t instead of int for object sizes in the jsonapi.  This makes
the API better self-documenting.

Reviewed-by: Andrew Dunstan <andrew@dunslane.net>
Discussion: https://www.postgresql.org/message-id/flat/f732b014-f614-4600-a437-dba5a2c3738b%40eisentraut.org
2024-06-21 07:53:30 +02:00
Peter Eisentraut
da32f5c4bc jsonapi: Some message style fixes
Add missing punctuation, and un-gettext-mark an internal error.
2024-05-23 09:22:28 +02:00
Peter Eisentraut
cc70e170c0 Make all Perl warnings fatal, catch-up
Apply c538592959 to new Perl files that had missed the note.
2024-05-15 10:10:19 +02:00
David Rowley
8cb653b245 Fix incorrect year in some copyright notices added this year
A few patches that were written <= 2023 and committed in 2024 still
contained 2023 copyright year.  Fix that.

Discussion: https://postgr.es/m/CAApHDvr5egyW3FmHbAg-Uq2p_Aizwco1Zjs6Vbq18KqN64-hRA@mail.gmail.com
2024-05-15 15:01:21 +12:00
Tom Lane
da256a4a7f Pre-beta mechanical code beautification.
Run pgindent, pgperltidy, and reformat-dat-files.

The pgindent part of this is pretty small, consisting mainly of
fixing up self-inflicted formatting damage from patches that
hadn't bothered to add their new typedefs to typedefs.list.
In order to keep it from making anything worse, I manually added
a dozen or so typedefs that appeared in the existing typedefs.list
but not in the buildfarm's list.  Perhaps we should formalize that,
or better find a way to get those typedefs into the automatic list.

pgperltidy is as opinionated as always, and reformat-dat-files too.
2024-05-14 16:34:50 -04:00
Peter Eisentraut
3ddbac368c Add missing gettext triggers
Commit d6607016c7 moved all the jsonapi.c error messages into
token_error().  This needs to be added to the various nls.mk files
that use this.  Since that makes token_error() effectively a globally
known symbol, the name seems a bit too general, so rename to
json_token_error() for more clarity.
2024-05-14 12:57:22 +02:00
Michael Paquier
855517307d Fix overread in JSON parsing errors for incomplete byte sequences
json_lex_string() relies on pg_encoding_mblen_bounded() to point to the
end of a JSON string when generating an error message, and the input it
uses is not guaranteed to be null-terminated.

It was possible to walk off the end of the input buffer by a few bytes
when the last bytes consist of an incomplete multi-byte sequence, as
token_terminator would point to a location defined by
pg_encoding_mblen_bounded() rather than the end of the input.  This
commit switches token_terminator so as the error uses data up to the
end of the JSON input.

More work should be done so as this code could rely on an equivalent of
report_invalid_encoding() so as incorrect byte sequences can show in
error messages in a readable form.  This requires work for at least two
cases in the JSON parsing API: an incomplete token and an invalid escape
sequence.  A more complete solution may be too invasive for a backpatch,
so this is left as a future improvement, taking care of the overread
first.

A test is added on HEAD as test_json_parser makes this issue
straight-forward to check.

Note that pg_encoding_mblen_bounded() no longer has any callers.  This
will be removed on HEAD with a separate commit, as this is proving to
encourage unsafe coding.

Author: Jacob Champion
Discussion: https://postgr.es/m/CAOYmi+ncM7pwLS3AnKCSmoqqtpjvA8wmCdoBtKA3ZrB2hZG6zA@mail.gmail.com
Backpatch-through: 13
2024-05-09 12:45:37 +09:00
Andrew Dunstan
e00b4f79e7 Remove redundant JSON parser typedefs
JsonNonTerminal and JsonParserSem were added in commit 3311ea86ed

These names of these two enums are not actually used, so there is no
need for typedefs. Instead use plain enums to declare the constants.

Noticed by Alvaro Herera.
2024-04-27 07:02:57 -04:00
Peter Eisentraut
7e44ac3741 Update src/common/unicode/.gitignore
for new downloaded files and new build results.
2024-04-22 09:16:33 +02:00
Daniel Gustafsson
950d4a2cb1 Fix typos and duplicate words
This fixes various typos, duplicated words, and tiny bits of whitespace
mainly in code comments but also in docs.

Author: Daniel Gustafsson <daniel@yesql.se>
Author: Heikki Linnakangas <hlinnaka@iki.fi>
Author: Alexander Lakhin <exclusion@gmail.com>
Author: David Rowley <dgrowleyml@gmail.com>
Author: Nazir Bilal Yavuz <byavuz81@gmail.com>
Discussion: https://postgr.es/m/3F577953-A29E-4722-98AD-2DA9EFF2CBB8@yesql.se
2024-04-18 21:28:07 +02:00
Andrew Dunstan
42fa4b6601 Assorted minor cleanups in the test_json_parser module
Per gripes from Michael Paquier

Discussion: https://postgr.es/m/ZhTQ6_w1vwOhqTQI@paquier.xyz

Along the way, also clean up a handful of typos in 3311ea86ed and
ea7b4e9a2a, found by Alexander Lakhin, and a couple of stylistic
snafus noted by Daniel Westermann and Daniel Gustafsson.
2024-04-12 10:32:30 -04:00
Andrew Dunstan
661ab4e185 Fix some memory leaks associated with parsing json and manifests
Coverity complained about not freeing some memory associated with
incrementally parsing backup manifests. To fix that, provide and use a new
shutdown function for the JsonManifestParseIncrementalState object, in
line with a suggestion from Tom Lane.

While analysing the problem, I noticed a buglet in freeing memory for
incremental json lexers. To fix that remove a bogus condition on
freeing the memory allocated for them.
2024-04-12 10:32:30 -04:00
Masahiko Sawada
810f64a015 Revert indexed and enlargable binary heap implementation.
This reverts commit b840508644 and bcb14f4abc. These commits were made
for commit 5bec1d6bc5 (Improve eviction algorithm in ReorderBuffer
using max-heap for many subtransactions). However, per discussion,
commit efb8acc0d0 replaced binary heap + index with pairing heap, and
made these commits unnecessary.

Reported-by: Jeff Davis
Discussion: https://postgr.es/m/12747c15811d94efcc5cda72d6b35c80d7bf3443.camel%40j-davis.com
2024-04-11 17:18:05 +09:00
Andrew Dunstan
c3e60f3d7e Silence some compiler warnings in commit 3311ea86ed
Per report from Nathan Bossart
2024-04-05 16:08:40 -04:00
Robert Haas
55a5ee30cd Fix incorrect calculation in BlockRefTableEntryGetBlocks.
The previous formula was incorrect in the case where the function's
nblocks argument was a multiple of BLOCKS_PER_CHUNK, which happens
whenever a relation segment file is exactly 512MB or exactly 1GB in
length. In such cases, the formula would calculate a stop_offset of
0 rather than 65536, resulting in modified blocks in the second half
of a 1GB file, or all the modified blocks in a 512MB file, being
omitted from the incremental backup.

Reported off-list by Tomas Vondra and Jakub Wartak.

Discussion: http://postgr.es/m/CA+TgmoYwy_KHp1-5GYNmVa=zdeJWhNH1T0SBmEuvqQNJEHj1Lw@mail.gmail.com
2024-04-05 13:41:19 -04:00
Andrew Dunstan
88620824c2 Tidy up after incremental JSON parser patch
Remove junk left over from non-vpath builds.

Try to remedy gettext error on some platforms.
2024-04-04 12:41:55 -04:00
Andrew Dunstan
1b00fe30a6 Fix warnings re typedef redefinition in ea7b4e9a2a and 3311ea86ed
Per gripe from Tom Lane and the buildfarm
2024-04-04 11:36:26 -04:00
Andrew Dunstan
ea7b4e9a2a Add support for incrementally parsing backup manifests
This adds the infrastructure for using the new non-recursive JSON parser
in processing manifests. It's important that callers make sure that the
last piece of json handed to the incremental manifest parser contains
the entire last few lines of the manifest, including the checksum.

Author: Andrew Dunstan
Reviewed-By: Jacob Champion

Discussion: https://postgr.es/m/7b0a51d6-0d9d-7366-3a1a-f74397a02f55@dunslane.net
2024-04-04 06:46:40 -04:00
Andrew Dunstan
3311ea86ed Introduce a non-recursive JSON parser
This parser uses an explicit prediction stack, unlike the present
recursive descent parser where the parser state is represented on the
call stack. This difference makes the new parser suitable for use in
incremental parsing of huge JSON documents that cannot be conveniently
handled piece-wise by the recursive descent parser. One potential use
for this will be in parsing large backup manifests associated with
incremental backups.

Because this parser is somewhat slower than the recursive descent
parser, it  is not replacing that parser, but is an additional parser
available to callers.

For testing purposes, if the build is done with -DFORCE_JSON_PSTACK, all
JSON parsing is done with the non-recursive parser, in which case only
trivial regression differences in error messages should be observed.

Author: Andrew Dunstan
Reviewed-By: Jacob Champion

Discussion: https://postgr.es/m/7b0a51d6-0d9d-7366-3a1a-f74397a02f55@dunslane.net
2024-04-04 06:46:40 -04:00
Masahiko Sawada
b840508644 Add functions to binaryheap for efficient key removal and update.
Previously, binaryheap didn't support updating a key and removing a
node in an efficient way. For example, in order to remove a node from
the binaryheap, the caller had to pass the node's position within the
array that the binaryheap internally has. Removing a node from the
binaryheap is done in O(log n) but searching for the key's position is
done in O(n).

This commit adds a hash table to binaryheap in order to track the
position of each nodes in the binaryheap. That way, by using newly
added functions such as binaryheap_update_up() etc., both updating a
key and removing a node can be done in O(1) on an average and O(log n)
in worst case. This is known as the indexed binary heap. The caller
can specify to use the indexed binaryheap by passing indexed = true.

The current code does not use the new indexing logic, but it will be
used by an upcoming patch.

Reviewed-by: Vignesh C, Peter Smith, Hayato Kuroda, Ajin Cherian,
Tomas Vondra, Shubham Khanna
Discussion: https://postgr.es/m/CAD21AoDffo37RC-eUuyHJKVEr017V2YYDLyn1xF_00ofptWbkg%40mail.gmail.com
2024-04-03 10:44:21 +09:00
Masahiko Sawada
bcb14f4abc Make binaryheap enlargeable.
The node array space of the binaryheap is doubled when there is no
available space.

Reviewed-by: Vignesh C, Peter Smith, Hayato Kuroda, Ajin Cherian,
Tomas Vondra, Shubham Khanna
Discussion: https://postgr.es/m/CAD21AoDffo37RC-eUuyHJKVEr017V2YYDLyn1xF_00ofptWbkg%40mail.gmail.com
2024-04-03 10:27:43 +09:00
Tom Lane
38698dd38e Unwind #if spaghetti in hmac_openssl.c a bit.
Make this code a little less confusing by defining a separate macro
that controls whether we'll use ResourceOwner facilities to track
the existence of a pg_hmac_ctx context.

The proximate reason to touch this is that since b8bff07da, we got
"unused function" warnings if building with older OpenSSL, because
the #if guards around the ResourceOwner wrapper function definitions
were different from those around the calls of those functions.
Pulling the ResourceOwner machinations outside of the #ifdef HAVE_xxx
guards fixes that and makes the code clearer too.

Discussion: https://postgr.es/m/1394271.1712016101@sss.pgh.pa.us
2024-04-02 10:41:44 -04:00
Jeff Davis
46e5441fa5 Add unicode_strtitle() for Unicode Default Case Conversion.
This brings the titlecasing implementation for the builtin provider
out of formatting.c and into unicode_case.c, along with
unicode_strlower() and unicode_strupper(). Accepts an arbitrary word
boundary callback.

Simple for now, but can be extended to support the Unicode Default
Case Conversion algorithm with full case mapping.

Discussion: https://postgr.es/m/3bc653b5d562ae9e2838b11cb696816c328a489a.camel@j-davis.com
Reviewed-by: Peter Eisentraut
2024-03-29 17:35:07 -07:00
Jeff Davis
b0be28761e Run perltidy on generate-unicode_version.pl. 2024-03-27 13:21:29 -07:00
Dean Rasheed
e6341323a8 Add functions to generate random numbers in a specified range.
This adds 3 new variants of the random() function:

    random(min integer, max integer) returns integer
    random(min bigint, max bigint) returns bigint
    random(min numeric, max numeric) returns numeric

Each returns a random number x in the range min <= x <= max.

For the numeric function, the number of digits after the decimal point
is equal to the number of digits that "min" or "max" has after the
decimal point, whichever has more.

The main entry points for these functions are in a new C source file.
The existing random(), random_normal(), and setseed() functions are
moved there too, so that they can all share the same PRNG state, which
is kept private to that file.

Dean Rasheed, reviewed by Jian He, David Zhang, Aleksander Alekseev,
and Tomas Vondra.

Discussion: https://postgr.es/m/CAEZATCV89Vxuq93xQdmc0t-0Y2zeeNQTdsjbmV7dyFBPykbV4Q@mail.gmail.com
2024-03-27 10:12:39 +00:00
Jeff Davis
503c0ad976 Fix convert_case(), introduced in 5c40364dd6.
Check source length before checking for NUL terminator to avoid
reading one byte past the string end. Also fix unreachable bug when
caller does not expect NUL-terminated result.

Add unit test coverage of convert_case() in case_test.c, which makes
it easier to reproduce the valgrind failure.

Discussion: https://postgr.es/m/7a9fd36d-7a38-4dc2-e676-fc939491a95a@gmail.com
Reported-by: Alexander Lakhin
2024-03-24 16:28:34 -07:00
Jeff Davis
9acae56ce0 Inline basic UTF-8 functions.
Shows a measurable speedup when processing UTF-8 data, such as with
the new builtin collation provider.

Discussion: https://postgr.es/m/163f4e2190cdf67f67016044e503c5004547e5a9.camel@j-davis.com
Reviewed-by: Peter Eisentraut
2024-03-20 09:40:57 -07:00
Jeff Davis
f9f3fb1cb7 Update src/common/unicode/README.
Change the test description to include the case mapping
test. Oversight in 5c40364dd6.
2024-03-18 16:39:29 -07:00
Daniel Gustafsson
d6607016c7 Support json_errdetail in FRONTEND code
Allocate memory for the error message inside memory owned by the
JsonLexContext and move responsibility away from the caller for
freeing it.  This means that we can partially revert b44669b2ca
as this is now safe to use in FRONTEND code.  The motivation for
this comes from the OAuth and incremental JSON patchsets but it
also adds value on its own.

Author: Jacob Champion <jacob.champion@enterprisedb.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CAOYmi+mWdTd6ujtyF7MsvXvk7ToLRVG_tYAcaGbQLvf=N4KrQw@mail.gmail.com
2024-03-17 23:56:15 +01:00
Daniel Gustafsson
b783186515 Add destroyStringInfo function for cleaning up StringInfos
destroyStringInfo() is a counterpart to makeStringInfo(), freeing a
palloc'd StringInfo and its data. This is a convenience function to
align the StringInfo API with the PQExpBuffer API. Originally added
in the OAuth patchset, it was extracted and committed separately in
order to aid upcoming JSON work.

Author: Daniel Gustafsson <daniel@yesql.se>
Author: Jacob Champion <jacob.champion@enterprisedb.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CAOYmi+mWdTd6ujtyF7MsvXvk7ToLRVG_tYAcaGbQLvf=N4KrQw@mail.gmail.com
2024-03-16 23:18:28 +01:00
Daniel Gustafsson
6b41ef0330 Fix documentation comment for pg_md5_hash
Commit b69aba7457 added the errstr parameter to pg_md5_hash but
missed updating the synopsis in the documentation comment.  The
follow-up commit 587de223f0 added the parameter to the list of
outputs.  The returnvalue had been changed from integer to bool
before that but remained in the synopsis.  This fixes both.

Author: Tatsuro Yamada <tatsuro.yamada@ntt.com>
Discussion: https://postgr.es/m/TYYPR01MB82313576150CC86084A122CD9E292@TYYPR01MB8231.jpnprd01.prod.outlook.com
2024-03-14 09:23:37 +01:00
Robert Haas
2041bc4276 Add the system identifier to backup manifests.
Before this patch, if you took a full backup on server A and then
tried to use the backup manifest to take an incremental backup on
server B, it wouldn't know that the manifest was from a different
server and so the incremental backup operation could potentially
complete without error. When you later tried to run pg_combinebackup,
you'd find out that your incremental backup was and always had been
invalid. That's poor timing, because nobody likes finding out about
backup problems only at restore time.

With this patch, you'll get an error when trying to take the (invalid)
incremental backup, which seems a lot nicer.

Amul Sul, revised by me. Review by Michael Paquier.

Discussion: http://postgr.es/m/CA+TgmoYLZzbSAMM3cAjV4Y+iCRZn-bR9H2+Mdz7NdaJFU1Zb5w@mail.gmail.com
2024-03-13 15:12:33 -04:00
Robert Haas
dbfc447165 Expose new function get_controlfile_by_exact_path().
This works just like get_controlfile(), but expects the path to the
control file rather than the path to the data directory that contains
the control file. This makes more sense in cases where the caller
has already constructed the path to the control file itself.

Amul Sul and Robert Haas, reviewed by Michael Paquier
2024-03-13 12:06:44 -04:00
Michael Paquier
2c8118ee5d Use printf's %m format instead of strerror(errno) in more places
Most callers of strerror() are removed from the backend code.  The
remaining callers require special handling with a saved errno from a
previous system call.  The frontend code still needs strerror() where
error states need to be handled outside of fprintf.

Note that pg_regress is not changed to use %m as the TAP output may
clobber errno, since those functions call fprintf() and friends before
evaluating the format string.

Support for %m in src/port/snprintf.c has been added in d6c55de1f9,
hence all the stable branches currently supported include it.

Author: Dagfinn Ilmari Mannsåker
Discussion: https://postgr.es/m/87sf13jhuw.fsf@wibble.ilmari.org
2024-03-12 10:02:54 +09:00
Jeff Davis
33ee2550d3 Fix type signedness error in commit 5c40364dd6.
Use ssize_t instead of size_t.

Discussion: https://postgr.es/m/b20d6d97-7338-48ea-ba33-837a1c8ef98e@iki.fi
Reported-by: Heikki Linnakangas
2024-03-08 16:00:46 -08:00
Daniel Gustafsson
be41a9b038 Fix errorhandling for reading from a pipe
When reading a line from a pipe failed on no data being read, the
errorhandling was erroneously logging with %m even thoug no error
description is available for %m to print.  This flaw accidentally
introduced in 5c7038d70b.

Reported-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/baa34329-f431-46af-bf74-1a78fdc90e4f@eisentraut.org
2024-03-08 22:53:06 +01:00
Daniel Gustafsson
6929e133b3 Replace perror with custom postgres logging
perror() is not used in postgres anymore out of policy, this replaces
the final callsites with the custom postgres logging framework.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/89B00F63-40F7-4D82-8353-DC9CABBAC1D1@yesql.se
2024-03-08 22:50:20 +01:00
Jeff Davis
5c40364dd6 Unicode case mapping tables and functions.
Implements Unicode simple case mapping, in which all code points map
to exactly one other code point unconditionally.

These tables are generated from UnicodeData.txt, which is already
being used by other infrastructure in src/common/unicode. The tables
are checked into the source tree, so they only need to be regenerated
when we update the Unicode version.

In preparation for the builtin collation provider, and possibly useful
for other callers.

Discussion: https://postgr.es/m/ff4c2f2f9c8fc7ca27c1c24ae37ecaeaeaff6b53.camel%40j-davis.com
Reviewed-by: Peter Eisentraut, Daniel Verite, Jeremy Schneider
2024-03-07 11:15:06 -08:00
Jeff Davis
ad49994538 Add Unicode property tables.
Provide functions to test for Unicode properties, such as Alphabetic
or Cased. These functions use tables derived from Unicode data files,
similar to the tables for Unicode normalization or general category,
and those tables can be updated with the 'update-unicode' build
target.

Use Unicode properties to provide functions to test for regex
character classes, like 'punct' or 'alnum'.

Infrastructure in preparation for a builtin collation provider, and
may also be useful for other callers.

Discussion: https://postgr.es/m/ff4c2f2f9c8fc7ca27c1c24ae37ecaeaeaff6b53.camel%40j-davis.com
Reviewed-by: Daniel Verite, Peter Eisentraut, Jeremy Schneider
2024-03-06 12:50:01 -08:00
Heikki Linnakangas
024c521117 Replace BackendIds with 0-based ProcNumbers
Now that BackendId was just another index into the proc array, it was
redundant with the 0-based proc numbers used in other places. Replace
all usage of backend IDs with proc numbers.

The only place where the term "backend id" remains is in a few pgstat
functions that expose backend IDs at the SQL level. Those IDs are now
in fact 0-based ProcNumbers too, but the documentation still calls
them "backend ids". That term still seems appropriate to describe what
the numbers are, so I let it be.

One user-visible effect is that pg_temp_0 is now a valid temp schema
name, for backend with ProcNumber 0.

Reviewed-by: Andres Freund
Discussion: https://www.postgresql.org/message-id/8171f1aa-496f-46a6-afc3-c46fe7a9b407@iki.fi
2024-03-03 19:38:22 +02:00
Michael Paquier
655dc31046 Simplify pg_enc2gettext_tbl[] with C99-designated initializer syntax
This commit switches pg_enc2gettext_tbl[] in encnames.c to use a
C99-designated initializer syntax.

pg_bind_textdomain_codeset() is simplified so as it is possible to do
a direct lookup at the gettext() array with a value of the enum pg_enc
rather than doing a loop through all its elements, as long as the
encoding value provided by GetDatabaseEncoding() is in the correct range
of supported encoding values.  Note that PG_MULE_INTERNAL gains a value
in the array, pointing to NULL.

Author: Jelte Fennema-Nio
Discussion: https://postgr.es/m/CAGECzQT3caUbcCcszNewCCmMbCuyP7XNAm60J3ybd6PN5kH2Dw@mail.gmail.com
2024-03-01 18:03:48 +09:00
Michael Paquier
ada87a4d95 Use C99-designated initializer syntax for arrays related to encodings
This updates the following lookup arrays to use C99-designated
initializer syntax, indexed based on the enum pg_enc:
pg_enc2icu_tbl[]
pg_enc2name_tbl[]
pg_wchar_table[]

This is more readable, and removes problems with ordering mistakes as
this removes dependencies between the arrays and their lookup index in
the enum pg_enc.  So, adding new encodings becomes easier, even if this
does not happen often.

Author: Jelte Fennema-Nio
Reviewed-by: Jian He, Japin Li
Discussion: https://postgr.es/m/CAGECzQT3caUbcCcszNewCCmMbCuyP7XNAm60J3ybd6PN5kH2Dw@mail.gmail.com
2024-02-29 09:54:25 +09:00
Michael Paquier
afd8ef3909 Use C99-designated initializer syntax for more arrays
This is in the same spirit as ef5e2e9085, updating this time some
arrays in parser.c, relpath.c, guc_tables.c and pg_dump_sort.c so as the
order of their elements has no need to match the enum structures they
are based on anymore.

Author: Jelte Fennema-Nio
Reviewed-by: Jian He, Japin Li
Discussion: https://postgr.es/m/CAGECzQT3caUbcCcszNewCCmMbCuyP7XNAm60J3ybd6PN5kH2Dw@mail.gmail.com
2024-02-28 08:42:36 +09:00
Daniel Gustafsson
5c7038d70b Refactor pipe_read_line to return the full line
Commit 5b2f4afffe refactored find_other_exec() and in the process
created pipe_read_line() into a static routine for reading a single
line of output, aimed at reading version numbers.  Commit a7e8ece41
later exposed it externally in order to read a postgresql.conf GUC
using "postgres -C ..".  Further, f06b1c598 also made use of it for
reading a version string much like find_other_exec().  The internal
variable remained "pgver", even when used for other purposes.

Since the function requires passing a buffer and its size, and at
most size - 1 bytes will be read via fgets(), there is a truncation
risk when using this for reading GUCs (like how pg_rewind does,
though the risk in this case is marginal).

To keep this as generic functionality for reading a line from a pipe,
this refactors pipe_read_line() into returning an allocated buffer
containing all of the line to remove the risk of silent truncation.

Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Discussion: https://postgr.es/m/DEDF73CE-D528-49A3-9089-B3592FD671A9@yesql.se
2024-02-09 15:03:16 +01:00
Nathan Bossart
97287bdfae Move is_valid_ascii() to ascii.h.
This function requires simd.h, which is a rather large dependency
for a widely-used header file like pg_wchar.h.  Furthermore, there
is a report of a third-party tool that is struggling to use
pg_wchar.h due to its dependence on simd.h (presumably because
simd.h uses several intrinsics).  Moving the function to the much
less popular ascii.h resolves these issues for now.

This commit is back-patched for the benefit of the aforementioned
third-party tool.  The simd.h dependency was only added in v16,
but we've opted to back-patch to v15 so that is_valid_ascii() lives
in the same file for all versions where it exists.  This could
break existing third-party code that uses the function, but we
couldn't find any examples of such code.  It should be possible to
fix any code that this commit breaks by including ascii.h in the
file that uses is_valid_ascii().

Author: Jubilee Young
Reviewed-by: Tom Lane, John Naylor, Andres Freund, Eric Ridge
Discussion: https://postgr.es/m/CAPNHn3oKJJxMsYq%2BqLYzVJOFrUcOr4OF1EC-KtFT-qh8nOOOtQ%40mail.gmail.com
Backpatch-through: 15
2024-01-29 12:08:57 -06:00
Jeff Davis
cf64d4e99f Cleanup for unicode-update build target and test.
In preparation for adding more Unicode tables.

Discussion: https://postgr.es/m/63cd8625-68fa-4760-844a-6b7f643336f2@ardentperf.com
Reviewed-by: Jeremy Schneider
2024-01-11 12:35:29 -08:00
Robert Haas
3d5c332a3d Repair various defects in dc21234005.
pg_combinebackup had various problems:

* strncpy was used in various places where strlcpy should be used
  instead, to avoid any possibility of the result not being
  \0-terminated.
* scan_for_existing_tablespaces() failed to close the directory,
  and an error when opening the directory was reported with the
  wrong pathname.
* write_reconstructed_file() contained some redundant and therefore
  dead code.
* flush_manifest() didn't check the result of pg_checksum_update()
  as we do in other places, and misused a local pathname variable
  that shouldn't exist at all.

In pg_basebackup, the wrong variable name was used in one place,
due to a copy and paste that was not properly adjusted.

In blkreftable.c, the loop incorrectly doubled chunkno instead of
max_chunks. Fix that. Also remove a nearby assertion per repeated
off-list complaints from Tom Lane.

Per Coverity and subsequent code inspection by me and by Tom Lane.

Discussion: http://postgr.es/m/CA+Tgmobvqqj-DW9F7uUzT-cQqs6wcVb-Xhs=w=hzJnXSE-kRGw@mail.gmail.com
2024-01-11 13:06:10 -05:00
Bruce Momjian
29275b1d17 Update copyright for 2024
Reported-by: Michael Paquier

Discussion: https://postgr.es/m/ZZKTDPxBBMt3C0J9@paquier.xyz

Backpatch-through: 12
2024-01-03 20:49:05 -05:00
Peter Eisentraut
c538592959 Make all Perl warnings fatal
There are a lot of Perl scripts in the tree, mostly code generation
and TAP tests.  Occasionally, these scripts produce warnings.  These
are probably always mistakes on the developer side (true positives).
Typical examples are warnings from genbki.pl or related when you make
a mess in the catalog files during development, or warnings from tests
when they massage a config file that looks different on different
hosts, or mistakes during merges (e.g., duplicate subroutine
definitions), or just mistakes that weren't noticed because there is a
lot of output in a verbose build.

This changes all warnings into fatal errors, by replacing

    use warnings;

by

    use warnings FATAL => 'all';

in all Perl files.

Discussion: https://www.postgresql.org/message-id/flat/06f899fd-1826-05ab-42d6-adeb1fd5e200%40eisentraut.org
2023-12-29 18:20:00 +01:00
Tom Lane
bad0763a4d Fix erroneous -Werror=missing-braces on old GCC.
In the same spirit as 5e0c761d0 and some earlier commits,
suppress a chorus of buildfarm warnings about braces in
these initializers.

Richard Guo

Discussion: https://postgr.es/m/CAMbWs48GzM-Ff7vr=_CeqaXxFBB9UntqtaW1cjU8hOo62AbOOg@mail.gmail.com
2023-12-24 23:36:33 -05:00
Robert Haas
49f2194ed5 Fix numerous typos in incremental backup commits.
Apparently, spell check would have been a really good idea.

Alexander Lakhin, with a few additions as per an off-list report
from Andres Freund.

Discussion: http://postgr.es/m/f08f7c60-1ad3-0b57-d580-54b11f07cddf@gmail.com
2023-12-21 15:36:17 -05:00
Robert Haas
174c480508 Add a new WAL summarizer process.
When active, this process writes WAL summary files to
$PGDATA/pg_wal/summaries. Each summary file contains information for a
certain range of LSNs on a certain TLI. For each relation, it stores a
"limit block" which is 0 if a relation is created or destroyed within
a certain range of WAL records, or otherwise the shortest length to
which the relation was truncated during that range of WAL records, or
otherwise InvalidBlockNumber. In addition, it stores a list of blocks
which have been modified during that range of WAL records, but
excluding blocks which were removed by truncation after they were
modified and never subsequently modified again.

In other words, it tells us which blocks need to copied in case of an
incremental backup covering that range of WAL records. But this
doesn't yet add the capability to actually perform an incremental
backup; the next patch will do that.

A new parameter summarize_wal enables or disables this new background
process.  The background process also automatically deletes summary
files that are older than wal_summarize_keep_time, if that parameter
has a non-zero value and the summarizer is configured to run.

Patch by me, with some design help from Dilip Kumar and Andres Freund.
Reviewed by Matthias van de Meent, Dilip Kumar, Jakub Wartak, Peter
Eisentraut, and Álvaro Herrera.

Discussion: http://postgr.es/m/CA+TgmoYOYZfMCyOXFyC-P+-mdrZqm5pP2N7S-r0z3_402h9rsA@mail.gmail.com
2023-12-20 08:42:28 -05:00
Michael Paquier
1301c80b21 Remove MSVC scripts
This commit removes all the scripts located in src/tools/msvc/ to build
PostgreSQL with Visual Studio on Windows, meson becoming the recommended
way to achieve that.  The scripts held some information that is still
relevant with meson, information kept and moved to better locations.
Comments that referred directly to the scripts are removed.

All the documentation still relevant that was in install-windows.sgml
has been moved to installation.sgml under a new subsection for Visual.
All the content specific to the scripts is removed.  Some adjustments
for the documentation are planned in a follow-up set of changes.

Author: Michael Paquier
Reviewed-by: Peter Eisentraut, Andres Freund
Discussion: https://postgr.es/m/ZQzp_VMJcerM1Cs_@paquier.xyz
2023-12-20 09:44:37 +09:00
Robert Haas
aafc07c7a1 Move src/bin/pg_verifybackup/parse_manifest.c into src/common.
This makes it possible for the code to be easily reused by other
client-side tools, and/or by the server.

Patch by me. Review of this patch in particular by at least Peter
Eisentraut; reviewers for the patch series in general include Dilip
Kumar, Andres Fruend, David Steele, Álvaro Herrera, and Jakub Wartak.

Discussion: http://postgr.es/m/CA+TgmoZ6UGZVnSy5iak6s6+AXu_DewXovDjhLs3-su6nmU_x_g@mail.gmail.com
2023-12-19 15:24:31 -05:00
Thomas Munro
0c6be59f5e Provide helper for retrying partial vectored I/O.
compute_remaining_iovec() is a re-usable routine for retrying after
pg_readv() or pg_writev() reports a short transfer.  This will gain new
users in a later commit, but can already replace the open-coded
equivalent code in the existing pg_pwritev_with_retry() function.

Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/CA+hUKGJkOiOCa+mag4BF+zHo7qo=o9CFheB8=g6uT5TUm2gkvA@mail.gmail.com
2023-12-12 10:57:18 +13:00
Jeff Davis
719b342d36 Shrink Unicode category table.
Missing entries can implicitly be considered "unassigned".

Discussion: https://postgr.es/m/ff4c2f2f9c8fc7ca27c1c24ae37ecaeaeaff6b53.camel@j-davis.com
2023-12-07 15:44:03 -08:00
Michael Paquier
14f2f9eb1a Add CHECK_FOR_INTERRUPTS() in scram_SaltedPassword() for the backend
scram_SaltedPassword() could take a long time to compute when the number
of iterations used is large enough, and this code uses a tight loop to
compute a salted password.

Note that the same issue exists in libpq when using \password and a
large iteration number, but this cannot be interrupted.  A CFI in the
backend is useful for server-side computations, at least.

Backpatch down to 16, where the user-settable GUC scram_iterations has
been added.

Author: Bowen Shi
Reviewed-by: Aleksander Alekseev, Daniel Gustafsson
Discussion: https://postgr.es/m/CAM_vCueV6xfr08KczfaCEk5J_qeTZtgqN7+orkNLx=g+phE82Q@mail.gmail.com
Backpatch-through: 16
2023-11-28 08:35:50 +09:00
Heikki Linnakangas
b8bff07daa Make ResourceOwners more easily extensible.
Instead of having a separate array/hash for each resource kind, use a
single array and hash to hold all kinds of resources. This makes it
possible to introduce new resource "kinds" without having to modify
the ResourceOwnerData struct. In particular, this makes it possible
for extensions to register custom resource kinds.

The old approach was to have a small array of resources of each kind,
and if it fills up, switch to a hash table. The new approach also uses
an array and a hash, but now the array and the hash are used at the
same time. The array is used to hold the recently added resources, and
when it fills up, they are moved to the hash. This keeps the access to
recent entries fast, even when there are a lot of long-held resources.

All the resource-specific ResourceOwnerEnlarge*(),
ResourceOwnerRemember*(), and ResourceOwnerForget*() functions have
been replaced with three generic functions that take resource kind as
argument. For convenience, we still define resource-specific wrapper
macros around the generic functions with the old names, but they are
now defined in the source files that use those resource kinds.

The release callback no longer needs to call ResourceOwnerForget on
the resource being released. ResourceOwnerRelease unregisters the
resource from the owner before calling the callback. That needed some
changes in bufmgr.c and some other files, where releasing the
resources previously always called ResourceOwnerForget.

Each resource kind specifies a release priority, and
ResourceOwnerReleaseAll releases the resources in priority order. To
make that possible, we have to restrict what you can do between
phases. After calling ResourceOwnerRelease(), you are no longer
allowed to remember any more resources in it or to forget any
previously remembered resources by calling ResourceOwnerForget.  There
was one case where that was done previously. At subtransaction commit,
AtEOSubXact_Inval() would handle the invalidation messages and call
RelationFlushRelation(), which temporarily increased the reference
count on the relation being flushed. We now switch to the parent
subtransaction's resource owner before calling AtEOSubXact_Inval(), so
that there is a valid ResourceOwner to temporarily hold that relcache
reference.

Other end-of-xact routines make similar calls to AtEOXact_Inval()
between release phases, but I didn't see any regression test failures
from those, so I'm not sure if they could reach a codepath that needs
remembering extra resources.

There were two exceptions to how the resource leak WARNINGs on commit
were printed previously: llvmjit silently released the context without
printing the warning, and a leaked buffer io triggered a PANIC. Now
everything prints a WARNING, including those cases.

Add tests in src/test/modules/test_resowner.

Reviewed-by: Aleksander Alekseev, Michael Paquier, Julien Rouhaud
Reviewed-by: Kyotaro Horiguchi, Hayato Kuroda, Álvaro Herrera, Zhihong Yu
Reviewed-by: Peter Eisentraut, Andres Freund
Discussion: https://www.postgresql.org/message-id/cbfabeb0-cd3c-e951-a572-19b365ed314d%40iki.fi
2023-11-08 13:30:50 +02:00
Peter Eisentraut
721856ff24 Remove distprep
A PostgreSQL release tarball contains a number of prebuilt files, in
particular files produced by bison, flex, perl, and well as html and
man documentation.  We have done this consistent with established
practice at the time to not require these tools for building from a
tarball.  Some of these tools were hard to get, or get the right
version of, from time to time, and shipping the prebuilt output was a
convenience to users.

Now this has at least two problems:

One, we have to make the build system(s) work in two modes: Building
from a git checkout and building from a tarball.  This is pretty
complicated, but it works so far for autoconf/make.  It does not
currently work for meson; you can currently only build with meson from
a git checkout.  Making meson builds work from a tarball seems very
difficult or impossible.  One particular problem is that since meson
requires a separate build directory, we cannot make the build update
files like gram.h in the source tree.  So if you were to build from a
tarball and update gram.y, you will have a gram.h in the source tree
and one in the build tree, but the way things work is that the
compiler will always use the one in the source tree.  So you cannot,
for example, make any gram.y changes when building from a tarball.
This seems impossible to fix in a non-horrible way.

Second, there is increased interest nowadays in precisely tracking the
origin of software.  We can reasonably track contributions into the
git tree, and users can reasonably track the path from a tarball to
packages and downloads and installs.  But what happens between the git
tree and the tarball is obscure and in some cases non-reproducible.

The solution for both of these issues is to get rid of the step that
adds prebuilt files to the tarball.  The tarball now only contains
what is in the git tree (*).  Getting the additional build
dependencies is no longer a problem nowadays, and the complications to
keep these dual build modes working are significant.  And of course we
want to get the meson build system working universally.

This commit removes the make distprep target altogether.  The make
dist target continues to do its job, it just doesn't call distprep
anymore.

(*) - The tarball also contains the INSTALL file that is built at make
dist time, but not by distprep.  This is unchanged for now.

The make maintainer-clean target, whose job it is to remove the
prebuilt files in addition to what make distclean does, is now just an
alias to make distprep.  (In practice, it is probably obsolete given
that git clean is available.)

The following programs are now hard build requirements in configure
(they were already required by meson.build):

- bison
- flex
- perl

Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://www.postgresql.org/message-id/flat/e07408d9-e5f2-d9fd-5672-f53354e9305e@eisentraut.org
2023-11-06 15:18:04 +01:00
Peter Eisentraut
2c7c6c417f More consistent behavior of GetDataDirectoryCreatePerm on Windows
On Windows, GetDataDirectoryCreatePerm() just did nothing.  The way
the code in some callers is structured, this is the first function
that tries to access the data directory.  So it also ends up the place
that is responsible for reporting that a data directory does not exist
or similar.  Therefore, on Windows, these scenarios end up on
potentially completely different code paths.

To unify this, to make testing more consistent across platforms, have
GetDataDirectoryCreatePerm() run the stat() call on Windows as well,
even though it won't do anything with the result.  That way, file
system errors are reporting to callers in the same way as on
non-Windows.

Reviewed-by: Aleksander Alekseev <aleksander@timescale.com>
Discussion: https://www.postgresql.org/message-id/15a59bca-0383-183c-9383-0446da9b87e1%40eisentraut.org
2023-11-05 21:59:04 +01:00
Jeff Davis
a02b37fc08 Additional unicode primitive functions.
Introduce unicode_version(), icu_unicode_version(), and
unicode_assigned().

The latter requires introducing a new lookup table for the Unicode
General Category, which is generated along with the other Unicode
lookup tables.

Discussion: https://postgr.es/m/CA+TgmoYzYR-yhU6k1XFCADeyj=Oyz2PkVsa3iKv+keM8wp-F_A@mail.gmail.com
Reviewed-by: Peter Eisentraut
2023-11-01 22:47:06 -07:00
Peter Eisentraut
611806cd72 Add trailing commas to enum definitions
Since C99, there can be a trailing comma after the last value in an
enum definition.  A lot of new code has been introducing this style on
the fly.  Some new patches are now taking an inconsistent approach to
this.  Some add the last comma on the fly if they add a new last
value, some are trying to preserve the existing style in each place,
some are even dropping the last comma if there was one.  We could
nudge this all in a consistent direction if we just add the trailing
commas everywhere once.

I omitted a few places where there was a fixed "last" value that will
always stay last.  I also skipped the header files of libpq and ecpg,
in case people want to use those with older compilers.  There were
also a small number of cases where the enum type wasn't used anywhere
(but the enum values were), which ended up confusing pgindent a bit,
so I left those alone.

Discussion: https://www.postgresql.org/message-id/flat/386f8c45-c8ac-4681-8add-e3b0852c1620%40eisentraut.org
2023-10-26 09:20:54 +02:00
David Rowley
f0efa5aec1 Introduce the concept of read-only StringInfos
There were various places in our codebase which conjured up a StringInfo
by manually assigning the StringInfo fields and setting the data field
to point to some existing buffer.  There wasn't much consistency here as
to what fields like maxlen got set to and in one location we didn't
correctly ensure that the buffer was correctly NUL terminated at len
bytes, as per what was documented as required in stringinfo.h

Here we introduce 2 new functions to initialize StringInfos.  One allows
callers to initialize a StringInfo passing along a buffer that is
already allocated by palloc.  Here the StringInfo code uses this buffer
directly rather than doing any memcpying into a new allocation.  Having
this as a function allows us to verify the buffer is correctly NUL
terminated.  StringInfos initialized this way can be appended to and
reset just like any other normal StringInfo.

The other new initialization function also accepts an existing buffer,
but the given buffer does not need to be a pointer to a palloc'd chunk.
This buffer could be a pointer pointing partway into some palloc'd chunk
or may not even be palloc'd at all.  StringInfos initialized this way
are deemed as "read-only".  This means that it's not possible to
append to them or reset them.

For the latter of the two new initialization functions mentioned above,
we relax the requirement that the data buffer must be NUL terminated.
Relaxing this requirement is convenient in a few places as it can save
us from having to allocate an entire new buffer just to add the NUL
terminator or save us from having to temporarily add a NUL only to have to
put the original char back again later.

Incompatibility note:

Here we also forego adding the NUL in a few places where it does not
seem to be required.  These locations are passing the given StringInfo
into a type's receive function.  It does not seem like any of our
built-in receive functions require this, but perhaps there's some UDT
out there in the wild which does require this.  It is likely worthy of
a mention in the release notes that a UDT's receive function mustn't rely
on the input StringInfo being NUL terminated.

Author: David Rowley
Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/CAApHDvorfO3iBZ%3DxpiZvp3uHtJVLyFaPBSvcAhAq2HPLnaNSwQ%40mail.gmail.com
2023-10-26 16:31:48 +13:00
Tom Lane
9b103f861e Improve pglz_decompress's defenses against corrupt compressed data.
When processing a match tag, check to see if the claimed "off"
is more than the distance back to the output buffer start.
If it is, then the data is corrupt, and what's more we would
fetch from outside the buffer boundaries and potentially incur
a SIGSEGV.  (Although the odds of that seem relatively low, given
that "off" can't be more than 4K.)

Back-patch to v13; before that, this function wasn't really
trying to protect against bad data.

Report and fix by Flavien Guedez.

Discussion: https://postgr.es/m/01fc0593-e31e-463d-902c-dd43174acee2@oopacity.net
2023-10-18 20:43:27 -04:00
Thomas Munro
63a582222c Try to handle torn reads of pg_control in frontend.
Some of our src/bin tools read the control file without any kind of
interlocking against concurrent writes from the server.  At least ext4
and ntfs can expose partially modified contents when you do that.

For now, we'll try to tolerate this by retrying up to 10 times if the
checksum doesn't match, until we get two reads in a row with the same
bad checksum.  This is not guaranteed to reach the right conclusion, but
it seems very likely to.  Thanks to Tom Lane for this suggestion.

Various ideas for interlocking or atomicity were considered too
complicated, unportable or expensive given the lack of field reports,
but remain open for future reconsideration.

Back-patch as far as 12.  It doesn't seem like a good idea to put a
heuristic change for a very rare problem into the final release of 11.

Reviewed-by: Anton A. Melnikov <aamelnikov@inbox.ru>
Reviewed-by: David Steele <david@pgmasters.net>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/20221123014224.xisi44byq3cf5psi%40awork3.anarazel.de
2023-10-16 17:33:08 +13:00
Tom Lane
b6c7cfac88 Restore proper linkage of pg_char_to_encoding() and friends.
Back in the 8.3 era we discovered that it was problematic if
libpq.so had encoding ID assignments different from the backend,
which is possible because on some platforms libpq.so might be
of a different major version from the calling programs.
psql should use libpq's assignments, but initdb has to use the
backend's, else it will put wrong values into pg_database.
The solution devised in commit 8468146b0 relied on giving initdb
its own copy of encnames.c rather than relying on the functions
exported by libpq.  Later, that metamorphosed into ensuring that
libpgcommon got linked before libpq -- which made things OK for
initdb but broke psql.  We didn't notice for lack of any changes
in enum pg_enc since then.  Commit 06843df4a reversed that, fixing
the latent bug in psql but adding one in initdb.  The meson build
infrastructure is also not being sufficiently careful about link
order, and trying to make it so would be equally fragile.

Hence, let's use a new scheme based on giving the libpq-exported
symbols different real names than the same functions exported from
libpgcommon.a or libpgcommon_srv.a.  (We could distinguish those
two cases as well, but there seems no need to.)  libpq gets the
official names to avoid an ABI break for libpq clients, while the
other cases use #define's to make the real names "xxx_private"
rather than "xxx".  By controlling where the #define's are
applied, we can force any particular client program to use one
set or the other of the encnames.c functions.

We cannot back-patch this, since it'd be an ABI break for backend
loadable modules, but there seems little need to.  We're just
trying to ensure that the world is safe for hypothetical future
additions to enum pg_enc.

In passing this should fix "duplicate symbol" linker warnings
that we've been seeing on AIX buildfarm members since commit
06843df4a.  It's not very clear why that linker is complaining
now, when there were strictly *more* duplicates visible before,
but in any case this should remove the reason for complaint.

Patch by me; thanks to Andres Freund for review.

Discussion: https://postgr.es/m/2385119.1696354473@sss.pgh.pa.us
2023-10-07 12:08:10 -04:00
Alvaro Herrera
1c99cde2f3
Improve JsonLexContext's freeability
Previously, the JSON code didn't have to worry too much about freeing
JsonLexContext, because it was never too long-lived.  With new features
being added for SQL/JSON this is no longer the case.  Add a routine
that knows how to free this struct and apply that to a few places, to
prevent this from becoming problematic.

At the same time, we change the API of makeJsonLexContextCstringLen to
make it receive a pointer to JsonLexContext for callers that want it to
be stack-allocated; it can also be passed as NULL to get the original
behavior of a palloc'ed one.

This also causes an ABI break due to the addition of flags to
JsonLexContext, so we can't easily backpatch it.  AFAICS that's not much
of a problem; apparently some leaks might exist in JSON usage of
text-search, for example via json_to_tsvector, but I haven't seen any
complaints about that.

Per Coverity complaint about datum_to_jsonb_internal().

Discussion: https://postgr.es/m/20230808174110.oq3iymllsv6amkih@alvherre.pgsql
2023-10-05 10:59:08 +02:00
Nathan Bossart
c103d07381 Add function for removing arbitrary nodes in binaryheap.
This commit introduces binaryheap_remove_node(), which can be used
to remove any node from a binary heap.  The implementation is
straightforward.  The target node is replaced with the last node in
the heap, and then we sift as needed to preserve the heap property.
This new function is intended for use in a follow-up commit that
will improve the performance of pg_restore.

Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/3612876.1689443232%40sss.pgh.pa.us
2023-09-18 14:06:08 -07:00
Nathan Bossart
5af0263afd Make binaryheap available to frontend code.
There are a couple of places in frontend code that could make use
of this simple binary heap implementation.  This commit makes
binaryheap usable in frontend code, much like commit 26aaf97b68 did
for StringInfo.  Like StringInfo, the header file is left in lib/
to reduce the likelihood of unnecessary breakage.

The frontend version of binaryheap exposes a void *-based API since
frontend code does not have access to the Datum definitions.  This
seemed like a better approach than switching all existing uses to
void * or making the Datum definitions available to frontend code.

Reviewed-by: Tom Lane, Alvaro Herrera
Discussion: https://postgr.es/m/3612876.1689443232%40sss.pgh.pa.us
2023-09-18 12:18:33 -07:00
Peter Eisentraut
9d17e5f16f Update Unicode data to Unicode 15.1.0 2023-09-18 07:26:34 +02:00
Peter Eisentraut
5c08927d36 Make Unicode script fit for future versions
Between Unicode 15.0.0 and 15.1.0, the whitespace in
EastAsianWidth.txt has changed a bit, such as from

0020;Na          # Zs         SPACE

to

0020           ; Na # Zs         SPACE

with space around the semicolon.  Adjust the script to be able to
parse that.
2023-09-18 07:25:46 +02:00
Nathan Bossart
cccc6cdeb3 Add support for syncfs() in frontend support functions.
This commit adds support for using syncfs() in fsync_pgdata() and
fsync_dir_recurse() (which have been renamed to sync_pgdata() and
sync_dir_recurse()).  Like recovery_init_sync_method,
sync_pgdata() calls syncfs() for the data directory, each
tablespace, and pg_wal (if it is a symlink).  For now, all of the
frontend utilities that use these support functions are hard-coded
to use fsync(), but a follow-up commit will allow specifying
syncfs().

Co-authored-by: Justin Pryzby
Reviewed-by: Michael Paquier
Discussion: https://postgr.es/m/20210930004340.GM831%40telsasoft.com
2023-09-06 16:27:00 -07:00
Michael Paquier
f1e9f6bbfa Avoid memory leak in rmtree() when path cannot be opened
An allocation done for the directory names to recurse into for their
deletion was done before OPENDIR(), so, assuming that a failure happens,
this could leak a bit of memory.

Author: Ranier Vilela
Reviewed-by: Daniel Gustafsson
Discussion: https://postgr.es/m/CAEudQAoN3-2ZKBALThnEk_q2hu8En5A0WG9O+5siJTQKVZzoWQ@mail.gmail.com
2023-07-31 11:36:44 +09:00
Michael Paquier
fa88928470 Generate automatically code and documentation related to wait events
The documentation and the code is generated automatically from a new
file called wait_event_names.txt, formatted in sections dedicated to
each wait event class (Timeout, Lock, IO, etc.) with three tab-separated
fields:
- C symbol in enums
- Format in the system views
- Description in the docs

Using this approach has several advantages, as we have proved to be
rather bad in maintaining this area of the tree across the years:
- The order of each item in the documentation and the code, which should
be alphabetical, has become incorrect multiple times, and the script
generating the code and documentation has a few rules to enforce that,
making the maintenance a no-brainer.
- Some wait events were added to the code, but not documented, so this
cannot be missed now.
- The order of the tables for each wait event class is enforced in the
documentation (the input .txt file does so as well for clarity, though
this is not mandatory).
- Less code, shaving 1.2k lines from the tree, with 1/3 of the savings
coming from the code, the rest from the documentation.

The wait event types "Lock" and "LWLock" still have their own code path
for their code, hence only the documentation is created for them.  These
classes are listed with a special marker called WAIT_EVENT_DOCONLY in
the input file.

Adding a new wait event now requires only an update of
wait_event_names.txt, with "Lock" and "LWLock" treated as exceptions.

This commit has been tested with configure/Makefile, the CI and VPATH
build.  clean, distclean and maintainer-clean were working fine.

Author: Bertrand Drouvot, Michael Paquier
Discussion: https://postgr.es/m/77a86b3a-c4a8-5f5d-69b9-d70bbf2e9b98@gmail.com
2023-07-05 10:53:11 +09:00
Andres Freund
a1cd982098 meson: Add dependencies to perl modules to various script invocations
Eventually it is likely worth trying to deal with this in a more expansive
way, by generating dependency files generated within the scripts. But it's not
entirely obvious how to do that in perl and is work more suitable for 17
anyway.

Reported-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Reviewed-by: Tristan Partin <tristan@neon.tech>
Discussion: https://postgr.es/m/87v8g7s6bf.fsf@wibble.ilmari.org
2023-06-09 20:12:16 -07:00
Tom Lane
d98ed080bb Fix small overestimation of base64 encoding output length.
pg_base64_enc_len() and its clones overestimated the output
length by up to 2 bytes, as a result of sloppy thinking about
where to divide.  No callers require a precise estimate, so
this has no consequences worse than palloc'ing a byte or two
more than necessary.  We might as well get it right though.

This bug is very ancient, dating to commit 79d78bb26 which
added encode.c.  (The other instances were presumably copied
from there.)  Still, it doesn't quite seem worth back-patching.

Oleg Tselebrovskiy

Discussion: https://postgr.es/m/f94da55286a63022150bc266afdab754@postgrespro.ru
2023-06-08 11:24:31 -04:00
Tom Lane
0245f8db36 Pre-beta mechanical code beautification.
Run pgindent, pgperltidy, and reformat-dat-files.

This set of diffs is a bit larger than typical.  We've updated to
pg_bsd_indent 2.1.2, which properly indents variable declarations that
have multi-line initialization expressions (the continuation lines are
now indented one tab stop).  We've also updated to perltidy version
20230309 and changed some of its settings, which reduces its desire to
add whitespace to lines to make assignments etc. line up.  Going
forward, that should make for fewer random-seeming changes to existing
code.

Discussion: https://postgr.es/m/20230428092545.qfb3y5wcu4cm75ur@alvherre.pgsql
2023-05-19 17:24:48 -04:00
Peter Eisentraut
8e7912e73d Message style improvements 2023-05-19 18:45:29 +02:00
Peter Eisentraut
009bd237bf Fix error message wordings
The original patch for percentrepl.c c96de2ce17 adopted the error
messages from basebackup_to_shell, but that uses terminology that
doesn't really fit with the new API naming.
2023-05-17 21:33:47 +02:00
Thomas Munro
faeedbcefd Introduce PG_IO_ALIGN_SIZE and align all I/O buffers.
In order to have the option to use O_DIRECT/FILE_FLAG_NO_BUFFERING in a
later commit, we need the addresses of user space buffers to be well
aligned.  The exact requirements vary by OS and file system (typically
sectors and/or memory pages).  The address alignment size is set to
4096, which is enough for currently known systems: it matches modern
sectors and common memory page size.  There is no standard governing
O_DIRECT's requirements so we might eventually have to reconsider this
with more information from the field or future systems.

Aligning I/O buffers on memory pages is also known to improve regular
buffered I/O performance.

Three classes of I/O buffers for regular data pages are adjusted:
(1) Heap buffers are now allocated with the new palloc_aligned() or
MemoryContextAllocAligned() functions introduced by commit 439f6175.
(2) Stack buffers now use a new struct PGIOAlignedBlock to respect
PG_IO_ALIGN_SIZE, if possible with this compiler.  (3) The buffer
pool is also aligned in shared memory.

WAL buffers were already aligned on XLOG_BLCKSZ.  It's possible for
XLOG_BLCKSZ to be configured smaller than PG_IO_ALIGNED_SIZE and thus
for O_DIRECT WAL writes to fail to be well aligned, but that's a
pre-existing condition and will be addressed by a later commit.

BufFiles are not yet addressed (there's no current plan to use O_DIRECT
for those, but they could potentially get some incidental speedup even
in plain buffered I/O operations through better alignment).

If we can't align stack objects suitably using the compiler extensions
we know about, we disable the use of O_DIRECT by setting PG_O_DIRECT to
0.  This avoids the need to consider systems that have O_DIRECT but
can't align stack objects the way we want; such systems could in theory
be supported with more work but we don't currently know of any such
machines, so it's easier to pretend there is no O_DIRECT support
instead.  That's an existing and tested class of system.

Add assertions that all buffers passed into smgrread(), smgrwrite() and
smgrextend() are correctly aligned, unless PG_O_DIRECT is 0 (= stack
alignment tricks may be unavailable) or the block size has been set too
small to allow arrays of buffers to be all aligned.

Author: Thomas Munro <thomas.munro@gmail.com>
Author: Andres Freund <andres@anarazel.de>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com>
Discussion: https://postgr.es/m/CA+hUKGK1X532hYqJ_MzFWt0n1zt8trz980D79WbjwnT-yYLZpg@mail.gmail.com
2023-04-08 16:34:50 +12:00
Tomas Vondra
2820adf775 Support long distance matching for zstd compression
zstd compression supports a special mode for finding matched in distant
past, which may result in better compression ratio, at the expense of
using more memory (the window size is 128MB).

To enable this optional mode, use the "long" keyword when specifying the
compression method (--compress=zstd:long).

Author: Justin Pryzby
Reviewed-by: Tomas Vondra, Jacob Champion
Discussion: https://postgr.es/m/20230224191840.GD1653@telsasoft.com
Discussion: https://postgr.es/m/20220327205020.GM28503@telsasoft.com
2023-04-06 17:18:42 +02:00
Daniel Gustafsson
b577743000 Make SCRAM iteration count configurable
Replace the hardcoded value with a GUC such that the iteration
count can be raised in order to increase protection against
brute-force attacks.  The hardcoded value for SCRAM iteration
count was defined to be 4096, which is taken from RFC 7677, so
set the default for the GUC to 4096 to match.  In RFC 7677 the
recommendation is at least 15000 iterations but 4096 is listed
as a SHOULD requirement given that it's estimated to yield a
0.5s processing time on a mobile handset of the time of RFC
writing (late 2015).

Raising the iteration count of SCRAM will make stored passwords
more resilient to brute-force attacks at a higher computational
cost during connection establishment.  Lowering the count will
reduce computational overhead during connections at the tradeoff
of reducing strength against brute-force attacks.

There are however platforms where even a modest iteration count
yields a too high computational overhead, with weaker password
encryption schemes chosen as a result.  In these situations,
SCRAM with a very low iteration count still gives benefits over
weaker schemes like md5, so we allow the iteration count to be
set to one at the low end.

The new GUC is intentionally generically named such that it can
be made to support future SCRAM standards should they emerge.
At that point the value can be made into key:value pairs with
an undefined key as a default which will be backwards compatible
with this.

Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Jonathan S. Katz <jkatz@postgresql.org>
Discussion: https://postgr.es/m/F72E7BC7-189F-4B17-BF47-9735EB72C364@yesql.se
2023-03-27 09:46:29 +02:00
Tom Lane
11a0a8b529 Implement find_my_exec()'s path normalization using realpath(3).
Replace the symlink-chasing logic in find_my_exec with realpath(3),
which has been required by POSIX since SUSv2.  (Windows lacks
realpath(), but there we can use _fullpath() which is functionally
equivalent.)  The main benefit of this is that -- on all modern
platforms at least -- realpath() avoids the chdir() shenanigans
we used to perform while interpreting symlinks.  That had various
corner-case failure modes so it's good to get rid of it.

There is still ongoing discussion about whether we could skip the
replacement of symlinks in some cases, but that's really matter
for a separate patch.  Meanwhile I want to push this before we get
too close to feature freeze, so that we can find out if there are
showstopper portability issues.

Discussion: https://postgr.es/m/797232.1662075573@sss.pgh.pa.us
2023-03-23 18:17:49 -04:00
Tom Lane
b0d8f2d983 Add SHELL_ERROR and SHELL_EXIT_CODE magic variables to psql.
These are set after a \! command or a backtick substitution.
SHELL_ERROR is just "true" for error (nonzero exit status) or "false"
for success, while SHELL_EXIT_CODE records the actual exit status
following standard shell/system(3) conventions.

Corey Huinker, reviewed by Maxim Orlov and myself

Discussion: https://postgr.es/m/CADkLM=cWao2x2f+UDw15W1JkVFr_bsxfstw=NGea7r9m4j-7rQ@mail.gmail.com
2023-03-21 13:03:56 -04:00
Andres Freund
2b7259f855 Silence pedantic compiler warning introduced in ce340e530d
.../src/common/file_utils.c: In function ‘pg_pwrite_zeros’:
.../src/common/file_utils.c:543:9: warning: ‘static’ is not at beginning of declaration [-Wold-style-declaration]
  543 |         const static PGAlignedBlock zbuffer = {{0}};    /* worth BLCKSZ */
2023-03-16 09:41:13 -07:00