Improve performance and reduce table sizes for case mapping.
The main case mapping table stores only 16-bit offsets, which can be
used to look up the mapped code point in any of the case tables (fold,
lower, upper, or title case). Simple case pairs point to the same
offsets.
Generate a function in generate-unicode_case_table.pl that consists of
a nested branches to test for specific codepoint ranges that determine
the offset in the main table.
Other approaches were considered, such as representing these ranges as
another structure (rather than branches in a generated function), or a
different approach such as a radix tree, or perfect hashing. The
author implemented and tested these alternatives and settled on the
generated branches.
Author: Alexander Borisov <lex.borisov@gmail.com>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/7cac7e66-9a3b-4e3f-a997-42aa0c401f80%40gmail.com
We want to support a "noreturn" decoration on more compilers besides
just GCC-compatible ones, but for that we need to move the decoration
in front of the function declaration instead of either behind it or
wherever, which is the current style afforded by GCC-style attributes.
Also rename the macro to "pg_noreturn" to be similar to the C11
standard "noreturn".
pg_noreturn is now supported on all compilers that support C11 (using
_Noreturn), as well as GCC-compatible ones (using __attribute__, as
before), as well as MSVC (using __declspec). (When PostgreSQL
requires C11, the latter two variants can be dropped.)
Now, all supported compilers effectively support pg_noreturn, so the
extra code for !HAVE_PG_ATTRIBUTE_NORETURN can be dropped.
This also fixes a possible problem if third-party code includes
stdnoreturn.h, because then the current definition of
#define pg_attribute_noreturn() __attribute__((noreturn))
would cause an error.
Note that the C standard does not support a noreturn attribute on
function pointer types. So we have to drop these here. There are
only two instances at this time, so it's not a big loss. In one case,
we can make up for it by adding the pg_noreturn to a wrapper function
and adding a pg_unreachable(), in the other case, the latter was
already done before.
Reviewed-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://www.postgresql.org/message-id/flat/pxr5b3z7jmkpenssra5zroxi7qzzp6eswuggokw64axmdixpnk@zbwxuq7gbbcw
Upcoming optimizations will add complexity to convert_case(). This
patch reorganizes slightly so that the complexity can be contained
within the logic to convert the case of a single character, rather
than mixing it in with logic to iterate through the string.
Reviewed-by: Alexander Borisov <lex.borisov@gmail.com>
Discussion: https://postgr.es/m/44005c3d-88f4-4a26-981f-fd82dfa8e313@gmail.com
For AIO, and also some other recent patches, we need the ability to call
relpath() in a critical section. Until now that was not feasible, as it
allocated memory.
The fact that relpath() allocated memory also made it awkward to use in log
messages because we had to take care to free the memory afterwards. Which we
e.g. didn't do for when zeroing out an invalid buffer.
We discussed other solutions, e.g. filling a pre-allocated buffer that's
passed to relpath(), but they all came with plenty downsides or were larger
projects. The easiest fix seems to be to make relpath() return the path by
value.
To be able to return the path by value we need to determine the maximum length
of a relation path. This patch adds a long #define that computes the exact
maximum, which is verified to be correct in a regression test.
As this change the signature of relpath(), extensions using it will need to
adapt their code. We discussed leaving a backward-compat shim in place, but
decided it's not worth it given the use of relpath() doesn't seem widespread.
Discussion: https://postgr.es/m/xeri5mla4b5syjd5a25nok5iez2kr3bm26j2qn4u7okzof2bmf@kwdh2vf7npra
There are cases where we cannot / do not want to error out for invalidly
encoded input. In such cases it can be useful to replace e.g. an incomplete
multi-byte characters with bytes that will trigger an error when getting
validated as part of a larger string.
Unfortunately, until now, for some encoding no such sequence existed. For
those encodings this commit removes one previously accepted input combination
- we consider that to be ok, as the chosen bytes are outside of the valid
ranges for the encodings, we just previously failed to detect that.
As we cannot add a new field to pg_wchar_table without breaking ABI, this is
implemented "in-line" in the newly added function.
Author: Noah Misch <noah@leadboat.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Backpatch-through: 13
Security: CVE-2025-1094
Previously StringInfo APIs allocated buffers with fixed initial
allocation size of 1024 bytes. This may be too large and inappropriate
for some callers that can do with smaller memory buffers. To fix this,
introduce new APIs that allow callers to specify initial buffer size.
extern StringInfo makeStringInfoExt(int initsize);
extern void initStringInfoExt(StringInfo str, int initsize);
Existing APIs (makeStringInfo() and initStringInfo()) are changed to
call makeStringInfoExt and initStringInfoExt respectively (via inline
helper functions makeStringInfoInternal and initStringInfoInternal),
with the default buffer size of 1024.
Reviewed-by: Nathan Bossart, David Rowley, Michael Paquier, Gurjeet Singh
Discussion: https://postgr.es/m/20241225.123704.1194662271286702010.ishii%40postgresql.org
This routine documented that "iterations" would use a default value if
set to 0 by the caller. However, the iteration should always be set by
the caller to a value strictly more than 0, as documented by an
assertion.
Oversight in b577743000, that has made the iteration count of SCRAM
configurable.
Author: Matheus Alcantara
Discussion: https://postgr.es/m/ac858943-4743-44cd-b4ad-08a0c10cbbc8@gmail.com
Backpatch-through: 16
Commit 0785d1b8b adds support for libpq as a JSON client, but
allocations for string tokens can still be leaked during parsing
failures. This is tricky to fix for the object_field semantic callbacks:
the field name must remain valid until the end of the object, but if a
parsing error is encountered partway through, object_field_end() won't
be invoked and the client won't get a chance to free the field name.
This patch adds a flag to switch the ownership of parsed tokens to the
lexer. When this is enabled, the client must make a copy of any tokens
it wants to persist past the callback lifetime, but the lexer will
handle necessary cleanup on failure.
Backend uses of the JSON parser don't need to use this flag, since the
parser's allocations will occur in a short lived memory context.
A -o option has been added to test_json_parser_incremental to exercise
the new setJsonLexContextOwnsTokens() API, and the test_json_parser TAP
tests make use of it. (The test program now cleans up allocated memory,
so that tests can be usefully run under leak sanitizers.)
Author: Jacob Champion
Discussion: https://postgr.es/m/CAOYmi+kb38EciwyBQOf9peApKGwraHqA7pgzBkvoUnw5BRfS1g@mail.gmail.com
Most came in during the 17 cycle, so backpatch there. Some
(particularly reorderbuffer.h) are very old, but backpatching doesn't
seem useful.
Like commits c9d2977519, c4f113e8fe.
Presently, each iteration of the loop in sift_down() will perform
3 comparisons if both children are larger than the parent node (2
for comparing each child to the parent node, and a third to compare
the children to each other). By first comparing the children to
each other and then comparing the larger child to the parent node,
we can accomplish the same thing with just 2 comparisons (while
also not affecting the number of comparisons in any other case).
Author: ChangAo Chen
Reviewed-by: Robert Haas
Discussion: https://postgr.es/m/tencent_0142D8DA90940B9930BCC08348BBD6D0BB07%40qq.com
As threatened in the previous patch, define MaxAllocSize in
src/include/common/fe_memutils.h rather than having several
copies of it in different src/common/*.c files. This also
provides an opportunity to document it better.
While this would probably be safe to back-patch, I'll refrain
(for now anyway).
Coverity complained that pg_saslprep() could suffer integer overflow,
leading to under-allocation of the output buffer, if the input string
exceeds SIZE_MAX/4. This hazard seems largely hypothetical, but it's
easy enough to defend against, so let's do so.
This patch creates a third place in src/common/ where we are locally
defining MaxAllocSize so that we can test against that in the same way
in backend and frontend compiles. That seems like about two places
too many, so the next patch will move that into common/fe_memutils.h.
I'm hesitant to do that in back branches however.
Back-patch to v14. The code looks similar in older branches, but
before commit 67a472d71 there was a separate test on the input string
length that prevented this hazard.
Per Coverity report.
Commit 5d2e1cc117 introduced some strsep() uses, but it did the
memory management wrong in some cases. We need to keep a separate
pointer to the allocate memory so that we can free it later, because
strsep() advances the pointer we pass to it, and it at the end it
will be NULL, so any free() calls won't do anything.
(This fixes two of the four places changed in commit 5d2e1cc117. The
other two don't have this problem.)
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/79692bf9-17d3-41e6-b9c9-fc8c3944222a@eisentraut.org
Valgrind reports that checks on lex->inc_state are undefined for the
"dummy lexer" used for incremental parsing, since it's only partially
initialized on the stack. This was introduced in 0785d1b8b2.
Zero-initialize the whole struct.
Author: Jacob Champion <jacob.champion@enterprisedb.com>
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://www.postgresql.org/message-id/CAOYmi+n9QWr4gsAADZc6qFQjFViXQYVk=gBy_EvxuqsgPJcb_g@mail.gmail.com
Coverity pointed out that inc_lex_level() would leak memory
(not to mention corrupt the pstack data structure) if some
but not all of its three REALLOC's failed. To fix, store
successfully-updated pointers back into the pstack struct
immediately.
Oversight in 0785d1b8b, so no need for back-patch.
Based on a patch by Michael Paquier.
For libpq, use PQExpBuffer instead of StringInfo. This requires us to
track allocation failures so that we can return JSON_OUT_OF_MEMORY as
needed rather than exit()ing.
Author: Jacob Champion <jacob.champion@enterprisedb.com>
Co-authored-by: Michael Paquier <michael@paquier.xyz>
Co-authored-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://www.postgresql.org/message-id/flat/d1b467a78e0e36ed85a09adf979d04cf124a9d4b.camel@vmware.com
In case of torn UTF8 in the input data we might end up going
past the end of the string since we don't account for length.
While validation won't be performed on a sequence with a NULL
byte it's better to avoid going past the end to beging with.
Fix by taking the length into consideration.
Author: Jacob Champion <jacob.champion@enterprisedb.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/CAOYmi+mTnmM172g=_+Yvc47hzzeAsYPy2C4UBY3HK9p-AXNV0g@mail.gmail.com
Similarly to 2065ddf5e3, this introduces a define for "pg_tblspc".
This makes the style more consistent with the existing PG_STAT_TMP_DIR,
for example.
There is a difference with the other cases with the introduction of
PG_TBLSPC_DIR_SLASH, required in two places for recovery and backups.
Author: Bertrand Drouvot
Reviewed-by: Ashutosh Bapat, Álvaro Herrera, Yugo Nagata, Michael
Paquier
Discussion: https://postgr.es/m/ZryVvjqS9SnV1GPP@ip-10-97-1-34.eu-west-3.compute.internal
OpenSSL 1.0.2 has been EOL from the upstream OpenSSL project for
some time, and is no longer the default OpenSSL version with any
vendor which package PostgreSQL. By retiring support for OpenSSL
1.0.2 we can remove a lot of no longer required complexity for
managing state within libcrypto which is now handled by OpenSSL.
Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/ZG3JNursG69dz1lr@paquier.xyz
Discussion: https://postgr.es/m/CA+hUKGKh7QrYzu=8yWEUJvXtMVm_CNWH1L_TLWCbZMwbi1XP2Q@mail.gmail.com
libpq must not use palloc/pfree. It's not allowed to exit on allocation
failure, and mixing the frontend pfree with malloc is architecturally
unsound.
Remove fe_memutils from the shlib build entirely, to keep devs from
accidentally depending on it in the future.
Author: Jacob Champion <jacob.champion@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/CAOYmi+=pg=W5L1h=3MEP_EB24jaBu2FyATrLXqQHGe7cpuvwyg@mail.gmail.com
The now preferred way to call realpath() is by passing NULL as the
second argument and get a malloc'ed result. We still supported the
old way of providing our own buffer as a second argument, for some
platforms that didn't support the new way yet. Those were only
Solaris less than version 11 and some older AIX versions (7.1 and
newer appear to support the new variant). We don't support those
platforms versions anymore, so we can remove this extra code.
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://www.postgresql.org/message-id/flat/9e638b49-5c3f-470f-a392-2cbedb2f7855%40eisentraut.org
Replace a static scratch buffer with a local variable, because a
static buffer makes the function not thread-safe. This function is
used in client-code in libpq, so it needs to be thread-safe. It was
until commit b67b57a966, which replaced the implementation with the
one from pgcrypto.
Backpatch to v14, where we switched to the new implementation.
Reviewed-by: Robert Haas, Michael Paquier
Discussion: https://www.postgresql.org/message-id/dfa2015d-ad21-4802-a4cc-3850fc5fff3f@iki.fi
strtok() considers adjacent delimiters to be one delimiter, which is
arguably the wrong behavior in some cases. Replace with strsep(),
which has the right behavior: Adjacent delimiters create an empty
token.
Affected by this are parsing of:
- Stored SCRAM secrets
("SCRAM-SHA-256$<iterations>:<salt>$<storedkey>:<serverkey>")
- ICU collation attributes
("und@colStrength=primary;colCaseLevel=yes") for ICU older than
version 54
- PG_COLORS environment variable
("error=01;31:warning=01;35:note=01;36:locus=01")
- pg_regress command-line options with comma-separated list arguments
(--dbname, --create-role) (currently only used pg_regress_ecpg)
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-by: David Steele <david@pgmasters.net>
Discussion: https://www.postgresql.org/message-id/flat/79692bf9-17d3-41e6-b9c9-fc8c3944222a@eisentraut.org
Until now, when an enlargeStringInfo() call would cause the StringInfo to
exceed its maximum size, we reported an "out of memory" error. This is
misleading as it's no such thing.
Here we remove the "out of memory" text and replace it with something
more relevant to better indicate that it's a program limitation that's
been reached.
Reported-by: Michael Banck
Reviewed-by: Daniel Gustafsson, Tom Lane
Discussion: https://postgr.es/m/18484-3e357ade5fe50e61@postgresql.org
Apply const qualifiers to char * arguments and fields throughout the
jsonapi. This allows the top-level APIs such as
pg_parse_json_incremental() to declare their input argument as const.
It also reduces the number of unconstify() calls.
Reviewed-by: Andrew Dunstan <andrew@dunslane.net>
Discussion: https://www.postgresql.org/message-id/flat/f732b014-f614-4600-a437-dba5a2c3738b%40eisentraut.org
Run pgindent, pgperltidy, and reformat-dat-files.
The pgindent part of this is pretty small, consisting mainly of
fixing up self-inflicted formatting damage from patches that
hadn't bothered to add their new typedefs to typedefs.list.
In order to keep it from making anything worse, I manually added
a dozen or so typedefs that appeared in the existing typedefs.list
but not in the buildfarm's list. Perhaps we should formalize that,
or better find a way to get those typedefs into the automatic list.
pgperltidy is as opinionated as always, and reformat-dat-files too.
Commit d6607016c7 moved all the jsonapi.c error messages into
token_error(). This needs to be added to the various nls.mk files
that use this. Since that makes token_error() effectively a globally
known symbol, the name seems a bit too general, so rename to
json_token_error() for more clarity.
json_lex_string() relies on pg_encoding_mblen_bounded() to point to the
end of a JSON string when generating an error message, and the input it
uses is not guaranteed to be null-terminated.
It was possible to walk off the end of the input buffer by a few bytes
when the last bytes consist of an incomplete multi-byte sequence, as
token_terminator would point to a location defined by
pg_encoding_mblen_bounded() rather than the end of the input. This
commit switches token_terminator so as the error uses data up to the
end of the JSON input.
More work should be done so as this code could rely on an equivalent of
report_invalid_encoding() so as incorrect byte sequences can show in
error messages in a readable form. This requires work for at least two
cases in the JSON parsing API: an incomplete token and an invalid escape
sequence. A more complete solution may be too invasive for a backpatch,
so this is left as a future improvement, taking care of the overread
first.
A test is added on HEAD as test_json_parser makes this issue
straight-forward to check.
Note that pg_encoding_mblen_bounded() no longer has any callers. This
will be removed on HEAD with a separate commit, as this is proving to
encourage unsafe coding.
Author: Jacob Champion
Discussion: https://postgr.es/m/CAOYmi+ncM7pwLS3AnKCSmoqqtpjvA8wmCdoBtKA3ZrB2hZG6zA@mail.gmail.com
Backpatch-through: 13
JsonNonTerminal and JsonParserSem were added in commit 3311ea86ed
These names of these two enums are not actually used, so there is no
need for typedefs. Instead use plain enums to declare the constants.
Noticed by Alvaro Herera.