The code made a subtle assumption that the lower-cased version of a
string never has more characters than the original. That is not always
true. For example, in a database with the latin9 encoding:
latin9db=# select lower(U&'\00CC' COLLATE "lt-x-icu");
lower
-----------
i\x1A\x1A
(1 row)
In this example, lower-casing expands the single input character into
three characters.
The generate_trgm_only() function relied on that assumption in two
ways:
- It used "slen * pg_database_encoding_max_length() + 4" to allocate
the buffer to hold the lowercased and blank-padded string. That
formula accounts for expansion if the lower-case characters are
longer (in bytes) than the originals, but it's still not enough if
the lower-cased string contains more *characters* than the original.
- Its callers sized the output array to hold the trigrams extracted
from the input string with the formula "(slen / 2 + 1) * 3", where
'slen' is the input string length in bytes. (The formula was
generous to account for the possibility that RPADDING was set to 2.)
That's also not enough if one input byte can turn into multiple
characters.
To fix, introduce a growable trigram array and give up on trying to
choose the correct max buffer sizes ahead of time.
Backpatch to v18, but no further. In previous versions lower-casing was
done character by character, and thus the assumption that lower-casing
doesn't change the character length was valid. That was changed in v18,
commit
|
||
|---|---|---|
| .github | ||
| config | ||
| contrib | ||
| doc | ||
| src | ||
| .cirrus.star | ||
| .cirrus.tasks.yml | ||
| .cirrus.yml | ||
| .dir-locals.el | ||
| .editorconfig | ||
| .git-blame-ignore-revs | ||
| .gitattributes | ||
| .gitignore | ||
| .mailmap | ||
| aclocal.m4 | ||
| configure | ||
| configure.ac | ||
| COPYRIGHT | ||
| GNUmakefile.in | ||
| HISTORY | ||
| Makefile | ||
| meson.build | ||
| meson_options.txt | ||
| README.md | ||
PostgreSQL Database Management System
This directory contains the source code distribution of the PostgreSQL database management system.
PostgreSQL is an advanced object-relational database management system that supports an extended subset of the SQL standard, including transactions, foreign keys, subqueries, triggers, user-defined types and functions. This distribution also contains C language bindings.
Copyright and license information can be found in the file COPYRIGHT.
General documentation about this version of PostgreSQL can be found at https://www.postgresql.org/docs/devel/. In particular, information about building PostgreSQL from the source code can be found at https://www.postgresql.org/docs/devel/installation.html.
The latest version of this software, and related software, may be obtained at https://www.postgresql.org/download/. For more information look at our web site located at https://www.postgresql.org/.