There are more compression parameters that can be specified than just
an integer compression level, so rename the new COMPRESSION_LEVEL
option to COMPRESSION_DETAIL before it gets released. Introduce a
flexible syntax for that option to allow arbitrary options to be
specified without needing to adjust the main replication grammar,
and common code to parse it that is shared between the client and
the server.
This commit doesn't actually add any new compression parameters,
so the only user-visible change is that you can now type something
like pg_basebackup --compress gzip:level=5 instead of writing just
pg_basebackup --compress gzip:5. However, it should make it easy to
add new options. If for example gzip starts offering fries, we can
support pg_basebackup --compress gzip:level=5,fries=true for the
benefit of users who want fries with that.
Along the way, this fixes a few things in pg_basebackup so that the
pg_basebackup can be used with a server-side compression algorithm
that pg_basebackup itself does not understand. For example,
pg_basebackup --compress server-lz4 could still succeed even if
only the server and not the client has LZ4 support, provided that
the other options to pg_basebackup don't require the client to
decompress the archive.
Patch by me. Reviewed by Justin Pryzby and Dagfinn Ilmari Mannsåker.
Discussion: http://postgr.es/m/CA+TgmoYvpetyRAbbg1M8b3-iHsaN4nsgmWPjOENu5-doHuJ7fA@mail.gmail.com
The AF_UNIX macro was being used unprotected by HAVE_UNIX_SOCKETS,
apparently since 2008. So the redirection through IS_AF_UNIX() is
apparently no longer necessary. (More generally, all supported
platforms are now HAVE_UNIX_SOCKETS, but even if there were a new
platform in the future, it seems plausible that it would define the
AF_UNIX symbol even without kernel support.) So remove the
IS_AF_UNIX() macro and make the code a bit more consistent.
Discussion: https://www.postgresql.org/message-id/flat/f2d26815-9832-e333-d52d-72fbc0ade896%40enterprisedb.com
This is similar to b69aba7, except that this completes the work for
HMAC with a new routine called pg_hmac_error() that would provide more
context about the type of error that happened during a HMAC computation:
- The fallback HMAC implementation in hmac.c relies on cryptohashes, so
in some code paths it is necessary to return back the error generated by
cryptohashes.
- For the OpenSSL implementation (hmac_openssl.c), the logic is very
similar to cryptohash_openssl.c, where the error context comes from
OpenSSL if one of its internal routines failed, with different error
codes if something internal to hmac_openssl.c failed or was incorrect.
Any in-core code paths that use the centralized HMAC interface are
related to SCRAM, for errors that are unlikely going to happen, with
only SHA-256. It would be possible to see errors when computing some
HMACs with MD5 for example and OpenSSL FIPS enabled, and this commit
would help in reporting the correct errors but nothing in core uses
that. So, at the end, no backpatch to v14 is done, at least for now.
Errors in SCRAM related to the computation of the server key, stored
key, etc. need to pass down the potential error context string across
more layers of their respective call stacks for the frontend and the
backend, so each surrounding routine is adapted for this purpose.
Reviewed-by: Sergey Shinderuk
Discussion: https://postgr.es/m/Yd0N9tSAIIkFd+qi@paquier.xyz
The existing cryptohash facility was causing problems in some code paths
related to MD5 (frontend and backend) that relied on the fact that the
only type of error that could happen would be an OOM, as the MD5
implementation used in PostgreSQL ~13 (the in-core implementation is
used when compiling with or without OpenSSL in those older versions),
could fail only under this circumstance.
The new cryptohash facilities can fail for reasons other than OOMs, like
attempting MD5 when FIPS is enabled (upstream OpenSSL allows that up to
1.0.2, Fedora and Photon patch OpenSSL 1.1.1 to allow that), so this
would cause incorrect reports to show up.
This commit extends the cryptohash APIs so as callers of those routines
can fetch more context when an error happens, by using a new routine
called pg_cryptohash_error(). The error states are stored within each
implementation's internal context data, so as it is possible to extend
the logic depending on what's suited for an implementation. The default
implementation requires few error states, but OpenSSL could report
various issues depending on its internal state so more is needed in
cryptohash_openssl.c, and the code is shaped so as we are always able to
grab the necessary information.
The core code is changed to adapt to the new error routine, painting
more "const" across the call stack where the static errors are stored,
particularly in authentication code paths on variables that provide
log details. This way, any future changes would warn if attempting to
free these strings. The MD5 authentication code was also a bit blurry
about the handling of "logdetail" (LOG sent to the postmaster), so
improve the comments related that, while on it.
The origin of the problem is 87ae969, that introduced the centralized
cryptohash facility. Extra changes are done for pgcrypto in v14 for the
non-OpenSSL code path to cope with the improvements done by this
commit.
Reported-by: Michael Mühlbeyer
Author: Michael Paquier
Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/89B7F072-5BBE-4C92-903E-D83E865D9367@trivadis.com
Backpatch-through: 14
This reverts commits c2d1eea9e and 11b500072, as well as similar hacks
elsewhere, in favor of setting up the PGDLLIMPORT macro so that it can
just be used unconditionally. That can work because in frontend code,
we need no marking in either the defining or consuming files for a
variable exported from these libraries; and frontend code has no need
to access variables exported from the core backend, either.
While at it, write some actual documentation about the PGDLLIMPORT
and PGDLLEXPORT macros.
Patch by me, based on a suggestion from Robert Haas.
Discussion: https://postgr.es/m/1160385.1638165449@sss.pgh.pa.us
PGDLLIMPORT is only appropriate for variables declared in the backend,
not when the variable is coming from a library included in frontend code.
(This isn't a particularly nice fix, but for now, use the same method
employed elsewhere.)
Discussion: https://postgr.es/m/E1mrWUD-000235-Hq@gemulon.postgresql.org
Standardize on xoroshiro128** as our basic PRNG algorithm, eliminating
a bunch of platform dependencies as well as fundamentally-obsolete PRNG
code. In addition, this API replacement will ease replacing the
algorithm again in future, should that become necessary.
xoroshiro128** is a few percent slower than the drand48 family,
but it can produce full-width 64-bit random values not only 48-bit,
and it should be much more trustworthy. It's likely to be noticeably
faster than the platform's random(), depending on which platform you
are thinking about; and we can have non-global state vectors easily,
unlike with random(). It is not cryptographically strong, but neither
are the functions it replaces.
Fabien Coelho, reviewed by Dean Rasheed, Aleksander Alekseev, and myself
Discussion: https://postgr.es/m/alpine.DEB.2.22.394.2105241211230.165418@pseudo
Up to now, you couldn't escape out of psql's \password command
by typing control-C (or other local spelling of SIGINT). This
is pretty user-unfriendly, so improve it. To do so, we have to
modify the functions provided by pg_get_line.c; but we don't
want to mess with psql's SIGINT handler setup, so provide an
API that lets that handler cause the cancel to occur.
This relies on the assumption that we won't do any major harm by
longjmp'ing out of fgets(). While that's obviously a little shaky,
we've long had the same assumption in the main input loop, and few
issues have been reported.
psql has some other simple_prompt() calls that could usefully
be improved the same way; for now, just deal with \password.
Nathan Bossart, minor tweaks by me
Discussion: https://postgr.es/m/747443.1635536754@sss.pgh.pa.us
The hardcoded "wide character" set in ucs_wcwidth() was last updated
around the Unicode 5.0 era. This led to misalignment when printing
emojis and other codepoints that have since been designated
wide or full-width.
To fix and keep up to date, extend update-unicode to download the list
of wide and full-width codepoints from the offical sources.
In passing, remove some comments about non-spacing characters that
haven't been accurate since we removed the former hardcoded logic.
Jacob Champion
Reported and reviewed by Pavel Stehule
Discussion: https://www.postgresql.org/message-id/flat/CAFj8pRCeX21O69YHxmykYySYyprZAqrKWWg0KoGKdjgqcGyygg@mail.gmail.com
This is a combined revert of the following commits:
- c3826f8, a refactoring piece that moved the hex decoding code to
src/common/. This code was cleaned up by aef8948, as it originally
included no overflow checks in the same way as the base64 routines in
src/common/ used by SCRAM, making it unsafe for its purpose.
- aef8948, a more advanced refactoring of the hex encoding/decoding code
to src/common/ that added sanity checks on the result buffer for hex
decoding and encoding. As reported by Hans Buschmann, those overflow
checks are expensive, and it is possible to see a performance drop in
the decoding/encoding of bytea or LOs the longer they are. Simple SQLs
working on large bytea values show a clear difference in perf profile.
- ccf4e27, a cleanup made possible by aef8948.
The reverts of all those commits bring back the performance of hex
decoding and encoding back to what it was in ~13. Fow now and
post-beta3, this is the simplest option.
Reported-by: Hans Buschmann
Discussion: https://postgr.es/m/1629039545467.80333@nidsa.net
Backpatch-through: 14
Similarly to the cryptohash implementations, this refactors the existing
HMAC code into a single set of APIs that can be plugged with any crypto
libraries PostgreSQL is built with (only OpenSSL currently). If there
is no such libraries, a fallback implementation is available. Those new
APIs are designed similarly to the existing cryptohash layer, so there
is no real new design here, with the same logic around buffer bound
checks and memory handling.
HMAC has a dependency on cryptohashes, so all the cryptohash types
supported by cryptohash{_openssl}.c can be used with HMAC. This
refactoring is an advantage mainly for SCRAM, that included its own
implementation of HMAC with SHA256 without relying on the existing
crypto libraries even if PostgreSQL was built with their support.
This code has been tested on Windows and Linux, with and without
OpenSSL, across all the versions supported on HEAD from 1.1.1 down to
1.0.1. I have also checked that the implementations are working fine
using some sample results, a custom extension of my own, and doing
cross-checks across different major versions with SCRAM with the client
and the backend.
Author: Michael Paquier
Reviewed-by: Bruce Momjian
Discussion: https://postgr.es/m/X9m0nkEJEzIPXjeZ@paquier.xyz
Point to the specific line where the error was detected; the
previous code tended to include several preceding lines as well.
Avoid re-scanning the entire input to recompute which line that
was. Simplify the logic a bit. Add test cases.
Simon Riggs and Hamid Akhtar, reviewed by Daniel Gustafsson and myself
Discussion: https://postgr.es/m/CANbhV-EPBnXm3MF_TTWBwwqgn1a1Ghmep9VHfqmNBQ8BT0f+_g@mail.gmail.com
With its current design, a careless use of pg_cryptohash_final() could
would result in an out-of-bound write in memory as the size of the
destination buffer to store the result digest is not known to the
cryptohash internals, without the caller knowing about that. This
commit adds a new argument to pg_cryptohash_final() to allow such sanity
checks, and implements such defenses.
The internals of SCRAM for HMAC could be tightened a bit more, but as
everything is based on SCRAM_KEY_LEN with uses particular to this code
there is no need to complicate its interface more than necessary, and
this comes back to the refactoring of HMAC in core. Except that, this
minimizes the uses of the existing DIGEST_LENGTH variables, relying
instead on sizeof() for the result sizes. In ossp-uuid, this also makes
the code more defensive, as it already relied on dce_uuid_t being at
least the size of a MD5 digest.
This is in philosophy similar to cfc40d3 for base64.c and aef8948 for
hex.c.
Reported-by: Ranier Vilela
Author: Michael Paquier, Ranier Vilela
Reviewed-by: Kyotaro Horiguchi
Discussion: https://postgr.es/m/CAEudQAoqEGmcff3J4sTSV-R_16Monuz-UpJFbf_dnVH=APr02Q@mail.gmail.com
With this commit, SHA1 goes through the implementation provided by
OpenSSL via EVP when building the backend with it, and uses as fallback
implementation KAME which was located in pgcrypto and already shaped for
an integration with a set of init, update and final routines.
Structures and routines have been renamed to make things consistent with
the fallback implementations of MD5 and SHA2.
uuid-ossp has used for ages a shortcut with pgcrypto to fetch a copy of
SHA1 if needed. This was built depending on the build options within
./configure, so this cleans up some code and removes the build
dependency between pgcrypto and uuid-ossp.
Note that this will help with the refactoring of HMAC, as pgcrypto
offers the option to use MD5, SHA1 or SHA2, so only the second option
was missing to make that possible.
Author: Michael Paquier
Reviewed-by: Heikki Linnakangas
Discussion: https://postgr.es/m/X9HXKTgrvJvYO7Oh@paquier.xyz
This commit addresses some issues with c3826f83 that moved the hex
decoding routine to src/common/:
- The decoding function lacked overflow checks, so when used for
security-related features it was an open door to out-of-bound writes if
not carefully used that could remain undetected. Like the base64
routines already in src/common/ used by SCRAM, this routine is reworked
to check for overflows by having the size of the destination buffer
passed as argument, with overflows checked before doing any writes.
- The encoding routine was missing. This is moved to src/common/ and
it gains the same overflow checks as the decoding part.
On failure, the hex routines of src/common/ issue an error as per the
discussion done to make them usable by frontend tools, but not by shared
libraries. Note that this is why ECPG is left out of this commit, and
it still includes a duplicated logic doing hex encoding and decoding.
While on it, this commit uses better variable names for the source and
destination buffers in the existing escape and base64 routines in
encode.c and it makes them more robust to overflow detection. The
previous core code issued a FATAL after doing out-of-bound writes if
going through the SQL functions, which would be enough to detect
problems when working on changes that impacted this area of the
code. Instead, an error is issued before doing an out-of-bound write.
The hex routines were being directly called for bytea conversions and
backup manifests without such sanity checks. The current calls happen
to not have any problems, but careless uses of such APIs could easily
lead to CVE-class bugs.
Author: Bruce Momjian, Michael Paquier
Reviewed-by: Sehrope Sarkuni
Discussion: https://postgr.es/m/20201231003557.GB22199@momjian.us
This commit addresses two issues:
- In pgcrypto, MD5 computation called pg_cryptohash_{init,update,final}
without checking for the result status.
- Simplify pg_checksum_raw_context to use only one variable for all the
SHA2 options available in checksum manifests.
Reported-by: Heikki Linnakangas
Discussion: https://postgr.es/m/f62f26bb-47a5-8411-46e5-4350823e06a5@iki.fi
The allocation of the cryptohash context data when building with OpenSSL
was happening in the memory context of the caller of
pg_cryptohash_create(), which could lead to issues with resowner cleanup
if cascading resources are cleaned up on an error. Like other
facilities using resowners, move the base allocation to TopMemoryContext
to ensure a correct cleanup on failure.
The resulting code gets simpler with this commit as the context data is
now hold by a unique opaque pointer, so as there is only one single
allocation done in TopMemoryContext.
After discussion, also change the cryptohash subroutines to return an
error if the caller provides NULL for the context data to ease error
detection on OOM.
Author: Heikki Linnakangas
Discussion: https://postgr.es/m/X9xbuEoiU3dlImfa@paquier.xyz
The patch needs test cases, reorganization, and cfbot testing.
Technically reverts commits 5c31afc49d..e35b2bad1a (exclusive/inclusive)
and 08db7c63f3..ccbe34139b.
Reported-by: Tom Lane, Michael Paquier
Discussion: https://postgr.es/m/E1ktAAG-0002V2-VB@gemulon.postgresql.org
This adds a key management system that stores (currently) two data
encryption keys of length 128, 192, or 256 bits. The data keys are
AES256 encrypted using a key encryption key, and validated via GCM
cipher mode. A command to obtain the key encryption key must be
specified at initdb time, and will be run at every database server
start. New parameters allow a file descriptor open to the terminal to
be passed. pg_upgrade support has also been added.
Discussion: https://postgr.es/m/CA+fd4k7q5o6Nc_AaX6BcYM9yqTbC6_pnH-6nSD=54Zp6NBQTCQ@mail.gmail.com
Discussion: https://postgr.es/m/20201202213814.GG20285@momjian.us
Author: Masahiko Sawada, me, Stephen Frost
The same logic was present for collation commands, SASLprep and
pgcrypto, so this removes some code.
Author: Michael Paquier
Reviewed-by: Stephen Frost, Heikki Linnakangas
Discussion: https://postgr.es/m/X9womIn6rne6Gud2@paquier.xyz
This adjusts some code related to recent changes for cryptohash
functions:
- Add a variable in md5.h to track down the size of a computed result,
moved from pgcrypto. Note that pg_md5_hash() assumed a result of this
size already.
- Call explicit_bzero() on the hashed data when freeing the context for
fallback implementations. For MD5, particularly, it would be annoying
to leave some non-zeroed data around.
- Clean up some code related to recent changes of uuid-ossp. .gitignore
still included md5.c and a comment was incorrect.
Discussion: https://postgr.es/m/X9HXKTgrvJvYO7Oh@paquier.xyz
This commit heavily reorganizes the MD5 implementations that exist in
the tree in various aspects.
First, MD5 is added to the list of options available in cryptohash.c and
cryptohash_openssl.c. This means that if building with OpenSSL, EVP is
used for MD5 instead of the fallback implementation that Postgres had
for ages. With the recent refactoring work for cryptohash functions,
this change is straight-forward. If not building with OpenSSL, a
fallback implementation internal to src/common/ is used.
Second, this reduces the number of MD5 implementations present in the
tree from two to one, by moving the KAME implementation from pgcrypto to
src/common/, and by removing the implementation that existed in
src/common/. KAME was already structured with an init/update/final set
of routines by pgcrypto (see original pgcrypto/md5.h) for compatibility
with OpenSSL, so moving it to src/common/ has proved to be a
straight-forward move, requiring no actual manipulation of the internals
of each routine. Some benchmarking has not shown any performance gap
between both implementations.
Similarly to the fallback implementation used for SHA2, the fallback
implementation of MD5 is moved to src/common/md5.c with an internal
header called md5_int.h for the init, update and final routines. This
gets then consumed by cryptohash.c.
The original routines used for MD5-hashed passwords are moved to a
separate file called md5_common.c, also in src/common/, aimed at being
shared between all MD5 implementations as utility routines to keep
compatibility with any code relying on them.
Like the SHA2 changes, this commit had its round of tests on both Linux
and Windows, across all versions of OpenSSL supported on HEAD, with and
even without OpenSSL.
Author: Michael Paquier
Reviewed-by: Daniel Gustafsson
Discussion: https://postgr.es/m/20201106073434.GA4961@paquier.xyz
Two new routines to allocate a hash context and to free it are created,
as these become necessary for the goal behind this refactoring: switch
the all cryptohash implementations for OpenSSL to use EVP (for FIPS and
also because upstream does not recommend the use of low-level cryptohash
functions for 20 years). Note that OpenSSL hides the internals of
cryptohash contexts since 1.1.0, so it is necessary to leave the
allocation to OpenSSL itself, explaining the need for those two new
routines. This part is going to require more work to properly track
hash contexts with resource owners, but this not introduced here.
Still, this refactoring makes the move possible.
This reduces the number of routines for all SHA2 implementations from
twelve (SHA{224,256,386,512} with init, update and final calls) to five
(create, free, init, update and final calls) by incorporating the hash
type directly into the hash context data.
The new cryptohash routines are moved to a new file, called cryptohash.c
for the fallback implementations, with SHA2 specifics becoming a part
internal to src/common/. OpenSSL specifics are part of
cryptohash_openssl.c. This infrastructure is usable for more hash
types, like MD5 or HMAC.
Any code paths using the internal SHA2 routines are adapted to report
correctly errors, which are most of the changes of this commit. The
zones mostly impacted are checksum manifests, libpq and SCRAM.
Note that e21cbb4 was a first attempt to switch SHA2 to EVP, but it
lacked the refactoring needed for libpq, as done here.
This patch has been tested on Linux and Windows, with and without
OpenSSL, and down to 1.0.1, the oldest version supported on HEAD.
Author: Michael Paquier
Reviewed-by: Daniel Gustafsson
Discussion: https://postgr.es/m/20200924025314.GE7405@paquier.xyz
This replaces the existing binary search with two perfect hash functions
for the composition and the decomposition in the backend code, at the
cost of slightly-larger binaries there (35kB in libpgcommon_srv.a). Per
the measurements done, this improves the speed of the recomposition and
decomposition by up to 30~40 times for the NFC and NFKC conversions,
while all other operations get at least 40% faster. This is not as
"good" as what libicu has, but it closes the gap a lot as per the
feedback from Daniel Verite.
The decomposition table remains the same, getting used for the binary
search in the frontend code, where we care more about the size of the
libraries like libpq over performance as this gets involved only in code
paths related to the SCRAM authentication. In consequence, note that
the perfect hash function for the recomposition needs to use a new
inverse lookup array back to to the existing decomposition table.
The size of all frontend deliverables remains unchanged, even with
--enable-debug, including libpq.
Author: John Naylor
Reviewed-by: Michael Paquier, Tom Lane
Discussion: https://postgr.es/m/CAFBsxsHUuMFCt6-pU+oG-F1==CmEp8wR+O+bRouXWu6i8kXuqA@mail.gmail.com
80f8eb7 has added to the normalization quick check headers some code
generated by PerfectHash.pm that is incompatible with the settings of
gitattributes for this repository, as whitespaces followed a set of tabs
for the first element of a line in the table. Instead of adding a new
exception to gitattributes, rework the format generated so as a right
padding with spaces is used instead of a left padding. This keeps the
table generated in a readable shape with its set of columns, making
unnecessary an update of gitattributes.
Reported-by: Peter Eisentraut
Author: John Naylor
Discussion: https://postgr.es/m/d601b3b5-a3c7-5457-2f84-3d6513d690fc@2ndquadrant.com
This makes the normalization quick check about 30% faster for NFC and
50% faster for NFKC than the binary search used previously. The hash
lookup reuses the existing array of bit fields used for the binary
search to get the quick check property and is generated as part of "make
update-unicode" in src/common/unicode/.
Author: John Naylor
Reviewed-by: Mark Dilger, Michael Paquier
Discussion: https://postgr.es/m/CACPNZCt4fbJ0_bGrN5QPt34N4whv=mszM0LMVQdoa2rC9UMRXA@mail.gmail.com
This reverts commit e21cbb4, as the switch to EVP routines requires a
more careful design where we would need to have at least our wrapper
routines return a status instead of issuing an error by themselves to
let the caller do the error handling. The memory handling was also
incorrect and could cause leaks in the backend if a failure happened,
requiring most likely a callback to do the necessary cleanup as the only
clean way to be able to allocate an EVP context requires the use of an
allocation within OpenSSL. The potential rework of the wrappers also
impacts the fallback implementation when not building with OpenSSL.
Originally, prairiedog has reported a compilation failure, but after
discussion with Tom Lane this needs a better design.
Discussion: https://postgr.es/m/20200928073330.GC2316@paquier.xyz
The use of low-level hash routines is not recommended by upstream
OpenSSL since 2000, and pgcrypto already switched to EVP as of 5ff4a67.
Note that this also fixes a failure with SCRAM authentication when using
FIPS in OpenSSL, but as there have been few complaints about this
problem and as this causes an ABI breakage, no backpatch is done.
Author: Michael Paquier, Alessandro Gherardi
Reviewed-by: Daniel Gustafsson
Discussion: https://postgr.es/m/20200924025314.GE7405@paquier.xyz
Discussion: https://postgr.es/m/20180911030250.GA27115@paquier.xyz
Further experience says that the appending behavior offered by
pg_get_line_append is useful to only a very small minority of callers.
For most, the requirement to reset the buffer after each line is just
an error-prone nuisance. Hence, invent another alternative call
pg_get_line_buf, which takes care of that detail.
Noted while reviewing a patch from Daniel Gustafsson.
Discussion: https://postgr.es/m/48A4FA71-524E-41B9-953A-FD04EF36E2E7@yesql.se
Up to now, if you tried to omit "AS" before a column label in a SELECT
list, it would only work if the column label was an IDENT, that is not
any known keyword. This is rather unfriendly considering that we have
so many keywords and are constantly growing more. In the wake of commit
1ed6b8956 it's possible to improve matters quite a bit.
We'd originally tried to make this work by having some of the existing
keyword categories be allowed without AS, but that didn't work too well,
because each category contains a few special cases that don't work
without AS. Instead, invent an entirely orthogonal keyword property
"can be bare column label", and mark all keywords that way for which
we don't get shift/reduce errors by doing so.
It turns out that of our 450 current keywords, all but 39 can be made
bare column labels, improving the situation by over 90%. This number
might move around a little depending on future grammar work, but it's
a pretty nice improvement.
Mark Dilger, based on work by myself and Robert Haas;
review by John Naylor
Discussion: https://postgr.es/m/38ca86db-42ab-9b48-2902-337a0d6b8311@2ndquadrant.com
Instead of hard-wiring specific verbosity levels into the option
processing of client applications, invent pg_logging_increase_verbosity()
and encourage clients to implement --verbose by calling that. Then,
the common convention that more -v's gets you more verbosity just works.
In particular, this allows resurrection of the debug-grade messages that
have long existed in pg_dump and its siblings. They were unreachable
before this commit due to lack of a way to select PG_LOG_DEBUG logging
level. (It appears that they may have been unreachable for some time
before common/logging.c was introduced, too, so I'm not specifically
blaming cc8d41511 for the oversight. One reason for thinking that is
that it's now apparent that _allocAH()'s message needs a null-pointer
guard. Testing might have failed to reveal that before 96bf88d52.)
Discussion: https://postgr.es/m/1173106.1600116625@sss.pgh.pa.us
Some kernels can tell us the type of a "dirent", so we can avoid a call
to stat() or lstat() in many cases. Define a new function
get_dirent_type() to contain that logic, for use by the backend and
frontend versions of walkdir(), and perhaps other callers in future.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Juan José Santamaría Flecha <juanjo.santamaria@gmail.com>
Discussion: https://postgr.es/m/CA%2BhUKG%2BFzxupGGN4GpUdbzZN%2Btn6FQPHo8w0Q%2BAPH5Wz8RG%2Bww%40mail.gmail.com
Letting the caller provide a StringInfo to read into is helpful when
the caller needs to merge lines or otherwise modify the data after
it's been read. Notably, now the code added by commit 8f8154a50
can use pg_get_line_append() instead of having its own copy of that
logic. A follow-on commit will also make use of this.
Also, since StringInfo buffers are a minimum of 1KB long, blindly
using pg_get_line() in a loop can eat a lot more memory than one would
expect. I discovered for instance that commit e0f05cd5b caused initdb
to consume circa 10MB to read postgres.bki, even though that's under
1MB worth of data. A less memory-hungry alternative is to re-use the
same StringInfo for all lines and pg_strdup the results.
Discussion: https://postgr.es/m/1315832.1599345736@sss.pgh.pa.us
This patch started out with the goal of harmonizing various arbitrary
limits on password length, but after awhile a better idea emerged:
let's just get rid of those fixed limits.
recv_password_packet() has an arbitrary limit on the packet size,
which we don't really need, so just drop it. (Note that this doesn't
really affect anything for MD5 or SCRAM password verification, since
those will hash the user's password to something shorter anyway.
It does matter for auth methods that require a cleartext password.)
Likewise remove the arbitrary error condition in pg_saslprep().
The remaining limits are mostly in client-side code that prompts
for passwords. To improve those, refactor simple_prompt() so that
it allocates its own result buffer that can be made as big as
necessary. Actually, it proves best to make a separate routine
pg_get_line() that has essentially the semantics of fgets(), except
that it allocates a suitable result buffer and hence will never
return a truncated line. (pg_get_line has a lot of potential
applications to replace randomly-sized fgets buffers elsewhere,
but I'll leave that for another patch.)
I built pg_get_line() atop stringinfo.c, which requires moving
that code to src/common/; but that seems fine since it was a poor
fit for src/port/ anyway.
This patch is mostly mine, but it owes a good deal to Nathan Bossart
who pressed for a solution to the password length problem and
created a predecessor patch. Also thanks to Peter Eisentraut and
Stephen Frost for ideas and discussion.
Discussion: https://postgr.es/m/09512C4F-8CB9-4021-B455-EF4C4F0D55A0@amazon.com
Any libpq client can use the header. Clients include backend components
postgres_fdw, dblink, and logical replication apply worker. Back-patch
to v10, because another fix needs this. In released branches, just copy
the header and keep the original.
OpenSSL's native reports about problems related to protocol version
restrictions are pretty opaque and inconsistent. When we get an
SSL error that is plausibly due to this, emit a hint message that
includes the range of SSL protocol versions we (think we) are
allowing. This should at least get the user thinking in the right
direction to resolve the problem, even if the hint isn't totally
accurate, which it might not be for assorted reasons.
Back-patch to v13 where we increased the default minimum protocol
version, thereby increasing the risk of this class of failure.
Patch by me, reviewed by Daniel Gustafsson
Discussion: https://postgr.es/m/a9408304-4381-a5af-d259-e55d349ae4ce@2ndquadrant.com
fe_archive.c was compiled only for the frontend in src/common/, but as
it will never share anything with the backend, it makes most sense to
move this file to src/fe_utils/.
Reported-by: Peter Eisentraut
Discussion: https://postgr.es/m/e9766d71-8655-ac86-bdf6-77e0e7169977@2ndquadrant.com
Backpatch-through: 13
Includes some manual cleanup of places that pgindent messed up,
most of which weren't per project style anyway.
Notably, it seems some people didn't absorb the style rules of
commit c9d297751, because there were a bunch of new occurrences
of function calls with a newline just after the left paren, all
with faulty expectations about how the rest of the call would get
indented.
This adds SQL expressions NORMALIZE() and IS NORMALIZED to convert and
check Unicode normal forms, per SQL standard.
To support fast IS NORMALIZED tests, we pull in a new data file
DerivedNormalizationProps.txt from Unicode and build a lookup table
from that, using techniques similar to ones already used for other
Unicode data. make update-unicode will keep it up to date. We only
build and use these tables for the NFC and NFKC forms, because they
are too big for NFD and NFKD and the improvement is not significant
enough there.
Reviewed-by: Daniel Verite <daniel@manitou-mail.org>
Reviewed-by: Andreas Karlsson <andreas@proxel.se>
Discussion: https://www.postgresql.org/message-id/flat/c1909f27-c269-2ed9-12f8-3ab72c8caf7a@2ndquadrant.com
pg_rewind needs to copy from the source cluster to the target cluster a
set of relation blocks changed from the previous checkpoint where WAL
forked up to the end of WAL on the target. Building this list of
relation blocks requires a range of WAL segments that may not be present
anymore on the target's pg_wal, causing pg_rewind to fail. It is
possible to work around this issue by copying manually the WAL segments
needed but this may lead to some extra and actually useless work.
This commit introduces a new option allowing pg_rewind to use a
restore_command while doing the rewind by grabbing the parameter value
of restore_command from the target cluster configuration. This allows
the rewind operation to be more reliable, so as only the WAL segments
needed by the rewind are restored from the archives.
In order to be able to do that, a new routine is added to src/common/ to
allow frontend tools to restore files from archives using an
already-built restore command. This version is more simple than the
backend equivalent as there is no need to handle the non-recovery case.
Author: Alexey Kondratov
Reviewed-by: Andrey Borodin, Andres Freund, Alvaro Herrera, Alexander
Korotkov, Michael Paquier
Discussion: https://postgr.es/m/a3acff50-5a0d-9a2c-b3b2-ee36168955c1@postgrespro.ru
It previously only supported NFKC, for use by SASLprep. This expands
the API to offer the choice of all four normalization forms. Right
now, there are no internal users of the forms other than NFKC.
Reviewed-by: Daniel Verite <daniel@manitou-mail.org>
Reviewed-by: Andreas Karlsson <andreas@proxel.se>
Discussion: https://www.postgresql.org/message-id/flat/c1909f27-c269-2ed9-12f8-3ab72c8caf7a@2ndquadrant.com
restore_command has only been used until now by the backend, but there
is a pending patch for pg_rewind to make use of that in the frontend.
Author: Alexey Kondratov
Reviewed-by: Andrey Borodin, Andres Freund, Alvaro Herrera, Alexander
Korotkov, Michael Paquier
Discussion: https://postgr.es/m/a3acff50-5a0d-9a2c-b3b2-ee36168955c1@postgrespro.ru
This also involves renaming src/include/utils/hashutils.h, which
becomes src/include/common/hashfn.h. Perhaps an argument can be
made for keeping the hashutils.h name, but it seemed more
consistent to make it match the name of the file, and also more
descriptive of what is actually going on here.
Patch by me, reviewed by Suraj Kharage and Mark Dilger. Off-list
advice on how not to break the Windows build from Davinder Singh
and Amit Kapila.
Discussion: http://postgr.es/m/CA+TgmoaRiG4TXND8QuM6JXFRkM_1wL2ZNhzaUKsuec9-4yrkgw@mail.gmail.com
To make this work, (1) makeJsonLexContextCstringLen now takes the
encoding to be used as an argument; (2) check_stack_depth() is made to
do nothing in frontend code, and (3) elog(ERROR, ...) is changed to
pg_log_fatal + exit in frontend code.
Mark Dilger, reviewed and slightly revised by me.
Discussion: http://postgr.es/m/CA+TgmoYfOXhd27MUDGioVh6QtpD0C1K-f6ObSA10AWiHBAL5bA@mail.gmail.com
Two routines have been added in OpenSSL 1.1.0 to set the protocol bounds
allowed within a given SSL context:
- SSL_CTX_set_min_proto_version
- SSL_CTX_set_max_proto_version
As Postgres supports OpenSSL down to 1.0.1 (as of HEAD), equivalent
replacements exist in the tree, which are only available for the
backend. A follow-up patch is planned to add control of the SSL
protocol bounds for libpq, so move those routines to src/common/ so as
libpq can use them.
Author: Daniel Gustafsson
Discussion: https://postgr.es/m/4F246AE3-A7AE-471E-BD3D-C799D3748E03@yesql.se
We currently have several sets of files generated from data provided
by Unicode. These all have ad hoc rules and instructions for updating
when new Unicode versions appear, and it's not done consistently.
This patch centralizes and automates the process and makes it part of
the release checklist. The Unicode and CLDR versions are specified in
Makefile.global.in. There is a new make target "update-unicode" that
downloads all the relevant files and runs the generation script.
There is also a new script for generating the table of combining
characters for ucs_wcwidth(). That table is now in a separate include
file rather than hardcoded into the middle of other code. This is
based on the script that was used for generating
d8594d123c, but the script itself wasn't
committed at that time.
Reviewed-by: John Naylor <john.naylor@2ndquadrant.com>
Discussion: https://www.postgresql.org/message-id/flat/c8d05f42-443e-6c23-819b-05b31759a37c@2ndquadrant.com
Within the context of SCRAM, "verifier" has a specific meaning in the
protocol, per RFCs. The existing code used "verifier" differently, to
mean whatever is or would be stored in pg_auth.rolpassword.
Fix this by using the term "secret" for this, following RFC 5803.
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://www.postgresql.org/message-id/flat/be397b06-6e4b-ba71-c7fb-54cae84a7e18%402ndquadrant.com
Commit 4d0e994eed added support for partial TOAST decompression, so the
decompression is interrupted after producing the requested prefix. For
prefix and slices near the beginning of the entry, this may saves a lot
of decompression work.
That however only deals with decompression - the whole compressed entry
was still fetched and re-assembled, even though the compression used
only a small fraction of it. This commit improves that by computing how
much compressed data may be needed to decompress the requested prefix,
and then fetches only the necessary part.
We always need to fetch a bit more compressed data than the requested
(uncompressed) prefix, because the prefix may not be compressible at all
and pglz itself adds a bit of overhead. That means this optimization is
most effective when the requested prefix is much smaller than the whole
compressed entry.
Author: Binguo Bao
Reviewed-by: Andrey Borodin, Tomas Vondra, Paul Ramsey
Discussion: https://www.postgresql.org/message-id/flat/CAL-OGkthU9Gs7TZchf5OWaL-Gsi=hXqufTxKv9qpNG73d5na_g@mail.gmail.com
Similarly to the signed versions added in 4d6ad31, this adds a set of
inline functions for overflow checks with unsigned integers, including
uint16, uint32 and uint64. This relies on compiler built-in overflow
checks by default if available. The behavior of unsigned integers is
well-defined so the fallback implementations checks are simple for
additions and subtractions. Multiplications avoid division-based checks
which are expensive if possible, still this can happen for uint64 if
128-bit integers are not available.
While on it, the code in common/int.h is reorganized to avoid too many
duplicated comments. The new macros will be used in a follow-up patch.
All thanks to Andres Freund for the input provided.
Author: Fabien Coelho, Michael Paquier
Discussion: https://postgr.es/m/20190830073423.GB2354@paquier.xyz
b654714 has reworked the way trailing CR/LF characters are removed from
strings. This commit introduces a new routine in common/string.c and
refactors the code so as the logic is in a single place, mostly.
Author: Michael Paquier
Reviewed-by: Bruce Momjian
Discussion: https://postgr.es/m/20190801031820.GF29334@paquier.xyz
This is a follow-up refactoring after 09ec55b and b674211, which has
proved that the encoding and decoding routines used by SCRAM have a
poor interface when it comes to check after buffer overflows. This adds
an extra argument in the shape of the length of the result buffer for
each routine, which is used for overflow checks when encoding or
decoding an input string. The original idea comes from Tom Lane.
As a result of that, the encoding routine can now fail, so all its
callers are adjusted to generate proper error messages in case of
problems.
On failure, the result buffer gets zeroed.
Author: Michael Paquier
Reviewed-by: Daniel Gustafsson
Discussion: https://postgr.es/m/20190623132535.GB1628@paquier.xyz
In order to separate OpenSSL's SHA symbols, this header has been using
USE_SSL, which is equivalent to USE_OPENSSL. There is now only one SSL
implementation included in the tree, so this works fine, but when
adding a new SSL implementation this would run into failures.
Author: Daniel Gustafsson
Discussion: https://postgr.es/m/0DF29010-CE26-4F51-85A6-9C8ABF5536F9@yesql.se
This is still using the 2.0 version of pg_bsd_indent.
I thought it would be good to commit this separately,
so as to document the differences between 2.0 and 2.1 behavior.
Discussion: https://postgr.es/m/16296.1558103386@sss.pgh.pa.us
The original placement of this module in src/fe_utils/ is ill-considered,
because several src/common/ modules have dependencies on it, meaning that
libpgcommon and libpgfeutils now have mutual dependencies. That makes it
pointless to have distinct libraries at all. The intended design is that
libpgcommon is lower-level than libpgfeutils, so only dependencies from
the latter to the former are acceptable.
We already have the precedent that fe_memutils and a couple of other
modules in src/common/ are frontend-only, so it's not stretching anything
out of whack to treat logging.c as a frontend-only module in src/common/.
To the extent that such modules help provide a common frontend/backend
environment for the rest of common/ to use, it's a reasonable design.
(logging.c does not yet provide an ereport() emulation, but one can
dream.)
Hence, move these files over, and revert basically all of the build-system
changes made by commit cc8d41511. There are no places that need to grow
new dependencies on libpgcommon, further reinforcing the idea that this
is the right solution.
Discussion: https://postgr.es/m/a912ffff-f6e4-778a-c86a-cf5c47a12933@2ndquadrant.com
This commit fixes a couple of issues related to the way password
verifiers hashed with MD5 or SCRAM-SHA-256 are detected, leading to
being able to store in catalogs passwords which do not follow the
supported hash formats:
- A MD5-hashed entry was checked based on if its header uses "md5" and
if the string length matches what is expected. Unfortunately the code
never checked if the hash only used hexadecimal characters, as reported
by Tom Lane.
- A SCRAM-hashed entry was checked based on only its header, which
should be "SCRAM-SHA-256$", but it never checked for any fields
afterwards, as reported by Jonathan Katz.
Backpatch down to v10, which is where SCRAM has been introduced, and
where password verifiers in plain format have been removed.
Author: Jonathan Katz
Reviewed-by: Tom Lane, Michael Paquier
Discussion: https://postgr.es/m/016deb6b-1f0a-8e9f-1833-a8675b170aa9@postgresql.org
Backpatch-through: 10
When asked for a slice of a TOAST entry, decompress enough to return the
slice instead of decompressing the entire object.
For use cases where the slice is at, or near, the beginning of the entry,
this avoids a lot of unnecessary decompression work.
This changes the signature of pglz_decompress() by adding a boolean to
indicate if it's ok for the call to finish before consuming all of the
source or destination buffers.
Author: Paul Ramsey
Reviewed-By: Rafia Sabih, Darafei Praliaskouski, Regina Obe
Discussion: https://postgr.es/m/CACowWR07EDm7Y4m2kbhN_jnys%3DBBf9A6768RyQdKm_%3DNpkcaWg%40mail.gmail.com
This unifies the various ad hoc logging (message printing, error
printing) systems used throughout the command-line programs.
Features:
- Program name is automatically prefixed.
- Message string does not end with newline. This removes a common
source of inconsistencies and omissions.
- Additionally, a final newline is automatically stripped, simplifying
use of PQerrorMessage() etc., another common source of mistakes.
- I converted error message strings to use %m where possible.
- As a result of the above several points, more translatable message
strings can be shared between different components and between
frontends and backend, without gratuitous punctuation or whitespace
differences.
- There is support for setting a "log level". This is not meant to be
user-facing, but can be used internally to implement debug or
verbose modes.
- Lazy argument evaluation, so no significant overhead if logging at
some level is disabled.
- Some color in the messages, similar to gcc and clang. Set
PG_COLOR=auto to try it out. Some colors are predefined, but can be
customized by setting PG_COLORS.
- Common files (common/, fe_utils/, etc.) can handle logging much more
simply by just using one API without worrying too much about the
context of the calling program, requiring callbacks, or having to
pass "progname" around everywhere.
- Some programs called setvbuf() to make sure that stderr is
unbuffered, even on Windows. But not all programs did that. This
is now done centrally.
Soft goals:
- Reduces vertical space use and visual complexity of error reporting
in the source code.
- Encourages more deliberate classification of messages. For example,
in some cases it wasn't clear without analyzing the surrounding code
whether a message was meant as an error or just an info.
- Concepts and terms are vaguely aligned with popular logging
frameworks such as log4j and Python logging.
This is all just about printing stuff out. Nothing affects program
flow (e.g., fatal exits). The uses are just too varied to do that.
Some existing code had wrappers that do some kind of print-and-exit,
and I adapted those.
I tried to keep the output mostly the same, but there is a lot of
historical baggage to unwind and special cases to consider, and I
might not always have succeeded. One significant change is that
pg_rewind used to write all error messages to stdout. That is now
changed to stderr.
Reviewed-by: Donald Dong <xdong@csumb.edu>
Reviewed-by: Arthur Zakirov <a.zakirov@postgrespro.ru>
Discussion: https://www.postgresql.org/message-id/flat/6a609b43-4f57-7348-6480-bd022f924310@2ndquadrant.com
ce6afc6 has begun the refactoring work by plugging pg_rewind into a
central routine to update the control file, and left around two extra
copies, with one in xlog.c for the backend and one in pg_resetwal.c. By
adding an extra option to the central routine in controldata_utils.c to
control if a flush of the control file needs to be done, it is proving
to be straight-forward to make xlog.c and pg_resetwal.c use the central
code path at the condition of moving the wait event tracking there.
Hence, this allows to have only one central code path to update the
control file, shaving the code from the duplicates.
This refactoring actually fixes a problem in pg_resetwal. Previously,
the control file was first removed before being recreated. So if a
crash happened between the moment the file was removed and the moment
the file was created, then it would have been possible to not have a
control file anymore in the database folder.
Author: Fabien Coelho
Reviewed-by: Michael Paquier
Discussion: https://postgr.es/m/alpine.DEB.2.21.1903170935210.2506@lancre
This adds a new routine to src/common/ which is compatible with both the
frontend and backend code, able to update the control file's contents.
This is now getting used only by pg_rewind, but some upcoming patches
which add more control on checksums for offline instances will make use
of it. This could also get used more by the backend as xlog.c has its
own flavor of the same logic with some wait events and an additional
flush phase before closing the opened file descriptor, but this is let
as separate work.
Author: Michael Banck, Michael Paquier
Reviewed-by: Fabien Coelho, Sergei Kornilov
Discussion: https://postgr.es/m/20181221201616.GD4974@nighthawk.caipicrew.dd-dns.de
Previously, floating-point output was done by rounding to a specific
decimal precision; by default, to 6 or 15 decimal digits (losing
information) or as requested using extra_float_digits. Drivers that
wanted exact float values, and applications like pg_dump that must
preserve values exactly, set extra_float_digits=3 (or sometimes 2 for
historical reasons, though this isn't enough for float4).
Unfortunately, decimal rounded output is slow enough to become a
noticable bottleneck when dealing with large result sets or COPY of
large tables when many floating-point values are involved.
Floating-point output can be done much faster when the output is not
rounded to a specific decimal length, but rather is chosen as the
shortest decimal representation that is closer to the original float
value than to any other value representable in the same precision. The
recently published Ryu algorithm by Ulf Adams is both relatively
simple and remarkably fast.
Accordingly, change float4out/float8out to output shortest decimal
representations if extra_float_digits is greater than 0, and make that
the new default. Applications that need rounded output can set
extra_float_digits back to 0 or below, and take the resulting
performance hit.
We make one concession to portability for systems with buggy
floating-point input: we do not output decimal values that fall
exactly halfway between adjacent representable binary values (which
would rely on the reader doing round-to-nearest-even correctly). This
is known to be a problem at least for VS2013 on Windows.
Our version of the Ryu code originates from
https://github.com/ulfjack/ryu/ at commit c9c3fb1979, but with the
following (significant) modifications:
- Output format is changed to use fixed-point notation for small
exponents, as printf would, and also to use lowercase 'e', a
minimum of 2 exponent digits, and a mandatory sign on the exponent,
to keep the formatting as close as possible to previous output.
- The output of exact midpoint values is disabled as noted above.
- The integer fast-path code is changed somewhat (since we have
fixed-point output and the upstream did not).
- Our project style has been largely applied to the code with the
exception of C99 declaration-after-statement, which has been
retained as an exception to our present policy.
- Most of upstream's debugging and conditionals are removed, and we
use our own configure tests to determine things like uint128
availability.
Changing the float output format obviously affects a number of
regression tests. This patch uses an explicit setting of
extra_float_digits=0 for test output that is not expected to be
exactly reproducible (e.g. due to numerical instability or differing
algorithms for transcendental functions).
Conversions from floats to numeric are unchanged by this patch. These
may appear in index expressions and it is not yet clear whether any
change should be made, so that can be left for another day.
This patch assumes that the only supported floating point format is
now IEEE format, and the documentation is updated to reflect that.
Code by me, adapting the work of Ulf Adams and other contributors.
References:
https://dl.acm.org/citation.cfm?id=3192369
Reviewed-by: Tom Lane, Andres Freund, Donald Dong
Discussion: https://postgr.es/m/87r2el1bx6.fsf@news-spur.riddles.org.uk
We've been speculating for a long time that hash-based keyword lookup
ought to be faster than binary search, but up to now we hadn't found
a suitable tool for generating the hash function. Joerg Sonnenberger
provided the inspiration, and sample code, to show us that rolling our
own generator wasn't a ridiculous idea. Hence, do that.
The method used here requires a lookup table of approximately 4 bytes
per keyword, but that's less than what we saved in the predecessor commit
afb0d0712, so it's not a big problem. The time savings is indeed
significant: preliminary testing suggests that the total time for raw
parsing (flex + bison phases) drops by ~20%.
Patch by me, but it owes its existence to Joerg Sonnenberger;
thanks also to John Naylor for review.
Discussion: https://postgr.es/m/20190103163340.GA15803@britannica.bec.de
Previously, ScanKeywordLookup was passed an array of string pointers.
This had some performance deficiencies: the strings themselves might
be scattered all over the place depending on the compiler (and some
quick checking shows that at least with gcc-on-Linux, they indeed
weren't reliably close together). That led to very cache-unfriendly
behavior as the binary search touched strings in many different pages.
Also, depending on the platform, the string pointers might need to
be adjusted at program start, so that they couldn't be simple constant
data. And the ScanKeyword struct had been designed with an eye to
32-bit machines originally; on 64-bit it requires 16 bytes per
keyword, making it even more cache-unfriendly.
Redesign so that the keyword strings themselves are allocated
consecutively (as part of one big char-string constant), thereby
eliminating the touch-lots-of-unrelated-pages syndrome. And get
rid of the ScanKeyword array in favor of three separate arrays:
uint16 offsets into the keyword array, uint16 token codes, and
uint8 keyword categories. That reduces the overhead per keyword
to 5 bytes instead of 16 (even less in programs that only need
one of the token codes and categories); moreover, the binary search
only touches the offsets array, further reducing its cache footprint.
This also lets us put the token codes somewhere else than the
keyword strings are, which avoids some unpleasant build dependencies.
While we're at it, wrap the data used by ScanKeywordLookup into
a struct that can be treated as an opaque type by most callers.
That doesn't change things much right now, but it will make it
less painful to switch to a hash-based lookup method, as is being
discussed in the mailing list thread.
Most of the change here is associated with adding a generator
script that can build the new data structure from the same
list-of-PG_KEYWORD header representation we used before.
The PG_KEYWORD lists that plpgsql and ecpg used to embed in
their scanner .c files have to be moved into headers, and the
Makefiles have to be taught to invoke the generator script.
This work is also necessary if we're to consider hash-based lookup,
since the generator script is what would be responsible for
constructing a hash table.
Aside from saving a few kilobytes in each program that includes
the keyword table, this seems to speed up raw parsing (flex+bison)
by a few percent. So it's worth doing even as it stands, though
we think we can gain even more with a follow-on patch to switch
to hash-based lookup.
John Naylor, with further hacking by me
Discussion: https://postgr.es/m/CAJVSVGXdFVU2sgym89XPL=Lv1zOS5=EHHQ8XWNzFL=mTXkKMLw@mail.gmail.com
The connection authorized message has quite a bit of useful information
in it, but didn't include the application_name (when provided), so let's
add that as it can be very useful.
Note that at the point where we're emitting the connection authorized
message, we haven't processed GUCs, so it's not possible to get this by
using log_line_prefix (which pulls from the GUC). There's also
something to be said for having this included in the connection
authorized message and then not needing to repeat it for every line, as
having it in log_line_prefix would do.
The GUC cleans the application name to pure-ascii, so do that here too,
but pull out the logic for cleaning up a string into its own function
in common and re-use it from those places, and check_cluster_name which
was doing the same thing.
Author: Don Seiler <don@seiler.us>
Discussion: https://postgr.es/m/CAHJZqBB_Pxv8HRfoh%2BAB4KxSQQuPVvtYCzMg7woNR3r7dfmopw%40mail.gmail.com
On ELF-based platforms (and maybe others?) it's possible for a shared
library, when dynamically loaded into the backend, to call the backend
versions of src/port and src/common functions rather than the frontend
versions that are actually linked into the shlib. This is definitely
not what we want, because the frontend versions often behave slightly
differently. Up to now it's been "slight" enough that nobody noticed;
but with the addition of SCRAM support functions in src/common, we're
observing crashes due to the difference between palloc and malloc
memory allocation rules, as reported in bug #15367 from Jeremy Evans.
The purpose of this patch is to create a direct test for this type of
mis-linking, so that we know whether any given platform requires extra
measures to prevent using the wrong functions. If the test fails, it
will lead to connection failures in the contrib/postgres_fdw regression
test. At the moment, *BSD platforms using ELF format are known to have
the problem and can be expected to fail; but we need to know whether
anything else does, and we need a reliable ongoing check for future
platforms.
Actually fixing the problem will be the subject of later commit(s).
Discussion: https://postgr.es/m/153626613985.23143.4743626885618266803@wrigleys.postgresql.org
There are some problems with the tls-unique channel binding type. It's not
supported by all SSL libraries, and strictly speaking it's not defined for
TLS 1.3 at all, even though at least in OpenSSL, the functions used for it
still seem to work with TLS 1.3 connections. And since we had no
mechanism to negotiate what channel binding type to use, there would be
awkward interoperability issues if a server only supported some channel
binding types. tls-server-end-point seems feasible to support with any SSL
library, so let's just stick to that.
This removes the scram_channel_binding libpq option altogether, since there
is now only one supported channel binding type.
This also removes all the channel binding tests from the SSL test suite.
They were really just testing the scram_channel_binding option, which
is now gone. Channel binding is used if both client and server support it,
so it is used in the existing tests. It would be good to have some tests
specifically for channel binding, to make sure it really is used, and the
different combinations of a client and a server that support or doesn't
support it. The current set of settings we have make it hard to write such
tests, but I did test those things manually, by disabling
HAVE_BE_TLS_GET_CERTIFICATE_HASH and/or
HAVE_PGTLS_GET_PEER_CERTIFICATE_HASH.
I also removed the SCRAM_CHANNEL_BINDING_TLS_END_POINT constant. This is a
matter of taste, but IMO it's more readable to just use the
"tls-server-end-point" string.
Refactor the checks on whether the SSL library supports the functions
needed for tls-server-end-point channel binding. Now the server won't
advertise, and the client won't choose, the SCRAM-SHA-256-PLUS variant, if
compiled with an OpenSSL version too old to support it.
In the passing, add some sanity checks to check that the chosen SASL
mechanism, SCRAM-SHA-256 or SCRAM-SHA-256-PLUS, matches whether the SCRAM
exchange used channel binding or not. For example, if the client selects
the non-channel-binding variant SCRAM-SHA-256, but in the SCRAM message
uses channel binding anyway. It's harmless from a security point of view,
I believe, and I'm not sure if there are some other conditions that would
cause the connection to fail, but it seems better to be strict about these
things and check explicitly.
Discussion: https://www.postgresql.org/message-id/ec787074-2305-c6f4-86aa-6902f98485a4%40iki.fi
Commit c37b3d08c dropped its added GetDataDirectoryCreatePerm call into
the wrong place in pg_resetwal.c, namely after the chdir to DataDir.
That broke invocations using a relative path, as reported by Tushar Ahuja.
We could have left it where it was and changed the argument to be ".",
but that'd result in a rather confusing error message in event of a
failure, so re-ordering seems like a better solution.
Similarly reorder operations in pg_rewind.c. The issue there is that
it doesn't seem like a good idea to do any actual operations before the
not-root check (on Unix) or the restricted token acquisition (on Windows).
I don't know that this is an actual bug, but I'm definitely not convinced
that it isn't, either.
Assorted other code review for c37b3d08c and da9b580d8: fix some
misspelled or otherwise badly worded comments, put the #include for
<sys/stat.h> where it actually belongs, etc.
Discussion: https://postgr.es/m/aeb9c3a7-3c3f-a57f-1a18-c8d4fcdc2a1f@enterprisedb.com
In commit 9c0a0de4c, I'd failed to notice that catalog/catalog.h
should also be considered a frontend-unsafe header, because it includes
(and needs) the full form of pg_class.h, not to mention relcache.h.
However, various frontend code was depending on it to get
TABLESPACE_VERSION_DIRECTORY, so refactoring of some sort is called for.
The cleanest answer seems to be to move TABLESPACE_VERSION_DIRECTORY,
as well as the OIDCHARS symbol, to common/relpath.h. Do that, and mop up
inclusions as necessary. (I found that quite a few current users of
catalog/catalog.h don't seem to need it at all anymore, apparently as a
result of the refactorings that created common/relpath.[hc]. And
initdb.c needed it only as a route to pg_class_d.h.)
Discussion: https://postgr.es/m/6629.1523294509@sss.pgh.pa.us
Allow the cluster to be optionally init'd with read access for the
group.
This means a relatively non-privileged user can perform a backup of the
cluster without requiring write privileges, which enhances security.
The mode of PGDATA is used to determine whether group permissions are
enabled for directory and file creates. This method was chosen as it's
simple and works well for the various utilities that write into PGDATA.
Changing the mode of PGDATA manually will not automatically change the
mode of all the files contained therein. If the user would like to
enable group access on an existing cluster then changing the mode of all
the existing files will be required. Note that pg_upgrade will
automatically change the mode of all migrated files if the new cluster
is init'd with the -g option.
Tests are included for the backend and all the utilities which operate
on the PG data directory to ensure that the correct mode is set based on
the data directory permissions.
Author: David Steele <david@pgmasters.net>
Reviewed-By: Michael Paquier, with discussion amongst many others.
Discussion: https://postgr.es/m/ad346fe6-b23e-59f1-ecb7-0e08390ad629%40pgmasters.net
Consolidate directory and file create permissions for tools which work
with the PG data directory by adding a new module (common/file_perm.c)
that contains variables (pg_file_create_mode, pg_dir_create_mode) and
constants to initialize them (0600 for files and 0700 for directories).
Convert mkdir() calls in the backend to MakePGDirectory() if the
original call used default permissions (always the case for regular PG
directories).
Add tests to make sure permissions in PGDATA are set correctly by the
tools which modify the PG data directory.
Authors: David Steele <david@pgmasters.net>,
Adam Brightwell <adam.brightwell@crunchydata.com>
Reviewed-By: Michael Paquier, with discussion amongst many others.
Discussion: https://postgr.es/m/ad346fe6-b23e-59f1-ecb7-0e08390ad629%40pgmasters.net
Several places used similar code to convert a string to an int, so take
the function that we already had and make it globally available.
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Some older compilers otherwise sometimes complain about undefined
values, even though the return value should not be used in the
overflow case. We assume that any decent compiler will optimize away
the unnecessary assignment in performance critical cases.
We do not want to restrain the returned value to a specific value,
e.g. 0 or the wrapped-around value, because some fast ways to
implement overflow detecting math do not easily allow for
that (e.g. msvc intrinsics). As the function documentation already
documents the returned value in case of intrinsics to be
implementation defined, no documentation has to be updated.
Per complaint from Tom Lane and his buildfarm member prairiedog.
Author: Andres Freund
Discussion: https://postgr.es/m/18169.1513958454@sss.pgh.pa.us
pg_hba_file_rules erroneously reported this as scram-sha256. Fix that.
To avoid future errors and confusion, also adjust documentation links
and internal symbols to have a separator between "sha" and "256".
Reported-by: Christophe Courtois <christophe.courtois@dalibo.com>
Author: Michael Paquier <michael.paquier@gmail.com>
This adds a second standard channel binding type for SCRAM. It is
mainly intended for third-party clients that cannot implement
tls-unique, for example JDBC.
Author: Michael Paquier <michael.paquier@gmail.com>
Mechanism names for SCRAM and channel binding names have been included
in scram.h by the libpq frontend code, and this header references a set
of routines which are only used by the backend. scram-common.h is on
the contrary usable by both the backend and libpq, so getting those
names from there seems more reasonable.
Author: Michael Paquier <michael.paquier@gmail.com>
It's not easy to get signed integer overflow checks correct and
fast. Therefore abstract the necessary infrastructure into a common
header providing addition, subtraction and multiplication for 16, 32,
64 bit signed integers.
The new macros aren't yet used, but a followup commit will convert
several open coded overflow checks.
Author: Andres Freund, with some code stolen from Greg Stark
Reviewed-By: Robert Haas
Discussion: https://postgr.es/m/20171024103954.ztmatprlglz3rwke@alap3.anarazel.de