There are still some alignment-related failures in the buildfarm,
which might or might not be able to be fixed quickly, but I've also
just realized that it increased the size of many WAL records by 4 bytes
because a block reference contains a RelFileLocator. The effect of that
hasn't been studied or discussed, so revert for now.
RelFileNumbers are now assigned using a separate counter, instead of
being assigned from the OID counter. This counter never wraps around:
if all 2^56 possible RelFileNumbers are used, an internal error
occurs. As the cluster is limited to 2^64 total bytes of WAL, this
limitation should not cause a problem in practice.
If the counter were 64 bits wide rather than 56 bits wide, we would
need to increase the width of the BufferTag, which might adversely
impact buffer lookup performance. Also, this lets us use bigint for
pg_class.relfilenode and other places where these values are exposed
at the SQL level without worrying about overflow.
This should remove the need to keep "tombstone" files around until
the next checkpoint when relations are removed. We do that to keep
RelFileNumbers from being recycled, but now that won't happen
anyway. However, this patch doesn't actually change anything in
this area; it just makes it possible for a future patch to do so.
Dilip Kumar, based on an idea from Andres Freund, who also reviewed
some earlier versions of the patch. Further review and some
wordsmithing by me. Also reviewed at various points by Ashutosh
Sharma, Vignesh C, Amul Sul, Álvaro Herrera, and Tom Lane.
Discussion: http://postgr.es/m/CA+Tgmobp7+7kmi4gkq7Y+4AM9fTvL+O1oQ4-5gFTT+6Ng-dQ=g@mail.gmail.com
Previously, these were declared in postgres_ext.h, but they are not
needed nearly so widely as the OID declarations, so that doesn't
necessarily make sense. Also, because postgres_ext.h is included
before most of c.h has been processed, the previous location creates
some problems for a pending patch.
Patch by me, reviewed by Dilip Kumar.
Discussion: http://postgr.es/m/CA+TgmoYc8oevMqRokZQ4y_6aRn-7XQny1JBr5DyWR_jiFtONHw@mail.gmail.com
Make sure that function declarations use names that exactly match the
corresponding names from function definitions in optimizer, parser,
utility, libpq, and "commands" code, as well as in remaining library
code. Do the same for all code related to frontend programs (with the
exception of pg_dump/pg_dumpall related code).
Like other recent commits that cleaned up function parameter names, this
commit was written with help from clang-tidy. Later commits will handle
ecpg and pg_dump/pg_dumpall.
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/CAH2-WznJt9CMM9KJTMjJh_zbL5hD9oX44qdJ4aqZtjFi-zA3Tg@mail.gmail.com
PG_COMPRESSION_OPTION_LEVEL is removed from the compression
specification logic, and instead the compression level is always
assigned with each library's default if nothing is directly given. This
centralizes the checks on the compression methods supported by a given
build, and always assigns a default compression level when parsing a
compression specification. This results in complaining at an earlier
stage than previously if a build supports a compression method or not,
aka when parsing a specification in the backend or the frontend, and not
when processing it. zstd, lz4 and zlib are able to handle in their
respective routines setting up the compression level the case of a
default value, hence the backend or frontend code (pg_receivewal or
pg_basebackup) has now no need to know what the default compression
level should be if nothing is specified: the logic is now done so as the
specification parsing assigns it. It can also be enforced by passing
down a "level" set to the default value, that the backend will accept
(the replication protocol is for example able to handle a command like
BASE_BACKUP (COMPRESSION_DETAIL 'gzip:level=-1')).
This code simplification fixes an issue with pg_basebackup --gzip
introduced by ffd5365, where the tarball of the streamed WAL segments
would be created as of pg_wal.tar.gz with uncompressed contents, while
the intention is to compress the segments with gzip at a default level.
The origin of the confusion comes from the handling of the default
compression level of gzip (-1 or Z_DEFAULT_COMPRESSION) and the value of
0 was getting assigned, which is what walmethods.c would consider
as equivalent to no compression when streaming WAL segments with its tar
methods. Assigning always the compression level removes the confusion
of some code paths considering a value of 0 set in a specification as
either no compression or a default compression level.
Note that 010_pg_basebackup.pl has to be adjusted to skip a few tests
where the shape of the compression detail string for client and
server-side compression was checked using gzip. This is a result of the
code simplification, as gzip specifications cannot be used if a build
does not support it.
Reported-by: Tom Lane
Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/1400032.1662217889@sss.pgh.pa.us
Backpatch-through: 15
Commit d8594d123 updated the list of non-spacing codepoints used
for calculating display width, but in doing so inadvertently removed
some, since the script used for that commit only considered combining
characters.
For complete coverage for zero-width characters, include codepoints in
the category Cf (Format). To reflect the wider purpose, also rename files
and update comments that referred specifically to combining characters.
Some of these ranges have been missing since v12, but due to lack of
field complaints it was determined not important enough to justify adding
special-case logic the backbranches.
Kyotaro Horiguchi
Report by Pavel Stehule
Discussion: https://www.postgresql.org/message-id/flat/CAFj8pRBE8yvpQ0FSkPCoe0Ny1jAAsAQ6j3qMgVwWvkqAoaaNmQ%40mail.gmail.com
This adds additional variants of palloc, pg_malloc, etc. that
encapsulate common usage patterns and provide more type safety.
Specifically, this adds palloc_object(), palloc_array(), and
repalloc_array(), which take the type name of the object to be
allocated as its first argument and cast the return as a pointer to
that type. There are also palloc0_object() and palloc0_array()
variants for initializing with zero, and pg_malloc_*() variants of all
of the above.
Inspired by the talloc library.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/bb755632-2a43-d523-36f8-a1e7a389a907@enterprisedb.com
SUSv3, all targeted Unixes and modern Windows have getaddrinfo() and
related interfaces. Drop the replacement implementation, and adjust
some headers slightly to make sure that the APIs are visible everywhere
using standard POSIX headers and names.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CA%2BhUKG%2BL_3brvh%3D8e0BW_VfX9h7MtwgN%3DnFHP5o7X2oZucY9dg%40mail.gmail.com
We have been using the term RelFileNode to refer to either (1) the
integer that is used to name the sequence of files for a certain relation
within the directory set aside for that tablespace/database combination;
or (2) that value plus the OIDs of the tablespace and database; or
occasionally (3) the whole series of files created for a relation
based on those values. Using the same name for more than one thing is
confusing.
Replace RelFileNode with RelFileNumber when we're talking about just the
single number, i.e. (1) from above, and with RelFileLocator when we're
talking about all the things that are needed to locate a relation's files
on disk, i.e. (2) from above. In the places where we refer to (3) as
a relfilenode, instead refer to "relation storage".
Since there is a ton of SQL code in the world that knows about
pg_class.relfilenode, don't change the name of that column, or of other
SQL-facing things that derive their name from it.
On the other hand, do adjust closely-related internal terminology. For
example, the structure member names dbNode and spcNode appear to be
derived from the fact that the structure itself was called RelFileNode,
so change those to dbOid and spcOid. Likewise, various variables with
names like rnode and relnode get renamed appropriately, according to
how they're being used in context.
Hopefully, this is clearer than before. It is also preparation for
future patches that intend to widen the relfilenumber fields from its
current width of 32 bits. Variables that store a relfilenumber are now
declared as type RelFileNumber rather than type Oid; right now, these
are the same, but that can now more easily be changed.
Dilip Kumar, per an idea from me. Reviewed also by Andres Freund.
I fixed some whitespace issues, changed a couple of words in a
comment, and made one other minor correction.
Discussion: http://postgr.es/m/CA+TgmoamOtXbVAQf9hWFzonUo6bhhjS6toZQd7HZ-pmojtAmag@mail.gmail.com
Discussion: http://postgr.es/m/CA+Tgmobp7+7kmi4gkq7Y+4AM9fTvL+O1oQ4-5gFTT+6Ng-dQ=g@mail.gmail.com
Discussion: http://postgr.es/m/CAFiTN-vTe79M8uDH1yprOU64MNFE+R3ODRuA+JWf27JbhY4hJw@mail.gmail.com
Three variables in pqsignal.h (UnBlockSig, BlockSig and StartupBlockSig)
were not marked with PGDLLIMPORT, as they are declared in a way that
prevents mark_pgdllimport.pl to detect them. These variables are
redefined in a style more consistent with the other headers, allowing
the script to find and mark them.
PGDLLIMPORT was missing for __pg_log_level in logging.h, so add it
back. The marking got accidentally removed in 9a374b77, just after its
addition in 8ec5694.
While on it, add a comment in mark_pgdllimport.pl explaining what are
the arguments needed by the script (aka a list of header paths).
Reported-by: Andres Freund
Discussion: https://postgr.es/m/20220506234924.6mxxotl3xl63db3l@alap3.anarazel.de
Enforce __pg_log_level message filtering centrally in logging.c,
instead of relying on the calling macros to do it. This is more
reliable (e.g. it works correctly for direct calls to pg_log_generic)
and it saves a percent or so of total code size because we get rid of
so many duplicate checks of __pg_log_level.
This does mean that argument expressions in a logging macro will be
evaluated even if we end up not printing anything. That seems of
little concern for INFO and higher levels as those messages are printed
by default, and most of our frontend programs don't even offer a way to
turn them off. I left the unlikely() checks in place for DEBUG
messages, though.
Discussion: https://postgr.es/m/3993549.1649449609@sss.pgh.pa.us
Compression option handling (level, algorithm or even workers) can be
used across several parts of the system and not only base backups.
Structures, objects and routines are renamed in consequence, to remove
the concept of base backups from this part of the code making this
change straight-forward.
pg_receivewal, that has gained support for LZ4 since babbbb5, will make
use of this infrastructure for its set of compression options, bringing
more consistency with pg_basebackup. This cleanup needs to be done
before releasing a beta of 15. pg_dump is a potential future target, as
well, and adding more compression options to it may happen in 16~.
Author: Michael Paquier
Reviewed-by: Robert Haas, Georgios Kokolatos
Discussion: https://postgr.es/m/YlPQGNAAa04raObK@paquier.xyz
Get rid of the separate "FATAL" log level, as it was applied
so inconsistently as to be meaningless. This mostly involves
s/pg_log_fatal/pg_log_error/g.
Create a macro pg_fatal() to handle the common use-case of
pg_log_error() immediately followed by exit(1). Various
modules had already invented either this or equivalent macros;
standardize on pg_fatal() and apply it where possible.
Invent the ability to add "detail" and "hint" messages to a
frontend message, much as we have long had in the backend.
Except where rewording was needed to convert existing coding
to detail/hint style, I have (mostly) resisted the temptation
to change existing message wording.
Patch by me. Design and patch reviewed at various stages by
Robert Haas, Kyotaro Horiguchi, Peter Eisentraut and
Daniel Gustafsson.
Discussion: https://postgr.es/m/1363732.1636496441@sss.pgh.pa.us
Up until now, we've had a policy of only marking certain variables
in the PostgreSQL header files with PGDLLIMPORT, but now we've
decided to mark them all. This means that extensions running on
Windows should no longer operate at a disadvantage as compared to
extensions running on Linux: if the variable is present in a header
file, it should be accessible.
Discussion: http://postgr.es/m/CA+TgmoYanc1_FSfimhgiWSqVyP5KKmh5NP2BWNwDhO8Pg2vGYQ@mail.gmail.com
libzstd allows transparent parallel compression just by setting
an option when creating the compression context, so permit that
for both client and server-side backup compression. To use this,
use something like pg_basebackup --compress WHERE-zstd:workers=N
where WHERE is "client" or "server" and N is an integer.
When compression is performed on the server side, this will spawn
threads inside the PostgreSQL backend. While there is almost no
PostgreSQL server code which is thread-safe, the threads here are used
internally by libzstd and touch only data structures controlled by
libzstd.
Patch by me, based in part on earlier work by Dipesh Pandit
and Jeevan Ladhe. Reviewed by Justin Pryzby.
Discussion: http://postgr.es/m/CA+Tgmobj6u-nWF-j=FemygUhobhryLxf9h-wJN7W-2rSsseHNA@mail.gmail.com
There are more compression parameters that can be specified than just
an integer compression level, so rename the new COMPRESSION_LEVEL
option to COMPRESSION_DETAIL before it gets released. Introduce a
flexible syntax for that option to allow arbitrary options to be
specified without needing to adjust the main replication grammar,
and common code to parse it that is shared between the client and
the server.
This commit doesn't actually add any new compression parameters,
so the only user-visible change is that you can now type something
like pg_basebackup --compress gzip:level=5 instead of writing just
pg_basebackup --compress gzip:5. However, it should make it easy to
add new options. If for example gzip starts offering fries, we can
support pg_basebackup --compress gzip:level=5,fries=true for the
benefit of users who want fries with that.
Along the way, this fixes a few things in pg_basebackup so that the
pg_basebackup can be used with a server-side compression algorithm
that pg_basebackup itself does not understand. For example,
pg_basebackup --compress server-lz4 could still succeed even if
only the server and not the client has LZ4 support, provided that
the other options to pg_basebackup don't require the client to
decompress the archive.
Patch by me. Reviewed by Justin Pryzby and Dagfinn Ilmari Mannsåker.
Discussion: http://postgr.es/m/CA+TgmoYvpetyRAbbg1M8b3-iHsaN4nsgmWPjOENu5-doHuJ7fA@mail.gmail.com
The AF_UNIX macro was being used unprotected by HAVE_UNIX_SOCKETS,
apparently since 2008. So the redirection through IS_AF_UNIX() is
apparently no longer necessary. (More generally, all supported
platforms are now HAVE_UNIX_SOCKETS, but even if there were a new
platform in the future, it seems plausible that it would define the
AF_UNIX symbol even without kernel support.) So remove the
IS_AF_UNIX() macro and make the code a bit more consistent.
Discussion: https://www.postgresql.org/message-id/flat/f2d26815-9832-e333-d52d-72fbc0ade896%40enterprisedb.com
This is similar to b69aba7, except that this completes the work for
HMAC with a new routine called pg_hmac_error() that would provide more
context about the type of error that happened during a HMAC computation:
- The fallback HMAC implementation in hmac.c relies on cryptohashes, so
in some code paths it is necessary to return back the error generated by
cryptohashes.
- For the OpenSSL implementation (hmac_openssl.c), the logic is very
similar to cryptohash_openssl.c, where the error context comes from
OpenSSL if one of its internal routines failed, with different error
codes if something internal to hmac_openssl.c failed or was incorrect.
Any in-core code paths that use the centralized HMAC interface are
related to SCRAM, for errors that are unlikely going to happen, with
only SHA-256. It would be possible to see errors when computing some
HMACs with MD5 for example and OpenSSL FIPS enabled, and this commit
would help in reporting the correct errors but nothing in core uses
that. So, at the end, no backpatch to v14 is done, at least for now.
Errors in SCRAM related to the computation of the server key, stored
key, etc. need to pass down the potential error context string across
more layers of their respective call stacks for the frontend and the
backend, so each surrounding routine is adapted for this purpose.
Reviewed-by: Sergey Shinderuk
Discussion: https://postgr.es/m/Yd0N9tSAIIkFd+qi@paquier.xyz
The existing cryptohash facility was causing problems in some code paths
related to MD5 (frontend and backend) that relied on the fact that the
only type of error that could happen would be an OOM, as the MD5
implementation used in PostgreSQL ~13 (the in-core implementation is
used when compiling with or without OpenSSL in those older versions),
could fail only under this circumstance.
The new cryptohash facilities can fail for reasons other than OOMs, like
attempting MD5 when FIPS is enabled (upstream OpenSSL allows that up to
1.0.2, Fedora and Photon patch OpenSSL 1.1.1 to allow that), so this
would cause incorrect reports to show up.
This commit extends the cryptohash APIs so as callers of those routines
can fetch more context when an error happens, by using a new routine
called pg_cryptohash_error(). The error states are stored within each
implementation's internal context data, so as it is possible to extend
the logic depending on what's suited for an implementation. The default
implementation requires few error states, but OpenSSL could report
various issues depending on its internal state so more is needed in
cryptohash_openssl.c, and the code is shaped so as we are always able to
grab the necessary information.
The core code is changed to adapt to the new error routine, painting
more "const" across the call stack where the static errors are stored,
particularly in authentication code paths on variables that provide
log details. This way, any future changes would warn if attempting to
free these strings. The MD5 authentication code was also a bit blurry
about the handling of "logdetail" (LOG sent to the postmaster), so
improve the comments related that, while on it.
The origin of the problem is 87ae969, that introduced the centralized
cryptohash facility. Extra changes are done for pgcrypto in v14 for the
non-OpenSSL code path to cope with the improvements done by this
commit.
Reported-by: Michael Mühlbeyer
Author: Michael Paquier
Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/89B7F072-5BBE-4C92-903E-D83E865D9367@trivadis.com
Backpatch-through: 14
This reverts commits c2d1eea9e and 11b500072, as well as similar hacks
elsewhere, in favor of setting up the PGDLLIMPORT macro so that it can
just be used unconditionally. That can work because in frontend code,
we need no marking in either the defining or consuming files for a
variable exported from these libraries; and frontend code has no need
to access variables exported from the core backend, either.
While at it, write some actual documentation about the PGDLLIMPORT
and PGDLLEXPORT macros.
Patch by me, based on a suggestion from Robert Haas.
Discussion: https://postgr.es/m/1160385.1638165449@sss.pgh.pa.us
PGDLLIMPORT is only appropriate for variables declared in the backend,
not when the variable is coming from a library included in frontend code.
(This isn't a particularly nice fix, but for now, use the same method
employed elsewhere.)
Discussion: https://postgr.es/m/E1mrWUD-000235-Hq@gemulon.postgresql.org
Standardize on xoroshiro128** as our basic PRNG algorithm, eliminating
a bunch of platform dependencies as well as fundamentally-obsolete PRNG
code. In addition, this API replacement will ease replacing the
algorithm again in future, should that become necessary.
xoroshiro128** is a few percent slower than the drand48 family,
but it can produce full-width 64-bit random values not only 48-bit,
and it should be much more trustworthy. It's likely to be noticeably
faster than the platform's random(), depending on which platform you
are thinking about; and we can have non-global state vectors easily,
unlike with random(). It is not cryptographically strong, but neither
are the functions it replaces.
Fabien Coelho, reviewed by Dean Rasheed, Aleksander Alekseev, and myself
Discussion: https://postgr.es/m/alpine.DEB.2.22.394.2105241211230.165418@pseudo
Up to now, you couldn't escape out of psql's \password command
by typing control-C (or other local spelling of SIGINT). This
is pretty user-unfriendly, so improve it. To do so, we have to
modify the functions provided by pg_get_line.c; but we don't
want to mess with psql's SIGINT handler setup, so provide an
API that lets that handler cause the cancel to occur.
This relies on the assumption that we won't do any major harm by
longjmp'ing out of fgets(). While that's obviously a little shaky,
we've long had the same assumption in the main input loop, and few
issues have been reported.
psql has some other simple_prompt() calls that could usefully
be improved the same way; for now, just deal with \password.
Nathan Bossart, minor tweaks by me
Discussion: https://postgr.es/m/747443.1635536754@sss.pgh.pa.us
The hardcoded "wide character" set in ucs_wcwidth() was last updated
around the Unicode 5.0 era. This led to misalignment when printing
emojis and other codepoints that have since been designated
wide or full-width.
To fix and keep up to date, extend update-unicode to download the list
of wide and full-width codepoints from the offical sources.
In passing, remove some comments about non-spacing characters that
haven't been accurate since we removed the former hardcoded logic.
Jacob Champion
Reported and reviewed by Pavel Stehule
Discussion: https://www.postgresql.org/message-id/flat/CAFj8pRCeX21O69YHxmykYySYyprZAqrKWWg0KoGKdjgqcGyygg@mail.gmail.com
This is a combined revert of the following commits:
- c3826f8, a refactoring piece that moved the hex decoding code to
src/common/. This code was cleaned up by aef8948, as it originally
included no overflow checks in the same way as the base64 routines in
src/common/ used by SCRAM, making it unsafe for its purpose.
- aef8948, a more advanced refactoring of the hex encoding/decoding code
to src/common/ that added sanity checks on the result buffer for hex
decoding and encoding. As reported by Hans Buschmann, those overflow
checks are expensive, and it is possible to see a performance drop in
the decoding/encoding of bytea or LOs the longer they are. Simple SQLs
working on large bytea values show a clear difference in perf profile.
- ccf4e27, a cleanup made possible by aef8948.
The reverts of all those commits bring back the performance of hex
decoding and encoding back to what it was in ~13. Fow now and
post-beta3, this is the simplest option.
Reported-by: Hans Buschmann
Discussion: https://postgr.es/m/1629039545467.80333@nidsa.net
Backpatch-through: 14
Similarly to the cryptohash implementations, this refactors the existing
HMAC code into a single set of APIs that can be plugged with any crypto
libraries PostgreSQL is built with (only OpenSSL currently). If there
is no such libraries, a fallback implementation is available. Those new
APIs are designed similarly to the existing cryptohash layer, so there
is no real new design here, with the same logic around buffer bound
checks and memory handling.
HMAC has a dependency on cryptohashes, so all the cryptohash types
supported by cryptohash{_openssl}.c can be used with HMAC. This
refactoring is an advantage mainly for SCRAM, that included its own
implementation of HMAC with SHA256 without relying on the existing
crypto libraries even if PostgreSQL was built with their support.
This code has been tested on Windows and Linux, with and without
OpenSSL, across all the versions supported on HEAD from 1.1.1 down to
1.0.1. I have also checked that the implementations are working fine
using some sample results, a custom extension of my own, and doing
cross-checks across different major versions with SCRAM with the client
and the backend.
Author: Michael Paquier
Reviewed-by: Bruce Momjian
Discussion: https://postgr.es/m/X9m0nkEJEzIPXjeZ@paquier.xyz
Point to the specific line where the error was detected; the
previous code tended to include several preceding lines as well.
Avoid re-scanning the entire input to recompute which line that
was. Simplify the logic a bit. Add test cases.
Simon Riggs and Hamid Akhtar, reviewed by Daniel Gustafsson and myself
Discussion: https://postgr.es/m/CANbhV-EPBnXm3MF_TTWBwwqgn1a1Ghmep9VHfqmNBQ8BT0f+_g@mail.gmail.com
With its current design, a careless use of pg_cryptohash_final() could
would result in an out-of-bound write in memory as the size of the
destination buffer to store the result digest is not known to the
cryptohash internals, without the caller knowing about that. This
commit adds a new argument to pg_cryptohash_final() to allow such sanity
checks, and implements such defenses.
The internals of SCRAM for HMAC could be tightened a bit more, but as
everything is based on SCRAM_KEY_LEN with uses particular to this code
there is no need to complicate its interface more than necessary, and
this comes back to the refactoring of HMAC in core. Except that, this
minimizes the uses of the existing DIGEST_LENGTH variables, relying
instead on sizeof() for the result sizes. In ossp-uuid, this also makes
the code more defensive, as it already relied on dce_uuid_t being at
least the size of a MD5 digest.
This is in philosophy similar to cfc40d3 for base64.c and aef8948 for
hex.c.
Reported-by: Ranier Vilela
Author: Michael Paquier, Ranier Vilela
Reviewed-by: Kyotaro Horiguchi
Discussion: https://postgr.es/m/CAEudQAoqEGmcff3J4sTSV-R_16Monuz-UpJFbf_dnVH=APr02Q@mail.gmail.com
With this commit, SHA1 goes through the implementation provided by
OpenSSL via EVP when building the backend with it, and uses as fallback
implementation KAME which was located in pgcrypto and already shaped for
an integration with a set of init, update and final routines.
Structures and routines have been renamed to make things consistent with
the fallback implementations of MD5 and SHA2.
uuid-ossp has used for ages a shortcut with pgcrypto to fetch a copy of
SHA1 if needed. This was built depending on the build options within
./configure, so this cleans up some code and removes the build
dependency between pgcrypto and uuid-ossp.
Note that this will help with the refactoring of HMAC, as pgcrypto
offers the option to use MD5, SHA1 or SHA2, so only the second option
was missing to make that possible.
Author: Michael Paquier
Reviewed-by: Heikki Linnakangas
Discussion: https://postgr.es/m/X9HXKTgrvJvYO7Oh@paquier.xyz
This commit addresses some issues with c3826f83 that moved the hex
decoding routine to src/common/:
- The decoding function lacked overflow checks, so when used for
security-related features it was an open door to out-of-bound writes if
not carefully used that could remain undetected. Like the base64
routines already in src/common/ used by SCRAM, this routine is reworked
to check for overflows by having the size of the destination buffer
passed as argument, with overflows checked before doing any writes.
- The encoding routine was missing. This is moved to src/common/ and
it gains the same overflow checks as the decoding part.
On failure, the hex routines of src/common/ issue an error as per the
discussion done to make them usable by frontend tools, but not by shared
libraries. Note that this is why ECPG is left out of this commit, and
it still includes a duplicated logic doing hex encoding and decoding.
While on it, this commit uses better variable names for the source and
destination buffers in the existing escape and base64 routines in
encode.c and it makes them more robust to overflow detection. The
previous core code issued a FATAL after doing out-of-bound writes if
going through the SQL functions, which would be enough to detect
problems when working on changes that impacted this area of the
code. Instead, an error is issued before doing an out-of-bound write.
The hex routines were being directly called for bytea conversions and
backup manifests without such sanity checks. The current calls happen
to not have any problems, but careless uses of such APIs could easily
lead to CVE-class bugs.
Author: Bruce Momjian, Michael Paquier
Reviewed-by: Sehrope Sarkuni
Discussion: https://postgr.es/m/20201231003557.GB22199@momjian.us
This commit addresses two issues:
- In pgcrypto, MD5 computation called pg_cryptohash_{init,update,final}
without checking for the result status.
- Simplify pg_checksum_raw_context to use only one variable for all the
SHA2 options available in checksum manifests.
Reported-by: Heikki Linnakangas
Discussion: https://postgr.es/m/f62f26bb-47a5-8411-46e5-4350823e06a5@iki.fi
The allocation of the cryptohash context data when building with OpenSSL
was happening in the memory context of the caller of
pg_cryptohash_create(), which could lead to issues with resowner cleanup
if cascading resources are cleaned up on an error. Like other
facilities using resowners, move the base allocation to TopMemoryContext
to ensure a correct cleanup on failure.
The resulting code gets simpler with this commit as the context data is
now hold by a unique opaque pointer, so as there is only one single
allocation done in TopMemoryContext.
After discussion, also change the cryptohash subroutines to return an
error if the caller provides NULL for the context data to ease error
detection on OOM.
Author: Heikki Linnakangas
Discussion: https://postgr.es/m/X9xbuEoiU3dlImfa@paquier.xyz
The patch needs test cases, reorganization, and cfbot testing.
Technically reverts commits 5c31afc49d..e35b2bad1a (exclusive/inclusive)
and 08db7c63f3..ccbe34139b.
Reported-by: Tom Lane, Michael Paquier
Discussion: https://postgr.es/m/E1ktAAG-0002V2-VB@gemulon.postgresql.org
This adds a key management system that stores (currently) two data
encryption keys of length 128, 192, or 256 bits. The data keys are
AES256 encrypted using a key encryption key, and validated via GCM
cipher mode. A command to obtain the key encryption key must be
specified at initdb time, and will be run at every database server
start. New parameters allow a file descriptor open to the terminal to
be passed. pg_upgrade support has also been added.
Discussion: https://postgr.es/m/CA+fd4k7q5o6Nc_AaX6BcYM9yqTbC6_pnH-6nSD=54Zp6NBQTCQ@mail.gmail.com
Discussion: https://postgr.es/m/20201202213814.GG20285@momjian.us
Author: Masahiko Sawada, me, Stephen Frost
The same logic was present for collation commands, SASLprep and
pgcrypto, so this removes some code.
Author: Michael Paquier
Reviewed-by: Stephen Frost, Heikki Linnakangas
Discussion: https://postgr.es/m/X9womIn6rne6Gud2@paquier.xyz
This adjusts some code related to recent changes for cryptohash
functions:
- Add a variable in md5.h to track down the size of a computed result,
moved from pgcrypto. Note that pg_md5_hash() assumed a result of this
size already.
- Call explicit_bzero() on the hashed data when freeing the context for
fallback implementations. For MD5, particularly, it would be annoying
to leave some non-zeroed data around.
- Clean up some code related to recent changes of uuid-ossp. .gitignore
still included md5.c and a comment was incorrect.
Discussion: https://postgr.es/m/X9HXKTgrvJvYO7Oh@paquier.xyz
This commit heavily reorganizes the MD5 implementations that exist in
the tree in various aspects.
First, MD5 is added to the list of options available in cryptohash.c and
cryptohash_openssl.c. This means that if building with OpenSSL, EVP is
used for MD5 instead of the fallback implementation that Postgres had
for ages. With the recent refactoring work for cryptohash functions,
this change is straight-forward. If not building with OpenSSL, a
fallback implementation internal to src/common/ is used.
Second, this reduces the number of MD5 implementations present in the
tree from two to one, by moving the KAME implementation from pgcrypto to
src/common/, and by removing the implementation that existed in
src/common/. KAME was already structured with an init/update/final set
of routines by pgcrypto (see original pgcrypto/md5.h) for compatibility
with OpenSSL, so moving it to src/common/ has proved to be a
straight-forward move, requiring no actual manipulation of the internals
of each routine. Some benchmarking has not shown any performance gap
between both implementations.
Similarly to the fallback implementation used for SHA2, the fallback
implementation of MD5 is moved to src/common/md5.c with an internal
header called md5_int.h for the init, update and final routines. This
gets then consumed by cryptohash.c.
The original routines used for MD5-hashed passwords are moved to a
separate file called md5_common.c, also in src/common/, aimed at being
shared between all MD5 implementations as utility routines to keep
compatibility with any code relying on them.
Like the SHA2 changes, this commit had its round of tests on both Linux
and Windows, across all versions of OpenSSL supported on HEAD, with and
even without OpenSSL.
Author: Michael Paquier
Reviewed-by: Daniel Gustafsson
Discussion: https://postgr.es/m/20201106073434.GA4961@paquier.xyz
Two new routines to allocate a hash context and to free it are created,
as these become necessary for the goal behind this refactoring: switch
the all cryptohash implementations for OpenSSL to use EVP (for FIPS and
also because upstream does not recommend the use of low-level cryptohash
functions for 20 years). Note that OpenSSL hides the internals of
cryptohash contexts since 1.1.0, so it is necessary to leave the
allocation to OpenSSL itself, explaining the need for those two new
routines. This part is going to require more work to properly track
hash contexts with resource owners, but this not introduced here.
Still, this refactoring makes the move possible.
This reduces the number of routines for all SHA2 implementations from
twelve (SHA{224,256,386,512} with init, update and final calls) to five
(create, free, init, update and final calls) by incorporating the hash
type directly into the hash context data.
The new cryptohash routines are moved to a new file, called cryptohash.c
for the fallback implementations, with SHA2 specifics becoming a part
internal to src/common/. OpenSSL specifics are part of
cryptohash_openssl.c. This infrastructure is usable for more hash
types, like MD5 or HMAC.
Any code paths using the internal SHA2 routines are adapted to report
correctly errors, which are most of the changes of this commit. The
zones mostly impacted are checksum manifests, libpq and SCRAM.
Note that e21cbb4 was a first attempt to switch SHA2 to EVP, but it
lacked the refactoring needed for libpq, as done here.
This patch has been tested on Linux and Windows, with and without
OpenSSL, and down to 1.0.1, the oldest version supported on HEAD.
Author: Michael Paquier
Reviewed-by: Daniel Gustafsson
Discussion: https://postgr.es/m/20200924025314.GE7405@paquier.xyz
This replaces the existing binary search with two perfect hash functions
for the composition and the decomposition in the backend code, at the
cost of slightly-larger binaries there (35kB in libpgcommon_srv.a). Per
the measurements done, this improves the speed of the recomposition and
decomposition by up to 30~40 times for the NFC and NFKC conversions,
while all other operations get at least 40% faster. This is not as
"good" as what libicu has, but it closes the gap a lot as per the
feedback from Daniel Verite.
The decomposition table remains the same, getting used for the binary
search in the frontend code, where we care more about the size of the
libraries like libpq over performance as this gets involved only in code
paths related to the SCRAM authentication. In consequence, note that
the perfect hash function for the recomposition needs to use a new
inverse lookup array back to to the existing decomposition table.
The size of all frontend deliverables remains unchanged, even with
--enable-debug, including libpq.
Author: John Naylor
Reviewed-by: Michael Paquier, Tom Lane
Discussion: https://postgr.es/m/CAFBsxsHUuMFCt6-pU+oG-F1==CmEp8wR+O+bRouXWu6i8kXuqA@mail.gmail.com
80f8eb7 has added to the normalization quick check headers some code
generated by PerfectHash.pm that is incompatible with the settings of
gitattributes for this repository, as whitespaces followed a set of tabs
for the first element of a line in the table. Instead of adding a new
exception to gitattributes, rework the format generated so as a right
padding with spaces is used instead of a left padding. This keeps the
table generated in a readable shape with its set of columns, making
unnecessary an update of gitattributes.
Reported-by: Peter Eisentraut
Author: John Naylor
Discussion: https://postgr.es/m/d601b3b5-a3c7-5457-2f84-3d6513d690fc@2ndquadrant.com
This makes the normalization quick check about 30% faster for NFC and
50% faster for NFKC than the binary search used previously. The hash
lookup reuses the existing array of bit fields used for the binary
search to get the quick check property and is generated as part of "make
update-unicode" in src/common/unicode/.
Author: John Naylor
Reviewed-by: Mark Dilger, Michael Paquier
Discussion: https://postgr.es/m/CACPNZCt4fbJ0_bGrN5QPt34N4whv=mszM0LMVQdoa2rC9UMRXA@mail.gmail.com
This reverts commit e21cbb4, as the switch to EVP routines requires a
more careful design where we would need to have at least our wrapper
routines return a status instead of issuing an error by themselves to
let the caller do the error handling. The memory handling was also
incorrect and could cause leaks in the backend if a failure happened,
requiring most likely a callback to do the necessary cleanup as the only
clean way to be able to allocate an EVP context requires the use of an
allocation within OpenSSL. The potential rework of the wrappers also
impacts the fallback implementation when not building with OpenSSL.
Originally, prairiedog has reported a compilation failure, but after
discussion with Tom Lane this needs a better design.
Discussion: https://postgr.es/m/20200928073330.GC2316@paquier.xyz
The use of low-level hash routines is not recommended by upstream
OpenSSL since 2000, and pgcrypto already switched to EVP as of 5ff4a67.
Note that this also fixes a failure with SCRAM authentication when using
FIPS in OpenSSL, but as there have been few complaints about this
problem and as this causes an ABI breakage, no backpatch is done.
Author: Michael Paquier, Alessandro Gherardi
Reviewed-by: Daniel Gustafsson
Discussion: https://postgr.es/m/20200924025314.GE7405@paquier.xyz
Discussion: https://postgr.es/m/20180911030250.GA27115@paquier.xyz
Further experience says that the appending behavior offered by
pg_get_line_append is useful to only a very small minority of callers.
For most, the requirement to reset the buffer after each line is just
an error-prone nuisance. Hence, invent another alternative call
pg_get_line_buf, which takes care of that detail.
Noted while reviewing a patch from Daniel Gustafsson.
Discussion: https://postgr.es/m/48A4FA71-524E-41B9-953A-FD04EF36E2E7@yesql.se
Up to now, if you tried to omit "AS" before a column label in a SELECT
list, it would only work if the column label was an IDENT, that is not
any known keyword. This is rather unfriendly considering that we have
so many keywords and are constantly growing more. In the wake of commit
1ed6b8956 it's possible to improve matters quite a bit.
We'd originally tried to make this work by having some of the existing
keyword categories be allowed without AS, but that didn't work too well,
because each category contains a few special cases that don't work
without AS. Instead, invent an entirely orthogonal keyword property
"can be bare column label", and mark all keywords that way for which
we don't get shift/reduce errors by doing so.
It turns out that of our 450 current keywords, all but 39 can be made
bare column labels, improving the situation by over 90%. This number
might move around a little depending on future grammar work, but it's
a pretty nice improvement.
Mark Dilger, based on work by myself and Robert Haas;
review by John Naylor
Discussion: https://postgr.es/m/38ca86db-42ab-9b48-2902-337a0d6b8311@2ndquadrant.com
Instead of hard-wiring specific verbosity levels into the option
processing of client applications, invent pg_logging_increase_verbosity()
and encourage clients to implement --verbose by calling that. Then,
the common convention that more -v's gets you more verbosity just works.
In particular, this allows resurrection of the debug-grade messages that
have long existed in pg_dump and its siblings. They were unreachable
before this commit due to lack of a way to select PG_LOG_DEBUG logging
level. (It appears that they may have been unreachable for some time
before common/logging.c was introduced, too, so I'm not specifically
blaming cc8d41511 for the oversight. One reason for thinking that is
that it's now apparent that _allocAH()'s message needs a null-pointer
guard. Testing might have failed to reveal that before 96bf88d52.)
Discussion: https://postgr.es/m/1173106.1600116625@sss.pgh.pa.us
Some kernels can tell us the type of a "dirent", so we can avoid a call
to stat() or lstat() in many cases. Define a new function
get_dirent_type() to contain that logic, for use by the backend and
frontend versions of walkdir(), and perhaps other callers in future.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Juan José Santamaría Flecha <juanjo.santamaria@gmail.com>
Discussion: https://postgr.es/m/CA%2BhUKG%2BFzxupGGN4GpUdbzZN%2Btn6FQPHo8w0Q%2BAPH5Wz8RG%2Bww%40mail.gmail.com
Letting the caller provide a StringInfo to read into is helpful when
the caller needs to merge lines or otherwise modify the data after
it's been read. Notably, now the code added by commit 8f8154a50
can use pg_get_line_append() instead of having its own copy of that
logic. A follow-on commit will also make use of this.
Also, since StringInfo buffers are a minimum of 1KB long, blindly
using pg_get_line() in a loop can eat a lot more memory than one would
expect. I discovered for instance that commit e0f05cd5b caused initdb
to consume circa 10MB to read postgres.bki, even though that's under
1MB worth of data. A less memory-hungry alternative is to re-use the
same StringInfo for all lines and pg_strdup the results.
Discussion: https://postgr.es/m/1315832.1599345736@sss.pgh.pa.us
This patch started out with the goal of harmonizing various arbitrary
limits on password length, but after awhile a better idea emerged:
let's just get rid of those fixed limits.
recv_password_packet() has an arbitrary limit on the packet size,
which we don't really need, so just drop it. (Note that this doesn't
really affect anything for MD5 or SCRAM password verification, since
those will hash the user's password to something shorter anyway.
It does matter for auth methods that require a cleartext password.)
Likewise remove the arbitrary error condition in pg_saslprep().
The remaining limits are mostly in client-side code that prompts
for passwords. To improve those, refactor simple_prompt() so that
it allocates its own result buffer that can be made as big as
necessary. Actually, it proves best to make a separate routine
pg_get_line() that has essentially the semantics of fgets(), except
that it allocates a suitable result buffer and hence will never
return a truncated line. (pg_get_line has a lot of potential
applications to replace randomly-sized fgets buffers elsewhere,
but I'll leave that for another patch.)
I built pg_get_line() atop stringinfo.c, which requires moving
that code to src/common/; but that seems fine since it was a poor
fit for src/port/ anyway.
This patch is mostly mine, but it owes a good deal to Nathan Bossart
who pressed for a solution to the password length problem and
created a predecessor patch. Also thanks to Peter Eisentraut and
Stephen Frost for ideas and discussion.
Discussion: https://postgr.es/m/09512C4F-8CB9-4021-B455-EF4C4F0D55A0@amazon.com
Any libpq client can use the header. Clients include backend components
postgres_fdw, dblink, and logical replication apply worker. Back-patch
to v10, because another fix needs this. In released branches, just copy
the header and keep the original.
OpenSSL's native reports about problems related to protocol version
restrictions are pretty opaque and inconsistent. When we get an
SSL error that is plausibly due to this, emit a hint message that
includes the range of SSL protocol versions we (think we) are
allowing. This should at least get the user thinking in the right
direction to resolve the problem, even if the hint isn't totally
accurate, which it might not be for assorted reasons.
Back-patch to v13 where we increased the default minimum protocol
version, thereby increasing the risk of this class of failure.
Patch by me, reviewed by Daniel Gustafsson
Discussion: https://postgr.es/m/a9408304-4381-a5af-d259-e55d349ae4ce@2ndquadrant.com
fe_archive.c was compiled only for the frontend in src/common/, but as
it will never share anything with the backend, it makes most sense to
move this file to src/fe_utils/.
Reported-by: Peter Eisentraut
Discussion: https://postgr.es/m/e9766d71-8655-ac86-bdf6-77e0e7169977@2ndquadrant.com
Backpatch-through: 13
Includes some manual cleanup of places that pgindent messed up,
most of which weren't per project style anyway.
Notably, it seems some people didn't absorb the style rules of
commit c9d297751, because there were a bunch of new occurrences
of function calls with a newline just after the left paren, all
with faulty expectations about how the rest of the call would get
indented.
This adds SQL expressions NORMALIZE() and IS NORMALIZED to convert and
check Unicode normal forms, per SQL standard.
To support fast IS NORMALIZED tests, we pull in a new data file
DerivedNormalizationProps.txt from Unicode and build a lookup table
from that, using techniques similar to ones already used for other
Unicode data. make update-unicode will keep it up to date. We only
build and use these tables for the NFC and NFKC forms, because they
are too big for NFD and NFKD and the improvement is not significant
enough there.
Reviewed-by: Daniel Verite <daniel@manitou-mail.org>
Reviewed-by: Andreas Karlsson <andreas@proxel.se>
Discussion: https://www.postgresql.org/message-id/flat/c1909f27-c269-2ed9-12f8-3ab72c8caf7a@2ndquadrant.com
pg_rewind needs to copy from the source cluster to the target cluster a
set of relation blocks changed from the previous checkpoint where WAL
forked up to the end of WAL on the target. Building this list of
relation blocks requires a range of WAL segments that may not be present
anymore on the target's pg_wal, causing pg_rewind to fail. It is
possible to work around this issue by copying manually the WAL segments
needed but this may lead to some extra and actually useless work.
This commit introduces a new option allowing pg_rewind to use a
restore_command while doing the rewind by grabbing the parameter value
of restore_command from the target cluster configuration. This allows
the rewind operation to be more reliable, so as only the WAL segments
needed by the rewind are restored from the archives.
In order to be able to do that, a new routine is added to src/common/ to
allow frontend tools to restore files from archives using an
already-built restore command. This version is more simple than the
backend equivalent as there is no need to handle the non-recovery case.
Author: Alexey Kondratov
Reviewed-by: Andrey Borodin, Andres Freund, Alvaro Herrera, Alexander
Korotkov, Michael Paquier
Discussion: https://postgr.es/m/a3acff50-5a0d-9a2c-b3b2-ee36168955c1@postgrespro.ru
It previously only supported NFKC, for use by SASLprep. This expands
the API to offer the choice of all four normalization forms. Right
now, there are no internal users of the forms other than NFKC.
Reviewed-by: Daniel Verite <daniel@manitou-mail.org>
Reviewed-by: Andreas Karlsson <andreas@proxel.se>
Discussion: https://www.postgresql.org/message-id/flat/c1909f27-c269-2ed9-12f8-3ab72c8caf7a@2ndquadrant.com
restore_command has only been used until now by the backend, but there
is a pending patch for pg_rewind to make use of that in the frontend.
Author: Alexey Kondratov
Reviewed-by: Andrey Borodin, Andres Freund, Alvaro Herrera, Alexander
Korotkov, Michael Paquier
Discussion: https://postgr.es/m/a3acff50-5a0d-9a2c-b3b2-ee36168955c1@postgrespro.ru
This also involves renaming src/include/utils/hashutils.h, which
becomes src/include/common/hashfn.h. Perhaps an argument can be
made for keeping the hashutils.h name, but it seemed more
consistent to make it match the name of the file, and also more
descriptive of what is actually going on here.
Patch by me, reviewed by Suraj Kharage and Mark Dilger. Off-list
advice on how not to break the Windows build from Davinder Singh
and Amit Kapila.
Discussion: http://postgr.es/m/CA+TgmoaRiG4TXND8QuM6JXFRkM_1wL2ZNhzaUKsuec9-4yrkgw@mail.gmail.com
To make this work, (1) makeJsonLexContextCstringLen now takes the
encoding to be used as an argument; (2) check_stack_depth() is made to
do nothing in frontend code, and (3) elog(ERROR, ...) is changed to
pg_log_fatal + exit in frontend code.
Mark Dilger, reviewed and slightly revised by me.
Discussion: http://postgr.es/m/CA+TgmoYfOXhd27MUDGioVh6QtpD0C1K-f6ObSA10AWiHBAL5bA@mail.gmail.com
Two routines have been added in OpenSSL 1.1.0 to set the protocol bounds
allowed within a given SSL context:
- SSL_CTX_set_min_proto_version
- SSL_CTX_set_max_proto_version
As Postgres supports OpenSSL down to 1.0.1 (as of HEAD), equivalent
replacements exist in the tree, which are only available for the
backend. A follow-up patch is planned to add control of the SSL
protocol bounds for libpq, so move those routines to src/common/ so as
libpq can use them.
Author: Daniel Gustafsson
Discussion: https://postgr.es/m/4F246AE3-A7AE-471E-BD3D-C799D3748E03@yesql.se
We currently have several sets of files generated from data provided
by Unicode. These all have ad hoc rules and instructions for updating
when new Unicode versions appear, and it's not done consistently.
This patch centralizes and automates the process and makes it part of
the release checklist. The Unicode and CLDR versions are specified in
Makefile.global.in. There is a new make target "update-unicode" that
downloads all the relevant files and runs the generation script.
There is also a new script for generating the table of combining
characters for ucs_wcwidth(). That table is now in a separate include
file rather than hardcoded into the middle of other code. This is
based on the script that was used for generating
d8594d123c, but the script itself wasn't
committed at that time.
Reviewed-by: John Naylor <john.naylor@2ndquadrant.com>
Discussion: https://www.postgresql.org/message-id/flat/c8d05f42-443e-6c23-819b-05b31759a37c@2ndquadrant.com
Within the context of SCRAM, "verifier" has a specific meaning in the
protocol, per RFCs. The existing code used "verifier" differently, to
mean whatever is or would be stored in pg_auth.rolpassword.
Fix this by using the term "secret" for this, following RFC 5803.
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://www.postgresql.org/message-id/flat/be397b06-6e4b-ba71-c7fb-54cae84a7e18%402ndquadrant.com
Commit 4d0e994eed added support for partial TOAST decompression, so the
decompression is interrupted after producing the requested prefix. For
prefix and slices near the beginning of the entry, this may saves a lot
of decompression work.
That however only deals with decompression - the whole compressed entry
was still fetched and re-assembled, even though the compression used
only a small fraction of it. This commit improves that by computing how
much compressed data may be needed to decompress the requested prefix,
and then fetches only the necessary part.
We always need to fetch a bit more compressed data than the requested
(uncompressed) prefix, because the prefix may not be compressible at all
and pglz itself adds a bit of overhead. That means this optimization is
most effective when the requested prefix is much smaller than the whole
compressed entry.
Author: Binguo Bao
Reviewed-by: Andrey Borodin, Tomas Vondra, Paul Ramsey
Discussion: https://www.postgresql.org/message-id/flat/CAL-OGkthU9Gs7TZchf5OWaL-Gsi=hXqufTxKv9qpNG73d5na_g@mail.gmail.com
Similarly to the signed versions added in 4d6ad31, this adds a set of
inline functions for overflow checks with unsigned integers, including
uint16, uint32 and uint64. This relies on compiler built-in overflow
checks by default if available. The behavior of unsigned integers is
well-defined so the fallback implementations checks are simple for
additions and subtractions. Multiplications avoid division-based checks
which are expensive if possible, still this can happen for uint64 if
128-bit integers are not available.
While on it, the code in common/int.h is reorganized to avoid too many
duplicated comments. The new macros will be used in a follow-up patch.
All thanks to Andres Freund for the input provided.
Author: Fabien Coelho, Michael Paquier
Discussion: https://postgr.es/m/20190830073423.GB2354@paquier.xyz
b654714 has reworked the way trailing CR/LF characters are removed from
strings. This commit introduces a new routine in common/string.c and
refactors the code so as the logic is in a single place, mostly.
Author: Michael Paquier
Reviewed-by: Bruce Momjian
Discussion: https://postgr.es/m/20190801031820.GF29334@paquier.xyz
This is a follow-up refactoring after 09ec55b and b674211, which has
proved that the encoding and decoding routines used by SCRAM have a
poor interface when it comes to check after buffer overflows. This adds
an extra argument in the shape of the length of the result buffer for
each routine, which is used for overflow checks when encoding or
decoding an input string. The original idea comes from Tom Lane.
As a result of that, the encoding routine can now fail, so all its
callers are adjusted to generate proper error messages in case of
problems.
On failure, the result buffer gets zeroed.
Author: Michael Paquier
Reviewed-by: Daniel Gustafsson
Discussion: https://postgr.es/m/20190623132535.GB1628@paquier.xyz
In order to separate OpenSSL's SHA symbols, this header has been using
USE_SSL, which is equivalent to USE_OPENSSL. There is now only one SSL
implementation included in the tree, so this works fine, but when
adding a new SSL implementation this would run into failures.
Author: Daniel Gustafsson
Discussion: https://postgr.es/m/0DF29010-CE26-4F51-85A6-9C8ABF5536F9@yesql.se
This is still using the 2.0 version of pg_bsd_indent.
I thought it would be good to commit this separately,
so as to document the differences between 2.0 and 2.1 behavior.
Discussion: https://postgr.es/m/16296.1558103386@sss.pgh.pa.us
The original placement of this module in src/fe_utils/ is ill-considered,
because several src/common/ modules have dependencies on it, meaning that
libpgcommon and libpgfeutils now have mutual dependencies. That makes it
pointless to have distinct libraries at all. The intended design is that
libpgcommon is lower-level than libpgfeutils, so only dependencies from
the latter to the former are acceptable.
We already have the precedent that fe_memutils and a couple of other
modules in src/common/ are frontend-only, so it's not stretching anything
out of whack to treat logging.c as a frontend-only module in src/common/.
To the extent that such modules help provide a common frontend/backend
environment for the rest of common/ to use, it's a reasonable design.
(logging.c does not yet provide an ereport() emulation, but one can
dream.)
Hence, move these files over, and revert basically all of the build-system
changes made by commit cc8d41511. There are no places that need to grow
new dependencies on libpgcommon, further reinforcing the idea that this
is the right solution.
Discussion: https://postgr.es/m/a912ffff-f6e4-778a-c86a-cf5c47a12933@2ndquadrant.com
This commit fixes a couple of issues related to the way password
verifiers hashed with MD5 or SCRAM-SHA-256 are detected, leading to
being able to store in catalogs passwords which do not follow the
supported hash formats:
- A MD5-hashed entry was checked based on if its header uses "md5" and
if the string length matches what is expected. Unfortunately the code
never checked if the hash only used hexadecimal characters, as reported
by Tom Lane.
- A SCRAM-hashed entry was checked based on only its header, which
should be "SCRAM-SHA-256$", but it never checked for any fields
afterwards, as reported by Jonathan Katz.
Backpatch down to v10, which is where SCRAM has been introduced, and
where password verifiers in plain format have been removed.
Author: Jonathan Katz
Reviewed-by: Tom Lane, Michael Paquier
Discussion: https://postgr.es/m/016deb6b-1f0a-8e9f-1833-a8675b170aa9@postgresql.org
Backpatch-through: 10
When asked for a slice of a TOAST entry, decompress enough to return the
slice instead of decompressing the entire object.
For use cases where the slice is at, or near, the beginning of the entry,
this avoids a lot of unnecessary decompression work.
This changes the signature of pglz_decompress() by adding a boolean to
indicate if it's ok for the call to finish before consuming all of the
source or destination buffers.
Author: Paul Ramsey
Reviewed-By: Rafia Sabih, Darafei Praliaskouski, Regina Obe
Discussion: https://postgr.es/m/CACowWR07EDm7Y4m2kbhN_jnys%3DBBf9A6768RyQdKm_%3DNpkcaWg%40mail.gmail.com
This unifies the various ad hoc logging (message printing, error
printing) systems used throughout the command-line programs.
Features:
- Program name is automatically prefixed.
- Message string does not end with newline. This removes a common
source of inconsistencies and omissions.
- Additionally, a final newline is automatically stripped, simplifying
use of PQerrorMessage() etc., another common source of mistakes.
- I converted error message strings to use %m where possible.
- As a result of the above several points, more translatable message
strings can be shared between different components and between
frontends and backend, without gratuitous punctuation or whitespace
differences.
- There is support for setting a "log level". This is not meant to be
user-facing, but can be used internally to implement debug or
verbose modes.
- Lazy argument evaluation, so no significant overhead if logging at
some level is disabled.
- Some color in the messages, similar to gcc and clang. Set
PG_COLOR=auto to try it out. Some colors are predefined, but can be
customized by setting PG_COLORS.
- Common files (common/, fe_utils/, etc.) can handle logging much more
simply by just using one API without worrying too much about the
context of the calling program, requiring callbacks, or having to
pass "progname" around everywhere.
- Some programs called setvbuf() to make sure that stderr is
unbuffered, even on Windows. But not all programs did that. This
is now done centrally.
Soft goals:
- Reduces vertical space use and visual complexity of error reporting
in the source code.
- Encourages more deliberate classification of messages. For example,
in some cases it wasn't clear without analyzing the surrounding code
whether a message was meant as an error or just an info.
- Concepts and terms are vaguely aligned with popular logging
frameworks such as log4j and Python logging.
This is all just about printing stuff out. Nothing affects program
flow (e.g., fatal exits). The uses are just too varied to do that.
Some existing code had wrappers that do some kind of print-and-exit,
and I adapted those.
I tried to keep the output mostly the same, but there is a lot of
historical baggage to unwind and special cases to consider, and I
might not always have succeeded. One significant change is that
pg_rewind used to write all error messages to stdout. That is now
changed to stderr.
Reviewed-by: Donald Dong <xdong@csumb.edu>
Reviewed-by: Arthur Zakirov <a.zakirov@postgrespro.ru>
Discussion: https://www.postgresql.org/message-id/flat/6a609b43-4f57-7348-6480-bd022f924310@2ndquadrant.com
ce6afc6 has begun the refactoring work by plugging pg_rewind into a
central routine to update the control file, and left around two extra
copies, with one in xlog.c for the backend and one in pg_resetwal.c. By
adding an extra option to the central routine in controldata_utils.c to
control if a flush of the control file needs to be done, it is proving
to be straight-forward to make xlog.c and pg_resetwal.c use the central
code path at the condition of moving the wait event tracking there.
Hence, this allows to have only one central code path to update the
control file, shaving the code from the duplicates.
This refactoring actually fixes a problem in pg_resetwal. Previously,
the control file was first removed before being recreated. So if a
crash happened between the moment the file was removed and the moment
the file was created, then it would have been possible to not have a
control file anymore in the database folder.
Author: Fabien Coelho
Reviewed-by: Michael Paquier
Discussion: https://postgr.es/m/alpine.DEB.2.21.1903170935210.2506@lancre
This adds a new routine to src/common/ which is compatible with both the
frontend and backend code, able to update the control file's contents.
This is now getting used only by pg_rewind, but some upcoming patches
which add more control on checksums for offline instances will make use
of it. This could also get used more by the backend as xlog.c has its
own flavor of the same logic with some wait events and an additional
flush phase before closing the opened file descriptor, but this is let
as separate work.
Author: Michael Banck, Michael Paquier
Reviewed-by: Fabien Coelho, Sergei Kornilov
Discussion: https://postgr.es/m/20181221201616.GD4974@nighthawk.caipicrew.dd-dns.de
Previously, floating-point output was done by rounding to a specific
decimal precision; by default, to 6 or 15 decimal digits (losing
information) or as requested using extra_float_digits. Drivers that
wanted exact float values, and applications like pg_dump that must
preserve values exactly, set extra_float_digits=3 (or sometimes 2 for
historical reasons, though this isn't enough for float4).
Unfortunately, decimal rounded output is slow enough to become a
noticable bottleneck when dealing with large result sets or COPY of
large tables when many floating-point values are involved.
Floating-point output can be done much faster when the output is not
rounded to a specific decimal length, but rather is chosen as the
shortest decimal representation that is closer to the original float
value than to any other value representable in the same precision. The
recently published Ryu algorithm by Ulf Adams is both relatively
simple and remarkably fast.
Accordingly, change float4out/float8out to output shortest decimal
representations if extra_float_digits is greater than 0, and make that
the new default. Applications that need rounded output can set
extra_float_digits back to 0 or below, and take the resulting
performance hit.
We make one concession to portability for systems with buggy
floating-point input: we do not output decimal values that fall
exactly halfway between adjacent representable binary values (which
would rely on the reader doing round-to-nearest-even correctly). This
is known to be a problem at least for VS2013 on Windows.
Our version of the Ryu code originates from
https://github.com/ulfjack/ryu/ at commit c9c3fb1979, but with the
following (significant) modifications:
- Output format is changed to use fixed-point notation for small
exponents, as printf would, and also to use lowercase 'e', a
minimum of 2 exponent digits, and a mandatory sign on the exponent,
to keep the formatting as close as possible to previous output.
- The output of exact midpoint values is disabled as noted above.
- The integer fast-path code is changed somewhat (since we have
fixed-point output and the upstream did not).
- Our project style has been largely applied to the code with the
exception of C99 declaration-after-statement, which has been
retained as an exception to our present policy.
- Most of upstream's debugging and conditionals are removed, and we
use our own configure tests to determine things like uint128
availability.
Changing the float output format obviously affects a number of
regression tests. This patch uses an explicit setting of
extra_float_digits=0 for test output that is not expected to be
exactly reproducible (e.g. due to numerical instability or differing
algorithms for transcendental functions).
Conversions from floats to numeric are unchanged by this patch. These
may appear in index expressions and it is not yet clear whether any
change should be made, so that can be left for another day.
This patch assumes that the only supported floating point format is
now IEEE format, and the documentation is updated to reflect that.
Code by me, adapting the work of Ulf Adams and other contributors.
References:
https://dl.acm.org/citation.cfm?id=3192369
Reviewed-by: Tom Lane, Andres Freund, Donald Dong
Discussion: https://postgr.es/m/87r2el1bx6.fsf@news-spur.riddles.org.uk