Thanks to previous patches, the request messages are now sanitized to
properly handle Upgrade requests. Now, if a 'connection: upgrade' header
value was found while no 'Upgrade' header, the 'upgrade' values is removed
from the 'connection' header. Conversely the opposite is also performed. If
'Upgrade' header was found, but no "conneciotn: upgrade" header value, all
occurrences of 'Upgrade' header are refused.
This patch depends on following ones:
* MINOR: h1: Add a H1M flag to specify a non-empty 'Upgrade:' header was parsed
* MINOR: http: Add function to remove all occurrences of a value in a header
It should fix the issue 3397. But the H2 part should be reviewed too, and
probably the H1 response parsing, to be consistent with this change.
The series should be backported as far as 2.4.
http_remove_header_value() function was added to parse a header value and
remove all occurrences of a specific value.
This patch is mandatory to fix a bug.
Records with a stream ID set to 0 are reserved to management records.
However there was no check to trigger an error if an application record is
received with a stream ID to 0. This could lead to crash becausqe management
streams (which are static and immutable) can be modified while processing
application records (STDOUT/STDERR/END_REQUEST).
To fix the issue, An error is returned if the stream ID 0 is set on
GET_VALUES_RESULT or UNKNOWN_TYPE records.
This patch must be backported to all stable versions.
do_decrypt_cek_rsa() calls EVP_PKEY_decrypt with RSA_PKCS1_PADDING for
RSA1_5 and returns failure (goto end) on decrypt error. This creates a
measurable timing difference between "padding invalid" (fast exit before
content decryption) and "padding valid + AEAD tag fail" (full AES-GCM/CBC
decryption path), exposing the RSA private key to a Bleichenbacher-style
adaptive attack requiring ~10^4-10^6 queries.
Fix: On RSA_PKCS1_PADDING failure, fill decrypted_cek with random bytes
of the buffer size and return success (retval=0). This forces execution
into decrypt_ciphertext() regardless of padding validity, so the attacker
cannot distinguish valid from invalid padding via timing. The AEAD tag
check in decrypt_ciphertext() will still reject the wrong CEK, but the
timing profile is identical for both branches.
RSA-OAEP variants are not affected (mathematically infeasible to craft
valid ciphertext without the private key).
Introduced by RSA1_5 path lacking constant-time fallback.
Two fixes addressing cryptographic and parsing correctness issues:
1. Enforce 16-byte GCM authentication tag in decrypt_ciphertext()
The base64url-decoded 5th JWE component (authentication tag) was passed
directly to EVP_CTRL_AEAD_SET_TAG with its attacker-controlled length.
OpenSSL accepts 1-16 byte GCM tags and only verifies that many bytes, so
a 1-byte tag reduces forgery work factor to ~256. RFC 7518 mandates 128-bit
(16 byte) tags for A*GCM. The CBC-HMAC path already enforced correct length,
confirming this was an oversight.
Fix: Add (*aead_tag)->data != 16 check before the GCM branch in
decrypt_ciphertext(), rejecting any non-16-byte tag.
Introduced by 416b87d5db (JWE A*GCM support).
2. Enforce 16-byte GCMKW tag in parse_jose() decode_jose_field()
The $.tag field from the attacker-supplied protected header in A*GCMKW
key-wrap was similarly decoded without length enforcement. Fix: Add a
size != 16 check for fields named ".tag" in decode_jose_field() when
called from the GCMKW path.
Introduced by 026652a7eb (GCMKW tag field parsing).
Header protection can only be performed on a packet of a minimal size.
There was already a check for this in qc_do_rm_hp() but it did not use
the correct value.
Fix this by using the correct minimal size which is 20 bytes starting
from the packet number offset. This is enough to decrypt 4 bytes (PN max
size) and 16 bytes of IV. If the packet is not big enough, it is
still silently discarded.
This must be backported up to 2.6.
Ensure frame parsing error does not cause a crash by removing the
associated BUG_ON()/ABORT_NOW().
For now, connection is flagged on error, which ensures that any
send/receive future operations are prevented and connection is closed
asap. In the future, a proper CONNECTION_CLOSE will be required as
defined by QMux protocol.
No need to backport.
hlua_http_add_hdr() passes Lua string values directly to htx_add_header()
without validation. This can be an issue for user-controlled data, but as
well when relying on poorly written scripts. This patch makes sure that
neither the name nor the value may contain any of these forbidden chars.
This should be backported to all versions since the issue has been there
since at least 2.4.
The following patch ensures server MAX_PUSH_ID are rejected as a client.
This has been implemented by extending h3_rcv_buf().
e4a5a64198
BUG/MINOR: h3: reject server MAX_PUSH_ID frame
Case label for MAX_PUSH_ID has been moved in the function, however the
break instruction was removed by error. Fix this by adding the missing
break statement.
This must be backported to every version the above fix is. Currently, it
is scheduled to 3.3.
Fix a case of dereference NULL pointer when trying to use an user from
an userlist which does not have a password configured.
The check_user() function tries to do an strcmp of the password, howver
u->pass is NULL and the strcmp would crash when trying.
Must be backported in every stable branches.
Previously, MAX_PUSH_ID frames were silently ignored both on client and
server sides. However, such frame cannot be emitted by the server.
This patch fixes this by properly issuing connection error
FRAME_UNEXPECTED when receiving a MAX_PUSH_ID frame as a client. This is
implemented by extending h3_check_frame_valid().
This must be backported up to 3.3.
HTTP/3 PUSH_PROMISE frames are systematically rejected with H3 error
FRAME_UNEXPECTED. This is adapted on the server side as a client can
never emit them.
This patch adapts error reporting when haproxy runs as a client. In this
case, server is still forbidden to emit any PUSH_PROMISE as MAX_PUSH_ID
frames are never emitted. In this case, ID_ERROR must be used as an
error code.
This must be backported up to 3.3.
CANCEL_PUSH frames are silently ignored on both client and server sides.
However, as push support is not implemented by haproxy, clients are thus
forbidden to emit any of those frames.
Fix this by closing the connection with ID_ERROR when receiving a client
CANCEL_PUSH as a server. On client side, the frame is still silently
discarded.
This must be backported up to 2.6.
Push streams are not supported by haproxy as a client. Thus, it never
emits any MAX_PUSH_ID frame. In this case, the server is not allowed to
initiate any push stream.
This patch ensures that such stream is closed with error H3_ID_ERROR, as
specified by HTTP/3 RFC.
This must be backported up to 3.3.
HTTP/3 push streams can only be opened by a server instance. The
specification mandates that the connection must be closed if a server
receives a client-initiated push stream.
This patch should ensure that it is not possible to exploit
unidirectional streams for an unexpected usage.
This must be backported up to 2.6.
The current PRNG is xoroshiro128**, it was introduced in 2.2 with
commit 52bf83939 ("BUG/MEDIUM: random: implement a thread-safe and
process-safe PRNG"). It features a 2^128 sequence and can perform
2^64 or 2^96 jumps, though only the 2^96 jump is implemented. It
was initially designed to support both processes and threads, and
implements a shared state between threads instead of allocating
distinct sequences based on PID and thread numbers.
Since then, the PRNG's usage grew and processes have disappeared,
but the lock or the DWCAS are still there due to its shared nature,
and it's possible to trigger watchdog warnings by issuing 100 UUIDs
in a single log-format string.
Also, UUID and QUIC retry tokens now consume 128 bits from the PRNG
in two 64-bit calls, and used to weaken the PRNG by rapidly disclosing
its internal state on reasonably idle systems. This indicates that
most of the time we now need 128 bits.
This patch modernizes the internal generator by switching to xoshiro256**,
which has comparable properties (it's even faster), and features even
longer 2^256 periods, still returning 64 bits per call. It can be
initialized with 2^128 and 2^192 jumps. More details here:
https://prng.di.unimi.it/https://prng.di.unimi.it/xoshiro256starstar.c
Here we implement a thread-local state instead of the old shared one,
so there is no more need for synchronization. The state is seeded at
boot, and each thread performs as many 2^192 jumps as their TID is
large. The master process performs a 2^128 jump where it used to
perform a 2^96 jump so that it doesn't overlap with any worker thread.
However a cleaner approach could be to perform a 2^128 jump for each
fork() (here the worker) and 2^192 for each thread. This might be for
a future improvement.
ha_random64_internal() is now the new PRNG, so that everything else
remains totally transparent. _ha_random64_pair_hashed() continues to
hash the first 128 bits of the state.
A simple config generating 100 UUID on 20 threads jumps from 135k to
1.25M req/s, which translates to a bump from 13.5M to 125M UUID/s,
or 9 times faster. And there is no more DWCAS can be seen anymore
in perf top:
Before: 13.5M/s
Overhead Shared Object Symbol
99.04% haproxy [.] ha_random64_internal
0.66% haproxy [.] _ha_random64_pair_hashed
0.03% libc-2.42.so [.] __printf_buffer
0.02% [kernel] [k] _raw_spin_lock
0.01% libc-2.42.so [.] __strchrnul_avx2
0.01% [kernel] [k] ktime_get
0.01% [kernel] [k] lapic_next_deadline
0.01% haproxy [.] sample_process
0.01% haproxy [.] chunk_printf
0.01% libc-2.42.so [.] __printf_buffer_write
0.01% [kernel] [k] hrtimer_active
0.01% libc-2.42.so [.] __memmove_avx_unaligned_erms
0.01% libc-2.42.so [.] _itoa_word
After: 125M/s
18.84% libc-2.42.so [.] __printf_buffer
9.84% haproxy [.] sample_process
8.33% libc-2.42.so [.] __strchrnul_avx2
6.61% libc-2.42.so [.] __memmove_avx_unaligned_erms
6.06% libc-2.42.so [.] __printf_buffer_write
4.43% haproxy [.] strlcpy2
4.09% libc-2.42.so [.] _itoa_word
2.62% haproxy [.] sess_build_logline_orig
2.12% haproxy [.] _ha_random64_pair_hashed
1.28% haproxy [.] pool_put_to_cache
1.06% haproxy [.] __pool_alloc
1.00% haproxy [.] smp_fetch_uuid
0.93% haproxy [.] lf_text_len
0.82% haproxy [.] ha_generate_uuid_v4
The QUIC retry tokens used to directly return ha_random64(), making the
next tokens easily predictable on low-load systems before the XXH64 call.
Let's now switch to the faster and safer ha_random64_pair_hashed() instead.
Instead of using two consecutive calls to ha_random64(), let's use the
cleaner and safer ha_random64_pair_hashed(). This way the internal
PRNG state will not leak into the emitted headers.
The UUID generation used to emit the internal PRNG state, which allows
to predict previous and next ones, or disclose the internal PRNG state.
While not critical, it may eventually become an issue.
This patch uses the new ha_random64_pair_hashed() function that returns
a pair of u64 that are hashed from the internal PRNG state. It's almost
twice as fast on 20 threads (14.1M UUID/s vs 7.8M/s).
The cluster secret, when SSL is not working, used to involve a mix of
calls to ha_random64() and random() to mask the bits that we didn't want
to see leaked. Let's now simply fall back to ha_random64_pair_hashed()
that does a much better job.
A lot of places call two ha_random64() in a row to generate a 128-bit
random. While it's now safe against linear analysis thanks to the XXH64
call, it's still particularly expensive due to the lock.
Here we introduce a new function ha_random64_pair_hashed(), that feeds
two uint64_t with a hash of the PRNG's internal state, and make it
advance. This will cut in half the number of calls to ha_random64()
and should recover a part of the performance lost in the lock. For
now it's not used.
Consuming randoms in pairs directly exposes the internal PRNG's state
on moderately idle system. It can allow to predict next (or previous)
UUIDs, QUIC retry tokens, and WS keys for example. Let's insert an XXH64
call on the ha_random64() output to avoid this. We expand the boot seed
as the secret at boot, and use now_ns as the seed for each call. The
original ha_random64() function was renamed to ha_random64_internal()
for use cases where it's not a problem to directly use the internal
state.
The performance loss is only measurable when single-threaded. It drops
from 7.32M UUID per second to 7.16M. Above that there is no longer any
difference due to the DWCAS loop which reaches up to 98.5% CPU at 20
threads.
This will need to be backported to stable releases after a period of
observation.
_51d_set_device_offsets() passes ctx.value.ptr directly to
fiftyoneDegreesGetDeviceOffset() which expects a null-terminated string.
Let's copy it through the trash first, to avoid possibly surronding
garbage.
This can be backported to all versions.
The PRNG used by the DNS currently is easily predictable once an
observer can collect a few consecutive IDs from the same thread, since
it's a 32-bit xorshift reduced to 16 bits output. Let's switch it to
ha_random32() instead.
This should be backported, however on older releases the ha_random32()
cost is higher due to the lock involved.
In ssl_sock_switchctx_cbk(), the servername is copied into the trash
and null-terminated, but later in the call to strncpy() it's still used
as-is, so anything that follows it will be copied as well, which is not
really expected. Let's make the servername point to the trash after
sanitizing it, like ssl_sock_switchcbk_wolfSSL_cbk() does.
This can be backported to 2.6 since it was introduced with commit
a996763619 ("BUG/MINOR: ssl: Store client SNI in SSL context in case
of ClientHello error").
After reading the handshake length, which is covered by the previous
4 bytes check, the size was not subtracted before being compared to the
retrieved handshake length, making it possible to accept a handshake
that claims to be 4 bytes larger than it really is. Similarly, a few
lines later, data[34] is accessed without checking that it is present,
because the test is made on the second hs_len, which doesn't guarantee
that the data are there.
This fix adds both tests. It can be backported to all stable versions
as it was introduced in 1.6 with commit bb2acf589f ("MINOR: payload:
add support for tls session ticket ext").
Right now no special case is made of size zero and the parser assumes
that it can read the last two chars, which do not exist in this case.
Let's check for this empty string situation and return zero (empty) as
well.
This should be backported to all versions.
http_7239_extract_nodeport() reads the first byte of the passed string
but the caller doesn't check that it's not empty, which can happen if
passed as 'host="127.0.0.1:"'. In that case the function would read and
return garbage that is present in the buffer after the colon. Let's just
check the remaining length before reading.
This can be backported to 2.8 as it was introduced with commit b2bb9257d2
("MINOR: proxy/http_ext: introduce proxy forwarded option").
7 ACME state handlers iterate over hc->res.hdrs, but they can be called
after an error was detected, and the HTTP client will leave res.hdrs NULL
on connection errors before headers are received. Let's check this inside
the loop, like the chkorder handler already does.
Most of them, if not all, need to be backported to 3.2.
In 1.4, Basic authentication support was added by commit f9423ae43a
("[MINOR] acl: add http_auth and http_auth_group"). Interestingly,
a mistake there consisted in taking the length of the comparison from
the input token, so "b" matches "Basic". It was later propagated to
Bearer in 2.5 with commit f5dd337b12 ("MINOR: http:
Add http_auth_bearer sample fetch"). Let's just compare the entire
tokens.
This may be backported though it is very minor.
A dynamic chunk size is now being allocated for output since commit
dfc4085413 ("MEDIUM: sample: Get chunks with a size dependent on input
data when necessary"). However this one missed the need for the trailing
zero when specifying the size, let's add it.
No backport is needed, this is only in 3.4.
Both boundary checks in the authority record parsing loop of
resolv_validate_dns_response() use >= bufend where they should use
> bufend, causing valid DNS responses with exactly enough bytes to be
rejected as invalid.
The first one, "reader + offset + 10 >= bufend" is too strict since it
prevents 10-byte responses from being accepted as valid while they
are. The second one, "reader + len >= bufend" has the same issue, when
exactly len bytes remain, the check rejects it even though dns_max_name()
already validated it. It may be backported though it is unlikely to ever
be noticed.
The caching RFC (9111, but was present since 2616) indicate that
cache-control supports both the "token" and "token=..." forms and that
consumers are supposed to recognize both. In addition, "private=..." is
explicitly mentioned, so servers could very well emit it. However,
haproxy only recognizes the short form without argument, except for
"no-cache" where it also supports it followed by the beginning of a
set-cookie argument. Thus it could miss "private=" or "no-store=".
Let's refine the checks. Now we explicitly recognize the form
no-cache="set-cookie", and all variants of "token" or "token=" as
identical to disable caching. It will more reliably catch such edge
cases and make sure we never cache a response marked like this.
This should be backported, at least to the latest LTS (3.2), maybe
further after some observation.
When checking for secondary entries, the tree is walked within duplicates
of the primary key, only indexed on the first 32 bits, which means that
in case of hash collision, we could start looking for an object and
switch to another one while visiting secondaries. In order to avoid this
we simply need to always check the full primary hash of the entry that
was found.
This should be backported to all stable versions.
By default, HTTP/1 status codes are not limited in the parser. However,
the value is stored in a 16-bit field, meaning that it may be truncated
if too large. Let's just restrict to 3-digits by default, and permit to
relax the check when accept-unsafe-violations is set, provided that the
value still fits in 16 bits.
This could be backported to latest LTS release.
Originally with "option accept-invalid-http-request", we couldn't really
edit the request on the fly to remove offending headers. But since we
have HTX and the headers are indexed one at a time, it has become
trivial. A non-negligible number of violations are conditioned by the
now renamed "option accept-unsafe-violations-in-http-request", and a
controversial one could definitely be reporting and passing invalid
header names containing control chars or spaces. The option was placed
so as not to block requests/responses containing them, but there's no
point in passing them to the other side. Most of the time it will be
totally harmless since the other side will reject them. But in case
haproxy is placed in front of a non-compliant server, it would fail
to protect it.
This patch implements a name check for all headers when a parsing
error was detected. It's cheap enough (especially since only done
after an error), and will skip the header if its name is invalid.
This may also remove some possibilities of confusion in logs, or
when encoding headers names for example.
This should be backported at least till the latest LTS.
Latest commit 04811943b5 ("MINOR: haterm: enable h3 for TCP bindings")
produces a warning when SSL is not enabled due to the addition of
expose-experimental-directives. Let's condition it to the use of SSL.
Passing a size or anything with suffix "r" is supposed to apply a
random factor form 0 to 1. However due to the replacement of random()
with ha_random64(), all 64 bits are random before the divide, so the
end result is a random 32-bit value. In addition, ha_random64() is
slow since shared between threads.
Let's use statistical_prng() which is designed for this purpose and
is much cheaper. No backport is needed, this is only in 3.4.
In resolv_validate_dns_response(), when matching an additional A/AAAA
record to an SRV record, the code checked tmp_record->ar_item == NULL
then called pool_free(resolv_answer_item_pool, tmp_record->ar_item).
This is a copy-paste mistake from similar patterns elsewhere since
the pointer is confirmed to be NULL a few lines above, so let's just
drop the confusing pool_free.
In resolv_validate_dns_response(), the second DNS record parsing path
manually constructs a 32-bit big-endian TTL value from four individual
bytes using the expression:
reader[0] * 16777216 + reader[1] * 65536 + reader[2] * 256 + reader[3]
We have read_n32() to do this, and it's more robust against unexpected
signedness surprises (which should not happen right here since reader is
unsigned char and we use -fwrapv so the result is defined). Also, let's
make the ttl an uint instead of an int. The TTL is only retrieved and not
used for now, so better clean it now.
In 2.5, commit da0264a96 ("MINOR: sample: Add be2hex converter")
introduced the be2hex() converter, which reads input data of a given
chunk size, processes it as a big endian block and turns it to hex
output.
There's an issue if the configured chunk_size (2nd argument) is larger
than tune.bufsize/2, because the max_size calculation will underflow,
and the later loop will always match since it compares a size_t to an
int (BTW, compilers love to annoy us with useless warnings but I never
found how to see some for these ones). This can result in overflowing
the output trash if the input sample is at least as large as half a
buffer.
Let's add an explicit check for this, and change the max_size type to
size_t so that the comparison is always right. While we're at it, let's
ask the trash buffer to be twice as large, just like bin2hex() does, as
it may result in offering a larger buffer in 3.4. thanks to the large
buffers support.
Despite the risk, this is marked as minor because a config with that
large an argument in the converter makes absolutely no sense.
This should be backported to 2.6. The *2 for the trash allocation will
conflict and have to be dropped in stable versions, which is safe.
When not set, the cluster secret is randomly generated by two
consecutive calls to ha_random64(). However, the random64 PRNG may be
partially observed on a fully idle machine (QUIC retry tokens, UUID,
WS key), and it could be rolled back to the initial call that produced
the secret. This is purely theoretical as a normally loaded system
wouldn't reveal meaningful sequences, but better address this while
it's still easy.
The first here consists in isolating the cluster_secret from the PRNG
sequence. When RAND_bytes() is available and works, it's used. Otherwise
ha_random64() is mixed with uncorrelated bits from random().
This could be backported to stable releases.
In dict_insert(), when ebis_insert() returns an existing node n indicating
that another thread inserted the same key concurrently, the code freed its
own newly-allocated entry and returned the winner without bumping its
refcount. Both callers then held a reference with refcount=1 instead of 2,
so when one expires the other becomes a use-after-free or double-free.
The bug likely comes from the fact that new_dict_entry() creates an entry
with a refcount preset to 1 (saves an atomic op) and that because of this
there is no refcount increment upon a successful insertion in the tree,
resulting in requiring different code paths for collision and normal
insertion.
A simple fix consists in bumping the refcount under the lock and unlocking
only at the end, but this would mean performing two free() calls under a
lock, which we always try to avoid. The code was slightly rearranged so
that we can now bump the existing entry's refcount under the lock in case
of duplicate, or unlock immediately in the common case, so that the free()
call is done out of the lock.
The probably of the race is very low (at peers connection setup only),
reason why it's marked low. This should be backported to all versions.
In parse_log_message(), the first loop looks for '>' that finishes the
priority field, and unfortunately it stops once it has checked the first
byte after the end of the buffer. This means that a priority made only
of digits for the whole buffer would read one extra byte. In practice
since pools have a tag at the end this is only detectable when using ASAN,
but this should be fixed nevertheless.
This can be backported to all versions.
It's worth noting that RFC5424 now says that the PRI field is 1..3
digits only, so maybe at some point we could seriously limit the
length as well.
When the PRIORITY flag is present on a HEADERS frame, the frame must
contain a stream dependency and a weight, for a total of 5 bytes. The
length is checked after reading the stream dep field so theoretically
such a frame could cause up to 4-byte OOB read at the end of the buffer,
though in practice buffers allocated from pools never end on a page
boundary (one extra word at the end) and the anomaly is still detected
after reading the stream ID and the connection aborted with the glitch
count incremented. Thus while not technically correct, practically
speaking it's harmless.
This should be backported to all stable releases.
The previous fix 75f72c2eb ("BUG/MEDIUM: resolvers: Fix test on dn label
size in resolv_dn_label_to_str()") may still leave garbage from the input
buffer into the response: if a component length is passed as zero, it
should mark the end, but instead a dot will be emitted, and whatever
follows it in the input buffer would continue to be appended as extra
components. While having no direct consequences beyond the domain not
being properly decoded, it could at least complicate troubleshooting.
This should be backported where the fix above is backported.
The previous fix 75f72c2eb ("BUG/MEDIUM: resolvers: Fix test on dn label
size in resolv_dn_label_to_str()") can still be fooled by an input exactly
the size of str_len, in which case the trailing zero appended at the end
was not being accounted for. Let's add 1 to the condition to prepare for
it.
This needs to be backported wherever the fix above is backported.