This reverts commit fefce297ab.
The commit broke the resolvers. All responses are marked as invalid. The
resolv_read_name() function can return 0 on error, but it seems also
possible to return 0 when no label name was found. And depending on the
caller, it can be an error... or not.
So, let's revert it. This might trigger a watchdog but doesn't seem to and
once fixed it makes things worse.
Must be backported as far as 2.4.
Initial max_record_size is set to 16382. If the first received record
size is larger, abort xprt_qmux layer immediately without having to wait
for the timeout.
No need to backport.
With QMux, each peer has to first emit a transport parameters frame. If
the received frame is different, xprt_qmux handshake cannot proceed.
This patch removes the BUG_ON() in this case, replacing it with a safer
connection closure.
In the future, a graceful close with CONNECTION_CLOSE frame should be
implemented.
No need to backport.
Remove BUG_ON() when reading a QMux record larger than the buffer. It is
now replaced by a safer error handling. In the future, a proper
CONNECTION_CLOSE emission should be implemented for this case.
No need to backport.
When a policy is set, and the number of threads is calculated
dynamically, make sure we enforce thread-hard-limit, and do not create
thread groups based on how many thread we would have created without
the limit.
This should be backported to 3.3 and 3.2. The patch won't apply cleanly
there, because the code has changed since then, but it should be very
similar, only we'll have to check "cpu_count" there, where in 3.4 we
check "thr_count".
commit 72fd357814 ("MEDIUM: mux-h1: Return an error on h2 upgrade
attempts if not allowed") added an h1_report_glitch() call on the new
405 path but exits via "goto no_parsing", which skips the
session_add_glitch_ctr() call at the end of the parse block. As a
result fc_glitches increments correctly but the per-session stick
counters never see it, breaking sc_glitch_cnt-based rate limiting of
the H2-preface-over-H1 abuse pattern.
No backport needed beyond the branches that took 72fd357814.
[cf: Patch was edited to move the goto label instead of duplicating
the call to session_add_glitch_ctr]
ssl_sock_add_san_ext() builds the Subject Alternative Name extension by
concatenating "DNS:" + servername and passing the result to
X509V3_EXT_nconf_nid(). OpenSSL's nconf parser splits the value string on
commas into multiple type:value SAN entries. The SNI comes from unauthenticated
TLS ClientHello data -- an attacker can embed commas and colons (e.g.,
"host,dns:internal.corp,ip:10.0.0.1") to inject arbitrary GENERAL_NAME entries
into certificates signed by HAProxy's configured CA.
This is a CA issuance-policy violation: the operator expects one certificate
per SNI hostname, but an attacker can obtain certificates containing additional
hostnames/IPs/emails without access to the CA private key.
Fix by adding ssl_sock_sni_is_valid() that validates the SNI contains only
DNS-label-legal characters (alphanumeric, hyphens, dots). The check is
performed at the start of ssl_sock_do_create_cert() before any allocation.
Commas, colons, spaces, and other special characters cause certificate
generation to fail, preventing SAN injection while allowing all valid
hostname values.
Must be backported in every maintained branches.
tcpcheck_ldap_expect_bindrsp() parses ASN.1 BER-encoded LDAP responses from
the health check target. After reading the outer message size and validating
protocol fields, it encounters a long-form BER length for the bindResponse
value (high bit set in the length byte). The code reads nbytes = (*ptr &
0x7f) then advances ptr by 1 + nbytes without checking that enough bytes
remain in the receive buffer. So, it is possible to read more data than
available.
Note that it is only possible if the LDAP response was forged because the
message length was already checked. LDAP response remains quite short and it
is not possible to read outside the buffer area. So at worst, garbage are
parsed and a wrong result is reported by the LDAP health-check. Most
probably an error will be reported.
This patch could be backported to all stable versions.
Released version 3.4-dev14 with the following main changes :
- MINOR: config: shm-stats-file is no longer experimental
- BUILD: proxy: unstatify the proxies_del_lock to avoid a warning without threads
- BUG/MEDIUM: net_helper: fix a remaining possibly infinite loop in converters
- MINOR: ssl_sock: remove unneeded check on QMux flags
- MINOR: connection: define xprt_add_l6hs()
- MINOR: xprt_qmux: define default value for get_alpn
- MINOR: connection: define mask CO_FL_WAIT_XPRT_L6
- MINOR: session: support QMux in clear on FE side
- MINOR: backend: support QMux in clear for BE side
- BUG/MINOR: ocsp: Manage date too far away in the future
- MINOR: mux_quic: handle STOP_SENDING in QMux
- MINOR: mux_quic: handle MAX_STREAMS for uni stream in QMux
- MINOR: mux_quic: do not crash on unhandled QMux frame reception
- BUG/MEDIUM: applet: Properly handle receives of size 0
- BUG/MEDIUM: resolvers: Fix test on dn label size in resolv_dn_label_to_str()
- BUG/MEDIUM: ssl-gencert: Unlock LRU cache if failing to generate certificate
- BUG/MINOR: quic: fix ODCID lookup from derived value
- BUG/MEDIUM: dict: hold lock while decrementing refcount in dict_entry_unref
- BUG/MINOR: tcpchecks: Limit parsing of agent-check reply to the buffer
- BUG/MEDIUM: hlua: Fix integer underflow when receiving line from lua cosocket
- BUG/MEDIUM: cli: Fix parsing of pattern finishing a command payload
- BUG/MEDIUM: acme: NUL terminate response buffer before PEM parsing
- BUILD: intops: mask the fail value in array_size_or_fail()
- BUG/MEDIUM: log-forward: make sure the month is unsigned
- BUG/MEDIUM: regex: allocate a large enough pcre2 match for all matches
- BUG/MEDIUM: tcpcheck/spoe: bound the SPOP error code to valid values
- BUG/MEDIUM: cache: fix a refcount leak for missed secondary entries
- BUG/MINOR: log: free logformat expr on compile failure in cfg_parse_log_profile
- BUG/MINOR: resolvers: fix room for trailing zero in resolv_dn_label_to_str()
- BUG/MINOR: resolvers: fix risk of appending garbage past the domain name
- BUG/MINOR: mux-h2: validate HEADERS frame length before reading stream dep
- BUG/MINOR: log: look for the end of priority before the end of the buffer
- BUG/MINOR: dict: fix refcount race on insert collision
- BUG/MINOR: init: use more than ha_random64() for the cluster secret
- BUG/MINOR: sample: limit the be2hex converter's chunk size
- CLEANUP: resolvers: use read_n32() instead of open-coded big-endian read
- CLEANUP: resolvers: remove pool_free(NULL) in SRV additional record matching
- CLEANUP: resolvers: fix comment typos and wrong filenames in file headers
- BUG/MINOR: haterm: fix the random suffix multiplication
- MINOR: haterm: enable h3 for TCP bindings
- MINOR: haterm: do not emit a warning when not using SSL
- BUG/MEDIUM: h1: drop headers whose names contain invalid chars
- BUG/MEDIUM: h1: limit status codes to 3 digits by default
- BUG/MEDIUM: cache: always verify the primary hash in get_secondary_entry()
- BUG/MINOR: cache: also recognize directives in the form "token="
- BUG/MINOR: resolvers: relax size checks in authority record parsing
- BUG/MINOR: sample: request an extra output byte for the url_dec converter
- BUG/MINOR: http-fetch: check against the whole token in get_http_auth()
- BUG/MEDIUM: acme: protect against risk of null-deref on connection failure
- BUG/MINOR: http-ext: always check remaining data when reading rfc7239 nodeport
- BUG/MINOR: base64: return empty string for empty input in base64dec()
- BUG/MINOR: payload: fix the handshake length bounds check smp_client_hello_parse()
- BUG/MINOR: ssl-hello: make use of the null-terminated servername
- BUG/MINOR: resolvers: switch to a better PRNG for query IDs
- BUG/MINOR: addons/51d: NUL-terminate headers before passing them to Trie API
- BUG/MEDIUM: tools: insert an XXH64 layer on the PRNG output
- MINOR: tools: provide a function to generate a hashed random pair
- MEDIUM: init: fall back to ha_random64_pair_hashed() for the cluster secret
- MEDIUM: tools: use the hashed random pair for UUID generation
- MEDIUM: h1: use ha_random64_pair_hashed() for the WebSocket key
- MEDIUM: quic: use ha_random64_pair_hashed() to generate the QUIC retry tokens
- MEDIUM: tools: switch the main PRNG to a thread-local xoshiro256**
- BUG/MEDIUM: h3: reject client push stream
- BUG/MINOR: h3: reject server push stream
- BUG/MINOR: h3: reject client CANCEL_PUSH frame
- BUG/MINOR: h3: adjust error on PUSH_PROMISE frame reception
- BUG/MINOR: h3: reject server MAX_PUSH_ID frame
- BUG/MEDIUM: auth: fix unconfigured password NULL deref
- BUG/MINOR: h3: add missing break on rcv_buf()
- BUG/MINOR: hlua: prevent Lua from passing CR/LF/NUL in HTTP headers
- BUG/MINOR: qmux: do not crash on frame parsing issue
- BUG/MINOR: quic: reject packet too short for HP decryption
- BUG/MINOR: jwe: enforce GCM tag length to 128 bits
- BUG/MEDIUM: jwe: substitute random CEK on RSA1_5 decryption failure per RFC 7516 #11.5
- BUG/MEDIUM: mux-fcgi: reject stream ID 0 for application records
- MINOR: http: Add function to remove all occurrences of a value in a header
- MINOR: h1: Add a H1M flag to specify a non-empty 'Upgrade:' header was parsed
- BUG/MEDIUM: h1-htx: Sanitize parsing to properly handle upgrade requests
- BUG/MINOR: mux-fcgi: Use relative offset to compute contig data in demux buf
- BUG/MINOR: mux-spop: Use relative offset to compute contig data in demux buf
- CLEANUP: mux-fcgi/mux-spop: Remove copy/pasted comment about slow realign
A comment about the condition to perform a slow realign of the demux buffer
was abusively copy/pasted from the FCGI multiplexer at different places in
the FCGI and SPOP multiplexers. Let's remove these comments.
b_contig_data() should be called with a head-relative offset (0 for the
beginning of readable data). However, in the SPOP multiplexer, to get
contiguous data available in the demux buffer, it is called with
b_head_ofs(dbuf) which returns an absolute buffer position (b->head). So
b->head is counted twice. Because of this bug, the demux buffer could be
realigned while it should not and conversely.
Instead, the offset 0 must be used. So let's fix it.
This patch must be backported as far as 3.2.
b_contig_data() should be called with a head-relative offset (0 for the
beginning of readable data). However, in the FCGI multiplexer, to get
contiguous data available in the demux buffer, it is called with
b_head_ofs(dbuf) which returns an absolute buffer position (b->head). So
b->head is counted twice. Because of this bug, the demux buffer could be
realigned while it should not and conversely.
Instead, the offset 0 must be used. So let's fix it.
This patch must be backported as far as 2.4.
Thanks to previous patches, the request messages are now sanitized to
properly handle Upgrade requests. Now, if a 'connection: upgrade' header
value was found while no 'Upgrade' header, the 'upgrade' values is removed
from the 'connection' header. Conversely the opposite is also performed. If
'Upgrade' header was found, but no "conneciotn: upgrade" header value, all
occurrences of 'Upgrade' header are refused.
This patch depends on following ones:
* MINOR: h1: Add a H1M flag to specify a non-empty 'Upgrade:' header was parsed
* MINOR: http: Add function to remove all occurrences of a value in a header
It should fix the issue 3397. But the H2 part should be reviewed too, and
probably the H1 response parsing, to be consistent with this change.
The series should be backported as far as 2.4.
http_remove_header_value() function was added to parse a header value and
remove all occurrences of a specific value.
This patch is mandatory to fix a bug.
Records with a stream ID set to 0 are reserved to management records.
However there was no check to trigger an error if an application record is
received with a stream ID to 0. This could lead to crash becausqe management
streams (which are static and immutable) can be modified while processing
application records (STDOUT/STDERR/END_REQUEST).
To fix the issue, An error is returned if the stream ID 0 is set on
GET_VALUES_RESULT or UNKNOWN_TYPE records.
This patch must be backported to all stable versions.
do_decrypt_cek_rsa() calls EVP_PKEY_decrypt with RSA_PKCS1_PADDING for
RSA1_5 and returns failure (goto end) on decrypt error. This creates a
measurable timing difference between "padding invalid" (fast exit before
content decryption) and "padding valid + AEAD tag fail" (full AES-GCM/CBC
decryption path), exposing the RSA private key to a Bleichenbacher-style
adaptive attack requiring ~10^4-10^6 queries.
Fix: On RSA_PKCS1_PADDING failure, fill decrypted_cek with random bytes
of the buffer size and return success (retval=0). This forces execution
into decrypt_ciphertext() regardless of padding validity, so the attacker
cannot distinguish valid from invalid padding via timing. The AEAD tag
check in decrypt_ciphertext() will still reject the wrong CEK, but the
timing profile is identical for both branches.
RSA-OAEP variants are not affected (mathematically infeasible to craft
valid ciphertext without the private key).
Introduced by RSA1_5 path lacking constant-time fallback.
Two fixes addressing cryptographic and parsing correctness issues:
1. Enforce 16-byte GCM authentication tag in decrypt_ciphertext()
The base64url-decoded 5th JWE component (authentication tag) was passed
directly to EVP_CTRL_AEAD_SET_TAG with its attacker-controlled length.
OpenSSL accepts 1-16 byte GCM tags and only verifies that many bytes, so
a 1-byte tag reduces forgery work factor to ~256. RFC 7518 mandates 128-bit
(16 byte) tags for A*GCM. The CBC-HMAC path already enforced correct length,
confirming this was an oversight.
Fix: Add (*aead_tag)->data != 16 check before the GCM branch in
decrypt_ciphertext(), rejecting any non-16-byte tag.
Introduced by 416b87d5db (JWE A*GCM support).
2. Enforce 16-byte GCMKW tag in parse_jose() decode_jose_field()
The $.tag field from the attacker-supplied protected header in A*GCMKW
key-wrap was similarly decoded without length enforcement. Fix: Add a
size != 16 check for fields named ".tag" in decode_jose_field() when
called from the GCMKW path.
Introduced by 026652a7eb (GCMKW tag field parsing).
Header protection can only be performed on a packet of a minimal size.
There was already a check for this in qc_do_rm_hp() but it did not use
the correct value.
Fix this by using the correct minimal size which is 20 bytes starting
from the packet number offset. This is enough to decrypt 4 bytes (PN max
size) and 16 bytes of IV. If the packet is not big enough, it is
still silently discarded.
This must be backported up to 2.6.
Ensure frame parsing error does not cause a crash by removing the
associated BUG_ON()/ABORT_NOW().
For now, connection is flagged on error, which ensures that any
send/receive future operations are prevented and connection is closed
asap. In the future, a proper CONNECTION_CLOSE will be required as
defined by QMux protocol.
No need to backport.
hlua_http_add_hdr() passes Lua string values directly to htx_add_header()
without validation. This can be an issue for user-controlled data, but as
well when relying on poorly written scripts. This patch makes sure that
neither the name nor the value may contain any of these forbidden chars.
This should be backported to all versions since the issue has been there
since at least 2.4.
The following patch ensures server MAX_PUSH_ID are rejected as a client.
This has been implemented by extending h3_rcv_buf().
e4a5a64198
BUG/MINOR: h3: reject server MAX_PUSH_ID frame
Case label for MAX_PUSH_ID has been moved in the function, however the
break instruction was removed by error. Fix this by adding the missing
break statement.
This must be backported to every version the above fix is. Currently, it
is scheduled to 3.3.
Fix a case of dereference NULL pointer when trying to use an user from
an userlist which does not have a password configured.
The check_user() function tries to do an strcmp of the password, howver
u->pass is NULL and the strcmp would crash when trying.
Must be backported in every stable branches.
Previously, MAX_PUSH_ID frames were silently ignored both on client and
server sides. However, such frame cannot be emitted by the server.
This patch fixes this by properly issuing connection error
FRAME_UNEXPECTED when receiving a MAX_PUSH_ID frame as a client. This is
implemented by extending h3_check_frame_valid().
This must be backported up to 3.3.
HTTP/3 PUSH_PROMISE frames are systematically rejected with H3 error
FRAME_UNEXPECTED. This is adapted on the server side as a client can
never emit them.
This patch adapts error reporting when haproxy runs as a client. In this
case, server is still forbidden to emit any PUSH_PROMISE as MAX_PUSH_ID
frames are never emitted. In this case, ID_ERROR must be used as an
error code.
This must be backported up to 3.3.
CANCEL_PUSH frames are silently ignored on both client and server sides.
However, as push support is not implemented by haproxy, clients are thus
forbidden to emit any of those frames.
Fix this by closing the connection with ID_ERROR when receiving a client
CANCEL_PUSH as a server. On client side, the frame is still silently
discarded.
This must be backported up to 2.6.
Push streams are not supported by haproxy as a client. Thus, it never
emits any MAX_PUSH_ID frame. In this case, the server is not allowed to
initiate any push stream.
This patch ensures that such stream is closed with error H3_ID_ERROR, as
specified by HTTP/3 RFC.
This must be backported up to 3.3.
HTTP/3 push streams can only be opened by a server instance. The
specification mandates that the connection must be closed if a server
receives a client-initiated push stream.
This patch should ensure that it is not possible to exploit
unidirectional streams for an unexpected usage.
This must be backported up to 2.6.
The current PRNG is xoroshiro128**, it was introduced in 2.2 with
commit 52bf83939 ("BUG/MEDIUM: random: implement a thread-safe and
process-safe PRNG"). It features a 2^128 sequence and can perform
2^64 or 2^96 jumps, though only the 2^96 jump is implemented. It
was initially designed to support both processes and threads, and
implements a shared state between threads instead of allocating
distinct sequences based on PID and thread numbers.
Since then, the PRNG's usage grew and processes have disappeared,
but the lock or the DWCAS are still there due to its shared nature,
and it's possible to trigger watchdog warnings by issuing 100 UUIDs
in a single log-format string.
Also, UUID and QUIC retry tokens now consume 128 bits from the PRNG
in two 64-bit calls, and used to weaken the PRNG by rapidly disclosing
its internal state on reasonably idle systems. This indicates that
most of the time we now need 128 bits.
This patch modernizes the internal generator by switching to xoshiro256**,
which has comparable properties (it's even faster), and features even
longer 2^256 periods, still returning 64 bits per call. It can be
initialized with 2^128 and 2^192 jumps. More details here:
https://prng.di.unimi.it/https://prng.di.unimi.it/xoshiro256starstar.c
Here we implement a thread-local state instead of the old shared one,
so there is no more need for synchronization. The state is seeded at
boot, and each thread performs as many 2^192 jumps as their TID is
large. The master process performs a 2^128 jump where it used to
perform a 2^96 jump so that it doesn't overlap with any worker thread.
However a cleaner approach could be to perform a 2^128 jump for each
fork() (here the worker) and 2^192 for each thread. This might be for
a future improvement.
ha_random64_internal() is now the new PRNG, so that everything else
remains totally transparent. _ha_random64_pair_hashed() continues to
hash the first 128 bits of the state.
A simple config generating 100 UUID on 20 threads jumps from 135k to
1.25M req/s, which translates to a bump from 13.5M to 125M UUID/s,
or 9 times faster. And there is no more DWCAS can be seen anymore
in perf top:
Before: 13.5M/s
Overhead Shared Object Symbol
99.04% haproxy [.] ha_random64_internal
0.66% haproxy [.] _ha_random64_pair_hashed
0.03% libc-2.42.so [.] __printf_buffer
0.02% [kernel] [k] _raw_spin_lock
0.01% libc-2.42.so [.] __strchrnul_avx2
0.01% [kernel] [k] ktime_get
0.01% [kernel] [k] lapic_next_deadline
0.01% haproxy [.] sample_process
0.01% haproxy [.] chunk_printf
0.01% libc-2.42.so [.] __printf_buffer_write
0.01% [kernel] [k] hrtimer_active
0.01% libc-2.42.so [.] __memmove_avx_unaligned_erms
0.01% libc-2.42.so [.] _itoa_word
After: 125M/s
18.84% libc-2.42.so [.] __printf_buffer
9.84% haproxy [.] sample_process
8.33% libc-2.42.so [.] __strchrnul_avx2
6.61% libc-2.42.so [.] __memmove_avx_unaligned_erms
6.06% libc-2.42.so [.] __printf_buffer_write
4.43% haproxy [.] strlcpy2
4.09% libc-2.42.so [.] _itoa_word
2.62% haproxy [.] sess_build_logline_orig
2.12% haproxy [.] _ha_random64_pair_hashed
1.28% haproxy [.] pool_put_to_cache
1.06% haproxy [.] __pool_alloc
1.00% haproxy [.] smp_fetch_uuid
0.93% haproxy [.] lf_text_len
0.82% haproxy [.] ha_generate_uuid_v4
The QUIC retry tokens used to directly return ha_random64(), making the
next tokens easily predictable on low-load systems before the XXH64 call.
Let's now switch to the faster and safer ha_random64_pair_hashed() instead.
Instead of using two consecutive calls to ha_random64(), let's use the
cleaner and safer ha_random64_pair_hashed(). This way the internal
PRNG state will not leak into the emitted headers.
The UUID generation used to emit the internal PRNG state, which allows
to predict previous and next ones, or disclose the internal PRNG state.
While not critical, it may eventually become an issue.
This patch uses the new ha_random64_pair_hashed() function that returns
a pair of u64 that are hashed from the internal PRNG state. It's almost
twice as fast on 20 threads (14.1M UUID/s vs 7.8M/s).
The cluster secret, when SSL is not working, used to involve a mix of
calls to ha_random64() and random() to mask the bits that we didn't want
to see leaked. Let's now simply fall back to ha_random64_pair_hashed()
that does a much better job.
A lot of places call two ha_random64() in a row to generate a 128-bit
random. While it's now safe against linear analysis thanks to the XXH64
call, it's still particularly expensive due to the lock.
Here we introduce a new function ha_random64_pair_hashed(), that feeds
two uint64_t with a hash of the PRNG's internal state, and make it
advance. This will cut in half the number of calls to ha_random64()
and should recover a part of the performance lost in the lock. For
now it's not used.
Consuming randoms in pairs directly exposes the internal PRNG's state
on moderately idle system. It can allow to predict next (or previous)
UUIDs, QUIC retry tokens, and WS keys for example. Let's insert an XXH64
call on the ha_random64() output to avoid this. We expand the boot seed
as the secret at boot, and use now_ns as the seed for each call. The
original ha_random64() function was renamed to ha_random64_internal()
for use cases where it's not a problem to directly use the internal
state.
The performance loss is only measurable when single-threaded. It drops
from 7.32M UUID per second to 7.16M. Above that there is no longer any
difference due to the DWCAS loop which reaches up to 98.5% CPU at 20
threads.
This will need to be backported to stable releases after a period of
observation.
_51d_set_device_offsets() passes ctx.value.ptr directly to
fiftyoneDegreesGetDeviceOffset() which expects a null-terminated string.
Let's copy it through the trash first, to avoid possibly surronding
garbage.
This can be backported to all versions.
The PRNG used by the DNS currently is easily predictable once an
observer can collect a few consecutive IDs from the same thread, since
it's a 32-bit xorshift reduced to 16 bits output. Let's switch it to
ha_random32() instead.
This should be backported, however on older releases the ha_random32()
cost is higher due to the lock involved.
In ssl_sock_switchctx_cbk(), the servername is copied into the trash
and null-terminated, but later in the call to strncpy() it's still used
as-is, so anything that follows it will be copied as well, which is not
really expected. Let's make the servername point to the trash after
sanitizing it, like ssl_sock_switchcbk_wolfSSL_cbk() does.
This can be backported to 2.6 since it was introduced with commit
a996763619 ("BUG/MINOR: ssl: Store client SNI in SSL context in case
of ClientHello error").
After reading the handshake length, which is covered by the previous
4 bytes check, the size was not subtracted before being compared to the
retrieved handshake length, making it possible to accept a handshake
that claims to be 4 bytes larger than it really is. Similarly, a few
lines later, data[34] is accessed without checking that it is present,
because the test is made on the second hs_len, which doesn't guarantee
that the data are there.
This fix adds both tests. It can be backported to all stable versions
as it was introduced in 1.6 with commit bb2acf589f ("MINOR: payload:
add support for tls session ticket ext").
Right now no special case is made of size zero and the parser assumes
that it can read the last two chars, which do not exist in this case.
Let's check for this empty string situation and return zero (empty) as
well.
This should be backported to all versions.
http_7239_extract_nodeport() reads the first byte of the passed string
but the caller doesn't check that it's not empty, which can happen if
passed as 'host="127.0.0.1:"'. In that case the function would read and
return garbage that is present in the buffer after the colon. Let's just
check the remaining length before reading.
This can be backported to 2.8 as it was introduced with commit b2bb9257d2
("MINOR: proxy/http_ext: introduce proxy forwarded option").
7 ACME state handlers iterate over hc->res.hdrs, but they can be called
after an error was detected, and the HTTP client will leave res.hdrs NULL
on connection errors before headers are received. Let's check this inside
the loop, like the chkorder handler already does.
Most of them, if not all, need to be backported to 3.2.
In 1.4, Basic authentication support was added by commit f9423ae43a
("[MINOR] acl: add http_auth and http_auth_group"). Interestingly,
a mistake there consisted in taking the length of the comparison from
the input token, so "b" matches "Basic". It was later propagated to
Bearer in 2.5 with commit f5dd337b12 ("MINOR: http:
Add http_auth_bearer sample fetch"). Let's just compare the entire
tokens.
This may be backported though it is very minor.
A dynamic chunk size is now being allocated for output since commit
dfc4085413 ("MEDIUM: sample: Get chunks with a size dependent on input
data when necessary"). However this one missed the need for the trailing
zero when specifying the size, let's add it.
No backport is needed, this is only in 3.4.
Both boundary checks in the authority record parsing loop of
resolv_validate_dns_response() use >= bufend where they should use
> bufend, causing valid DNS responses with exactly enough bytes to be
rejected as invalid.
The first one, "reader + offset + 10 >= bufend" is too strict since it
prevents 10-byte responses from being accepted as valid while they
are. The second one, "reader + len >= bufend" has the same issue, when
exactly len bytes remain, the check rejects it even though dns_max_name()
already validated it. It may be backported though it is unlikely to ever
be noticed.
The caching RFC (9111, but was present since 2616) indicate that
cache-control supports both the "token" and "token=..." forms and that
consumers are supposed to recognize both. In addition, "private=..." is
explicitly mentioned, so servers could very well emit it. However,
haproxy only recognizes the short form without argument, except for
"no-cache" where it also supports it followed by the beginning of a
set-cookie argument. Thus it could miss "private=" or "no-store=".
Let's refine the checks. Now we explicitly recognize the form
no-cache="set-cookie", and all variants of "token" or "token=" as
identical to disable caching. It will more reliably catch such edge
cases and make sure we never cache a response marked like this.
This should be backported, at least to the latest LTS (3.2), maybe
further after some observation.
When checking for secondary entries, the tree is walked within duplicates
of the primary key, only indexed on the first 32 bits, which means that
in case of hash collision, we could start looking for an object and
switch to another one while visiting secondaries. In order to avoid this
we simply need to always check the full primary hash of the entry that
was found.
This should be backported to all stable versions.
By default, HTTP/1 status codes are not limited in the parser. However,
the value is stored in a 16-bit field, meaning that it may be truncated
if too large. Let's just restrict to 3-digits by default, and permit to
relax the check when accept-unsafe-violations is set, provided that the
value still fits in 16 bits.
This could be backported to latest LTS release.
Originally with "option accept-invalid-http-request", we couldn't really
edit the request on the fly to remove offending headers. But since we
have HTX and the headers are indexed one at a time, it has become
trivial. A non-negligible number of violations are conditioned by the
now renamed "option accept-unsafe-violations-in-http-request", and a
controversial one could definitely be reporting and passing invalid
header names containing control chars or spaces. The option was placed
so as not to block requests/responses containing them, but there's no
point in passing them to the other side. Most of the time it will be
totally harmless since the other side will reject them. But in case
haproxy is placed in front of a non-compliant server, it would fail
to protect it.
This patch implements a name check for all headers when a parsing
error was detected. It's cheap enough (especially since only done
after an error), and will skip the header if its name is invalid.
This may also remove some possibilities of confusion in logs, or
when encoding headers names for example.
This should be backported at least till the latest LTS.
Latest commit 04811943b5 ("MINOR: haterm: enable h3 for TCP bindings")
produces a warning when SSL is not enabled due to the addition of
expose-experimental-directives. Let's condition it to the use of SSL.
Passing a size or anything with suffix "r" is supposed to apply a
random factor form 0 to 1. However due to the replacement of random()
with ha_random64(), all 64 bits are random before the divide, so the
end result is a random 32-bit value. In addition, ha_random64() is
slow since shared between threads.
Let's use statistical_prng() which is designed for this purpose and
is much cheaper. No backport is needed, this is only in 3.4.
In resolv_validate_dns_response(), when matching an additional A/AAAA
record to an SRV record, the code checked tmp_record->ar_item == NULL
then called pool_free(resolv_answer_item_pool, tmp_record->ar_item).
This is a copy-paste mistake from similar patterns elsewhere since
the pointer is confirmed to be NULL a few lines above, so let's just
drop the confusing pool_free.
In resolv_validate_dns_response(), the second DNS record parsing path
manually constructs a 32-bit big-endian TTL value from four individual
bytes using the expression:
reader[0] * 16777216 + reader[1] * 65536 + reader[2] * 256 + reader[3]
We have read_n32() to do this, and it's more robust against unexpected
signedness surprises (which should not happen right here since reader is
unsigned char and we use -fwrapv so the result is defined). Also, let's
make the ttl an uint instead of an int. The TTL is only retrieved and not
used for now, so better clean it now.
In 2.5, commit da0264a96 ("MINOR: sample: Add be2hex converter")
introduced the be2hex() converter, which reads input data of a given
chunk size, processes it as a big endian block and turns it to hex
output.
There's an issue if the configured chunk_size (2nd argument) is larger
than tune.bufsize/2, because the max_size calculation will underflow,
and the later loop will always match since it compares a size_t to an
int (BTW, compilers love to annoy us with useless warnings but I never
found how to see some for these ones). This can result in overflowing
the output trash if the input sample is at least as large as half a
buffer.
Let's add an explicit check for this, and change the max_size type to
size_t so that the comparison is always right. While we're at it, let's
ask the trash buffer to be twice as large, just like bin2hex() does, as
it may result in offering a larger buffer in 3.4. thanks to the large
buffers support.
Despite the risk, this is marked as minor because a config with that
large an argument in the converter makes absolutely no sense.
This should be backported to 2.6. The *2 for the trash allocation will
conflict and have to be dropped in stable versions, which is safe.
When not set, the cluster secret is randomly generated by two
consecutive calls to ha_random64(). However, the random64 PRNG may be
partially observed on a fully idle machine (QUIC retry tokens, UUID,
WS key), and it could be rolled back to the initial call that produced
the secret. This is purely theoretical as a normally loaded system
wouldn't reveal meaningful sequences, but better address this while
it's still easy.
The first here consists in isolating the cluster_secret from the PRNG
sequence. When RAND_bytes() is available and works, it's used. Otherwise
ha_random64() is mixed with uncorrelated bits from random().
This could be backported to stable releases.
In dict_insert(), when ebis_insert() returns an existing node n indicating
that another thread inserted the same key concurrently, the code freed its
own newly-allocated entry and returned the winner without bumping its
refcount. Both callers then held a reference with refcount=1 instead of 2,
so when one expires the other becomes a use-after-free or double-free.
The bug likely comes from the fact that new_dict_entry() creates an entry
with a refcount preset to 1 (saves an atomic op) and that because of this
there is no refcount increment upon a successful insertion in the tree,
resulting in requiring different code paths for collision and normal
insertion.
A simple fix consists in bumping the refcount under the lock and unlocking
only at the end, but this would mean performing two free() calls under a
lock, which we always try to avoid. The code was slightly rearranged so
that we can now bump the existing entry's refcount under the lock in case
of duplicate, or unlock immediately in the common case, so that the free()
call is done out of the lock.
The probably of the race is very low (at peers connection setup only),
reason why it's marked low. This should be backported to all versions.
In parse_log_message(), the first loop looks for '>' that finishes the
priority field, and unfortunately it stops once it has checked the first
byte after the end of the buffer. This means that a priority made only
of digits for the whole buffer would read one extra byte. In practice
since pools have a tag at the end this is only detectable when using ASAN,
but this should be fixed nevertheless.
This can be backported to all versions.
It's worth noting that RFC5424 now says that the PRI field is 1..3
digits only, so maybe at some point we could seriously limit the
length as well.
When the PRIORITY flag is present on a HEADERS frame, the frame must
contain a stream dependency and a weight, for a total of 5 bytes. The
length is checked after reading the stream dep field so theoretically
such a frame could cause up to 4-byte OOB read at the end of the buffer,
though in practice buffers allocated from pools never end on a page
boundary (one extra word at the end) and the anomaly is still detected
after reading the stream ID and the connection aborted with the glitch
count incremented. Thus while not technically correct, practically
speaking it's harmless.
This should be backported to all stable releases.
The previous fix 75f72c2eb ("BUG/MEDIUM: resolvers: Fix test on dn label
size in resolv_dn_label_to_str()") may still leave garbage from the input
buffer into the response: if a component length is passed as zero, it
should mark the end, but instead a dot will be emitted, and whatever
follows it in the input buffer would continue to be appended as extra
components. While having no direct consequences beyond the domain not
being properly decoded, it could at least complicate troubleshooting.
This should be backported where the fix above is backported.
The previous fix 75f72c2eb ("BUG/MEDIUM: resolvers: Fix test on dn label
size in resolv_dn_label_to_str()") can still be fooled by an input exactly
the size of str_len, in which case the trailing zero appended at the end
was not being accounted for. Let's add 1 to the condition to prepare for
it.
This needs to be backported wherever the fix above is backported.
When lf_expr_compile() fails in cfg_parse_log_profile, the code leaves
without freeing the previously strdup()'d strings in target_lf->str and
target_lf->conf.file. Let's add a call to lf_expr_deinit() there to
release it.
It was harmless anyway since the startup will abort when this happens,
but better clean it because with increasingly dynamic setups, one day
it could become a runtime leak.
No backport is needed.
When a primary cache hit has a Vary secondary_key_signature, the code calls
retain_entry() and shctx_row_detach() before performing the secondary lookup.
If get_secondary_entry() returns NULL (no stored variant matches), res is set
to NULL and the function falls through to return ACT_RET_CONT without calling
release_entry() or shctx_row_reattach(). Each such request leaks one refcount
and pins one shctx row permanently, eventually exhausting the cache if this
happens to all objects. This is visible when requesting a secondary key
covered by vary for an object that is already stored without that key.
"show cache" then shows the object's refcount increasing after each request.
In order to fix this we must do like when no secondary key could be built
and release everything. We only reattach to the row if we previously
detached.
The issue was introduced in 2.4 with commit 1785f3dd9 ("MEDIUM: cache: Add
the Vary header support"). The code changed a bit in 2.9 with commit
48f81ec09 ("MAJOR: cache: Delay cache entry delete in reserve_hot function"),
so in order to backport to 2.8 and older, the patch will have to be manually
applied (no test on detached).
tcpcheck_spop_expect_hello() stores the SPOA agent-supplied status-code
varint directly into check->code (signed short) without range validation.
The code is later used as an index into spop_err_reasons[100]. Let's
just replace invalid status codes with SPOP_ERR_UNKNOWN to avoid any
problem.
The SPOP tcp-check was introduced in 3.1 so this fix must be backported
to 3.2.
In 3.3 with commit fda6dc959 ("MINOR: regex: use a thread-local match
pointer for pcre2") we got a thread-local match that saves us from having
to allocate a match array with each match. However something was clearly
overlooked or misunderstood in the pcre2 API because the local match
array was initialized via pcre2_match_data_create() for MAX_MATCH-1
entries instead of MAX_MATCH, despite the commit message mentioning
MAX_MATCH entries. It was possibly confused with an index. Due to this
there is a risk of crash when matching more than 9 groups in a regex.
This fix must be backported to 3.3.
In 2.3, in preparation for log forwarding, commit 546488559 ("MEDIUM:
log/sink: re-work and merge of build message API.") extended the log
send API to be able to use metadata from an existing header. However
the month number is parsed from the passed meta-data and compared
against 11 but there's no check for negative values which could in
theory cause a negative monthname[] index.
It can be a problem when the date is received as RFC5424 and forced
to RFC3164 because certain characters in the month field could result
in a negative month value. Let's fix it by turning the month to unsigned
to make sure we only accept months 0..11.
This should be backported to all branches.
Cross-compilation on m68k fails in ssl_sock_resize_passphrase_cache()
where the compiler noticed the SIZE_MAX passed to realloc() in the
error path and complained that it's larger than PTRDIFF_MAX. This can
be disabled with -Walloc-size-larger-than=SIZE_MAX but in practice we
can simply hide the value and keep the warning to detect real failures
elsewhere. Let's pass it through DISGUISE() and also take this
opportunity for doing that inside an unlikely() clause since it's never
supposed to happen.
acme_res_certificate() passes the httpclient response buffer to
ssl_sock_load_pem_into_ckch(), which will then call BIO_new_mem_buf(buf, -1).
The "-1" flag will make the OpenSSL PEM parser determine the length by
using strlen(). However, the httpclient populates the response buffer with
__b_putblk() without writing a trailing NUL to it. The byte at area[data]
is whatever data previously resided there in the memory pool.
Thus, a malicious or compromised ACME CA can perform an arbitrary-length
out-of-bounds read until hitting the first NULL byte past the response
body. The OpenSSL PEM loader will try to iterate to load the chain
certificates, thus the PEM-looking garbage found in freed memory chunks
can be erroneously loaded as additional intermediate certificates. The
presence of a single NUL inside the valid response body will result in
silent truncation of the certificate.
Make sure that the area[data] contains a terminating NULL before passing
the buffer to the parser. Fail on insufficient room for the NUL terminator.
No backport required: The ACME client has been added in 3.x and this
code path didn't exist in 2.x.
When the dedidacted buffer to store the command payload was added (c5ae0da62
"MEDIUM: cli: Make a buffer for the command payload"), an bug was
introduced. When the pattern finishing the command payload is found, it is
removed from the buffer. A NULL-bytes is added before it, skipping the
previous newline character.
It worked well in all cases before the commit above, because the commandline
was already parsed and was placed at the beginning of the cmdline buffer.
So, there is always a line before the payload.
Now, the payload is stored in a dedicated buffer. So there is nothing
preceeding it in a buffer. If the payload is empty, we cannot rewind to the
previous line to set the NULL-byte character. We must handle this case to
avoid integer underflow on the payload buffer length.
It is a 3.4-specific bug. No backport needed.
In hlua_socket_receive_yield(), when we try to get a line, the trailing CRLF is
stripped by decrementing the block length. The '\n' is first skipped, then,
possible a preceeding '\r'. But the block lenght is never checked. If an empty
line is returned, this leads to an integer underflow and most probably to a
crash because this length is used to copy data into a LUA string.
To fix the issue, the block length is now properly tested against 0 before
decrementing it.
This patch must be backported to all stable versions.
When parsing the agent-check reply, we first loop on the response to find
the newline character, to add a NULL-byte at the end of the line. However,
this loop is not bounded to the data available in the buffer. So it is
possible to read bytes outside the buffer and eventually write a NULL-byte
ouside the buffer.
So let's check for the end of the buffer when looping on the agent-check
reply.
This patch must be backported to all stable versions.
In dict_entry_unref(), the write lock on d->rwlock was only acquired after
decrementing the refcount. However, between the decrement and the lock,
another thread could increment it by calling dict_insert(). That could lead
to a UAF.
To fix the issue, the call to HA_ATOMIC_SUB_FETCH is moved inside the write
lock.
This patch must be backported to all stable versions.
In haproxy, when an Initial packet is received, a new connection may be
created and a DCID must be attributed. This CID is derived from the
original DCID used by the client in its first packet. This is an
optimization to avoid storing two CIDs values in the CID tree.
On CID lookup, if the DCID used is not found, derivation is performed
again. This should permit to retrieve the DCID node. However, this
operation is not performed as expected in quic_get_cid_tid(), as the
wrong value is used on the second lookup. Fix this function by using
derive CID for it. Note that retrieve_qc_conn_from_cid() performs the
same lookup but the bug was not present there.
The impact of this bug is relatively low as most clients send a single
Initial packet. Even in case of multiple packets in a single datagram,
this does not cause any issue as the current thread is assigned as
default.
This should be backported up to 2.8.
In ssl_sock_generate_certificate(), if the LRU cache for generated
certificates is used, the LRU tree is not unlocked on cache miss if the
certificate generation failed. So let's unlock it on error path.
The bug was introduced by the commit fbc98ebcd ("BUG/MEDIUM: ssl: fix error
path on generate-certificates"). So this patch must be backported with the
commit above, so to all stable versions.
In resolv_dn_label_to_str(), size for a dn label was stored into an integer
from a signed char without a cast to unsigned. So dn label with a size of
128 bytes or more become negative, skipping this way the copy loop and
desynchronizing input vs output.
In addition, the size of the destination string was only checked at the
begining, against the dn string length. But it must also be checked for
every dn label, to be sure. The dn string can be forged to copied more bytes
than expected.
This patch must be backported to all stable versions.
when appctx_rcv_buf() function was called to get data from the applet, but
to get zero bytes, nothing was performed and the function early
returned. However, we must at least take care to set SE_FL_WANT_ROOM if
necessary. Otherwise, if data are still blocked in the applet's output
buffer while the EOI/EOS are pending, the information can be reported to the
upper layer and remaining data can be lost.
Indeed, in such case, SE_FL_WANT_ROOM flag is here to specify the applet has
more data to deliver. Thanks to this flag, the stream will wait before
closing. But when appctx_rcv_buf() function is called, this flag is removed by
the stconn. It is the function responsibility to set it again when necessary.
This patch should fix second part of the issue #3366. It must be backported
to 3.0.
Completes qmux_parse_frm() to ensure every frames allowed by QMux
protocol are listed. For now, nothing is implemented except a CHECK_IF()
to report such events.
This is necessary to prevent a crash on abort. Frames not supported by
QMux should already have been rejected prior via qmux_is_frm_valid().
Handle reception of a MAX_STREAMS frame for unidirectional stream usage
when using QMux. This simply consists in using qcc_recv_max_streams() as
with QUIC protocol.
Ensure reception of STOP_SENDING via QMux protocol is properly handled.
This simply consists in using qcc_recv_stop_sending() which will update
the associated QCS if found.
The check on the OCSP response expire time is based on the "Next Update"
field of the response, converted by my_timegm function that returns a
time_t (signed long). It is then stored in the 'expire' field of the
certificate_ocsp structure which is typed as a signed long.
When loading an OCSP response, if the "Next Update" time is too far in
the future and we are running on a 32 bits machine, we might end up with
negative times ireturned by my_timegm, which make the comparison with
the current date fail and raises the "OCSP single response: no longer
valid." error message.
This problem typically happens in the ocsp_auto_update.vtc regtest since
the loaded OCSP response have a "Next Update" field in 2050.
This patch simply changes the type of the expire field to an unsigned
long since the 'my_timegm' function does not return '-1' in case of
error, contrary to the standard 'timegm' one.
Ths patch can be backported to all stable branches.
Use xprt_add_l6hs() at the end of connect_server() if selected MUX layer
relies on a temporary handshake prior to its initialization. This
functions is noop is SSL layer is active.
This change is necessary to support clear QMux on the backend side.
Recently defined <init_xprt> from mux_proto_list is used to render the
code as generic as possible.
Activates xprt_qmux layer if necessary via session_accept_fd(). This is
necessary to be able to support QMux in clear. This operation is noop if
SSL is active, as in this case xprt_qmux will be activated after the SSL
handshake completion.
To ensure MUX init is delayed when running with clear QMux, mask
CO_FL_WAIT_XPRT_L6 is added to test if the embryonic task must be
started instead.
Define a new connection flag mask CO_FL_WAIT_XPRT_L6. This will be used
to indicate that a XPRT layer is running on top of layer 6. For now,
only xprt_qmux implements this method of operation.
Extend get_alpn() for xprt_qmux layer. If lower layer does not implement
ALPN negotiation, return a statically default protocol value. Currently
this is set to "h3".
This change is required to support QMux in clear without SSL. In the
future, it could be useful to configure the default protocol, for
example by extending the syntax for the "proto" keyword.
When QMux protocol is used, xprt_qmux layer is setup after SSL handshake
completion but prior to the MUX initialization. Once transport
parameters exchange is successful, the layer is removed and the MUX is
started.
The layer setup operation was performed directly on ssl_sock_io_cb().
Simplify the code by extracting it in a dedicated function
xprt_add_l6hs(). The function is generic so the requested XPRT layer
must be passed as argument.
The code is mostly identical. One difference is that a check is
performed to ensure no SSL handshake is pending. If this is the case,
the function is a noop. This will become useful to support QMux
transparently both in clear or on top of SSL.
Another minor addition is that CO_FL_XPRT_READY flag is automatically
resetted by xprt_add_l6hs(). This allows the code to use
conn_xprt_start() standard function after XPRT init.
A recent patch has introduced <init_xprt> mux_proto_list member. This
allows to activate QMux on SSL handshake completion without explicit
"proto qmux" setting.
Thanks to this change, on SSL handshake completion it is not necessary
anymore to check for CO_FL_QMUX_* flags.
The various tcp_option_* converters rely on tcp_fullhdr_find_opt() to
find the option. However, the same bug as fixed in commit dbf471f99a
("BUG/MAJOR: net_helper: ip.fp infinite loop on malformed tcp options")
was also present there, by which an option of length 0 could be looped
over indefinitely. In practice this does not happen since such options
are not valid, but if passed encoded in an HTTP header for example, it
could possibly be passed.
While fixing it, let's check for length >1 in all 3 locations insteead
of only non-zero, since there's no point processing a malformed option
that wouldn't even be properly skipped.
This fix doesn't need to be backported, unless the ip.fp series is.
Thanks to @Vincent55 for reporting this issue.
When threads are disabled, "static __decl_spinlock(foo);" ends up as
"static;", causing a build warning when threads are disabled. We don't
need it to be static so let's drop "static" here. No backport is needed,
this is 3.4-only.
Released version 3.4-dev13 with the following main changes :
- BUG/MINOR: backend: correct parameter value validation in get_server_ph_post()
- BUG/MINOR: config/dns: properly fail on duplicate nameserver name detection
- BUG/MEDIUM: dns: fix long loops in additional records parse on name failure
- BUG/MEDIUM: resolvers: fix name compression pointer validation in resolv_read_name()
- BUG/MEDIUM: dns: fix memory leak of sockaddr in dns_session_init() error path
- CLEANUP: proxy: fix tiny mistakes in parse error messages
- CLEANUP: dns: fix misleading error messages in dns_stream_init()
- BUG/MINOR: server: better handling of OOM in srv_set_fqdn()
- BUG/MINOR: servers: use proper source of pool_conn_name in srv_settings_cpy()
- BUG/MEDIUM: server/cli: unlock server lock on failure in cli_parse_set_server
- BUG/MINOR: resolvers: fix dangling list pointer in resolvers_new() error paths
- BUG/MINOR: dns: fix dangling dgram pointer on dns_dgram_init() failure path
- BUG/MINOR: proxy: use proxy_drop() in parse_new_proxy() error path
- CLEANUP: resolvers: properly initialize the sample in resolv_action_do_resolve()
- BUG/MINOR: resolvers: report the expression error in the do-resolve() action parser
- BUG/MINOR: resolvers: fix leaked dgram and dns_ring struct in parse_resolve_conf()
- BUG/MINOR: resolvers: fix leaked fields on cfg_parse_resolvers() error paths
- BUG/MINOR: resolvers: fix missing task_idle destruction in resolvers_destroy()
- CLEANUP: proxy: fix duplicate declaration of cli_find_frontend in proxy.h
- CLEANUP: address a few typos and copy-paste errors in httpclient and dns
- DOC: internal: add a few rules about internal core principles
- BUG/MINOR: session/trace: use distinct flags for SESS_EV_END and _ERR
- CLEANUP: stick-table: uniformize the different action_inc_gpc*()
- REGTESTS: do not run quic/tls13_ssl_crt-list_filters in quic openssl compat mode
- REGTESTS: quic/issuers_chain_path: do not forget to enable QUIC compat mode
- BUG/MINOR: sock: store the connection error status
- BUG/MINOR: check: properly report errno in chk_report_conn_err()
- CLEANUP: tcpcheck: mention that we're a bit far for a sync errno
- BUG/MINOR: jwt: fix possible memory leak in convert_ecdsa_sig() error path
- CLEANUP: jwe: fix theoretical overflow in AAD length calculation
- DOC: config: further clarify that resolvers "default" exists
- MINOR: proxy: remove the experimental status on dynamic backends
- BUG/MEDIUM: limits: properly account for global.maxpipes in compute_ideal_maxconn()
- BUG/MINOR: jws: fix OpenSSL 3.0 version check from > to >=
- BUG/MINOR: jws: Add missing return value check (EVP_PKEY_get_bn_param)
- BUG/MINOR: server: Properly handle init-state value during haproxy startup
- BUG/MINOR: httpclient-cli: Destroy http-client context if failing to start it
- BUG/MEDIUM: h1: Skip all h2c values from Upgrade headers during parsing
- BUG/MINOR: h1: Don't mask websocket protocol if multiple protocols used
- MINOR: haterm: Don't init haterm master pipe if not used
- CLEANUP: haterm: Remove "(too old kernel)" from warning message during init
- BUG/MINOR: httpclient-cli: fix uninit variable in error label
- MINOR: mux: Rename the "token" from mux_proto_list to mux_proto
- MEDIUM: connections: Use both mux_proto and alpn to pick a mux
- MINOR: connection: define conn_select_mux_fe()
- MINOR: connection: define conn_select_mux_be()
- MINOR: connection/mux_quic: add MUX <init_xprt> field for QMux handshake
- MINOR: proxy/server: reject TCP ALPN h3 without experimental
- MEDIUM: ssl: allow h3/QMux negotiation without explicit proto
- BUG/MINOR: server: accept server IDs above 2^31 and clarify error message
- BUG/MINOR: backend: fix balance hash calculation when using hash-type none
- MINOR: server: support hash-key id32 for a cleaner distribution
- MINOR: backend: support hash-key guid for a stabler distribution
- MINOR: startup: support unprivileged chroot if possible
- MEDIUM: startup: add automatic chroot feature
- MINOR: h2: explain committed_extra_streams dec on h2_init() error
- OPTIM: h2: do not update committed streams if elasticity disabled
- MINOR: mux_quic: implement basic committed_extra_streams accounting
- MINOR: quic: use stream elasticity value for initial advertisement
- MINOR: mux_quic: define ms_bidi_rel QCC member
- MAJOR: mux_quic: support stream elasticity during connection lifetime
- BUG/MEDIUM: servers: Store the connection hash with the parameter cache
- BUG/MINOR: prevent conn leak in case of xprt_qmux init failure
- BUILD: traces: set a few __maybe_unused on vars used only for traces
- BUILD: traces: add USE_TRACE allowing to disable traces
- MINOR: startup: do not execute chroot() when "/"
- MEDIUM: startup: warn when chroot is not set for root
- BUG/MEDIUM: servers: Don't forget to set srv_hash when needed
- DOC: fix typo on QUIC stream.max-concurrent reference
- BUG/MINOR: mux_quic: do not exceed stream.max-concurrent on backend side
- BUG/MINOR: htx: Fix value of HTX_XFER_HDRS_ONLY flag
- MEDIUM: htx: Improve htx_xfer API to not count HTX meta-data
- BUG/MEDIUM: applet: Fix transfer of HTX data to the applet
- BUG/MEDIUM: htx: Alloc a chunk of right size in htx_replace_blk_value()
- MEDIUM: stick-tables: Avoid freeing elements while holding a lock
- MINOR: intops: add a multiply overflow detection for ulong and size_t
- CLEANUP: tree-wide: use array_size_or_fail() in array size for allocations
- DOC: update supported gcc and openssl versions in INSTALL
Instead of relying on malloc(n*size), we now pass array_size_or_fail(n,m)
so that it becomes possible to detect overflow. This is particularly
interesting for global settings that might be set large enough to cause
overflows on 32-bit systems for example, resulting in small values that
then cause trouble. Now the overflow will be detected at allocation time.
Around 25 locations were updated.
Sometimes we'd like to know if some products overflow, so let's add a
pair of functions for this, for ulong and for size_t. For recent enough
compilers (gcc >= 5, clang >= 3.4) we just use __builtin_mul_overflow()
otherwise we rely on a division and a comparison before performing the
operation.
A third function, array_size_or_fail() computes the size of an array
of m elements of n bytes each, and returns the total size if it fits
in a size_t, otherwise ~0 if it does not so that passing this to
malloc() or any other variant would fail by trying to exhaust the
entire memory space.
In stksess_trash_oldest(), and process_tables_expire(), avoid freeing
elements while holding two locks, as it could be very costly.
Instead, build a linked list of elements to be free'd, and do so once we
no longer hold any lock.
This may help with github issue #3380, and may be backported to 3.3.
Since support for large buffers was added, we must be careful when chunks
are allocated. Indeed, depending on the context a large chunks may be
required if data are copied from a large buffer.
In htx_replace_blk_value() function, when a defragmentation is necessary,
the data to be replaced are copied to a chunk before the
defragmentation. However, I forgot to get large chunk when necessary by
calling alloc_trash_chunk_sz() instead of alloc_trash_chunk(). Because of
this issue, it is possible to copy data to a too small chunk, leading to a
crash.
So let's fix the issue.
Thanks to Vincent55 for finding and reporting this.
No backport needed.
appctx_htx_snd_buf() function is relying on htx_xfer() function to transfer
HTX blocks when a swap of buffers is not possible. However, it was not
properly using this function.
Indeed, originally htx_xfer() was designed to transfer blocks with a limit,
the <count> parameter, which included the blocks payload and the
meta-data. It was aligned with all calls, except for the transfer of HTX
data to the applet, in appctx_htx_snd_buf() function. In that case, the
<count> parameter is the amount of data forwarded by the stream to the
applet. So meta-data are not included.
Thanks to the previous commit ("MEDIUM: htx: Improve htx_xfer API to not count
HTX meta-data"), it is now possible to instruct htx_xfer() function that
<count> parameter does not include the meta-data.
Because of this bug, crashes can be experienced when transferring HTX data
to an applet. At first glance, lua HTTP applets and the http client are
concerned.
Stable versions from 3.3 to 3.0 are also affected. But this patch cannot be
backported as is because htx_xfer() function does not exist on these
versions.
Thaks to Yon Harlicaj for finding and reporting this.
(https://x.com/nvmb3r - https://www.linkedin.com/in/eljon-harlicaj/)
This patch add the ability to the htx_xfer() function to transfer data
without acounting the meta-data. By default, the <count> variable includes
the meta-data. But by setting the flag HTX_XFER_NO_METADATA, It is possible
to transfer HTX blocks without count meta-data. In that case, <count> will
not contain the blocks meta-data and the return value will not include them.
HTX_XFER_* flags must be declared as a bitfield. However, value of
HTX_XFER_HDRS_ONLY was set of 0x03 while it should be 0x04. So let's fix it.
This patch must be backported where the htx_xfer() function was backported
(5ead611cc "MEDIUM: htx: Add htx_xfer function to replace htx_xfer_blks").
Fix usage of stream.max-concurrent QUIC setting on the backend side.
Contrary to frontend connections, this limit must be enforced by QUIC
MUX directly. This is necessary as the peer may allow a larger number of
concurrent streams via its flow control.
First, QUIC TP initial max bidi streams value is now set to 0. This is
fine as only the HTTP/3 client is expected to open bidirectional
streams.
The most important changes is performed in qcm_avail_streams(). The
value first depends on the peer flow control. Now, it is further reduced
if necessary to not exceed the configured BE stream.max-concurrent.
Note that this new behavior may further increases current limitation on
QUIC BE reuse when a QCS instance is kept while its upper stream layer
is detached. In this case there is a risk that the connection is not
reinserted in the correct server pool, as an idle or avail one.
This is a breaking change as BE stream.max-concurrent keyword setting
meaning is changed in effect. However, this does not necessitate extra
warnings as the previous usage was in effect useless. Furthermore, QUIC
on the backend side is still considered as experimental.
This can be backported up to 3.3.
Commit 8aa854ab26a7daa613a17548f1fe1d0adb8cf61b made it so we'd store
the hash corresponding to the server parameters, so that we could detect
if we're still talking to the same server, and not use those parameters
if not.
However, when updating those parameters, we forgot to store the new
hash, which would result in the new parameters never be used, and
breakling 0RTT.
Fix that by properly update the hash when needed.
This should be backported when 8aa854ab26a7daa613a17548f1fe1d0adb8cf61b
is backported.
We're still regularly seeing insecure configs where chroot is missing.
Now that we have "chroot auto", there's no excuse for not knowing where
to chroot, so let's detect that we're starting as root, detect that the
process is allowed to chroot (i.e. no capability issue, or some hardened
containers), and if no chroot is set, let's emit a warning explaining how
to silence it, i.e. either "chroot auto" or "chroot /".
Most likely we'll start using "chroot auto" by default in 3.5 if no
usability issue is reported.
We'll recommend to use "chroot /" to explicitly disable chroot, however
there might be configurations where it would cause problems to just issue
the syscall (typically some hardened containers), so let's make sure that
"chroot /" is a nop in this case.
This reduces the total code size by 6-10% and speeds up the build a
bit. It can be further reduced by disabling the trace decoding code
inside certain subsystems like muxes. But at least like this it will
help users on small systems to reduce the footprint when not needed
by explicitly passing USE_TRACE=0 (they remain enabled by default).
In case of XPRT_QMUX init failure on the frontend side, the connection
must immediately be released. This is not the case on the backend side
as a stream can supervize the connection lifetime.
This patch performs the connection free via conn_complete_session(). As
conn is flagged with CO_FL_ERROR, this will automatically fail and
invoke session_kill_embryonic(), which ensures the session and its
connection are both freed as wanted in this case.
No need to backport.
When we store the negociated server parameters, such as the ALPN, also
store the calculated hash with the connection. If it is different, as
can happen because the IP address is different because set-dst was used,
we certainly do not want to reuse the information in the cache,
otherwise we could end up using the wrong ALPN and mux.
That means we already have to calculate the hash in connect_server()
now, while before we would not do it for Websockets, if we could not do
connection reuse, as that's all the hash was used for.
This should fix Github issue #3386
This should be backported as far as 3.2.
qcc_release_remote_stream() is called each time a remote stream is
closed. Flow control accounting is updated and when necessary, a
MAX_STREAMS_BIDI frame is prepared to allow the peer to initiate new
streams.
This patch extends stream elasticity features with the QUIC bidirection
stream flow control mechanism. The announced value can now be possibly
reduced depending on conn_calc_max_streams().
The first step is to decrement closed streams from the global committed
extra streams total. This must be performed conn_calc_max_streams() to
ensure the calculation will be valid.
Then, there is two cases depending on conn_calc_max_streams() result. If
the value is less than the peer still remaining stream window, nothing
more is performed. If the opposite case, flow control must be increased
and a MAX_STREAMS_BIDI frame is prepared, with the value adjusted to not
exceed the stream elasticity limit. Global extra streams total is then
finally incremented.
This calcul also ensures that when all streams are closed, global extra
streams accounting operations are decremented by 1, as a connection
always has access to one stream which is excluded from the global total.
Note that if stream elasticity is not active, flow control increases
principle is unchanged and remains statically performed.
This patch is labelled as major as it complexifies bidirectional stream
flow control mechanisme. This is a sensitive operation as there is a
risk of connection freeze if flow control updates are inadvertently
skipped.
Add a new QCC member <ms_bidi_rel>. This represents the number of
concurrent streams advertised similarly to ms_bidi, but as a relative
value.
This patch does not introduce any functional change. For now,
<ms_bidi_rel> will be equal to <ms_bidi_init>. However, with the
implementation of stream elasticity and dynamic adjustment for
concurrent max-streams-bidi, the former will be required to keep the
last advertised value.
When stream elasticity is active, the maximum number of concurrent bidi
streams advertised via transport parameters is now reduced depending on
the connection load. This is implemented via conn_calc_max_streams()
which returns the value to use.
This is not applied on listeners with enabled 0-RTT. Indeed, for such
connections, clients are expected to reuse the previously seen transport
parameters. The server on the other hand must not decrease several
values on the newly advertised params, in particular for the maximum
number of concurrent bidi streams. The simplest way to prevent 0-RTT
failure is to not mix stream elasticity with it.
Note that the 0-RTT limitation is only applied for the initial value :
during the connection lifetime, stream elasticity can still be used by
the MUX to dynamically reduce the stream window. This will be
implemented in a future patch.
Account QUIC frontend connections into committed_extra_streams when
stream elasticity setting is active. This is performed in QCC init and
release functions.
This patch has no impact on QUIC subsystem for now. Connections will
still allow a static number of concurrent streams based on
tune.quic.fe.stream.max-concurrent. However, this has a direct
repercussion on H2 subsystem, as a higher count of QUIC connections will
reduce the concurrent streams allowed there.
When streams-elasticity is enabled in the configuration, H2 mux is
responsible to update the global committed_extra_streams value.
Adjust these operations to ensure they are skipped if streams-elasticity
is disabled, which is the current default. This prevents unnecessary
atomic operations in this case.
No need to backport unless streams-elasticity feature is picked in older
releases.
h2_init() is now responsible to increment committed_extra_streams for
new frontend connections, in relation to the newly implemented
stream-elasticity feature. In case of an early error, a mirroring
decrement is executed on fail_stream label.
However, for now this error label can only be selected via BE conns. In
fact, it's not yet possible for h2_init() to fail after the extra
streams increment.
However, the decrement operation is kept to prevent any omissions in
case of future evolutions of h2_init() error path. To prevent reporting
of a possible dead code, add an extra comment which summarizes the
situation.
It is now possible to use "chroot auto" in the configuration. This lets
haproxy create an anonymous (cleaned up after the process terminates)
and read-only directory for chroot. This directory is created in /tmp;
we might want to support creating it in a different directory in the
future, either by respecting $TMPDIR or by allowing an optional
directory after the "auto" keyword.
When server fleets are constantly updated, using a stable distribution
across a bunch of load balancers can be convenient. The addr and port
already provide a bit of this but for situations were addresses might
differ between sites or change dynamically this does not work. The guid
is perfect for this because by definition it's supposed to designate a
single server and be unique. So when two servers anywhere have the same,
the tool that provisionned them promises that they are the same server.
So here we introduce "hash-key guid" which performs a 32-bit hash on
the GUID value. When no guid is provided, a fallback is performed on
ID, as is done for other keys.
The "id" hash-key scales the ID by a factor of 16 that tries to leave
room between the nodes on the 32-bit space to permit smooth weight
variations (e.g. during slowstart). However this does not deal well
with overlaps between server IDs. For example, assigning IDs that are
only multiples of 256 million to 16 servers yields traffic only on
one since in practice they all have the same 28 lower bits.
The new "id32" hash key bridges this gap by using the full 32-bit ID
of the server as the key. On the other hand, the user must be careful
not to switch the hash function to "none" when using incremental IDs
because in this case they might be very poorly distributed. But this
can be convenient for automated provisionning systems which assign
IDs themselves, as the full 32 bits are used now.
The "hash-type xxx none" is broken for keys that are not in type string
because the sample fetch call casts them to SMP_T_BIN, that tends to
preserve the original format (integers, IP addresses etc), but the
gen_hash() function in case of BE_LB_HFCN_NONE expects to read a string
representing a number, that it parses to retrieve the value, and just
fails on many binary types. For example, the following just always
returns key 0:
balance hash rand()
hash type consistent none
An ugly workaround is to make sure the expression returns a string, for
example this:
balance hash rand(),concat()
hash type consistent none
In order to fix most cases here, we force the conversion to type string
when using BE_LB_HFCN_NONE, but a better approach would require a larger
rework and split gen_hash() or change it to accept an integer as well,
so that the caller could cast to SMP_T_INT for BE_LB_HFCN_NONE and pass
the resulting number already parsed with the least information loss. In
this case even IPv4 addresses would be preserved.
The current approach at least addresses the initially envisioned use
cases, and the limitations have been added to the doc. This can be
backported to 3.0 though it's not really important.
Due to the check of the stored value instead of the parsed one, it was not
permitted to use server IDs above 2^31 while they are perfectly possible.
Let's refine the parsing and also update the error message to indicate the
range. The doc was also refined to reflect the relation with hash-key.
This may be backported though it wouldn't have any effect on working
configs.
Implements automatic selection of QMux MUX if "h3" ALPN has been
negotiated on top of TCP/SSL.
The first part of this change is to define "alpn" member of
mux_proto_list. This is necessary so that conn_get_best_mux_entry() can
select it when "h3" has been chosen. As a side-effect, this also
automatically sets a default ALPN to "h3" for bind lines with "proto
qmux".
The most important change is to adapt the SSL layer. On handshake
completion, the eligible MUX is retrieved via conn_select_mux_fe/be()
functions. If xprt_qmux is required by it, MUX init is delayed and QMux
handshake is started first.
This last change is necessary as connection flags CO_FL_QMUX_RECV/SEND
are only set if "proto qmux" is explicitely set. In case xprt_qmux is
activated via pure ALPN negotiation, these flags are also set on
xprt_qmux_init(). This is mandatory to ensure emission/reception of QMux
transport parameters will be performed as expected.
Add a postparsing check on TCP ALPN bind and server setting. An error is
reported if the token "h3" is present and expose-experimental-directives
is not globally activated. This ensures that QMux protocol won't be
selected if experimental features are not explicitely requested.
The check is not performed though if "proto qmux" is explicitely
defined, as this setting already checks for experimental support.
Currently, it's not possible to activate QMux without any explicit
"proto qmux" config. However, this will be implemented in a next patch,
so this check will become necessary.
The first part of this patch defines a new mux_proto_list field named
<xprt_init>. This allows to define an extra XPRT layer which should be
activated first prior to the MUX creation both on frontend and backend
sides.
This is immediately used for QMux mux_proto_list to require XPRT_QMUX
handshake. With this change, activation of QMux connection flags in
session_accept_fd() and connect_server() are adjusted to take into
account <init_xprt> field. This approach is much more evolutive than
relying on the previous MUX name.
Change in connect_server() will also be necessary to support QMux
activation on a TCP server with h3 ALPN without explicit "proto qmux".
This guarantees that MUX initialization is delayed after QMux handshake.
This patch is similar to the previous one but this time for backend
connections. The MUX selection code is directly extracted from
conn_install_mux_chk() and conn_install_mux_be().
Define a new function conn_select_mux_fe().
The objective is to have a preliminary function to determine the MUX
which will be used without initializing it. This will be useful for MUX
which relies on a specific XPRT handshake prior to its startup, which is
the case for QMux protocol.
The code of conn_select_mux_fe() is identical to the beginning of
conn_install_mux_fe() with a similar MUX selection logic. However,
connection MUX initialization is not performed in this case. In a future
patch, both functions should be merged together to reduce code
duplication.
In conn_get_best_mux() and conn_get_best_mux_entry(), the mux name was
provided sometimes based on the "proto" directive, sometimes based on
the ALPN, but in any case, it was compared again the mux_proto_list
mux_proto field. This is not correct, as ALPN can be different from the
internal mux_proto. So enhance those functions so that they wll accept
an ALPN as well. If a mux_proto is provided, that will be used, if not,
and if an ALPN is provided, then that will be used, and compared against
the ALPN provided by the mux, if any.
In struct mux_proto_list, rename the "token" field to "mux_proto". That
field should only be used to match the name provided in the "proto"
directive, and it will be soon.
This should be a no-op.
The following patch fixes a leak in case of httpclient_start() failure
in the httpclient_cli code by adding httpclient_destroy() call on error
path.
c53256adbc
BUG/MINOR: httpclient-cli: Destroy http-client context if failing to start it
However, error label may be selected prior to httpclient allocation if
CLI arguments are incorrect. This can cause a crash due to a deferencing
of an uninitialized variable. This has been detected via a compilation
error :
src/httpclient_cli.c: In function 'hc_cli_parse':
src/httpclient_cli.c:162:2: error: 'hc' may be used uninitialized in this function [-Werror=maybe-uninitialized]
162 | httpclient_destroy(hc);
| ^~~~~~~~~~~~~~~~~~~~~~
This must be backported along the above patch, which is scheduled up to
the 2.6 release.
During initialization of the haterm master pipe, If its size is limited
(lower than the configured one * 5/4), a warning is emitted. In this
warning, it is specified this happened because the kernel is too old. But it
is unrelated. So let's remove this part.
There is no reason to initialize the haterm master pipe if haterm is not
used. So now, it is only performed if a non-disabled haterm frontend is
found. To do so, in addition to test the proxy's flags and capabilities, we
also check if "stream_new_from_sc" points on "hstream_new".
During H1 message parsing, the Upgrade header values are checked to detect
"websocket" prototol, to properly handle websocket upgrades between H1 and
H2 and to possibly reject messages if mandatory headers are missing.
However, the flag is reset for each new Upgrade header and the information
may be lost. So never reset it.
This patch must be backported as far as 2.4.
During the H1 message parsing, the Upgrade header values are checked to
detect "h2c" and "h2" tokens and skip them. To do so, we rely on
H1_MF_UPG_H2C flag, set during the parsing. And during the request
post-parsing, if this flag was set, all Upgrade headers are removed.
This was fixed by the commit 7b89aa5b1 ("BUG/MINOR: h1: do not forward h2c
upgrade header token").
However, there are two issues here and the commit above must be refined.
First, the flag is reset for each new Upgrade header. So "h2c" or "h2"
tokens will be properly detected if all tokens are set on the same Upgrade
header. But if splitted on several headers, previously detected tokens will
be hidden by a next ones.
Concretly, the following will be properly caught
Connection: upgrade
Upgrade: foo, h2c, bar
But then following not:
Connection: upgrade:
Upgrade: foo, h2c
Upgrade: bar
Then, when a "h2c" or "h2" token is finally reported, all Upgrade headers
are removed, regardless other tokens.
So, to fix the both issues, everything is now handled during the message
parsing by skipping "h2c" and "h2" tokens, rebuilding the Upgrade header
value without then offending tokens. The same was already performed for the
Connection header, to skip "keep-alive" and "close" value. So it is not a so
fancy change.
Thanks to this change, it is no longer necessary to handle H1_MF_UPG_H2C
during the request post-parsing. And in fact, this flag is no longer
necessary. So let's remove it too.
Thanks to Vincent55 for finding and reporting this.
This patch must be backported as far as 2.4.
When the call to httpclient_start() failed, it is the caller responsibilty
to destroy the http-client context by calling httpclient_destroy(). It is
performed at several places but it was missing in the httpclient_cli
code. So let's fix it.
This patch must be backported as far as 2.6. On 3.2 and lower, it must be
applied on http_client.c.
Unlike stated in the configuration manual, the server 'init-state' parameter
was not evaluated during haproxy startup/reload. After a review, it appeared
there were also issues if combined with the 'track' parameter. In addtition,
this parameter was only evaluated when health-checks were enabled for the
server, leading to unexpected behavior if the serve settings are dynamically
changed via the CLI.
To fix those issues, behavior of the 'init-state' parameter was slightly
adapted. It is always evaluated, even when there is no running health-checks
for the server. An error is reported if the 'track' parameter is also
defined. Both cannot work together.
In addition, the "none" state was introduced to be able to restore the
default behavior. It will be especially useful when the parameter is
inherited from a 'default-server' directive.
This patch should fix the issue #3298. It must be backported as far as 3.2.
Three #if directives used > 0x30000000L which excluded OpenSSL 3.0.0
exactly from the modern code path, treating it as pre-3.0. Changed all
three to >= 0x30000000L to match jwe.c and openssl-compat.h conventions.
This affects EC key thumbprint generation, RSA JWK generation, and
JWS algorithm detection for OpenSSL 3.0.0.
Starting a config with maxpipes and no maxconn always ended up in error
because the number of FDs needed for pipes was not deduced from the total
number of FDs when calculating maxconn, and was later found to exceed the
total number of allocatable FD during final checks.
When global.maxpipes is set, it must be used during compute_ideal_maxconn()
so that it's properly deduced.
Without this, just having "maxpipes 500" in a config prevents it from
starting. With the fix, it properly starts with a maxconn adjusted
depending on the number of splice-enabled proxies.
This should be backported, theoretically everywhere, but preferably
progressively. The following config should fail on affected versions
and load with fixed ones:
global
maxpipes 500
frontend srv1
bind :8001
As initially planned, if no trouble was reported on dynamic backend
commands on the CLI, the experimental status could be dropped before the
release. The feedback was not very broad, but was conclusive in that the
operations work as expected and the current syntax can be preserved even
for future evolutions. So we can drop the experimental status.
It was explained in the general presentation of resolvers but not in
the "resolvers" keyword description itself, which might be where users
could be looking for that info, so let's quickly repeat that info there.
The expression items[JWE_ELT_JOSE].length << 3 performs the shift on an
unsigned int (32-bit) before being cast to uint64_t instead of after.
This means that we don't cover for a possible overflow (which would
never happen as it would need a header length beyond 512MB). At least
fixing it will avoid code check reports.
The allocated ec_R and ec_S were not released in case one of the two
would fail to be allocated/created, and would cause a memory leak. Let's
add the missing BN_free(). This may be backported to 2.4.
The collection of errno in tcpcheck_eval_connect() and tcpcheck_main()
is quite far from the production location, and the risk of having a
zero errno is definitely not null. Tests show that this works, so
better not try to fix something not broken, but at least place a
comment there indicating that it's not necessarily super-reliable.
This would need to be revisited the day we finally store errno in
the connection.
When in 2.2, with commit c8dc20a825 ("BUG/MINOR: checks: refine which
errno values are really errors."), errno reporting was refined, an
extra check was added before calling retrieve_errno_from_socket(), and
by mistake the test on !errno got inverted so that we only call the
function to retrieve the error from the socket when errno is set!
The first test in the function detects it and returns without changing
anything, so this didn't have much effect, however when errno is not
set (certain call places purposely pass zero so that getsockopt() is
used), this wasn't called so the error wasn't reported. Apparently it
only happened when called from process_chk_conn() after an async
error was detected, so probably just cases where POLLERR is reported,
which remains infrequent.
Let's fix the direction of this flag. It can be backported if needed
but it's unlikely anyone really noticed.
When an async connect() fails in sock_conn_check(), it returns an errno
that will not be retrieved later by a subsequent getsockopt(SO_ERROR).
The problem is that this errno is then definitely lost. This is visible
in the 4be_1srv_smtpchk_httpchk_layer47errors regtest that fails on
certain systems (e.g. glibc 2.31 on arm32 running Linux 6.1), where the
connect() error is systematically lost and the "Connection refused" is
never seen in the check status. It also matches a few random reports of
the past indicating that the connection error was sometimes not reported
in the stats page in front of a down server.
Ideally we should store errno in connections as soon as the error is
seen. However this would require significant changes that are not
acceptable yet for 3.4 nor stable releases. A more acceptable fix is to
make use of the extra CO_ER_* flags set by conn_set_errno() as soon as
the error is detected. This will recognize a sufficiently large number
of errors and the check status will report them (here we'll have
"ECONNREFUSED" in the check). Note that on systems where the error is
seen synchronously, we can have "ECONNREFUSED (Connection refused)",
but this is not a problem.
This fix adds the missing conn_set_errno() call to sock_conn_check(),
that is thus sufficient to catch this error. In addition, the two
affected regtests were updated to search for ECONNREFUSED here.
This might be backported to older releases if users request it, but it
is probably not necessary.
This test is compatible with QUIC_OPENSSL_COMPAT but the "limited-quic"
directive was not set, making it fail on older libs with no QUIC support
despite being declared as compatible.
While action_inc_gpc1() explicitly checks if s->stkctr or sess->stkctr
are set since 2.8 with commit 6c0117168 ("MEDIUM: stick-table: set the
track-sc limit at boottime via tune.stick-counters"), action_inc_gpc0()
and the generic action_inc_gpc() still stuck to the old approach of not
checking them, causing confusion when reviewing the code.
Upon closer inspection, the only case where the pointer may be NULL is
when global.tune.nb_stk_ctr is zero, which happens when the global
section contains "tune.stick-counters 0". However in this case, the
config parser "parse_inc_gpc()" will reject any reference to any stick
counter, so in theory there is no problem.
Regardless, the difference of treatment between sibling functions remains
confusing and the check is cheap, so let's generalize it, it will save a
future reader from the need to inspect stream_new() and session_new().
Session traces were brought in 3.1 by commit abb07af67 ("MINOR:
session/trace: enable very minimal session tracing") though there was
an issue, because SESS_EV_END and SESS_EV_ERR have the same value (it's
a copy-paste mistake).
This can be backported to 3.2.
The new file core-principles.txt quickly enumerates a number of rules
and invariants across the project. These can be used as quick reminders
as well as basic rules for reviews. It's still lacking a lot of info but
should be a good start.
The function cli_find_frontend was declared twice identically at lines 98-99
of include/haproxy/proxy.h. The second declaration should have been for
cli_find_backend, which is defined in src/proxy.c and used in several places
but was missing from the header's exported symbols.
This is a simple copy-paste mistake where line 99 duplicated line 98 verbatim
instead of declaring cli_find_backend.
When destroying a stream-based DNS nameserver, task_req and task_rsp
were destroyed but task_idle was missed, causing a task object leak.
This doesn't necessarily have to be backported since it's only upon
exit that it is visible.
cfg_parse_resolvers() has many error paths on allocation failure when
parsing "nameserver". These paths handle their own cleanup instead of
centralizing it. The result is that some errors paths leak some fields.
The most complex ones are the strdup() failures which require to check
for stream or dgram to figure what to free. These can be detected via
ASAN on a dummy strdup() allocation failure:
Indirect leak of 131080 byte(s) in 1 object(s) allocated from:
#0 0x7f0b7ed1f0ab in malloc (/usr/lib64/libasan.so.8+0x11f0ab)
#1 0x000000c73e19 in dns_ring_new src/dns_ring.c:59
#2 0x000000af1848 in dns_dgram_init src/dns.c:480
#3 0x000000922005 in cfg_parse_resolvers src/resolvers.c:3792
#4 0x00000089d33e in parse_cfg src/cfgparse.c:2202
#5 0x0000009e0a39 in read_cfg src/haproxy.c:1142
#6 0x000000447e8c in main src/haproxy.c:3474
#7 0x7f0b7e02ad13 in __libc_start_call_main (/lib64/libc.so.6+0x2ad13)
#8 0x7ffd35f1531c ([stack]+0x2031c)
Indirect leak of 304 byte(s) in 1 object(s) allocated from:
#0 0x7f0b7ed1ea23 in calloc (/usr/lib64/libasan.so.8+0x11ea23)
#1 0x000000af1681 in dns_dgram_init src/dns.c:468
#2 0x000000922005 in cfg_parse_resolvers src/resolvers.c:3792
#3 0x00000089d33e in parse_cfg src/cfgparse.c:2202
#4 0x0000009e0a39 in read_cfg src/haproxy.c:1142
#5 0x000000447e8c in main src/haproxy.c:3474
#6 0x7f0b7e02ad13 in __libc_start_call_main (/lib64/libc.so.6+0x2ad13)
#7 0x7ffd35f1531c ([stack]+0x2031c)
Indirect leak of 104 byte(s) in 1 object(s) allocated from:
#0 0x7f0b7ed1ea23 in calloc (/usr/lib64/libasan.so.8+0x11ea23)
#1 0x000000921f83 in cfg_parse_resolvers src/resolvers.c:3772
#2 0x00000089d33e in parse_cfg src/cfgparse.c:2202
#3 0x0000009e0a39 in read_cfg src/haproxy.c:1142
#4 0x000000447e8c in main src/haproxy.c:3474
#5 0x7f0b7e02ad13 in __libc_start_call_main (/lib64/libc.so.6+0x2ad13)
#6 0x7ffd35f1531c ([stack]+0x2031c)
Indirect leak of 64 byte(s) in 1 object(s) allocated from:
#0 0x7f0b7ed1f0ab in malloc (/usr/lib64/libasan.so.8+0x11f0ab)
#1 0x000000c73e09 in dns_ring_new src/dns_ring.c:55
#2 0x000000af1848 in dns_dgram_init src/dns.c:480
#3 0x000000922005 in cfg_parse_resolvers src/resolvers.c:3792
#4 0x00000089d33e in parse_cfg src/cfgparse.c:2202
#5 0x0000009e0a39 in read_cfg src/haproxy.c:1142
#6 0x000000447e8c in main src/haproxy.c:3474
#7 0x7f0b7e02ad13 in __libc_start_call_main (/lib64/libc.so.6+0x2ad13)
#8 0x7ffd35f1531c ([stack]+0x2031c)
Indirect leak of 15 byte(s) in 1 object(s) allocated from:
#0 0x7f0b7ed18e20 in strdup (/usr/lib64/libasan.so.8+0x118e20)
#1 0x00000092203b in cfg_parse_resolvers src/resolvers.c:3798
#2 0x00000089d33e in parse_cfg src/cfgparse.c:2202
#3 0x0000009e0a39 in read_cfg src/haproxy.c:1142
#4 0x000000447e8c in main src/haproxy.c:3474
#5 0x7f0b7e02ad13 in __libc_start_call_main (/lib64/libc.so.6+0x2ad13)
#6 0x7ffd35f1531c ([stack]+0x2031c)
This should be completely reworked so that the cleanup is performed in
a central place, as the risk to get it wrong remains high.
This patch does the minimal changes to clean this up. It does not need
to be backported since it only triggers on boot OOM.
Some strdup() failures in parse_resolve_conf() do not release everything
due to the way the function is built, resulting in leaks on error that are
caught by ASAN:
Direct leak of 304 byte(s) in 1 object(s) allocated from:
#0 0x7fe74231ea23 in calloc (/usr/lib64/libasan.so.8+0x11ea23)
#1 0x000000af1681 in dns_dgram_init src/dns.c:468
#2 0x00000091cbbf in parse_resolve_conf src/resolvers.c:3559
#3 0x00000092179e in cfg_parse_resolvers src/resolvers.c:3815
#4 0x00000089d33e in parse_cfg src/cfgparse.c:2202
#5 0x0000009e0a39 in read_cfg src/haproxy.c:1142
#6 0x000000447e8c in main src/haproxy.c:3474
#7 0x7fe74162ad13 in __libc_start_call_main (/lib64/libc.so.6+0x2ad13)
#8 0x7ffc0a43e31f ([stack]+0x2031f)
Indirect leak of 131080 byte(s) in 1 object(s) allocated from:
#0 0x7fe74231f0ab in malloc (/usr/lib64/libasan.so.8+0x11f0ab)
#1 0x000000c73e19 in dns_ring_new src/dns_ring.c:59
#2 0x000000af1848 in dns_dgram_init src/dns.c:480
#3 0x00000091cbbf in parse_resolve_conf src/resolvers.c:3559
#4 0x00000092179e in cfg_parse_resolvers src/resolvers.c:3815
#5 0x00000089d33e in parse_cfg src/cfgparse.c:2202
#6 0x0000009e0a39 in read_cfg src/haproxy.c:1142
#7 0x000000447e8c in main src/haproxy.c:3474
#8 0x7fe74162ad13 in __libc_start_call_main (/lib64/libc.so.6+0x2ad13)
#9 0x7ffc0a43e31f ([stack]+0x2031f)
Indirect leak of 64 byte(s) in 1 object(s) allocated from:
#0 0x7fe74231f0ab in malloc (/usr/lib64/libasan.so.8+0x11f0ab)
#1 0x000000c73e09 in dns_ring_new src/dns_ring.c:55
#2 0x000000af1848 in dns_dgram_init src/dns.c:480
#3 0x00000091cbbf in parse_resolve_conf src/resolvers.c:3559
#4 0x00000092179e in cfg_parse_resolvers src/resolvers.c:3815
#5 0x00000089d33e in parse_cfg src/cfgparse.c:2202
#6 0x0000009e0a39 in read_cfg src/haproxy.c:1142
#7 0x000000447e8c in main src/haproxy.c:3474
#8 0x7fe74162ad13 in __libc_start_call_main (/lib64/libc.so.6+0x2ad13)
#9 0x7ffc0a43e31f ([stack]+0x2031f)
SUMMARY: AddressSanitizer: 131448 byte(s) leaked in 3 allocation(s).
Let's free the dgram and the dns ring. This can be backported though it's
not important as it only happens on OOM condition during boot.
When an expression is used for do-resolve(), an error may be reported.
Unfortunately it was scratched and replaced by the do-resolve() error,
leaving no chance to know exactly what was wrong. Let's report the
contents of the error when available. It will indicate identifiers that
are not found or invalid ranges or types being used.
This can be backported to all versions.
The sample used to pass the IP address only had its data, px, sess and
strm fields initialized before being passed to vars_set_by_name(). It
turns out that this latter one doesn't seem to touch ctx, flags nor opt
but nothing guarantees it. Let's at least initialize the fields properly
to avoid passing random garbage.
No backport is needed.
In parse_new_proxy(), when proxy_defproxy_cpy() fails, the error path used
ha_free(&curproxy) to release the partially constructed proxy. However, the
proxy was allocated via alloc_new_proxy() which performs significant setup:
- setup_new_proxy() inserts it into the proxy_by_name tree (proxy_store_name)
- It appends to the global proxies list (LIST_APPEND)
- proxy_take() increments its refcount
Additionally, proxy_defproxy_cpy() may have allocated further resources
(strdup'd strings, compression structures, email alert fields, etc).
Using ha_free() only freed the proxy struct itself, leaving:
- The proxy still registered in the name tree (dangling pointer)
- The proxy still linked in the global proxies list
- All strdup'd strings and other allocations leaked
This is visible with ASAN when causing random allocation errors:
[NOTICE] (27033) : haproxy version is 3.4-dev12-b15468-11
[NOTICE] (27033) : path to executable is ./haproxy
[ALERT] (27033) : config : parsing [/dev/stdin:5015] : proxy 'bk3': failed to duplicate tcpcheck preset-vars
[ALERT] (27033) : config : Error(s) found in configuration file : /dev/stdin
=================================================================
==27033==ERROR: LeakSanitizer: detected memory leaks
Direct leak of 4 byte(s) in 1 object(s) allocated from:
#0 0x7f113e518e20 in strdup (/usr/lib64/libasan.so.8+0x118e20)
#1 0x000000955410 in setup_new_proxy src/proxy.c:3178
#2 0x000000955816 in alloc_new_proxy src/proxy.c:3221
#3 0x000000956c33 in parse_new_proxy src/proxy.c:3554
#4 0x000000a24d03 in cfg_parse_listen src/cfgparse-listen.c:495
#5 0x00000089d33e in parse_cfg src/cfgparse.c:2202
#6 0x0000009e0bb9 in read_cfg src/haproxy.c:1142
#7 0x000000447e8c in main src/haproxy.c:3474
#8 0x7f113d82ad13 in __libc_start_call_main (/lib64/libc.so.6+0x2ad13)
#9 0x7fff65b4e320 ([stack]+0x20320)
SUMMARY: AddressSanitizer: 4 byte(s) leaked in 1 allocation(s).
The fix replaces ha_free(&curproxy) with proxy_drop(curproxy), which
properly calls deinit_proxy() to release all internal resources, removes
the proxy from trees and lists, decrements the refcount, and frees the
struct.
No backport is needed since proxy_drop() is only in 3.4.
In dns_dgram_init(), the newly created dgram is assigned to the name server
before the ring is attached. In case of errors, e.g. due to too many watchers,
the dgram is released but not removed from ns->dgram. Let's only assign the
pointer on success to avoid this, as it's not needed before. The problem
was introduced in 2.4 with commit c943799c86 ("MEDIUM: resolvers/dns: split
dns.c into dns.c and resolvers.c"), and was possibly there before. The fix
may be backported to all stable versions.
The resolver 'r' is appended to the global sec_resolvers list, but upon failure
later, pointers are released but the element remains in the list, corrupting it,
and possibly causing a crash during deinit() when releasing remaining ones.
Adding a LIST_DEL_INIT() on the error unrolling path is sufficient.
Note that the issue will only happen on failure to allocate memory via
strdup() so the risk is low. The bug was introduced in 2.6 by commit
e7f5776800 ("MINOR: resolvers: resolvers_new() create a resolvers with
default values"), so the fix may be backported to several releases, but
does not necessarily have to go that far.
In cli_parse_set_server()'s 'ssl' branch, the server lock is taken,
and not released in case srv_set_ssl() fails, resulting in a dead lock
and a panic the next time an attempt to touch this server is made. The
lock must be released on all error paths.
This was introduced in 3.3 by commit f8f94ffc9 ("BUG/MEDIUM: server:
Use sni as pool connection name for SSL server only") which was marked
for backporting to 3.0, so this must likely be backported that far.
The condition 'if (srv->pool_conn_name)' was checking the destination
instead of the source 'src->pool_conn_name', meaning the strdup() would
never fire (since newly calloc'd servers start with NULL pool_conn_name),
and the pool_conn_name setting from default-server was silently ignored.
Introduced in 3.2 with commit f0f1816f1 ("MINOR: check: implement
check-pool-conn-name srv keyword") when pool_conn_name support was added
to srv_settings_cpy(). The bug caused any 'pool-conn-name' setting in a
'default-server' line to be lost for all servers inheriting from it.
Note that it's not the first time this function induces such a bug due
to the poor choice of "srv" vs "src" that should be renamed to avoid
keyboard mistakes and visual confusion.
This needs to be backported to 3.2.
This function may face an OOM on strdup() in the middle of the hostname
or hostname_dn replacement, leaving NULLs in either or both of the server's
fields, which is definitely not good for other call places.
Let's perform a safe replacement instead: we first allocate the new
values, and only if they are successful, then we release the previous
ones and replace them.
It is not necessary to backport this unless the issue is reported (it
was found via code review).
All task allocation errors report "memory allocation error initializing
the ring" when the actual failure was task_new_anywhere() returning NULL.
This clearly is a copy-paste. Let's fix the error messages to help when
debugging. Since it's only about allocation failures during init, there
is probably no point in backporting this.
One is s/keyworld/keyword in the retry-on parser. The other one is a
wrong argument "len" being printed in case of parse error for
"declare capture" instead of the length itself.
These can be backported though they are not important.
In dns_session_init(), sockaddr_alloc() allocates 'addr' from the sockaddr
pool, but on failure of appctx_finalize_startup() we jump to the error label
without calling sockaddr_free(&addr), leaking the allocation. Let's add the
missing sockaddr_free() on the error branch.
This must be backported to 2.6.
The original DNS code would only use the 8 lower bits of the compression
offset. This was fixed in 2.0 with commit 2fa66c3b9 ("BUG/MEDIUM: dns:
overflowed dns name start position causing invalid dns error") but it was
not sufficient because the anti-loop check continues to use only 8 of the
14 bits, thus a crafted response where the 8 lower bits pass the check and
the 6 higher should fail it would be accepted. The impacts remains limited
thanks to the bounds check and the recursion limits, but such invalid
responses could still cost a lot to process. Let's compute the 14-bit
offset once for all and use it everywhere.
In resolv_validate_dns_response(), the additional records loop calls
resolv_read_name(). When it returns zero due to a bad response, the main
loop does a "continue" without making the "reader" pointer progress, so it
evaluates the exact same field again and again. Fortunately this is limited
by arcount which is 16 bits, but it means it can still iterate 65535 times
there, allocating and releasing an answer_record at each turn. Let's just
jump to the invalid_resp label that handles the cleaning. There was the
same pattern (without the allocation) with nscount a few lines above BTW.
These can possibly explain some situations where a high CPU usage observed
processing responses.
Seems like these were introduced in 2.2 with commit 37950c8d2
("BUG/MEDIUM: dns: improper parsing of aditional records")
This must be backported to stable versions.
In cfg_parse_resolvers(), two duplicate name checks set err_code but lacked
'goto out', allowing execution to fall through and create the duplicate entry.
This would result in new resolvers and nameservers to be created after the
error was displayed, and a leak of the previous one. It's mostly harmless
since we're exiting after such errors. This can be backported if desired.
In the inner while loop that validates each character of a POST parameter
value, the code checks *p via HTTP_IS_TOKEN() and HTTP_IS_LWS() instead
of *end, while the loop condition only advances "end", so only the first
character of each value is validated.
This means spaces or binary data embedded in parameter values after the
first character goes undetected. Fix by replacing both references to *p
with *end to properly scan through all characters as intended.
This bug was introduced in 1.5-dev20 by commit 98634f0c7 ("MEDIUM:
backend: Enhance hash-type directive with an algorithm options") so
the fix must be backported to all versions.
Released version 3.4-dev12 with the following main changes :
- SCRIPTS: announce-release: add a link to the OpenTelemetry filter
- BUG/MEDIUM: servers: Only requeue servers if they are up
- MINOR: tinfo: store the number of committed extra streams in the tgroup
- MINOR: connection: add a function to calculate elastic streams limit
- MINOR: mux-h2: consider the elastic streams limit on frontend
- MINOR: lb: make LB initialization even more declarative
- BUG/MINOR: cfgparse-listen: do not emit extraneous line in rule order warnings
- CLEANUP: tree-wide: fix typos in non user-visible comments in 15 files
- CLEANUP: h1/htx: fix a few typos in warning, debug and trace messages
- BUG/MINOR: mux-h1: only check h1s if not NULL
- BUG/MINOR: http-fetch: fix smp_fetch_hdr_ip()'s handling of brackets for IPv6
- BUG/MINOR: http-fetch: make http_first_req() check for HTTP first
- BUG/MINOR: http-act: set-status() must check the response message, not the request
- BUG/MINOR: tools: fix memory leak in env_expand() error path
- BUG/MINOR: auth: free user groups on error paths in userlist_postinit()
- BUG/MINOR: uri-auth: avoid leaks on initialization error
- BUG/MINOR: cache: fix memory leak in parse_cache_rule error path
- BUG/MINOR: cfgcond: make KQUEUE check for GTUNE_USE_KQUEUE not GTUNE_USE_EPOLL
- BUG/MINOR: mqtt: connack parser returns MQTT_NEED_MORE_DATA on unknown property
- BUG/MINOR: mqtt: connect parser uses wrong bit field for TOPIC_ALIAS_MAXIMUM
- BUG/MINOR: mqtt: connack parser uses wrong bit for SUBSCRIPTION_IDENTIFIERS_AVAILABLE
- BUG/MINOR: mqtt: fix PUBLISH flags validation that want all bits to be set
- CLEANUP: http_htx: rename inner 'type' to 'ptype' to avoid variable shadowing
- CLEANUP: mux-h2: fix minor output debugging format issues
- CLEANUP: http-rules: fix a few '&' vs '&&' checks for clarity
- CLEANUP: auth: remove undeclared auth_resolve_groups() from auth.h
- CLEANUP: cache: remove redundant res_htx assignment in http_cache_io_handler()
- CLEANUP: channel: remove bogus and unused definition of channel_empty()
- CLEANUP: flt_http_comp: remove duplicate rate limit and CPU usage checks
- CLEANUP: mqtt: remove duplicate MQTT_FN_BIT_USER_PROPERTY in CONNECT fields
- BUG/MINOR: uri-auth: fix possible null-deref in latest fix for leaks
- BUG/MEDIUM: tasks: Keep the TASK_RUNNING flag until queued
- CLEANUP: mqtt: fix spelling of shared_subscription_available
- CLEANUP: regex: pre-initialize error variable in regex_comp() to calm analysis
- BUILD: compiler: fix redefinition of __nonstring
- CLEANUP: defaults: adjust MAX_THREADS multiplier number in comment
- CLEANUP: src/cpuset.c: fix missing return in functions returning int
- REGTESTS: Use ${tmpdir} instead of hardcoded /tmp/
- REGTESTS: Don't try to use real nameservers for testcases
- CLEANUP: tree-wide: fix typos in non user-visible comments in 3 more files
- MINOR: cli: improve forward compatibility for show fd
- DOC: management: document the <tgid>/<fd> form of show fd
- CLEANUP: tree-wide: fix more typos and outdated explanations in comments
- BUG/MEDIUM: dict: hold read lock while incrementing refcount in dict_insert
- BUG/MEDIUM: http-client: Only consume input buffer when hc one is empty
- BUG/MINOR: xprt_qstrm: fix conflicting prototype
- REORG: mux_quic: use newer qcm prefix for legacy qmux files
- MINOR: mux_quic: use qcm prefix for mux callbacks
- MINOR: mux_quic: use qcm prefix for mux functions
- MINOR: mux_quic: use qcm prefix for traces functions/structs
- MINOR: mux_quic: rename qstrm files to qmux
- MINOR: mux_quic: remove qstrm naming in QUIC MUX
- MINOR: connection: rename QMux related flags
- MINOR: xprt_qmux: use qmux instead of qstrm naming
- MINOR: trace: implement source alias
- MEDIUM: mux_quic: rename qmux traces to qcm
- MINOR: sample: add a generic reverse converter
- MINOR: sample: add a reverse_dom converter
- DOC: proxy-protocol: clarify UDP usage
- BUILD: 51d.c: cleanup, fix preprocessor ifdefs
- CLEANUP: tree-wide: fix typos in user-invisible files
- CLEANUP: htx: Adjust numbering of HTX blocks' types in the description
Support of pseudo-headers was removed as unused, but mention of it
in the description remains and disrupt the numbering in comment, which
can be confusing.
The ifdef spans over function boundaries, making some combinations produce
code that does not build:
addons/51degrees/51d.c:638:1: error: Unmatched '}'. Configuration: ''. [syntaxError]
the proxy protocol spec didn't specify UDP and therefore most
implementations treat it as a TCP connection and re-use the last send
information for a ip/port pair.
This change makes it more clear.
In domain-based routing and policy rules, suffix matching on hostnames is
often easier to express as a prefix match on reversed labels. A dedicated
converter makes this convenient with existing fetches and matchers.
This also has a performance benefit for large maps. Prefix string matches use
the prefix-tree index (PAT_MATCH_BEG with pat_idx_tree_pfx), while end matches
use the string-list index (PAT_MATCH_END with pat_idx_list_str), so
reversed-label lookups can avoid linear suffix scans.
This patch adds "reverse_dom", a string converter that reverses domain labels,
ignores one optional trailing dot on input, and rejects empty labels. It
intentionally leaves trailing-dot handling to the caller so configurations can
choose between exact matches, subdomain-only matches, or an explicit dotted
form built with "concat(.)" for prefix lookups.
Examples:
example.com -> com.example
mail.example.com -> com.example.mail
The documentation is updated and a reg-test covers the converter itself, the
explicit dotted form for "map_beg()", and the subdomain-only "-m beg" case.
Some use cases benefit from reversing a string before passing it to other
converters or lookups. While reverse_dom addresses domain-specific label
reversal, a generic byte-wise string reversal remains useful on its own and can
also be combined with other converters such as concat().
A common lookup use case is turning a suffix match on the original string into
a prefix match on the reversed string. Prefix string matches use the
prefix-tree index (PAT_MATCH_BEG with pat_idx_tree_pfx), while end matches use
the string-list index (PAT_MATCH_END with pat_idx_list_str), so reversing
before map_beg can avoid linear suffix scans for large maps.
This patch adds a new string converter named "reverse". It reverses the input
string byte by byte and returns the resulting string unchanged otherwise. It
does not apply any domain-specific semantics or character-encoding semantics.
The documentation is updated and a reg-test is added to cover the basic
conversion as well as a simple composition with concat(.).
This patch is part of a renaming affecting mux_quic and related files.
Naming "qcm" will become the new marker for common mux_quic stuffs,
shared both on QUIC and QMux protocols. The final objective is to limit
"qmux" naming for stuffs related to the new experimental protocol of the
same name.
The current patch renames mux_quic traces token to "qcm". This is the
only change with external visible impacts in the whole renaming series.
However, to preserve compatibility, trace source alias is defined with
the older name "qmux". This relies on the previous patch which
introduced "alias" new trace_source member.
Add a new "alias" member in trace_source structure. Its purpose is to be
an alternative to the member "name". This will be used in the next patch
to allow renaming of QUIC mux traces while preserving compatibility.
This new member is only used in trace_find_source() which is the helper
used to retrieve a trace source from its name.
This is a follow-up on the QUIC MUX renaming process.
The current patch performs renaming in xprt_qmux layer. Older "qstrm"
identifier is replaced by the new name "qmux". Every remaining functions
and structures in xprt_qmux are changed. Outside effects are only
present in QUIC MUX which directly uses some of these functions.
This is a follow-up on the QUIC MUX renaming process.
The current patch performs renaming of "qstrm" to "qmux" in connection
flags. These flags are only used in linked with the xprt_qmux layer.
This has an impact on every files which manipulates these flags, namely
backend, session and ssl_sock sources.
Also, internal xprt identifier is renamed from XPRT_QSTRM to XPRT_QMUX,
This is a follow-up on the QUIC MUX renaming process.
The current patch replaces "qstrm" naming with "qmux" in QUIC MUX source
file. Some members are impacted in qcc and qcs structures, as well as
some internal functions used for QMux receive/send. Internal mux_ops is
also rename to qmux_ops. This is not a breaking change as its externally
visible name was already set to "qmux" originally.
This is a follow-up on the QUIC MUX renaming process. Now most of "qmux"
generic usages as been replaced in favor of "qcm" naming. The next part
of the renaming is to replace "qstrm" naming with "qmux" for stuffs
related to the new QMux protocol specifically.
This is first applied on filenames. As with the previous renaming,
Makefile and include statements are updated as well to prevent
compilation issues.
This is a follow-up on the QUIC MUX renaming process.
This patch renames several definitions in QUIC MUX traces source code.
This is only an internal change. Trace source name is kept to "qmux" for
now, but will be changed in a dedicated patch.
This is a follow-up on the QUIC MUX renaming process.
A previous patch already renames MUX ops callbacks. The current patch
proceed to a similar operation for the rest of the MUX functions.
This is a follow-up on the QUIC MUX renaming process.
The current patch renames all MUX functions used as stream ops
callbacks. Also, internally defined mux_ops is also renamed from
"qmux_ops" to "quic_ops". There is no breaking change as mux_ops name
field remain set to "QUIC".
This patch is the first one of the renaming serie, affecting the QUIC
MUX module. The objective is to remove older "qmux" naming which was
used as a generic identifier. Now it should be restricted to the QMux
experimental protocol. A new "qcm" naming will replace the generic
usage.
The current patch renames the files themselves. Token "qmux" is replaced
by the new "qcm" identifier. Makefile and include statements are
adjusted as required.
This patch adds the missing include of xprt_qstrm header into its
companion source file. This helped to detect an incoherence in the
xprt_qstrm_xfer_rxbuf() prototype which is now fixed.
Header files is also updated with mandatory include statements and
forward declaration.
No backport needed.
Since http-client applet uses its own buffers, it is possible to have data
stuck in the applet input buffer while the http-client response buffer is
full, preventing the applet to consume these data. If this happens on the
last part of the response payload, the upper stream can decide to shut the
applet. In this case, the applet using the http client will not be able to
retrieve these last data because they will never be move into the hc
response buffer.
The main reason for this bug is that, for now, the applets cannot survive
the upper stream unlike multiplexers. It could be a good improvement for the
3.5. However, some applets still uses the stream-connector and the upper
stream (peer and stat applets for instance). So it is not an easy task.
In the mean time, to fix the issue on stable branches, the http-client
applet now stops to consume data when the hc response buffer is not empty.
This way, the applet shut will be deferred. Data will be consumed when they
can be fully moved in the httpclient response buffer.
This patch should fix the issue #3366. It must be backported to 3.3.
In dict_insert(), the read lock on d->rwlock was released before
incrementing the entry's refcount. Between the RDUNLOCK and the
HA_ATOMIC_INC, another thread could call dict_entry_unref() to drop
the refcount to zero, acquire the write lock, delete the entry from
the tree, and free it. The subsequent HA_ATOMIC_INC would then be a
use-after-free on freed memory.
The fix moves the HA_ATOMIC_INC inside the read lock, matching the
pattern used in stick_table.c for identical refcount-then-unlock
sequences.
It can be backported to the branches where this is relevant.
Some outdated comments, as well as typos were fixed in the following files:
dgram.h protocol.h queue-t.h cpu_topo.c debug.c dict.c
protocol.c queue.c raw_sock.c trace.c wdt.c
Add the syntax description, including the wildcard forms and the
note that <tgid> is currently parsed but ignored pending future
support for per-thread-group fd tables.
The "<tgid>/" and "/" wildcard forms previously produced no output.
This isn't a bug since they are new, but a script written for future
versions (where the slash form will gain per-thread-group semantics)
would not work the same on 3.4. Make them produce output by dropping
the redundant ctx->fd = -1 wildcard sentinel; also tighten tgid
validation to reject values <= 0.
The test doesn't need a real nameserver and in a isolated, restricted
test environment it might not be able to reach one at all, like with a
network sandbox. So lets just use 127.0.0.1:53. Even if there is none,
that's not a problem for this particular test.
Signed-off-by: Christian Ruppert <idl0r@qasl.de>
Tests may be excuted in sandboxed or minimalistic / restricted
environments, so incosistencies might cause trouble, like missing
permissions. So lets use the tmpdir variable instead, so the user might
define some path
Signed-off-by: Christian Ruppert <idl0r@qasl.de>
Cppcheck found the issue described in github #2124, which can cause these
errors if no CPUSET implementation is supported (and CPUSET_USE_ULONG is
not enabled):
src/cpuset.c:21:11: error: Found an exit path from function with non-void return type that has missing return statement [missingReturn]
src/cpuset.c:36:11: error: Found an exit path from function with non-void return type that has missing return statement [missingReturn]
src/cpuset.c💯1: error: Found an exit path from function with non-void return type that has missing return statement [missingReturn]
src/cpuset.c:124:1: error: Found an exit path from function with non-void return type that has missing return statement [missingReturn]
src/cpuset.c:152:1: error: Found an exit path from function with non-void return type that has missing return statement [missingReturn]
src/cpuset.c:163:1: error: Found an exit path from function with non-void return type that has missing return statement [missingReturn]
This can be backported.
Dmitry Sivachenko reported a build warning on FreeBSD -dev, where
__nonstring is apparently already defined. Let's guard our own
definition to avoid such issues. It could make sense to backport
this to recent stable versions which may soon be exposed to modern
compilers.
In regex_comp(), the error variable is either a const char* (USE_PCRE)
or a a uchar[] (USE_PCRE2), and navigating through the ifdefs is quite a
mess, making it hard to figure if it's always properly initialized when
printing an error message. Let's just preset it to NULL to clarify what
comes from where.
In task_schedule(), it is not enough to get the TASK_RUNNING flag before
setting the expire field, we also have to keep it while queueing the
taks, otherwise the task may run in the meanwhile and set expire to 0,
triggering the BUG_ON() in __task_queue() again. So now, only drop the
running flag once it's done.
This should be backported up to 2.8.
Latest commit 2dfbc311a8 ("BUG/MINOR: uri-auth: avoid leaks on
initialization error") left a possible null-deref case which was
surprisingly only detected by certain compiler combinations. No
backport needed.
In comp_prepare_compress_request(), the compression rate limit and CPU
usage checks were duplicated. The first set runs before selecting the
algorithm, and the second set runs after. That's definitely a copy-paste
issue or a patch being applied twice. Let's just drop one.
In http_re{q,s}_get_intercept_rule(), there are two occurrences of '&'
being used instead of '&&' which fortunately work thanks to the tests
being negations (hence 0/1 on each branch). Let's fix that and take this
opportunity for adding explicit precedence in http_apply_redirect_rule().
In h2_dump_h2s_info(), the tl.calls was being printed as signed instead
of unsigned, which is not correct but harmless (only used with "show
fd"). In the same function, we don't check if h2s->sd is valid while
dereferencing it. In practise it is valid since "show fd" is run under
thread isolation, but it's far from being obvious, and if conditions
would later change, we don't know it could be printed between h2s_new()
and h2s_frt_stream_new(). Finally in h2s_make_data() a wrong set of
H2_EV_RX_* flags were used instead of H2_EV_TX_* to emit traces.
In http_add_header() there are "type" variables of the same type at two
levels, which is a bit confusing. The inner one is for the "prev" block,
so let's rename it "ptype" by analogy with "pblk".
The definition of the PUBLISH message type indicates that the LSB are
independent, but uses a value of 0xF that clearly shows an attempt to
use a mask instead, but it results in all messages not having all flags
set to be rejected. A sane approach would have been to check for a mask
and an expected value. Let's just add a special case for it in function
mqtt_read_fixed_hdr() since that's for a single message type.
This can be backported anywhere.
In mqtt_parse_connack(), the MQTT_PROP_SUBSCRIPTION_IDENTIFIERS_AVAILABLE
case was checking and setting MQTT_FN_BIT_SUBSCRIPTION_IDENTIFIER instead
of MQTT_FN_BIT_SUBSCRIPTION_IDENTIFIERS_AVAILABLE, due to a copy-paste
mistake. This can be backported where needed.
In mqtt_parse_connect(), the MQTT_PROP_TOPIC_ALIAS_MAXIMUM case was checking
and setting MQTT_FN_BIT_TOPIC_ALIAS instead of MQTT_FN_BIT_TOPIC_ALIAS_MAXIMUM.
This means duplicate detection for Topic-Alias-Maximum property was using the
wrong bitmask, and the actual Topic-Alias-Maximum bit was never set, making
duplicate detection ineffective for this property. The CONNACK parser already
had this correct.
In mqtt_parse_connack(), the switch statement's default case for unknown
MQTT properties was using 'return 0' which returns MQTT_NEED_MORE_DATA.
This is misleading: an unknown property should be treated as an invalid
message (MQTT_INVALID_MESSAGE), like other functions do. This branches to
the "end" label without touching the preset return value instead. This can
be backported if needed.
In cfg_eval_cond_enabled(), the "KQUEUE" option incorrectly checks
GTUNE_USE_EPOLL instead of GTUNE_USE_KQUEUE. This is a copy-paste bug
from the preceding EPOLL case. It can be backported though it's harmless.
When the filter config (fconf) allocation fails in parse_cache_rule,
the previously allocated cache_flt_conf (cconf) and its strdup'd name
string are not freed. The error path only freed cconf but not
cconf->c.name, causing a memory leak.
No backport is needed.
When stats_add_scope() and stats_add_auth() fail to initialize a field,
they just leave a partially allocated and initialized structure behind
them that is leaked. The whole architecture doesn't provide clean
unrolling abilities since everything is shared and assigned unconditionally,
but let's at least release what was just allocated. The whole approach would
probably deserve being revisited if one day this becomes more dynamic.
No backport needed.
In userlist_postinit(), when an error occurs (missing group, missing user, or
allocation failure), the function returned immediately without freeing the
auth_groups_list linked lists that were built for all users in the first loop.
Each user's curuser->u.groups pointed to these allocated nodes, which leaked
on every error path.
Fix by replacing direct returns with a goto to a centralized cleanup label
that frees all users' groups lists before returning the error. Also fix a
trailing double space in one error return statement while refactoring.
Note that the impact is very low since we're supposed to fail to boo after
such errors.
When my_realloc2() fails in env_expand(), the code jumps to 'leave:' and
returns NULL, but the original input 'in' is never freed (it's only freed
at line 4919 in the success case). Given that callers typically pass it
the direct return of strdup(), it looks like it is expected to always be
freed. This can be backported everywhere.
action_http_set_status() checks for soft rewrite on the request message
by mistake instead of the response message. This could possibly cause a
rewrite failure when soft rewrite is enabled since it will not be seen
there, though the impact is extremely low. It can be backported.
smp_fetch_http_first_req() reads ->txn.http->flags without first
checking if txn.http is properly allocated. In theory if called from
the wrong context it could crash, even though tests where it's called
from "tcp-request content" don't seem to have any effect. Let's fix
it regardless, at least to dissipate the doubt. It can be backported
everywhere.
IPv6 addresses can be read enclosed in brackets, but the length of the
string is not checked before checking them. If by lack of luck, the
buffer is empty but already contains '[' in the first place, we'd read
the byte at position -1, possibly crashing (even though in practice it
will not since allocated blocks will be precedeed by the malloc meta-
data). At least it could make asan/valgrind unhappy.
This can be backported to all versions.
Since we can emit glitches during an H2 upgrade, we no longer have a
guaranteed h1s, so _h1_report_glitch() must check h1s before
dereferencing it. No backport is needed as this arrived in 3.4-dev11
with commit 72fd357814 ("MEDIUM: mux-h1: Return an error on h2 upgrade
attempts if not allowed").
Just a few minor user visible issues issues found in mux_h1 and http_htx
(traces, warnings and debug output). This may be backported though isn't
important at all.
This fixes typos and spelling mistakes in the following files:
channel-t.h channel.h filters-t.h http_htx.h htx-t.h tools.h
cfgcond.c channel.c flt_http_comp.c http_ana.c htx.c mqtt.c
mux_h1.c regex.c stats-proxy.c
Some functions such as tcp_parse_tcp_req() are able to emit their own
warnings by relying on warnif_misplaced_*() which directly prints the
warning. However when doing so they still increment the warning counter
which makes cfg_parse_listen() try to emit it, except that what's in the
variable is NULL, so we end up with:
[WARNING] (260) : config : parsing [/etc/haproxy/haproxy.cfg:17] : (null)
Let's just check the errmsg variable before printing the error. If it's
NULL, it's because the message was already printed.
This can be backported to all branches.
This lets lb_ops specify the conditions necessary to bind to this set of
ops. The condition is expressed as a list of mask and match fields on
the algorithm flags. This is then used in proxy_finalize() to locate the
lb_ops corresponding to the current configuration, by iterating over
the list of lb_ops structures. This list is implemented using the same
mechanisms used for configuration keywords: an INITCALL1 macro to a
registration function.
This also moves the lookup and property flags into the lb_ops structure
that were previously applied manually on a case by case basis.
Now the streams-elasticity limit applies to h2 frontend connections.
It allows to reduce the number of advertised streams based on the
number of concurrent connections.
This adds a new tune.streams-elasticity parameter. This parameter
indicates, as a percentage, the average number of streams per connection
at full load. It is used to calculate limits of the number of streams to
advertise on new connections. 0 means that no such limit is set.
When a limit is set, the new function conn_calc_max_streams() determines
the optimal number of streams to allow on a connection. It will assign at
least the ratio of streams left to connections left, and at least a fair
share of what's left times the number of desired streams. It will always
ensure that each connection gets at least 1 stream, and everything beyond
this will be evenly distributed. For now the function is not used.
In order to be able to enforce global streams limitations, we'll first
have to be able to account how many streams we promised to serve via
frontend muxes. We'll always need to support at least one stream, which
is why here we're only counting extra streams beyond the first one. It
also has the benefit of leaving H1 out of this, and save it from updating
a variable. Also in order to avoid an important update cost, we're storing
this value per thread group. For now only H2 is implemented, but QUIC
should follow shortly and should only count bidirectional streams.
In init_srv_requeue(), only attempt to run the tasklet if the server is
actually running, otherwise it will end up being queued a second time,
when the server is actually brought up, and that will lead to a
corrupted mt_list.
This can easily be reproduced by adding a dynamic server, as those start
disabled, and then enabling and disabling it a couple of times.
This should fix github issue #3360.
This should be backported up to 3.2.
expect ~ "[^:\\[ ]\\[${h1_pid}\\]: Health check for server be4/srv4 failed.+reason: Layer4 connection problem.+info: \"Connection refused\".+check duration: [[:digit:]]+ms.+status: 0/1 DOWN."
expect ~ "[^:\\[ ]\\[${h1_pid}\\]: Health check for server be4/srv4 failed.+reason: Layer4 connection problem.+info: \"ECONNREFUSED returned by OS.*\".+check duration: [[:digit:]]+ms.+status: 0/1 DOWN."