SC_FL_ABRT_DONE flag should never be set when SC_FL_EOS was already
set. These both flags were introduced to replace the old CF_SHUTR and to
have a flag for shuts driven by the stream and a flag for the read0 received
by the mux. So both flags must not be seen at same time on a SC. It is
espeically important because some processing are performed when these flags
are set. And wrong decisions may be made.
This patch must be backproted as far as 2.8.
The first attempt to fix this issue (c672b2a29 "BUG/MINOR: http-ana:
Properly detect client abort when forwarding the response") was not fully
correct and could be responsible to false report of client abort during the
response forwarding. I guess it is possible to truncate the response.
Instead, we must also take care that the client closed on its side, by
checking SC_FL_EOS flag on the front SC. Indeed, if the client has aborted,
this flag should be set.
This patch should be backported as far as 2.8.
The RX_F_INHERITED flag was ambiguous, as it was used to mark both
listeners inherited from the parent process and listeners duplicated
from another local receiver. This could lead to incorrect behavior
concerning socket unbinding and suspension.
This commit refactors the handling of inherited listeners by splitting
the RX_F_INHERITED flag into two more specific flags:
- RX_F_INHERITED_FD: Indicates a listener inherited from the parent
process via its file descriptor. These listeners should not be unbound
by the master.
- RX_F_INHERITED_SOCK: Indicates a listener that shares a socket with
another one, either by being inherited from the parent or by being
duplicated from another local listener. These listeners should not be
suspended or resumed individually.
Previously, the sharding code was unconditionally using RX_F_INHERITED
when duplicating a file descriptor. In HAProxy versions prior to 3.1,
this led to a file descriptor leak for duplicated unix stats sockets in
the master process. This would eventually cause the master to crash with
a BUG_ON in fd_insert() once the file descriptor limit was reached.
This must be backported as far as 3.0. Branches earlier than 3.0 are
affected but would need a different patch as the logic is different.
Right now we don't get any state trace when receiving an RST_STREAM, and
this is not convenient because RST_STREAM(0) is not visible at all, except
in developer level because the function is entered and left.
Let's extract the RST code first and always log it using TRACE_PRINTF()
(along with h2c/h2s) so that it's possible to detect certain codes being
used.
The previous patch tried to fix access to QCS <sd> member, as the latter
is not always allocated anymore on the frontend side.
a15f0461a0
BUG/MEDIUM: h3: do not access QCS <sd> if not allocated
In particular, access was prevented after HEADERS parsing in case
h3_req_headers_to_htx() returned an error, which indicates that the
stream-endpoint allocation was not performed. However, this still is not
enough when QCS instance is already closed at this step. Indeed, in this
case, h3_req_headers_to_htx() returns OK but stream-endpoint allocation
is skipped as an optimization as no data exchange will be performed.
To definitely fix this kind of problems, add checks on qcs <sd> member
before accessing it in H3 layer. This method is the safest one to ensure
there is no NULL dereferencement.
This should fix github issue #3211.
This must be backported along the above mentionned patch.
- Convert additional cases to use the automatic alignment feature for
the THREAD_ALIGN(ED) macros. This includes some cases that are less
obviously correct where it seems we wanted to align only in the
USE_THREAD case but were not using the thread specific macros.
- Also move some alignment requirements to the structure definition
instead of having it on variable declaration.
- Use the automatic alignment feature instead of hardcoding 64 all over
the code.
- This also converts a few bare __attribute__((aligned(X))) to using the
ALIGNED macro.
Don't attempt to use stored sessions when creating new check
connections, as the check SSL parameters might be different from the
server's ones.
This has not been proven to be a problem yet, but it doesn't mean it
can't be, and this should be backported up to 2.8 along with
dcce936912 if it is.
When establishing check connections, do not store the negociated ALPN
into the server's path_param if the connection is a check connection, as
it may use different SSL parameters than the regular connections. To do
so, only store them if the CO_FL_SSL_NO_CACHED_INFO is not set.
Otherwise, the check ALPN may be stored, and the wrong mux can be used
for regular connections, which will end up generating 502s.
This should fix Github issue #3207
This should be backported to 3.3.
Add a new flag to connections, CO_FL_SSL_NO_CACHED_INFO, and set it for
checks.
It lets the ssl layer know that he should not use cached informations,
such as the ALPN as stored in the server, or cached sessions.
This wlil be used for checks, as checks may target different servers, or
used a different SSL configuration, so we can't assume the stored
informations are correct.
This should be backported to 3.3, and may be backported up to 2.8 if the
attempts to do session resume by checks is proven to be a problem.
Move the code that is responsible for checking the ALPN, and updating
the one stored in the server's path_param, from after we created the
mux, to after we did an handshake. Once we did it once, the mux will not
be created by the ssl code anymore, as when we know which mux to use
thanks to the ALPN, it will be done earlier in connect_server(), so in
the unlikely event it changes, we would not detect it anymore, and we'd
keep on creating the wrong mux.
This can be reproduced by doing a first request, and then changing the
ALPN of the server without haproxy noticing (ie without haproxy noticing
that the server went down).
This should be backported to 3.3.
In ticket #3204, it was reported that "show proc" is not able to display
more than 202 processes. Indeed the bufsize is 16k by default in the
master, and can't be changed anymore since 3.1.
This patch allows the 'show proc' to start again to dump when the buffer
is full, based on the timestamp of the last PID it attempted to dump.
Using pointers or count the number of processes might not be a good idea
since the list can change between calls.
Could be backported in all stable branche.
Since the following commit, allocation of QCS stream-endpoint on FE side
has been delayed. The objective is to allocate it only for QCS attached
to an upper stream object. Stream-endpoint allocation is now performed
on qcs_attach_sc() called during HEADERS parsing.
commit e6064c5616
OPTIM: mux-quic: delay FE sedesc alloc to stream creation
Also, stream-endpoint is accessed through the QCS instance after HEADERS
or DATA frames parsing, to update the known input payload length. The
above patch triggered regressions as in some code paths, <sd> field is
dereferenced while still being NULL.
This patch fixes this by restricting access to <sd> field after newer
conditions.
First, after HEADERS parsing, known input length is only updated if
h3_req_headers_to_htx() previously returned a success value, which
guarantee that qcs_attach_sc() has been executed.
After DATA parsing, <sd> is only accessed after the frame validity
check. This ensures that HEADERS were already parsed, thus guaranteing
that stream-endpoint is allocated.
This should fix github issue #3211.
This must be backported up to 3.3. This is sufficient, unless above
patch is backported to previous releases, in which case the current one
must be picked with it.
It is a very old bug (2012), dating from the introduction of the keep-alive
support to HAProxy. When a request is fully received, the SC on backend side
is switched to NOHALF mode. It means that when the read0 is received from
the server, the server connection is immediately closed. It is expected to
do so at the end of a classical request. However, it must not be performed
if the session is switched to the TUNNEL mode (after an HTTP/1 upgrade or a
CONNECT). The client may still have data to send to the server. And closing
brutally the server connection this way will be handled as an error on
client side.
This bug is especially visible when a H2 connection on client side because a
RST_STREAM is emitted and a "SD--" is reported in logs.
Thanks to @chrisstaite
This patch should fix the issue #3205. It must be backported to all stable
versions.
When per-stream "bytes_in" and "bytes_out" counters where replaced in 3.3,
the wrong counters were used for %B and %U values in logs. In the
configuration manual and the commit message, it was specificed that
"bytes_in" was replaced by "req_in" and "bytes_out" by "res_in", but in the
code, wrong counters were used. It is now fixed.
This patch should fix the issue #3208. It must be backported to 3.3.
Thanks to the previous patch, "BUG/MEDIUM: ssl: Don't reuse TLS session
if the connection's SNI differs", it is no useless to store the SNI of
cached TLS sessions. This SNI is no longer tested and new connections
reusing a session must have the same SNI.
The main change here is for the ssl_sock_set_servername() function. It is no
longer possible to compare the SNI of the reused session with the one of the
new connection. So, the SNI is always set, with no other processing. Mainly,
the session is not destroyed when SNIs don't match. It means the commit
119a4084bf ("BUG/MEDIUM: ssl: for a handshake when server-side SNI changes")
is implicitly reverted.
It is good to note that it is unclear for me when and why the reused session
should be destroyed. Because I'm unable to reproduce any issue fixed by the
commit above.
This patch could be backported as far as 3.0 with the commit above.
When a new SSL server connection is created, if no SNI is set, it is
possible to inherit from the one of the reused TLS session. The bug was
introduced by the commit 95ac5fe4a ("MEDIUM: ssl_sock: always use the SSL's
server name, not the one from the tid"). The mixup is possible between
regular connections but also with health-checks connections.
But it is only the visible part of the bug. If the SNI of the cached TLS
session does not match the one of the new connection, no reuse must be
performed at all.
To fix the bug, hash of the SNI of the reused session is compared with the
one of the new connection. The TLS session is reused only if the hashes are
the same.
This patch should fix the issue #3195. It must be slowly backported as far
as 3.0. it relies on the following series:
* MEDIUM: tcpcheck/backend: Get the connection SNI before initializing SSL ctx
* MINOR: connection/ssl: Store the SNI hash value in the connection itself
* MEDIUM: ssl: Store hash of the SNI for cached TLS sessions
* MINOR: ssl: Add a function to hash SNIs
* MEDIUM: quic: Add connection as argument when qc_new_conn() is called
* BUG/MINOR: ssl: Don't allow to set NULL sni
The SNI of a new connection is now retrieved earlier, before the
initialization of the SSL context. So, concretely, it is now performed
before calling conn_prepare(). The SNI is then set just after.
When a SNI is set on a new connection, its hash is now saved in the
connection itself. To do so, a dedicated field was added into the connection
strucutre, called sni_hash. For now, this value is only used when the TLS
session is cached.
This patch relies on the commit "MINOR: ssl: Store hash of the SNI for
cached TLS sessions". We now use the hash of the SNIs instead of the SNIs
themselves to know if we must update the cached SNI or not.
For cached TLS sessions, in addition to the SNI itself, its hash is now also
saved. No changes are expected here because this hash is not used for now.
This commit relies on:
* MINOR: ssl: Add a function to hash SNIs
This patch only adds the function ssl_sock_sni_hash() that can be used to
get the hash value corresponding to an SNI. A global seed, sni_hash_seed, is
used.
This patch reverts the commit efe60745b ("MINOR: quic: remove connection arg
from qc_new_conn()"). The connection will be mandatory when the QUIC
connection is created on backend side to fix an issue when we try to reuse a
TLS session.
So, the connection is again an argument of qc_new_conn(), the 4th
argument. It is NULL for frontend QUIC connections but there is no special
check on it.
ssl_sock_set_servername() function was documented to support NULL sni to
unset it. However, the man page of SSL_get_servername() does not mentionned
it is supported or not. And it is in fact not supported by WolfSSL and leads
to a crash if we do so.
For now, this function is never called with a NULL sni, so it better and
safer to forbid this case. Now, if the sni is NULL, the function does
nothing.
This patch could be backported to all stable versions.
This patch impacts both the QUIC frontends and listeners.
Note that "ssl-default-bind-ciphersuites", "ssl-default-bind-curves",
are not ignored by QUIC by the frontend. This is also the case for the
backends with "ssl-default-server-ciphersuites" and "ssl-default-server-curves".
These settings are set by ssl_sock_prepare_ctx() for the frontends and
by ssl_sock_prepare_srv_ssl_ctx() for the backends. But ssl_quic_initial_ctx()
first sets the default QUIC frontends (see <quic_ciphers> and <quic_groups>)
before these ssl_sock.c function are called, leading some TLS stack to
refuse them if they do not support them. This is the case for some OpenSSL 3.5
stack with FIPS support. They do not support X25519.
To fix this, set the default QUIC ciphersuites and curves only if not already
set by the settings mentioned above.
Rename <quic_ciphers> global variable to <default_quic_ciphersuites>
and <quic_groups> to <default_quic_curves> to reflect the OpenSSL API naming.
These options are taken into an account by ssl_quic_initial_ctx()
which inspects these four variable before calling SSL_CTX_set_ciphersuites()
with <default_quic_ciphersuites> as parameter and SSL_CTX_set_curves() with
<default_quic_curves> as parameter if needed, that is to say, if no ciphersuites
and curves were set by "ssl-default-bind-ciphersuites", "ssl-default-bind-curves"
as global options or "ciphersuites", "curves" as "bind" line options.
Note that the bind_conf struct is not modified when no "ciphersuites" or
"curves" option are used on "bind" lines.
On backend side, rely on ssl_sock_init_srv() to set the server ciphersuites
and curves. This function is modified to use respectively <default_quic_ciphersuites>
and <default_quic_curves> if no ciphersuites and curves were set by
"ssl-default-server-ciphersuites", "ssl-default-server-curves" as global options
or "ciphersuites", "curves" as "server" line options.
Thank to @rwagoner for having reported this issue in GH #3194 when using
an OpenSSL 3.5.4 stack with FIPS support.
Must be backported as far as 2.6
This is the same issue as the one fixed by this commit:
BUG/MINOR: quic-be: handshake errors without connection stream closure
But this time this is when the client has to send an alert to the server.
The fix consists in creating the mux after having set the handshake connection
error flag and error_code.
This bug was revealed by ssl/set_ssl_cafile.vtc reg test.
Depends on this commit:
MINOR: quic: avoid code duplication in TLS alert callback
Must be backported to 3.3
Both the OpenSSL QUIC API TLS alert callback ha_quic_ossl_alert() does exactly
the same thing than the one for quictls API, even if the parameter have different
types.
Call ha_quic_send_alert() quictls callback from ha_quic_ossl_alert OpenSSL
QUIC API callback to avoid such code duplication.
Traces were missing in this function.
Also add information about the connection struct from qc->conn when
initialized for all the traces.
Should be easily backported as far as 2.6.
This bug was revealed on backend side by reg-tests/ssl/del_ssl_crt-list.vtc when
run wich QUIC connections. As expected by the test, a TLS alert is generated on
servsr side. This latter sands a CONNECTION_CLOSE frame with a CRYPTO error
(>= 0x100). In this case the client closes its QUIC connection. But
the stream connection was not informed. This leads the connection to
be closed after the server timeout expiration. It shouls be closed asap.
This is the reason why reg-tests/ssl/del_ssl_crt-list.vtc could succeeds
or failed, but only after a 5 seconds delay.
To fix this, mimic the ssl_sock_io_cb() for TCP/SSL connections. Call
the same code this patch implements with ssl_sock_handle_hs_error()
to correctly handle the handshake errors. Note that some SSL counters
were not incremented for both the backends and frontends. After such
errors, ssl_sock_io_cb() start the mux after the connection has been
flagged in error. This has as side effect to close the stream
in conn_create_mux().
Must be backported to 3.3 only for backends. This is not sure at this time
if this bug may impact the frontends.
Such crashes may occur for QUIC frontends only when the SSL traces are enabled.
ssl_sock_switchctx_cbk() ClientHello callback may be called without any connection
initialize (<conn>) for QUIC connections leading to crashes when passing
conn->err_code to TRACE_ERROR().
Modify the TRACE_ERROR() statement to pass this parameter only when <conn> is
initialized.
Must be backported as far as 3.2.
As returned by Christian Ruppert in GH issue #3203, we're having an
issue with checks for empty args in skipped blocks: the check is
performed after the line is tokenized, without considering the case
where it's disabled due to outer false .if/.else conditions. Because
of this, a test like this one:
.if defined(SRV1_ADDR)
server srv1 "$SRV1_ADDR"
.endif
will fail when SRV1_ADDR is empty or not set, saying that this will
result in an empty arg on the line.
The solution consists in postponing this check after the conditions
evaluation so that disabled lines are already skipped. And for this
to be possible, we need to move "errptr" one level above so that it
remains accessible there.
This will need to be backported to 3.3 and wherever commit 1968731765
("BUG/MEDIUM: config: solve the empty argument problem again") is
backported. As such it is also related to GH issue #2367.
The keyword was correct in the doc but in the code it was spelled
with a missing 's' after 'settings', making it unavailable. Since
there was no other way to find this but reading the code, it's safe
to simply fix it and assume nobody relied on the wrong spelling.
In the worst case for older backports it can also be duplicated.
This must be backported to 3.0.
Extend QUIC server configuration so that congestion algorithm and
maximum window size can be set on the server line. This can be achieved
using quic-cc-algo keyword with a syntax similar to a bind line.
This should be backported up to 3.3 as this feature is considered as
necessary for full QUIC backend support. Note that this relies on the
serie of previous commits which should be picked first.
Extract code from bind_parse_quic_cc_algo() related to pure parsing of
quic-cc-algo keyword. The objective is to be able to quickly duplicate
this option on the server line.
This may need to be backported to support QUIC congestion control
algorithm support on the server line in version 3.3.
Each QUIC congestion algorithm is defined as a structure with callbacks
in it. Every quic_conn has a member pointing to the configured
algorithm, inherited from the bind-conf keyword or to the default CUBIC
value.
Convert all these definitions to const. This ensures that there never
will be an accidental modification of a globally shared structure. This
also requires to mark quic_cc_algo field in bind_conf and quic_cc as
const.
This reverts commit a6504c9cfb.
Each supported QUIC algo are associated with a set of callbacks defined
in a structure quic_cc_algo. Originally, bind_conf would use a constant
pointer to one of these definitions.
During pacing implementation, this field was transformed into a
dynamically allocated value copied from the original definition. The
idea was to be able to tweak settings at the listener level. However,
this was never used in practice. As such, revert to the original model.
This may need to be backported to support QUIC congestion control
algorithm support on the server line in version 3.3.
Because of missing "case" keyword in front of the values in a switch
case statement, the values were interpreted as goto tags and the switch
statement became useless.
This patch should fix GitHub issue #3200.
The fix should be backported up to 2.8.
As reported in github issue #3192, in certain situations with transparent
listeners, it is possible to get the incoming connection's destination
wrong via SO_ORIGINAL_DST. Two cases were identified thus far:
- incorrect conntrack configuration where NOTRACK is used only on
incoming packets, resulting in reverse connections being created
from response packets. It's then mostly a matter of timing, i.e.
whether or not the connection is confirmed before the source is
retrieved, but in this case the connection's destination address
as retrieved by SO_ORIGINAL_DST is the client's address.
- late outgoing retransmit that recreates a just expired conntrack
entry, in reverse direction as well. It's possible that combinations
of RST or FIN might play a role here in speeding up conntrack eviction,
as well as the rollover of source ports on the client whose new
connection matches an older one and simply refreshes it due to
nf_conntrack_tcp_loose being set by default.
TPROXY doesn't require conntrack, only REDIRECT, DNAT etc do. However
the system doesn't offer any option to know how a conntrack entry was
created (i.e. normally or via a response packet) to let us know that
it's pointless to check the original destination, nor does it permit
to access the local vs peer addresses in opposition to src/dst which
can be wrong in this case.
One alternate approach could consist in only checking SO_ORIGINAL_DST
for listening sockets not configured with the "transparent" option,
but the problem here is that our low-level API only works with FDs
without knowing their purpose, so it's unknown there that the fd
corresponds to a listener, let alone in transparent mode.
A (slightly more expensive) variant of this approach here consists in
checking on the socket itself that it was accepted in transparent mode
using IP_TRANSPARENT, and skip SO_ORIGINAL_DST if this is the case.
This does the job well enough (no more client addresses appearing in
the dst field) and remains a good compromise. A future improvement of
the API could permit to pass the transparent flag down the stack to
that function.
This should be backported to stable versions after some observation
in latest -dev.
For reference, here are some links to older conversations on that topic
that Lukas found during this analysis:
https://lists.openwall.net/netdev/2019/01/12/34https://discourse.haproxy.org/t/send-proxy-not-modifying-some-traffic-with-proxy-ip-port-details/3336/9https://www.mail-archive.com/haproxy@formilux.org/msg32199.htmlhttps://lists.openwall.net/netdev/2019/01/23/114
This reverts commit de29000e60.
The fix was in fact invalid. First it is not supprted by WolfSSL to call
SSL_set_tlsext_host_name with a hostname to NULL. Then, it is not specified
as supported by other SSL libraries.
But, by reviewing the root cause of this bug, it appears there is an issue
with the reuse of TLS sesisons. It must not be performed if the SNI does not
match. A TLS session created with a SNI must not be reused with another
SNI. The side effects are not clear but functionnaly speaking, it is
invalid.
So, for now, the commit above was reverted because it is invalid and it
crashes with WolfSSL. Then the init of the SSL connection must be reworked
to get the SNI earlier, to be able to reuse or not an existing TLS
session.
When a new SSL server connection is created, if no SNI is set, it is
possible to inherit from the one of the reused TLS session. The bug was
introduced by the commit 95ac5fe4a ("MEDIUM: ssl_sock: always use the SSL's
server name, not the one from the tid"). The mixup is possible between
regular connections but also with health-checks connections.
To fix the issue, when no SNI is set, for regular server connections and for
health-check connections, the SNI must explicitly be disabled by calling
ssl_sock_set_servername() with the hostname set to NULL.
Many thanks to Lukas for his detailed bug report.
This patch should fix the issue #3195. It must be backported as far as 3.0.
Replace BUG_ON() for buffer alloc failure on h3_resp_headers_to_htx() by
proper error handling. An error status is reported which should be
sufficient to initiate connection closure.
No need to backport.
h3_resp_headers_to_htx() is the function used to convert an HTTP/3
response into a HTX message. It was introduced on this release for QUIC
backend support.
A BUG_ON() would occur if multiple responses are forwarded
simultaneously on a stream without rcv_buf in between. Fix this by
removing it. Instead, if QCS HTX buffer is not empty when handling with
a new response, prefer to pause demux operation. This is restarted when
the buffer has been read and emptied by the upper stream layer.
No need to backport.
A recent patch has introduced free operation for QUIC tokens stored in a
server. These values are located in <per_thr> server array.
However, a server instance may be released prior to its full
initialization in case of a failure during "add server" CLI command. The
mentionned patch would cause a srv_drop() crash due to an invalid usage
of NULL <per_thr> member.
Fix this by adding a check on <per_thr> prior to dereference it in
srv_drop().
No need to backport.
If quic_connect_server() fails, quic_conn FD will remain unopened as set
to -1. Backend connections do not have a fallback socket for future
exchange, contrary to frontend one which can use the listener FD. As
such, it is better to release these connections early.
This patch adjusts such failure by extending quic_close(). This function
is called by the upper layer immediately after a connect issue. In this
case, release immediately a quic_conn backend instance if the FD is
unset, which means that connect has previously failed.
Also, quic_conn_release() is extended to ensure that such faulty
connections are immediately freed and not converted into a
quic_conn_closed instance.
Prior to this patch, a backend quic_conn without any FD would remain
allocated and possibly active. If its tasklet is executed, this resulted
in a crash due to access to an invalid FD.
No need to backport.
A recent patch has extended "show quic" capability. It is now possible
to list a specific list of connections, either active frontend, closing
frontend or backend connections.
An issue was introduced as the list is local storage. As this command is
reentrant, show quic context must be extended so that the currently
inspected list is also saved.
This issue was reported via GCC which mentions an uninitilized value
depending on branching conditions.
Add an extra "(B)" marker when displaying a backend connection during a
"show quic". This is useful to differentiate them with the frontend side
when displaying all connections.
Add a new "be" filter to "show quic". Its purpose is to be able to
display backend connections. These connections can also be listed using
"all" filter.
Each quic_conn instance is stored in a global list. Its purpose is to be
able to loop over all known connections during "show quic".
Split this into two separate lists for frontend and backend usage.
Another change is that closing backend connections do not move into
quic_conns_clo list. They remain instead in their original list. The
objective of this patch is to reduce the contention between the two
sides.
Note that this prevents backend connections to be listed in "show quic"
now. This will be adjusted in a future patch.
QUIC CIDs are stored in a global tree. Prior to this patch, CIDs used on
both frontend and backend sides were mixed together.
This patch implement CID storage separation between FE and BE sides. The
original tre quic_cid_trees is splitted as
quic_fe_cid_trees/quic_be_cid_trees.
This patch should reduce contention between frontend and backend usages.
Also, it should reduce the risk of random CID collision.
A recent patch has implemented caching of QUIC token received from a
NEW_TOKEN frame into the server cache. This value is stored per thread
into a <quic_retry_token> field.
This field is an ist, first set to an empty string. Via
qc_try_store_new_token(), it is reallocated to fit the size of the newly
stored token. Prior to this patch, the field was never freed so this
causes a memory leak.
Fix this by using istfree() on <quic_retry_token> field during
srv_drop().
No need to backport.
For QUIC client support, a token may be emitted along with INITIAL
packets during the handshake. The token is encoded during emission via
qc_enc_token() called by qc_build_pkt().
The token may be provided from different sources. First, it can be
retrieved via <retry_token> quic_conn member when a Retry packet was
received. If not present, a token may be reused from the server cache,
populated from NEW_TOKEN received from previous a connection.
Prior to this patch, the last method may cause an issue. If the upper
connection instance is released prior to the handshake completion, this
prevents access to a possible server token. This is considered an error
by qc_enc_token(). The error is reported up to calling functions,
preventing any emission to be performed. In the end, this prevented the
either the full quic_conn release or subsizing into quic_conn_closed
until the idle timeout completion (30s by default). With abortonclose
set now by default on HTTP frontends, early client shutdowns can easily
cause excessive memory consumption.
To fix this, change qc_enc_token() so that if connection is closed, no
token is encoded but also no error is reported. This allows to continue
emission and permit early connection release.
No need to backport.
All of the other bandwidth-limiting code stores limits and intermediate
(byte) counters as unsigned integers. The exception here is
freq_ctr_overshoot_period which takes in unsigned values but returns a
signed value. While this has the benefit of letting the caller know how
far away from overshooting they are, this is not currently leveraged
anywhere in the codebase, and it has the downside of halving the positive
range of the result.
More concretely though, returning a signed integer when all intermediate
values are unsigned (and boundaries are not checked) could result in an
overflow, producing values that are at best unexpected. In the case of
flt_bwlim (the only usage of freq_ctr_overshoot_period in the codebase at
the time of writing), an overflow could cause the filter to wait for a
large number of milliseconds when in fact it shouldn't wait at all.
This is a niche possibility, because it requires that a bandwidth limit is
defined in the range [2^31, 2^32). In this case, the raw limit value would
not fit into a signed integer, and close to the end of the period, the
`(elapsed * freq)/period` calculation could produce a value which also
doesn't fit into a signed integer.
If at the same time `curr` (the number of events counted so far in the
current period) is small, then we could get a very large negative value
which overflows. This is undefined behaviour and could produce surprising
results. The most obvious outcome is flt_bwlim sometimes waiting for a
large amount of time in a case where it shouldn't wait at all, thereby
incorrectly slowing down the flow of data.
Converting just the return type from signed to unsigned (and checking for
the overflow) prevents this undefined behaviour. It also makes the range
of valid values consistent between the input and output of
freq_ctr_overshoot_period and with the input and output of other freq_ctr
functions, thereby reducing the potential for surprise in intermediate
calculations: now everything supports the full 0 - 2^32 range.
A new server feature "sni-auto" has been introduced recently. The
objective is to automatically set the SNI value to the host header if no
SNI is explicitely set.
668916c1a2
MEDIUM: server/ssl: Base the SNI value to the HTTP host header by default
There is an issue with it : server SNI is currently always overwritten,
even if explicitely set in the configuration file. Adjust
check_config_validity() to ensure the default value is only used if
<sni_expr> is NULL.
This issue was detected as a memory leak on <sni_expr> was reported when
SNI is explicitely set on a server line.
This patch is related to github feature request #3081.
No need to backport, unless the above patch is.
The httpsclient_log_format variable lacks a few values in the TLS fields
that are now available as fetches.
On the backend side we have:
"%[fc_err]/%[ssl_fc_err,hex]/%[ssl_c_err]/%[ssl_c_ca_err]/%[ssl_fc_is_resumed] %[ssl_fc_sni]/%sslv/%sslc"
We now have enough sample fetches to have this equivalent in the
httpclient:
"%[bc_err]/%[ssl_bc_err,hex]/%[ssl_c_err]/%[ssl_c_ca_err]/%[ssl_bc_is_resumed] %[ssl_bc_sni]/%[ssl_bc_protocol]/%[ssl_bc_cipher]"
Instead of the current:
"%[bc_err]/%[ssl_bc_err,hex]/-/-/%[ssl_bc_is_resumed] -/-/-"
Please compiler with maybe-uninitialized warning
src/acme.c: In function ‘cli_acme_chall_ready_parse’:
include/haproxy/task.h:215:9: error: ‘ctx’ may be used uninitialized [-Werror=maybe-uninitialized]
215 | _task_wakeup(t, f, MK_CALLER(WAKEUP_TYPE_TASK_WAKEUP, 0, 0))
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/acme.c:2903:17: note: in expansion of macro ‘task_wakeup’
2903 | task_wakeup(ctx->task, TASK_WOKEN_MSG);
| ^~~~~~~~~~~
src/acme.c:2862:26: note: ‘ctx’ was declared here
2862 | struct acme_ctx *ctx;
| ^~~
Backport to 3.2.
Improve the challenge_ready processing:
- do a lookup directly instead looping in the task tree
- only do a task_wakeup when every challenges are ready to avoid starting
the task and stopping it just after
- Compute the number of remaining challenge to setup
- Output a message giving the number of remaining challenges to setup
and if the task started again.
Backport to 3.2.
In case of the dns-01 challenge, it is possible to have a domain
"example.com" and "*.example.com" in the same request. This will create
2 different auth objects, which need 2 different challenges.
However the associated domain is "example.com" for both auth objects.
When doing a "challenge_ready", the algorithm will break at the first
domain found. But since you can have multiple time the same domain in
this case, breaking at the first one prevent to have all auth objects in
a ready state.
This patch just remove the break so we can loop on every auth objects.
Must be backported to 3.2.
Since the following commit, allocation of stream-endpoint has been
delayed. The objective is to allocate it only for QCS attached to an
upper stream object.
commit e6064c5616
OPTIM: mux-quic: delay FE sedesc alloc to stream creation
However, some MUX functions are unsafe as qcs->sd is dereferenced
without any check on it which will result in a crash. Fix this by
testing that qcs->sd is allocated before using it.
This does not need to be backported, unless the above patch is.
This bug impacts only the backends.
Recent commits have modified quic_rx_pkt_parse() for the QUIC backend to handle the
retry token, and version negotiation. This function is called for the quic_conn
even when is closing state (so for the quic_conn_closed struct). The quic_conn
struct and quic_conn_closed struct share some members thank to the leading
QUIC_CONN_COMMON struct. The recent modification impacts some members which do not
exist for the quic_connn_closed struct, leading to buffer overflows if modified.
For the backends only this patch:
1- silently drops the Retry packet (received/parsed only by backends)
2- silently drops the Initial packets received in closing state
This is safe for the Initial packets because in closing state the datagrams
are entirely skipped thanks to qc_rx_check_closing() in quic_dgram_parse().
No backport needed because the backend support arrived with the current dev.
On frontend side, a stream-endpoint is allocated on every qcs_new()
invokation. However, this is only used for bidirectional request
streams.
This patch delays stream-endpoint allocation to qcs_attach_sc(), just
prior the instantiation of the upper stream object. This does not bring
any behavior change but is a nice optimization.
On backend side, streams are instantiated prior to their QCS MUX
counterpart. Thus, QCS can reuse the stream-endpoint already allocated
with the streams, either on qmux_init() or attach operation.
However, a stream-endpoint is also always allocated in every qcs_new()
invokation. For backend QCS, it is thus overwritten on
qmux_init()/attach operation. This causes a memleak.
Fix this by restricting allocation of stream-endpoint only for frontend
connection.
This does not need to be backported.
A regression was introduced in the commit 2d7e3ddd4 ("BUG/MEDIUM: cli: do
not return ACKs one char at a time"). When the CLI is processing a command
line, we no longer send response immediately. It is especially useful for
clients sending a bunch of commands with very short response.
However, in that state, the CLI applet must state it has no more data to
deliver. Otherwise it will be woken up again and again because data are
found in its output buffer with no blocking conditions. In worst cases, if
the command rate is really high, this can trigger the watchdog.
This patch must be backported where the patch above is, so probably as far
as 3.0.
There was a mixup between read/send events and ability for an applet to
receive and send. The fix seems obvious by reading it. The call-rate must be
incremented when nothing was received from the applet while it was allowed
and nothing was sent to the applet while it was allowed.
This patch must be backported as far as 3.0.
The computed maxconn was only displayed in verbose or debug modes. This
is too bad because lots of users just don't know what they're starting
with and can be trapped when an environment changes. Let's use ha_notice()
instead of a conditional fprintf() so that it gets displayed right after
the other startup messages, hoping that users will get used to seeing it
and more easily spot anomalies. See github issue #3191 for more context.
The fix in commit 9481cef948 ("BUG/MEDIUM: connection: do not reinsert a
purgeable conn in idle list") is also needed for ssl_sock_io_cb() which
can also release an idle connection and must perform the same checks.
This fix must be backported to all stable versions containing the fix
above.
Connection struct defines an handle which can point to either a FD or a
quic_conn. On the latter case, CO_FL_FDLESS must be set. This is already
the case on frontend side.
This patch fixes QUIC backend support. Before setting connection handle
member to a quic_conn instance, ensure that CO_FL_FDLESS flag is set on
the connection.
Prior to this patch, crash can occur in "show sess all".
No need to backport.
quic_conn has a local_addr member which is used to store the connection
source address. On backend side, this member is initialized to NULL as
the address is not yet known prior to connect. With this patch,
quic_connect_server() is extended so that local_addr is updated after
connect() success.
Also, quic_sock_get_src() is completed for the backend side which now
returns local_addr member. This step is necessary to properly support
fetches bc_src/bc_src_port.
Since the commit 5003ac7fe ("MEDIUM: config: set useful ALPN defaults for
HTTPS and QUIC"), the ALPN is set by default to "h2,http/1.1" for HTTPS
listeners. However, it is in conflict with the forced mux protocol, if
any. Indeed, with "proto" keyword, the mux can be forced. In that case, some
combinations with the default ALPN will triggers connections errors.
For instance, by setting "proto h2", it will not be possible to use the H1
multiplexer. So we must take care to not advertise it in the ALPN. Worse,
since the commit above, most modern HTTP clients will try to use the H2
because it is advertised in the ALPN. By setting "proto h1" on the bind line
will make all the traffic rejected in error.
To fix the issue, and thanks to previous commits, if it is defined, we are
now relying on the ALPN defined by the mux protocol by default. The H1
multiplexer (only the one that can be forced) defines it to "http/1.1" while
the H2 multiplexer defines it to "h2". So by default, if one or another of
these muxes is forced, and if no ALPN is set, the mux ALPN is used.
Other multiplexers are not defining any default ALPN for now, because it is
useless. In addition, only the listeners are concerned because there is no
default ALPN on the server side.Finally, there is no tests performed if the
ALPN is forced on the bind line. It is the user responsibility to properly
configure his listeners (at least for now).
This patch depends on:
* MINOR: config: Do proto detection for listeners before checks about ALPN
* MINOR: muxes: Support an optional ALPN string when defining mux protocols
The series must be backported as far as 2.8.
The verification of any forced mux protocol, via the "proto" keyword, for
listeners is now performed before any tests on the ALPN. It will be
mandatory to be able to force the default ALPN, if not forced on the bind
line.
This patch will be mandatory for the next fix.
When a multiplexer protocol is defined, it is now possible to specify the
ALPN it supports, in binary format. This info is optionnal. For now only the
h2 and the h1 multiplexers define an ALPN because this will be mandatory for
a fix. But this could be used in future for different purpose.
This patch will be mandatory for the next fix.
In assign_server_and_queue(), there's a rare case when the server was
full, so we created a pendconn, another server was considered but in the
meanwhile the pendconn was unqueued already, so we just left the
function. We did so, however, while still holding the queue lock, which
will ultimately lead to a deadlock, and ultimately the watchdog would
kill the process.
To fix that, just unlock the queue before leaving.
This should be backported to 3.2.
When configuring an acme section with the 'map' keyword, the user must
use an existing map. If the map doesn't exist, a log will be emitted
when trying to add the challenge to the map.
This patch change the behavior by checking at startup if the map exists,
so haproxy would warn and won't start with a non-existing map.
This must be backported in 3.2.
Contrary to TCP, QUIC does not SSL_free() its SSL * object when its ->close()
XPRT callback is called. This has as side effect to trigger some BUG_ON(!conn)
with <conn> the connection from TLS callbacks registered at configuration
parsing time, so after this <conn> have been released.
This is the case for instance with ssl_sock_srv_verifycbk() whose role is to
add some checks to the built-in server certificate verification process.
This patch prevents the pointer to <conn> dereferencing inside several callbacks
shared between TCP and QUIC.
Thank you to @InputOutputZ for its report in GH #3188.
As the QUIC backend feature arrived with the current 3.3 dev, no need to backport.
As shown in github issue #3191, the error message shown when FD limits
are exceeded is not very useful as-is, since the current hard limit is
not displayed, and no suggestion is made about what to change in the
config. Let's explain about maxconn/ulimit-n/fd-hard-limit, suggest
dropping them or setting them to a context-based value at roughly 49%
of the current limit minus the known used FDs for listeners and checks.
This allows common "large" hard limits to report mostly round maxconns.
Example:
[ALERT] (25330) : [haproxy.main()] Cannot raise FD limit to 4001020,
current limit is 1024 and hard limit is 4096. You may prefer to let
HAProxy adjust the limit by itself; for this, please just drop any
'maxconn' and 'ulimit-n' from the global section, and possibly add
'fd-hard-limit' lower than this hard limit. You may also force a new
'maxconn' value that is a bit lower than half of the hard limit minus
listeners and checks. This results in roughly 1500 here.
It's always a pain to guess the number of FDs that can be needed by
listeners, checks, threads, pollers etc. We have this estimate in
global.maxsock before calling set_global_maxconn(), but we lose it
the line after. Let's copy it into global.est_fd_usage and keep it.
This will be helpful to try to provide more accurate suggestions for
maxconn.
This quic_conn ->xrpt_ctx is passed to qc_send_ppkts(), the quic_conn is retrieved
from this context to be used inside this function and it is not used at all
by this function.
This patch simply directly passes the quic_conn to qc_send_ppkts(). This is only
what this function needs.
On the frontend side, QUIC transfer can be performed either via a
connection owned FD or multiplex on the listener one. When a quic_conn
is freed and converted to quic_conn_closed instance, its FD if open is
closed and all exchanges are now multiplex via the listener FD.
This is different for the backend as connections only has the choice to
use their owned FD. Thus, special care care must be taken when freeing a
connection and converting it to a quic_conn_closed instance. In this
case, qc_release_fd() is delayed to the quic_conn_closed release.
Furthermore, when the FD is transferred, its iocb and owner fields are
updated to the new quic_conn_closed instance. Without it, a crash will
occur when accessing the freed quic_conn tasklet. A newly dedicated
handler quic_conn_closed_sock_fd_iocb is used to ensure access to
quic_conn_closed members only.
jobs is a global counter which serves to account activity through the
whole process. Soft-stop procedure will wait until this counter is
resetted to the nul value.
jobs is not used for backend connections. Thus, it is not incremented
when a QUIC backend connection is instantiated as expected. However,
decrement is performed on all sides during quic_conn_release(). This
causes the counter wrapping.
Fix this by decrementing jobs only for frontend connections. Without
this patch, soft stop procedure will hang indefinitely if QUIC backend
connections were in use.
Properly implement support for max-reuse server keyword. This is done by
adding a total count of streams seen for the whole connection. This
value is used in avail_streams callback.
When haproxy is compiled in -O0, the SSL_get_max_early_data() symbol is
used in the generated assembly, however -O2 seems to remove this symbol
when optimizing the code.
It happens because `if conn_is_back(conn)` and `if
(objt_listener(conn->target))` are opposed conditions, which mean we
never use the branch when objt_listener(conn->target) is true.
This patch removes the dead code. Bonus: SSL_get_max_early_data() is not
implemented in rustls, and that's the only thing preventing to start
with it.
This can be backported in every stable branches.
When trying to use the P-256 curve in the acme configuration with
OpenSSL 3.x, the generation of the account was failing because OpenSSL
doesn't return a NIST or SECG curve name, but a ANSI X9.62 one.
Since the ANSI X9.62 curve names were not in the list, it couldn't match
anything supported.
This patch fixes the issue by adding both prime192v1 and prime256v1 name
in the struct curve array which is used during curve parsing.
Must be backported to 3.2.
Since the new master-worker model in 3.1, signals are registered in
step_init_3(). However, those signals were supposed to be registered
only for the worker or the standalone mode. It would call the wrong
callback in the master even during configuration parsing.
The patch set the signals handler to NULL for the master so it does
nothing until they really are registered.
Must be backported as far as 3.1.
Since haproxy 3.1, the master-worker mode changed to let the worker
parse the configuration instead of the master.
Previously, signals were blocked during configuration parsing and
unblocked before entering the polling loop of the master. This way it
was impossible to start a reload during the configuration parsing.
But with the new model, the polling loop is started in the master before
the configuration parsing is finished, and the signals are still
unblocked at this step. Meaning that it is possible to start a reload
while the configuration is parsing.
This patch reintroduce the behavior of blocking the signals during
configuration parsing adapted to the new model:
- Before the exec() of the reload, signals are blocked.
- When entering the polling loop, the SIGCHLD is unblocked because it is
required to get a failure during configuration parsing in the worker
- Once the configuration is parsed, upon success in _send_status() or
upon failure in run_master_in_recovery_mode() every signals are unblocked.
This patch must be backported as far as 3.1.
Move the char *msg variable declared in main() in a sub-block since
there's already multiple msg variable in other sub-blocks in this
function.
Also make it const.
The QUIC backend crashes when its peer does not support 0-RTT. In this case,
when the sessions are reused, no early-data level secrets are derived by
the TLS stack. This leads to crashes from qc_send_mux() which does not suppose
that both early-data level (qc->eel) and application level (qc->ael) cipher levels
could be non initialized.
To fix this:
- prevent qc_send_mux() to send data if these two encryption level are not
intialized. In this case it returns QUIC_TX_ERR_NONE;
- avoid waking up the MUX from XPRT ->start() callback if the MUX is ready
but without early-data level secrets to send them;
- ensure the MUX is woken up by qc_ssl_do_handshake() after handshake completion
if it is ready calling qc_notify_send()
Thank you to @InputOutputZ for having reported this issue in GH #3188.
No need to backport because QUIC backends is a current 3.3 development feature.
There was no mworker-max-reload value by default, it was set to INT_MAX
so this was impossible to reach.
The default value is now 50, which is still high, but no workers should
undergo that much reloads. Meaning that a worker will be killed with
SIGTERM if it reach this much reloads.
Remove <ipv4> argument from qc_new_conn(). This parameter is unnecessary
as it can be derived from the family type of the addresses also passed
as argument.
The objective of this patch is to streamline qc_new_conn() usage so that
it is similar for frontend and backend sides.
Previously, several parameters were set only for frontend connections.
These arguments are replaced by a single quic_rx_packet argument, which
represents the INITIAL packet triggering the connection allocation on
the server side. For a QUIC client endpoint, it remains NULL. This usage
is consider more explicit.
As a minor change, <target> is moved as the first argument of the
function. This is considered useful as this argument determines whether
the connection is a frontend or backend entry.
Along with these changes, qc_new_conn() documentation has been reworded
so that it is now up-to-date with the newest usage.
When a new backend connection is instantiated, a CID is first randomly
generated. It will serve as the first DCID for incoming packets from the
server. Prior to this patch, if the generated CID caused a collision
with an other entries from another connection, an error is reported and
the connection cannot be allocated.
This patch improves this procedure by implementing retries when a
collision occurs. Now, at most three attemps will be performed before
giving up. This is the same procedure already performed for CIDs
instantiated after RETIRE_CONNECTION_ID frame parsing.
Along with this functional change, qc_new_conn() is refactored for
backend instantiation. The CID generation is extracted from it and the
value is passed as an argument. This is considered cleaner as the code
is more similar between frontend and backend sides.
quic_newcid_from_hash64 is an external callback. If defined, it serves
as a CID method generation, as an alternative to the default random
implementation.
This mechanism was not correctly implemented on the backend side.
Indeed, <hash64> quic_conn member is only setted for frontend
connections. The simplest solution would be to properly define it also
for backend ones. However, quic_newcid_from_hash64 derivation is really
only useful for the frontend side for now. Thus, this patch disables
using it on the backend side in favor of the default random generator.
To implement this, quic_cid_generate() is splitted in two functions, for
both methods of CIDs generation. This is the responsibility of the
caller to select the proper method. On backend side, only random
implementation is now used.
The shard keyword is already used by the peers and on the server lines. And
it is unrelated with the session keys distribution. So instead of talking
about shard for the session key hashing, we now use the term "bucket".
This bug impacts only the QUIC backends when haproxy is compiled against
OpenSSL 3.5 with QUIC API(HAVE_OPENSSL_QUIC).
The QUIC clients could not reuse their SSL session because the TLS tickets
received from the servers could not be provided to the TLS stack. This should
be done when the stack calls ha_quic_ossl_crypto_recv_rcd()
(OSSL_FUNC_SSL_QUIC_TLS_CRYPTO_RECV_RCD callback).
According to OpenSSL team, an SSL_read() call must be done after the handshake
completion. It seems the correct location is at the same level as for
SSL_process_quic_post_handshake() for quictls.
Thank you to @mattcaswell, @Sashan and @vdukhovni for having helped in solving
this issue.
Must be backported to 3.1
This very minor issue impacts only the backend when compiled against OpenSSL 3.5
with QUIC API (HAVE_OPENSSL_QUIC).
The "SSL handshake OK" trace was not dumped by a TRACE() call. This was very
annoying when debugging.
Modify the concerned code section which is a bit ugly and simplify it.
The TRACE() call is done at a unique location for now on.
Should be backported to 3.2 to ease any further backport.