The QUIC code can only handle IPv4 or IPv6 addresses, so using two
sockaddr_storage structs wastes a lot of space in the quic_dgram struct.
This is a very large overhead since this structure is written in the MPSC
ring buffers before every datagram, while many of those datagrams are only
50 bytes or less. Using an union instead saves 200 bytes per datagram,
increasing the capacity of the buffers significantly.
Using an offset instead of a pointer into the datagram buffer is less
error-prone as we do not have to manually fixup that pointer when the
datagram is moved somewhere else in memory.
Use an MPSC ring buffer to hold data for each datagram handler. Holding
this data in a per-handler buffer avoids the HoL blocking we experienced
when we had per-listener buffers with data from all threads mixed up
in them.
This also gets rid of the mt_list contention we were suffering before,
that was causing some threads to be stuck for a significant amount of
time, causing warnings and even crashes in some cases.
This patch refactors parsing in QUIC frame module. Function
qc_parse_frm() has been splitted in three :
* qc_parse_frm_type()
* qc_parse_frm_pkt()
* qc_parse_frm_payload()
No functional change. The main objective of this patch is to facilitate
a QMux implementation. One of the gain is the ability to manipulate QUIC
frames without any reference to a QUIC packet as it is irrelevant for
QMux. Also, quic_set_connection_close() calls are extracted as this
relies on qc type. The caller is now responsible to set the required
error code.
Most calls to mux ops were instrumented with a CALL_MUX_WITH_RET() or
CALL_MUX_NO_RET() macro in order to make the current thread's context
point to the called mux and be able to track its allocations. Only
a bunch of harmless mux_ctl() and ->subscribe/unsubscribe calls were
left untouched since useless. But destroy/detach/shut/init/snd_buf
and rcv_buf are now tracked.
It will not show allocations performed in IO callback via tasklet
wakeups however.
In order to ease reading of the output, cmp_memprof_ctx() knows about
muxes and sorts based on the .subscribe function address instead of
the mux_ops address so as to keep various callers grouped.
The QUIC MUX layer is closed after its transport counterpart. This may
be necessary then to reject any new streams opened by the remote peer.
This operation is dependent however from the application protocol.
Previously, a function qc_h3_request_reject() was directly implemented
in quic_conn source file for use when HTTP/3 was previously negotiated.
However, this solution was not evolutive and broke layering.
This patch introduces a new proper separation with a <strm_reject>
callback defined in quic_conn structure. When set, it will be used to
preemptively close any new stream. QUIC MUX is responsible to set it
just before its closure.
No functional change. This patch is purely a refactoring with a better
architecture design. Especially, H3 specific code from transport layer
is now completely removed.
There's something a bit awkward in the way stats counters are inherited
through the QUIC modules: quic_conn-t includes quic_stats-t.h, which
declares quic_stats_module as extern from a type that's not known from
this file. And anyway externs should not be exported from type defintions
since they're not part of the ABI itself.
This commit moves the declaration to quic_stats.h which now takes care
to include stats-t.h to get the definition of struct stats_module. The
few users who used to learn it through quic_conn-t.h now include it
explicitly. As a bonus this reduces the number of preprocessed lines
by 5000 (~0.1%).
By the way, it looks like struct stats_module could benefit from being
moved off stats-t.h since it's only used at places where the rest of
the stats is not needed. Maybe something to consider for a future
cleanup.
half_open_conn is a proxy counter used to account for quic_conn in
half-open state : this represents a connection whose address is not yet
validated (handshake successful, or via token validation).
This counter only has sense for the frontend side. Currently, code is
safe as access is only performed if quic_conn is not yet flagged with
QUIC_FL_CONN_PEER_VALIDATED_ADDR, which is always set for backend
connections.
To better reflect this, add a BUG_ON() when half_open_conn is
incremented/decremented to ensure this never occurs for backend
connections.
When a duplicated CRYPTO frame is received during handshake, a server
may consider that there was a packet loss and immediately retransmit its
pending CRYPTO data without having to wait for PTO expiration. However,
RFC 9002 indicates that this should only be performed at most once per
connection to avoid excessive packet transmission.
QUIC connection is flagged with QUIC_FL_CONN_HANDSHAKE_SPEED_UP to mark
that a fast retransmit has been performed. However, during the
refactoring on CRYPTO handling with the storage conversion from ncbuf to
ncbmbuf, the check on the flag was accidentely removed. The faulty patch
is the following one :
commit f50425c021
MINOR: quic: remove received CRYPTO temporary tree storage
This patch adds again the check on QUIC_FL_CONN_HANDSHAKE_SPEED_UP
before initiating fast retransmit. This ensures this is only performed
once per connection.
This must be backported up to 3.3.
This patch reverts the commit efe60745b ("MINOR: quic: remove connection arg
from qc_new_conn()"). The connection will be mandatory when the QUIC
connection is created on backend side to fix an issue when we try to reuse a
TLS session.
So, the connection is again an argument of qc_new_conn(), the 4th
argument. It is NULL for frontend QUIC connections but there is no special
check on it.
This bug was revealed on backend side by reg-tests/ssl/del_ssl_crt-list.vtc when
run wich QUIC connections. As expected by the test, a TLS alert is generated on
servsr side. This latter sands a CONNECTION_CLOSE frame with a CRYPTO error
(>= 0x100). In this case the client closes its QUIC connection. But
the stream connection was not informed. This leads the connection to
be closed after the server timeout expiration. It shouls be closed asap.
This is the reason why reg-tests/ssl/del_ssl_crt-list.vtc could succeeds
or failed, but only after a 5 seconds delay.
To fix this, mimic the ssl_sock_io_cb() for TCP/SSL connections. Call
the same code this patch implements with ssl_sock_handle_hs_error()
to correctly handle the handshake errors. Note that some SSL counters
were not incremented for both the backends and frontends. After such
errors, ssl_sock_io_cb() start the mux after the connection has been
flagged in error. This has as side effect to close the stream
in conn_create_mux().
Must be backported to 3.3 only for backends. This is not sure at this time
if this bug may impact the frontends.
QUIC CIDs are stored in a global tree. Prior to this patch, CIDs used on
both frontend and backend sides were mixed together.
This patch implement CID storage separation between FE and BE sides. The
original tre quic_cid_trees is splitted as
quic_fe_cid_trees/quic_be_cid_trees.
This patch should reduce contention between frontend and backend usages.
Also, it should reduce the risk of random CID collision.
This bug impacts only the backends.
Recent commits have modified quic_rx_pkt_parse() for the QUIC backend to handle the
retry token, and version negotiation. This function is called for the quic_conn
even when is closing state (so for the quic_conn_closed struct). The quic_conn
struct and quic_conn_closed struct share some members thank to the leading
QUIC_CONN_COMMON struct. The recent modification impacts some members which do not
exist for the quic_connn_closed struct, leading to buffer overflows if modified.
For the backends only this patch:
1- silently drops the Retry packet (received/parsed only by backends)
2- silently drops the Initial packets received in closing state
This is safe for the Initial packets because in closing state the datagrams
are entirely skipped thanks to qc_rx_check_closing() in quic_dgram_parse().
No backport needed because the backend support arrived with the current dev.
Remove <ipv4> argument from qc_new_conn(). This parameter is unnecessary
as it can be derived from the family type of the addresses also passed
as argument.
The objective of this patch is to streamline qc_new_conn() usage so that
it is similar for frontend and backend sides.
Previously, several parameters were set only for frontend connections.
These arguments are replaced by a single quic_rx_packet argument, which
represents the INITIAL packet triggering the connection allocation on
the server side. For a QUIC client endpoint, it remains NULL. This usage
is consider more explicit.
As a minor change, <target> is moved as the first argument of the
function. This is considered useful as this argument determines whether
the connection is a frontend or backend entry.
Along with these changes, qc_new_conn() documentation has been reworded
so that it is now up-to-date with the newest usage.
quic_newcid_from_hash64 is an external callback. If defined, it serves
as a CID method generation, as an alternative to the default random
implementation.
This mechanism was not correctly implemented on the backend side.
Indeed, <hash64> quic_conn member is only setted for frontend
connections. The simplest solution would be to properly define it also
for backend ones. However, quic_newcid_from_hash64 derivation is really
only useful for the frontend side for now. Thus, this patch disables
using it on the backend side in favor of the default random generator.
To implement this, quic_cid_generate() is splitted in two functions, for
both methods of CIDs generation. This is the responsibility of the
caller to select the proper method. On backend side, only random
implementation is now used.
This bug impacts only the QUIC clients (or backends). The version negotiation
was not supported at all for them. This is an oversight.
Contrary to the QUIC server which choose the negotiated version after having
received the transport parameters (into ClientHello message) the client selects
the negotiated version from the first Initial packet version field. Indeed, the
server transport parameters are inside the ServerHello messages ciphered
into Handshake packets.
This non intrusive patch does not impact the QUIC server implementation.
It only selects the negotiated version from the first Initial packet
received from the server and consequently initializes the TLS cipher context.
Thank you to @InputOutputZ for having reporte this issue in GH #3178.
No need to backport because the QUIC backends support arrives with 3.3.
Add a per thread ist struct to srv_per_thread struct to store the QUIC token to
be reused for subsequent sessions.
Parse at packet level (from qc_parse_ptk_frms()) these tokens and store
them calling qc_try_store_new_token() newly implemented function. This is
this new function which does its best (may fail) to update the tokens.
Modify qc_do_build_pkt() to resend these tokens calling quic_enc_token()
implemented by this patch.
At this time the connection migration is not supported by QUIC backends.
This patch prevents this process to be launched for connections to QUIC backends.
Furthermore, the connection migration process could be started systematically
when connecting a backend to INADDR_ANY, leading to crashes into qc_handle_conn_migration()
(when referencing qc->li).
Thank you to @InputOutputZ for having reported this issue in GH #3178.
This patch simply checks the connection type (listener or not) before checking if
a connection migration must be started.
No need to backport because support for QUIC backends is available from 3.3.
On Initial packet parsing, a new quic_conn instance is allocated via
qc_new_conn(). Then a CID is allocated with its value derivated from
client ODCID. On CID tree insert, a collision can occur if another
thread was already parsing an Initial packet from the same client. In
this case, the connection is released and the packet will be requeued to
the other thread.
Originally, CID collision check was performed prior to quic_conn
allocation. This was changed by the commit below, as this could cause
issue on quic_conn alloc failure.
commit 4ae29be18c
BUG/MINOR: quic: Possible endless loop in quic_lstnr_dghdlr()
However, this procedure is less optimal. Indeed, qc_new_conn() performs
many steps, thus it could be better to skip it on Initial CID collision,
which can happen frequently. This patch restores the older order of
operations, with CID collision check prior to quic_conn allocation.
To ensure this does not cause again the same bug, the CID is removed in
case of quic_conn alloc failure. This should prevent any loop as it
ensures that a CID found in the global tree does not point to a NULL
quic_conn, unless if CID is attach to a foreign thread. When this thread
will parse a re-enqueued packet, either the quic_conn is already
allocated or the CID has been removed, triggering a fresh CID and
quic_conn allocation procedure.
CIDs are provided by haproxy so that the peer can use them as DCID of
its packets. Their value is set via a random generator. It happens on
several occasions during connection lifetime:
* via ODCID derivation if haproxy is the server
* on quic_conn init if haproxy is the client
* during post-handshake if haproxy is the server
* on RETIRE_CONNECTION_ID frame parsing
CIDs are stored in a global tree. On ODCID derivation, a check is
performed to ensure the CID is not a duplicate value. This is mandatory
to properly handle multiple INITIAL packets from the same client on
different thread.
However, for the other cases, no check is performed for CID collision.
As _quic_cid_insert() is silent, the issue is not detected at all. This
results in a CID advertized to the peer but not stored in the global
one. In the end, this may cause two issues. The first one is that
packets from the client which use the new CID will be rejected by
haproxy, most probably with a STATELESS_RESET. The second issue is that
it can cause a crash during quic_conn release. Indeed, the CID is stored
in the quic_conn local tree and thus eb_delete() for the global tree
will be performed. As <leaf_p> member is uninit, this results in a
segfault.
Note that this issue is pretty rare. It can only be observed if running
with a high number of concurrent connections in parallel, so that the
random generator will provide duplicate values. Patch is still labelled
as MEDIUM as this modifies code paths used frequently.
To fix this, _quic_cid_insert() unsafe function is completely removed.
Instead, quic_cid_insert() can be used, which reports an error code if a
collision happens. CID are then stored in the quic_conn tree only after
global tree insert success. Here is the solution for each steps if a
collision occurs :
* on init as client: the connection is completely released
* post-handshake: the CID is immediately released. The connection is
kept, but it will miss an extra CID.
* on RETIRE_CONNECTION_ID parsing: a loop is implemented to retry random
generation. It it fails several times, the connection is closed in
error.
A small convenience change is made to quic_cid_insert(). Output
parameter <new_tid> can now be NULL, which is useful as most of the
times caller do not care about it.
This must be backported up to 2.6.
Split new_quic_cid() function into multiple ones. This patch should not
introduce any visible change. The objective is to render CID allocation
and generation more modular.
The first advantage of this patch is to bring code simplication. In
particular, conn CID sequence number increment and insertion into
connection tree is simpler than before. Another improvment is also that
errors could now be handled easier at each different steps of the CID
init.
This patch is a prerequisite for the fix on CID collision, thus it must
be backported prior to it to every affected version.
During RETIRE_CONNECTION_ID frame parsing, a new connection ID is
immediately reallocated after the release of the previous one. This is
done to ensure that the peer will never run out of DCID.
Prior to this patch, a CID allocation failure was be silently ignored.
This prevent the emission of a new CID, which could prevent the peer to
emit packets if it had no other CIDs available for use. Now, such error
is considered fatal to the connection. This is the safest solution as
it's better to close connections when memory is running low.
It must be backported up to 2.8.
This patch removes <mux_state> field from quic_conn structure. The
purpose of this field was to indicate if MUX layer above quic_conn is
not yet initialized, active, or already released.
It became tedious to properly set it as initialization order of the
various quic_conn/conn/MUX layers now differ between the frontend and
backend sides, and also depending if 0-RTT is used or not. Recently, a
new change introduced in connect_server() will allow to initialize QUIC
MUX earlier if ALPN is cached on the server structure. This had another
level of complexity.
Thus, this patch removes <mux_state> field completely. Instead, a new
flag QUIC_FL_CONN_XPRT_CLOSED is defined. It is set at a single place
only on close XPRT callback invokation. It can be mixed with the new
utility functions qc_wait_for_conn()/qc_is_conn_ready() to determine the
status of conn/MUX layers now without an extra quic_conn field.
This patch is similar to the previous one, this time dealing with
qc_new_conn(). This function was asymetric on frontend and backend side,
as connection argument was set only in the latter case.
This was required prior due to qc_alloc_ssl_sock_ctx() signature. This
has changed with the previous patch, thus qc_new_conn() can also be
realigned on both FE and BE sides. <conn> member of quic_conn instance
is always set outside it, in qc_xprt_start() on the backend case.
A QUIC global tune setting is defined to be able to force Retry emission
prior to handshake. By definition, this ability is only supported by
QUIC servers, hence it is a frontend option only.
Rename the option to use "fe" prefix. The old option name is deprecated
and will be removed in 3.5
Various settings can be configured related to QUIC congestion controler.
This patch duplicates them to be able to set independent values on
frontend and backend sides.
As with previous patch, option are renamed to use "fe/be" unified
prefixes. This is part of the current serie of commits which unify QUIC
settings. Older options are deprecated and will be removed on 3.5
release.
The previous commit switch from ncbuf to ncbmbuf as storage for received
CRYPTO frames. The latter ensures that buffering of such frames cannot
fail anymore due to gaps size.
Previously, extra mechanism were implemented on QUIC frames parsing
function to overcome the limitation of ncbuf on gaps size. Before
insertion, CRYPTO frames were stored in a temporary tree to order their
insertion. As this is not necessary anymore, this commit removes the
temporary tree insertion.
This commit is closely associated to the previous bug fix. As it
provides a neat optimization and code simplication, it can be backported
with it, but not in the next immediate release to spot potential
regression.
In QUIC, TLS handshake messages such as ClientHello are encapsulated in
CRYPTO frames. Each QUIC implementation can split the content in several
frames of random sizes. In fact, this feature is now used by several
clients, based on chrome so-called "Chaos protection" mechanism :
https://quiche.googlesource.com/quiche/+/cb6b51054274cb2c939264faf34a1776e0a5bab7
To support this, haproxy uses a ncbuf storage to store received CRYPTO
frames before passing it to the SSL library. However, this storage
suffers from a limitation as gaps between two filled blocks cannot be
smaller than 8 bytes. Thus, depending on the size of received CRYPTO
frames and their order, ncbuf may not be sufficient. Over time, several
mechanisms were implemented in haproxy QUIC frames parsing to overcome
the ncbuf limitation.
However, reports recently highlight that with some clients haproxy is
not able to deal with CRYPTO frames reception. In particular, this is
the case with the latest ngtcp2 release, which implements a similar
chaos protection mechanism via the following patch. It also seems that
this impacts haproxy interaction with firefox.
commit 89c29fd8611d5e6d2f6b1f475c5e3494c376028c
Author: Tatsuhiro Tsujikawa <tatsuhiro.t@gmail.com>
Date: Mon Aug 4 22:48:06 2025 +0900
Crumble Client Initial CRYPTO (aka chaos protection)
To fix haproxy CRYPTO frames buffering once and for all, an alternative
non-contiguous buffer named ncbmbuf has been recently implemented. This
type does not suffer from gaps size limitation, albeit at the cost of a
small reduction in the size available for data storage.
Thus, the purpose of this current patch is to replace ncbuf with the
newer ncbmbuf for QUIC CRYPTO frames parsing. Now, ncbmb_add() is used
to buffer received frames which is guaranteed to suceed. The only
remaining case of error is if a received frame offset and length exceed
the ncbmbuf data storage, which would result in a CRYPTO_BUFFER_EXCEEDED
error code.
A notable behavior change when switching to ncbmbuf implementation is
that NCB_ADD_COMPARE mode cannot be used anymore during add. Instead,
crypto frame content received at a similar offset will be overwritten.
A final note regarding STREAM frames parsing. For now, it is considered
unnecessary to switch from ncbuf in this case. Indeed, QUIC clients does
not perform aggressive fragmentation for them. Keeping ncbuf ensure that
the data storage size is bigger than the equivalent ncbmbuf area.
This should fix github issue #3141.
This patch must be backported up to 2.6. It is first necessary to pick
the relevant commits for ncbmbuf implementation prior to it.
The ->li (struct listener *) member of quic_conn struct was replaced by a
->target (struct obj_type *) member by this commit:
MINOR: quic-be: get rid of ->li quic_conn member
to abstract the connection type (front or back) when implementing QUIC for the
backends. In these cases, ->target was a pointer to the ojb_type of a server
struct. This could not work with the dynamic servers contrary to the listeners
which are not dynamic.
This patch almost reverts the one mentioned above. ->target pointer to obj_type member
is replaced by ->li pointer to listener struct member. As the listener are not
dynamic, this is easy to do this. All one has to do is to replace the
objt_listener(qc->target) statement by qc->li where applicable.
For the backend connection, when needed, this is always qc->conn->target which is
used only when qc->conn is initialized. The only "problematic" case is for
quic_dgram_parse() which takes a pointer to an obj_type as third argument.
But this obj_type is only used to call quic_rx_pkt_parse(). Inside this function
it is used to access the proxy counters of the connection thanks to qc_counters().
So, this obj_type argument may be null for now on with this patch. This is the
reason why qc_counters() is modified to take this into consideration.
This fix follows this previous one:
BUG/MINOR: quic: reorder fragmented RX CRYPTO frames by their offsets
which is not sufficient when a client fragments and mixes its CRYPTO frames AND
leaveswith holes by packets. ngtcp2 (and perhaps chrome) splits theire CRYPTO
frames but without hole by packet. In such a case, the CRYPTO parsing leads to
QUIC_RX_RET_FRM_AGAIN errors which cannot be fixed when the peer resends its packets.
Indeed, even if the peer resends its frames in a different order, this does not
help because since the previous commit, the CRYPTO frames are ordered on haproxy side.
This issue was detected thanks to the interopt tests with quic-go as client. This
client fragments its CRYPTO frames, mixes them, and generate holes, and most of
the times with the retry test.
To fix this, when a QUIC_RX_RET_FRM_AGAIN error is encountered, the CRYPTO frames
parsing is not stop. This leaves chances to the next CRYPTO frames to be parsed.
Must be backported as far as 2.6 as the commit mentioned above.
Since this commit:
BUG/MINOR: quic: reorder fragmented RX CRYPTO frames by their offsets
when they are parsed, the CRYPTO frames are ordered by their offsets into an ebtree.
Then their data are provided to the ncbufs.
But in case of error, when qc_handle_crypto_frm() returns QUIC_RX_RET_FRM_FATAL
or QUIC_RX_RET_FRM_AGAIN), they remain attached to their tree. Then
from <err> label, they are deteleted and deleted (with a while(node) { eb_delete();
qc_frm_free();} loop). But before this loop, these statements directly
free the frame without deleting it from its tree, if this is a CRYPTO frame,
leading to a use after free when running the loop:
if (frm)
qc_frm_free(qc, &frm);
This issue was detected by the interop tests, with quic-go as client. Weirdly, this
client sends CRYPTO frames by packet with holes.
Must be backported as far as 2.6 as the commit mentioned above.
This modification should have arrived with this commit:
MINOR: quic: remove ->offset qf_crypto struct field
Since this commit, the CRYPTO offset node key assignment is done at parsing time
when calling qc_parse_frm() from qc_parse_pkt_frms().
This useless assigment has been reported in GH #3095 by coverity.
This patch should be easily backported as far as 2.6 as the one mentioned above
to ease any further backport to come.
This patch follows this previous bug fix:
BUG/MINOR: quic: reorder fragmented RX CRYPTO frames by their offsets
where a ebtree node has been added to qf_crypto struct. It has the same
meaning and type as ->offset_node.key field with ->offset_node an eb64tree node.
This patch simply removes ->offset which is no more useful.
This patch should be easily backported as far as 2.6 as the one mentioned above
to ease any further backport to come.
This issue impacts the QUIC listeners. It is the same as the one fixed by this
commit:
BUG/MINOR: quic: repeat packet parsing to deal with fragmented CRYPTO
As chrome, ngtcp2 client decided to fragment its CRYPTO frames but in a much
more agressive way. This could be fixed with a list local to qc_parse_pkt_frms()
to please chrome thanks to the commit above. But this is not sufficient for
ngtcp2 which often splits its ClientHello message into more than 10 fragments
with very small ones. This leads the packet parser to interrupt the CRYPTO frames
parsing due to the ncbuf gap size limit.
To fix this, this patch approximatively proceeds the same way but with an
ebtree to reorder the CRYPTO by their offsets. These frames are directly
inserted into a local ebtree. Then this ebtree is reused to provide the
reordered CRYPTO data to the underlying ncbuf (non contiguous buffer). This way
there are very few less chances for the ncbufs used to store CRYPTO data
to reach a too much fragmented state.
Must be backported as far as 2.6.
This will make the pools size and alignment automatically inherit
the type declaration. It was done like this:
sed -i -e 's:DECLARE_POOL(\([^,]*,[^,]*,\s*\)sizeof(\([^)]*\))):DECLARE_TYPED_POOL(\1\2):g' $(git grep -lw DECLARE_POOL src addons)
sed -i -e 's:DECLARE_STATIC_POOL(\([^,]*,[^,]*,\s*\)sizeof(\([^)]*\))):DECLARE_STATIC_TYPED_POOL(\1\2):g' $(git grep -lw DECLARE_STATIC_POOL src addons)
81 replacements were made. The only remaining ones are those which set
their own size without depending on a structure. The few ones with an
extra size were manually handled.
It also means that the requested alignments are now checked against the
type's. Given that none is specified for now, no issue is reported.
It was verified with "show pools detailed" that the definitions are
exactly the same, and that the binaries are similar.
Previously quic_conn <target> member was used to determine if quic_conn
was used on the frontend (as server) or backend side (as client). A new
helper function can now be used to directly check flag
QUIC_FL_CONN_IS_BACK.
This reduces the dependency between quic_conn and their relative
listener/server instances.
Replace all calls to qc_is_listener() (resp. !qc_is_listener()) by calls to
objt_listener() (resp. objt_server()).
Remove qc_is_listener() implement and QUIC_FL_CONN_LISTENER the flag it
relied on.
- Add ->retry_token and ->retry_token_len new quic_conn struct members to store
the retry tokens. These objects are allocated by quic_rx_packet_parse() and
released by quic_conn_release().
- Add <pool_head_quic_retry_token> new pool for these tokens.
- Implement quic_retry_packet_check() to check the integrity tag of these tokens
upon RETRY packets receipt. quic_tls_generate_retry_integrity_tag() is called
by this new function. It has been modified to pass the address where the tag
must be generated
- Add <resend> new parameter to quic_pktns_discard(). This function is called
to discard the packet number spaces where the already TX packets and frames are
attached to. <resend> allows the caller to prevent this function to release
the in flight TX packets/frames. The frames are requeued to be resent.
- Modify quic_rx_pkt_parse() to handle the RETRY packets. What must be done upon
such packets receipt is:
- store the retry token,
- store the new peer SCID as the DCID of the connection. Note that the peer will
modify again its SCID. This is why this SCID is also stored as the ODCID
which must be matched with the peer retry_source_connection_id transport parameter,
- discard the Initial packet number space without flagging it as discarded and
prevent retransmissions calling qc_set_timer(),
- modify the TLS cryptographic cipher contexts (RX/TX),
- wakeup the I/O handler to send new Initial packets asap.
- Modify quic_transport_param_decode() to handle the retry_source_connection_id
transport parameter as a QUIC client. Then its caller is modified to
check this transport parameter matches with the SCID sent by the peer with
the RETRY packet.
Implement support for MAX_STREAMS frame. On frontend, this was mostly
useless as haproxy would never initiate new bidirectional streams.
However, this becomes necessary to control stream flow-control when
using QUIC as a client on the backend side.
Parsing of MAX_STREAMS is implemented via new qcc_recv_max_streams().
This allows to update <ms_uni>/<ms_bidi> QCC fields.
This patch is necessary to achieve QUIC backend connection reuse.
NEW_TOKEN frame is never emitted by a client, hence parsing was not
tested on frontend side.
On backend side, an issue can occur, as expected token length is static,
based on the token length used internally by haproxy. This is not
sufficient for most server implementation which uses larger token. This
causes a parsing error, which may cause skipping of following frames in
the same packet. This issue was detected using ngtcp2 as server.
As for now tokens are unused by haproxy, simply discard test on token
length during NEW_TOKEN frame parsing. The token itself is merely
skipped without being stored. This is sufficient for now to continue on
experimenting with QUIC backend implementation.
This does not need to be backported.
Replace ->li quic_conn pointer to struct listener member by ->target which is
an object type enum and adapt the code.
Use __objt_(listener|server)() where the object type is known. Typically
this is were the code which is specific to one connection type (frontend/backend).
Remove <server> parameter passed to qc_new_conn(). It is redundant with the
<target> parameter.
GSO is not supported at this time for QUIC backend. qc_prep_pkts() is modified
to prevent it from building more than an MTU. This has as consequence to prevent
qc_send_ppkts() to use GSO.
ssl_clienthello.c code is run only by listeners. This is why __objt_listener()
is used in place of ->li.
Store the peer connection ID (SCID) as the connection DCID as soon as an Initial
packet is received.
Stop comparing the packet to QUIC_PACKET_TYPE_0RTT is already match as
QUIC_PACKET_TYPE_INITIAL.
A QUIC server must not send too short datagram with ack-eliciting packets inside.
This cannot be done from quic_rx_pkt_parse() because one does not know if
there is ack-eliciting frame into the Initial packets. If the packet must be
dropped, this is after having parsed it!
Modify quic_dgram_parse() to stop passing it a listener as third parameter.
In place the object type address of the connection socket owner is passed
to support the haproxy servers with QUIC as transport protocol.
qc_owner_obj_type() is implemented to return this address.
qc_counters() is also implemented to return the QUIC specific counters of
the proxy of owner of the connection.
quic_rx_pkt_parse() called by quic_dgram_parse() is also modify to use
the object type address used by this latter as last parameter. It is
also modified to send Retry packet only from listeners. A QUIC client
(connection to haproxy QUIC servers) must drop the Initial packets with
non null token length. It is also not supposed to receive O-RTT packets
which are dropped.
Modify qc_alloc_ssl_sock_ctx() to pass the connection object as parameter. It is
NULL for a QUIC listener, not NULL for a QUIC server. This connection object is
set as value for ->conn quic_conn struct member. Initialise the SSL session object from
this function for QUIC servers.
qc_ssl_set_quic_transport_params() is also modified to pass the SSL object as parameter.
This is the unique parameter this function needs. <qc> parameter is used only for
the trace.
SSL_do_handshake() must be calle as soon as the SSL object is initialized for
the QUIC backend connection. This triggers the TLS CRYPTO data delivery.
tasklet_wakeup() is also called to send asap these CRYPTO data.
Modify the QUIC_EV_CONN_NEW event trace to dump the potential errors returned by
SSL_do_handshake().
quic-conn layer has to handle itself STREAM frames after MUX release. If
the stream was already seen, it is probably only a retransmitted frame
which can be safely ignored. For other streams, an active closure may be
needed.
Thus it's necessary that quic-conn layer knows the highest stream ID
already handled by the MUX after its release. Previously, this was done
via <nb_streams> member array in quic-conn structure.
Refactor this by replacing <nb_streams> by two members called
<stream_max_uni>/<stream_max_bidi>. Indeed, it is unnecessary for
quic-conn layer to monitor locally opened uni streams, as the peer
cannot by definition emit a STREAM frame on it. Also, bidirectional
streams are always opened by the remote side.
Previously, <nb_streams> were set by quic-stream layer. Now,
<stream_max_uni>/<stream_max_bidi> members are only set one time, just
prior to QUIC MUX release. This is sufficient as quic-conn do not use
them if the MUX is available.
Note that previously, IDs were used relatively to their type, thus
incremented by 1, after shifting the original value. For simplification,
use the plain stream ID, which is incremented by 4.
Move general function to check if a stream is uni or bidirectional from
QUIC MUX to quic_utils module. This should prevent unnecessary include
of QUIC MUX header file in other sources.