Commit graph

1085 commits

Author SHA1 Message Date
Willy Tarreau
f8385ef165 CLEANUP: tree-wide: fix around 20 mistakes in comments in h2,tools,peers
This cleans up around 12 non-visible typos in h2 and mux-h2, 6 in peers,
3 in tools, and also addresses a leftover after commit 9294e8822f in 2.4
which changed the word fingerprint calculation without updating the
comment about the possible output values. No backport needed.
2026-04-27 14:47:39 +02:00
Willy Tarreau
14a113472d CLEANUP: mux-h2: remove duplicate forward declaration of h2s_rxbuf_{head,tail}()
These were strangely committed together with commit e4cb0ad632 ("MINOR:
mux-h2/traces: add buffer-related info to h2s and h2c").
2026-04-27 14:44:29 +02:00
Willy Tarreau
c6600d7835 CLEANUP: tree-wide: address various spelling mistakes in comments from -dev7
These ones were found in recent patches merged since -dev7. There is no
user-visible change so no backport is needed.
2026-04-27 10:50:12 +02:00
Willy Tarreau
d12edebe4a BUG/MAJOR: mux-h2: detect incomplete transfers on HEADERS frames as well
Checks are already made on H2 to detect inconsistencies between
advertised content-length and transferred data (excess of data or
premature END_STREAM flag on DATA frame). However, as found by
Martino Spagnuolo (r3verii), a subtle case remains: if the END_STREAM
appears on the HEADERS frame (i.e. a regular request for example),
then the check is not made. In this case it is possible to advertise
more contents than will really be transferred. If the other side uses
HTTP/1.1, and the server responds before the end of the transfer,
this means that the number of advertised bytes that will never be
transferred and that the server will drain will be taken from the
next request, effectively hiding a part of the header.

In practice this can be used to force subsequent requests to fail, or
when running with "http-reuse never" or when running with a totally
idle server, to perform a request smuggling by constructing specially
crafted request pairs where the first one is used to trigger an early
response and hide parts of or all headers of the second one, to
instead use a second embedded one that was not subject to analysis.

The risk remains moderate given the low prevalence of "http-reuse never"
in production environments, and of idle servers.

The fix consists in detecting if advertised content-length remains when
processing an END_STREAM flag on a HEADERS frame. It also does it for
trailers, which turn out to be another way to abuse the bug. However it
takes great care not to break bodyless responses (204, 304 and responses
to HEAD requests) that may present a content-length that doesn't reflect
the presence of a body in the response.

A temporary alternative to the fix is to disable HTTP/2 by specifying
"alpn http/1.1" on "bind" lines, and adding "option disable-h2-upgrade"
in HTTP frontends.

This must be backported to all stable versions.
2026-04-23 17:05:24 +02:00
Olivier Houchard
dca4c379ce BUG/MINOR: H2: Don't forget to free shared_rx_bufs on failure
In h2_init(), if we have a failure while creating the h2c, and we
allocated shared_tx_bufs, don't forget to free it, otherwise we'll have
a memory leak.

This was introduced in 3.1 by commit a891534bfd ("MINOR: mux-h2: allocate
the array of shared rx bufs in the h2c"), so the fix should be backported
as far as 3.2.
2026-04-23 08:12:46 +02:00
Olivier Houchard
0963070d4f BUG/MINOR: h2: Don't look at the exclusive bit for PRIORITY frame
When receiving a PRIORITY frame, when checking if the stream id provided
is ours, ignore bit 31, as it is the exclusive bit, and not part of the
stream id, whoever sends a PRIORITY frame with its own id and the
exclusive bit set will not be considered an error, as it should per the
RFC.

The impact is basically non-existent since we don't use PRIORITY frames,
it's only that we would ignore such an invalid frame instead of breaking
the connection.

The bug was introduced in 1.9 with commit 92153fccd3 ("BUG/MINOR: h2:
properly check PRIORITY frames") so the fix must be backported to all
versions.
2026-04-23 08:09:48 +02:00
Olivier Houchard
915a58c3c1 BUG/MINOR: h2: make tune.h2.log-errors actually work
Commit e67e36c9eb introduced
tune.h2.log-errors, that would let you pick if you wanted to know about
stream errors, connection errors, or no error.
However, a logic error made it so no error will be picked for any value
except for "none", in which case connection would be picked. Fix that by
just checking the strcmp() return value correctly.

This should be backported wherever e67e36c9eb
has been backported.
2026-04-23 08:04:43 +02:00
Willy Tarreau
8f7ee0a59f BUG/MINOR: mux-h2: count a proto error when rejecting a stream on parsing error
The proxy error counter was not updated in h2c_frt_handle_headers() in
case of failure to decode a HEADERS frame. Make sure to keep it updated.
This can be backported to all stable versions.
2026-04-22 15:57:20 +02:00
Willy Tarreau
c73a81469e BUG/MINOR: mux-h2: count a protocol error when failing to parse a trailer
Commit aab1a60977 ("BUG/MEDIUM: h2/htx: always fail on too large trailers")
explicitly returned an RST_STREAM on failure to decode some trailers, and
used the code H2_ERR_INTERNAL_ERROR. However there are multiple possible
causes for this failure to happen, and it turns out that it's much more
likely to be related to a protocol error than a decompression error. So
let's change this to PROTOCOL_ERROR, and count a protocol error on the
proxy and in the session.

This can be backported to all stable versions (with adjustments related
to these versions, maybe focusing on 3.2 max is reasonable).
2026-04-22 15:57:20 +02:00
Willy Tarreau
90b2154d93 MEDIUM: muxes: always set conn->owner to the session that owns the connection
When an idle connection is private or considered private, session_add_conn()
is called to add it to the list of connections owned by the session. But
in case of allocation failure, the session is not set, which results in
a long list of possible situations that are all corner cases which are
difficult to test (and debug).

This commit relies on the fact that it is already permitted to have
conn->owner pointing to a session even if the connection couldn't be
added to the session's list, as this was already the case in
conn_backend_get() when dealing with HOL_RISK. Also as seen in commit
3aab17bd56 added in 2.4, it is already possible to have conn->owner
set with the connection not being in a list, and only the list element
is checked for this.

This commit modifies session_add_conn() to always set conn->onwer, even
if the list element couldn't be allocated. This way it's possible to
always refer to conn->owner to find the session owning a private conn
even in case of failure to allocate an entry. This requires to change
the checks on conn->owner to a check of the list element to see if the
connection belongs to a session, the pre-assignment of sess to
conn->owner in conn_backend_get() is no longer needed, same for the
pre-assignment in http_wait_for_response(), and that's all.

The H1 mux remained unchanged because since it cannot multiplex, in
case it fails to allocate a pconn, it instantly kills the connection.
2026-04-21 08:45:46 +02:00
Willy Tarreau
a0541f5d21 BUG/MEDIUM: mux-h2: ignore conn->owner when deciding if a connection is dead
Originally, valid backend connections always used to have conn->owner
pointing to the owner session. In 1.9, commit 93c885 enforced this when
implementing backend H2 support by making sure that no orphaned connection
was left on its own with no remaining stream able to handle it.
Later, idle connections were reworked so that they were no longer
necessarily attached to a stream, but could be directly in the server,
accessed via a hash, so it started to become possible to have conn->owner
left to NULL when picking such a connection. It in fact happens for
http-reuse always, when the second stream picks the connection because
its owner is NULL and it's not changed.

More recently, a case was identified where it could be theoretically
possible to reinsert a dead connection into an idle list, and commit
59c599f3f0 ("BUG/MEDIUM: mux-h2: make sure not to move a dead
connection to idle") addressed that possibility in 3.3 by adding the
h2c_is_dead() test in h2_detach() before deciding to reinsert a
connection into the idle list.

Unfortunately, the combination of changes above results in the following
sequence being possible:
  - a stream requires a connection, connect_server() creates one, sets
    conn->owner to the session, then when the session is being set up,
    the SSL stack calls conn_create_mux() which gets the session from
    conn->owner, passes it to mux->init() (h2_init), which in turn
    creates the backend stream and assigns it this session.

  - when the stream ends, it detaches (h2_detach), and the call to
    h2c_is_dead() returns false because h2c->conn->owner is set. The
    connection is thus added into the server's idle list.

  - a new stream comes, it finds the connection in the server's list,
    which doesn't require to set conn->owner, the stream is added via
    h2_attach() which passes the stream's session, and that one is
    properly set on h2s again, but never on conn->owner.

  - the stream finishes, detaches, and this time the call to h2c_is_dead()
    sees the owner is NULL, thus indicates that the connection seems dead
    so it's not added again to the idle list, and it's destroyed.

Note that this most only happens at low loads (at most one active stream
per connection, so typically at most than one active stream per thread),
where the H2 reuse ratio on a server configured with http-reuse always
or http-reuse aggressive is close to 50%. At high loads, this is much more
rare, though looking at the reuse stats for a server, it's visible that a
sustained load still shows around 1% of the connections being periodically
renewed.

Interestingly, for RHTTP the impact is more important because there
was already a work around for this test in h2c_is_dead() but it uses
conn_is_reverse(), which is never correct in this case (it should be
called conn_to_reverse() because it says the conn must be reversed
and has not yet been), so this extra test doesn't protect against the
NULL check, and connections are closed after each stream is terminated
(if there is no other stream left).

After a long analysis with Amaury and Olivier, it was concluded that:
  - the h2c_is_dead() addition is finally not the best solution and
    could be refined, however in the current state it's a bit tricky.
  - the conn->owner test in h2c_is_dead() is no longer relevant,
    probably since 2.4 when connections were stored using hash_nodes
    in the servers and would no longer depend on a session, so that
    test should be removed.
  - the test conn_is_reverse() on the same line, that was added to
    ignore the former for RHTTP, and which doesn't properly work either
    should be removed as well.

Some further cleanups should be performed to clarify this situation.

This patch implements the points above, and it should be backported
wherever commit 59c599f3f0 was backported.
2026-04-16 18:27:15 +02:00
Willy Tarreau
e375f1061a MINOR: mux-h2: report glitches on early RST_STREAM
We leverage the SE_FL_APP_STARTED flag to detect whether the application
layer had a chance to run or not when an RST_STREAM is received. This
allows us to triage RST_STREAM between regular ones and harmful ones,
and to count glitches for them. It reveals extremely effective at
detecting fast HEADERS+RST pairs.

It could be useful to backport it to 3.2, though it depends on these
two previous patches to be backported first (the first one was already
planned and the second one is harmless, though will require to drop
the haterm changes):

  BUG/MINOR: stconn: Always declare the SC created from healthchecks as a back SC
  MINOR: stconn: flag the stream endpoint descriptor when the app has started
2026-03-30 16:32:21 +02:00
Ilia Shipitsin
b7d1c2f91d CLEANUP: fix typos and spelling in comments and documentation
Corrected multiple spelling mistakes across CLI scripts, documentation,
and source comments (e.g. "Specifiy" → "Specify", "explicitely" → "explicitly",
"transfert" → "transfer", "resetted" → "reset", etc.). These changes
improve readability and consistency without altering functionality.
2026-03-30 09:24:19 +02:00
Christopher Faulet
cd363e0246 MEDIUM: mux-h2: Stop dealing with HTX flags transfer in h2_rcv_buf()
In h2_rcv_buf(), HTX flags are transfer with data when htx_xfer() is
called. There is no reason to continue to deal with them in the H2 mux. In
addition, there is no reason to set SE_FL_EOI flag when a parsing error was
reported. This part was added before the stconn era. Nowadays, when an HTX
parsing error is reported, an error on the sedesc should also be reported.
2026-03-23 14:02:43 +01:00
Christopher Faulet
d257dd4563 Revert "BUG/MEDIUM: mux-h2: make sure to always report pending errors to the stream"
This reverts commit 44932b6c41.

The patch above was only necessary to handle partial headers or trailers
parsing. There was nothing to prevent the H2 multiplexer to start to add
headers or trailers in an HTX message and to stop the processing on error,
leaving the HTX message with no EOH/EOT block.

From the HTX API point of view, it is unexepected. And this was fixed thanks
to the commit ba7dc46a9 ("BUG/MINOR: h2/h3: Never insert partial
headers/trailers in an HTX message").

So this patch can be reverted. It is important to not report a parsign error
too early, when there are still data to transfer to the upper layer.

This patch must be backport where 44932b6c4 was backported but only after
backporting ba7dc46a9 first.
2026-03-23 14:02:43 +01:00
Christopher Faulet
39121ceca6 MEDIUM: tree-wide: Rely on htx_xfer() instead of htx_xfer_blks()
htx_xfer() function replaced htx_xfer_blks(). So let's use it.
2026-03-23 14:02:43 +01:00
Willy Tarreau
5d0f5f8168 MINOR: mux-h2: assign a limited frames processing budget
This introduces 3 new settings: tune.h2.be.max-frames-at-once and
tune.h2.fe.max-frames-at-once, which limit the number of frames that
will be processed at once for backend and frontend side respectively,
and tune.h2.fe.max-rst-at-once which limits the number of RST_STREAM
frames processed at once on the frontend.

We can now yield when reading too many frames at once, which allows to
limit the latency caused by processing too many frames in large buffers.
However if we stop due to the RST budget being depleted, it's most likely
the sign of a protocol abuse, so we make the tasklet go to BULK since
the goal is to punish it.

By limiting the number of RST per loop to 1, the SSL response time drops
from 95ms to 1.6ms during an H2 RST flood attack, and the maximum SSL
connection rate drops from 35.5k to 28.0k instead of 11.8k. A moderate
SSL load that shows 1ms response time and 23kcps increases to 2ms with
15kcps versus 95ms and 800cps before. The average loop time goes down
from 270-280us to 160us, while still doubling the attack absorption
rate with the same CPU capacity.

This patch may usefully be backported to 3.3 and 3.2. Note that to be
effective, this relies on the following patches:

  MEDIUM: sched: do not run a same task multiple times in series
  MINOR: sched: do not requeue a tasklet into the current queue
  MINOR: sched: do not punish self-waking tasklets anymore
  MEDIUM: sched: do not punish self-waking tasklets if TASK_WOKEN_ANY
  MEDIUM: sched: change scheduler budgets to lower TL_BULK
2026-03-23 07:14:22 +01:00
Willy Tarreau
932d77e287 MINOR: mux-h2: permit to fix a minimum value for the advertised streams limit
When using rq-load on tune.h2.fe.max-concurrent-streams, it's easy to
reach a situation where only one stream is allowed. There's nothing
wrong with this but it turns out that slightly higher values do not
necessarily cause significantly higher loads and will improve the user
experience. For this reason the keyword now also supports "min" to
specify a value. Experimentation shows that values from 5 to 15 remain
very effective at protecting the run queue while allowing a great level
of parallelism that keeps a site fluid.
2026-03-19 16:24:32 +01:00
Willy Tarreau
c238965b27 MINOR: mux-h2: permit to moderate the advertised streams limit depending on load
Global setting tune.h2.fe.max-concurrent-streams now supports an optional
"rq-load" option to pass either a target load, or a keyword among "auto"
and "ignore". These are used to quadratically reduce the advertised streams
limit when the thread's run queue size goes beyong the configured value,
and automatically reduce the load on the process from new connections.
With "auto", instead of taking an explicit value, it uses as a target the
"tune.runqueue-depth" setting (which might be automatic). Tests have shown
that values between 50 and 100 are already very effective at reducing the
loads during attacks from 100000 to around 1500. By default, "ignore"
is in effect, which means that the dynamic tuning is not enabled.
2026-03-19 16:24:31 +01:00
Willy Tarreau
b63492e4f4 MINOR: mux-h2: store the concurrent streams hard limit in the h2c
The hard limit on the number of concurrent streams is currently
determined only by configuration and returned by
h2c_max_concurrent_streams(). However this doesn't permit to
change such settings on the fly without risking to break connections,
and it doesn't allow a connection to pick a different value, which
could be desirable for example to try to slow abuse down.

Let's store a copy of h2c_max_concurrent_streams() at connection
creation time into the h2c as streams_hard_limit. This inflates
the h2c size from 1324 to 1328 (0.3%) which is acceptable for the
expected benefits.
2026-03-19 16:24:31 +01:00
Willy Tarreau
e31640368a BUG/MINOR: mux-h2: properly ignore R bit in WINDOW_UPDATE increments
The window size increments are 31 bits and the topmost bit is reserved
and should be ignored, however it was not masked, so a peer sending it
set would emit a negative value which could actually reduce the current
window instead of increasing it. Note that the window cannot reach zero
as there's already a test for this, but transfers could slow down to
the same speed as if an initial window of just a few bytes had been
advertised. Let's just mask the reserved bit before processing.

This should be backported to all stable versions.
2026-03-19 07:21:47 +01:00
Willy Tarreau
0e231bbd7c BUG/MINOR: mux-h2: properly ignore R bit in GOAWAY stream ID
The stream ID indicated in GOAWAY frames must have its bit 31 (R) ignored
and this wasn't the case. The effect is that if this bit was present, the
GOAWAY frame would mark the last acceptable stream as negative, which is
the default situation (unlimited), thus would basically result in this
GOAWAY frame to be ignored since it would replace a negative last_sid
with another negative one. The impact is thus basically that if a peer
would emit anything non-zero in the R bit, the GOAWAY frame would be
ignored and new streams would still be initiated on the backend, before
being rejected by the server.

Thanks to Haruto Kimura (Stella) for finding and reporting this bug.

This fix needs to be backported to all stable versions.
2026-03-19 07:11:54 +01:00
Willy Tarreau
ec7b07b650 MINOR: connection: track mux calls to report their allocation context
Most calls to mux ops were instrumented with a CALL_MUX_WITH_RET() or
CALL_MUX_NO_RET() macro in order to make the current thread's context
point to the called mux and be able to track its allocations. Only
a bunch of harmless mux_ctl() and ->subscribe/unsubscribe calls were
left untouched since useless. But destroy/detach/shut/init/snd_buf
and rcv_buf are now tracked.

It will not show allocations performed in IO callback via tasklet
wakeups however.

In order to ease reading of the output, cmp_memprof_ctx() knows about
muxes and sorts based on the .subscribe function address instead of
the mux_ops address so as to keep various callers grouped.
2026-03-12 18:06:38 +01:00
Christopher Faulet
64d997ebfc MAJOR: muxes: No longer use app_ops .wake() callback function from muxes
Thanks to previous commits, it is now possible to wake the data layer up,
via a tasklet_wakeup, instead of using the app_ops .wake() callback
function.

When a data layer must be notified of a mux event (an error for instance),
we now always perform a tasklet_wakeup(). TASK_WOKEN_MSG state is used by
default. TASK_WOKEN_IO is eventually added if the data layer was subscribed
to receives or sends.

Changes are not trivial at all. We replaced a synchronous call to the
sc_conn_process() function by a tasklet_wakeup().
2026-03-10 15:10:34 +01:00
Christopher Faulet
26a0817c1a MINOR: muxes: Wakup the data layer from a mux stream with TASK_WOKEN_IO state
Now, when a mux stream is waking its data layer up for receives or sends, it
uses the TASK_WOKEN_IO state. The state is not used by the stconn I/O
callback function for now.
2026-03-10 15:10:34 +01:00
Christopher Faulet
aea0d38fdd MINOR: mux-h2: Rely on h2s_notify_send() when resuming h2s for sending
In h2_resume_each_sending_h2s(), there was exactly the same code than
h2s_notify_send(). So let's use h2s_notify_send() instead of duplicating
code.
2026-03-10 15:10:34 +01:00
Willy Tarreau
9019a5db93 MEDIUM: counters: return aggregate extra counters in ->fill_stats()
Now thanks to new macro EXTRA_COUNTERS_AGGR() we can iterate over all
thread groups storages when returning the data for a given metric. This
remains convenient and mostly transparent. The caller continues to pass
the pointer to the metric in the first group, and offsets are calculated
for all other groups and data summed. For now all groups except the
first one contain only zeroes but reported values are nevertheless
correct.
2026-02-26 17:03:53 +01:00
Willy Tarreau
de0eddf512 MINOR: counters: add EXTRA_COUNTERS_BASE() to retrieve extra_counters base storage
The goal is to always retrieve the storage address of the first thread
group for the given module. This will be used to iterate over all thread
groups. For now it returns the same value as EXTRA_COUNTERS_GET().
2026-02-26 17:03:53 +01:00
Willy Tarreau
8dd22a62a4 CLEANUP: counters: only retrieve zeroes for unallocated extra_counters
Since version 2.4 with commit 7f8f6cb926 ("BUG/MEDIUM: stats: prevent
crash if counters not alloc with dummy one") we can afford to always
update extra_counters because we know they're always either allocated
or linked to a dedicated trash. However, the ->fill_stats() callbacks
continue to access such values, making it technically possible to
retrieve random counters from this trash, which is not really clean.
Let's implement an explicit test in the ->fill_stats() functions to
only return 0 for the metric when not allocated like this. It's much
cleaner because it guarantees that we're returning an empty counter
in this case rather than random values.

The situation currently happens for dummy servers like the ones used
in Lua proxies as well as those used by rings (e.g. used for logging
or traces). Normally, none of the objects retrieved via stats or
Prometheus is concerned by this unallocated extra_counters situation,
so this is more about a cleanup than a real fix.
2026-02-26 08:24:03 +01:00
Willy Tarreau
95a9f472d2 MEDIUM: counters: change the fill_stats() API to pass the module and extra_counters
We'll soon need to iterate over thread groups in the fill_stats() functions,
so let's first pass the extra_counters and stats_module pointers to the
fill_stats functions. They now call EXTRA_COUNTERS_GET() themselves with
these elements in order to retrieve the required pointer. Nothing else
changed, and it's getting even a bit more transparent for callers.

This doesn't change anything visible however.
2026-02-26 08:24:03 +01:00
Willy Tarreau
44932b6c41 BUG/MEDIUM: mux-h2: make sure to always report pending errors to the stream
Some checks are pending
Contrib / build (push) Waiting to run
alpine/musl / gcc (push) Waiting to run
VTest / Generate Build Matrix (push) Waiting to run
VTest / (push) Blocked by required conditions
Windows / Windows, gcc, all features (push) Waiting to run
Some stream parsing errors that do not affect the connection result in
the parsed block not being transferred from the rx buffer to the channel
and not being reported upstream in rcv_buf(), causing the stconn to time
out. Let's detect this condition, and propagate term flags anyway since
no more progress will be made otherwise.

This should be backported at least till 3.2, probably even 2.8.
2026-02-26 00:30:42 +01:00
Willy Tarreau
e67e36c9eb MINOR: mux-h2: add a new setting, "tune.h2.log-errors" to tweak error logging
The H2 mux currently logs whenever some decoding fails. Most of the errors
happen at the connection level, but some are even at the stream level,
meaning that multiple logs can be emitted for a given connection, which
can quickly use some resource for little value. This new setting allows
to tweak this and decide to only log errors that affect the connection,
or even none at all.

This should be backported at least as far as 3.2.
2026-02-25 22:43:40 +01:00
Willy Tarreau
cad6e0b3da MINOR: mux-h2: also count glitches on invalid trailers
Two cases were not causing glitches to be incremented:
  - invalid trailers
  - trailers on closed streams

This patch addresses this. It could be backported, at least to 3.2.
2026-02-25 22:03:16 +01:00
Christopher Faulet
36282ae348 MEDIUM: mux-h1/mux-h2/mux-fcgi/h3: Disable 0-copy for buffers of different size
Today, it is useless to check the buffers size before performing a 0-copy in
muxes when data are sent, but it will be mandatory when the large buffers
support on channels will be added. Indeed, muxes will still rely on normal
buffers, so we must take care to never swap buffers of different size.
2026-02-18 13:26:21 +01:00
Christopher Faulet
6bf450b7fe MINOR: tree-wide: Use the buffer size instead of global setting when possible
At many places, we rely on global.tune.bufsize value instead of using the buffer
size. For now, it is not a problem. But if we want to be able to deal with
buffers of different sizes, it is good to reduce as far as possible dependencies
on the global value. most of time, we can use b_size() or c_size()
functions. The main change is performed on the error snapshot where the buffer
size was added into the error_snapshot structure.
2026-02-18 13:26:20 +01:00
Christopher Faulet
cda056b9f4 BUG/MEDIUM: mux-h2/quic: Stop sending via fast-forward if stream is closed
If is illegal to send data if the stream is already closed. The case is
properly handled when data are sent via snd_buf(), by draining the data. But
it was still possible to process these data via nego_ff().

So, in this patch, both for the H2 and QUIC multiplexers, the fast-forward
is disabled if the stream is closed and nothing is performed. Doing so, we
will automatically fall back on the regular sending path and be able to
drain data in snd_buf().

Thanks to Mike Walker for his investigation on the subject.

This patch should be backported as far as 3.0.
2026-02-18 09:44:09 +01:00
Ilia Shipitsin
f8a77ecf62 CLEANUP: assorted typo fixes in the code, commits and doc
Some checks failed
Contrib / build (push) Has been cancelled
alpine/musl / gcc (push) Has been cancelled
VTest / Generate Build Matrix (push) Has been cancelled
Windows / Windows, gcc, all features (push) Has been cancelled
VTest / (push) Has been cancelled
2025-12-25 19:45:29 +01:00
Willy Tarreau
0901f60cef MINOR: mux-h2: perform a graceful close at 75% glitches threshold
This avoids hitting the hard wall for connections with non-compliant
peers that would be accumulating errors over long connections. We now
permit to recycle the connection early enough to reset the connection
counter.

This was tested artificially by adding this to h2c_frt_handle_headers():

  h2c_report_glitch(h2c, 1, "new stream");

or this to h2_detach():

  h2c_report_glitch(h2c, 1, "detaching");

and injecting using h2load -c 1 -n 1000 0:4445 on a config featuring
tune.h2.fe.glitches-threshold 1000:

  finished in 8.74ms, 85802.54 req/s, 686.62MB/s
  requests: 1000 total, 751 started, 751 done, 750 succeeded, 250 failed, 250 errored, 0 timeout
  status codes: 750 2xx, 0 3xx, 0 4xx, 0 5xx
  traffic: 6.00MB (6293303) total, 132.57KB (135750) headers (space savings 29.84%), 5.86MB (6144000) data
                       min         max         mean         sd        +/- sd
  time for request:        9us       178us        10us         6us    99.47%
  time for connect:      139us       139us       139us         0us   100.00%
  time to 1st byte:      339us       339us       339us         0us   100.00%
  req/s           :   87477.70    87477.70    87477.70        0.00   100.00%

The failures are due to h2load not supporting reconnection.
2025-12-20 19:26:29 +01:00
Willy Tarreau
52adeef7e1 MINOR: mux-h2: add missing glitch count for non-decodable H2 headers
One rare error case could produce a protocol error on the stream when
not being able to decode response headers wasn't being accounted as a
glitch, so let's fix it.
2025-12-20 19:11:16 +01:00
Willy Tarreau
9a046fc3ad BUG/MEDIUM: mux-h2: synchronize all conditions to create a new backend stream
In H2 the conditions to create a new stream differ for a client and a
server when a GOAWAY was exchanged. While on the server, any stream
whose ID is lower than or equal to the one advertised in GOAWAY is
valid, for a client it's forbidden to create any stream after receipt
of a GOAWAY, even if its ID is lower than or equal to the last one,
despite the server not being able to tell the difference from the
number of streams in flight.

Unfortunately, the logic in the code did not always reflect this
specificity of the client (the backend code in our case), and most
often considered that it was still permitted to create a new stream
until the max_id was greater than or equal to the advertised last_id.
This is for example what h2c_is_dead() and h2c_streams_left() do. In
other places, such as h2_avail_streams(), the rule is properly taken
into account. Very often the advertised last_id is the same, and this
is also what haproxy does (which explains why it's impossible to
reproduce the issue by chaining two haproxy layers), but a server may
wish to advertise any ID including 2^31-1 as mentioned in the spec,
and in this case the functions would behave differently.

This discrepancy results in a corner case where a GOAWAY received on
an idle connection will cause the next stream creation to be initially
accepted but then rejected via h2_avail_streams(), and the connection
left in a bad state, still attached to the session due to http-reuse
safe, but not reinserted into idle list, since the backend code
currently is not able to properly recover from this situation. Worse,
the idle flags are no longer on it but TASK_F_USR1 still is, and this
makes the recently added BUG_ON() rightfully trigger since this case
is not supposed to happen.

Admittedly more of the backend recovery code needs to be reworked,
however the mux must consistently decide whether or not a connection
may be reused or needs to be released.

This commit fixes the affected logic by introducing a new function
"h2c_reached_last_stream()" which says if a connection has reached its
last stream, regardless of the side, and using this one everywhere
max_id was compared to last_id. This is sufficient to address the
corner case that be_reuse_connection() currently cannot recover from.

This is in relation to GH issue #3215 and it should be sufficient to
fix the issue there. Thanks to Chris Staite for reporting the issue
and kudos to Amaury for spotting the events sequence that can lead
to this situation.

This patch must be backported to 3.3 first, then to older versions
later. It's worth noting that it's much more difficult to observe
the issue before 3.3 because the BUG_ON() is not there, and the
possibly non-released connection might end up being killed for other
reasons (timeouts etc). But one possible visible effect might be the
impossibility to delete a server (which Chris observed in 3.3).
2025-12-18 17:01:32 +01:00
Willy Tarreau
3ec5818807 MINOR: h2/trace: emit a trace of the received RST_STREAM type
Right now we don't get any state trace when receiving an RST_STREAM, and
this is not convenient because RST_STREAM(0) is not visible at all, except
in developer level because the function is entered and left.

Let's extract the RST code first and always log it using TRACE_PRINTF()
(along with h2c/h2s) so that it's possible to detect certain codes being
used.
2025-12-10 15:58:56 +01:00
Christopher Faulet
8e08a635eb MINOR: muxes: Support an optional ALPN string when defining mux protocols
When a multiplexer protocol is defined, it is now possible to specify the
ALPN it supports, in binary format. This info is optionnal. For now only the
h2 and the h1 multiplexers define an ALPN because this will be mandatory for
a fix. But this could be used in future for different purpose.

This patch will be mandatory for the next fix.
2025-11-20 16:14:52 +01:00
Willy Tarreau
4a6dec7193 DEBUG: servers: add a few checks for stress-testing idle conns
The latest idle conns fix 9481cef948 ("BUG/MEDIUM: connection: do not
reinsert a purgeable conn in idle list") addresses a very hard-to-hit
case which manifests itself with an attempt to reuse a connection fails
because conn->mux is NULL:

  Program terminated with signal SIGSEGV, Segmentation fault.
  #0  0x0000655410b8642c in conn_backend_get (reuse_mode=4, srv=srv@entry=0x6554378a7140,
      sess=sess@entry=0x7cfe140948a0, is_safe=is_safe@entry=0,
      hash=hash@entry=910818338996668161) at src/backend.c:1390
  1390     if (conn->mux->takeover && conn->mux->takeover(conn, i, 0) == 0) {

However the condition that leads to this situation can be detected
earlier, by the presence of the connection in the toremove_list, whose
race window is much larger and easier to detect.

This patch adds a few BUG_ON_STRESS() at selected places that an detect
this condition. When built with -DDEBUG_STRESS and run under stress with
two distinct processes communicating over H2 over SSL, under a stress of
400-500k req/s, the front process usually crashes in the first 10-30s
triggering in _srv_add_idle() if the fix above is reverted (and it does
not crash with the fix).

This is mainly included to serve as an illustration of how to instrument
the code for seamless stress testing.
2025-11-14 17:00:17 +01:00
Amaury Denoyelle
9481cef948 BUG/MEDIUM: connection: do not reinsert a purgeable conn in idle list
Some checks failed
Contrib / build (push) Has been cancelled
alpine/musl / gcc (push) Has been cancelled
VTest / Generate Build Matrix (push) Has been cancelled
Windows / Windows, gcc, all features (push) Has been cancelled
VTest / (push) Has been cancelled
A recent patch was introduced to fix a rare race condition in idle
connection code which would result in a crash. The issue is when MUX IO
handler run on top of connection moved in the purgeable list. The
connection would be considered as present in the idle list instead, and
reinserted in it at the end of the handler while still in the purge
list.

  096999ee20
  BUG/MEDIUM: connections: permit to permanently remove an idle conn

This patch solves the described issue. However, it introduces another
bug as it may clear connection flag when removing a connection from its
parent list. However, these flags now serve primarily as a status which
indicate that the connection is accounted by the server. When a backend
connection is freed, server idle/used counters are decremented
accordingly to these flags. With the above patch, an incorrect counter
could be adjusted and thus wrapping would occured.

The first impact of this bug is that it may distort the estimated number
of connections needed by servers, which would result either in poor
reuse rate or too many idle connections kept. Another noticeable impact
is that it may prevent server deletion.

The main problem of the original and current issues is that connection
flags are misinterpreted as telling if a connection is present in the
idle list. As already described here, in fact these flags are solely a
status which indicate that the connection is accounted in server
counters. Thus, here are the definitive conclusion that can be learned
here :

* (conn->flags & CO_FL_LIST_MASK) == 1:
  the connection is accounted by the server
  it may or may not be present in the idle list

* (conn->flags & CO_FL_LIST_MASK) == 0
  the connection is not accounted and not present in idle list

The discussion above does not mention session list, but a similar
pattern can be observed when CO_FL_SESS_IDLE flag is set.

To keep the original issue solved and fix the current one, IO MUX
handlers prologue are rewritten. Now, flags are not checked anymore for
list appartenance and LIST_INLIST macro is used instead. This is
definitely clearer with conn_in_list purpose here.

On IO MUX handlers end, conn idle flags may be checked if conn_in_list
was true, to reinsert the connection either in idle or safe list. This
is considered safe as no function should modify idle flags when a
connection is not stored in a list, except during conn_free() operation.

This patch must be backported to every stable versions after revert of
the above commit. It should be appliable up to 3.0 without any issue. On
2.8 and below, <idle_list> connection member does not exist. It should
be safe to check <leaf_p> tree node as a replacement.
2025-11-14 16:06:34 +01:00
Amaury Denoyelle
d79295d89b Revert "BUG/MEDIUM: connections: permit to permanently remove an idle conn"
The target patch fixes a rare race condition which happen when a MUX IO
handler is working on a connection already moved into the purge list. In
this case, the handler will incorrectly moved back the connection into
the idle list.

To fix this, conn_delete_from_tree() was extended to remove flags along
with the connection from the idle list. This was performed when the
connection is moved into the purge list. However, it introduces another
issue related to the idle server connection accounting. Thus it is
necessary to revert it prior to the incoming newer fix.

This patch must be backported to every version where the original commit
is.
2025-11-14 16:06:34 +01:00
Willy Tarreau
5fe4677231 MINOR: server: move the lock inside srv_add_idle()
Almost all callers of _srv_add_idle() lock the list then call the
function. It's not the most efficient and it requires some care from
the caller to take care of that lock. Let's change this a little bit by
having srv_add_idle() that takes the lock and calls _srv_add_idle() that
is now inlined. This way callers don't have to handle the lock themselves
anymore, and the lock is only taken around the sensitive parts, not the
function call+return.

Interestingly, perf tests show a small perf increase from 2.28-2.32M RPS
to 2.32-2.37M RPS on a 128-thread system.
2025-11-06 13:16:24 +01:00
Willy Tarreau
096999ee20 BUG/MEDIUM: connections: permit to permanently remove an idle conn
There's currently a function conn_delete_from_tree() which is used to
detach an idle connection from the tree it's currently attached to so
that it is no longer found. This function is used in three circumstances:
  - when picking a new connection that no longer has any avail stream
  - when temporarily working on the connection from an I/O handler,
    in which case it's re-added at the end
  - when killing a connection

The 2nd case above is quite specific, as it requires to preserve the
CO_FL_LIST_MASK flags so that the connection can be re-inserted into
the proper tree when leaving the handler. However, there's a catch.
When killing a connection, we want to be certain it will not be
reinserted into the tree. The flags preservation is causing a tiny
race if an I/O happens while the connection is in the kill list,
because in this case the I/O handler will note the connection flags,
do its work, then reinsert the connection where it believed it was,
then the connection gets purged, and another user can find it in the
tree.

The issue is very difficult to reproduce. On a 128-thread machine it
happens in H2 around 500k req/s after around 50M requests. In H1 it
happens after around 1 billion requests.

The fix here consists in passing an extra argument to the function to
indicate if the removal is permanent or not. When it's permanent, the
function will clear the associated flags. The callers were adjusted
so that all those dequeuing a connection in order to kill it do it
permanently and all other ones do it only temporarily.

A slightly different approach could have worked: the function could
always remove all flags, and the callers would need to restore them.
But this would require trickier modifications of the various call
places, compared to only passing 0/1 to indicate the permanent status.

This will need to be backported to all stable versions. The issue was
at least reproduced since 3.1 (not tested before). The patch will need
to be adjusted for 3.2 and older, because a 2nd argument "thr" was
added in 3.3, so the patch will not apply to older versions as-is.
2025-11-05 11:08:25 +01:00
Willy Tarreau
59c599f3f0 BUG/MEDIUM: mux-h2: make sure not to move a dead connection to idle
In h2_detach(), it looks possible to place a dead connection back to
the idle list, and to later call h2_release() on it once detected as
dead. It's not certain that it happens but nothing in the code shows
it is not possible, so better make sure it cannot happen.

This should be preventively backported to all versions.
2025-11-05 11:08:25 +01:00
Willy Tarreau
a1f26ca307 BUG/MINOR: mux-h2: send the preface along with the first request if needed
Some checks are pending
Contrib / build (push) Waiting to run
alpine/musl / gcc (push) Waiting to run
VTest / Generate Build Matrix (push) Waiting to run
VTest / (push) Blocked by required conditions
Windows / Windows, gcc, all features (push) Waiting to run
Tests involving 0-RTT and H2 on the backend show that 0-RTT is being
partially used but does not work. The analysis shows that only the
preface and settings are sent using early-data and the request is sent
separately. As explained in the previous patch, this is caused by the
fact that a wakeup of the iocb is needed just to send the preface, then
a new call to process_stream is needed to try sending again.

Here with this patch, we're making h2_snd_buf() able to send the preface
if it was not yet sent. Thanks to this, the preface, settings and first
request can now leave as a single TCP segment. In case of TLS with 0-RTT,
it now allows all the block to leave in early data.

Even in clear-text H2, we're now seeing a 15% lower context-switch count,
and the number of calls to process_stream() per connection dropped from 3
to 2. The connection rate increased by an extra 9.5%. Compared to without
the last 3 patches, this is a 22% reduction of context-switches, 33%
reduction of process_stream() calls, and 15.7% increase in connection
rate. And more importantly, 0-RTT now really works with H2 on the
backend, saving one full RTT on the first request.

This fix is only for a missed optimization and a non-functional 0-RTT
on the backend. It's worth backporting it, but it doesn't cause enough
harm to hurry a backport. Better wait for it to live a little bit in
3.3 (till at least a week or two after the final release) before
backporting it. It's not sure that it's worth going beyond 3.2 in any
case. It depends on the these two previous commits:

  MEDIUM: mux-h2: do not needlessly refrain from sending data early
  MINOR: mux-h2: extract the code to send preface+settings into its own function
2025-10-30 18:16:54 +01:00
Willy Tarreau
d5aa3e19cc MINOR: mux-h2: extract the code to send preface+settings into its own function
The code that deals with sending preface + settings and changing the
state currently is in h2_process_mux(), but we'll want to do it as
well from h2_snd_buf(), so let's move it to a dedicate function first.
At this point there is no functional change.
2025-10-30 18:16:54 +01:00