haproxy

mirror of https://github.com/haproxy/haproxy.git synced 2026-06-11 09:52:56 -04:00

Author	SHA1	Message	Date
Willy Tarreau	f8385ef165	CLEANUP: tree-wide: fix around 20 mistakes in comments in h2,tools,peers This cleans up around 12 non-visible typos in h2 and mux-h2, 6 in peers, 3 in tools, and also addresses a leftover after commit `9294e8822f` in 2.4 which changed the word fingerprint calculation without updating the comment about the possible output values. No backport needed.	2026-04-27 14:47:39 +02:00
Willy Tarreau	14a113472d	CLEANUP: mux-h2: remove duplicate forward declaration of h2s_rxbuf_{head,tail}() These were strangely committed together with commit `e4cb0ad632` ("MINOR: mux-h2/traces: add buffer-related info to h2s and h2c").	2026-04-27 14:44:29 +02:00
Willy Tarreau	c6600d7835	CLEANUP: tree-wide: address various spelling mistakes in comments from -dev7 These ones were found in recent patches merged since -dev7. There is no user-visible change so no backport is needed.	2026-04-27 10:50:12 +02:00
Willy Tarreau	d12edebe4a	BUG/MAJOR: mux-h2: detect incomplete transfers on HEADERS frames as well Checks are already made on H2 to detect inconsistencies between advertised content-length and transferred data (excess of data or premature END_STREAM flag on DATA frame). However, as found by Martino Spagnuolo (r3verii), a subtle case remains: if the END_STREAM appears on the HEADERS frame (i.e. a regular request for example), then the check is not made. In this case it is possible to advertise more contents than will really be transferred. If the other side uses HTTP/1.1, and the server responds before the end of the transfer, this means that the number of advertised bytes that will never be transferred and that the server will drain will be taken from the next request, effectively hiding a part of the header. In practice this can be used to force subsequent requests to fail, or when running with "http-reuse never" or when running with a totally idle server, to perform a request smuggling by constructing specially crafted request pairs where the first one is used to trigger an early response and hide parts of or all headers of the second one, to instead use a second embedded one that was not subject to analysis. The risk remains moderate given the low prevalence of "http-reuse never" in production environments, and of idle servers. The fix consists in detecting if advertised content-length remains when processing an END_STREAM flag on a HEADERS frame. It also does it for trailers, which turn out to be another way to abuse the bug. However it takes great care not to break bodyless responses (204, 304 and responses to HEAD requests) that may present a content-length that doesn't reflect the presence of a body in the response. A temporary alternative to the fix is to disable HTTP/2 by specifying "alpn http/1.1" on "bind" lines, and adding "option disable-h2-upgrade" in HTTP frontends. This must be backported to all stable versions.	2026-04-23 17:05:24 +02:00
Olivier Houchard	dca4c379ce	BUG/MINOR: H2: Don't forget to free shared_rx_bufs on failure In h2_init(), if we have a failure while creating the h2c, and we allocated shared_tx_bufs, don't forget to free it, otherwise we'll have a memory leak. This was introduced in 3.1 by commit `a891534bfd` ("MINOR: mux-h2: allocate the array of shared rx bufs in the h2c"), so the fix should be backported as far as 3.2.	2026-04-23 08:12:46 +02:00
Olivier Houchard	0963070d4f	BUG/MINOR: h2: Don't look at the exclusive bit for PRIORITY frame When receiving a PRIORITY frame, when checking if the stream id provided is ours, ignore bit 31, as it is the exclusive bit, and not part of the stream id, whoever sends a PRIORITY frame with its own id and the exclusive bit set will not be considered an error, as it should per the RFC. The impact is basically non-existent since we don't use PRIORITY frames, it's only that we would ignore such an invalid frame instead of breaking the connection. The bug was introduced in 1.9 with commit `92153fccd3` ("BUG/MINOR: h2: properly check PRIORITY frames") so the fix must be backported to all versions.	2026-04-23 08:09:48 +02:00
Olivier Houchard	915a58c3c1	BUG/MINOR: h2: make tune.h2.log-errors actually work Commit `e67e36c9eb` introduced tune.h2.log-errors, that would let you pick if you wanted to know about stream errors, connection errors, or no error. However, a logic error made it so no error will be picked for any value except for "none", in which case connection would be picked. Fix that by just checking the strcmp() return value correctly. This should be backported wherever `e67e36c9eb` has been backported.	2026-04-23 08:04:43 +02:00
Willy Tarreau	8f7ee0a59f	BUG/MINOR: mux-h2: count a proto error when rejecting a stream on parsing error The proxy error counter was not updated in h2c_frt_handle_headers() in case of failure to decode a HEADERS frame. Make sure to keep it updated. This can be backported to all stable versions.	2026-04-22 15:57:20 +02:00
Willy Tarreau	c73a81469e	BUG/MINOR: mux-h2: count a protocol error when failing to parse a trailer Commit `aab1a60977` ("BUG/MEDIUM: h2/htx: always fail on too large trailers") explicitly returned an RST_STREAM on failure to decode some trailers, and used the code H2_ERR_INTERNAL_ERROR. However there are multiple possible causes for this failure to happen, and it turns out that it's much more likely to be related to a protocol error than a decompression error. So let's change this to PROTOCOL_ERROR, and count a protocol error on the proxy and in the session. This can be backported to all stable versions (with adjustments related to these versions, maybe focusing on 3.2 max is reasonable).	2026-04-22 15:57:20 +02:00
Willy Tarreau	90b2154d93	MEDIUM: muxes: always set conn->owner to the session that owns the connection When an idle connection is private or considered private, session_add_conn() is called to add it to the list of connections owned by the session. But in case of allocation failure, the session is not set, which results in a long list of possible situations that are all corner cases which are difficult to test (and debug). This commit relies on the fact that it is already permitted to have conn->owner pointing to a session even if the connection couldn't be added to the session's list, as this was already the case in conn_backend_get() when dealing with HOL_RISK. Also as seen in commit `3aab17bd56` added in 2.4, it is already possible to have conn->owner set with the connection not being in a list, and only the list element is checked for this. This commit modifies session_add_conn() to always set conn->onwer, even if the list element couldn't be allocated. This way it's possible to always refer to conn->owner to find the session owning a private conn even in case of failure to allocate an entry. This requires to change the checks on conn->owner to a check of the list element to see if the connection belongs to a session, the pre-assignment of sess to conn->owner in conn_backend_get() is no longer needed, same for the pre-assignment in http_wait_for_response(), and that's all. The H1 mux remained unchanged because since it cannot multiplex, in case it fails to allocate a pconn, it instantly kills the connection.	2026-04-21 08:45:46 +02:00
Willy Tarreau	a0541f5d21	BUG/MEDIUM: mux-h2: ignore conn->owner when deciding if a connection is dead Originally, valid backend connections always used to have conn->owner pointing to the owner session. In 1.9, commit 93c885 enforced this when implementing backend H2 support by making sure that no orphaned connection was left on its own with no remaining stream able to handle it. Later, idle connections were reworked so that they were no longer necessarily attached to a stream, but could be directly in the server, accessed via a hash, so it started to become possible to have conn->owner left to NULL when picking such a connection. It in fact happens for http-reuse always, when the second stream picks the connection because its owner is NULL and it's not changed. More recently, a case was identified where it could be theoretically possible to reinsert a dead connection into an idle list, and commit `59c599f3f0` ("BUG/MEDIUM: mux-h2: make sure not to move a dead connection to idle") addressed that possibility in 3.3 by adding the h2c_is_dead() test in h2_detach() before deciding to reinsert a connection into the idle list. Unfortunately, the combination of changes above results in the following sequence being possible: - a stream requires a connection, connect_server() creates one, sets conn->owner to the session, then when the session is being set up, the SSL stack calls conn_create_mux() which gets the session from conn->owner, passes it to mux->init() (h2_init), which in turn creates the backend stream and assigns it this session. - when the stream ends, it detaches (h2_detach), and the call to h2c_is_dead() returns false because h2c->conn->owner is set. The connection is thus added into the server's idle list. - a new stream comes, it finds the connection in the server's list, which doesn't require to set conn->owner, the stream is added via h2_attach() which passes the stream's session, and that one is properly set on h2s again, but never on conn->owner. - the stream finishes, detaches, and this time the call to h2c_is_dead() sees the owner is NULL, thus indicates that the connection seems dead so it's not added again to the idle list, and it's destroyed. Note that this most only happens at low loads (at most one active stream per connection, so typically at most than one active stream per thread), where the H2 reuse ratio on a server configured with http-reuse always or http-reuse aggressive is close to 50%. At high loads, this is much more rare, though looking at the reuse stats for a server, it's visible that a sustained load still shows around 1% of the connections being periodically renewed. Interestingly, for RHTTP the impact is more important because there was already a work around for this test in h2c_is_dead() but it uses conn_is_reverse(), which is never correct in this case (it should be called conn_to_reverse() because it says the conn must be reversed and has not yet been), so this extra test doesn't protect against the NULL check, and connections are closed after each stream is terminated (if there is no other stream left). After a long analysis with Amaury and Olivier, it was concluded that: - the h2c_is_dead() addition is finally not the best solution and could be refined, however in the current state it's a bit tricky. - the conn->owner test in h2c_is_dead() is no longer relevant, probably since 2.4 when connections were stored using hash_nodes in the servers and would no longer depend on a session, so that test should be removed. - the test conn_is_reverse() on the same line, that was added to ignore the former for RHTTP, and which doesn't properly work either should be removed as well. Some further cleanups should be performed to clarify this situation. This patch implements the points above, and it should be backported wherever commit `59c599f3f0` was backported.	2026-04-16 18:27:15 +02:00
Willy Tarreau	e375f1061a	MINOR: mux-h2: report glitches on early RST_STREAM We leverage the SE_FL_APP_STARTED flag to detect whether the application layer had a chance to run or not when an RST_STREAM is received. This allows us to triage RST_STREAM between regular ones and harmful ones, and to count glitches for them. It reveals extremely effective at detecting fast HEADERS+RST pairs. It could be useful to backport it to 3.2, though it depends on these two previous patches to be backported first (the first one was already planned and the second one is harmless, though will require to drop the haterm changes): BUG/MINOR: stconn: Always declare the SC created from healthchecks as a back SC MINOR: stconn: flag the stream endpoint descriptor when the app has started	2026-03-30 16:32:21 +02:00
Ilia Shipitsin	b7d1c2f91d	CLEANUP: fix typos and spelling in comments and documentation Corrected multiple spelling mistakes across CLI scripts, documentation, and source comments (e.g. "Specifiy" → "Specify", "explicitely" → "explicitly", "transfert" → "transfer", "resetted" → "reset", etc.). These changes improve readability and consistency without altering functionality.	2026-03-30 09:24:19 +02:00
Christopher Faulet	cd363e0246	MEDIUM: mux-h2: Stop dealing with HTX flags transfer in h2_rcv_buf() In h2_rcv_buf(), HTX flags are transfer with data when htx_xfer() is called. There is no reason to continue to deal with them in the H2 mux. In addition, there is no reason to set SE_FL_EOI flag when a parsing error was reported. This part was added before the stconn era. Nowadays, when an HTX parsing error is reported, an error on the sedesc should also be reported.	2026-03-23 14:02:43 +01:00
Christopher Faulet	d257dd4563	Revert "BUG/MEDIUM: mux-h2: make sure to always report pending errors to the stream" This reverts commit `44932b6c41`. The patch above was only necessary to handle partial headers or trailers parsing. There was nothing to prevent the H2 multiplexer to start to add headers or trailers in an HTX message and to stop the processing on error, leaving the HTX message with no EOH/EOT block. From the HTX API point of view, it is unexepected. And this was fixed thanks to the commit `ba7dc46a9` ("BUG/MINOR: h2/h3: Never insert partial headers/trailers in an HTX message"). So this patch can be reverted. It is important to not report a parsign error too early, when there are still data to transfer to the upper layer. This patch must be backport where `44932b6c4` was backported but only after backporting `ba7dc46a9` first.	2026-03-23 14:02:43 +01:00
Christopher Faulet	39121ceca6	MEDIUM: tree-wide: Rely on htx_xfer() instead of htx_xfer_blks() htx_xfer() function replaced htx_xfer_blks(). So let's use it.	2026-03-23 14:02:43 +01:00
Willy Tarreau	5d0f5f8168	MINOR: mux-h2: assign a limited frames processing budget This introduces 3 new settings: tune.h2.be.max-frames-at-once and tune.h2.fe.max-frames-at-once, which limit the number of frames that will be processed at once for backend and frontend side respectively, and tune.h2.fe.max-rst-at-once which limits the number of RST_STREAM frames processed at once on the frontend. We can now yield when reading too many frames at once, which allows to limit the latency caused by processing too many frames in large buffers. However if we stop due to the RST budget being depleted, it's most likely the sign of a protocol abuse, so we make the tasklet go to BULK since the goal is to punish it. By limiting the number of RST per loop to 1, the SSL response time drops from 95ms to 1.6ms during an H2 RST flood attack, and the maximum SSL connection rate drops from 35.5k to 28.0k instead of 11.8k. A moderate SSL load that shows 1ms response time and 23kcps increases to 2ms with 15kcps versus 95ms and 800cps before. The average loop time goes down from 270-280us to 160us, while still doubling the attack absorption rate with the same CPU capacity. This patch may usefully be backported to 3.3 and 3.2. Note that to be effective, this relies on the following patches: MEDIUM: sched: do not run a same task multiple times in series MINOR: sched: do not requeue a tasklet into the current queue MINOR: sched: do not punish self-waking tasklets anymore MEDIUM: sched: do not punish self-waking tasklets if TASK_WOKEN_ANY MEDIUM: sched: change scheduler budgets to lower TL_BULK	2026-03-23 07:14:22 +01:00
Willy Tarreau	932d77e287	MINOR: mux-h2: permit to fix a minimum value for the advertised streams limit When using rq-load on tune.h2.fe.max-concurrent-streams, it's easy to reach a situation where only one stream is allowed. There's nothing wrong with this but it turns out that slightly higher values do not necessarily cause significantly higher loads and will improve the user experience. For this reason the keyword now also supports "min" to specify a value. Experimentation shows that values from 5 to 15 remain very effective at protecting the run queue while allowing a great level of parallelism that keeps a site fluid.	2026-03-19 16:24:32 +01:00
Willy Tarreau	c238965b27	MINOR: mux-h2: permit to moderate the advertised streams limit depending on load Global setting tune.h2.fe.max-concurrent-streams now supports an optional "rq-load" option to pass either a target load, or a keyword among "auto" and "ignore". These are used to quadratically reduce the advertised streams limit when the thread's run queue size goes beyong the configured value, and automatically reduce the load on the process from new connections. With "auto", instead of taking an explicit value, it uses as a target the "tune.runqueue-depth" setting (which might be automatic). Tests have shown that values between 50 and 100 are already very effective at reducing the loads during attacks from 100000 to around 1500. By default, "ignore" is in effect, which means that the dynamic tuning is not enabled.	2026-03-19 16:24:31 +01:00
Willy Tarreau	b63492e4f4	MINOR: mux-h2: store the concurrent streams hard limit in the h2c The hard limit on the number of concurrent streams is currently determined only by configuration and returned by h2c_max_concurrent_streams(). However this doesn't permit to change such settings on the fly without risking to break connections, and it doesn't allow a connection to pick a different value, which could be desirable for example to try to slow abuse down. Let's store a copy of h2c_max_concurrent_streams() at connection creation time into the h2c as streams_hard_limit. This inflates the h2c size from 1324 to 1328 (0.3%) which is acceptable for the expected benefits.	2026-03-19 16:24:31 +01:00
Willy Tarreau	e31640368a	BUG/MINOR: mux-h2: properly ignore R bit in WINDOW_UPDATE increments The window size increments are 31 bits and the topmost bit is reserved and should be ignored, however it was not masked, so a peer sending it set would emit a negative value which could actually reduce the current window instead of increasing it. Note that the window cannot reach zero as there's already a test for this, but transfers could slow down to the same speed as if an initial window of just a few bytes had been advertised. Let's just mask the reserved bit before processing. This should be backported to all stable versions.	2026-03-19 07:21:47 +01:00
Willy Tarreau	0e231bbd7c	BUG/MINOR: mux-h2: properly ignore R bit in GOAWAY stream ID The stream ID indicated in GOAWAY frames must have its bit 31 (R) ignored and this wasn't the case. The effect is that if this bit was present, the GOAWAY frame would mark the last acceptable stream as negative, which is the default situation (unlimited), thus would basically result in this GOAWAY frame to be ignored since it would replace a negative last_sid with another negative one. The impact is thus basically that if a peer would emit anything non-zero in the R bit, the GOAWAY frame would be ignored and new streams would still be initiated on the backend, before being rejected by the server. Thanks to Haruto Kimura (Stella) for finding and reporting this bug. This fix needs to be backported to all stable versions.	2026-03-19 07:11:54 +01:00
Willy Tarreau	ec7b07b650	MINOR: connection: track mux calls to report their allocation context Most calls to mux ops were instrumented with a CALL_MUX_WITH_RET() or CALL_MUX_NO_RET() macro in order to make the current thread's context point to the called mux and be able to track its allocations. Only a bunch of harmless mux_ctl() and ->subscribe/unsubscribe calls were left untouched since useless. But destroy/detach/shut/init/snd_buf and rcv_buf are now tracked. It will not show allocations performed in IO callback via tasklet wakeups however. In order to ease reading of the output, cmp_memprof_ctx() knows about muxes and sorts based on the .subscribe function address instead of the mux_ops address so as to keep various callers grouped.	2026-03-12 18:06:38 +01:00
Christopher Faulet	64d997ebfc	MAJOR: muxes: No longer use app_ops .wake() callback function from muxes Thanks to previous commits, it is now possible to wake the data layer up, via a tasklet_wakeup, instead of using the app_ops .wake() callback function. When a data layer must be notified of a mux event (an error for instance), we now always perform a tasklet_wakeup(). TASK_WOKEN_MSG state is used by default. TASK_WOKEN_IO is eventually added if the data layer was subscribed to receives or sends. Changes are not trivial at all. We replaced a synchronous call to the sc_conn_process() function by a tasklet_wakeup().	2026-03-10 15:10:34 +01:00
Christopher Faulet	26a0817c1a	MINOR: muxes: Wakup the data layer from a mux stream with TASK_WOKEN_IO state Now, when a mux stream is waking its data layer up for receives or sends, it uses the TASK_WOKEN_IO state. The state is not used by the stconn I/O callback function for now.	2026-03-10 15:10:34 +01:00
Christopher Faulet	aea0d38fdd	MINOR: mux-h2: Rely on h2s_notify_send() when resuming h2s for sending In h2_resume_each_sending_h2s(), there was exactly the same code than h2s_notify_send(). So let's use h2s_notify_send() instead of duplicating code.	2026-03-10 15:10:34 +01:00
Willy Tarreau	9019a5db93	MEDIUM: counters: return aggregate extra counters in ->fill_stats() Now thanks to new macro EXTRA_COUNTERS_AGGR() we can iterate over all thread groups storages when returning the data for a given metric. This remains convenient and mostly transparent. The caller continues to pass the pointer to the metric in the first group, and offsets are calculated for all other groups and data summed. For now all groups except the first one contain only zeroes but reported values are nevertheless correct.	2026-02-26 17:03:53 +01:00
Willy Tarreau	de0eddf512	MINOR: counters: add EXTRA_COUNTERS_BASE() to retrieve extra_counters base storage The goal is to always retrieve the storage address of the first thread group for the given module. This will be used to iterate over all thread groups. For now it returns the same value as EXTRA_COUNTERS_GET().	2026-02-26 17:03:53 +01:00
Willy Tarreau	8dd22a62a4	CLEANUP: counters: only retrieve zeroes for unallocated extra_counters Since version 2.4 with commit `7f8f6cb926` ("BUG/MEDIUM: stats: prevent crash if counters not alloc with dummy one") we can afford to always update extra_counters because we know they're always either allocated or linked to a dedicated trash. However, the ->fill_stats() callbacks continue to access such values, making it technically possible to retrieve random counters from this trash, which is not really clean. Let's implement an explicit test in the ->fill_stats() functions to only return 0 for the metric when not allocated like this. It's much cleaner because it guarantees that we're returning an empty counter in this case rather than random values. The situation currently happens for dummy servers like the ones used in Lua proxies as well as those used by rings (e.g. used for logging or traces). Normally, none of the objects retrieved via stats or Prometheus is concerned by this unallocated extra_counters situation, so this is more about a cleanup than a real fix.	2026-02-26 08:24:03 +01:00
Willy Tarreau	95a9f472d2	MEDIUM: counters: change the fill_stats() API to pass the module and extra_counters We'll soon need to iterate over thread groups in the fill_stats() functions, so let's first pass the extra_counters and stats_module pointers to the fill_stats functions. They now call EXTRA_COUNTERS_GET() themselves with these elements in order to retrieve the required pointer. Nothing else changed, and it's getting even a bit more transparent for callers. This doesn't change anything visible however.	2026-02-26 08:24:03 +01:00
Willy Tarreau	44932b6c41	BUG/MEDIUM: mux-h2: make sure to always report pending errors to the stream Some checks are pending Contrib / build (push) Waiting to run Details alpine/musl / gcc (push) Waiting to run Details VTest / Generate Build Matrix (push) Waiting to run Details VTest / (push) Blocked by required conditions Details Windows / Windows, gcc, all features (push) Waiting to run Details Some stream parsing errors that do not affect the connection result in the parsed block not being transferred from the rx buffer to the channel and not being reported upstream in rcv_buf(), causing the stconn to time out. Let's detect this condition, and propagate term flags anyway since no more progress will be made otherwise. This should be backported at least till 3.2, probably even 2.8.	2026-02-26 00:30:42 +01:00
Willy Tarreau	e67e36c9eb	MINOR: mux-h2: add a new setting, "tune.h2.log-errors" to tweak error logging The H2 mux currently logs whenever some decoding fails. Most of the errors happen at the connection level, but some are even at the stream level, meaning that multiple logs can be emitted for a given connection, which can quickly use some resource for little value. This new setting allows to tweak this and decide to only log errors that affect the connection, or even none at all. This should be backported at least as far as 3.2.	2026-02-25 22:43:40 +01:00
Willy Tarreau	cad6e0b3da	MINOR: mux-h2: also count glitches on invalid trailers Two cases were not causing glitches to be incremented: - invalid trailers - trailers on closed streams This patch addresses this. It could be backported, at least to 3.2.	2026-02-25 22:03:16 +01:00
Christopher Faulet	36282ae348	MEDIUM: mux-h1/mux-h2/mux-fcgi/h3: Disable 0-copy for buffers of different size Today, it is useless to check the buffers size before performing a 0-copy in muxes when data are sent, but it will be mandatory when the large buffers support on channels will be added. Indeed, muxes will still rely on normal buffers, so we must take care to never swap buffers of different size.	2026-02-18 13:26:21 +01:00
Christopher Faulet	6bf450b7fe	MINOR: tree-wide: Use the buffer size instead of global setting when possible At many places, we rely on global.tune.bufsize value instead of using the buffer size. For now, it is not a problem. But if we want to be able to deal with buffers of different sizes, it is good to reduce as far as possible dependencies on the global value. most of time, we can use b_size() or c_size() functions. The main change is performed on the error snapshot where the buffer size was added into the error_snapshot structure.	2026-02-18 13:26:20 +01:00
Christopher Faulet	cda056b9f4	BUG/MEDIUM: mux-h2/quic: Stop sending via fast-forward if stream is closed If is illegal to send data if the stream is already closed. The case is properly handled when data are sent via snd_buf(), by draining the data. But it was still possible to process these data via nego_ff(). So, in this patch, both for the H2 and QUIC multiplexers, the fast-forward is disabled if the stream is closed and nothing is performed. Doing so, we will automatically fall back on the regular sending path and be able to drain data in snd_buf(). Thanks to Mike Walker for his investigation on the subject. This patch should be backported as far as 3.0.	2026-02-18 09:44:09 +01:00
Ilia Shipitsin	f8a77ecf62	CLEANUP: assorted typo fixes in the code, commits and doc Some checks failed Contrib / build (push) Has been cancelled Details alpine/musl / gcc (push) Has been cancelled Details VTest / Generate Build Matrix (push) Has been cancelled Details Windows / Windows, gcc, all features (push) Has been cancelled Details VTest / (push) Has been cancelled Details	2025-12-25 19:45:29 +01:00
Willy Tarreau	0901f60cef	MINOR: mux-h2: perform a graceful close at 75% glitches threshold This avoids hitting the hard wall for connections with non-compliant peers that would be accumulating errors over long connections. We now permit to recycle the connection early enough to reset the connection counter. This was tested artificially by adding this to h2c_frt_handle_headers(): h2c_report_glitch(h2c, 1, "new stream"); or this to h2_detach(): h2c_report_glitch(h2c, 1, "detaching"); and injecting using h2load -c 1 -n 1000 0:4445 on a config featuring tune.h2.fe.glitches-threshold 1000: finished in 8.74ms, 85802.54 req/s, 686.62MB/s requests: 1000 total, 751 started, 751 done, 750 succeeded, 250 failed, 250 errored, 0 timeout status codes: 750 2xx, 0 3xx, 0 4xx, 0 5xx traffic: 6.00MB (6293303) total, 132.57KB (135750) headers (space savings 29.84%), 5.86MB (6144000) data min max mean sd +/- sd time for request: 9us 178us 10us 6us 99.47% time for connect: 139us 139us 139us 0us 100.00% time to 1st byte: 339us 339us 339us 0us 100.00% req/s : 87477.70 87477.70 87477.70 0.00 100.00% The failures are due to h2load not supporting reconnection.	2025-12-20 19:26:29 +01:00
Willy Tarreau	52adeef7e1	MINOR: mux-h2: add missing glitch count for non-decodable H2 headers One rare error case could produce a protocol error on the stream when not being able to decode response headers wasn't being accounted as a glitch, so let's fix it.	2025-12-20 19:11:16 +01:00
Willy Tarreau	9a046fc3ad	BUG/MEDIUM: mux-h2: synchronize all conditions to create a new backend stream In H2 the conditions to create a new stream differ for a client and a server when a GOAWAY was exchanged. While on the server, any stream whose ID is lower than or equal to the one advertised in GOAWAY is valid, for a client it's forbidden to create any stream after receipt of a GOAWAY, even if its ID is lower than or equal to the last one, despite the server not being able to tell the difference from the number of streams in flight. Unfortunately, the logic in the code did not always reflect this specificity of the client (the backend code in our case), and most often considered that it was still permitted to create a new stream until the max_id was greater than or equal to the advertised last_id. This is for example what h2c_is_dead() and h2c_streams_left() do. In other places, such as h2_avail_streams(), the rule is properly taken into account. Very often the advertised last_id is the same, and this is also what haproxy does (which explains why it's impossible to reproduce the issue by chaining two haproxy layers), but a server may wish to advertise any ID including 2^31-1 as mentioned in the spec, and in this case the functions would behave differently. This discrepancy results in a corner case where a GOAWAY received on an idle connection will cause the next stream creation to be initially accepted but then rejected via h2_avail_streams(), and the connection left in a bad state, still attached to the session due to http-reuse safe, but not reinserted into idle list, since the backend code currently is not able to properly recover from this situation. Worse, the idle flags are no longer on it but TASK_F_USR1 still is, and this makes the recently added BUG_ON() rightfully trigger since this case is not supposed to happen. Admittedly more of the backend recovery code needs to be reworked, however the mux must consistently decide whether or not a connection may be reused or needs to be released. This commit fixes the affected logic by introducing a new function "h2c_reached_last_stream()" which says if a connection has reached its last stream, regardless of the side, and using this one everywhere max_id was compared to last_id. This is sufficient to address the corner case that be_reuse_connection() currently cannot recover from. This is in relation to GH issue #3215 and it should be sufficient to fix the issue there. Thanks to Chris Staite for reporting the issue and kudos to Amaury for spotting the events sequence that can lead to this situation. This patch must be backported to 3.3 first, then to older versions later. It's worth noting that it's much more difficult to observe the issue before 3.3 because the BUG_ON() is not there, and the possibly non-released connection might end up being killed for other reasons (timeouts etc). But one possible visible effect might be the impossibility to delete a server (which Chris observed in 3.3).	2025-12-18 17:01:32 +01:00
Willy Tarreau	3ec5818807	MINOR: h2/trace: emit a trace of the received RST_STREAM type Right now we don't get any state trace when receiving an RST_STREAM, and this is not convenient because RST_STREAM(0) is not visible at all, except in developer level because the function is entered and left. Let's extract the RST code first and always log it using TRACE_PRINTF() (along with h2c/h2s) so that it's possible to detect certain codes being used.	2025-12-10 15:58:56 +01:00
Christopher Faulet	8e08a635eb	MINOR: muxes: Support an optional ALPN string when defining mux protocols When a multiplexer protocol is defined, it is now possible to specify the ALPN it supports, in binary format. This info is optionnal. For now only the h2 and the h1 multiplexers define an ALPN because this will be mandatory for a fix. But this could be used in future for different purpose. This patch will be mandatory for the next fix.	2025-11-20 16:14:52 +01:00
Willy Tarreau	4a6dec7193	DEBUG: servers: add a few checks for stress-testing idle conns The latest idle conns fix `9481cef948` ("BUG/MEDIUM: connection: do not reinsert a purgeable conn in idle list") addresses a very hard-to-hit case which manifests itself with an attempt to reuse a connection fails because conn->mux is NULL: Program terminated with signal SIGSEGV, Segmentation fault. #0 0x0000655410b8642c in conn_backend_get (reuse_mode=4, srv=srv@entry=0x6554378a7140, sess=sess@entry=0x7cfe140948a0, is_safe=is_safe@entry=0, hash=hash@entry=910818338996668161) at src/backend.c:1390 1390 if (conn->mux->takeover && conn->mux->takeover(conn, i, 0) == 0) { However the condition that leads to this situation can be detected earlier, by the presence of the connection in the toremove_list, whose race window is much larger and easier to detect. This patch adds a few BUG_ON_STRESS() at selected places that an detect this condition. When built with -DDEBUG_STRESS and run under stress with two distinct processes communicating over H2 over SSL, under a stress of 400-500k req/s, the front process usually crashes in the first 10-30s triggering in _srv_add_idle() if the fix above is reverted (and it does not crash with the fix). This is mainly included to serve as an illustration of how to instrument the code for seamless stress testing.	2025-11-14 17:00:17 +01:00
Amaury Denoyelle	9481cef948	BUG/MEDIUM: connection: do not reinsert a purgeable conn in idle list Some checks failed Contrib / build (push) Has been cancelled Details alpine/musl / gcc (push) Has been cancelled Details VTest / Generate Build Matrix (push) Has been cancelled Details Windows / Windows, gcc, all features (push) Has been cancelled Details VTest / (push) Has been cancelled Details A recent patch was introduced to fix a rare race condition in idle connection code which would result in a crash. The issue is when MUX IO handler run on top of connection moved in the purgeable list. The connection would be considered as present in the idle list instead, and reinserted in it at the end of the handler while still in the purge list. `096999ee20` BUG/MEDIUM: connections: permit to permanently remove an idle conn This patch solves the described issue. However, it introduces another bug as it may clear connection flag when removing a connection from its parent list. However, these flags now serve primarily as a status which indicate that the connection is accounted by the server. When a backend connection is freed, server idle/used counters are decremented accordingly to these flags. With the above patch, an incorrect counter could be adjusted and thus wrapping would occured. The first impact of this bug is that it may distort the estimated number of connections needed by servers, which would result either in poor reuse rate or too many idle connections kept. Another noticeable impact is that it may prevent server deletion. The main problem of the original and current issues is that connection flags are misinterpreted as telling if a connection is present in the idle list. As already described here, in fact these flags are solely a status which indicate that the connection is accounted in server counters. Thus, here are the definitive conclusion that can be learned here : * (conn->flags & CO_FL_LIST_MASK) == 1: the connection is accounted by the server it may or may not be present in the idle list * (conn->flags & CO_FL_LIST_MASK) == 0 the connection is not accounted and not present in idle list The discussion above does not mention session list, but a similar pattern can be observed when CO_FL_SESS_IDLE flag is set. To keep the original issue solved and fix the current one, IO MUX handlers prologue are rewritten. Now, flags are not checked anymore for list appartenance and LIST_INLIST macro is used instead. This is definitely clearer with conn_in_list purpose here. On IO MUX handlers end, conn idle flags may be checked if conn_in_list was true, to reinsert the connection either in idle or safe list. This is considered safe as no function should modify idle flags when a connection is not stored in a list, except during conn_free() operation. This patch must be backported to every stable versions after revert of the above commit. It should be appliable up to 3.0 without any issue. On 2.8 and below, <idle_list> connection member does not exist. It should be safe to check <leaf_p> tree node as a replacement.	2025-11-14 16:06:34 +01:00
Amaury Denoyelle	d79295d89b	Revert "BUG/MEDIUM: connections: permit to permanently remove an idle conn" The target patch fixes a rare race condition which happen when a MUX IO handler is working on a connection already moved into the purge list. In this case, the handler will incorrectly moved back the connection into the idle list. To fix this, conn_delete_from_tree() was extended to remove flags along with the connection from the idle list. This was performed when the connection is moved into the purge list. However, it introduces another issue related to the idle server connection accounting. Thus it is necessary to revert it prior to the incoming newer fix. This patch must be backported to every version where the original commit is.	2025-11-14 16:06:34 +01:00
Willy Tarreau	5fe4677231	MINOR: server: move the lock inside srv_add_idle() Almost all callers of _srv_add_idle() lock the list then call the function. It's not the most efficient and it requires some care from the caller to take care of that lock. Let's change this a little bit by having srv_add_idle() that takes the lock and calls _srv_add_idle() that is now inlined. This way callers don't have to handle the lock themselves anymore, and the lock is only taken around the sensitive parts, not the function call+return. Interestingly, perf tests show a small perf increase from 2.28-2.32M RPS to 2.32-2.37M RPS on a 128-thread system.	2025-11-06 13:16:24 +01:00
Willy Tarreau	096999ee20	BUG/MEDIUM: connections: permit to permanently remove an idle conn There's currently a function conn_delete_from_tree() which is used to detach an idle connection from the tree it's currently attached to so that it is no longer found. This function is used in three circumstances: - when picking a new connection that no longer has any avail stream - when temporarily working on the connection from an I/O handler, in which case it's re-added at the end - when killing a connection The 2nd case above is quite specific, as it requires to preserve the CO_FL_LIST_MASK flags so that the connection can be re-inserted into the proper tree when leaving the handler. However, there's a catch. When killing a connection, we want to be certain it will not be reinserted into the tree. The flags preservation is causing a tiny race if an I/O happens while the connection is in the kill list, because in this case the I/O handler will note the connection flags, do its work, then reinsert the connection where it believed it was, then the connection gets purged, and another user can find it in the tree. The issue is very difficult to reproduce. On a 128-thread machine it happens in H2 around 500k req/s after around 50M requests. In H1 it happens after around 1 billion requests. The fix here consists in passing an extra argument to the function to indicate if the removal is permanent or not. When it's permanent, the function will clear the associated flags. The callers were adjusted so that all those dequeuing a connection in order to kill it do it permanently and all other ones do it only temporarily. A slightly different approach could have worked: the function could always remove all flags, and the callers would need to restore them. But this would require trickier modifications of the various call places, compared to only passing 0/1 to indicate the permanent status. This will need to be backported to all stable versions. The issue was at least reproduced since 3.1 (not tested before). The patch will need to be adjusted for 3.2 and older, because a 2nd argument "thr" was added in 3.3, so the patch will not apply to older versions as-is.	2025-11-05 11:08:25 +01:00
Willy Tarreau	59c599f3f0	BUG/MEDIUM: mux-h2: make sure not to move a dead connection to idle In h2_detach(), it looks possible to place a dead connection back to the idle list, and to later call h2_release() on it once detected as dead. It's not certain that it happens but nothing in the code shows it is not possible, so better make sure it cannot happen. This should be preventively backported to all versions.	2025-11-05 11:08:25 +01:00
Willy Tarreau	a1f26ca307	BUG/MINOR: mux-h2: send the preface along with the first request if needed Some checks are pending Contrib / build (push) Waiting to run Details alpine/musl / gcc (push) Waiting to run Details VTest / Generate Build Matrix (push) Waiting to run Details VTest / (push) Blocked by required conditions Details Windows / Windows, gcc, all features (push) Waiting to run Details Tests involving 0-RTT and H2 on the backend show that 0-RTT is being partially used but does not work. The analysis shows that only the preface and settings are sent using early-data and the request is sent separately. As explained in the previous patch, this is caused by the fact that a wakeup of the iocb is needed just to send the preface, then a new call to process_stream is needed to try sending again. Here with this patch, we're making h2_snd_buf() able to send the preface if it was not yet sent. Thanks to this, the preface, settings and first request can now leave as a single TCP segment. In case of TLS with 0-RTT, it now allows all the block to leave in early data. Even in clear-text H2, we're now seeing a 15% lower context-switch count, and the number of calls to process_stream() per connection dropped from 3 to 2. The connection rate increased by an extra 9.5%. Compared to without the last 3 patches, this is a 22% reduction of context-switches, 33% reduction of process_stream() calls, and 15.7% increase in connection rate. And more importantly, 0-RTT now really works with H2 on the backend, saving one full RTT on the first request. This fix is only for a missed optimization and a non-functional 0-RTT on the backend. It's worth backporting it, but it doesn't cause enough harm to hurry a backport. Better wait for it to live a little bit in 3.3 (till at least a week or two after the final release) before backporting it. It's not sure that it's worth going beyond 3.2 in any case. It depends on the these two previous commits: MEDIUM: mux-h2: do not needlessly refrain from sending data early MINOR: mux-h2: extract the code to send preface+settings into its own function	2025-10-30 18:16:54 +01:00
Willy Tarreau	d5aa3e19cc	MINOR: mux-h2: extract the code to send preface+settings into its own function The code that deals with sending preface + settings and changing the state currently is in h2_process_mux(), but we'll want to do it as well from h2_snd_buf(), so let's move it to a dedicate function first. At this point there is no functional change.	2025-10-30 18:16:54 +01:00

1 2 3 4 5 ...

1085 commits