Commit graph

21191 commits

Author SHA1 Message Date
Olivier Houchard
915a58c3c1 BUG/MINOR: h2: make tune.h2.log-errors actually work
Commit e67e36c9eb introduced
tune.h2.log-errors, that would let you pick if you wanted to know about
stream errors, connection errors, or no error.
However, a logic error made it so no error will be picked for any value
except for "none", in which case connection would be picked. Fix that by
just checking the strcmp() return value correctly.

This should be backported wherever e67e36c9eb
has been backported.
2026-04-23 08:04:43 +02:00
Emeric Brun
dbf471f99a BUG/MAJOR: net_helper: ip.fp infinite loop on malformed tcp options
Some checks are pending
Contrib / admin/halog/ (push) Waiting to run
Contrib / dev/flags/ (push) Waiting to run
Contrib / dev/haring/ (push) Waiting to run
Contrib / dev/hpack/ (push) Waiting to run
Contrib / dev/poll/ (push) Waiting to run
VTest / Generate Build Matrix (push) Waiting to run
VTest / (push) Blocked by required conditions
Windows / Windows, gcc, all features (push) Waiting to run
A malformed tcp option with an option length set to 0 can cause
an infinite loop on ip.fp converter.

The patch also forces the computation to use an unsigned char to
avoid a shift back during the parsing.

This fix should be backported on all versions including the ip.fp
converter.
2026-04-22 16:52:30 +02:00
Willy Tarreau
8f7ee0a59f BUG/MINOR: mux-h2: count a proto error when rejecting a stream on parsing error
The proxy error counter was not updated in h2c_frt_handle_headers() in
case of failure to decode a HEADERS frame. Make sure to keep it updated.
This can be backported to all stable versions.
2026-04-22 15:57:20 +02:00
Willy Tarreau
c73a81469e BUG/MINOR: mux-h2: count a protocol error when failing to parse a trailer
Commit aab1a60977 ("BUG/MEDIUM: h2/htx: always fail on too large trailers")
explicitly returned an RST_STREAM on failure to decode some trailers, and
used the code H2_ERR_INTERNAL_ERROR. However there are multiple possible
causes for this failure to happen, and it turns out that it's much more
likely to be related to a protocol error than a decompression error. So
let's change this to PROTOCOL_ERROR, and count a protocol error on the
proxy and in the session.

This can be backported to all stable versions (with adjustments related
to these versions, maybe focusing on 3.2 max is reasonable).
2026-04-22 15:57:20 +02:00
Christopher Faulet
24e05fe33a MINOR: stream: Use a pcli transaction to replace pcli_* members
A new type of transaction was introduced for master-cli streams. So
SF_TXN_PCLI flag and functions to allocate and destroy PCLI transactions
were added.

In the stream structure, all pcli_* members were moved in the pcli
transaction and the txn union was updated accordingly.

When it was ambiguous, a test on the transaction type was performed. For
instance to destroy the transaciton.
2026-04-22 15:19:12 +02:00
Christopher Faulet
9d45929341 MINOR: stream: Add flags to identify the stream tansaction when allocated
To be able to deal with different types of transaction for a stream, new
stream flags was added to know the transaction type when allocated. For now
only HTTP transactions can be allocated, so only SF_TXN_HTTP was
introduced. The mask SF_TXN_MASK must be used to get the transaction type.

The transaction type is set when it is allocated and removed when it is
destroyed.
2026-04-22 15:19:12 +02:00
Christopher Faulet
594753238c MINOR: stream: Move the HTTP txn in an union
The HTTP transaction is moved in an union. For now, it is the only possible
transaction that can be allocated. But that will change. Thanks to this
commit and the next one, it will be possible to deal with different kind of
transactions for a stream.

This patch looks quite huge, but it is more or less a renaming of all
accesses to "txn" field by "txn.http".
2026-04-22 15:19:12 +02:00
Christopher Faulet
80776da297 MEDIUM: cli: increase the payload pattern up to 64 bytes
The maximum size allowed for the payload pattern was increase up to 64 bytes
(65 bytes because of the trailing \0), to be able to use a sha256 of random
data for instance. It could be useful to prevent any data smuggling on the
payload.

Note that on the CLI, it could be possible to have only the buffer size as a
limit, because the command line is only consumed once all commands are
executed. The payload pattern is only a pointer in the buffer where the
command line was copied. However, for the master CLI, the data are streamed
to the worker, so we must keep a copy of he payload pattern. This is why we
must limit its size.
2026-04-22 15:19:12 +02:00
Christopher Faulet
9b1f0a3553 MEDIUM: cli: Add support for dynamically allocated payloads
It is now possible to deal with too big payload to fit in a buffer, without
changing the buffer size. By default, a payload up to 128 KB can be
dynamically allocated. "tune.cli.max-payload-size" global parameter can be
used to change this value, with some caution for huge values.

For CLI command handler functions, there is no change at all. A pointer on
the payload is still passed as parameter. Internally, an area is allocated
for the payload only if it is too big.

The payload pattern used to detect the end of the payload is part from the
allocated area.
2026-04-22 15:19:12 +02:00
Christopher Faulet
c5ae0da622 MEDIUM: cli: Make a buffer for the command payload
The payload is now saved as a buffer in the CLI context instead of a simple
pointer. It is mandatory to be able to reallocate the payload if it is too
big.
2026-04-22 15:19:12 +02:00
Christopher Faulet
337a8dac62 MINOR: cli: Handle the paylod pattern as a pointer in the cmdline buffer
Instead of copying the payload pattern in the CLI context, we now only save
a pointer on this pattern. It is possible because the command line is copied
in the CLI context. Arguments are already handled this way when the command
is processed.
2026-04-22 15:19:12 +02:00
Willy Tarreau
2f61566b03 CLEANUP: mux-h1: remove the unneeded test on conn->owner in h1s_finish_detach()
Some checks are pending
Contrib / admin/halog/ (push) Waiting to run
Contrib / dev/flags/ (push) Waiting to run
Contrib / dev/haring/ (push) Waiting to run
Contrib / dev/hpack/ (push) Waiting to run
Contrib / dev/poll/ (push) Waiting to run
VTest / Generate Build Matrix (push) Waiting to run
VTest / (push) Blocked by required conditions
Windows / Windows, gcc, all features (push) Waiting to run
There was a test below the "release" label on conn->owner to decide
whether to kill the connection or not. But this test is not needed,
because:
  - for frontends, it's always set so the test never matches
  - for backends, it was NULL on the second stream once a request
    was being reused from an idle pool, so it couldn't be used to
    discriminate between connections. In practice, the goal was to
    try to detect certain dead connections but all cases leading to
    such connections are either already handled in the tests before
    (which don't reach this label), or are handled by the other
    conditions.

Thus, let's remove this confusing test.
2026-04-21 08:45:46 +02:00
Willy Tarreau
2e26e427a2 CLEANUP: mux-h1: avoid using conn->owner in uncertain areas
Some places use conn->owner to retrieve the session. It's valid because
each time it is done, it's on the frontend, though it's not always 100%
obvious and sometimes requires deep code analysis. Let's clarify these
points and even rely on an intermediary variable to make it clearer. One
case where the owner couldn't differ from the session without being NULL
was also eliminated.
2026-04-21 08:45:46 +02:00
Willy Tarreau
d93c53b0df MEDIUM: session: always reset the conn->owner on backend when installing mux
When installing a mux on the backend, unless we have a good reason for
keeping the session set in conn->owner, we must reset it. Having the
session there just hides potential bugs and prevents certain tests from
being properly done.

Now it is much clearer: conn->owner remains set to the session on
frontend connections, is set to the session when the connection is
private or assimilated private and belongs to the session list, or
is NULL.
2026-04-21 08:45:46 +02:00
Willy Tarreau
90b2154d93 MEDIUM: muxes: always set conn->owner to the session that owns the connection
When an idle connection is private or considered private, session_add_conn()
is called to add it to the list of connections owned by the session. But
in case of allocation failure, the session is not set, which results in
a long list of possible situations that are all corner cases which are
difficult to test (and debug).

This commit relies on the fact that it is already permitted to have
conn->owner pointing to a session even if the connection couldn't be
added to the session's list, as this was already the case in
conn_backend_get() when dealing with HOL_RISK. Also as seen in commit
3aab17bd56 added in 2.4, it is already possible to have conn->owner
set with the connection not being in a list, and only the list element
is checked for this.

This commit modifies session_add_conn() to always set conn->onwer, even
if the list element couldn't be allocated. This way it's possible to
always refer to conn->owner to find the session owning a private conn
even in case of failure to allocate an entry. This requires to change
the checks on conn->owner to a check of the list element to see if the
connection belongs to a session, the pre-assignment of sess to
conn->owner in conn_backend_get() is no longer needed, same for the
pre-assignment in http_wait_for_response(), and that's all.

The H1 mux remained unchanged because since it cannot multiplex, in
case it fails to allocate a pconn, it instantly kills the connection.
2026-04-21 08:45:46 +02:00
Willy Tarreau
9141d87830 BUG/MINOR: sample: adjust dependencies for channel output bytes counters
The bytes_in, bytes_out, {req,res}.bytes_{in,out} sample fetch functions
are marked as internal dependencies only. But that's not exact, they are
statistics. Request traffic (bytes_in, req.bytes*) is usable starting
from the request, while response traffic (bytes_out, res.bytes*) is usable
as soon as a response begins to be received, and all are valid till the
end of the transaction.

The impact is that the log-format below:

  log-format "req.bytes_in=%[req.bytes_in] req.bytes_out=%[req.bytes_out] res.bytes_in=%[res.bytes_in] res.bytes_out=%[res.bytes_out]"

is emitted too early and only logs zeroes when uploading 1MB and
downloading 1MB:

  req.bytes_in=0 req.bytes_out=0 res.bytes_in=15288 res.bytes_out=0

This patch marks the request stats RQFIN and the response stats RSFIN,
so that they're valid at any moment and the logs backend knows it must
wait for the latest moment to emit such a line. With this change, the
line above now correctly produces:

  req.bytes_in=1000157 req.bytes_out=1000157 res.bytes_in=1048629 res.bytes_out=1048629

This should be backported as far as the latest LTS probably, along with
these 2 previous patches:

  BUG/MINOR: log: consider format expression dependencies to decide when to log
  MINOR: sample: make RQ/RS stats available everywhere
2026-04-21 08:01:07 +02:00
Willy Tarreau
6df10d0802 MINOR: sample: make RQ/RS stats available everywhere
Sample fetch functions working on the request/response stats were marked
as being only compatible with the log phase. This is a mistake because
by definitions, stats can be consulted anywhere from the moment they
start to appear. It's only that they are valid as far as the logs. At the
moment, no sample fetch function depends on RQFIN, and only res.timer.data
depends on RSFIN. But this will be needed to relax certain sample fetch
functions (and will need to be backported along with a few other patches).
2026-04-21 08:01:07 +02:00
Willy Tarreau
e51be30f78 BUG/MINOR: log: consider format expression dependencies to decide when to log
Log-format properly takes into account the LW_* flags set by the log
aliases, however its consideration for the sample fetch expressions is
very minimalistic (HTTP y/n). It poses a problem because logging some
statistics doesn't work unless some log aliases are involved to force
the log to wait till the end.

Before this change, the following log-format:

    log-format "res.timer.data=%[res.timer.data]"

would log "res.timer.data=0" regardless of the time taken to transfer
data, and the log would be emitted instantly. However, this line:

    log-format "res.timer.data=%[res.timer.data] %B"

would properly log the time taken to transfer the data because %B which
carries the log flag LW_BYTES forces the log to wait till the end.

This patch makes sure that anything requiring response (headers or body)
waits for at least the response, and that anything requiring response body
or end of transfer (req/res) waits till the end (LW_BYTES). Thanks to
this, the log above is now correct even without the "%B" hack.

This should be backported at least till the latest LTS.
2026-04-21 08:01:07 +02:00
William Lallemand
95c400d08e MINOR: acme: allow IP SAN in certificate request
Some checks are pending
Contrib / admin/halog/ (push) Waiting to run
Contrib / dev/flags/ (push) Waiting to run
Contrib / dev/haring/ (push) Waiting to run
Contrib / dev/hpack/ (push) Waiting to run
Contrib / dev/poll/ (push) Waiting to run
VTest / Generate Build Matrix (push) Waiting to run
VTest / (push) Blocked by required conditions
Windows / Windows, gcc, all features (push) Waiting to run
Implement IP in both requestOrder and CSR so a certificate with SAN IPs
can be generated.
2026-04-20 18:10:47 +02:00
William Lallemand
0d14bb7473 MINOR: acme: implement draft-ietf-acme-profiles
The ACME Profiles extension (draft-ietf-acme-profiles) allows a client
to request a specific certificate profile by including a "profile" field
in the newOrder request. This lets the CA select the appropriate
certificate issuance policy (e.g. "classic", "shortlived") for a given
order.

A new "profile" keyword is added to the acme section. When set, its
value is included in the newOrder JSON payload sent to the CA.
2026-04-20 18:10:35 +02:00
Olivier Houchard
78712c3898 BUG/MEDIUM: checks: Don't forget to set the "alt_proto" field
The target address type has been added to checks in commit
d759e60a32, but as part of that address
type is the "alt_proto" field, that was not properly set for dynamic
servers, That could lead to checks not working for any protocol that use
a non-zero alt_proto, such as QUIC. So set it properly.
2026-04-20 11:59:44 +02:00
William Lallemand
95e9629530 BUILD: ssl/sample: potential null pointer dereference in sample_conv_aes
gcc flags aead_tag_trash as potentially NULL at the chunk_memcpy call
inside the (!dec && gcm) block, because it cannot correlate the
condition with the allocation that only happens in that same branch. Add
an explicit NULL check to silence the warning.

This was caught by cross-zoo.yml:

In file included from include/haproxy/connection.h:28,
                 from src/ssl_sample.c:27:
In function ‘b_orig’,
    inlined from ‘sample_conv_aes’ at src/ssl_sample.c:540:23:
include/haproxy/buf.h:80:17: error: potential null pointer dereference [-Werror=null-dereference]
   80 |         return b->area;
      |                ~^~~~~~
In function ‘b_data’,
    inlined from ‘sample_conv_aes’ at src/ssl_sample.c:540:3:
include/haproxy/buf.h💯17: error: potential null pointer dereference [-Werror=null-dereference]
  100 |         return b->data;
      |                ~^~~~~~
2026-04-20 11:00:24 +02:00
Amaury Denoyelle
1f435f031b BUG/MINOR: xprt_qstrm: reduce max record length check
When trying to read QMux transport parameters frame, the record length
is checked to ensure it is not bigger than the buffer size. The
objective is to detect as soon as possible when receiving data that
cannot be handled and to close the connection.

In fact, this check is not accurate, as it did not take into account the
size of the Record length field itself. This patch fixes the comparison
by substracting with the size of the decoded varint.

No need to backport.
2026-04-20 10:21:30 +02:00
Amaury Denoyelle
0610b4487b BUG/MINOR: xprt_qstrm: read record length in 64bits
QMux record lengths are encoded as a QUIC varint. Thus in theory, it
requires a 64bits integer to be able to read the whole value. In
practice, if the record is bigger than bufsize, read operation cannot be
completed and an error must be reported.

This patch fixes record length decoding both in xprt_qstrm layer, which
is now performed in two steps. The value is first read in a 64bits
integer instead of a size_t whose size is dependent on the architecture.
Result is then checked against bufsize and if inferior stored in the
previously used variable (xprt ctx rxrlen member).

This should partially fix build issue reported on github #3334.

No need to backport.
2026-04-20 09:23:29 +02:00
Willy Tarreau
bb59ba0a98 BUILD: haterm: don't pass size_t to %lu in error messages
It fails on 32-bit systems, let's cast it to ulong like in other places.
No backport needed.
2026-04-18 11:25:30 +02:00
Amaury Denoyelle
1acf147e2a MINOR: mux-quic: release BE idle conn after GOAWAY reception
Some checks failed
Contrib / dev/flags/ (push) Has been cancelled
Contrib / dev/haring/ (push) Has been cancelled
Contrib / dev/hpack/ (push) Has been cancelled
Contrib / dev/poll/ (push) Has been cancelled
VTest / Generate Build Matrix (push) Has been cancelled
VTest / Alpine+musl, gcc (push) Has been cancelled
Windows / Windows, gcc, all features (push) Has been cancelled
VTest / (push) Has been cancelled
An idle backend connection is useless if a HTTP/3 GOAWAY frame has been
received. Indeed, it is forbid to open new stream on such connection.

Thus, this patch ensures such connections are removed as soon as
possible. This is performed via a new check in qcc_is_dead() on
QC_CF_CONN_SHUT flag for backend connections. This ensures that a shut
connection is released instead of being inserted in idle list on detach
operation.

This commits also completes qcc_recv() with a new call to qcc_is_dead()
on its ending. This is necessary if GOAWAY is received on an idle
connection. For now, this is only checked for backend connections as a
GOAWAY is without any real effect for frontend connections. Thus, this
extra protection ensures that we do not break by incident QUIC frontend
support.

qcc_io_recv() also performs qcc_decode_qcs(). However, an extra
qcc_is_dead() is not necessary in this case as the following
qcc_io_process() already performs it.
2026-04-17 13:28:17 +02:00
Amaury Denoyelle
220b1bf6d9 MEDIUM: h3: prevent new streams on GOAWAY reception
Implement the reception of a HTTP/3 GOAWAY frame. This is performed via
the new function h3_parse_goaway_frm(). The advertised ID is stored in
new <id_shut_r> h3c member. It serves to ensure that a bigger ID is not
advertised when receiving multiple GOAWAY frames.

GOAWAY frame reception is only really useful on the backend side for
haproxy. When this occurs, h3c is now flagged with H3_CF_GOAWAY_RECV.
Also, QCC is also updated with new flag QC_CF_CONN_SHUT. This flag
indicates that no new stream may be opened on the connection. Callback
avail_streams() is thus edited to report 0 in this case.
2026-04-17 13:28:17 +02:00
Amaury Denoyelle
5c8c9fc528 MINOR: h3: simplify GOAWAY local emission
Rework GOAWAY emission handling at the HTTP/3 layer. Previously, h3c
member <id_goaway> were updated during the connection on each new
streams attach. This ID was finally reused when a GOAWAY was emitted.

However, this is unnecessary to keep an updated ID during the connection
lifetime. Indeed, <largest_bidi_r> QCC member can be used for the same
purpose. Note that this is only useful for the frontend side. For a
client connection, GOAWAY contains a PUSH ID, thus 0 can be used for
now.

Thus, <id_goaway> in h3c is renamed <id_shut_l>. Now it is only sent
when the GOAWAY is emitted. This allows to reject any streams with a
greater ID. This approach is considered simpler.

Note that <largest_bidi_r> is not strictly similar to the obsolete
<id_goaway>. Indeed, if an error occurs before the corresponding stream
layer allocation, the former would still be incremented. However,
this is not a real issue as GOAWAY specification is clear that lower IDs
are not guaranteed to being handled well, until either the stream is
closed or resetted, or the whole connection is teared down.
2026-04-17 11:36:01 +02:00
Amaury Denoyelle
143d0034c9 BUG/MINOR: mux_quic: limit avail_streams() to 2^62
QUIC streams ID are encoded as 62-bit integer and cannot reuse an ID
within a connection. This is necessary to take into account this
limitation for backend connections.

This patch implements this via qmux_avail_streams() callback. In the
case where the connection is approaching the encoding limit, reduce the
advertised value until the limit is reached. Note that this is very
unlikely to happen as the value is pretty high.

This should be backported up to 3.3.
2026-04-17 11:36:01 +02:00
Aurelien DARRAGON
4945d02c99 MINOR: compression: prefix compression oriented functions with "comp_"
add comp_ prefix to all compression related functions, in anticipation
of decompression functions that will be integrated in the same file, so
we don't get mixed up between the two.

No change of behavior expected.
2026-04-17 08:26:56 +02:00
Willy Tarreau
a0541f5d21 BUG/MEDIUM: mux-h2: ignore conn->owner when deciding if a connection is dead
Originally, valid backend connections always used to have conn->owner
pointing to the owner session. In 1.9, commit 93c885 enforced this when
implementing backend H2 support by making sure that no orphaned connection
was left on its own with no remaining stream able to handle it.
Later, idle connections were reworked so that they were no longer
necessarily attached to a stream, but could be directly in the server,
accessed via a hash, so it started to become possible to have conn->owner
left to NULL when picking such a connection. It in fact happens for
http-reuse always, when the second stream picks the connection because
its owner is NULL and it's not changed.

More recently, a case was identified where it could be theoretically
possible to reinsert a dead connection into an idle list, and commit
59c599f3f0 ("BUG/MEDIUM: mux-h2: make sure not to move a dead
connection to idle") addressed that possibility in 3.3 by adding the
h2c_is_dead() test in h2_detach() before deciding to reinsert a
connection into the idle list.

Unfortunately, the combination of changes above results in the following
sequence being possible:
  - a stream requires a connection, connect_server() creates one, sets
    conn->owner to the session, then when the session is being set up,
    the SSL stack calls conn_create_mux() which gets the session from
    conn->owner, passes it to mux->init() (h2_init), which in turn
    creates the backend stream and assigns it this session.

  - when the stream ends, it detaches (h2_detach), and the call to
    h2c_is_dead() returns false because h2c->conn->owner is set. The
    connection is thus added into the server's idle list.

  - a new stream comes, it finds the connection in the server's list,
    which doesn't require to set conn->owner, the stream is added via
    h2_attach() which passes the stream's session, and that one is
    properly set on h2s again, but never on conn->owner.

  - the stream finishes, detaches, and this time the call to h2c_is_dead()
    sees the owner is NULL, thus indicates that the connection seems dead
    so it's not added again to the idle list, and it's destroyed.

Note that this most only happens at low loads (at most one active stream
per connection, so typically at most than one active stream per thread),
where the H2 reuse ratio on a server configured with http-reuse always
or http-reuse aggressive is close to 50%. At high loads, this is much more
rare, though looking at the reuse stats for a server, it's visible that a
sustained load still shows around 1% of the connections being periodically
renewed.

Interestingly, for RHTTP the impact is more important because there
was already a work around for this test in h2c_is_dead() but it uses
conn_is_reverse(), which is never correct in this case (it should be
called conn_to_reverse() because it says the conn must be reversed
and has not yet been), so this extra test doesn't protect against the
NULL check, and connections are closed after each stream is terminated
(if there is no other stream left).

After a long analysis with Amaury and Olivier, it was concluded that:
  - the h2c_is_dead() addition is finally not the best solution and
    could be refined, however in the current state it's a bit tricky.
  - the conn->owner test in h2c_is_dead() is no longer relevant,
    probably since 2.4 when connections were stored using hash_nodes
    in the servers and would no longer depend on a session, so that
    test should be removed.
  - the test conn_is_reverse() on the same line, that was added to
    ignore the former for RHTTP, and which doesn't properly work either
    should be removed as well.

Some further cleanups should be performed to clarify this situation.

This patch implements the points above, and it should be backported
wherever commit 59c599f3f0 was backported.
2026-04-16 18:27:15 +02:00
Willy Tarreau
0af603f46f MEDIUM: threads: change the default max-threads-per-group value to 16
A lot of our subsystems start to be shared by thread groups now
(listeners, queues, stick-tables, stats, idle connections, LB algos).
This has allowed to recover the performance that used to be out of
reach on losely shared platforms (typically AMD EPYC systems), but in
parallel other large unified systems (Xeon and large Arm in general)
still suffer from the remaining contention when placing too many
threads in a group.

A first test running on a 64-core Neoverse-N1 processor with a single
backend with one server and no LB algo specifiied shows 1.58 Mrps with
64 threads per group, and 1.71 Mrps with 16 threads per group. The
difference is essentially spent updating stats counters everywhere.

Another test is the connection:close mode, delivering 85 kcps with
64 threads per group, and 172 kcps (202%) with 16 threads per group.
In this case it's mostly the more numerous listeners which improve
the situation as the change is mostly in the kernel:

max-threads-per-group 64:
  # perf top
  Samples: 244K of event 'cycles', 4000 Hz, Event count (approx.): 61065854708 los
  Overhead  Shared Object     Symbol
    10.41%  [kernel]          [k] queued_spin_lock_slowpath
    10.36%  [kernel]          [k] _raw_spin_unlock_irqrestore
     2.54%  [kernel]          [k] _raw_spin_lock
     2.24%  [kernel]          [k] handle_softirqs
     1.49%  haproxy           [.] process_stream
     1.22%  [kernel]          [k] _raw_spin_lock_bh

  # h1load
  time conns tot_conn  tot_req      tot_bytes    err  cps  rps  bps   ttfb
     1  1024    84560    83536        4761666      0 84k5 83k5 38M0 11.91m
     2  1024   168736   167713        9559698      0 84k0 84k0 38M3 11.98m
     3  1024   253865   252841       14412165      0 85k0 85k0 38M7 11.84m
     4  1024   339143   338119       19272783      0 85k1 85k1 38M8 11.80m
     5  1024   424204   423180       24121374      0 84k9 84k9 38M7 11.86m

max-threads-per-group 16:
  # perf top
  Samples: 1M of event 'cycles', 4000 Hz, Event count (approx.): 375998622679 lost
  Overhead  Shared Object     Symbol
    15.20%  [kernel]          [k] queued_spin_lock_slowpath
     4.31%  [kernel]          [k] _raw_spin_unlock_irqrestore
     3.33%  [kernel]          [k] handle_softirqs
     2.54%  [kernel]          [k] _raw_spin_lock
     1.46%  haproxy           [.] process_stream
     1.12%  [kernel]          [k] _raw_spin_lock_bh

  # h1load
      time conns tot_conn  tot_req      tot_bytes    err  cps  rps  bps   ttfb
         1  1020   172230   171211        9759255      0 172k 171k 78M0 5.817m
         2  1024   343482   342460       19520277      0 171k 171k 78M0 5.875m
         3  1021   515947   514926       29350953      0 172k 172k 78M5 5.841m
         4  1024   689972   688949       39270207      0 173k 173k 79M2 5.783m
         5  1024   863904   862881       49184274      0 173k 173k 79M2 5.795m

So let's change the default value to 16. It also happens to match what's
used by default on EPYC systems these days.

This change was marked MEDIUM as it will increase the number of listening
sockets on some systems, to match their counter parts from other vendors,
which is easier for capacity planning.
2026-04-16 10:48:43 +02:00
Willy Tarreau
d7c747b572 BUG/MINOR: threads: properly set the number of tgroups when non using policy
When nbthread is set, the CPU policies are not used and do not set
nbthread nor nbtgroups. When back into thread_detect_count(), these
are set respectively to thr_max and 1. The problem which becomes very
visible with max-threads-per-group, is that setting this one in
combination with nbthreads results in only one group with the calculated
number of threads per group. And there's not even a warning. So basically
a configuration having:

   global
       nbthread 64
       max-threads-per-group 8

would only start 8 threads.

In this case, grp_min remains valid and should be used, so let's just
change the assignment so that the number of groups is always correct.

A few ifdefs had to move because the calculations were only made for
the USE_CPU_AFFINITY case. Now these parts have been refined so that
all the logic continues to apply even without USE_CPU_AFFINITY.

One visible side effect is that setting nbthread above 64 will
automatically create the associated number of groups even when
USE_CPU_AFFINITY is not set. Previously it was silently changed
to match the per-group limit.

Ideally this should be backported to 3.2 where the issue was
introduced, though it may change the behavior of configs that were
silently being ignored (e.g. "nbthread 128"), so the backport should
be considered with care. At least 3.3 should have it because it uses
cpu-policy by default so it's only for failing cases that it would be
involved.
2026-04-15 17:47:26 +02:00
William Lallemand
794737cc8d CLEANUP: acme: no need to reset ctx state and http_state before nextreq
The nextreq label already implement setting http_state to ACME_HTTP_REQ
and setting ctx->state to st. It is only needed to set the st variable
before jumping to nextreq.
2026-04-15 16:17:39 +02:00
William Lallemand
69211b869f BUG/MINOR: acme: fix fallback state after failed initial DNS check
When the opportunistic initial DNS check (ACME_INITIAL_RSLV_READY) fails,
the state machine was incorrectly transitioning to ACME_RSLV_RETRY_DELAY
instead of ACME_CLI_WAIT. This caused the challenge to enter the DNS retry
loop rather than falling back to the normal cond_ready flow that waits for
the CLI signal.

Also reorder ACME_CLI_WAIT in the state enum and trace switch to reflect
the actual execution order introduced in the previous commit: it comes after
ACME_INITIAL_RSLV_READY, not before ACME_INITIAL_RSLV_TRIGGER.

No backport needed.
2026-04-15 16:06:59 +02:00
William Lallemand
c295a5c861 MINOR: acme: opportunistic DNS check for dns-persist-01 to skip challenge-ready steps
For dns-persist-01, the "_validation-persist.<domain>" TXT record is set once
and never changes between renewals. Add an initial opportunistic DNS check
(ACME_INITIAL_RSLV_TRIGGER / ACME_INITIAL_RSLV_READY states) that runs before
the challenge-ready conditions are evaluated. If all domains already have the
TXT record, the challenge is submitted immediately without going through the
cli/delay/dns challenge-ready steps, making renewals faster once the record is
in place.

The new ACME_RDY_INITIAL_DNS flag is automatically set for
dns-persist-01 in cond_ready.
2026-04-15 15:57:57 +02:00
Willy Tarreau
5fe0579d49 MEDIUM: threads: start threads by groups
Till now, threads were all started one at a time from thread 1. This
will soon cause us limitations once we want to reduce shared stuff
between thread groups.

Let's slightly change the startup sequence so that the first thread
starts one initial thread for each group, and that each of these
threads then starts all other threads from their group before switching
to the final task. Since it requires an intermediary step, we need to
store that threads' start function to access it from the group, so it
was put into the tgroup_info which still has plenty of room available.

It could also theoretically speed up the boot sequence, though in
practice it doesn't change anything because each thead's initialization
is made one at a time to avoid races during the early boot. However
ther is now a function in charge of starting all extra threads of a
group, and whih is called from this group.
2026-04-15 15:53:56 +02:00
Amaury Denoyelle
e2dbcd20f2 MINOR: mux-quic: close connection when reaching max-total streams
This commit completes the previous one which implements a new setting to
limit the number of streams usable by a client on a QUIC connection.

When the connection becomes idle after reaching this limit, it is
immediately closed. This is implemented by extending checks in
qcc_is_dead(). This results in a CONNECTION_CLOSE emission, which is
useful to free resources as soon as possible.
2026-04-15 15:18:37 +02:00
Amaury Denoyelle
497cabd9e5 MEDIUM: quic: implement fe.stream.max-total
Implement a new setting to limit the total number of bidirectional
streams that the client may use on a single connection. By default, it
is set to 0 which means it is not limited at all.

If a positive value is configured, the client can only open a fixed
number of request streams per QUIC connection. Internally, this is
implemented in two steps :

* First, MAX_STREAMS_BIDI flow control advertizing will be reduced when
  approaching the limit before being completely turned off when reaching
  it. This guarantees that the client cannot exceed the limit without
  violating the flow control.

* Second, when attaching the latest stream with ID matching max-total
  setting, connection graceful shutdown is initiated. In HTTP/3, this
  results in a GOAWAY emission. This allows the remaining streams to be
  completed before the connection becomes completely idle.
2026-04-15 15:18:37 +02:00
Amaury Denoyelle
a7e1c82648 MINOR: mux-quic: perform app init in case of early shutdown
Adds a qcc_app_init() call in qcc_app_shutdown(). This is necessary if
shutdown is performed early, before any invokation of qcc_io_send().

Currently, this should never occur in practice. However, this will
become necessary with the new settings tune.quic.fe.stream.max-total.
Indeed, when using a very small value, app-ops layer may be closed early
in the connection lifetime.
2026-04-15 14:34:13 +02:00
Amaury Denoyelle
1038720675 MINOR: mux-quic: reorganize code for app init/shutdown
Refactor code related to app-layer init/shutdown operations. In short,
qcc_shutdown() is renamed to qcc_app_shutdown(). It is also moved next
to qcc_app_init() to better reflect their link.
2026-04-15 14:34:13 +02:00
Amaury Denoyelle
6c837723bf MINOR: mux-quic: improve documentation for qcs_attach_sc()
Complete function doc for qcs_attach_sc() by using the proper
terminology related to stream/stconn/sedesc. The purpose of this
function should be clearer now.
2026-04-15 14:34:13 +02:00
Emeric Brun
8f06c8fda4 BUG/MEDIUM: peers: trash of expired entries delayed after fullresync
stksess_new has set the entry expire to the table expire delay,
if it is a new entry, set_entry inserts at that position in the expire
tree. There was a touch_remote updating the expire setting but the
tree's re-ordering is not designed to set back in the past resulting
to an entry that will be trashed only after a full table's expire delay
regardless the expire set on the stktsess.

This patch sets the newts expire before the call of 'set_entry'.
This way a new inserted entry is set directly at the right position
in the tree to trash the entry in time.

This patch should be backported on all supported branches and at
least v2.8
2026-04-15 10:03:17 +02:00
Willy Tarreau
90e8ccd9c2 MINOR: sample: add new sample fetch functions reporting current CPU usage
Some checks are pending
Contrib / dev/flags/ (push) Waiting to run
Contrib / dev/haring/ (push) Waiting to run
Contrib / dev/hpack/ (push) Waiting to run
Contrib / dev/poll/ (push) Waiting to run
VTest / Generate Build Matrix (push) Waiting to run
VTest / (push) Blocked by required conditions
VTest / Alpine+musl, gcc (push) Waiting to run
Windows / Windows, gcc, all features (push) Waiting to run
Some features can automatically turn on or off depending on CPU usage,
but it's not easy to measure it. Let's provide 3 new sample fetch functions
reporting the CPU usage as measured inside haproxy during the previous
polling loop, and reported in "idle" stats header / "show info", or used
by tune.glitches.kill.cpu-usage, or maxcompcpuusage:

  - cpu_usage_thr: CPU usage between 0 and 100 of the current thread, used
    by functions above
  - cpu_usage_grp: CPU usage between 0 and 100, averaged over all threads of
    the same group as the current one.
  - cpu_usage_proc: CPU usage between 0 and 100, averaged over all threads
    of the current process

Note that the value will fluctuate since it only covers a few tens to
hundreds of requests of the last polling loop, but it reports what is
being used to take decisions.

It could also be used to disable some non-essential debugging/processing
under too high loads for example.
2026-04-14 17:47:18 +02:00
Willy Tarreau
630ef96f92 MINOR: sample: return the number of the current thread group
Just like we have a sample fetch function that returns the number of the
current thread, let's have the same with the thread group number. This
can be useful for troubleshooting, given that certain things are currently
per thread-group (e.g. idle backend connections, certain LB algos etc).
2026-04-14 17:05:34 +02:00
Willy Tarreau
b943d2a7eb CLEANUP: sample: fix the comment regarding the range of the thread sample fetch
The comment says "between 1 and nbthread" while it's in fact between 0 and
nbthread-1 and this is also documented like this in the config manual. No
backport needed though it cannot hurt.
2026-04-14 16:59:56 +02:00
Willy Tarreau
9c6e07c43f MINOR: stats: report the number of thread groups in "show info"
Since thread groups were enabled by default in 3.3, it has become an
important element of diagnostic that we're missing in "show info". Let's
add it under "NbThreadGroups".
2026-04-14 16:48:16 +02:00
William Lallemand
f28dd158ed MINOR: ssl: add TLS 1.2 values in HAPROXY_KEYLOG_XX_LOG_FMT
Add the CLIENT_RANDOM line for TLS1.2 in HAPROXY_KEYLOG_FC_LOG_FMT and
HAPROXY_KEY_LOG_BC_FMT. These are useful to produce a keylog file
compatible with both TLS1.3 and TLS1.2.
2026-04-14 16:03:25 +02:00
Christopher Faulet
7270bfcff5 BUG/MEDIUM: htx: Don't count delta twice when block value is replaced
A regression was introduced by the commit a8887e55a ("BUG/MEDIUM: htx: Fix
function used to change part of a block value when defrag").

When a block value was replaced and a defragmentation was performed, the
delta between the old value and the new one was counted twice. htx_defrag()
already is responsible to set the new size for the HTX message. So it must
not be performed in htx_replace_blk_value().

This patch must be backported with the commit above. So theorically to all
stable versions.
2026-04-14 14:07:21 +02:00
Christopher Faulet
d899f23017 BUG/MEDIUM: htx: Properly handle block modification during defragmentation
A regression was introcuded by the commit 0c6f2207f ("MEDIUM: htx: Refactor
htx defragmentation to merge data blocks").

When a defragmentation is performed, it is possible to alter a block
size. The main usage is to prepare a block value replacement. However, since
the commit above, the change is no longer handled. The block info are
changed but the size of the message is not modified accordingly.

This patch depends on the commit "MINOR: htx: Add helper function to get
type and size from the block info field"

No backport needed.
2026-04-14 14:07:21 +02:00
William Lallemand
3415abe56d MINOR: mjson: reintroduce mjson_next()
The lack of mjson_next() prevents to iterate easily and need to hack by
iterating on a loop of snprintf + $.field[XXX] combined with
mjson_find().

This reintroduce mjson_next() so we could iterate without having to
build the string.

The patch does not reintroduce MJSON_ENABLE_NEXT so it could be used
without having to define it.
2026-04-14 10:57:21 +02:00
William Lallemand
cf72132f22 MINOR: acme: display the type of challenge in ACME_INITIAL_DELAY
Some checks failed
Contrib / build (push) Waiting to run
VTest / Generate Build Matrix (push) Waiting to run
VTest / (push) Blocked by required conditions
Windows / Windows, gcc, all features (push) Waiting to run
alpine/musl / gcc (push) Has been cancelled
The ACME_INITIAL_DELAY state displays a message about 'dns-01', but this
state is also used for 'dns-persist-01'.

This patch displays the challenge that was configured instead of dns-01
2026-04-14 10:16:11 +02:00
Tim Duesterhus
ed0c51d2c0 MINOR: http_fetch: Add support for checks to unique-id fetch
Some checks are pending
Contrib / build (push) Waiting to run
alpine/musl / gcc (push) Waiting to run
VTest / Generate Build Matrix (push) Waiting to run
VTest / (push) Blocked by required conditions
Windows / Windows, gcc, all features (push) Waiting to run
This allows to use the `unique-id` fetch within `tcp-check` or `http-check`
ruleset. The format is taken from the checked server's backend (which is
naturally inherited from the corresponding `defaults` section).

This is particularly useful with

    http-check send ... hdr request-id %[unique-id]

to ensure all requests sent by HAProxy have a unique ID header attached.

This resolves GitHub Issue #3307.

Reviewed-by: Volker Dusch <github@wallbash.com>
2026-04-13 20:02:21 +02:00
Tim Duesterhus
2c748125f5 MINOR: check: Support generating a unique_id for checks
This implementation is directly modeled after `stream_generate_unique_id()` and
the corresponding `unique_id` field on `struct stream`.

It will be used in a future commit to enable the use of the `%[unique-id]`
fetch in check rules.
2026-04-13 20:01:42 +02:00
Tim Duesterhus
7ff2627112 CLEANUP: log: Stop touching struct stream internals for %ID
Use the return value of `stream_generate_unique_id()` instead of relying on the
`unique_id` field of `struct stream` when handling the `%ID` log placeholder.
This also allowed to unify the "stream available" and "stream not available"
paths.

Reviewed-by: Volker Dusch <github@wallbash.com>
2026-04-13 20:01:42 +02:00
Tim Duesterhus
38796d4c06 MINOR: Allow inlining of stream_generate_unique_id()
With the introduction of the `generate_unique_id()` helper, the actual
complicated logic is sitting in a different file. Allow inlining of
`stream_generate_unique_id()`, so that callers can benefit from an abstraction
without hiding away the access of `strm->unique_id` behind a function call.
2026-04-13 20:01:42 +02:00
Tim Duesterhus
73040e3a8e MINOR: Add generate_unique_id() helper
This new function will handle the actual generation of the unique ID according
to a format. The caller is responsible to check that no unique ID is stored
yet.
2026-04-13 20:01:02 +02:00
Tim Duesterhus
4cf06a7d23 CLEANUP: Make lf_expr parameter of sess_build_logline_orig() const
Since this is safely possible without making any changes, we can provide this
hint to the compiler.
2026-04-13 19:59:12 +02:00
Willy Tarreau
9a5db56a36 BUG/MINOR: haterm: don't apply the default pipe size margin twice
Commit 6d16b11022 ("BUG/MINOR: haterm: preserve the pipe size margin
for splicing") solved the issue of pipe size being sufficient for the
vmsplice() call, but as Christopher pointed out, the ratio was applied
to the default size of 64k, so now it's applied twice, giving 100k
instead of 80k. Let's drop it from there.

No backport needed.
2026-04-13 19:38:48 +02:00
Egor Shestakov
79c54d28b0 BUG/MINOR: acme: don't pass NULL into format string
Printing a "(null)" when NULL passed with the %s format specifier is a
GNU extension, so it must be avoided for portability reasons.

Must be backported as far as 3.2
2026-04-13 18:56:13 +02:00
William Lallemand
53679fe5f6 BUG/MINOR: acme: read the wildcard flag from the authorization response
Some checks are pending
Contrib / build (push) Waiting to run
alpine/musl / gcc (push) Waiting to run
VTest / Generate Build Matrix (push) Waiting to run
VTest / (push) Blocked by required conditions
Windows / Windows, gcc, all features (push) Waiting to run
The wildcard field was declared and used when building the dns-persist-01
TXT record value (policy=wildcard suffix), but was never populated from
the server's authorization response. Add the missing mjson_get_bool() call
to read $.wildcard before saving auth->dns.
2026-04-13 18:49:53 +02:00
William Lallemand
39476040ec MINOR: acme: extend resolver-based DNS pre-check to dns-persist-01
Add challenge_type parameter to acme_rslv_start() to select the correct
DNS lookup prefix: _validation-persist.<domain> for dns-persist-01 and
_acme-challenge.<domain> for dns-01.

Default cond_ready to ACME_RDY_DNS|ACME_RDY_DELAY for dns-persist-01.
Extend ACME_CLI_WAIT to cover dns-persist-01 alongside dns-01.

In ACME_RSLV_READY, check only TXT record existence for dns-persist-01
since the resolver cannot parse multiple strings within a single TXT entry.
2026-04-13 18:45:08 +02:00
Mia Kanashi
0d3689959d MEDIUM: acme: implement dns-persist-01 challenge
Implements draft DNS-PERSIST-01 challenge based on
https://datatracker.ietf.org/doc/html/draft-ietf-acme-dns-persist

Blog post: https://letsencrypt.org/2026/02/18/dns-persist-01

This challenge is designed to use preprovisioned DNS records,
unlike DNS-01 challenge it doesn't need per provider API integration.

In short instead of validating order by crafting a custom response
based on input recieved from ACME server, like other challenges do
in particular DNS-01, HTTP-01, TLS-ALPN-01, in this challenge you
authorize domain statically, ACME account key functions similar to
a private key and accounturi in the record functions like a public key,
ACME server verifies that account uri matches account key and authorizes
based on that. You only need to write DNS record one time,
accounturi binds to an account key, and will only change if new account
key is created, although it is possible to rotate account key without
changing account uri.

Main benefits of this challenge in contrast to DNS-01:
1. Security, no need to give reverse proxy write access to the DNS.
2. Simplicity, no complex per provider integrations like Lego needed.
3. Robustness, no worrying about DNS record cache each renewal.

It would be used like this:
1. generate an account key ahead of time
2. add required DNS record manually or automatically using IaC tools
3. start HAProxy with the same account key used

Intended way to use this challenge is with a code that will print
and maybe sets DNS records ahead of time. For example that could
be integrated into the IaC provisioning step. This challenge type
is extremely recent though, so those integrations are yet to be written.

It is possible to do this challenge without extra tools too,
with pebble / challtestsrv steps would be as following:

After starting HAProxy it will print required records in the logs.

With challtestsrv you can then set those records like this:

curl -d '{
  "host":"_validation-persist.localhost.",
  "value": "pebble.letsencrypt.org; accounturi=...; policy=wildcard"}
' http://localhost:8055/set-txt

After setting the records run renew with the name of the certificate:

echo "acme renew @cert/localhost.pem" \
  | socat stdio tcp4-connect:127.0.0.1:9999

Or just restart HAProxy.

Unlike with DNS-01 you don't have to worry about DNS records changing,
if there is any problem with DNS records you can just retry.
2026-04-13 18:45:08 +02:00
Willy Tarreau
6d16b11022 BUG/MINOR: haterm: preserve the pipe size margin for splicing
Originally in httpterm we used to allocate 5/4 of the size of a pipe to
permit to use vmsplice because there's some fragmentation or overhead
internally that requires to use a bit of margin. While this was initially
applied to haterm as well, it was accidentally lost with commit fb82dece47
("BUG/MEDIUM: haterm: Properly initialize the splicing support for haterm"),
resulting in errors about vmsplice() whenever tune.pipesize is set. Let's
enforce the ratio again.

No backport is needed.
2026-04-13 18:37:04 +02:00
Egor Shestakov
61f04d1951 MINOR: errors: remove excessive errmsg checks
I noticed some strange checks for presence of errmsg. Called functions
generate non-empty error message in case of failure, so a non-NULL address
of the error message is enough.

No backport needed.
2026-04-13 15:39:05 +02:00
Christopher Faulet
ebb801d7c8 BUG/MEDIUM: cli: Properly handle too big payload on a command line
When command line is parsed, when the payload was too big the error was not
properly handled. Instead of leaving the parsing function to print the
error, we looped infinitly trying to parse remaining data.

When the command line is too big, we must exit the parsing function in
CLI_ST_PRINT_ERR state. Instead of exiting the function, we only left the
while loop, setting this way the cli applet in CLI_ST_PROMPT state.

This patch must be backported as far as 3.2.
2026-04-13 15:18:47 +02:00
Egor Shestakov
c82b10b8d2 MINOR: tools: memvprintf(): remove <out> check that always true
memvprintf() exits early if the <out> is NULL, so the further NULL check is
redundant.

No backport needed.
2026-04-13 14:36:52 +02:00
Tim Duesterhus
801d028790 CLEANUP: http_fetch: Use local unique_id variable in smp_fetch_uniqueid()
Instead of relying on the implementation detail that
`stream_generate_unique_id()` will store the unique ID in `strm->unique_id` we
should use the returned value, especially since that one is already checked in
the `isttest()`.

Reviewed-by: Volker Dusch <github@wallbash.com>
2026-04-13 14:02:29 +02:00
Tim Duesterhus
44a461a19f CLEANUP: stream: Reduce duplication in stream_generate_unique_id()
The return value of the `if()` and `else` branch is identical. We can just move
it out of conditional paths.

Reviewed-by: Volker Dusch <github@wallbash.com>
2026-04-13 14:02:29 +02:00
Tim Duesterhus
f778e6feb0 CLEANUP: stream: Explain the two-step initialization in stream_generate_unique_id()
This two-step initialization of `strm->unique_id` looks like a refactoring
target. Add a comment to prevent regressions of the fix in
fb7b5c8a53.
2026-04-13 14:02:29 +02:00
Tim Duesterhus
756ad19f04 CLEANUP: log: Return size_t from sess_build_logline_orig()
`sess_build_logline_orig()` takes a `size_t maxsize` as input and accordingly
should also return `size_t` instead of `int` as the resulting length. In
practice most of the callers already stored the result in a `size_t` anyways.
The few places that used an `int` were adjusted.

This Coccinelle patch was used to check for completeness:

    @@
    type T != size_t;
    T var;
    @@

    (
    * var = build_logline(...)
    |
    * var = build_logline_orig(...)
    |
    * var = sess_build_logline(...)
    |
    * var = sess_build_logline_orig(...)
    )

Reviewed-by: Volker Dusch <github@wallbash.com>
2026-04-13 14:02:29 +02:00
Tim Duesterhus
34c17608e7 BUG/MINOR: log: Fix error message when using unavailable fetch in logfmt
The following configuration:

    defaults
    	unique-id-format TEST-%[srv_name]

    frontend fe_http
    	mode http

    	bind :::8080 v4v6

Emitted the following error:

    [ALERT]    (219835) : Parsing [./patch.cfg:2]: failed to parse unique-id : sample fetch <srv_name]> may not be reliably used here because it needs 'server' which is not available here.

The `]` in the name of the sample fetch should not be there.

This bug exists since at least HAProxy 2.4, which is the oldest supported
version. The fix should be backported there.

Reviewed-by: Volker Dusch <github@wallbash.com>
2026-04-13 14:02:29 +02:00
Amaury Denoyelle
34c9ded340 BUG/MINOR: quic: do not use hardcoded values in QMux TP frame builder
Reuse QUIC transport parameters value set in xprt_qstrm layer in frame
builder function. Prior to this patch, mux_quic would use different
values from the advertised ones.

No need to backport.
2026-04-13 13:38:11 +02:00
Amaury Denoyelle
175717f5be MINOR: mux_quic: remove duplicate QMux local transport params
Some checks are pending
Contrib / build (push) Waiting to run
alpine/musl / gcc (push) Waiting to run
VTest / Generate Build Matrix (push) Waiting to run
VTest / (push) Blocked by required conditions
Windows / Windows, gcc, all features (push) Waiting to run
When QMux was first implemented, values used for emitted transport
parameters in xprt_qstrm and local flow control in mux_quic were
initialized separately. This is error prone in particular if a value is
change in one layer but not the other.

This patch fixes this by using xprt_qstrm_lparams() in QMux init
function. Mux flow control is then loaded with these values. Thus all
values are now initialized in a single place which is xprt_qstrm_init().
2026-04-13 09:38:46 +02:00
Miroslav Zagorac
cd14abf9f3 MEDIUM: otel: added OpenTelemetry filter skeleton
The OpenTelemetry (OTel) filter enables distributed tracing of requests
across service boundaries, export of metrics such as request rates,
latencies and error counts, and structured logging tied to trace context,
giving operators a unified view of HAProxy traffic through any
OpenTelemetry-compatible backend.

The OTel filter is implemented using the standard HAProxy stream filter
API.  Stream filters attach to proxies and intercept traffic at each stage
of processing: they receive callbacks on stream creation and destruction,
channel analyzer events, HTTP header and payload processing, and TCP data
forwarding.  This allows the filter to collect telemetry data at every
stage of the request/response lifecycle without modifying the core proxy
logic.

This commit added the minimum set of files required for the filter to
compile: the addon Makefile with pkg-config-based detection of the
opentelemetry-c-wrapper library, header files with configuration
constants, utility macros and type definitions, and the source files
containing stub filter operation callbacks registered through
flt_otel_ops and the "opentelemetry" keyword parser entry point.

The filter uses the opentelemetry-c-wrapper library from HAProxy
Technologies, which provides a C interface to the OpenTelemetry C++ SDK.
This wrapper allows HAProxy, a C codebase, to leverage the full
OpenTelemetry observability pipeline without direct C++ dependencies
in the HAProxy source tree.

  https://github.com/haproxytech/opentelemetry-c-wrapper
  https://github.com/open-telemetry/opentelemetry-cpp

Build options:

  USE_OTEL     - enable the OpenTelemetry filter
  OTEL_DEBUG   - compile the filter in debug mode
  OTEL_INC     - force the include path to the C wrapper
  OTEL_LIB     - force the library path to the C wrapper
  OTEL_RUNPATH - add the C wrapper RUNPATH to the executable

Example build with OTel and debug enabled:

  make -j8 USE_OTEL=1 OTEL_DEBUG=1 TARGET=linux-glibc
2026-04-13 09:23:26 +02:00
Amaury Denoyelle
b8145fa5d4 BUG/MINOR: xprt_qstrm: do not parse record length on read again
conn_recv_qstrm() may be called several times per connection if the read
data is too short and a truncated record is received.

Previously, record length was parsed every time the function is invoked.
However, this must only be performed if record length varint is
incomplete. Once read and parsed, data are removed from the buffer via
b_quic_dec_int(). Thus, next conn_recv_qstrm() run will reread an
invalid record length this time.

This patch fixes this by only parsing record length if <rxrlen> member
is null. Prior to it, parsing of QMux transport parameters would fail in
case of a first truncated read, which would prevent the connection
initialization.

No need to backport.
2026-04-13 09:11:08 +02:00
Amaury Denoyelle
b5624a6365 BUG/MINOR: mux_quic: prevent QMux crash on qcc_io_send() error path
A QCC connection may be flagged with QC_CF_ERRL to trigger a
CONNECTION_CLOSE emission. However, for now error reporting is not
functional with QMux, as it relies on quic_conn layer access.

To prevent a crash in qcc_io_send() when using QMux, add a
conn_is_quic() check when QC_CF_ERRL is set to ensure no access will be
performed on quic_conn layer. In the future, this should be extended so
that QMux is also able to emit CONNECTION_CLOSE for connection closure.

No need to backport.
2026-04-13 09:11:08 +02:00
Christopher Faulet
fb82dece47 BUG/MEDIUM: haterm: Properly initialize the splicing support for haterm
Some checks failed
Contrib / build (push) Has been cancelled
alpine/musl / gcc (push) Has been cancelled
VTest / Generate Build Matrix (push) Has been cancelled
Windows / Windows, gcc, all features (push) Has been cancelled
VTest / (push) Has been cancelled
First, we must not emit any warning if splicing is not configured and the
global maxpipes value is 0. Then we must not remove GTUNE_USE_SPLICE flag
when we fail to allocate the haterm master pipe. Instead, we test it when we
negociate with the opposite side, to properly exclude the splicing if it is
not usable.

No backport needed.
2026-04-10 16:32:29 +02:00
Christopher Faulet
313121639e Revert "BUG/MEDIUM: haterm: Move all init functions of haterm in haterm_init.c"
This reverts commit 8056117e98.

Moving haterm init from haproxy is not the right way to fix the issue
because it should be possible to use a haterm configuration in haproxy.

So let's revert the commit above.
2026-04-10 16:32:29 +02:00
Amaury Denoyelle
63febbace7 BUG/MINOR: do not crash on QMux reception of BLOCKED frames
Some checks failed
Contrib / build (push) Has been cancelled
alpine/musl / gcc (push) Has been cancelled
VTest / Generate Build Matrix (push) Has been cancelled
Windows / Windows, gcc, all features (push) Has been cancelled
VTest / (push) Has been cancelled
Add QUIC BLOCKED frames in the list of supported types in
qstrm_parse_frm(). Nothing is really implemented for them as for QUIC,
but this prevents a crash when receiving one of them via QMux.

No need to backport.
2026-04-10 10:30:49 +02:00
Amaury Denoyelle
a993f0c503 MEDIUM: mux-quic/xprt_qstrm: implement QMux record emission
This patch implements emission of the new Record layer for QMux frames.
This handles mux-quic and xprt_qstrm layers as this is performed
similarly in both cases.

Currently, the simplest approach has been prefered : each frame is
encoded in its own record. This is not the most efficient in size but it
is extremely simple to implement for a first interop testing.
2026-04-10 10:20:52 +02:00
Amaury Denoyelle
792e055c7c MEDIUM: xprt_qstrm: implement QMux record parsing
This patch implements the new QMux record layer parsing for xprt_qstrm.
This is mostly similar to the MUX code from the previous patch.

Along with this change, a new xprt_qstrm layer accessor exposes the
possible remaining record length after Transport parameters parsing.
This can only occur when xprt_qstrm Rx buffer is not completely emptied
due to other following frames. If stored in the same record, MUX layer
has to know the remaining record length.

Thus, xprt_qstrm_rxrlen() is now used in qmux_init() to preinitialize
<rx.rlen> QCC field.
2026-04-10 10:20:52 +02:00
Amaury Denoyelle
5271cdaca3 MEDIUM: mux-quic: implement QMux record parsing
This is the first patch of a serie which aims to support the new Record
layer defined by the draft 01 of QMux protocol.

  https://www.ietf.org/archive/id/draft-ietf-quic-qmux-01.html#name-qmux-records

This patch deals with QMux reception at the MUX layer. The function
qcc_qstrm_recv() is adapted to read record headers before frame parsing.
This requires to keep the last record length read in a new QCC field
named <rx.rlen>.

Frames are only parsed once a full record is received. One of the
advantage of the record layer is that it can only contains whole frame
without truncation.
2026-04-10 10:20:52 +02:00
Amaury Denoyelle
10f2867dc2 MINOR: xprt_qstrm: handle connection errors
This patch implements proper connection error handling for xprt_qstrm
layer. Basically, processing is interrupted if CO_FL_ERROR is
encountered after either rcv_buf or snd_buf operations. Connectionn
error is set to the newly defined value CO_ER_QSTRM.
2026-04-10 10:20:52 +02:00
Amaury Denoyelle
47199ce895 MINOR: xprt_qstrm: implement Tx buffering
This commit adds buffering on transmission for xprt_qstrm layer. This is
necessary in the rare case where send syscall only emits partial data.

A new <txbuf> member is defined in xprt_qstrm context. On first send
invokation, buffer is allocated and then the QMux transport parameters
frame is encoded. Then emission is performed via snd_buf and each time
the send function is invoked.
2026-04-10 10:20:52 +02:00
Amaury Denoyelle
fb3b268747 MINOR: xprt_qstrm/mux-quic: handle extra QMux frames after params
Layer xprt_qstrm is responsible to read the initial QMux transport
parameters frame. However, it could receive more data if some other
frames follow it. This extra content can only be handled by the MUX
layer once initialized.

Theorically, it could have been implemented via MSG_PEEK. However, this
flag is currently ignored by SSL layer. Besides, it is tedious to
implement safely. A new approach has been prefered where the MUX layer
is responsible to retrieve remaining data via xprt_qstrm_rxbuf()
accessor function during its initialization.

Thus, qmux_init() now may retrieve the buffer from xprt_qstrm layer.
This is performed via b_xfer() which will result in a zero copy
transfer. If this happens, tasklet is immediately scheduled to start
demuxing.
2026-04-10 10:20:52 +02:00
Amaury Denoyelle
890831f292 MINOR: xprt_qstrm: implement Rx buffering
Implement buffering for reception on xprt_qstrm layer. This is necessary
to handle reception of a truncated QMux transport parameters frame.

This is performed via a new dedicated <rxbuf> member in xprt_qstrm
context. Read is performed by reusing the buffer until a whole frame can
be read.
2026-04-10 10:20:52 +02:00
Amaury Denoyelle
c63e6ecd4b BUG/MINOR: quic: increment pos pointer on QMux transport params parsing
QUIC frame parsers functions take a <pos> pointer as input argument for
the data to be parsed. If parsing is successful, <pos> must be
incremented to point to the next data.

Increment was not performed when parsing QMux transport parameters
frame. This commit fixes this. Note that for now there is no real issue
as xprt_qstrm does not check the QMux frame length.

No need to backport.
2026-04-10 10:20:52 +02:00
Amaury Denoyelle
90d0e8a948 BUG/MINOR: mux-quic: fix potential NULL deref on qcc_release()
In qcc_release(), <conn> may be NULL. Thus every access on it must be
tested.

With recent QMux introduction, a call to conn_is_quic() has been added
prior to registration of the stream rejection callback. It could lead to
NULL deref as <conn> is not tested there. Fix this by adding an extra
check on the pointer validity.

No need to backport.
2026-04-10 10:20:52 +02:00
Greg Kroah-Hartman
4ad200f276 BUG/MINOR: hlua: fix use-after-free of HTTP reason string
hlua_applet_http_status() stored the result of luaL_optlstring()
directly in http_ctx->reason. The pointer references Lua-managed
string storage which is only guaranteed valid until the C function
returns to Lua. If the GC runs between applet:set_status(200, str)
and applet:start_response(), the pointer dangles.

hlua_applet_http_send_response() then calls ist(http_ctx->reason)
which does strlen() on freed memory, followed by memcpy into the
HTX status line. The freed-and-reallocated chunk contents are sent
verbatim to the HTTP client.

Trigger:

    applet:set_status(200, table.concat({"Reason ", str:rep(50)}))
    collectgarbage("collect"); collectgarbage("collect")
    applet:start_response()

With heap grooming, adjacent allocation contents (session data, TLS
material from the same thread) leak into the response status line.

Anchor the Lua string in the registry keyed by the http_ctx field
address so it survives until the applet is done with it. The
registry entry is overwritten on each call (handles repeated
set_status) and naturally cleaned up when the lua_State is closed.

This patch should be backported to all stable versions.
2026-04-10 10:18:27 +02:00
Greg Kroah-Hartman
0aeae23056 BUG/MEDIUM: mux-fcgi: prevent record-length truncation with large bufsize
FCGI content_length is a 16-bit field but fcgi_set_record_size()
is called with size_t/uint32_t arguments. With tune.bufsize >= 65544
(legal; cfgparse-global.c only enforces <= INT_MAX-16), a single
HTX DATA block or accumulated outbuf can exceed 65535 bytes. The
implicit conversion to uint16_t silently truncates the length field
while b_add(mbuf, outbuf.data) writes the full body.

A client posting ~99000 bytes can craft the body so that bytes
after the truncated length are parsed by PHP-FPM as fresh FCGI
records on the connection: a smuggled BEGIN_REQUEST + PARAMS with
arbitrary SCRIPT_FILENAME / PHP_VALUE bypasses all haproxy ACLs.

Fix the zero-copy path by refusing it when the block exceeds 65535
bytes (falls through to copy). Fix the copy path by capping
outbuf.size to 65535 + header so the data-fill loop naturally
stops at the FCGI maximum and emits the rest in a subsequent record.

The PARAMS path at line 2084 is similarly affected but harder to
trigger (requires combined header+param size > 65535) and is
covered by the same outbuf.size cap pattern if applied there.

This patch must be backported to all stable versions.
2026-04-10 09:40:16 +02:00
Greg Kroah-Hartman
e6c3660327 BUG/MINOR: sample: fix info leak in regsub when exp_replace fails
exp_replace() returns int and returns -1 when the back-reference
expansion overflows the output buffer (regex.c:51). output->data is
size_t, so -1 becomes SIZE_MAX. There was no error check.

The subsequent comparisons interpret SIZE_MAX as a huge length:
"output->data > b_room(trash)" tries to grow trash, then
"max > output->data" is false so max stays at trash->size, and
memcpy(trash, output->area, trash->size) copies the full chunk.
output->area is a pool_alloc()'d chunk that is NOT zeroed; the
bytes after the partial exp_replace output are stale data from a
prior pool user (request headers, response bodies from the same
worker thread).

Trigger with a backreference whose expansion exceeds bufsize:

    http-request set-header X %[req.hdr(In),regsub('(.+)','\1\1')]

and a request with In: of ~9000 bytes. The X header sent to the
backend then contains ~9KB of stale heap data.

With tune.bufsize.large set, get_larger_trash_chunk() upgrades trash
and the memcpy reads up to ~50KB past the (smaller) output->area
allocation.

http_ana.c:2728 and http_act.c:551 already check exp_replace() for
-1; this call site was missed when backreferences were added.

This patch must be backported to all stable versions.
2026-04-10 09:33:37 +02:00
Christopher Faulet
b0a9216ca5 BUG/MEDIUM: samples: Fix handling of SMP_T_METH samples
Some checks are pending
Contrib / build (push) Waiting to run
alpine/musl / gcc (push) Waiting to run
VTest / Generate Build Matrix (push) Waiting to run
VTest / (push) Blocked by required conditions
Windows / Windows, gcc, all features (push) Waiting to run
Samples of type SMP_T_METH were not properly handled in smp_dup(),
smp_is_safe() and smp_is_rw(). For "other" methods, for instance PATCH, a
fallback was performed on the SMP_T_STR type. Only the buffer considered
changed. "smp->data.u.meth.str" should be used for the SMP_T_METH samples
while smp->data.u.str should be used for SMP_T_STR samples. However, in
smp_dup(), the result was stored in wrong buffer, the string one instead of
the method one. In smp_is_safe() and smp_is_rw(), the method buffer was not
used at all.

We now take care to use the right buffer.

This patch must be backported to all stable versions.
2026-04-09 22:05:12 +02:00
Christopher Faulet
265be7e8cb BUG/MINOR: haterm: Return the good start-line for 100-continue interim message
When "Expect" header was found in request headers, "HTTP/1.1 100-continue"
was returned instead of "HTTP/1.1 100 continue". Let's fix it.

No backport needed.
2026-04-09 22:04:42 +02:00
Greg Kroah-Hartman
0cde3cd4df BUG/MINOR: http-act: validate decoded lengths in *-headers-bin
http_action_set_headers_bin() decodes varint name and value lengths
from a binary sample but never validates that the decoded length
fits in the remaining sample data before constructing the ist.

If the value's varint decodes to a large number with only a few
bytes following, v.len exceeds the buffer and http_add_header()
memcpys past the sample, copying adjacent heap data into a header
sent to the backend (or client, with http-response).

The intended source for this action is the hdrs_bin sample fetch
which produces well-formed output, but nothing prevents an admin
from feeding it req.body or another untrusted source. With:

    http-request set-var(txn.h) req.body
    http-request add-headers-bin var(txn.h)

a POST body of [05]"X-Foo"[c8]"AB" produces v = {ptr="AB", len=200}
and 198 bytes of adjacent heap data go into X-Foo.

http_action_del_headers_bin() was fixed too.

Compare spoe_decode_buffer() which has the equivalent check.
Validate both name and value lengths against remaining data.

No backport needed.
2026-04-09 17:10:56 +02:00
Greg Kroah-Hartman
b63cae7f9b BUG/MINOR: resolvers: fix memory leak on AAAA additional records
Commit c84c15d393 ("BUG/MINOR: resolvers: Apply dns-accept-family
setting on additional records") converted a switch statement to an
if/else chain but left the break; in the AAAA branch. In the new
form, break exits the surrounding for loop instead of a switch case.

For every AAAA additional record in an SRV response:
  - answer_record allocated at line 1460 is never freed and never
    inserted into answer_tree -> ~580 bytes leaked per response
  - all subsequent additional records in the response are silently
    discarded

A DNS server controlling SRV responses for haproxy service discovery
can leak memory at MB/min rates given default resolution intervals.
Also breaks IPv6 SRV target resolution outright since the AAAA record
is leaked rather than attached to its SRV entry.
2026-04-09 16:31:05 +02:00
William Lallemand
591a85e29e MINOR: lua: add tune.lua.openlibs to restrict loaded Lua standard libraries
HAProxy has always called luaL_openlibs() unconditionally, which opens
all standard Lua libraries including io, os, package and debug. This
makes it impossible to prevent Lua scripts from executing binaries
(os.execute, io.popen), loading native C modules (package/require), or
bypassing any Lua-level sandbox via the debug library.

Add a new global directive tune.lua.openlibs that accepts a comma-separated
list of library names to load:

  tune.lua.openlibs none                   # only base + coroutine
  tune.lua.openlibs string,math,table,utf8 # safe libs only
  tune.lua.openlibs all                    # default, same as before

The base and coroutine libraries are always loaded regardless: base provides
core Lua functions that HAProxy relies on, and coroutine is required because
HAProxy overrides coroutine.create() with its own safe implementation.

When all libraries are enabled (the default), the fast path still calls
luaL_openlibs() directly with no overhead. A parse error is returned if
the directive appears after lua-load or lua-load-per-thread (the Lua state
is already initialised at that point), or if 'none' is combined with other
library names. Note that fork() and new thread creation are already blocked
by default regardless of this setting (see "insecure-fork-wanted").
2026-04-09 14:31:10 +02:00
Willy Tarreau
3020fde525 BUG/MAJOR: slz: always make sure to limit fixed output to less than worst case literals
Some checks are pending
Contrib / build (push) Waiting to run
alpine/musl / gcc (push) Waiting to run
VTest / Generate Build Matrix (push) Waiting to run
VTest / (push) Blocked by required conditions
Windows / Windows, gcc, all features (push) Waiting to run
Literals are sent in two ways:
  - in EOB state, unencoded and prefixed with their length
  - in FIXED state, huffman-encoded

And references are only sent in FIXED state.

The API promises that the amount of data will not grow by more than
5 bytes every 65535 input bytes (the comment was adjusted to remind
this last point). This is guaranteed by the literal encoding in EOB
state (BT, LEN, NLEN + bytes), which is supposed to be the worst
case by design.

However, as reported by Greg KH, this is currently not true: the test
that decides whether or not to switch to FIXED state to send references
doesn't properly account for the number of bytes needed to roll back
to the *exact* same state in EOB, which means sending EOB, BT,
alignment, LEN and NLEN in addition to the referenced bytes, versus
sending the encoding for the reference. By not taking into account the
cost of returning to the initial state (BT+LEN+NLEN), it was possible
to stay too long in the FIXED state and to consume the extra bytes that
are needed to return to the EOB state, resulting in producing much more
data in case of multiple switchovers (up to 6.25% increase was measured
in tests, or 1/16, which matches worst case estimates based on the code).

And this check is only valid when starting from EOB (in order to restore
the same state that offers this guarantee). When already in FIXED state,
the encoded reference is always smaller than or same size as the data.
The smallest match length we support is 4 bytes, and when encoded this
is no more than 28 bits, so it is safe to stay in FIXED state as long
as needed while checking the possibility of switching back to EOB.

This very slightly reduces the compression ratio (-0.17% on a linux
kernel source) but makes sure we respect the API promise of no more
than 5 extra bytes per 65535 of input. A side effect of the slightly
simpler check is an ~7.5% performance increase in compression speed.

Many thanks to Greg for the detailed report allowing to reproduce
the issue.

This is libslz upstream commit 002e838935bf298d967f670036efa95822b6c84e.

Note: in haproxy's default configuration (tune.bufsize 16384,
tune.maxrewrite 1024), this problem cannot be triggered, because the
reserve limits input to 15360 bytes, and the overflow is maximum
960 bytes resulting in 16320 bytes total, which still fits into the
buffer. However, reducing tune.maxrewrite below 964, or tune.bufsize
above 17408 can result in overflows for specially crafted patterns.

A workaround for larger buffers consists in always setting tune.bufsize
to at least 1/16 of tune.bufsize.

Reported-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://www.mail-archive.com/haproxy@formilux.org/msg46837.html
2026-04-08 19:14:25 +02:00
Olivier Houchard
d759e60a32 MEDIUM: check: Revamp the way the protocol and xprt are determined
Storing the protocol directly into the check was not a good idea,
because the protocol may not be determined until after a DNS resolution
on the server, and may even change at runtime, if the DNS changes.
What we can, however, figure out at start up, is the net_addr_type,
which will contain all that we need to find out which protocol to use
later.
Also revert the changes made by commit 07edaed191
that would not reuse the server xprt if a different alpn is set for
checks. The alpn is just a string, and should not influence the choice
of the xprt.
We'll now make sure to use the server xprt, unless an address is
provided, in which case we'll use whatever xprt matches that address, or
a port, in which case we'll assume we want TCP, and use check_ssl to
know whetver we want the SSL xprt or not.
Now that the check contains all that is needed to know which protocol to
look up, always just use that when creating a new check connection if it
is the default check connection, and for now, always use TCP when a
tcp-check or http-check connect rule is used (which means those can't be
used for QUIC so far).

This should hopefully fix github issue #3324.
2026-04-08 18:41:48 +02:00
William Lallemand
923b4c3a19 Revert "BUG: hlua: fix stack overflow in httpclient headers conversion"
This reverts commit a03120e228.

A WIP version of the patch was applied before the actual patch by
accident. The correct patch is 2db801c ("BUG/MINOR: hlua: fix stack
overflow in httpclient headers conversion")
2026-04-08 11:05:38 +02:00
Christopher Faulet
b7add82f92 BUG/MEDIUM: connection: Wake the stconn on error when failing to create mux
When the app_ops were removed, direct calls to the SC wake callback function
were replaced by tasklet wakeups. However, in conn_create_mux(), it was
replaced by a direct call to sc_conn_process(). However, sc_conn_process()
is only usable when the SC is attach to a stream. A backend mux can be
created for a healcheck. In this context, sc_conn_process() cannot be
called.

Because of this bug, crashes can be experienced when an error is triggered
during a SSL connection attempt from a healthcheck.

To fix the issue, the call to sc_conn_process() was replaced by a tasklet
wakeup.

This patch should fix the issue #3326. No backport needed.
2026-04-08 08:20:59 +02:00
Greg Kroah-Hartman
06673291d7 BUG/MINOR: peers: fix OOB heap write in dictionary cache update
Some checks are pending
Contrib / build (push) Waiting to run
alpine/musl / gcc (push) Waiting to run
VTest / Generate Build Matrix (push) Waiting to run
VTest / (push) Blocked by required conditions
Windows / Windows, gcc, all features (push) Waiting to run
When a peer sends a dictionary entry update with a value (the else
branch at line 2109), the entry id decoded from the wire was never
validated against dc->max_entries before being used as an array index
into dc->rx[].

A malicious peer can send id=N where N > 128 (PEER_STKT_CACHE_MAX_ENTRIES)
to:
  - dc->rx[id-1].de at line 2123: OOB read followed by atomic decrement
    and potential free of an attacker-controlled pointer via
    dict_entry_unref()
  - dc->rx[id-1].de = de at line 2124: OOB write of a heap pointer at
    an attacker-controlled offset (16-byte stride, ~64 GiB range)

The bounds check was added to the key-only branch in commit f9e51beec
("BUG/MINOR: peers: Do not ignore a protocol error for dictionary
entries.") but was never added to the with-value branch. The bug has
been present since dictionary support was introduced in commit
8d78fa7def ("MINOR: peers: Make peers protocol support new
"server_name" data type.").

Reachable from any TCP client that knows the configured peer name
(no cryptographic authentication on the peers protocol). Requires a
stick-table with "store server_key" in the configuration.

Fix by hoisting the bounds check above the branch so it covers both
paths.

Must be backported as far as 2.6.
2026-04-07 14:41:46 +02:00
Greg Kroah-Hartman
782a1b5888 BUG/MEDIUM: chunk: fix infinite loop in get_larger_trash_chunk()
When the input chunk is already the large buffer (chk->size ==
large_trash_size), the <= comparison still matched and returned
another large buffer of the same size. Callers that retry on a
non-NULL return value (sample.c:4567 in json_query) loop forever.

The json_query infinite loop is trivially triggered: mjson_unescape()
returns -1 not only when the output buffer is too small but also for
any \uXXYY escape where XX != "00" (mjson.c:305) and for invalid
escapes like \q. The retry loop assumes -1 always means "grow the
buffer", so a 14-byte JSON body of {"k":"\u0100"} hangs the worker
thread permanently. Send N such requests to exhaust all worker
threads.

Use < instead of <= so a chunk that is already large yields NULL.
This also fixes the json converter overflow at sample.c:2869 where
no recheck happens after the "growth" returned a same-size buffer.

Introduced in commit ce912271db ("MEDIUM: chunk: Add support for
large chunks"). No backport needed.
2026-04-07 14:20:38 +02:00
Greg Kroah-Hartman
f712841cf0 BUG/MEDIUM: chunk: fix typo allocating small trash with bufsize_large
A copy-paste error in alloc_trash_buffers_per_thread() passes
global.tune.bufsize_large to alloc_small_trash_buffers() instead of
global.tune.bufsize_small. This sets small_trash_size = bufsize_large.

When tune.bufsize.large is configured, get_larger_trash_chunk() then
incorrectly matches a large buffer against small_trash_size at line
169 and "grows" it to a regular (smaller) buffer. b_xfer() at line
179 attempts to copy the large buffer's contents into the smaller one:

  - Default builds (DEBUG_STRICT=1): BUG_ON in __b_putblk() aborts
    the process -> remote DoS
  - DEBUG_STRICT=0 builds: BUG_ON becomes ASSUME() and the compiler
    elides the check -> heap overflow with attacker-controlled bytes

Reachable via the json converter (sample.c:2862) when escaping
~bufsize_large/6 control characters in attacker-supplied data such
as a request header or body.

Introduced in commit 92a24a4e87 ("MEDIUM: chunk: Add support for
small chunks"). No backport needed.
2026-04-07 14:20:38 +02:00
Greg Kroah-Hartman
d6284470e4 BUG/MINOR: hlua: fix format-string vulnerability in Patref error path
hlua_error() is a printf-family function (calls vsnprintf), but
hlua_patref_set, hlua_patref_add, and _hlua_patref_add_bulk pass
errmsg directly as the format string. errmsg is built by pattern.c
helpers that embed the user-supplied key or value verbatim, e.g.
pat_ref_set_elt() generates "unable to parse '<value>'".

A Lua script calling:

    ref:set("key", "%p.%p.%p.%p.%p.%p.%p.%p")

against a map with an integer output type (where the parse fails)
gets stack/register contents formatted into the (nil, err) return
value -> ASLR/canary leak. With %n and no _FORTIFY_SOURCE this
becomes an arbitrary write primitive.

This must be backported as far as the Patref Lua API exists.
2026-04-07 14:18:13 +02:00
Greg Kroah-Hartman
2db801c635 BUG/MINOR: hlua: fix stack overflow in httpclient headers conversion
hlua_httpclient_table_to_hdrs() declares a VLA of size
global.tune.max_http_hdr (default 101) on the stack but never checks
hdr_num against that bound. A Lua script that supplies a header table
with more than 101 values writes struct http_hdr entries (two ist =
two heap pointers + two lengths) past the end of the VLA, smashing
the stack frame.

Trigger from any Lua action/task/service:

    local hc = core.httpclient()
    local v = {}
    for i = 1, 300 do v[i] = "x" end
    hc:get{ url = "http://127.0.0.1/", headers = { ["X"] = v } }

Each out-of-bounds entry writes a heap pointer (controllable
allocation contents via istdup) plus an attacker-chosen length onto
the stack, overwriting the saved return address.

[wla: this is only reachable if the Lua script passes more than
max_http_hdr header values, which requires access to the script itself]

This must be backported as far as the httpclient Lua API exists.

Signed-off-by: William Lallemand <wlallemand@haproxy.com>
2026-04-07 13:31:39 +02:00
Greg Kroah-Hartman
a03120e228 BUG: hlua: fix stack overflow in httpclient headers conversion
hlua_httpclient_table_to_hdrs() declares a VLA of size
global.tune.max_http_hdr (default 101) on the stack but never checks
hdr_num against that bound. A Lua script that supplies a header table
with more than 101 values writes struct http_hdr entries (two ist =
two heap pointers + two lengths) past the end of the VLA, smashing
the stack frame.

Trigger from any Lua action/task/service:

    local hc = core.httpclient()
    local v = {}
    for i = 1, 300 do v[i] = "x" end
    hc:get{ url = "http://127.0.0.1/", headers = { ["X"] = v } }

Each out-of-bounds entry writes a heap pointer (controllable
allocation contents via istdup) plus an attacker-chosen length onto
the stack, overwriting the saved return address. With no stack
canary, this is direct RCE; with a canary, it requires a leak first.

Reachable from any deployment that loads Lua scripts. While Lua
scripts are nominally trusted, this turns "can edit Lua" into "can
execute arbitrary native code", which is a meaningful boundary in
many setups (Lua sandbox escape).

This must be backported as far as the httpclient Lua API exists.
2026-04-07 11:23:40 +02:00
Greg Kroah-Hartman
5161415653 BUG/MEDIUM: jwe: fix memory leak in jwt_decrypt_secret with var argument
When the secret argument to jwt_decrypt_secret is a variable
(ARGT_VAR) rather than a literal string, alloc_trash_chunk() is
called to hold the base64-decoded secret but the buffer is never
released. The end: label frees input, decrypted_cek, out, and the
decoded_items array but not secret.

Each request leaks one trash chunk (~tune.bufsize, default 16KB).
At ~65000 requests per GiB this allows slow memory exhaustion DoS
against any config of the form:

    http-request set-var(txn.x) req.hdr(...),jwt_decrypt_secret(txn.key)

This must be backported as far as JWE support exists.
2026-04-07 11:17:30 +02:00
Greg Kroah-Hartman
648b0e7bea BUG/MEDIUM: jwt: fix heap overflow in ECDSA signature DER conversion
convert_ecdsa_sig() calls i2d_ECDSA_SIG(ecdsa_sig, &p) where p
points into signature->area, a trash chunk of tune.bufsize bytes
(default 16384). i2d writes with no output bound.

The raw R||S input can be up to bufsize bytes (filled by
base64urldec at jwt.c:520-527), giving bignum_len up to 8192. The
DER encoding adds a SEQUENCE header (2-4 bytes), two INTEGER headers
(2-4 bytes each), and up to two leading-zero sign-padding bytes when
the bignum high bit is set. With two 8192-byte bignums having the
high bit set, the encoding is ~16398 bytes, overflowing the 16384-
byte buffer by ~14 bytes.

Triggered by any JWT with alg=ES256/384/512 and a ~21830-character
base64url signature. The signature does not need to verify
successfully; the overflow happens before verification. Reachable
from any config using jwt_verify with an EC algorithm.

Also fixes the existing wrong check: i2d returns -1 on error which
became SIZE_MAX in the size_t signature->data, defeating the
"== 0" test.

This must be backported as far as JWT support exists.
2026-04-07 11:11:42 +02:00
Greg Kroah-Hartman
717e9aec5f BUG/MEDIUM: jwe: fix NULL deref crash with empty CEK and non-dir alg
In sample_conv_jwt_decrypt_secret(), when a JWE token has an empty
encrypted-key section but the algorithm is not "dir" (e.g. A128KW),
neither branch initializes decrypted_cek. The NULL pointer is then
passed to decrypt_ciphertext() which dereferences it:

  - For GCM encodings: aes_process() calls b_orig(NULL) -> SIGSEGV
  - For CBC encodings: b_data(NULL) at jwe.c:463 -> SIGSEGV

A single HTTP request with a crafted Authorization header crashes the
worker process. Trigger token (JOSE header {"alg":"A128KW","enc":"A128GCM"},
empty CEK section between the two dots):

  eyJhbGciOiJBMTI4S1ciLCJlbmMiOiJBMTI4R0NNIn0..AAAAAAAAAAAAAAAA.AA.AA

Reachable in any configuration using the jwt_decrypt_secret converter.
The other two decrypt converters (jwt_decrypt_jwk, jwt_decrypt_cert)
already have the check.

This must be backported as far as JWE support exists.
2026-04-07 10:57:47 +02:00
Greg Kroah-Hartman
ed267f9bc5 BUG/MEDIUM: payload: validate SNI name_len in req.ssl_sni
The 16-bit name_len field is read directly from the ClientHello and
stored as the sample length without any validation against srv_len,
ext_len, or the channel buffer size. A 65-byte ClientHello with
name_len=0xffff produces a sample claiming 65535 bytes of data when
only ~4 bytes are actually present in the buffer.

Downstream consumers then read tens of kilobytes past the channel
buffer:
  - pattern.c:741 XXH3() hashes 65535 bytes -> ~50KB OOB heap read
  - sample.c smp_dup memcpy if large trash configured
  - log-format %[req.ssl_sni] leaks heap contents to logs/headers

Reachable pre-authentication on any TCP frontend using req.ssl_sni
(req_ssl_sni), which is the documented way to do SNI-based content
switching in TCP mode. No SSL handshake is required; the parser
runs on raw buffer contents in tcp-request content rules.

Bug introduced in commit d4c33c8889 (2013). The ALPN parser in
the same file at line 1044 has the equivalent check; SNI never did.

This must be backported to all supported versions.
2026-04-07 10:50:04 +02:00
Christopher Faulet
41bded8952 BUG/MEDIUM: tcpcheck: Properly retrieve tcpcheck type to install the best mux
Some checks failed
Contrib / build (push) Has been cancelled
alpine/musl / gcc (push) Has been cancelled
VTest / Generate Build Matrix (push) Has been cancelled
Windows / Windows, gcc, all features (push) Has been cancelled
VTest / (push) Has been cancelled
When the healthcheck section support was added, the tcpcheck type was moved
into the tcpcheck ruleset. However, conn_install_mux_chk() function was not
updated accordingly. So the TCP mode was always returned.

No backport needed. This patch is related to #3324 but it is not the root
cause of the issue.
2026-04-05 09:26:46 +02:00
Aurelien DARRAGON
ca53ee17b6 BUG/MINOR: counters: fix unexpected 127 char GUID truncation for shm-stats-file objects
Some checks failed
Contrib / build (push) Has been cancelled
alpine/musl / gcc (push) Has been cancelled
VTest / Generate Build Matrix (push) Has been cancelled
Windows / Windows, gcc, all features (push) Has been cancelled
VTest / (push) Has been cancelled
As reported by GH @phihos on GH #3320, using the shm-stats-file feature
with objects exceeding 127 chars would result in object name being
unexpectedly truncated, while GUID API supports up to 128 chars.

Indeed, with the config below, and shm-stats-file enabled:
   server s1  127.0.0.1:1 guid srv:xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx:SRV_1 disabled
    server s10 127.0.0.1:1 guid srv:xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx:SRV_10 disabled

haproxy would store the second server object with the same id as the first
one, but upon reload, only the first one would be restored, which would
eventually cause shm-stats-file slot exhaustion with repetitive reloads.

@phihos, found out the underlying issue, in counters.c we used snprintf()
with sizeof(shm_obj->guid) - 1 as <size> parameter, while we should have
use sizeof(shm_obj->guid) instead since shm_obj->guid already takes the
terminating NULL byte into account.

So we simply apply the fix suggested by @phihos, and hopefully this should
solve the shm-stats-file slot leak that was observed.

Unfortunately, for now, we cannot warn the user that a duplicate
shm-stats-file object was found, because we accept duplicate objects
by design for 2 reasons. The first one is for a new process to be able
to change the object type for a previously known GUID while allowing
previous processes to use the old object as long as they are alive.
The second reason is that upon startup we cannot afford to scan the
whole object list, as soon as we find a match (type + GUID), we bind
the object, and this way we avoid unnecessary lookup time.

Perhaps we have room for improvement in the future, but for now let's
keep it this way.

It should be backported to 3.3

Big thanks to @phihos for the bug description, analysis and
suggestions.
2026-04-04 02:14:50 +02:00
Christopher Faulet
774d0dfe14 BUG/MEDIUM: tcpcheck/server: Fix parsing of healthcheck param for dynamic servers
Some checks are pending
Contrib / build (push) Waiting to run
alpine/musl / gcc (push) Waiting to run
VTest / Generate Build Matrix (push) Waiting to run
VTest / (push) Blocked by required conditions
Windows / Windows, gcc, all features (push) Waiting to run
The parsing of the "healthcheck" parameter for dynamic servers was not
finished. The post-config was missing, leading to a crash because the
ruleset pointer was NULL.

To fix the issue, check_server_tcpcheck() function is called in
cli_parse_add_server().

No backport needed.
2026-04-03 16:31:18 +02:00
Christopher Faulet
09c37fb6bd MINOR: tcpcheck: Reject unknown keyword during parsing of healthcheck section
unknown keyword was just ignored. it is not really handy to detect
error. Now an error is reported and the parsing is aborted.
2026-04-03 16:31:17 +02:00
Willy Tarreau
6ed656d691 BUG/MINOR: http-act: fix a typo in the "pause" action error message
It was saying "mause" instead of "pause" :-)
This should be backported to 3.2.
2026-04-03 16:25:49 +02:00
Olivier Houchard
2147e8e368 MINOR: servers: The right parameter for idle-pool.shared is "full"
In documentation, and in an error message, provide the right new keyword for
"idle-pool.shared", it is "full", not "auto".
2026-04-03 15:45:26 +02:00
Christopher Faulet
c010c3924a BUG/MEDIUM: mux-h1: Disable 0-copy forwarding when draining the request
When an early response is sent to the client and the H1 connection is
switched to the draining state, we must take care to disable the 0-copy data
forwarding because the backend side is no longer here. It is an issue
because this prevent any regular receive to be performed.

This patch should fix the issue #3316. It must be backported as far as 3.0.
2026-04-03 15:12:55 +02:00
Christopher Faulet
8056117e98 BUG/MEDIUM: haterm: Move all init functions of haterm in haterm_init.c
Functions used to initialize haterm (the splicing and the response buffers)
were defined and registered in haterm.c. The problem is that this file in
compiled with haproxy. So it may be an issue. And for the splicing part,
warnings may be emitted when haproxy is started.

To avoid any issue during haproxy startup and to avoid to initialize some
part of haterm, all init functions were moved into haterm_init.c file.

No backport needed.
2026-04-03 15:09:44 +02:00
William Lallemand
ed208b7e0f MINOR: tcpcheck: reintroduce proxy_parse_tcpcheck() symbol
Remove the proxy_parse_tcpcheck() static keyword.
2026-04-03 14:37:27 +02:00
Aurelien DARRAGON
8d28c0e37b MEDIUM: filters: add "filter-sequence" directive
This is another pre-requisite work for upcoming decompression filter.

In this patch we implement the "filter-sequence" directive which can be
used in proxy section (frontend,backend,listen) and takes 2 parameters

The first one is the direction (request or response), the second one
is a comma separated list of filter names previously declared on the
proxy using the "filter" keyword.

The main goal of this directive is to be able to instruct haproxy in which
order the filters should be executed on request and response paths,
especially if the ordering between request and response handling must
differ, and without relying on the filter declaration ordering (within
the proxy) which is used by default by haproxy.

Another benefit of this feature is that it becomes possible to "ignore"
a previously declared filter on the proxy. Indeed, when filter-sequence
is defined for a given direction (request/response), then it will be used
over the implicit filter ordering, but if a filter which was previously
declared is not specified in the related filter-sequence, it will not be
executed on purpose. This can be used as a way to temporarily disable a
filter without completely removing its configuration.

Documentation was updated (check examples for more info)
2026-04-03 12:10:27 +02:00
Aurelien DARRAGON
629a5ae531 MINOR: filters: add filter name to flt_conf struct
flt_conf struct stores the filter id, which is used internally to check
match the filter against static pointer identifier, and also used as
descriptive text to describe the filter. But the id is not consistent
with the public name as used in the configuration (for instance when
selecting filter through the 'filter' directive).

What we do in this patch is that we add flt_conf->name member, which
stores the real filter name as seen in the configuration. This will
allow to select filters by their name from other directives in the
configuration.
2026-04-03 12:10:20 +02:00
William Lallemand
e42f381bfc BUG/MINOR: ssl: fix memory leak in ssl_fc_crtname by using SSL_CTX ex_data index
The ssl_crtname_index was registered with SSL_get_ex_new_index() but the
certificate name is stored on a SSL_CTX object via SSL_CTX_set_ex_data().
The free callback is only invoked for the object type matching the index
registration, so the strdup'd name was never freed when the SSL_CTX was
released.

Fix this by using SSL_CTX_get_ex_new_index() instead, which ensures the
free callback fires when the SSL_CTX is destroyed.

No backport needed.
2026-04-03 11:00:36 +02:00
Christopher Faulet
f1bf8dd148 MINOR: haterm: Add an request option to disable splicing
The parameter "?S=0" can now be added on the request URI to disable the
splicing for the response. "?S=1" will only fallback on the haterm settings.
2026-04-03 10:46:15 +02:00
Christopher Faulet
3ab5044c1c MINOR: haterm: Handle boolean request options as flags
Following request options are now handled as flags:

  - ?k=1 => flag HS_ST_OPT_CHUNK_RES is set
  - ?c=0 => flag HS_ST_OPT_NO_CACHE is set
  - ?R=1 => flag HS_ST_OPT_RANDOM_RES is set
  - ?A=A => flag HS_ST_OPT_REQ_AFTER_RES is set.

By default, none is set.
2026-04-03 10:46:15 +02:00
Christopher Faulet
f349d0b113 MEDIUM: haterm: Add support for splicing and option to disable it
The support for the splicing was added and enabled by default, if
supported. The command line option '-dS' was also added to disable the
feature.

When the splicing can be used and the front multiplexer agrees to proceed,
tee() is used to "copy" data from the master pipe to the client pipe.
2026-04-03 10:46:15 +02:00
Christopher Faulet
70a7f8dce0 MEDIUM: haterm: Prepare support for splicing by initializing a master pipe
Now the zero-copy data forwarding is supported, we will add the splicing
support. To do so, we first create a master pipe with vmsplice() during
haterm startup. It is only performed if the splicing is supported. And its
size can be configured by setting "tune.pipesize" global parameter.

This master pipe will be used to fill the pipe with the client.
2026-04-03 10:46:15 +02:00
Christopher Faulet
ecf36f2ca8 MEDIUM: haterm: Add support for 0-copy data forwading and option to disable it
The support for the zero-copy data forwarding was added and enabled by
default. The command line option '-dZ' was also added to disable the
feature.

Concretely, when haterm pushes the response payload, if the zero-copy
forwarding is supported, a dedicated function is used to do so.
hstream_ff_snd() will rely on se_nego_ff() to know how many data can send
and at the end, on se_done_ff() to really send data.

hstream_add_ff_data() function was added to perform the raw copy of the
payload in the sedesc I/O buffer.
2026-04-03 10:46:15 +02:00
Christopher Faulet
2a1afcf39d MINOR: hasterm: Change hstream_add_data() to prepare zero-copy data forwarding
hstream_add_data() function is renamed to hstream_add_htx_data() because
there will be a similar function to add data in zero-copy forwarding
mode. The function was also adapted to take the data length to add in
parameter and to return the number of written bytes.
2026-04-03 10:46:15 +02:00
William Lallemand
d89ae36adc MINOR: ssl: add the ssl_fc_crtname sample fetch
This new sample fetch returns the name of the certificate selected for
an incoming SSL/TLS connection, as it would appear in "show ssl cert".
It may be a filename with its relative or absolute path, or an alias,
depending on how the certificate was declared in the configuration.

The certificate name is stored as ex_data on the SSL_CTX at load time
in ckch_inst_new_load_store(), and freed via a dedicated free callback.
2026-04-03 10:41:00 +02:00
Willy Tarreau
efb1ab57be BUG/MINOR: cfgcond: fail cleanly on missing argument for "feature"
The "feature" predicate takes an argument name. Not passing one will
cause strstr() to always find something, including at the end of the
string, and to read past end that ASAN detects. We need to check that
we didn't reach end before proceeding.

This bug was reported by OSS Fuzz here:
   https://issues.oss-fuzz.com/issues/499133314

The issue is present since 2.4 with commit 58ca706e16 ("MINOR: config:
add predicate "feature" to detect certain built-in features") so this
fix must be backported to all stable versions.
2026-04-03 09:17:35 +02:00
Willy Tarreau
f9ba750fd9 BUG/MINOR: cfgcond: always set the error string on awslc_api checks
Using awslc_api_before() with an invalid argument results in "(null)"
appearing in the error message due to -1 being returned without the
error message being filled. Let's always fill the error message on error.

This was introduced in 3.3 with commit 3d15c07ed0 ("MINOR: cfgcond: add
"awslc_api_atleast" and "awslc_api_before""), and this fix must be
backported to 3.3.
2026-04-03 08:58:49 +02:00
Willy Tarreau
bf04e64f2c BUG/MINOR: cfgcond: always set the error string on openssl_version checks
Using openssl_version_before() with an invalid argument results in "(null)"
appearing in the error message due to -1 being returned without the error
message being filled. Let's always fill the error message on error.

This was introduced in 2.5 with commit 3aeb3f9347 ("MINOR: cfgcond:
implements openssl_version_atleast and openssl_version_before"), and
this fix must be backported to 2.6.
2026-04-03 08:56:54 +02:00
Willy Tarreau
3608374d6d BUG/MINOR: cfgcond: properly set the error pointer on evaluation error
cfg_eval_condition() says that the <errptr> pointer will be set upon
error. However, cfg_eval_cond_expr() can fail (e.g. failure to handle
a dynamic argument) but would branch to "done" and leave errptr unset.
Let's check for this case as well.

This bug was reported by OSS Fuzz here:
   https://issues.oss-fuzz.com/issues/499135825

The bug was introduced in 2.5 around commit ca81887599 ("MINOR:
cfgcond: insert an expression between the condition and the term") so
the fix must be backported as far as 2.6.
2026-04-03 08:51:02 +02:00
William Lallemand
6df3662077 MEDIUM: acme: split the initial delay from the retry DNS delay
Some checks are pending
Contrib / build (push) Waiting to run
alpine/musl / gcc (push) Waiting to run
VTest / Generate Build Matrix (push) Waiting to run
VTest / (push) Blocked by required conditions
Windows / Windows, gcc, all features (push) Waiting to run
The previous ACME_RSLV_WAIT state served a dual role: it applied the
initial dns-delay before the first DNS probe and also handled the
delay between retries. There was no way to simply wait a fixed delay
before submitting the challenge without also triggering DNS pre-checks.

Replace ACME_RSLV_WAIT with two distinct states:
  - ACME_INITIAL_DELAY: an optional initial wait before proceeding,
    only applied when "challenge-ready" includes the new "delay" keyword
  - ACME_RSLV_RETRY_DELAY: the delay between resolution retries, always
    applied when DNS pre-checks are in progress

The new "delay" keyword in "challenge-ready" can be used standalone
(wait then submit the challenge directly) or combined with "dns" (wait
then start the DNS pre-checks). When "delay" is not set, the first DNS
probe fires immediately.

Update the documentation accordingly.
2026-04-02 18:29:26 +02:00
William Lallemand
6ca83eb731 MINOR: acme: remove remaining CLI wait in ACME_RSLV_TRIGGER
The TASK_WOKEN_TIMER check that previously handled the case where
RSLV_TRIGGER was reached directly from the CLI command is therefore dead
code and can be removed.
2026-04-02 18:23:35 +02:00
Amaury Denoyelle
22b7da1464 BUG/MINOR: mux_quic: fix uninit for QMux emission
Fix the following build warning from obsolete compilers for <orig_frm>
variable in qcc_qstrm_send_frames() function :

src/mux_quic_qstrm.c:266:17: warning: 'orig_frm' may be used
  uninitialized in this function [-Wmaybe-uninitialized]

The variable is now explicitely initialized to NULL on each loop, which
should prevent this warning. Note that for code clarity, the variable is
renamed <next_frm>.

No need to backport.
2026-04-02 16:58:00 +02:00
William Lallemand
253bf8cbae MEDIUM: acme: initialize the dns timer starting from the first DNS request
Previously the dns timeout timer was initialized in ACME_RSLV_WAIT,
before the initial dns-delay expires. This meant the countdown started
before any DNS request was actually sent, so the effective timeout was
shorter than expected by one dns-delay period.

Move the initialization to ACME_RSLV_TRIGGER so the timer starts only
when the first DNS resolution attempt is triggered. Update the
documentation to clarify this behaviour.
2026-04-02 15:54:02 +02:00
William Lallemand
6228ec6a81 MINOR: acme: split the CLI wait from the resolve wait
Add a new state ACME_CLI_WAIT which allows to split the CLI wait logic
from the resolve wait logic.
2026-04-02 15:54:02 +02:00
Amaury Denoyelle
2457701299 MAJOR: mux-quic: activate QMux on the backend side
Some checks failed
Contrib / build (push) Has been cancelled
alpine/musl / gcc (push) Has been cancelled
VTest / Generate Build Matrix (push) Has been cancelled
Windows / Windows, gcc, all features (push) Has been cancelled
VTest / (push) Has been cancelled
During connect_server(), xprt_qstrm is selected to performed transport
parameters exchange prior to the mux layer initialization.
2026-04-02 14:02:05 +02:00
Amaury Denoyelle
490b465fd1 MAJOR: mux-quic: activate QMux for frontend side
To be able to support QMux protocol, xprt_qstrm is activated by the
session for transport parameters exchange, prior to the mux layer
initialization.
2026-04-02 14:02:05 +02:00
Amaury Denoyelle
b26178396a MINOR: mux-quic: use QMux transport parameters from qstrm xprt
Defines an API for xprt_qstrm so that the QMux transport parameters can
be retrieved by the MUX layer on its initialization. This concerns both
local and remote parameters.

Functions xprt_qstrm_lparams/rparams() are defined and exported for
this. They are both used in qmux_init() if QMux protocol is active.
2026-04-02 14:02:04 +02:00
Amaury Denoyelle
3c42a7e9ac MEDIUM: ssl: load xprt_qstrm after handshake completion
On SSL handshake completion, MUX layer can be initialized if not already
the case. However, for QMux protocol, it is necessary first to perform
transport parameters exchange, via the new xprt_qstrm layer. This patch
ensures this is performed if any flag CO_FL_QSTRM_* is set on the
connection.

Also, SSL layer registers itself via add_xprt. This ensures that it can
be used by xprt_qstrm for the emission/reception of the necessary
frames.
2026-04-02 14:02:04 +02:00
Amaury Denoyelle
f1ed1de317 MINOR: xprt_qstrm: implement sending of transport parameters
This patch implements QMux emission of transport parameters via
xprt_qstrm. Similarly to receive, this is performed in conn_send_qstrm()
which uses lower xprt snd_buf operation. The connection must first be
flagged with CO_FL_QSTRM_SEND to trigger this step.
2026-04-02 14:02:04 +02:00
Amaury Denoyelle
531a2b9f1a MINOR: xprt_qstrm: implement reception of transport parameters
Extend xprt_qstrm to implement the reception of QMux transport
parameters. This is performed via conn_recv_qstrm() which relies on the
lower xprt rcv_buf operation. Once received, parameters are kept in
xprt_qstrm context, so that the MUX can retrieve them on init.

For the reception of parameters to be active, the connection must first
be flagged with CO_FL_QSTRM_RECV.
2026-04-02 14:02:04 +02:00
Amaury Denoyelle
91ea5809e9 MINOR: xprt_qstrm: define callback for ALPN retrieval
Add get_alpn operation support for xprt_qstrm. This simply acts as a
passthrough method to the underlying XPRT layer.

This function is necessary for QMux when running above SSL, as mux-quic
will access ALPN during its initialization in order to instantiate the
proper application protocol layer.
2026-04-02 14:02:04 +02:00
Amaury Denoyelle
4dd224b5ef MINOR: xprt_qstrm: define new xprt module for QMux protocol
Define a new XPRT layer for the new QMux protocol. Its role will be to
perform the initial exchange of transport parameters.

On completion, contrary to XPRT handshake, xprt_qstrm will first init
the MUX and then removes itself. This will be necessary so that the
parameters can be retrieved by the MUX during its initialization.

This patch only declares the new xprt_qstrm along with basic operations.
Future commits will implement the proper reception/emission steps.
2026-04-02 14:02:04 +02:00
Amaury Denoyelle
782894f5b8 MINOR: mux-quic: define Tx connection buffer for QMux
Similarly to reception, a new buffer is defined in QCC connection to
handle emission for QMux protocol. This replaces the trash buffer usage
in qcc_qstrm_send_frames().

This buffer is necessary to handle partial emission. On retry, the
buffer must be completely emitted before starting to send new frames.
2026-04-02 14:02:04 +02:00
Amaury Denoyelle
621f21f6fd MINOR: mux-quic: implement qstream send callback
Each time a QUIC frame is emitted, mux-quic layer is notified via a
callback to update the underlying QCS. For QUIC, this is performed via
qc_stream_desc element.

In QMux protocol, this can be simplified as there is no
qc_stream_desc/quic_conn layer interaction. Instead, each time snd_buf
is called, QCS can be updated immediately using its return value. This
is performed via a new function qstrm_ctrl_send().

Its work is similar to the QUIC equivalent but in a simpler mode. In
particular, sent data can be immediately removed from the Tx buffer as
there is no need for retransmission when running above TCP.
2026-04-02 14:02:04 +02:00
Amaury Denoyelle
e8d9eb4f7a MEDIUM: mux-quic: implement QMux send
This patchs implement mux-quic reception for the new QMux protocol. This
is performed via the new function qcc_qstrm_send_frames(). Its interface
is similar to the QUIC equivalent : it takes a list of frames and
encodes them in a buffer before sending it via snd_buf.

Contrary to QUIC, a check on CO_FL_ERROR flag is performed prior to
every qcc_qstrm_send_frames() invokation to interrupt emission. This is
necessary as the transport layer may set it during snd_buf. This is not
the case currently for quic_conn layer, but maybe a similar mechanism
should be implemented as well for QUIC in the future.
2026-04-02 14:02:04 +02:00