Commit graph

9118 commits

Author SHA1 Message Date
Amaury Denoyelle
42c495f3d7 MINOR: ncbmbuf: implement ncbmb_data()
Implement ncbmb_data() function for the ncbmbuf type. Its purpose is
similar to its ncbuf counterpart : it returns the size in bytes of data
starting at a specific offset until the next gap.

As the previous patch, this commit must be backported prior to the fix
to come on QUIC CRYPTO frames parsing.
2025-10-22 15:04:06 +02:00
Amaury Denoyelle
1e1a3aa6aa MINOR: ncbmbuf: implement add
This patch implements add operation for ncbmbuf type.

This function is simpler than its ncbuf counterpart. Indeed, for now
only NCB_ADD_OVERWRT mode is supported. This compromise has been chosen
as ncbmbuf will be first used for QUIC CRYPTO frames handling, which
does not mandate to compare existing filled blocks during insertion.

As the previous patch, this commit must be backported prior to the fix
to come on QUIC CRYPTO frames parsing.
2025-10-22 15:04:06 +02:00
Amaury Denoyelle
b9f91ad3ff MINOR: ncbmbuf: define new ncbmbuf type
Define ncbmbuf which is an alternative non-contiguous buffer
implementation. "bm" abbreviation stands for bitmap, which reflects how
gaps and filled blocks are encoded. The main purpose of this
implementation is to get rid of the ncbuf limitation regarding the
minimal size for gaps between two blocks of data.

This commit adds the new module ncbmbuf. Along with it, some utility
functions such as ncbmb_make(), ncbmb_init() and ncbmb_is_empty() are
defined. Public API of ncbmbuf will be extended in the following
patches.

This patch is not considered a bug fix. However, it will be required to
fix issue encountered on QUIC CRYPTO frames parsing. Thus, it will be
necessary to backport the current patch prior to the fix to come.
2025-10-22 15:04:06 +02:00
Amaury Denoyelle
59f0bafef2 MINOR: ncbuf: extract common types
ncbuf is a module which provide a non-contiguous buffer type
implementation. This patch extracts some basic types related to it into
a new file ncbuf_common.h.

This patch will be useful to provide a new non-contiguous buffer
alternative implementation based on a bitmap.

This patch is not a bug fix. However, it is necessary for ncbmbuf
implementation which will be required to fix a QUIC issue on CRYPTO
frames parsing. This, it will be necessary to backport the current patch
prior to the fix to come.
2025-10-22 11:11:20 +02:00
Olivier Houchard
d5562e31bd MEDIUM: stick-tables: Remove the table lock
Remove the table lock, it was only protecting the per-table expiration
date, and that task is gone.
2025-10-20 15:04:47 +02:00
Olivier Houchard
8bc8a21b25 MEDIUM: stick-tables: Use a per-shard expiration task
Instead of having per-table expiration tasks, just use one per shard.
The task will now go through all the tables to expire entries. When a
table gets an expiration earlier than the one previously known, it will
be put in a mt-list, and the task will be responsible to put it into an
eb32, ordered based on the next expiration.
Each per-shard task will run on a different thread, so it should lead to
a better load distribution than the per-table tasks.
2025-10-20 15:04:47 +02:00
Olivier Houchard
945aa0ea82 MINOR: initcalls: Add a new initcall stage, STG_INIT_2
Add a new initcall stage, STG_INIT_2, for stuff to be called after
step_init_2() is called, so after we know for sure that global.nbthread
will be set.
Modify stick-tables stkt_late_init() to run at STG_INIT_2 instead of
STG_INIT, in anticipation for it to be enhanced and have a need for
global.nbthread.
2025-10-20 15:04:41 +02:00
Olivier Houchard
7a33b90b3c BUG/MEDIUM: mt_list: Make sure not to unlock the element twice
Some checks are pending
Contrib / build (push) Waiting to run
alpine/musl / gcc (push) Waiting to run
VTest / Generate Build Matrix (push) Waiting to run
VTest / (push) Blocked by required conditions
Windows / Windows, gcc, all features (push) Waiting to run
In mt_list_delete(), if the element was not in a list, then n and p will
point to it, and so setting n->prev and n->next will be enough to unlock it.
Don't do it twice, as once it's been done the first time, another thread may
be working with it, and may have added it to a list already, and doing it
a second time can lead to list inconsistencies.

This should be backported up to 2.8.
2025-10-19 23:21:42 +02:00
Frederic Lecaille
51eca5cbce BUG/MINOR: quic: SSL counters not handled
Some checks are pending
Contrib / build (push) Waiting to run
alpine/musl / gcc (push) Waiting to run
VTest / Generate Build Matrix (push) Waiting to run
VTest / (push) Blocked by required conditions
Windows / Windows, gcc, all features (push) Waiting to run
The SSL counters were not handled at all for QUIC connections. This patch
implement ssl_sock_update_counters() extracting the code from ssl_sock.c
and call this function where applicable both in TLS/TCP and QUIC parts.

Must be backported as far as 2.8.
2025-10-17 12:13:43 +02:00
Willy Tarreau
17930edecc MEDIUM: pools: detect() when munmap() fails in UAF mode
Some checks failed
Contrib / build (push) Has been cancelled
alpine/musl / gcc (push) Has been cancelled
VTest / Generate Build Matrix (push) Has been cancelled
Windows / Windows, gcc, all features (push) Has been cancelled
VTest / (push) Has been cancelled
Better check that munmap() always works, otherwise it means we might
have miscalculated an address, and if it fails silently, it will eat
all the memory extremely quickly. Let's add a BUG_ON() on munmap's
return.
2025-10-13 19:22:31 +02:00
Willy Tarreau
0e6a233217 BUG/MEDIUM: pools: fix bad freeing of aligned pools in UAF mode
As reported by Christopher, in UAF mode memory release of aligned
objects as introduced in commit ef915e672a ("MEDIUM: pools: respect
pool alignment in allocations") does not work. The padding calculation
in the freeing code is no longer correct since it now depends on the
alignment, so munmap() fails on EINVAL. Fortunately we don't care much
about it since we know it's the low bits of the passed address, which
is much simpler to compute, since all mmaps are page-aligned.

There's no need to backport this, as this was introduced in 3.3.
2025-10-13 19:19:39 +02:00
Willy Tarreau
fda6dc9597 MINOR: regex: use a thread-local match pointer for pcre2
The pcre2 matching requires an array of matches for grouping, that is
allocated when executing the rule by pre-processing it, and that is
immediately freed after use. This is quite inefficient and results in
annoying patterns in "show profiling" that attribute the allocations
to libpcre2 and the releases to haproxy.

A good suggestion from Dragan is to pre-allocate these per thread,
since the entry is not specific to a regex. In addition we're already
limited to MAX_MATCH matches so we don't even have the problem of
having to grow it while parsing nor processing.

The current patch adds a per-thread pair of init/deinit functions to
allocate a thread-local entry for that, and gets rid of the dynamic
allocations. It will result in cleaner memory management patterns and
slightly higher performance (+2.5%) when using pcre2.
2025-10-13 16:56:43 +02:00
Remi Tricot-Le Breton
bf5b912a62 MINOR: jwt: Add specific error code for known but unavailable certificate
A certificate that does not have the 'jwt' flag enabled cannot be used
for JWT validation. We now raise a specific return value so that such a
case can be identified.
2025-10-13 10:38:52 +02:00
Remi Tricot-Le Breton
18ff130e9d MINOR: jwt: Add new "jwt" certificate option
This option can be used to enable the use of a given certificate for JWT
verification. It defaults to 'off' so certificates that are declared in
a crt-store and will be used for JWT verification must have a
"jwt on" option in the configuration.
2025-10-13 10:38:52 +02:00
Remi Tricot-Le Breton
53957c50c3 MINOR: jwt: Do not look into ckch_store for jwt_verify converter
We must not try to load full-on certificates for 'jwt_verify' converter
anymore. 'jwt_verify_cert' is the only one that accepts a certificate.
2025-10-13 10:38:52 +02:00
Remi Tricot-Le Breton
f5632fd481 MINOR: jwt: Add new jwt_verify_cert converter
This converter will be in charge of performing the same operation as the
'jwt_verify' one except that it takes a full-on pem certificate path
instead of a public key path as parameter.
The certificate path can be either provided directly as a string or via
a variable. This allows to use certificates that are not known during
init to perform token validation.
2025-10-13 10:38:52 +02:00
Remi Tricot-Le Breton
c3c0597a34 MEDIUM: jwt: Remove certificate support in jwt_verify converter
The jwt_verify converter will not take full-on certificates anymore
in favor of a new soon to come jwt_verify_cert. We might end up with a
new jwt_verify_hmac in the future as well which would allow to deprecate
the jwt_verify converter and remove the need for a specific internal
tree for public keys.
The logic to always look into the internal jwt tree by default and
resolve to locking the ckch tree as little as possible will also be
removed. This allows to get rid of the duplicated reference to
EVP_PKEYs, the one in the jwt tree entry and the one in the ckch_store.
2025-10-13 10:38:52 +02:00
Christopher Faulet
4145a61101 BUG/MEDIUM: stconn: Properly forward kip to the opposite SE descriptor
By refactoring the HTX to remove the extra field, a bug was introduced in
the stream-connector part. The <kip> (known input payload) value of a sedesc
was moved to <kop> (knwon output payload) using the same sedesc. Of course,
this is totally wrong. <kip> value of a sedesc must be forwarded to the
opposite side.

In addition, the operation is performed in sc_conn_send(). In this function,
we manipulate the stream-connectors. So se_fwd_kip() function was changed to
use the stream-connectors directely.

Now, the function sc_ep_fwd_kip() is now called with the both
stream-connectors to properly forward <kip> from on side to the opposite
side.

The bug is 3.3-specific. No backport needed.
2025-10-10 11:01:21 +02:00
Christopher Faulet
914538cd39 MEDIUM: htx: Remove the HTX extra field
Thanks for previous changes, it is now possible to remove the <extra> field
from the HTX structure. HTX_FL_ALTERED_PAYLOAD flag is also removed because
it is now unsued.
2025-10-08 11:10:42 +02:00
Christopher Faulet
c0b6db2830 MINOR: stconn: Add two fields in sedesc to replace the HTX extra value
For now, the HTX extra value is used to specify the known part, in bytes, of
the HTTP payload we will receive. It may concerne the full payload if a
content-length is specified or the current chunk for a chunk-encoded
message. The main purpose of this value is to be used on the opposite side
to be able to announce chunks bigger than a buffer. It can also be used to
check the validity of the payload on the sending path, to properly detect
too big or too short payload.

However, setting this information in the HTX message itself is not really
appropriate because the information is lost when the HTX message is consumed
and the underlying buffer released. So the producer must take care to always
add it in all HTX messages. it is especially an issue when the payload is
altered by a filter.

So to fix this design issue, the information will be moved in the sedesc. It
is a persistent area to save the information. In addition, to avoid the
ambiguity between what the producer say and what the consumer see, the
information will be splitted in two fields. In this patch, the fields are
added:

 * kip : The known input payload length
 * kop : The known output payload lenght

The producer will be responsible to set <kip> value. The stream will be
responsible to decrement <kip> and increment <kop> accordingly. And the
consumer will be responsible to remove consumed bytes from <kop>.
2025-10-08 11:01:36 +02:00
Willy Tarreau
75103e7701 MINOR: proxy: introduce proxy_abrt_close_def() to pass the desired default
With this function we can now pass the desired default value for the
abortonclose option when neither the option nor its opposite were set.
Let's also take this opportunity for using it directly from the HTTP
analyser since there's no point in re-checking the proxy's mode there.
2025-10-08 10:29:41 +02:00
Willy Tarreau
644b3dc7d8 MAJOR: proxy: enable abortonclose by default on HTTP proxies
As discussed on https://github.com/orgs/haproxy/discussions/3146 and on
the mailing list, there's a marked preference for having abortonclose
enabled by default when relevant. The point being that with todays'
internet, the large majority of requests sent with a closed input
channel are aborted requests, and that it's pointless to waste resources
processing them.

This patch now considers both "option abortonclose" and its opposite
"no option abortonclose" to figure whether abortonclose is enabled or
disabled in a backend. When neither are set (thus not even inherited
from a defaults section), then it considers the proxy's mode, and HTTP
mode implies abortonclose by default.

This may make some legacy services fail starting with 3.3. In this case
it will be sufficient to add "no option abortonclose" in either the
affected backend or the defaults section it derives from. But for
internet-facing proxies it's better to stay with the option enabled.
2025-10-08 10:29:41 +02:00
Willy Tarreau
fe47e8dfc5 MINOR: proxy: only check abortonclose through a dedicated function
In order to prepare for changing the way abortonclose works, let's
replace the direct flag check with a similarly named function
(proxy_abrt_close) which returns the on/off status of the directive
for the proxy. For now it simply reflects the flag's state.
2025-10-08 10:29:41 +02:00
William Lallemand
69bd253b23 CLEANUP: mjson: remove unused defines from mjson.h
This patch removes unused defines from mjson.h.
It also removes unused c++ declarations and includes.

string.h is moved to mjson.c
2025-10-06 09:30:07 +02:00
William Lallemand
8ea8aaace2 CLEANUP: mjson: remove MJSON_ENABLE_BASE64 code
Remove the code used under #if MJSON_ENABLE_BASE64, which is not used
within haproxy, to ease the maintenance of mjson.
2025-10-03 16:09:13 +02:00
William Lallemand
4edb05eb12 CLEANUP: mjson: remove MJSON_ENABLE_NEXT code
Remove the code used under #if MJSON_ENABLE_NEXT, which is not used
within haproxy, to ease the maintenance of mjson.
2025-10-03 16:08:17 +02:00
William Lallemand
a4eeeeeb07 CLEANUP: mjson: remove MJSON_ENABLE_PRINT code
Remove the code used under #if MJSON_ENABLE_PRINT, which is not used
within haproxy, to ease the maintenance of mjson.
2025-10-03 16:07:59 +02:00
William Lallemand
d63dfa34a2 CLEANUP: mjson: remove MJSON_ENABLE_RPC code
Remove the code used under #if MJSON_ENABLE_RPC, which is not used
within haproxy, to ease the maintenance of mjson.
2025-10-03 16:06:33 +02:00
Willy Tarreau
1afaa7b59d MINOR: rawsock: introduce CO_RFL_TRY_HARDER to detect closures on complete reads
Normally, when reading a full buffer, or exactly the requested size, it
is not really possible to know if the peer had closed immediately after,
and usually we don't care. There's a problematic case, though, which is
with SSL: the SSL layer reads in small chunks of a few bytes, and can
consume a client_hello this way, then start computation without knowing
yet that the client has aborted. In order to permit knowing more, we now
introduce a new read flag, CO_RFL_TRY_HARDER, which says that if we've
read up to the permitted limit and the flag is set, then we attempt one
extra byte using MSG_PEEK to detect whether the connection was closed
immediately after that content or not. The first use case will obviously
be related to SSL and client_hello, but it might possibly also make sense
on HTTP responses to detect a pending FIN at the end of a response (e.g.
if a close was already advertised).
2025-10-01 10:23:01 +02:00
Willy Tarreau
25f5f357cc MINOR: sched: pass the thread number to is_sched_alive()
Now it will be possible to query any thread's scheduler state, not
only the current one. This aims at simplifying the watchdog checks
for reported threads. The operation is now a simple atomic xchg.
2025-10-01 10:18:53 +02:00
Olivier Houchard
cf26745857 MINOR: mt_list: Implement MT_LIST_POP_LOCKED()
Implement MT_LIST_POP_LOCKED(), that behaves as MT_LIST_POP() and
removes the first element from the list, if any, but keeps it locked.

This should be backported to 3.2, as it will be use in a bug fix in the
stick tables that affects 3.2 too.
2025-09-30 16:25:07 +02:00
William Lallemand
b70c7f48fa MINOR: acme: implement "reuse-key" option
The new "reuse-key" option in the "acme" section, allows to keep the
private key instead of generating a new one at each renewal.
2025-09-27 21:41:39 +02:00
William Lallemand
3e72a9f618 MINOR: acme: provider-name for dpapi sink
Like "acme-vars", the "provider-name" in the acme section is used in
case of DNS-01 challenge and is sent to the dpapi sink.

This is used to pass the name of a DNS provider in order to chose the
DNS API to use.

This patch implements the cfg_parse_acme_vars_provider() which parses
either acme-vars or provider-name options and escape their strings.

Example:

     $ ( echo "@@1 show events dpapi -w -0"; cat - ) | socat /tmp/master.sock -  | cat -e
     <0>2025-09-18T17:53:58.831140+02:00 acme deploy foobpar.pem thumbprint gDvbPL3w4J4rxb8gj20mGEgtuicpvltnTl6j1kSZ3vQ$
     acme-vars "var1=foobar\"toto\",var2=var2"$
     provider-name "godaddy"$
     {$
       "identifier": {$
         "type": "dns",$
         "value": "example.com"$
       },$
       "status": "pending",$
       "expires": "2025-09-25T14:41:57Z",$
       [...]
2025-09-26 10:23:35 +02:00
William Lallemand
92c31a6fb7 MINOR: acme: acme-vars allow to pass data to the dpapi sink
In the case of the dns-01 challenge, the agent that handles the
challenge might need some extra information which depends on the DNS
provider.

This patch introduces the "acme-vars" option in the acme section, which
allows to pass these data to the dpapi sink. The double quotes will be
escaped when printed in the sink.

Example:

    global
        setenv VAR1 'foobar"toto"'

    acme LE
        directory https://acme-staging-v02.api.letsencrypt.org/directory
        challenge DNS-01
        acme-vars "var1=${VAR1},var2=var2"

Would output:

    $ ( echo "@@1 show events dpapi -w -0"; cat - ) | socat /tmp/master.sock -  | cat -e
    <0>2025-09-18T17:53:58.831140+02:00 acme deploy foobpar.pem thumbprint gDvbPL3w4J4rxb8gj20mGEgtuicpvltnTl6j1kSZ3vQ$
    acme-vars "var1=foobar\"toto\",var2=var2"$
    {$
      "identifier": {$
        "type": "dns",$
        "value": "example.com"$
      },$
      "status": "pending",$
      "expires": "2025-09-25T14:41:57Z",$
      [...]
2025-09-19 16:40:53 +02:00
Aurelien DARRAGON
5c299dee5a MEDIUM: stats: consider that shared stats pointers may be NULL
This patch looks huge, but it has a very simple goal: protect all
accessed to shared stats pointers (either read or writes), because
we know consider that these pointers may be NULL.

The reason behind this is despite all precautions taken to ensure the
pointers shouldn't be NULL when not expected, there are still corner
cases (ie: frontends stats used on a backend which no FE cap and vice
versa) where we could try to access a memory area which is not
allocated. Willy stumbled on such cases while playing with the rings
servers upon connection error, which eventually led to process crashes
(since 3.3 when shared stats were implemented)

Also, we may decide later that shared stats are optional and should
be disabled on the proxy to save memory and CPU, and this patch is
a step further towards that goal.

So in essence, this patch ensures shared stats pointers are always
initialized (including NULL), and adds necessary guards before shared
stats pointers are de-referenced. Since we already had some checks
for backends and listeners stats, and the pointer address retrieval
should stay in cpu cache, let's hope that this patch doesn't impact
stats performance much.
2025-09-18 16:49:51 +02:00
Willy Tarreau
08c6bbb542 OPTIM: sink: don't waste time calling sink_announce_dropped() if busy
If we see that another thread is already busy trying to announce the
dropped counter, there's no point going there, so let's just skip all
that operation from sink_write() and avoid disturbing the other thread.
This results in a boost from 244 to 262k req/s.
2025-09-18 09:07:35 +02:00
Willy Tarreau
361c227465 MINOR: trace: don't call strlen() on the function's name
Currently there's a small mistake in the way the trace function and
macros. The calling function name is known as a constant until the
macro and passed as-is to the __trace() function. That one needs to
know its length and will call ist() on it, resulting in a real call
to strlen() while that length was known before the call. Let's use
an ist instead of a const char* for __trace() and __trace_enabled()
so that we can now completely avoid calling strlen() during this
operation. This has significantly reduced the importance of
__trace_enabled() in perf top.
2025-09-18 08:31:57 +02:00
Willy Tarreau
8c077c17eb MINOR: server: add the "cc" keyword to set the TCP congestion controller
It is possible on at least Linux and FreeBSD to set the congestion control
algorithm to be used with outgoing connections, among the list of supported
and permitted ones. Let's expose this setting with "cc". Unknown or
forbidden algorithms will be ignored and the default one will continue to
be used.
2025-09-17 17:19:33 +02:00
Willy Tarreau
4ed3cf295d MINOR: listener: add the "cc" bind keyword to set the TCP congestion controller
It is possible on at least Linux and FreeBSD to set the congestion control
algorithm to be used with incoming connections, among the list of supported
and permitted ones. Let's expose this setting with "cc". Permission issues
might be reported (as warnings).
2025-09-17 17:03:42 +02:00
Ben Kallus
31d0695a6a IMPORT: ebtree: replace hand-rolled offsetof to avoid UB
The C standard specifies that it's undefined behavior to dereference
NULL (even if you use & right after). The hand-rolled offsetof idiom
&(((s*)NULL)->f) is thus technically undefined. This clutters the
output of UBSan and is simple to fix: just use the real offsetof when
it's available.

Note that there's no clear statement about this point in the spec,
only several points which together converge to this:

- From N3220, 6.5.3.4:
  A postfix expression followed by the -> operator and an identifier
  designates a member of a structure or union object. The value is
  that of the named member of the object to which the first expression
  points, and is an lvalue.

- From N3220, 6.3.2.1:
  An lvalue is an expression (with an object type other than void) that
  potentially designates an object; if an lvalue does not designate an
  object when it is evaluated, the behavior is undefined.

- From N3220, 6.5.4.4 p3:
  The unary & operator yields the address of its operand. If the
  operand has type "type", the result has type "pointer to type". If
  the operand is the result of a unary * operator, neither that operator
  nor the & operator is evaluated and the result is as if both were
  omitted, except that the constraints on the operators still apply and
  the result is not an lvalue. Similarly, if the operand is the result
  of a [] operator, neither the & operator nor the unary * that is
  implied by the [] is evaluated and the result is as if the & operator
  were removed and the [] operator were changed to a + operator.

=> In short, this is saying that C guarantees these identities:
    1. &(*p) is equivalent to p
    2. &(p[n]) is equivalent to p + n

As a consequence, &(*p) doesn't result in the evaluation of *p, only
the evaluation of p (and similar for []). There is no corresponding
special carve-out for ->.

See also: https://pvs-studio.com/en/blog/posts/cpp/0306/

After this patch, HAProxy can run without crashing after building w/
clang-19 -fsanitize=undefined -fno-sanitize=function,alignment

This is ebtree commit bd499015d908596f70277ddacef8e6fa998c01d5.
Signed-off-by: Willy Tarreau <w@1wt.eu>
This is ebtree commit 5211c2f71d78bf546f5d01c8d3c1484e868fac13.
2025-09-17 14:30:32 +02:00
Willy Tarreau
a31da78685 IMPORT: ebtree: add a definition of offsetof()
We'll use this to improve the definition of container_of(). Let's define
it if it does not exist. We can rely on __builtin_offsetof() on recent
enough compilers.

This is ebtree commit 1ea273e60832b98f552b9dbd013e6c2b32113aa5.
Signed-off-by: Willy Tarreau <w@1wt.eu>
This is ebtree commit 69b2ef57a8ce321e8de84486182012c954380401.
2025-09-17 14:30:32 +02:00
Ben Kallus
ddbff4e235 IMPORT: ebtree: Fix UB from clz(0)
From 'man gcc': passing 0 as the argument to "__builtin_ctz" or
"__builtin_clz" invokes undefined behavior. This triggers UBsan
in HAProxy.

[wt: tested in treebench and verified not to cause any performance
 regression with opstime-u32 nor stress-u32]
Signed-off-by: Willy Tarreau <w@1wt.eu>
This is ebtree commit 8c29daf9fa6e34de8c7684bb7713e93dcfe09029.
Signed-off-by: Willy Tarreau <w@1wt.eu>
This is ebtree commit cf3b93736cb550038325e1d99861358d65f70e9a.
2025-09-17 14:30:32 +02:00
Willy Tarreau
52c6dd773d IMPORT: ebst: use prefetching in lookup() and insert()
While the previous optimizations couldn't be preserved due to the
possibility of out-of-bounds accesses, at least the prefetch is useful.
A test on treebench shows that for 64k short strings, the lookup time
falls from 276 to 199ns per lookup (28% savings), and the insert falls
from 311 to 296ns (4.9% savings), which are pretty respectable, so
let's do this.

This is ebtree commit b44ea5d07dc1594d62c3a902783ed1fb133f568d.
2025-09-17 14:30:32 +02:00
Willy Tarreau
fef4cfbd21 IMPORT: ebtree: only use __builtin_prefetch() when supported
It looks like __builtin_prefetch() appeared in gcc-3.1 as there's no
mention of it in 3.0's doc. Let's replace it with eb_prefetch() which
maps to __builtin_prefetch() on supported compilers and falls back to
the usual do{}while(0) on other ones. It was tested to properly build
with tcc as well as gcc-2.95.

This is ebtree commit 7ee6ede56a57a046cb552ed31302b93ff1a21b1a.
2025-09-17 14:30:32 +02:00
Willy Tarreau
3dda813d54 IMPORT: eb32/64: optimize insert for modern CPUs
Similar to previous patches, let's improve the insert() descent loop to
avoid discovering mandatory data too late. The change here is even
simpler than previous ones, a prefetch was installed and troot is
calculated before last instruction in a speculative way. This was enough
to gain +50% insertion rate on random data.

This is ebtree commit e893f8cc4d44b10f406b9d1d78bd4a9bd9183ccf.
2025-09-17 14:30:32 +02:00
Willy Tarreau
61654c07bd IMPORT: ebmb: optimize the lookup for modern CPUs
This is the same principles as for the latest improvements made on
integer trees. Applying the same recipes made the ebmb_lookup()
function jump from 10.07 to 12.25 million lookups per second on a
10k random values tree (+21.6%).

It's likely that the ebmb_lookup_longest() code could also benefit
from this, though this was neither explored nor tested.

This is ebtree commit a159731fd6b91648a2fef3b953feeb830438c924.
2025-09-17 14:30:32 +02:00
Willy Tarreau
6c54bf7295 IMPORT: eb32/eb64: place an unlikely() on the leaf test
In the loop we can help the compiler build slightly more efficient code
by placing an unlikely() around the leaf test. This shows a consistent
0.5% performance gain both on eb32 and eb64.

This is ebtree commit 6c9cdbda496837bac1e0738c14e42faa0d1b92c4.
2025-09-17 14:30:32 +02:00
Willy Tarreau
384907f4e7 IMPORT: eb32: drop the now useless node_bit variable
This one was previously used to preload from the node and keep a copy
in a register on i386 machines with few registers. With the new more
optimal code it's totally useless, so let's get rid of it. By the way
the 64 bit code didn't use that at all already.

This is ebtree commit 1e219a74cfa09e785baf3637b6d55993d88b47ef.
2025-09-17 14:30:31 +02:00
Willy Tarreau
c9e4adf608 IMPORT: eb32/eb64: use a more parallelizable check for lack of common bits
Instead of shifting the XOR value right and comparing it to 1, which
roughly requires 2 sequential instructions, better test if the XOR has
any bit above the current bit, which means any bit set among those
strictly higher, or in other words that XOR & (-bit << 1) is non-zero.
This is one less instruction in the fast path and gives another nice
performance gain on random keys (in million lookups/s):

    eb32   1k:  33.17 -> 37.30   +12.5%
          10k:  15.74 -> 17.08   +8.51%
         100k:   8.00 ->  9.00   +12.5%
    eb64   1k:  34.40 -> 38.10   +10.8%
          10k:  16.17 -> 17.10   +5.75%
         100k:   8.38 ->  8.87   +5.85%

This is ebtree commit c942a2771758eed4f4584fe23cf2914573817a6b.
2025-09-17 14:30:31 +02:00
Willy Tarreau
6af17d491f IMPORT: eb32/eb64: reorder the lookup loop for modern CPUs
The current code calculates the next troot based on a calculation.
This was efficient when the algorithm was developed many years ago
on K6 and K7 CPUs running at low frequencies with few registers and
limited branch prediction units but nowadays with ultra-deep pipelines
and high latency memory that's no longer efficient, because the CPU
needs to have completed multiple operations before knowing which
address to start fetching from. It's sad because we only have two
branches each time but the CPU cannot know it. In addition, the
calculation is performed late in the loop, which does not help the
address generation unit to start prefetching next data.

Instead we should help the CPU by preloading data early from the node
and calculing troot as soon as possible. The CPU will be able to
postpone that processing until the dependencies are available and it
really needs to dereference it. In addition we must absolutely avoid
serializing instructions such as "(a >> b) & 1" because there's no
way for the compiler to parallelize that code nor for the CPU to pre-
process some early data.

What this patch does is relatively simple:

  - we try to prefetch the next two branches as soon as the
    node is known, which will help dereference the selected node in
    the next iteration; it was shown that it only works with the next
    changes though, otherwise it can reduce the performance instead.
    In practice the prefetching will start a bit later once the node
    is really in the cache, but since there's no dependency between
    these instructions and any other one, we let the CPU optimize as
    it wants.

  - we preload all important data from the node (next two branches,
    key and node.bit) very early even if not immediately needed.
    This is cheap, it doesn't cause any pipeline stall and speeds
    up later operations.

  - we pre-calculate 1<<bit that we assign into a register, so as
    to avoid serializing instructions when deciding which branch to
    take.

  - we assign the troot based on a ternary operation (or if/else) so
    that the CPU knows upfront the two possible next addresses without
    waiting for the end of a calculation and can prefetch their contents
    every time the branch prediction unit guesses right.

Just doing this provides significant gains at various tree sizes on
random keys (in million lookups per second):

  eb32   1k:  29.07 -> 33.17  +14.1%
        10k:  14.27 -> 15.74  +10.3%
       100k:   6.64 ->  8.00  +20.5%
  eb64   1k:  27.51 -> 34.40  +25.0%
        10k:  13.54 -> 16.17  +19.4%
       100k:   7.53 ->  8.38  +11.3%

The performance is now much closer to the sequential keys. This was
done for all variants ({32,64}{,i,le,ge}).

Another point, the equality test in the loop improves the performance
when looking up random keys (since we don't need to reach the leaf),
but is counter-productive for sequential keys, which can gain ~17%
without that test. However sequential keys are normally not used with
exact lookups, but rather with lookup_ge() that spans a time frame,
and which does not have that test for this precise reason, so in the
end both use cases are served optimally.

It's interesting to note that everything here is solely based on data
dependencies, and that trying to perform *less* operations upfront
always ends up with lower performance (typically the original one).

This is ebtree commit 05a0613e97f51b6665ad5ae2801199ad55991534.
2025-09-17 14:30:31 +02:00
Aurelien DARRAGON
644b6b9925 MINOR: counters: document that tg shared counters are tied to shm-stats-file mapping
Let's explicitly mention that fe_counters_shared_tg and
be_counters_shared_tg structs are embedded in shm_stats_file_object
struct so any change in those structs will result in shm stats file
incompatibility between processes, thus extra precaution must be
taken when making changes to them.

Note that the provisionning made in shm_stats_file_object struct could
be used to add members to {fe,be}_counters_shared_tg without changing
shm_stats_file_object struct size if needed in order to preserve
shm stats file version.
2025-09-17 11:31:29 +02:00
Willy Tarreau
4edff4a2cc CLEANUP: vars: use the item API for the variables trees
The variables trees use the immediate cebtree API, better use the
item one which is more expressive and safer. The "node" field was
renamed to "name_node" to avoid any ambiguity.
2025-09-16 10:51:23 +02:00
Willy Tarreau
2d6b5c7a60 MEDIUM: connection: reintegrate conn_hash_node into connection
Previously the conn_hash_node was placed outside the connection due
to the big size of the eb64_node that could have negatively impacted
frontend connections. But having it outside also means that one
extra allocation is needed for each backend connection, and that one
memory indirection is needed for each lookup.

With the compact trees, the tree node is smaller (16 bytes vs 40) so
the overhead is much lower. By integrating it into the connection,
We're also eliminating one pointer from the connection to the hash
node and one pointer from the hash node to the connection (in addition
to the extra object bookkeeping). This results in saving at least 24
bytes per total backend connection, and only inflates connections by
16 bytes (from 240 to 256), which is a reasonable compromise.

Tests on a 64-core EPYC show a 2.4% increase in the request rate
(from 2.08 to 2.13 Mrps).
2025-09-16 09:23:46 +02:00
Willy Tarreau
ceaf8c1220 MEDIUM: connection: move idle connection trees to ceb64
Idle connection trees currently require a 56-byte conn_hash_node per
connection, which can be reduced to 32 bytes by moving to ceb64. While
ceb64 is theoretically slower, in practice here we're essentially
dealing with trees that almost always contain a single key and many
duplicates. In this case, ceb64 insert and lookup functions become
faster than eb64 ones because all duplicates are a list accessed in
O(1) while it's a subtree for eb64. In tests it is impossible to tell
the difference between the two, so it's worth reducing the memory
usage.

This commit brings the following memory savings to conn_hash_node
(one per backend connection), and to srv_per_thread (one per thread
and per server):

     struct       before  after  delta
  conn_hash_nodea   56     32     -24
  srv_per_thread    96     72     -24

The delicate part is conn_delete_from_tree(), because we need to
know the tree root the connection is attached to. But thanks to
recent cleanups, it's now clear enough (i.e. idle/safe/avail vs
session are easy to distinguish).
2025-09-16 09:23:46 +02:00
Willy Tarreau
95b8adff67 MINOR: connection: pass the thread number to conn_delete_from_tree()
We'll soon need to choose the server's root based on the connection's
flags, and for this we'll need the thread it's attached to, which is
not always the current one. This patch simply passes the thread number
from all callers. They know it because they just set the idle_conns
lock on it prior to calling the function.
2025-09-16 09:23:46 +02:00
Willy Tarreau
7773d87ea6 CLEANUP: proxy: slightly reorganize fields to plug some holes
The proxy struct has several small holes that deserved being plugged by
moving a few fields around. Now we're down to 3056 from 3072 previously,
and the remaining holes are small.

At the moment, compared to before this series, we're seeing these
sizes:

    type\size   7d554ca62   current  delta
    listener       752        704     -48  (-6.4%)
    server        4032       3840    -192  (-4.8%)
    proxy         3184       3056    -128  (-4%)
    stktable      3392       3328     -64  (-1.9%)

Configs with many servers have shrunk by about 4% in RAM and configs
with many proxies by about 3%.
2025-09-16 09:23:46 +02:00
Willy Tarreau
8df81b6fcc CLEANUP: server: slightly reorder fields in the struct to plug holes
The struct server still has a lot of holes and padding that make it
quite big. By moving a few fields aronud between areas which do not
interact (e.g. boot vs aligned areas), it's quite easy to plug some
of them and/or to arrange larger ones which could be reused later with
a bit more effort. Here we've reduced holes by 40 bytes, allowing the
struct to shrink by one more cache line (64 bytes). The new size is
3840 bytes.
2025-09-16 09:23:46 +02:00
Willy Tarreau
d18d972b1f MEDIUM: server: index server ID using compact trees
The server ID is currently stored as a 32-bit int using an eb32 tree.
It's used essentially to find holes in order to automatically assign IDs,
and to detect duplicates. Let's change this to use compact trees instead
in order to save 24 bytes in struct server for this node, plus 8 bytes in
struct proxy. The server struct is still 3904 bytes large (due to
alignment) and the proxy struct is 3072.
2025-09-16 09:23:46 +02:00
Willy Tarreau
66191584d1 MEDIUM: listener: index listener ID using compact trees
The listener ID is currently stored as a 32-bit int using an eb32 tree.
It's used essentially to find holes in order to automatically assign IDs,
and to detect duplicates. Let's change this to use compact trees instead
in order to save 24 bytes in struct listener for this node, plus 8 bytes
in struct proxy. The struct listener is now 704 bytes large, and the
struct proxy 3080.
2025-09-16 09:23:46 +02:00
Willy Tarreau
1a95bc42c7 MEDIUM: proxy: index proxy ID using compact trees
The proxy ID is currently stored as a 32-bit int using an eb32 tree.
It's used essentially to find holes in order to automatically assign IDs,
and to detect duplicates. Let's change this to use compact trees instead
in order to save 24 bytes in struct proxy for this node, plus 8 bytes in
the root (which is static so not much relevant here). Now the proxy is
3088 bytes large.
2025-09-16 09:23:46 +02:00
Willy Tarreau
eab5b89dce MINOR: proxy: add proxy_index_id() to index a proxy by its ID
This avoids needlessly exposing the tree's root and the mechanics outside
of the low-level code.
2025-09-16 09:23:46 +02:00
Willy Tarreau
5e4b6714e1 MINOR: listener: add listener_index_id() to index a listener by its ID
This avoids needlessly exposing the tree's root and the mechanics outside
of the low-level code.
2025-09-16 09:23:46 +02:00
Willy Tarreau
5a5cec4d7a MINOR: server: add server_index_id() to index a server by its ID
This avoids needlessly exposing the tree's root and the mechanics outside
of the low-level code.
2025-09-16 09:23:46 +02:00
Willy Tarreau
0b0aefe19b MINOR: server: add server_get_next_id() to find next free server ID
This was previously achieved via the generic get_next_id() but we'll soon
get rid of generic ID trees so let's have a dedicated server_get_next_id().
As a bonus it reduces the exposure of the tree's root outside of the functions.
2025-09-16 09:23:46 +02:00
Willy Tarreau
23605eddb1 MINOR: listener: add listener_get_next_id() to find next free listener ID
This was previously achieved via the generic get_next_id() but we'll soon
get rid of generic ID trees so let's have a dedicated listener_get_next_id().
As a bonus it reduces the exposure of the tree's root outside of the functions.
2025-09-16 09:23:46 +02:00
Willy Tarreau
b2402d67b7 MINOR: proxy: add proxy_get_next_id() to find next free proxy ID
This was previously achieved via the generic get_next_id() but we'll soon
get rid of generic ID trees so let's have a dedicated proxy_get_next_id().
2025-09-16 09:23:46 +02:00
Willy Tarreau
f4059ea42f MEDIUM: stktable: index table names using compact trees
Here we're saving 64 bytes per stick-table, from 3392 to 3328, and the
change was really straightforward so there's no reason not to do it.
2025-09-16 09:23:46 +02:00
Willy Tarreau
d0d60a007d MEDIUM: proxy: switch conf.name to cebis_tree
This is used to index the proxy's name and it contains a copy of the
pointer to the proxy's name in <id>. Changing that for a ceb_node placed
just before <id> saves 32 bytes to the struct proxy, which is now 3112
bytes large.

Here we need to continue to support duplicates since they're still
allowed between type-incompatible proxies.

Interestingly, the use of cebis_next_dup() instead of cebis_next() in
proxy_find_by_name() allows us to get rid of an strcmp() that was
performed for each use_backend rule. A test with a large config
(100k backends) shows that we can get 3% extra performance on a
config involving a static use_backend rule (3.09M to 3.18M rps),
and even 4.5% on a dynamic rule selecting a random backend (2.47M
to 2.59M).
2025-09-16 09:23:46 +02:00
Willy Tarreau
fdf6fd5b45 MEDIUM: server: switch the host_dn member to cebis_tree
This member is used to index the hostname_dn contents for DNS resolution.
Let's replace it with a cebis_tree to save another 32 bytes (24 for the
node + 8 by avoiding the duplication of the pointer). The struct server is
now at 3904 bytes.
2025-09-16 09:23:46 +02:00
Willy Tarreau
413e903a22 MEDIUM: server: switch conf.name to cebis_tree
This is used to index the server name and it contains a copy of the
pointer to the server's name in <id>. Changing that for a ceb_node placed
just before <id> saves 32 bytes to the struct server, which remains 3968
bytes large due to alignment. The proxy struct shrinks by 8 bytes to 3144.

It's worth noting that the current way duplicate names are handled remains
based on the previous mechanism where dups were permitted. Ideally we
should now reject them during insertion and use unique key trees instead.
2025-09-16 09:23:46 +02:00
Willy Tarreau
0e99f64fc6 MEDIUM: server: switch addr_node to cebis_tree
This contains the text representation of the server's address, for use
with stick-tables with "srvkey addr". Switching them to a compact node
saves 24 more bytes from this structure. The key was moved to an external
pointer "addr_key" right after the node.

The server struct is now 3968 bytes (down from 4032) due to alignment, and
the proxy struct shrinks by 8 bytes to 3152.
2025-09-16 09:23:46 +02:00
Willy Tarreau
91258fb9d8 MEDIUM: guid: switch guid to more compact cebuis_tree
The current guid struct size is 56 bytes. Once reduced using compact
trees, it goes down to 32 (almost half). We're not on a critical path
and size matters here, so better switch to this.

It's worth noting that the name part could also be stored in the
guid_node at the end to save 8 extra byte (no pointer needed anymore),
however the purpose of this struct is to be embedded into other ones,
which is not compatible with having a dynamic size.

Affected struct sizes in bytes:

           Before     After   Diff
  server    4032       4032     0*
  proxy     3184       3160    -24
  listener   752        728    -24

*: struct server is full of holes and padding (176 bytes) and is
64-byte aligned. Moving the guid_node elsewhere such as after sess_conn
reduces it to 3968, or one less cache line. There's no point in moving
anything now because forthcoming patches will arrange other parts.
2025-09-16 09:23:46 +02:00
Willy Tarreau
e36b3b60b3 MEDIUM: migrate the patterns reference to cebs_tree
cebs_tree are 24 bytes smaller than ebst_tree (16B vs 40B), and pattern
references are only used during map/acl updates, so their storage is
pure loss between updates (which most of the time never happen). By
switching their indexing to compact trees, we can save 16 to 24 bytes
per entry depending on alightment (here it's 24 per struct but 16
practical as malloc's alignment keeps 8 unused).

Tested on core i7-8650U running at 3.0 GHz, with a file containing
17.7M IP addresses (16.7M different):

   $ time  ./haproxy -c -f acl-ip.cfg

Save 280 MB RAM for 17.7M IP addresses, and slightly speeds up the
startup (5.8%, from 19.2s to 18.2s), a part of which possible being
attributed to having to write less memory. Note that this is on small
strings. On larger ones such as user-agents, ebtree doesn't reread
the whole key and might be more efficient.

Before:
  RAM (VSZ/RSS): 4443912 3912444

  real    0m19.211s
  user    0m18.138s
  sys     0m1.068s

  Overhead  Command         Shared Object      Symbol
    44.79%  haproxy  haproxy            [.] ebst_insert
    25.07%  haproxy  haproxy            [.] ebmb_insert_prefix
     3.44%  haproxy  libc-2.33.so       [.] __libc_calloc
     2.71%  haproxy  libc-2.33.so       [.] _int_malloc
     2.33%  haproxy  haproxy            [.] free_pattern_tree
     1.78%  haproxy  libc-2.33.so       [.] inet_pton4
     1.62%  haproxy  libc-2.33.so       [.] _IO_fgets
     1.58%  haproxy  libc-2.33.so       [.] _int_free
     1.56%  haproxy  haproxy            [.] pat_ref_push
     1.35%  haproxy  libc-2.33.so       [.] malloc_consolidate
     1.16%  haproxy  libc-2.33.so       [.] __strlen_avx2
     0.79%  haproxy  haproxy            [.] pat_idx_tree_ip
     0.76%  haproxy  haproxy            [.] pat_ref_read_from_file
     0.60%  haproxy  libc-2.33.so       [.] __strrchr_avx2
     0.55%  haproxy  libc-2.33.so       [.] unlink_chunk.constprop.0
     0.54%  haproxy  libc-2.33.so       [.] __memchr_avx2
     0.46%  haproxy  haproxy            [.] pat_ref_append

After:
  RAM (VSZ/RSS): 4166108 3634768

  real    0m18.114s
  user    0m17.113s
  sys     0m0.996s

  Overhead  Command  Shared Object       Symbol
    38.99%  haproxy  haproxy             [.] cebs_insert
    27.09%  haproxy  haproxy             [.] ebmb_insert_prefix
     3.63%  haproxy  libc-2.33.so        [.] __libc_calloc
     3.18%  haproxy  libc-2.33.so        [.] _int_malloc
     2.69%  haproxy  haproxy             [.] free_pattern_tree
     1.99%  haproxy  libc-2.33.so        [.] inet_pton4
     1.74%  haproxy  libc-2.33.so        [.] _IO_fgets
     1.73%  haproxy  libc-2.33.so        [.] _int_free
     1.57%  haproxy  haproxy             [.] pat_ref_push
     1.48%  haproxy  libc-2.33.so        [.] malloc_consolidate
     1.22%  haproxy  libc-2.33.so        [.] __strlen_avx2
     1.05%  haproxy  libc-2.33.so        [.] __strcmp_avx2
     0.80%  haproxy  haproxy             [.] pat_idx_tree_ip
     0.74%  haproxy  libc-2.33.so        [.] __memchr_avx2
     0.69%  haproxy  libc-2.33.so        [.] __strrchr_avx2
     0.69%  haproxy  libc-2.33.so        [.] _IO_getline_info
     0.62%  haproxy  haproxy             [.] pat_ref_read_from_file
     0.56%  haproxy  libc-2.33.so        [.] unlink_chunk.constprop.0
     0.56%  haproxy  libc-2.33.so        [.] cfree@GLIBC_2.2.5
     0.46%  haproxy  haproxy             [.] pat_ref_append

If the addresses are totally disordered (via "shuf" on the input file),
we see both implementations reach exactly 68.0s (slower due to much
higher cache miss ratio).

On large strings such as user agents (1 million here), it's now slightly
slower (+9%):

Before:
  real    0m2.475s
  user    0m2.316s
  sys     0m0.155s

After:
  real    0m2.696s
  user    0m2.544s
  sys     0m0.147s

But such patterns are much less common than short ones, and the memory
savings do still count.

Note that while it could be tempting to get rid of the list that chains
all these pat_ref_elt together and only enumerate them by walking along
the tree to save 16 extra bytes per entry, that's not possible due to
the problem that insertion ordering is critical (think overlapping regex
such as /index.* and /index.html). Currently it's not possible to proceed
differently because patterns are first pre-loaded into the pat_ref via
pat_ref_read_from_file_smp() and later indexed by pattern_read_from_file(),
which has to only redo the second part anyway for maps/acls declared
multiple times.
2025-09-16 09:23:46 +02:00
Willy Tarreau
ddf900a0ce IMPORT: cebtree: import version 0.5.0 to support duplicates
The support for duplicates is necessary for various use cases related
to config names, so let's upgrade to the latest version which brings
this support. This updates the cebtree code to commit 808ed67 (tag
0.5.0). A few tiny adaptations were needed:
  - replace a few ceb_node** with ceb_root** since pointers are now
    tagged ;
  - replace cebu*.h with ceb*.h since both are now merged in the same
    include file. This way we can drop the unused cebu*.h files from
    cebtree that are provided only for compatibility.
  - rename immediate storage functions to cebXX_imm_XXX() as per the API
    change in 0.5 that makes immediate explicit rather than implicit.
    This only affects vars and tools.c:copy_file_name().

The tests continue to work.
2025-09-16 09:23:46 +02:00
Remi Tricot-Le Breton
257df69fbd BUG/MINOR: ocsp: Crash when updating CA during ocsp updates
If an ocsp response is set to be updated automatically and some
certificate or CA updates are performed on the CLI, if the CLI update
happens while the OCSP response is being updated and is then detached
from the udapte tree, it might be wrongly inserted into the update tree
in 'ssl_sock_load_ocsp', and then reinserted when the update finishes.

The update tree then gets corrupted and we could end up crashing when
accessing other nodes in the ocsp response update tree.

This patch must be backported up to 2.8.
This patch fixes GitHub #3100.
2025-09-15 15:34:36 +02:00
Aurelien DARRAGON
6a92b14cc1 MEDIUM: log/proxy: store log-steps selection using a bitmask, not an eb tree
An eb tree was used to anticipate for infinite amount of custom log steps
configured at a proxy level. In turns out this makes no sense to configure
that much logging steps for a proxy, and the cost of the eb tree is non
negligible in terms of memory footprint, especially when used in a default
section.

Instead, let's use a simple bitmask, which allows up to 64 logging steps
configured at proxy level. If we lack space some day (and need more than
64 logging steps to be configured), we could simply modify
"struct log_steps" to spread the bitmask over multiple 64bits integers,
minor some adjustments where the mask is set and checked.
2025-09-15 10:29:02 +02:00
Christopher Faulet
b582fd41c2 Revert "BUG/MINOR: ocsp: Crash when updating CA during ocsp updates"
This reverts commit 167ea8fc7b.

The patch was backported by mistake.
2025-09-15 10:16:20 +02:00
Remi Tricot-Le Breton
167ea8fc7b BUG/MINOR: ocsp: Crash when updating CA during ocsp updates
If an ocsp response is set to be updated automatically and some
certificate or CA updates are performed on the CLI, if the CLI update
happens while the OCSP response is being updated and is then detached
from the udapte tree, it might be wrongly inserted into the update tree
in 'ssl_sock_load_ocsp', and then reinserted when the update finishes.

The update tree then gets corrupted and we could end up crashing when
accessing other nodes in the ocsp response update tree.

This patch must be backported up to 2.8.
This patch fixes GitHub #3100.
2025-09-15 08:20:16 +02:00
Willy Tarreau
8fb5ae5cc6 MINOR: activity/memory: count allocations performed under a lock
By checking the current thread's locking status, it becomes possible
to know during a memory allocation whether it's performed under a lock
or not. Both pools and memprofile functions were instrumented to check
for this and to increment the memprofile bin's locked_calls counter.

This one, when not zero, is reported on "show profiling memory" with a
percentage of all allocations that such locked allocations represent.
This way it becomes possible to try to target certain code paths that
are particularly expensive. Example:

  $ socat - /tmp/sock1 <<< "show profiling memory"|grep lock
     20297301           0     2598054528              0|   0x62a820fa3991 sockaddr_alloc+0x61/0xa3 p_alloc(128) [pool=sockaddr] [locked=54962 (0.2 %)]
            0    20297301              0     2598054528|   0x62a820fa3a24 sockaddr_free+0x44/0x59 p_free(-128) [pool=sockaddr] [locked=34300 (0.1 %)]
      9908432           0     1268279296              0|   0x62a820eb8524 main+0x81974 p_alloc(128) [pool=task] [locked=9908432 (100.0 %)]
      9908432           0      554872192              0|   0x62a820eb85a6 main+0x819f6 p_alloc(56) [pool=tasklet] [locked=9908432 (100.0 %)]
       263001           0       63120240              0|   0x62a820fa3c97 conn_new+0x37/0x1b2 p_alloc(240) [pool=connection] [locked=20662 (7.8 %)]
        71643           0       47307584              0|   0x62a82105204d pool_get_from_os_noinc+0x12d/0x161 posix_memalign(660) [locked=5393 (7.5 %)]
2025-09-11 16:32:34 +02:00
Willy Tarreau
9d8c2a888b MINOR: activity: collect CPU time spent on memory allocations for each task
When task profiling is enabled, the pool alloc/free code will measure the
time it takes to perform memory allocation after a cache miss or memory
freeing to the shared cache or OS. The time taken with the thread-local
cache is never measured as measuring that time is very expensive compared
to the pool access time. Here doing so costs around 2% performance at 2M
req/s, only when task profiling is enabled, so this remains reasonable.
The scheduler takes care of collecting that time and updating the
sched_activity entry corresponding to the current task when task profiling
is enabled.

The goal clearly is to track places that are wasting CPU time allocating
and releasing too often, or causing large evictions. This appears like
this in "show profiling tasks aggr":

  Tasks activity over 11.428 sec till 0.000 sec ago:
    function                      calls   cpu_tot   cpu_avg   lkw_avg   lkd_avg   mem_avg   lat_avg
    process_stream             44183891   16.47m    22.36us   491.0ns   1.154us   1.000ns   101.1us
    h1_io_cb                   57386064   4.011m    4.193us   20.00ns   16.00ns      -      29.47us
    sc_conn_io_cb              42088024   49.04s    1.165us      -         -         -      54.67us
    h1_timeout_task              438171   196.5ms   448.0ns      -         -         -      100.1us
    srv_cleanup_toremove_conns       65   1.468ms   22.58us   184.0ns   87.00ns      -      101.3us
    task_process_applet               3   508.0us   169.3us      -      107.0us   1.847us   29.67us
    srv_cleanup_idle_conns            6   225.3us   37.55us   15.74us   36.84us      -      49.47us
    accept_queue_process              2   45.62us   22.81us      -         -      4.949us   54.33us
2025-09-11 16:32:34 +02:00
Willy Tarreau
195794eb59 MINOR: activity: add a new mem_avg column to show profiling stats
This new column will be used for reporting the average time spent
allocating or freeing memory in a task when task profiling is enabled.
For now it is not updated.
2025-09-11 16:32:34 +02:00
Willy Tarreau
98cc815e3e MINOR: activity: collect time spent with a lock held for each task
When DEBUG_THREAD > 0 and task profiling enabled, we'll now measure the
time spent with at least one lock held for each task. The time is
collected by locking operations when locks are taken raising the level
to one, or released resetting the level. An accumulator is updated in
the thread_ctx struct that is collected by the scheduler when the task
returns, and updated in the sched_activity entry of the related task.

This allows to observe figures like this one:

  Tasks activity over 259.516 sec till 0.000 sec ago:
    function                      calls   cpu_tot   cpu_avg   lkw_avg   lkd_avg   lat_avg
    h1_io_cb                   15466589   2.574m    9.984us      -         -      33.45us <- sock_conn_iocb@src/sock.c:1099 tasklet_wakeup
    sc_conn_io_cb               8047994   8.325s    1.034us      -         -      870.1us <- sc_app_chk_rcv_conn@src/stconn.c:844 tasklet_wakeup
    process_stream              7734689   4.356m    33.79us   1.990us   1.641us   1.554ms <- sc_notify@src/stconn.c:1206 task_wakeup
    process_stream              7734292   46.74m    362.6us   278.3us   132.2us   972.0us <- stream_new@src/stream.c:585 task_wakeup
    sc_conn_io_cb               7733158   46.88s    6.061us      -         -      68.78us <- h1_wake_stream_for_recv@src/mux_h1.c:3633 tasklet_wakeup
    task_process_applet         6603593   4.484m    40.74us   16.69us   34.00us   96.47us <- sc_app_chk_snd_applet@src/stconn.c:1043 appctx_wakeup
    task_process_applet         4761796   3.420m    43.09us   18.79us   39.28us   138.2us <- __process_running_peer_sync@src/peers.c:3579 appctx_wakeup
    process_table_expire        4710662   4.880m    62.16us   9.648us   53.95us   158.6us <- run_tasks_from_lists@src/task.c:671 task_queue
    stktable_add_pend_updates   4171868   6.786s    1.626us      -      1.487us   47.94us <- stktable_add_pend_updates@src/stick_table.c:869 tasklet_wakeup
    h1_io_cb                    2871683   1.198s    417.0ns   70.00ns   69.00ns   1.005ms <- h1_takeover@src/mux_h1.c:5659 tasklet_wakeup
    process_peer_sync           2304957   5.368s    2.328us      -      1.156us   68.54us <- stktable_add_pend_updates@src/stick_table.c:873 task_wakeup
    process_peer_sync           1388141   3.174s    2.286us      -      1.130us   52.31us <- run_tasks_from_lists@src/task.c:671 task_queue
    stktable_add_pend_updates    463488   3.530s    7.615us   2.000ns   7.134us   771.2us <- stktable_touch_with_exp@src/stick_table.c:654 tasklet_wakeup

Here we see that almost the entirety of stktable_add_pend_updates() is
spent under a lock, that 1/3 of the execution time of process_stream()
was performed under a lock and that 2/3 of it was spent waiting for a
lock (this is related to the 10 track-sc present in this config), and
that the locking time in process_peer_sync() has now significantly
reduced. This is more visible with "show profiling tasks aggr":

  Tasks activity over 475.354 sec till 0.000 sec ago:
    function                      calls   cpu_tot   cpu_avg   lkw_avg   lkd_avg   lat_avg
    h1_io_cb                   25742539   3.699m    8.622us   11.00ns   10.00ns   188.0us
    sc_conn_io_cb              22565666   1.475m    3.920us      -         -      473.9us
    process_stream             21665212   1.195h    198.6us   140.6us   67.08us   1.266ms
    task_process_applet        16352495   11.31m    41.51us   17.98us   36.55us   112.3us
    process_peer_sync           7831923   17.15s    2.189us      -      1.107us   41.27us
    process_table_expire        6878569   6.866m    59.89us   9.359us   51.91us   151.8us
    stktable_add_pend_updates   6602502   14.77s    2.236us      -      2.060us   119.8us
    h1_timeout_task                 801   703.4us   878.0ns      -         -      185.7us
    srv_cleanup_toremove_conns      347   12.43ms   35.82us   240.0ns   70.00ns   1.924ms
    accept_queue_process            142   1.384ms   9.743us      -         -      340.6us
    srv_cleanup_idle_conns           74   475.0us   6.418us   896.0ns   5.667us   114.6us
2025-09-11 16:32:34 +02:00
Willy Tarreau
95433f224e MINOR: activity: add a new lkd_avg column to show profiling stats
This new column will be used for reporting the average time spent
in a task with at least one lock held. It will only have a non-zero
value when DEBUG_THREAD > 0. For now it is not updated.
2025-09-11 16:32:34 +02:00
Willy Tarreau
4b23b2ed32 MINOR: thread: add a lock level information in the thread_ctx
The new lock_level field indicates the number of cumulated locks that
are held by the current thread. It's fed as soon as DEBUG_THREAD is at
least 1. In addition, thread_isolate() adds 128, so that it's even
possible to check for combinations of both. The value is also reported
in thread dumps (warnings and panics).
2025-09-11 16:32:34 +02:00
Willy Tarreau
503084643f MINOR: activity: collect time spent waiting on a lock for each task
When DEBUG_THREAD > 0, and if task profiling is enabled, then each
locking attempt will measure the time it takes to obtain the lock, then
add that time to a thread_ctx accumulator that the scheduler will then
retrieve to update the current task's sched_activity entry. The value
will then appear avearaged over the number of calls in the lkw_avg column
of "show profiling tasks", such as below:

  Tasks activity over 48.298 sec till 0.000 sec ago:
    function                      calls   cpu_tot   cpu_avg   lkw_avg   lat_avg
    h1_io_cb                    3200170   26.81s    8.377us      -      32.73us <- sock_conn_iocb@src/sock.c:1099 tasklet_wakeup
    sc_conn_io_cb               1657841   1.645s    992.0ns      -      853.0us <- sc_app_chk_rcv_conn@src/stconn.c:844 tasklet_wakeup
    process_stream              1600450   49.16s    30.71us   1.936us   1.392ms <- sc_notify@src/stconn.c:1206 task_wakeup
    process_stream              1600321   7.770m    291.3us   209.1us   901.6us <- stream_new@src/stream.c:585 task_wakeup
    sc_conn_io_cb               1599928   7.975s    4.984us      -      65.77us <- h1_wake_stream_for_recv@src/mux_h1.c:3633 tasklet_wakeup
    task_process_applet          997609   46.37s    46.48us   16.80us   113.0us <- sc_app_chk_snd_applet@src/stconn.c:1043 appctx_wakeup
    process_table_expire         922074   48.79s    52.92us   7.275us   181.1us <- run_tasks_from_lists@src/task.c:670 task_queue
    stktable_add_pend_updates    705423   1.511s    2.142us      -      56.81us <- stktable_add_pend_updates@src/stick_table.c:869 tasklet_wakeup
    task_process_applet          683511   34.75s    50.84us   18.37us   153.3us <- __process_running_peer_sync@src/peers.c:3579 appctx_wakeup
    h1_io_cb                     535395   198.1ms   370.0ns   72.00ns   930.4us <- h1_takeover@src/mux_h1.c:5659 tasklet_wakeup

It now makes it pretty obvious which tasks (hence call chains) spend their
time waiting on a lock and for what share of their execution time.
2025-09-11 16:32:34 +02:00
Willy Tarreau
1956c544b5 MINOR: activity: add a new lkw_avg column to show profiling stats
This new column will be used for reporting the average time spent waiting
for a lock. It will only have a non-zero value when DEBUG_THREAD > 0. For
now it is not updated.
2025-09-11 16:32:34 +02:00
Christopher Faulet
3023e98199 BUG/MINOR: resolvers: Restore round-robin selection on records in DNS answers
Since the commit dcb696cd3 ("MEDIUM: resolvers: hash the records before
inserting them into the tree"), When several records are found in a DNS
answer, the round robin selection over these records is no longer performed.

Indeed, before a list of records was used. To ensure each records was
selected one after the other, at each selection, the first record of the
list was moved at the end. When this list was replaced bu a tree, the same
mechanism was preserved. However, the record is indexed using its key, a
hash of the record. So its position never changes. When it is removed and
reinserted in the tree, its position remains the same. When we walk though
the tree, starting from the root, the records are always evaluated in the
same order. So, even if there are several records in a DNS answer, the same
IP address is always selected.

It is quite easy to trigger the issue with a do-resolv action.

To fix the issue, the node to perform the next selection is now saved. So
instead of restarting from the root each time, we can restart from the next
node of the previous call.

Thanks to Damien Claisse for the issue analysis and for the reproducer.

This patch should fix the issue #3116. It must be backported as far as 2.6.
2025-09-11 15:46:45 +02:00
William Lallemand
e52e6f66ac BUG/MEDIUM: jws: return size_t in JWS functions
JWS functions are supposed to return 0 upon error or when nothing was
produced. This was done in order to put easily the return value in
trash->data without having to check the return value.

However functions like a2base64url() or snprintf() could return a
negative value, which would be casted in a unsigned int if this happen.

This patch add checks on the JWS functions to ensure that no negative
value can be returned, and change the prototype from int to size_t.

This is also related to issue #3114.

Must be backported to 3.2.
2025-09-11 14:31:32 +02:00
Amaury Denoyelle
d293cc62dc MINOR: quic: display build warning for compat layer on recent OpenSSL
Build option USE_QUIC_OPENSSL_COMPAT=1 must be set to activate QUIC
support for OpenSSL prior to version 3.5.2. This compiles an internal
compatibility layer, which must be then activated at runtime with global
option limited-quic.

Starting from OpenSSL version 3.5.2, a proper QUIC TLS API is now
exposed. Thus, the compatibility layer is unneeded. However it can still
be compiled against newer OpenSSL releases and activated at runtime,
mostly for test purpose.

As this compatibility layer has some limitations, (no support for QUIC
0-RTT), it's important that users notice this situation and disable it
if possible. Thus, this patch adds a notice warning when
USE_QUIC_OPENSSL_COMPAT=1 is set when building against OpenSSL 3.5.2 and
above. This should be sufficient for users and packagers to understand
that this option is not necessary anymore.

Note that USE_QUIC_OPENSSL_COMPAT=1 is incompatible with others TLS
library which exposed a QUIC API based on original BoringSSL patches
set. A build error will prevent the compatibility layer to be built.
limited-quic option is thus silently ignored.
2025-09-11 10:11:12 +02:00
Frederic Lecaille
5027ba36a9 MINOR: quic-be: make SSL/QUIC objects use their own indexes (ssl_qc_app_data_index)
This index is used to retrieve the quic_conn object from its SSL object, the same
way the connection is retrieved from its SSL object for SSL/TCP connections.

This patch implements two helper functions to avoid the ugly code with such blocks:

   #ifdef USE_QUIC
   else if (qc) { .. }
   #endif

Implement ssl_sock_get_listener() to return the listener from an SSL object.
Implement ssl_sock_get_conn() to return the connection from an SSL object
and optionally a pointer to the ssl_sock_ctx struct attached to the connections
or the quic_conns.

Use this functions where applicable:
   - ssl_tlsext_ticket_key_cb() calls ssl_sock_get_listener()
   - ssl_sock_infocbk() calls ssl_sock_get_conn()
   - ssl_sock_msgcbk() calls ssl_sock_get_ssl_conn()
   - ssl_sess_new_srv_cb() calls ssl_sock_get_conn()
   - ssl_sock_srv_verifycbk() calls ssl_sock_get_conn()

Also modify qc_ssl_sess_init() to initialize the ssl_qc_app_data_index index for
the QUIC backends.
2025-09-11 09:51:28 +02:00
Frederic Lecaille
47bb15ca84 MINOR: quic: get rid of ->target quic_conn struct member
The ->li (struct listener *) member of quic_conn struct was replaced by a
->target (struct obj_type *) member by this commit:

    MINOR: quic-be: get rid of ->li quic_conn member

to abstract the connection type (front or back) when implementing QUIC for the
backends. In these cases, ->target was a pointer to the ojb_type of a server
struct. This could not work with the dynamic servers contrary to the listeners
which are not dynamic.

This patch almost reverts the one mentioned above. ->target pointer to obj_type member
is replaced by ->li pointer to listener struct member. As the listener are not
dynamic, this is easy to do this. All one has to do is to replace the
objt_listener(qc->target) statement by qc->li where applicable.

For the backend connection, when needed, this is always qc->conn->target which is
used only when qc->conn is initialized. The only "problematic" case is for
quic_dgram_parse() which takes a pointer to an obj_type as third argument.
But this obj_type is only used to call quic_rx_pkt_parse(). Inside this function
it is used to access the proxy counters of the connection thanks to qc_counters().
So, this obj_type argument may be null for now on with this patch. This is the
reason why qc_counters() is modified to take this into consideration.
2025-09-11 09:51:28 +02:00
Olivier Houchard
ff47ae60f3 MEDIUM: server: Introduce the concept of path parameters
Add a new field in struct server, path parameters. It will contain
connection informations for the server that are not expected to change.
For now, just store the ALPN negociated with the server. Each time an
handhskae is done, we'll update it, even though it is not supposed to
change. This will be useful when trying to send early data, that way
we'll know which mux to use.
Each time the server goes down or is disabled, those informations are
erased, as we can't be sure those parameters will be the same once the
server will be back up.
2025-09-09 19:01:24 +02:00
Olivier Houchard
5ab9954faa MINOR: ssl: Add a flag to let it known we have an ALPN negociated
Add a new flag to the ssl_sock_ctx, to be set as soon as the ALPN has
been negociated.
This happens before the handshake has been completed, and that
information will let us know that, when we receive early data, if the
ALPN has been negociated, then we can immediately create a mux, as the
ALPN will tell us which mux to use.
2025-09-09 19:01:24 +02:00
Willy Tarreau
f87cf8b76e MEDIUM: stick-tables: relax stktable_trash_oldest() to only purge what is needed
stktable_trash_oldest() does insist a lot on purging what was requested,
only limited by STKTABLE_MAX_UPDATES_AT_ONCE. This is called in two
conditions, one to allocate a new stksess, and the other one to purge
entries of a stopping process. The cost of iterating over all shards
is huge, and a shard lock is taken each time before looking up entries.

Moreover, multiple threads can end up doing the same and looking hard for
many entries to purge when only one is needed. Furthermore, all threads
start from the same shard, hence synchronize their locks. All of this
costs a lot to other operations such as access from peers.

This commit simplifies the approach by ignoring the budget, starting
from a random shard number, and using a trylock so as to be able to
give up early in case of contention. The approach chosen here consists
in trying hard to flush at least one entry, but once at least one is
evicted or at least one trylock failed, then a failure on the trylock
will result in finishing.

The function now returns a success as long as one entry was freed.

With this, tests no longer show watchdog warnings during tests, though
a few still remain when stopping the tests (which are not related to
this function but to the contention from process_table_expire()).

With this change, under high contention some entries' purge might be
postponed and the table may occasionally contain slightly more entries
than their size (though this already happens since stksess_new() first
increments ->current before decrementing it).

Measures were made on a 64-core system with 8 peers
of 16 threads each, at CPU saturation (350k req/s each doing 10
track-sc) for 10M req, with 3 different approaches:

  - this one resulted in 1500 failures to find an entry (0.015%
    size overhead), with the lowest contention and the fairest
    peers distibution.

  - leaving only after a success resulted in 229 failures (0.0029%
    size overhead) but doubled the time spent in the function (on
    the write lock precisely).

  - leaving only when both a success and a failed lock were met
    resulted in 31 failures (0.00031% overhead) but the contention
    was high enough again so that peers were not all up to date.

Considering that a saturated machine might exceed its entries by
0.015% is pretty minimal, the mechanism is kept.

This should be backported to 3.2 after a bit more testing as it
resolves some watchdog warnings and panics. It requires precedent
commit "MINOR: stick-table: permit stksess_new() to temporarily
allocate more entries" to over-allocate instead of failing in case
of contention.
2025-09-09 17:56:37 +02:00
Willy Tarreau
c3f94fbd9b DEBUG: stream: count the number of passes in the connect loop
Normally the connect loop cannot loop, but some recent traces can easily
convince one of the opposite. Let's add a counter, including in panic
dumps, in order to avoid the repeated long head scratching sessions
starting with "and what if...". In addition, if it's found to loop, this
time it will be certain and will indicate what to zoom in. This should
be backported to 3.2.
2025-09-09 17:56:14 +02:00
Amaury Denoyelle
0b6908385e BUG/MINOR: quic: properly support GSO on backend side
Previously, GSO emission was explicitely disabled on backend side. This
is not true since the following patch, thus GSO can be used, for example
when transfering large POST requests to a HTTP/3 backend.

  commit e064e5d461
  MINOR: quic: duplicate GSO unsupp status from listener to conn

However, GSO on the backend side may cause crash when handling EIO. In
this case, GSO must be completely disabled. Previously, this was
performed by flagging listener instance. In backend side, this would
cause a crash as listener is NULL.

This patch fixes it by supporting GSO disable flag for servers. Thus, in
qc_send_ppkts(), EIO can be converted either to a listener or server
flag depending on the quic_conn proxy side. On backend side, server
instance is retrieved via <qc.conn.target>. This is enough to guarantee
that server is not deleted.

This does not need to be backported.
2025-09-08 16:18:05 +02:00
Christopher Faulet
e653dc304e MINOR: pools: Don't dump anymore info about pools when purge is forced
Historically, when the purge of pools was forced by sending a SIGQUIT to
haproxy, information about the pools were first dumped. It is now totally
pointless because these info can be retrieved via the CLI. It is even less
relevant now because the purge is forced typically when there are memroy
issues and to dump pools information, data must be allocated.

dump_pools_info() function was simplified because it is now called only from
an applet. No reason to still try to dump info on stderr.
2025-09-08 16:04:40 +02:00
Amaury Denoyelle
f645cd3c74 MINOR: quic: restore QUIC_HP_SAMPLE_LEN constant
The below patch fixes padding emission for small packets, which is
required to ensure that header protection removal can be performed by
the recipient.

  commit d7dea408c6
  BUG/MINOR: quic: too short PADDING frame for too short packets

In addition to the proper fix, constant QUIC_HP_SAMPLE_LEN was removed
and replaced by QUIC_TLS_TAG_LEN. However, it still makes sense to have
a dedicated constant which represent the size of the sample used for
header protection. Thus, this patch restores it.

Special instructions for backport : above patch mentions that no
backport is needed. However, this is incorrect, as bug is introduced by
another patch scheduled for backport up to 2.6. Thus, it is first
mandatory to schedule d7dea408c6 after it.
Then, this patch can also be used for the sake of code clarity.
2025-09-08 14:49:03 +02:00
Frederic Lecaille
6f9fccec1f MINOR: quic: SSL session reuse for QUIC
Mimic the same behavior as the one for SSL/TCP connetion to implement the
SSL session reuse.

Extract the code which try to reuse the SSL session for SSL/TCP connections
to implement ssl_sock_srv_try_reuse_sess().
Call this function from QUIC ->init() xprt callback (qc_conn_init()) as this
done for SSL/TCP connections.
2025-09-08 11:46:26 +02:00
Frederic Lecaille
d7dea408c6 BUG/MINOR: quic: too short PADDING frame for too short packets
This bug arrvived with this commit:

    MINOR: quic: centralize padding for HP sampling on packet building

What was missed is the fact that at the centralization point for the
PADDING frame to add for too short packet, <len> payload length  already includes
<*pn_len> the packet number field length value.

So when computing the length of the PADDING frame, the packet field length must
not be considered and added to the payload length (<len>).

This bug leaded too short PADDING frame to too short packets. This was the case,
most of times with Application level packets with a 1-byte packet number field
followed by a 1-byte PING frame. A 1-byte PADDING frame was added in this case
in place of a correct 2-bytes PADDINF frame. The header packet protection of
such packet could not be removed by the clients as for instance for ngtcp2 with
such traces:

    I00001828 0x5a135c81e803f092c74bac64a85513b657 pkt could not decrypt packet number

As the header protection could no be removed, the header keyupdate bit could also
not be read by packet analyzers such as pyshark used during the keyupdate tests.

No need to backport.
2025-09-05 16:17:11 +02:00
Christopher Faulet
f9a6ae727c OPTIM: tcpcheck: Reorder tcpchek_connect structure fields to fill holes
Thanks to this patch, two 4-bytes holes are now filled in the
tcpchek_connect structure.
2025-09-05 15:56:42 +02:00
Christopher Faulet
ffc1f096e0 MEDIUM: httpcheck/ssl: Base the SNI value on the HTTP host header by default
Similarly to the automic SNI selection for regulat SSL traffic, the SNI of
health-checks HTTPS connection is now automatically set by default by using
the host header value. "check-sni-auto" and "no-check-sni-auto" server
settings were added to change this behavior.

Only implicit HTTPS health-checks can take advantage of this feature. In
this case, the host header value from the "option httpchk" directive is used
to extract the SNI. It is disabled if http-check rules are used. So, the SNI
must still be explicitly specified via a "http-check connect" rule.

This patch with should paritally fix the issue #3081.
2025-09-05 15:56:42 +02:00
Christopher Faulet
668916c1a2 MEDIUM: server/ssl: Base the SNI value to the HTTP host header by default
For HTTPS outgoing connections, the SNI is now automatically set using the
Host header value if no other value is already set (via the "sni" server
keyword). It is now the default behavior. It could be disabled with the
"no-sni-auto" server keyword. And eventually "sni-auto" server keyword may
be used to reset any previous "no-sni-auto" setting. This option can be
inherited from "default-server" settings. Finally, if no connection name is
set via "pool-conn-name" setting, the selected value is used.

The automatic selection of the SNI is enabled by default for all outgoing
connections. But it is concretely used for HTTPS connections only. The
expression used is "req.hdr(host),host_only".

This patch should paritally fix the issue #3081. It only covers the server
part. Another patch will add the feature for HTTP health-checks.
2025-09-05 15:56:42 +02:00
Christopher Faulet
f8f94ffc9c BUG/MEDIUM: server: Use sni as pool connection name for SSL server only
By default, for a given server, when no pool-conn-name is specified, the
configured sni is used. However, this must only be done when SSL is in-use
for the server. Of course, it is uncommon to have a sni expression for
now-ssl server. But this may happen.

In addition, the SSL may be disabled via the CLI. In that case, the
pool-conn-name must be discarded if it was copied from the sni. And, we must
of course take care to set it if the ssl is enabled.

Finally, when the attac-srv action is checked, we now checked the
pool-conn-name expression.

This patch should be backported as far as 3.0. It relies on "MINOR: server:
Parse sni and pool-conn-name expressions in a dedicated function" which
should be backported too.
2025-09-05 15:56:08 +02:00
Aurelien DARRAGON
1a1362ea0b MINOR: stats-file: reserve some bytes in exported structs
We may need additional struct members in shm_stats_file_object and
shm_stats_file_hdr, yet since these structs are exported they should
not change in size nor ordering else it would require a version change
to break compability on purpose since mapping would differ.

Here we reserve 64 additional bytes in shm_stats_file_object, and
128 bytes in shm_stats_file_hdr for future usage.
2025-09-03 16:29:48 +02:00
Aurelien DARRAGON
21d97ccfae BUILD: stats-file: fix aligment issues
Document some byte holes and fix some potential aligment issues
between 32 and 64 bits architectures to ensure the shm_stats_file memory
mapping is consistent between operating systems.
2025-09-03 16:28:46 +02:00
Aurelien DARRAGON
46a5948ed2 MINOR: compiler: add ALWAYS_PAD() macro
same as THREAD_PAD() but doesn't depend on haproxy being compiled with
thread support. It may be useful for memory (or files) that may be
shared between multiple processed.
2025-09-03 16:28:46 +02:00
Aurelien DARRAGON
585ece4c92 MEDIUM: stats-file/counters: store and preload stats counters as shm file objects
This is the last patch of the shm stats file series, in this patch we
implement the logic to store and fetch shm stats objects and associate
them to existing shared counters on the current process.

Shm objects are stored in the same memory location as the shm stats file
header. In fact they are stored right after it. All objects (struct
shm_stats_file_object) have the same size (no matter their type), which
allows for easy object traversal without having to check the object's
type, and could permit the use of external tools to scan the SHM in the
future. Each object stores a guid (of GUID_MAX_LEN+1 size) and tgid
which allows to match corresponding shared counters indexes. Also,
as stated before, each object stores the list of users making use of
it. Objects are never released (the map can only grow), but unused
objects (when no more users or active users are found in objects->users),
the object is automatically recycled. Also, each object stores its
type which defines how the object generic data member should be handled.

Upon startup (or reload), haproxy first tries to scan existing shm to
find objects that could be associated to frontends, backends, listeners
or servers in the current config based on GUID. For associations that
couldn't be made, haproxy will automatically create missing objects in
the SHM during late startup. When haproxy matches with an existing object,
it means the counter from an older process is preserved in the new
process, so multiple processes temporarily share the same counter for as
long as required for older processes to eventually exit.
2025-09-03 15:59:37 +02:00
Aurelien DARRAGON
ee17d20245 MINOR: stats-file: add process slot management for shm stats file
Now that all processes tied to the same shm stats file now share a
common clock source, we introduce the process slot notion in this
patch.

Each living process registers itself in a map at a free index: each slot
stores information about the process' PID and heartbeat. Each process is
responsible for updating its heartbeat, a slot is considered as "free" if
the heartbeat was never set or if the heartbeat is expired (60 seconds of
inactivity). The total number of slots is set to 64, this is on purpose
because it allows to easily store the "users" of a given shm object using
a 64 bits bitmask. Given that when haproxy is reloaded olders processes
are supposed to die eventually, it should be large enough (64 simultaneous
processes) to be safe. If we manage to reach this limit someday, more
slots could be added by splitting "users" bitmask on multiple 64bits
variable.
2025-09-03 15:59:33 +02:00
Aurelien DARRAGON
443e657fd6 MEDIUM: stats-file: processes share the same clock source from shm-stats-file
The use of the "shm-stats-file" directive now implies that all processes
using the same file now share a common clock source, this is required
for consistency regarding time-related operations.

The clock source is stored in the shm stats file header.
When the directive is set, all processes share the same clock
(global_now_ms and global_now_ns both point to variables in the map),
this is required for time-based counters such as freq counters to work
consistently. Since all processes manipulate global clock with atomic
operations exclusively during runtime, and don't systematically relies
on it (thanks to local now_ms and now_ns), it is pretty much transparent.
2025-09-03 15:59:27 +02:00
Aurelien DARRAGON
c91d93ed1c MINOR: stats-file: introduce shm-stats-file directive
add initial support for the "shm-stats-file" directive and
associated "shm-stats-file-max-objects" directive. For now they are
flagged as experimental directives.

The shared memory file is automatically created by the first process.
The file is created using open() so it is up to the user to provide
relevant path (either on regular filesystem or ramfs for performance
reasons). The directive takes only one argument which is path of the
shared memory file. It is passed as-is to open().

The maximum number of objects per thread-group (hard limit) that can be
stored in the shm is defined by "shm-stats-file-max-objects" directive,

Upon initial creation, the main shm stats file header is provisioned with
the version which must remains the same to be compatible between processes
and defaults to 2k. which means approximately 1mb max per thread group
and should cover most setups. When the limit is reached (during startup)
an error is reported by haproxy which invites the user to increase the
"shm-stats-file-max-objects" if desired, but this means more memory will
be allocated. Actual memory usage is low at start, because only the mmap
(mapping) is provisionned with the maximum number of objects to avoid
relocating the memory area during runtime, but the actual shared memory
file is dynamically resized when objects are added (resized by following
half power of 2 curve when new objects are added, see upcoming commits)

For now only the file is created, further logic will be implemented in
upcoming commits.
2025-09-03 15:59:22 +02:00
Aurelien DARRAGON
cb08bcb9d6 MINOR: counters: retrieve detailed errmsg upon failure with counters_{fe,be}_shared_prepare()
counters_{fe,be}_shared_prepare now take an extra <errmsg> parameter
that contains additional hints about the error in case of failure.

It must be freed accordingly since it is allocated using memprintf
2025-09-03 15:59:17 +02:00
Amaury Denoyelle
a84b404b34 MINOR: quic/flags: complete missing flags
Add missing quic_conn flags definition for dev utility.
2025-09-02 09:37:43 +02:00
Amaury Denoyelle
1517869145 BUG/BUILD: stats: fix build due to missing stat enum definition
Recently, new server counter for private idle connections have been
added to statistics output. However, the patch was missing
ST_I_PX_PRIV_IDLE_CUR enum definition.

No need to backport.
2025-08-29 09:32:10 +02:00
Amaury Denoyelle
dbe31e3f65 MEDIUM: session: account on server idle conns attached to session
This patch adds a new member <curr_sess_idle_conns> on the server. It
serves as a counter of idle connections attached on a session instead of
regular idle/safe trees. This is used only for private connections.

The objective is to provide a method to detect if there is idle
connections still referencing a server.

This will be particularly useful to ensure that a server is removable.
Currently, this is not yet necessary as idle connections are directly
freed via "del server" handler under thread isolation. However, this
procedure will be replaced by an asynchronous mechanism outside of
thread isolation.

Careful: connections attached to a session but not idle will not be
accounted by this counter. These connections can still be detected via
srv_has_streams() so "del server" will be safe.

This counter is maintain during the whole lifetime of a private
connection. This is mandatory to guarantee "del server" safety and is
conform with other idle server counters. What this means it that
decrement is performed only when the connection transitions from idle to
in use, or just prior to its deletion. For the first case, this is
covered by session_get_conn(). The second case is trickier. It cannot be
done via session_unown_conn() as a private connection may still live a
little longer after its removal from session, most notably when
scheduled for idle purging.

Thus, conn_free() has been adjusted to handle the final decrement. Now,
conn_backend_deinit() is also called for private connections if
CO_FL_SESS_IDLE flag is present. This results in a call to
srv_release_conn() which is responsible to decrement server idle
counters.
2025-08-28 15:08:35 +02:00
Amaury Denoyelle
7a6e3c1a73 MAJOR: server: implement purging of private idle connections
When a server goes into maintenance, or if its IP address is changed,
idle connections attached to it are scheduled for deletion via the purge
mechanism. Connections are moved from server idle/safe list to the purge
list relative to their thread. Connections are freed on their owned
thread by the scheduled purge task.

This patch extends this procedure to also handle private idle
connections stored in sessions instead of servers. This is possible
thanks via <sess_conns> list server member. A call to the newly
defined-function session_purge_conns() is performed on each list
element. This moves private connections from their session to the purge
list alongside other server idle connections.

This change relies on the serie of previous commits which ensure that
access to private idle connections is now thread-safe, with idle_conns
lock usage and careful manipulation of private idle conns in
input/output handlers.

The main benefit of this patch is that now all idle connections
targetting a server set in maintenance are removed. Previously, private
connections would remain until their attach sessions were closed.
2025-08-28 15:08:35 +02:00
Amaury Denoyelle
73fd12e928 MEDIUM: conn/muxes/ssl: remove BE priv idle conn from sess on IO
This is a direct follow-up of previous patch which adjust idle private
connections access via input/output handlers.

This patch implement the handlers prologue part. Now, private idle
connections require a similar treatment with non-private idle
connections. Thus, private conns are removed temporarily from its
session under protection of idle_conns lock.

As locking usage is already performed in input/output handler,
session_unown_conn() cannot be called. Thus, a new function
session_detach_idle_conn() is implemented in session module, which
performs basically the same operation but relies on external locking.
2025-08-28 15:08:35 +02:00
Amaury Denoyelle
8de0807b74 MEDIUM: conn/muxes/ssl: reinsert BE priv conn into sess on IO completion
When dealing with input/output on a connection related handler, special
care must be taken prior to access the connection if it is considered as
idle, as it could be manipulated by another thread. Thus, connection is
first removed from its idle tree before processing. The connection is
reinserted on processing completion unless it has been freed during it.

Idle private connections are not concerned by this, because takeover is
not applied on them. However, a future patch will implement purging of
these connections along with regular idle ones. As such, it is necessary
to also protect private connections usage now. This is the subject of
this patch and the next one.

With this patch, input/output handlers epilogue of
muxes/SSL/conn_notify_mux() are adjusted. A new code path is able to
deal with a connection attached to a session instead of a server. In
this case, session_reinsert_idle_conn() is used. Contrary to
session_add_conn(), this new function is reserved for idle connections
usage after a temporary removal.

Contrary to _srv_add_idle() used by regular idle connections,
session_reinsert_idle_conn() may fail as an allocation can be required.
If this happens, the connection is immediately destroyed.

This patch has no effect for now. It must be coupled with the next one
which will temporarily remove private idle connections on input/output
handler prologue.
2025-08-28 15:08:35 +02:00
Amaury Denoyelle
f234b40cde MINOR: server: shard by thread sess_conns member
Server member <sess_conns> is a mt_list which contains every backend
connections attached to a session which targets this server. These
connecions are not present in idle server trees.

The main utility of this list is to be able to cleanup these connections
prior to removing a server via "del server" CLI. However, this procedure
will be adjusted by a future patch. As such, <sess_conns> member must be
moved into srv_per_thread struct. Effectively, this duplicates a list
for every threads.

This commit does not introduce functional change. Its goal is to ensure
that these connections are now ordered by their owning thread, which
will allow to implement a purge, similarly to idle connections attached
to servers.
2025-08-28 14:52:29 +02:00
Amaury Denoyelle
d4f7a2dbcc MINOR: session: uninline functions related to BE conns management
Move from header to source file functions related to session management
of backend connections. These functions are big enough to remove inline
attribute.
2025-08-28 14:52:29 +02:00
Amaury Denoyelle
d0df41fd22 MINOR: session: document explicitely that session_add_conn() is safe
A set of recent patches have simplified management of backend connection
attached to sessions. The API is now stricter to prevent any misuse.

One of this change is the addition of a BUG_ON() in session_add_conn(),
which ensures that a connection is not attached to a session if its
<owner> field points to another entry.

On older haproxy releases, this assertion could not be enforced due to
NTLM as a connection is turned as private during its transfer. When
using a true multiplexed protocol on the backend side, the connection
could be assigned in turn to several sessions. However, NTLM is now only
applied for HTTP/1.1 as it does not make sense if the connection is
already shared.

To better clarify this situation, extend the comment on BUG_ON() inside
session_add_conn().
2025-08-28 14:52:29 +02:00
Amaury Denoyelle
a96f1286a7 BUG/MINOR: connection: rearrange union list members
A connection can be stored in several lists, thus there is several
attach points in struct connection. Depending on its proxy side, either
frontend or backend, a single connection will only access some of them
during its lifetime.

As an optimization, these attach points are organized in a union.
However, this repartition was not correctly achieved along
frontend/backend side delimitation.

Furthermore, reverse HTTP has recently been introduced. With this
feature, a connection can migrate from frontend to backend side or vice
versa. As such, it becomes even more tedious to ensure that these
members are always accessed in a safe way.

This commit rearrange these fields. First, union is now clearly splitted
between frontend and backend only elements. Next, backend elements are
initialized with conn_backend_init(), which is already used during
connection reversal on an edge endpoint. A new function
conn_frontend_init() serves to initialize the other members, called both
on connection first instantiation and on reversal on a dialer endpoint.

This model is much cleaner and should prevent any access to fields from
the wrong side.

Currently, there is no known case of wrong access in the existing code
base. However, this cleanup is considered an improvement which must be
backported up to 3.0 to remove any possible undefined behavior.
2025-08-28 14:52:29 +02:00
Frederic Lecaille
31c17ad837 MINOR: quic: remove ->offset qf_crypto struct field
This patch follows this previous bug fix:

    BUG/MINOR: quic: reorder fragmented RX CRYPTO frames by their offsets

where a ebtree node has been added to qf_crypto struct. It has the same
meaning and type as ->offset_node.key field with ->offset_node an eb64tree node.
This patch simply removes ->offset which is no more useful.

This patch should be easily backported as far as 2.6 as the one mentioned above
to ease any further backport to come.
2025-08-28 08:19:34 +02:00
William Lallemand
18ebd81962 MINOR: ssl: diagnostic warning when both 'default-crt' and 'strict-sni' are used
It possible to use both 'strict-sni' and 'default-crt' on the same bind
line, which does not make much sense.

This patch implements a check which will look for default certificates
in the sni_w tree when strict-sni is used. (Referenced by their empty
sni ""). default-crt sets the CKCH_INST_EXPL_DEFAULT flag in
ckch_inst->is_default, so its possible to differenciate explicits
default from implicit default.

Could be backported as far as 3.0.

This was discussed in ticket #3082.
2025-08-27 16:22:12 +02:00
Frederic Lecaille
d753f24096 BUG/MINOR: quic: reorder fragmented RX CRYPTO frames by their offsets
This issue impacts the QUIC listeners. It is the same as the one fixed by this
commit:

	BUG/MINOR: quic: repeat packet parsing to deal with fragmented CRYPTO

As chrome, ngtcp2 client decided to fragment its CRYPTO frames but in a much
more agressive way. This could be fixed with a list local to qc_parse_pkt_frms()
to please chrome thanks to the commit above. But this is not sufficient for
ngtcp2 which often splits its ClientHello message into more than 10 fragments
with very small ones. This leads the packet parser to interrupt the CRYPTO frames
parsing due to the ncbuf gap size limit.

To fix this, this patch approximatively proceeds the same way but with an
ebtree to reorder the CRYPTO by their offsets. These frames are directly
inserted into a local ebtree. Then this ebtree is reused to provide the
reordered CRYPTO data to the underlying ncbuf (non contiguous buffer). This way
there are very few less chances for the ncbufs used to store CRYPTO data
to reach a too much fragmented state.

Must be backported as far as 2.6.
2025-08-27 16:14:19 +02:00
Aurelien DARRAGON
cdb97cb73e MEDIUM: server: split srv_init() in srv_preinit() + srv_postinit()
We actually need more granularity to split srv postparsing init tasks:
Some of them are required to be run BEFORE the config is checked, and
some of them AFTER the config is checked.

Thus we push the logic from 368d0136 ("MEDIUM: server: add and use
srv_init() function") a little bit further and split the function
in two distinct ones, one of them executed under check_config_validity()
and the other one using REGISTER_POST_SERVER_CHECK() hook.

SRV_F_CHECKED flag was removed because it is no longer needed,
srv_preinit() is only called once, and so is srv_postinit().
2025-08-27 12:54:19 +02:00
Christopher Faulet
71c01c1010 MINOR: applet: Make some applet functions HTX aware
applet_output_room() and applet_input_data() are now HTX aware. These
functions automatically rely on htx versions if APPLET_FL_HTX flag is set
for the applet.
2025-08-25 11:11:05 +02:00
Christopher Faulet
927884a3eb MINOR: applet: Add a flag to know an applet is using HTX buffers
Multiplexers already explicitly announce their HTX support. Now it is
possible to set flags on applet, it could be handy to do the same. So, now,
HTX aware applets must set the APPLET_FL_HTX flag.
2025-08-25 11:11:05 +02:00
Christopher Faulet
1c76e4b2e4 MINOR: applet: Add function to test applet flags from the appctx
appctx_app_test() function can now be used to test the applet flags using an
appctx. This simplify a bit tests on applet flags. For now, this function is
used to test APPLET_FL_NEW_API flag.
2025-08-25 11:11:05 +02:00
Christopher Faulet
3de6c375aa MINOR: applet: Rely on applet flag to detect the new api
Instead of setting a flag on the applet context by checking the defined
callback functions of the applet to know if an applet is using the new API
or not, we can now rely on the applet flags itself. By checking
APPLET_FL_NEW_API flag, it does the job. APPCTX_FL_INOUT_BUFS flag is thus
removed.
2025-08-25 11:11:05 +02:00
Amaury Denoyelle
1529ec1a25 MINOR: quic: centralize padding for HP sampling on packet building
The below patch has simplified INITIAL padding on emission. Now,
qc_prep_pkts() is responsible to activate padding for this case, and
there is no more special case in qc_do_build_pkt() needed.

  commit 8bc339a6ad
  BUG/MAJOR: quic: fix INITIAL padding with probing packet only

However, qc_do_build_pkt() may still activate padding on its own, to
ensure that a packet is big enough so that header protection decryption
can be performed by the peer. HP decryption is performed by extracting a
sample from the ciphered packet, starting 4 bytes after PN offset.
Sample length is 16 bytes as defined by TLS algos used by QUIC. Thus, a
QUIC sender must ensures that length of packet number plus payload
fields to be at least 4 bytes long. This is enough given that each
packet is completed by a 16 bytes AEAD tag which can be part of the HP
sample.

This patch simplifies qc_do_build_pkt() by centralizing padding for this
case in a single location. This is performed at the end of the function
after payload is completed. The code is thus simpler.

This is not a bug. However, it may be interesting to backport this patch
up to 2.6, as qc_do_build_pkt() is a tedious function, in particular
when dealing with padding generation, thus it may benefit greatly from
simplification.
2025-08-25 08:48:24 +02:00
Olivier Houchard
6f21c5631a MINOR: ssl: Add a way to globally disable ktls.
Add a new global option, "noktls", as well as a command line option,
"-dT", to totally disable ktls usage, even if it is activated on servers
or binds in the configuration.
That makes it easier to quickly figure out if a problem is related to
ktls or not.
2025-08-20 18:33:11 +02:00
Olivier Houchard
5c8fa50966 MEDIUM: ssl: Add ktls support for AWS-LC.
Add ktls support for AWS-LC. As it does not know anything
about ktls, it means extracting keys from the ssl lib, and provide them
to the kernel. At which point we can use regular recvmsg()/sendmsg()
calls.
This patch only provides support for TLS 1.2, AWS-LC provides a
different way to extract keys for TLS 1.3.
Note that this may work with BoringSSL too, but it has not been tested.
2025-08-20 18:33:11 +02:00
Olivier Houchard
ed7d20afc8 MEDIUM: ssl: Add kTLS support for OpenSSL.
Modify the SSL code to enable kTLS with OpenSSL.
It mostly requires our internal BIO to be able to handle the various
kTLS-specific controls in ha_ssl_ctrl(), as well as being able to use
recvmsg() and sendmsg() from ha_ssl_read() and ha_ssl_write().
2025-08-20 18:33:11 +02:00
Olivier Houchard
7836fe8fe3 MINOR: ssl: Define HAVE_VANILLA_OPENSSL if openssl is used.
If we're using OpenSSL as our crypto library, so add a define,
HAVE_VANILLA_OPENSSL, to make it easier to differentiate between the
various crypto libs.
2025-08-20 18:33:10 +02:00
Olivier Houchard
e8674658ae MINOR: cfgparse: Add a new "ktls" option to bind and server.
Add a new "ktls" option to bind and server. Valid values are "on" and
"off".
It currently does nothing, but when kTLS will be implemented, it will
enable or disable kTLS for the corresponding sockets.
It is marked as experimental for now.
2025-08-20 18:33:10 +02:00
Olivier Houchard
075e753802 MEDIUM: mux_h1/mux_pt: Use XPRT_CAN_SPLICE to decide if we should splice
In both mux_h1 and mux_pt, use the new XPRT_CAN_SPLICE capability to
decide if we should attempt to use splicing or not.
If we receive XPRT_CONN_CAN_MAYBE_SPLICE, add a new flag on the
connection, CO_FL_WANT_SPLICING, to let the xprt know that we'd love to
be able to do splicing, so that it may get ready for that.
This should have no effect right now, and is required work for adding
kTLS support.
2025-08-20 18:33:10 +02:00
Olivier Houchard
5731b8a19c MEDIUM: xprt: Add a "get_capability" method.
Add a new method to xprts, get_capability, that can be used to query if
an xprt supports something or not.
The first capability implemented is XPRT_CAN_SPLICE, to know if the xprt
will be able to use splicing for the provided connection.
The possible answers are XPRT_CONN_CAN_NOT_SPLICE, which indicates
splicing will never be possible for that connection,
XPRT_CONN_COULD_SPLICE, which indicates that splicing is not usable
right now, but may be in the future, and XPRT_CONN_CAN_SPLICE, that
means we can splice right away.
2025-08-20 18:33:10 +02:00
Olivier Houchard
2623b7822e MINOR: ssl: Add a "flags" field to ssl_sock_ctx.
Instead of adding more separate fields in ssl_sock_ctx, add a "flags"
one.
Convert the "can_send_early_data" to the flag SSL_SOCK_F_EARLY_ENABLED.
More flags will be added for kTLS support.
2025-08-20 17:28:03 +02:00
Olivier Houchard
3d685fcb7d MINOR: xprt: Add recvmsg() and sendmsg() parameters to rcv_buf() and snd_buf().
In rcv_buf() and snd_buf(), use sendmsg/recvmsg instead of send and
recv, and add two new optional parameters to provide msg_control and
msg_controllen.
Those are unused for now, but will be used later for kTLS.
2025-08-20 17:28:03 +02:00
Frederic Lecaille
878a72d001 BUG/MEDIUM: quic: listener connection stuck during handshakes (OpenSSL 3.5)
This issue was reported in GH #3071 by @famfo where a wireshark capture
reveals that some handshake could not complete after having received
two Initial packets. This could happen when the packets were parsed
in two times, calling qc_ssl_provide_all_quic_data() two times.

This is due to crypto data stream counter which was incremented two times
from qc_ssl_provide_all_quic_data() (see cstream->rx.offset += data
statement around line 1223 in quic_ssl.c). One time by the callback
which "receives" the crypto data, and on time by qc_ssl_provide_all_quic_data().

Then when parsing the second crypto data frame, the parser detected
that the crypto were already provided.

To fix this, one could comment the code which increment the crypto data
stream counter by <data>. That said, when using the OpenSSL 3.5 QUIC API
one should not modified the crypto data stream outside of the OpenSSL 3.5
QUIC API.

So, this patch stop calling qc_ssl_provide_all_quic_data() and
qc_ssl_provide_quic_data() and only calls qc_ssl_do_hanshake() after
having received some crypto data. In addition to this, as these functions
are no more called when building haproxy against OpenSSL 3.5, this patch
disable their compilations (with #ifndef HAVE_OPENSSL_QUIC).

This patch depends on this previous one:

     MINOR: quic: implement qc_ssl_do_hanshake()

Thank you to @famto for this report.

Must be backported to 3.2.
2025-08-14 14:54:47 +02:00
Willy Tarreau
a7f8693fa2 MEDIUM: ring: always allocate properly aligned ring structures
The rings were manually padded to place the various areas that compose
them into different cache lines, provided that the allocator returned
a cache-aligned address, which until now was not granted. By now
switching to the aligned API we can finally have this guarantee and
hope for more consistent ring performance between tests. Like previously
the few carefully crafted THREAD_PAD() could simply be replaced by
generic THREAD_ALIGN() that dictate the type's alignment.

This was the last user of THREAD_PAD() by the way.
2025-08-13 17:47:39 +02:00
Willy Tarreau
cfdab917fe MINOR: server: align server struct to 64 bytes
Several times recently, it was noticed that some benchmarks would
highly vary depending on the position of certain fields in the server
struct, and this could even vary between runs.

The server struct does have separate areas depending on the user cases
and hot/cold aspect of the members stored there, but the areas are
artificially kept apart using fixed padding instead of real alignment,
which has the first sad effect of artificially inflating the struct,
and the second one of misaligning it.

Now that we have all the necessary tools to keep them aligned, let's
just do it. The struct has shrunk from 4160 to 4032 bytes on 64-bit
systems, 152 of which are still holes or padding.
2025-08-13 17:37:11 +02:00
Willy Tarreau
a469356268 MEDIUM: server: introduce srv_alloc()/srv_free() to alloc/free a server
It happens that we free servers at various places in the code, both
on error paths and at runtime thanks to the "server delete" feature. In
order to switch to an aligned struct, we'll need to change the calloc()
and free() calls. Let's first spot them and switch them to srv_alloc()
and srv_free() instead of using calloc() and either free() or ha_free().
An easy trap to fall into is that some of them are default-server
entries. The new srv_free() function also resets the pointer like
ha_free() does.

This was done by running the following coccinelle script all over the
code:

  @@
  struct server *srv;
  @@
  (
  - free(srv)
  + srv_free(&srv)
  |
  - ha_free(&srv)
  + srv_free(&srv)
  )
  @@
  struct server *srv;
  expression e1;
  expression e2;
  @@
  (
  - srv = malloc(e1)
  + srv = srv_alloc()
  |
  - srv = calloc(e1, e2)
  + srv = srv_alloc()
  )

This is marked medium because despite spotting all call places, we can
never rule out the possibility that some out-of-tree patches would
allocate their own servers and continue to use the old API... at their
own risk.
2025-08-13 17:37:11 +02:00
Willy Tarreau
33d72568dd MINOR: tools: also implement ha_aligned_alloc_typed()
This one is a macro and will allocate a properly aligned and sized
object. This will help make sure that the alignment promised to the
compiler is respected.

When memstats is used, the type name is passed as a string into the
.extra field so that it can be displayed in "debug dev memstats". Two
tiny mistakes related to memstats macros were also fixed (calloc
instead of malloc for zalloc), and the doc was also added to document
how to use these calls.
2025-08-13 17:37:08 +02:00
Willy Tarreau
e21bb531ca MINOR: pools: permit to optionally specify extra size and alignment
The common macros REGISTER_TYPED_POOL(), DECLARE_TYPED_POOL() and
DECLARE_STATIC_TYPED_POOL() will now take two optional arguments,
one being the extra size to be added to the structure, and a second
one being the desired alignment to enforce. This will permit to
specify alignments larger than the default ones promised to the
compiler.
2025-08-11 19:55:30 +02:00
Willy Tarreau
d240f387ca MINOR: pools: distinguish the requested alignment from the type-specific one
We're letting users request an alignment but that can violate one imposed
by a type, especially if we start seeing REGISTER_TYPED_POOL() grow in
adoption, encouraging users to specify alignment on their types. On the
other hand, if we ask the user to always specify the alignment, no control
is possible and the error is easy. Let's have a second field in the pool
registration, for the type-specific one. We'll set it to zero when unknown,
and to the types's alignment when known. This way it will become possible
to compare them at startup time to detect conflicts. For now no macro
permits to set both separately so this is not visible.
2025-08-11 19:55:30 +02:00
Willy Tarreau
746e77d000 MINOR: tools: implement ha_aligned_zalloc()
This one is exactly ha_aligned_alloc() followed by a memset(0), as
it will be convenient for a number of call places as a replacement
for calloc().

Note that ideally we should also have a calloc version that performs
basic multiply overflow checks, but these are essentially used with
numbers of threads times small structs so that's fine, and we already
do the same everywhere in malloc() calls.
2025-08-11 19:55:30 +02:00
Olivier Houchard
b6702d5342 BUG/MEDIUM: ssl: fix build with AWS-LC
AWS-LC doesn't provide SSL_in_before(), and doesn't provide an easy way
to know if we already started the handshake or not. So instead, just add
a new field in ssl_sock_ctx, "can_write_early_data", that will be
initialized to 1, and will be set to 0 as soon as we start the
handshake.

This should be backported up to 2.8 with
13aa5616c9.
2025-08-08 20:21:14 +02:00
Aurelien DARRAGON
bcb124f92a MINOR: init: add REGISTER_POST_DEINIT_MASTER() hook
Similar to REGISTER_POST_DEINIT() hook (which is invoked during deinit)
but for master process only, when haproxy was started in master-worker
mode. The goal is to be able to register cleanup functions that will
only run for the master process right before exiting.
2025-08-07 22:27:14 +02:00
Aurelien DARRAGON
c8282f6138 MINOR: clock: add clock_get_now_offset() helper
Same as clock_set_now_offset() but to retrieve the offset from external
location.
2025-08-07 22:27:09 +02:00
Aurelien DARRAGON
20f9d8fa4e MINOR: clock: add clock_set_now_offset() helper
Since now_offset is a static variable and is not exposed outside from
clock.c, let's add an helper so that it becomes possible to set its
value from another source file.
2025-08-07 22:27:05 +02:00
Aurelien DARRAGON
4c3a36c609 MINOR: guid: add guid_count() function
returns the total amount of registered GUIDs in the guid_tree
2025-08-07 22:26:58 +02:00
Aurelien DARRAGON
7c52964591 MINOR: guid: add guid_get() helper
guid_get() is a convenient function to get the actual key string
associated to a given guid_node struct
2025-08-07 22:26:52 +02:00
Amaury Denoyelle
cae828cbf5 MINOR: quic: define QUIC_FL_CONN_IS_BACK flag
Define a new quic_conn flag assign if the connection is used on the
backend side. This is similar to other haproxy components such as struct
connection and muxes element.

This flag is positionned via qc_new_conn(). Also update quic traces to
mark proxy side as 'F' or 'B' suffix.
2025-08-07 16:59:59 +02:00
Amaury Denoyelle
e064e5d461 MINOR: quic: duplicate GSO unsupp status from listener to conn
QUIC emission can use GSO to emit multiple datagrams with a single
syscall invokation. However, this feature relies on several kernel
parameters which are checked on haproxy process startup.

Even if these checks report no issue, GSO may still be unable due to the
underlying network adapter underneath. Thus, if a EIO occured on
sendmsg() with GSO, listener is flagged to mark GSO as unsupported. This
allows every other QUIC connections to share the status and avoid using
GSO when using this listener.

Previously, listener flag was checked for every QUIC emission. This was
done using an atomic operation to prevent races. Improve this by
duplicating GSO unsupported status as the connection level. This is done
on qc_new_conn() and also on thread rebinding if a new listener instance
is used.

The main benefit from this patch is to reduce the dependency between
quic_conn and listener instances.
2025-08-07 16:36:26 +02:00
Willy Tarreau
ef915e672a MEDIUM: pools: respect pool alignment in allocations
Now pool_alloc_area() takes the alignment in argument and makes use
of ha_aligned_malloc() instead of malloc(). pool_alloc_area_uaf()
simply applies the alignment before returning the mapped area. The
pool_free() functionn calls ha_aligned_free() so as to permit to use
a specific API for aligned alloc/free like mingw requires.

Note that it's possible to see warnings about mismatching sized
during pool_free() since we know both the pool and the type. In
pool_free, adding just this is sufficient to detect potential
offenders:

	WARN_ON(__alignof__(*__ptr) > pool->align);
2025-08-06 19:20:36 +02:00
Willy Tarreau
f0d0922aa1 MINOR: pools: add macros to declare pools based on a struct type
DECLARE_TYPED_POOL() and friends take a name, a type and an extra
size (to be added to the size of the element), and will use this
to create the pool. This has the benefit of letting the compiler
automatically adapt sizeof() and alignof() based on the type
declaration.
2025-08-06 19:20:36 +02:00
Willy Tarreau
6ea0e3e2f8 MINOR: pools: add macros to register aligned pools
This adds an alignment argument to create_pool_from_loc() and
completes the existing low-level macros with new ones that expose
the alignment and the new macros permit to specify it. For now
they're not used.
2025-08-06 19:20:36 +02:00
Willy Tarreau
eb075d15f6 MEDIUM: pools: add an alignment property
This will be used to declare aligned pools. For now it's not used,
but it's properly set from the various registrations that compose
a pool, and rounded up to the next power of 2, with a minimum of
sizeof(void*).

The alignment is returned in the "show pools" part that indicates
the entry size. E.g. "(56 bytes/8)" means 56 bytes, aligned by 8.
2025-08-06 19:20:36 +02:00
Willy Tarreau
ac23b873f5 DEBUG: pools: also retrieve file and line for direct callers of create_pool()
Just like previous patch, we want to retrieve the location of the caller.
For this we turn create_pool() into a macro that collects __FILE__ and
__LINE__ and passes them to the now renamed function create_pool_with_loc().

Now the remaining ~30 pools also have their location stored.
2025-08-06 19:20:34 +02:00
Willy Tarreau
efa856a8b0 DEBUG: pools: store the pool registration file name and line number
When pools are declared using DECLARE_POOL(), REGISTER_POOL etc, we
know where they are and it's trivial to retrieve the file name and line
number, so let's store them in the pool_registration, and display them
when known in "show pools detailed".
2025-08-06 19:20:32 +02:00
Willy Tarreau
ff62aacb20 MEDIUM: pools: change the static pool creation to pass a registration
Now we're creating statically allocated registrations instead of
passing all the parameters and allocating them on the fly. Not only
this is simpler to extend (we're limited in number of INITCALL args),
but it also leaves all of these in the data segment where they are
easier to find when debugging.
2025-08-06 19:20:30 +02:00
Willy Tarreau
f51d58bd2e MINOR: pools: force the name at creation time to be a const.
This is already the case as all names are constant so that's fine. If
it would ever change, it's not very hard to just replace it in-situ
via an strdup() and set a flag to mention that it's dynamically
allocated. We just don't need this right now.

One immediately visible effect is in "show pools detailed" where the
names are no longer truncated.
2025-08-06 19:20:28 +02:00
Willy Tarreau
ee5bc28865 MINOR: pools: add a new flag to declare static registrations
We must not free these ones when destroying a pool, so let's dedicate
them a flag to mention that they are static. For now we don't have any
such.
2025-08-06 19:20:26 +02:00
Willy Tarreau
18505f9718 MINOR: pools: support creating a pool from a pool registration
We've recently introduced pool registrations to be able to enumerate
all pool creation requests with their respective parameters, but till
now they were only used for debugging ("show pools detailed"). Let's
go a step further and split create_pool() in two:
  - the first half only allocates and sets the pool registration
  - the second half creates the pool from the registration

This is what this patch does. This now opens the ability to pre-create
registrations and create pools directly from there.
2025-08-06 19:20:22 +02:00
Willy Tarreau
325d1bdcca MINOR: implement ha_aligned_alloc() to return aligned memory areas
We have two versions, _safe() which verifies and adjusts alignment,
and the regular one which trusts the caller. There's also a dedicated
ha_aligned_free() due to mingw.

The currently detected OSes are mingw, unixes older than POSIX 200112
which require memalign(), and those post 200112 which will use
posix_memalign(). Solaris 10 reports 200112 (probably through
_GNU_SOURCE since it does not do it by default), and Solaris 11 still
supports memalign() so for all Solaris we use memalign(). The memstats
wrappers are also implemented, and have the exported names. This was
the opportunity for providing a separate free call that lets the caller
specify the size (e.g. for use with pools).

For now this code is not used.
2025-08-06 19:19:27 +02:00
Willy Tarreau
e921fe894f BUILD: compat: always set _POSIX_VERSION to ease comparisons
Sometimes we need to compare it to known versions, let's make sure it's
always defined. We set it to zero if undefined so that it cannot match
any comparison.
2025-08-06 19:19:27 +02:00
Willy Tarreau
2ce0c63206 BUILD: quic: use _MAX() to avoid build issues in pools declarations
With the upcoming pool declaration, we're filling a struct's fields,
while older versions were relying on initcalls which could be turned
to function declarations. Thus the compound expressions that were
usable there are not necessarily anymore, as witnessed here with
gcc-5.5 on solaris 10:

      In file included from include/haproxy/quic_tx.h:26:0,
                       from src/quic_tx.c:15:
      include/haproxy/compat.h:106:19: error: braced-group within expression allowed only inside a function
       #define MAX(a, b) ({    \
                         ^
      include/haproxy/pool.h:41:11: note: in definition of macro '__REGISTER_POOL'
         .size = _size,           \
                 ^
      ...
      include/haproxy/quic_tx-t.h:6:29: note: in expansion of macro 'MAX'
       #define QUIC_MAX_CC_BUFSIZE MAX(QUIC_INITIAL_IPV6_MTU, QUIC_INITIAL_IPV4_MTU)

Let's make the macro use _MAX() instead of MAX() since it relies on pure
constants.
2025-08-06 19:19:11 +02:00
Willy Tarreau
cf8871ae40 BUILD: compat: provide relaxed versions of the MIN/MAX macros
In 3.0 the MIN/MAX macros were converted to compound expressions with
commit 0999e3d959 ("CLEANUP: compat: make the MIN/MAX macros more
reliable"). However with older compilers these are not supported out
of code blocks (e.g. to initialize variables or struct members). This
is the case on Solaris 10 with gcc-5.5 when QUIC doesn't compile
anymore with the future pool registration:

  In file included from include/haproxy/quic_tx.h:26:0,
                   from src/quic_tx.c:15:
  include/haproxy/compat.h:106:19: error: braced-group within expression allowed only inside a function
   #define MAX(a, b) ({    \
                     ^
  include/haproxy/pool.h:41:11: note: in definition of macro '__REGISTER_POOL'
     .size = _size,           \
             ^
  ...
  include/haproxy/quic_tx-t.h:6:29: note: in expansion of macro 'MAX'
   #define QUIC_MAX_CC_BUFSIZE MAX(QUIC_INITIAL_IPV6_MTU, QUIC_INITIAL_IPV4_MTU)

Let's provide the old relaxed versions as _MIN/_MAX for use with constants
like such cases where it's certain that there is no risk. A previous attempt
using __builtin_constant_p() to switch between the variants did not work,
and it's really not worth the hassle of going this far.
2025-08-06 19:18:42 +02:00
Aurelien DARRAGON
aeff2a3b2a BUG/MEDIUM: hlua_fcn: ensure systematic watcher cleanup for server list iterator
In 358166a ("BUG/MINOR: hlua_fcn: restore server pairs iterator pointer
consistency"), I wrongly assumed that because the iterator was a temporary
object, no specific cleanup was needed for the watcher.

In fact watcher_detach() is not only relevant for the watcher itself, but
especially for its parent list to remove the current watcher from it.

As iterators are temporary objects, failing to remove their watchers from
the server watcher list causes the server watcher list to be corrupted.

On a normal iteration sequence, the last watcher_next() receives NULL
as target so it successfully detaches the last watcher from the list.
However the corner case here is with interrupted iterators: users are
free to break away from the iteration loop when a specific condition is
met for instance from the lua script, when this happens
hlua_listable_servers_pairs_iterator() doesn't get a chance to detach the
last iterator.

Also, Lua doesn't tell us that the loop was interrupted,
so to fix the issue we rely on the garbage collector to force a last
detach right before the object is freed. To achieve that, watcher_detach()
was slightly modified so that it becomes possible to call it without
knowing if the watcher is already detached or not, if watcher_detach() is
called on a detached watcher, the function does nothing. This way it saves
the caller from having to track the watcher state and makes the API a
little more convenient to use. This way we now systematically call
watcher_detach() for server iterators right before they are garbage
collected.

This was first reported in GH #3055. It can be observed when the server
list is browsed one than more time when it was already browsed from Lua
for a given proxy and the iteration was interrupted before the end. As the
watcher list is corrupted, the common symptom is watcher_attach() or
watcher_next() not ending due to the internal mt_list call looping
forever.

Thanks to GH users @sabretus and @sabretus for their precious help.

It should be backported everywhere 358166a was.
2025-08-05 13:06:46 +02:00
William Lallemand
9ee14ed2d9 MEDIUM: acme: allow to wait and restart the task for DNS-01
DNS-01 needs a external process which would register a TXT record on a
DNS provider, using a REST API or something else.

To achieve this, the process should read the dpapi sink and wait for
events. With the DNS-01 challenge, HAProxy will put the task to sleep
before asking the ACME server to achieve the challenge. The task then
need to be woke up, using the command implemented by this patch.

This patch implements the "acme challenge_ready" command which should be
used by the agent once the challenge was configured in order to wake the
task up.

Example:
    echo "@1 acme challenge_ready foobar.pem.rsa domain kikyo" | socat /tmp/master.sock -
2025-08-01 18:07:12 +02:00
William Lallemand
365a69648c MINOR: acme: emit a log for DNS-01 challenge response
This commit emits a log which output the TXT entry to create in case of
DNS-01. This is useful in cases you want to update your TXT entry
manually.

Example:

    acme: foobar.pem.rsa: DNS-01 requires to set the "acme-challenge.example.com" TXT record to "7L050ytWm6ityJqolX-PzBPR0LndHV8bkZx3Zsb-FMg"
2025-08-01 16:12:27 +02:00
William Lallemand
09275fd549 BUILD: acme: avoid declaring TRACE_SOURCE in acme-t.h
Files ending with '-t.h' are supposed to be used for structure
definitions and could be included in the same file to check API
definitions.

This patch removes TRACE_SOURCE from acme-t.h to avoid conflicts with
other TRACE_SOURCE definitions.
2025-07-31 16:03:28 +02:00
Amaury Denoyelle
2ecc5290f2 MINOR: session: streamline session_check_idle_conn() usage
session_check_idle_conn() is called by muxes when a connection becomes
idle. It ensures that the session idle limit is not yet reached. Else,
the connection is removed from the session and it can be freed.

Prior to this patch, session_check_idle_conn() was compatible with a
NULL session argument. In this case, it would return true, considering
that no limit was reached and connection not removed.

However, this renders the function error-prone and subject to future
bugs. This patch streamlines it by ensuring it is never called with a
NULL argument. Thus it can now only returns true if connection is kept
in the session or false if it was removed, as first intended.
2025-07-30 16:13:30 +02:00
Amaury Denoyelle
dd9645d6b9 MINOR: session: do not release conn in session_check_idle_conn()
session_check_idle_conn() is called to flag a connection already
inserted in a session list as idle. If the session limit on the number
of idle connections (max-session-srv-conns) is exceeded, the connection
is removed from the session list.

In addition to the connection removal, session_check_idle_conn()
directly calls MUX destroy callback on the connection. This means the
connection is freed by the function itself and should not be used by the
caller anymore.

This is not practical when an alternative connection closure method
should be used, such as a graceful shutdown with QUIC. As such, remove
MUX destroy invokation : this is now the responsability of the caller to
either close or release immediately the connection.
2025-07-30 11:43:41 +02:00
Amaury Denoyelle
57e9425dbc MINOR: session: strengthen idle conn limit check
Add a BUG_ON() on session_check_idle_conn() to ensure the connection is
not already flagged as CO_FL_SESS_IDLE.

This checks that this function is only called one time per connection
transition from active to idle. This is necessary to ensure that session
idle counter is only incremented one time per connection.
2025-07-30 11:40:16 +02:00
Amaury Denoyelle
ec1ab8d171 MINOR: session: remove redundant target argument from session_add_conn()
session_add_conn() uses three argument : connection and session
instances, plus a void pointer labelled as target. Typically, it
represents the server, but can also be a backend instance (for example
on dispatch).

In fact, this argument is redundant as <target> is already a member of
the connection. This commit simplifies session_add_conn() by removing
it. A BUG_ON() on target is extended to ensure it is never NULL.
2025-07-30 11:39:57 +02:00
Amaury Denoyelle
668c2cfb09 MINOR: session: strengthen connection attach to session
This commit is the first one of a serie to refactor insertion of backend
private connection into the session list.

session_add_conn() is used to attach a connection into a session list.
Previously, this function would report an error if the connection
specified was already attached to another session. However, this case
currently never happens and thus can be considered as buggy.

Remove this check and replace it with a BUG_ON(). This allows to ensure
that session insertion remains consistent. The same check is also
transformed in session_check_idle_conn().
2025-07-30 11:39:26 +02:00
Aurelien DARRAGON
14966c856b MINOR: clock: make global_now_ns a pointer as well
Similar to previous commit but for global_now_ns
2025-07-29 18:04:15 +02:00
Aurelien DARRAGON
4a20b3835a MINOR: clock: make global_now_ms a pointer
This is preparation work for shared counters between co-processes. As
co-processes will need to share a common date. global_now_ms will be used
for that as it will point to the shm when sharing is enabled.

Thus in this patch we turn global_now_ms into a pointer (and adjust the
places where it is written to and read from, hopefully atomic operations
through pointer are already used so the change is trivial)

For now global_now_ms points to process-local _global_now_ms which is a
fallback for when sharing through the shm is not enabled.
2025-07-29 18:04:14 +02:00
Aurelien DARRAGON
713ebd2750 CLEANUP: counters: rename counters_be_shared_init to counters_be_shared_prepare
75e480d10 ("MEDIUM: stats: avoid 1 indirection by storing the shared
stats directly in counters struct") took care of renaming
counters_fe_shared_init() but we forgot counters_be_shared_init().

Let's fix that for consistency
2025-07-29 18:00:13 +02:00
William Lallemand
83a335f925 MINOR: acme: implement traces
Implement traces for the ACME protocol.

 -dt acme:data:complete will dump every input and output buffers,
 including decoded buffers before being converted to JWS.
 It will also dump certificates in the traces.

 -dt acme:user:complete will only dump the state of the task handler.
2025-07-29 17:25:10 +02:00
Aurelien DARRAGON
c24de077bd OPTIM: stats: store fast sharded counters pointers at session and stream level
Following commit 75e480d10 ("MEDIUM: stats: avoid 1 indirection by storing
the shared stats directly in counters struct"), in order to minimize the
impact of the recent sharded counters work, we try to push things a bit
further in this patch by storing and using "fast" pointers at the session
and stream levels when available to avoid costly indirections and
systematic "tgid" resolution (which can not be cached by the CPU due to
its THREAD-local nature).

Indeed, we know that a session/stream is tied to a given CPU, thanks to
this we know that the tgid for a given session/stream will never change.

Given that, we are able to store sharded frontend and listener counters
pointer at the session level (namely sess->fe_tgcounters and
sess->li_tgcounters), and once the backend and the server are selected,
we are also able to store backend and server sharded counters
pointer at the stream level (namely s->be_tgcounters and s->sv_tgcounters)

Everywhere we rely on these counters and the stream or session context is
available, we use the fast pointers it instead of the indirect pointers
path to make the pointer resolution a bit faster.

This optimization proved to bring a few percents back, and together with
the previous 75e480d10 commit we now fixed the performance regression (we
are back to back with 3.2 stats performance)
2025-07-25 18:24:23 +02:00
Aurelien DARRAGON
cf8ba60c88 CLEANUP: peers: remove unused peer_session_target()
Since commit 7293eb68 ("MEDIUM: peers: use server as stream target") peer
session target always point to server in order to benefit from existing
server transport options.

Thanks to that, it is no longer necessary to have peer_session_target()
helper function, because all it does is return the pointer to the
server object. Let's get rid of that
2025-07-25 18:24:17 +02:00
Ben Kallus
1e48ec7f6c CLEANUP: include: replace hand-rolled offsetof to avoid UB
The C standard specifies that it's undefined behavior to dereference
NULL (even if you use & right after). The hand-rolled offsetof idiom
&(((s*)NULL)->f) is thus technically undefined. This clutters the
output of UBSan and is simple to fix: just use the real offsetof when
it's available.

Note that there's no clear statement about this point in the spec,
only several points which together converge to this:

- From N3220, 6.5.3.4:
  A postfix expression followed by the -> operator and an identifier
  designates a member of a structure or union object. The value is
  that of the named member of the object to which the first expression
  points, and is an lvalue.

- From N3220, 6.3.2.1:
  An lvalue is an expression (with an object type other than void) that
  potentially designates an object; if an lvalue does not designate an
  object when it is evaluated, the behavior is undefined.

- From N3220, 6.5.4.4 p3:
  The unary & operator yields the address of its operand. If the
  operand has type "type", the result has type "pointer to type". If
  the operand is the result of a unary * operator, neither that operator
  nor the & operator is evaluated and the result is as if both were
  omitted, except that the constraints on the operators still apply and
  the result is not an lvalue. Similarly, if the operand is the result
  of a [] operator, neither the & operator nor the unary * that is
  implied by the [] is evaluated and the result is as if the & operator
  were removed and the [] operator were changed to a + operator.

=> In short, this is saying that C guarantees these identities:
    1. &(*p) is equivalent to p
    2. &(p[n]) is equivalent to p + n

As a consequence, &(*p) doesn't result in the evaluation of *p, only
the evaluation of p (and similar for []). There is no corresponding
special carve-out for ->.

See also: https://pvs-studio.com/en/blog/posts/cpp/0306/

After this patch, HAProxy can run without crashing after building w/
clang-19 -fsanitize=undefined -fno-sanitize=function,alignment
2025-07-25 17:54:32 +02:00
Ben Kallus
d3b46cca7b CLEANUP: compiler: prefer char * over void * for pointer arithmetic
This patch changes two instances of pointer arithmetic on void *
to use char * instead, to avoid UB. This is essentially to please
UB analyzers, though.
2025-07-25 17:54:32 +02:00
Aurelien DARRAGON
75e480d107 MEDIUM: stats: avoid 1 indirection by storing the shared stats directly in counters struct
Between 3.2 and 3.3-dev we noticed a noticeable performance regression
due to stats handling. After bisecting, Willy found out that recent
work to split stats computing accross multiple thread groups (stats
sharding) was responsible for that performance regression. We're looking
at roughly 20% performance loss.

More precisely, it is the added indirections, multiplied by the number
of statistics that are updated for each request, which in the end causes
a significant amount of time being spent resolving pointers.

We noticed that the fe_counters_shared and be_counters_shared structures
which are currently allocated in dedicated memory since a0dcab5c
("MAJOR: counters: add shared counters base infrastructure")
are no longer huge since 16eb0fab31 ("MAJOR: counters: dispatch counters
over thread groups") because they now essentially hold flags plus the
per-thread group id pointer mapping, not the counters themselves.

As such we decided to try merging fe_counters_shared and
be_counters_shared in their parent structures. The cost is slight memory
overhead for the parent structure, but it allows to get rid of one
pointer indirection. This patch alone yields visible performance gains
and almost restores 3.2 stats performance.

counters_fe_shared_get() was renamed to counters_fe_shared_prepare() and
now returns either failure or success instead of a pointer because we
don't need to retrieve a shared pointer anymore, the function takes care
of initializing existing pointer.
2025-07-25 16:46:10 +02:00
Christopher Faulet
b8d5307bd9 MEDIUM: applet: Emit a warning when a legacy applet is spawned
To motivate developers to support the new applets API, a warning is now
emitted when a legacy applet is spawned. To not flood users, this warning is
only emitted once per legacy applet. To do so, the applet flag
APPLET_FL_WARNED was added. It is set when the warning is emitted.

Note that test and set on this flag are not performed via atomic operations.
So it is possible to have more than one warning for a given applet if it is
spawned in same time on several threads. At worrst, there is one warning per
thread.
2025-07-25 15:53:33 +02:00
Christopher Faulet
337768656b MINOR: applet: Add support for flags on applets with a flag about the new API
A new field was added in the applet structure to be able to set flags on the
applets The first one is related to the new API. APPLET_FL_NEW_API is set
for applets based on the new API. It was set on all HAProxy's applets.
2025-07-25 15:44:02 +02:00
Christopher Faulet
1f9a1cbefc MINOR: applet: Improve applet API to take care of inbuf/outbuf alloc failures
applet_get_inbuf() and applet_get_outbuf() functions were not testing if the
buffers were available. So, the caller had to check them before calling one
of these functions. It is not really handy. So now, these functions take
care to have a fully usable buffer before returning. Otherwise NULL is
returned.
2025-07-24 12:13:41 +02:00
Christopher Faulet
44aae94ab9 MINOR: applet: Add HTX versions for applet_input_data() and applet_output_room()
It will be useful for HTX applets because availale data in the input buffer and
available space in the output buffer are computed from the HTX message and not
the buffer itself. So now, applet_htx_input_data() and applet_htx_output_room()
functions can be used.
2025-07-24 12:13:41 +02:00
Christopher Faulet
d9855102cf BUG/MEDIUM: Remove sync sends from streams to applets
When the applet API was reviewed to use dedicated buffers, the support for
sends from the streams to applets was added. Unfortunately, it was not a
good idea because this way it is possible to deliver data to an applet and
release it just after, truncated data. Indeed, the release stage for applets
is related to the stream release itself. However, unlike the multiplexers,
the applets cannot survive to a stream for now.

So, for now, the sync sends from the streams is removed for applets, waiting
for a better way to handle the applets release stage.

Note that this only concerns applets using their own buffers. And of now,
the bug is harmless because all refactored applets are on server side and
consume data first. But this will be an issue with the HTTP client.

This patch should be backported as far as 3.0 after a period of observation.
2025-07-24 12:13:41 +02:00
Christopher Faulet
574d0d8211 BUG/MINOR: applet: Fix applet_getword() to not return one extra byte
applet_getword() function is returning one extra byte when a string is
returned because the "ret" variable is not reset before the loop on the
data. The patch also fixes applet_getline().

It is a 3.3-specific issue. No need to backport.
2025-07-24 12:13:41 +02:00
Christopher Faulet
41a40680ce BUG/MEDIUM: stconn: Fix conditions to know an applet can get data from stream
sc_is_send_allowed() function is used to know if an applet is able to
receive data from the stream. But this function was designed for applets
using the channels buffer. It is not adapted to applets using their own
buffers.

when the SE_FL_WAIT_DATA flag is set, it means the applet is waiting for
more data and should not be woken up without new data. For applets using
channels buffer, just testing the flag is enough because process_stream()
will remove if when more data will be available. For applets using their own
buffers, it is more complicated. Some data may be blocked in the output
channel buffer. In that case, and when the applet input buffer can receive
daa, the applet can be woken up.

This patch must be backported as far as 3.0 after a period of observation.
2025-07-24 12:13:41 +02:00
Christopher Faulet
0d371d2729 BUG/MEDIUM: applet: State inbuf is no longer full if input data are skipped
When data are skipped from the input buffer of an applet, we must take care
to notify the input buffer is no longer full. Otherwise, this could prevent
the stream to push data to the applet.

It is 3.3-specific. No backport needed.
2025-07-24 12:13:41 +02:00
Ilia Shipitsin
a2267fafcf CLEANUP: acme: fix wrong spelling of "resources"
"ressources" was used as a variable name, let's use English variant
to make spell check happier
2025-07-24 08:11:42 +02:00
Amaury Denoyelle
3bf37596ba MINOR: mux-quic: store session in QCS instance
Add a new <sess> member into QCS structure. It is used to store the
parent session of the stream on attach operation. This is only done for
backend side.

This new member will become necessary when connection reuse will be
implemented. <owner> member of connection is not suitable as it could be
set to NULL, notably after a session_add_conn() failure.

Also, a single BE conn can be shared along different session instance,
in particular when using aggressive/always reuse mode. Thus it is
necessary to linked each QCS instance with its session.
2025-07-23 15:42:37 +02:00
Remi Tricot-Le Breton
8f2b787241 MINOR: ssl: Add curves in ssl traces
Dump the ClientHello curves in the SSL traces.
2025-07-21 16:44:50 +02:00
Remi Tricot-Le Breton
d799a1b3b2 MINOR: ssl: Add curve id to curve name table and mapping functions
The SSL libraries like OpenSSL for instance do not seem to actually
provide a public mapping between IANA defined curve IDs and curve names,
or even a mapping between curve IDs and internal NIDs.
This new table regroups all those information in a single table so that
we can convert curve names (be it SECG or NIST format) to curve IDs or
NIDs.
The previously existing 'curves2nid' function now uses the new table,
and a new 'curveid2str' one is added.
2025-07-21 16:44:50 +02:00
Remi Tricot-Le Breton
f00d9bf12d MINOR: ssl: Add ciphers in ssl traces
Decode the contents of the ClientHello ciphers extension and dump a
human readable list in the ssl traces.
2025-07-21 16:44:50 +02:00
Frederic Lecaille
14d0f74052 MINOR: quic: Remove pool_head_quic_be_cc_buf pool
This patch impacts the QUIC frontends. It reverts this patch

    MINOR: quic-be: add a "CC connection" backend TX buffer pool

which adds <pool_head_quic_be_cc_buf> new pool to allocate CC (connection closed state)
TX buffers with bigger object size than the one for <pool_head_quic_cc_buf>.
Indeed the QUIC backends must be able to send at least 1200 bytes Initial packets.

For now on, both the QUIC frontends and backend use the same pool with
MAX(QUIC_INITIAL_IPV6_MTU, QUIC_INITIAL_IPV4_MTU)(1252 bytes) as object size.
2025-07-17 19:33:21 +02:00
Valentine Krasnobaeva
9e11c852fe MINOR: cpu-topo: write thread-cpu bindings into trash buffer
Write thread-cpu bindings and cluster summary into provided trash buffer.
Like this we can call this function in any place, when this info is needed.
2025-07-17 19:07:58 +02:00
Valentine Krasnobaeva
2405283230 MINOR: cpu-topo: split cpu_dump_topology() to show its summary in show dev
cpu_dump_topology() prints details about each enabled CPU and a summary with
clusters info and thread-cpu bindings. The latter is often usefull for
debugging and we want to add it in the 'show dev' output.

So, let's split cpu_dump_topology() in two parts: cpu_topo_debug() to print the
details about each enabled CPU; and cpu_topo_dump_summary() to print only the
summary.

In the next commit we will modify cpu_topo_dump_summary() to write into local
trash buffer and it could be easily called from debug_parse_cli_show_dev().
2025-07-17 19:07:46 +02:00
Willy Tarreau
b6d0ecd258 DOC: connection: explain the rules for idle/safe/avail connections
It's super difficult to find the rules that operate idle conns depending
on their idle/safe/avail/private status. Some are in lists, others not.
Some are in trees, others not. Some have a flag set, others not. This
documents the rules before the definitions in connection-t.h. It could
even be backported to help during backport sessions.
2025-07-16 18:53:57 +02:00
Frederic Lecaille
838024e07e MINOR: quic: Get rid of qc_is_listener()
Replace all calls to qc_is_listener() (resp. !qc_is_listener()) by calls to
objt_listener() (resp. objt_server()).
Remove qc_is_listener() implement and QUIC_FL_CONN_LISTENER the flag it
relied on.
2025-07-16 16:42:21 +02:00
Christopher Faulet
4f7c26cbb3 BUG/MINOR: applet: Don't trigger BUG_ON if the tid is not on appctx init
When an appctx is initialized, there is a BUG_ON() to be sure the appctx is
really initialized on the right thread to avoid bugs on the thread
affinity. However, it is possible to not choose the thread when the appctx
is created and let it starts on any thread. In that case, the thread
affinity is set when the appctx is initialized. So, we must take cate to not
trigger the BUG_ON() in that case.

For now, we never hit the bug because the thread affinity is always set
during the appctx creation.

This patch must be backport as far as 2.8.
2025-07-16 13:47:33 +02:00
Amaury Denoyelle
63586a8ab4 BUG/MINOR: h3: properly handle interim response on BE side
On backend side, H3 layer is responsible to decode a HTTP/3 response
into an HTX message. Multiple responses may be received on a single
stream with interim status codes prior to the final one.

h3_resp_headers_to_htx() is the function used solely on backend side
responsible for H3 response to HTX transcoding. This patch extends it to
be able to properly support interim responses. When such a response is
received, the new flag H3_SF_RECV_INTERIM is set. This is converted to
QMUX qcs flag QC_SF_EOI_SUSPENDED.

The objective of this latter flag is to prevent stream EOI to be
reported during stream rcv_buf callback, even if HTX message contains
EOM and is empty. QC_SF_EOI_SUSPENDED will be cleared when the final
response is finally converted, which unblock stream EOI notification for
next rcv_buf invocations. Note however that HTX EOM is untouched : it is
always set for both interim and final response reception.

As a minor adjustment, HTX_SL_F_BODYLESS is always set for interim
responses.

Contrary to frontend interim response handling, a flag is necessary on
QMUX layer. This is because H3 to HTX transcoding and rcv_buf callback
are two distinct operations, called under different context (MUX vs
stream tasklet).

Also note that H3 layer has two distinct flags for interim response
handling, one only used as a server (FE side) and the other as a client
(BE side). It was preferred to used two distinct flags which is
considered less error-prone, contrary to a single unified flag which
would require to always set the proxy side to ensure it is relevant or
not.

No need to backport.
2025-07-15 18:39:23 +02:00
Amaury Denoyelle
f349df44b4 MINOR: qmux: change API for snd_buf FIN transmission
Previous patches have fixes interim response encoding via
h3_resp_headers_send(). However, it is still necessary to adjust h3
layer state-machine so that several successive HTTP responses are
accepted for a single stream.

Prior to this, QMUX was responsible to decree that the final HTX message
was encoded so that FIN stream can be emitted. However, with interim
response, MUX is in fact unable to properly determine this. As such,
this is the responsibility of the application protocol layer. To reflect
this, app_ops snd_buf callback is modified so that a new output argument
<fin> is added to it.

Note that for now this commit does not bring any functional change.
However, it will be necessary for the following patch. As such, it
should be backported prior to it to every versions as necessary.
2025-07-15 18:39:23 +02:00
Willy Tarreau
4ac28f07d0 MEDIUM: proxy: take the defsrv out of the struct proxy
The server struct has gone huge over time (~3.8kB), and having a copy
of it in the defsrv section of the struct proxy costs a lot of RAM,
that is not needed anymore at run time.

This patch replaces this struct with a dynamically allocated one. The
field is allocated and initialized during alloc_new_proxy() and is
freed when the proxy is destroyed for now. But the goal will be to
support freeing it after parsing the section.
2025-07-15 10:34:18 +02:00
Willy Tarreau
616c10f608 CLEANUP: server: add server_find_by_addr()
Server lookup by address requires locking and manipulation of the tree
from user code. Let's provide server_find_by_addr() which does that for
us.
2025-07-15 10:30:28 +02:00
Willy Tarreau
fda04994d9 CLEANUP: server: simplify server_find_by_id()
At a few places we're seeing some open-coding of the same function, likely
because it looks overkill for what it's supposed to do, due to extraneous
tests that are not needed (e.g. check of the backend's PR_CAP_BE etc).
Let's just remove all these superfluous tests and inline it so that it
feels more suitable for use everywhere it's needed.
2025-07-15 10:30:28 +02:00
Willy Tarreau
61acd15ea8 CLEANUP: server: rename findserver() to server_find_by_name()
Now it's more logical and matches what is done in the rest of these
functions. server_find() now relies on it.
2025-07-15 10:30:28 +02:00
Willy Tarreau
6ad9285796 CLEANUP: server: rename server_find_by_name() to server_find()
This function doesn't just look at the name but also the ID when the
argument starts with a '#'. So the name is not correct and explains
why this function is not always used when the name only is needed,
and why the list-based findserver() is used instead. So let's just
call the function "server_find()", and rename its generation-id based
cousin "server_find_unique()".
2025-07-15 10:30:28 +02:00
Willy Tarreau
5e78ab33cd MINOR: server: use the tree to look up the server name in findserver()
Let's just use the tree-based lookup instead of walking through the list.
This function is used to find duplicates in "track" statements and a few
such places, so it's important not to waste too much time on large setups.
2025-07-15 10:30:27 +02:00
Willy Tarreau
12a6a3bb3f REORG: server: move findserver() from proxy.c to server.c
The reason this function was overlooked is that it had mostly equivalent
ones in server.c, let's move them together.
2025-07-15 10:30:27 +02:00
Valentine Krasnobaeva
0c63883be1 MINOR: debug: add distro name and version in postmortem
Since 2012, systemd compliant distributions contain
/etc/os-release file. This file has some standardized format, see details at
https://www.freedesktop.org/software/systemd/man/latest/os-release.html.

Let's read it in feed_post_mortem_linux() to gather more info about the
distribution.

(cherry picked from commit f1594c41368baf8f60737b229e4359fa7e1289a9)
Signed-off-by: Willy Tarreau <w@1wt.eu>
2025-07-11 11:48:19 +02:00
Ilia Shipitsin
0ee3d739b8 CLEANUP: assorted typo fixes in the code, commits and doc
Corrected various spelling and phrasing errors to improve clarity and consistency.
2025-07-10 19:49:48 +02:00
Christopher Faulet
187ae28cf4 MINOR: h1-htx: Add function to format an HTX message in its H1 representation
The function h1_format_htx_msg() can now be used to convert a valid HTX
message in its H1 representation. No validity test is performed, the HTX
message must be valid. Only trailers are silently ignored if the message is
not chunked. In addition, the destination buffer must be empty. 1XX interim
responses should be supported. But again, there is no validity tests.
2025-07-10 10:29:49 +02:00
Christopher Faulet
25b0625d5c BUG/MEDIUM: http-client: Drain the request if an early response is received
When a large request is sent, it is possible to have a response before the
end of the request. It is valid from HTTP perspective but it is an issue
with the current design of the http-client. Indded, the request and the
response are handled sequentially. So the response will be blocked, waiting
for the end of the request. Most of time, it is not an issue, except when
the request transfer is blocked. In that case, the applet is blocked.

With the current API, it is not possible to handle early response and
continue the request transfer. So, this case cannot be handle. In that case,
it seems reasonnable to drain the request if a response is received. This
way, the request transfer, from the caller point of view, is never blocked
and the response can be properly processed.

To do so, the action flag HTTPCLIENT_FA_DRAIN_REQ is added to the
http-client. When it is set, the request payload is just dropped. In that
case, we take care to not report the end of input to properly report the
request was truncated, especially in logs.

It is only an issue with large POSTs, when the payload is streamed.

This patch must be backported as far as 2.6.
2025-07-09 16:27:24 +02:00
Frederic Lecaille
45ac235baa BUG/MEDIUM: quic: Crash after QUIC server callbacks restoration (OpenSSL 3.5)
Revert this patch which is no more useful since OpenSSL 3.5.1 to remove the
QUIC server callback restoration after SSL context switch:

    MINOR: quic: OpenSSL 3.5 internal QUIC custom extension for transport parameters reset

It was required for 3.5.0. That said, there was no CI for OpenSSL 3.5 at the date
of this commit. The CI recently revealed that the QUIC server side could crash
during QUIC reg tests just after having restored the callbacks as implemented by
the commit above.

Also revert this commit which is no more useful because it arrived with the commit
above:

	BUG/MEDIUM: quic: SSL/TCP handshake failures with OpenSSL 3.

Must be backported to 3.2.
2025-07-09 16:01:02 +02:00
Frederic Lecaille
c01eb1040e MINOR: quic: Prevent QUIC build with OpenSSL 3.5 new QUIC API version < 3.5.1
The QUIC listener part was impacted by the 3.5.0 OpenSSL new QUIC API with several
issues which have been fixed by 3.5.1.

Add a #error to prevent such OpenSSL 3.5 new QUIC API use with version below 3.5.1.

Must be backported to 3.2.
2025-07-09 16:01:02 +02:00
Willy Tarreau
95cf518bfa BUG/MINOR: resolvers: don't lower the case of binary DNS format
The server's "hostname_dn" is in Domain Name format, not a pure string, as
converted by resolv_str_to_dn_label(). It is made of lower-case string
components delimited by binary lengths, e.g. <0x03>www<0x07>haproxy<0x03)org.
As such it must not be lowercased again in srv_state_srv_update(), because
1) it's useless on the name components since already done, and 2) because
it would replace component lengths 97 and above by 32-char shorter ones.
Granted, not many domain names have that large components so the risk is
very low but the operation is always wrong anyway. This was brought in
2.5 by commit 3406766d57 ("MEDIUM: resolvers: add a ref between servers
and srv request or used SRV record").

In the same vein, let's fix the confusing strcasecmp() that are applied
to this binary format, and use memcmp() instead. Here there's basically
no risk to incorrectly match the wrong record, but that test alone is
confusing enough to provoke the existence of the bug above.

Finally let's update the component for that field to mention that it's
in this format and already lower cased.

Better not backport this, the risk of facing this bug is almost zero, and
every time we touch such files something breaks for bad reasons.
2025-07-08 07:54:45 +02:00
Frederic Lecaille
5a87f4673a MINOR: quic: Prevent QUIC backend use with the OpenSSL QUIC compatibility module (USE_OPENSS_COMPAT)
Make the server line parsing fail when a QUIC backend is configured  if haproxy
is built to use the OpenSSL stack compatibility module. This latter does not
support the QUIC client part.
2025-07-07 14:13:02 +02:00
Frederic Lecaille
6aebca7f2c BUG/MINOR: quic: Missing TLS 1.3 QUIC cipher suites and groups inits (OpenSSL 3.5 QUIC API)
This bug impacts both QUIC backends and frontends with OpenSSL 3.5 as QUIC API.

The connections to a haproxy QUIC listener from a haproxy QUIC backend could not
work at all without HelloRetryRequest TLS messages emitted by the backend
asking the QUIC client to restart the handshake followed by TLS alerts:

    conn. @(nil) OpenSSL error[0xa000098] read_state_machine: excessive message size

Furthermore, the Initial CRYPTO data sent by the client were big (about two 1252 bytes
packets) (ClientHello TLS message). After analyzing the packets a key_share extension
with <unknown> as value was long (more that 1Ko). This extension is in relation with
the groups but does not belong to the groups supported by QUIC.

That said such connections could work with ngtcp2 as backend built against the same
OSSL TLS stack API but with a HelloRetryRequest.

ngtcp2 always set the QUIC default cipher suites and group, for all the stacks it
supports as implemented by this patch.

So this patch configures both QUIC backend and frontend cipher suites and groups
calling SSL_CTX_set_ciphersuites() and SSL_CTX_set1_groups_list() with the correct
argument, except for SSL_CTX_set1_groups_list() which fails with QUIC TLS for
a unknown reason at this time.

The call to SSL_CTX_set_options() is useless from ssl_quic_initial_ctx() for the QUIC
clients. One relies on ssl_sock_prepare_srv_ssl_ctx() to set them for now on.

This patch is effective for all the supported stacks without impact for AWS-LC,
and QUIC TLS and fixes the connections for haproxy QUIC frontend and backends
when builts against OpenSSL 3.5 QUIC API).

A new define HAVE_OPENSSL_QUICTLS has been added to openssl-compat.h to distinguish
the QUIC TLS stack.

Must be backported to 3.2.
2025-07-07 14:13:02 +02:00
Willy Tarreau
573143e0c8 MINOR: pattern: add a counter of added/freed patterns
Patterns are allocated when loading maps/acls from a file or dynamically
via the CLI, and are released only from the CLI (e.g. "clear map xxx").
These ones do not use pools and are much harder to monitor, e.g. in case
a script adds many and forgets to clear them, etc.

Let's add a new pair of metrics "PatternsAdded" and "PatternsFreed" that
will report the number of added and freed patterns respectively. This
can allow to simply graph both. The difference between the two normally
represents the number of allocated patterns. If Added grows without
Freed following, it can indicate a faulty script that doesn't perform
the needed cleanup. The metrics are also made available to Prometheus
as patterns_added_total and patterns_freed_total respectively.
2025-07-05 00:12:45 +02:00
Remi Tricot-Le Breton
a075d6928a CLEANUP: ssl: Rename ssl_trace-t.h to ssl_trace.h
This header does not actually contain any structures so it's best to
remove the '-t' from the name for better consistency.
2025-07-04 15:21:50 +02:00
Christopher Faulet
5232df57ab MINOR: proto-tcp: Add support for TCP MD5 signature for listeners and servers
This patch adds the support for the RFC2385 (Protection of BGP Sessions via
the + TCP MD5 Signature Option) for the listeners and the servers. The
feature is only available on Linux. Keywords are not exposed otherwise.

By setting "tcp-md5sig <password>" option on a bind line, TCP segments of
all connections instantiated from the listening socket will be signed with a
16-byte MD5 digest. The same option can be set on a server line to protect
outgoing connections to the corresponding server.

The primary use case for this option is to allow BGP to protect itself
against the introduction of spoofed TCP segments into the connection
stream. But it can be useful for any very long-lived TCP connections.

A reg-test was added and it will be executed only on linux. All other
targets are excluded.
2025-07-03 15:25:40 +02:00
William Lallemand
3e05e20029 MEDIUM: httpclient: implement a way to use directly htx data
Add a HTTPCLIENT_O_RES_HTX flag which allow to store directly the HTX
data in the response buffer instead of extracting the data in raw
format.

This is useful when the data need to be reused in another request.
2025-07-01 16:31:47 +02:00
William Lallemand
2f4219ed68 MEDIUM: httpclient: split the CLI from the actual httpclient API
This patch split the httpclient code to prevent confusion between the
httpclient CLI command and the actual httpclient API.

Indeed there was a confusion between the flag used internally by the
CLI command, and the actual httpclient API.

hc_cli_* functions as well as HC_C_F_* defines were moved to
httpclient_cli.c.
2025-07-01 15:46:04 +02:00
William Lallemand
519abefb57 BUG/MINOR: httpclient: wrongly named httpproxy flag
The HC_F_HTTPPROXY flag was wrongly named and does not use the correct
value, indeed this flag was meant to be used for the httpclient API, not
the httpclient CLI.

This patch fixes the problem by introducing HTTPCLIENT_FO_HTTPPROXY
which has must be set in hc->flags.

Also add a member 'options' in the httpclient structure, because the
member flags is reinitialized when starting.

Must be backported as far as 3.0.
2025-07-01 14:47:52 +02:00
Aurelien DARRAGON
747a812066 MEDIUM: stats: add persistent state to typed output format
Add a fourth character to the second column of the "typed output format"
to indicate whether the value results from a volatile or persistent metric
('V' or 'P' characters respectively). A persistent metric means the value
could possibily be preserved across reloads by leveraging a shared memory
between multiple co-processes. Such metrics are identified as "shared" in
the code (since they are possibly shared between multiple co-processes)

Some reg-tests were updated to take that change into account, also, some
outputs in the configuration manual were updated to reflect current
behavior.
2025-07-01 14:15:03 +02:00
Remi Tricot-Le Breton
522bca98e1 MAJOR: jwt: Allow certificate instead of public key in jwt_verify converter
The 'jwt_verify' converter could only be passed public keys as second
parameter instead of full-on public certificates. This patch allows
proper certificates to be used.
Those certificates can be loaded in ckch_stores like any other
certificate which means that all the certificate-related operations that
can be made via the CLI can now benefit JWT validation as well.

We now have two ways JWT validation can work, the legacy one which only
relies on public keys which could not be stored in ckch_stores without
some in depth changes in the way the ckch_stores are built. In this
legacy way, the public keys are fully stored in a cache dedicated to JWT
only which does not have any CLI commands and any way to update them
during runtime. It also requires that all the public keys used are
passed at least once explicitely to the 'jwt_verify' converter so that
they can be loaded during init.
The new way uses actual certificates, either already stored in the
ckch_store tree (if predefined in a crt-store or already used previously
in the configuration) or loaded in the ckch_store tree during init if
they are explicitely used in the configuration like so:
    var(txn.bearer),jwt_verify(txn.jwt_alg,"cert.pem")

When using a variable (or any other way that can only be resolved during
runtime) in place of the converter's <key> parameter, the first time we
encounter a new value (for which we don't have any entry in the jwt
tree) we will lock the ckch_store tree and try to perform a lookup in
it. If the lookup fails, an entry will still be inserted into the jwt
tree so that any following call with this value avoids performing the
ckch_store tree lookup.
2025-06-30 17:59:55 +02:00
Remi Tricot-Le Breton
cd89ce1766 MINOR: jwt: Rename pkey to pubkey in jwt_cert_tree_entry struct
Rename the jwt_cert_tree_entry member pkey to pubkey to avoid any
confusion between private and public key.
2025-06-30 17:59:55 +02:00
Christopher Faulet
a2a142bf40 BUG/MEDIUM: hlua: Forbid any L6/L7 sample fetche functions from lua services
It was already forbidden to use HTTP sample fetch functions from lua
services. An error is triggered if it happens. However, the error must be
extended to any L6/L7 sample fetch functions.

Indeed, a lua service is an applet. It totally unexepected for an applet to
access to input data in a channel's buffer. These data have not been
analyzed yet and are still subject to any change. An applet, lua or not,
must never access to "not forwarded" data. Only output data are
available. For now, if a lua applet relies on any L6/L7 sampel fetch
functions, the behavior is undefined and not consistent.

So to fix the issue, hlua flag HLUA_F_MAY_USE_HTTP is renamed to
HLUA_F_MAY_USE_CHANNELS_DATA. This flag is used to prevent any lua applet to
use L6/L7 sample fetch functions.

This patch could be backported to all stable versions.
2025-06-30 16:47:59 +02:00
Aurelien DARRAGON
4fcc9b5572 MINOR: counters: rename last_change counter to last_state_change
Since proxy and server struct already have an internal last_change
variable and we cannot merge it with the shared counter one, let's
rename the last_change counter to be more specific and prevent the
mixup between the two.

last_change counter is renamed to last_state_change, and unlike the
internal last_change, this one is a shared counter so it is expected
to be updated by other processes in our back.

However, when updating last_state_change counter, we use the value
of the server/proxy last_change as reference value.
2025-06-30 16:26:38 +02:00
Aurelien DARRAGON
5b1480c9d4 MEDIUM: proxy: add and use a separate last_change variable for internal use
Same motivation as previous commit, proxy last_change is "abused" because
it is used for 2 different purposes, one for stats, and the other one
for process-local internal use.

Let's add a separate proxy-only last_change variable for internal use,
and leave the last_change shared (and thread-grouped) counter for
statistics.
2025-06-30 16:26:31 +02:00
Aurelien DARRAGON
01dfe17acf MEDIUM: server: add and use a separate last_change variable for internal use
last_change server metric is used for 2 separate purposes. First it is
used to report last server state change date for stats and other related
metrics. But it is also used internally, including in sensitive paths,
such as lb related stuff to take decision or perform computations
(ie: in srv_dynamic_maxconn()).

Due to last_change counter now being split over thread groups since 16eb0fa
("MAJOR: counters: dispatch counters over thread groups"), reading the
aggregated value has a cost, and we cannot afford to consult last_change
value from srv_dynamic_maxconn() anymore. Moreover, since the value is
used to take decision for the current process we don't wan't the variable
to be updated by another process in our back.

To prevent performance regression and sharing issues, let's instead add a
separate srv->last_change value, which is not updated atomically (given how
rare the  updates are), and only serves for places where the use of the
aggregated last_change counter/stats (split over thread groups) is too
costly.
2025-06-30 16:26:25 +02:00
Aurelien DARRAGON
837762e2ee MINOR: mailers: warn if mailers are configured but not actually used
Now that native mailers configuration is only usable with Lua mailers,
Willy noticed that we lack a way to warn the user if mailers were
previously configured on an older version but Lua mailers were not loaded,
which could trick the user into thinking mailers keep working when
transitionning to 3.2 while it is not.

In this patch we add the 'core.use_native_mailers_config()' Lua function
which should be called in Lua script body before making use of
'Proxy:get_mailers()' function to retrieve legacy mailers configuration
from haproxy main config. This way haproxy effectively knows that the
native mailers config is actually being used from Lua (which indicates
user correctly migrated from native mailers to Lua mailers), else if
mailers are configured but not used from Lua then haproxy warns the user
about the fact that they will be ignored unless they are used from Lua.
(e.g.: using the provided 'examples/lua/mailers.lua' to ease transition)
2025-06-27 16:41:18 +02:00
Frederic Lecaille
194e3bc2d5 MINOR: quic-be: address validation support implementation (RETRY)
- Add ->retry_token and ->retry_token_len new quic_conn struct members to store
  the retry tokens. These objects are allocated by quic_rx_packet_parse() and
  released by quic_conn_release().
- Add <pool_head_quic_retry_token> new pool for these tokens.
- Implement quic_retry_packet_check() to check the integrity tag of these tokens
  upon RETRY packets receipt. quic_tls_generate_retry_integrity_tag() is called
  by this new function. It has been modified to pass the address where the tag
  must be generated
- Add <resend> new parameter to quic_pktns_discard(). This function is called
  to discard the packet number spaces where the already TX packets and frames are
  attached to. <resend> allows the caller to prevent this function to release
  the in flight TX packets/frames. The frames are requeued to be resent.
- Modify quic_rx_pkt_parse() to handle the RETRY packets. What must be done upon
  such packets receipt is:
  - store the retry token,
  - store the new peer SCID as the DCID of the connection. Note that the peer will
    modify again its SCID. This is why this SCID is also stored as the ODCID
    which must be matched with the peer retry_source_connection_id transport parameter,
  - discard the Initial packet number space without flagging it as discarded and
    prevent retransmissions calling qc_set_timer(),
  - modify the TLS cryptographic cipher contexts (RX/TX),
  - wakeup the I/O handler to send new Initial packets asap.
- Modify quic_transport_param_decode() to handle the retry_source_connection_id
  transport parameter as a QUIC client. Then its caller is modified to
  check this transport parameter matches with the SCID sent by the peer with
  the RETRY packet.
2025-06-26 09:48:00 +02:00
Frederic Lecaille
9cb2acd2f2 MINOR: quic-be: add a "CC connection" backend TX buffer pool
A QUIC client must be able to close a connection sending Initial packets. But
QUIC client Initial packets must always be at least 1200 bytes long. To reduce
the memory use of TX buffers of a connection when in "closing" state, a pool
was dedicated for this purpose but with a too much reduced TX buffer size
(QUIC_MAX_CC_BUFSIZE).

This patch adds a "closing state connection" TX buffer pool with the same role
for QUIC backends.
2025-06-26 09:48:00 +02:00
William Lallemand
7cb6167d04 MAJOR: mworker: remove program section support
This patch removes completely the support for the program section, the
parsing of the section as well as the internals in the mworker does not
support it anymore.

The program section was considered dysfonctional and not fully
compatible with the "mworker V3" model. Users that want to run an
external program must use their init system.

The documentation is cleaned up in another patch.
2025-06-25 16:11:34 +02:00
Remi Tricot-Le Breton
34fc73ba81 MINOR: ssl: Add "renegotiate" server option
This "renegotiate" option can be set on SSL backends to allow secure
renegotiation. It is mostly useful with SSL libraries that disable
secure regotiation by default (such as AWS-LC).
The "no-renegotiate" one can be used the other way around, to disable
secure renegotation that could be allowed by default.
Those two options can be set via "ssl-default-server-options" as well.
2025-06-25 15:23:48 +02:00
Aurelien DARRAGON
5694a98744 MAJOR: mailers: remove native mailers support
As mentioned in 2.8 announce on the mailing list [1] and on the wiki [2]
native mailers were deprecated and planned for removal in 3.3. Now is
the time to drop the legacy code for native mailers which is based on a
tcpcheck "hack" and cannot be maintained. Lua mailers should be used as
a drop in replacement. Indeed, "mailers" and associated config directives
are preserved because mailers config is exposed to Lua, which helps smoothing
the transition from native mailers to Lua based ones.

As a reminder, to keep mailers configuration working as before without
making changes to the config file, simply add the line below to the global
section:

       lua-load examples/lua/mailers.lua

mailers.lua script (provided in the git repository, adjust path as needed)
may be customized by users familiar with Lua, by default it emulates the
behavior of the native (now removed) mailers.

[1]: https://www.mail-archive.com/haproxy@formilux.org/msg43600.html
[2]: https://github.com/haproxy/wiki/wiki/Breaking-changes
2025-06-24 10:55:58 +02:00
Aurelien DARRAGON
c0f6024854 MINOR: hlua: emit a log instead of an alert for aborted actions due to unavailable yield
As reported by Chris Staite in GH #3002, trying to yield from a Lua
action during a client disconnect causes the script to be interrupted
(which is expected) and an alert to be emitted with the error:
"Lua function '%s': yield not allowed".

While this error is well suited for cases where the yield is not expected
at all (ie: when context doesn't allow it) and results from a yield misuse
in the Lua script, it isn't the case when the yield is exceptionnally not
available due to an abort or error in the request/response processing.
Because of that we raise an alert but the user cannot do anything about it
(the script is correct), so it is confusing and polluting the logs.

In this patch we introduce the ACT_OPT_FINAL_EARLY flag which is a
complementary flag to ACT_OPT_FIRST. This flag is set when the
ACT_OPT_FIRST is set earlier than normal (due to error/abort).
hlua_action() then checks for this flag to decide whether an error (alert)
or a simple log message should be emitted when the yield is not available.

It should solve GH #3002. Thanks to Chris Staite (@chrisstaite-menlo) for
having reported the issue and suggested a solution.
2025-06-24 10:55:55 +02:00
Amaury Denoyelle
74b95922ef BUG/MEDIUM: quic: do not release BE quic-conn prior to upper conn
For frontend side, quic_conn is only released if MUX wasn't allocated,
either due to handshake abort, in which case upper layer is never
allocated, or after transfer completion when full conn + MUX layers are
already released.

On the backend side, initialization is not performed in the same order.
Indeed, in this case, connection is first instantiated, the nthe
quic_conn is created to execute the handshake, while MUX is still only
allocated on handshake completion. As such, it is not possible anymore
to free immediately quic_conn on handshake failure. Else, this can cause
crash if the connection try to reaccess to its transport layer after
quic_conn release.

Such crash can easily be reproduced in case of connection error to the
QUIC server. Here is an example of an experienced backtrace.

Thread 1 "haproxy" received signal SIGSEGV, Segmentation fault.
  0x0000555555739733 in quic_close (conn=0x55555734c0d0, xprt_ctx=0x5555573a6e50) at src/xprt_quic.c:28
  28              qc->conn = NULL;
  [ ## gdb ## ] bt
  #0  0x0000555555739733 in quic_close (conn=0x55555734c0d0, xprt_ctx=0x5555573a6e50) at src/xprt_quic.c:28
  #1  0x00005555559c9708 in conn_xprt_close (conn=0x55555734c0d0) at include/haproxy/connection.h:162
  #2  0x00005555559c97d2 in conn_full_close (conn=0x55555734c0d0) at include/haproxy/connection.h:206
  #3  0x00005555559d01a9 in sc_detach_endp (scp=0x7fffffffd648) at src/stconn.c:451
  #4  0x00005555559d05b9 in sc_reset_endp (sc=0x55555734bf00) at src/stconn.c:533
  #5  0x000055555598281d in back_handle_st_cer (s=0x55555734adb0) at src/backend.c:2754
  #6  0x000055555588158a in process_stream (t=0x55555734be10, context=0x55555734adb0, state=516) at src/stream.c:1907
  #7  0x0000555555dc31d9 in run_tasks_from_lists (budgets=0x7fffffffdb30) at src/task.c:655
  #8  0x0000555555dc3dd3 in process_runnable_tasks () at src/task.c:889
  #9  0x0000555555a1daae in run_poll_loop () at src/haproxy.c:2865
  #10 0x0000555555a1e20c in run_thread_poll_loop (data=0x5555569d1c00 <ha_thread_info>) at src/haproxy.c:3081
  #11 0x0000555555a1f66b in main (argc=5, argv=0x7fffffffde18) at src/haproxy.c:3671

To fix this, change the condition prior to calling quic_conn release. If
<conn> member is not NULL, delay the release, similarly to the case when
MUX is allocated. This allows connection to be freed first, and detach
from quic_conn layer through close xprt operation.

No need to backport.
2025-06-20 17:46:10 +02:00
Amaury Denoyelle
06cab99a0e MINOR: mux-quic: support max bidi streams value set by the peer
Implement support for MAX_STREAMS frame. On frontend, this was mostly
useless as haproxy would never initiate new bidirectional streams.
However, this becomes necessary to control stream flow-control when
using QUIC as a client on the backend side.

Parsing of MAX_STREAMS is implemented via new qcc_recv_max_streams().
This allows to update <ms_uni>/<ms_bidi> QCC fields.

This patch is necessary to achieve QUIC backend connection reuse.
2025-06-18 17:25:27 +02:00
Amaury Denoyelle
805a070ab9 BUG/MINOR: mux-quic/h3: properly handle too low peer fctl initial stream
Previously, no check on peer flow-control was implemented prior to open
a local QUIC stream. This was a small problem for frontend
implementation, as in this case haproxy as a server never opens
bidirectional streams.

On frontend, the only stream opened by haproxy in this case is for
HTTP/3 control unidirectional data. If the peer uses an initial value
for max uni streams set to 0, it would violate its flow control, and the
peer will probably close the connection. Note however that RFC 9114
mandates that each peer defines minimal initial value so that at least
the control stream can be created.

This commit improves the situation of too low initial max uni streams
value. Now, on HTTP/3 layer initialization, haproxy preemptively checks
flow control limit on streams via a new function
qcc_fctl_avail_streams(). If credit is already expired due to a too
small initial value, haproxy preemptively closes the connection using
H3_ERR_GENERAL_PROTOCOL_ERROR. This behavior is better as haproxy is now
the initiator of the connection closure.

This should be backported up to 2.8.
2025-06-18 17:18:55 +02:00
Amaury Denoyelle
c807182ec9 CLEANUP: connection: remove unused mux-ops dedicated to QUIC
Remove avail_streams_bidi/avail_streams_uni mux_ops. These callbacks
were designed to be specific to QUIC. However, they won't be necessary,
as stream layer only cares about bidirectional streams.
2025-06-18 17:02:50 +02:00
Amaury Denoyelle
555ec99d43 MINOR: h3: adjust auth request encoding or fallback to host
Implement proper encoding of HTTP/3 authority pseudo-header during
request transcoding on the backend side. A pseudo-header :authority is
encoded if a value can be extracted from HTX start-line. A special check
is also implemented to ensure that a host header is not encoded if
:authority already is.

A new function qpack_encode_auth() is defined to implement QPACK
encoding of :authority header using literal field line with name ref.
2025-06-16 18:11:09 +02:00
Amaury Denoyelle
235e818fa1 MINOR: h3: complete HTTP/3 request scheme encoding
Previously, scheme was always set to https when transcoding an HTX
start-line into a HTTP/3 request. Change this so this conversion is now
fully compliant.

If no scheme is specified by the client, which is what happens most of
the time with HTTP/1, https is set for the HTTP/3 request. Else, reuse
the scheme requested by the client.

If either https or http is set, qpack_encode_scheme will encode it using
entry from QPACK static table. Else, a full literal field line with name
ref is used instead as the scheme value is specified as-is.
2025-06-16 18:11:09 +02:00
Amaury Denoyelle
a0912cf914 MINOR: h3: complete HTTP/3 request method encoding
On the backend side, HTX start-line is converted into a HTTP/3 request
message. Previously, GET method was hardcoded. Implement proper method
conversion, by extracting it from the HTX start-line.

qpack_encode_method() has also been extended, so that it is able to
encode any method, either using a static table entry, or with a literal
field line with name ref representation.
2025-06-16 18:11:09 +02:00
Amaury Denoyelle
7157adb154 MINOR: h3: support basic HTX start-line conversion into HTTP/3 request
This commit is the first one of a serie which aim is to implement
transcoding of a HTX request into HTTP/3, which is necessary for QUIC
backend support.

Transcoding is implementing via a new function h3_req_headers_send()
when a HTX start-line is parsed. For now, most of the request fields are
hardcoded, using a GET method. This will be adjusted in the next
following patches.
2025-06-16 18:11:09 +02:00
Amaury Denoyelle
e8775d51df MINOR: mux-quic: define flag for backend side
Mux connection is flagged with new QC_CF_IS_BACK if used on the backend
side. For now the only change is during traces, to be able to
differentiate frontend and backend usage.
2025-06-12 11:28:54 +02:00
Amaury Denoyelle
93b904702f MINOR: mux-quic: improve documentation for snd/rcv app-ops
Complete document for rcv_buf/snd_buf operations. In particular, return
value is now explicitely defined. For H3 layer, associated functions
documentation is also extended.
2025-06-12 11:28:54 +02:00
Frederic Lecaille
b9703cf711 MINOR: quic-be: get rid of ->li quic_conn member
Replace ->li quic_conn pointer to struct listener member by  ->target which is
an object type enum and adapt the code.
Use __objt_(listener|server)() where the object type is known. Typically
this is were the code which is specific to one connection type (frontend/backend).
Remove <server> parameter passed to qc_new_conn(). It is redundant with the
<target> parameter.
GSO is not supported at this time for QUIC backend. qc_prep_pkts() is modified
to prevent it from building more than an MTU. This has as consequence to prevent
qc_send_ppkts() to use GSO.
ssl_clienthello.c code is run only by listeners. This is why __objt_listener()
is used in place of ->li.
2025-06-11 18:37:34 +02:00
Frederic Lecaille
2d076178c6 MINOR: quic-be: Store asap the DCID
Store the peer connection ID (SCID) as the connection DCID as soon as an Initial
packet is received.
Stop comparing the packet to QUIC_PACKET_TYPE_0RTT is already match as
QUIC_PACKET_TYPE_INITIAL.
A QUIC server must not send too short datagram with ack-eliciting packets inside.
This cannot be done from quic_rx_pkt_parse() because one does not know if
there is ack-eliciting frame into the Initial packets. If the packet must be
dropped, this is after having parsed it!
2025-06-11 18:37:34 +02:00
Frederic Lecaille
43d88a44f1 MINOR: quic-be: Datagrams and packet parsing support
Modify quic_dgram_parse() to stop passing it a listener as third parameter.
In place the object type address of the connection socket owner is passed
to support the haproxy servers with QUIC as transport protocol.
qc_owner_obj_type() is implemented to return this address.
qc_counters() is also implemented to return the QUIC specific counters of
the proxy of owner of the connection.
quic_rx_pkt_parse() called by quic_dgram_parse() is also modify to use
the object type address used by this latter as last parameter. It is
also modified to send Retry packet only from listeners. A QUIC client
(connection to haproxy QUIC servers) must drop the Initial packets with
non null token length. It is also not supposed to receive O-RTT packets
which are dropped.
2025-06-11 18:37:34 +02:00
Frederic Lecaille
89d5a59933 MINOR: quic-be: add field for max_udp_payload_size into quic_conn
Add ->max_udp_payload_size new member to quic_conn struct.
Initialize it from qc_new_conn().
Adapt qc_snd_buf() to use it.
2025-06-11 18:37:34 +02:00
Frederic Lecaille
52ec3430f2 MINOR: sock: Add protocol and socket types parameters to sock_create_server_socket()
This patch only adds <proto_type> new proto_type enum parameter and <sock_type>
socket type parameter to sock_create_server_socket() and adapts its callers.
This is to prepare the use of this function by QUIC servers/backends.
2025-06-11 18:37:34 +02:00
Frederic Lecaille
9c84f64652 MINOR: quic-be: Add a function to initialize the QUIC client transport parameters
Implement qc_srv_params_init() to initialize the QUIC client transport parameters
in relation with connections to haproxy servers/backends.
2025-06-11 18:37:34 +02:00
Frederic Lecaille
f49bbd36b9 MINOR: quic-be: SSL sessions initializations
Modify qc_alloc_ssl_sock_ctx() to pass the connection object as parameter. It is
NULL for a QUIC listener, not NULL for a QUIC server. This connection object is
set as value for ->conn quic_conn struct member. Initialise the SSL session object from
this function for QUIC servers.
qc_ssl_set_quic_transport_params() is also modified to pass the SSL object as parameter.
This is the unique parameter this function needs. <qc> parameter is used only for
the trace.
SSL_do_handshake() must be calle as soon as the SSL object is initialized for
the QUIC backend connection. This triggers the TLS CRYPTO data delivery.
tasklet_wakeup() is also called to send asap these CRYPTO data.
Modify the QUIC_EV_CONN_NEW event trace to dump the potential errors returned by
SSL_do_handshake().
2025-06-11 18:37:34 +02:00
Frederic Lecaille
1408d94bc4 MINOR: quic-be: ssl_sock contexts allocation and misc adaptations
Implement ssl_sock_new_ssl_ctx() to allocate a SSL server context as this is currently
done for TCP servers and also for QUIC servers depending on the <is_quic> boolean value
passed as new parameter. For QUIC servers, this function calls ssl_quic_srv_new_ssl_ctx()
which is specific to QUIC.
2025-06-11 18:37:34 +02:00
Frederic Lecaille
1e45690656 MINOR: quic-be: Add a function for the TLS context allocations
Implement ssl_quic_srv_new_ssl_ctx() whose aim is to allocate a TLS context
for QUIC servers.
2025-06-11 18:37:34 +02:00
Frederic Lecaille
24fc44c44d MINOR: quic-be: QUIC backend XPRT and transport parameters init during parsing
Add ->quic_params new member to server struct.
Also set the ->xprt member of the server being initialized and initialize asap its
transport parameters from _srv_parse_init().
2025-06-11 18:37:34 +02:00
Frederic Lecaille
990c9f95f7 MINOR: quic-be: Correct Version Information transp. param encoding
According to the RFC, a QUIC client must encode the QUIC version it supports
into the "Available Versions" of "Version Information" transport parameter
order by descending preference.

This is done defining <quic_version_2> and <quic_version_draft_29> new variables
pointers to the corresponding version of <quic_versions> array elements.
A client announces its available versions as follows: v1, v2, draft29.
2025-06-11 18:37:34 +02:00
Amaury Denoyelle
bdd5e58179 MINOR: server: implement helper to identify QUIC servers
Define srv_is_quic() which can be used to quickly identified if a server
uses QUIC protocol.
2025-06-11 18:37:19 +02:00
Olivier Houchard
6993981cd6 BUG/MEDIUM: fd: Use the provided tgid in fd_insert() to get tgroup_info
In fd_insert(), use the provided tgid to ghet the thread group info,
instead of using the one of the current thread, as we may call
fd_insert() from a thread of another thread group, that will happen at
least when binding the listeners. Otherwise we'd end up accessing the
thread mask containing enabled thread of the wrong thread group, which
can lead to crashes if we're binding on threads not present in the
thread group.
This should fix Github issue #2991.

This should be backported up to 2.8.
2025-06-10 15:10:56 +02:00
Christopher Faulet
18f9c71041 CLEANUP: applet: Simplify a bit comments for applet_put* functions
Instead of repeating which buffer is used depending on the API used by the
applet, a reference to applet_get_outbuf() was added.
2025-06-10 08:16:10 +02:00
Christopher Faulet
79445766a3 MINOR: applet: Add API functions to get data from the input buffer
There was already functions to pushed data from the applet to the stream by
inserting them in the right buffer, depending the applet was using or not
the legacy API. Here, functions to retreive data pushed to the applet by the
stream were added:

  * applet_getchar   : Gets one character

  * applet_getblk    : Copies a full block of data

  * applet_getword   : Copies one text block representing a word using a
                       custom separator as delimiter

  * applet_getline   : Copies one text line

  * applet_getblk_nc : Get one or two blocks of data

  * applet_getword_nc: Gets one or two blocks of text representing a word
                       using a custom separator as delimiter

  * applet_getline_nc: Gets one or two blocks of text representing a line
2025-06-10 08:16:10 +02:00
Christopher Faulet
0d8ecb1edc MINOR: applet: Add API functions to manipulate input and output buffers
In this patch, some functions were added to ease input and output buffers
manipulation, regardless the corresponding applet is using its own buffers
or it is relying on channels buffers. Following functions were added:

  * applet_get_inbuf  : Get the buffer containing data pushed to the applet
                        by the stream

  * applet_get_outbuf : Get the buffer containing data pushed by the applet
                        to the stream

  * applet_input_data : Return the amount of data in the input buffer

  * applet_skip_input : Skips <len> bytes from the input buffer

  * applet_reset_input: Skips all bytes from the input buffer

  * applet_output_room: Returns the amout of space available at the output
                        buffer

  * applet_need_room  : Indicates that the applet have more data to deliver
                        and it needs more room in the output buffer to do
			so
2025-06-10 08:16:10 +02:00
Aurelien DARRAGON
16eb0fab31 MAJOR: counters: dispatch counters over thread groups
Most fe and be counters are good candidates for being shared between
processes. They are now grouped inside "shared" struct sub member under
be_counters and fe_counters.

Now they are properly identified, they would greatly benefit from being
shared over thread groups to reduce the cost of atomic operations when
updating them. For this, we take the current tgid into account so each
thread group only updates its own counters. For this to work, it is
mandatory that the "shared" member from {fe,be}_counters is initialized
AFTER global.nbtgroups is known, because each shared counter causes the stat
to be allocated lobal.nbtgroups times. When updating a counter without
concurrency, the first counter from the array may be updated.

To consult the shared counters (which requires aggregation of per-tgid
individual counters), some helper functions were added to counter.h to
ease code maintenance and avoid computing errors.
2025-06-05 09:59:38 +02:00
Aurelien DARRAGON
12c3ffbb48 MINOR: counters: add local-only internal rates to compute some maxes
cps_max (max new connections received per second), sps_max (max new
sessions per second) and http.rps_max (maximum new http requests per
second) all rely on shared counters (namely conn_per_sec, sess_per_sec and
http.req_per_sec). The problem is that shared counters are about to be
distributed over thread groups, and we cannot afford to compute the
total (for all thread groups) each time we update the max counters.

Instead, since such max counters (relying on shared counters) are a very
few exceptions, let's add internal (sess,conn,req) per sec freq counters
that are dedicated to cps_max, sps_max and http.rps_max computing.

Thanks to that, related *_max counters shouldn't be negatively impacted
by the thread-group distribution, yet they will not benefit from it
either. Related internal freq counters are prefixed with "_" to emphasize
the fact that they should not be used for other purpose (the shared ones,
which are about to be distributed over thread groups in upcoming commits
are still available and must be used instead). The internal ones could
eventually be removed at any time if we find another way to compute the
{cps,sps,http.rps)_max counters.
2025-06-05 09:59:31 +02:00
Aurelien DARRAGON
b72a8bb138 CLEANUP: counters: merge some common counters between {fe,be}_counters_shared
Now that we have a common struct between fe and be shared counters struct
let's perform some cleanup to merge duplicate members into the common
struct part. This will ease code maintenance.
2025-06-05 09:59:24 +02:00
Aurelien DARRAGON
b599138842 MEDIUM: counters: manage shared counters using dedicated helpers
proxies, listeners and server shared counters are now managed via helpers
added in one of the previous commits.

When guid is not set (ie: when not yet assigned), shared counters pointer
is allocated using calloc() (local memory) and a flag is set on the shared
counters struct to know how to manipulate (and free it). Else if guid is
set, then it means that the counters may be shared so while for now we
don't actually use a shared memory location the API is ready for that.

The way it works, for proxies and servers (for which guid is not known
during creation), we first call counters_{fe,be}_shared_get with guid not
set, which results in local pointer being retrieved (as if we just
manually called calloc() to retrieve a pointer). Later (during postparsing)
if guid is set we try to upgrade the pointer from local to shared.

Lastly, since the memory location for some objects (proxies and servers
counters) may change from creation to postparsing, let's update
counters->last_change member directly under counters_{fe,be}_shared_get()
so we don't miss it.

No change of behavior is expected, this is only preparation work.
2025-06-05 09:59:17 +02:00
Aurelien DARRAGON
c10ce1c85b MINOR: counters: add common struct and flags to {fe,be}_counters_shared
fe_counters_shared and be_counters_shared may share some common members
since they are quite similar, so we add a common struct part shared
between the two. struct counters_shared is added for convenience as
a generic pointer to manipulate common members from fe or be shared
counters pointer.

Also, the first common member is added: shared fe and be counters now
have a flags member.
2025-06-05 09:59:10 +02:00
Aurelien DARRAGON
aa53887398 MINOR: counters: add shared counters helpers to get and drop shared pointers
create include/haproxy/counters.h and src/counters.c files to anticipate
for further helpers as some counters specific tasks needs to be carried
out and since counters are shared between multiple object types (ie:
listener, proxy, server..) we need generic helpers.

Add some shared counters helper which are not yet used but will be updated
in upcoming commits.
2025-06-05 09:59:04 +02:00
Aurelien DARRAGON
a0dcab5c45 MAJOR: counters: add shared counters base infrastructure
Shareable counters are not tagged as shared counters and are dynamically
allocated in separate memory area as a prerequisite for being stored
in shared memory area. For now, GUID and threads groups are not taken into
account, this is only a first step.

also we ensure all counters are now manipulated using atomic operations,
namely, "last_change" counter is now read from and written to using atomic
ops.

Despite the numerous changes caused by the counters being moved away from
counters struct, no change of behavior should be expected.
2025-06-05 09:58:58 +02:00
Christopher Faulet
8ee650a88b CLEANUP: applet: Update comment for applet_put* functions
These functions were copied from the channel API and modified to work with
applets using the new API or the legacy one. However, the comments were
updated accordingly. It is the purpose of this patch.
2025-06-03 15:03:30 +02:00
Aurelien DARRAGON
368d01361a MEDIUM: server: add and use srv_init() function
rename _srv_postparse() internal function to srv_init() function and group
srv_init_per_thr() plus idle conns list init inside it. This way we can
perform some simplifications as srv_init() performs multiple server
init steps after parsing.

SRV_F_CHECKED flag was added, it is automatically set when srv_init()
runs successfully. If the flag is already set and srv_init() is called
again, nothing is done. This permis to manually call srv_init() earlier
than the default POST_CHECK hook when needed without risking to do things
twice.
2025-06-02 17:51:33 +02:00
Aurelien DARRAGON
889ef6f67b MEDIUM: server: automatically add server to proxy list in new_server()
while new_server() takes the parent proxy as argument and even assigns
srv->proxy to the parent proxy, it didn't actually inserted the server
to the parent proxy server list on success.

The result is that sometimes we add the server to the list after
new_server() is called, and sometimes we don't.

This is really error-prone and because of that hooks such as
REGISTER_POST_SERVER_CHECK() which as run for all servers listed in
all proxies may not be relied upon for servers which are not actually
inserted in their parent proxy server list. Plus it feels very strange
to have a server that points to a proxy, but then the proxy doesn't know
about it because it cannot find it in its server list.

To prevent errors and make proxy->srv list reliable, we move the insertion
logic directly under new_server(). This requires to know if we are called
during parsing or during runtime to either insert or append the server to
the parent proxy list. For that we use PR_FL_CHECKED flag from the parent
proxy (if the flag is set, then the proxy was checked so we are past the
init phase, thus we assume we are called during runtime)

This implies that during startup if new_server() has to be cancelled on
error paths we need to call srv_detach() (which is now exposed in server.h)
before srv_drop().

The consequence of this commit is that REGISTER_POST_SERVER_CHECK() should
not run reliably on all servers created using new_server() (without having
to manually loop on global servers_list)
2025-06-02 17:51:30 +02:00
Aurelien DARRAGON
943958c3ff MINOR: proxy: add a true list containing all proxies
We have global proxies_list pointer which is announced as the list of
"all existing proxies", but in fact it only represents regular proxies
declared on the config file through "listen, frontend or backend" keywords

It is ambiguous, and we currently don't have a straightforwrd method to
iterate over all proxies (either public or internal ones) within haproxy

Instead we still have to manually iterate over multiple lists (main
proxies, log-forward proxies, peer proxies..) which is error-prone.

In this patch we add a struct list member (8 bytes) inside struct proxy
in order to store every proxy (except default ones) within a global
"proxies" list which is actually representative for all proxies existing
under haproxy process, like we already have for servers.
2025-06-02 17:51:21 +02:00
Aurelien DARRAGON
d04843167c MINOR: stats: add stat_col flags
Add stat_col flags member to store .generic bit and prepare for upcoming
flags. No functional change expected.
2025-06-02 17:51:08 +02:00
Willy Tarreau
9f4cd435d3 [RELEASE] Released version 3.3-dev0
Released version 3.3-dev0 with the following main changes :
    - MINOR: version: mention that it's development again
2025-05-28 16:46:34 +02:00
Willy Tarreau
8809251ee0 MINOR: version: mention that it's development again
This essentially reverts a6458fd426.
2025-05-28 16:46:15 +02:00
Willy Tarreau
a6458fd426 MINOR: version: mention that it's 3.2 LTS now.
The version will be maintained up to around Q2 2030. Let's
also update the INSTALL file to mention this.
2025-05-28 16:31:27 +02:00
Christopher Faulet
99e755d673 MINOR: listeners: Add support for a label on bind line
It is now possile to set a label on a bind line. All sockets attached to
this bind line inherits from this label. The idea is to be able to groud of
sockets. For now, there is no mechanism to create these groups, this must be
done by hand.
2025-05-26 19:00:00 +02:00
Willy Tarreau
3494775a1f MINOR: ssl: support strict-sni in ssl-default-bind-options
Several users already reported that it would be nice to support
strict-sni in ssl-default-bind-options. However, in order to support
it, we also need an option to disable it.

This patch moves the setting of the option from the strict_sni field
to a flag in the ssl_options field so that it can be inherited from
the default bind options, and adds a new "no-strict-sni" directive to
allow to disable it on a specific "bind" line.

The test file "del_ssl_crt-list.vtc" which already tests both options
was updated to make use of the default option and the no- variant to
confirm everything continues to work.
2025-05-22 15:31:54 +02:00
Willy Tarreau
a1577a89a0 MINOR: glitches: add global setting "tune.glitches.kill.cpu-usage"
It was mentioned during the development of glitches that it would be
nice to support not killing misbehaving connections below a certain
CPU usage so that poor implementations that routinely misbehave without
impact are not killed. This is now possible by setting a CPU usage
threshold under which we don't kill them via this parameter. It defaults
to zero so that we continue to kill them by default.
2025-05-21 15:47:42 +02:00
Amaury Denoyelle
00d90e8839 MINOR: quic: adjust quic_conn-t.h include list
Adjust include list in quic_conn-t.h. This file is included in many QUIC
source, so it is useful to keep as lightweight as possible. Note that
connection/QUIC MUX are transformed into forward declaration for better
layer separation.
2025-05-21 14:44:27 +02:00
Amaury Denoyelle
01e3b2119a MINOR: quic: add some missing includes
Insert some missing includes statement in QUIC source files. This was
detected after the next commit which adjust the include list used in
quic_conn-t.h file.
2025-05-21 14:44:27 +02:00
Amaury Denoyelle
f286288471 MINOR: quic: refactor handling of streams after MUX release
quic-conn layer has to handle itself STREAM frames after MUX release. If
the stream was already seen, it is probably only a retransmitted frame
which can be safely ignored. For other streams, an active closure may be
needed.

Thus it's necessary that quic-conn layer knows the highest stream ID
already handled by the MUX after its release. Previously, this was done
via <nb_streams> member array in quic-conn structure.

Refactor this by replacing <nb_streams> by two members called
<stream_max_uni>/<stream_max_bidi>. Indeed, it is unnecessary for
quic-conn layer to monitor locally opened uni streams, as the peer
cannot by definition emit a STREAM frame on it. Also, bidirectional
streams are always opened by the remote side.

Previously, <nb_streams> were set by quic-stream layer. Now,
<stream_max_uni>/<stream_max_bidi> members are only set one time, just
prior to QUIC MUX release. This is sufficient as quic-conn do not use
them if the MUX is available.

Note that previously, IDs were used relatively to their type, thus
incremented by 1, after shifting the original value. For simplification,
use the plain stream ID, which is incremented by 4.
2025-05-21 14:26:45 +02:00
Amaury Denoyelle
07d41a043c MINOR: quic: move function to check stream type in utils
Move general function to check if a stream is uni or bidirectional from
QUIC MUX to quic_utils module. This should prevent unnecessary include
of QUIC MUX header file in other sources.
2025-05-21 14:17:41 +02:00
Amaury Denoyelle
cf45bf1ad8 CLEANUP: quic: remove unused cbuf module
Cbuf are not used anymore. Remove the related source and header files,
as well as include statements in the rest of QUIC source files.
2025-05-21 14:16:37 +02:00
Frederic Lecaille
b3ac1a636c MINOR: quic: implement all remaining callbacks for OpenSSL 3.5 QUIC API
The quic_conn struct is modified for two reasons. The first one is to store
the encoded version of the local tranport parameter as this is done for
USE_QUIC_OPENSSL_COMPAT. Indeed, the local transport parameter "should remain
valid until after the parameters have been sent" as mentionned by
SSL_set_quic_tls_cbs(3) manual. In our case, the buffer is a static buffer
attached to the quic_conn object. qc_ssl_set_quic_transport_params() function
whose role is to call SSL_set_tls_quic_transport_params() (aliased by
SSL_set_quic_transport_params() to set these local tranport parameter into
the TLS stack from the buffer attached to the quic_conn struct.

The second quic_conn struct modification is the addition of the  new ->prot_level
(SSL protection level) member added to the quic_conn struct to store "the most
recent write encryption level set via the OSSL_FUNC_SSL_QUIC_TLS_yield_secret_fn
callback (if it has been called)" as mentionned by SSL_set_quic_tls_cbs(3) manual.

This patches finally implements the five remaining callacks to make the haproxy
QUIC implementation work.

OSSL_FUNC_SSL_QUIC_TLS_crypto_send_fn() (ha_quic_ossl_crypto_send) is easy to
implement. It calls ha_quic_add_handshake_data() after having converted
qc->prot_level TLS protection level value to the correct ssl_encryption_level_t
(boringSSL API/quictls) value.

OSSL_FUNC_SSL_QUIC_TLS_crypto_recv_rcd_fn() (ha_quic_ossl_crypto_recv_rcd())
provide the non-contiguous addresses to the TLS stack, without releasing
them.

OSSL_FUNC_SSL_QUIC_TLS_crypto_release_rcd_fn() (ha_quic_ossl_crypto_release_rcd())
release these non-contiguous buffer relying on the fact that the list of
encryption level (qc->qel_list) is correctly ordered by SSL protection level
secret establishements order (by the TLS stack).

OSSL_FUNC_SSL_QUIC_TLS_yield_secret_fn() (ha_quic_ossl_got_transport_params())
is a simple wrapping function over ha_quic_set_encryption_secrets() which is used
by boringSSL/quictls API.

OSSL_FUNC_SSL_QUIC_TLS_got_transport_params_fn() (ha_quic_ossl_got_transport_params())
role is to store the peer received transport parameters. It simply calls
quic_transport_params_store() and set them into the TLS stack calling
qc_ssl_set_quic_transport_params().

Also add some comments for all the OpenSSL 3.5 QUIC API callbacks.

This patch have no impact on the other use of QUIC API provided by the others TLS
stacks.
2025-05-20 15:00:06 +02:00
Frederic Lecaille
dc6a3c329a MINOR: quic: Allow the use of the new OpenSSL 3.5.0 QUIC TLS API (to be completed)
This patch allows the use of the new OpenSSL 3.5.0 QUIC TLS API when it is
available and detected at compilation time. The detection relies on the presence of the
OSSL_FUNC_SSL_QUIC_TLS_CRYPTO_SEND macro from openssl-compat.h. Indeed this
macro is defined by OpenSSL since 3.5.0 version. It is not defined by quictls.
This helps in distinguishing these two TLS stacks. When the detection succeeds,
HAVE_OPENSSL_QUIC is also defined by openssl-compat.h. Then, this is this new macro
which is used to detect the availability of the new OpenSSL 3.5.0 QUIC TLS API.

Note that this detection is done only if USE_QUIC_OPENSSL_COMPAT is not asked.
So, USE_QUIC_OPENSSL_COMPAT and HAVE_OPENSSL_QUIC are exclusive.

At the same location, from openssl-compat.h, ssl_encryption_level_t enum is
defined. This enum was defined by quictls and expansively used by the haproxy
QUIC implementation. SSL_set_quic_transport_params() is replaced by
SSL_set_quic_tls_transport_params. SSL_set_quic_early_data_enabled() (quictls) is also replaced
by SSL_set_quic_tls_early_data_enabled() (OpenSSL). SSL_quic_read_level() (quictls)
is not defined by OpenSSL. It is only used by the traces to log the current
TLS stack decryption level (read). A macro makes it return -1 which is an
usused values.

The most of the differences between quictls and OpenSSL QUI APIs are in quic_ssl.c
where some callbacks must be defined for these two APIs. This is why this
patch modifies quic_ssl.c to define an array of OSSL_DISPATCH structs: <ha_quic_dispatch>.
Each element of this arry defines a callback. So, this patch implements these
six callabcks:

  - ha_quic_ossl_crypto_send()
  - ha_quic_ossl_crypto_recv_rcd()
  - ha_quic_ossl_crypto_release_rcd()
  - ha_quic_ossl_yield_secret()
  - ha_quic_ossl_got_transport_params() and
  - ha_quic_ossl_alert().

But at this time, these implementations which must return an int return 0 interpreted
as a failure by the OpenSSL QUIC API, except for ha_quic_ossl_alert() which
is implemented the same was as for quictls. The five remaining functions above
will be implemented by the next patches to come.

ha_quic_set_encryption_secrets() and ha_quic_add_handshake_data() have been moved
to be defined for both quictls and OpenSSL QUIC API.

These callbacks are attached to the SSL objects (sessions) calling qc_ssl_set_cbs()
new function. This latter callback the correct function to attached the correct
callbacks to the SSL objects (defined by <ha_quic_method> for quictls, and
<ha_quic_dispatch> for OpenSSL).

The calls to SSL_provide_quic_data() and SSL_process_quic_post_handshake()
have been also disabled. These functions are not defined by OpenSSL QUIC API.
At this time, the functions which call them are still defined when HAVE_OPENSSL_QUIC
is defined.
2025-05-20 15:00:06 +02:00
Willy Tarreau
411b04c7d3 IMPORT: slz: use a better hash for machines with a fast multiply
The current hash involves 3 simple shifts and additions so that it can
be mapped to a multiply on architecures having a fast multiply. This is
indeed what the compiler does on x86_64. A large range of values was
scanned to try to find more optimal factors on machines supporting such
a fast multiply, and it turned out that new factor 0x1af42f resulted in
smoother hashes that provided on average 0.4% better compression on both
the Silesia corpus and an mbox file composed of very compressible emails
and uncompressible attachments. It's even slightly better than CRC32C
while being faster on Skylake. This patch enables this factor on archs
with a fast multiply.

This is slz upstream commit 82ad1e75c13245a835c1c09764c89f2f6e8e2a40.
2025-05-16 16:43:53 +02:00
Willy Tarreau
0a91c6dcae BUILD: debug: mark ha_crash_now() as attribute(noreturn)
Building on MIPS64 with clang16 incorrectly reports some uninitialized
value warnings in stats-proxy.c due to some calls to ABORT_NOW() where
the compiler didn't know the code wouldn't return. Let's properly mark
the function as noreturn, and take this opportunity for also marking it
unused to avoid possible warnings depending on the build options (if
ABORT_NOW is not used). No backport needed though it will not harm.
2025-05-16 16:43:53 +02:00
Christopher Faulet
f45a632bad BUG/MEDIUM: stconn: Disable 0-copy forwarding for filters altering the payload
It is especially a problem with Lua filters, but it is important to disable
the 0-copy forwarding if a filter alters the payload, or at least to be able
to disable it. While the filter is registered on the data filtering, it is
not an issue (and it is the common case) because, there is now way to
fast-forward data at all. But it may be an issue if a filter decides to
alter the payload and to unregister from data filtering. In that case, the
0-copy forwarding can be re-enabled in a hardly precdictable state.

To fix the issue, a SC flags was added to do so. The HTTP compression filter
set it and lua filters too if the body length is changed (via
HTTPMessage.set_body_len()).

Note that it is an issue because of a bad design about the HTX. Many info
about the message are stored in the HTX structure itself. It must be
refactored to move several info to the stream-endpoint descriptor. This
should ease modifications at the stream level, from filter or a TCP/HTTP
rules.

This should be backported as far as 3.0. If necessary, it may be backported
on lower versions, as far as 2.6. In that case, it must be reviewed and
adapted.
2025-05-16 15:11:37 +02:00
Christopher Faulet
a3940614c2 BUG/MEDIUM: mux-spop: Remove frame parsing states from the SPOP connection state
SPOP_CS_FRAME_H and SPOP_CS_FRAME_P states, that were used to handle frame
parsing, were removed. The demux process now relies on the demux stream ID
to know if it is waiting for the frame header or the frame
payload. Concretly, when the demux stream ID is not set (dsi == -1), the
demuxer is waiting for the next frame header. Otherwise (dsi >= 0), it is
waiting for the frame payload. It is especially important to be able to
properly handle DISCONNECT frames sent by the agents.

SPOP_CS_RUNNING state is introduced to know the hello handshake was finished
and the SPOP connection is able to open SPOP streams and exchange NOTIFY/ACK
frames with the agents.

It depends on the following fixes:

  * MINOR: mux-spop: Don't set SPOP connection state to FRAME_H after ACK parsing
  * BUG/MINOR: mux-spop: Make the demux stream ID a signed integer

This change will be mandatory for the next fix. It must be backported to 3.1
with the commits above.
2025-05-13 19:51:40 +02:00
Willy Tarreau
e049bd00ab MEDIUM: config: change default limits to 1024 threads and 32 groups
A test run on a dual-socket EPYC 9845 (2x160 cores) showed that we'll
be facing new limits during the lifetime of 3.2 with our current 16
groups and 256 threads max:

  $ cat test.cfg
  global
      cpu-policy perforamnce

  $ ./haproxy -dc -c -f test.cfg
  ...
  Thread CPU Bindings:
    Tgrp/Thr  Tid        CPU set
    1/1-32    1-32       32: 0-15,320-335
    2/1-32    33-64      32: 16-31,336-351
    3/1-32    65-96      32: 32-47,352-367
    4/1-32    97-128     32: 48-63,368-383
    5/1-32    129-160    32: 64-79,384-399
    6/1-32    161-192    32: 80-95,400-415
    7/1-32    193-224    32: 96-111,416-431
    8/1-32    225-256    32: 112-127,432-447

Raising the default limit to 1024 threads and 32 groups is sufficient
to buy us enough margin for a long time (hopefully, please don't laugh,
you, reader from the future):

  $ ./haproxy -dc -c -f test.cfg
  ...
  Thread CPU Bindings:
    Tgrp/Thr  Tid        CPU set
    1/1-32    1-32       32: 0-15,320-335
    2/1-32    33-64      32: 16-31,336-351
    3/1-32    65-96      32: 32-47,352-367
    4/1-32    97-128     32: 48-63,368-383
    5/1-32    129-160    32: 64-79,384-399
    6/1-32    161-192    32: 80-95,400-415
    7/1-32    193-224    32: 96-111,416-431
    8/1-32    225-256    32: 112-127,432-447
    9/1-32    257-288    32: 128-143,448-463
    10/1-32   289-320    32: 144-159,464-479
    11/1-32   321-352    32: 160-175,480-495
    12/1-32   353-384    32: 176-191,496-511
    13/1-32   385-416    32: 192-207,512-527
    14/1-32   417-448    32: 208-223,528-543
    15/1-32   449-480    32: 224-239,544-559
    16/1-32   481-512    32: 240-255,560-575
    17/1-32   513-544    32: 256-271,576-591
    18/1-32   545-576    32: 272-287,592-607
    19/1-32   577-608    32: 288-303,608-623
    20/1-32   609-640    32: 304-319,624-639

We can change this default now because it has no functional effect
without any configured cpu-policy, so this will only be an opt-in
and it's better to do it now than to have an effect during the
maintenance phase. A tiny effect is a doubling of the number of
pool buckets and stick-table shards internally, which means that
aside slightly reducing contention in these areas, a dump of tables
can enumerate keys in a different order (hence the adjustment in the
vtc).

The only really visible effect is a slightly higher static memory
consumption (29->35 MB on a small config), but that difference
remains even with 50k servers so that's pretty much acceptable.

Thanks to Erwan Velu for the quick tests and the insights!
2025-05-13 18:15:33 +02:00
Amaury Denoyelle
f3b9676416 MINOR: quic: display stream age
Add a field to save the creation date of qc_stream_desc instance. This
is useful to display QUIC stream age in "show quic stream" output.
2025-05-13 15:44:22 +02:00
Amaury Denoyelle
1ccede211c MINOR: mux-quic: account Rx data per stream
Add counters to measure Rx buffers usage per QCS. This reused the newly
defined bdata_ctr type already used for Tx accounting.

Note that for now, <tot> value of bdata_ctr is not used. This is because
it is not easy to account for data accross contiguous buffers.

These values are displayed both on log/traces and "show quic" output.
2025-05-13 15:41:51 +02:00
Amaury Denoyelle
a1dc9070e7 MINOR: quic: account Tx data per stream
Add accounting at qc_stream_desc level to be able to report the number
of allocated Tx buffers and the sum of their data. This represents data
ready for emission or already emitted and waiting on ACK.

To simplify this accounting, a new counter type bdata_ctr is defined in
quic_utils.h. This regroups both buffers and data counter, plus a
maximum on the buffer value.

These values are now displayed on QCS info used both on logline and
traces, and also on "show quic" output.
2025-05-13 15:41:41 +02:00
Willy Tarreau
ebab479cdf MINOR: http: add a function to validate characters of :authority
As discussed here:
  https://github.com/httpwg/http2-spec/pull/936
  https://github.com/haproxy/haproxy/issues/2941

It's important to take care of some special characters in the :authority
pseudo header before reassembling a complete URI, because after assembly
it's too late (e.g. the '/').

This patch adds a specific function which was checks all such characters
and their ranges on an ist, and benefits from modern compilers
optimizations that arrange the comparisons into an evaluation tree for
faster match. That's the version that gave the most consistent performance
across various compilers, though some hand-crafted versions using bitmaps
stored in register could be slightly faster but super sensitive to code
ordering, suggesting that the results might vary with future compilers.
This one takes on average 1.2ns per character at 3 GHz (3.6 cycles per
char on avg). The resulting impact on H2 request processing time (small
requests) was measured around 0.3%, from 6.60 to 6.618us per request,
which is a bit high but remains acceptable given that the test only
focused on req rate.

The code was made usable both for H2 and H3.
2025-05-12 18:02:47 +02:00
William Lallemand
96b1f1fd26 MINOR: tools: ha_freearray() frees an array of string
ha_freearray() is a new function which free() an array of strings
terminated by a NULL entry.

The pointer to the array will be free and set to NULL.
2025-05-09 19:12:05 +02:00
Willy Tarreau
8a96216847 MEDIUM: sock-inet: re-check IPv6 connectivity every 30s
IPv6 connectivity might start off (e.g. network not fully up when
haproxy starts), so for features like resolvers, it would be nice to
periodically recheck.

With this change, instead of having the resolvers code rely on a variable
indicating connectivity, it will now call a function that will check for
how long a connectivity check hasn't been run, and will perform a new one
if needed. The age was set to 30s which seems reasonable considering that
the DNS will cache results anyway. There's no saving in spacing it more
since the syscall is very check (just a connect() without any packet being
emitted).

The variables remain exported so that we could present them in show info
or anywhere else.

This way, "dns-accept-family auto" will now stay up to date. Warning
though, it does perform some caching so even with a refreshed IPv6
connectivity, an older record may be returned anyway.
2025-05-09 15:45:44 +02:00
Willy Tarreau
1404f6fb7b DEBUG: pools: add a new integrity mode "backup" to copy the released area
This way we can preserve the entire contents of the released area for
later inspection. This automatically enables comparison at reallocation
time as well (like "integrity" does). If used in combination with
integrity, the comparison is disabled but the check of non-corruption
of the area mangled by integrity is still operated.
2025-05-09 14:57:00 +02:00
William Lallemand
e7574cd5f0 MINOR: acme: add the global option 'acme.scheduler'
The automatic scheduler is useful but sometimes you don't want to use,
or schedule manually.

This patch adds an 'acme.scheduler' option in the global section, which
can be set to either 'auto' or 'off'. (auto is the default value)

This also change the ouput of the 'acme status' command so it does not
shows scheduled values. The state will be 'Stopped' instead of
'Scheduled'.
2025-05-09 14:00:39 +02:00
Willy Tarreau
0ae14beb2a DEBUG: pool: permit per-pool UAF configuration
The new MEM_F_UAF flag can be set just after a pool's creation to make
this pool UAF for debugging purposes. This allows to maintain a better
overall performance required to reproduce issues while still having a
chance to catch UAF. It will only be used by developers who will manually
add it to areas worth being inspected, though.
2025-05-09 13:59:02 +02:00
Amaury Denoyelle
294bf26c06 MINOR: quic: extend return value during TP parsing
Extend API used for QUIC transport parameter decoding. This is done via
the introduction of a dedicated enum to report the various error
condition detected. No functional change should occur with this patch,
as the only returned code is QUIC_TP_DEC_ERR_TRUNC, which results in the
connection closure via a TLS alert.

This patch will be necessary to properly reject transport parameters
with the proper CONNECTION_CLOSE error code. As such, it should be
backported up to 2.6 with the following series.
2025-05-07 15:19:52 +02:00
Willy Tarreau
feaac66b5e DEBUG: threads: merge successive idempotent lock operations in history
In order to make the lock history a bit more useful, let's try to merge
adjacent lock/unlock sequences that don't change anything for other
threads. For this we can replace the last unlock with the new operation
on the same label, and even just not store it if it was the same as the
one before the unlock, since in the end it's the same as if the unlock
had not been done.

Now loops that used to be filled with "R:LISTENER U:LISTENER" show more
useful info such as:

  S:IDLE_CONNS U:IDLE_CONNS S:PEER U:PEER S:IDLE_CONNS U:IDLE_CONNS R:LISTENER U:LISTENER
  U:STK_TABLE W:STK_SESS U:STK_SESS R:STK_TABLE U:STK_TABLE W:STK_SESS U:STK_SESS R:STK_TABLE
  R:STK_TABLE U:STK_TABLE W:STK_SESS U:STK_SESS W:STK_TABLE_UPDT U:STK_TABLE_UPDT S:PEER

It's worth noting that it can sometimes induce confusion when recursive
locks of the same label are used (a few exist on peers or stick-tables),
as in such a case the two operations would be needed. However these ones
are already undebuggable, so instead they will just have to be renamed
to make sure they use a distinct label.
2025-05-05 18:36:12 +02:00
Willy Tarreau
743dce95d2 DEBUG: threads: don't keep lock label "OTHER" in the per-thread history
Most threads are filled with "R:OTHER U:OTHER" in their history. Since
anything non-important can use other it's not observable but it pollutes
the history. Let's just drop OTHER entirely during the recording.
2025-05-05 18:10:57 +02:00
William Lallemand
878a3507df BUILD: acme: need HAVE_ASN1_TIME_TO_TM
Restrict the build of the ACME feature to libraries which provide
ASN1_TIME_to_tm() function.
2025-05-02 16:01:32 +02:00
William Lallemand
626de9538e MINOR: ssl: add function to extract X509 notBefore date in time_t
Add x509_get_notbefore_time_t() which returns the notBefore date in
time_t format.
2025-05-02 16:01:32 +02:00
Olivier Houchard
388539faa3 MEDIUM: stick-tables: defer adding updates to a tasklet
There is a lot of contention trying to add updates to the tree. So
instead of trying to add the updates to the tree right away, just add
them to a mt-list (with one mt-list per thread group, so that the
mt-list does not become the new point of contention that much), and
create a tasklet dedicated to adding updates to the tree, in batchs, to
avoid keeping the update lock for too long.
This helps getting stick tables perform better under heavy load.
2025-05-02 15:27:55 +02:00
Olivier Houchard
faa18c1ad8 BUG/MEDIUM: quic: Let it be known if the tasklet has been released.
quic_conn_release() may, or may not, free the tasklet associated with
the connection. So make it return 1 if it was, and 0 otherwise, so that
if it was called from the tasklet handler itself, the said handler can
act accordingly and return NULL if the tasklet was destroyed.
This should be backported if 9240cd4a27
is backported.
2025-05-02 11:09:28 +02:00
William Lallemand
18d2371e0d MINOR: acme: change the default max retries to 5
Change the default max retries constant to 5 instead of 3.
Some servers can be be a bit long to execute the challenge.
2025-05-02 09:40:12 +02:00
Olivier Houchard
b138eab302 BUG/MEDIUM: connections: Report connection closing in conn_create_mux()
Add an extra parametre to conn_create_mux(), "closed_connection".
If a pointer is provided, then let it know if the connection was closed.
Callers have no way to determine that otherwise, and we need to know
that, at least in ssl_sock_io_cb(), as if the connection was closed we
need to return NULL, as the tasklet was free'd, otherwise that can lead
to memory corruption and crashes.
This should be backported if 9240cd4a27
is backported too.
2025-04-30 17:17:36 +02:00
Olivier Houchard
4abfade371 MINOR: tasks: Remove unused tasklet_remove_from_tasklet_list
Remove tasklet_remove_from_tasklet_list, as the function hasn't been
used for a long time, and there is little reason to keep it.
2025-04-30 17:09:06 +02:00
Olivier Houchard
2bab043c8c MEDIUM: tasks: Remove TASK_IN_LIST and use TASK_QUEUED instead.
TASK_QUEUED was used to mean "the task has been scheduled to run",
TASK_IN_LIST was used to mean "the tasklet has been scheduled to run",
remove TASK_IN_LIST and just use TASK_QUEUED for tasklets instead.

This commit is just cosmetic, and should not have any impact.
2025-04-30 17:08:57 +02:00
William Lallemand
563ca94ab8 MINOR: ssl/cli: "acme ps" shows the acme tasks
Implement a way to display the running acme tasks over the CLI.

It currently only displays a "Running" status with the certificate name
and the acme section from the configuration.

The displayed running tasks are limited to the size of a buffer for now,
it will require a backref list later to be called multiple times to
resume the list.
2025-04-30 17:12:50 +02:00
Aurelien DARRAGON
97363015a5 MINOR: add hlua_yield_asap() helper
When called, this function will try to enforce a yield (if available) as
soon as possible. Indeed, automatic yield is already enforced every X
Lua instructions. However, there may be some cases where we know after
running heavy operation that we should yield already to avoid taking too
much CPU at once.

This is what this function offers, instead of asking the user to manually
yield using "core.yield()" from Lua itself after using an expensive
Lua method offered by haproxy, we can directly enforce the yield without
the need to do it in the Lua script.
2025-04-30 17:00:27 +02:00
Amaury Denoyelle
df50d3e39f MINOR: mux-quic: limit emitted MSD frames count per qcs
The previous commit has implemented a new calcul method for
MAX_STREAM_DATA frame emission. Now, a frame may be emitted as soon as a
buffer was consumed by a QCS instance.

This will probably increase the number of MAX_STREAM_DATA frame
emission. It may even cause a series of frame emitted for the same
stream with increasing values under high load, which is completely
unnecessary.

To improve this, limit the number of MAX_STREAM_DATA frames built to one
per QCS instance. This is implemented by storing a reference to this
frame in QCS structure via a new member <tx.msd_frm>.

Note that to properly reset QCS msd_frm member, emission of flow-control
frames have been changed. Now, each frame is emitted individually. On
one side, it is better as it prevent to emit frames related to different
streams in a single datagram, which is not desirable in case of packet
loss. However, this can also increase sendto() syscall invocation.
2025-04-30 16:08:47 +02:00
Amaury Denoyelle
14a3fb679f MEDIUM: mux-quic: increase flow-control on each bufsize
Recently, QCS Rx allocation buffer method has been improved. It is now
possible to allocate multiple buffers per QCS instances, which was
necessary to improve HTTP/3 POST throughput.

However, a limitation remained related to the emission of
MAX_STREAM_DATA. These frames are only emitted once at least half of the
receive capacity has been consumed by its QCS instance. This may be too
restrictive when a client need to upload a large payload.

Improve this by adjusting MAX_STREAM_DATA allocation. If QCS capacity is
still limited to 1 or 2 buffers max, the old calcul is still used. This
is necessary when user has limited upload throughput via their
configuration. If QCS capacity is more than 2 buffers, a new frame is
emitted if at least a buffer was consumed.

This patch has reduced number of STREAM_DATA_BLOCKED frames received in
POST tests with some specific clients.
2025-04-30 16:08:47 +02:00
Remi Tricot-Le Breton
047fb37b19 MINOR: Add 'conn' param to ssl_sock_chose_sni_ctx
This is only useful in the traces, the conn parameter won't be used
otherwise.
2025-04-30 11:11:26 +02:00
Remi Tricot-Le Breton
6519cec2ed MINOR: ssl: Add traces about sigalg extension parsing in clientHello callback
We had to parse the sigAlg extension by hand in order to properly select
the certificate used by the SSL frontends. These traces allow to dump
the allowed sigAlg list sent by the client in its clientHello.
2025-04-30 11:11:26 +02:00
Remi Tricot-Le Breton
105c1ca139 MINOR: ssl: Add traces to the switchctx callback
This callback allows to pick the used certificate on an SSL frontend.
The certificate selection is made according to the information sent by
the client in the clientHello. The traces that were added will allow to
better understand what certificate was chosen and why. It will also warn
us if the chosen certificate was the default one.
The actual certificate parsing happens in ssl_sock_chose_sni_ctx. It's
in this function that we actually get the filename of the certificate
used.
2025-04-30 11:11:26 +02:00
Remi Tricot-Le Breton
dbdd0630e1 MINOR: ssl: Add ocsp stapling callback traces
If OCSP stapling fails because of a missing or invalid OCSP response we
used to silently disable stapling for the given session. We can now know
a bit more what happened regarding OCSP stapling.
2025-04-30 11:11:26 +02:00
Remi Tricot-Le Breton
0fb05540b2 MINOR: ssl: Add traces to verify callback
Those traces allow to know which errors were met during certificate
chain validation as well as which ones were ignored.
2025-04-30 11:11:26 +02:00
Remi Tricot-Le Breton
4a8fa28e36 MINOR: ssl: Add traces around SSL_do_handshake call
Those traces dump information about the multiple SSL_do_handshake calls
(renegotiation and regular call). Some errors coud also be dumped in
case of rejected early data.
Depending on the chosen verbosity, some information about the current
handshake can be dumped as well (servername, tls version, chosen cipher
for instance).
In case of failed handshake, the error codes and messages will also be
dumped in the log to ease debugging.
2025-04-30 11:11:26 +02:00
Remi Tricot-Le Breton
9f146bdab3 MINOR: ssl: Add traces to ssl_sock_io_cb function
Add new SSL traces.
2025-04-30 11:11:26 +02:00
Remi Tricot-Le Breton
475bb8d843 MINOR: ssl: Add traces to recv/send functions
Those traces will allow to identify sessions on which early data is used
as well as some forcefully closed connections.
2025-04-30 11:11:26 +02:00
Remi Tricot-Le Breton
9bb8d6dcd1 MINOR: ssl: Add traces to ssl init/close functions
Add a dedicated trace for some unlikely allocation failures and async
errors. Those traces will ostly be used to identify the start and end of
a given SSL connection.
2025-04-30 11:11:26 +02:00
Remi Tricot-Le Breton
08e40f4589 MINOR: Add "sigalg" to "sigalg name" helper function
This function can be used to convert a TLSv1.3 sigAlg entry (2bytes)
from the signature_agorithms client hello extension into a string.

In order to ease debugging, some TLSv1.2 combinations can also be
dumped. In TLSv1.2 those signature algorithms pairs were built out of a
one byte signature identifier combined to a one byte hash identifier.
In TLSv1.3 those identifiers are two bytes blocs that must be treated as
such.
2025-04-30 11:11:26 +02:00
Willy Tarreau
566b384e4e MINOR: tools: make my_strndup() take a size_t len instead of and int
In relation to issue #2954, it appears that turning some size_t length
calculations to the int that uses my_strndup() upsets coverity a bit.
Instead of dealing with such warnings each time, better address it at
the root. An inspection of all call places show that the size passed
there is always positive so we can safely use an unsigned type, and
size_t will always suit it like for strndup() where it's available.
2025-04-30 05:17:43 +02:00
Aurelien DARRAGON
5288b39011 BUG/MINOR: dns: prevent ds accumulation within dss
when dns session callback (dns_session_release()) is called upon error
(ie: when some pending queries were not sent), we try our best to
re-create the applet in order to preserve the pending queries and give
them a chance to be retried. This is done at the end of
dns_session_release().

However, doing so exposes to an issue: if the error preventing queries
from being sent is still encountered over and over the dns session could
stay there indefinitely. Meanwhile, other dns sessions may be created on
the same dns_stream_server periodically. If previous failing dns sessions
don't terminate but we also keep creating new ones, we end up accumulating
failing sessions on a given dns_stream_server, which can eventually cause
ressource shortage.

This issue was found when trying to address ("BUG/MINOR: dns: add tempo
between 2 connection attempts for dns servers")

To fix it, we track the number of failed consecutive sessions for a given
dns server. When we reach the threshold (set to 100), we consider that the
link to the dns server is broken (at least temporarily) and we force
dns_session_new() to fail, so that we stop creating new sessions until one
of the existing one eventually succeeds.

A workaround for this fix consists in setting the "maxconn" parameter on
nameserver directive (under resolvers section) to a reasonnable value so
that no more than "maxconn" sessions may co-exist on the same server at
a given time.

This may be backported to all stable versions.
("CLEANUP: dns: remove unused dns_stream_server struct member") may be
backported to ease the backport.
2025-04-29 21:20:54 +02:00
Aurelien DARRAGON
14ebe95a10 CLEANUP: dns: remove unused dns_stream_server struct member
dns_stream_server "max_slots" is unused, let's get rid of it
2025-04-29 21:20:44 +02:00
Aurelien DARRAGON
1ced5ef2fd MINOR: applet: add appctx_schedule() macro
Just like task_schedule() but for applets to wakeup an applet at a
specific time, leverages _task_schedule() internally
2025-04-29 21:19:37 +02:00
William Lallemand
5555926fdd MEDIUM: acme: use a map to store tokens and thumbprints
The stateless mode which was documented previously in the ACME example
is not convenient for all use cases.

First, when HAProxy generates the account key itself, you wouldn't be
able to put the thumbprint in the configuration, so you will have to get
the thumbprint and then reload.
Second, in the case you are using multiple account key, there are
multiple thumbprint, and it's not easy to know which one you want to use
when responding to the challenger.

This patch allows to configure a map in the acme section, which will be
filled by the acme task with the token corresponding to the challenge,
as the key, and the thumbprint as the value. This way it's easy to reply
the right thumbprint.

Example:
    http-request return status 200 content-type text/plain lf-string "%[path,field(-1,/)].%[path,field(-1,/),map(virt@acme)]\n" if { path_beg '/.well-known/acme-challenge/' }
2025-04-29 16:15:55 +02:00
Amaury Denoyelle
0f9b3daf98 MEDIUM: quic: limit global Tx memory
Define a new settings tune.quic.frontend.max-tot-window. It contains a
size argument which can be used to set a limit on the sum of all QUIC
connections congestion window. This is applied both on
quic_cc_path_set() and quic_cc_path_inc().

Note that this limitation cannot reduce a congestion window more than
the minimal limit which is set to 2 datagrams.
2025-04-29 15:19:32 +02:00
Amaury Denoyelle
e841164a44 MINOR: quic: account for global congestion window
Use the newly defined cshared type to account for the sum of congestion
window of every QUIC connection. This value is stored in global counter
quic_mem_global defined in proto_quic module.
2025-04-29 15:19:32 +02:00
Amaury Denoyelle
3891456d20 MINOR: thread: define cshared type
Define a new type "struct cshared". This can be used as a tool to
manipulate a global counter with thread-safety ensured. Each thread
would declare its thread-local cshared type, which would point to a
global counter.

Each thread can then add/substract value to their owned thread-local
cshared instance via cshared_add(). If the difference exceed a
configured limit, either positively or negatively, the global counter is
updated and thread-local instance is reset to 0. Each thread can safely
read the global counter value using cshared_read().
2025-04-29 15:10:06 +02:00
Amaury Denoyelle
7bad88c35c BUG/MINOR: quic: ensure cwnd limits are always enforced
Congestion window is limit by a minimal and maximum values which can
never be exceeded. Min value is hardcoded to 2 datagrams as recommended
by the specification. Max value is specified via haproxy configuration.

These values must be respected each time the congestion window size is
adjusted. However, in some rare occasions, limit were not always
enforced. Fix this by implementing wrappers to set or increment the
congestion window. These functions ensure limits are always applied
after the operation.

Additionnally, wrappers also ensure that if window reached a new maximum
value, it is saved in <cwnd_last_max> field.

This should be backported up to 2.6, after a brief period of
observation.
2025-04-29 15:10:06 +02:00
Amaury Denoyelle
2eb1b0cd96 MINOR: quic: rename min/max fields for congestion window algo
There was some possible confusion between fields related to congestion
window size min and max limit which cannot be exceeded, and the maximum
value previously reached by the window.

Fix this by adopting a new naming scheme. Enforced limit are now renamed
<limit_max>/<limit_min>, while the previously reached max value is
renamed <cwnd_last_max>.

This should be backported up to 3.1.
2025-04-29 15:10:06 +02:00
Willy Tarreau
2cdb3cb91e MINOR: tcp: add support for setting TCP_NOTSENT_LOWAT on both sides
TCP_NOTSENT_LOWAT is very convenient as it indicates when to report
EAGAIN on the sending side. It takes a margin on top of the estimated
window, meaning that it's no longer needed to store too many data in
socket buffers. Instead there's just enough to fill the send window
and a little bit of margin to cover the scheduling time to restart
sending. Experiments on a 100ms network have shown a 10-fold reduction
in the memory used by socket buffers by just setting this value to
tune.bufsize, without noticing any performance degradation. Theoretically
the responsiveness on multiplexed protocols such as H2 should also be
improved.
2025-04-29 12:13:42 +02:00
Willy Tarreau
f25b4abc9b MINOR: cli: split APPCTX_CLI_ST1_PROMPT into two distinct flags
The CLI's "prompt" command toggles two distinct things:
  - displaying or hiding the prompt at the beginning of the line
  - single-command vs interactive mode

These are two independent concepts and the prompt mode doesn't
always cope well with tools that would like to upload data without
having to read the prompt on return. Also, the master command line
works in interactive mode by default with no prompt, which is not
consistent (and not convenient for tools). So let's start by splitting
the bit in two, and have a new APPCTX_CLI_ST1_INTER flag dedicated
to the interactive mode. For now the "prompt" command alone continues
to toggle the two at once.
2025-04-28 20:21:06 +02:00
Willy Tarreau
5ac280f2a7 MINOR: compiler: add more macros to detect macro definitions
We add __equals_0(NAME) which is only true if NAME is defined as zero,
and __def_as_empty(NAME) which is only true if NAME is defined as an
empty string.
2025-04-28 20:21:06 +02:00
Willy Tarreau
12c7189bc8 MEDIUM: thread: set DEBUG_THREAD to 1 by default
Setting DEBUG_THREAD to 1 allows recording the lock history for each
thread. Tests have shown that (as predicted) the cost of updating a
single thread-local variable is not perceptible in the noise, especially
when compared to the cost of obtaining a lock. Since this can provide
useful value when debugging deadlocks, let's enable it by default when
threads are enabled.
2025-04-28 16:50:34 +02:00
Willy Tarreau
d9a659ed96 MINOR: threads/cli: display the lock history on "show threads"
This will display the lock labels and modes for each non-empty step
at the end of "show threads" when these are defined. This allows to
emit up to the last 8 locking operation for each thread on 64 bit
machines.
2025-04-28 16:50:34 +02:00
Willy Tarreau
b8a1c2380b MEDIUM: threads: keep history of taken locks with DEBUG_THREAD > 0
by only storing a word in each thread context, we can keep the history
of all taken/dropped locks by label. This is expected to be very cheap
and to permit to store up to 8 consecutive lock operations in 64 bits.
That should significantly help detect recursive locks as well as figure
what thread was likely to hinder another one waiting for a lock.

For now we only store the final state of the lock, we don't store the
attempt to get it. It's just a matter of space since we already need
4 ops (rd,sk,wr,un) which take 2 bits, leaving max 64 labels. We're
already around 45. We could also multiply by 5 and still keep 8 bits
total per lock, that would limit us to 51 locks max. It seems that
most of the time if we get a watchdog panic, anyway the victim thread
will be perfectly located so that we don't need a specific value for
this. Another benefit is that we perform a single memory write per
lock.
2025-04-28 16:50:34 +02:00
Willy Tarreau
23371b3e7c MINOR: threads: turn the full lock debugging to DEBUG_THREAD=2
At level 1 it now does nothing. This is reserved for some subsequent
patches which will implement lighter debugging.
2025-04-28 16:50:34 +02:00
Willy Tarreau
903a6b14ef MINOR: threads: prepare DEBUG_THREAD to receive more values
We now default the value to zero and make sure all tests properly take
care of values above zero. This is in preparation for supporting several
degrees of debugging.
2025-04-28 16:50:34 +02:00
William Lallemand
bb768b3e26 MEDIUM: acme: use Retry-After value for retries
Parse the Retry-After header in response and store it in order to use
the value as the next delay for the next retry, fallback to 3s if the
value couldn't be parse or does not exist.
2025-04-24 20:14:47 +02:00
Willy Tarreau
69b051d1dc MINOR: resolvers: add "dns-accept-family auto" to rely on detected IPv6
Instead of always having to force IPv4 or IPv6, let's now also offer
"auto" which will only enable IPv6 if the system has a default gateway
for it. This means that properly configured dual-stack systems will
default to "ipv4,ipv6" while those lacking a gateway will only use
"ipv4". Note that no real connectivity test is performed, so firewalled
systems may still get it wrong and might prefer to rely on a manual
"ipv4" assignment.
2025-04-24 17:52:28 +02:00
Willy Tarreau
5d41d476f3 MINOR: sock-inet: detect apparent IPv6 connectivity
In order to ease dual-stack deployments, we could at least try to
check if ipv6 seems to be reachable. For this we're adding a test
based on a UDP connect (no traffic) on port 53 to the base of
public addresses (2001::) and see if the connect() is permitted,
indicating that the routing table knows how to reach it, or fails.
Based on this result we're setting a global variable that other
subsystems might use to preset their defaults.
2025-04-24 17:52:28 +02:00
Willy Tarreau
2c46c2c042 MINOR: resolvers: add command-line argument -4 to force IPv4-only DNS
In order to ease troubleshooting and testing, the new "-4" command line
argument enforces queries and processing of "A" DNS records only, i.e.
those representing IPv4 addresses. This can be useful when a host lack
end-to-end dual-stack connectivity. This overrides the global
"dns-accept-family" directive and is equivalent to value "ipv4".
2025-04-24 17:52:28 +02:00
Willy Tarreau
940fa19ad8 MEDIUM: resolvers: add global "dns-accept-family" directive
By default, DNS resolvers accept both IPv4 and IPv6 addresses. This can be
influenced by the "resolve-prefer" keywords on server lines as well as the
family argument to the "do-resolve" action, but that is only a preference,
which does not block the other family from being used when it's alone. In
some environments where dual-stack is not usable, stumbling on an unreachable
IPv6-only DNS record can cause significant trouble as it will replace a
previous IPv4 one which would possibly have continued to work till next
request. The "dns-accept-family" global option permits to enforce usage of
only one (or both) address families. The argument is a comma-delimited list
of the following words:
  - "ipv4": query and accept IPv4 addresses ("A" records)
  - "ipv6": query and accept IPv6 addresses ("AAAA" records)

When a single family is used, no request will be sent to resolvers for the
other family, and any response for the othe family will be ignored. The
default value is "ipv4,ipv6", which effectively enables both families.
2025-04-24 17:52:28 +02:00
Christopher Faulet
29632bcabf CLEANUP: applet: Remove unsued rule pointer in appctx structure
Thanks to previous commits, the "rule" field in the appctx structure is no
longer used. So we can safely remove it.
2025-04-24 16:22:31 +02:00
Christopher Faulet
b734d7c156 MINOR: cli/applet: Move appctx fields only used by the CLI in a private context
There are several fields in the appctx structure only used by the CLI. To
make things cleaner, all these fields are now placed in a dedicated context
inside the appctx structure. The final goal is to move it in the service
context and add an API for cli commands to get a command coontext inside the
cli context.
2025-04-24 15:09:37 +02:00
Christopher Faulet
742dc01537 CLEANUP: applet: Update st0/st1 comment in appctx structure
Today, these states are used by almost all applets. So update the comments
of these fields.
2025-04-24 15:09:37 +02:00
Christopher Faulet
44ace9a1b7 MINOR: cli: Rename some CLI applet states to reflect recent refactoring
CLI_ST_GETREQ state was renamed into CLI_ST_PARSE_CMDLINE and CLI_ST_PARSEREQ
into CLI_ST_PROCESS_CMDLINE to reflect the real action performed in these
states.
2025-04-24 15:09:37 +02:00
Christopher Faulet
20ec1de214 MAJOR: cli: Refacor parsing and execution of pipelined commands
Before this patch, when pipelined commands were received, each command was
parsed and then excuted before moving to the next command. Pending commands
were not copied in the input buffer of the applet. The major issue with this
way to handle commands is the impossibility to consume inputs from commands
with an I/O handler, like "show events" for instance. It was working thanks
to a "bug" if such commands were the last one on the command line. But it
was impossible to use them followed by another command. And this prevents us
to implement any streaming support for CLI commands.

So we decided to refactor the command line parsing to have something similar
to a basic shell. Now an entire line is parsed, including the payload,
before starting commands execution. The command line is copied in a
dedicated buffer. "appctx->chunk" buffer is used for this purpose. It was an
unsed field, so it is safe to use it here. Once the command line copied, the
commands found on this line are executed. Because the applet input buffer
was flushed, any input can be safely consumed by the CLI applet and is
available for the command I/O handler. Thanks to this change, "show event
-w" command can be followed by a command. And in theory, it should be
possible to implement commands supporting input data streaming. For
instance, the Tetris like lua applet can be used on the CLI now.

Note that the payload, if any, is part of the command line and must be fully
received before starting the commands processing. It means there is still
the limitation to a buffer, but not only for the payload but for the whole
command line. The payload is still necessarily at the end of the command
line and is passed as argument to the last command. Internally, the
"appctx->cli_payload" field was introduced to point on the payload in the
command line buffer.

This patch is quite huge but it cannot easily be splitted. It should not
introduced significant changes.
2025-04-24 15:09:37 +02:00
Willy Tarreau
1af592c511 MINOR: stick-table: use a separate lock label for updates
Too many locks were sharing STK_TABLE_LOCK making it hard to analyze.
Let's split the already heavily used update lock.
2025-04-24 14:02:22 +02:00
William Lallemand
af73f98a3e MEDIUM: acme: rename "uri" into "directory"
Rename the "uri" option of the acme section into "directory".
2025-04-24 10:52:46 +02:00
William Lallemand
d700a242b4 MINOR: httpclient: add an "https" log-format
Add an experimental "https" log-format for the httpclient, it is not
used by the httpclient by default, but could be define in a customized
proxy.

The string is basically a httpslog, with some of the fields replaced by
their backend equivalent or - when not available:

"%ci:%cp [%tr] %ft -/- %TR/%Tw/%Tc/%Tr/%Ta %ST %B %CC %CS %tsc %ac/%fc/%bc/%sc/%rc %sq/%bq %hr %hs %{+Q}r %[bc_err]/%[ssl_bc_err,hex]/-/-/%[ssl_bc_is_resumed] -/-/-"
2025-04-23 15:32:46 +02:00
Christopher Faulet
a56feffc6f CLEANUP: h1: Remove now useless h1_parse_cont_len_header() function
Since the commit "MINOR: hlua/h1: Use http_parse_cont_len_header() to parse
content-length value", this function is no longer used. So it can be safely
removed.
2025-04-22 16:14:47 +02:00
Christopher Faulet
5200203677 MINOR: proxy: Add options to drop HTTP trailers during message forwarding
In RFC9110, it is stated that trailers could be merged with the
headers. While it should be performed with a speicial care, it may be a
problem for some applications. To avoid any trouble with such applications,
two new options were added to drop trailers during the message forwarding.

On the backend, "http-drop-request-trailers" option can be enabled to drop
trailers from the requests before sending them to the server. And on the
frontend, "http-drop-response-trailers" option can be enabled to drop
trailers from the responses before sending them to the client. The options
can be defined in defaults sections and disabled with "no" keyword.

This patch should fix the issue #2930.
2025-04-22 16:14:46 +02:00
Christopher Faulet
044ef9b3d6 CLEANUP: Slightly reorder some proxy option flags to free slots
PR_O_TCPCHK_SSL and PR_O_CONTSTATS was shifted to free a slot. The idea is
to have 2 contiguous slots to be able to insert two new options.
2025-04-22 16:14:46 +02:00
Amaury Denoyelle
4309a6fbf8 BUG/MINOR: quic: do not crash on CRYPTO ncbuf alloc failure
To handle out-of-order received CRYPTO frames, a ncbuf instance is
allocated. This is done via the helper quic_get_ncbuf().

Buffer allocation was improperly checked. In case b_alloc() fails, it
crashes due to a BUG_ON(). Fix this by removing it. The function now
returns NULL on allocation failure, which is already properly handled in
its caller qc_handle_crypto_frm().

This should fix the last reported crash from github issue #2935.

This must be backported up to 2.6.
2025-04-18 18:11:17 +02:00
Olivier Houchard
3758eab71c MEDIUM: lb_fwrr: Use one ebtree per thread group.
When using the round-robin load balancer, the major source of contention
is the lbprm lock, that has to be held every time we pick a server.
To mitigate that, make it so there are one tree per thread-group, and
one lock per thread-group. That means we now have a lb_fwrr_per_tgrp
structure that will contain the two lb_fwrr_groups (active and backup) as well
as the lock to protect them in the per-thread lbprm struct, and all
fields in the struct server are now moved to the per-thread structure
too.
Those changes are mostly mechanical, and brings good performances
improvment, on a 64-cores AMD CPU, with 64 servers configured, we could
process about 620000 requests par second, and we now can process around
1400000 requests per second.
2025-04-17 17:38:23 +02:00
Olivier Houchard
f36f6cfd26 MINOR: proxies: Add a per-thread group lbprm struct.
Add a new structure in the per-thread groups proxy structure, that will
contain whatever is per-thread group in lbprm.
It will be accessed as p->per_tgrp[tgid].lbprm.
2025-04-17 17:38:23 +02:00
Olivier Houchard
7ca1c94ff0 MINOR: lb_fwrr: Move the next weight out of fwrr_group.
Move the "next_weight" outside of fwrr_group, and inside struct lb_fwrr
directly, one for the active servers, one for the backup servers.
We will soon have one fwrr_group per thread group, but next_weight will
be global to all of them.
2025-04-17 17:38:23 +02:00
Olivier Houchard
444125a764 MINOR: servers: Provide a pointer to the server in srv_per_tgroup.
Add a pointer to the server into the struct srv_per_tgroup, so that if
we only have access to that srv_per_tgroup, we can come back to the
corresponding server.
2025-04-17 17:38:23 +02:00
Willy Tarreau
36ec70c526 MINOR: sched: add a new function is_sched_alive() to report scheduler's health
This verifies that the scheduler is still ticking without having to
access the activity[] array nor keeping local copies of the ctxsw
counter. It just tests and sets a flag that is reset after each
return from a ->process() function.
2025-04-17 16:25:47 +02:00
Willy Tarreau
874ba2afed CLEANUP: debug: no longer set nor use TH_FL_DUMPING_OTHERS
TH_FL_DUMPING_OTHERS was being used to try to perform exclusion between
threads running "show threads" and those producing warnings. Now that it
is much more cleanly handled, we don't need that type of protection
anymore, which was adding to the complexity of the solution. Let's just
get rid of it.
2025-04-17 16:25:47 +02:00
Willy Tarreau
c16d5415a8 MINOR: debug: make ha_stuck_warning() only work for the current thread
Since we no longer call it with a foreign thread, let's simplify its code
and get rid of the special cases that were relying on ha_thread_dump_fill()
and synchronization with a remote thread. We're not only dumping the
current thread so ha_thread_dump_one() is sufficient.
2025-04-17 16:25:47 +02:00
Willy Tarreau
b24d7f248e MINOR: pass a valid buffer pointer to ha_thread_dump_one()
The goal is to let the caller deal with the pointer so that the function
only has to fill that buffer without worrying about locking. This way,
synchronous dumps from "show threads" are produced and emitted directly
without causing undesired locking of the buffer nor risking causing
confusion about thread_dump_buffer containing bits from an interrupted
dump in progress.

It's only the caller that's responsible for notifying the requester of
the end of the dump by setting bit 0 of the pointer if needed (i.e. it's
only done in the debug handler).
2025-04-17 16:25:47 +02:00
Willy Tarreau
5ac739cd0c MINOR: debug: remove unused case of thr!=tid in ha_thread_dump_one()
This function was initially designed to dump any threadd into the presented
buffer, but the way it currently works is that it's always called for the
current thread, and uses the distinction between coming from a sighandler
or being called directly to detect which thread is the caller.

Let's simplify all this by replacing thr with tid everywhere, and using
the thread-local pointers where it makes sense (e.g. th_ctx, th_ctx etc).
The confusing "from_signal" argument is now replaced with "is_caller"
which clearly states whether or not the caller declares being the one
asking for the dump (the logic is inverted, but there are only two call
places with a constant).
2025-04-17 16:25:47 +02:00
Willy Tarreau
6d8a523d14 MINOR: tinfo: keep a copy of the pointer to the thread dump buffer
Instead of using the thread dump buffer for post-mortem analysis, we'll
keep a copy of the assigned pointer whenever it's used, even for warnings
or "show threads". This will offer more opportunities to figure from a
core what happened, and will give us more freedom regarding the value of
the thread_dump_buffer itself. For example, even at the end of the dump
when the pointer is reset, the last used buffer is now preserved.
2025-04-17 16:25:47 +02:00
Willy Tarreau
337017e2f9 BUG/MINOR: threads: set threads_idle and threads_harmless even with no threads
Some signal handlers rely on these to decide about the level of detail to
provide in dumps, so let's properly fill the info about entering/leaving
idle. Note that for consistency with other tests we're using bitops with
t->ltid_bit, while we could simply assign 0/1 to the fields. But it makes
the code more readable and the whole difference is only 88 bytes on a 3MB
executable.

This bug is not important, and while older versions are likely affected
as well, it's not worth taking the risk to backport this in case it would
wake up an obscure bug.
2025-04-17 16:25:47 +02:00
Amaury Denoyelle
52246249ab MEDIUM: listener/mux-h2: implement idle-ping on frontend side
This commit is the counterpart of the previous one, adapted on the
frontend side. "idle-ping" is added as keyword to bind lines, to be able
to refresh client timeout of idle frontend connections.

H2 MUX behavior remains similar as the previous patch. The only
significant change is in h2c_update_timeout(), as idle-ping is now taken
into account also for frontend connection. The calculated value is
compared with http-request/http-keep-alive timeout value. The shorter
delay is then used as expired date. As hr/ka timeout are based on
idle_start, this allows to run them in parallel with an idle-ping timer.
2025-04-17 14:49:36 +02:00
Amaury Denoyelle
a78a04cfae MEDIUM: server/mux-h2: implement idle-ping on backend side
This commit implements support for idle-ping on the backend side. First,
a new server keyword "idle-ping" is defined in configuration parsing. It
is used to set the corresponding new server member.

The second part of this commit implements idle-ping support on H2 MUX. A
new inlined function conn_idle_ping() is defined to access connection
idle-ping value. Two new connection flags are defined H2_CF_IDL_PING and
H2_CF_IDL_PING_SENT. The first one is set for idle connections via
h2c_update_timeout().

On h2_timeout_task() handler, if first flag is set, instead of releasing
the connection as before, the second flag is set and tasklet is
scheduled. As both flags are now set, h2_process_mux() will proceed to
PING emission. The timer has also been rearmed to the idle-ping value.
If a PING ACK is received before next timeout, connection timer is
refreshed. Else, the connection is released, as with timer expiration.

Also of importance, special care is needed when a backend connection is
going to idle. In this case, idle-ping timer must be rearmed. Thus a new
invokation of h2c_update_timeout() is performed on h2_detach().
2025-04-17 14:49:36 +02:00
William Lallemand
e778049ffc MINOR: acme: register the task in the ckch_store
This patch registers the task in the ckch_store so we don't run 2 tasks
at the same time for a given certificate.

Move the task creation under the lock and check if there was already a
task under the lock.
2025-04-16 17:12:43 +02:00
William Lallemand
c291a5c73c BUILD: incompatible pointer type suspected with -DDEBUG_UNIT
src/jws.c: In function '__jws_init':
src/jws.c:594:38: error: passing argument 2 of 'hap_register_unittest' from incompatible pointer type [-Wincompatible-pointer-types]
  594 |         hap_register_unittest("jwk", jwk_debug);
      |                                      ^~~~~~~~~
      |                                      |
      |                                      int (*)(int,  char **)
In file included from include/haproxy/api.h:36,
                 from include/import/ebtree.h:251,
                 from include/import/ebmbtree.h:25,
                 from include/haproxy/jwt-t.h:25,
                 from src/jws.c:5:
include/haproxy/init.h:37:52: note: expected 'int (*)(void)' but argument is of type 'int (*)(int,  char **)'
   37 | void hap_register_unittest(const char *name, int (*fct)());
      |                                              ~~~~~~^~~~~~

GCC 15 is warning because the function pointer does have its
arguments in the register function.

Should fix issue #2929.
2025-04-15 15:49:44 +02:00
Willy Tarreau
b708345c17 DEBUG: counters: add the ability to enable/disable updating the COUNT_IF counters
These counters can have a noticeable cost on large machines, though not
dramatic. There's no single good choice to keep them enabled or disabled.
This commit adds multiple choices:
  - DEBUG_COUNTERS set to 2 will automatically enable them by default, while
    1 will disable them by default
  - the global "debug.counters on/off" will allow to change the setting at
    boot, regardless of DEBUG_COUNTERS as long as it was at least 1.
  - the CLI "debug counters on/off" will also allow to change the value at
    run time, allowing to observe a phenomenon while it's happening, or to
    disable counters if it's suspected that their cost is too high

Finally, the "debug counters" command will append "(stopped)" at the end
of the CNT lines when these counters are stopped.

Not that the whole mechanism would easily support being extended to all
counter types by specifying the types to apply to, but it doesn't seem
useful at all and would require the user to also type "cnt" on debug
lines. This may easily be changed in the future if it's found relevant.
2025-04-14 19:02:13 +02:00
Willy Tarreau
a142adaba0 DEBUG: counters: make COUNT_IF() only appear at DEBUG_COUNTERS>=1
COUNT_IF() is convenient but can be heavy since some of them were found
to trigger often (roughly 1 counter per request on avg). This might even
have an impact on large setups due to the cost of a shared cache line
bouncing between multiple cores. For now there's no way to disable it,
so let's only enable it when DEBUG_COUNTERS is 1 or above. A future
change will make it configurable.
2025-04-14 19:02:13 +02:00
Willy Tarreau
61d633a3ac DEBUG: rename DEBUG_GLITCHES to DEBUG_COUNTERS and enable it by default
Till now the per-line glitches counters were only enabled with the
confusingly named DEBUG_GLITCHES (which would not turn glitches off
when disabled). Let's instead change it to DEBUG_COUNTERS and make sure
it's enabled by default (though it can still be disabled with
-DDEBUG_GLITCHES=0 just like for DEBUG_STRICT). It will later be
expanded to cover more counters.
2025-04-14 19:02:13 +02:00
William Lallemand
39c05cedff BUILD: acme: enable the ACME feature when JWS is present
The ACME feature depends on the JWS, which currently does not work with
every SSL libraries. This patch only enables ACME when JWS is enabled.
2025-04-12 01:39:03 +02:00
William Lallemand
5500bda9eb MINOR: acme: implement retrieval of the certificate
Once the Order status is "valid", the certificate URL is accessible,
this patch implements the retrieval of the certificate which is stocked
in ctx->store.
2025-04-12 01:39:03 +02:00
William Lallemand
27fff179fe MINOR: acme: verify the order status once finalized
This implements a call to the order status to check if the certificate
is ready.
2025-04-12 01:39:03 +02:00
William Lallemand
680222b382 MINOR: acme: finalize by sending the CSR
This patch does the finalize step of the ACME task.
This encodes the CSR into base64 format and send it to the finalize URL.

https://www.rfc-editor.org/rfc/rfc8555#section-7.4
2025-04-12 01:29:27 +02:00
William Lallemand
de5dc31a0d MINOR: acme: generate the CSR in a X509_REQ
Generate the X509_REQ using the generated private key and the SAN from
the configuration. This is only done once before the task is started.

It could probably be done at the beginning of the task with the private
key generation once we have a scheduler instead of a CLI command.
2025-04-12 01:29:27 +02:00
William Lallemand
00ba62df15 MINOR: acme: implement a check on the challenge status
This patch implements a check on the challenge URL, once haproxy asked
for the challenge to be verified, it must verify the status of the
challenge resolution and if there weren't any error.
2025-04-12 01:29:27 +02:00
William Lallemand
711a13a4b4 MINOR: acme: send the request for challenge ready
This patch sends the "{}" message to specify that a challenge is ready.
It iterates on every challenge URL in the authorization list from the
acme_ctx.

This allows the ACME server to procede to the challenge validation.
https://www.rfc-editor.org/rfc/rfc8555#section-7.5.1
2025-04-12 01:29:27 +02:00
William Lallemand
ae0bc88f91 MINOR: acme: get the challenges object from the Auth URL
This patch implements the retrieval of the challenges objects on the
authorizations URLs. The challenges object contains a token and a
challenge url that need to be called once the challenge is setup.

Each authorization URLs contain multiple challenge objects, usually one
per challenge type (HTTP-01, DNS-01, ALPN-01... We only need to keep the
one that is relevent to our configuration.
2025-04-12 01:29:27 +02:00
William Lallemand
4842c5ea8c MINOR: acme: newOrder request retrieve authorizations URLs
This patch implements the newOrder action in the ACME task, in order to
ask for a new certificate, a list of SAN is sent as a JWS payload.
the ACME server replies a list of Authorization URLs. One Authorization
is created per SAN on a Order.

The authorization URLs are stored in a linked list of 'struct acme_auth'
in acme_ctx, so we can get the challenge URLs from them later.

The location header is also store as it is the URL of the order object.

https://datatracker.ietf.org/doc/html/rfc8555#section-7.4
2025-04-12 01:29:27 +02:00
William Lallemand
04d393f661 MINOR: acme: generate new account
The new account action in the ACME task use the same function as the
chkaccount, but onlyReturnExisting is not sent in this case!
2025-04-12 01:29:27 +02:00
William Lallemand
7f9bf4d5f7 MINOR: acme: check if the account exist
This patch implements the retrival of the KID (account identifier) using
the pkey.

A request is sent to the newAccount URL using the onlyReturnExisting
option, which allow to get the kid of an existing account.

acme_jws_payload() implement a way to generate a JWS payload using the
nonce, pkey and provided URI.
2025-04-12 01:29:27 +02:00
William Lallemand
0aa6dedf72 MINOR: acme: handle the nonce
ACME requests are supposed to be sent with a Nonce, the first Nonce
should be retrieved using the newNonce URI provided by the directory.

This nonce is stored and must be replaced by the new one received in the
each response.
2025-04-12 01:29:27 +02:00
William Lallemand
471290458e MINOR: acme: get the ACME directory
The first request of the ACME protocol is getting the list of URLs for
the next steps.

This patch implements the first request and the parsing of the response.

The response is a JSON object so mjson is used to parse it.
2025-04-12 01:29:27 +02:00
William Lallemand
b8209cf697 MINOR: acme/cli: add the 'acme renew' command
The "acme renew" command launch the ACME task for a given certificate.

The CLI parser generates a new private key using the parameters from the
acme section..
2025-04-12 01:29:27 +02:00
William Lallemand
bf6a39c4d1 MINOR: acme: add private key configuration
This commit allows to configure the generated private keys, you can
configure the keytype (RSA/ECDSA), the number of bits or the curves.

Example:

    acme LE
        uri https://acme-staging-v02.api.letsencrypt.org/directory
        account account.key
        contact foobar@example.com
        challenge HTTP-01
        keytype ECDSA
        curves P-384
2025-04-12 01:29:27 +02:00
William Lallemand
2e8c350b95 MINOR: acme: add configuration for the crt-store
Add new acme keywords for the ckch_conf parsing, which will be used on a
crt-store, a crt line in a frontend, or even a crt-list.

The cfg_postparser_acme() is called in order to check if a section referenced
elsewhere really exists in the config file.
2025-04-12 01:29:27 +02:00
William Lallemand
077e2ce84c MINOR: acme: add the acme section in the configuration parser
Add a configuration parser for the new acme section, the section is
configured this way:

    acme letsencrypt
        uri https://acme-staging-v02.api.letsencrypt.org/directory
        account account.key
        contact foobar@example.com
        challenge HTTP-01

When unspecified, the challenge defaults to HTTP-01, and the account key
to "<section_name>.account.key".

Section are stored in a linked list containing acme_cfg structures, the
configuration parsing is mostly resolved in the postsection parser
cfg_postsection_acme() which is called after the parsing of an acme section.
2025-04-12 01:29:27 +02:00
William Lallemand
20718f40b6 MEDIUM: ssl/ckch: add filename and linenum argument to crt-store parsing
Add filename and linenum arguments to the crt-store / ckch_conf parsing.

It allows to use them in the parsing function so we could emits error.
2025-04-12 01:29:27 +02:00
Willy Tarreau
00c967fac4 MINOR: master/cli: support bidirectional communications with workers
Some rare commands in the worker require to keep their input open and
terminate when it's closed ("show events -w", "wait"). Others maintain
a per-session context ("set anon on"). But in its default operation
mode, the master CLI passes commands one at a time to the worker, and
closes the CLI's input channel so that the command can immediately
close upon response. This effectively prevents these two specific cases
from being used.

Here the approach that we take is to introduce a bidirectional mode to
connect to the worker, where everything sent to the master is immediately
forwarded to the worker (including the raw command), allowing to queue
multiple commands at once in the same session, and to continue to watch
the input to detect when the client closes. It must be a client's choice
however, since doing so means that the client cannot batch many commands
at once to the master process, but must wait for these commands to complete
before sending new ones. For this reason we use the prefix "@@<pid>" for
this. It works exactly like "@" except that it maintains the channel
open during the whole execution. Similarly to "@<pid>" with no command,
"@@<pid>" will simply open an interactive CLI session to the worker, that
will be ended by "quit" or by closing the connection. This can be convenient
for the user, and possibly for clients willing to dedicate a connection to
the worker.
2025-04-11 16:09:17 +02:00
Aurelien DARRAGON
fbfeb591f7 MINOR: proxy: add deinit_proxy() helper func
Same as free_proxy(), but does not free the base proxy pointer (ie: the
proxy itself may not be allocated)

Goal is to be able to cleanup statically allocated dummy proxies.
2025-04-10 22:10:31 +02:00
Aurelien DARRAGON
e1cec655ee MINOR: proxy: add setup_new_proxy() function
Split alloc_new_proxy() in two functions: the preparing part is now
handled by setup_new_proxy() which can be called individually, while
alloc_new_proxy() takes care of allocating a new proxy struct and then
calling setup_new_proxy() with the freshly allocated proxy.
2025-04-10 22:10:31 +02:00
Willy Tarreau
f4634e5a38 MINOR: ring/cli: support delimiting events with a trailing \0 on "show events"
At the moment it is not supported to produce multi-line events on the
"show events" output, simply because the LF character is used as the
default end-of-event mark. However it could be convenient to produce
well-formatted multi-line events, e.g. in JSON or other formats. UNIX
utilities have already faced similar needs in the past and added
"-print0" to "find" and "-0" to "xargs" to mention that the delimiter
is the NUL character. This makes perfect sense since it's never present
in contents, so let's do exactly the same here.

Thus from now on, "show events <ring> -0" will delimit messages using
a \0 instead of a \n, permitting a better and safer encapsulation.
2025-04-08 14:36:35 +02:00
Willy Tarreau
0be6d73e88 MINOR: ring: support arbitrary delimiters through ring_dispatch_messages()
In order to support delimiting output events with other characters than
just the LF, let's pass the delimiter through the API. The default remains
the LF, used by applet_append_line(), and ignored by the log forwarder.
2025-04-08 14:36:35 +02:00
Willy Tarreau
f01ff2478f BUILD: atomics: fix build issue on non-x86/non-arm systems
Commit f435a2e518 ("CLEANUP: atomics: also replace __sync_synchronize()
with __atomic_thread_fence()") replaced the builtins used for barriers,
but the different API required an argument while the macros didn't specify
any, resulting in double parenthesis that were causing obscure build errors
such as "called object type 'void' is not a function or function pointer".
Let's just specify the args for the macro. No backport is needed.
2025-04-07 09:38:22 +02:00
Aurelien DARRAGON
11d4d0957e MEDIUM: task: make notification_* API thread safe by default
Some notification_* functions were not thread safe by default as they
assumed only one producer would emit events for registered tasks.

While this suited well with the Lua sockets use-case, this proved to
be a limitation with some other event sources (ie: lua Queue class)

instead of having to deal with both the non thread safe and thread
safe variants (_mt suffix), which is error prone, let's make the
entire API thread safe regarding the event list.

Pruning functions still require that only one thread executes them,
with Lua this is always the case because there is one cleanup list
per context.
2025-04-03 17:52:50 +02:00
Aurelien DARRAGON
748dba4859 MINOR: hlua_fcn: register queue class using hlua_register_metatable()
Most lua classes are registered by leveraging the
hlua_register_metatable() helper. Let's use that for the Queue class as
well for consitency.
2025-04-03 17:52:17 +02:00
Aurelien DARRAGON
b77b1a2c3a MINOR: task: add thread safe notification_new and notification_wake variants
notification_new and notification_wake were historically meant to be
called by a single thread doing both the init and the wakeup for other
tasks waiting on the signals.

In this patch, we extend the API so that notification_new and
notification_wake have thread-safe variants that can safely be used with
multiple threads registering on the same list of events and multiple
threads pushing updates on the list.
2025-04-03 17:52:03 +02:00
Amaury Denoyelle
f0f1816f1a MINOR: check: implement check-pool-conn-name srv keyword
This commit is a direct follow-up of the previous one. It defines a new
server keyword check-pool-conn-name. It is used as the default value for
the name parameter of idle connection hash generation.

Its behavior is similar to server keyword pool-conn-name, but reserved
for checks reuse. If check-pool-conn-name is set, it is used in priority
to match a connection for reuse. If unset, a fallback is performed on
check-sni.
2025-04-03 17:19:07 +02:00
Amaury Denoyelle
43367f94f1 MINOR: check/backend: support conn reuse with SNI
Support for connection reuse during server checks was implemented
recently. This is activated with the server keyword check-reuse-pool.

Similarly to stream processing via connect_backend(), a connection hash
is calculated when trying to perform reuse for checks. This is necessary
to retrieve for a connection which shares the check connect parameters.
However, idle connections can additionnally be tagged using a
pool-conn-name or SNI under connect_backend(). Check reuse does not test
these values, which prevent to retrieve a matching connection.

Improve this by using "check-sni" value as idle connection hash input
for check reuse. be_calculate_conn_hash() API has been adjusted so that
name value can be passed as input, both when using streams or checks.

Even with the current patch, there is still some scenarii which could
not be covered for checks connection reuse. most notably, when using
dynamic pool-conn-name/SNI value. It is however at least sufficient to
cover simpler cases.
2025-04-03 17:19:07 +02:00
Willy Tarreau
f435a2e518 CLEANUP: atomics: also replace __sync_synchronize() with __atomic_thread_fence()
The drop of older compilers also allows us to focus on clearer
barriers, so let's use them.
2025-04-03 11:59:31 +02:00
Willy Tarreau
34e3b83f9c CLEANUP: atomics: remove support for gcc < 4.7
The old __sync_* API is no longer necessary since we do not support
gcc before 4.7 anymore. Let's just get rid of this code, the file is
still ugly enough without it.
2025-04-03 11:55:35 +02:00
Ilia Shipitsin
27a6353ceb CLEANUP: assorted typo fixes in the code, commits and doc 2025-04-03 11:37:25 +02:00
William Lallemand
b351f06ff1 REORG: ssl: move curves2nid and nid2nist to ssl_utils
curves2nid and nid2nist are generic functions that could be used outside
the JWS scope, this patch put them at the right place so they can be
reused.
2025-04-02 19:34:09 +02:00
Amaury Denoyelle
f1fb396d71 MEDIUM: check: implement check-reuse-pool
Implement the possibility to reuse idle connections when performing
server checks. This is done thanks to the recently introduced functions
be_calculate_conn_hash() and be_reuse_connection().

One side effect of this change is that be_calculate_conn_hash() can now
be called with a NULL stream instance. As such, part of the functions
are adjusted accordingly.

Note that to simplify configuration, connection reuse is not performed
if any specific check connection parameters are defined on the server
line or via the tcp-check connect rule. This is performed via newly
defined tcpcheck_use_nondefault_connect().
2025-04-02 14:57:40 +02:00
Amaury Denoyelle
e34f748e3a MINOR: check define check-reuse-pool server keyword
Define a new server keyword check-reuse-pool, and its counterpart with a
"no" prefix. For the moment, only parsing is implemented. The real
behavior adjustment will be implemented in the next patch.
2025-04-02 14:57:40 +02:00
Amaury Denoyelle
20eb57b486 MINOR: backend: remove stream usage on connection reuse
Adjust newly defined be_reuse_connection() API. The stream argument is
removed. This will allows checks to be able to invoke it without relying
on a stream instance.
2025-04-02 14:57:40 +02:00
Amaury Denoyelle
ee94a6cfc1 MINOR: backend: extract conn reuse from connect_server()
Following the previous patch, the part directly related to connection
reuse is extracted from connect_server(). It is now define in a new
function be_reuse_connection().
2025-04-02 14:57:40 +02:00
Amaury Denoyelle
c7cc6b6401 MINOR: backend: extract conn hash calculation from connect_server()
On connection reuse, a hash is first calculated. It is generated from
various connection parameters, to retrieve a matching connection.

Extract hash calculation from connect_server() into a new dedicated
function be_calculate_conn_hash(). The objective is to be able to
perform connection reuse for checks, without connect_server() invokation
which relies on a stream instance.
2025-04-02 14:57:40 +02:00
Willy Tarreau
4ec5509541 BUILD: compiler: undefine the CONCAT() macro if already defined
As Ilya reported in issue #2911, the CONCAT() macro breaks on NetBSD
which defines its own as __CONCAT() (which is exactly the same). Let's
just undefine it before ours to fix the issue instead of renaming, but
keep ours so that we don't have doubts about what we're running with.

Note that the patch introducing this breaking change was backported
to 3.0.
2025-04-02 11:36:43 +02:00
Ilia Shipitsin
78b849b839 CLEANUP: assorted typo fixes in the code and comments
code, comments and doc actually.
2025-04-02 11:12:20 +02:00
Olivier Houchard
9fe72bba3c MAJOR: leastconn; Revamp the way servers are ordered.
For leastconn, servers used to just be stored in an ebtree.
Each server would be one node.
Change that so that nodes contain multiple mt_lists. Each list
will contain servers that share the same key (typically meaning
they have the same number of connections). Using mt_lists means
that as long as tree elements already exist, moving a server from
one tree element to another does no longer require the lbprm write
lock.
We use multiple mt_lists to reduce the contention when moving
a server from one tree element to another. A list in the new
element will be chosen randomly.
We no longer remove a tree element as soon as they no longer
contain any server. Instead, we keep a list of all elements,
and when we need a new element, we look at that list only if it
contains a number of elements already, otherwise we'll allocate
a new one. Keeping nodes in the tree ensures that we very
rarely have to take the lbrpm write lock (as it only happens
when we're moving the server to a position for which no
element is currently in the tree).

The number of mt_lists used is defined as FWLC_NB_LISTS.
The number of tree elements we want to keep is defined as
FWLC_MIN_FREE_ENTRIES, both in defaults.h.
The value used were picked afrer experimentation, and
seems to be the best choice of performances vs memory
usage.

Doing that gives a good boost in performances when a lot of
servers are used.
With a configuration using 500 servers, before that patch,
about 830000 requests per second could be processed, with
that patch, about 1550000 requests per second are
processed, on an 64-cores AMD, using 1200 concurrent connections.
2025-04-01 18:05:30 +02:00
Olivier Houchard
ba521a1d88 MINOR: threads: Add HA_RWLOCK_TRYRDTOWR()
Add HA_RWLOCK_TRYRDTOWR(), that tries to upgrade a lock
from reader to writer, and fails if any seeker or writer already
holds it.
2025-04-01 18:05:30 +02:00
Olivier Houchard
2a9436f96b MINOR: lbprm: Add method to deinit server and proxy
Add two new methods to lbprm, server_deinit() and proxy_deinit(),
in case something should be done at the lbprm level when
removing servers and proxies.
2025-04-01 18:05:30 +02:00
Olivier Houchard
17059098e7 MINOR: mt_list: Implement mt_list_try_lock_prev().
Implement mt_list_try_lock_prev(), that does the same thing
as mt_list_lock_prev(), exceot if the list is locked, it
returns { NULL, NULL } instaed of waiting.
2025-04-01 18:05:30 +02:00
William Lallemand
fdcb97614c MINOR: ssl/ckch: add substring parser for ckch_conf
Add a substring parser for the ckch_conf keyword parser, this will split
a string into multiple substring, and strdup them in a array.
2025-04-01 15:38:32 +02:00
William Lallemand
f8fe84caca MINOR: jws: emit the JWK thumbprint
jwk_thumbprint() is a function which is a function which implements
RFC7368 and emits a JWK thumbprint using a EVP_PKEY.

EVP_PKEY_EC_to_pub_jwk() and EVP_PKEY_RSA_to_pub_jwk() were changed in
order to match what is required to emit a thumbprint (ie, no spaces or
lines and the lexicographic order of the fields)
2025-04-01 11:57:55 +02:00
Willy Tarreau
1e9a2529aa MINOR: cpu-topo: pass an extra argument to ha_cpu_policy
This extra argument will allow common functions to distinguish between
multiple policies. For now it's not used.
2025-03-31 16:21:37 +02:00
Willy Tarreau
571573874a MINOR: cpu-set: add a new function to print cpu-sets in human-friendly mode
The new function "print_cpu_set()" will print cpu sets in a human-friendly
way, with commas and dashes for intervals. The goal is to keep them compact
enough.
2025-03-31 16:21:37 +02:00
Willy Tarreau
3955f151b1 MINOR: cpu-set: compare two cpu sets with ha_cpuset_isequal()
This function returns true if two CPU sets are equal.
2025-03-31 16:21:37 +02:00
Valentine Krasnobaeva
b303861469 MINOR: compiler: add __nonstring macro
GCC 15 throws the following warning on fixed-size char arrays if they do not
contain terminated NUL:

src/tools.c:2041:25: error: initializer-string for array of 'char' truncates NUL terminator but destination lacks 'nonstring' attribute (17 chars into 16 available) [-Werror=unterminated-string-initialization]
 2041 | const char hextab[16] = "0123456789ABCDEF";

We are using a couple of such definitions for some constants. Converting them
to flexible arrays, like: hextab[] = "0123456789ABCDEF" may have consequences,
as enlarged arrays won't fit anymore where they were possibly located due to
the memory alignement constraints.

GCC adds 'nonstring' variable attribute for such char arrays, but clang and
other compilers don't have it. Let's wrap 'nonstring' with our
__nonstring macro, which will test if the compiler supports this attribute.

This fixes the issue #2910.
2025-03-31 13:50:28 +02:00
Willy Tarreau
6b17310757 MEDIUM: pools: be a bit smarter when merging comparable size pools
By default, pools of comparable sizes are merged together. However, the
current algorithm is dumb: it rounds the requested size to the next
multiple of 16 and compares the sizes like this. This results in many
entries which are already multiples of 16 not being merged, for example
1024 and 1032 are separate, 65536 and 65540 are separate, 48 and 56 are
separate (though 56 merges with 64).

This commit changes this to consider not just the entry size but also the
average entry size, that is, it compares the average size of all objects
sharing the pool with the size of the object looking for a pool. If the
object is not more than 1% bigger nor smaller than the current average
size or if it neither 16 bytes smaller nor larger, then it can be merged.
Also, it always respects exact matches in order to avoid merging objects
into larger pools or worse, extending existing ones for no reason, and
when there's a tie, it always avoids extending an existing pool.

Also, we now visit all existing pools in order to spot the best one, we
do not stop anymore at the smallest one large enough. Theoretically this
could cost a bit of CPU but in practice it's O(N^2) with N quite small
(typically in the order of 100) and the cost at each step is very low
(compare a few integer values). But as a side effect, pools are no
longer sorted by size, "show pools bysize" is needed for this.

This causes the objects to be much better grouped together, accepting to
use a little bit more sometimes to avoid fragmentation, without causing
everyone to be merged into the same pool. Thanks to this we're now
seeing 36 pools instead of 48 by default, with some very nice examples
of compact grouping:

  - Pool qc_stream_r (80 bytes) : 13 users
      >  qc_stream_r : size=72 flags=0x1 align=0
      >  quic_cstrea : size=80 flags=0x1 align=0
      >  qc_stream_a : size=64 flags=0x1 align=0
      >  hlua_esub   : size=64 flags=0x1 align=0
      >  stconn      : size=80 flags=0x1 align=0
      >  dns_query   : size=64 flags=0x1 align=0
      >  vars        : size=80 flags=0x1 align=0
      >  filter      : size=64 flags=0x1 align=0
      >  session pri : size=64 flags=0x1 align=0
      >  fcgi_hdr_ru : size=72 flags=0x1 align=0
      >  fcgi_param_ : size=72 flags=0x1 align=0
      >  pendconn    : size=80 flags=0x1 align=0
      >  capture     : size=64 flags=0x1 align=0

  - Pool h3s (56 bytes) : 17 users
      >  h3s         : size=56 flags=0x1 align=0
      >  qf_crypto   : size=48 flags=0x1 align=0
      >  quic_tls_se : size=48 flags=0x1 align=0
      >  quic_arng   : size=56 flags=0x1 align=0
      >  hlua_flt_ct : size=56 flags=0x1 align=0
      >  promex_metr : size=48 flags=0x1 align=0
      >  conn_hash_n : size=56 flags=0x1 align=0
      >  resolv_requ : size=48 flags=0x1 align=0
      >  mux_pt      : size=40 flags=0x1 align=0
      >  comp_state  : size=40 flags=0x1 align=0
      >  notificatio : size=48 flags=0x1 align=0
      >  tasklet     : size=56 flags=0x1 align=0
      >  bwlim_state : size=48 flags=0x1 align=0
      >  xprt_handsh : size=48 flags=0x1 align=0
      >  email_alert : size=56 flags=0x1 align=0
      >  caphdr      : size=41 flags=0x1 align=0
      >  caphdr      : size=41 flags=0x1 align=0

  - Pool quic_cids (32 bytes) : 13 users
      >  quic_cids   : size=16 flags=0x1 align=0
      >  quic_tls_ke : size=32 flags=0x1 align=0
      >  quic_tls_iv : size=12 flags=0x1 align=0
      >  cbuf        : size=32 flags=0x1 align=0
      >  hlua_queuew : size=24 flags=0x1 align=0
      >  hlua_queue  : size=24 flags=0x1 align=0
      >  promex_modu : size=24 flags=0x1 align=0
      >  cache_st    : size=24 flags=0x1 align=0
      >  spoe_appctx : size=32 flags=0x1 align=0
      >  ehdl_sub_tc : size=32 flags=0x1 align=0
      >  fcgi_flt_ct : size=16 flags=0x1 align=0
      >  sig_handler : size=32 flags=0x1 align=0
      >  pipe        : size=24 flags=0x1 align=0

  - Pool quic_crypto (1032 bytes) : 2 users
      >  quic_crypto : size=1032 flags=0x1 align=0
      >  requri      : size=1024 flags=0x1 align=0

  - Pool quic_conn_r (65544 bytes) : 2 users
      >  quic_conn_r : size=65536 flags=0x1 align=0
      >  dns_msg_buf : size=65540 flags=0x1 align=0

On a very unscientific test consisting in sending 1 million H1 requests
and 1 million H2 requests to the stats page, we're seeing an ~6% lower
memory usage with the patch:

  before the patch:
    Total: 48 pools, 4120832 bytes allocated, 4120832 used (~3555680 by thread caches).

  after the patch:
    Total: 36 pools, 3880648 bytes allocated, 3880648 used (~3299064 by thread caches).

This should be taken with care however since pools allocate and release
in batches.
2025-03-25 18:01:01 +01:00
Pierre-Andre Savalle
8ed1e91efd MEDIUM: lb-chash: add directive hash-preserve-affinity
When using hash-based load balancing, requests are always assigned to
the server corresponding to the hash bucket for the balancing key,
without taking maxconn or maxqueue into account, unlike in other load
balancing methods like 'first'. This adds a new backend directive that
can be used to take maxconn and possibly maxqueue in that context. This
can be used when hashing is desired to achieve cache locality, but
sending requests to a different server is preferable to queuing for a
long time or failing requests when the initial server is saturated.

By default, affinity is preserved as was the case previously. When
'hash-preserve-affinity' is set to 'maxqueue', servers are considered
successively in the order of the hash ring until a server that does not
have a full queue is found.

When 'maxconn' is set on a server, queueing cannot be disabled, as
'maxqueue=0' means unlimited.  To support picking a different server
when a server is at 'maxconn' irrespective of the queue,
'hash-preserve-affinity' can be set to 'maxconn'.
2025-03-25 18:01:01 +01:00
Amaury Denoyelle
cf9e40bd8a MINOR: quic: define max-stream-data configuration as a ratio 2025-03-25 16:30:35 +01:00
Amaury Denoyelle
68c10d444d MINOR: mux-quic: define config for max-data
Define a new global configuration tune.quic.frontend.max-data. This
allows users to explicitely set the value for the corresponding QUIC TP
initial-max-data, with direct impact on haproxy memory consumption.
2025-03-25 16:30:09 +01:00
Amaury Denoyelle
a71007c088 MINOR: quic: move global tune options into quic_tune
A new structure quic_tune has recently been defined. Its purpose is to
store global options related to QUIC. Previously, only the tunable to
toggle pacing was stored in it.

This commit moves several QUIC related tunable from global to quic_tune
structure. This better centralizes QUIC configuration option and gives
room for future generic options.
2025-03-24 10:01:46 +01:00
Willy Tarreau
9091c5317f MINOR: cli/pools: record the list of pool registrations even when merging them
By default, create_pool() tries to merge similar pools into one. But when
dealing with certain bugs, it's hard to say which ones were merged together.
We do have the information at registration time, so let's just create a
list of registrations ("pool_registration") attached to each pool, that
will store that information. It can then be consulted on the CLI using
"show pools detailed", where the names, sizes, alignment and flags are
reported.
2025-03-21 17:09:30 +01:00
Aurelien DARRAGON
7ec6f4412c MINOR: stats: add alt_name field to stat_col struct
alt_name will be used by metric exporters to know how the metric should be
presented to the user. If the alt_name is NULL, the metric should be
ignored. For now only promex exporter will make use of this.
2025-03-21 17:04:54 +01:00
Olivier Houchard
98967aa09f MEDIUM: mt_list: Reduce the max number of loops with exponential backoff
Reduce the max number of loops in the mt_list code while waiting for
a lock to be available with exponential backoff. It's been observed that
the current value led to severe performances degradation at least on
some hardware, hopefully this value will be acceptable everywhere.
2025-03-21 11:30:59 +01:00
Aurelien DARRAGON
af68343a56 MINOR: stats: use stat_col storage stat_cols_info
Use stat_col storage for stat_cols_info[] array instead of name_desc.

As documented in 65624876f ("MINOR: stats: introduce a more expressive
stat definition method"), stat_col supersedes name_desc storage but
it remains backward compatible. Here we migrate to the new API to be
able to further extend stat_cols_info[] in following patches.
2025-03-20 11:38:32 +01:00
Aurelien DARRAGON
9c60fc9fe1 MINOR: stats: STATS_PX_CAP___B_ macro
STATS_PX_CAP___B_ points to STATS_PX_CAP_BE, it is just an alias
for consistency, like STATS_PX_CAP____S which points to
STATS_PX_CAP_SRV.
2025-03-20 11:37:47 +01:00
Aurelien DARRAGON
3c1b00b127 MINOR: stats: add .generic explicit field in stat_col struct
Further extend logic implemented in 65624876 ("MINOR: stats: introduce a
more expressive stat definition method") and 4e9e8418 ("MINOR: stats:
prepare stats-file support for values other than FN_COUNTER"): we don't
rely anymore on the presence of the capability to know if the metric is
generic or not. This is because it prevents us from setting a capability
on static statistics. Yet it could be useful to set the capability even
on static metrics, thus we add a dedicated .generic bit to tell haproxy
that the metric is generic and can be handled automatically by the API.

Also, ME_NEW_* helpers are not explicitly associated to generic metric
definition (as it was already the case before) to avoid ambiguities.
It may change in the future as we may need to use the new definition
method to define static metrics (without the generic bit set). But for
now it isn't the case as this need definition was implemented for generic
metrics support in the first place. If we want to define static metrics
using the API, we could add a new set of helpers for instance.
2025-03-20 11:37:21 +01:00
William Lallemand
2fb6270910 MEDIUM: ssl/ckch: make the ckch_conf more generic
The ckch_store_load_files() function makes specific processing for
PARSE_TYPE_STR as if it was a type only used for paths.

This patch changes a little bit the way it's done,
PARSE_TYPE_STR is only meant to strdup() a string and stores the
resulting pointer in the ckch_conf structure.

Any processing regarding the path is now done in the callback.

Since the callbacks were basically doing the same thing, they were
transformed into the DECLARE_CKCH_CONF_LOAD() macros which allows to
do some templating of these functions.

The resulting ckch_conf_load_* functions will do the same as before,
except they will also do the path processing instead of letting
ckch_store_load_files() do it, which means we don't need the "base"
member anymore in the struct ckch_conf_kws.
2025-03-19 18:08:40 +01:00
William Lallemand
b0ad777902 MINOR: tools: path_base() concatenates a path with a base path
With the SSL configuration, crt-base, key-base are often used, these
keywords concatenates the base path with the path when the path does not
start by  '/'.

This is done at several places in the code, so a function to do this
would be better to standardize the code.
2025-03-19 17:59:31 +01:00
William Lallemand
29b4b985c3 MINOR: jws: use jwt_alg type instead of a char
This patch implements the function EVP_PKEY_to_jws_algo() which returns
a jwt_alg compatible with the private key.

This value can then be passed to jws_b64_protected() and
jws_b64_signature() which modified to take an jwt_alg instead of a char.
2025-03-17 18:06:34 +01:00
William Lallemand
de67f25a7e MINOR: jws: add new functions in jws.h
Add signatures of jws_b64_payload(), jws_b64_protected(),
jws_b64_signature(), jws_flattened() which allows to create a complete
JWS flattened object.
2025-03-17 11:51:52 +01:00
Willy Tarreau
156430ceb6 MINOR: cpu-topo: add a CPU policy setting to the global section
We'll need to let the user decide what's best for their workload, and in
order to do this we'll have to provide tunable options. For that, we're
introducing struct ha_cpu_policy which contains a name, a description
and a function pointer. The purpose will be to use that function pointer
to choose the best CPUs to use and now to set the number of threads and
thread-groups, that will be called during the thread setup phase. The
only supported policy for now is "none" which doesn't set/touch anything
(i.e. all available CPUs are used).
2025-03-14 18:33:16 +01:00
Willy Tarreau
c93ee25054 MINOR: cpu-topo: add "only-node" and "drop-node" to cpu-set
These are processed after the topology is detected, and they allow to
restrict binding to or evict CPUs matching the indicated node(s).
2025-03-14 18:33:16 +01:00
Willy Tarreau
aa4776210b MINOR: cpu-topo: create an array of the clusters
The goal here is to keep an array of the known CPU clusters, because
we'll use that often to decide of the performance of a cluster and
its relevance compared to other ones. We'll store the number of CPUs
in it, the total capacity etc. For the capacity, we count one unit
per core, and 1/3 of it per extra SMT thread, since this is roughly
what has been measured on modern CPUs.

In order to ease debugging, they're also dumped with -dc.
2025-03-14 18:30:31 +01:00
Willy Tarreau
4a6eaf6c5e MINOR: cpu-topo: add a function to sort by cluster+capacity
The purpose here is to detect heterogenous clusters which are not
properly reported, based on the exposed information about the cores
capacity. The algorithm here consists in sorting CPUs by capacity
within a cluster, and considering as equal all those which have 5%
or less difference in capacity with the previous one. This allows
large clusters of more than 5% total between extremities, while
keeping apart those where the limit is more pronounced. This is
quite common in embedded environments with big.little systems, as
well as on some laptops.
2025-03-14 18:30:31 +01:00
Willy Tarreau
d169758fa9 MINOR: cpu-topo: make sure we don't leave unassigned IDs in the cpu_topo
It's important that we don't leave unassigned IDs in the topology,
because the selection mechanism is based on index-based masks, so an
unassigned ID will never be kept. This is particularly visible on
systems where we cannot access the CPU topology, the package id, node id
and even thread id are set to -1, and all CPUs are evicted due to -1 not
being set in the "only-cpu" sets.

Here in new function "cpu_fixup_topology()", we assign them with the
smallest unassigned value. This function will be used to assign IDs
where missing in general.
2025-03-14 18:30:31 +01:00
Willy Tarreau
af648c7b58 MINOR: cpu-topo: assign clusters to cores without and renumber them
Due to the previous commit we can end up with cores not assigned
any cluster ID. For this, at the end we sort the CPUs by topology
and assign cluster IDs to remaining CPUs based on pkg/node/llc.
For example an 14900 now shows 5 clusters, one for the 8 p-cores,
and 4 of 4 e-cores each.

The local cluster numbers are per (node,pkg) ID so that any rule could
easily be applied on them, but we also keep the global numbers that
will help with thread group assignment.

We still need to force to assign distinct cluster IDs to cores
running on a different L3. For example the EPYC 74F3 is reported
as having 8 different L3s (which is true) and only one cluster.

Here we introduce a new function "cpu_compose_clusters()" that is called
from the main init code just after cpu_detect_topology() so that it's
not OS-dependent. It deals with this renumbering of all clusters in
topology order, taking care of considering any distinct LLC as being
on a distinct cluster.
2025-03-14 18:30:31 +01:00
Willy Tarreau
a4471ea56d MINOR: cpu-topo: implement a CPU sorting mechanism by cluster ID
This will be used to detect and fix incorrect setups which report
the same cluster ID for multiple L3 instances.

The arrangement of functions in this file is becoming a real problem.
Maybe we should move all this to cpu_topo for example, and better
distinguish OS-specific and generic code.
2025-03-14 18:30:31 +01:00
Willy Tarreau
a8acdbd9fd MINOR: cpu-topo: implement a sorting mechanism by CPU locality
Once we've kept only the CPUs we want, the next step will be to form
groups and these ones are based on locality. Thus we'll have to sort by
locality. For now the locality is only inferred by the index. No grouping
is made at this point. For this we add the "cpu_reorder_by_locality"
function with a locality-based comparison function.
2025-03-14 18:30:31 +01:00
Willy Tarreau
18133a054d MINOR: cpu-topo: implement a sorting mechanism for CPU index
CPU selection will be performed by sorting CPUs according to
various criteria. For dumps however, that's really not convenient
and we'll need to reorder the CPUs according to their index only.
This is what the new function cpu_reorder_by_index() does. It's
called  in thread_detect_count() before dumping the CPU topology.
2025-03-14 18:30:31 +01:00
Willy Tarreau
1af4942c95 MEDIUM: thread: start to detect thread groups and threads min/max
By mutually refining the thread count and group count, we can try
to detect the most suitable setup for the current machine. Taskset
is implicitly handled correctly. tgroups automatically adapt to the
configured number of threads. cpu-map manages to limit tgroups to
the smallest supported value.

The thread-limit is enforced. Just like in cfgparse, if the thread
count was forced to a higher value, it's reduced and a warning is
emitted. But if it was not set, the thr_max value is bound to this
limit so that further calculations respect it.

We continue to default to the max number of available threads and 1
tgroup by default, with the limit. This normally allows to get rid
of that test in check_config_validity().
2025-03-14 18:30:30 +01:00
Willy Tarreau
f0661e79fe MINOR: global: add a command-line option to enable CPU binding debugging
During development, everything related to CPU binding and the CPU topology
is debugged using state dumps at various places, but it does make sense to
have a real command line option so that this remains usable in production
to help users figure why some CPUs are not used by default. Let's add
"-dc" for this. Since the list of global.tune.options values is almost
full and does not 100% match this option, let's add a new "tune.debug"
field for this.
2025-03-14 18:30:30 +01:00
Willy Tarreau
ac1db9db7d MINOR: thread: turn thread_cpu_mask_forced() into an init-time variable
The function is not convenient because it doesn't allow us to undo the
startup changes, and depending on where it's being used, we don't know
whether the values read have already been altered (this is not the case
right now but it's going to evolve).

Let's just compute the status during cpu_detect_usable() and set a
variable accordingly. This way we'll always read the init value, and
if needed we can even afford to reset it. Also, placing it in cpu_topo.c
limits cross-file dependencies (e.g. threads without affinity etc).
2025-03-14 18:30:30 +01:00
Willy Tarreau
7cb274439b MINOR: cpu-topo: add CPU topology detection for linux
This uses the publicly available information from /sys to figure the cache
and package arrangements between logical CPUs and fill ha_cpu_topo[], as
well as their SMT capabilities and relative capacity for those which expose
this. The functions clearly have to be OS-specific.
2025-03-14 18:30:30 +01:00
Willy Tarreau
8f72ce335a MINOR: cpu-topo: add detection of online CPUs on Linux
This adds a generic function ha_cpuset_detect_online() which for now
only supports linux via /sys. It fills a cpuset with the list of online
CPUs that were detected (or returns a failure).
2025-03-14 18:30:30 +01:00
Willy Tarreau
8c524c7c9d REORG: cpu-topo: move bound cpu detection from cpuset to cpu-topo
The cpuset files are normally used only for cpu manipulations. It happens
that the initial CPU binding detection was initially placed there since
there was no better place, but in practice, being OS-specific, it should
really be in cpu-topo. This simplifies cpuset which doesn't need to know
about the OS anymore.
2025-03-14 18:30:30 +01:00
Willy Tarreau
a6fdc3eaf0 MINOR: cpu-topo: update CPU topology from excluded CPUs at boot
Now before trying to resolve the thread assignment to groups, we detect
which CPUs are not bound at boot so that we can mark them with
HA_CPU_F_EXCLUDED. This will be useful to better know on which CPUs we
can count later. Note that we purposely ignore cpu-map here as we
don't know how threads and groups will map to cpu-map entries, hence
which CPUs will really be used.

It's important to proceed this way so that when we have no info we
assume they're all available.
2025-03-14 18:30:30 +01:00
Willy Tarreau
bdb731172c MINOR: cpu-topo: add a function to dump CPU topology
The new function cpu_dump_topology() will centralize most debugging
calls, and it can make efforts of not dumping some possibly irrelevant
fields (e.g. non-existing cache levels).
2025-03-14 18:30:30 +01:00
Willy Tarreau
041462c4af MINOR: cpu-topo: rely on _SC_NPROCESSORS_CONF to trim maxcpus
We don't want to constantly deal with as many CPUs as a cpuset can hold,
so let's first try to trim the value to what the system claims to support
via _SC_NPROCESSORS_CONF. It is obviously still subject to the limit of
the cpuset size though. The value is stored globally so that we can
reuse it elsewhere after initialization.
2025-03-14 18:30:30 +01:00
Willy Tarreau
656cedad42 MINOR: cpu-topo: allocate and initialize the ha_cpu_topo array.
This does the bare minimum to allocate and initialize a global
ha_cpu_topo array for the number of supported CPUs and release
it at deinit time.
2025-03-14 18:30:30 +01:00
Willy Tarreau
d165f5d3ab MINOR: cpu-topo: add ha_cpu_topo definition
This structure will be used to store information about each CPU's
topology (package ID, L3 cache ID, NUMA node ID etc). This will be used
in conjunction with CPU affinity setting to try to perform a mostly
optimal binding between threads and CPU numbers by default. Since it
was noticed during tests that absolutely none of the many machines
tested reports different die numbers, the die_id is not stored.
Also, it was found along experiments that the cluster ID will be used
a lot, half of the time as a node-local identifier, and half of the
time as a global identifier. So let's store the two versions at once
(cl_gid, cl_lid).

Some flags are added to indicate causes of exclusion (offline, excluded
at boot, excluded by rules, ignored by policy).
2025-03-14 18:30:30 +01:00
Willy Tarreau
69ac4cd315 MINOR: compiler: add a new __decl_thread_var() macro to declare local variables
__decl_thread() already exists but is more suited for struct members.
When using it in a variables block, it appends the final trailing
semi-colon which is a statement that ends the variable block. Better
clean this up and have one precisely for variable blocks. In this
case we can simply define an unused enum value that will consume the
semi-colon. That's what the new macro __decl_thread_var() does.
2025-03-12 18:08:12 +01:00
Willy Tarreau
bb4addabb7 MINOR: compiler: add a simple macro to concatenate resolved strings
It's often useful to be able to concatenate strings after resolving
them (e.g. __FILE__, __LINE__ etc). Let's just have a CONCAT() macro
to do that, which calls _CONCAT() with the same arguments to make
sure the contents are resolved before being concatenated.
2025-03-12 18:06:55 +01:00
Aurelien DARRAGON
003fe530ae MINOR: log: add "option host" log-forward option
add only the parsing part, options are currently unused
2025-03-12 10:51:35 +01:00
Aurelien DARRAGON
47f14be9f3 MINOR: tools: only print address in sa2str() when port == -1
Support special value for port in sa2str: if port is equal to -1, only
print the address without the port, also ignoring <map_ports> value.
2025-03-12 10:51:20 +01:00
Aurelien DARRAGON
bc76f6dde9 MINOR: log: migrate log-forward options from proxy->options2 to options3
Migrate recently added log-forward section options, currently stored under
proxy->options2 to proxy->options3 since proxy->options2 is running out of
space and we plan on adding more log-forward options.
2025-03-12 10:50:03 +01:00
Aurelien DARRAGON
cc5a66212d MINOR: proxy: add proxy->options3
proxy->options2 is almost full, yet we will add new log-forward options
in upcoming patches so we anticipate that by adding a new {no_}options3
and cfg_opts3[] to further extend proxy options
2025-03-12 10:49:36 +01:00
Amaury Denoyelle
dc7913d814 MAJOR: mux-quic: increase stream flow-control for multi-buffer alloc
Support for multiple Rx buffers per QCS instance has been introduced by
previous patches. However, due to flow-control initial values, client
were still unable to fully used this to increase their upload
throughput.

This patch increases max-stream-data-bidi-remote flow-control initial
values. A new define QMUX_STREAM_RX_BUF_FACTOR will fix the number of
concurrent buffers allocable per QCS. It is set to 90.

Note that connection flow-control initial value did not changed. It is
still configured to be equivalent to bufsize multiplied by the maximum
concurrent streams. This ensures that Rx buffers allocation is still
constrained per connection, so that it won't be possible to have all
active QCS instances using in parallel their maximum Rx buffers count.
2025-03-07 12:06:27 +01:00
Amaury Denoyelle
a4f31ffeeb MINOR: mux-quic: store QCS Rx buf in a single-entry tree
Convert QCS rx buffer pointer to a tree container. Additionnaly, offset
field of qc_stream_rxbuf is thus transformed into a node tree.

For now, only a single Rx buffer is stored at most in QCS tree. Multiple
Rx buffers will be implemented in a future patch to improve QUIC clients
upload throughput.
2025-03-07 12:06:26 +01:00
Amaury Denoyelle
cc3c2d1f12 MINOR: mux-quic: define rxbuf wrapper
Define a new type qc_stream_rxbuf. This is used as a wrapper around QCS
Rx buffer with encapsulation of the ncbuf storage. It is allocated via a
new pool. Several functions are adapted to be able to deal with
qc_stream_rxbuf as a wrapper instead of the previous plain ncbuf
instance.

No functional change should happen with this patch. For now, only a
single qc_stream_rxbuf can be instantiated per QCS. However, this new
type will be useful to implement multiple Rx buffer storage in a future
commit.
2025-03-07 12:06:26 +01:00
Amaury Denoyelle
4b1e63d191 MINOR: mux-quic: define globally stream rxbuf size
QCS uses ncbuf for STREAM data storage. This serves as a limit for
maximum STREAM buffering capacity, advertised via QUIC transport
parameters for initial flow-control values.

Define a new function qmux_stream_rx_bufsz() which can be used to
retrieve this Rx buffer size. This can be used both in MUX/H3 layers and
in QUIC transport parameters.
2025-03-07 12:06:26 +01:00
Amaury Denoyelle
861b11334c MINOR: h3/hq-interop: restore function for standalone FIN receive
Previously, a function qcs_http_handle_standalone_fin() was implemented
to handle a received standalone FIN, bypassing app_ops layer decoding.
However, this was removed as app_ops layer interaction is necessary. For
example, HTTP/3 checks that FIN is never sent on the control uni stream.

This patch reintroduces qcs_http_handle_standalone_fin(), albeit in a
slightly diminished version. Most importantly, it is now the
responsibility of the app_ops layer itself to use it, to avoid the
shortcoming described above.

The main objective of this patch is to be able to support standalone FIN
in HTTP/0.9 layer. This is easily done via the reintroduction of
qcs_http_handle_standalone_fin() usage. This will be useful to perform
testing, as standalone FIN is a corner case which can easily be broken.
2025-03-07 12:06:26 +01:00
Valentine Krasnobaeva
e900ef987e BUG/MEIDUM: startup: return to initial cwd only after check_config_validity()
In check_config_validity() we evaluate some sample fetch expressions
(log-format, server rules, etc). These expressions may use external files like
maps.

If some particular 'default-path' was set in the global section before, it's no
longer applied to resolve file pathes in check_config_validity(). parse_cfg()
at the end of config parsing switches back to the initial cwd.

This fixes the issue #2886.

This patch should be backported in all stable versions since 2.4.0, including
2.4.0.
2025-03-06 10:49:48 +01:00
Roberto Moreda
f98b5c4f59 MINOR: log: add dont-parse-log and assume-rfc6587-ntf options
This commit introduces the dont-parse-log option to disable log message
parsing, allowing raw log data to be forwarded without modification.

Also, it adds the assume-rfc6587-ntf option to frame log messages
using only non-transparent framing as per RFC 6587. This avoids
missparsing in certain cases (mainly with non RFC compliant messages).

The documentation is updated to include details on the new options and
their intended use cases.

This feature was discussed in GH #2856
2025-03-06 09:30:39 +01:00
Aurelien DARRAGON
0746f6bde0 MINOR: cfgparse-listen: add and use cfg_parse_listen_match_option() helper
cfg_parse_listen_match_option() takes cfg_opt array as parameter, as well
current args, expected mode and cap bitfields.

It is expected to be used under cfg_parse_listen() function or similar.
Its goal is to remove code duplication around proxy->options and
proxy->options2 handling, since the same checks are performed for the
two. Also, this function could help to evaluate proxy options for
mode-specific proxies such as log-forward section for instance:
by giving the expected mode and capatiblity as input, the function
would only match compatible options.
2025-03-06 09:30:18 +01:00
Aurelien DARRAGON
d9aa199100 MINOR: proxy: make pr_mode enum bitfield compatible
Current pr_mode enum is a regular enum because a proxy only supports one
mode at a time. However it can be handy for a function to be given a
list of compatible modes for a proxy, and we can't do that using a
bitfield because pr_mode is not bitfield compatible (values share
the same bits).

In this patch we manually define pr_mode values so that they are all
using separate bits and allows a function to take a bitfield of
compatible modes as parameter.
2025-03-06 09:30:11 +01:00
Olivier Houchard
335ef3264b DEBUG: init: Add a macro to register unit tests
Add a new macro, REGISTER_UNITTEST(), that will automatically make sure
we call hap_register_unittest(), instead of having to create a function
that will do so.
2025-03-04 18:18:10 +01:00
William Lallemand
a647839954 DEBUG: init: add a way to register functions for unit tests
Doing unit tests with haproxy was always a bit difficult, some of the
function you want to test would depend on the buffer or trash buffer
initialisation of HAProxy, so building a separate main() for them is
quite hard.

This patch adds a way to register a function that can be called with the
"-U" parameter on the command line, will be executed just after
step_init_1() and will exit the process with its return value as an exit
code.

When using the -U option, every keywords after this option is passed to
the callback and could be used as a parameter, letting the capability to
handle complex arguments if required by the test.

HAProxy need to be built with DEBUG_UNIT to activate this feature.
2025-03-03 12:43:32 +01:00
William Lallemand
4dc0ba233e MINOR: jws: implement a JWK public key converter
Implement a converter which takes an EVP_PKEY and converts it to a
public JWK key. This is the first step of the JWS implementation.

It supports both EC and RSA keys.

Know to work with:

- LibreSSL
- AWS-LC
- OpenSSL > 1.1.1
2025-03-03 12:43:32 +01:00
Willy Tarreau
730641f7ca BUG/MINOR: server: check for either proxy-protocol v1 or v2 to send hedaer
As reported in issue #2882, using "no-send-proxy-v2" on a server line does
not properly disable the use of proxy-protocol if it was enabled in a
default-server directive in combination with other PP options. The reason
for this is that the sending of a proxy header is determined by a test on
srv->pp_opts without any distinction, so disabling PPv2 while leaving other
options results in a PPv1 header to be sent.

Let's fix this by explicitly testing for the presence of either send-proxy
or send-proxy-v2 when deciding to send a proxy header.

This can be backported to all versions. Thanks to Andre Sencioles (@asenci)
for reporting the issue and testing the fix.
2025-03-03 04:05:47 +01:00
Olivier Houchard
706b008429 MEDIUM: servers: Add strict-maxconn.
Maxconn is a bit of a misnomer when it comes to servers, as it doesn't
control the maximum number of connections we establish to a server, but
the maximum number of simultaneous requests. So add "strict-maxconn",
that will make it so we will never establish more connections than
maxconn.
It extends the meaning of the "restricted" setting of
tune.takeover-other-tg-connections, as it will also attempt to get idle
connections from other thread groups if strict-maxconn is set.
2025-02-26 13:00:18 +01:00
Olivier Houchard
8de8ed4f48 MEDIUM: connections: Allow taking over connections from other tgroups.
Allow haproxy to take over idle connections from other thread groups
than our own. To control that, add a new tunable,
tune.takeover-other-tg-connections. It can have 3 values, "none", where
we won't attempt to get connections from the other thread group (the
default), "restricted", where we only will try to get idle connections
from other thread groups when we're using reverse HTTP, and "full",
where we always try to get connections from other thread groups.
Unless there is a special need, it is advised to use "none" (or
restricted if we're using reverse HTTP) as using connections from other
thread groups may have a performance impact.
2025-02-26 13:00:18 +01:00
Olivier Houchard
c36aae2af1 MINOR: pollers: Add a fixup_tgid_takeover() method.
Add a fixup_tgid_takeover() method to pollers for which it makes sense
(epoll, kqueue and evport). That method can be called after a takeover
of a fd from a different thread group, to make sure the poller's
internal structure reflects the new state.
2025-02-26 13:00:18 +01:00
Olivier Houchard
c5cc09c00d MINOR: fd: Add fd_lock_tgid_cur().
Add fd_lock_tgid_cur(), a function that will lock the tgid, without
modifying its value.
2025-02-26 13:00:18 +01:00
Olivier Houchard
52b97ff8dd MEDIUM: fd: Wait if locked in fd_grab_tgid() and fd_take_tgid().
Wait while the tgid is locked in fd_grab_tgid() and fd_take_tgid().
As that lock is barely used, it should have no impact.
2025-02-26 13:00:18 +01:00
Willy Tarreau
fb7874c286 MINOR: tinfo: split the signal handler report flags into 3
While signals are not recursive, one signal (e.g. wdt) may interrupt
another one (e.g. debug). The problem this causes is that when leaving
the inner handler, it removes the outer's flag, hence the protection
that comes with it. Let's just have 3 distinct flags for regular signals,
debug signal and watchdog signal. We add a 4th definition which is an
aggregate of the 3 to ease testing.
2025-02-24 13:37:52 +01:00
Vincent Dechenaux
9011b3621b MINOR: compression: Introduce minimum size
This is the introduction of "minsize-req" and "minsize-res".
These two options allow you to set the minimum payload size required for
compression to be applied.
This helps save CPU on both server and client sides when the payload does
not need to be compressed.
2025-02-22 11:32:40 +01:00
Willy Tarreau
29e246a84c MINOR: freq_ctr: provide non-blocking read functions
Some code called by the debug handlers in the context of a signal handler
accesses to some freq_ctr and occasionally ends up on a locked one from
the same thread that is dumping it. Let's introduce a non-blocking version
that at least allows to return even if the value is in the process of being
updated, it's less problematic than hanging.
2025-02-21 18:26:29 +01:00