haproxy

mirror of https://github.com/haproxy/haproxy.git synced 2026-05-28 04:12:17 -04:00

Author	SHA1	Message	Date
Amaury Denoyelle	57e9425dbc	MINOR: session: strengthen idle conn limit check Add a BUG_ON() on session_check_idle_conn() to ensure the connection is not already flagged as CO_FL_SESS_IDLE. This checks that this function is only called one time per connection transition from active to idle. This is necessary to ensure that session idle counter is only incremented one time per connection.	2025-07-30 11:40:16 +02:00
Amaury Denoyelle	ec1ab8d171	MINOR: session: remove redundant target argument from session_add_conn() session_add_conn() uses three argument : connection and session instances, plus a void pointer labelled as target. Typically, it represents the server, but can also be a backend instance (for example on dispatch). In fact, this argument is redundant as <target> is already a member of the connection. This commit simplifies session_add_conn() by removing it. A BUG_ON() on target is extended to ensure it is never NULL.	2025-07-30 11:39:57 +02:00
Amaury Denoyelle	668c2cfb09	MINOR: session: strengthen connection attach to session This commit is the first one of a serie to refactor insertion of backend private connection into the session list. session_add_conn() is used to attach a connection into a session list. Previously, this function would report an error if the connection specified was already attached to another session. However, this case currently never happens and thus can be considered as buggy. Remove this check and replace it with a BUG_ON(). This allows to ensure that session insertion remains consistent. The same check is also transformed in session_check_idle_conn().	2025-07-30 11:39:26 +02:00
Aurelien DARRAGON	14966c856b	MINOR: clock: make global_now_ns a pointer as well Similar to previous commit but for global_now_ns	2025-07-29 18:04:15 +02:00
Aurelien DARRAGON	4a20b3835a	MINOR: clock: make global_now_ms a pointer This is preparation work for shared counters between co-processes. As co-processes will need to share a common date. global_now_ms will be used for that as it will point to the shm when sharing is enabled. Thus in this patch we turn global_now_ms into a pointer (and adjust the places where it is written to and read from, hopefully atomic operations through pointer are already used so the change is trivial) For now global_now_ms points to process-local _global_now_ms which is a fallback for when sharing through the shm is not enabled.	2025-07-29 18:04:14 +02:00
Aurelien DARRAGON	713ebd2750	CLEANUP: counters: rename counters_be_shared_init to counters_be_shared_prepare `75e480d10` ("MEDIUM: stats: avoid 1 indirection by storing the shared stats directly in counters struct") took care of renaming counters_fe_shared_init() but we forgot counters_be_shared_init(). Let's fix that for consistency	2025-07-29 18:00:13 +02:00
William Lallemand	83a335f925	MINOR: acme: implement traces Implement traces for the ACME protocol. -dt acme:data:complete will dump every input and output buffers, including decoded buffers before being converted to JWS. It will also dump certificates in the traces. -dt acme:user:complete will only dump the state of the task handler.	2025-07-29 17:25:10 +02:00
Aurelien DARRAGON	c24de077bd	OPTIM: stats: store fast sharded counters pointers at session and stream level Following commit `75e480d10` ("MEDIUM: stats: avoid 1 indirection by storing the shared stats directly in counters struct"), in order to minimize the impact of the recent sharded counters work, we try to push things a bit further in this patch by storing and using "fast" pointers at the session and stream levels when available to avoid costly indirections and systematic "tgid" resolution (which can not be cached by the CPU due to its THREAD-local nature). Indeed, we know that a session/stream is tied to a given CPU, thanks to this we know that the tgid for a given session/stream will never change. Given that, we are able to store sharded frontend and listener counters pointer at the session level (namely sess->fe_tgcounters and sess->li_tgcounters), and once the backend and the server are selected, we are also able to store backend and server sharded counters pointer at the stream level (namely s->be_tgcounters and s->sv_tgcounters) Everywhere we rely on these counters and the stream or session context is available, we use the fast pointers it instead of the indirect pointers path to make the pointer resolution a bit faster. This optimization proved to bring a few percents back, and together with the previous `75e480d10` commit we now fixed the performance regression (we are back to back with 3.2 stats performance)	2025-07-25 18:24:23 +02:00
Aurelien DARRAGON	cf8ba60c88	CLEANUP: peers: remove unused peer_session_target() Since commit `7293eb68` ("MEDIUM: peers: use server as stream target") peer session target always point to server in order to benefit from existing server transport options. Thanks to that, it is no longer necessary to have peer_session_target() helper function, because all it does is return the pointer to the server object. Let's get rid of that	2025-07-25 18:24:17 +02:00
Ben Kallus	1e48ec7f6c	CLEANUP: include: replace hand-rolled offsetof to avoid UB The C standard specifies that it's undefined behavior to dereference NULL (even if you use & right after). The hand-rolled offsetof idiom &(((s)NULL)->f) is thus technically undefined. This clutters the output of UBSan and is simple to fix: just use the real offsetof when it's available. Note that there's no clear statement about this point in the spec, only several points which together converge to this: - From N3220, 6.5.3.4: A postfix expression followed by the -> operator and an identifier designates a member of a structure or union object. The value is that of the named member of the object to which the first expression points, and is an lvalue. - From N3220, 6.3.2.1: An lvalue is an expression (with an object type other than void) that potentially designates an object; if an lvalue does not designate an object when it is evaluated, the behavior is undefined. - From N3220, 6.5.4.4 p3: The unary & operator yields the address of its operand. If the operand has type "type", the result has type "pointer to type". If the operand is the result of a unary operator, neither that operator nor the & operator is evaluated and the result is as if both were omitted, except that the constraints on the operators still apply and the result is not an lvalue. Similarly, if the operand is the result of a [] operator, neither the & operator nor the unary * that is implied by the [] is evaluated and the result is as if the & operator were removed and the [] operator were changed to a + operator. => In short, this is saying that C guarantees these identities: 1. &(p) is equivalent to p 2. &(p[n]) is equivalent to p + n As a consequence, &(p) doesn't result in the evaluation of *p, only the evaluation of p (and similar for []). There is no corresponding special carve-out for ->. See also: https://pvs-studio.com/en/blog/posts/cpp/0306/ After this patch, HAProxy can run without crashing after building w/ clang-19 -fsanitize=undefined -fno-sanitize=function,alignment	2025-07-25 17:54:32 +02:00
Ben Kallus	d3b46cca7b	CLEANUP: compiler: prefer char * over void * for pointer arithmetic This patch changes two instances of pointer arithmetic on void * to use char * instead, to avoid UB. This is essentially to please UB analyzers, though.	2025-07-25 17:54:32 +02:00
Aurelien DARRAGON	75e480d107	MEDIUM: stats: avoid 1 indirection by storing the shared stats directly in counters struct Between 3.2 and 3.3-dev we noticed a noticeable performance regression due to stats handling. After bisecting, Willy found out that recent work to split stats computing accross multiple thread groups (stats sharding) was responsible for that performance regression. We're looking at roughly 20% performance loss. More precisely, it is the added indirections, multiplied by the number of statistics that are updated for each request, which in the end causes a significant amount of time being spent resolving pointers. We noticed that the fe_counters_shared and be_counters_shared structures which are currently allocated in dedicated memory since `a0dcab5c` ("MAJOR: counters: add shared counters base infrastructure") are no longer huge since `16eb0fab31` ("MAJOR: counters: dispatch counters over thread groups") because they now essentially hold flags plus the per-thread group id pointer mapping, not the counters themselves. As such we decided to try merging fe_counters_shared and be_counters_shared in their parent structures. The cost is slight memory overhead for the parent structure, but it allows to get rid of one pointer indirection. This patch alone yields visible performance gains and almost restores 3.2 stats performance. counters_fe_shared_get() was renamed to counters_fe_shared_prepare() and now returns either failure or success instead of a pointer because we don't need to retrieve a shared pointer anymore, the function takes care of initializing existing pointer.	2025-07-25 16:46:10 +02:00
Christopher Faulet	b8d5307bd9	MEDIUM: applet: Emit a warning when a legacy applet is spawned To motivate developers to support the new applets API, a warning is now emitted when a legacy applet is spawned. To not flood users, this warning is only emitted once per legacy applet. To do so, the applet flag APPLET_FL_WARNED was added. It is set when the warning is emitted. Note that test and set on this flag are not performed via atomic operations. So it is possible to have more than one warning for a given applet if it is spawned in same time on several threads. At worrst, there is one warning per thread.	2025-07-25 15:53:33 +02:00
Christopher Faulet	337768656b	MINOR: applet: Add support for flags on applets with a flag about the new API A new field was added in the applet structure to be able to set flags on the applets The first one is related to the new API. APPLET_FL_NEW_API is set for applets based on the new API. It was set on all HAProxy's applets.	2025-07-25 15:44:02 +02:00
Christopher Faulet	1f9a1cbefc	MINOR: applet: Improve applet API to take care of inbuf/outbuf alloc failures applet_get_inbuf() and applet_get_outbuf() functions were not testing if the buffers were available. So, the caller had to check them before calling one of these functions. It is not really handy. So now, these functions take care to have a fully usable buffer before returning. Otherwise NULL is returned.	2025-07-24 12:13:41 +02:00
Christopher Faulet	44aae94ab9	MINOR: applet: Add HTX versions for applet_input_data() and applet_output_room() It will be useful for HTX applets because availale data in the input buffer and available space in the output buffer are computed from the HTX message and not the buffer itself. So now, applet_htx_input_data() and applet_htx_output_room() functions can be used.	2025-07-24 12:13:41 +02:00
Christopher Faulet	d9855102cf	BUG/MEDIUM: Remove sync sends from streams to applets When the applet API was reviewed to use dedicated buffers, the support for sends from the streams to applets was added. Unfortunately, it was not a good idea because this way it is possible to deliver data to an applet and release it just after, truncated data. Indeed, the release stage for applets is related to the stream release itself. However, unlike the multiplexers, the applets cannot survive to a stream for now. So, for now, the sync sends from the streams is removed for applets, waiting for a better way to handle the applets release stage. Note that this only concerns applets using their own buffers. And of now, the bug is harmless because all refactored applets are on server side and consume data first. But this will be an issue with the HTTP client. This patch should be backported as far as 3.0 after a period of observation.	2025-07-24 12:13:41 +02:00
Christopher Faulet	574d0d8211	BUG/MINOR: applet: Fix applet_getword() to not return one extra byte applet_getword() function is returning one extra byte when a string is returned because the "ret" variable is not reset before the loop on the data. The patch also fixes applet_getline(). It is a 3.3-specific issue. No need to backport.	2025-07-24 12:13:41 +02:00
Christopher Faulet	41a40680ce	BUG/MEDIUM: stconn: Fix conditions to know an applet can get data from stream sc_is_send_allowed() function is used to know if an applet is able to receive data from the stream. But this function was designed for applets using the channels buffer. It is not adapted to applets using their own buffers. when the SE_FL_WAIT_DATA flag is set, it means the applet is waiting for more data and should not be woken up without new data. For applets using channels buffer, just testing the flag is enough because process_stream() will remove if when more data will be available. For applets using their own buffers, it is more complicated. Some data may be blocked in the output channel buffer. In that case, and when the applet input buffer can receive daa, the applet can be woken up. This patch must be backported as far as 3.0 after a period of observation.	2025-07-24 12:13:41 +02:00
Christopher Faulet	0d371d2729	BUG/MEDIUM: applet: State inbuf is no longer full if input data are skipped When data are skipped from the input buffer of an applet, we must take care to notify the input buffer is no longer full. Otherwise, this could prevent the stream to push data to the applet. It is 3.3-specific. No backport needed.	2025-07-24 12:13:41 +02:00
Ilia Shipitsin	a2267fafcf	CLEANUP: acme: fix wrong spelling of "resources" "ressources" was used as a variable name, let's use English variant to make spell check happier	2025-07-24 08:11:42 +02:00
Amaury Denoyelle	3bf37596ba	MINOR: mux-quic: store session in QCS instance Add a new <sess> member into QCS structure. It is used to store the parent session of the stream on attach operation. This is only done for backend side. This new member will become necessary when connection reuse will be implemented. <owner> member of connection is not suitable as it could be set to NULL, notably after a session_add_conn() failure. Also, a single BE conn can be shared along different session instance, in particular when using aggressive/always reuse mode. Thus it is necessary to linked each QCS instance with its session.	2025-07-23 15:42:37 +02:00
Remi Tricot-Le Breton	8f2b787241	MINOR: ssl: Add curves in ssl traces Dump the ClientHello curves in the SSL traces.	2025-07-21 16:44:50 +02:00
Remi Tricot-Le Breton	d799a1b3b2	MINOR: ssl: Add curve id to curve name table and mapping functions The SSL libraries like OpenSSL for instance do not seem to actually provide a public mapping between IANA defined curve IDs and curve names, or even a mapping between curve IDs and internal NIDs. This new table regroups all those information in a single table so that we can convert curve names (be it SECG or NIST format) to curve IDs or NIDs. The previously existing 'curves2nid' function now uses the new table, and a new 'curveid2str' one is added.	2025-07-21 16:44:50 +02:00
Remi Tricot-Le Breton	f00d9bf12d	MINOR: ssl: Add ciphers in ssl traces Decode the contents of the ClientHello ciphers extension and dump a human readable list in the ssl traces.	2025-07-21 16:44:50 +02:00
Frederic Lecaille	14d0f74052	MINOR: quic: Remove pool_head_quic_be_cc_buf pool This patch impacts the QUIC frontends. It reverts this patch MINOR: quic-be: add a "CC connection" backend TX buffer pool which adds <pool_head_quic_be_cc_buf> new pool to allocate CC (connection closed state) TX buffers with bigger object size than the one for <pool_head_quic_cc_buf>. Indeed the QUIC backends must be able to send at least 1200 bytes Initial packets. For now on, both the QUIC frontends and backend use the same pool with MAX(QUIC_INITIAL_IPV6_MTU, QUIC_INITIAL_IPV4_MTU)(1252 bytes) as object size.	2025-07-17 19:33:21 +02:00
Valentine Krasnobaeva	9e11c852fe	MINOR: cpu-topo: write thread-cpu bindings into trash buffer Write thread-cpu bindings and cluster summary into provided trash buffer. Like this we can call this function in any place, when this info is needed.	2025-07-17 19:07:58 +02:00
Valentine Krasnobaeva	2405283230	MINOR: cpu-topo: split cpu_dump_topology() to show its summary in show dev cpu_dump_topology() prints details about each enabled CPU and a summary with clusters info and thread-cpu bindings. The latter is often usefull for debugging and we want to add it in the 'show dev' output. So, let's split cpu_dump_topology() in two parts: cpu_topo_debug() to print the details about each enabled CPU; and cpu_topo_dump_summary() to print only the summary. In the next commit we will modify cpu_topo_dump_summary() to write into local trash buffer and it could be easily called from debug_parse_cli_show_dev().	2025-07-17 19:07:46 +02:00
Willy Tarreau	b6d0ecd258	DOC: connection: explain the rules for idle/safe/avail connections It's super difficult to find the rules that operate idle conns depending on their idle/safe/avail/private status. Some are in lists, others not. Some are in trees, others not. Some have a flag set, others not. This documents the rules before the definitions in connection-t.h. It could even be backported to help during backport sessions.	2025-07-16 18:53:57 +02:00
Frederic Lecaille	838024e07e	MINOR: quic: Get rid of qc_is_listener() Replace all calls to qc_is_listener() (resp. !qc_is_listener()) by calls to objt_listener() (resp. objt_server()). Remove qc_is_listener() implement and QUIC_FL_CONN_LISTENER the flag it relied on.	2025-07-16 16:42:21 +02:00
Christopher Faulet	4f7c26cbb3	BUG/MINOR: applet: Don't trigger BUG_ON if the tid is not on appctx init When an appctx is initialized, there is a BUG_ON() to be sure the appctx is really initialized on the right thread to avoid bugs on the thread affinity. However, it is possible to not choose the thread when the appctx is created and let it starts on any thread. In that case, the thread affinity is set when the appctx is initialized. So, we must take cate to not trigger the BUG_ON() in that case. For now, we never hit the bug because the thread affinity is always set during the appctx creation. This patch must be backport as far as 2.8.	2025-07-16 13:47:33 +02:00
Amaury Denoyelle	63586a8ab4	BUG/MINOR: h3: properly handle interim response on BE side On backend side, H3 layer is responsible to decode a HTTP/3 response into an HTX message. Multiple responses may be received on a single stream with interim status codes prior to the final one. h3_resp_headers_to_htx() is the function used solely on backend side responsible for H3 response to HTX transcoding. This patch extends it to be able to properly support interim responses. When such a response is received, the new flag H3_SF_RECV_INTERIM is set. This is converted to QMUX qcs flag QC_SF_EOI_SUSPENDED. The objective of this latter flag is to prevent stream EOI to be reported during stream rcv_buf callback, even if HTX message contains EOM and is empty. QC_SF_EOI_SUSPENDED will be cleared when the final response is finally converted, which unblock stream EOI notification for next rcv_buf invocations. Note however that HTX EOM is untouched : it is always set for both interim and final response reception. As a minor adjustment, HTX_SL_F_BODYLESS is always set for interim responses. Contrary to frontend interim response handling, a flag is necessary on QMUX layer. This is because H3 to HTX transcoding and rcv_buf callback are two distinct operations, called under different context (MUX vs stream tasklet). Also note that H3 layer has two distinct flags for interim response handling, one only used as a server (FE side) and the other as a client (BE side). It was preferred to used two distinct flags which is considered less error-prone, contrary to a single unified flag which would require to always set the proxy side to ensure it is relevant or not. No need to backport.	2025-07-15 18:39:23 +02:00
Amaury Denoyelle	f349df44b4	MINOR: qmux: change API for snd_buf FIN transmission Previous patches have fixes interim response encoding via h3_resp_headers_send(). However, it is still necessary to adjust h3 layer state-machine so that several successive HTTP responses are accepted for a single stream. Prior to this, QMUX was responsible to decree that the final HTX message was encoded so that FIN stream can be emitted. However, with interim response, MUX is in fact unable to properly determine this. As such, this is the responsibility of the application protocol layer. To reflect this, app_ops snd_buf callback is modified so that a new output argument <fin> is added to it. Note that for now this commit does not bring any functional change. However, it will be necessary for the following patch. As such, it should be backported prior to it to every versions as necessary.	2025-07-15 18:39:23 +02:00
Willy Tarreau	4ac28f07d0	MEDIUM: proxy: take the defsrv out of the struct proxy The server struct has gone huge over time (~3.8kB), and having a copy of it in the defsrv section of the struct proxy costs a lot of RAM, that is not needed anymore at run time. This patch replaces this struct with a dynamically allocated one. The field is allocated and initialized during alloc_new_proxy() and is freed when the proxy is destroyed for now. But the goal will be to support freeing it after parsing the section.	2025-07-15 10:34:18 +02:00
Willy Tarreau	616c10f608	CLEANUP: server: add server_find_by_addr() Server lookup by address requires locking and manipulation of the tree from user code. Let's provide server_find_by_addr() which does that for us.	2025-07-15 10:30:28 +02:00
Willy Tarreau	fda04994d9	CLEANUP: server: simplify server_find_by_id() At a few places we're seeing some open-coding of the same function, likely because it looks overkill for what it's supposed to do, due to extraneous tests that are not needed (e.g. check of the backend's PR_CAP_BE etc). Let's just remove all these superfluous tests and inline it so that it feels more suitable for use everywhere it's needed.	2025-07-15 10:30:28 +02:00
Willy Tarreau	61acd15ea8	CLEANUP: server: rename findserver() to server_find_by_name() Now it's more logical and matches what is done in the rest of these functions. server_find() now relies on it.	2025-07-15 10:30:28 +02:00
Willy Tarreau	6ad9285796	CLEANUP: server: rename server_find_by_name() to server_find() This function doesn't just look at the name but also the ID when the argument starts with a '#'. So the name is not correct and explains why this function is not always used when the name only is needed, and why the list-based findserver() is used instead. So let's just call the function "server_find()", and rename its generation-id based cousin "server_find_unique()".	2025-07-15 10:30:28 +02:00
Willy Tarreau	5e78ab33cd	MINOR: server: use the tree to look up the server name in findserver() Let's just use the tree-based lookup instead of walking through the list. This function is used to find duplicates in "track" statements and a few such places, so it's important not to waste too much time on large setups.	2025-07-15 10:30:27 +02:00
Willy Tarreau	12a6a3bb3f	REORG: server: move findserver() from proxy.c to server.c The reason this function was overlooked is that it had mostly equivalent ones in server.c, let's move them together.	2025-07-15 10:30:27 +02:00
Valentine Krasnobaeva	0c63883be1	MINOR: debug: add distro name and version in postmortem Since 2012, systemd compliant distributions contain /etc/os-release file. This file has some standardized format, see details at https://www.freedesktop.org/software/systemd/man/latest/os-release.html. Let's read it in feed_post_mortem_linux() to gather more info about the distribution. (cherry picked from commit f1594c41368baf8f60737b229e4359fa7e1289a9) Signed-off-by: Willy Tarreau <w@1wt.eu>	2025-07-11 11:48:19 +02:00
Ilia Shipitsin	0ee3d739b8	CLEANUP: assorted typo fixes in the code, commits and doc Corrected various spelling and phrasing errors to improve clarity and consistency.	2025-07-10 19:49:48 +02:00
Christopher Faulet	187ae28cf4	MINOR: h1-htx: Add function to format an HTX message in its H1 representation The function h1_format_htx_msg() can now be used to convert a valid HTX message in its H1 representation. No validity test is performed, the HTX message must be valid. Only trailers are silently ignored if the message is not chunked. In addition, the destination buffer must be empty. 1XX interim responses should be supported. But again, there is no validity tests.	2025-07-10 10:29:49 +02:00
Christopher Faulet	25b0625d5c	BUG/MEDIUM: http-client: Drain the request if an early response is received When a large request is sent, it is possible to have a response before the end of the request. It is valid from HTTP perspective but it is an issue with the current design of the http-client. Indded, the request and the response are handled sequentially. So the response will be blocked, waiting for the end of the request. Most of time, it is not an issue, except when the request transfer is blocked. In that case, the applet is blocked. With the current API, it is not possible to handle early response and continue the request transfer. So, this case cannot be handle. In that case, it seems reasonnable to drain the request if a response is received. This way, the request transfer, from the caller point of view, is never blocked and the response can be properly processed. To do so, the action flag HTTPCLIENT_FA_DRAIN_REQ is added to the http-client. When it is set, the request payload is just dropped. In that case, we take care to not report the end of input to properly report the request was truncated, especially in logs. It is only an issue with large POSTs, when the payload is streamed. This patch must be backported as far as 2.6.	2025-07-09 16:27:24 +02:00
Frederic Lecaille	45ac235baa	BUG/MEDIUM: quic: Crash after QUIC server callbacks restoration (OpenSSL 3.5) Revert this patch which is no more useful since OpenSSL 3.5.1 to remove the QUIC server callback restoration after SSL context switch: MINOR: quic: OpenSSL 3.5 internal QUIC custom extension for transport parameters reset It was required for 3.5.0. That said, there was no CI for OpenSSL 3.5 at the date of this commit. The CI recently revealed that the QUIC server side could crash during QUIC reg tests just after having restored the callbacks as implemented by the commit above. Also revert this commit which is no more useful because it arrived with the commit above: BUG/MEDIUM: quic: SSL/TCP handshake failures with OpenSSL 3. Must be backported to 3.2.	2025-07-09 16:01:02 +02:00
Frederic Lecaille	c01eb1040e	MINOR: quic: Prevent QUIC build with OpenSSL 3.5 new QUIC API version < 3.5.1 The QUIC listener part was impacted by the 3.5.0 OpenSSL new QUIC API with several issues which have been fixed by 3.5.1. Add a #error to prevent such OpenSSL 3.5 new QUIC API use with version below 3.5.1. Must be backported to 3.2.	2025-07-09 16:01:02 +02:00
Willy Tarreau	95cf518bfa	BUG/MINOR: resolvers: don't lower the case of binary DNS format The server's "hostname_dn" is in Domain Name format, not a pure string, as converted by resolv_str_to_dn_label(). It is made of lower-case string components delimited by binary lengths, e.g. <0x03>www<0x07>haproxy<0x03)org. As such it must not be lowercased again in srv_state_srv_update(), because 1) it's useless on the name components since already done, and 2) because it would replace component lengths 97 and above by 32-char shorter ones. Granted, not many domain names have that large components so the risk is very low but the operation is always wrong anyway. This was brought in 2.5 by commit `3406766d57` ("MEDIUM: resolvers: add a ref between servers and srv request or used SRV record"). In the same vein, let's fix the confusing strcasecmp() that are applied to this binary format, and use memcmp() instead. Here there's basically no risk to incorrectly match the wrong record, but that test alone is confusing enough to provoke the existence of the bug above. Finally let's update the component for that field to mention that it's in this format and already lower cased. Better not backport this, the risk of facing this bug is almost zero, and every time we touch such files something breaks for bad reasons.	2025-07-08 07:54:45 +02:00
Frederic Lecaille	5a87f4673a	MINOR: quic: Prevent QUIC backend use with the OpenSSL QUIC compatibility module (USE_OPENSS_COMPAT) Make the server line parsing fail when a QUIC backend is configured if haproxy is built to use the OpenSSL stack compatibility module. This latter does not support the QUIC client part.	2025-07-07 14:13:02 +02:00
Frederic Lecaille	6aebca7f2c	BUG/MINOR: quic: Missing TLS 1.3 QUIC cipher suites and groups inits (OpenSSL 3.5 QUIC API) This bug impacts both QUIC backends and frontends with OpenSSL 3.5 as QUIC API. The connections to a haproxy QUIC listener from a haproxy QUIC backend could not work at all without HelloRetryRequest TLS messages emitted by the backend asking the QUIC client to restart the handshake followed by TLS alerts: conn. @(nil) OpenSSL error[0xa000098] read_state_machine: excessive message size Furthermore, the Initial CRYPTO data sent by the client were big (about two 1252 bytes packets) (ClientHello TLS message). After analyzing the packets a key_share extension with <unknown> as value was long (more that 1Ko). This extension is in relation with the groups but does not belong to the groups supported by QUIC. That said such connections could work with ngtcp2 as backend built against the same OSSL TLS stack API but with a HelloRetryRequest. ngtcp2 always set the QUIC default cipher suites and group, for all the stacks it supports as implemented by this patch. So this patch configures both QUIC backend and frontend cipher suites and groups calling SSL_CTX_set_ciphersuites() and SSL_CTX_set1_groups_list() with the correct argument, except for SSL_CTX_set1_groups_list() which fails with QUIC TLS for a unknown reason at this time. The call to SSL_CTX_set_options() is useless from ssl_quic_initial_ctx() for the QUIC clients. One relies on ssl_sock_prepare_srv_ssl_ctx() to set them for now on. This patch is effective for all the supported stacks without impact for AWS-LC, and QUIC TLS and fixes the connections for haproxy QUIC frontend and backends when builts against OpenSSL 3.5 QUIC API). A new define HAVE_OPENSSL_QUICTLS has been added to openssl-compat.h to distinguish the QUIC TLS stack. Must be backported to 3.2.	2025-07-07 14:13:02 +02:00
Willy Tarreau	573143e0c8	MINOR: pattern: add a counter of added/freed patterns Patterns are allocated when loading maps/acls from a file or dynamically via the CLI, and are released only from the CLI (e.g. "clear map xxx"). These ones do not use pools and are much harder to monitor, e.g. in case a script adds many and forgets to clear them, etc. Let's add a new pair of metrics "PatternsAdded" and "PatternsFreed" that will report the number of added and freed patterns respectively. This can allow to simply graph both. The difference between the two normally represents the number of allocated patterns. If Added grows without Freed following, it can indicate a faulty script that doesn't perform the needed cleanup. The metrics are also made available to Prometheus as patterns_added_total and patterns_freed_total respectively.	2025-07-05 00:12:45 +02:00
Remi Tricot-Le Breton	a075d6928a	CLEANUP: ssl: Rename ssl_trace-t.h to ssl_trace.h This header does not actually contain any structures so it's best to remove the '-t' from the name for better consistency.	2025-07-04 15:21:50 +02:00
Christopher Faulet	5232df57ab	MINOR: proto-tcp: Add support for TCP MD5 signature for listeners and servers This patch adds the support for the RFC2385 (Protection of BGP Sessions via the + TCP MD5 Signature Option) for the listeners and the servers. The feature is only available on Linux. Keywords are not exposed otherwise. By setting "tcp-md5sig <password>" option on a bind line, TCP segments of all connections instantiated from the listening socket will be signed with a 16-byte MD5 digest. The same option can be set on a server line to protect outgoing connections to the corresponding server. The primary use case for this option is to allow BGP to protect itself against the introduction of spoofed TCP segments into the connection stream. But it can be useful for any very long-lived TCP connections. A reg-test was added and it will be executed only on linux. All other targets are excluded.	2025-07-03 15:25:40 +02:00
William Lallemand	3e05e20029	MEDIUM: httpclient: implement a way to use directly htx data Add a HTTPCLIENT_O_RES_HTX flag which allow to store directly the HTX data in the response buffer instead of extracting the data in raw format. This is useful when the data need to be reused in another request.	2025-07-01 16:31:47 +02:00
William Lallemand	2f4219ed68	MEDIUM: httpclient: split the CLI from the actual httpclient API This patch split the httpclient code to prevent confusion between the httpclient CLI command and the actual httpclient API. Indeed there was a confusion between the flag used internally by the CLI command, and the actual httpclient API. hc_cli_* functions as well as HC_C_F_* defines were moved to httpclient_cli.c.	2025-07-01 15:46:04 +02:00
William Lallemand	519abefb57	BUG/MINOR: httpclient: wrongly named httpproxy flag The HC_F_HTTPPROXY flag was wrongly named and does not use the correct value, indeed this flag was meant to be used for the httpclient API, not the httpclient CLI. This patch fixes the problem by introducing HTTPCLIENT_FO_HTTPPROXY which has must be set in hc->flags. Also add a member 'options' in the httpclient structure, because the member flags is reinitialized when starting. Must be backported as far as 3.0.	2025-07-01 14:47:52 +02:00
Aurelien DARRAGON	747a812066	MEDIUM: stats: add persistent state to typed output format Add a fourth character to the second column of the "typed output format" to indicate whether the value results from a volatile or persistent metric ('V' or 'P' characters respectively). A persistent metric means the value could possibily be preserved across reloads by leveraging a shared memory between multiple co-processes. Such metrics are identified as "shared" in the code (since they are possibly shared between multiple co-processes) Some reg-tests were updated to take that change into account, also, some outputs in the configuration manual were updated to reflect current behavior.	2025-07-01 14:15:03 +02:00
Remi Tricot-Le Breton	522bca98e1	MAJOR: jwt: Allow certificate instead of public key in jwt_verify converter The 'jwt_verify' converter could only be passed public keys as second parameter instead of full-on public certificates. This patch allows proper certificates to be used. Those certificates can be loaded in ckch_stores like any other certificate which means that all the certificate-related operations that can be made via the CLI can now benefit JWT validation as well. We now have two ways JWT validation can work, the legacy one which only relies on public keys which could not be stored in ckch_stores without some in depth changes in the way the ckch_stores are built. In this legacy way, the public keys are fully stored in a cache dedicated to JWT only which does not have any CLI commands and any way to update them during runtime. It also requires that all the public keys used are passed at least once explicitely to the 'jwt_verify' converter so that they can be loaded during init. The new way uses actual certificates, either already stored in the ckch_store tree (if predefined in a crt-store or already used previously in the configuration) or loaded in the ckch_store tree during init if they are explicitely used in the configuration like so: var(txn.bearer),jwt_verify(txn.jwt_alg,"cert.pem") When using a variable (or any other way that can only be resolved during runtime) in place of the converter's <key> parameter, the first time we encounter a new value (for which we don't have any entry in the jwt tree) we will lock the ckch_store tree and try to perform a lookup in it. If the lookup fails, an entry will still be inserted into the jwt tree so that any following call with this value avoids performing the ckch_store tree lookup.	2025-06-30 17:59:55 +02:00
Remi Tricot-Le Breton	cd89ce1766	MINOR: jwt: Rename pkey to pubkey in jwt_cert_tree_entry struct Rename the jwt_cert_tree_entry member pkey to pubkey to avoid any confusion between private and public key.	2025-06-30 17:59:55 +02:00
Christopher Faulet	a2a142bf40	BUG/MEDIUM: hlua: Forbid any L6/L7 sample fetche functions from lua services It was already forbidden to use HTTP sample fetch functions from lua services. An error is triggered if it happens. However, the error must be extended to any L6/L7 sample fetch functions. Indeed, a lua service is an applet. It totally unexepected for an applet to access to input data in a channel's buffer. These data have not been analyzed yet and are still subject to any change. An applet, lua or not, must never access to "not forwarded" data. Only output data are available. For now, if a lua applet relies on any L6/L7 sampel fetch functions, the behavior is undefined and not consistent. So to fix the issue, hlua flag HLUA_F_MAY_USE_HTTP is renamed to HLUA_F_MAY_USE_CHANNELS_DATA. This flag is used to prevent any lua applet to use L6/L7 sample fetch functions. This patch could be backported to all stable versions.	2025-06-30 16:47:59 +02:00
Aurelien DARRAGON	4fcc9b5572	MINOR: counters: rename last_change counter to last_state_change Since proxy and server struct already have an internal last_change variable and we cannot merge it with the shared counter one, let's rename the last_change counter to be more specific and prevent the mixup between the two. last_change counter is renamed to last_state_change, and unlike the internal last_change, this one is a shared counter so it is expected to be updated by other processes in our back. However, when updating last_state_change counter, we use the value of the server/proxy last_change as reference value.	2025-06-30 16:26:38 +02:00
Aurelien DARRAGON	5b1480c9d4	MEDIUM: proxy: add and use a separate last_change variable for internal use Same motivation as previous commit, proxy last_change is "abused" because it is used for 2 different purposes, one for stats, and the other one for process-local internal use. Let's add a separate proxy-only last_change variable for internal use, and leave the last_change shared (and thread-grouped) counter for statistics.	2025-06-30 16:26:31 +02:00
Aurelien DARRAGON	01dfe17acf	MEDIUM: server: add and use a separate last_change variable for internal use last_change server metric is used for 2 separate purposes. First it is used to report last server state change date for stats and other related metrics. But it is also used internally, including in sensitive paths, such as lb related stuff to take decision or perform computations (ie: in srv_dynamic_maxconn()). Due to last_change counter now being split over thread groups since `16eb0fa` ("MAJOR: counters: dispatch counters over thread groups"), reading the aggregated value has a cost, and we cannot afford to consult last_change value from srv_dynamic_maxconn() anymore. Moreover, since the value is used to take decision for the current process we don't wan't the variable to be updated by another process in our back. To prevent performance regression and sharing issues, let's instead add a separate srv->last_change value, which is not updated atomically (given how rare the updates are), and only serves for places where the use of the aggregated last_change counter/stats (split over thread groups) is too costly.	2025-06-30 16:26:25 +02:00
Aurelien DARRAGON	837762e2ee	MINOR: mailers: warn if mailers are configured but not actually used Now that native mailers configuration is only usable with Lua mailers, Willy noticed that we lack a way to warn the user if mailers were previously configured on an older version but Lua mailers were not loaded, which could trick the user into thinking mailers keep working when transitionning to 3.2 while it is not. In this patch we add the 'core.use_native_mailers_config()' Lua function which should be called in Lua script body before making use of 'Proxy:get_mailers()' function to retrieve legacy mailers configuration from haproxy main config. This way haproxy effectively knows that the native mailers config is actually being used from Lua (which indicates user correctly migrated from native mailers to Lua mailers), else if mailers are configured but not used from Lua then haproxy warns the user about the fact that they will be ignored unless they are used from Lua. (e.g.: using the provided 'examples/lua/mailers.lua' to ease transition)	2025-06-27 16:41:18 +02:00
Frederic Lecaille	194e3bc2d5	MINOR: quic-be: address validation support implementation (RETRY) - Add ->retry_token and ->retry_token_len new quic_conn struct members to store the retry tokens. These objects are allocated by quic_rx_packet_parse() and released by quic_conn_release(). - Add <pool_head_quic_retry_token> new pool for these tokens. - Implement quic_retry_packet_check() to check the integrity tag of these tokens upon RETRY packets receipt. quic_tls_generate_retry_integrity_tag() is called by this new function. It has been modified to pass the address where the tag must be generated - Add <resend> new parameter to quic_pktns_discard(). This function is called to discard the packet number spaces where the already TX packets and frames are attached to. <resend> allows the caller to prevent this function to release the in flight TX packets/frames. The frames are requeued to be resent. - Modify quic_rx_pkt_parse() to handle the RETRY packets. What must be done upon such packets receipt is: - store the retry token, - store the new peer SCID as the DCID of the connection. Note that the peer will modify again its SCID. This is why this SCID is also stored as the ODCID which must be matched with the peer retry_source_connection_id transport parameter, - discard the Initial packet number space without flagging it as discarded and prevent retransmissions calling qc_set_timer(), - modify the TLS cryptographic cipher contexts (RX/TX), - wakeup the I/O handler to send new Initial packets asap. - Modify quic_transport_param_decode() to handle the retry_source_connection_id transport parameter as a QUIC client. Then its caller is modified to check this transport parameter matches with the SCID sent by the peer with the RETRY packet.	2025-06-26 09:48:00 +02:00
Frederic Lecaille	9cb2acd2f2	MINOR: quic-be: add a "CC connection" backend TX buffer pool A QUIC client must be able to close a connection sending Initial packets. But QUIC client Initial packets must always be at least 1200 bytes long. To reduce the memory use of TX buffers of a connection when in "closing" state, a pool was dedicated for this purpose but with a too much reduced TX buffer size (QUIC_MAX_CC_BUFSIZE). This patch adds a "closing state connection" TX buffer pool with the same role for QUIC backends.	2025-06-26 09:48:00 +02:00
William Lallemand	7cb6167d04	MAJOR: mworker: remove program section support This patch removes completely the support for the program section, the parsing of the section as well as the internals in the mworker does not support it anymore. The program section was considered dysfonctional and not fully compatible with the "mworker V3" model. Users that want to run an external program must use their init system. The documentation is cleaned up in another patch.	2025-06-25 16:11:34 +02:00
Remi Tricot-Le Breton	34fc73ba81	MINOR: ssl: Add "renegotiate" server option This "renegotiate" option can be set on SSL backends to allow secure renegotiation. It is mostly useful with SSL libraries that disable secure regotiation by default (such as AWS-LC). The "no-renegotiate" one can be used the other way around, to disable secure renegotation that could be allowed by default. Those two options can be set via "ssl-default-server-options" as well.	2025-06-25 15:23:48 +02:00
Aurelien DARRAGON	5694a98744	MAJOR: mailers: remove native mailers support As mentioned in 2.8 announce on the mailing list [1] and on the wiki [2] native mailers were deprecated and planned for removal in 3.3. Now is the time to drop the legacy code for native mailers which is based on a tcpcheck "hack" and cannot be maintained. Lua mailers should be used as a drop in replacement. Indeed, "mailers" and associated config directives are preserved because mailers config is exposed to Lua, which helps smoothing the transition from native mailers to Lua based ones. As a reminder, to keep mailers configuration working as before without making changes to the config file, simply add the line below to the global section: lua-load examples/lua/mailers.lua mailers.lua script (provided in the git repository, adjust path as needed) may be customized by users familiar with Lua, by default it emulates the behavior of the native (now removed) mailers. [1]: https://www.mail-archive.com/haproxy@formilux.org/msg43600.html [2]: https://github.com/haproxy/wiki/wiki/Breaking-changes	2025-06-24 10:55:58 +02:00
Aurelien DARRAGON	c0f6024854	MINOR: hlua: emit a log instead of an alert for aborted actions due to unavailable yield As reported by Chris Staite in GH #3002, trying to yield from a Lua action during a client disconnect causes the script to be interrupted (which is expected) and an alert to be emitted with the error: "Lua function '%s': yield not allowed". While this error is well suited for cases where the yield is not expected at all (ie: when context doesn't allow it) and results from a yield misuse in the Lua script, it isn't the case when the yield is exceptionnally not available due to an abort or error in the request/response processing. Because of that we raise an alert but the user cannot do anything about it (the script is correct), so it is confusing and polluting the logs. In this patch we introduce the ACT_OPT_FINAL_EARLY flag which is a complementary flag to ACT_OPT_FIRST. This flag is set when the ACT_OPT_FIRST is set earlier than normal (due to error/abort). hlua_action() then checks for this flag to decide whether an error (alert) or a simple log message should be emitted when the yield is not available. It should solve GH #3002. Thanks to Chris Staite (@chrisstaite-menlo) for having reported the issue and suggested a solution.	2025-06-24 10:55:55 +02:00
Amaury Denoyelle	74b95922ef	BUG/MEDIUM: quic: do not release BE quic-conn prior to upper conn For frontend side, quic_conn is only released if MUX wasn't allocated, either due to handshake abort, in which case upper layer is never allocated, or after transfer completion when full conn + MUX layers are already released. On the backend side, initialization is not performed in the same order. Indeed, in this case, connection is first instantiated, the nthe quic_conn is created to execute the handshake, while MUX is still only allocated on handshake completion. As such, it is not possible anymore to free immediately quic_conn on handshake failure. Else, this can cause crash if the connection try to reaccess to its transport layer after quic_conn release. Such crash can easily be reproduced in case of connection error to the QUIC server. Here is an example of an experienced backtrace. Thread 1 "haproxy" received signal SIGSEGV, Segmentation fault. 0x0000555555739733 in quic_close (conn=0x55555734c0d0, xprt_ctx=0x5555573a6e50) at src/xprt_quic.c:28 28 qc->conn = NULL; [ ## gdb ## ] bt #0 0x0000555555739733 in quic_close (conn=0x55555734c0d0, xprt_ctx=0x5555573a6e50) at src/xprt_quic.c:28 #1 0x00005555559c9708 in conn_xprt_close (conn=0x55555734c0d0) at include/haproxy/connection.h:162 #2 0x00005555559c97d2 in conn_full_close (conn=0x55555734c0d0) at include/haproxy/connection.h:206 #3 0x00005555559d01a9 in sc_detach_endp (scp=0x7fffffffd648) at src/stconn.c:451 #4 0x00005555559d05b9 in sc_reset_endp (sc=0x55555734bf00) at src/stconn.c:533 #5 0x000055555598281d in back_handle_st_cer (s=0x55555734adb0) at src/backend.c:2754 #6 0x000055555588158a in process_stream (t=0x55555734be10, context=0x55555734adb0, state=516) at src/stream.c:1907 #7 0x0000555555dc31d9 in run_tasks_from_lists (budgets=0x7fffffffdb30) at src/task.c:655 #8 0x0000555555dc3dd3 in process_runnable_tasks () at src/task.c:889 #9 0x0000555555a1daae in run_poll_loop () at src/haproxy.c:2865 #10 0x0000555555a1e20c in run_thread_poll_loop (data=0x5555569d1c00 <ha_thread_info>) at src/haproxy.c:3081 #11 0x0000555555a1f66b in main (argc=5, argv=0x7fffffffde18) at src/haproxy.c:3671 To fix this, change the condition prior to calling quic_conn release. If <conn> member is not NULL, delay the release, similarly to the case when MUX is allocated. This allows connection to be freed first, and detach from quic_conn layer through close xprt operation. No need to backport.	2025-06-20 17:46:10 +02:00
Amaury Denoyelle	06cab99a0e	MINOR: mux-quic: support max bidi streams value set by the peer Implement support for MAX_STREAMS frame. On frontend, this was mostly useless as haproxy would never initiate new bidirectional streams. However, this becomes necessary to control stream flow-control when using QUIC as a client on the backend side. Parsing of MAX_STREAMS is implemented via new qcc_recv_max_streams(). This allows to update <ms_uni>/<ms_bidi> QCC fields. This patch is necessary to achieve QUIC backend connection reuse.	2025-06-18 17:25:27 +02:00
Amaury Denoyelle	805a070ab9	BUG/MINOR: mux-quic/h3: properly handle too low peer fctl initial stream Previously, no check on peer flow-control was implemented prior to open a local QUIC stream. This was a small problem for frontend implementation, as in this case haproxy as a server never opens bidirectional streams. On frontend, the only stream opened by haproxy in this case is for HTTP/3 control unidirectional data. If the peer uses an initial value for max uni streams set to 0, it would violate its flow control, and the peer will probably close the connection. Note however that RFC 9114 mandates that each peer defines minimal initial value so that at least the control stream can be created. This commit improves the situation of too low initial max uni streams value. Now, on HTTP/3 layer initialization, haproxy preemptively checks flow control limit on streams via a new function qcc_fctl_avail_streams(). If credit is already expired due to a too small initial value, haproxy preemptively closes the connection using H3_ERR_GENERAL_PROTOCOL_ERROR. This behavior is better as haproxy is now the initiator of the connection closure. This should be backported up to 2.8.	2025-06-18 17:18:55 +02:00
Amaury Denoyelle	c807182ec9	CLEANUP: connection: remove unused mux-ops dedicated to QUIC Remove avail_streams_bidi/avail_streams_uni mux_ops. These callbacks were designed to be specific to QUIC. However, they won't be necessary, as stream layer only cares about bidirectional streams.	2025-06-18 17:02:50 +02:00
Amaury Denoyelle	555ec99d43	MINOR: h3: adjust auth request encoding or fallback to host Implement proper encoding of HTTP/3 authority pseudo-header during request transcoding on the backend side. A pseudo-header :authority is encoded if a value can be extracted from HTX start-line. A special check is also implemented to ensure that a host header is not encoded if :authority already is. A new function qpack_encode_auth() is defined to implement QPACK encoding of :authority header using literal field line with name ref.	2025-06-16 18:11:09 +02:00
Amaury Denoyelle	235e818fa1	MINOR: h3: complete HTTP/3 request scheme encoding Previously, scheme was always set to https when transcoding an HTX start-line into a HTTP/3 request. Change this so this conversion is now fully compliant. If no scheme is specified by the client, which is what happens most of the time with HTTP/1, https is set for the HTTP/3 request. Else, reuse the scheme requested by the client. If either https or http is set, qpack_encode_scheme will encode it using entry from QPACK static table. Else, a full literal field line with name ref is used instead as the scheme value is specified as-is.	2025-06-16 18:11:09 +02:00
Amaury Denoyelle	a0912cf914	MINOR: h3: complete HTTP/3 request method encoding On the backend side, HTX start-line is converted into a HTTP/3 request message. Previously, GET method was hardcoded. Implement proper method conversion, by extracting it from the HTX start-line. qpack_encode_method() has also been extended, so that it is able to encode any method, either using a static table entry, or with a literal field line with name ref representation.	2025-06-16 18:11:09 +02:00
Amaury Denoyelle	7157adb154	MINOR: h3: support basic HTX start-line conversion into HTTP/3 request This commit is the first one of a serie which aim is to implement transcoding of a HTX request into HTTP/3, which is necessary for QUIC backend support. Transcoding is implementing via a new function h3_req_headers_send() when a HTX start-line is parsed. For now, most of the request fields are hardcoded, using a GET method. This will be adjusted in the next following patches.	2025-06-16 18:11:09 +02:00
Amaury Denoyelle	e8775d51df	MINOR: mux-quic: define flag for backend side Mux connection is flagged with new QC_CF_IS_BACK if used on the backend side. For now the only change is during traces, to be able to differentiate frontend and backend usage.	2025-06-12 11:28:54 +02:00
Amaury Denoyelle	93b904702f	MINOR: mux-quic: improve documentation for snd/rcv app-ops Complete document for rcv_buf/snd_buf operations. In particular, return value is now explicitely defined. For H3 layer, associated functions documentation is also extended.	2025-06-12 11:28:54 +02:00
Frederic Lecaille	b9703cf711	MINOR: quic-be: get rid of ->li quic_conn member Replace ->li quic_conn pointer to struct listener member by ->target which is an object type enum and adapt the code. Use __objt_(listener\|server)() where the object type is known. Typically this is were the code which is specific to one connection type (frontend/backend). Remove <server> parameter passed to qc_new_conn(). It is redundant with the <target> parameter. GSO is not supported at this time for QUIC backend. qc_prep_pkts() is modified to prevent it from building more than an MTU. This has as consequence to prevent qc_send_ppkts() to use GSO. ssl_clienthello.c code is run only by listeners. This is why __objt_listener() is used in place of ->li.	2025-06-11 18:37:34 +02:00
Frederic Lecaille	2d076178c6	MINOR: quic-be: Store asap the DCID Store the peer connection ID (SCID) as the connection DCID as soon as an Initial packet is received. Stop comparing the packet to QUIC_PACKET_TYPE_0RTT is already match as QUIC_PACKET_TYPE_INITIAL. A QUIC server must not send too short datagram with ack-eliciting packets inside. This cannot be done from quic_rx_pkt_parse() because one does not know if there is ack-eliciting frame into the Initial packets. If the packet must be dropped, this is after having parsed it!	2025-06-11 18:37:34 +02:00
Frederic Lecaille	43d88a44f1	MINOR: quic-be: Datagrams and packet parsing support Modify quic_dgram_parse() to stop passing it a listener as third parameter. In place the object type address of the connection socket owner is passed to support the haproxy servers with QUIC as transport protocol. qc_owner_obj_type() is implemented to return this address. qc_counters() is also implemented to return the QUIC specific counters of the proxy of owner of the connection. quic_rx_pkt_parse() called by quic_dgram_parse() is also modify to use the object type address used by this latter as last parameter. It is also modified to send Retry packet only from listeners. A QUIC client (connection to haproxy QUIC servers) must drop the Initial packets with non null token length. It is also not supposed to receive O-RTT packets which are dropped.	2025-06-11 18:37:34 +02:00
Frederic Lecaille	89d5a59933	MINOR: quic-be: add field for max_udp_payload_size into quic_conn Add ->max_udp_payload_size new member to quic_conn struct. Initialize it from qc_new_conn(). Adapt qc_snd_buf() to use it.	2025-06-11 18:37:34 +02:00
Frederic Lecaille	52ec3430f2	MINOR: sock: Add protocol and socket types parameters to sock_create_server_socket() This patch only adds <proto_type> new proto_type enum parameter and <sock_type> socket type parameter to sock_create_server_socket() and adapts its callers. This is to prepare the use of this function by QUIC servers/backends.	2025-06-11 18:37:34 +02:00
Frederic Lecaille	9c84f64652	MINOR: quic-be: Add a function to initialize the QUIC client transport parameters Implement qc_srv_params_init() to initialize the QUIC client transport parameters in relation with connections to haproxy servers/backends.	2025-06-11 18:37:34 +02:00
Frederic Lecaille	f49bbd36b9	MINOR: quic-be: SSL sessions initializations Modify qc_alloc_ssl_sock_ctx() to pass the connection object as parameter. It is NULL for a QUIC listener, not NULL for a QUIC server. This connection object is set as value for ->conn quic_conn struct member. Initialise the SSL session object from this function for QUIC servers. qc_ssl_set_quic_transport_params() is also modified to pass the SSL object as parameter. This is the unique parameter this function needs. <qc> parameter is used only for the trace. SSL_do_handshake() must be calle as soon as the SSL object is initialized for the QUIC backend connection. This triggers the TLS CRYPTO data delivery. tasklet_wakeup() is also called to send asap these CRYPTO data. Modify the QUIC_EV_CONN_NEW event trace to dump the potential errors returned by SSL_do_handshake().	2025-06-11 18:37:34 +02:00
Frederic Lecaille	1408d94bc4	MINOR: quic-be: ssl_sock contexts allocation and misc adaptations Implement ssl_sock_new_ssl_ctx() to allocate a SSL server context as this is currently done for TCP servers and also for QUIC servers depending on the <is_quic> boolean value passed as new parameter. For QUIC servers, this function calls ssl_quic_srv_new_ssl_ctx() which is specific to QUIC.	2025-06-11 18:37:34 +02:00
Frederic Lecaille	1e45690656	MINOR: quic-be: Add a function for the TLS context allocations Implement ssl_quic_srv_new_ssl_ctx() whose aim is to allocate a TLS context for QUIC servers.	2025-06-11 18:37:34 +02:00
Frederic Lecaille	24fc44c44d	MINOR: quic-be: QUIC backend XPRT and transport parameters init during parsing Add ->quic_params new member to server struct. Also set the ->xprt member of the server being initialized and initialize asap its transport parameters from _srv_parse_init().	2025-06-11 18:37:34 +02:00
Frederic Lecaille	990c9f95f7	MINOR: quic-be: Correct Version Information transp. param encoding According to the RFC, a QUIC client must encode the QUIC version it supports into the "Available Versions" of "Version Information" transport parameter order by descending preference. This is done defining <quic_version_2> and <quic_version_draft_29> new variables pointers to the corresponding version of <quic_versions> array elements. A client announces its available versions as follows: v1, v2, draft29.	2025-06-11 18:37:34 +02:00
Amaury Denoyelle	bdd5e58179	MINOR: server: implement helper to identify QUIC servers Define srv_is_quic() which can be used to quickly identified if a server uses QUIC protocol.	2025-06-11 18:37:19 +02:00
Olivier Houchard	6993981cd6	BUG/MEDIUM: fd: Use the provided tgid in fd_insert() to get tgroup_info In fd_insert(), use the provided tgid to ghet the thread group info, instead of using the one of the current thread, as we may call fd_insert() from a thread of another thread group, that will happen at least when binding the listeners. Otherwise we'd end up accessing the thread mask containing enabled thread of the wrong thread group, which can lead to crashes if we're binding on threads not present in the thread group. This should fix Github issue #2991. This should be backported up to 2.8.	2025-06-10 15:10:56 +02:00
Christopher Faulet	18f9c71041	CLEANUP: applet: Simplify a bit comments for applet_put* functions Instead of repeating which buffer is used depending on the API used by the applet, a reference to applet_get_outbuf() was added.	2025-06-10 08:16:10 +02:00
Christopher Faulet	79445766a3	MINOR: applet: Add API functions to get data from the input buffer There was already functions to pushed data from the applet to the stream by inserting them in the right buffer, depending the applet was using or not the legacy API. Here, functions to retreive data pushed to the applet by the stream were added: * applet_getchar : Gets one character * applet_getblk : Copies a full block of data * applet_getword : Copies one text block representing a word using a custom separator as delimiter * applet_getline : Copies one text line * applet_getblk_nc : Get one or two blocks of data * applet_getword_nc: Gets one or two blocks of text representing a word using a custom separator as delimiter * applet_getline_nc: Gets one or two blocks of text representing a line	2025-06-10 08:16:10 +02:00
Christopher Faulet	0d8ecb1edc	MINOR: applet: Add API functions to manipulate input and output buffers In this patch, some functions were added to ease input and output buffers manipulation, regardless the corresponding applet is using its own buffers or it is relying on channels buffers. Following functions were added: * applet_get_inbuf : Get the buffer containing data pushed to the applet by the stream * applet_get_outbuf : Get the buffer containing data pushed by the applet to the stream * applet_input_data : Return the amount of data in the input buffer * applet_skip_input : Skips <len> bytes from the input buffer * applet_reset_input: Skips all bytes from the input buffer * applet_output_room: Returns the amout of space available at the output buffer * applet_need_room : Indicates that the applet have more data to deliver and it needs more room in the output buffer to do so	2025-06-10 08:16:10 +02:00
Aurelien DARRAGON	16eb0fab31	MAJOR: counters: dispatch counters over thread groups Most fe and be counters are good candidates for being shared between processes. They are now grouped inside "shared" struct sub member under be_counters and fe_counters. Now they are properly identified, they would greatly benefit from being shared over thread groups to reduce the cost of atomic operations when updating them. For this, we take the current tgid into account so each thread group only updates its own counters. For this to work, it is mandatory that the "shared" member from {fe,be}_counters is initialized AFTER global.nbtgroups is known, because each shared counter causes the stat to be allocated lobal.nbtgroups times. When updating a counter without concurrency, the first counter from the array may be updated. To consult the shared counters (which requires aggregation of per-tgid individual counters), some helper functions were added to counter.h to ease code maintenance and avoid computing errors.	2025-06-05 09:59:38 +02:00
Aurelien DARRAGON	12c3ffbb48	MINOR: counters: add local-only internal rates to compute some maxes cps_max (max new connections received per second), sps_max (max new sessions per second) and http.rps_max (maximum new http requests per second) all rely on shared counters (namely conn_per_sec, sess_per_sec and http.req_per_sec). The problem is that shared counters are about to be distributed over thread groups, and we cannot afford to compute the total (for all thread groups) each time we update the max counters. Instead, since such max counters (relying on shared counters) are a very few exceptions, let's add internal (sess,conn,req) per sec freq counters that are dedicated to cps_max, sps_max and http.rps_max computing. Thanks to that, related *_max counters shouldn't be negatively impacted by the thread-group distribution, yet they will not benefit from it either. Related internal freq counters are prefixed with "_" to emphasize the fact that they should not be used for other purpose (the shared ones, which are about to be distributed over thread groups in upcoming commits are still available and must be used instead). The internal ones could eventually be removed at any time if we find another way to compute the {cps,sps,http.rps)_max counters.	2025-06-05 09:59:31 +02:00
Aurelien DARRAGON	b72a8bb138	CLEANUP: counters: merge some common counters between {fe,be}_counters_shared Now that we have a common struct between fe and be shared counters struct let's perform some cleanup to merge duplicate members into the common struct part. This will ease code maintenance.	2025-06-05 09:59:24 +02:00
Aurelien DARRAGON	b599138842	MEDIUM: counters: manage shared counters using dedicated helpers proxies, listeners and server shared counters are now managed via helpers added in one of the previous commits. When guid is not set (ie: when not yet assigned), shared counters pointer is allocated using calloc() (local memory) and a flag is set on the shared counters struct to know how to manipulate (and free it). Else if guid is set, then it means that the counters may be shared so while for now we don't actually use a shared memory location the API is ready for that. The way it works, for proxies and servers (for which guid is not known during creation), we first call counters_{fe,be}_shared_get with guid not set, which results in local pointer being retrieved (as if we just manually called calloc() to retrieve a pointer). Later (during postparsing) if guid is set we try to upgrade the pointer from local to shared. Lastly, since the memory location for some objects (proxies and servers counters) may change from creation to postparsing, let's update counters->last_change member directly under counters_{fe,be}_shared_get() so we don't miss it. No change of behavior is expected, this is only preparation work.	2025-06-05 09:59:17 +02:00
Aurelien DARRAGON	c10ce1c85b	MINOR: counters: add common struct and flags to {fe,be}_counters_shared fe_counters_shared and be_counters_shared may share some common members since they are quite similar, so we add a common struct part shared between the two. struct counters_shared is added for convenience as a generic pointer to manipulate common members from fe or be shared counters pointer. Also, the first common member is added: shared fe and be counters now have a flags member.	2025-06-05 09:59:10 +02:00
Aurelien DARRAGON	aa53887398	MINOR: counters: add shared counters helpers to get and drop shared pointers create include/haproxy/counters.h and src/counters.c files to anticipate for further helpers as some counters specific tasks needs to be carried out and since counters are shared between multiple object types (ie: listener, proxy, server..) we need generic helpers. Add some shared counters helper which are not yet used but will be updated in upcoming commits.	2025-06-05 09:59:04 +02:00
Aurelien DARRAGON	a0dcab5c45	MAJOR: counters: add shared counters base infrastructure Shareable counters are not tagged as shared counters and are dynamically allocated in separate memory area as a prerequisite for being stored in shared memory area. For now, GUID and threads groups are not taken into account, this is only a first step. also we ensure all counters are now manipulated using atomic operations, namely, "last_change" counter is now read from and written to using atomic ops. Despite the numerous changes caused by the counters being moved away from counters struct, no change of behavior should be expected.	2025-06-05 09:58:58 +02:00
Christopher Faulet	8ee650a88b	CLEANUP: applet: Update comment for applet_put* functions These functions were copied from the channel API and modified to work with applets using the new API or the legacy one. However, the comments were updated accordingly. It is the purpose of this patch.	2025-06-03 15:03:30 +02:00
Aurelien DARRAGON	368d01361a	MEDIUM: server: add and use srv_init() function rename _srv_postparse() internal function to srv_init() function and group srv_init_per_thr() plus idle conns list init inside it. This way we can perform some simplifications as srv_init() performs multiple server init steps after parsing. SRV_F_CHECKED flag was added, it is automatically set when srv_init() runs successfully. If the flag is already set and srv_init() is called again, nothing is done. This permis to manually call srv_init() earlier than the default POST_CHECK hook when needed without risking to do things twice.	2025-06-02 17:51:33 +02:00
Aurelien DARRAGON	889ef6f67b	MEDIUM: server: automatically add server to proxy list in new_server() while new_server() takes the parent proxy as argument and even assigns srv->proxy to the parent proxy, it didn't actually inserted the server to the parent proxy server list on success. The result is that sometimes we add the server to the list after new_server() is called, and sometimes we don't. This is really error-prone and because of that hooks such as REGISTER_POST_SERVER_CHECK() which as run for all servers listed in all proxies may not be relied upon for servers which are not actually inserted in their parent proxy server list. Plus it feels very strange to have a server that points to a proxy, but then the proxy doesn't know about it because it cannot find it in its server list. To prevent errors and make proxy->srv list reliable, we move the insertion logic directly under new_server(). This requires to know if we are called during parsing or during runtime to either insert or append the server to the parent proxy list. For that we use PR_FL_CHECKED flag from the parent proxy (if the flag is set, then the proxy was checked so we are past the init phase, thus we assume we are called during runtime) This implies that during startup if new_server() has to be cancelled on error paths we need to call srv_detach() (which is now exposed in server.h) before srv_drop(). The consequence of this commit is that REGISTER_POST_SERVER_CHECK() should not run reliably on all servers created using new_server() (without having to manually loop on global servers_list)	2025-06-02 17:51:30 +02:00
Aurelien DARRAGON	943958c3ff	MINOR: proxy: add a true list containing all proxies We have global proxies_list pointer which is announced as the list of "all existing proxies", but in fact it only represents regular proxies declared on the config file through "listen, frontend or backend" keywords It is ambiguous, and we currently don't have a straightforwrd method to iterate over all proxies (either public or internal ones) within haproxy Instead we still have to manually iterate over multiple lists (main proxies, log-forward proxies, peer proxies..) which is error-prone. In this patch we add a struct list member (8 bytes) inside struct proxy in order to store every proxy (except default ones) within a global "proxies" list which is actually representative for all proxies existing under haproxy process, like we already have for servers.	2025-06-02 17:51:21 +02:00
Aurelien DARRAGON	d04843167c	MINOR: stats: add stat_col flags Add stat_col flags member to store .generic bit and prepare for upcoming flags. No functional change expected.	2025-06-02 17:51:08 +02:00
Willy Tarreau	9f4cd435d3	[RELEASE] Released version 3.3-dev0 Released version 3.3-dev0 with the following main changes : - MINOR: version: mention that it's development again	2025-05-28 16:46:34 +02:00
Willy Tarreau	8809251ee0	MINOR: version: mention that it's development again This essentially reverts `a6458fd426`.	2025-05-28 16:46:15 +02:00
Willy Tarreau	a6458fd426	MINOR: version: mention that it's 3.2 LTS now. The version will be maintained up to around Q2 2030. Let's also update the INSTALL file to mention this.	2025-05-28 16:31:27 +02:00
Christopher Faulet	99e755d673	MINOR: listeners: Add support for a label on bind line It is now possile to set a label on a bind line. All sockets attached to this bind line inherits from this label. The idea is to be able to groud of sockets. For now, there is no mechanism to create these groups, this must be done by hand.	2025-05-26 19:00:00 +02:00
Willy Tarreau	3494775a1f	MINOR: ssl: support strict-sni in ssl-default-bind-options Several users already reported that it would be nice to support strict-sni in ssl-default-bind-options. However, in order to support it, we also need an option to disable it. This patch moves the setting of the option from the strict_sni field to a flag in the ssl_options field so that it can be inherited from the default bind options, and adds a new "no-strict-sni" directive to allow to disable it on a specific "bind" line. The test file "del_ssl_crt-list.vtc" which already tests both options was updated to make use of the default option and the no- variant to confirm everything continues to work.	2025-05-22 15:31:54 +02:00
Willy Tarreau	a1577a89a0	MINOR: glitches: add global setting "tune.glitches.kill.cpu-usage" It was mentioned during the development of glitches that it would be nice to support not killing misbehaving connections below a certain CPU usage so that poor implementations that routinely misbehave without impact are not killed. This is now possible by setting a CPU usage threshold under which we don't kill them via this parameter. It defaults to zero so that we continue to kill them by default.	2025-05-21 15:47:42 +02:00
Amaury Denoyelle	00d90e8839	MINOR: quic: adjust quic_conn-t.h include list Adjust include list in quic_conn-t.h. This file is included in many QUIC source, so it is useful to keep as lightweight as possible. Note that connection/QUIC MUX are transformed into forward declaration for better layer separation.	2025-05-21 14:44:27 +02:00
Amaury Denoyelle	01e3b2119a	MINOR: quic: add some missing includes Insert some missing includes statement in QUIC source files. This was detected after the next commit which adjust the include list used in quic_conn-t.h file.	2025-05-21 14:44:27 +02:00
Amaury Denoyelle	f286288471	MINOR: quic: refactor handling of streams after MUX release quic-conn layer has to handle itself STREAM frames after MUX release. If the stream was already seen, it is probably only a retransmitted frame which can be safely ignored. For other streams, an active closure may be needed. Thus it's necessary that quic-conn layer knows the highest stream ID already handled by the MUX after its release. Previously, this was done via <nb_streams> member array in quic-conn structure. Refactor this by replacing <nb_streams> by two members called <stream_max_uni>/<stream_max_bidi>. Indeed, it is unnecessary for quic-conn layer to monitor locally opened uni streams, as the peer cannot by definition emit a STREAM frame on it. Also, bidirectional streams are always opened by the remote side. Previously, <nb_streams> were set by quic-stream layer. Now, <stream_max_uni>/<stream_max_bidi> members are only set one time, just prior to QUIC MUX release. This is sufficient as quic-conn do not use them if the MUX is available. Note that previously, IDs were used relatively to their type, thus incremented by 1, after shifting the original value. For simplification, use the plain stream ID, which is incremented by 4.	2025-05-21 14:26:45 +02:00
Amaury Denoyelle	07d41a043c	MINOR: quic: move function to check stream type in utils Move general function to check if a stream is uni or bidirectional from QUIC MUX to quic_utils module. This should prevent unnecessary include of QUIC MUX header file in other sources.	2025-05-21 14:17:41 +02:00
Amaury Denoyelle	cf45bf1ad8	CLEANUP: quic: remove unused cbuf module Cbuf are not used anymore. Remove the related source and header files, as well as include statements in the rest of QUIC source files.	2025-05-21 14:16:37 +02:00
Frederic Lecaille	b3ac1a636c	MINOR: quic: implement all remaining callbacks for OpenSSL 3.5 QUIC API The quic_conn struct is modified for two reasons. The first one is to store the encoded version of the local tranport parameter as this is done for USE_QUIC_OPENSSL_COMPAT. Indeed, the local transport parameter "should remain valid until after the parameters have been sent" as mentionned by SSL_set_quic_tls_cbs(3) manual. In our case, the buffer is a static buffer attached to the quic_conn object. qc_ssl_set_quic_transport_params() function whose role is to call SSL_set_tls_quic_transport_params() (aliased by SSL_set_quic_transport_params() to set these local tranport parameter into the TLS stack from the buffer attached to the quic_conn struct. The second quic_conn struct modification is the addition of the new ->prot_level (SSL protection level) member added to the quic_conn struct to store "the most recent write encryption level set via the OSSL_FUNC_SSL_QUIC_TLS_yield_secret_fn callback (if it has been called)" as mentionned by SSL_set_quic_tls_cbs(3) manual. This patches finally implements the five remaining callacks to make the haproxy QUIC implementation work. OSSL_FUNC_SSL_QUIC_TLS_crypto_send_fn() (ha_quic_ossl_crypto_send) is easy to implement. It calls ha_quic_add_handshake_data() after having converted qc->prot_level TLS protection level value to the correct ssl_encryption_level_t (boringSSL API/quictls) value. OSSL_FUNC_SSL_QUIC_TLS_crypto_recv_rcd_fn() (ha_quic_ossl_crypto_recv_rcd()) provide the non-contiguous addresses to the TLS stack, without releasing them. OSSL_FUNC_SSL_QUIC_TLS_crypto_release_rcd_fn() (ha_quic_ossl_crypto_release_rcd()) release these non-contiguous buffer relying on the fact that the list of encryption level (qc->qel_list) is correctly ordered by SSL protection level secret establishements order (by the TLS stack). OSSL_FUNC_SSL_QUIC_TLS_yield_secret_fn() (ha_quic_ossl_got_transport_params()) is a simple wrapping function over ha_quic_set_encryption_secrets() which is used by boringSSL/quictls API. OSSL_FUNC_SSL_QUIC_TLS_got_transport_params_fn() (ha_quic_ossl_got_transport_params()) role is to store the peer received transport parameters. It simply calls quic_transport_params_store() and set them into the TLS stack calling qc_ssl_set_quic_transport_params(). Also add some comments for all the OpenSSL 3.5 QUIC API callbacks. This patch have no impact on the other use of QUIC API provided by the others TLS stacks.	2025-05-20 15:00:06 +02:00
Frederic Lecaille	dc6a3c329a	MINOR: quic: Allow the use of the new OpenSSL 3.5.0 QUIC TLS API (to be completed) This patch allows the use of the new OpenSSL 3.5.0 QUIC TLS API when it is available and detected at compilation time. The detection relies on the presence of the OSSL_FUNC_SSL_QUIC_TLS_CRYPTO_SEND macro from openssl-compat.h. Indeed this macro is defined by OpenSSL since 3.5.0 version. It is not defined by quictls. This helps in distinguishing these two TLS stacks. When the detection succeeds, HAVE_OPENSSL_QUIC is also defined by openssl-compat.h. Then, this is this new macro which is used to detect the availability of the new OpenSSL 3.5.0 QUIC TLS API. Note that this detection is done only if USE_QUIC_OPENSSL_COMPAT is not asked. So, USE_QUIC_OPENSSL_COMPAT and HAVE_OPENSSL_QUIC are exclusive. At the same location, from openssl-compat.h, ssl_encryption_level_t enum is defined. This enum was defined by quictls and expansively used by the haproxy QUIC implementation. SSL_set_quic_transport_params() is replaced by SSL_set_quic_tls_transport_params. SSL_set_quic_early_data_enabled() (quictls) is also replaced by SSL_set_quic_tls_early_data_enabled() (OpenSSL). SSL_quic_read_level() (quictls) is not defined by OpenSSL. It is only used by the traces to log the current TLS stack decryption level (read). A macro makes it return -1 which is an usused values. The most of the differences between quictls and OpenSSL QUI APIs are in quic_ssl.c where some callbacks must be defined for these two APIs. This is why this patch modifies quic_ssl.c to define an array of OSSL_DISPATCH structs: <ha_quic_dispatch>. Each element of this arry defines a callback. So, this patch implements these six callabcks: - ha_quic_ossl_crypto_send() - ha_quic_ossl_crypto_recv_rcd() - ha_quic_ossl_crypto_release_rcd() - ha_quic_ossl_yield_secret() - ha_quic_ossl_got_transport_params() and - ha_quic_ossl_alert(). But at this time, these implementations which must return an int return 0 interpreted as a failure by the OpenSSL QUIC API, except for ha_quic_ossl_alert() which is implemented the same was as for quictls. The five remaining functions above will be implemented by the next patches to come. ha_quic_set_encryption_secrets() and ha_quic_add_handshake_data() have been moved to be defined for both quictls and OpenSSL QUIC API. These callbacks are attached to the SSL objects (sessions) calling qc_ssl_set_cbs() new function. This latter callback the correct function to attached the correct callbacks to the SSL objects (defined by <ha_quic_method> for quictls, and <ha_quic_dispatch> for OpenSSL). The calls to SSL_provide_quic_data() and SSL_process_quic_post_handshake() have been also disabled. These functions are not defined by OpenSSL QUIC API. At this time, the functions which call them are still defined when HAVE_OPENSSL_QUIC is defined.	2025-05-20 15:00:06 +02:00
Willy Tarreau	411b04c7d3	IMPORT: slz: use a better hash for machines with a fast multiply The current hash involves 3 simple shifts and additions so that it can be mapped to a multiply on architecures having a fast multiply. This is indeed what the compiler does on x86_64. A large range of values was scanned to try to find more optimal factors on machines supporting such a fast multiply, and it turned out that new factor 0x1af42f resulted in smoother hashes that provided on average 0.4% better compression on both the Silesia corpus and an mbox file composed of very compressible emails and uncompressible attachments. It's even slightly better than CRC32C while being faster on Skylake. This patch enables this factor on archs with a fast multiply. This is slz upstream commit 82ad1e75c13245a835c1c09764c89f2f6e8e2a40.	2025-05-16 16:43:53 +02:00
Willy Tarreau	0a91c6dcae	BUILD: debug: mark ha_crash_now() as attribute(noreturn) Building on MIPS64 with clang16 incorrectly reports some uninitialized value warnings in stats-proxy.c due to some calls to ABORT_NOW() where the compiler didn't know the code wouldn't return. Let's properly mark the function as noreturn, and take this opportunity for also marking it unused to avoid possible warnings depending on the build options (if ABORT_NOW is not used). No backport needed though it will not harm.	2025-05-16 16:43:53 +02:00
Christopher Faulet	f45a632bad	BUG/MEDIUM: stconn: Disable 0-copy forwarding for filters altering the payload It is especially a problem with Lua filters, but it is important to disable the 0-copy forwarding if a filter alters the payload, or at least to be able to disable it. While the filter is registered on the data filtering, it is not an issue (and it is the common case) because, there is now way to fast-forward data at all. But it may be an issue if a filter decides to alter the payload and to unregister from data filtering. In that case, the 0-copy forwarding can be re-enabled in a hardly precdictable state. To fix the issue, a SC flags was added to do so. The HTTP compression filter set it and lua filters too if the body length is changed (via HTTPMessage.set_body_len()). Note that it is an issue because of a bad design about the HTX. Many info about the message are stored in the HTX structure itself. It must be refactored to move several info to the stream-endpoint descriptor. This should ease modifications at the stream level, from filter or a TCP/HTTP rules. This should be backported as far as 3.0. If necessary, it may be backported on lower versions, as far as 2.6. In that case, it must be reviewed and adapted.	2025-05-16 15:11:37 +02:00
Christopher Faulet	a3940614c2	BUG/MEDIUM: mux-spop: Remove frame parsing states from the SPOP connection state SPOP_CS_FRAME_H and SPOP_CS_FRAME_P states, that were used to handle frame parsing, were removed. The demux process now relies on the demux stream ID to know if it is waiting for the frame header or the frame payload. Concretly, when the demux stream ID is not set (dsi == -1), the demuxer is waiting for the next frame header. Otherwise (dsi >= 0), it is waiting for the frame payload. It is especially important to be able to properly handle DISCONNECT frames sent by the agents. SPOP_CS_RUNNING state is introduced to know the hello handshake was finished and the SPOP connection is able to open SPOP streams and exchange NOTIFY/ACK frames with the agents. It depends on the following fixes: * MINOR: mux-spop: Don't set SPOP connection state to FRAME_H after ACK parsing * BUG/MINOR: mux-spop: Make the demux stream ID a signed integer This change will be mandatory for the next fix. It must be backported to 3.1 with the commits above.	2025-05-13 19:51:40 +02:00
Willy Tarreau	e049bd00ab	MEDIUM: config: change default limits to 1024 threads and 32 groups A test run on a dual-socket EPYC 9845 (2x160 cores) showed that we'll be facing new limits during the lifetime of 3.2 with our current 16 groups and 256 threads max: $ cat test.cfg global cpu-policy perforamnce $ ./haproxy -dc -c -f test.cfg ... Thread CPU Bindings: Tgrp/Thr Tid CPU set 1/1-32 1-32 32: 0-15,320-335 2/1-32 33-64 32: 16-31,336-351 3/1-32 65-96 32: 32-47,352-367 4/1-32 97-128 32: 48-63,368-383 5/1-32 129-160 32: 64-79,384-399 6/1-32 161-192 32: 80-95,400-415 7/1-32 193-224 32: 96-111,416-431 8/1-32 225-256 32: 112-127,432-447 Raising the default limit to 1024 threads and 32 groups is sufficient to buy us enough margin for a long time (hopefully, please don't laugh, you, reader from the future): $ ./haproxy -dc -c -f test.cfg ... Thread CPU Bindings: Tgrp/Thr Tid CPU set 1/1-32 1-32 32: 0-15,320-335 2/1-32 33-64 32: 16-31,336-351 3/1-32 65-96 32: 32-47,352-367 4/1-32 97-128 32: 48-63,368-383 5/1-32 129-160 32: 64-79,384-399 6/1-32 161-192 32: 80-95,400-415 7/1-32 193-224 32: 96-111,416-431 8/1-32 225-256 32: 112-127,432-447 9/1-32 257-288 32: 128-143,448-463 10/1-32 289-320 32: 144-159,464-479 11/1-32 321-352 32: 160-175,480-495 12/1-32 353-384 32: 176-191,496-511 13/1-32 385-416 32: 192-207,512-527 14/1-32 417-448 32: 208-223,528-543 15/1-32 449-480 32: 224-239,544-559 16/1-32 481-512 32: 240-255,560-575 17/1-32 513-544 32: 256-271,576-591 18/1-32 545-576 32: 272-287,592-607 19/1-32 577-608 32: 288-303,608-623 20/1-32 609-640 32: 304-319,624-639 We can change this default now because it has no functional effect without any configured cpu-policy, so this will only be an opt-in and it's better to do it now than to have an effect during the maintenance phase. A tiny effect is a doubling of the number of pool buckets and stick-table shards internally, which means that aside slightly reducing contention in these areas, a dump of tables can enumerate keys in a different order (hence the adjustment in the vtc). The only really visible effect is a slightly higher static memory consumption (29->35 MB on a small config), but that difference remains even with 50k servers so that's pretty much acceptable. Thanks to Erwan Velu for the quick tests and the insights!	2025-05-13 18:15:33 +02:00
Amaury Denoyelle	f3b9676416	MINOR: quic: display stream age Add a field to save the creation date of qc_stream_desc instance. This is useful to display QUIC stream age in "show quic stream" output.	2025-05-13 15:44:22 +02:00
Amaury Denoyelle	1ccede211c	MINOR: mux-quic: account Rx data per stream Add counters to measure Rx buffers usage per QCS. This reused the newly defined bdata_ctr type already used for Tx accounting. Note that for now, <tot> value of bdata_ctr is not used. This is because it is not easy to account for data accross contiguous buffers. These values are displayed both on log/traces and "show quic" output.	2025-05-13 15:41:51 +02:00
Amaury Denoyelle	a1dc9070e7	MINOR: quic: account Tx data per stream Add accounting at qc_stream_desc level to be able to report the number of allocated Tx buffers and the sum of their data. This represents data ready for emission or already emitted and waiting on ACK. To simplify this accounting, a new counter type bdata_ctr is defined in quic_utils.h. This regroups both buffers and data counter, plus a maximum on the buffer value. These values are now displayed on QCS info used both on logline and traces, and also on "show quic" output.	2025-05-13 15:41:41 +02:00
Willy Tarreau	ebab479cdf	MINOR: http: add a function to validate characters of :authority As discussed here: https://github.com/httpwg/http2-spec/pull/936 https://github.com/haproxy/haproxy/issues/2941 It's important to take care of some special characters in the :authority pseudo header before reassembling a complete URI, because after assembly it's too late (e.g. the '/'). This patch adds a specific function which was checks all such characters and their ranges on an ist, and benefits from modern compilers optimizations that arrange the comparisons into an evaluation tree for faster match. That's the version that gave the most consistent performance across various compilers, though some hand-crafted versions using bitmaps stored in register could be slightly faster but super sensitive to code ordering, suggesting that the results might vary with future compilers. This one takes on average 1.2ns per character at 3 GHz (3.6 cycles per char on avg). The resulting impact on H2 request processing time (small requests) was measured around 0.3%, from 6.60 to 6.618us per request, which is a bit high but remains acceptable given that the test only focused on req rate. The code was made usable both for H2 and H3.	2025-05-12 18:02:47 +02:00
William Lallemand	96b1f1fd26	MINOR: tools: ha_freearray() frees an array of string ha_freearray() is a new function which free() an array of strings terminated by a NULL entry. The pointer to the array will be free and set to NULL.	2025-05-09 19:12:05 +02:00
Willy Tarreau	8a96216847	MEDIUM: sock-inet: re-check IPv6 connectivity every 30s IPv6 connectivity might start off (e.g. network not fully up when haproxy starts), so for features like resolvers, it would be nice to periodically recheck. With this change, instead of having the resolvers code rely on a variable indicating connectivity, it will now call a function that will check for how long a connectivity check hasn't been run, and will perform a new one if needed. The age was set to 30s which seems reasonable considering that the DNS will cache results anyway. There's no saving in spacing it more since the syscall is very check (just a connect() without any packet being emitted). The variables remain exported so that we could present them in show info or anywhere else. This way, "dns-accept-family auto" will now stay up to date. Warning though, it does perform some caching so even with a refreshed IPv6 connectivity, an older record may be returned anyway.	2025-05-09 15:45:44 +02:00
Willy Tarreau	1404f6fb7b	DEBUG: pools: add a new integrity mode "backup" to copy the released area This way we can preserve the entire contents of the released area for later inspection. This automatically enables comparison at reallocation time as well (like "integrity" does). If used in combination with integrity, the comparison is disabled but the check of non-corruption of the area mangled by integrity is still operated.	2025-05-09 14:57:00 +02:00
William Lallemand	e7574cd5f0	MINOR: acme: add the global option 'acme.scheduler' The automatic scheduler is useful but sometimes you don't want to use, or schedule manually. This patch adds an 'acme.scheduler' option in the global section, which can be set to either 'auto' or 'off'. (auto is the default value) This also change the ouput of the 'acme status' command so it does not shows scheduled values. The state will be 'Stopped' instead of 'Scheduled'.	2025-05-09 14:00:39 +02:00
Willy Tarreau	0ae14beb2a	DEBUG: pool: permit per-pool UAF configuration The new MEM_F_UAF flag can be set just after a pool's creation to make this pool UAF for debugging purposes. This allows to maintain a better overall performance required to reproduce issues while still having a chance to catch UAF. It will only be used by developers who will manually add it to areas worth being inspected, though.	2025-05-09 13:59:02 +02:00
Amaury Denoyelle	294bf26c06	MINOR: quic: extend return value during TP parsing Extend API used for QUIC transport parameter decoding. This is done via the introduction of a dedicated enum to report the various error condition detected. No functional change should occur with this patch, as the only returned code is QUIC_TP_DEC_ERR_TRUNC, which results in the connection closure via a TLS alert. This patch will be necessary to properly reject transport parameters with the proper CONNECTION_CLOSE error code. As such, it should be backported up to 2.6 with the following series.	2025-05-07 15:19:52 +02:00
Willy Tarreau	feaac66b5e	DEBUG: threads: merge successive idempotent lock operations in history In order to make the lock history a bit more useful, let's try to merge adjacent lock/unlock sequences that don't change anything for other threads. For this we can replace the last unlock with the new operation on the same label, and even just not store it if it was the same as the one before the unlock, since in the end it's the same as if the unlock had not been done. Now loops that used to be filled with "R:LISTENER U:LISTENER" show more useful info such as: S:IDLE_CONNS U:IDLE_CONNS S:PEER U:PEER S:IDLE_CONNS U:IDLE_CONNS R:LISTENER U:LISTENER U:STK_TABLE W:STK_SESS U:STK_SESS R:STK_TABLE U:STK_TABLE W:STK_SESS U:STK_SESS R:STK_TABLE R:STK_TABLE U:STK_TABLE W:STK_SESS U:STK_SESS W:STK_TABLE_UPDT U:STK_TABLE_UPDT S:PEER It's worth noting that it can sometimes induce confusion when recursive locks of the same label are used (a few exist on peers or stick-tables), as in such a case the two operations would be needed. However these ones are already undebuggable, so instead they will just have to be renamed to make sure they use a distinct label.	2025-05-05 18:36:12 +02:00
Willy Tarreau	743dce95d2	DEBUG: threads: don't keep lock label "OTHER" in the per-thread history Most threads are filled with "R:OTHER U:OTHER" in their history. Since anything non-important can use other it's not observable but it pollutes the history. Let's just drop OTHER entirely during the recording.	2025-05-05 18:10:57 +02:00
William Lallemand	878a3507df	BUILD: acme: need HAVE_ASN1_TIME_TO_TM Restrict the build of the ACME feature to libraries which provide ASN1_TIME_to_tm() function.	2025-05-02 16:01:32 +02:00
William Lallemand	626de9538e	MINOR: ssl: add function to extract X509 notBefore date in time_t Add x509_get_notbefore_time_t() which returns the notBefore date in time_t format.	2025-05-02 16:01:32 +02:00
Olivier Houchard	388539faa3	MEDIUM: stick-tables: defer adding updates to a tasklet There is a lot of contention trying to add updates to the tree. So instead of trying to add the updates to the tree right away, just add them to a mt-list (with one mt-list per thread group, so that the mt-list does not become the new point of contention that much), and create a tasklet dedicated to adding updates to the tree, in batchs, to avoid keeping the update lock for too long. This helps getting stick tables perform better under heavy load.	2025-05-02 15:27:55 +02:00
Olivier Houchard	faa18c1ad8	BUG/MEDIUM: quic: Let it be known if the tasklet has been released. quic_conn_release() may, or may not, free the tasklet associated with the connection. So make it return 1 if it was, and 0 otherwise, so that if it was called from the tasklet handler itself, the said handler can act accordingly and return NULL if the tasklet was destroyed. This should be backported if `9240cd4a27` is backported.	2025-05-02 11:09:28 +02:00
William Lallemand	18d2371e0d	MINOR: acme: change the default max retries to 5 Change the default max retries constant to 5 instead of 3. Some servers can be be a bit long to execute the challenge.	2025-05-02 09:40:12 +02:00
Olivier Houchard	b138eab302	BUG/MEDIUM: connections: Report connection closing in conn_create_mux() Add an extra parametre to conn_create_mux(), "closed_connection". If a pointer is provided, then let it know if the connection was closed. Callers have no way to determine that otherwise, and we need to know that, at least in ssl_sock_io_cb(), as if the connection was closed we need to return NULL, as the tasklet was free'd, otherwise that can lead to memory corruption and crashes. This should be backported if `9240cd4a27` is backported too.	2025-04-30 17:17:36 +02:00
Olivier Houchard	4abfade371	MINOR: tasks: Remove unused tasklet_remove_from_tasklet_list Remove tasklet_remove_from_tasklet_list, as the function hasn't been used for a long time, and there is little reason to keep it.	2025-04-30 17:09:06 +02:00
Olivier Houchard	2bab043c8c	MEDIUM: tasks: Remove TASK_IN_LIST and use TASK_QUEUED instead. TASK_QUEUED was used to mean "the task has been scheduled to run", TASK_IN_LIST was used to mean "the tasklet has been scheduled to run", remove TASK_IN_LIST and just use TASK_QUEUED for tasklets instead. This commit is just cosmetic, and should not have any impact.	2025-04-30 17:08:57 +02:00
William Lallemand	563ca94ab8	MINOR: ssl/cli: "acme ps" shows the acme tasks Implement a way to display the running acme tasks over the CLI. It currently only displays a "Running" status with the certificate name and the acme section from the configuration. The displayed running tasks are limited to the size of a buffer for now, it will require a backref list later to be called multiple times to resume the list.	2025-04-30 17:12:50 +02:00
Aurelien DARRAGON	97363015a5	MINOR: add hlua_yield_asap() helper When called, this function will try to enforce a yield (if available) as soon as possible. Indeed, automatic yield is already enforced every X Lua instructions. However, there may be some cases where we know after running heavy operation that we should yield already to avoid taking too much CPU at once. This is what this function offers, instead of asking the user to manually yield using "core.yield()" from Lua itself after using an expensive Lua method offered by haproxy, we can directly enforce the yield without the need to do it in the Lua script.	2025-04-30 17:00:27 +02:00
Amaury Denoyelle	df50d3e39f	MINOR: mux-quic: limit emitted MSD frames count per qcs The previous commit has implemented a new calcul method for MAX_STREAM_DATA frame emission. Now, a frame may be emitted as soon as a buffer was consumed by a QCS instance. This will probably increase the number of MAX_STREAM_DATA frame emission. It may even cause a series of frame emitted for the same stream with increasing values under high load, which is completely unnecessary. To improve this, limit the number of MAX_STREAM_DATA frames built to one per QCS instance. This is implemented by storing a reference to this frame in QCS structure via a new member <tx.msd_frm>. Note that to properly reset QCS msd_frm member, emission of flow-control frames have been changed. Now, each frame is emitted individually. On one side, it is better as it prevent to emit frames related to different streams in a single datagram, which is not desirable in case of packet loss. However, this can also increase sendto() syscall invocation.	2025-04-30 16:08:47 +02:00
Amaury Denoyelle	14a3fb679f	MEDIUM: mux-quic: increase flow-control on each bufsize Recently, QCS Rx allocation buffer method has been improved. It is now possible to allocate multiple buffers per QCS instances, which was necessary to improve HTTP/3 POST throughput. However, a limitation remained related to the emission of MAX_STREAM_DATA. These frames are only emitted once at least half of the receive capacity has been consumed by its QCS instance. This may be too restrictive when a client need to upload a large payload. Improve this by adjusting MAX_STREAM_DATA allocation. If QCS capacity is still limited to 1 or 2 buffers max, the old calcul is still used. This is necessary when user has limited upload throughput via their configuration. If QCS capacity is more than 2 buffers, a new frame is emitted if at least a buffer was consumed. This patch has reduced number of STREAM_DATA_BLOCKED frames received in POST tests with some specific clients.	2025-04-30 16:08:47 +02:00
Remi Tricot-Le Breton	047fb37b19	MINOR: Add 'conn' param to ssl_sock_chose_sni_ctx This is only useful in the traces, the conn parameter won't be used otherwise.	2025-04-30 11:11:26 +02:00
Remi Tricot-Le Breton	6519cec2ed	MINOR: ssl: Add traces about sigalg extension parsing in clientHello callback We had to parse the sigAlg extension by hand in order to properly select the certificate used by the SSL frontends. These traces allow to dump the allowed sigAlg list sent by the client in its clientHello.	2025-04-30 11:11:26 +02:00
Remi Tricot-Le Breton	105c1ca139	MINOR: ssl: Add traces to the switchctx callback This callback allows to pick the used certificate on an SSL frontend. The certificate selection is made according to the information sent by the client in the clientHello. The traces that were added will allow to better understand what certificate was chosen and why. It will also warn us if the chosen certificate was the default one. The actual certificate parsing happens in ssl_sock_chose_sni_ctx. It's in this function that we actually get the filename of the certificate used.	2025-04-30 11:11:26 +02:00
Remi Tricot-Le Breton	dbdd0630e1	MINOR: ssl: Add ocsp stapling callback traces If OCSP stapling fails because of a missing or invalid OCSP response we used to silently disable stapling for the given session. We can now know a bit more what happened regarding OCSP stapling.	2025-04-30 11:11:26 +02:00
Remi Tricot-Le Breton	0fb05540b2	MINOR: ssl: Add traces to verify callback Those traces allow to know which errors were met during certificate chain validation as well as which ones were ignored.	2025-04-30 11:11:26 +02:00
Remi Tricot-Le Breton	4a8fa28e36	MINOR: ssl: Add traces around SSL_do_handshake call Those traces dump information about the multiple SSL_do_handshake calls (renegotiation and regular call). Some errors coud also be dumped in case of rejected early data. Depending on the chosen verbosity, some information about the current handshake can be dumped as well (servername, tls version, chosen cipher for instance). In case of failed handshake, the error codes and messages will also be dumped in the log to ease debugging.	2025-04-30 11:11:26 +02:00
Remi Tricot-Le Breton	9f146bdab3	MINOR: ssl: Add traces to ssl_sock_io_cb function Add new SSL traces.	2025-04-30 11:11:26 +02:00
Remi Tricot-Le Breton	475bb8d843	MINOR: ssl: Add traces to recv/send functions Those traces will allow to identify sessions on which early data is used as well as some forcefully closed connections.	2025-04-30 11:11:26 +02:00
Remi Tricot-Le Breton	9bb8d6dcd1	MINOR: ssl: Add traces to ssl init/close functions Add a dedicated trace for some unlikely allocation failures and async errors. Those traces will ostly be used to identify the start and end of a given SSL connection.	2025-04-30 11:11:26 +02:00
Remi Tricot-Le Breton	08e40f4589	MINOR: Add "sigalg" to "sigalg name" helper function This function can be used to convert a TLSv1.3 sigAlg entry (2bytes) from the signature_agorithms client hello extension into a string. In order to ease debugging, some TLSv1.2 combinations can also be dumped. In TLSv1.2 those signature algorithms pairs were built out of a one byte signature identifier combined to a one byte hash identifier. In TLSv1.3 those identifiers are two bytes blocs that must be treated as such.	2025-04-30 11:11:26 +02:00
Willy Tarreau	566b384e4e	MINOR: tools: make my_strndup() take a size_t len instead of and int In relation to issue #2954, it appears that turning some size_t length calculations to the int that uses my_strndup() upsets coverity a bit. Instead of dealing with such warnings each time, better address it at the root. An inspection of all call places show that the size passed there is always positive so we can safely use an unsigned type, and size_t will always suit it like for strndup() where it's available.	2025-04-30 05:17:43 +02:00
Aurelien DARRAGON	5288b39011	BUG/MINOR: dns: prevent ds accumulation within dss when dns session callback (dns_session_release()) is called upon error (ie: when some pending queries were not sent), we try our best to re-create the applet in order to preserve the pending queries and give them a chance to be retried. This is done at the end of dns_session_release(). However, doing so exposes to an issue: if the error preventing queries from being sent is still encountered over and over the dns session could stay there indefinitely. Meanwhile, other dns sessions may be created on the same dns_stream_server periodically. If previous failing dns sessions don't terminate but we also keep creating new ones, we end up accumulating failing sessions on a given dns_stream_server, which can eventually cause ressource shortage. This issue was found when trying to address ("BUG/MINOR: dns: add tempo between 2 connection attempts for dns servers") To fix it, we track the number of failed consecutive sessions for a given dns server. When we reach the threshold (set to 100), we consider that the link to the dns server is broken (at least temporarily) and we force dns_session_new() to fail, so that we stop creating new sessions until one of the existing one eventually succeeds. A workaround for this fix consists in setting the "maxconn" parameter on nameserver directive (under resolvers section) to a reasonnable value so that no more than "maxconn" sessions may co-exist on the same server at a given time. This may be backported to all stable versions. ("CLEANUP: dns: remove unused dns_stream_server struct member") may be backported to ease the backport.	2025-04-29 21:20:54 +02:00
Aurelien DARRAGON	14ebe95a10	CLEANUP: dns: remove unused dns_stream_server struct member dns_stream_server "max_slots" is unused, let's get rid of it	2025-04-29 21:20:44 +02:00
Aurelien DARRAGON	1ced5ef2fd	MINOR: applet: add appctx_schedule() macro Just like task_schedule() but for applets to wakeup an applet at a specific time, leverages _task_schedule() internally	2025-04-29 21:19:37 +02:00
William Lallemand	5555926fdd	MEDIUM: acme: use a map to store tokens and thumbprints The stateless mode which was documented previously in the ACME example is not convenient for all use cases. First, when HAProxy generates the account key itself, you wouldn't be able to put the thumbprint in the configuration, so you will have to get the thumbprint and then reload. Second, in the case you are using multiple account key, there are multiple thumbprint, and it's not easy to know which one you want to use when responding to the challenger. This patch allows to configure a map in the acme section, which will be filled by the acme task with the token corresponding to the challenge, as the key, and the thumbprint as the value. This way it's easy to reply the right thumbprint. Example: http-request return status 200 content-type text/plain lf-string "%[path,field(-1,/)].%[path,field(-1,/),map(virt@acme)]\n" if { path_beg '/.well-known/acme-challenge/' }	2025-04-29 16:15:55 +02:00
Amaury Denoyelle	0f9b3daf98	MEDIUM: quic: limit global Tx memory Define a new settings tune.quic.frontend.max-tot-window. It contains a size argument which can be used to set a limit on the sum of all QUIC connections congestion window. This is applied both on quic_cc_path_set() and quic_cc_path_inc(). Note that this limitation cannot reduce a congestion window more than the minimal limit which is set to 2 datagrams.	2025-04-29 15:19:32 +02:00
Amaury Denoyelle	e841164a44	MINOR: quic: account for global congestion window Use the newly defined cshared type to account for the sum of congestion window of every QUIC connection. This value is stored in global counter quic_mem_global defined in proto_quic module.	2025-04-29 15:19:32 +02:00
Amaury Denoyelle	3891456d20	MINOR: thread: define cshared type Define a new type "struct cshared". This can be used as a tool to manipulate a global counter with thread-safety ensured. Each thread would declare its thread-local cshared type, which would point to a global counter. Each thread can then add/substract value to their owned thread-local cshared instance via cshared_add(). If the difference exceed a configured limit, either positively or negatively, the global counter is updated and thread-local instance is reset to 0. Each thread can safely read the global counter value using cshared_read().	2025-04-29 15:10:06 +02:00
Amaury Denoyelle	7bad88c35c	BUG/MINOR: quic: ensure cwnd limits are always enforced Congestion window is limit by a minimal and maximum values which can never be exceeded. Min value is hardcoded to 2 datagrams as recommended by the specification. Max value is specified via haproxy configuration. These values must be respected each time the congestion window size is adjusted. However, in some rare occasions, limit were not always enforced. Fix this by implementing wrappers to set or increment the congestion window. These functions ensure limits are always applied after the operation. Additionnally, wrappers also ensure that if window reached a new maximum value, it is saved in <cwnd_last_max> field. This should be backported up to 2.6, after a brief period of observation.	2025-04-29 15:10:06 +02:00
Amaury Denoyelle	2eb1b0cd96	MINOR: quic: rename min/max fields for congestion window algo There was some possible confusion between fields related to congestion window size min and max limit which cannot be exceeded, and the maximum value previously reached by the window. Fix this by adopting a new naming scheme. Enforced limit are now renamed <limit_max>/<limit_min>, while the previously reached max value is renamed <cwnd_last_max>. This should be backported up to 3.1.	2025-04-29 15:10:06 +02:00
Willy Tarreau	2cdb3cb91e	MINOR: tcp: add support for setting TCP_NOTSENT_LOWAT on both sides TCP_NOTSENT_LOWAT is very convenient as it indicates when to report EAGAIN on the sending side. It takes a margin on top of the estimated window, meaning that it's no longer needed to store too many data in socket buffers. Instead there's just enough to fill the send window and a little bit of margin to cover the scheduling time to restart sending. Experiments on a 100ms network have shown a 10-fold reduction in the memory used by socket buffers by just setting this value to tune.bufsize, without noticing any performance degradation. Theoretically the responsiveness on multiplexed protocols such as H2 should also be improved.	2025-04-29 12:13:42 +02:00
Willy Tarreau	f25b4abc9b	MINOR: cli: split APPCTX_CLI_ST1_PROMPT into two distinct flags The CLI's "prompt" command toggles two distinct things: - displaying or hiding the prompt at the beginning of the line - single-command vs interactive mode These are two independent concepts and the prompt mode doesn't always cope well with tools that would like to upload data without having to read the prompt on return. Also, the master command line works in interactive mode by default with no prompt, which is not consistent (and not convenient for tools). So let's start by splitting the bit in two, and have a new APPCTX_CLI_ST1_INTER flag dedicated to the interactive mode. For now the "prompt" command alone continues to toggle the two at once.	2025-04-28 20:21:06 +02:00
Willy Tarreau	5ac280f2a7	MINOR: compiler: add more macros to detect macro definitions We add __equals_0(NAME) which is only true if NAME is defined as zero, and __def_as_empty(NAME) which is only true if NAME is defined as an empty string.	2025-04-28 20:21:06 +02:00
Willy Tarreau	12c7189bc8	MEDIUM: thread: set DEBUG_THREAD to 1 by default Setting DEBUG_THREAD to 1 allows recording the lock history for each thread. Tests have shown that (as predicted) the cost of updating a single thread-local variable is not perceptible in the noise, especially when compared to the cost of obtaining a lock. Since this can provide useful value when debugging deadlocks, let's enable it by default when threads are enabled.	2025-04-28 16:50:34 +02:00
Willy Tarreau	d9a659ed96	MINOR: threads/cli: display the lock history on "show threads" This will display the lock labels and modes for each non-empty step at the end of "show threads" when these are defined. This allows to emit up to the last 8 locking operation for each thread on 64 bit machines.	2025-04-28 16:50:34 +02:00
Willy Tarreau	b8a1c2380b	MEDIUM: threads: keep history of taken locks with DEBUG_THREAD > 0 by only storing a word in each thread context, we can keep the history of all taken/dropped locks by label. This is expected to be very cheap and to permit to store up to 8 consecutive lock operations in 64 bits. That should significantly help detect recursive locks as well as figure what thread was likely to hinder another one waiting for a lock. For now we only store the final state of the lock, we don't store the attempt to get it. It's just a matter of space since we already need 4 ops (rd,sk,wr,un) which take 2 bits, leaving max 64 labels. We're already around 45. We could also multiply by 5 and still keep 8 bits total per lock, that would limit us to 51 locks max. It seems that most of the time if we get a watchdog panic, anyway the victim thread will be perfectly located so that we don't need a specific value for this. Another benefit is that we perform a single memory write per lock.	2025-04-28 16:50:34 +02:00
Willy Tarreau	23371b3e7c	MINOR: threads: turn the full lock debugging to DEBUG_THREAD=2 At level 1 it now does nothing. This is reserved for some subsequent patches which will implement lighter debugging.	2025-04-28 16:50:34 +02:00
Willy Tarreau	903a6b14ef	MINOR: threads: prepare DEBUG_THREAD to receive more values We now default the value to zero and make sure all tests properly take care of values above zero. This is in preparation for supporting several degrees of debugging.	2025-04-28 16:50:34 +02:00
William Lallemand	bb768b3e26	MEDIUM: acme: use Retry-After value for retries Parse the Retry-After header in response and store it in order to use the value as the next delay for the next retry, fallback to 3s if the value couldn't be parse or does not exist.	2025-04-24 20:14:47 +02:00
Willy Tarreau	69b051d1dc	MINOR: resolvers: add "dns-accept-family auto" to rely on detected IPv6 Instead of always having to force IPv4 or IPv6, let's now also offer "auto" which will only enable IPv6 if the system has a default gateway for it. This means that properly configured dual-stack systems will default to "ipv4,ipv6" while those lacking a gateway will only use "ipv4". Note that no real connectivity test is performed, so firewalled systems may still get it wrong and might prefer to rely on a manual "ipv4" assignment.	2025-04-24 17:52:28 +02:00
Willy Tarreau	5d41d476f3	MINOR: sock-inet: detect apparent IPv6 connectivity In order to ease dual-stack deployments, we could at least try to check if ipv6 seems to be reachable. For this we're adding a test based on a UDP connect (no traffic) on port 53 to the base of public addresses (2001::) and see if the connect() is permitted, indicating that the routing table knows how to reach it, or fails. Based on this result we're setting a global variable that other subsystems might use to preset their defaults.	2025-04-24 17:52:28 +02:00
Willy Tarreau	2c46c2c042	MINOR: resolvers: add command-line argument -4 to force IPv4-only DNS In order to ease troubleshooting and testing, the new "-4" command line argument enforces queries and processing of "A" DNS records only, i.e. those representing IPv4 addresses. This can be useful when a host lack end-to-end dual-stack connectivity. This overrides the global "dns-accept-family" directive and is equivalent to value "ipv4".	2025-04-24 17:52:28 +02:00
Willy Tarreau	940fa19ad8	MEDIUM: resolvers: add global "dns-accept-family" directive By default, DNS resolvers accept both IPv4 and IPv6 addresses. This can be influenced by the "resolve-prefer" keywords on server lines as well as the family argument to the "do-resolve" action, but that is only a preference, which does not block the other family from being used when it's alone. In some environments where dual-stack is not usable, stumbling on an unreachable IPv6-only DNS record can cause significant trouble as it will replace a previous IPv4 one which would possibly have continued to work till next request. The "dns-accept-family" global option permits to enforce usage of only one (or both) address families. The argument is a comma-delimited list of the following words: - "ipv4": query and accept IPv4 addresses ("A" records) - "ipv6": query and accept IPv6 addresses ("AAAA" records) When a single family is used, no request will be sent to resolvers for the other family, and any response for the othe family will be ignored. The default value is "ipv4,ipv6", which effectively enables both families.	2025-04-24 17:52:28 +02:00
Christopher Faulet	29632bcabf	CLEANUP: applet: Remove unsued rule pointer in appctx structure Thanks to previous commits, the "rule" field in the appctx structure is no longer used. So we can safely remove it.	2025-04-24 16:22:31 +02:00
Christopher Faulet	b734d7c156	MINOR: cli/applet: Move appctx fields only used by the CLI in a private context There are several fields in the appctx structure only used by the CLI. To make things cleaner, all these fields are now placed in a dedicated context inside the appctx structure. The final goal is to move it in the service context and add an API for cli commands to get a command coontext inside the cli context.	2025-04-24 15:09:37 +02:00
Christopher Faulet	742dc01537	CLEANUP: applet: Update st0/st1 comment in appctx structure Today, these states are used by almost all applets. So update the comments of these fields.	2025-04-24 15:09:37 +02:00
Christopher Faulet	44ace9a1b7	MINOR: cli: Rename some CLI applet states to reflect recent refactoring CLI_ST_GETREQ state was renamed into CLI_ST_PARSE_CMDLINE and CLI_ST_PARSEREQ into CLI_ST_PROCESS_CMDLINE to reflect the real action performed in these states.	2025-04-24 15:09:37 +02:00
Christopher Faulet	20ec1de214	MAJOR: cli: Refacor parsing and execution of pipelined commands Before this patch, when pipelined commands were received, each command was parsed and then excuted before moving to the next command. Pending commands were not copied in the input buffer of the applet. The major issue with this way to handle commands is the impossibility to consume inputs from commands with an I/O handler, like "show events" for instance. It was working thanks to a "bug" if such commands were the last one on the command line. But it was impossible to use them followed by another command. And this prevents us to implement any streaming support for CLI commands. So we decided to refactor the command line parsing to have something similar to a basic shell. Now an entire line is parsed, including the payload, before starting commands execution. The command line is copied in a dedicated buffer. "appctx->chunk" buffer is used for this purpose. It was an unsed field, so it is safe to use it here. Once the command line copied, the commands found on this line are executed. Because the applet input buffer was flushed, any input can be safely consumed by the CLI applet and is available for the command I/O handler. Thanks to this change, "show event -w" command can be followed by a command. And in theory, it should be possible to implement commands supporting input data streaming. For instance, the Tetris like lua applet can be used on the CLI now. Note that the payload, if any, is part of the command line and must be fully received before starting the commands processing. It means there is still the limitation to a buffer, but not only for the payload but for the whole command line. The payload is still necessarily at the end of the command line and is passed as argument to the last command. Internally, the "appctx->cli_payload" field was introduced to point on the payload in the command line buffer. This patch is quite huge but it cannot easily be splitted. It should not introduced significant changes.	2025-04-24 15:09:37 +02:00
Willy Tarreau	1af592c511	MINOR: stick-table: use a separate lock label for updates Too many locks were sharing STK_TABLE_LOCK making it hard to analyze. Let's split the already heavily used update lock.	2025-04-24 14:02:22 +02:00
William Lallemand	af73f98a3e	MEDIUM: acme: rename "uri" into "directory" Rename the "uri" option of the acme section into "directory".	2025-04-24 10:52:46 +02:00
William Lallemand	d700a242b4	MINOR: httpclient: add an "https" log-format Add an experimental "https" log-format for the httpclient, it is not used by the httpclient by default, but could be define in a customized proxy. The string is basically a httpslog, with some of the fields replaced by their backend equivalent or - when not available: "%ci:%cp [%tr] %ft -/- %TR/%Tw/%Tc/%Tr/%Ta %ST %B %CC %CS %tsc %ac/%fc/%bc/%sc/%rc %sq/%bq %hr %hs %{+Q}r %[bc_err]/%[ssl_bc_err,hex]/-/-/%[ssl_bc_is_resumed] -/-/-"	2025-04-23 15:32:46 +02:00
Christopher Faulet	a56feffc6f	CLEANUP: h1: Remove now useless h1_parse_cont_len_header() function Since the commit "MINOR: hlua/h1: Use http_parse_cont_len_header() to parse content-length value", this function is no longer used. So it can be safely removed.	2025-04-22 16:14:47 +02:00
Christopher Faulet	5200203677	MINOR: proxy: Add options to drop HTTP trailers during message forwarding In RFC9110, it is stated that trailers could be merged with the headers. While it should be performed with a speicial care, it may be a problem for some applications. To avoid any trouble with such applications, two new options were added to drop trailers during the message forwarding. On the backend, "http-drop-request-trailers" option can be enabled to drop trailers from the requests before sending them to the server. And on the frontend, "http-drop-response-trailers" option can be enabled to drop trailers from the responses before sending them to the client. The options can be defined in defaults sections and disabled with "no" keyword. This patch should fix the issue #2930.	2025-04-22 16:14:46 +02:00
Christopher Faulet	044ef9b3d6	CLEANUP: Slightly reorder some proxy option flags to free slots PR_O_TCPCHK_SSL and PR_O_CONTSTATS was shifted to free a slot. The idea is to have 2 contiguous slots to be able to insert two new options.	2025-04-22 16:14:46 +02:00
Amaury Denoyelle	4309a6fbf8	BUG/MINOR: quic: do not crash on CRYPTO ncbuf alloc failure To handle out-of-order received CRYPTO frames, a ncbuf instance is allocated. This is done via the helper quic_get_ncbuf(). Buffer allocation was improperly checked. In case b_alloc() fails, it crashes due to a BUG_ON(). Fix this by removing it. The function now returns NULL on allocation failure, which is already properly handled in its caller qc_handle_crypto_frm(). This should fix the last reported crash from github issue #2935. This must be backported up to 2.6.	2025-04-18 18:11:17 +02:00
Olivier Houchard	3758eab71c	MEDIUM: lb_fwrr: Use one ebtree per thread group. When using the round-robin load balancer, the major source of contention is the lbprm lock, that has to be held every time we pick a server. To mitigate that, make it so there are one tree per thread-group, and one lock per thread-group. That means we now have a lb_fwrr_per_tgrp structure that will contain the two lb_fwrr_groups (active and backup) as well as the lock to protect them in the per-thread lbprm struct, and all fields in the struct server are now moved to the per-thread structure too. Those changes are mostly mechanical, and brings good performances improvment, on a 64-cores AMD CPU, with 64 servers configured, we could process about 620000 requests par second, and we now can process around 1400000 requests per second.	2025-04-17 17:38:23 +02:00
Olivier Houchard	f36f6cfd26	MINOR: proxies: Add a per-thread group lbprm struct. Add a new structure in the per-thread groups proxy structure, that will contain whatever is per-thread group in lbprm. It will be accessed as p->per_tgrp[tgid].lbprm.	2025-04-17 17:38:23 +02:00
Olivier Houchard	7ca1c94ff0	MINOR: lb_fwrr: Move the next weight out of fwrr_group. Move the "next_weight" outside of fwrr_group, and inside struct lb_fwrr directly, one for the active servers, one for the backup servers. We will soon have one fwrr_group per thread group, but next_weight will be global to all of them.	2025-04-17 17:38:23 +02:00
Olivier Houchard	444125a764	MINOR: servers: Provide a pointer to the server in srv_per_tgroup. Add a pointer to the server into the struct srv_per_tgroup, so that if we only have access to that srv_per_tgroup, we can come back to the corresponding server.	2025-04-17 17:38:23 +02:00
Willy Tarreau	36ec70c526	MINOR: sched: add a new function is_sched_alive() to report scheduler's health This verifies that the scheduler is still ticking without having to access the activity[] array nor keeping local copies of the ctxsw counter. It just tests and sets a flag that is reset after each return from a ->process() function.	2025-04-17 16:25:47 +02:00
Willy Tarreau	874ba2afed	CLEANUP: debug: no longer set nor use TH_FL_DUMPING_OTHERS TH_FL_DUMPING_OTHERS was being used to try to perform exclusion between threads running "show threads" and those producing warnings. Now that it is much more cleanly handled, we don't need that type of protection anymore, which was adding to the complexity of the solution. Let's just get rid of it.	2025-04-17 16:25:47 +02:00
Willy Tarreau	c16d5415a8	MINOR: debug: make ha_stuck_warning() only work for the current thread Since we no longer call it with a foreign thread, let's simplify its code and get rid of the special cases that were relying on ha_thread_dump_fill() and synchronization with a remote thread. We're not only dumping the current thread so ha_thread_dump_one() is sufficient.	2025-04-17 16:25:47 +02:00
Willy Tarreau	b24d7f248e	MINOR: pass a valid buffer pointer to ha_thread_dump_one() The goal is to let the caller deal with the pointer so that the function only has to fill that buffer without worrying about locking. This way, synchronous dumps from "show threads" are produced and emitted directly without causing undesired locking of the buffer nor risking causing confusion about thread_dump_buffer containing bits from an interrupted dump in progress. It's only the caller that's responsible for notifying the requester of the end of the dump by setting bit 0 of the pointer if needed (i.e. it's only done in the debug handler).	2025-04-17 16:25:47 +02:00
Willy Tarreau	5ac739cd0c	MINOR: debug: remove unused case of thr!=tid in ha_thread_dump_one() This function was initially designed to dump any threadd into the presented buffer, but the way it currently works is that it's always called for the current thread, and uses the distinction between coming from a sighandler or being called directly to detect which thread is the caller. Let's simplify all this by replacing thr with tid everywhere, and using the thread-local pointers where it makes sense (e.g. th_ctx, th_ctx etc). The confusing "from_signal" argument is now replaced with "is_caller" which clearly states whether or not the caller declares being the one asking for the dump (the logic is inverted, but there are only two call places with a constant).	2025-04-17 16:25:47 +02:00
Willy Tarreau	6d8a523d14	MINOR: tinfo: keep a copy of the pointer to the thread dump buffer Instead of using the thread dump buffer for post-mortem analysis, we'll keep a copy of the assigned pointer whenever it's used, even for warnings or "show threads". This will offer more opportunities to figure from a core what happened, and will give us more freedom regarding the value of the thread_dump_buffer itself. For example, even at the end of the dump when the pointer is reset, the last used buffer is now preserved.	2025-04-17 16:25:47 +02:00
Willy Tarreau	337017e2f9	BUG/MINOR: threads: set threads_idle and threads_harmless even with no threads Some signal handlers rely on these to decide about the level of detail to provide in dumps, so let's properly fill the info about entering/leaving idle. Note that for consistency with other tests we're using bitops with t->ltid_bit, while we could simply assign 0/1 to the fields. But it makes the code more readable and the whole difference is only 88 bytes on a 3MB executable. This bug is not important, and while older versions are likely affected as well, it's not worth taking the risk to backport this in case it would wake up an obscure bug.	2025-04-17 16:25:47 +02:00
Amaury Denoyelle	52246249ab	MEDIUM: listener/mux-h2: implement idle-ping on frontend side This commit is the counterpart of the previous one, adapted on the frontend side. "idle-ping" is added as keyword to bind lines, to be able to refresh client timeout of idle frontend connections. H2 MUX behavior remains similar as the previous patch. The only significant change is in h2c_update_timeout(), as idle-ping is now taken into account also for frontend connection. The calculated value is compared with http-request/http-keep-alive timeout value. The shorter delay is then used as expired date. As hr/ka timeout are based on idle_start, this allows to run them in parallel with an idle-ping timer.	2025-04-17 14:49:36 +02:00
Amaury Denoyelle	a78a04cfae	MEDIUM: server/mux-h2: implement idle-ping on backend side This commit implements support for idle-ping on the backend side. First, a new server keyword "idle-ping" is defined in configuration parsing. It is used to set the corresponding new server member. The second part of this commit implements idle-ping support on H2 MUX. A new inlined function conn_idle_ping() is defined to access connection idle-ping value. Two new connection flags are defined H2_CF_IDL_PING and H2_CF_IDL_PING_SENT. The first one is set for idle connections via h2c_update_timeout(). On h2_timeout_task() handler, if first flag is set, instead of releasing the connection as before, the second flag is set and tasklet is scheduled. As both flags are now set, h2_process_mux() will proceed to PING emission. The timer has also been rearmed to the idle-ping value. If a PING ACK is received before next timeout, connection timer is refreshed. Else, the connection is released, as with timer expiration. Also of importance, special care is needed when a backend connection is going to idle. In this case, idle-ping timer must be rearmed. Thus a new invokation of h2c_update_timeout() is performed on h2_detach().	2025-04-17 14:49:36 +02:00
William Lallemand	e778049ffc	MINOR: acme: register the task in the ckch_store This patch registers the task in the ckch_store so we don't run 2 tasks at the same time for a given certificate. Move the task creation under the lock and check if there was already a task under the lock.	2025-04-16 17:12:43 +02:00
William Lallemand	c291a5c73c	BUILD: incompatible pointer type suspected with -DDEBUG_UNIT src/jws.c: In function '__jws_init': src/jws.c:594:38: error: passing argument 2 of 'hap_register_unittest' from incompatible pointer type [-Wincompatible-pointer-types] 594 \| hap_register_unittest("jwk", jwk_debug); \| ^~~~~~~~~ \| \| \| int ()(int, char ) In file included from include/haproxy/api.h:36, from include/import/ebtree.h:251, from include/import/ebmbtree.h:25, from include/haproxy/jwt-t.h:25, from src/jws.c:5: include/haproxy/init.h:37:52: note: expected 'int ()(void)' but argument is of type 'int ()(int, char )' 37 \| void hap_register_unittest(const char name, int (*fct)()); \| ~~~~~~^~~~~~ GCC 15 is warning because the function pointer does have its arguments in the register function. Should fix issue #2929.	2025-04-15 15:49:44 +02:00
Willy Tarreau	b708345c17	DEBUG: counters: add the ability to enable/disable updating the COUNT_IF counters These counters can have a noticeable cost on large machines, though not dramatic. There's no single good choice to keep them enabled or disabled. This commit adds multiple choices: - DEBUG_COUNTERS set to 2 will automatically enable them by default, while 1 will disable them by default - the global "debug.counters on/off" will allow to change the setting at boot, regardless of DEBUG_COUNTERS as long as it was at least 1. - the CLI "debug counters on/off" will also allow to change the value at run time, allowing to observe a phenomenon while it's happening, or to disable counters if it's suspected that their cost is too high Finally, the "debug counters" command will append "(stopped)" at the end of the CNT lines when these counters are stopped. Not that the whole mechanism would easily support being extended to all counter types by specifying the types to apply to, but it doesn't seem useful at all and would require the user to also type "cnt" on debug lines. This may easily be changed in the future if it's found relevant.	2025-04-14 19:02:13 +02:00
Willy Tarreau	a142adaba0	DEBUG: counters: make COUNT_IF() only appear at DEBUG_COUNTERS>=1 COUNT_IF() is convenient but can be heavy since some of them were found to trigger often (roughly 1 counter per request on avg). This might even have an impact on large setups due to the cost of a shared cache line bouncing between multiple cores. For now there's no way to disable it, so let's only enable it when DEBUG_COUNTERS is 1 or above. A future change will make it configurable.	2025-04-14 19:02:13 +02:00
Willy Tarreau	61d633a3ac	DEBUG: rename DEBUG_GLITCHES to DEBUG_COUNTERS and enable it by default Till now the per-line glitches counters were only enabled with the confusingly named DEBUG_GLITCHES (which would not turn glitches off when disabled). Let's instead change it to DEBUG_COUNTERS and make sure it's enabled by default (though it can still be disabled with -DDEBUG_GLITCHES=0 just like for DEBUG_STRICT). It will later be expanded to cover more counters.	2025-04-14 19:02:13 +02:00
William Lallemand	39c05cedff	BUILD: acme: enable the ACME feature when JWS is present The ACME feature depends on the JWS, which currently does not work with every SSL libraries. This patch only enables ACME when JWS is enabled.	2025-04-12 01:39:03 +02:00
William Lallemand	5500bda9eb	MINOR: acme: implement retrieval of the certificate Once the Order status is "valid", the certificate URL is accessible, this patch implements the retrieval of the certificate which is stocked in ctx->store.	2025-04-12 01:39:03 +02:00
William Lallemand	27fff179fe	MINOR: acme: verify the order status once finalized This implements a call to the order status to check if the certificate is ready.	2025-04-12 01:39:03 +02:00
William Lallemand	680222b382	MINOR: acme: finalize by sending the CSR This patch does the finalize step of the ACME task. This encodes the CSR into base64 format and send it to the finalize URL. https://www.rfc-editor.org/rfc/rfc8555#section-7.4	2025-04-12 01:29:27 +02:00
William Lallemand	de5dc31a0d	MINOR: acme: generate the CSR in a X509_REQ Generate the X509_REQ using the generated private key and the SAN from the configuration. This is only done once before the task is started. It could probably be done at the beginning of the task with the private key generation once we have a scheduler instead of a CLI command.	2025-04-12 01:29:27 +02:00
William Lallemand	00ba62df15	MINOR: acme: implement a check on the challenge status This patch implements a check on the challenge URL, once haproxy asked for the challenge to be verified, it must verify the status of the challenge resolution and if there weren't any error.	2025-04-12 01:29:27 +02:00
William Lallemand	711a13a4b4	MINOR: acme: send the request for challenge ready This patch sends the "{}" message to specify that a challenge is ready. It iterates on every challenge URL in the authorization list from the acme_ctx. This allows the ACME server to procede to the challenge validation. https://www.rfc-editor.org/rfc/rfc8555#section-7.5.1	2025-04-12 01:29:27 +02:00
William Lallemand	ae0bc88f91	MINOR: acme: get the challenges object from the Auth URL This patch implements the retrieval of the challenges objects on the authorizations URLs. The challenges object contains a token and a challenge url that need to be called once the challenge is setup. Each authorization URLs contain multiple challenge objects, usually one per challenge type (HTTP-01, DNS-01, ALPN-01... We only need to keep the one that is relevent to our configuration.	2025-04-12 01:29:27 +02:00
William Lallemand	4842c5ea8c	MINOR: acme: newOrder request retrieve authorizations URLs This patch implements the newOrder action in the ACME task, in order to ask for a new certificate, a list of SAN is sent as a JWS payload. the ACME server replies a list of Authorization URLs. One Authorization is created per SAN on a Order. The authorization URLs are stored in a linked list of 'struct acme_auth' in acme_ctx, so we can get the challenge URLs from them later. The location header is also store as it is the URL of the order object. https://datatracker.ietf.org/doc/html/rfc8555#section-7.4	2025-04-12 01:29:27 +02:00
William Lallemand	04d393f661	MINOR: acme: generate new account The new account action in the ACME task use the same function as the chkaccount, but onlyReturnExisting is not sent in this case!	2025-04-12 01:29:27 +02:00
William Lallemand	7f9bf4d5f7	MINOR: acme: check if the account exist This patch implements the retrival of the KID (account identifier) using the pkey. A request is sent to the newAccount URL using the onlyReturnExisting option, which allow to get the kid of an existing account. acme_jws_payload() implement a way to generate a JWS payload using the nonce, pkey and provided URI.	2025-04-12 01:29:27 +02:00
William Lallemand	0aa6dedf72	MINOR: acme: handle the nonce ACME requests are supposed to be sent with a Nonce, the first Nonce should be retrieved using the newNonce URI provided by the directory. This nonce is stored and must be replaced by the new one received in the each response.	2025-04-12 01:29:27 +02:00
William Lallemand	471290458e	MINOR: acme: get the ACME directory The first request of the ACME protocol is getting the list of URLs for the next steps. This patch implements the first request and the parsing of the response. The response is a JSON object so mjson is used to parse it.	2025-04-12 01:29:27 +02:00
William Lallemand	b8209cf697	MINOR: acme/cli: add the 'acme renew' command The "acme renew" command launch the ACME task for a given certificate. The CLI parser generates a new private key using the parameters from the acme section..	2025-04-12 01:29:27 +02:00
William Lallemand	bf6a39c4d1	MINOR: acme: add private key configuration This commit allows to configure the generated private keys, you can configure the keytype (RSA/ECDSA), the number of bits or the curves. Example: acme LE uri https://acme-staging-v02.api.letsencrypt.org/directory account account.key contact foobar@example.com challenge HTTP-01 keytype ECDSA curves P-384	2025-04-12 01:29:27 +02:00
William Lallemand	2e8c350b95	MINOR: acme: add configuration for the crt-store Add new acme keywords for the ckch_conf parsing, which will be used on a crt-store, a crt line in a frontend, or even a crt-list. The cfg_postparser_acme() is called in order to check if a section referenced elsewhere really exists in the config file.	2025-04-12 01:29:27 +02:00
William Lallemand	077e2ce84c	MINOR: acme: add the acme section in the configuration parser Add a configuration parser for the new acme section, the section is configured this way: acme letsencrypt uri https://acme-staging-v02.api.letsencrypt.org/directory account account.key contact foobar@example.com challenge HTTP-01 When unspecified, the challenge defaults to HTTP-01, and the account key to "<section_name>.account.key". Section are stored in a linked list containing acme_cfg structures, the configuration parsing is mostly resolved in the postsection parser cfg_postsection_acme() which is called after the parsing of an acme section.	2025-04-12 01:29:27 +02:00
William Lallemand	20718f40b6	MEDIUM: ssl/ckch: add filename and linenum argument to crt-store parsing Add filename and linenum arguments to the crt-store / ckch_conf parsing. It allows to use them in the parsing function so we could emits error.	2025-04-12 01:29:27 +02:00
Willy Tarreau	00c967fac4	MINOR: master/cli: support bidirectional communications with workers Some rare commands in the worker require to keep their input open and terminate when it's closed ("show events -w", "wait"). Others maintain a per-session context ("set anon on"). But in its default operation mode, the master CLI passes commands one at a time to the worker, and closes the CLI's input channel so that the command can immediately close upon response. This effectively prevents these two specific cases from being used. Here the approach that we take is to introduce a bidirectional mode to connect to the worker, where everything sent to the master is immediately forwarded to the worker (including the raw command), allowing to queue multiple commands at once in the same session, and to continue to watch the input to detect when the client closes. It must be a client's choice however, since doing so means that the client cannot batch many commands at once to the master process, but must wait for these commands to complete before sending new ones. For this reason we use the prefix "@@<pid>" for this. It works exactly like "@" except that it maintains the channel open during the whole execution. Similarly to "@<pid>" with no command, "@@<pid>" will simply open an interactive CLI session to the worker, that will be ended by "quit" or by closing the connection. This can be convenient for the user, and possibly for clients willing to dedicate a connection to the worker.	2025-04-11 16:09:17 +02:00
Aurelien DARRAGON	fbfeb591f7	MINOR: proxy: add deinit_proxy() helper func Same as free_proxy(), but does not free the base proxy pointer (ie: the proxy itself may not be allocated) Goal is to be able to cleanup statically allocated dummy proxies.	2025-04-10 22:10:31 +02:00
Aurelien DARRAGON	e1cec655ee	MINOR: proxy: add setup_new_proxy() function Split alloc_new_proxy() in two functions: the preparing part is now handled by setup_new_proxy() which can be called individually, while alloc_new_proxy() takes care of allocating a new proxy struct and then calling setup_new_proxy() with the freshly allocated proxy.	2025-04-10 22:10:31 +02:00
Willy Tarreau	f4634e5a38	MINOR: ring/cli: support delimiting events with a trailing \0 on "show events" At the moment it is not supported to produce multi-line events on the "show events" output, simply because the LF character is used as the default end-of-event mark. However it could be convenient to produce well-formatted multi-line events, e.g. in JSON or other formats. UNIX utilities have already faced similar needs in the past and added "-print0" to "find" and "-0" to "xargs" to mention that the delimiter is the NUL character. This makes perfect sense since it's never present in contents, so let's do exactly the same here. Thus from now on, "show events <ring> -0" will delimit messages using a \0 instead of a \n, permitting a better and safer encapsulation.	2025-04-08 14:36:35 +02:00
Willy Tarreau	0be6d73e88	MINOR: ring: support arbitrary delimiters through ring_dispatch_messages() In order to support delimiting output events with other characters than just the LF, let's pass the delimiter through the API. The default remains the LF, used by applet_append_line(), and ignored by the log forwarder.	2025-04-08 14:36:35 +02:00
Willy Tarreau	f01ff2478f	BUILD: atomics: fix build issue on non-x86/non-arm systems Commit `f435a2e518` ("CLEANUP: atomics: also replace __sync_synchronize() with __atomic_thread_fence()") replaced the builtins used for barriers, but the different API required an argument while the macros didn't specify any, resulting in double parenthesis that were causing obscure build errors such as "called object type 'void' is not a function or function pointer". Let's just specify the args for the macro. No backport is needed.	2025-04-07 09:38:22 +02:00
Aurelien DARRAGON	11d4d0957e	MEDIUM: task: make notification_* API thread safe by default Some notification_* functions were not thread safe by default as they assumed only one producer would emit events for registered tasks. While this suited well with the Lua sockets use-case, this proved to be a limitation with some other event sources (ie: lua Queue class) instead of having to deal with both the non thread safe and thread safe variants (_mt suffix), which is error prone, let's make the entire API thread safe regarding the event list. Pruning functions still require that only one thread executes them, with Lua this is always the case because there is one cleanup list per context.	2025-04-03 17:52:50 +02:00
Aurelien DARRAGON	748dba4859	MINOR: hlua_fcn: register queue class using hlua_register_metatable() Most lua classes are registered by leveraging the hlua_register_metatable() helper. Let's use that for the Queue class as well for consitency.	2025-04-03 17:52:17 +02:00
Aurelien DARRAGON	b77b1a2c3a	MINOR: task: add thread safe notification_new and notification_wake variants notification_new and notification_wake were historically meant to be called by a single thread doing both the init and the wakeup for other tasks waiting on the signals. In this patch, we extend the API so that notification_new and notification_wake have thread-safe variants that can safely be used with multiple threads registering on the same list of events and multiple threads pushing updates on the list.	2025-04-03 17:52:03 +02:00
Amaury Denoyelle	f0f1816f1a	MINOR: check: implement check-pool-conn-name srv keyword This commit is a direct follow-up of the previous one. It defines a new server keyword check-pool-conn-name. It is used as the default value for the name parameter of idle connection hash generation. Its behavior is similar to server keyword pool-conn-name, but reserved for checks reuse. If check-pool-conn-name is set, it is used in priority to match a connection for reuse. If unset, a fallback is performed on check-sni.	2025-04-03 17:19:07 +02:00
Amaury Denoyelle	43367f94f1	MINOR: check/backend: support conn reuse with SNI Support for connection reuse during server checks was implemented recently. This is activated with the server keyword check-reuse-pool. Similarly to stream processing via connect_backend(), a connection hash is calculated when trying to perform reuse for checks. This is necessary to retrieve for a connection which shares the check connect parameters. However, idle connections can additionnally be tagged using a pool-conn-name or SNI under connect_backend(). Check reuse does not test these values, which prevent to retrieve a matching connection. Improve this by using "check-sni" value as idle connection hash input for check reuse. be_calculate_conn_hash() API has been adjusted so that name value can be passed as input, both when using streams or checks. Even with the current patch, there is still some scenarii which could not be covered for checks connection reuse. most notably, when using dynamic pool-conn-name/SNI value. It is however at least sufficient to cover simpler cases.	2025-04-03 17:19:07 +02:00
Willy Tarreau	f435a2e518	CLEANUP: atomics: also replace __sync_synchronize() with __atomic_thread_fence() The drop of older compilers also allows us to focus on clearer barriers, so let's use them.	2025-04-03 11:59:31 +02:00
Willy Tarreau	34e3b83f9c	CLEANUP: atomics: remove support for gcc < 4.7 The old __sync_* API is no longer necessary since we do not support gcc before 4.7 anymore. Let's just get rid of this code, the file is still ugly enough without it.	2025-04-03 11:55:35 +02:00
Ilia Shipitsin	27a6353ceb	CLEANUP: assorted typo fixes in the code, commits and doc	2025-04-03 11:37:25 +02:00
William Lallemand	b351f06ff1	REORG: ssl: move curves2nid and nid2nist to ssl_utils curves2nid and nid2nist are generic functions that could be used outside the JWS scope, this patch put them at the right place so they can be reused.	2025-04-02 19:34:09 +02:00
Amaury Denoyelle	f1fb396d71	MEDIUM: check: implement check-reuse-pool Implement the possibility to reuse idle connections when performing server checks. This is done thanks to the recently introduced functions be_calculate_conn_hash() and be_reuse_connection(). One side effect of this change is that be_calculate_conn_hash() can now be called with a NULL stream instance. As such, part of the functions are adjusted accordingly. Note that to simplify configuration, connection reuse is not performed if any specific check connection parameters are defined on the server line or via the tcp-check connect rule. This is performed via newly defined tcpcheck_use_nondefault_connect().	2025-04-02 14:57:40 +02:00
Amaury Denoyelle	e34f748e3a	MINOR: check define check-reuse-pool server keyword Define a new server keyword check-reuse-pool, and its counterpart with a "no" prefix. For the moment, only parsing is implemented. The real behavior adjustment will be implemented in the next patch.	2025-04-02 14:57:40 +02:00
Amaury Denoyelle	20eb57b486	MINOR: backend: remove stream usage on connection reuse Adjust newly defined be_reuse_connection() API. The stream argument is removed. This will allows checks to be able to invoke it without relying on a stream instance.	2025-04-02 14:57:40 +02:00
Amaury Denoyelle	ee94a6cfc1	MINOR: backend: extract conn reuse from connect_server() Following the previous patch, the part directly related to connection reuse is extracted from connect_server(). It is now define in a new function be_reuse_connection().	2025-04-02 14:57:40 +02:00
Amaury Denoyelle	c7cc6b6401	MINOR: backend: extract conn hash calculation from connect_server() On connection reuse, a hash is first calculated. It is generated from various connection parameters, to retrieve a matching connection. Extract hash calculation from connect_server() into a new dedicated function be_calculate_conn_hash(). The objective is to be able to perform connection reuse for checks, without connect_server() invokation which relies on a stream instance.	2025-04-02 14:57:40 +02:00
Willy Tarreau	4ec5509541	BUILD: compiler: undefine the CONCAT() macro if already defined As Ilya reported in issue #2911, the CONCAT() macro breaks on NetBSD which defines its own as __CONCAT() (which is exactly the same). Let's just undefine it before ours to fix the issue instead of renaming, but keep ours so that we don't have doubts about what we're running with. Note that the patch introducing this breaking change was backported to 3.0.	2025-04-02 11:36:43 +02:00
Ilia Shipitsin	78b849b839	CLEANUP: assorted typo fixes in the code and comments code, comments and doc actually.	2025-04-02 11:12:20 +02:00
Olivier Houchard	9fe72bba3c	MAJOR: leastconn; Revamp the way servers are ordered. For leastconn, servers used to just be stored in an ebtree. Each server would be one node. Change that so that nodes contain multiple mt_lists. Each list will contain servers that share the same key (typically meaning they have the same number of connections). Using mt_lists means that as long as tree elements already exist, moving a server from one tree element to another does no longer require the lbprm write lock. We use multiple mt_lists to reduce the contention when moving a server from one tree element to another. A list in the new element will be chosen randomly. We no longer remove a tree element as soon as they no longer contain any server. Instead, we keep a list of all elements, and when we need a new element, we look at that list only if it contains a number of elements already, otherwise we'll allocate a new one. Keeping nodes in the tree ensures that we very rarely have to take the lbrpm write lock (as it only happens when we're moving the server to a position for which no element is currently in the tree). The number of mt_lists used is defined as FWLC_NB_LISTS. The number of tree elements we want to keep is defined as FWLC_MIN_FREE_ENTRIES, both in defaults.h. The value used were picked afrer experimentation, and seems to be the best choice of performances vs memory usage. Doing that gives a good boost in performances when a lot of servers are used. With a configuration using 500 servers, before that patch, about 830000 requests per second could be processed, with that patch, about 1550000 requests per second are processed, on an 64-cores AMD, using 1200 concurrent connections.	2025-04-01 18:05:30 +02:00
Olivier Houchard	ba521a1d88	MINOR: threads: Add HA_RWLOCK_TRYRDTOWR() Add HA_RWLOCK_TRYRDTOWR(), that tries to upgrade a lock from reader to writer, and fails if any seeker or writer already holds it.	2025-04-01 18:05:30 +02:00
Olivier Houchard	2a9436f96b	MINOR: lbprm: Add method to deinit server and proxy Add two new methods to lbprm, server_deinit() and proxy_deinit(), in case something should be done at the lbprm level when removing servers and proxies.	2025-04-01 18:05:30 +02:00
Olivier Houchard	17059098e7	MINOR: mt_list: Implement mt_list_try_lock_prev(). Implement mt_list_try_lock_prev(), that does the same thing as mt_list_lock_prev(), exceot if the list is locked, it returns { NULL, NULL } instaed of waiting.	2025-04-01 18:05:30 +02:00
William Lallemand	fdcb97614c	MINOR: ssl/ckch: add substring parser for ckch_conf Add a substring parser for the ckch_conf keyword parser, this will split a string into multiple substring, and strdup them in a array.	2025-04-01 15:38:32 +02:00
William Lallemand	f8fe84caca	MINOR: jws: emit the JWK thumbprint jwk_thumbprint() is a function which is a function which implements RFC7368 and emits a JWK thumbprint using a EVP_PKEY. EVP_PKEY_EC_to_pub_jwk() and EVP_PKEY_RSA_to_pub_jwk() were changed in order to match what is required to emit a thumbprint (ie, no spaces or lines and the lexicographic order of the fields)	2025-04-01 11:57:55 +02:00
Willy Tarreau	1e9a2529aa	MINOR: cpu-topo: pass an extra argument to ha_cpu_policy This extra argument will allow common functions to distinguish between multiple policies. For now it's not used.	2025-03-31 16:21:37 +02:00
Willy Tarreau	571573874a	MINOR: cpu-set: add a new function to print cpu-sets in human-friendly mode The new function "print_cpu_set()" will print cpu sets in a human-friendly way, with commas and dashes for intervals. The goal is to keep them compact enough.	2025-03-31 16:21:37 +02:00
Willy Tarreau	3955f151b1	MINOR: cpu-set: compare two cpu sets with ha_cpuset_isequal() This function returns true if two CPU sets are equal.	2025-03-31 16:21:37 +02:00
Valentine Krasnobaeva	b303861469	MINOR: compiler: add __nonstring macro GCC 15 throws the following warning on fixed-size char arrays if they do not contain terminated NUL: src/tools.c:2041:25: error: initializer-string for array of 'char' truncates NUL terminator but destination lacks 'nonstring' attribute (17 chars into 16 available) [-Werror=unterminated-string-initialization] 2041 \| const char hextab[16] = "0123456789ABCDEF"; We are using a couple of such definitions for some constants. Converting them to flexible arrays, like: hextab[] = "0123456789ABCDEF" may have consequences, as enlarged arrays won't fit anymore where they were possibly located due to the memory alignement constraints. GCC adds 'nonstring' variable attribute for such char arrays, but clang and other compilers don't have it. Let's wrap 'nonstring' with our __nonstring macro, which will test if the compiler supports this attribute. This fixes the issue #2910.	2025-03-31 13:50:28 +02:00
Willy Tarreau	6b17310757	MEDIUM: pools: be a bit smarter when merging comparable size pools By default, pools of comparable sizes are merged together. However, the current algorithm is dumb: it rounds the requested size to the next multiple of 16 and compares the sizes like this. This results in many entries which are already multiples of 16 not being merged, for example 1024 and 1032 are separate, 65536 and 65540 are separate, 48 and 56 are separate (though 56 merges with 64). This commit changes this to consider not just the entry size but also the average entry size, that is, it compares the average size of all objects sharing the pool with the size of the object looking for a pool. If the object is not more than 1% bigger nor smaller than the current average size or if it neither 16 bytes smaller nor larger, then it can be merged. Also, it always respects exact matches in order to avoid merging objects into larger pools or worse, extending existing ones for no reason, and when there's a tie, it always avoids extending an existing pool. Also, we now visit all existing pools in order to spot the best one, we do not stop anymore at the smallest one large enough. Theoretically this could cost a bit of CPU but in practice it's O(N^2) with N quite small (typically in the order of 100) and the cost at each step is very low (compare a few integer values). But as a side effect, pools are no longer sorted by size, "show pools bysize" is needed for this. This causes the objects to be much better grouped together, accepting to use a little bit more sometimes to avoid fragmentation, without causing everyone to be merged into the same pool. Thanks to this we're now seeing 36 pools instead of 48 by default, with some very nice examples of compact grouping: - Pool qc_stream_r (80 bytes) : 13 users > qc_stream_r : size=72 flags=0x1 align=0 > quic_cstrea : size=80 flags=0x1 align=0 > qc_stream_a : size=64 flags=0x1 align=0 > hlua_esub : size=64 flags=0x1 align=0 > stconn : size=80 flags=0x1 align=0 > dns_query : size=64 flags=0x1 align=0 > vars : size=80 flags=0x1 align=0 > filter : size=64 flags=0x1 align=0 > session pri : size=64 flags=0x1 align=0 > fcgi_hdr_ru : size=72 flags=0x1 align=0 > fcgi_param_ : size=72 flags=0x1 align=0 > pendconn : size=80 flags=0x1 align=0 > capture : size=64 flags=0x1 align=0 - Pool h3s (56 bytes) : 17 users > h3s : size=56 flags=0x1 align=0 > qf_crypto : size=48 flags=0x1 align=0 > quic_tls_se : size=48 flags=0x1 align=0 > quic_arng : size=56 flags=0x1 align=0 > hlua_flt_ct : size=56 flags=0x1 align=0 > promex_metr : size=48 flags=0x1 align=0 > conn_hash_n : size=56 flags=0x1 align=0 > resolv_requ : size=48 flags=0x1 align=0 > mux_pt : size=40 flags=0x1 align=0 > comp_state : size=40 flags=0x1 align=0 > notificatio : size=48 flags=0x1 align=0 > tasklet : size=56 flags=0x1 align=0 > bwlim_state : size=48 flags=0x1 align=0 > xprt_handsh : size=48 flags=0x1 align=0 > email_alert : size=56 flags=0x1 align=0 > caphdr : size=41 flags=0x1 align=0 > caphdr : size=41 flags=0x1 align=0 - Pool quic_cids (32 bytes) : 13 users > quic_cids : size=16 flags=0x1 align=0 > quic_tls_ke : size=32 flags=0x1 align=0 > quic_tls_iv : size=12 flags=0x1 align=0 > cbuf : size=32 flags=0x1 align=0 > hlua_queuew : size=24 flags=0x1 align=0 > hlua_queue : size=24 flags=0x1 align=0 > promex_modu : size=24 flags=0x1 align=0 > cache_st : size=24 flags=0x1 align=0 > spoe_appctx : size=32 flags=0x1 align=0 > ehdl_sub_tc : size=32 flags=0x1 align=0 > fcgi_flt_ct : size=16 flags=0x1 align=0 > sig_handler : size=32 flags=0x1 align=0 > pipe : size=24 flags=0x1 align=0 - Pool quic_crypto (1032 bytes) : 2 users > quic_crypto : size=1032 flags=0x1 align=0 > requri : size=1024 flags=0x1 align=0 - Pool quic_conn_r (65544 bytes) : 2 users > quic_conn_r : size=65536 flags=0x1 align=0 > dns_msg_buf : size=65540 flags=0x1 align=0 On a very unscientific test consisting in sending 1 million H1 requests and 1 million H2 requests to the stats page, we're seeing an ~6% lower memory usage with the patch: before the patch: Total: 48 pools, 4120832 bytes allocated, 4120832 used (~3555680 by thread caches). after the patch: Total: 36 pools, 3880648 bytes allocated, 3880648 used (~3299064 by thread caches). This should be taken with care however since pools allocate and release in batches.	2025-03-25 18:01:01 +01:00
Pierre-Andre Savalle	8ed1e91efd	MEDIUM: lb-chash: add directive hash-preserve-affinity When using hash-based load balancing, requests are always assigned to the server corresponding to the hash bucket for the balancing key, without taking maxconn or maxqueue into account, unlike in other load balancing methods like 'first'. This adds a new backend directive that can be used to take maxconn and possibly maxqueue in that context. This can be used when hashing is desired to achieve cache locality, but sending requests to a different server is preferable to queuing for a long time or failing requests when the initial server is saturated. By default, affinity is preserved as was the case previously. When 'hash-preserve-affinity' is set to 'maxqueue', servers are considered successively in the order of the hash ring until a server that does not have a full queue is found. When 'maxconn' is set on a server, queueing cannot be disabled, as 'maxqueue=0' means unlimited. To support picking a different server when a server is at 'maxconn' irrespective of the queue, 'hash-preserve-affinity' can be set to 'maxconn'.	2025-03-25 18:01:01 +01:00
Amaury Denoyelle	cf9e40bd8a	MINOR: quic: define max-stream-data configuration as a ratio	2025-03-25 16:30:35 +01:00
Amaury Denoyelle	68c10d444d	MINOR: mux-quic: define config for max-data Define a new global configuration tune.quic.frontend.max-data. This allows users to explicitely set the value for the corresponding QUIC TP initial-max-data, with direct impact on haproxy memory consumption.	2025-03-25 16:30:09 +01:00
Amaury Denoyelle	a71007c088	MINOR: quic: move global tune options into quic_tune A new structure quic_tune has recently been defined. Its purpose is to store global options related to QUIC. Previously, only the tunable to toggle pacing was stored in it. This commit moves several QUIC related tunable from global to quic_tune structure. This better centralizes QUIC configuration option and gives room for future generic options.	2025-03-24 10:01:46 +01:00
Willy Tarreau	9091c5317f	MINOR: cli/pools: record the list of pool registrations even when merging them By default, create_pool() tries to merge similar pools into one. But when dealing with certain bugs, it's hard to say which ones were merged together. We do have the information at registration time, so let's just create a list of registrations ("pool_registration") attached to each pool, that will store that information. It can then be consulted on the CLI using "show pools detailed", where the names, sizes, alignment and flags are reported.	2025-03-21 17:09:30 +01:00
Aurelien DARRAGON	7ec6f4412c	MINOR: stats: add alt_name field to stat_col struct alt_name will be used by metric exporters to know how the metric should be presented to the user. If the alt_name is NULL, the metric should be ignored. For now only promex exporter will make use of this.	2025-03-21 17:04:54 +01:00
Olivier Houchard	98967aa09f	MEDIUM: mt_list: Reduce the max number of loops with exponential backoff Reduce the max number of loops in the mt_list code while waiting for a lock to be available with exponential backoff. It's been observed that the current value led to severe performances degradation at least on some hardware, hopefully this value will be acceptable everywhere.	2025-03-21 11:30:59 +01:00
Aurelien DARRAGON	af68343a56	MINOR: stats: use stat_col storage stat_cols_info Use stat_col storage for stat_cols_info[] array instead of name_desc. As documented in `65624876f` ("MINOR: stats: introduce a more expressive stat definition method"), stat_col supersedes name_desc storage but it remains backward compatible. Here we migrate to the new API to be able to further extend stat_cols_info[] in following patches.	2025-03-20 11:38:32 +01:00
Aurelien DARRAGON	9c60fc9fe1	MINOR: stats: STATS_PX_CAP___B_ macro STATS_PX_CAP___B_ points to STATS_PX_CAP_BE, it is just an alias for consistency, like STATS_PX_CAP____S which points to STATS_PX_CAP_SRV.	2025-03-20 11:37:47 +01:00
Aurelien DARRAGON	3c1b00b127	MINOR: stats: add .generic explicit field in stat_col struct Further extend logic implemented in `65624876` ("MINOR: stats: introduce a more expressive stat definition method") and `4e9e8418` ("MINOR: stats: prepare stats-file support for values other than FN_COUNTER"): we don't rely anymore on the presence of the capability to know if the metric is generic or not. This is because it prevents us from setting a capability on static statistics. Yet it could be useful to set the capability even on static metrics, thus we add a dedicated .generic bit to tell haproxy that the metric is generic and can be handled automatically by the API. Also, ME_NEW_* helpers are not explicitly associated to generic metric definition (as it was already the case before) to avoid ambiguities. It may change in the future as we may need to use the new definition method to define static metrics (without the generic bit set). But for now it isn't the case as this need definition was implemented for generic metrics support in the first place. If we want to define static metrics using the API, we could add a new set of helpers for instance.	2025-03-20 11:37:21 +01:00
William Lallemand	2fb6270910	MEDIUM: ssl/ckch: make the ckch_conf more generic The ckch_store_load_files() function makes specific processing for PARSE_TYPE_STR as if it was a type only used for paths. This patch changes a little bit the way it's done, PARSE_TYPE_STR is only meant to strdup() a string and stores the resulting pointer in the ckch_conf structure. Any processing regarding the path is now done in the callback. Since the callbacks were basically doing the same thing, they were transformed into the DECLARE_CKCH_CONF_LOAD() macros which allows to do some templating of these functions. The resulting ckch_conf_load_* functions will do the same as before, except they will also do the path processing instead of letting ckch_store_load_files() do it, which means we don't need the "base" member anymore in the struct ckch_conf_kws.	2025-03-19 18:08:40 +01:00
William Lallemand	b0ad777902	MINOR: tools: path_base() concatenates a path with a base path With the SSL configuration, crt-base, key-base are often used, these keywords concatenates the base path with the path when the path does not start by '/'. This is done at several places in the code, so a function to do this would be better to standardize the code.	2025-03-19 17:59:31 +01:00
William Lallemand	29b4b985c3	MINOR: jws: use jwt_alg type instead of a char This patch implements the function EVP_PKEY_to_jws_algo() which returns a jwt_alg compatible with the private key. This value can then be passed to jws_b64_protected() and jws_b64_signature() which modified to take an jwt_alg instead of a char.	2025-03-17 18:06:34 +01:00
William Lallemand	de67f25a7e	MINOR: jws: add new functions in jws.h Add signatures of jws_b64_payload(), jws_b64_protected(), jws_b64_signature(), jws_flattened() which allows to create a complete JWS flattened object.	2025-03-17 11:51:52 +01:00
Willy Tarreau	156430ceb6	MINOR: cpu-topo: add a CPU policy setting to the global section We'll need to let the user decide what's best for their workload, and in order to do this we'll have to provide tunable options. For that, we're introducing struct ha_cpu_policy which contains a name, a description and a function pointer. The purpose will be to use that function pointer to choose the best CPUs to use and now to set the number of threads and thread-groups, that will be called during the thread setup phase. The only supported policy for now is "none" which doesn't set/touch anything (i.e. all available CPUs are used).	2025-03-14 18:33:16 +01:00
Willy Tarreau	c93ee25054	MINOR: cpu-topo: add "only-node" and "drop-node" to cpu-set These are processed after the topology is detected, and they allow to restrict binding to or evict CPUs matching the indicated node(s).	2025-03-14 18:33:16 +01:00
Willy Tarreau	aa4776210b	MINOR: cpu-topo: create an array of the clusters The goal here is to keep an array of the known CPU clusters, because we'll use that often to decide of the performance of a cluster and its relevance compared to other ones. We'll store the number of CPUs in it, the total capacity etc. For the capacity, we count one unit per core, and 1/3 of it per extra SMT thread, since this is roughly what has been measured on modern CPUs. In order to ease debugging, they're also dumped with -dc.	2025-03-14 18:30:31 +01:00
Willy Tarreau	4a6eaf6c5e	MINOR: cpu-topo: add a function to sort by cluster+capacity The purpose here is to detect heterogenous clusters which are not properly reported, based on the exposed information about the cores capacity. The algorithm here consists in sorting CPUs by capacity within a cluster, and considering as equal all those which have 5% or less difference in capacity with the previous one. This allows large clusters of more than 5% total between extremities, while keeping apart those where the limit is more pronounced. This is quite common in embedded environments with big.little systems, as well as on some laptops.	2025-03-14 18:30:31 +01:00
Willy Tarreau	d169758fa9	MINOR: cpu-topo: make sure we don't leave unassigned IDs in the cpu_topo It's important that we don't leave unassigned IDs in the topology, because the selection mechanism is based on index-based masks, so an unassigned ID will never be kept. This is particularly visible on systems where we cannot access the CPU topology, the package id, node id and even thread id are set to -1, and all CPUs are evicted due to -1 not being set in the "only-cpu" sets. Here in new function "cpu_fixup_topology()", we assign them with the smallest unassigned value. This function will be used to assign IDs where missing in general.	2025-03-14 18:30:31 +01:00
Willy Tarreau	af648c7b58	MINOR: cpu-topo: assign clusters to cores without and renumber them Due to the previous commit we can end up with cores not assigned any cluster ID. For this, at the end we sort the CPUs by topology and assign cluster IDs to remaining CPUs based on pkg/node/llc. For example an 14900 now shows 5 clusters, one for the 8 p-cores, and 4 of 4 e-cores each. The local cluster numbers are per (node,pkg) ID so that any rule could easily be applied on them, but we also keep the global numbers that will help with thread group assignment. We still need to force to assign distinct cluster IDs to cores running on a different L3. For example the EPYC 74F3 is reported as having 8 different L3s (which is true) and only one cluster. Here we introduce a new function "cpu_compose_clusters()" that is called from the main init code just after cpu_detect_topology() so that it's not OS-dependent. It deals with this renumbering of all clusters in topology order, taking care of considering any distinct LLC as being on a distinct cluster.	2025-03-14 18:30:31 +01:00
Willy Tarreau	a4471ea56d	MINOR: cpu-topo: implement a CPU sorting mechanism by cluster ID This will be used to detect and fix incorrect setups which report the same cluster ID for multiple L3 instances. The arrangement of functions in this file is becoming a real problem. Maybe we should move all this to cpu_topo for example, and better distinguish OS-specific and generic code.	2025-03-14 18:30:31 +01:00
Willy Tarreau	a8acdbd9fd	MINOR: cpu-topo: implement a sorting mechanism by CPU locality Once we've kept only the CPUs we want, the next step will be to form groups and these ones are based on locality. Thus we'll have to sort by locality. For now the locality is only inferred by the index. No grouping is made at this point. For this we add the "cpu_reorder_by_locality" function with a locality-based comparison function.	2025-03-14 18:30:31 +01:00
Willy Tarreau	18133a054d	MINOR: cpu-topo: implement a sorting mechanism for CPU index CPU selection will be performed by sorting CPUs according to various criteria. For dumps however, that's really not convenient and we'll need to reorder the CPUs according to their index only. This is what the new function cpu_reorder_by_index() does. It's called in thread_detect_count() before dumping the CPU topology.	2025-03-14 18:30:31 +01:00
Willy Tarreau	1af4942c95	MEDIUM: thread: start to detect thread groups and threads min/max By mutually refining the thread count and group count, we can try to detect the most suitable setup for the current machine. Taskset is implicitly handled correctly. tgroups automatically adapt to the configured number of threads. cpu-map manages to limit tgroups to the smallest supported value. The thread-limit is enforced. Just like in cfgparse, if the thread count was forced to a higher value, it's reduced and a warning is emitted. But if it was not set, the thr_max value is bound to this limit so that further calculations respect it. We continue to default to the max number of available threads and 1 tgroup by default, with the limit. This normally allows to get rid of that test in check_config_validity().	2025-03-14 18:30:30 +01:00
Willy Tarreau	f0661e79fe	MINOR: global: add a command-line option to enable CPU binding debugging During development, everything related to CPU binding and the CPU topology is debugged using state dumps at various places, but it does make sense to have a real command line option so that this remains usable in production to help users figure why some CPUs are not used by default. Let's add "-dc" for this. Since the list of global.tune.options values is almost full and does not 100% match this option, let's add a new "tune.debug" field for this.	2025-03-14 18:30:30 +01:00
Willy Tarreau	ac1db9db7d	MINOR: thread: turn thread_cpu_mask_forced() into an init-time variable The function is not convenient because it doesn't allow us to undo the startup changes, and depending on where it's being used, we don't know whether the values read have already been altered (this is not the case right now but it's going to evolve). Let's just compute the status during cpu_detect_usable() and set a variable accordingly. This way we'll always read the init value, and if needed we can even afford to reset it. Also, placing it in cpu_topo.c limits cross-file dependencies (e.g. threads without affinity etc).	2025-03-14 18:30:30 +01:00
Willy Tarreau	7cb274439b	MINOR: cpu-topo: add CPU topology detection for linux This uses the publicly available information from /sys to figure the cache and package arrangements between logical CPUs and fill ha_cpu_topo[], as well as their SMT capabilities and relative capacity for those which expose this. The functions clearly have to be OS-specific.	2025-03-14 18:30:30 +01:00
Willy Tarreau	8f72ce335a	MINOR: cpu-topo: add detection of online CPUs on Linux This adds a generic function ha_cpuset_detect_online() which for now only supports linux via /sys. It fills a cpuset with the list of online CPUs that were detected (or returns a failure).	2025-03-14 18:30:30 +01:00
Willy Tarreau	8c524c7c9d	REORG: cpu-topo: move bound cpu detection from cpuset to cpu-topo The cpuset files are normally used only for cpu manipulations. It happens that the initial CPU binding detection was initially placed there since there was no better place, but in practice, being OS-specific, it should really be in cpu-topo. This simplifies cpuset which doesn't need to know about the OS anymore.	2025-03-14 18:30:30 +01:00
Willy Tarreau	a6fdc3eaf0	MINOR: cpu-topo: update CPU topology from excluded CPUs at boot Now before trying to resolve the thread assignment to groups, we detect which CPUs are not bound at boot so that we can mark them with HA_CPU_F_EXCLUDED. This will be useful to better know on which CPUs we can count later. Note that we purposely ignore cpu-map here as we don't know how threads and groups will map to cpu-map entries, hence which CPUs will really be used. It's important to proceed this way so that when we have no info we assume they're all available.	2025-03-14 18:30:30 +01:00
Willy Tarreau	bdb731172c	MINOR: cpu-topo: add a function to dump CPU topology The new function cpu_dump_topology() will centralize most debugging calls, and it can make efforts of not dumping some possibly irrelevant fields (e.g. non-existing cache levels).	2025-03-14 18:30:30 +01:00
Willy Tarreau	041462c4af	MINOR: cpu-topo: rely on _SC_NPROCESSORS_CONF to trim maxcpus We don't want to constantly deal with as many CPUs as a cpuset can hold, so let's first try to trim the value to what the system claims to support via _SC_NPROCESSORS_CONF. It is obviously still subject to the limit of the cpuset size though. The value is stored globally so that we can reuse it elsewhere after initialization.	2025-03-14 18:30:30 +01:00
Willy Tarreau	656cedad42	MINOR: cpu-topo: allocate and initialize the ha_cpu_topo array. This does the bare minimum to allocate and initialize a global ha_cpu_topo array for the number of supported CPUs and release it at deinit time.	2025-03-14 18:30:30 +01:00
Willy Tarreau	d165f5d3ab	MINOR: cpu-topo: add ha_cpu_topo definition This structure will be used to store information about each CPU's topology (package ID, L3 cache ID, NUMA node ID etc). This will be used in conjunction with CPU affinity setting to try to perform a mostly optimal binding between threads and CPU numbers by default. Since it was noticed during tests that absolutely none of the many machines tested reports different die numbers, the die_id is not stored. Also, it was found along experiments that the cluster ID will be used a lot, half of the time as a node-local identifier, and half of the time as a global identifier. So let's store the two versions at once (cl_gid, cl_lid). Some flags are added to indicate causes of exclusion (offline, excluded at boot, excluded by rules, ignored by policy).	2025-03-14 18:30:30 +01:00
Willy Tarreau	69ac4cd315	MINOR: compiler: add a new __decl_thread_var() macro to declare local variables __decl_thread() already exists but is more suited for struct members. When using it in a variables block, it appends the final trailing semi-colon which is a statement that ends the variable block. Better clean this up and have one precisely for variable blocks. In this case we can simply define an unused enum value that will consume the semi-colon. That's what the new macro __decl_thread_var() does.	2025-03-12 18:08:12 +01:00
Willy Tarreau	bb4addabb7	MINOR: compiler: add a simple macro to concatenate resolved strings It's often useful to be able to concatenate strings after resolving them (e.g. __FILE__, __LINE__ etc). Let's just have a CONCAT() macro to do that, which calls _CONCAT() with the same arguments to make sure the contents are resolved before being concatenated.	2025-03-12 18:06:55 +01:00
Aurelien DARRAGON	003fe530ae	MINOR: log: add "option host" log-forward option add only the parsing part, options are currently unused	2025-03-12 10:51:35 +01:00

... 4 5 6 7 8 ...

8742 commits