haproxy

mirror of https://github.com/haproxy/haproxy.git synced 2026-05-26 03:03:51 -04:00

Author	SHA1	Message	Date
Olivier Houchard	dcce936912	MINOR: connections: Add a new CO_FL_SSL_NO_CACHED_INFO flag Add a new flag to connections, CO_FL_SSL_NO_CACHED_INFO, and set it for checks. It lets the ssl layer know that he should not use cached informations, such as the ALPN as stored in the server, or cached sessions. This wlil be used for checks, as checks may target different servers, or used a different SSL configuration, so we can't assume the stored informations are correct. This should be backported to 3.3, and may be backported up to 2.8 if the attempts to do session resume by checks is proven to be a problem.	2025-12-09 16:43:31 +01:00
Christopher Faulet	be998b590e	MEDIUM: ssl/server: No longer store the SNI of cached TLS sessions Thanks to the previous patch, "BUG/MEDIUM: ssl: Don't reuse TLS session if the connection's SNI differs", it is no useless to store the SNI of cached TLS sessions. This SNI is no longer tested and new connections reusing a session must have the same SNI. The main change here is for the ssl_sock_set_servername() function. It is no longer possible to compare the SNI of the reused session with the one of the new connection. So, the SNI is always set, with no other processing. Mainly, the session is not destroyed when SNIs don't match. It means the commit `119a4084bf` ("BUG/MEDIUM: ssl: for a handshake when server-side SNI changes") is implicitly reverted. It is good to note that it is unclear for me when and why the reused session should be destroyed. Because I'm unable to reproduce any issue fixed by the commit above. This patch could be backported as far as 3.0 with the commit above.	2025-12-08 15:22:01 +01:00
Christopher Faulet	28654f3c9b	MINOR: connection/ssl: Store the SNI hash value in the connection itself When a SNI is set on a new connection, its hash is now saved in the connection itself. To do so, a dedicated field was added into the connection strucutre, called sni_hash. For now, this value is only used when the TLS session is cached.	2025-12-08 15:22:01 +01:00
Christopher Faulet	9794585204	MINOR: ssl: Store hash of the SNI for cached TLS sessions For cached TLS sessions, in addition to the SNI itself, its hash is now also saved. No changes are expected here because this hash is not used for now. This commit relies on: * MINOR: ssl: Add a function to hash SNIs	2025-12-08 15:22:00 +01:00
Christopher Faulet	d993e1eeae	MINOR: ssl: Add a function to hash SNIs This patch only adds the function ssl_sock_sni_hash() that can be used to get the hash value corresponding to an SNI. A global seed, sni_hash_seed, is used.	2025-12-08 15:22:00 +01:00
Christopher Faulet	a83ed86b78	MEDIUM: quic: Add connection as argument when qc_new_conn() is called This patch reverts the commit `efe60745b` ("MINOR: quic: remove connection arg from qc_new_conn()"). The connection will be mandatory when the QUIC connection is created on backend side to fix an issue when we try to reuse a TLS session. So, the connection is again an argument of qc_new_conn(), the 4th argument. It is NULL for frontend QUIC connections but there is no special check on it.	2025-12-08 15:22:00 +01:00
Frederic Lecaille	90064ac88b	BUG/MINOR: quic: do not set first the default QUIC curves This patch impacts both the QUIC frontends and listeners. Note that "ssl-default-bind-ciphersuites", "ssl-default-bind-curves", are not ignored by QUIC by the frontend. This is also the case for the backends with "ssl-default-server-ciphersuites" and "ssl-default-server-curves". These settings are set by ssl_sock_prepare_ctx() for the frontends and by ssl_sock_prepare_srv_ssl_ctx() for the backends. But ssl_quic_initial_ctx() first sets the default QUIC frontends (see <quic_ciphers> and <quic_groups>) before these ssl_sock.c function are called, leading some TLS stack to refuse them if they do not support them. This is the case for some OpenSSL 3.5 stack with FIPS support. They do not support X25519. To fix this, set the default QUIC ciphersuites and curves only if not already set by the settings mentioned above. Rename <quic_ciphers> global variable to <default_quic_ciphersuites> and <quic_groups> to <default_quic_curves> to reflect the OpenSSL API naming. These options are taken into an account by ssl_quic_initial_ctx() which inspects these four variable before calling SSL_CTX_set_ciphersuites() with <default_quic_ciphersuites> as parameter and SSL_CTX_set_curves() with <default_quic_curves> as parameter if needed, that is to say, if no ciphersuites and curves were set by "ssl-default-bind-ciphersuites", "ssl-default-bind-curves" as global options or "ciphersuites", "curves" as "bind" line options. Note that the bind_conf struct is not modified when no "ciphersuites" or "curves" option are used on "bind" lines. On backend side, rely on ssl_sock_init_srv() to set the server ciphersuites and curves. This function is modified to use respectively <default_quic_ciphersuites> and <default_quic_curves> if no ciphersuites and curves were set by "ssl-default-server-ciphersuites", "ssl-default-server-curves" as global options or "ciphersuites", "curves" as "server" line options. Thank to @rwagoner for having reported this issue in GH #3194 when using an OpenSSL 3.5.4 stack with FIPS support. Must be backported as far as 2.6	2025-12-08 10:40:59 +01:00
Frederic Lecaille	c36e27d10e	BUG/MINOR: quic-be: handshake errors without connection stream closure This bug was revealed on backend side by reg-tests/ssl/del_ssl_crt-list.vtc when run wich QUIC connections. As expected by the test, a TLS alert is generated on servsr side. This latter sands a CONNECTION_CLOSE frame with a CRYPTO error (>= 0x100). In this case the client closes its QUIC connection. But the stream connection was not informed. This leads the connection to be closed after the server timeout expiration. It shouls be closed asap. This is the reason why reg-tests/ssl/del_ssl_crt-list.vtc could succeeds or failed, but only after a 5 seconds delay. To fix this, mimic the ssl_sock_io_cb() for TCP/SSL connections. Call the same code this patch implements with ssl_sock_handle_hs_error() to correctly handle the handshake errors. Note that some SSL counters were not incremented for both the backends and frontends. After such errors, ssl_sock_io_cb() start the mux after the connection has been flagged in error. This has as side effect to close the stream in conn_create_mux(). Must be backported to 3.3 only for backends. This is not sure at this time if this bug may impact the frontends.	2025-12-08 10:40:59 +01:00
Amaury Denoyelle	47dff5be52	MINOR: quic: implement cc-algo server keyword Some checks failed Contrib / build (push) Has been cancelled Details alpine/musl / gcc (push) Has been cancelled Details VTest / Generate Build Matrix (push) Has been cancelled Details Windows / Windows, gcc, all features (push) Has been cancelled Details VTest / (push) Has been cancelled Details Extend QUIC server configuration so that congestion algorithm and maximum window size can be set on the server line. This can be achieved using quic-cc-algo keyword with a syntax similar to a bind line. This should be backported up to 3.3 as this feature is considered as necessary for full QUIC backend support. Note that this relies on the serie of previous commits which should be picked first.	2025-12-01 15:53:58 +01:00
Amaury Denoyelle	979588227f	MINOR: quic: define quic_cc_algo as const Each QUIC congestion algorithm is defined as a structure with callbacks in it. Every quic_conn has a member pointing to the configured algorithm, inherited from the bind-conf keyword or to the default CUBIC value. Convert all these definitions to const. This ensures that there never will be an accidental modification of a globally shared structure. This also requires to mark quic_cc_algo field in bind_conf and quic_cc as const.	2025-12-01 15:05:41 +01:00
Willy Tarreau	36133759d3	[RELEASE] Released version 3.4-dev0 Some checks failed Contrib / build (push) Has been cancelled Details alpine/musl / gcc (push) Has been cancelled Details VTest / Generate Build Matrix (push) Has been cancelled Details Windows / Windows, gcc, all features (push) Has been cancelled Details VTest / (push) Has been cancelled Details Released version 3.4-dev0 with the following main changes : - MINOR: version: mention that it's development again	2025-11-26 16:12:45 +01:00
Willy Tarreau	e8d6ffb692	MINOR: version: mention that it's development again This essentially reverts `d8ba9a2a92`.	2025-11-26 16:11:47 +01:00
Willy Tarreau	d8ba9a2a92	MINOR: version: mention that 3.3 is stable now This version will be maintained up to around Q1 2027. The INSTALL file also mentions it.	2025-11-26 15:54:30 +01:00
Amaury Denoyelle	49e6fca51b	MINOR: quic: use separate global quic_conns FE/BE lists Each quic_conn instance is stored in a global list. Its purpose is to be able to loop over all known connections during "show quic". Split this into two separate lists for frontend and backend usage. Another change is that closing backend connections do not move into quic_conns_clo list. They remain instead in their original list. The objective of this patch is to reduce the contention between the two sides. Note that this prevents backend connections to be listed in "show quic" now. This will be adjusted in a future patch.	2025-11-25 14:30:18 +01:00
Amaury Denoyelle	a5801e542d	MINOR: quic: split global CID tree between FE and BE sides QUIC CIDs are stored in a global tree. Prior to this patch, CIDs used on both frontend and backend sides were mixed together. This patch implement CID storage separation between FE and BE sides. The original tre quic_cid_trees is splitted as quic_fe_cid_trees/quic_be_cid_trees. This patch should reduce contention between frontend and backend usages. Also, it should reduce the risk of random CID collision.	2025-11-25 14:30:18 +01:00
Jacques Heunis	91eb9b082b	BUG/MINOR: freq_ctr: Prevent possible signed overflow in freq_ctr_overshoot_period Some checks failed Contrib / build (push) Has been cancelled Details alpine/musl / gcc (push) Has been cancelled Details VTest / Generate Build Matrix (push) Has been cancelled Details Windows / Windows, gcc, all features (push) Has been cancelled Details VTest / (push) Has been cancelled Details All of the other bandwidth-limiting code stores limits and intermediate (byte) counters as unsigned integers. The exception here is freq_ctr_overshoot_period which takes in unsigned values but returns a signed value. While this has the benefit of letting the caller know how far away from overshooting they are, this is not currently leveraged anywhere in the codebase, and it has the downside of halving the positive range of the result. More concretely though, returning a signed integer when all intermediate values are unsigned (and boundaries are not checked) could result in an overflow, producing values that are at best unexpected. In the case of flt_bwlim (the only usage of freq_ctr_overshoot_period in the codebase at the time of writing), an overflow could cause the filter to wait for a large number of milliseconds when in fact it shouldn't wait at all. This is a niche possibility, because it requires that a bandwidth limit is defined in the range [2^31, 2^32). In this case, the raw limit value would not fit into a signed integer, and close to the end of the period, the `(elapsed * freq)/period` calculation could produce a value which also doesn't fit into a signed integer. If at the same time `curr` (the number of events counted so far in the current period) is small, then we could get a very large negative value which overflows. This is undefined behaviour and could produce surprising results. The most obvious outcome is flt_bwlim sometimes waiting for a large amount of time in a case where it shouldn't wait at all, thereby incorrectly slowing down the flow of data. Converting just the return type from signed to unsigned (and checking for the overflow) prevents this undefined behaviour. It also makes the range of valid values consistent between the input and output of freq_ctr_overshoot_period and with the input and output of other freq_ctr functions, thereby reducing the potential for surprise in intermediate calculations: now everything supports the full 0 - 2^32 range.	2025-11-24 14:10:13 +01:00
Christopher Faulet	8e08a635eb	MINOR: muxes: Support an optional ALPN string when defining mux protocols When a multiplexer protocol is defined, it is now possible to specify the ALPN it supports, in binary format. This info is optionnal. For now only the h2 and the h1 multiplexers define an ALPN because this will be mandatory for a fix. But this could be used in future for different purpose. This patch will be mandatory for the next fix.	2025-11-20 16:14:52 +01:00
Willy Tarreau	91d4f4f618	MINOR: limits: keep a copy of the rough estimate of needed FDs in global struct It's always a pain to guess the number of FDs that can be needed by listeners, checks, threads, pollers etc. We have this estimate in global.maxsock before calling set_global_maxconn(), but we lose it the line after. Let's copy it into global.est_fd_usage and keep it. This will be helpful to try to provide more accurate suggestions for maxconn.	2025-11-20 08:44:52 +01:00
Frederic Lecaille	a88fdf8669	MINOR: quic/flags: add missing QUIC flags for flags dev tool. Add missing QUIC_FL_CONN_XPRT_CLOSED quic_conn flags definition.	2025-11-20 08:10:58 +01:00
Amaury Denoyelle	d54d78fe9a	BUG/MINOR: quic: fix FD usage for quic_conn_closed on backend side Some checks are pending Contrib / build (push) Waiting to run Details alpine/musl / gcc (push) Waiting to run Details VTest / Generate Build Matrix (push) Waiting to run Details VTest / (push) Blocked by required conditions Details Windows / Windows, gcc, all features (push) Waiting to run Details On the frontend side, QUIC transfer can be performed either via a connection owned FD or multiplex on the listener one. When a quic_conn is freed and converted to quic_conn_closed instance, its FD if open is closed and all exchanges are now multiplex via the listener FD. This is different for the backend as connections only has the choice to use their owned FD. Thus, special care care must be taken when freeing a connection and converting it to a quic_conn_closed instance. In this case, qc_release_fd() is delayed to the quic_conn_closed release. Furthermore, when the FD is transferred, its iocb and owner fields are updated to the new quic_conn_closed instance. Without it, a crash will occur when accessing the freed quic_conn tasklet. A newly dedicated handler quic_conn_closed_sock_fd_iocb is used to ensure access to quic_conn_closed members only.	2025-11-19 16:02:22 +01:00
Amaury Denoyelle	e55bcf5746	BUG/MINOR: mux-quic: implement max-reuse server parameter Properly implement support for max-reuse server keyword. This is done by adding a total count of streams seen for the whole connection. This value is used in avail_streams callback.	2025-11-19 16:02:22 +01:00
Amaury Denoyelle	c67a614e45	MINOR: quic: remove <ipv4> arg from qc_new_conn() Some checks failed Contrib / build (push) Has been cancelled Details alpine/musl / gcc (push) Has been cancelled Details VTest / Generate Build Matrix (push) Has been cancelled Details Windows / Windows, gcc, all features (push) Has been cancelled Details VTest / (push) Has been cancelled Details Remove <ipv4> argument from qc_new_conn(). This parameter is unnecessary as it can be derived from the family type of the addresses also passed as argument.	2025-11-17 10:20:54 +01:00
Amaury Denoyelle	133f100467	MINOR: quic: refactor qc_new_conn() prototype The objective of this patch is to streamline qc_new_conn() usage so that it is similar for frontend and backend sides. Previously, several parameters were set only for frontend connections. These arguments are replaced by a single quic_rx_packet argument, which represents the INITIAL packet triggering the connection allocation on the server side. For a QUIC client endpoint, it remains NULL. This usage is consider more explicit. As a minor change, <target> is moved as the first argument of the function. This is considered useful as this argument determines whether the connection is a frontend or backend entry. Along with these changes, qc_new_conn() documentation has been reworded so that it is now up-to-date with the newest usage.	2025-11-17 10:13:40 +01:00
Amaury Denoyelle	49edaca513	MINOR: quic: try to clarify quic_conn CIDs fields direction quic_conn has two fields named <dcid> and <scid>. It may cause confusion as it is not obvious how these fields are related to the connection direction. Try to improve this by extending the documentation of these two fields.	2025-11-17 10:11:04 +01:00
Amaury Denoyelle	8720130cc7	MINOR: quic: do not use quic_newcid_from_hash64 on BE side quic_newcid_from_hash64 is an external callback. If defined, it serves as a CID method generation, as an alternative to the default random implementation. This mechanism was not correctly implemented on the backend side. Indeed, <hash64> quic_conn member is only setted for frontend connections. The simplest solution would be to properly define it also for backend ones. However, quic_newcid_from_hash64 derivation is really only useful for the frontend side for now. Thus, this patch disables using it on the backend side in favor of the default random generator. To implement this, quic_cid_generate() is splitted in two functions, for both methods of CIDs generation. This is the responsibility of the caller to select the proper method. On backend side, only random implementation is now used.	2025-11-17 10:11:04 +01:00
Christopher Faulet	fc6e3e9081	MINOR: stick-tables: Rename stksess shards to use buckets The shard keyword is already used by the peers and on the server lines. And it is unrelated with the session keys distribution. So instead of talking about shard for the session key hashing, we now use the term "bucket".	2025-11-17 07:42:51 +01:00
Willy Tarreau	675c86c4aa	DEBUG: add BUG_ON_STRESS(): a BUG_ON() implemented only when DEBUG_STRESS > 0 The purpose of this new BUG_ON is beyond BUG_ON_HOT(). While BUG_ON_HOT() is meant to be light but placed on very hot code paths, BUG_ON_STRESS() might be heavy and only used under stress-testing, to try to detect early that something bad is starting to happen. This one is not even type-checked when not defined because we don't want to risk the compiler emitting the slightest piece of code there in production mode, so as to give enough freedom to the developers.	2025-11-14 16:42:53 +01:00
Willy Tarreau	3d441e78e5	DEBUG: extend DEBUG_STRESS to ease testing and turn on extra checks DEBUG_STRESS is currently used only to expose "stress-level". With this patch, we go a bit further, by automatically forcing DEBUG_STRICT and DEBUG_STRICT_ACTION to their highest values in order to enable all BUG_ON levels, and make all of them result in a crash. In addition, care is taken to always only have 0 or 1 in the macro, so that it can be tested using "#if DEBUG_STRESS > 0" as well as "if (DEBUG_STRESS) { }" everywhere. The goal will be to ease insertion of extra tests for builds dedicated to stress-testing that enable possibly expensive extra checks on certain code paths that cannot reasonably be compiled in for production code right now.	2025-11-14 16:38:04 +01:00
Amaury Denoyelle	d79295d89b	Revert "BUG/MEDIUM: connections: permit to permanently remove an idle conn" The target patch fixes a rare race condition which happen when a MUX IO handler is working on a connection already moved into the purge list. In this case, the handler will incorrectly moved back the connection into the idle list. To fix this, conn_delete_from_tree() was extended to remove flags along with the connection from the idle list. This was performed when the connection is moved into the purge list. However, it introduces another issue related to the idle server connection accounting. Thus it is necessary to revert it prior to the incoming newer fix. This patch must be backported to every version where the original commit is.	2025-11-14 16:06:34 +01:00
William Lallemand	3d15c07ed0	MINOR: cfgcond: add "awslc_api_atleast" and "awslc_api_before" AWS-LC features are not easily tested with just the openssl version constant. AWS-LC uses its own API versioning stored in the AWSLC_API_VERSION constant. This patch add the two awslc_api_atleast and awslc_api_before predicates that help to check the AWS-LC API.	2025-11-14 11:01:45 +01:00
Amaury Denoyelle	8415254cea	MINOR: check: clarify check-reuse-pool interaction with reuse policy check-reuse-pool can only perform as expected if reuse policy on the backend is set to aggressive or higher. Update the documentation to reflect this and implement a server diag warning.	2025-11-14 10:44:05 +01:00
William Lallemand	2bdf5a7937	BUG/MEDIUM: acme: move from mt_list to a rwlock + ebmbtree The current ACME scheduler suffers from problems due to the way the tasks are stored: - MT_LIST are not scalables when having a lot of ACME tasks and having to look for a specific one. - the acme_task pointer was stored in the ckch_store in order to not passing through the whole list. But a ckch_store can be updated and the pointer lost in the previous one. - when a task fails, the ptr in the ckch_store was not removed because we only work with a copy of the original ckch_store, it would need to lock the ckchs_tree and remove this pointer. This patch fixes the issues by removing the MT_LIST-based architecture, and replacing it by a simple ebmbtree + rwlock design. The pointer to the task is not stored anymore in the ckch_store, but instead it is stored in the acme_tasks tree. Finding a task is done by doing a lookup on this tree with a RDLOCK. Instead of checking if store->acme_task is not NULL, a lookup is also done. This allow to remove the stuck "acme_task" pointer in the store, which was preventing to restart an acme task when the previous failed for this specific certificate. Must be backported in 3.2.	2025-11-13 15:18:12 +01:00
Frederic Lecaille	d84463f9f6	MINOR: quic-be: validate the 0-RTT transport parameters During 0-RTT sessions, some server transport parameters are reused after having been save from previous sessions. These parameters must not be reduced when it resends them. The client must check this is the case when some early data are accepted by the server. This is what is implemented by this patch. Implement qc_early_tranport_params_validate() which checks the new server parameters are not reduced. Also implement qc_ssl_eary_data_accepted() which was not implemented for TLS stack without 0-RTT support (for instance wolfssl). That said this function was no more used. This is why the compilation against wolfssl could not fail.	2025-11-13 14:04:31 +01:00
Frederic Lecaille	6419b9f204	MEDIUM: quic-be: enable the use of 0-RTT This patch allows the use of 0-RTT feature on QUIC server lines with "allow-0rtt" option. In fact 0-RTT is really enabled only if ssl_sock_srv_try_reuse_sess() successfully manages to reuse the SSL session and the chosen application protocol from previous connections. Note that, at this time, 0-RTT works only with quictls and aws-lc as TLS stack. (0-RTT does not work at all (even for QUIC frontends) with libressl).	2025-11-13 14:04:31 +01:00
Frederic Lecaille	a4bbbc75db	MINOR: quic-be: Send post handshake frames from list of frames (0-RTT) This patch is required to make 0-RTT work. It modifies the prototype of quic_build_post_handshake_frames() to send post handshake frames from a list of frames in place of the application encryption level (used as <qc->ael> local variable). This patch does not modify at all the current QUIC stack behavior (even for QUIC frontends). It must be considered as a preparation for the code to come about 0-RTT support for QUIC backends.	2025-11-13 14:04:31 +01:00
Frederic Lecaille	6e14365a5b	MEDIUM: quic-be: modify ssl_sock_srv_try_reuse_sess() to reuse backend sessions (0-RTT) This function is called for both TCP and QUIC connections to reuse SSL sessions saved by ssl_sess_new_srv_cb() callback called upon new SSL session creation. In addition to this, a QUIC SSL session must reuse the ALPN and some specific QUIC transport parameters. This is what is added by this patch for QUIC 0-RTT sessions. Note that for now on, ssl_sock_srv_try_reuse_sess() may fail for QUIC connections if it did not managed to reuse the ALPN. The caller must be informed of such an issue. It must not enable 0-RTT for the current session in this case. This is impossible without ALPN which is required to start a mux. ssl_sock_srv_try_reuse_sess() is modified to always succeeds for TCP connections.	2025-11-13 14:04:31 +01:00
Frederic Lecaille	5309dfb56b	MINOR: quic-be: Save the backend 0-RTT parameters For both TCP and QUIC connections, this is ssl_sess_new_srv_cb() callback which is called when a new SSL session is created. Its role is to save the session to be reused for the next sessions. This patch modifies this callback to save the QUIC parameters to be reused for the next 0-RTT sessions (or during SSL session resumption). The already existing path_params->nego_alpn member is used to store the ALPN as this is done for TCP alongside path_params->tps new quic_early_transport_params struct used to save the QUIC transport parameters to be reused for 0-RTT sessions.	2025-11-13 14:04:31 +01:00
Frederic Lecaille	41e40eb431	MINOR: quic-be: helper quic_reuse_srv_params() function to reuse server params (0-RTT) Implement quic_reuse_srv_params() whose role is to reuse the ALPN negotiated during a first connection to a QUIC backend alongside its transport parameters.	2025-11-13 14:04:31 +01:00
Frederic Lecaille	33564ca54c	MINOR: quic-be: helper functions to save/restore transport params (0-RTT) Define quic_early_transport_params new struct for QUIC transport parameters in relation with 0-RTT. This parameters must be saved during a first session to be reused for 0-RTT next sessions. qc_early_transport_params_cpy() copies the 0-RTT transport parameters to be saved during a first connection to a backend. The copy is made from a quic_transport_params struct to a quic_ealy_transport_params struct. On the contrary, qc_early_transport_params_reuse() copies the transport parameters to be reused for a 0-RTT session from a previous one. The copy is made from a quic_early_transport_params strcut to a quic_transport_params struct. Also add QUIC_EV_EARLY_TRANSP_PARAMS trace event to dump such 0-RTT transport parameters from traces.	2025-11-13 14:04:31 +01:00
Frederic Lecaille	80070fe51c	MEDIUM: quic-be: Parse, store and reuse tokens provided by NEW_TOKEN Add a per thread ist struct to srv_per_thread struct to store the QUIC token to be reused for subsequent sessions. Parse at packet level (from qc_parse_ptk_frms()) these tokens and store them calling qc_try_store_new_token() newly implemented function. This is this new function which does its best (may fail) to update the tokens. Modify qc_do_build_pkt() to resend these tokens calling quic_enc_token() implemented by this patch.	2025-11-13 14:04:31 +01:00
Frederic Lecaille	8f23d4d287	MINOR: quic-be: Parse the NEW_TOKEN frame Rename ->data qf_new_token struct field to ->w_data to distinguish it from ->r_data new field used to parse the NEW_TOKEN frame. Indeed to build the NEW_TOKEN we need to write it to a static buffer into the frame struct. To parse it we only need to store the address of the token field into the RX buffer.	2025-11-13 14:04:31 +01:00
Amaury Denoyelle	5a8728d03a	MEDIUM/OPTIM: quic: alloc quic_conn after CID collision check Some checks failed Contrib / build (push) Has been cancelled Details alpine/musl / gcc (push) Has been cancelled Details VTest / Generate Build Matrix (push) Has been cancelled Details Windows / Windows, gcc, all features (push) Has been cancelled Details VTest / (push) Has been cancelled Details On Initial packet parsing, a new quic_conn instance is allocated via qc_new_conn(). Then a CID is allocated with its value derivated from client ODCID. On CID tree insert, a collision can occur if another thread was already parsing an Initial packet from the same client. In this case, the connection is released and the packet will be requeued to the other thread. Originally, CID collision check was performed prior to quic_conn allocation. This was changed by the commit below, as this could cause issue on quic_conn alloc failure. commit `4ae29be18c` BUG/MINOR: quic: Possible endless loop in quic_lstnr_dghdlr() However, this procedure is less optimal. Indeed, qc_new_conn() performs many steps, thus it could be better to skip it on Initial CID collision, which can happen frequently. This patch restores the older order of operations, with CID collision check prior to quic_conn allocation. To ensure this does not cause again the same bug, the CID is removed in case of quic_conn alloc failure. This should prevent any loop as it ensures that a CID found in the global tree does not point to a NULL quic_conn, unless if CID is attach to a foreign thread. When this thread will parse a re-enqueued packet, either the quic_conn is already allocated or the CID has been removed, triggering a fresh CID and quic_conn allocation procedure.	2025-11-10 12:10:14 +01:00
Amaury Denoyelle	2623e0a0b7	BUG/MEDIUM: quic: handle collision on CID generation CIDs are provided by haproxy so that the peer can use them as DCID of its packets. Their value is set via a random generator. It happens on several occasions during connection lifetime: * via ODCID derivation if haproxy is the server * on quic_conn init if haproxy is the client * during post-handshake if haproxy is the server * on RETIRE_CONNECTION_ID frame parsing CIDs are stored in a global tree. On ODCID derivation, a check is performed to ensure the CID is not a duplicate value. This is mandatory to properly handle multiple INITIAL packets from the same client on different thread. However, for the other cases, no check is performed for CID collision. As _quic_cid_insert() is silent, the issue is not detected at all. This results in a CID advertized to the peer but not stored in the global one. In the end, this may cause two issues. The first one is that packets from the client which use the new CID will be rejected by haproxy, most probably with a STATELESS_RESET. The second issue is that it can cause a crash during quic_conn release. Indeed, the CID is stored in the quic_conn local tree and thus eb_delete() for the global tree will be performed. As <leaf_p> member is uninit, this results in a segfault. Note that this issue is pretty rare. It can only be observed if running with a high number of concurrent connections in parallel, so that the random generator will provide duplicate values. Patch is still labelled as MEDIUM as this modifies code paths used frequently. To fix this, _quic_cid_insert() unsafe function is completely removed. Instead, quic_cid_insert() can be used, which reports an error code if a collision happens. CID are then stored in the quic_conn tree only after global tree insert success. Here is the solution for each steps if a collision occurs : * on init as client: the connection is completely released * post-handshake: the CID is immediately released. The connection is kept, but it will miss an extra CID. * on RETIRE_CONNECTION_ID parsing: a loop is implemented to retry random generation. It it fails several times, the connection is closed in error. A small convenience change is made to quic_cid_insert(). Output parameter <new_tid> can now be NULL, which is useful as most of the times caller do not care about it. This must be backported up to 2.6.	2025-11-10 12:10:14 +01:00
Amaury Denoyelle	419e5509d8	MINOR: quic: split CID alloc/generation function Split new_quic_cid() function into multiple ones. This patch should not introduce any visible change. The objective is to render CID allocation and generation more modular. The first advantage of this patch is to bring code simplication. In particular, conn CID sequence number increment and insertion into connection tree is simpler than before. Another improvment is also that errors could now be handled easier at each different steps of the CID init. This patch is a prerequisite for the fix on CID collision, thus it must be backported prior to it to every affected version.	2025-11-10 12:10:14 +01:00
Christopher Faulet	ecc2c3a35d	MEDIUM: peers: Remove commitupdate field on stick-tables This stick-table field was atomically updated with the last update id pushed and dumped on the CLI but never used otherwise. And all peer sessions share the same id because it is a stick-table info. So the info in peers dump is pretty limited. So, let's remove it.	2025-11-07 12:17:53 +01:00
Ben Kallus	d5ca3bb3b4	IMPORT: cebtree: Replace offset calculation with offsetof to avoid UB Some checks are pending Contrib / build (push) Waiting to run Details alpine/musl / gcc (push) Waiting to run Details VTest / Generate Build Matrix (push) Waiting to run Details VTest / (push) Blocked by required conditions Details Windows / Windows, gcc, all features (push) Waiting to run Details This is the same as the equivalent fix in ebtree: The C standard specifies that it's undefined behavior to dereference NULL (even if you use & right after). The hand-rolled offsetof idiom &(((s*)NULL)->f) is thus technically undefined. This clutters the output of UBSan and is simple to fix: just use the real offsetof when it's available. This is cebtree commit 2d08958858c2b8a1da880061aed941324e20e748.	2025-11-07 07:32:58 +01:00
Willy Tarreau	14087e48b9	MINOR: tools: add env_suggest() to suggest alternate variable names The purpose here is to look in the environment for a variable whose name looks like the provided one. This will be used to try to auto- correct misspelled environment variables that would silently be turned to an empty string.	2025-11-06 19:57:44 +01:00
Willy Tarreau	a4d78dd4f5	MINOR: tools: add support for ist to the word fingerprinting functions The word fingerprinting functions are used to compare similar words to suggest a correctly spelled one that looks like what the user proposed. Currently the functions only support const char*, but there's no reason for this, and it would be convenient to support substrings extracted from random pieces of configurations. Here we're adding new variants "_with_len" that take these ISTs and which are in fact a slight change of the original ones that the old ones now rely on.	2025-11-06 19:57:44 +01:00
Willy Tarreau	0144426dfb	BUG/MEDIUM: server: close a race around ready_srv when deleting a server When a server is being disabled or deleted, in case it matches the backend's ready_srv, this one is reset. However it's currently done in a non-atomic way when the server goes down, and that could occasionally reset the entry matching another server, but more importantly if in parallel some requests are dequeued for that server, it may re-appear there after having been removed, leading to a possible crash once it is fully removed, as shown in issue #3177. Let's make sure we reset the pointer when detaching the server from the proxy, and use a CAS in both cases to only reset this server. This fix needs to be backported to 3.2. There, srv_detach() is in server.c instead of server.h. Thanks to Basha Mougamadou for the detailed report and the useful backtraces.	2025-11-06 19:57:44 +01:00
Christopher Faulet	a1b5325a7a	MINOR: channel: Remove total field from channels The <total> field in the channel structure is now useless, so it can be removed. The <bytes_in> field from the SC is used instead. This patch is related to issue #1617.	2025-11-06 15:01:29 +01:00
Christopher Faulet	1effe0fc0a	MINOR: applet: Add function to get amount of data in the output buffer The helper function applet_output_data() returns the amount of data in the output buffer of an applet. For applets using the new API, it is based on data present in the outbuf buffer. For legacy applets, it is based on input data present in the input channel's buffer. The HTX version, applet_htx_output_data(), is also available This patch is related to issue #1617.	2025-11-06 15:01:29 +01:00
Christopher Faulet	4991a51208	MINOR: stats: Add stats about request and response bytes received and sent In previous patches, these counters were added per frontend, backend, server and listener. With this patch, these counters are reported on stats, including promex. Note that the stats file minor version was incremented by one because the shm_stats_file_object struct size has changed. This patch is related to issue #1617.	2025-11-06 15:01:29 +01:00
Christopher Faulet	0084baa6ba	MINOR: counters: Remove bytes_in and bytes_out counter from fe/be/srv/li bytes_in and bytes_out counters per frontend, backend, listener and server were removed and we now rely on, respectively on, req_in and res_in counters. This patch is related to issue #1617.	2025-11-06 15:01:29 +01:00
Christopher Faulet	567df50d91	MINOR: stream: Remove bytes_in and bytes_out counters from stream per-stream bytes_in and bytes_out counters was removed and replaced by req.in and res.in. Coorresponding samples still exists but replies on new counters. This patch is related to issue #1617.	2025-11-06 15:01:29 +01:00
Christopher Faulet	1c62a6f501	MINOR: counters: Add req_in/req_out/res_in/res_out counters for fe/be/srv/li Thanks to the previous patch, and based on info available on the stream, it is now possible to have counters for frontends, backends, servers and listeners to report number of bytes received and sent on both sides. This patch is related to issue #1617.	2025-11-06 15:01:29 +01:00
Christopher Faulet	ac9201f929	MINOR: stream: Add samples to get number of bytes received or sent on each side req.in and req.out samples can now be used to get the number of bytes received by a client and send to the server. And res.in and res.out samples can be used to get the number of bytes received by a server and send to the client. These info are stored in the logs structure inside a stream. This patch is related to issue #1617.	2025-11-06 15:01:28 +01:00
Christopher Faulet	629fbbce19	MINOR: stconn: Add counters to SC to know number of bytes received and sent <bytes_in> and <bytes_out> counters were added to SC to count, respectively, the number of bytes received from an endpoint or sent to an endpoint. These counters are updated for connections and applets. This patch is related to issue #1617.	2025-11-06 15:01:28 +01:00
Willy Tarreau	5fe4677231	MINOR: server: move the lock inside srv_add_idle() Almost all callers of _srv_add_idle() lock the list then call the function. It's not the most efficient and it requires some care from the caller to take care of that lock. Let's change this a little bit by having srv_add_idle() that takes the lock and calls _srv_add_idle() that is now inlined. This way callers don't have to handle the lock themselves anymore, and the lock is only taken around the sensitive parts, not the function call+return. Interestingly, perf tests show a small perf increase from 2.28-2.32M RPS to 2.32-2.37M RPS on a 128-thread system.	2025-11-06 13:16:24 +01:00
William Lallemand	546c67d137	MINOR: acme: generate a temporary key pair This patch provides two functions acme_gen_tmp_pkey() and acme_gen_tmp_x509(). These functions generates a unique keypair and X509 certificate that will be stored in tmp_x509 and tmp_pkey. If the key pair or certificate was already generated they will return the existing one. The key is an RSA2048 and the X509 is generated with a expiration in the past. The CN is "expired". These are just placeholders to be used if we don't have files.	2025-11-06 11:56:27 +01:00
William Lallemand	1df55b441b	MEDIUM: ssl/ckch: use ckch_store instead of ckch_data for ckch_conf_kws This is an API change, instead of passing a ckch_data alone, the ckch_conf_kws.func() is called with a ckch_store. This allows the callback to access the whole ckch_store, with the ckch_conf and the ckch_data. But it requires the ckch_conf to be actually put in the ckch_store before.	2025-11-06 11:56:27 +01:00
Amaury Denoyelle	b9809fe0d0	MINOR: quic: remove <mux_state> field Some checks failed Contrib / build (push) Has been cancelled Details alpine/musl / gcc (push) Has been cancelled Details VTest / Generate Build Matrix (push) Has been cancelled Details Windows / Windows, gcc, all features (push) Has been cancelled Details VTest / (push) Has been cancelled Details This patch removes <mux_state> field from quic_conn structure. The purpose of this field was to indicate if MUX layer above quic_conn is not yet initialized, active, or already released. It became tedious to properly set it as initialization order of the various quic_conn/conn/MUX layers now differ between the frontend and backend sides, and also depending if 0-RTT is used or not. Recently, a new change introduced in connect_server() will allow to initialize QUIC MUX earlier if ALPN is cached on the server structure. This had another level of complexity. Thus, this patch removes <mux_state> field completely. Instead, a new flag QUIC_FL_CONN_XPRT_CLOSED is defined. It is set at a single place only on close XPRT callback invokation. It can be mixed with the new utility functions qc_wait_for_conn()/qc_is_conn_ready() to determine the status of conn/MUX layers now without an extra quic_conn field.	2025-11-05 14:03:34 +01:00
Willy Tarreau	096999ee20	BUG/MEDIUM: connections: permit to permanently remove an idle conn There's currently a function conn_delete_from_tree() which is used to detach an idle connection from the tree it's currently attached to so that it is no longer found. This function is used in three circumstances: - when picking a new connection that no longer has any avail stream - when temporarily working on the connection from an I/O handler, in which case it's re-added at the end - when killing a connection The 2nd case above is quite specific, as it requires to preserve the CO_FL_LIST_MASK flags so that the connection can be re-inserted into the proper tree when leaving the handler. However, there's a catch. When killing a connection, we want to be certain it will not be reinserted into the tree. The flags preservation is causing a tiny race if an I/O happens while the connection is in the kill list, because in this case the I/O handler will note the connection flags, do its work, then reinsert the connection where it believed it was, then the connection gets purged, and another user can find it in the tree. The issue is very difficult to reproduce. On a 128-thread machine it happens in H2 around 500k req/s after around 50M requests. In H1 it happens after around 1 billion requests. The fix here consists in passing an extra argument to the function to indicate if the removal is permanent or not. When it's permanent, the function will clear the associated flags. The callers were adjusted so that all those dequeuing a connection in order to kill it do it permanently and all other ones do it only temporarily. A slightly different approach could have worked: the function could always remove all flags, and the callers would need to restore them. But this would require trickier modifications of the various call places, compared to only passing 0/1 to indicate the permanent status. This will need to be backported to all stable versions. The issue was at least reproduced since 3.1 (not tested before). The patch will need to be adjusted for 3.2 and older, because a 2nd argument "thr" was added in 3.3, so the patch will not apply to older versions as-is.	2025-11-05 11:08:25 +01:00
Olivier Houchard	7d4aa7b22b	BUG/MEDIUM: server: Add a rwlock to path parameter Add a rwlock to control the server's path_parameter, to make sure multiple threads don't set it at the same time, and it can't be seen in an inconsistent state. Also don't set the parameter every time, only set them if they have changed, to prevent needless writes. This does not need to be backported.	2025-11-04 18:47:34 +01:00
Amaury Denoyelle	efe60745b3	MINOR: quic: remove connection arg from qc_new_conn() Some checks are pending Contrib / build (push) Waiting to run Details alpine/musl / gcc (push) Waiting to run Details VTest / Generate Build Matrix (push) Waiting to run Details VTest / (push) Blocked by required conditions Details Windows / Windows, gcc, all features (push) Waiting to run Details This patch is similar to the previous one, this time dealing with qc_new_conn(). This function was asymetric on frontend and backend side, as connection argument was set only in the latter case. This was required prior due to qc_alloc_ssl_sock_ctx() signature. This has changed with the previous patch, thus qc_new_conn() can also be realigned on both FE and BE sides. <conn> member of quic_conn instance is always set outside it, in qc_xprt_start() on the backend case.	2025-11-04 17:47:42 +01:00
Amaury Denoyelle	5a17cade4f	MINOR: quic: do not set conn member if ssl_sock_ctx ssl_sock_ctx is a generic object used both on TCP/SSL and QUIC stacks. Most notably it contains a <conn> member which is a pointer to struct connection. On QUIC frontend side, this member is always set to NULL. Indeed, connection is only created after handshake completion. However, this has changed for backend side, where the connection is instantiated prior to its quic_conn counterpart. Thus, ssl_sock_ctx member would be set in this case as a convenience for use later in qc_ssl_do_hanshake(). However, this method was unsafe as the connection can be released, without resetting ssl_sock_ctx member. Thus, the previous patch fixes this by using on <conn> member through the quic_conn instance which is the proper way. Thus, this patch resets ssl_sock_ctx <conn> member to NULL. This is deemed the cleanest method as it ensures that both frontend and backend sides must not use it anymore.	2025-11-04 17:38:09 +01:00
Willy Tarreau	fd012b6c59	OPTIM: proxy: move atomically access fields out of the read-only ones Some checks are pending Contrib / build (push) Waiting to run Details alpine/musl / gcc (push) Waiting to run Details VTest / Generate Build Matrix (push) Waiting to run Details VTest / (push) Blocked by required conditions Details Windows / Windows, gcc, all features (push) Waiting to run Details Perf top showed that h1_snd_buf() was having great difficulties accessing the proxy's server_id_hdr_name field in the middle of the headers loop. Moving the assignment out of the loop to a local variable moved the problem there as well: \| if (!(h1m->flags & H1_MF_RESP) && isttest(h1c->px->server_id_hdr_n 0.10 \|20b0: mov -0x120(%rbp),%rdi 1.33 \| mov 0x60(%rdi),%r10 0.01 \| test %eax,%eax 0.18 \| jne 2118 12.87 \| mov 0x350(%r10),%rdi 0.01 \| test %rdi,%rdi 0.05 \| je 2118 \| mov 0x358(%r10),%r11 It turns out that there are several atomically accessed fields in its vicinity, causing the cache line to bounce all the time. Let's collect the few frequently changed fields and place them together at the end of the structure, and plug the 32-bit hole with another isolated field. Doing so also reduced a little bit the cost of decrementing be->be_conn in process_stream(), and overall the HTTP/1 performance increased by about 1% both on ARM and x86_64.	2025-11-03 13:54:49 +01:00
Amaury Denoyelle	6bfabfdc77	OPTIM: backend: skip conn reuse for incompatible proxies Some checks failed Contrib / build (push) Has been cancelled Details alpine/musl / gcc (push) Has been cancelled Details VTest / Generate Build Matrix (push) Has been cancelled Details Windows / Windows, gcc, all features (push) Has been cancelled Details VTest / (push) Has been cancelled Details When trying to reuse a backend connection, a connection hash is calculated to match an entry with similar parameters. Previously, this operation was skipped if the stream content wasn't based on HTTP, as it would have been incompatible with http-reuse. With the introduction of SPOP backends, this condition was removed, so that it can also benefit from connection reuse. However, this means that now hash calcul is always performed when connecting to a server, even for TCP or log backends. This is unnecessary as these proxies cannot perform connection reuse. Note also that reuse mode is resetted on postparsing for incompatible backends. This at least guarantees that no tree lookup will be performed via be_reuse_connection(). However, connection lookup is still performed in the session via session_get_conn() which is another unnecessary operation. Thus, this patch restores the condition so that reuse operations are now entirely skipped if a backend mode is incompatible. This is implemented via a new utility function named be_supports_conn_reuse(). This could be backported up to 3.1, as this commit could be considered as a performance regression for tcp/log backend modes.	2025-11-03 10:43:50 +01:00
Amaury Denoyelle	14a6468df5	MINOR: quic: reject conf with QUIC servers if not compiled Ensure that QUIC support is compiled into haproxy when a QUIC server is configured. This check is performed during _srv_parse_finalize() so that it is detected both on configuration parsing and when adding a dynamic server via the CLI. Note that this changes the behavior of srv_is_quic() utility function. Previously, it always returned false when QUIC support wasn't compiled. With this new check introduced, it is now guaranteed that a QUIC server won't exist if compilation support is not active. Hence srv_is_quic() does not rely anymore on USE_QUIC define.	2025-10-31 11:32:20 +01:00
Willy Tarreau	b0e8edaef2	MEDIUM: mux-h2: do not needlessly refrain from sending data early The mux currently refrains from sending data before H2_CS_FRAME_H, i.e. before the peer's SETTINGS frame was received. While it makes sense on the frontend, it's causing harm on the backend because it forces the first request to be sent in two halves over an extra RTT: first the preface and settings, second the request once the settings are received. This is totally contrary to the philosophy of the H2 protocol, consisting in permitting the client to send as soon as possible. Actually what happens is the following: - process_stream() calls connect_server() - connect_server() creates a connection, and if the proto/alpn is guessed or known, the mux is instantiated for the current request. - the H2 init code wakes the h2 tasklet up and returns - process_stream() tries to send the request using h2_snd_buf(), but that one sees that we're before H2_CS_FRAME_H, refrains from doing so and returns. - process_stream() subscribes and quits - the h2 tasklet can now execute to send the preface and settings, which leave as a first TCP segment. The connection is ready. - the iocb is woken again once the server's SETTINGS frame is received, turning the connection to the H2_CS_FRAME_H state, and the iocb wake up process_stream(). - process_stream() executes again and can try to send again. - h2_snd_buf() is called and finally sends the request as a second TCP segment. Not only this is inefficient, but it also renders 0-RTT and TFO impossible on H2 connections. When 0-RTT is used, only the preface and settings leave as early data (the very first data of that connection), which is totally pointless. In order to fix this, we have to go through a few steps: - first we need to let data be sent to a server immediately after the SETTINGS frame was sent (i.e. in H2_CS_SETTINGS1 state instead of H2_CS_FRAME_H). However, some protocol extensions are advertised by the server using SETTINGS (e.g. RFC8441) and some requests might need to know the existence of such extensions. For this reason we're adding a new h2c flag, H2_CF_SETTINGS_NEEDED, which indicates that some operations were not done because a server's SETTINGS frame is needed. This is set when trying to send a protocol upgrade or extended CONNECT during H2_CS_SETTINGS1, indicating that it's needed to wait for H2_CS_FRAME_H in this case. The flag is always set on frontend connections. This is what is being done in this patch. - second, we need to be able to push the preface opportunistically with the first h2_snd_buf() so that it's not needed to wake the tasklet up just to send that and wake process_stream() again. This will be in a separate patch. By doing the first step, we're at least saving one needless tasklet wakeup per connection (~9%), which results in ~5% backend connection rate increase.	2025-10-30 18:16:54 +01:00
William Lallemand	1e2f920be6	MINOR: listener: implement bind_conf_find_by_name() Returns a pointer to the first bind_conf matching <name> in a frontend <front>. When name is prefixed by a @ (@<filename>:<linenum>), it tries to look for the corresponding filename and line of the configuration file. NULL is returned if no match is found.	2025-10-30 10:37:42 +01:00
sftcd	23f5cbb411	MINOR: ssl/ech: add logging and sample fetches for ECH status and outer SNI This patch adds functions to expose Encrypted Client Hello (ECH) status and outer SNI information for logging and sample fetching. Two new helper functions are introduced in ech.c: - conn_get_ech_status() places the ECH processing status string into a buffer. - conn_get_ech_outer_sni() retrieves the outer SNI value if ECH succeeded. Two new sample fetch keywords are added: - "ssl_fc_ech_status" returns the ECH status string. - "ssl_fc_ech_outer_sni" returns the outer SNI value seen during ECH. These allow ECH information to be used in HAProxy logs, ACLs, and captures.	2025-10-30 10:37:30 +01:00
sftcd	dba4fd248a	MEDIUM: ssl/ech: config and load keys This patch introduces the USE_ECH option in the Makefile to enable support for Encrypted Client Hello (ECH) with OpenSSL. A new function, load_echkeys, is added to load ECH keys from a specified directory. The SSL context initialization process in ssl_sock.c is updated to load these keys if configured. A new configuration directive, `ech`, is introduced to allow users to specify the ECH key directory in the listener configuration.	2025-10-30 10:37:12 +01:00
Remi Tricot-Le Breton	dc35a3487b	MINOR: ssl: Do not dump decrypted privkeys in 'dump ssl cert' A private keys that is password protected and was decoded during init thanks to the password obtained thanks to 'ssl-passphrase-cmd' should not be dumped via 'dump ssl cert' CLI command.	2025-10-29 10:54:17 +01:00
Remi Tricot-Le Breton	478dd7bad0	MEDIUM: ssl: Add certificate password callback that calls external command When a certificate is protected by a password, we can provide the password via the dedicated pem_password_cb param provided to PEM_read_bio_PrivateKey. HAProxy will fetch the password automatically during init by calling a user-defined external command that should dump the right password on its standard output (see new 'ssl-passphrase-cmd' global option).	2025-10-29 10:54:17 +01:00
Remi Tricot-Le Breton	1ec59d3426	MINOR: init: Make devnullfd global and create it earlier in init The devnull fd might be needed during configuration parsing, if some options require to fork/exec for instance. So we now create it much earlier in the init process and without depending on the '-q' or '-d' parameters.	2025-10-29 10:54:17 +01:00
Willy Tarreau	2d7e3ddd4a	BUG/MEDIUM: cli: do not return ACKs one char at a time Some checks are pending Contrib / build (push) Waiting to run Details alpine/musl / gcc (push) Waiting to run Details VTest / Generate Build Matrix (push) Waiting to run Details VTest / (push) Blocked by required conditions Details Windows / Windows, gcc, all features (push) Waiting to run Details Since 3.0 where the CLI started to use rcv_buf, it appears that some external tools sending chained commands are randomly experiencing failures. Each time this happens when the whole command is sent as a single packet, immediately followed by a close. This is not a correct way to use the CLI but this has been working for ages for simple netcat-based scripts, so we should at least try to preserve this. The cause of the failure is that the first LF that acks a command is immediately sent back to the client and rejected due to the closed connection. This in turn forwards the error back to the applet which aborts its processing. Before 3.0 the responses would be queued into the buffer, then sent back to the channel, and would all fail at once. This changed when snd_buf/rcv_buf were implemented because the applets are much more responsive and since they yield between each command, they can deliver one ACK at a time that is immediately forwarded down the chain. An easy way to observe the problem is to send 5 map updates, a shutdown, and immediately close via tcploop, and in parallel run a periodic "show map" to count the number of elements: $ tcploop -U /tmp/sock1 C S:"add map #0 1 1; add map #0 2 2; add map #0 3 3; add map #0 4 4; add map #0 5 5\n" F K Before 3.0, there would always be 5 elements. Since 3.0 and before `20ec1de214` ("MAJOR: cli: Refacor parsing and execution of pipelined commands"), almost always 2. And since that commit above in 3.2, almost always one. Doing the same using socat or netcat shows almost always 5... It's entirely timing-dependent, and might even vary based on the RTT between the client and haproxy! The approach taken here consists in doing the same principle as MSG_MORE or Nagle but on the response buffer: the applet doesn't need to send a single ACK for each command when it has already been woken up and is scheduled to come back to work. It's fine (and even desirable) that ACKs are grouped in a single packet as much as possible. For this reason, this patch implements APPCTX_CLI_ST1_YIELD, a new CLI flag which indicates that the applet left in yielding condition, i.e. it has not finished its work. This flag is used by .rcv_buf to hold pending data. This way we won't return partial responses for no reason, and we can continue to emulate the previous behavior. One very nice benefit to this is that it saves huge amounts of CPU on the client. In the test below that tries to update 1M map entries, the CPU used by socat went from 100% to 0% and the total transfer time dropped by 28%: before: $ time awk 'BEGIN{ printf "prompt i\n"; for (i=0;i<1000000;i++) { \ printf "add map #0 %d %d\n",i,i,i }}' \| socat /tmp/sock1 - >/dev/null real 0m2.407s user 0m1.485s sys 0m1.682s after: $ time awk 'BEGIN{ printf "prompt i\n"; for (i=0;i<1000000;i++) { \ printf "add map #0 %d %d\n",i,i,i }}' \| socat /tmp/sock1 - >/dev/null real 0m1.721s user 0m0.952s sys 0m0.057s The difference is also quite visible on the number of syscalls during the test (for 1k updates): before: % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 100.00 0.071691 0 100001 sendmsg after: % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 100.00 0.000011 1 9 sendmsg This patch will need to be backported to 3.0, and depends on these two patches to be backported as well: MINOR: applet: do not put SE_FL_WANT_ROOM on rcv_buf() if the channel is empty MINOR: cli: create cli_raw_rcv_buf() from the generic applet_raw_rcv_buf()	2025-10-27 16:57:07 +01:00
Olivier Houchard	837351245a	BUG/MEDIUM: mt_list: Use atomic operations to prevent compiler optims As a folow-up to `f40f5401b9`, explicitely use atomic operations to set the prev and next fields, to make sure the compiler can't assume anything about it, and just does it. This should be backported after `f40f5401b9` up to 2.8.	2025-10-24 13:34:41 +02:00
Willy Tarreau	2ec6df59bf	BUILD: openssl-compat: fix build failure with OPENSSL=0 and KTLS=1 The USE_KTLS test is currently being done outside of the USE_OPENSSL guard so disabling USE_OPENSSL still results in build failures on libcs built with support for kernels before 4.17, because we enable KTLS by default on linux. Let's move the KTLS block inside the USE_OPENSSL guard instead. No backport is needed since KTLS is only in 3.3.	2025-10-24 10:45:02 +02:00
Aurelien DARRAGON	d655ed5f14	BUG/MAJOR: stats-file: ensure shm_stats_file_object struct mapping consistency (2nd attempt) Some checks are pending Contrib / build (push) Waiting to run Details alpine/musl / gcc (push) Waiting to run Details VTest / Generate Build Matrix (push) Waiting to run Details VTest / (push) Blocked by required conditions Details Windows / Windows, gcc, all features (push) Waiting to run Details This is a second attempt at fixing issues on 32bits systems which would trigger the following BUG_ON() statement: FATAL: bug condition "sizeof(struct shm_stats_file_object) != 544" matched at src/stats-file.c:825 shm_stats_file_object struct size changed, is is part of the exported API: ensure all precautions were taken (ie: shm_stats_file version change) before adjusting this This is a drop-in replacement for `d30b88a6c` + `4693ee0ff`, as suggested by Willy. Indeed, on supported platforms unsigned int can be assumed to be 4 bytes long, and long can be assumed to be 8 bytes long. As such, the previous attempt was overkill and added unecessary maintenance complexity which could result in bugs if not used properly. Moreover, it would only partially solve the issue, since on little endian vs big endian architectures, the provisioned memory areas (originating from the same shm stats file) could be read differently by the host. Instead we fix the aligments issues, and this alone helps to ensure struct memory consistency on 64 vs 32bits platforms. It was tested on both i386 and i586. last_change and last_sess counters are now stored as unsigned int, as it helped to fix the alignment issues and they were found to be used as 32bits integers anyway. Thanks to Willy for problem analysis and the patch proposal. No backport needed.	2025-10-24 09:35:38 +02:00
Aurelien DARRAGON	a931779dde	Revert "MINOR: compiler: add FIXED_SIZE(size, type, name) macro" This reverts commit `466a603b59`. Due to the last 2 commits, this macro is now unused, and will probably never be used, so let's get rid of that for now.	2025-10-24 09:35:34 +02:00
Aurelien DARRAGON	8277f891d2	Revert "MEDIUM: freq-ctr: use explicit-size types for freq-ctr struct" This reverts commit `4693ee0ff7`. As discussed in GH #3168, this works but it is not the proper way to fix the issue. See following commits.	2025-10-24 09:35:29 +02:00
Aurelien DARRAGON	c0d952ccc1	Revert "BUG/MAJOR: stats-file: ensure shm_stats_file_object struct mapping consistency" This reverts commit `d30b88a6cc`. As discussed in GH #3168, this works but it is not the proper way to fix the issue. See following commits.	2025-10-24 09:35:25 +02:00
Amaury Denoyelle	7ba4b0ad5f	BUG/MINOR: quic: rename and duplicate stream settings Some checks failed Contrib / build (push) Has been cancelled Details alpine/musl / gcc (push) Has been cancelled Details VTest / Generate Build Matrix (push) Has been cancelled Details Windows / Windows, gcc, all features (push) Has been cancelled Details VTest / (push) Has been cancelled Details Several settings can be set to control stream multiplexing and associated receive window. Previously, all of these settings were configured using prefix "tune.quic.frontend.", despite being applied blindly on both sides. Fix this by duplicating these settings specific to frontend and backend side. Options are also renamed to use the standardize prefix "tune.quic.[be\|fe].stream." notation. Also, each option is individually renamed to better reflect its purpose and hide technical details relative to QUIC transport parameter naming : * max-data-size -> stream.rxbuf * max-streams-bidi -> stream.max-concurrent * stream-data-ratio -> stream.data-ratio No need to backport.	2025-10-23 16:49:20 +02:00
Amaury Denoyelle	d5142706f8	BUG/MINOR: quic: split option for congestion max window size	2025-10-23 16:49:20 +02:00
Amaury Denoyelle	33afba0dda	BUG/MINOR: quic: split max-idle-timeout option for FE/BE usage Streamline max-idle-timeout option. Rename it to use the newer cohesive naming scheme 'tune.quic.fe\|be.'. Two different fields were already defined in global struct. These fields are moved into quic_tune along with other QUIC settings. However, no parser was defined for backend option, this commit fixes this. No need to backport this.	2025-10-23 16:49:20 +02:00
Amaury Denoyelle	5bc659a4a2	MINOR: quic: rename frontend sock-per-conn setting On frontend side, a quic_conn can have a dedicated FD or use the listener one. These different modes can be activated via a global QUIC tune setting. This patch adjusts the option. First, it is renamed to the more meaningful name 'tune.quic.fe.sock-per-conn'. Also, arguments are now either 'default-on' or 'force-off'. The objective is to better highlight reliationship with 'quic-socket' bind option. The older option is deprecated and will be removed in 3.5.	2025-10-23 16:49:20 +02:00
Amaury Denoyelle	a14c6cee17	MINOR: quic: rename retry-threshold setting A QUIC global tune setting is defined to be able to force Retry emission prior to handshake. By definition, this ability is only supported by QUIC servers, hence it is a frontend option only. Rename the option to use "fe" prefix. The old option name is deprecated and will be removed in 3.5	2025-10-23 16:49:20 +02:00
Amaury Denoyelle	d248c5bd21	MINOR: quic: rename max Tx mem setting QUIC global memory can be limited across the entire process via a global tune setting. Previously, this setting used to misleading "frontend" prefix. As this is applied as a sum between all QUIC connections, both from frontend and backend sides, remove the prefix. The new option name is "tune.quic.mem.tx-max". The older option name is deprecated and will be removed in 3.5.	2025-10-23 16:49:20 +02:00
Amaury Denoyelle	9bfe9b9e21	MINOR: quic: split Tx options for FE/BE usage This patch is similar to the previous one, except that it is focused on Tx QUIC settings. It is now possible to toggle GSO and pacing on frontend and backend sides independently. As with previous patch, option are renamed to use "fe/be" unified prefixes. This is part of the current serie of commits which unify QUI settings. Older options are deprecated and will be removed on 3.5 release.	2025-10-23 16:49:20 +02:00
Amaury Denoyelle	33a8cb87a9	MINOR: quic: split congestion controler options for FE/BE usage Various settings can be configured related to QUIC congestion controler. This patch duplicates them to be able to set independent values on frontend and backend sides. As with previous patch, option are renamed to use "fe/be" unified prefixes. This is part of the current serie of commits which unify QUIC settings. Older options are deprecated and will be removed on 3.5 release.	2025-10-23 16:49:20 +02:00
Amaury Denoyelle	7640e9a9ee	MINOR: quic: duplicate glitches FE option on BE side Previously, QUIC glitches support was only implemented for frontend side. Extend this so that the option can be specified separately both on frontend and backend sides. Function _qcc_report_glitch() now retrieves the relevant max value based on connection side. In addition to this, option has been renamed to use "fe/be" prefixes. This is part of the current serie of commits which unify QUIC settings. Older options are deprecated and will be removed on 3.5 release.	2025-10-23 16:49:20 +02:00
Amaury Denoyelle	b34cd0b506	MINOR: quic: rename "no-quic" to "tune.quic.listen" Rename the option to quickly enable/disable every QUIC listeners. It now takes an argument on/off. The documentation is extended to reflect the fact that QUIC backend are not impacted by this option. The older keyword is simply removed. Deprecation is considered unnecessary as this setting is only useful during debugging.	2025-10-23 16:47:58 +02:00
Amaury Denoyelle	42e5ec6519	MINOR: quic: prepare support for options on FE/BE side A major reorganization of QUIC settings is going to be performed. One of its objective is to clearly define options which can be separately configured on frontend and backend proxy sides. To implement this, quic_tune structure is extended to support fe and be options. A set of macros/functions is also defined : it allows to retrieve an option defined on both sides with unified code, based on proxy side of a quic_conn/connection instance.	2025-10-23 15:06:01 +02:00
Olivier Houchard	f40f5401b9	BUG/MEDIUM: mt_lists: Avoid el->prev = el->next = el Avoid setting both el->prev and el->next on the same line. The goal is to set both el->prev and el->next to el, but a naive compiler, such as when we're using -O0, will set el->next first, then will set el->prev to the value of el->next, but if we're unlucky, el->next will have been set to something else by another thread. So explicitely set both to what we want. This should be backported up to 2.8.	2025-10-23 14:43:51 +02:00
Aurelien DARRAGON	d30b88a6cc	BUG/MAJOR: stats-file: ensure shm_stats_file_object struct mapping consistency Some checks are pending Contrib / build (push) Waiting to run Details alpine/musl / gcc (push) Waiting to run Details VTest / Generate Build Matrix (push) Waiting to run Details VTest / (push) Blocked by required conditions Details Windows / Windows, gcc, all features (push) Waiting to run Details As reported by @tianon on GH #3168, running haproxy on 32bits i386 platform would trigger the following BUG_ON() statement: FATAL: bug condition "sizeof(struct shm_stats_file_object) != 544" matched at src/stats-file.c:825 shm_stats_file_object struct size changed, is is part of the exported API: ensure all precautions were taken (ie: shm_stats_file version change) before adjusting this In fact, some efforts were already taken to ensure shm_stats_file_object struct size remains consistent on 64 vs 32 bits platforms, since shm_stats_file_object is part of the public API and directly exposed in the stats file. However, some parts were overlooked: some structs that are embedded in shm_stats_file_object struct itself weren't using fixed-width integers, and would sometime be unaligned. The result of this is that it was up to the compiler (platform-dependent) to choose how to deal with such ambiguities, which could cause the struct mapping/size to be inconsistent from one platform to another. Hopefully this was caught by the BUG_ON() statement and with the precious help of @tianon To fix this, we now use fixed-width integers everywhere for members (and submembers) of shm_stats_file_object struct, and we use explicit padding where missing to avoid automatic padding when we don't expect one. As for the previous commit, we leverage FIXED_SIZE() and FIXED_SIZE_ARRAY() macro to set the expected width for each integer without causing build issues on platform that don't support larger integers. No backport needed, this feature was introduced during 3.3-dev.	2025-10-22 20:52:22 +02:00
Aurelien DARRAGON	4693ee0ff7	MEDIUM: freq-ctr: use explicit-size types for freq-ctr struct freq-ctr struct is used by the shm_stats_file API, and more precisely, it is used in the shm_stats_file_object struct for counters. shm_stats_file_object struct requires to be plateform-independent, thus we switch to using explicit size types (AKA fixed width integer types) for freq-ctr, in the attempt to make freq-ctr size and memory mapping consistent from one platform to another. We cannot simply use fixed-width integer because some of them are involved in atomic operations, and forcing a given width could cause build issues on some platforms where atomic ops are not implemented for large integers. Instead we leverage the FIXED_SIZE macro to keep handling the integers as before, but forcing them to be stored using expected number of bytes (unused bytes will simply be ignored). No change of behavior should be expected.	2025-10-22 20:52:18 +02:00
Aurelien DARRAGON	466a603b59	MINOR: compiler: add FIXED_SIZE(size, type, name) macro FIXED_SIZE() macro can be used to instruct the compiler that the struct member named <name>, handled as <type>, must be stored using <size> bytes and that even if the type used is actualler smaller than the expected size FIXED_SIZE_ARRAY(), similar to FIXED_SIZE() but for arrays: it takes an extra argument which is the number of members. They may be used for portability concerns to ensure a structure mapping remains consistent between platforms.	2025-10-22 20:52:12 +02:00
Amaury Denoyelle	f50425c021	MINOR: quic: remove received CRYPTO temporary tree storage Some checks failed Contrib / build (push) Has been cancelled Details alpine/musl / gcc (push) Has been cancelled Details VTest / Generate Build Matrix (push) Has been cancelled Details Windows / Windows, gcc, all features (push) Has been cancelled Details VTest / (push) Has been cancelled Details The previous commit switch from ncbuf to ncbmbuf as storage for received CRYPTO frames. The latter ensures that buffering of such frames cannot fail anymore due to gaps size. Previously, extra mechanism were implemented on QUIC frames parsing function to overcome the limitation of ncbuf on gaps size. Before insertion, CRYPTO frames were stored in a temporary tree to order their insertion. As this is not necessary anymore, this commit removes the temporary tree insertion. This commit is closely associated to the previous bug fix. As it provides a neat optimization and code simplication, it can be backported with it, but not in the next immediate release to spot potential regression.	2025-10-22 15:24:02 +02:00
Amaury Denoyelle	4c11206395	BUG/MAJOR: quic: use ncbmbuf for CRYPTO handling In QUIC, TLS handshake messages such as ClientHello are encapsulated in CRYPTO frames. Each QUIC implementation can split the content in several frames of random sizes. In fact, this feature is now used by several clients, based on chrome so-called "Chaos protection" mechanism : https://quiche.googlesource.com/quiche/+/cb6b51054274cb2c939264faf34a1776e0a5bab7 To support this, haproxy uses a ncbuf storage to store received CRYPTO frames before passing it to the SSL library. However, this storage suffers from a limitation as gaps between two filled blocks cannot be smaller than 8 bytes. Thus, depending on the size of received CRYPTO frames and their order, ncbuf may not be sufficient. Over time, several mechanisms were implemented in haproxy QUIC frames parsing to overcome the ncbuf limitation. However, reports recently highlight that with some clients haproxy is not able to deal with CRYPTO frames reception. In particular, this is the case with the latest ngtcp2 release, which implements a similar chaos protection mechanism via the following patch. It also seems that this impacts haproxy interaction with firefox. commit 89c29fd8611d5e6d2f6b1f475c5e3494c376028c Author: Tatsuhiro Tsujikawa <tatsuhiro.t@gmail.com> Date: Mon Aug 4 22:48:06 2025 +0900 Crumble Client Initial CRYPTO (aka chaos protection) To fix haproxy CRYPTO frames buffering once and for all, an alternative non-contiguous buffer named ncbmbuf has been recently implemented. This type does not suffer from gaps size limitation, albeit at the cost of a small reduction in the size available for data storage. Thus, the purpose of this current patch is to replace ncbuf with the newer ncbmbuf for QUIC CRYPTO frames parsing. Now, ncbmb_add() is used to buffer received frames which is guaranteed to suceed. The only remaining case of error is if a received frame offset and length exceed the ncbmbuf data storage, which would result in a CRYPTO_BUFFER_EXCEEDED error code. A notable behavior change when switching to ncbmbuf implementation is that NCB_ADD_COMPARE mode cannot be used anymore during add. Instead, crypto frame content received at a similar offset will be overwritten. A final note regarding STREAM frames parsing. For now, it is considered unnecessary to switch from ncbuf in this case. Indeed, QUIC clients does not perform aggressive fragmentation for them. Keeping ncbuf ensure that the data storage size is bigger than the equivalent ncbmbuf area. This should fix github issue #3141. This patch must be backported up to 2.6. It is first necessary to pick the relevant commits for ncbmbuf implementation prior to it.	2025-10-22 15:04:41 +02:00
Amaury Denoyelle	8b8ab2824e	MINOR: ncbmbuf: implement advance operation Implement ncbmb_advance() function for the ncbmbuf type. This allows to remove bytes in front of the buffer, regardless of the existing gaps. This is implemented by resetting the corresponding bits of the bitmap. As the previous patch, this commit must be backported prior to the fix to come on QUIC CRYPTO frames parsing.	2025-10-22 15:04:06 +02:00
Amaury Denoyelle	42c495f3d7	MINOR: ncbmbuf: implement ncbmb_data() Implement ncbmb_data() function for the ncbmbuf type. Its purpose is similar to its ncbuf counterpart : it returns the size in bytes of data starting at a specific offset until the next gap. As the previous patch, this commit must be backported prior to the fix to come on QUIC CRYPTO frames parsing.	2025-10-22 15:04:06 +02:00
Amaury Denoyelle	1e1a3aa6aa	MINOR: ncbmbuf: implement add This patch implements add operation for ncbmbuf type. This function is simpler than its ncbuf counterpart. Indeed, for now only NCB_ADD_OVERWRT mode is supported. This compromise has been chosen as ncbmbuf will be first used for QUIC CRYPTO frames handling, which does not mandate to compare existing filled blocks during insertion. As the previous patch, this commit must be backported prior to the fix to come on QUIC CRYPTO frames parsing.	2025-10-22 15:04:06 +02:00
Amaury Denoyelle	b9f91ad3ff	MINOR: ncbmbuf: define new ncbmbuf type Define ncbmbuf which is an alternative non-contiguous buffer implementation. "bm" abbreviation stands for bitmap, which reflects how gaps and filled blocks are encoded. The main purpose of this implementation is to get rid of the ncbuf limitation regarding the minimal size for gaps between two blocks of data. This commit adds the new module ncbmbuf. Along with it, some utility functions such as ncbmb_make(), ncbmb_init() and ncbmb_is_empty() are defined. Public API of ncbmbuf will be extended in the following patches. This patch is not considered a bug fix. However, it will be required to fix issue encountered on QUIC CRYPTO frames parsing. Thus, it will be necessary to backport the current patch prior to the fix to come.	2025-10-22 15:04:06 +02:00
Amaury Denoyelle	59f0bafef2	MINOR: ncbuf: extract common types ncbuf is a module which provide a non-contiguous buffer type implementation. This patch extracts some basic types related to it into a new file ncbuf_common.h. This patch will be useful to provide a new non-contiguous buffer alternative implementation based on a bitmap. This patch is not a bug fix. However, it is necessary for ncbmbuf implementation which will be required to fix a QUIC issue on CRYPTO frames parsing. This, it will be necessary to backport the current patch prior to the fix to come.	2025-10-22 11:11:20 +02:00
Olivier Houchard	d5562e31bd	MEDIUM: stick-tables: Remove the table lock Remove the table lock, it was only protecting the per-table expiration date, and that task is gone.	2025-10-20 15:04:47 +02:00
Olivier Houchard	8bc8a21b25	MEDIUM: stick-tables: Use a per-shard expiration task Instead of having per-table expiration tasks, just use one per shard. The task will now go through all the tables to expire entries. When a table gets an expiration earlier than the one previously known, it will be put in a mt-list, and the task will be responsible to put it into an eb32, ordered based on the next expiration. Each per-shard task will run on a different thread, so it should lead to a better load distribution than the per-table tasks.	2025-10-20 15:04:47 +02:00
Olivier Houchard	945aa0ea82	MINOR: initcalls: Add a new initcall stage, STG_INIT_2 Add a new initcall stage, STG_INIT_2, for stuff to be called after step_init_2() is called, so after we know for sure that global.nbthread will be set. Modify stick-tables stkt_late_init() to run at STG_INIT_2 instead of STG_INIT, in anticipation for it to be enhanced and have a need for global.nbthread.	2025-10-20 15:04:41 +02:00
Olivier Houchard	7a33b90b3c	BUG/MEDIUM: mt_list: Make sure not to unlock the element twice Some checks are pending Contrib / build (push) Waiting to run Details alpine/musl / gcc (push) Waiting to run Details VTest / Generate Build Matrix (push) Waiting to run Details VTest / (push) Blocked by required conditions Details Windows / Windows, gcc, all features (push) Waiting to run Details In mt_list_delete(), if the element was not in a list, then n and p will point to it, and so setting n->prev and n->next will be enough to unlock it. Don't do it twice, as once it's been done the first time, another thread may be working with it, and may have added it to a list already, and doing it a second time can lead to list inconsistencies. This should be backported up to 2.8.	2025-10-19 23:21:42 +02:00
Frederic Lecaille	51eca5cbce	BUG/MINOR: quic: SSL counters not handled Some checks are pending Contrib / build (push) Waiting to run Details alpine/musl / gcc (push) Waiting to run Details VTest / Generate Build Matrix (push) Waiting to run Details VTest / (push) Blocked by required conditions Details Windows / Windows, gcc, all features (push) Waiting to run Details The SSL counters were not handled at all for QUIC connections. This patch implement ssl_sock_update_counters() extracting the code from ssl_sock.c and call this function where applicable both in TLS/TCP and QUIC parts. Must be backported as far as 2.8.	2025-10-17 12:13:43 +02:00
Willy Tarreau	17930edecc	MEDIUM: pools: detect() when munmap() fails in UAF mode Some checks failed Contrib / build (push) Has been cancelled Details alpine/musl / gcc (push) Has been cancelled Details VTest / Generate Build Matrix (push) Has been cancelled Details Windows / Windows, gcc, all features (push) Has been cancelled Details VTest / (push) Has been cancelled Details Better check that munmap() always works, otherwise it means we might have miscalculated an address, and if it fails silently, it will eat all the memory extremely quickly. Let's add a BUG_ON() on munmap's return.	2025-10-13 19:22:31 +02:00
Willy Tarreau	0e6a233217	BUG/MEDIUM: pools: fix bad freeing of aligned pools in UAF mode As reported by Christopher, in UAF mode memory release of aligned objects as introduced in commit `ef915e672a` ("MEDIUM: pools: respect pool alignment in allocations") does not work. The padding calculation in the freeing code is no longer correct since it now depends on the alignment, so munmap() fails on EINVAL. Fortunately we don't care much about it since we know it's the low bits of the passed address, which is much simpler to compute, since all mmaps are page-aligned. There's no need to backport this, as this was introduced in 3.3.	2025-10-13 19:19:39 +02:00
Willy Tarreau	fda6dc9597	MINOR: regex: use a thread-local match pointer for pcre2 The pcre2 matching requires an array of matches for grouping, that is allocated when executing the rule by pre-processing it, and that is immediately freed after use. This is quite inefficient and results in annoying patterns in "show profiling" that attribute the allocations to libpcre2 and the releases to haproxy. A good suggestion from Dragan is to pre-allocate these per thread, since the entry is not specific to a regex. In addition we're already limited to MAX_MATCH matches so we don't even have the problem of having to grow it while parsing nor processing. The current patch adds a per-thread pair of init/deinit functions to allocate a thread-local entry for that, and gets rid of the dynamic allocations. It will result in cleaner memory management patterns and slightly higher performance (+2.5%) when using pcre2.	2025-10-13 16:56:43 +02:00
Remi Tricot-Le Breton	bf5b912a62	MINOR: jwt: Add specific error code for known but unavailable certificate A certificate that does not have the 'jwt' flag enabled cannot be used for JWT validation. We now raise a specific return value so that such a case can be identified.	2025-10-13 10:38:52 +02:00
Remi Tricot-Le Breton	18ff130e9d	MINOR: jwt: Add new "jwt" certificate option This option can be used to enable the use of a given certificate for JWT verification. It defaults to 'off' so certificates that are declared in a crt-store and will be used for JWT verification must have a "jwt on" option in the configuration.	2025-10-13 10:38:52 +02:00
Remi Tricot-Le Breton	53957c50c3	MINOR: jwt: Do not look into ckch_store for jwt_verify converter We must not try to load full-on certificates for 'jwt_verify' converter anymore. 'jwt_verify_cert' is the only one that accepts a certificate.	2025-10-13 10:38:52 +02:00
Remi Tricot-Le Breton	f5632fd481	MINOR: jwt: Add new jwt_verify_cert converter This converter will be in charge of performing the same operation as the 'jwt_verify' one except that it takes a full-on pem certificate path instead of a public key path as parameter. The certificate path can be either provided directly as a string or via a variable. This allows to use certificates that are not known during init to perform token validation.	2025-10-13 10:38:52 +02:00
Remi Tricot-Le Breton	c3c0597a34	MEDIUM: jwt: Remove certificate support in jwt_verify converter The jwt_verify converter will not take full-on certificates anymore in favor of a new soon to come jwt_verify_cert. We might end up with a new jwt_verify_hmac in the future as well which would allow to deprecate the jwt_verify converter and remove the need for a specific internal tree for public keys. The logic to always look into the internal jwt tree by default and resolve to locking the ckch tree as little as possible will also be removed. This allows to get rid of the duplicated reference to EVP_PKEYs, the one in the jwt tree entry and the one in the ckch_store.	2025-10-13 10:38:52 +02:00
Christopher Faulet	4145a61101	BUG/MEDIUM: stconn: Properly forward kip to the opposite SE descriptor By refactoring the HTX to remove the extra field, a bug was introduced in the stream-connector part. The <kip> (known input payload) value of a sedesc was moved to <kop> (knwon output payload) using the same sedesc. Of course, this is totally wrong. <kip> value of a sedesc must be forwarded to the opposite side. In addition, the operation is performed in sc_conn_send(). In this function, we manipulate the stream-connectors. So se_fwd_kip() function was changed to use the stream-connectors directely. Now, the function sc_ep_fwd_kip() is now called with the both stream-connectors to properly forward <kip> from on side to the opposite side. The bug is 3.3-specific. No backport needed.	2025-10-10 11:01:21 +02:00
Christopher Faulet	914538cd39	MEDIUM: htx: Remove the HTX extra field Thanks for previous changes, it is now possible to remove the <extra> field from the HTX structure. HTX_FL_ALTERED_PAYLOAD flag is also removed because it is now unsued.	2025-10-08 11:10:42 +02:00
Christopher Faulet	c0b6db2830	MINOR: stconn: Add two fields in sedesc to replace the HTX extra value For now, the HTX extra value is used to specify the known part, in bytes, of the HTTP payload we will receive. It may concerne the full payload if a content-length is specified or the current chunk for a chunk-encoded message. The main purpose of this value is to be used on the opposite side to be able to announce chunks bigger than a buffer. It can also be used to check the validity of the payload on the sending path, to properly detect too big or too short payload. However, setting this information in the HTX message itself is not really appropriate because the information is lost when the HTX message is consumed and the underlying buffer released. So the producer must take care to always add it in all HTX messages. it is especially an issue when the payload is altered by a filter. So to fix this design issue, the information will be moved in the sedesc. It is a persistent area to save the information. In addition, to avoid the ambiguity between what the producer say and what the consumer see, the information will be splitted in two fields. In this patch, the fields are added: * kip : The known input payload length * kop : The known output payload lenght The producer will be responsible to set <kip> value. The stream will be responsible to decrement <kip> and increment <kop> accordingly. And the consumer will be responsible to remove consumed bytes from <kop>.	2025-10-08 11:01:36 +02:00
Willy Tarreau	75103e7701	MINOR: proxy: introduce proxy_abrt_close_def() to pass the desired default With this function we can now pass the desired default value for the abortonclose option when neither the option nor its opposite were set. Let's also take this opportunity for using it directly from the HTTP analyser since there's no point in re-checking the proxy's mode there.	2025-10-08 10:29:41 +02:00
Willy Tarreau	644b3dc7d8	MAJOR: proxy: enable abortonclose by default on HTTP proxies As discussed on https://github.com/orgs/haproxy/discussions/3146 and on the mailing list, there's a marked preference for having abortonclose enabled by default when relevant. The point being that with todays' internet, the large majority of requests sent with a closed input channel are aborted requests, and that it's pointless to waste resources processing them. This patch now considers both "option abortonclose" and its opposite "no option abortonclose" to figure whether abortonclose is enabled or disabled in a backend. When neither are set (thus not even inherited from a defaults section), then it considers the proxy's mode, and HTTP mode implies abortonclose by default. This may make some legacy services fail starting with 3.3. In this case it will be sufficient to add "no option abortonclose" in either the affected backend or the defaults section it derives from. But for internet-facing proxies it's better to stay with the option enabled.	2025-10-08 10:29:41 +02:00
Willy Tarreau	fe47e8dfc5	MINOR: proxy: only check abortonclose through a dedicated function In order to prepare for changing the way abortonclose works, let's replace the direct flag check with a similarly named function (proxy_abrt_close) which returns the on/off status of the directive for the proxy. For now it simply reflects the flag's state.	2025-10-08 10:29:41 +02:00
William Lallemand	69bd253b23	CLEANUP: mjson: remove unused defines from mjson.h This patch removes unused defines from mjson.h. It also removes unused c++ declarations and includes. string.h is moved to mjson.c	2025-10-06 09:30:07 +02:00
William Lallemand	8ea8aaace2	CLEANUP: mjson: remove MJSON_ENABLE_BASE64 code Remove the code used under #if MJSON_ENABLE_BASE64, which is not used within haproxy, to ease the maintenance of mjson.	2025-10-03 16:09:13 +02:00
William Lallemand	4edb05eb12	CLEANUP: mjson: remove MJSON_ENABLE_NEXT code Remove the code used under #if MJSON_ENABLE_NEXT, which is not used within haproxy, to ease the maintenance of mjson.	2025-10-03 16:08:17 +02:00
William Lallemand	a4eeeeeb07	CLEANUP: mjson: remove MJSON_ENABLE_PRINT code Remove the code used under #if MJSON_ENABLE_PRINT, which is not used within haproxy, to ease the maintenance of mjson.	2025-10-03 16:07:59 +02:00
William Lallemand	d63dfa34a2	CLEANUP: mjson: remove MJSON_ENABLE_RPC code Remove the code used under #if MJSON_ENABLE_RPC, which is not used within haproxy, to ease the maintenance of mjson.	2025-10-03 16:06:33 +02:00
Willy Tarreau	1afaa7b59d	MINOR: rawsock: introduce CO_RFL_TRY_HARDER to detect closures on complete reads Normally, when reading a full buffer, or exactly the requested size, it is not really possible to know if the peer had closed immediately after, and usually we don't care. There's a problematic case, though, which is with SSL: the SSL layer reads in small chunks of a few bytes, and can consume a client_hello this way, then start computation without knowing yet that the client has aborted. In order to permit knowing more, we now introduce a new read flag, CO_RFL_TRY_HARDER, which says that if we've read up to the permitted limit and the flag is set, then we attempt one extra byte using MSG_PEEK to detect whether the connection was closed immediately after that content or not. The first use case will obviously be related to SSL and client_hello, but it might possibly also make sense on HTTP responses to detect a pending FIN at the end of a response (e.g. if a close was already advertised).	2025-10-01 10:23:01 +02:00
Willy Tarreau	25f5f357cc	MINOR: sched: pass the thread number to is_sched_alive() Now it will be possible to query any thread's scheduler state, not only the current one. This aims at simplifying the watchdog checks for reported threads. The operation is now a simple atomic xchg.	2025-10-01 10:18:53 +02:00
Olivier Houchard	cf26745857	MINOR: mt_list: Implement MT_LIST_POP_LOCKED() Implement MT_LIST_POP_LOCKED(), that behaves as MT_LIST_POP() and removes the first element from the list, if any, but keeps it locked. This should be backported to 3.2, as it will be use in a bug fix in the stick tables that affects 3.2 too.	2025-09-30 16:25:07 +02:00
William Lallemand	b70c7f48fa	MINOR: acme: implement "reuse-key" option The new "reuse-key" option in the "acme" section, allows to keep the private key instead of generating a new one at each renewal.	2025-09-27 21:41:39 +02:00
William Lallemand	3e72a9f618	MINOR: acme: provider-name for dpapi sink Like "acme-vars", the "provider-name" in the acme section is used in case of DNS-01 challenge and is sent to the dpapi sink. This is used to pass the name of a DNS provider in order to chose the DNS API to use. This patch implements the cfg_parse_acme_vars_provider() which parses either acme-vars or provider-name options and escape their strings. Example: $ ( echo "@@1 show events dpapi -w -0"; cat - ) \| socat /tmp/master.sock - \| cat -e <0>2025-09-18T17:53:58.831140+02:00 acme deploy foobpar.pem thumbprint gDvbPL3w4J4rxb8gj20mGEgtuicpvltnTl6j1kSZ3vQ$ acme-vars "var1=foobar\"toto\",var2=var2"$ provider-name "godaddy"$ {$ "identifier": {$ "type": "dns",$ "value": "example.com"$ },$ "status": "pending",$ "expires": "2025-09-25T14:41:57Z",$ [...]	2025-09-26 10:23:35 +02:00
William Lallemand	92c31a6fb7	MINOR: acme: acme-vars allow to pass data to the dpapi sink In the case of the dns-01 challenge, the agent that handles the challenge might need some extra information which depends on the DNS provider. This patch introduces the "acme-vars" option in the acme section, which allows to pass these data to the dpapi sink. The double quotes will be escaped when printed in the sink. Example: global setenv VAR1 'foobar"toto"' acme LE directory https://acme-staging-v02.api.letsencrypt.org/directory challenge DNS-01 acme-vars "var1=${VAR1},var2=var2" Would output: $ ( echo "@@1 show events dpapi -w -0"; cat - ) \| socat /tmp/master.sock - \| cat -e <0>2025-09-18T17:53:58.831140+02:00 acme deploy foobpar.pem thumbprint gDvbPL3w4J4rxb8gj20mGEgtuicpvltnTl6j1kSZ3vQ$ acme-vars "var1=foobar\"toto\",var2=var2"$ {$ "identifier": {$ "type": "dns",$ "value": "example.com"$ },$ "status": "pending",$ "expires": "2025-09-25T14:41:57Z",$ [...]	2025-09-19 16:40:53 +02:00
Aurelien DARRAGON	5c299dee5a	MEDIUM: stats: consider that shared stats pointers may be NULL This patch looks huge, but it has a very simple goal: protect all accessed to shared stats pointers (either read or writes), because we know consider that these pointers may be NULL. The reason behind this is despite all precautions taken to ensure the pointers shouldn't be NULL when not expected, there are still corner cases (ie: frontends stats used on a backend which no FE cap and vice versa) where we could try to access a memory area which is not allocated. Willy stumbled on such cases while playing with the rings servers upon connection error, which eventually led to process crashes (since 3.3 when shared stats were implemented) Also, we may decide later that shared stats are optional and should be disabled on the proxy to save memory and CPU, and this patch is a step further towards that goal. So in essence, this patch ensures shared stats pointers are always initialized (including NULL), and adds necessary guards before shared stats pointers are de-referenced. Since we already had some checks for backends and listeners stats, and the pointer address retrieval should stay in cpu cache, let's hope that this patch doesn't impact stats performance much.	2025-09-18 16:49:51 +02:00
Willy Tarreau	08c6bbb542	OPTIM: sink: don't waste time calling sink_announce_dropped() if busy If we see that another thread is already busy trying to announce the dropped counter, there's no point going there, so let's just skip all that operation from sink_write() and avoid disturbing the other thread. This results in a boost from 244 to 262k req/s.	2025-09-18 09:07:35 +02:00
Willy Tarreau	361c227465	MINOR: trace: don't call strlen() on the function's name Currently there's a small mistake in the way the trace function and macros. The calling function name is known as a constant until the macro and passed as-is to the __trace() function. That one needs to know its length and will call ist() on it, resulting in a real call to strlen() while that length was known before the call. Let's use an ist instead of a const char* for __trace() and __trace_enabled() so that we can now completely avoid calling strlen() during this operation. This has significantly reduced the importance of __trace_enabled() in perf top.	2025-09-18 08:31:57 +02:00
Willy Tarreau	8c077c17eb	MINOR: server: add the "cc" keyword to set the TCP congestion controller It is possible on at least Linux and FreeBSD to set the congestion control algorithm to be used with outgoing connections, among the list of supported and permitted ones. Let's expose this setting with "cc". Unknown or forbidden algorithms will be ignored and the default one will continue to be used.	2025-09-17 17:19:33 +02:00
Willy Tarreau	4ed3cf295d	MINOR: listener: add the "cc" bind keyword to set the TCP congestion controller It is possible on at least Linux and FreeBSD to set the congestion control algorithm to be used with incoming connections, among the list of supported and permitted ones. Let's expose this setting with "cc". Permission issues might be reported (as warnings).	2025-09-17 17:03:42 +02:00
Ben Kallus	31d0695a6a	IMPORT: ebtree: replace hand-rolled offsetof to avoid UB The C standard specifies that it's undefined behavior to dereference NULL (even if you use & right after). The hand-rolled offsetof idiom &(((s)NULL)->f) is thus technically undefined. This clutters the output of UBSan and is simple to fix: just use the real offsetof when it's available. Note that there's no clear statement about this point in the spec, only several points which together converge to this: - From N3220, 6.5.3.4: A postfix expression followed by the -> operator and an identifier designates a member of a structure or union object. The value is that of the named member of the object to which the first expression points, and is an lvalue. - From N3220, 6.3.2.1: An lvalue is an expression (with an object type other than void) that potentially designates an object; if an lvalue does not designate an object when it is evaluated, the behavior is undefined. - From N3220, 6.5.4.4 p3: The unary & operator yields the address of its operand. If the operand has type "type", the result has type "pointer to type". If the operand is the result of a unary operator, neither that operator nor the & operator is evaluated and the result is as if both were omitted, except that the constraints on the operators still apply and the result is not an lvalue. Similarly, if the operand is the result of a [] operator, neither the & operator nor the unary * that is implied by the [] is evaluated and the result is as if the & operator were removed and the [] operator were changed to a + operator. => In short, this is saying that C guarantees these identities: 1. &(p) is equivalent to p 2. &(p[n]) is equivalent to p + n As a consequence, &(p) doesn't result in the evaluation of *p, only the evaluation of p (and similar for []). There is no corresponding special carve-out for ->. See also: https://pvs-studio.com/en/blog/posts/cpp/0306/ After this patch, HAProxy can run without crashing after building w/ clang-19 -fsanitize=undefined -fno-sanitize=function,alignment This is ebtree commit bd499015d908596f70277ddacef8e6fa998c01d5. Signed-off-by: Willy Tarreau <w@1wt.eu> This is ebtree commit 5211c2f71d78bf546f5d01c8d3c1484e868fac13.	2025-09-17 14:30:32 +02:00
Willy Tarreau	a31da78685	IMPORT: ebtree: add a definition of offsetof() We'll use this to improve the definition of container_of(). Let's define it if it does not exist. We can rely on __builtin_offsetof() on recent enough compilers. This is ebtree commit 1ea273e60832b98f552b9dbd013e6c2b32113aa5. Signed-off-by: Willy Tarreau <w@1wt.eu> This is ebtree commit 69b2ef57a8ce321e8de84486182012c954380401.	2025-09-17 14:30:32 +02:00
Ben Kallus	ddbff4e235	IMPORT: ebtree: Fix UB from clz(0) From 'man gcc': passing 0 as the argument to "__builtin_ctz" or "__builtin_clz" invokes undefined behavior. This triggers UBsan in HAProxy. [wt: tested in treebench and verified not to cause any performance regression with opstime-u32 nor stress-u32] Signed-off-by: Willy Tarreau <w@1wt.eu> This is ebtree commit 8c29daf9fa6e34de8c7684bb7713e93dcfe09029. Signed-off-by: Willy Tarreau <w@1wt.eu> This is ebtree commit cf3b93736cb550038325e1d99861358d65f70e9a.	2025-09-17 14:30:32 +02:00
Willy Tarreau	52c6dd773d	IMPORT: ebst: use prefetching in lookup() and insert() While the previous optimizations couldn't be preserved due to the possibility of out-of-bounds accesses, at least the prefetch is useful. A test on treebench shows that for 64k short strings, the lookup time falls from 276 to 199ns per lookup (28% savings), and the insert falls from 311 to 296ns (4.9% savings), which are pretty respectable, so let's do this. This is ebtree commit b44ea5d07dc1594d62c3a902783ed1fb133f568d.	2025-09-17 14:30:32 +02:00
Willy Tarreau	fef4cfbd21	IMPORT: ebtree: only use __builtin_prefetch() when supported It looks like __builtin_prefetch() appeared in gcc-3.1 as there's no mention of it in 3.0's doc. Let's replace it with eb_prefetch() which maps to __builtin_prefetch() on supported compilers and falls back to the usual do{}while(0) on other ones. It was tested to properly build with tcc as well as gcc-2.95. This is ebtree commit 7ee6ede56a57a046cb552ed31302b93ff1a21b1a.	2025-09-17 14:30:32 +02:00
Willy Tarreau	3dda813d54	IMPORT: eb32/64: optimize insert for modern CPUs Similar to previous patches, let's improve the insert() descent loop to avoid discovering mandatory data too late. The change here is even simpler than previous ones, a prefetch was installed and troot is calculated before last instruction in a speculative way. This was enough to gain +50% insertion rate on random data. This is ebtree commit e893f8cc4d44b10f406b9d1d78bd4a9bd9183ccf.	2025-09-17 14:30:32 +02:00
Willy Tarreau	61654c07bd	IMPORT: ebmb: optimize the lookup for modern CPUs This is the same principles as for the latest improvements made on integer trees. Applying the same recipes made the ebmb_lookup() function jump from 10.07 to 12.25 million lookups per second on a 10k random values tree (+21.6%). It's likely that the ebmb_lookup_longest() code could also benefit from this, though this was neither explored nor tested. This is ebtree commit a159731fd6b91648a2fef3b953feeb830438c924.	2025-09-17 14:30:32 +02:00
Willy Tarreau	6c54bf7295	IMPORT: eb32/eb64: place an unlikely() on the leaf test In the loop we can help the compiler build slightly more efficient code by placing an unlikely() around the leaf test. This shows a consistent 0.5% performance gain both on eb32 and eb64. This is ebtree commit 6c9cdbda496837bac1e0738c14e42faa0d1b92c4.	2025-09-17 14:30:32 +02:00
Willy Tarreau	384907f4e7	IMPORT: eb32: drop the now useless node_bit variable This one was previously used to preload from the node and keep a copy in a register on i386 machines with few registers. With the new more optimal code it's totally useless, so let's get rid of it. By the way the 64 bit code didn't use that at all already. This is ebtree commit 1e219a74cfa09e785baf3637b6d55993d88b47ef.	2025-09-17 14:30:31 +02:00
Willy Tarreau	c9e4adf608	IMPORT: eb32/eb64: use a more parallelizable check for lack of common bits Instead of shifting the XOR value right and comparing it to 1, which roughly requires 2 sequential instructions, better test if the XOR has any bit above the current bit, which means any bit set among those strictly higher, or in other words that XOR & (-bit << 1) is non-zero. This is one less instruction in the fast path and gives another nice performance gain on random keys (in million lookups/s): eb32 1k: 33.17 -> 37.30 +12.5% 10k: 15.74 -> 17.08 +8.51% 100k: 8.00 -> 9.00 +12.5% eb64 1k: 34.40 -> 38.10 +10.8% 10k: 16.17 -> 17.10 +5.75% 100k: 8.38 -> 8.87 +5.85% This is ebtree commit c942a2771758eed4f4584fe23cf2914573817a6b.	2025-09-17 14:30:31 +02:00
Willy Tarreau	6af17d491f	IMPORT: eb32/eb64: reorder the lookup loop for modern CPUs The current code calculates the next troot based on a calculation. This was efficient when the algorithm was developed many years ago on K6 and K7 CPUs running at low frequencies with few registers and limited branch prediction units but nowadays with ultra-deep pipelines and high latency memory that's no longer efficient, because the CPU needs to have completed multiple operations before knowing which address to start fetching from. It's sad because we only have two branches each time but the CPU cannot know it. In addition, the calculation is performed late in the loop, which does not help the address generation unit to start prefetching next data. Instead we should help the CPU by preloading data early from the node and calculing troot as soon as possible. The CPU will be able to postpone that processing until the dependencies are available and it really needs to dereference it. In addition we must absolutely avoid serializing instructions such as "(a >> b) & 1" because there's no way for the compiler to parallelize that code nor for the CPU to pre- process some early data. What this patch does is relatively simple: - we try to prefetch the next two branches as soon as the node is known, which will help dereference the selected node in the next iteration; it was shown that it only works with the next changes though, otherwise it can reduce the performance instead. In practice the prefetching will start a bit later once the node is really in the cache, but since there's no dependency between these instructions and any other one, we let the CPU optimize as it wants. - we preload all important data from the node (next two branches, key and node.bit) very early even if not immediately needed. This is cheap, it doesn't cause any pipeline stall and speeds up later operations. - we pre-calculate 1<<bit that we assign into a register, so as to avoid serializing instructions when deciding which branch to take. - we assign the troot based on a ternary operation (or if/else) so that the CPU knows upfront the two possible next addresses without waiting for the end of a calculation and can prefetch their contents every time the branch prediction unit guesses right. Just doing this provides significant gains at various tree sizes on random keys (in million lookups per second): eb32 1k: 29.07 -> 33.17 +14.1% 10k: 14.27 -> 15.74 +10.3% 100k: 6.64 -> 8.00 +20.5% eb64 1k: 27.51 -> 34.40 +25.0% 10k: 13.54 -> 16.17 +19.4% 100k: 7.53 -> 8.38 +11.3% The performance is now much closer to the sequential keys. This was done for all variants ({32,64}{,i,le,ge}). Another point, the equality test in the loop improves the performance when looking up random keys (since we don't need to reach the leaf), but is counter-productive for sequential keys, which can gain ~17% without that test. However sequential keys are normally not used with exact lookups, but rather with lookup_ge() that spans a time frame, and which does not have that test for this precise reason, so in the end both use cases are served optimally. It's interesting to note that everything here is solely based on data dependencies, and that trying to perform less operations upfront always ends up with lower performance (typically the original one). This is ebtree commit 05a0613e97f51b6665ad5ae2801199ad55991534.	2025-09-17 14:30:31 +02:00
Aurelien DARRAGON	644b6b9925	MINOR: counters: document that tg shared counters are tied to shm-stats-file mapping Let's explicitly mention that fe_counters_shared_tg and be_counters_shared_tg structs are embedded in shm_stats_file_object struct so any change in those structs will result in shm stats file incompatibility between processes, thus extra precaution must be taken when making changes to them. Note that the provisionning made in shm_stats_file_object struct could be used to add members to {fe,be}_counters_shared_tg without changing shm_stats_file_object struct size if needed in order to preserve shm stats file version.	2025-09-17 11:31:29 +02:00
Willy Tarreau	4edff4a2cc	CLEANUP: vars: use the item API for the variables trees The variables trees use the immediate cebtree API, better use the item one which is more expressive and safer. The "node" field was renamed to "name_node" to avoid any ambiguity.	2025-09-16 10:51:23 +02:00
Willy Tarreau	2d6b5c7a60	MEDIUM: connection: reintegrate conn_hash_node into connection Previously the conn_hash_node was placed outside the connection due to the big size of the eb64_node that could have negatively impacted frontend connections. But having it outside also means that one extra allocation is needed for each backend connection, and that one memory indirection is needed for each lookup. With the compact trees, the tree node is smaller (16 bytes vs 40) so the overhead is much lower. By integrating it into the connection, We're also eliminating one pointer from the connection to the hash node and one pointer from the hash node to the connection (in addition to the extra object bookkeeping). This results in saving at least 24 bytes per total backend connection, and only inflates connections by 16 bytes (from 240 to 256), which is a reasonable compromise. Tests on a 64-core EPYC show a 2.4% increase in the request rate (from 2.08 to 2.13 Mrps).	2025-09-16 09:23:46 +02:00
Willy Tarreau	ceaf8c1220	MEDIUM: connection: move idle connection trees to ceb64 Idle connection trees currently require a 56-byte conn_hash_node per connection, which can be reduced to 32 bytes by moving to ceb64. While ceb64 is theoretically slower, in practice here we're essentially dealing with trees that almost always contain a single key and many duplicates. In this case, ceb64 insert and lookup functions become faster than eb64 ones because all duplicates are a list accessed in O(1) while it's a subtree for eb64. In tests it is impossible to tell the difference between the two, so it's worth reducing the memory usage. This commit brings the following memory savings to conn_hash_node (one per backend connection), and to srv_per_thread (one per thread and per server): struct before after delta conn_hash_nodea 56 32 -24 srv_per_thread 96 72 -24 The delicate part is conn_delete_from_tree(), because we need to know the tree root the connection is attached to. But thanks to recent cleanups, it's now clear enough (i.e. idle/safe/avail vs session are easy to distinguish).	2025-09-16 09:23:46 +02:00
Willy Tarreau	95b8adff67	MINOR: connection: pass the thread number to conn_delete_from_tree() We'll soon need to choose the server's root based on the connection's flags, and for this we'll need the thread it's attached to, which is not always the current one. This patch simply passes the thread number from all callers. They know it because they just set the idle_conns lock on it prior to calling the function.	2025-09-16 09:23:46 +02:00
Willy Tarreau	7773d87ea6	CLEANUP: proxy: slightly reorganize fields to plug some holes The proxy struct has several small holes that deserved being plugged by moving a few fields around. Now we're down to 3056 from 3072 previously, and the remaining holes are small. At the moment, compared to before this series, we're seeing these sizes: type\size `7d554ca62` current delta listener 752 704 -48 (-6.4%) server 4032 3840 -192 (-4.8%) proxy 3184 3056 -128 (-4%) stktable 3392 3328 -64 (-1.9%) Configs with many servers have shrunk by about 4% in RAM and configs with many proxies by about 3%.	2025-09-16 09:23:46 +02:00
Willy Tarreau	8df81b6fcc	CLEANUP: server: slightly reorder fields in the struct to plug holes The struct server still has a lot of holes and padding that make it quite big. By moving a few fields aronud between areas which do not interact (e.g. boot vs aligned areas), it's quite easy to plug some of them and/or to arrange larger ones which could be reused later with a bit more effort. Here we've reduced holes by 40 bytes, allowing the struct to shrink by one more cache line (64 bytes). The new size is 3840 bytes.	2025-09-16 09:23:46 +02:00
Willy Tarreau	d18d972b1f	MEDIUM: server: index server ID using compact trees The server ID is currently stored as a 32-bit int using an eb32 tree. It's used essentially to find holes in order to automatically assign IDs, and to detect duplicates. Let's change this to use compact trees instead in order to save 24 bytes in struct server for this node, plus 8 bytes in struct proxy. The server struct is still 3904 bytes large (due to alignment) and the proxy struct is 3072.	2025-09-16 09:23:46 +02:00
Willy Tarreau	66191584d1	MEDIUM: listener: index listener ID using compact trees The listener ID is currently stored as a 32-bit int using an eb32 tree. It's used essentially to find holes in order to automatically assign IDs, and to detect duplicates. Let's change this to use compact trees instead in order to save 24 bytes in struct listener for this node, plus 8 bytes in struct proxy. The struct listener is now 704 bytes large, and the struct proxy 3080.	2025-09-16 09:23:46 +02:00
Willy Tarreau	1a95bc42c7	MEDIUM: proxy: index proxy ID using compact trees The proxy ID is currently stored as a 32-bit int using an eb32 tree. It's used essentially to find holes in order to automatically assign IDs, and to detect duplicates. Let's change this to use compact trees instead in order to save 24 bytes in struct proxy for this node, plus 8 bytes in the root (which is static so not much relevant here). Now the proxy is 3088 bytes large.	2025-09-16 09:23:46 +02:00
Willy Tarreau	eab5b89dce	MINOR: proxy: add proxy_index_id() to index a proxy by its ID This avoids needlessly exposing the tree's root and the mechanics outside of the low-level code.	2025-09-16 09:23:46 +02:00
Willy Tarreau	5e4b6714e1	MINOR: listener: add listener_index_id() to index a listener by its ID This avoids needlessly exposing the tree's root and the mechanics outside of the low-level code.	2025-09-16 09:23:46 +02:00
Willy Tarreau	5a5cec4d7a	MINOR: server: add server_index_id() to index a server by its ID This avoids needlessly exposing the tree's root and the mechanics outside of the low-level code.	2025-09-16 09:23:46 +02:00
Willy Tarreau	0b0aefe19b	MINOR: server: add server_get_next_id() to find next free server ID This was previously achieved via the generic get_next_id() but we'll soon get rid of generic ID trees so let's have a dedicated server_get_next_id(). As a bonus it reduces the exposure of the tree's root outside of the functions.	2025-09-16 09:23:46 +02:00
Willy Tarreau	23605eddb1	MINOR: listener: add listener_get_next_id() to find next free listener ID This was previously achieved via the generic get_next_id() but we'll soon get rid of generic ID trees so let's have a dedicated listener_get_next_id(). As a bonus it reduces the exposure of the tree's root outside of the functions.	2025-09-16 09:23:46 +02:00
Willy Tarreau	b2402d67b7	MINOR: proxy: add proxy_get_next_id() to find next free proxy ID This was previously achieved via the generic get_next_id() but we'll soon get rid of generic ID trees so let's have a dedicated proxy_get_next_id().	2025-09-16 09:23:46 +02:00
Willy Tarreau	f4059ea42f	MEDIUM: stktable: index table names using compact trees Here we're saving 64 bytes per stick-table, from 3392 to 3328, and the change was really straightforward so there's no reason not to do it.	2025-09-16 09:23:46 +02:00
Willy Tarreau	d0d60a007d	MEDIUM: proxy: switch conf.name to cebis_tree This is used to index the proxy's name and it contains a copy of the pointer to the proxy's name in <id>. Changing that for a ceb_node placed just before <id> saves 32 bytes to the struct proxy, which is now 3112 bytes large. Here we need to continue to support duplicates since they're still allowed between type-incompatible proxies. Interestingly, the use of cebis_next_dup() instead of cebis_next() in proxy_find_by_name() allows us to get rid of an strcmp() that was performed for each use_backend rule. A test with a large config (100k backends) shows that we can get 3% extra performance on a config involving a static use_backend rule (3.09M to 3.18M rps), and even 4.5% on a dynamic rule selecting a random backend (2.47M to 2.59M).	2025-09-16 09:23:46 +02:00
Willy Tarreau	fdf6fd5b45	MEDIUM: server: switch the host_dn member to cebis_tree This member is used to index the hostname_dn contents for DNS resolution. Let's replace it with a cebis_tree to save another 32 bytes (24 for the node + 8 by avoiding the duplication of the pointer). The struct server is now at 3904 bytes.	2025-09-16 09:23:46 +02:00
Willy Tarreau	413e903a22	MEDIUM: server: switch conf.name to cebis_tree This is used to index the server name and it contains a copy of the pointer to the server's name in <id>. Changing that for a ceb_node placed just before <id> saves 32 bytes to the struct server, which remains 3968 bytes large due to alignment. The proxy struct shrinks by 8 bytes to 3144. It's worth noting that the current way duplicate names are handled remains based on the previous mechanism where dups were permitted. Ideally we should now reject them during insertion and use unique key trees instead.	2025-09-16 09:23:46 +02:00
Willy Tarreau	0e99f64fc6	MEDIUM: server: switch addr_node to cebis_tree This contains the text representation of the server's address, for use with stick-tables with "srvkey addr". Switching them to a compact node saves 24 more bytes from this structure. The key was moved to an external pointer "addr_key" right after the node. The server struct is now 3968 bytes (down from 4032) due to alignment, and the proxy struct shrinks by 8 bytes to 3152.	2025-09-16 09:23:46 +02:00
Willy Tarreau	91258fb9d8	MEDIUM: guid: switch guid to more compact cebuis_tree The current guid struct size is 56 bytes. Once reduced using compact trees, it goes down to 32 (almost half). We're not on a critical path and size matters here, so better switch to this. It's worth noting that the name part could also be stored in the guid_node at the end to save 8 extra byte (no pointer needed anymore), however the purpose of this struct is to be embedded into other ones, which is not compatible with having a dynamic size. Affected struct sizes in bytes: Before After Diff server 4032 4032 0* proxy 3184 3160 -24 listener 752 728 -24 *: struct server is full of holes and padding (176 bytes) and is 64-byte aligned. Moving the guid_node elsewhere such as after sess_conn reduces it to 3968, or one less cache line. There's no point in moving anything now because forthcoming patches will arrange other parts.	2025-09-16 09:23:46 +02:00
Willy Tarreau	e36b3b60b3	MEDIUM: migrate the patterns reference to cebs_tree cebs_tree are 24 bytes smaller than ebst_tree (16B vs 40B), and pattern references are only used during map/acl updates, so their storage is pure loss between updates (which most of the time never happen). By switching their indexing to compact trees, we can save 16 to 24 bytes per entry depending on alightment (here it's 24 per struct but 16 practical as malloc's alignment keeps 8 unused). Tested on core i7-8650U running at 3.0 GHz, with a file containing 17.7M IP addresses (16.7M different): $ time ./haproxy -c -f acl-ip.cfg Save 280 MB RAM for 17.7M IP addresses, and slightly speeds up the startup (5.8%, from 19.2s to 18.2s), a part of which possible being attributed to having to write less memory. Note that this is on small strings. On larger ones such as user-agents, ebtree doesn't reread the whole key and might be more efficient. Before: RAM (VSZ/RSS): 4443912 3912444 real 0m19.211s user 0m18.138s sys 0m1.068s Overhead Command Shared Object Symbol 44.79% haproxy haproxy [.] ebst_insert 25.07% haproxy haproxy [.] ebmb_insert_prefix 3.44% haproxy libc-2.33.so [.] __libc_calloc 2.71% haproxy libc-2.33.so [.] _int_malloc 2.33% haproxy haproxy [.] free_pattern_tree 1.78% haproxy libc-2.33.so [.] inet_pton4 1.62% haproxy libc-2.33.so [.] _IO_fgets 1.58% haproxy libc-2.33.so [.] _int_free 1.56% haproxy haproxy [.] pat_ref_push 1.35% haproxy libc-2.33.so [.] malloc_consolidate 1.16% haproxy libc-2.33.so [.] __strlen_avx2 0.79% haproxy haproxy [.] pat_idx_tree_ip 0.76% haproxy haproxy [.] pat_ref_read_from_file 0.60% haproxy libc-2.33.so [.] __strrchr_avx2 0.55% haproxy libc-2.33.so [.] unlink_chunk.constprop.0 0.54% haproxy libc-2.33.so [.] __memchr_avx2 0.46% haproxy haproxy [.] pat_ref_append After: RAM (VSZ/RSS): 4166108 3634768 real 0m18.114s user 0m17.113s sys 0m0.996s Overhead Command Shared Object Symbol 38.99% haproxy haproxy [.] cebs_insert 27.09% haproxy haproxy [.] ebmb_insert_prefix 3.63% haproxy libc-2.33.so [.] __libc_calloc 3.18% haproxy libc-2.33.so [.] _int_malloc 2.69% haproxy haproxy [.] free_pattern_tree 1.99% haproxy libc-2.33.so [.] inet_pton4 1.74% haproxy libc-2.33.so [.] _IO_fgets 1.73% haproxy libc-2.33.so [.] _int_free 1.57% haproxy haproxy [.] pat_ref_push 1.48% haproxy libc-2.33.so [.] malloc_consolidate 1.22% haproxy libc-2.33.so [.] __strlen_avx2 1.05% haproxy libc-2.33.so [.] __strcmp_avx2 0.80% haproxy haproxy [.] pat_idx_tree_ip 0.74% haproxy libc-2.33.so [.] __memchr_avx2 0.69% haproxy libc-2.33.so [.] __strrchr_avx2 0.69% haproxy libc-2.33.so [.] _IO_getline_info 0.62% haproxy haproxy [.] pat_ref_read_from_file 0.56% haproxy libc-2.33.so [.] unlink_chunk.constprop.0 0.56% haproxy libc-2.33.so [.] cfree@GLIBC_2.2.5 0.46% haproxy haproxy [.] pat_ref_append If the addresses are totally disordered (via "shuf" on the input file), we see both implementations reach exactly 68.0s (slower due to much higher cache miss ratio). On large strings such as user agents (1 million here), it's now slightly slower (+9%): Before: real 0m2.475s user 0m2.316s sys 0m0.155s After: real 0m2.696s user 0m2.544s sys 0m0.147s But such patterns are much less common than short ones, and the memory savings do still count. Note that while it could be tempting to get rid of the list that chains all these pat_ref_elt together and only enumerate them by walking along the tree to save 16 extra bytes per entry, that's not possible due to the problem that insertion ordering is critical (think overlapping regex such as /index.* and /index.html). Currently it's not possible to proceed differently because patterns are first pre-loaded into the pat_ref via pat_ref_read_from_file_smp() and later indexed by pattern_read_from_file(), which has to only redo the second part anyway for maps/acls declared multiple times.	2025-09-16 09:23:46 +02:00
Willy Tarreau	ddf900a0ce	IMPORT: cebtree: import version 0.5.0 to support duplicates The support for duplicates is necessary for various use cases related to config names, so let's upgrade to the latest version which brings this support. This updates the cebtree code to commit 808ed67 (tag 0.5.0). A few tiny adaptations were needed: - replace a few ceb_node with ceb_root since pointers are now tagged ; - replace cebu.h with ceb.h since both are now merged in the same include file. This way we can drop the unused cebu*.h files from cebtree that are provided only for compatibility. - rename immediate storage functions to cebXX_imm_XXX() as per the API change in 0.5 that makes immediate explicit rather than implicit. This only affects vars and tools.c:copy_file_name(). The tests continue to work.	2025-09-16 09:23:46 +02:00
Remi Tricot-Le Breton	257df69fbd	BUG/MINOR: ocsp: Crash when updating CA during ocsp updates If an ocsp response is set to be updated automatically and some certificate or CA updates are performed on the CLI, if the CLI update happens while the OCSP response is being updated and is then detached from the udapte tree, it might be wrongly inserted into the update tree in 'ssl_sock_load_ocsp', and then reinserted when the update finishes. The update tree then gets corrupted and we could end up crashing when accessing other nodes in the ocsp response update tree. This patch must be backported up to 2.8. This patch fixes GitHub #3100.	2025-09-15 15:34:36 +02:00
Aurelien DARRAGON	6a92b14cc1	MEDIUM: log/proxy: store log-steps selection using a bitmask, not an eb tree An eb tree was used to anticipate for infinite amount of custom log steps configured at a proxy level. In turns out this makes no sense to configure that much logging steps for a proxy, and the cost of the eb tree is non negligible in terms of memory footprint, especially when used in a default section. Instead, let's use a simple bitmask, which allows up to 64 logging steps configured at proxy level. If we lack space some day (and need more than 64 logging steps to be configured), we could simply modify "struct log_steps" to spread the bitmask over multiple 64bits integers, minor some adjustments where the mask is set and checked.	2025-09-15 10:29:02 +02:00
Christopher Faulet	b582fd41c2	Revert "BUG/MINOR: ocsp: Crash when updating CA during ocsp updates" This reverts commit `167ea8fc7b`. The patch was backported by mistake.	2025-09-15 10:16:20 +02:00
Remi Tricot-Le Breton	167ea8fc7b	BUG/MINOR: ocsp: Crash when updating CA during ocsp updates If an ocsp response is set to be updated automatically and some certificate or CA updates are performed on the CLI, if the CLI update happens while the OCSP response is being updated and is then detached from the udapte tree, it might be wrongly inserted into the update tree in 'ssl_sock_load_ocsp', and then reinserted when the update finishes. The update tree then gets corrupted and we could end up crashing when accessing other nodes in the ocsp response update tree. This patch must be backported up to 2.8. This patch fixes GitHub #3100.	2025-09-15 08:20:16 +02:00
Willy Tarreau	8fb5ae5cc6	MINOR: activity/memory: count allocations performed under a lock By checking the current thread's locking status, it becomes possible to know during a memory allocation whether it's performed under a lock or not. Both pools and memprofile functions were instrumented to check for this and to increment the memprofile bin's locked_calls counter. This one, when not zero, is reported on "show profiling memory" with a percentage of all allocations that such locked allocations represent. This way it becomes possible to try to target certain code paths that are particularly expensive. Example: $ socat - /tmp/sock1 <<< "show profiling memory"\|grep lock 20297301 0 2598054528 0\| 0x62a820fa3991 sockaddr_alloc+0x61/0xa3 p_alloc(128) [pool=sockaddr] [locked=54962 (0.2 %)] 0 20297301 0 2598054528\| 0x62a820fa3a24 sockaddr_free+0x44/0x59 p_free(-128) [pool=sockaddr] [locked=34300 (0.1 %)] 9908432 0 1268279296 0\| 0x62a820eb8524 main+0x81974 p_alloc(128) [pool=task] [locked=9908432 (100.0 %)] 9908432 0 554872192 0\| 0x62a820eb85a6 main+0x819f6 p_alloc(56) [pool=tasklet] [locked=9908432 (100.0 %)] 263001 0 63120240 0\| 0x62a820fa3c97 conn_new+0x37/0x1b2 p_alloc(240) [pool=connection] [locked=20662 (7.8 %)] 71643 0 47307584 0\| 0x62a82105204d pool_get_from_os_noinc+0x12d/0x161 posix_memalign(660) [locked=5393 (7.5 %)]	2025-09-11 16:32:34 +02:00
Willy Tarreau	9d8c2a888b	MINOR: activity: collect CPU time spent on memory allocations for each task When task profiling is enabled, the pool alloc/free code will measure the time it takes to perform memory allocation after a cache miss or memory freeing to the shared cache or OS. The time taken with the thread-local cache is never measured as measuring that time is very expensive compared to the pool access time. Here doing so costs around 2% performance at 2M req/s, only when task profiling is enabled, so this remains reasonable. The scheduler takes care of collecting that time and updating the sched_activity entry corresponding to the current task when task profiling is enabled. The goal clearly is to track places that are wasting CPU time allocating and releasing too often, or causing large evictions. This appears like this in "show profiling tasks aggr": Tasks activity over 11.428 sec till 0.000 sec ago: function calls cpu_tot cpu_avg lkw_avg lkd_avg mem_avg lat_avg process_stream 44183891 16.47m 22.36us 491.0ns 1.154us 1.000ns 101.1us h1_io_cb 57386064 4.011m 4.193us 20.00ns 16.00ns - 29.47us sc_conn_io_cb 42088024 49.04s 1.165us - - - 54.67us h1_timeout_task 438171 196.5ms 448.0ns - - - 100.1us srv_cleanup_toremove_conns 65 1.468ms 22.58us 184.0ns 87.00ns - 101.3us task_process_applet 3 508.0us 169.3us - 107.0us 1.847us 29.67us srv_cleanup_idle_conns 6 225.3us 37.55us 15.74us 36.84us - 49.47us accept_queue_process 2 45.62us 22.81us - - 4.949us 54.33us	2025-09-11 16:32:34 +02:00
Willy Tarreau	195794eb59	MINOR: activity: add a new mem_avg column to show profiling stats This new column will be used for reporting the average time spent allocating or freeing memory in a task when task profiling is enabled. For now it is not updated.	2025-09-11 16:32:34 +02:00
Willy Tarreau	98cc815e3e	MINOR: activity: collect time spent with a lock held for each task When DEBUG_THREAD > 0 and task profiling enabled, we'll now measure the time spent with at least one lock held for each task. The time is collected by locking operations when locks are taken raising the level to one, or released resetting the level. An accumulator is updated in the thread_ctx struct that is collected by the scheduler when the task returns, and updated in the sched_activity entry of the related task. This allows to observe figures like this one: Tasks activity over 259.516 sec till 0.000 sec ago: function calls cpu_tot cpu_avg lkw_avg lkd_avg lat_avg h1_io_cb 15466589 2.574m 9.984us - - 33.45us <- sock_conn_iocb@src/sock.c:1099 tasklet_wakeup sc_conn_io_cb 8047994 8.325s 1.034us - - 870.1us <- sc_app_chk_rcv_conn@src/stconn.c:844 tasklet_wakeup process_stream 7734689 4.356m 33.79us 1.990us 1.641us 1.554ms <- sc_notify@src/stconn.c:1206 task_wakeup process_stream 7734292 46.74m 362.6us 278.3us 132.2us 972.0us <- stream_new@src/stream.c:585 task_wakeup sc_conn_io_cb 7733158 46.88s 6.061us - - 68.78us <- h1_wake_stream_for_recv@src/mux_h1.c:3633 tasklet_wakeup task_process_applet 6603593 4.484m 40.74us 16.69us 34.00us 96.47us <- sc_app_chk_snd_applet@src/stconn.c:1043 appctx_wakeup task_process_applet 4761796 3.420m 43.09us 18.79us 39.28us 138.2us <- __process_running_peer_sync@src/peers.c:3579 appctx_wakeup process_table_expire 4710662 4.880m 62.16us 9.648us 53.95us 158.6us <- run_tasks_from_lists@src/task.c:671 task_queue stktable_add_pend_updates 4171868 6.786s 1.626us - 1.487us 47.94us <- stktable_add_pend_updates@src/stick_table.c:869 tasklet_wakeup h1_io_cb 2871683 1.198s 417.0ns 70.00ns 69.00ns 1.005ms <- h1_takeover@src/mux_h1.c:5659 tasklet_wakeup process_peer_sync 2304957 5.368s 2.328us - 1.156us 68.54us <- stktable_add_pend_updates@src/stick_table.c:873 task_wakeup process_peer_sync 1388141 3.174s 2.286us - 1.130us 52.31us <- run_tasks_from_lists@src/task.c:671 task_queue stktable_add_pend_updates 463488 3.530s 7.615us 2.000ns 7.134us 771.2us <- stktable_touch_with_exp@src/stick_table.c:654 tasklet_wakeup Here we see that almost the entirety of stktable_add_pend_updates() is spent under a lock, that 1/3 of the execution time of process_stream() was performed under a lock and that 2/3 of it was spent waiting for a lock (this is related to the 10 track-sc present in this config), and that the locking time in process_peer_sync() has now significantly reduced. This is more visible with "show profiling tasks aggr": Tasks activity over 475.354 sec till 0.000 sec ago: function calls cpu_tot cpu_avg lkw_avg lkd_avg lat_avg h1_io_cb 25742539 3.699m 8.622us 11.00ns 10.00ns 188.0us sc_conn_io_cb 22565666 1.475m 3.920us - - 473.9us process_stream 21665212 1.195h 198.6us 140.6us 67.08us 1.266ms task_process_applet 16352495 11.31m 41.51us 17.98us 36.55us 112.3us process_peer_sync 7831923 17.15s 2.189us - 1.107us 41.27us process_table_expire 6878569 6.866m 59.89us 9.359us 51.91us 151.8us stktable_add_pend_updates 6602502 14.77s 2.236us - 2.060us 119.8us h1_timeout_task 801 703.4us 878.0ns - - 185.7us srv_cleanup_toremove_conns 347 12.43ms 35.82us 240.0ns 70.00ns 1.924ms accept_queue_process 142 1.384ms 9.743us - - 340.6us srv_cleanup_idle_conns 74 475.0us 6.418us 896.0ns 5.667us 114.6us	2025-09-11 16:32:34 +02:00
Willy Tarreau	95433f224e	MINOR: activity: add a new lkd_avg column to show profiling stats This new column will be used for reporting the average time spent in a task with at least one lock held. It will only have a non-zero value when DEBUG_THREAD > 0. For now it is not updated.	2025-09-11 16:32:34 +02:00
Willy Tarreau	4b23b2ed32	MINOR: thread: add a lock level information in the thread_ctx The new lock_level field indicates the number of cumulated locks that are held by the current thread. It's fed as soon as DEBUG_THREAD is at least 1. In addition, thread_isolate() adds 128, so that it's even possible to check for combinations of both. The value is also reported in thread dumps (warnings and panics).	2025-09-11 16:32:34 +02:00
Willy Tarreau	503084643f	MINOR: activity: collect time spent waiting on a lock for each task When DEBUG_THREAD > 0, and if task profiling is enabled, then each locking attempt will measure the time it takes to obtain the lock, then add that time to a thread_ctx accumulator that the scheduler will then retrieve to update the current task's sched_activity entry. The value will then appear avearaged over the number of calls in the lkw_avg column of "show profiling tasks", such as below: Tasks activity over 48.298 sec till 0.000 sec ago: function calls cpu_tot cpu_avg lkw_avg lat_avg h1_io_cb 3200170 26.81s 8.377us - 32.73us <- sock_conn_iocb@src/sock.c:1099 tasklet_wakeup sc_conn_io_cb 1657841 1.645s 992.0ns - 853.0us <- sc_app_chk_rcv_conn@src/stconn.c:844 tasklet_wakeup process_stream 1600450 49.16s 30.71us 1.936us 1.392ms <- sc_notify@src/stconn.c:1206 task_wakeup process_stream 1600321 7.770m 291.3us 209.1us 901.6us <- stream_new@src/stream.c:585 task_wakeup sc_conn_io_cb 1599928 7.975s 4.984us - 65.77us <- h1_wake_stream_for_recv@src/mux_h1.c:3633 tasklet_wakeup task_process_applet 997609 46.37s 46.48us 16.80us 113.0us <- sc_app_chk_snd_applet@src/stconn.c:1043 appctx_wakeup process_table_expire 922074 48.79s 52.92us 7.275us 181.1us <- run_tasks_from_lists@src/task.c:670 task_queue stktable_add_pend_updates 705423 1.511s 2.142us - 56.81us <- stktable_add_pend_updates@src/stick_table.c:869 tasklet_wakeup task_process_applet 683511 34.75s 50.84us 18.37us 153.3us <- __process_running_peer_sync@src/peers.c:3579 appctx_wakeup h1_io_cb 535395 198.1ms 370.0ns 72.00ns 930.4us <- h1_takeover@src/mux_h1.c:5659 tasklet_wakeup It now makes it pretty obvious which tasks (hence call chains) spend their time waiting on a lock and for what share of their execution time.	2025-09-11 16:32:34 +02:00
Willy Tarreau	1956c544b5	MINOR: activity: add a new lkw_avg column to show profiling stats This new column will be used for reporting the average time spent waiting for a lock. It will only have a non-zero value when DEBUG_THREAD > 0. For now it is not updated.	2025-09-11 16:32:34 +02:00
Christopher Faulet	3023e98199	BUG/MINOR: resolvers: Restore round-robin selection on records in DNS answers Since the commit `dcb696cd3` ("MEDIUM: resolvers: hash the records before inserting them into the tree"), When several records are found in a DNS answer, the round robin selection over these records is no longer performed. Indeed, before a list of records was used. To ensure each records was selected one after the other, at each selection, the first record of the list was moved at the end. When this list was replaced bu a tree, the same mechanism was preserved. However, the record is indexed using its key, a hash of the record. So its position never changes. When it is removed and reinserted in the tree, its position remains the same. When we walk though the tree, starting from the root, the records are always evaluated in the same order. So, even if there are several records in a DNS answer, the same IP address is always selected. It is quite easy to trigger the issue with a do-resolv action. To fix the issue, the node to perform the next selection is now saved. So instead of restarting from the root each time, we can restart from the next node of the previous call. Thanks to Damien Claisse for the issue analysis and for the reproducer. This patch should fix the issue #3116. It must be backported as far as 2.6.	2025-09-11 15:46:45 +02:00
William Lallemand	e52e6f66ac	BUG/MEDIUM: jws: return size_t in JWS functions JWS functions are supposed to return 0 upon error or when nothing was produced. This was done in order to put easily the return value in trash->data without having to check the return value. However functions like a2base64url() or snprintf() could return a negative value, which would be casted in a unsigned int if this happen. This patch add checks on the JWS functions to ensure that no negative value can be returned, and change the prototype from int to size_t. This is also related to issue #3114. Must be backported to 3.2.	2025-09-11 14:31:32 +02:00
Amaury Denoyelle	d293cc62dc	MINOR: quic: display build warning for compat layer on recent OpenSSL Build option USE_QUIC_OPENSSL_COMPAT=1 must be set to activate QUIC support for OpenSSL prior to version 3.5.2. This compiles an internal compatibility layer, which must be then activated at runtime with global option limited-quic. Starting from OpenSSL version 3.5.2, a proper QUIC TLS API is now exposed. Thus, the compatibility layer is unneeded. However it can still be compiled against newer OpenSSL releases and activated at runtime, mostly for test purpose. As this compatibility layer has some limitations, (no support for QUIC 0-RTT), it's important that users notice this situation and disable it if possible. Thus, this patch adds a notice warning when USE_QUIC_OPENSSL_COMPAT=1 is set when building against OpenSSL 3.5.2 and above. This should be sufficient for users and packagers to understand that this option is not necessary anymore. Note that USE_QUIC_OPENSSL_COMPAT=1 is incompatible with others TLS library which exposed a QUIC API based on original BoringSSL patches set. A build error will prevent the compatibility layer to be built. limited-quic option is thus silently ignored.	2025-09-11 10:11:12 +02:00
Frederic Lecaille	5027ba36a9	MINOR: quic-be: make SSL/QUIC objects use their own indexes (ssl_qc_app_data_index) This index is used to retrieve the quic_conn object from its SSL object, the same way the connection is retrieved from its SSL object for SSL/TCP connections. This patch implements two helper functions to avoid the ugly code with such blocks: #ifdef USE_QUIC else if (qc) { .. } #endif Implement ssl_sock_get_listener() to return the listener from an SSL object. Implement ssl_sock_get_conn() to return the connection from an SSL object and optionally a pointer to the ssl_sock_ctx struct attached to the connections or the quic_conns. Use this functions where applicable: - ssl_tlsext_ticket_key_cb() calls ssl_sock_get_listener() - ssl_sock_infocbk() calls ssl_sock_get_conn() - ssl_sock_msgcbk() calls ssl_sock_get_ssl_conn() - ssl_sess_new_srv_cb() calls ssl_sock_get_conn() - ssl_sock_srv_verifycbk() calls ssl_sock_get_conn() Also modify qc_ssl_sess_init() to initialize the ssl_qc_app_data_index index for the QUIC backends.	2025-09-11 09:51:28 +02:00
Frederic Lecaille	47bb15ca84	MINOR: quic: get rid of ->target quic_conn struct member The ->li (struct listener ) member of quic_conn struct was replaced by a ->target (struct obj_type ) member by this commit: MINOR: quic-be: get rid of ->li quic_conn member to abstract the connection type (front or back) when implementing QUIC for the backends. In these cases, ->target was a pointer to the ojb_type of a server struct. This could not work with the dynamic servers contrary to the listeners which are not dynamic. This patch almost reverts the one mentioned above. ->target pointer to obj_type member is replaced by ->li pointer to listener struct member. As the listener are not dynamic, this is easy to do this. All one has to do is to replace the objt_listener(qc->target) statement by qc->li where applicable. For the backend connection, when needed, this is always qc->conn->target which is used only when qc->conn is initialized. The only "problematic" case is for quic_dgram_parse() which takes a pointer to an obj_type as third argument. But this obj_type is only used to call quic_rx_pkt_parse(). Inside this function it is used to access the proxy counters of the connection thanks to qc_counters(). So, this obj_type argument may be null for now on with this patch. This is the reason why qc_counters() is modified to take this into consideration.	2025-09-11 09:51:28 +02:00
Olivier Houchard	ff47ae60f3	MEDIUM: server: Introduce the concept of path parameters Add a new field in struct server, path parameters. It will contain connection informations for the server that are not expected to change. For now, just store the ALPN negociated with the server. Each time an handhskae is done, we'll update it, even though it is not supposed to change. This will be useful when trying to send early data, that way we'll know which mux to use. Each time the server goes down or is disabled, those informations are erased, as we can't be sure those parameters will be the same once the server will be back up.	2025-09-09 19:01:24 +02:00
Olivier Houchard	5ab9954faa	MINOR: ssl: Add a flag to let it known we have an ALPN negociated Add a new flag to the ssl_sock_ctx, to be set as soon as the ALPN has been negociated. This happens before the handshake has been completed, and that information will let us know that, when we receive early data, if the ALPN has been negociated, then we can immediately create a mux, as the ALPN will tell us which mux to use.	2025-09-09 19:01:24 +02:00
Willy Tarreau	f87cf8b76e	MEDIUM: stick-tables: relax stktable_trash_oldest() to only purge what is needed stktable_trash_oldest() does insist a lot on purging what was requested, only limited by STKTABLE_MAX_UPDATES_AT_ONCE. This is called in two conditions, one to allocate a new stksess, and the other one to purge entries of a stopping process. The cost of iterating over all shards is huge, and a shard lock is taken each time before looking up entries. Moreover, multiple threads can end up doing the same and looking hard for many entries to purge when only one is needed. Furthermore, all threads start from the same shard, hence synchronize their locks. All of this costs a lot to other operations such as access from peers. This commit simplifies the approach by ignoring the budget, starting from a random shard number, and using a trylock so as to be able to give up early in case of contention. The approach chosen here consists in trying hard to flush at least one entry, but once at least one is evicted or at least one trylock failed, then a failure on the trylock will result in finishing. The function now returns a success as long as one entry was freed. With this, tests no longer show watchdog warnings during tests, though a few still remain when stopping the tests (which are not related to this function but to the contention from process_table_expire()). With this change, under high contention some entries' purge might be postponed and the table may occasionally contain slightly more entries than their size (though this already happens since stksess_new() first increments ->current before decrementing it). Measures were made on a 64-core system with 8 peers of 16 threads each, at CPU saturation (350k req/s each doing 10 track-sc) for 10M req, with 3 different approaches: - this one resulted in 1500 failures to find an entry (0.015% size overhead), with the lowest contention and the fairest peers distibution. - leaving only after a success resulted in 229 failures (0.0029% size overhead) but doubled the time spent in the function (on the write lock precisely). - leaving only when both a success and a failed lock were met resulted in 31 failures (0.00031% overhead) but the contention was high enough again so that peers were not all up to date. Considering that a saturated machine might exceed its entries by 0.015% is pretty minimal, the mechanism is kept. This should be backported to 3.2 after a bit more testing as it resolves some watchdog warnings and panics. It requires precedent commit "MINOR: stick-table: permit stksess_new() to temporarily allocate more entries" to over-allocate instead of failing in case of contention.	2025-09-09 17:56:37 +02:00
Willy Tarreau	c3f94fbd9b	DEBUG: stream: count the number of passes in the connect loop Normally the connect loop cannot loop, but some recent traces can easily convince one of the opposite. Let's add a counter, including in panic dumps, in order to avoid the repeated long head scratching sessions starting with "and what if...". In addition, if it's found to loop, this time it will be certain and will indicate what to zoom in. This should be backported to 3.2.	2025-09-09 17:56:14 +02:00
Amaury Denoyelle	0b6908385e	BUG/MINOR: quic: properly support GSO on backend side Previously, GSO emission was explicitely disabled on backend side. This is not true since the following patch, thus GSO can be used, for example when transfering large POST requests to a HTTP/3 backend. commit `e064e5d461` MINOR: quic: duplicate GSO unsupp status from listener to conn However, GSO on the backend side may cause crash when handling EIO. In this case, GSO must be completely disabled. Previously, this was performed by flagging listener instance. In backend side, this would cause a crash as listener is NULL. This patch fixes it by supporting GSO disable flag for servers. Thus, in qc_send_ppkts(), EIO can be converted either to a listener or server flag depending on the quic_conn proxy side. On backend side, server instance is retrieved via <qc.conn.target>. This is enough to guarantee that server is not deleted. This does not need to be backported.	2025-09-08 16:18:05 +02:00
Christopher Faulet	e653dc304e	MINOR: pools: Don't dump anymore info about pools when purge is forced Historically, when the purge of pools was forced by sending a SIGQUIT to haproxy, information about the pools were first dumped. It is now totally pointless because these info can be retrieved via the CLI. It is even less relevant now because the purge is forced typically when there are memroy issues and to dump pools information, data must be allocated. dump_pools_info() function was simplified because it is now called only from an applet. No reason to still try to dump info on stderr.	2025-09-08 16:04:40 +02:00
Amaury Denoyelle	f645cd3c74	MINOR: quic: restore QUIC_HP_SAMPLE_LEN constant The below patch fixes padding emission for small packets, which is required to ensure that header protection removal can be performed by the recipient. commit `d7dea408c6` BUG/MINOR: quic: too short PADDING frame for too short packets In addition to the proper fix, constant QUIC_HP_SAMPLE_LEN was removed and replaced by QUIC_TLS_TAG_LEN. However, it still makes sense to have a dedicated constant which represent the size of the sample used for header protection. Thus, this patch restores it. Special instructions for backport : above patch mentions that no backport is needed. However, this is incorrect, as bug is introduced by another patch scheduled for backport up to 2.6. Thus, it is first mandatory to schedule `d7dea408c6` after it. Then, this patch can also be used for the sake of code clarity.	2025-09-08 14:49:03 +02:00
Frederic Lecaille	6f9fccec1f	MINOR: quic: SSL session reuse for QUIC Mimic the same behavior as the one for SSL/TCP connetion to implement the SSL session reuse. Extract the code which try to reuse the SSL session for SSL/TCP connections to implement ssl_sock_srv_try_reuse_sess(). Call this function from QUIC ->init() xprt callback (qc_conn_init()) as this done for SSL/TCP connections.	2025-09-08 11:46:26 +02:00
Frederic Lecaille	d7dea408c6	BUG/MINOR: quic: too short PADDING frame for too short packets This bug arrvived with this commit: MINOR: quic: centralize padding for HP sampling on packet building What was missed is the fact that at the centralization point for the PADDING frame to add for too short packet, <len> payload length already includes <*pn_len> the packet number field length value. So when computing the length of the PADDING frame, the packet field length must not be considered and added to the payload length (<len>). This bug leaded too short PADDING frame to too short packets. This was the case, most of times with Application level packets with a 1-byte packet number field followed by a 1-byte PING frame. A 1-byte PADDING frame was added in this case in place of a correct 2-bytes PADDINF frame. The header packet protection of such packet could not be removed by the clients as for instance for ngtcp2 with such traces: I00001828 0x5a135c81e803f092c74bac64a85513b657 pkt could not decrypt packet number As the header protection could no be removed, the header keyupdate bit could also not be read by packet analyzers such as pyshark used during the keyupdate tests. No need to backport.	2025-09-05 16:17:11 +02:00
Christopher Faulet	f9a6ae727c	OPTIM: tcpcheck: Reorder tcpchek_connect structure fields to fill holes Thanks to this patch, two 4-bytes holes are now filled in the tcpchek_connect structure.	2025-09-05 15:56:42 +02:00
Christopher Faulet	ffc1f096e0	MEDIUM: httpcheck/ssl: Base the SNI value on the HTTP host header by default Similarly to the automic SNI selection for regulat SSL traffic, the SNI of health-checks HTTPS connection is now automatically set by default by using the host header value. "check-sni-auto" and "no-check-sni-auto" server settings were added to change this behavior. Only implicit HTTPS health-checks can take advantage of this feature. In this case, the host header value from the "option httpchk" directive is used to extract the SNI. It is disabled if http-check rules are used. So, the SNI must still be explicitly specified via a "http-check connect" rule. This patch with should paritally fix the issue #3081.	2025-09-05 15:56:42 +02:00
Christopher Faulet	668916c1a2	MEDIUM: server/ssl: Base the SNI value to the HTTP host header by default For HTTPS outgoing connections, the SNI is now automatically set using the Host header value if no other value is already set (via the "sni" server keyword). It is now the default behavior. It could be disabled with the "no-sni-auto" server keyword. And eventually "sni-auto" server keyword may be used to reset any previous "no-sni-auto" setting. This option can be inherited from "default-server" settings. Finally, if no connection name is set via "pool-conn-name" setting, the selected value is used. The automatic selection of the SNI is enabled by default for all outgoing connections. But it is concretely used for HTTPS connections only. The expression used is "req.hdr(host),host_only". This patch should paritally fix the issue #3081. It only covers the server part. Another patch will add the feature for HTTP health-checks.	2025-09-05 15:56:42 +02:00
Christopher Faulet	f8f94ffc9c	BUG/MEDIUM: server: Use sni as pool connection name for SSL server only By default, for a given server, when no pool-conn-name is specified, the configured sni is used. However, this must only be done when SSL is in-use for the server. Of course, it is uncommon to have a sni expression for now-ssl server. But this may happen. In addition, the SSL may be disabled via the CLI. In that case, the pool-conn-name must be discarded if it was copied from the sni. And, we must of course take care to set it if the ssl is enabled. Finally, when the attac-srv action is checked, we now checked the pool-conn-name expression. This patch should be backported as far as 3.0. It relies on "MINOR: server: Parse sni and pool-conn-name expressions in a dedicated function" which should be backported too.	2025-09-05 15:56:08 +02:00
Aurelien DARRAGON	1a1362ea0b	MINOR: stats-file: reserve some bytes in exported structs We may need additional struct members in shm_stats_file_object and shm_stats_file_hdr, yet since these structs are exported they should not change in size nor ordering else it would require a version change to break compability on purpose since mapping would differ. Here we reserve 64 additional bytes in shm_stats_file_object, and 128 bytes in shm_stats_file_hdr for future usage.	2025-09-03 16:29:48 +02:00
Aurelien DARRAGON	21d97ccfae	BUILD: stats-file: fix aligment issues Document some byte holes and fix some potential aligment issues between 32 and 64 bits architectures to ensure the shm_stats_file memory mapping is consistent between operating systems.	2025-09-03 16:28:46 +02:00
Aurelien DARRAGON	46a5948ed2	MINOR: compiler: add ALWAYS_PAD() macro same as THREAD_PAD() but doesn't depend on haproxy being compiled with thread support. It may be useful for memory (or files) that may be shared between multiple processed.	2025-09-03 16:28:46 +02:00
Aurelien DARRAGON	585ece4c92	MEDIUM: stats-file/counters: store and preload stats counters as shm file objects This is the last patch of the shm stats file series, in this patch we implement the logic to store and fetch shm stats objects and associate them to existing shared counters on the current process. Shm objects are stored in the same memory location as the shm stats file header. In fact they are stored right after it. All objects (struct shm_stats_file_object) have the same size (no matter their type), which allows for easy object traversal without having to check the object's type, and could permit the use of external tools to scan the SHM in the future. Each object stores a guid (of GUID_MAX_LEN+1 size) and tgid which allows to match corresponding shared counters indexes. Also, as stated before, each object stores the list of users making use of it. Objects are never released (the map can only grow), but unused objects (when no more users or active users are found in objects->users), the object is automatically recycled. Also, each object stores its type which defines how the object generic data member should be handled. Upon startup (or reload), haproxy first tries to scan existing shm to find objects that could be associated to frontends, backends, listeners or servers in the current config based on GUID. For associations that couldn't be made, haproxy will automatically create missing objects in the SHM during late startup. When haproxy matches with an existing object, it means the counter from an older process is preserved in the new process, so multiple processes temporarily share the same counter for as long as required for older processes to eventually exit.	2025-09-03 15:59:37 +02:00
Aurelien DARRAGON	ee17d20245	MINOR: stats-file: add process slot management for shm stats file Now that all processes tied to the same shm stats file now share a common clock source, we introduce the process slot notion in this patch. Each living process registers itself in a map at a free index: each slot stores information about the process' PID and heartbeat. Each process is responsible for updating its heartbeat, a slot is considered as "free" if the heartbeat was never set or if the heartbeat is expired (60 seconds of inactivity). The total number of slots is set to 64, this is on purpose because it allows to easily store the "users" of a given shm object using a 64 bits bitmask. Given that when haproxy is reloaded olders processes are supposed to die eventually, it should be large enough (64 simultaneous processes) to be safe. If we manage to reach this limit someday, more slots could be added by splitting "users" bitmask on multiple 64bits variable.	2025-09-03 15:59:33 +02:00
Aurelien DARRAGON	443e657fd6	MEDIUM: stats-file: processes share the same clock source from shm-stats-file The use of the "shm-stats-file" directive now implies that all processes using the same file now share a common clock source, this is required for consistency regarding time-related operations. The clock source is stored in the shm stats file header. When the directive is set, all processes share the same clock (global_now_ms and global_now_ns both point to variables in the map), this is required for time-based counters such as freq counters to work consistently. Since all processes manipulate global clock with atomic operations exclusively during runtime, and don't systematically relies on it (thanks to local now_ms and now_ns), it is pretty much transparent.	2025-09-03 15:59:27 +02:00
Aurelien DARRAGON	c91d93ed1c	MINOR: stats-file: introduce shm-stats-file directive add initial support for the "shm-stats-file" directive and associated "shm-stats-file-max-objects" directive. For now they are flagged as experimental directives. The shared memory file is automatically created by the first process. The file is created using open() so it is up to the user to provide relevant path (either on regular filesystem or ramfs for performance reasons). The directive takes only one argument which is path of the shared memory file. It is passed as-is to open(). The maximum number of objects per thread-group (hard limit) that can be stored in the shm is defined by "shm-stats-file-max-objects" directive, Upon initial creation, the main shm stats file header is provisioned with the version which must remains the same to be compatible between processes and defaults to 2k. which means approximately 1mb max per thread group and should cover most setups. When the limit is reached (during startup) an error is reported by haproxy which invites the user to increase the "shm-stats-file-max-objects" if desired, but this means more memory will be allocated. Actual memory usage is low at start, because only the mmap (mapping) is provisionned with the maximum number of objects to avoid relocating the memory area during runtime, but the actual shared memory file is dynamically resized when objects are added (resized by following half power of 2 curve when new objects are added, see upcoming commits) For now only the file is created, further logic will be implemented in upcoming commits.	2025-09-03 15:59:22 +02:00
Aurelien DARRAGON	cb08bcb9d6	MINOR: counters: retrieve detailed errmsg upon failure with counters_{fe,be}_shared_prepare() counters_{fe,be}_shared_prepare now take an extra <errmsg> parameter that contains additional hints about the error in case of failure. It must be freed accordingly since it is allocated using memprintf	2025-09-03 15:59:17 +02:00
Amaury Denoyelle	a84b404b34	MINOR: quic/flags: complete missing flags Add missing quic_conn flags definition for dev utility.	2025-09-02 09:37:43 +02:00
Amaury Denoyelle	1517869145	BUG/BUILD: stats: fix build due to missing stat enum definition Recently, new server counter for private idle connections have been added to statistics output. However, the patch was missing ST_I_PX_PRIV_IDLE_CUR enum definition. No need to backport.	2025-08-29 09:32:10 +02:00
Amaury Denoyelle	dbe31e3f65	MEDIUM: session: account on server idle conns attached to session This patch adds a new member <curr_sess_idle_conns> on the server. It serves as a counter of idle connections attached on a session instead of regular idle/safe trees. This is used only for private connections. The objective is to provide a method to detect if there is idle connections still referencing a server. This will be particularly useful to ensure that a server is removable. Currently, this is not yet necessary as idle connections are directly freed via "del server" handler under thread isolation. However, this procedure will be replaced by an asynchronous mechanism outside of thread isolation. Careful: connections attached to a session but not idle will not be accounted by this counter. These connections can still be detected via srv_has_streams() so "del server" will be safe. This counter is maintain during the whole lifetime of a private connection. This is mandatory to guarantee "del server" safety and is conform with other idle server counters. What this means it that decrement is performed only when the connection transitions from idle to in use, or just prior to its deletion. For the first case, this is covered by session_get_conn(). The second case is trickier. It cannot be done via session_unown_conn() as a private connection may still live a little longer after its removal from session, most notably when scheduled for idle purging. Thus, conn_free() has been adjusted to handle the final decrement. Now, conn_backend_deinit() is also called for private connections if CO_FL_SESS_IDLE flag is present. This results in a call to srv_release_conn() which is responsible to decrement server idle counters.	2025-08-28 15:08:35 +02:00
Amaury Denoyelle	7a6e3c1a73	MAJOR: server: implement purging of private idle connections When a server goes into maintenance, or if its IP address is changed, idle connections attached to it are scheduled for deletion via the purge mechanism. Connections are moved from server idle/safe list to the purge list relative to their thread. Connections are freed on their owned thread by the scheduled purge task. This patch extends this procedure to also handle private idle connections stored in sessions instead of servers. This is possible thanks via <sess_conns> list server member. A call to the newly defined-function session_purge_conns() is performed on each list element. This moves private connections from their session to the purge list alongside other server idle connections. This change relies on the serie of previous commits which ensure that access to private idle connections is now thread-safe, with idle_conns lock usage and careful manipulation of private idle conns in input/output handlers. The main benefit of this patch is that now all idle connections targetting a server set in maintenance are removed. Previously, private connections would remain until their attach sessions were closed.	2025-08-28 15:08:35 +02:00
Amaury Denoyelle	73fd12e928	MEDIUM: conn/muxes/ssl: remove BE priv idle conn from sess on IO This is a direct follow-up of previous patch which adjust idle private connections access via input/output handlers. This patch implement the handlers prologue part. Now, private idle connections require a similar treatment with non-private idle connections. Thus, private conns are removed temporarily from its session under protection of idle_conns lock. As locking usage is already performed in input/output handler, session_unown_conn() cannot be called. Thus, a new function session_detach_idle_conn() is implemented in session module, which performs basically the same operation but relies on external locking.	2025-08-28 15:08:35 +02:00
Amaury Denoyelle	8de0807b74	MEDIUM: conn/muxes/ssl: reinsert BE priv conn into sess on IO completion When dealing with input/output on a connection related handler, special care must be taken prior to access the connection if it is considered as idle, as it could be manipulated by another thread. Thus, connection is first removed from its idle tree before processing. The connection is reinserted on processing completion unless it has been freed during it. Idle private connections are not concerned by this, because takeover is not applied on them. However, a future patch will implement purging of these connections along with regular idle ones. As such, it is necessary to also protect private connections usage now. This is the subject of this patch and the next one. With this patch, input/output handlers epilogue of muxes/SSL/conn_notify_mux() are adjusted. A new code path is able to deal with a connection attached to a session instead of a server. In this case, session_reinsert_idle_conn() is used. Contrary to session_add_conn(), this new function is reserved for idle connections usage after a temporary removal. Contrary to _srv_add_idle() used by regular idle connections, session_reinsert_idle_conn() may fail as an allocation can be required. If this happens, the connection is immediately destroyed. This patch has no effect for now. It must be coupled with the next one which will temporarily remove private idle connections on input/output handler prologue.	2025-08-28 15:08:35 +02:00
Amaury Denoyelle	f234b40cde	MINOR: server: shard by thread sess_conns member Server member <sess_conns> is a mt_list which contains every backend connections attached to a session which targets this server. These connecions are not present in idle server trees. The main utility of this list is to be able to cleanup these connections prior to removing a server via "del server" CLI. However, this procedure will be adjusted by a future patch. As such, <sess_conns> member must be moved into srv_per_thread struct. Effectively, this duplicates a list for every threads. This commit does not introduce functional change. Its goal is to ensure that these connections are now ordered by their owning thread, which will allow to implement a purge, similarly to idle connections attached to servers.	2025-08-28 14:52:29 +02:00
Amaury Denoyelle	d4f7a2dbcc	MINOR: session: uninline functions related to BE conns management Move from header to source file functions related to session management of backend connections. These functions are big enough to remove inline attribute.	2025-08-28 14:52:29 +02:00
Amaury Denoyelle	d0df41fd22	MINOR: session: document explicitely that session_add_conn() is safe A set of recent patches have simplified management of backend connection attached to sessions. The API is now stricter to prevent any misuse. One of this change is the addition of a BUG_ON() in session_add_conn(), which ensures that a connection is not attached to a session if its <owner> field points to another entry. On older haproxy releases, this assertion could not be enforced due to NTLM as a connection is turned as private during its transfer. When using a true multiplexed protocol on the backend side, the connection could be assigned in turn to several sessions. However, NTLM is now only applied for HTTP/1.1 as it does not make sense if the connection is already shared. To better clarify this situation, extend the comment on BUG_ON() inside session_add_conn().	2025-08-28 14:52:29 +02:00
Amaury Denoyelle	a96f1286a7	BUG/MINOR: connection: rearrange union list members A connection can be stored in several lists, thus there is several attach points in struct connection. Depending on its proxy side, either frontend or backend, a single connection will only access some of them during its lifetime. As an optimization, these attach points are organized in a union. However, this repartition was not correctly achieved along frontend/backend side delimitation. Furthermore, reverse HTTP has recently been introduced. With this feature, a connection can migrate from frontend to backend side or vice versa. As such, it becomes even more tedious to ensure that these members are always accessed in a safe way. This commit rearrange these fields. First, union is now clearly splitted between frontend and backend only elements. Next, backend elements are initialized with conn_backend_init(), which is already used during connection reversal on an edge endpoint. A new function conn_frontend_init() serves to initialize the other members, called both on connection first instantiation and on reversal on a dialer endpoint. This model is much cleaner and should prevent any access to fields from the wrong side. Currently, there is no known case of wrong access in the existing code base. However, this cleanup is considered an improvement which must be backported up to 3.0 to remove any possible undefined behavior.	2025-08-28 14:52:29 +02:00
Frederic Lecaille	31c17ad837	MINOR: quic: remove ->offset qf_crypto struct field This patch follows this previous bug fix: BUG/MINOR: quic: reorder fragmented RX CRYPTO frames by their offsets where a ebtree node has been added to qf_crypto struct. It has the same meaning and type as ->offset_node.key field with ->offset_node an eb64tree node. This patch simply removes ->offset which is no more useful. This patch should be easily backported as far as 2.6 as the one mentioned above to ease any further backport to come.	2025-08-28 08:19:34 +02:00
William Lallemand	18ebd81962	MINOR: ssl: diagnostic warning when both 'default-crt' and 'strict-sni' are used It possible to use both 'strict-sni' and 'default-crt' on the same bind line, which does not make much sense. This patch implements a check which will look for default certificates in the sni_w tree when strict-sni is used. (Referenced by their empty sni ""). default-crt sets the CKCH_INST_EXPL_DEFAULT flag in ckch_inst->is_default, so its possible to differenciate explicits default from implicit default. Could be backported as far as 3.0. This was discussed in ticket #3082.	2025-08-27 16:22:12 +02:00
Frederic Lecaille	d753f24096	BUG/MINOR: quic: reorder fragmented RX CRYPTO frames by their offsets This issue impacts the QUIC listeners. It is the same as the one fixed by this commit: BUG/MINOR: quic: repeat packet parsing to deal with fragmented CRYPTO As chrome, ngtcp2 client decided to fragment its CRYPTO frames but in a much more agressive way. This could be fixed with a list local to qc_parse_pkt_frms() to please chrome thanks to the commit above. But this is not sufficient for ngtcp2 which often splits its ClientHello message into more than 10 fragments with very small ones. This leads the packet parser to interrupt the CRYPTO frames parsing due to the ncbuf gap size limit. To fix this, this patch approximatively proceeds the same way but with an ebtree to reorder the CRYPTO by their offsets. These frames are directly inserted into a local ebtree. Then this ebtree is reused to provide the reordered CRYPTO data to the underlying ncbuf (non contiguous buffer). This way there are very few less chances for the ncbufs used to store CRYPTO data to reach a too much fragmented state. Must be backported as far as 2.6.	2025-08-27 16:14:19 +02:00
Aurelien DARRAGON	cdb97cb73e	MEDIUM: server: split srv_init() in srv_preinit() + srv_postinit() We actually need more granularity to split srv postparsing init tasks: Some of them are required to be run BEFORE the config is checked, and some of them AFTER the config is checked. Thus we push the logic from `368d0136` ("MEDIUM: server: add and use srv_init() function") a little bit further and split the function in two distinct ones, one of them executed under check_config_validity() and the other one using REGISTER_POST_SERVER_CHECK() hook. SRV_F_CHECKED flag was removed because it is no longer needed, srv_preinit() is only called once, and so is srv_postinit().	2025-08-27 12:54:19 +02:00
Christopher Faulet	71c01c1010	MINOR: applet: Make some applet functions HTX aware applet_output_room() and applet_input_data() are now HTX aware. These functions automatically rely on htx versions if APPLET_FL_HTX flag is set for the applet.	2025-08-25 11:11:05 +02:00
Christopher Faulet	927884a3eb	MINOR: applet: Add a flag to know an applet is using HTX buffers Multiplexers already explicitly announce their HTX support. Now it is possible to set flags on applet, it could be handy to do the same. So, now, HTX aware applets must set the APPLET_FL_HTX flag.	2025-08-25 11:11:05 +02:00
Christopher Faulet	1c76e4b2e4	MINOR: applet: Add function to test applet flags from the appctx appctx_app_test() function can now be used to test the applet flags using an appctx. This simplify a bit tests on applet flags. For now, this function is used to test APPLET_FL_NEW_API flag.	2025-08-25 11:11:05 +02:00
Christopher Faulet	3de6c375aa	MINOR: applet: Rely on applet flag to detect the new api Instead of setting a flag on the applet context by checking the defined callback functions of the applet to know if an applet is using the new API or not, we can now rely on the applet flags itself. By checking APPLET_FL_NEW_API flag, it does the job. APPCTX_FL_INOUT_BUFS flag is thus removed.	2025-08-25 11:11:05 +02:00
Amaury Denoyelle	1529ec1a25	MINOR: quic: centralize padding for HP sampling on packet building The below patch has simplified INITIAL padding on emission. Now, qc_prep_pkts() is responsible to activate padding for this case, and there is no more special case in qc_do_build_pkt() needed. commit `8bc339a6ad` BUG/MAJOR: quic: fix INITIAL padding with probing packet only However, qc_do_build_pkt() may still activate padding on its own, to ensure that a packet is big enough so that header protection decryption can be performed by the peer. HP decryption is performed by extracting a sample from the ciphered packet, starting 4 bytes after PN offset. Sample length is 16 bytes as defined by TLS algos used by QUIC. Thus, a QUIC sender must ensures that length of packet number plus payload fields to be at least 4 bytes long. This is enough given that each packet is completed by a 16 bytes AEAD tag which can be part of the HP sample. This patch simplifies qc_do_build_pkt() by centralizing padding for this case in a single location. This is performed at the end of the function after payload is completed. The code is thus simpler. This is not a bug. However, it may be interesting to backport this patch up to 2.6, as qc_do_build_pkt() is a tedious function, in particular when dealing with padding generation, thus it may benefit greatly from simplification.	2025-08-25 08:48:24 +02:00
Olivier Houchard	6f21c5631a	MINOR: ssl: Add a way to globally disable ktls. Add a new global option, "noktls", as well as a command line option, "-dT", to totally disable ktls usage, even if it is activated on servers or binds in the configuration. That makes it easier to quickly figure out if a problem is related to ktls or not.	2025-08-20 18:33:11 +02:00
Olivier Houchard	5c8fa50966	MEDIUM: ssl: Add ktls support for AWS-LC. Add ktls support for AWS-LC. As it does not know anything about ktls, it means extracting keys from the ssl lib, and provide them to the kernel. At which point we can use regular recvmsg()/sendmsg() calls. This patch only provides support for TLS 1.2, AWS-LC provides a different way to extract keys for TLS 1.3. Note that this may work with BoringSSL too, but it has not been tested.	2025-08-20 18:33:11 +02:00
Olivier Houchard	ed7d20afc8	MEDIUM: ssl: Add kTLS support for OpenSSL. Modify the SSL code to enable kTLS with OpenSSL. It mostly requires our internal BIO to be able to handle the various kTLS-specific controls in ha_ssl_ctrl(), as well as being able to use recvmsg() and sendmsg() from ha_ssl_read() and ha_ssl_write().	2025-08-20 18:33:11 +02:00
Olivier Houchard	7836fe8fe3	MINOR: ssl: Define HAVE_VANILLA_OPENSSL if openssl is used. If we're using OpenSSL as our crypto library, so add a define, HAVE_VANILLA_OPENSSL, to make it easier to differentiate between the various crypto libs.	2025-08-20 18:33:10 +02:00
Olivier Houchard	e8674658ae	MINOR: cfgparse: Add a new "ktls" option to bind and server. Add a new "ktls" option to bind and server. Valid values are "on" and "off". It currently does nothing, but when kTLS will be implemented, it will enable or disable kTLS for the corresponding sockets. It is marked as experimental for now.	2025-08-20 18:33:10 +02:00
Olivier Houchard	075e753802	MEDIUM: mux_h1/mux_pt: Use XPRT_CAN_SPLICE to decide if we should splice In both mux_h1 and mux_pt, use the new XPRT_CAN_SPLICE capability to decide if we should attempt to use splicing or not. If we receive XPRT_CONN_CAN_MAYBE_SPLICE, add a new flag on the connection, CO_FL_WANT_SPLICING, to let the xprt know that we'd love to be able to do splicing, so that it may get ready for that. This should have no effect right now, and is required work for adding kTLS support.	2025-08-20 18:33:10 +02:00
Olivier Houchard	5731b8a19c	MEDIUM: xprt: Add a "get_capability" method. Add a new method to xprts, get_capability, that can be used to query if an xprt supports something or not. The first capability implemented is XPRT_CAN_SPLICE, to know if the xprt will be able to use splicing for the provided connection. The possible answers are XPRT_CONN_CAN_NOT_SPLICE, which indicates splicing will never be possible for that connection, XPRT_CONN_COULD_SPLICE, which indicates that splicing is not usable right now, but may be in the future, and XPRT_CONN_CAN_SPLICE, that means we can splice right away.	2025-08-20 18:33:10 +02:00
Olivier Houchard	2623b7822e	MINOR: ssl: Add a "flags" field to ssl_sock_ctx. Instead of adding more separate fields in ssl_sock_ctx, add a "flags" one. Convert the "can_send_early_data" to the flag SSL_SOCK_F_EARLY_ENABLED. More flags will be added for kTLS support.	2025-08-20 17:28:03 +02:00
Olivier Houchard	3d685fcb7d	MINOR: xprt: Add recvmsg() and sendmsg() parameters to rcv_buf() and snd_buf(). In rcv_buf() and snd_buf(), use sendmsg/recvmsg instead of send and recv, and add two new optional parameters to provide msg_control and msg_controllen. Those are unused for now, but will be used later for kTLS.	2025-08-20 17:28:03 +02:00
Frederic Lecaille	878a72d001	BUG/MEDIUM: quic: listener connection stuck during handshakes (OpenSSL 3.5) This issue was reported in GH #3071 by @famfo where a wireshark capture reveals that some handshake could not complete after having received two Initial packets. This could happen when the packets were parsed in two times, calling qc_ssl_provide_all_quic_data() two times. This is due to crypto data stream counter which was incremented two times from qc_ssl_provide_all_quic_data() (see cstream->rx.offset += data statement around line 1223 in quic_ssl.c). One time by the callback which "receives" the crypto data, and on time by qc_ssl_provide_all_quic_data(). Then when parsing the second crypto data frame, the parser detected that the crypto were already provided. To fix this, one could comment the code which increment the crypto data stream counter by <data>. That said, when using the OpenSSL 3.5 QUIC API one should not modified the crypto data stream outside of the OpenSSL 3.5 QUIC API. So, this patch stop calling qc_ssl_provide_all_quic_data() and qc_ssl_provide_quic_data() and only calls qc_ssl_do_hanshake() after having received some crypto data. In addition to this, as these functions are no more called when building haproxy against OpenSSL 3.5, this patch disable their compilations (with #ifndef HAVE_OPENSSL_QUIC). This patch depends on this previous one: MINOR: quic: implement qc_ssl_do_hanshake() Thank you to @famto for this report. Must be backported to 3.2.	2025-08-14 14:54:47 +02:00
Willy Tarreau	a7f8693fa2	MEDIUM: ring: always allocate properly aligned ring structures The rings were manually padded to place the various areas that compose them into different cache lines, provided that the allocator returned a cache-aligned address, which until now was not granted. By now switching to the aligned API we can finally have this guarantee and hope for more consistent ring performance between tests. Like previously the few carefully crafted THREAD_PAD() could simply be replaced by generic THREAD_ALIGN() that dictate the type's alignment. This was the last user of THREAD_PAD() by the way.	2025-08-13 17:47:39 +02:00
Willy Tarreau	cfdab917fe	MINOR: server: align server struct to 64 bytes Several times recently, it was noticed that some benchmarks would highly vary depending on the position of certain fields in the server struct, and this could even vary between runs. The server struct does have separate areas depending on the user cases and hot/cold aspect of the members stored there, but the areas are artificially kept apart using fixed padding instead of real alignment, which has the first sad effect of artificially inflating the struct, and the second one of misaligning it. Now that we have all the necessary tools to keep them aligned, let's just do it. The struct has shrunk from 4160 to 4032 bytes on 64-bit systems, 152 of which are still holes or padding.	2025-08-13 17:37:11 +02:00
Willy Tarreau	a469356268	MEDIUM: server: introduce srv_alloc()/srv_free() to alloc/free a server It happens that we free servers at various places in the code, both on error paths and at runtime thanks to the "server delete" feature. In order to switch to an aligned struct, we'll need to change the calloc() and free() calls. Let's first spot them and switch them to srv_alloc() and srv_free() instead of using calloc() and either free() or ha_free(). An easy trap to fall into is that some of them are default-server entries. The new srv_free() function also resets the pointer like ha_free() does. This was done by running the following coccinelle script all over the code: @@ struct server srv; @@ ( - free(srv) + srv_free(&srv) \| - ha_free(&srv) + srv_free(&srv) ) @@ struct server srv; expression e1; expression e2; @@ ( - srv = malloc(e1) + srv = srv_alloc() \| - srv = calloc(e1, e2) + srv = srv_alloc() ) This is marked medium because despite spotting all call places, we can never rule out the possibility that some out-of-tree patches would allocate their own servers and continue to use the old API... at their own risk.	2025-08-13 17:37:11 +02:00
Willy Tarreau	33d72568dd	MINOR: tools: also implement ha_aligned_alloc_typed() This one is a macro and will allocate a properly aligned and sized object. This will help make sure that the alignment promised to the compiler is respected. When memstats is used, the type name is passed as a string into the .extra field so that it can be displayed in "debug dev memstats". Two tiny mistakes related to memstats macros were also fixed (calloc instead of malloc for zalloc), and the doc was also added to document how to use these calls.	2025-08-13 17:37:08 +02:00
Willy Tarreau	e21bb531ca	MINOR: pools: permit to optionally specify extra size and alignment The common macros REGISTER_TYPED_POOL(), DECLARE_TYPED_POOL() and DECLARE_STATIC_TYPED_POOL() will now take two optional arguments, one being the extra size to be added to the structure, and a second one being the desired alignment to enforce. This will permit to specify alignments larger than the default ones promised to the compiler.	2025-08-11 19:55:30 +02:00
Willy Tarreau	d240f387ca	MINOR: pools: distinguish the requested alignment from the type-specific one We're letting users request an alignment but that can violate one imposed by a type, especially if we start seeing REGISTER_TYPED_POOL() grow in adoption, encouraging users to specify alignment on their types. On the other hand, if we ask the user to always specify the alignment, no control is possible and the error is easy. Let's have a second field in the pool registration, for the type-specific one. We'll set it to zero when unknown, and to the types's alignment when known. This way it will become possible to compare them at startup time to detect conflicts. For now no macro permits to set both separately so this is not visible.	2025-08-11 19:55:30 +02:00
Willy Tarreau	746e77d000	MINOR: tools: implement ha_aligned_zalloc() This one is exactly ha_aligned_alloc() followed by a memset(0), as it will be convenient for a number of call places as a replacement for calloc(). Note that ideally we should also have a calloc version that performs basic multiply overflow checks, but these are essentially used with numbers of threads times small structs so that's fine, and we already do the same everywhere in malloc() calls.	2025-08-11 19:55:30 +02:00
Olivier Houchard	b6702d5342	BUG/MEDIUM: ssl: fix build with AWS-LC AWS-LC doesn't provide SSL_in_before(), and doesn't provide an easy way to know if we already started the handshake or not. So instead, just add a new field in ssl_sock_ctx, "can_write_early_data", that will be initialized to 1, and will be set to 0 as soon as we start the handshake. This should be backported up to 2.8 with `13aa5616c9`.	2025-08-08 20:21:14 +02:00
Aurelien DARRAGON	bcb124f92a	MINOR: init: add REGISTER_POST_DEINIT_MASTER() hook Similar to REGISTER_POST_DEINIT() hook (which is invoked during deinit) but for master process only, when haproxy was started in master-worker mode. The goal is to be able to register cleanup functions that will only run for the master process right before exiting.	2025-08-07 22:27:14 +02:00
Aurelien DARRAGON	c8282f6138	MINOR: clock: add clock_get_now_offset() helper Same as clock_set_now_offset() but to retrieve the offset from external location.	2025-08-07 22:27:09 +02:00
Aurelien DARRAGON	20f9d8fa4e	MINOR: clock: add clock_set_now_offset() helper Since now_offset is a static variable and is not exposed outside from clock.c, let's add an helper so that it becomes possible to set its value from another source file.	2025-08-07 22:27:05 +02:00
Aurelien DARRAGON	4c3a36c609	MINOR: guid: add guid_count() function returns the total amount of registered GUIDs in the guid_tree	2025-08-07 22:26:58 +02:00
Aurelien DARRAGON	7c52964591	MINOR: guid: add guid_get() helper guid_get() is a convenient function to get the actual key string associated to a given guid_node struct	2025-08-07 22:26:52 +02:00
Amaury Denoyelle	cae828cbf5	MINOR: quic: define QUIC_FL_CONN_IS_BACK flag Define a new quic_conn flag assign if the connection is used on the backend side. This is similar to other haproxy components such as struct connection and muxes element. This flag is positionned via qc_new_conn(). Also update quic traces to mark proxy side as 'F' or 'B' suffix.	2025-08-07 16:59:59 +02:00
Amaury Denoyelle	e064e5d461	MINOR: quic: duplicate GSO unsupp status from listener to conn QUIC emission can use GSO to emit multiple datagrams with a single syscall invokation. However, this feature relies on several kernel parameters which are checked on haproxy process startup. Even if these checks report no issue, GSO may still be unable due to the underlying network adapter underneath. Thus, if a EIO occured on sendmsg() with GSO, listener is flagged to mark GSO as unsupported. This allows every other QUIC connections to share the status and avoid using GSO when using this listener. Previously, listener flag was checked for every QUIC emission. This was done using an atomic operation to prevent races. Improve this by duplicating GSO unsupported status as the connection level. This is done on qc_new_conn() and also on thread rebinding if a new listener instance is used. The main benefit from this patch is to reduce the dependency between quic_conn and listener instances.	2025-08-07 16:36:26 +02:00
Willy Tarreau	ef915e672a	MEDIUM: pools: respect pool alignment in allocations Now pool_alloc_area() takes the alignment in argument and makes use of ha_aligned_malloc() instead of malloc(). pool_alloc_area_uaf() simply applies the alignment before returning the mapped area. The pool_free() functionn calls ha_aligned_free() so as to permit to use a specific API for aligned alloc/free like mingw requires. Note that it's possible to see warnings about mismatching sized during pool_free() since we know both the pool and the type. In pool_free, adding just this is sufficient to detect potential offenders: WARN_ON(__alignof__(*__ptr) > pool->align);	2025-08-06 19:20:36 +02:00
Willy Tarreau	f0d0922aa1	MINOR: pools: add macros to declare pools based on a struct type DECLARE_TYPED_POOL() and friends take a name, a type and an extra size (to be added to the size of the element), and will use this to create the pool. This has the benefit of letting the compiler automatically adapt sizeof() and alignof() based on the type declaration.	2025-08-06 19:20:36 +02:00
Willy Tarreau	6ea0e3e2f8	MINOR: pools: add macros to register aligned pools This adds an alignment argument to create_pool_from_loc() and completes the existing low-level macros with new ones that expose the alignment and the new macros permit to specify it. For now they're not used.	2025-08-06 19:20:36 +02:00
Willy Tarreau	eb075d15f6	MEDIUM: pools: add an alignment property This will be used to declare aligned pools. For now it's not used, but it's properly set from the various registrations that compose a pool, and rounded up to the next power of 2, with a minimum of sizeof(void*). The alignment is returned in the "show pools" part that indicates the entry size. E.g. "(56 bytes/8)" means 56 bytes, aligned by 8.	2025-08-06 19:20:36 +02:00
Willy Tarreau	ac23b873f5	DEBUG: pools: also retrieve file and line for direct callers of create_pool() Just like previous patch, we want to retrieve the location of the caller. For this we turn create_pool() into a macro that collects __FILE__ and __LINE__ and passes them to the now renamed function create_pool_with_loc(). Now the remaining ~30 pools also have their location stored.	2025-08-06 19:20:34 +02:00
Willy Tarreau	efa856a8b0	DEBUG: pools: store the pool registration file name and line number When pools are declared using DECLARE_POOL(), REGISTER_POOL etc, we know where they are and it's trivial to retrieve the file name and line number, so let's store them in the pool_registration, and display them when known in "show pools detailed".	2025-08-06 19:20:32 +02:00
Willy Tarreau	ff62aacb20	MEDIUM: pools: change the static pool creation to pass a registration Now we're creating statically allocated registrations instead of passing all the parameters and allocating them on the fly. Not only this is simpler to extend (we're limited in number of INITCALL args), but it also leaves all of these in the data segment where they are easier to find when debugging.	2025-08-06 19:20:30 +02:00
Willy Tarreau	f51d58bd2e	MINOR: pools: force the name at creation time to be a const. This is already the case as all names are constant so that's fine. If it would ever change, it's not very hard to just replace it in-situ via an strdup() and set a flag to mention that it's dynamically allocated. We just don't need this right now. One immediately visible effect is in "show pools detailed" where the names are no longer truncated.	2025-08-06 19:20:28 +02:00
Willy Tarreau	ee5bc28865	MINOR: pools: add a new flag to declare static registrations We must not free these ones when destroying a pool, so let's dedicate them a flag to mention that they are static. For now we don't have any such.	2025-08-06 19:20:26 +02:00
Willy Tarreau	18505f9718	MINOR: pools: support creating a pool from a pool registration We've recently introduced pool registrations to be able to enumerate all pool creation requests with their respective parameters, but till now they were only used for debugging ("show pools detailed"). Let's go a step further and split create_pool() in two: - the first half only allocates and sets the pool registration - the second half creates the pool from the registration This is what this patch does. This now opens the ability to pre-create registrations and create pools directly from there.	2025-08-06 19:20:22 +02:00
Willy Tarreau	325d1bdcca	MINOR: implement ha_aligned_alloc() to return aligned memory areas We have two versions, _safe() which verifies and adjusts alignment, and the regular one which trusts the caller. There's also a dedicated ha_aligned_free() due to mingw. The currently detected OSes are mingw, unixes older than POSIX 200112 which require memalign(), and those post 200112 which will use posix_memalign(). Solaris 10 reports 200112 (probably through _GNU_SOURCE since it does not do it by default), and Solaris 11 still supports memalign() so for all Solaris we use memalign(). The memstats wrappers are also implemented, and have the exported names. This was the opportunity for providing a separate free call that lets the caller specify the size (e.g. for use with pools). For now this code is not used.	2025-08-06 19:19:27 +02:00
Willy Tarreau	e921fe894f	BUILD: compat: always set _POSIX_VERSION to ease comparisons Sometimes we need to compare it to known versions, let's make sure it's always defined. We set it to zero if undefined so that it cannot match any comparison.	2025-08-06 19:19:27 +02:00
Willy Tarreau	2ce0c63206	BUILD: quic: use _MAX() to avoid build issues in pools declarations With the upcoming pool declaration, we're filling a struct's fields, while older versions were relying on initcalls which could be turned to function declarations. Thus the compound expressions that were usable there are not necessarily anymore, as witnessed here with gcc-5.5 on solaris 10: In file included from include/haproxy/quic_tx.h:26:0, from src/quic_tx.c:15: include/haproxy/compat.h:106:19: error: braced-group within expression allowed only inside a function #define MAX(a, b) ({ \ ^ include/haproxy/pool.h:41:11: note: in definition of macro '__REGISTER_POOL' .size = _size, \ ^ ... include/haproxy/quic_tx-t.h:6:29: note: in expansion of macro 'MAX' #define QUIC_MAX_CC_BUFSIZE MAX(QUIC_INITIAL_IPV6_MTU, QUIC_INITIAL_IPV4_MTU) Let's make the macro use _MAX() instead of MAX() since it relies on pure constants.	2025-08-06 19:19:11 +02:00
Willy Tarreau	cf8871ae40	BUILD: compat: provide relaxed versions of the MIN/MAX macros In 3.0 the MIN/MAX macros were converted to compound expressions with commit `0999e3d959` ("CLEANUP: compat: make the MIN/MAX macros more reliable"). However with older compilers these are not supported out of code blocks (e.g. to initialize variables or struct members). This is the case on Solaris 10 with gcc-5.5 when QUIC doesn't compile anymore with the future pool registration: In file included from include/haproxy/quic_tx.h:26:0, from src/quic_tx.c:15: include/haproxy/compat.h:106:19: error: braced-group within expression allowed only inside a function #define MAX(a, b) ({ \ ^ include/haproxy/pool.h:41:11: note: in definition of macro '__REGISTER_POOL' .size = _size, \ ^ ... include/haproxy/quic_tx-t.h:6:29: note: in expansion of macro 'MAX' #define QUIC_MAX_CC_BUFSIZE MAX(QUIC_INITIAL_IPV6_MTU, QUIC_INITIAL_IPV4_MTU) Let's provide the old relaxed versions as _MIN/_MAX for use with constants like such cases where it's certain that there is no risk. A previous attempt using __builtin_constant_p() to switch between the variants did not work, and it's really not worth the hassle of going this far.	2025-08-06 19:18:42 +02:00
Aurelien DARRAGON	aeff2a3b2a	BUG/MEDIUM: hlua_fcn: ensure systematic watcher cleanup for server list iterator In `358166a` ("BUG/MINOR: hlua_fcn: restore server pairs iterator pointer consistency"), I wrongly assumed that because the iterator was a temporary object, no specific cleanup was needed for the watcher. In fact watcher_detach() is not only relevant for the watcher itself, but especially for its parent list to remove the current watcher from it. As iterators are temporary objects, failing to remove their watchers from the server watcher list causes the server watcher list to be corrupted. On a normal iteration sequence, the last watcher_next() receives NULL as target so it successfully detaches the last watcher from the list. However the corner case here is with interrupted iterators: users are free to break away from the iteration loop when a specific condition is met for instance from the lua script, when this happens hlua_listable_servers_pairs_iterator() doesn't get a chance to detach the last iterator. Also, Lua doesn't tell us that the loop was interrupted, so to fix the issue we rely on the garbage collector to force a last detach right before the object is freed. To achieve that, watcher_detach() was slightly modified so that it becomes possible to call it without knowing if the watcher is already detached or not, if watcher_detach() is called on a detached watcher, the function does nothing. This way it saves the caller from having to track the watcher state and makes the API a little more convenient to use. This way we now systematically call watcher_detach() for server iterators right before they are garbage collected. This was first reported in GH #3055. It can be observed when the server list is browsed one than more time when it was already browsed from Lua for a given proxy and the iteration was interrupted before the end. As the watcher list is corrupted, the common symptom is watcher_attach() or watcher_next() not ending due to the internal mt_list call looping forever. Thanks to GH users @sabretus and @sabretus for their precious help. It should be backported everywhere `358166a` was.	2025-08-05 13:06:46 +02:00
William Lallemand	9ee14ed2d9	MEDIUM: acme: allow to wait and restart the task for DNS-01 DNS-01 needs a external process which would register a TXT record on a DNS provider, using a REST API or something else. To achieve this, the process should read the dpapi sink and wait for events. With the DNS-01 challenge, HAProxy will put the task to sleep before asking the ACME server to achieve the challenge. The task then need to be woke up, using the command implemented by this patch. This patch implements the "acme challenge_ready" command which should be used by the agent once the challenge was configured in order to wake the task up. Example: echo "@1 acme challenge_ready foobar.pem.rsa domain kikyo" \| socat /tmp/master.sock -	2025-08-01 18:07:12 +02:00
William Lallemand	365a69648c	MINOR: acme: emit a log for DNS-01 challenge response This commit emits a log which output the TXT entry to create in case of DNS-01. This is useful in cases you want to update your TXT entry manually. Example: acme: foobar.pem.rsa: DNS-01 requires to set the "acme-challenge.example.com" TXT record to "7L050ytWm6ityJqolX-PzBPR0LndHV8bkZx3Zsb-FMg"	2025-08-01 16:12:27 +02:00
William Lallemand	09275fd549	BUILD: acme: avoid declaring TRACE_SOURCE in acme-t.h Files ending with '-t.h' are supposed to be used for structure definitions and could be included in the same file to check API definitions. This patch removes TRACE_SOURCE from acme-t.h to avoid conflicts with other TRACE_SOURCE definitions.	2025-07-31 16:03:28 +02:00
Amaury Denoyelle	2ecc5290f2	MINOR: session: streamline session_check_idle_conn() usage session_check_idle_conn() is called by muxes when a connection becomes idle. It ensures that the session idle limit is not yet reached. Else, the connection is removed from the session and it can be freed. Prior to this patch, session_check_idle_conn() was compatible with a NULL session argument. In this case, it would return true, considering that no limit was reached and connection not removed. However, this renders the function error-prone and subject to future bugs. This patch streamlines it by ensuring it is never called with a NULL argument. Thus it can now only returns true if connection is kept in the session or false if it was removed, as first intended.	2025-07-30 16:13:30 +02:00
Amaury Denoyelle	dd9645d6b9	MINOR: session: do not release conn in session_check_idle_conn() session_check_idle_conn() is called to flag a connection already inserted in a session list as idle. If the session limit on the number of idle connections (max-session-srv-conns) is exceeded, the connection is removed from the session list. In addition to the connection removal, session_check_idle_conn() directly calls MUX destroy callback on the connection. This means the connection is freed by the function itself and should not be used by the caller anymore. This is not practical when an alternative connection closure method should be used, such as a graceful shutdown with QUIC. As such, remove MUX destroy invokation : this is now the responsability of the caller to either close or release immediately the connection.	2025-07-30 11:43:41 +02:00
Amaury Denoyelle	57e9425dbc	MINOR: session: strengthen idle conn limit check Add a BUG_ON() on session_check_idle_conn() to ensure the connection is not already flagged as CO_FL_SESS_IDLE. This checks that this function is only called one time per connection transition from active to idle. This is necessary to ensure that session idle counter is only incremented one time per connection.	2025-07-30 11:40:16 +02:00
Amaury Denoyelle	ec1ab8d171	MINOR: session: remove redundant target argument from session_add_conn() session_add_conn() uses three argument : connection and session instances, plus a void pointer labelled as target. Typically, it represents the server, but can also be a backend instance (for example on dispatch). In fact, this argument is redundant as <target> is already a member of the connection. This commit simplifies session_add_conn() by removing it. A BUG_ON() on target is extended to ensure it is never NULL.	2025-07-30 11:39:57 +02:00
Amaury Denoyelle	668c2cfb09	MINOR: session: strengthen connection attach to session This commit is the first one of a serie to refactor insertion of backend private connection into the session list. session_add_conn() is used to attach a connection into a session list. Previously, this function would report an error if the connection specified was already attached to another session. However, this case currently never happens and thus can be considered as buggy. Remove this check and replace it with a BUG_ON(). This allows to ensure that session insertion remains consistent. The same check is also transformed in session_check_idle_conn().	2025-07-30 11:39:26 +02:00
Aurelien DARRAGON	14966c856b	MINOR: clock: make global_now_ns a pointer as well Similar to previous commit but for global_now_ns	2025-07-29 18:04:15 +02:00
Aurelien DARRAGON	4a20b3835a	MINOR: clock: make global_now_ms a pointer This is preparation work for shared counters between co-processes. As co-processes will need to share a common date. global_now_ms will be used for that as it will point to the shm when sharing is enabled. Thus in this patch we turn global_now_ms into a pointer (and adjust the places where it is written to and read from, hopefully atomic operations through pointer are already used so the change is trivial) For now global_now_ms points to process-local _global_now_ms which is a fallback for when sharing through the shm is not enabled.	2025-07-29 18:04:14 +02:00
Aurelien DARRAGON	713ebd2750	CLEANUP: counters: rename counters_be_shared_init to counters_be_shared_prepare `75e480d10` ("MEDIUM: stats: avoid 1 indirection by storing the shared stats directly in counters struct") took care of renaming counters_fe_shared_init() but we forgot counters_be_shared_init(). Let's fix that for consistency	2025-07-29 18:00:13 +02:00
William Lallemand	83a335f925	MINOR: acme: implement traces Implement traces for the ACME protocol. -dt acme:data:complete will dump every input and output buffers, including decoded buffers before being converted to JWS. It will also dump certificates in the traces. -dt acme:user:complete will only dump the state of the task handler.	2025-07-29 17:25:10 +02:00
Aurelien DARRAGON	c24de077bd	OPTIM: stats: store fast sharded counters pointers at session and stream level Following commit `75e480d10` ("MEDIUM: stats: avoid 1 indirection by storing the shared stats directly in counters struct"), in order to minimize the impact of the recent sharded counters work, we try to push things a bit further in this patch by storing and using "fast" pointers at the session and stream levels when available to avoid costly indirections and systematic "tgid" resolution (which can not be cached by the CPU due to its THREAD-local nature). Indeed, we know that a session/stream is tied to a given CPU, thanks to this we know that the tgid for a given session/stream will never change. Given that, we are able to store sharded frontend and listener counters pointer at the session level (namely sess->fe_tgcounters and sess->li_tgcounters), and once the backend and the server are selected, we are also able to store backend and server sharded counters pointer at the stream level (namely s->be_tgcounters and s->sv_tgcounters) Everywhere we rely on these counters and the stream or session context is available, we use the fast pointers it instead of the indirect pointers path to make the pointer resolution a bit faster. This optimization proved to bring a few percents back, and together with the previous `75e480d10` commit we now fixed the performance regression (we are back to back with 3.2 stats performance)	2025-07-25 18:24:23 +02:00
Aurelien DARRAGON	cf8ba60c88	CLEANUP: peers: remove unused peer_session_target() Since commit `7293eb68` ("MEDIUM: peers: use server as stream target") peer session target always point to server in order to benefit from existing server transport options. Thanks to that, it is no longer necessary to have peer_session_target() helper function, because all it does is return the pointer to the server object. Let's get rid of that	2025-07-25 18:24:17 +02:00
Ben Kallus	1e48ec7f6c	CLEANUP: include: replace hand-rolled offsetof to avoid UB The C standard specifies that it's undefined behavior to dereference NULL (even if you use & right after). The hand-rolled offsetof idiom &(((s)NULL)->f) is thus technically undefined. This clutters the output of UBSan and is simple to fix: just use the real offsetof when it's available. Note that there's no clear statement about this point in the spec, only several points which together converge to this: - From N3220, 6.5.3.4: A postfix expression followed by the -> operator and an identifier designates a member of a structure or union object. The value is that of the named member of the object to which the first expression points, and is an lvalue. - From N3220, 6.3.2.1: An lvalue is an expression (with an object type other than void) that potentially designates an object; if an lvalue does not designate an object when it is evaluated, the behavior is undefined. - From N3220, 6.5.4.4 p3: The unary & operator yields the address of its operand. If the operand has type "type", the result has type "pointer to type". If the operand is the result of a unary operator, neither that operator nor the & operator is evaluated and the result is as if both were omitted, except that the constraints on the operators still apply and the result is not an lvalue. Similarly, if the operand is the result of a [] operator, neither the & operator nor the unary * that is implied by the [] is evaluated and the result is as if the & operator were removed and the [] operator were changed to a + operator. => In short, this is saying that C guarantees these identities: 1. &(p) is equivalent to p 2. &(p[n]) is equivalent to p + n As a consequence, &(p) doesn't result in the evaluation of *p, only the evaluation of p (and similar for []). There is no corresponding special carve-out for ->. See also: https://pvs-studio.com/en/blog/posts/cpp/0306/ After this patch, HAProxy can run without crashing after building w/ clang-19 -fsanitize=undefined -fno-sanitize=function,alignment	2025-07-25 17:54:32 +02:00
Ben Kallus	d3b46cca7b	CLEANUP: compiler: prefer char * over void * for pointer arithmetic This patch changes two instances of pointer arithmetic on void * to use char * instead, to avoid UB. This is essentially to please UB analyzers, though.	2025-07-25 17:54:32 +02:00
Aurelien DARRAGON	75e480d107	MEDIUM: stats: avoid 1 indirection by storing the shared stats directly in counters struct Between 3.2 and 3.3-dev we noticed a noticeable performance regression due to stats handling. After bisecting, Willy found out that recent work to split stats computing accross multiple thread groups (stats sharding) was responsible for that performance regression. We're looking at roughly 20% performance loss. More precisely, it is the added indirections, multiplied by the number of statistics that are updated for each request, which in the end causes a significant amount of time being spent resolving pointers. We noticed that the fe_counters_shared and be_counters_shared structures which are currently allocated in dedicated memory since `a0dcab5c` ("MAJOR: counters: add shared counters base infrastructure") are no longer huge since `16eb0fab31` ("MAJOR: counters: dispatch counters over thread groups") because they now essentially hold flags plus the per-thread group id pointer mapping, not the counters themselves. As such we decided to try merging fe_counters_shared and be_counters_shared in their parent structures. The cost is slight memory overhead for the parent structure, but it allows to get rid of one pointer indirection. This patch alone yields visible performance gains and almost restores 3.2 stats performance. counters_fe_shared_get() was renamed to counters_fe_shared_prepare() and now returns either failure or success instead of a pointer because we don't need to retrieve a shared pointer anymore, the function takes care of initializing existing pointer.	2025-07-25 16:46:10 +02:00
Christopher Faulet	b8d5307bd9	MEDIUM: applet: Emit a warning when a legacy applet is spawned To motivate developers to support the new applets API, a warning is now emitted when a legacy applet is spawned. To not flood users, this warning is only emitted once per legacy applet. To do so, the applet flag APPLET_FL_WARNED was added. It is set when the warning is emitted. Note that test and set on this flag are not performed via atomic operations. So it is possible to have more than one warning for a given applet if it is spawned in same time on several threads. At worrst, there is one warning per thread.	2025-07-25 15:53:33 +02:00
Christopher Faulet	337768656b	MINOR: applet: Add support for flags on applets with a flag about the new API A new field was added in the applet structure to be able to set flags on the applets The first one is related to the new API. APPLET_FL_NEW_API is set for applets based on the new API. It was set on all HAProxy's applets.	2025-07-25 15:44:02 +02:00
Christopher Faulet	1f9a1cbefc	MINOR: applet: Improve applet API to take care of inbuf/outbuf alloc failures applet_get_inbuf() and applet_get_outbuf() functions were not testing if the buffers were available. So, the caller had to check them before calling one of these functions. It is not really handy. So now, these functions take care to have a fully usable buffer before returning. Otherwise NULL is returned.	2025-07-24 12:13:41 +02:00
Christopher Faulet	44aae94ab9	MINOR: applet: Add HTX versions for applet_input_data() and applet_output_room() It will be useful for HTX applets because availale data in the input buffer and available space in the output buffer are computed from the HTX message and not the buffer itself. So now, applet_htx_input_data() and applet_htx_output_room() functions can be used.	2025-07-24 12:13:41 +02:00
Christopher Faulet	d9855102cf	BUG/MEDIUM: Remove sync sends from streams to applets When the applet API was reviewed to use dedicated buffers, the support for sends from the streams to applets was added. Unfortunately, it was not a good idea because this way it is possible to deliver data to an applet and release it just after, truncated data. Indeed, the release stage for applets is related to the stream release itself. However, unlike the multiplexers, the applets cannot survive to a stream for now. So, for now, the sync sends from the streams is removed for applets, waiting for a better way to handle the applets release stage. Note that this only concerns applets using their own buffers. And of now, the bug is harmless because all refactored applets are on server side and consume data first. But this will be an issue with the HTTP client. This patch should be backported as far as 3.0 after a period of observation.	2025-07-24 12:13:41 +02:00
Christopher Faulet	574d0d8211	BUG/MINOR: applet: Fix applet_getword() to not return one extra byte applet_getword() function is returning one extra byte when a string is returned because the "ret" variable is not reset before the loop on the data. The patch also fixes applet_getline(). It is a 3.3-specific issue. No need to backport.	2025-07-24 12:13:41 +02:00
Christopher Faulet	41a40680ce	BUG/MEDIUM: stconn: Fix conditions to know an applet can get data from stream sc_is_send_allowed() function is used to know if an applet is able to receive data from the stream. But this function was designed for applets using the channels buffer. It is not adapted to applets using their own buffers. when the SE_FL_WAIT_DATA flag is set, it means the applet is waiting for more data and should not be woken up without new data. For applets using channels buffer, just testing the flag is enough because process_stream() will remove if when more data will be available. For applets using their own buffers, it is more complicated. Some data may be blocked in the output channel buffer. In that case, and when the applet input buffer can receive daa, the applet can be woken up. This patch must be backported as far as 3.0 after a period of observation.	2025-07-24 12:13:41 +02:00
Christopher Faulet	0d371d2729	BUG/MEDIUM: applet: State inbuf is no longer full if input data are skipped When data are skipped from the input buffer of an applet, we must take care to notify the input buffer is no longer full. Otherwise, this could prevent the stream to push data to the applet. It is 3.3-specific. No backport needed.	2025-07-24 12:13:41 +02:00
Ilia Shipitsin	a2267fafcf	CLEANUP: acme: fix wrong spelling of "resources" "ressources" was used as a variable name, let's use English variant to make spell check happier	2025-07-24 08:11:42 +02:00
Amaury Denoyelle	3bf37596ba	MINOR: mux-quic: store session in QCS instance Add a new <sess> member into QCS structure. It is used to store the parent session of the stream on attach operation. This is only done for backend side. This new member will become necessary when connection reuse will be implemented. <owner> member of connection is not suitable as it could be set to NULL, notably after a session_add_conn() failure. Also, a single BE conn can be shared along different session instance, in particular when using aggressive/always reuse mode. Thus it is necessary to linked each QCS instance with its session.	2025-07-23 15:42:37 +02:00
Remi Tricot-Le Breton	8f2b787241	MINOR: ssl: Add curves in ssl traces Dump the ClientHello curves in the SSL traces.	2025-07-21 16:44:50 +02:00
Remi Tricot-Le Breton	d799a1b3b2	MINOR: ssl: Add curve id to curve name table and mapping functions The SSL libraries like OpenSSL for instance do not seem to actually provide a public mapping between IANA defined curve IDs and curve names, or even a mapping between curve IDs and internal NIDs. This new table regroups all those information in a single table so that we can convert curve names (be it SECG or NIST format) to curve IDs or NIDs. The previously existing 'curves2nid' function now uses the new table, and a new 'curveid2str' one is added.	2025-07-21 16:44:50 +02:00
Remi Tricot-Le Breton	f00d9bf12d	MINOR: ssl: Add ciphers in ssl traces Decode the contents of the ClientHello ciphers extension and dump a human readable list in the ssl traces.	2025-07-21 16:44:50 +02:00
Frederic Lecaille	14d0f74052	MINOR: quic: Remove pool_head_quic_be_cc_buf pool This patch impacts the QUIC frontends. It reverts this patch MINOR: quic-be: add a "CC connection" backend TX buffer pool which adds <pool_head_quic_be_cc_buf> new pool to allocate CC (connection closed state) TX buffers with bigger object size than the one for <pool_head_quic_cc_buf>. Indeed the QUIC backends must be able to send at least 1200 bytes Initial packets. For now on, both the QUIC frontends and backend use the same pool with MAX(QUIC_INITIAL_IPV6_MTU, QUIC_INITIAL_IPV4_MTU)(1252 bytes) as object size.	2025-07-17 19:33:21 +02:00
Valentine Krasnobaeva	9e11c852fe	MINOR: cpu-topo: write thread-cpu bindings into trash buffer Write thread-cpu bindings and cluster summary into provided trash buffer. Like this we can call this function in any place, when this info is needed.	2025-07-17 19:07:58 +02:00
Valentine Krasnobaeva	2405283230	MINOR: cpu-topo: split cpu_dump_topology() to show its summary in show dev cpu_dump_topology() prints details about each enabled CPU and a summary with clusters info and thread-cpu bindings. The latter is often usefull for debugging and we want to add it in the 'show dev' output. So, let's split cpu_dump_topology() in two parts: cpu_topo_debug() to print the details about each enabled CPU; and cpu_topo_dump_summary() to print only the summary. In the next commit we will modify cpu_topo_dump_summary() to write into local trash buffer and it could be easily called from debug_parse_cli_show_dev().	2025-07-17 19:07:46 +02:00
Willy Tarreau	b6d0ecd258	DOC: connection: explain the rules for idle/safe/avail connections It's super difficult to find the rules that operate idle conns depending on their idle/safe/avail/private status. Some are in lists, others not. Some are in trees, others not. Some have a flag set, others not. This documents the rules before the definitions in connection-t.h. It could even be backported to help during backport sessions.	2025-07-16 18:53:57 +02:00
Frederic Lecaille	838024e07e	MINOR: quic: Get rid of qc_is_listener() Replace all calls to qc_is_listener() (resp. !qc_is_listener()) by calls to objt_listener() (resp. objt_server()). Remove qc_is_listener() implement and QUIC_FL_CONN_LISTENER the flag it relied on.	2025-07-16 16:42:21 +02:00
Christopher Faulet	4f7c26cbb3	BUG/MINOR: applet: Don't trigger BUG_ON if the tid is not on appctx init When an appctx is initialized, there is a BUG_ON() to be sure the appctx is really initialized on the right thread to avoid bugs on the thread affinity. However, it is possible to not choose the thread when the appctx is created and let it starts on any thread. In that case, the thread affinity is set when the appctx is initialized. So, we must take cate to not trigger the BUG_ON() in that case. For now, we never hit the bug because the thread affinity is always set during the appctx creation. This patch must be backport as far as 2.8.	2025-07-16 13:47:33 +02:00
Amaury Denoyelle	63586a8ab4	BUG/MINOR: h3: properly handle interim response on BE side On backend side, H3 layer is responsible to decode a HTTP/3 response into an HTX message. Multiple responses may be received on a single stream with interim status codes prior to the final one. h3_resp_headers_to_htx() is the function used solely on backend side responsible for H3 response to HTX transcoding. This patch extends it to be able to properly support interim responses. When such a response is received, the new flag H3_SF_RECV_INTERIM is set. This is converted to QMUX qcs flag QC_SF_EOI_SUSPENDED. The objective of this latter flag is to prevent stream EOI to be reported during stream rcv_buf callback, even if HTX message contains EOM and is empty. QC_SF_EOI_SUSPENDED will be cleared when the final response is finally converted, which unblock stream EOI notification for next rcv_buf invocations. Note however that HTX EOM is untouched : it is always set for both interim and final response reception. As a minor adjustment, HTX_SL_F_BODYLESS is always set for interim responses. Contrary to frontend interim response handling, a flag is necessary on QMUX layer. This is because H3 to HTX transcoding and rcv_buf callback are two distinct operations, called under different context (MUX vs stream tasklet). Also note that H3 layer has two distinct flags for interim response handling, one only used as a server (FE side) and the other as a client (BE side). It was preferred to used two distinct flags which is considered less error-prone, contrary to a single unified flag which would require to always set the proxy side to ensure it is relevant or not. No need to backport.	2025-07-15 18:39:23 +02:00
Amaury Denoyelle	f349df44b4	MINOR: qmux: change API for snd_buf FIN transmission Previous patches have fixes interim response encoding via h3_resp_headers_send(). However, it is still necessary to adjust h3 layer state-machine so that several successive HTTP responses are accepted for a single stream. Prior to this, QMUX was responsible to decree that the final HTX message was encoded so that FIN stream can be emitted. However, with interim response, MUX is in fact unable to properly determine this. As such, this is the responsibility of the application protocol layer. To reflect this, app_ops snd_buf callback is modified so that a new output argument <fin> is added to it. Note that for now this commit does not bring any functional change. However, it will be necessary for the following patch. As such, it should be backported prior to it to every versions as necessary.	2025-07-15 18:39:23 +02:00
Willy Tarreau	4ac28f07d0	MEDIUM: proxy: take the defsrv out of the struct proxy The server struct has gone huge over time (~3.8kB), and having a copy of it in the defsrv section of the struct proxy costs a lot of RAM, that is not needed anymore at run time. This patch replaces this struct with a dynamically allocated one. The field is allocated and initialized during alloc_new_proxy() and is freed when the proxy is destroyed for now. But the goal will be to support freeing it after parsing the section.	2025-07-15 10:34:18 +02:00
Willy Tarreau	616c10f608	CLEANUP: server: add server_find_by_addr() Server lookup by address requires locking and manipulation of the tree from user code. Let's provide server_find_by_addr() which does that for us.	2025-07-15 10:30:28 +02:00
Willy Tarreau	fda04994d9	CLEANUP: server: simplify server_find_by_id() At a few places we're seeing some open-coding of the same function, likely because it looks overkill for what it's supposed to do, due to extraneous tests that are not needed (e.g. check of the backend's PR_CAP_BE etc). Let's just remove all these superfluous tests and inline it so that it feels more suitable for use everywhere it's needed.	2025-07-15 10:30:28 +02:00
Willy Tarreau	61acd15ea8	CLEANUP: server: rename findserver() to server_find_by_name() Now it's more logical and matches what is done in the rest of these functions. server_find() now relies on it.	2025-07-15 10:30:28 +02:00
Willy Tarreau	6ad9285796	CLEANUP: server: rename server_find_by_name() to server_find() This function doesn't just look at the name but also the ID when the argument starts with a '#'. So the name is not correct and explains why this function is not always used when the name only is needed, and why the list-based findserver() is used instead. So let's just call the function "server_find()", and rename its generation-id based cousin "server_find_unique()".	2025-07-15 10:30:28 +02:00
Willy Tarreau	5e78ab33cd	MINOR: server: use the tree to look up the server name in findserver() Let's just use the tree-based lookup instead of walking through the list. This function is used to find duplicates in "track" statements and a few such places, so it's important not to waste too much time on large setups.	2025-07-15 10:30:27 +02:00
Willy Tarreau	12a6a3bb3f	REORG: server: move findserver() from proxy.c to server.c The reason this function was overlooked is that it had mostly equivalent ones in server.c, let's move them together.	2025-07-15 10:30:27 +02:00
Valentine Krasnobaeva	0c63883be1	MINOR: debug: add distro name and version in postmortem Since 2012, systemd compliant distributions contain /etc/os-release file. This file has some standardized format, see details at https://www.freedesktop.org/software/systemd/man/latest/os-release.html. Let's read it in feed_post_mortem_linux() to gather more info about the distribution. (cherry picked from commit f1594c41368baf8f60737b229e4359fa7e1289a9) Signed-off-by: Willy Tarreau <w@1wt.eu>	2025-07-11 11:48:19 +02:00
Ilia Shipitsin	0ee3d739b8	CLEANUP: assorted typo fixes in the code, commits and doc Corrected various spelling and phrasing errors to improve clarity and consistency.	2025-07-10 19:49:48 +02:00
Christopher Faulet	187ae28cf4	MINOR: h1-htx: Add function to format an HTX message in its H1 representation The function h1_format_htx_msg() can now be used to convert a valid HTX message in its H1 representation. No validity test is performed, the HTX message must be valid. Only trailers are silently ignored if the message is not chunked. In addition, the destination buffer must be empty. 1XX interim responses should be supported. But again, there is no validity tests.	2025-07-10 10:29:49 +02:00
Christopher Faulet	25b0625d5c	BUG/MEDIUM: http-client: Drain the request if an early response is received When a large request is sent, it is possible to have a response before the end of the request. It is valid from HTTP perspective but it is an issue with the current design of the http-client. Indded, the request and the response are handled sequentially. So the response will be blocked, waiting for the end of the request. Most of time, it is not an issue, except when the request transfer is blocked. In that case, the applet is blocked. With the current API, it is not possible to handle early response and continue the request transfer. So, this case cannot be handle. In that case, it seems reasonnable to drain the request if a response is received. This way, the request transfer, from the caller point of view, is never blocked and the response can be properly processed. To do so, the action flag HTTPCLIENT_FA_DRAIN_REQ is added to the http-client. When it is set, the request payload is just dropped. In that case, we take care to not report the end of input to properly report the request was truncated, especially in logs. It is only an issue with large POSTs, when the payload is streamed. This patch must be backported as far as 2.6.	2025-07-09 16:27:24 +02:00
Frederic Lecaille	45ac235baa	BUG/MEDIUM: quic: Crash after QUIC server callbacks restoration (OpenSSL 3.5) Revert this patch which is no more useful since OpenSSL 3.5.1 to remove the QUIC server callback restoration after SSL context switch: MINOR: quic: OpenSSL 3.5 internal QUIC custom extension for transport parameters reset It was required for 3.5.0. That said, there was no CI for OpenSSL 3.5 at the date of this commit. The CI recently revealed that the QUIC server side could crash during QUIC reg tests just after having restored the callbacks as implemented by the commit above. Also revert this commit which is no more useful because it arrived with the commit above: BUG/MEDIUM: quic: SSL/TCP handshake failures with OpenSSL 3. Must be backported to 3.2.	2025-07-09 16:01:02 +02:00
Frederic Lecaille	c01eb1040e	MINOR: quic: Prevent QUIC build with OpenSSL 3.5 new QUIC API version < 3.5.1 The QUIC listener part was impacted by the 3.5.0 OpenSSL new QUIC API with several issues which have been fixed by 3.5.1. Add a #error to prevent such OpenSSL 3.5 new QUIC API use with version below 3.5.1. Must be backported to 3.2.	2025-07-09 16:01:02 +02:00
Willy Tarreau	95cf518bfa	BUG/MINOR: resolvers: don't lower the case of binary DNS format The server's "hostname_dn" is in Domain Name format, not a pure string, as converted by resolv_str_to_dn_label(). It is made of lower-case string components delimited by binary lengths, e.g. <0x03>www<0x07>haproxy<0x03)org. As such it must not be lowercased again in srv_state_srv_update(), because 1) it's useless on the name components since already done, and 2) because it would replace component lengths 97 and above by 32-char shorter ones. Granted, not many domain names have that large components so the risk is very low but the operation is always wrong anyway. This was brought in 2.5 by commit `3406766d57` ("MEDIUM: resolvers: add a ref between servers and srv request or used SRV record"). In the same vein, let's fix the confusing strcasecmp() that are applied to this binary format, and use memcmp() instead. Here there's basically no risk to incorrectly match the wrong record, but that test alone is confusing enough to provoke the existence of the bug above. Finally let's update the component for that field to mention that it's in this format and already lower cased. Better not backport this, the risk of facing this bug is almost zero, and every time we touch such files something breaks for bad reasons.	2025-07-08 07:54:45 +02:00
Frederic Lecaille	5a87f4673a	MINOR: quic: Prevent QUIC backend use with the OpenSSL QUIC compatibility module (USE_OPENSS_COMPAT) Make the server line parsing fail when a QUIC backend is configured if haproxy is built to use the OpenSSL stack compatibility module. This latter does not support the QUIC client part.	2025-07-07 14:13:02 +02:00
Frederic Lecaille	6aebca7f2c	BUG/MINOR: quic: Missing TLS 1.3 QUIC cipher suites and groups inits (OpenSSL 3.5 QUIC API) This bug impacts both QUIC backends and frontends with OpenSSL 3.5 as QUIC API. The connections to a haproxy QUIC listener from a haproxy QUIC backend could not work at all without HelloRetryRequest TLS messages emitted by the backend asking the QUIC client to restart the handshake followed by TLS alerts: conn. @(nil) OpenSSL error[0xa000098] read_state_machine: excessive message size Furthermore, the Initial CRYPTO data sent by the client were big (about two 1252 bytes packets) (ClientHello TLS message). After analyzing the packets a key_share extension with <unknown> as value was long (more that 1Ko). This extension is in relation with the groups but does not belong to the groups supported by QUIC. That said such connections could work with ngtcp2 as backend built against the same OSSL TLS stack API but with a HelloRetryRequest. ngtcp2 always set the QUIC default cipher suites and group, for all the stacks it supports as implemented by this patch. So this patch configures both QUIC backend and frontend cipher suites and groups calling SSL_CTX_set_ciphersuites() and SSL_CTX_set1_groups_list() with the correct argument, except for SSL_CTX_set1_groups_list() which fails with QUIC TLS for a unknown reason at this time. The call to SSL_CTX_set_options() is useless from ssl_quic_initial_ctx() for the QUIC clients. One relies on ssl_sock_prepare_srv_ssl_ctx() to set them for now on. This patch is effective for all the supported stacks without impact for AWS-LC, and QUIC TLS and fixes the connections for haproxy QUIC frontend and backends when builts against OpenSSL 3.5 QUIC API). A new define HAVE_OPENSSL_QUICTLS has been added to openssl-compat.h to distinguish the QUIC TLS stack. Must be backported to 3.2.	2025-07-07 14:13:02 +02:00
Willy Tarreau	573143e0c8	MINOR: pattern: add a counter of added/freed patterns Patterns are allocated when loading maps/acls from a file or dynamically via the CLI, and are released only from the CLI (e.g. "clear map xxx"). These ones do not use pools and are much harder to monitor, e.g. in case a script adds many and forgets to clear them, etc. Let's add a new pair of metrics "PatternsAdded" and "PatternsFreed" that will report the number of added and freed patterns respectively. This can allow to simply graph both. The difference between the two normally represents the number of allocated patterns. If Added grows without Freed following, it can indicate a faulty script that doesn't perform the needed cleanup. The metrics are also made available to Prometheus as patterns_added_total and patterns_freed_total respectively.	2025-07-05 00:12:45 +02:00
Remi Tricot-Le Breton	a075d6928a	CLEANUP: ssl: Rename ssl_trace-t.h to ssl_trace.h This header does not actually contain any structures so it's best to remove the '-t' from the name for better consistency.	2025-07-04 15:21:50 +02:00
Christopher Faulet	5232df57ab	MINOR: proto-tcp: Add support for TCP MD5 signature for listeners and servers This patch adds the support for the RFC2385 (Protection of BGP Sessions via the + TCP MD5 Signature Option) for the listeners and the servers. The feature is only available on Linux. Keywords are not exposed otherwise. By setting "tcp-md5sig <password>" option on a bind line, TCP segments of all connections instantiated from the listening socket will be signed with a 16-byte MD5 digest. The same option can be set on a server line to protect outgoing connections to the corresponding server. The primary use case for this option is to allow BGP to protect itself against the introduction of spoofed TCP segments into the connection stream. But it can be useful for any very long-lived TCP connections. A reg-test was added and it will be executed only on linux. All other targets are excluded.	2025-07-03 15:25:40 +02:00
William Lallemand	3e05e20029	MEDIUM: httpclient: implement a way to use directly htx data Add a HTTPCLIENT_O_RES_HTX flag which allow to store directly the HTX data in the response buffer instead of extracting the data in raw format. This is useful when the data need to be reused in another request.	2025-07-01 16:31:47 +02:00
William Lallemand	2f4219ed68	MEDIUM: httpclient: split the CLI from the actual httpclient API This patch split the httpclient code to prevent confusion between the httpclient CLI command and the actual httpclient API. Indeed there was a confusion between the flag used internally by the CLI command, and the actual httpclient API. hc_cli_* functions as well as HC_C_F_* defines were moved to httpclient_cli.c.	2025-07-01 15:46:04 +02:00
William Lallemand	519abefb57	BUG/MINOR: httpclient: wrongly named httpproxy flag The HC_F_HTTPPROXY flag was wrongly named and does not use the correct value, indeed this flag was meant to be used for the httpclient API, not the httpclient CLI. This patch fixes the problem by introducing HTTPCLIENT_FO_HTTPPROXY which has must be set in hc->flags. Also add a member 'options' in the httpclient structure, because the member flags is reinitialized when starting. Must be backported as far as 3.0.	2025-07-01 14:47:52 +02:00
Aurelien DARRAGON	747a812066	MEDIUM: stats: add persistent state to typed output format Add a fourth character to the second column of the "typed output format" to indicate whether the value results from a volatile or persistent metric ('V' or 'P' characters respectively). A persistent metric means the value could possibily be preserved across reloads by leveraging a shared memory between multiple co-processes. Such metrics are identified as "shared" in the code (since they are possibly shared between multiple co-processes) Some reg-tests were updated to take that change into account, also, some outputs in the configuration manual were updated to reflect current behavior.	2025-07-01 14:15:03 +02:00
Remi Tricot-Le Breton	522bca98e1	MAJOR: jwt: Allow certificate instead of public key in jwt_verify converter The 'jwt_verify' converter could only be passed public keys as second parameter instead of full-on public certificates. This patch allows proper certificates to be used. Those certificates can be loaded in ckch_stores like any other certificate which means that all the certificate-related operations that can be made via the CLI can now benefit JWT validation as well. We now have two ways JWT validation can work, the legacy one which only relies on public keys which could not be stored in ckch_stores without some in depth changes in the way the ckch_stores are built. In this legacy way, the public keys are fully stored in a cache dedicated to JWT only which does not have any CLI commands and any way to update them during runtime. It also requires that all the public keys used are passed at least once explicitely to the 'jwt_verify' converter so that they can be loaded during init. The new way uses actual certificates, either already stored in the ckch_store tree (if predefined in a crt-store or already used previously in the configuration) or loaded in the ckch_store tree during init if they are explicitely used in the configuration like so: var(txn.bearer),jwt_verify(txn.jwt_alg,"cert.pem") When using a variable (or any other way that can only be resolved during runtime) in place of the converter's <key> parameter, the first time we encounter a new value (for which we don't have any entry in the jwt tree) we will lock the ckch_store tree and try to perform a lookup in it. If the lookup fails, an entry will still be inserted into the jwt tree so that any following call with this value avoids performing the ckch_store tree lookup.	2025-06-30 17:59:55 +02:00
Remi Tricot-Le Breton	cd89ce1766	MINOR: jwt: Rename pkey to pubkey in jwt_cert_tree_entry struct Rename the jwt_cert_tree_entry member pkey to pubkey to avoid any confusion between private and public key.	2025-06-30 17:59:55 +02:00
Christopher Faulet	a2a142bf40	BUG/MEDIUM: hlua: Forbid any L6/L7 sample fetche functions from lua services It was already forbidden to use HTTP sample fetch functions from lua services. An error is triggered if it happens. However, the error must be extended to any L6/L7 sample fetch functions. Indeed, a lua service is an applet. It totally unexepected for an applet to access to input data in a channel's buffer. These data have not been analyzed yet and are still subject to any change. An applet, lua or not, must never access to "not forwarded" data. Only output data are available. For now, if a lua applet relies on any L6/L7 sampel fetch functions, the behavior is undefined and not consistent. So to fix the issue, hlua flag HLUA_F_MAY_USE_HTTP is renamed to HLUA_F_MAY_USE_CHANNELS_DATA. This flag is used to prevent any lua applet to use L6/L7 sample fetch functions. This patch could be backported to all stable versions.	2025-06-30 16:47:59 +02:00
Aurelien DARRAGON	4fcc9b5572	MINOR: counters: rename last_change counter to last_state_change Since proxy and server struct already have an internal last_change variable and we cannot merge it with the shared counter one, let's rename the last_change counter to be more specific and prevent the mixup between the two. last_change counter is renamed to last_state_change, and unlike the internal last_change, this one is a shared counter so it is expected to be updated by other processes in our back. However, when updating last_state_change counter, we use the value of the server/proxy last_change as reference value.	2025-06-30 16:26:38 +02:00
Aurelien DARRAGON	5b1480c9d4	MEDIUM: proxy: add and use a separate last_change variable for internal use Same motivation as previous commit, proxy last_change is "abused" because it is used for 2 different purposes, one for stats, and the other one for process-local internal use. Let's add a separate proxy-only last_change variable for internal use, and leave the last_change shared (and thread-grouped) counter for statistics.	2025-06-30 16:26:31 +02:00
Aurelien DARRAGON	01dfe17acf	MEDIUM: server: add and use a separate last_change variable for internal use last_change server metric is used for 2 separate purposes. First it is used to report last server state change date for stats and other related metrics. But it is also used internally, including in sensitive paths, such as lb related stuff to take decision or perform computations (ie: in srv_dynamic_maxconn()). Due to last_change counter now being split over thread groups since `16eb0fa` ("MAJOR: counters: dispatch counters over thread groups"), reading the aggregated value has a cost, and we cannot afford to consult last_change value from srv_dynamic_maxconn() anymore. Moreover, since the value is used to take decision for the current process we don't wan't the variable to be updated by another process in our back. To prevent performance regression and sharing issues, let's instead add a separate srv->last_change value, which is not updated atomically (given how rare the updates are), and only serves for places where the use of the aggregated last_change counter/stats (split over thread groups) is too costly.	2025-06-30 16:26:25 +02:00
Aurelien DARRAGON	837762e2ee	MINOR: mailers: warn if mailers are configured but not actually used Now that native mailers configuration is only usable with Lua mailers, Willy noticed that we lack a way to warn the user if mailers were previously configured on an older version but Lua mailers were not loaded, which could trick the user into thinking mailers keep working when transitionning to 3.2 while it is not. In this patch we add the 'core.use_native_mailers_config()' Lua function which should be called in Lua script body before making use of 'Proxy:get_mailers()' function to retrieve legacy mailers configuration from haproxy main config. This way haproxy effectively knows that the native mailers config is actually being used from Lua (which indicates user correctly migrated from native mailers to Lua mailers), else if mailers are configured but not used from Lua then haproxy warns the user about the fact that they will be ignored unless they are used from Lua. (e.g.: using the provided 'examples/lua/mailers.lua' to ease transition)	2025-06-27 16:41:18 +02:00
Frederic Lecaille	194e3bc2d5	MINOR: quic-be: address validation support implementation (RETRY) - Add ->retry_token and ->retry_token_len new quic_conn struct members to store the retry tokens. These objects are allocated by quic_rx_packet_parse() and released by quic_conn_release(). - Add <pool_head_quic_retry_token> new pool for these tokens. - Implement quic_retry_packet_check() to check the integrity tag of these tokens upon RETRY packets receipt. quic_tls_generate_retry_integrity_tag() is called by this new function. It has been modified to pass the address where the tag must be generated - Add <resend> new parameter to quic_pktns_discard(). This function is called to discard the packet number spaces where the already TX packets and frames are attached to. <resend> allows the caller to prevent this function to release the in flight TX packets/frames. The frames are requeued to be resent. - Modify quic_rx_pkt_parse() to handle the RETRY packets. What must be done upon such packets receipt is: - store the retry token, - store the new peer SCID as the DCID of the connection. Note that the peer will modify again its SCID. This is why this SCID is also stored as the ODCID which must be matched with the peer retry_source_connection_id transport parameter, - discard the Initial packet number space without flagging it as discarded and prevent retransmissions calling qc_set_timer(), - modify the TLS cryptographic cipher contexts (RX/TX), - wakeup the I/O handler to send new Initial packets asap. - Modify quic_transport_param_decode() to handle the retry_source_connection_id transport parameter as a QUIC client. Then its caller is modified to check this transport parameter matches with the SCID sent by the peer with the RETRY packet.	2025-06-26 09:48:00 +02:00
Frederic Lecaille	9cb2acd2f2	MINOR: quic-be: add a "CC connection" backend TX buffer pool A QUIC client must be able to close a connection sending Initial packets. But QUIC client Initial packets must always be at least 1200 bytes long. To reduce the memory use of TX buffers of a connection when in "closing" state, a pool was dedicated for this purpose but with a too much reduced TX buffer size (QUIC_MAX_CC_BUFSIZE). This patch adds a "closing state connection" TX buffer pool with the same role for QUIC backends.	2025-06-26 09:48:00 +02:00
William Lallemand	7cb6167d04	MAJOR: mworker: remove program section support This patch removes completely the support for the program section, the parsing of the section as well as the internals in the mworker does not support it anymore. The program section was considered dysfonctional and not fully compatible with the "mworker V3" model. Users that want to run an external program must use their init system. The documentation is cleaned up in another patch.	2025-06-25 16:11:34 +02:00
Remi Tricot-Le Breton	34fc73ba81	MINOR: ssl: Add "renegotiate" server option This "renegotiate" option can be set on SSL backends to allow secure renegotiation. It is mostly useful with SSL libraries that disable secure regotiation by default (such as AWS-LC). The "no-renegotiate" one can be used the other way around, to disable secure renegotation that could be allowed by default. Those two options can be set via "ssl-default-server-options" as well.	2025-06-25 15:23:48 +02:00
Aurelien DARRAGON	5694a98744	MAJOR: mailers: remove native mailers support As mentioned in 2.8 announce on the mailing list [1] and on the wiki [2] native mailers were deprecated and planned for removal in 3.3. Now is the time to drop the legacy code for native mailers which is based on a tcpcheck "hack" and cannot be maintained. Lua mailers should be used as a drop in replacement. Indeed, "mailers" and associated config directives are preserved because mailers config is exposed to Lua, which helps smoothing the transition from native mailers to Lua based ones. As a reminder, to keep mailers configuration working as before without making changes to the config file, simply add the line below to the global section: lua-load examples/lua/mailers.lua mailers.lua script (provided in the git repository, adjust path as needed) may be customized by users familiar with Lua, by default it emulates the behavior of the native (now removed) mailers. [1]: https://www.mail-archive.com/haproxy@formilux.org/msg43600.html [2]: https://github.com/haproxy/wiki/wiki/Breaking-changes	2025-06-24 10:55:58 +02:00
Aurelien DARRAGON	c0f6024854	MINOR: hlua: emit a log instead of an alert for aborted actions due to unavailable yield As reported by Chris Staite in GH #3002, trying to yield from a Lua action during a client disconnect causes the script to be interrupted (which is expected) and an alert to be emitted with the error: "Lua function '%s': yield not allowed". While this error is well suited for cases where the yield is not expected at all (ie: when context doesn't allow it) and results from a yield misuse in the Lua script, it isn't the case when the yield is exceptionnally not available due to an abort or error in the request/response processing. Because of that we raise an alert but the user cannot do anything about it (the script is correct), so it is confusing and polluting the logs. In this patch we introduce the ACT_OPT_FINAL_EARLY flag which is a complementary flag to ACT_OPT_FIRST. This flag is set when the ACT_OPT_FIRST is set earlier than normal (due to error/abort). hlua_action() then checks for this flag to decide whether an error (alert) or a simple log message should be emitted when the yield is not available. It should solve GH #3002. Thanks to Chris Staite (@chrisstaite-menlo) for having reported the issue and suggested a solution.	2025-06-24 10:55:55 +02:00
Amaury Denoyelle	74b95922ef	BUG/MEDIUM: quic: do not release BE quic-conn prior to upper conn For frontend side, quic_conn is only released if MUX wasn't allocated, either due to handshake abort, in which case upper layer is never allocated, or after transfer completion when full conn + MUX layers are already released. On the backend side, initialization is not performed in the same order. Indeed, in this case, connection is first instantiated, the nthe quic_conn is created to execute the handshake, while MUX is still only allocated on handshake completion. As such, it is not possible anymore to free immediately quic_conn on handshake failure. Else, this can cause crash if the connection try to reaccess to its transport layer after quic_conn release. Such crash can easily be reproduced in case of connection error to the QUIC server. Here is an example of an experienced backtrace. Thread 1 "haproxy" received signal SIGSEGV, Segmentation fault. 0x0000555555739733 in quic_close (conn=0x55555734c0d0, xprt_ctx=0x5555573a6e50) at src/xprt_quic.c:28 28 qc->conn = NULL; [ ## gdb ## ] bt #0 0x0000555555739733 in quic_close (conn=0x55555734c0d0, xprt_ctx=0x5555573a6e50) at src/xprt_quic.c:28 #1 0x00005555559c9708 in conn_xprt_close (conn=0x55555734c0d0) at include/haproxy/connection.h:162 #2 0x00005555559c97d2 in conn_full_close (conn=0x55555734c0d0) at include/haproxy/connection.h:206 #3 0x00005555559d01a9 in sc_detach_endp (scp=0x7fffffffd648) at src/stconn.c:451 #4 0x00005555559d05b9 in sc_reset_endp (sc=0x55555734bf00) at src/stconn.c:533 #5 0x000055555598281d in back_handle_st_cer (s=0x55555734adb0) at src/backend.c:2754 #6 0x000055555588158a in process_stream (t=0x55555734be10, context=0x55555734adb0, state=516) at src/stream.c:1907 #7 0x0000555555dc31d9 in run_tasks_from_lists (budgets=0x7fffffffdb30) at src/task.c:655 #8 0x0000555555dc3dd3 in process_runnable_tasks () at src/task.c:889 #9 0x0000555555a1daae in run_poll_loop () at src/haproxy.c:2865 #10 0x0000555555a1e20c in run_thread_poll_loop (data=0x5555569d1c00 <ha_thread_info>) at src/haproxy.c:3081 #11 0x0000555555a1f66b in main (argc=5, argv=0x7fffffffde18) at src/haproxy.c:3671 To fix this, change the condition prior to calling quic_conn release. If <conn> member is not NULL, delay the release, similarly to the case when MUX is allocated. This allows connection to be freed first, and detach from quic_conn layer through close xprt operation. No need to backport.	2025-06-20 17:46:10 +02:00
Amaury Denoyelle	06cab99a0e	MINOR: mux-quic: support max bidi streams value set by the peer Implement support for MAX_STREAMS frame. On frontend, this was mostly useless as haproxy would never initiate new bidirectional streams. However, this becomes necessary to control stream flow-control when using QUIC as a client on the backend side. Parsing of MAX_STREAMS is implemented via new qcc_recv_max_streams(). This allows to update <ms_uni>/<ms_bidi> QCC fields. This patch is necessary to achieve QUIC backend connection reuse.	2025-06-18 17:25:27 +02:00
Amaury Denoyelle	805a070ab9	BUG/MINOR: mux-quic/h3: properly handle too low peer fctl initial stream Previously, no check on peer flow-control was implemented prior to open a local QUIC stream. This was a small problem for frontend implementation, as in this case haproxy as a server never opens bidirectional streams. On frontend, the only stream opened by haproxy in this case is for HTTP/3 control unidirectional data. If the peer uses an initial value for max uni streams set to 0, it would violate its flow control, and the peer will probably close the connection. Note however that RFC 9114 mandates that each peer defines minimal initial value so that at least the control stream can be created. This commit improves the situation of too low initial max uni streams value. Now, on HTTP/3 layer initialization, haproxy preemptively checks flow control limit on streams via a new function qcc_fctl_avail_streams(). If credit is already expired due to a too small initial value, haproxy preemptively closes the connection using H3_ERR_GENERAL_PROTOCOL_ERROR. This behavior is better as haproxy is now the initiator of the connection closure. This should be backported up to 2.8.	2025-06-18 17:18:55 +02:00
Amaury Denoyelle	c807182ec9	CLEANUP: connection: remove unused mux-ops dedicated to QUIC Remove avail_streams_bidi/avail_streams_uni mux_ops. These callbacks were designed to be specific to QUIC. However, they won't be necessary, as stream layer only cares about bidirectional streams.	2025-06-18 17:02:50 +02:00
Amaury Denoyelle	555ec99d43	MINOR: h3: adjust auth request encoding or fallback to host Implement proper encoding of HTTP/3 authority pseudo-header during request transcoding on the backend side. A pseudo-header :authority is encoded if a value can be extracted from HTX start-line. A special check is also implemented to ensure that a host header is not encoded if :authority already is. A new function qpack_encode_auth() is defined to implement QPACK encoding of :authority header using literal field line with name ref.	2025-06-16 18:11:09 +02:00
Amaury Denoyelle	235e818fa1	MINOR: h3: complete HTTP/3 request scheme encoding Previously, scheme was always set to https when transcoding an HTX start-line into a HTTP/3 request. Change this so this conversion is now fully compliant. If no scheme is specified by the client, which is what happens most of the time with HTTP/1, https is set for the HTTP/3 request. Else, reuse the scheme requested by the client. If either https or http is set, qpack_encode_scheme will encode it using entry from QPACK static table. Else, a full literal field line with name ref is used instead as the scheme value is specified as-is.	2025-06-16 18:11:09 +02:00
Amaury Denoyelle	a0912cf914	MINOR: h3: complete HTTP/3 request method encoding On the backend side, HTX start-line is converted into a HTTP/3 request message. Previously, GET method was hardcoded. Implement proper method conversion, by extracting it from the HTX start-line. qpack_encode_method() has also been extended, so that it is able to encode any method, either using a static table entry, or with a literal field line with name ref representation.	2025-06-16 18:11:09 +02:00
Amaury Denoyelle	7157adb154	MINOR: h3: support basic HTX start-line conversion into HTTP/3 request This commit is the first one of a serie which aim is to implement transcoding of a HTX request into HTTP/3, which is necessary for QUIC backend support. Transcoding is implementing via a new function h3_req_headers_send() when a HTX start-line is parsed. For now, most of the request fields are hardcoded, using a GET method. This will be adjusted in the next following patches.	2025-06-16 18:11:09 +02:00
Amaury Denoyelle	e8775d51df	MINOR: mux-quic: define flag for backend side Mux connection is flagged with new QC_CF_IS_BACK if used on the backend side. For now the only change is during traces, to be able to differentiate frontend and backend usage.	2025-06-12 11:28:54 +02:00
Amaury Denoyelle	93b904702f	MINOR: mux-quic: improve documentation for snd/rcv app-ops Complete document for rcv_buf/snd_buf operations. In particular, return value is now explicitely defined. For H3 layer, associated functions documentation is also extended.	2025-06-12 11:28:54 +02:00
Frederic Lecaille	b9703cf711	MINOR: quic-be: get rid of ->li quic_conn member Replace ->li quic_conn pointer to struct listener member by ->target which is an object type enum and adapt the code. Use __objt_(listener\|server)() where the object type is known. Typically this is were the code which is specific to one connection type (frontend/backend). Remove <server> parameter passed to qc_new_conn(). It is redundant with the <target> parameter. GSO is not supported at this time for QUIC backend. qc_prep_pkts() is modified to prevent it from building more than an MTU. This has as consequence to prevent qc_send_ppkts() to use GSO. ssl_clienthello.c code is run only by listeners. This is why __objt_listener() is used in place of ->li.	2025-06-11 18:37:34 +02:00
Frederic Lecaille	2d076178c6	MINOR: quic-be: Store asap the DCID Store the peer connection ID (SCID) as the connection DCID as soon as an Initial packet is received. Stop comparing the packet to QUIC_PACKET_TYPE_0RTT is already match as QUIC_PACKET_TYPE_INITIAL. A QUIC server must not send too short datagram with ack-eliciting packets inside. This cannot be done from quic_rx_pkt_parse() because one does not know if there is ack-eliciting frame into the Initial packets. If the packet must be dropped, this is after having parsed it!	2025-06-11 18:37:34 +02:00
Frederic Lecaille	43d88a44f1	MINOR: quic-be: Datagrams and packet parsing support Modify quic_dgram_parse() to stop passing it a listener as third parameter. In place the object type address of the connection socket owner is passed to support the haproxy servers with QUIC as transport protocol. qc_owner_obj_type() is implemented to return this address. qc_counters() is also implemented to return the QUIC specific counters of the proxy of owner of the connection. quic_rx_pkt_parse() called by quic_dgram_parse() is also modify to use the object type address used by this latter as last parameter. It is also modified to send Retry packet only from listeners. A QUIC client (connection to haproxy QUIC servers) must drop the Initial packets with non null token length. It is also not supposed to receive O-RTT packets which are dropped.	2025-06-11 18:37:34 +02:00
Frederic Lecaille	89d5a59933	MINOR: quic-be: add field for max_udp_payload_size into quic_conn Add ->max_udp_payload_size new member to quic_conn struct. Initialize it from qc_new_conn(). Adapt qc_snd_buf() to use it.	2025-06-11 18:37:34 +02:00
Frederic Lecaille	52ec3430f2	MINOR: sock: Add protocol and socket types parameters to sock_create_server_socket() This patch only adds <proto_type> new proto_type enum parameter and <sock_type> socket type parameter to sock_create_server_socket() and adapts its callers. This is to prepare the use of this function by QUIC servers/backends.	2025-06-11 18:37:34 +02:00
Frederic Lecaille	9c84f64652	MINOR: quic-be: Add a function to initialize the QUIC client transport parameters Implement qc_srv_params_init() to initialize the QUIC client transport parameters in relation with connections to haproxy servers/backends.	2025-06-11 18:37:34 +02:00
Frederic Lecaille	f49bbd36b9	MINOR: quic-be: SSL sessions initializations Modify qc_alloc_ssl_sock_ctx() to pass the connection object as parameter. It is NULL for a QUIC listener, not NULL for a QUIC server. This connection object is set as value for ->conn quic_conn struct member. Initialise the SSL session object from this function for QUIC servers. qc_ssl_set_quic_transport_params() is also modified to pass the SSL object as parameter. This is the unique parameter this function needs. <qc> parameter is used only for the trace. SSL_do_handshake() must be calle as soon as the SSL object is initialized for the QUIC backend connection. This triggers the TLS CRYPTO data delivery. tasklet_wakeup() is also called to send asap these CRYPTO data. Modify the QUIC_EV_CONN_NEW event trace to dump the potential errors returned by SSL_do_handshake().	2025-06-11 18:37:34 +02:00
Frederic Lecaille	1408d94bc4	MINOR: quic-be: ssl_sock contexts allocation and misc adaptations Implement ssl_sock_new_ssl_ctx() to allocate a SSL server context as this is currently done for TCP servers and also for QUIC servers depending on the <is_quic> boolean value passed as new parameter. For QUIC servers, this function calls ssl_quic_srv_new_ssl_ctx() which is specific to QUIC.	2025-06-11 18:37:34 +02:00
Frederic Lecaille	1e45690656	MINOR: quic-be: Add a function for the TLS context allocations Implement ssl_quic_srv_new_ssl_ctx() whose aim is to allocate a TLS context for QUIC servers.	2025-06-11 18:37:34 +02:00
Frederic Lecaille	24fc44c44d	MINOR: quic-be: QUIC backend XPRT and transport parameters init during parsing Add ->quic_params new member to server struct. Also set the ->xprt member of the server being initialized and initialize asap its transport parameters from _srv_parse_init().	2025-06-11 18:37:34 +02:00
Frederic Lecaille	990c9f95f7	MINOR: quic-be: Correct Version Information transp. param encoding According to the RFC, a QUIC client must encode the QUIC version it supports into the "Available Versions" of "Version Information" transport parameter order by descending preference. This is done defining <quic_version_2> and <quic_version_draft_29> new variables pointers to the corresponding version of <quic_versions> array elements. A client announces its available versions as follows: v1, v2, draft29.	2025-06-11 18:37:34 +02:00
Amaury Denoyelle	bdd5e58179	MINOR: server: implement helper to identify QUIC servers Define srv_is_quic() which can be used to quickly identified if a server uses QUIC protocol.	2025-06-11 18:37:19 +02:00
Olivier Houchard	6993981cd6	BUG/MEDIUM: fd: Use the provided tgid in fd_insert() to get tgroup_info In fd_insert(), use the provided tgid to ghet the thread group info, instead of using the one of the current thread, as we may call fd_insert() from a thread of another thread group, that will happen at least when binding the listeners. Otherwise we'd end up accessing the thread mask containing enabled thread of the wrong thread group, which can lead to crashes if we're binding on threads not present in the thread group. This should fix Github issue #2991. This should be backported up to 2.8.	2025-06-10 15:10:56 +02:00
Christopher Faulet	18f9c71041	CLEANUP: applet: Simplify a bit comments for applet_put* functions Instead of repeating which buffer is used depending on the API used by the applet, a reference to applet_get_outbuf() was added.	2025-06-10 08:16:10 +02:00
Christopher Faulet	79445766a3	MINOR: applet: Add API functions to get data from the input buffer There was already functions to pushed data from the applet to the stream by inserting them in the right buffer, depending the applet was using or not the legacy API. Here, functions to retreive data pushed to the applet by the stream were added: * applet_getchar : Gets one character * applet_getblk : Copies a full block of data * applet_getword : Copies one text block representing a word using a custom separator as delimiter * applet_getline : Copies one text line * applet_getblk_nc : Get one or two blocks of data * applet_getword_nc: Gets one or two blocks of text representing a word using a custom separator as delimiter * applet_getline_nc: Gets one or two blocks of text representing a line	2025-06-10 08:16:10 +02:00
Christopher Faulet	0d8ecb1edc	MINOR: applet: Add API functions to manipulate input and output buffers In this patch, some functions were added to ease input and output buffers manipulation, regardless the corresponding applet is using its own buffers or it is relying on channels buffers. Following functions were added: * applet_get_inbuf : Get the buffer containing data pushed to the applet by the stream * applet_get_outbuf : Get the buffer containing data pushed by the applet to the stream * applet_input_data : Return the amount of data in the input buffer * applet_skip_input : Skips <len> bytes from the input buffer * applet_reset_input: Skips all bytes from the input buffer * applet_output_room: Returns the amout of space available at the output buffer * applet_need_room : Indicates that the applet have more data to deliver and it needs more room in the output buffer to do so	2025-06-10 08:16:10 +02:00
Aurelien DARRAGON	16eb0fab31	MAJOR: counters: dispatch counters over thread groups Most fe and be counters are good candidates for being shared between processes. They are now grouped inside "shared" struct sub member under be_counters and fe_counters. Now they are properly identified, they would greatly benefit from being shared over thread groups to reduce the cost of atomic operations when updating them. For this, we take the current tgid into account so each thread group only updates its own counters. For this to work, it is mandatory that the "shared" member from {fe,be}_counters is initialized AFTER global.nbtgroups is known, because each shared counter causes the stat to be allocated lobal.nbtgroups times. When updating a counter without concurrency, the first counter from the array may be updated. To consult the shared counters (which requires aggregation of per-tgid individual counters), some helper functions were added to counter.h to ease code maintenance and avoid computing errors.	2025-06-05 09:59:38 +02:00
Aurelien DARRAGON	12c3ffbb48	MINOR: counters: add local-only internal rates to compute some maxes cps_max (max new connections received per second), sps_max (max new sessions per second) and http.rps_max (maximum new http requests per second) all rely on shared counters (namely conn_per_sec, sess_per_sec and http.req_per_sec). The problem is that shared counters are about to be distributed over thread groups, and we cannot afford to compute the total (for all thread groups) each time we update the max counters. Instead, since such max counters (relying on shared counters) are a very few exceptions, let's add internal (sess,conn,req) per sec freq counters that are dedicated to cps_max, sps_max and http.rps_max computing. Thanks to that, related *_max counters shouldn't be negatively impacted by the thread-group distribution, yet they will not benefit from it either. Related internal freq counters are prefixed with "_" to emphasize the fact that they should not be used for other purpose (the shared ones, which are about to be distributed over thread groups in upcoming commits are still available and must be used instead). The internal ones could eventually be removed at any time if we find another way to compute the {cps,sps,http.rps)_max counters.	2025-06-05 09:59:31 +02:00
Aurelien DARRAGON	b72a8bb138	CLEANUP: counters: merge some common counters between {fe,be}_counters_shared Now that we have a common struct between fe and be shared counters struct let's perform some cleanup to merge duplicate members into the common struct part. This will ease code maintenance.	2025-06-05 09:59:24 +02:00
Aurelien DARRAGON	b599138842	MEDIUM: counters: manage shared counters using dedicated helpers proxies, listeners and server shared counters are now managed via helpers added in one of the previous commits. When guid is not set (ie: when not yet assigned), shared counters pointer is allocated using calloc() (local memory) and a flag is set on the shared counters struct to know how to manipulate (and free it). Else if guid is set, then it means that the counters may be shared so while for now we don't actually use a shared memory location the API is ready for that. The way it works, for proxies and servers (for which guid is not known during creation), we first call counters_{fe,be}_shared_get with guid not set, which results in local pointer being retrieved (as if we just manually called calloc() to retrieve a pointer). Later (during postparsing) if guid is set we try to upgrade the pointer from local to shared. Lastly, since the memory location for some objects (proxies and servers counters) may change from creation to postparsing, let's update counters->last_change member directly under counters_{fe,be}_shared_get() so we don't miss it. No change of behavior is expected, this is only preparation work.	2025-06-05 09:59:17 +02:00
Aurelien DARRAGON	c10ce1c85b	MINOR: counters: add common struct and flags to {fe,be}_counters_shared fe_counters_shared and be_counters_shared may share some common members since they are quite similar, so we add a common struct part shared between the two. struct counters_shared is added for convenience as a generic pointer to manipulate common members from fe or be shared counters pointer. Also, the first common member is added: shared fe and be counters now have a flags member.	2025-06-05 09:59:10 +02:00
Aurelien DARRAGON	aa53887398	MINOR: counters: add shared counters helpers to get and drop shared pointers create include/haproxy/counters.h and src/counters.c files to anticipate for further helpers as some counters specific tasks needs to be carried out and since counters are shared between multiple object types (ie: listener, proxy, server..) we need generic helpers. Add some shared counters helper which are not yet used but will be updated in upcoming commits.	2025-06-05 09:59:04 +02:00
Aurelien DARRAGON	a0dcab5c45	MAJOR: counters: add shared counters base infrastructure Shareable counters are not tagged as shared counters and are dynamically allocated in separate memory area as a prerequisite for being stored in shared memory area. For now, GUID and threads groups are not taken into account, this is only a first step. also we ensure all counters are now manipulated using atomic operations, namely, "last_change" counter is now read from and written to using atomic ops. Despite the numerous changes caused by the counters being moved away from counters struct, no change of behavior should be expected.	2025-06-05 09:58:58 +02:00
Christopher Faulet	8ee650a88b	CLEANUP: applet: Update comment for applet_put* functions These functions were copied from the channel API and modified to work with applets using the new API or the legacy one. However, the comments were updated accordingly. It is the purpose of this patch.	2025-06-03 15:03:30 +02:00
Aurelien DARRAGON	368d01361a	MEDIUM: server: add and use srv_init() function rename _srv_postparse() internal function to srv_init() function and group srv_init_per_thr() plus idle conns list init inside it. This way we can perform some simplifications as srv_init() performs multiple server init steps after parsing. SRV_F_CHECKED flag was added, it is automatically set when srv_init() runs successfully. If the flag is already set and srv_init() is called again, nothing is done. This permis to manually call srv_init() earlier than the default POST_CHECK hook when needed without risking to do things twice.	2025-06-02 17:51:33 +02:00
Aurelien DARRAGON	889ef6f67b	MEDIUM: server: automatically add server to proxy list in new_server() while new_server() takes the parent proxy as argument and even assigns srv->proxy to the parent proxy, it didn't actually inserted the server to the parent proxy server list on success. The result is that sometimes we add the server to the list after new_server() is called, and sometimes we don't. This is really error-prone and because of that hooks such as REGISTER_POST_SERVER_CHECK() which as run for all servers listed in all proxies may not be relied upon for servers which are not actually inserted in their parent proxy server list. Plus it feels very strange to have a server that points to a proxy, but then the proxy doesn't know about it because it cannot find it in its server list. To prevent errors and make proxy->srv list reliable, we move the insertion logic directly under new_server(). This requires to know if we are called during parsing or during runtime to either insert or append the server to the parent proxy list. For that we use PR_FL_CHECKED flag from the parent proxy (if the flag is set, then the proxy was checked so we are past the init phase, thus we assume we are called during runtime) This implies that during startup if new_server() has to be cancelled on error paths we need to call srv_detach() (which is now exposed in server.h) before srv_drop(). The consequence of this commit is that REGISTER_POST_SERVER_CHECK() should not run reliably on all servers created using new_server() (without having to manually loop on global servers_list)	2025-06-02 17:51:30 +02:00
Aurelien DARRAGON	943958c3ff	MINOR: proxy: add a true list containing all proxies We have global proxies_list pointer which is announced as the list of "all existing proxies", but in fact it only represents regular proxies declared on the config file through "listen, frontend or backend" keywords It is ambiguous, and we currently don't have a straightforwrd method to iterate over all proxies (either public or internal ones) within haproxy Instead we still have to manually iterate over multiple lists (main proxies, log-forward proxies, peer proxies..) which is error-prone. In this patch we add a struct list member (8 bytes) inside struct proxy in order to store every proxy (except default ones) within a global "proxies" list which is actually representative for all proxies existing under haproxy process, like we already have for servers.	2025-06-02 17:51:21 +02:00
Aurelien DARRAGON	d04843167c	MINOR: stats: add stat_col flags Add stat_col flags member to store .generic bit and prepare for upcoming flags. No functional change expected.	2025-06-02 17:51:08 +02:00
Willy Tarreau	9f4cd435d3	[RELEASE] Released version 3.3-dev0 Released version 3.3-dev0 with the following main changes : - MINOR: version: mention that it's development again	2025-05-28 16:46:34 +02:00
Willy Tarreau	8809251ee0	MINOR: version: mention that it's development again This essentially reverts `a6458fd426`.	2025-05-28 16:46:15 +02:00
Willy Tarreau	a6458fd426	MINOR: version: mention that it's 3.2 LTS now. The version will be maintained up to around Q2 2030. Let's also update the INSTALL file to mention this.	2025-05-28 16:31:27 +02:00
Christopher Faulet	99e755d673	MINOR: listeners: Add support for a label on bind line It is now possile to set a label on a bind line. All sockets attached to this bind line inherits from this label. The idea is to be able to groud of sockets. For now, there is no mechanism to create these groups, this must be done by hand.	2025-05-26 19:00:00 +02:00
Willy Tarreau	3494775a1f	MINOR: ssl: support strict-sni in ssl-default-bind-options Several users already reported that it would be nice to support strict-sni in ssl-default-bind-options. However, in order to support it, we also need an option to disable it. This patch moves the setting of the option from the strict_sni field to a flag in the ssl_options field so that it can be inherited from the default bind options, and adds a new "no-strict-sni" directive to allow to disable it on a specific "bind" line. The test file "del_ssl_crt-list.vtc" which already tests both options was updated to make use of the default option and the no- variant to confirm everything continues to work.	2025-05-22 15:31:54 +02:00
Willy Tarreau	a1577a89a0	MINOR: glitches: add global setting "tune.glitches.kill.cpu-usage" It was mentioned during the development of glitches that it would be nice to support not killing misbehaving connections below a certain CPU usage so that poor implementations that routinely misbehave without impact are not killed. This is now possible by setting a CPU usage threshold under which we don't kill them via this parameter. It defaults to zero so that we continue to kill them by default.	2025-05-21 15:47:42 +02:00
Amaury Denoyelle	00d90e8839	MINOR: quic: adjust quic_conn-t.h include list Adjust include list in quic_conn-t.h. This file is included in many QUIC source, so it is useful to keep as lightweight as possible. Note that connection/QUIC MUX are transformed into forward declaration for better layer separation.	2025-05-21 14:44:27 +02:00
Amaury Denoyelle	01e3b2119a	MINOR: quic: add some missing includes Insert some missing includes statement in QUIC source files. This was detected after the next commit which adjust the include list used in quic_conn-t.h file.	2025-05-21 14:44:27 +02:00
Amaury Denoyelle	f286288471	MINOR: quic: refactor handling of streams after MUX release quic-conn layer has to handle itself STREAM frames after MUX release. If the stream was already seen, it is probably only a retransmitted frame which can be safely ignored. For other streams, an active closure may be needed. Thus it's necessary that quic-conn layer knows the highest stream ID already handled by the MUX after its release. Previously, this was done via <nb_streams> member array in quic-conn structure. Refactor this by replacing <nb_streams> by two members called <stream_max_uni>/<stream_max_bidi>. Indeed, it is unnecessary for quic-conn layer to monitor locally opened uni streams, as the peer cannot by definition emit a STREAM frame on it. Also, bidirectional streams are always opened by the remote side. Previously, <nb_streams> were set by quic-stream layer. Now, <stream_max_uni>/<stream_max_bidi> members are only set one time, just prior to QUIC MUX release. This is sufficient as quic-conn do not use them if the MUX is available. Note that previously, IDs were used relatively to their type, thus incremented by 1, after shifting the original value. For simplification, use the plain stream ID, which is incremented by 4.	2025-05-21 14:26:45 +02:00
Amaury Denoyelle	07d41a043c	MINOR: quic: move function to check stream type in utils Move general function to check if a stream is uni or bidirectional from QUIC MUX to quic_utils module. This should prevent unnecessary include of QUIC MUX header file in other sources.	2025-05-21 14:17:41 +02:00
Amaury Denoyelle	cf45bf1ad8	CLEANUP: quic: remove unused cbuf module Cbuf are not used anymore. Remove the related source and header files, as well as include statements in the rest of QUIC source files.	2025-05-21 14:16:37 +02:00
Frederic Lecaille	b3ac1a636c	MINOR: quic: implement all remaining callbacks for OpenSSL 3.5 QUIC API The quic_conn struct is modified for two reasons. The first one is to store the encoded version of the local tranport parameter as this is done for USE_QUIC_OPENSSL_COMPAT. Indeed, the local transport parameter "should remain valid until after the parameters have been sent" as mentionned by SSL_set_quic_tls_cbs(3) manual. In our case, the buffer is a static buffer attached to the quic_conn object. qc_ssl_set_quic_transport_params() function whose role is to call SSL_set_tls_quic_transport_params() (aliased by SSL_set_quic_transport_params() to set these local tranport parameter into the TLS stack from the buffer attached to the quic_conn struct. The second quic_conn struct modification is the addition of the new ->prot_level (SSL protection level) member added to the quic_conn struct to store "the most recent write encryption level set via the OSSL_FUNC_SSL_QUIC_TLS_yield_secret_fn callback (if it has been called)" as mentionned by SSL_set_quic_tls_cbs(3) manual. This patches finally implements the five remaining callacks to make the haproxy QUIC implementation work. OSSL_FUNC_SSL_QUIC_TLS_crypto_send_fn() (ha_quic_ossl_crypto_send) is easy to implement. It calls ha_quic_add_handshake_data() after having converted qc->prot_level TLS protection level value to the correct ssl_encryption_level_t (boringSSL API/quictls) value. OSSL_FUNC_SSL_QUIC_TLS_crypto_recv_rcd_fn() (ha_quic_ossl_crypto_recv_rcd()) provide the non-contiguous addresses to the TLS stack, without releasing them. OSSL_FUNC_SSL_QUIC_TLS_crypto_release_rcd_fn() (ha_quic_ossl_crypto_release_rcd()) release these non-contiguous buffer relying on the fact that the list of encryption level (qc->qel_list) is correctly ordered by SSL protection level secret establishements order (by the TLS stack). OSSL_FUNC_SSL_QUIC_TLS_yield_secret_fn() (ha_quic_ossl_got_transport_params()) is a simple wrapping function over ha_quic_set_encryption_secrets() which is used by boringSSL/quictls API. OSSL_FUNC_SSL_QUIC_TLS_got_transport_params_fn() (ha_quic_ossl_got_transport_params()) role is to store the peer received transport parameters. It simply calls quic_transport_params_store() and set them into the TLS stack calling qc_ssl_set_quic_transport_params(). Also add some comments for all the OpenSSL 3.5 QUIC API callbacks. This patch have no impact on the other use of QUIC API provided by the others TLS stacks.	2025-05-20 15:00:06 +02:00
Frederic Lecaille	dc6a3c329a	MINOR: quic: Allow the use of the new OpenSSL 3.5.0 QUIC TLS API (to be completed) This patch allows the use of the new OpenSSL 3.5.0 QUIC TLS API when it is available and detected at compilation time. The detection relies on the presence of the OSSL_FUNC_SSL_QUIC_TLS_CRYPTO_SEND macro from openssl-compat.h. Indeed this macro is defined by OpenSSL since 3.5.0 version. It is not defined by quictls. This helps in distinguishing these two TLS stacks. When the detection succeeds, HAVE_OPENSSL_QUIC is also defined by openssl-compat.h. Then, this is this new macro which is used to detect the availability of the new OpenSSL 3.5.0 QUIC TLS API. Note that this detection is done only if USE_QUIC_OPENSSL_COMPAT is not asked. So, USE_QUIC_OPENSSL_COMPAT and HAVE_OPENSSL_QUIC are exclusive. At the same location, from openssl-compat.h, ssl_encryption_level_t enum is defined. This enum was defined by quictls and expansively used by the haproxy QUIC implementation. SSL_set_quic_transport_params() is replaced by SSL_set_quic_tls_transport_params. SSL_set_quic_early_data_enabled() (quictls) is also replaced by SSL_set_quic_tls_early_data_enabled() (OpenSSL). SSL_quic_read_level() (quictls) is not defined by OpenSSL. It is only used by the traces to log the current TLS stack decryption level (read). A macro makes it return -1 which is an usused values. The most of the differences between quictls and OpenSSL QUI APIs are in quic_ssl.c where some callbacks must be defined for these two APIs. This is why this patch modifies quic_ssl.c to define an array of OSSL_DISPATCH structs: <ha_quic_dispatch>. Each element of this arry defines a callback. So, this patch implements these six callabcks: - ha_quic_ossl_crypto_send() - ha_quic_ossl_crypto_recv_rcd() - ha_quic_ossl_crypto_release_rcd() - ha_quic_ossl_yield_secret() - ha_quic_ossl_got_transport_params() and - ha_quic_ossl_alert(). But at this time, these implementations which must return an int return 0 interpreted as a failure by the OpenSSL QUIC API, except for ha_quic_ossl_alert() which is implemented the same was as for quictls. The five remaining functions above will be implemented by the next patches to come. ha_quic_set_encryption_secrets() and ha_quic_add_handshake_data() have been moved to be defined for both quictls and OpenSSL QUIC API. These callbacks are attached to the SSL objects (sessions) calling qc_ssl_set_cbs() new function. This latter callback the correct function to attached the correct callbacks to the SSL objects (defined by <ha_quic_method> for quictls, and <ha_quic_dispatch> for OpenSSL). The calls to SSL_provide_quic_data() and SSL_process_quic_post_handshake() have been also disabled. These functions are not defined by OpenSSL QUIC API. At this time, the functions which call them are still defined when HAVE_OPENSSL_QUIC is defined.	2025-05-20 15:00:06 +02:00
Willy Tarreau	411b04c7d3	IMPORT: slz: use a better hash for machines with a fast multiply The current hash involves 3 simple shifts and additions so that it can be mapped to a multiply on architecures having a fast multiply. This is indeed what the compiler does on x86_64. A large range of values was scanned to try to find more optimal factors on machines supporting such a fast multiply, and it turned out that new factor 0x1af42f resulted in smoother hashes that provided on average 0.4% better compression on both the Silesia corpus and an mbox file composed of very compressible emails and uncompressible attachments. It's even slightly better than CRC32C while being faster on Skylake. This patch enables this factor on archs with a fast multiply. This is slz upstream commit 82ad1e75c13245a835c1c09764c89f2f6e8e2a40.	2025-05-16 16:43:53 +02:00
Willy Tarreau	0a91c6dcae	BUILD: debug: mark ha_crash_now() as attribute(noreturn) Building on MIPS64 with clang16 incorrectly reports some uninitialized value warnings in stats-proxy.c due to some calls to ABORT_NOW() where the compiler didn't know the code wouldn't return. Let's properly mark the function as noreturn, and take this opportunity for also marking it unused to avoid possible warnings depending on the build options (if ABORT_NOW is not used). No backport needed though it will not harm.	2025-05-16 16:43:53 +02:00
Christopher Faulet	f45a632bad	BUG/MEDIUM: stconn: Disable 0-copy forwarding for filters altering the payload It is especially a problem with Lua filters, but it is important to disable the 0-copy forwarding if a filter alters the payload, or at least to be able to disable it. While the filter is registered on the data filtering, it is not an issue (and it is the common case) because, there is now way to fast-forward data at all. But it may be an issue if a filter decides to alter the payload and to unregister from data filtering. In that case, the 0-copy forwarding can be re-enabled in a hardly precdictable state. To fix the issue, a SC flags was added to do so. The HTTP compression filter set it and lua filters too if the body length is changed (via HTTPMessage.set_body_len()). Note that it is an issue because of a bad design about the HTX. Many info about the message are stored in the HTX structure itself. It must be refactored to move several info to the stream-endpoint descriptor. This should ease modifications at the stream level, from filter or a TCP/HTTP rules. This should be backported as far as 3.0. If necessary, it may be backported on lower versions, as far as 2.6. In that case, it must be reviewed and adapted.	2025-05-16 15:11:37 +02:00
Christopher Faulet	a3940614c2	BUG/MEDIUM: mux-spop: Remove frame parsing states from the SPOP connection state SPOP_CS_FRAME_H and SPOP_CS_FRAME_P states, that were used to handle frame parsing, were removed. The demux process now relies on the demux stream ID to know if it is waiting for the frame header or the frame payload. Concretly, when the demux stream ID is not set (dsi == -1), the demuxer is waiting for the next frame header. Otherwise (dsi >= 0), it is waiting for the frame payload. It is especially important to be able to properly handle DISCONNECT frames sent by the agents. SPOP_CS_RUNNING state is introduced to know the hello handshake was finished and the SPOP connection is able to open SPOP streams and exchange NOTIFY/ACK frames with the agents. It depends on the following fixes: * MINOR: mux-spop: Don't set SPOP connection state to FRAME_H after ACK parsing * BUG/MINOR: mux-spop: Make the demux stream ID a signed integer This change will be mandatory for the next fix. It must be backported to 3.1 with the commits above.	2025-05-13 19:51:40 +02:00

... 6 7 8 9 10 ...

9118 commits