haproxy

mirror of https://github.com/haproxy/haproxy.git synced 2026-05-28 04:12:17 -04:00

Author	SHA1	Message	Date
Aurelien DARRAGON	aa53887398	MINOR: counters: add shared counters helpers to get and drop shared pointers create include/haproxy/counters.h and src/counters.c files to anticipate for further helpers as some counters specific tasks needs to be carried out and since counters are shared between multiple object types (ie: listener, proxy, server..) we need generic helpers. Add some shared counters helper which are not yet used but will be updated in upcoming commits.	2025-06-05 09:59:04 +02:00
Aurelien DARRAGON	a0dcab5c45	MAJOR: counters: add shared counters base infrastructure Shareable counters are not tagged as shared counters and are dynamically allocated in separate memory area as a prerequisite for being stored in shared memory area. For now, GUID and threads groups are not taken into account, this is only a first step. also we ensure all counters are now manipulated using atomic operations, namely, "last_change" counter is now read from and written to using atomic ops. Despite the numerous changes caused by the counters being moved away from counters struct, no change of behavior should be expected.	2025-06-05 09:58:58 +02:00
Christopher Faulet	8ee650a88b	CLEANUP: applet: Update comment for applet_put* functions These functions were copied from the channel API and modified to work with applets using the new API or the legacy one. However, the comments were updated accordingly. It is the purpose of this patch.	2025-06-03 15:03:30 +02:00
Aurelien DARRAGON	368d01361a	MEDIUM: server: add and use srv_init() function rename _srv_postparse() internal function to srv_init() function and group srv_init_per_thr() plus idle conns list init inside it. This way we can perform some simplifications as srv_init() performs multiple server init steps after parsing. SRV_F_CHECKED flag was added, it is automatically set when srv_init() runs successfully. If the flag is already set and srv_init() is called again, nothing is done. This permis to manually call srv_init() earlier than the default POST_CHECK hook when needed without risking to do things twice.	2025-06-02 17:51:33 +02:00
Aurelien DARRAGON	889ef6f67b	MEDIUM: server: automatically add server to proxy list in new_server() while new_server() takes the parent proxy as argument and even assigns srv->proxy to the parent proxy, it didn't actually inserted the server to the parent proxy server list on success. The result is that sometimes we add the server to the list after new_server() is called, and sometimes we don't. This is really error-prone and because of that hooks such as REGISTER_POST_SERVER_CHECK() which as run for all servers listed in all proxies may not be relied upon for servers which are not actually inserted in their parent proxy server list. Plus it feels very strange to have a server that points to a proxy, but then the proxy doesn't know about it because it cannot find it in its server list. To prevent errors and make proxy->srv list reliable, we move the insertion logic directly under new_server(). This requires to know if we are called during parsing or during runtime to either insert or append the server to the parent proxy list. For that we use PR_FL_CHECKED flag from the parent proxy (if the flag is set, then the proxy was checked so we are past the init phase, thus we assume we are called during runtime) This implies that during startup if new_server() has to be cancelled on error paths we need to call srv_detach() (which is now exposed in server.h) before srv_drop(). The consequence of this commit is that REGISTER_POST_SERVER_CHECK() should not run reliably on all servers created using new_server() (without having to manually loop on global servers_list)	2025-06-02 17:51:30 +02:00
Aurelien DARRAGON	943958c3ff	MINOR: proxy: add a true list containing all proxies We have global proxies_list pointer which is announced as the list of "all existing proxies", but in fact it only represents regular proxies declared on the config file through "listen, frontend or backend" keywords It is ambiguous, and we currently don't have a straightforwrd method to iterate over all proxies (either public or internal ones) within haproxy Instead we still have to manually iterate over multiple lists (main proxies, log-forward proxies, peer proxies..) which is error-prone. In this patch we add a struct list member (8 bytes) inside struct proxy in order to store every proxy (except default ones) within a global "proxies" list which is actually representative for all proxies existing under haproxy process, like we already have for servers.	2025-06-02 17:51:21 +02:00
Aurelien DARRAGON	d04843167c	MINOR: stats: add stat_col flags Add stat_col flags member to store .generic bit and prepare for upcoming flags. No functional change expected.	2025-06-02 17:51:08 +02:00
Willy Tarreau	9f4cd435d3	[RELEASE] Released version 3.3-dev0 Released version 3.3-dev0 with the following main changes : - MINOR: version: mention that it's development again	2025-05-28 16:46:34 +02:00
Willy Tarreau	8809251ee0	MINOR: version: mention that it's development again This essentially reverts `a6458fd426`.	2025-05-28 16:46:15 +02:00
Willy Tarreau	a6458fd426	MINOR: version: mention that it's 3.2 LTS now. The version will be maintained up to around Q2 2030. Let's also update the INSTALL file to mention this.	2025-05-28 16:31:27 +02:00
Christopher Faulet	99e755d673	MINOR: listeners: Add support for a label on bind line It is now possile to set a label on a bind line. All sockets attached to this bind line inherits from this label. The idea is to be able to groud of sockets. For now, there is no mechanism to create these groups, this must be done by hand.	2025-05-26 19:00:00 +02:00
Willy Tarreau	3494775a1f	MINOR: ssl: support strict-sni in ssl-default-bind-options Several users already reported that it would be nice to support strict-sni in ssl-default-bind-options. However, in order to support it, we also need an option to disable it. This patch moves the setting of the option from the strict_sni field to a flag in the ssl_options field so that it can be inherited from the default bind options, and adds a new "no-strict-sni" directive to allow to disable it on a specific "bind" line. The test file "del_ssl_crt-list.vtc" which already tests both options was updated to make use of the default option and the no- variant to confirm everything continues to work.	2025-05-22 15:31:54 +02:00
Willy Tarreau	a1577a89a0	MINOR: glitches: add global setting "tune.glitches.kill.cpu-usage" It was mentioned during the development of glitches that it would be nice to support not killing misbehaving connections below a certain CPU usage so that poor implementations that routinely misbehave without impact are not killed. This is now possible by setting a CPU usage threshold under which we don't kill them via this parameter. It defaults to zero so that we continue to kill them by default.	2025-05-21 15:47:42 +02:00
Amaury Denoyelle	00d90e8839	MINOR: quic: adjust quic_conn-t.h include list Adjust include list in quic_conn-t.h. This file is included in many QUIC source, so it is useful to keep as lightweight as possible. Note that connection/QUIC MUX are transformed into forward declaration for better layer separation.	2025-05-21 14:44:27 +02:00
Amaury Denoyelle	01e3b2119a	MINOR: quic: add some missing includes Insert some missing includes statement in QUIC source files. This was detected after the next commit which adjust the include list used in quic_conn-t.h file.	2025-05-21 14:44:27 +02:00
Amaury Denoyelle	f286288471	MINOR: quic: refactor handling of streams after MUX release quic-conn layer has to handle itself STREAM frames after MUX release. If the stream was already seen, it is probably only a retransmitted frame which can be safely ignored. For other streams, an active closure may be needed. Thus it's necessary that quic-conn layer knows the highest stream ID already handled by the MUX after its release. Previously, this was done via <nb_streams> member array in quic-conn structure. Refactor this by replacing <nb_streams> by two members called <stream_max_uni>/<stream_max_bidi>. Indeed, it is unnecessary for quic-conn layer to monitor locally opened uni streams, as the peer cannot by definition emit a STREAM frame on it. Also, bidirectional streams are always opened by the remote side. Previously, <nb_streams> were set by quic-stream layer. Now, <stream_max_uni>/<stream_max_bidi> members are only set one time, just prior to QUIC MUX release. This is sufficient as quic-conn do not use them if the MUX is available. Note that previously, IDs were used relatively to their type, thus incremented by 1, after shifting the original value. For simplification, use the plain stream ID, which is incremented by 4.	2025-05-21 14:26:45 +02:00
Amaury Denoyelle	07d41a043c	MINOR: quic: move function to check stream type in utils Move general function to check if a stream is uni or bidirectional from QUIC MUX to quic_utils module. This should prevent unnecessary include of QUIC MUX header file in other sources.	2025-05-21 14:17:41 +02:00
Amaury Denoyelle	cf45bf1ad8	CLEANUP: quic: remove unused cbuf module Cbuf are not used anymore. Remove the related source and header files, as well as include statements in the rest of QUIC source files.	2025-05-21 14:16:37 +02:00
Frederic Lecaille	b3ac1a636c	MINOR: quic: implement all remaining callbacks for OpenSSL 3.5 QUIC API The quic_conn struct is modified for two reasons. The first one is to store the encoded version of the local tranport parameter as this is done for USE_QUIC_OPENSSL_COMPAT. Indeed, the local transport parameter "should remain valid until after the parameters have been sent" as mentionned by SSL_set_quic_tls_cbs(3) manual. In our case, the buffer is a static buffer attached to the quic_conn object. qc_ssl_set_quic_transport_params() function whose role is to call SSL_set_tls_quic_transport_params() (aliased by SSL_set_quic_transport_params() to set these local tranport parameter into the TLS stack from the buffer attached to the quic_conn struct. The second quic_conn struct modification is the addition of the new ->prot_level (SSL protection level) member added to the quic_conn struct to store "the most recent write encryption level set via the OSSL_FUNC_SSL_QUIC_TLS_yield_secret_fn callback (if it has been called)" as mentionned by SSL_set_quic_tls_cbs(3) manual. This patches finally implements the five remaining callacks to make the haproxy QUIC implementation work. OSSL_FUNC_SSL_QUIC_TLS_crypto_send_fn() (ha_quic_ossl_crypto_send) is easy to implement. It calls ha_quic_add_handshake_data() after having converted qc->prot_level TLS protection level value to the correct ssl_encryption_level_t (boringSSL API/quictls) value. OSSL_FUNC_SSL_QUIC_TLS_crypto_recv_rcd_fn() (ha_quic_ossl_crypto_recv_rcd()) provide the non-contiguous addresses to the TLS stack, without releasing them. OSSL_FUNC_SSL_QUIC_TLS_crypto_release_rcd_fn() (ha_quic_ossl_crypto_release_rcd()) release these non-contiguous buffer relying on the fact that the list of encryption level (qc->qel_list) is correctly ordered by SSL protection level secret establishements order (by the TLS stack). OSSL_FUNC_SSL_QUIC_TLS_yield_secret_fn() (ha_quic_ossl_got_transport_params()) is a simple wrapping function over ha_quic_set_encryption_secrets() which is used by boringSSL/quictls API. OSSL_FUNC_SSL_QUIC_TLS_got_transport_params_fn() (ha_quic_ossl_got_transport_params()) role is to store the peer received transport parameters. It simply calls quic_transport_params_store() and set them into the TLS stack calling qc_ssl_set_quic_transport_params(). Also add some comments for all the OpenSSL 3.5 QUIC API callbacks. This patch have no impact on the other use of QUIC API provided by the others TLS stacks.	2025-05-20 15:00:06 +02:00
Frederic Lecaille	dc6a3c329a	MINOR: quic: Allow the use of the new OpenSSL 3.5.0 QUIC TLS API (to be completed) This patch allows the use of the new OpenSSL 3.5.0 QUIC TLS API when it is available and detected at compilation time. The detection relies on the presence of the OSSL_FUNC_SSL_QUIC_TLS_CRYPTO_SEND macro from openssl-compat.h. Indeed this macro is defined by OpenSSL since 3.5.0 version. It is not defined by quictls. This helps in distinguishing these two TLS stacks. When the detection succeeds, HAVE_OPENSSL_QUIC is also defined by openssl-compat.h. Then, this is this new macro which is used to detect the availability of the new OpenSSL 3.5.0 QUIC TLS API. Note that this detection is done only if USE_QUIC_OPENSSL_COMPAT is not asked. So, USE_QUIC_OPENSSL_COMPAT and HAVE_OPENSSL_QUIC are exclusive. At the same location, from openssl-compat.h, ssl_encryption_level_t enum is defined. This enum was defined by quictls and expansively used by the haproxy QUIC implementation. SSL_set_quic_transport_params() is replaced by SSL_set_quic_tls_transport_params. SSL_set_quic_early_data_enabled() (quictls) is also replaced by SSL_set_quic_tls_early_data_enabled() (OpenSSL). SSL_quic_read_level() (quictls) is not defined by OpenSSL. It is only used by the traces to log the current TLS stack decryption level (read). A macro makes it return -1 which is an usused values. The most of the differences between quictls and OpenSSL QUI APIs are in quic_ssl.c where some callbacks must be defined for these two APIs. This is why this patch modifies quic_ssl.c to define an array of OSSL_DISPATCH structs: <ha_quic_dispatch>. Each element of this arry defines a callback. So, this patch implements these six callabcks: - ha_quic_ossl_crypto_send() - ha_quic_ossl_crypto_recv_rcd() - ha_quic_ossl_crypto_release_rcd() - ha_quic_ossl_yield_secret() - ha_quic_ossl_got_transport_params() and - ha_quic_ossl_alert(). But at this time, these implementations which must return an int return 0 interpreted as a failure by the OpenSSL QUIC API, except for ha_quic_ossl_alert() which is implemented the same was as for quictls. The five remaining functions above will be implemented by the next patches to come. ha_quic_set_encryption_secrets() and ha_quic_add_handshake_data() have been moved to be defined for both quictls and OpenSSL QUIC API. These callbacks are attached to the SSL objects (sessions) calling qc_ssl_set_cbs() new function. This latter callback the correct function to attached the correct callbacks to the SSL objects (defined by <ha_quic_method> for quictls, and <ha_quic_dispatch> for OpenSSL). The calls to SSL_provide_quic_data() and SSL_process_quic_post_handshake() have been also disabled. These functions are not defined by OpenSSL QUIC API. At this time, the functions which call them are still defined when HAVE_OPENSSL_QUIC is defined.	2025-05-20 15:00:06 +02:00
Willy Tarreau	411b04c7d3	IMPORT: slz: use a better hash for machines with a fast multiply The current hash involves 3 simple shifts and additions so that it can be mapped to a multiply on architecures having a fast multiply. This is indeed what the compiler does on x86_64. A large range of values was scanned to try to find more optimal factors on machines supporting such a fast multiply, and it turned out that new factor 0x1af42f resulted in smoother hashes that provided on average 0.4% better compression on both the Silesia corpus and an mbox file composed of very compressible emails and uncompressible attachments. It's even slightly better than CRC32C while being faster on Skylake. This patch enables this factor on archs with a fast multiply. This is slz upstream commit 82ad1e75c13245a835c1c09764c89f2f6e8e2a40.	2025-05-16 16:43:53 +02:00
Willy Tarreau	0a91c6dcae	BUILD: debug: mark ha_crash_now() as attribute(noreturn) Building on MIPS64 with clang16 incorrectly reports some uninitialized value warnings in stats-proxy.c due to some calls to ABORT_NOW() where the compiler didn't know the code wouldn't return. Let's properly mark the function as noreturn, and take this opportunity for also marking it unused to avoid possible warnings depending on the build options (if ABORT_NOW is not used). No backport needed though it will not harm.	2025-05-16 16:43:53 +02:00
Christopher Faulet	f45a632bad	BUG/MEDIUM: stconn: Disable 0-copy forwarding for filters altering the payload It is especially a problem with Lua filters, but it is important to disable the 0-copy forwarding if a filter alters the payload, or at least to be able to disable it. While the filter is registered on the data filtering, it is not an issue (and it is the common case) because, there is now way to fast-forward data at all. But it may be an issue if a filter decides to alter the payload and to unregister from data filtering. In that case, the 0-copy forwarding can be re-enabled in a hardly precdictable state. To fix the issue, a SC flags was added to do so. The HTTP compression filter set it and lua filters too if the body length is changed (via HTTPMessage.set_body_len()). Note that it is an issue because of a bad design about the HTX. Many info about the message are stored in the HTX structure itself. It must be refactored to move several info to the stream-endpoint descriptor. This should ease modifications at the stream level, from filter or a TCP/HTTP rules. This should be backported as far as 3.0. If necessary, it may be backported on lower versions, as far as 2.6. In that case, it must be reviewed and adapted.	2025-05-16 15:11:37 +02:00
Christopher Faulet	a3940614c2	BUG/MEDIUM: mux-spop: Remove frame parsing states from the SPOP connection state SPOP_CS_FRAME_H and SPOP_CS_FRAME_P states, that were used to handle frame parsing, were removed. The demux process now relies on the demux stream ID to know if it is waiting for the frame header or the frame payload. Concretly, when the demux stream ID is not set (dsi == -1), the demuxer is waiting for the next frame header. Otherwise (dsi >= 0), it is waiting for the frame payload. It is especially important to be able to properly handle DISCONNECT frames sent by the agents. SPOP_CS_RUNNING state is introduced to know the hello handshake was finished and the SPOP connection is able to open SPOP streams and exchange NOTIFY/ACK frames with the agents. It depends on the following fixes: * MINOR: mux-spop: Don't set SPOP connection state to FRAME_H after ACK parsing * BUG/MINOR: mux-spop: Make the demux stream ID a signed integer This change will be mandatory for the next fix. It must be backported to 3.1 with the commits above.	2025-05-13 19:51:40 +02:00
Willy Tarreau	e049bd00ab	MEDIUM: config: change default limits to 1024 threads and 32 groups A test run on a dual-socket EPYC 9845 (2x160 cores) showed that we'll be facing new limits during the lifetime of 3.2 with our current 16 groups and 256 threads max: $ cat test.cfg global cpu-policy perforamnce $ ./haproxy -dc -c -f test.cfg ... Thread CPU Bindings: Tgrp/Thr Tid CPU set 1/1-32 1-32 32: 0-15,320-335 2/1-32 33-64 32: 16-31,336-351 3/1-32 65-96 32: 32-47,352-367 4/1-32 97-128 32: 48-63,368-383 5/1-32 129-160 32: 64-79,384-399 6/1-32 161-192 32: 80-95,400-415 7/1-32 193-224 32: 96-111,416-431 8/1-32 225-256 32: 112-127,432-447 Raising the default limit to 1024 threads and 32 groups is sufficient to buy us enough margin for a long time (hopefully, please don't laugh, you, reader from the future): $ ./haproxy -dc -c -f test.cfg ... Thread CPU Bindings: Tgrp/Thr Tid CPU set 1/1-32 1-32 32: 0-15,320-335 2/1-32 33-64 32: 16-31,336-351 3/1-32 65-96 32: 32-47,352-367 4/1-32 97-128 32: 48-63,368-383 5/1-32 129-160 32: 64-79,384-399 6/1-32 161-192 32: 80-95,400-415 7/1-32 193-224 32: 96-111,416-431 8/1-32 225-256 32: 112-127,432-447 9/1-32 257-288 32: 128-143,448-463 10/1-32 289-320 32: 144-159,464-479 11/1-32 321-352 32: 160-175,480-495 12/1-32 353-384 32: 176-191,496-511 13/1-32 385-416 32: 192-207,512-527 14/1-32 417-448 32: 208-223,528-543 15/1-32 449-480 32: 224-239,544-559 16/1-32 481-512 32: 240-255,560-575 17/1-32 513-544 32: 256-271,576-591 18/1-32 545-576 32: 272-287,592-607 19/1-32 577-608 32: 288-303,608-623 20/1-32 609-640 32: 304-319,624-639 We can change this default now because it has no functional effect without any configured cpu-policy, so this will only be an opt-in and it's better to do it now than to have an effect during the maintenance phase. A tiny effect is a doubling of the number of pool buckets and stick-table shards internally, which means that aside slightly reducing contention in these areas, a dump of tables can enumerate keys in a different order (hence the adjustment in the vtc). The only really visible effect is a slightly higher static memory consumption (29->35 MB on a small config), but that difference remains even with 50k servers so that's pretty much acceptable. Thanks to Erwan Velu for the quick tests and the insights!	2025-05-13 18:15:33 +02:00
Amaury Denoyelle	f3b9676416	MINOR: quic: display stream age Add a field to save the creation date of qc_stream_desc instance. This is useful to display QUIC stream age in "show quic stream" output.	2025-05-13 15:44:22 +02:00
Amaury Denoyelle	1ccede211c	MINOR: mux-quic: account Rx data per stream Add counters to measure Rx buffers usage per QCS. This reused the newly defined bdata_ctr type already used for Tx accounting. Note that for now, <tot> value of bdata_ctr is not used. This is because it is not easy to account for data accross contiguous buffers. These values are displayed both on log/traces and "show quic" output.	2025-05-13 15:41:51 +02:00
Amaury Denoyelle	a1dc9070e7	MINOR: quic: account Tx data per stream Add accounting at qc_stream_desc level to be able to report the number of allocated Tx buffers and the sum of their data. This represents data ready for emission or already emitted and waiting on ACK. To simplify this accounting, a new counter type bdata_ctr is defined in quic_utils.h. This regroups both buffers and data counter, plus a maximum on the buffer value. These values are now displayed on QCS info used both on logline and traces, and also on "show quic" output.	2025-05-13 15:41:41 +02:00
Willy Tarreau	ebab479cdf	MINOR: http: add a function to validate characters of :authority As discussed here: https://github.com/httpwg/http2-spec/pull/936 https://github.com/haproxy/haproxy/issues/2941 It's important to take care of some special characters in the :authority pseudo header before reassembling a complete URI, because after assembly it's too late (e.g. the '/'). This patch adds a specific function which was checks all such characters and their ranges on an ist, and benefits from modern compilers optimizations that arrange the comparisons into an evaluation tree for faster match. That's the version that gave the most consistent performance across various compilers, though some hand-crafted versions using bitmaps stored in register could be slightly faster but super sensitive to code ordering, suggesting that the results might vary with future compilers. This one takes on average 1.2ns per character at 3 GHz (3.6 cycles per char on avg). The resulting impact on H2 request processing time (small requests) was measured around 0.3%, from 6.60 to 6.618us per request, which is a bit high but remains acceptable given that the test only focused on req rate. The code was made usable both for H2 and H3.	2025-05-12 18:02:47 +02:00
William Lallemand	96b1f1fd26	MINOR: tools: ha_freearray() frees an array of string ha_freearray() is a new function which free() an array of strings terminated by a NULL entry. The pointer to the array will be free and set to NULL.	2025-05-09 19:12:05 +02:00
Willy Tarreau	8a96216847	MEDIUM: sock-inet: re-check IPv6 connectivity every 30s IPv6 connectivity might start off (e.g. network not fully up when haproxy starts), so for features like resolvers, it would be nice to periodically recheck. With this change, instead of having the resolvers code rely on a variable indicating connectivity, it will now call a function that will check for how long a connectivity check hasn't been run, and will perform a new one if needed. The age was set to 30s which seems reasonable considering that the DNS will cache results anyway. There's no saving in spacing it more since the syscall is very check (just a connect() without any packet being emitted). The variables remain exported so that we could present them in show info or anywhere else. This way, "dns-accept-family auto" will now stay up to date. Warning though, it does perform some caching so even with a refreshed IPv6 connectivity, an older record may be returned anyway.	2025-05-09 15:45:44 +02:00
Willy Tarreau	1404f6fb7b	DEBUG: pools: add a new integrity mode "backup" to copy the released area This way we can preserve the entire contents of the released area for later inspection. This automatically enables comparison at reallocation time as well (like "integrity" does). If used in combination with integrity, the comparison is disabled but the check of non-corruption of the area mangled by integrity is still operated.	2025-05-09 14:57:00 +02:00
William Lallemand	e7574cd5f0	MINOR: acme: add the global option 'acme.scheduler' The automatic scheduler is useful but sometimes you don't want to use, or schedule manually. This patch adds an 'acme.scheduler' option in the global section, which can be set to either 'auto' or 'off'. (auto is the default value) This also change the ouput of the 'acme status' command so it does not shows scheduled values. The state will be 'Stopped' instead of 'Scheduled'.	2025-05-09 14:00:39 +02:00
Willy Tarreau	0ae14beb2a	DEBUG: pool: permit per-pool UAF configuration The new MEM_F_UAF flag can be set just after a pool's creation to make this pool UAF for debugging purposes. This allows to maintain a better overall performance required to reproduce issues while still having a chance to catch UAF. It will only be used by developers who will manually add it to areas worth being inspected, though.	2025-05-09 13:59:02 +02:00
Amaury Denoyelle	294bf26c06	MINOR: quic: extend return value during TP parsing Extend API used for QUIC transport parameter decoding. This is done via the introduction of a dedicated enum to report the various error condition detected. No functional change should occur with this patch, as the only returned code is QUIC_TP_DEC_ERR_TRUNC, which results in the connection closure via a TLS alert. This patch will be necessary to properly reject transport parameters with the proper CONNECTION_CLOSE error code. As such, it should be backported up to 2.6 with the following series.	2025-05-07 15:19:52 +02:00
Willy Tarreau	feaac66b5e	DEBUG: threads: merge successive idempotent lock operations in history In order to make the lock history a bit more useful, let's try to merge adjacent lock/unlock sequences that don't change anything for other threads. For this we can replace the last unlock with the new operation on the same label, and even just not store it if it was the same as the one before the unlock, since in the end it's the same as if the unlock had not been done. Now loops that used to be filled with "R:LISTENER U:LISTENER" show more useful info such as: S:IDLE_CONNS U:IDLE_CONNS S:PEER U:PEER S:IDLE_CONNS U:IDLE_CONNS R:LISTENER U:LISTENER U:STK_TABLE W:STK_SESS U:STK_SESS R:STK_TABLE U:STK_TABLE W:STK_SESS U:STK_SESS R:STK_TABLE R:STK_TABLE U:STK_TABLE W:STK_SESS U:STK_SESS W:STK_TABLE_UPDT U:STK_TABLE_UPDT S:PEER It's worth noting that it can sometimes induce confusion when recursive locks of the same label are used (a few exist on peers or stick-tables), as in such a case the two operations would be needed. However these ones are already undebuggable, so instead they will just have to be renamed to make sure they use a distinct label.	2025-05-05 18:36:12 +02:00
Willy Tarreau	743dce95d2	DEBUG: threads: don't keep lock label "OTHER" in the per-thread history Most threads are filled with "R:OTHER U:OTHER" in their history. Since anything non-important can use other it's not observable but it pollutes the history. Let's just drop OTHER entirely during the recording.	2025-05-05 18:10:57 +02:00
William Lallemand	878a3507df	BUILD: acme: need HAVE_ASN1_TIME_TO_TM Restrict the build of the ACME feature to libraries which provide ASN1_TIME_to_tm() function.	2025-05-02 16:01:32 +02:00
William Lallemand	626de9538e	MINOR: ssl: add function to extract X509 notBefore date in time_t Add x509_get_notbefore_time_t() which returns the notBefore date in time_t format.	2025-05-02 16:01:32 +02:00
Olivier Houchard	388539faa3	MEDIUM: stick-tables: defer adding updates to a tasklet There is a lot of contention trying to add updates to the tree. So instead of trying to add the updates to the tree right away, just add them to a mt-list (with one mt-list per thread group, so that the mt-list does not become the new point of contention that much), and create a tasklet dedicated to adding updates to the tree, in batchs, to avoid keeping the update lock for too long. This helps getting stick tables perform better under heavy load.	2025-05-02 15:27:55 +02:00
Olivier Houchard	faa18c1ad8	BUG/MEDIUM: quic: Let it be known if the tasklet has been released. quic_conn_release() may, or may not, free the tasklet associated with the connection. So make it return 1 if it was, and 0 otherwise, so that if it was called from the tasklet handler itself, the said handler can act accordingly and return NULL if the tasklet was destroyed. This should be backported if `9240cd4a27` is backported.	2025-05-02 11:09:28 +02:00
William Lallemand	18d2371e0d	MINOR: acme: change the default max retries to 5 Change the default max retries constant to 5 instead of 3. Some servers can be be a bit long to execute the challenge.	2025-05-02 09:40:12 +02:00
Olivier Houchard	b138eab302	BUG/MEDIUM: connections: Report connection closing in conn_create_mux() Add an extra parametre to conn_create_mux(), "closed_connection". If a pointer is provided, then let it know if the connection was closed. Callers have no way to determine that otherwise, and we need to know that, at least in ssl_sock_io_cb(), as if the connection was closed we need to return NULL, as the tasklet was free'd, otherwise that can lead to memory corruption and crashes. This should be backported if `9240cd4a27` is backported too.	2025-04-30 17:17:36 +02:00
Olivier Houchard	4abfade371	MINOR: tasks: Remove unused tasklet_remove_from_tasklet_list Remove tasklet_remove_from_tasklet_list, as the function hasn't been used for a long time, and there is little reason to keep it.	2025-04-30 17:09:06 +02:00
Olivier Houchard	2bab043c8c	MEDIUM: tasks: Remove TASK_IN_LIST and use TASK_QUEUED instead. TASK_QUEUED was used to mean "the task has been scheduled to run", TASK_IN_LIST was used to mean "the tasklet has been scheduled to run", remove TASK_IN_LIST and just use TASK_QUEUED for tasklets instead. This commit is just cosmetic, and should not have any impact.	2025-04-30 17:08:57 +02:00
William Lallemand	563ca94ab8	MINOR: ssl/cli: "acme ps" shows the acme tasks Implement a way to display the running acme tasks over the CLI. It currently only displays a "Running" status with the certificate name and the acme section from the configuration. The displayed running tasks are limited to the size of a buffer for now, it will require a backref list later to be called multiple times to resume the list.	2025-04-30 17:12:50 +02:00
Aurelien DARRAGON	97363015a5	MINOR: add hlua_yield_asap() helper When called, this function will try to enforce a yield (if available) as soon as possible. Indeed, automatic yield is already enforced every X Lua instructions. However, there may be some cases where we know after running heavy operation that we should yield already to avoid taking too much CPU at once. This is what this function offers, instead of asking the user to manually yield using "core.yield()" from Lua itself after using an expensive Lua method offered by haproxy, we can directly enforce the yield without the need to do it in the Lua script.	2025-04-30 17:00:27 +02:00
Amaury Denoyelle	df50d3e39f	MINOR: mux-quic: limit emitted MSD frames count per qcs The previous commit has implemented a new calcul method for MAX_STREAM_DATA frame emission. Now, a frame may be emitted as soon as a buffer was consumed by a QCS instance. This will probably increase the number of MAX_STREAM_DATA frame emission. It may even cause a series of frame emitted for the same stream with increasing values under high load, which is completely unnecessary. To improve this, limit the number of MAX_STREAM_DATA frames built to one per QCS instance. This is implemented by storing a reference to this frame in QCS structure via a new member <tx.msd_frm>. Note that to properly reset QCS msd_frm member, emission of flow-control frames have been changed. Now, each frame is emitted individually. On one side, it is better as it prevent to emit frames related to different streams in a single datagram, which is not desirable in case of packet loss. However, this can also increase sendto() syscall invocation.	2025-04-30 16:08:47 +02:00
Amaury Denoyelle	14a3fb679f	MEDIUM: mux-quic: increase flow-control on each bufsize Recently, QCS Rx allocation buffer method has been improved. It is now possible to allocate multiple buffers per QCS instances, which was necessary to improve HTTP/3 POST throughput. However, a limitation remained related to the emission of MAX_STREAM_DATA. These frames are only emitted once at least half of the receive capacity has been consumed by its QCS instance. This may be too restrictive when a client need to upload a large payload. Improve this by adjusting MAX_STREAM_DATA allocation. If QCS capacity is still limited to 1 or 2 buffers max, the old calcul is still used. This is necessary when user has limited upload throughput via their configuration. If QCS capacity is more than 2 buffers, a new frame is emitted if at least a buffer was consumed. This patch has reduced number of STREAM_DATA_BLOCKED frames received in POST tests with some specific clients.	2025-04-30 16:08:47 +02:00
Remi Tricot-Le Breton	047fb37b19	MINOR: Add 'conn' param to ssl_sock_chose_sni_ctx This is only useful in the traces, the conn parameter won't be used otherwise.	2025-04-30 11:11:26 +02:00
Remi Tricot-Le Breton	6519cec2ed	MINOR: ssl: Add traces about sigalg extension parsing in clientHello callback We had to parse the sigAlg extension by hand in order to properly select the certificate used by the SSL frontends. These traces allow to dump the allowed sigAlg list sent by the client in its clientHello.	2025-04-30 11:11:26 +02:00
Remi Tricot-Le Breton	105c1ca139	MINOR: ssl: Add traces to the switchctx callback This callback allows to pick the used certificate on an SSL frontend. The certificate selection is made according to the information sent by the client in the clientHello. The traces that were added will allow to better understand what certificate was chosen and why. It will also warn us if the chosen certificate was the default one. The actual certificate parsing happens in ssl_sock_chose_sni_ctx. It's in this function that we actually get the filename of the certificate used.	2025-04-30 11:11:26 +02:00
Remi Tricot-Le Breton	dbdd0630e1	MINOR: ssl: Add ocsp stapling callback traces If OCSP stapling fails because of a missing or invalid OCSP response we used to silently disable stapling for the given session. We can now know a bit more what happened regarding OCSP stapling.	2025-04-30 11:11:26 +02:00
Remi Tricot-Le Breton	0fb05540b2	MINOR: ssl: Add traces to verify callback Those traces allow to know which errors were met during certificate chain validation as well as which ones were ignored.	2025-04-30 11:11:26 +02:00
Remi Tricot-Le Breton	4a8fa28e36	MINOR: ssl: Add traces around SSL_do_handshake call Those traces dump information about the multiple SSL_do_handshake calls (renegotiation and regular call). Some errors coud also be dumped in case of rejected early data. Depending on the chosen verbosity, some information about the current handshake can be dumped as well (servername, tls version, chosen cipher for instance). In case of failed handshake, the error codes and messages will also be dumped in the log to ease debugging.	2025-04-30 11:11:26 +02:00
Remi Tricot-Le Breton	9f146bdab3	MINOR: ssl: Add traces to ssl_sock_io_cb function Add new SSL traces.	2025-04-30 11:11:26 +02:00
Remi Tricot-Le Breton	475bb8d843	MINOR: ssl: Add traces to recv/send functions Those traces will allow to identify sessions on which early data is used as well as some forcefully closed connections.	2025-04-30 11:11:26 +02:00
Remi Tricot-Le Breton	9bb8d6dcd1	MINOR: ssl: Add traces to ssl init/close functions Add a dedicated trace for some unlikely allocation failures and async errors. Those traces will ostly be used to identify the start and end of a given SSL connection.	2025-04-30 11:11:26 +02:00
Remi Tricot-Le Breton	08e40f4589	MINOR: Add "sigalg" to "sigalg name" helper function This function can be used to convert a TLSv1.3 sigAlg entry (2bytes) from the signature_agorithms client hello extension into a string. In order to ease debugging, some TLSv1.2 combinations can also be dumped. In TLSv1.2 those signature algorithms pairs were built out of a one byte signature identifier combined to a one byte hash identifier. In TLSv1.3 those identifiers are two bytes blocs that must be treated as such.	2025-04-30 11:11:26 +02:00
Willy Tarreau	566b384e4e	MINOR: tools: make my_strndup() take a size_t len instead of and int In relation to issue #2954, it appears that turning some size_t length calculations to the int that uses my_strndup() upsets coverity a bit. Instead of dealing with such warnings each time, better address it at the root. An inspection of all call places show that the size passed there is always positive so we can safely use an unsigned type, and size_t will always suit it like for strndup() where it's available.	2025-04-30 05:17:43 +02:00
Aurelien DARRAGON	5288b39011	BUG/MINOR: dns: prevent ds accumulation within dss when dns session callback (dns_session_release()) is called upon error (ie: when some pending queries were not sent), we try our best to re-create the applet in order to preserve the pending queries and give them a chance to be retried. This is done at the end of dns_session_release(). However, doing so exposes to an issue: if the error preventing queries from being sent is still encountered over and over the dns session could stay there indefinitely. Meanwhile, other dns sessions may be created on the same dns_stream_server periodically. If previous failing dns sessions don't terminate but we also keep creating new ones, we end up accumulating failing sessions on a given dns_stream_server, which can eventually cause ressource shortage. This issue was found when trying to address ("BUG/MINOR: dns: add tempo between 2 connection attempts for dns servers") To fix it, we track the number of failed consecutive sessions for a given dns server. When we reach the threshold (set to 100), we consider that the link to the dns server is broken (at least temporarily) and we force dns_session_new() to fail, so that we stop creating new sessions until one of the existing one eventually succeeds. A workaround for this fix consists in setting the "maxconn" parameter on nameserver directive (under resolvers section) to a reasonnable value so that no more than "maxconn" sessions may co-exist on the same server at a given time. This may be backported to all stable versions. ("CLEANUP: dns: remove unused dns_stream_server struct member") may be backported to ease the backport.	2025-04-29 21:20:54 +02:00
Aurelien DARRAGON	14ebe95a10	CLEANUP: dns: remove unused dns_stream_server struct member dns_stream_server "max_slots" is unused, let's get rid of it	2025-04-29 21:20:44 +02:00
Aurelien DARRAGON	1ced5ef2fd	MINOR: applet: add appctx_schedule() macro Just like task_schedule() but for applets to wakeup an applet at a specific time, leverages _task_schedule() internally	2025-04-29 21:19:37 +02:00
William Lallemand	5555926fdd	MEDIUM: acme: use a map to store tokens and thumbprints The stateless mode which was documented previously in the ACME example is not convenient for all use cases. First, when HAProxy generates the account key itself, you wouldn't be able to put the thumbprint in the configuration, so you will have to get the thumbprint and then reload. Second, in the case you are using multiple account key, there are multiple thumbprint, and it's not easy to know which one you want to use when responding to the challenger. This patch allows to configure a map in the acme section, which will be filled by the acme task with the token corresponding to the challenge, as the key, and the thumbprint as the value. This way it's easy to reply the right thumbprint. Example: http-request return status 200 content-type text/plain lf-string "%[path,field(-1,/)].%[path,field(-1,/),map(virt@acme)]\n" if { path_beg '/.well-known/acme-challenge/' }	2025-04-29 16:15:55 +02:00
Amaury Denoyelle	0f9b3daf98	MEDIUM: quic: limit global Tx memory Define a new settings tune.quic.frontend.max-tot-window. It contains a size argument which can be used to set a limit on the sum of all QUIC connections congestion window. This is applied both on quic_cc_path_set() and quic_cc_path_inc(). Note that this limitation cannot reduce a congestion window more than the minimal limit which is set to 2 datagrams.	2025-04-29 15:19:32 +02:00
Amaury Denoyelle	e841164a44	MINOR: quic: account for global congestion window Use the newly defined cshared type to account for the sum of congestion window of every QUIC connection. This value is stored in global counter quic_mem_global defined in proto_quic module.	2025-04-29 15:19:32 +02:00
Amaury Denoyelle	3891456d20	MINOR: thread: define cshared type Define a new type "struct cshared". This can be used as a tool to manipulate a global counter with thread-safety ensured. Each thread would declare its thread-local cshared type, which would point to a global counter. Each thread can then add/substract value to their owned thread-local cshared instance via cshared_add(). If the difference exceed a configured limit, either positively or negatively, the global counter is updated and thread-local instance is reset to 0. Each thread can safely read the global counter value using cshared_read().	2025-04-29 15:10:06 +02:00
Amaury Denoyelle	7bad88c35c	BUG/MINOR: quic: ensure cwnd limits are always enforced Congestion window is limit by a minimal and maximum values which can never be exceeded. Min value is hardcoded to 2 datagrams as recommended by the specification. Max value is specified via haproxy configuration. These values must be respected each time the congestion window size is adjusted. However, in some rare occasions, limit were not always enforced. Fix this by implementing wrappers to set or increment the congestion window. These functions ensure limits are always applied after the operation. Additionnally, wrappers also ensure that if window reached a new maximum value, it is saved in <cwnd_last_max> field. This should be backported up to 2.6, after a brief period of observation.	2025-04-29 15:10:06 +02:00
Amaury Denoyelle	2eb1b0cd96	MINOR: quic: rename min/max fields for congestion window algo There was some possible confusion between fields related to congestion window size min and max limit which cannot be exceeded, and the maximum value previously reached by the window. Fix this by adopting a new naming scheme. Enforced limit are now renamed <limit_max>/<limit_min>, while the previously reached max value is renamed <cwnd_last_max>. This should be backported up to 3.1.	2025-04-29 15:10:06 +02:00
Willy Tarreau	2cdb3cb91e	MINOR: tcp: add support for setting TCP_NOTSENT_LOWAT on both sides TCP_NOTSENT_LOWAT is very convenient as it indicates when to report EAGAIN on the sending side. It takes a margin on top of the estimated window, meaning that it's no longer needed to store too many data in socket buffers. Instead there's just enough to fill the send window and a little bit of margin to cover the scheduling time to restart sending. Experiments on a 100ms network have shown a 10-fold reduction in the memory used by socket buffers by just setting this value to tune.bufsize, without noticing any performance degradation. Theoretically the responsiveness on multiplexed protocols such as H2 should also be improved.	2025-04-29 12:13:42 +02:00
Willy Tarreau	f25b4abc9b	MINOR: cli: split APPCTX_CLI_ST1_PROMPT into two distinct flags The CLI's "prompt" command toggles two distinct things: - displaying or hiding the prompt at the beginning of the line - single-command vs interactive mode These are two independent concepts and the prompt mode doesn't always cope well with tools that would like to upload data without having to read the prompt on return. Also, the master command line works in interactive mode by default with no prompt, which is not consistent (and not convenient for tools). So let's start by splitting the bit in two, and have a new APPCTX_CLI_ST1_INTER flag dedicated to the interactive mode. For now the "prompt" command alone continues to toggle the two at once.	2025-04-28 20:21:06 +02:00
Willy Tarreau	5ac280f2a7	MINOR: compiler: add more macros to detect macro definitions We add __equals_0(NAME) which is only true if NAME is defined as zero, and __def_as_empty(NAME) which is only true if NAME is defined as an empty string.	2025-04-28 20:21:06 +02:00
Willy Tarreau	12c7189bc8	MEDIUM: thread: set DEBUG_THREAD to 1 by default Setting DEBUG_THREAD to 1 allows recording the lock history for each thread. Tests have shown that (as predicted) the cost of updating a single thread-local variable is not perceptible in the noise, especially when compared to the cost of obtaining a lock. Since this can provide useful value when debugging deadlocks, let's enable it by default when threads are enabled.	2025-04-28 16:50:34 +02:00
Willy Tarreau	d9a659ed96	MINOR: threads/cli: display the lock history on "show threads" This will display the lock labels and modes for each non-empty step at the end of "show threads" when these are defined. This allows to emit up to the last 8 locking operation for each thread on 64 bit machines.	2025-04-28 16:50:34 +02:00
Willy Tarreau	b8a1c2380b	MEDIUM: threads: keep history of taken locks with DEBUG_THREAD > 0 by only storing a word in each thread context, we can keep the history of all taken/dropped locks by label. This is expected to be very cheap and to permit to store up to 8 consecutive lock operations in 64 bits. That should significantly help detect recursive locks as well as figure what thread was likely to hinder another one waiting for a lock. For now we only store the final state of the lock, we don't store the attempt to get it. It's just a matter of space since we already need 4 ops (rd,sk,wr,un) which take 2 bits, leaving max 64 labels. We're already around 45. We could also multiply by 5 and still keep 8 bits total per lock, that would limit us to 51 locks max. It seems that most of the time if we get a watchdog panic, anyway the victim thread will be perfectly located so that we don't need a specific value for this. Another benefit is that we perform a single memory write per lock.	2025-04-28 16:50:34 +02:00
Willy Tarreau	23371b3e7c	MINOR: threads: turn the full lock debugging to DEBUG_THREAD=2 At level 1 it now does nothing. This is reserved for some subsequent patches which will implement lighter debugging.	2025-04-28 16:50:34 +02:00
Willy Tarreau	903a6b14ef	MINOR: threads: prepare DEBUG_THREAD to receive more values We now default the value to zero and make sure all tests properly take care of values above zero. This is in preparation for supporting several degrees of debugging.	2025-04-28 16:50:34 +02:00
William Lallemand	bb768b3e26	MEDIUM: acme: use Retry-After value for retries Parse the Retry-After header in response and store it in order to use the value as the next delay for the next retry, fallback to 3s if the value couldn't be parse or does not exist.	2025-04-24 20:14:47 +02:00
Willy Tarreau	69b051d1dc	MINOR: resolvers: add "dns-accept-family auto" to rely on detected IPv6 Instead of always having to force IPv4 or IPv6, let's now also offer "auto" which will only enable IPv6 if the system has a default gateway for it. This means that properly configured dual-stack systems will default to "ipv4,ipv6" while those lacking a gateway will only use "ipv4". Note that no real connectivity test is performed, so firewalled systems may still get it wrong and might prefer to rely on a manual "ipv4" assignment.	2025-04-24 17:52:28 +02:00
Willy Tarreau	5d41d476f3	MINOR: sock-inet: detect apparent IPv6 connectivity In order to ease dual-stack deployments, we could at least try to check if ipv6 seems to be reachable. For this we're adding a test based on a UDP connect (no traffic) on port 53 to the base of public addresses (2001::) and see if the connect() is permitted, indicating that the routing table knows how to reach it, or fails. Based on this result we're setting a global variable that other subsystems might use to preset their defaults.	2025-04-24 17:52:28 +02:00
Willy Tarreau	2c46c2c042	MINOR: resolvers: add command-line argument -4 to force IPv4-only DNS In order to ease troubleshooting and testing, the new "-4" command line argument enforces queries and processing of "A" DNS records only, i.e. those representing IPv4 addresses. This can be useful when a host lack end-to-end dual-stack connectivity. This overrides the global "dns-accept-family" directive and is equivalent to value "ipv4".	2025-04-24 17:52:28 +02:00
Willy Tarreau	940fa19ad8	MEDIUM: resolvers: add global "dns-accept-family" directive By default, DNS resolvers accept both IPv4 and IPv6 addresses. This can be influenced by the "resolve-prefer" keywords on server lines as well as the family argument to the "do-resolve" action, but that is only a preference, which does not block the other family from being used when it's alone. In some environments where dual-stack is not usable, stumbling on an unreachable IPv6-only DNS record can cause significant trouble as it will replace a previous IPv4 one which would possibly have continued to work till next request. The "dns-accept-family" global option permits to enforce usage of only one (or both) address families. The argument is a comma-delimited list of the following words: - "ipv4": query and accept IPv4 addresses ("A" records) - "ipv6": query and accept IPv6 addresses ("AAAA" records) When a single family is used, no request will be sent to resolvers for the other family, and any response for the othe family will be ignored. The default value is "ipv4,ipv6", which effectively enables both families.	2025-04-24 17:52:28 +02:00
Christopher Faulet	29632bcabf	CLEANUP: applet: Remove unsued rule pointer in appctx structure Thanks to previous commits, the "rule" field in the appctx structure is no longer used. So we can safely remove it.	2025-04-24 16:22:31 +02:00
Christopher Faulet	b734d7c156	MINOR: cli/applet: Move appctx fields only used by the CLI in a private context There are several fields in the appctx structure only used by the CLI. To make things cleaner, all these fields are now placed in a dedicated context inside the appctx structure. The final goal is to move it in the service context and add an API for cli commands to get a command coontext inside the cli context.	2025-04-24 15:09:37 +02:00
Christopher Faulet	742dc01537	CLEANUP: applet: Update st0/st1 comment in appctx structure Today, these states are used by almost all applets. So update the comments of these fields.	2025-04-24 15:09:37 +02:00
Christopher Faulet	44ace9a1b7	MINOR: cli: Rename some CLI applet states to reflect recent refactoring CLI_ST_GETREQ state was renamed into CLI_ST_PARSE_CMDLINE and CLI_ST_PARSEREQ into CLI_ST_PROCESS_CMDLINE to reflect the real action performed in these states.	2025-04-24 15:09:37 +02:00
Christopher Faulet	20ec1de214	MAJOR: cli: Refacor parsing and execution of pipelined commands Before this patch, when pipelined commands were received, each command was parsed and then excuted before moving to the next command. Pending commands were not copied in the input buffer of the applet. The major issue with this way to handle commands is the impossibility to consume inputs from commands with an I/O handler, like "show events" for instance. It was working thanks to a "bug" if such commands were the last one on the command line. But it was impossible to use them followed by another command. And this prevents us to implement any streaming support for CLI commands. So we decided to refactor the command line parsing to have something similar to a basic shell. Now an entire line is parsed, including the payload, before starting commands execution. The command line is copied in a dedicated buffer. "appctx->chunk" buffer is used for this purpose. It was an unsed field, so it is safe to use it here. Once the command line copied, the commands found on this line are executed. Because the applet input buffer was flushed, any input can be safely consumed by the CLI applet and is available for the command I/O handler. Thanks to this change, "show event -w" command can be followed by a command. And in theory, it should be possible to implement commands supporting input data streaming. For instance, the Tetris like lua applet can be used on the CLI now. Note that the payload, if any, is part of the command line and must be fully received before starting the commands processing. It means there is still the limitation to a buffer, but not only for the payload but for the whole command line. The payload is still necessarily at the end of the command line and is passed as argument to the last command. Internally, the "appctx->cli_payload" field was introduced to point on the payload in the command line buffer. This patch is quite huge but it cannot easily be splitted. It should not introduced significant changes.	2025-04-24 15:09:37 +02:00
Willy Tarreau	1af592c511	MINOR: stick-table: use a separate lock label for updates Too many locks were sharing STK_TABLE_LOCK making it hard to analyze. Let's split the already heavily used update lock.	2025-04-24 14:02:22 +02:00
William Lallemand	af73f98a3e	MEDIUM: acme: rename "uri" into "directory" Rename the "uri" option of the acme section into "directory".	2025-04-24 10:52:46 +02:00
William Lallemand	d700a242b4	MINOR: httpclient: add an "https" log-format Add an experimental "https" log-format for the httpclient, it is not used by the httpclient by default, but could be define in a customized proxy. The string is basically a httpslog, with some of the fields replaced by their backend equivalent or - when not available: "%ci:%cp [%tr] %ft -/- %TR/%Tw/%Tc/%Tr/%Ta %ST %B %CC %CS %tsc %ac/%fc/%bc/%sc/%rc %sq/%bq %hr %hs %{+Q}r %[bc_err]/%[ssl_bc_err,hex]/-/-/%[ssl_bc_is_resumed] -/-/-"	2025-04-23 15:32:46 +02:00
Christopher Faulet	a56feffc6f	CLEANUP: h1: Remove now useless h1_parse_cont_len_header() function Since the commit "MINOR: hlua/h1: Use http_parse_cont_len_header() to parse content-length value", this function is no longer used. So it can be safely removed.	2025-04-22 16:14:47 +02:00
Christopher Faulet	5200203677	MINOR: proxy: Add options to drop HTTP trailers during message forwarding In RFC9110, it is stated that trailers could be merged with the headers. While it should be performed with a speicial care, it may be a problem for some applications. To avoid any trouble with such applications, two new options were added to drop trailers during the message forwarding. On the backend, "http-drop-request-trailers" option can be enabled to drop trailers from the requests before sending them to the server. And on the frontend, "http-drop-response-trailers" option can be enabled to drop trailers from the responses before sending them to the client. The options can be defined in defaults sections and disabled with "no" keyword. This patch should fix the issue #2930.	2025-04-22 16:14:46 +02:00
Christopher Faulet	044ef9b3d6	CLEANUP: Slightly reorder some proxy option flags to free slots PR_O_TCPCHK_SSL and PR_O_CONTSTATS was shifted to free a slot. The idea is to have 2 contiguous slots to be able to insert two new options.	2025-04-22 16:14:46 +02:00
Amaury Denoyelle	4309a6fbf8	BUG/MINOR: quic: do not crash on CRYPTO ncbuf alloc failure To handle out-of-order received CRYPTO frames, a ncbuf instance is allocated. This is done via the helper quic_get_ncbuf(). Buffer allocation was improperly checked. In case b_alloc() fails, it crashes due to a BUG_ON(). Fix this by removing it. The function now returns NULL on allocation failure, which is already properly handled in its caller qc_handle_crypto_frm(). This should fix the last reported crash from github issue #2935. This must be backported up to 2.6.	2025-04-18 18:11:17 +02:00
Olivier Houchard	3758eab71c	MEDIUM: lb_fwrr: Use one ebtree per thread group. When using the round-robin load balancer, the major source of contention is the lbprm lock, that has to be held every time we pick a server. To mitigate that, make it so there are one tree per thread-group, and one lock per thread-group. That means we now have a lb_fwrr_per_tgrp structure that will contain the two lb_fwrr_groups (active and backup) as well as the lock to protect them in the per-thread lbprm struct, and all fields in the struct server are now moved to the per-thread structure too. Those changes are mostly mechanical, and brings good performances improvment, on a 64-cores AMD CPU, with 64 servers configured, we could process about 620000 requests par second, and we now can process around 1400000 requests per second.	2025-04-17 17:38:23 +02:00
Olivier Houchard	f36f6cfd26	MINOR: proxies: Add a per-thread group lbprm struct. Add a new structure in the per-thread groups proxy structure, that will contain whatever is per-thread group in lbprm. It will be accessed as p->per_tgrp[tgid].lbprm.	2025-04-17 17:38:23 +02:00
Olivier Houchard	7ca1c94ff0	MINOR: lb_fwrr: Move the next weight out of fwrr_group. Move the "next_weight" outside of fwrr_group, and inside struct lb_fwrr directly, one for the active servers, one for the backup servers. We will soon have one fwrr_group per thread group, but next_weight will be global to all of them.	2025-04-17 17:38:23 +02:00
Olivier Houchard	444125a764	MINOR: servers: Provide a pointer to the server in srv_per_tgroup. Add a pointer to the server into the struct srv_per_tgroup, so that if we only have access to that srv_per_tgroup, we can come back to the corresponding server.	2025-04-17 17:38:23 +02:00
Willy Tarreau	36ec70c526	MINOR: sched: add a new function is_sched_alive() to report scheduler's health This verifies that the scheduler is still ticking without having to access the activity[] array nor keeping local copies of the ctxsw counter. It just tests and sets a flag that is reset after each return from a ->process() function.	2025-04-17 16:25:47 +02:00
Willy Tarreau	874ba2afed	CLEANUP: debug: no longer set nor use TH_FL_DUMPING_OTHERS TH_FL_DUMPING_OTHERS was being used to try to perform exclusion between threads running "show threads" and those producing warnings. Now that it is much more cleanly handled, we don't need that type of protection anymore, which was adding to the complexity of the solution. Let's just get rid of it.	2025-04-17 16:25:47 +02:00
Willy Tarreau	c16d5415a8	MINOR: debug: make ha_stuck_warning() only work for the current thread Since we no longer call it with a foreign thread, let's simplify its code and get rid of the special cases that were relying on ha_thread_dump_fill() and synchronization with a remote thread. We're not only dumping the current thread so ha_thread_dump_one() is sufficient.	2025-04-17 16:25:47 +02:00
Willy Tarreau	b24d7f248e	MINOR: pass a valid buffer pointer to ha_thread_dump_one() The goal is to let the caller deal with the pointer so that the function only has to fill that buffer without worrying about locking. This way, synchronous dumps from "show threads" are produced and emitted directly without causing undesired locking of the buffer nor risking causing confusion about thread_dump_buffer containing bits from an interrupted dump in progress. It's only the caller that's responsible for notifying the requester of the end of the dump by setting bit 0 of the pointer if needed (i.e. it's only done in the debug handler).	2025-04-17 16:25:47 +02:00
Willy Tarreau	5ac739cd0c	MINOR: debug: remove unused case of thr!=tid in ha_thread_dump_one() This function was initially designed to dump any threadd into the presented buffer, but the way it currently works is that it's always called for the current thread, and uses the distinction between coming from a sighandler or being called directly to detect which thread is the caller. Let's simplify all this by replacing thr with tid everywhere, and using the thread-local pointers where it makes sense (e.g. th_ctx, th_ctx etc). The confusing "from_signal" argument is now replaced with "is_caller" which clearly states whether or not the caller declares being the one asking for the dump (the logic is inverted, but there are only two call places with a constant).	2025-04-17 16:25:47 +02:00
Willy Tarreau	6d8a523d14	MINOR: tinfo: keep a copy of the pointer to the thread dump buffer Instead of using the thread dump buffer for post-mortem analysis, we'll keep a copy of the assigned pointer whenever it's used, even for warnings or "show threads". This will offer more opportunities to figure from a core what happened, and will give us more freedom regarding the value of the thread_dump_buffer itself. For example, even at the end of the dump when the pointer is reset, the last used buffer is now preserved.	2025-04-17 16:25:47 +02:00
Willy Tarreau	337017e2f9	BUG/MINOR: threads: set threads_idle and threads_harmless even with no threads Some signal handlers rely on these to decide about the level of detail to provide in dumps, so let's properly fill the info about entering/leaving idle. Note that for consistency with other tests we're using bitops with t->ltid_bit, while we could simply assign 0/1 to the fields. But it makes the code more readable and the whole difference is only 88 bytes on a 3MB executable. This bug is not important, and while older versions are likely affected as well, it's not worth taking the risk to backport this in case it would wake up an obscure bug.	2025-04-17 16:25:47 +02:00
Amaury Denoyelle	52246249ab	MEDIUM: listener/mux-h2: implement idle-ping on frontend side This commit is the counterpart of the previous one, adapted on the frontend side. "idle-ping" is added as keyword to bind lines, to be able to refresh client timeout of idle frontend connections. H2 MUX behavior remains similar as the previous patch. The only significant change is in h2c_update_timeout(), as idle-ping is now taken into account also for frontend connection. The calculated value is compared with http-request/http-keep-alive timeout value. The shorter delay is then used as expired date. As hr/ka timeout are based on idle_start, this allows to run them in parallel with an idle-ping timer.	2025-04-17 14:49:36 +02:00
Amaury Denoyelle	a78a04cfae	MEDIUM: server/mux-h2: implement idle-ping on backend side This commit implements support for idle-ping on the backend side. First, a new server keyword "idle-ping" is defined in configuration parsing. It is used to set the corresponding new server member. The second part of this commit implements idle-ping support on H2 MUX. A new inlined function conn_idle_ping() is defined to access connection idle-ping value. Two new connection flags are defined H2_CF_IDL_PING and H2_CF_IDL_PING_SENT. The first one is set for idle connections via h2c_update_timeout(). On h2_timeout_task() handler, if first flag is set, instead of releasing the connection as before, the second flag is set and tasklet is scheduled. As both flags are now set, h2_process_mux() will proceed to PING emission. The timer has also been rearmed to the idle-ping value. If a PING ACK is received before next timeout, connection timer is refreshed. Else, the connection is released, as with timer expiration. Also of importance, special care is needed when a backend connection is going to idle. In this case, idle-ping timer must be rearmed. Thus a new invokation of h2c_update_timeout() is performed on h2_detach().	2025-04-17 14:49:36 +02:00
William Lallemand	e778049ffc	MINOR: acme: register the task in the ckch_store This patch registers the task in the ckch_store so we don't run 2 tasks at the same time for a given certificate. Move the task creation under the lock and check if there was already a task under the lock.	2025-04-16 17:12:43 +02:00
William Lallemand	c291a5c73c	BUILD: incompatible pointer type suspected with -DDEBUG_UNIT src/jws.c: In function '__jws_init': src/jws.c:594:38: error: passing argument 2 of 'hap_register_unittest' from incompatible pointer type [-Wincompatible-pointer-types] 594 \| hap_register_unittest("jwk", jwk_debug); \| ^~~~~~~~~ \| \| \| int ()(int, char ) In file included from include/haproxy/api.h:36, from include/import/ebtree.h:251, from include/import/ebmbtree.h:25, from include/haproxy/jwt-t.h:25, from src/jws.c:5: include/haproxy/init.h:37:52: note: expected 'int ()(void)' but argument is of type 'int ()(int, char )' 37 \| void hap_register_unittest(const char name, int (*fct)()); \| ~~~~~~^~~~~~ GCC 15 is warning because the function pointer does have its arguments in the register function. Should fix issue #2929.	2025-04-15 15:49:44 +02:00
Willy Tarreau	b708345c17	DEBUG: counters: add the ability to enable/disable updating the COUNT_IF counters These counters can have a noticeable cost on large machines, though not dramatic. There's no single good choice to keep them enabled or disabled. This commit adds multiple choices: - DEBUG_COUNTERS set to 2 will automatically enable them by default, while 1 will disable them by default - the global "debug.counters on/off" will allow to change the setting at boot, regardless of DEBUG_COUNTERS as long as it was at least 1. - the CLI "debug counters on/off" will also allow to change the value at run time, allowing to observe a phenomenon while it's happening, or to disable counters if it's suspected that their cost is too high Finally, the "debug counters" command will append "(stopped)" at the end of the CNT lines when these counters are stopped. Not that the whole mechanism would easily support being extended to all counter types by specifying the types to apply to, but it doesn't seem useful at all and would require the user to also type "cnt" on debug lines. This may easily be changed in the future if it's found relevant.	2025-04-14 19:02:13 +02:00
Willy Tarreau	a142adaba0	DEBUG: counters: make COUNT_IF() only appear at DEBUG_COUNTERS>=1 COUNT_IF() is convenient but can be heavy since some of them were found to trigger often (roughly 1 counter per request on avg). This might even have an impact on large setups due to the cost of a shared cache line bouncing between multiple cores. For now there's no way to disable it, so let's only enable it when DEBUG_COUNTERS is 1 or above. A future change will make it configurable.	2025-04-14 19:02:13 +02:00
Willy Tarreau	61d633a3ac	DEBUG: rename DEBUG_GLITCHES to DEBUG_COUNTERS and enable it by default Till now the per-line glitches counters were only enabled with the confusingly named DEBUG_GLITCHES (which would not turn glitches off when disabled). Let's instead change it to DEBUG_COUNTERS and make sure it's enabled by default (though it can still be disabled with -DDEBUG_GLITCHES=0 just like for DEBUG_STRICT). It will later be expanded to cover more counters.	2025-04-14 19:02:13 +02:00
William Lallemand	39c05cedff	BUILD: acme: enable the ACME feature when JWS is present The ACME feature depends on the JWS, which currently does not work with every SSL libraries. This patch only enables ACME when JWS is enabled.	2025-04-12 01:39:03 +02:00
William Lallemand	5500bda9eb	MINOR: acme: implement retrieval of the certificate Once the Order status is "valid", the certificate URL is accessible, this patch implements the retrieval of the certificate which is stocked in ctx->store.	2025-04-12 01:39:03 +02:00
William Lallemand	27fff179fe	MINOR: acme: verify the order status once finalized This implements a call to the order status to check if the certificate is ready.	2025-04-12 01:39:03 +02:00
William Lallemand	680222b382	MINOR: acme: finalize by sending the CSR This patch does the finalize step of the ACME task. This encodes the CSR into base64 format and send it to the finalize URL. https://www.rfc-editor.org/rfc/rfc8555#section-7.4	2025-04-12 01:29:27 +02:00
William Lallemand	de5dc31a0d	MINOR: acme: generate the CSR in a X509_REQ Generate the X509_REQ using the generated private key and the SAN from the configuration. This is only done once before the task is started. It could probably be done at the beginning of the task with the private key generation once we have a scheduler instead of a CLI command.	2025-04-12 01:29:27 +02:00
William Lallemand	00ba62df15	MINOR: acme: implement a check on the challenge status This patch implements a check on the challenge URL, once haproxy asked for the challenge to be verified, it must verify the status of the challenge resolution and if there weren't any error.	2025-04-12 01:29:27 +02:00
William Lallemand	711a13a4b4	MINOR: acme: send the request for challenge ready This patch sends the "{}" message to specify that a challenge is ready. It iterates on every challenge URL in the authorization list from the acme_ctx. This allows the ACME server to procede to the challenge validation. https://www.rfc-editor.org/rfc/rfc8555#section-7.5.1	2025-04-12 01:29:27 +02:00
William Lallemand	ae0bc88f91	MINOR: acme: get the challenges object from the Auth URL This patch implements the retrieval of the challenges objects on the authorizations URLs. The challenges object contains a token and a challenge url that need to be called once the challenge is setup. Each authorization URLs contain multiple challenge objects, usually one per challenge type (HTTP-01, DNS-01, ALPN-01... We only need to keep the one that is relevent to our configuration.	2025-04-12 01:29:27 +02:00
William Lallemand	4842c5ea8c	MINOR: acme: newOrder request retrieve authorizations URLs This patch implements the newOrder action in the ACME task, in order to ask for a new certificate, a list of SAN is sent as a JWS payload. the ACME server replies a list of Authorization URLs. One Authorization is created per SAN on a Order. The authorization URLs are stored in a linked list of 'struct acme_auth' in acme_ctx, so we can get the challenge URLs from them later. The location header is also store as it is the URL of the order object. https://datatracker.ietf.org/doc/html/rfc8555#section-7.4	2025-04-12 01:29:27 +02:00
William Lallemand	04d393f661	MINOR: acme: generate new account The new account action in the ACME task use the same function as the chkaccount, but onlyReturnExisting is not sent in this case!	2025-04-12 01:29:27 +02:00
William Lallemand	7f9bf4d5f7	MINOR: acme: check if the account exist This patch implements the retrival of the KID (account identifier) using the pkey. A request is sent to the newAccount URL using the onlyReturnExisting option, which allow to get the kid of an existing account. acme_jws_payload() implement a way to generate a JWS payload using the nonce, pkey and provided URI.	2025-04-12 01:29:27 +02:00
William Lallemand	0aa6dedf72	MINOR: acme: handle the nonce ACME requests are supposed to be sent with a Nonce, the first Nonce should be retrieved using the newNonce URI provided by the directory. This nonce is stored and must be replaced by the new one received in the each response.	2025-04-12 01:29:27 +02:00
William Lallemand	471290458e	MINOR: acme: get the ACME directory The first request of the ACME protocol is getting the list of URLs for the next steps. This patch implements the first request and the parsing of the response. The response is a JSON object so mjson is used to parse it.	2025-04-12 01:29:27 +02:00
William Lallemand	b8209cf697	MINOR: acme/cli: add the 'acme renew' command The "acme renew" command launch the ACME task for a given certificate. The CLI parser generates a new private key using the parameters from the acme section..	2025-04-12 01:29:27 +02:00
William Lallemand	bf6a39c4d1	MINOR: acme: add private key configuration This commit allows to configure the generated private keys, you can configure the keytype (RSA/ECDSA), the number of bits or the curves. Example: acme LE uri https://acme-staging-v02.api.letsencrypt.org/directory account account.key contact foobar@example.com challenge HTTP-01 keytype ECDSA curves P-384	2025-04-12 01:29:27 +02:00
William Lallemand	2e8c350b95	MINOR: acme: add configuration for the crt-store Add new acme keywords for the ckch_conf parsing, which will be used on a crt-store, a crt line in a frontend, or even a crt-list. The cfg_postparser_acme() is called in order to check if a section referenced elsewhere really exists in the config file.	2025-04-12 01:29:27 +02:00
William Lallemand	077e2ce84c	MINOR: acme: add the acme section in the configuration parser Add a configuration parser for the new acme section, the section is configured this way: acme letsencrypt uri https://acme-staging-v02.api.letsencrypt.org/directory account account.key contact foobar@example.com challenge HTTP-01 When unspecified, the challenge defaults to HTTP-01, and the account key to "<section_name>.account.key". Section are stored in a linked list containing acme_cfg structures, the configuration parsing is mostly resolved in the postsection parser cfg_postsection_acme() which is called after the parsing of an acme section.	2025-04-12 01:29:27 +02:00
William Lallemand	20718f40b6	MEDIUM: ssl/ckch: add filename and linenum argument to crt-store parsing Add filename and linenum arguments to the crt-store / ckch_conf parsing. It allows to use them in the parsing function so we could emits error.	2025-04-12 01:29:27 +02:00
Willy Tarreau	00c967fac4	MINOR: master/cli: support bidirectional communications with workers Some rare commands in the worker require to keep their input open and terminate when it's closed ("show events -w", "wait"). Others maintain a per-session context ("set anon on"). But in its default operation mode, the master CLI passes commands one at a time to the worker, and closes the CLI's input channel so that the command can immediately close upon response. This effectively prevents these two specific cases from being used. Here the approach that we take is to introduce a bidirectional mode to connect to the worker, where everything sent to the master is immediately forwarded to the worker (including the raw command), allowing to queue multiple commands at once in the same session, and to continue to watch the input to detect when the client closes. It must be a client's choice however, since doing so means that the client cannot batch many commands at once to the master process, but must wait for these commands to complete before sending new ones. For this reason we use the prefix "@@<pid>" for this. It works exactly like "@" except that it maintains the channel open during the whole execution. Similarly to "@<pid>" with no command, "@@<pid>" will simply open an interactive CLI session to the worker, that will be ended by "quit" or by closing the connection. This can be convenient for the user, and possibly for clients willing to dedicate a connection to the worker.	2025-04-11 16:09:17 +02:00
Aurelien DARRAGON	fbfeb591f7	MINOR: proxy: add deinit_proxy() helper func Same as free_proxy(), but does not free the base proxy pointer (ie: the proxy itself may not be allocated) Goal is to be able to cleanup statically allocated dummy proxies.	2025-04-10 22:10:31 +02:00
Aurelien DARRAGON	e1cec655ee	MINOR: proxy: add setup_new_proxy() function Split alloc_new_proxy() in two functions: the preparing part is now handled by setup_new_proxy() which can be called individually, while alloc_new_proxy() takes care of allocating a new proxy struct and then calling setup_new_proxy() with the freshly allocated proxy.	2025-04-10 22:10:31 +02:00
Willy Tarreau	f4634e5a38	MINOR: ring/cli: support delimiting events with a trailing \0 on "show events" At the moment it is not supported to produce multi-line events on the "show events" output, simply because the LF character is used as the default end-of-event mark. However it could be convenient to produce well-formatted multi-line events, e.g. in JSON or other formats. UNIX utilities have already faced similar needs in the past and added "-print0" to "find" and "-0" to "xargs" to mention that the delimiter is the NUL character. This makes perfect sense since it's never present in contents, so let's do exactly the same here. Thus from now on, "show events <ring> -0" will delimit messages using a \0 instead of a \n, permitting a better and safer encapsulation.	2025-04-08 14:36:35 +02:00
Willy Tarreau	0be6d73e88	MINOR: ring: support arbitrary delimiters through ring_dispatch_messages() In order to support delimiting output events with other characters than just the LF, let's pass the delimiter through the API. The default remains the LF, used by applet_append_line(), and ignored by the log forwarder.	2025-04-08 14:36:35 +02:00
Willy Tarreau	f01ff2478f	BUILD: atomics: fix build issue on non-x86/non-arm systems Commit `f435a2e518` ("CLEANUP: atomics: also replace __sync_synchronize() with __atomic_thread_fence()") replaced the builtins used for barriers, but the different API required an argument while the macros didn't specify any, resulting in double parenthesis that were causing obscure build errors such as "called object type 'void' is not a function or function pointer". Let's just specify the args for the macro. No backport is needed.	2025-04-07 09:38:22 +02:00
Aurelien DARRAGON	11d4d0957e	MEDIUM: task: make notification_* API thread safe by default Some notification_* functions were not thread safe by default as they assumed only one producer would emit events for registered tasks. While this suited well with the Lua sockets use-case, this proved to be a limitation with some other event sources (ie: lua Queue class) instead of having to deal with both the non thread safe and thread safe variants (_mt suffix), which is error prone, let's make the entire API thread safe regarding the event list. Pruning functions still require that only one thread executes them, with Lua this is always the case because there is one cleanup list per context.	2025-04-03 17:52:50 +02:00
Aurelien DARRAGON	748dba4859	MINOR: hlua_fcn: register queue class using hlua_register_metatable() Most lua classes are registered by leveraging the hlua_register_metatable() helper. Let's use that for the Queue class as well for consitency.	2025-04-03 17:52:17 +02:00
Aurelien DARRAGON	b77b1a2c3a	MINOR: task: add thread safe notification_new and notification_wake variants notification_new and notification_wake were historically meant to be called by a single thread doing both the init and the wakeup for other tasks waiting on the signals. In this patch, we extend the API so that notification_new and notification_wake have thread-safe variants that can safely be used with multiple threads registering on the same list of events and multiple threads pushing updates on the list.	2025-04-03 17:52:03 +02:00
Amaury Denoyelle	f0f1816f1a	MINOR: check: implement check-pool-conn-name srv keyword This commit is a direct follow-up of the previous one. It defines a new server keyword check-pool-conn-name. It is used as the default value for the name parameter of idle connection hash generation. Its behavior is similar to server keyword pool-conn-name, but reserved for checks reuse. If check-pool-conn-name is set, it is used in priority to match a connection for reuse. If unset, a fallback is performed on check-sni.	2025-04-03 17:19:07 +02:00
Amaury Denoyelle	43367f94f1	MINOR: check/backend: support conn reuse with SNI Support for connection reuse during server checks was implemented recently. This is activated with the server keyword check-reuse-pool. Similarly to stream processing via connect_backend(), a connection hash is calculated when trying to perform reuse for checks. This is necessary to retrieve for a connection which shares the check connect parameters. However, idle connections can additionnally be tagged using a pool-conn-name or SNI under connect_backend(). Check reuse does not test these values, which prevent to retrieve a matching connection. Improve this by using "check-sni" value as idle connection hash input for check reuse. be_calculate_conn_hash() API has been adjusted so that name value can be passed as input, both when using streams or checks. Even with the current patch, there is still some scenarii which could not be covered for checks connection reuse. most notably, when using dynamic pool-conn-name/SNI value. It is however at least sufficient to cover simpler cases.	2025-04-03 17:19:07 +02:00
Willy Tarreau	f435a2e518	CLEANUP: atomics: also replace __sync_synchronize() with __atomic_thread_fence() The drop of older compilers also allows us to focus on clearer barriers, so let's use them.	2025-04-03 11:59:31 +02:00
Willy Tarreau	34e3b83f9c	CLEANUP: atomics: remove support for gcc < 4.7 The old __sync_* API is no longer necessary since we do not support gcc before 4.7 anymore. Let's just get rid of this code, the file is still ugly enough without it.	2025-04-03 11:55:35 +02:00
Ilia Shipitsin	27a6353ceb	CLEANUP: assorted typo fixes in the code, commits and doc	2025-04-03 11:37:25 +02:00
William Lallemand	b351f06ff1	REORG: ssl: move curves2nid and nid2nist to ssl_utils curves2nid and nid2nist are generic functions that could be used outside the JWS scope, this patch put them at the right place so they can be reused.	2025-04-02 19:34:09 +02:00
Amaury Denoyelle	f1fb396d71	MEDIUM: check: implement check-reuse-pool Implement the possibility to reuse idle connections when performing server checks. This is done thanks to the recently introduced functions be_calculate_conn_hash() and be_reuse_connection(). One side effect of this change is that be_calculate_conn_hash() can now be called with a NULL stream instance. As such, part of the functions are adjusted accordingly. Note that to simplify configuration, connection reuse is not performed if any specific check connection parameters are defined on the server line or via the tcp-check connect rule. This is performed via newly defined tcpcheck_use_nondefault_connect().	2025-04-02 14:57:40 +02:00
Amaury Denoyelle	e34f748e3a	MINOR: check define check-reuse-pool server keyword Define a new server keyword check-reuse-pool, and its counterpart with a "no" prefix. For the moment, only parsing is implemented. The real behavior adjustment will be implemented in the next patch.	2025-04-02 14:57:40 +02:00
Amaury Denoyelle	20eb57b486	MINOR: backend: remove stream usage on connection reuse Adjust newly defined be_reuse_connection() API. The stream argument is removed. This will allows checks to be able to invoke it without relying on a stream instance.	2025-04-02 14:57:40 +02:00
Amaury Denoyelle	ee94a6cfc1	MINOR: backend: extract conn reuse from connect_server() Following the previous patch, the part directly related to connection reuse is extracted from connect_server(). It is now define in a new function be_reuse_connection().	2025-04-02 14:57:40 +02:00
Amaury Denoyelle	c7cc6b6401	MINOR: backend: extract conn hash calculation from connect_server() On connection reuse, a hash is first calculated. It is generated from various connection parameters, to retrieve a matching connection. Extract hash calculation from connect_server() into a new dedicated function be_calculate_conn_hash(). The objective is to be able to perform connection reuse for checks, without connect_server() invokation which relies on a stream instance.	2025-04-02 14:57:40 +02:00
Willy Tarreau	4ec5509541	BUILD: compiler: undefine the CONCAT() macro if already defined As Ilya reported in issue #2911, the CONCAT() macro breaks on NetBSD which defines its own as __CONCAT() (which is exactly the same). Let's just undefine it before ours to fix the issue instead of renaming, but keep ours so that we don't have doubts about what we're running with. Note that the patch introducing this breaking change was backported to 3.0.	2025-04-02 11:36:43 +02:00
Ilia Shipitsin	78b849b839	CLEANUP: assorted typo fixes in the code and comments code, comments and doc actually.	2025-04-02 11:12:20 +02:00
Olivier Houchard	9fe72bba3c	MAJOR: leastconn; Revamp the way servers are ordered. For leastconn, servers used to just be stored in an ebtree. Each server would be one node. Change that so that nodes contain multiple mt_lists. Each list will contain servers that share the same key (typically meaning they have the same number of connections). Using mt_lists means that as long as tree elements already exist, moving a server from one tree element to another does no longer require the lbprm write lock. We use multiple mt_lists to reduce the contention when moving a server from one tree element to another. A list in the new element will be chosen randomly. We no longer remove a tree element as soon as they no longer contain any server. Instead, we keep a list of all elements, and when we need a new element, we look at that list only if it contains a number of elements already, otherwise we'll allocate a new one. Keeping nodes in the tree ensures that we very rarely have to take the lbrpm write lock (as it only happens when we're moving the server to a position for which no element is currently in the tree). The number of mt_lists used is defined as FWLC_NB_LISTS. The number of tree elements we want to keep is defined as FWLC_MIN_FREE_ENTRIES, both in defaults.h. The value used were picked afrer experimentation, and seems to be the best choice of performances vs memory usage. Doing that gives a good boost in performances when a lot of servers are used. With a configuration using 500 servers, before that patch, about 830000 requests per second could be processed, with that patch, about 1550000 requests per second are processed, on an 64-cores AMD, using 1200 concurrent connections.	2025-04-01 18:05:30 +02:00
Olivier Houchard	ba521a1d88	MINOR: threads: Add HA_RWLOCK_TRYRDTOWR() Add HA_RWLOCK_TRYRDTOWR(), that tries to upgrade a lock from reader to writer, and fails if any seeker or writer already holds it.	2025-04-01 18:05:30 +02:00
Olivier Houchard	2a9436f96b	MINOR: lbprm: Add method to deinit server and proxy Add two new methods to lbprm, server_deinit() and proxy_deinit(), in case something should be done at the lbprm level when removing servers and proxies.	2025-04-01 18:05:30 +02:00
Olivier Houchard	17059098e7	MINOR: mt_list: Implement mt_list_try_lock_prev(). Implement mt_list_try_lock_prev(), that does the same thing as mt_list_lock_prev(), exceot if the list is locked, it returns { NULL, NULL } instaed of waiting.	2025-04-01 18:05:30 +02:00
William Lallemand	fdcb97614c	MINOR: ssl/ckch: add substring parser for ckch_conf Add a substring parser for the ckch_conf keyword parser, this will split a string into multiple substring, and strdup them in a array.	2025-04-01 15:38:32 +02:00
William Lallemand	f8fe84caca	MINOR: jws: emit the JWK thumbprint jwk_thumbprint() is a function which is a function which implements RFC7368 and emits a JWK thumbprint using a EVP_PKEY. EVP_PKEY_EC_to_pub_jwk() and EVP_PKEY_RSA_to_pub_jwk() were changed in order to match what is required to emit a thumbprint (ie, no spaces or lines and the lexicographic order of the fields)	2025-04-01 11:57:55 +02:00
Willy Tarreau	1e9a2529aa	MINOR: cpu-topo: pass an extra argument to ha_cpu_policy This extra argument will allow common functions to distinguish between multiple policies. For now it's not used.	2025-03-31 16:21:37 +02:00
Willy Tarreau	571573874a	MINOR: cpu-set: add a new function to print cpu-sets in human-friendly mode The new function "print_cpu_set()" will print cpu sets in a human-friendly way, with commas and dashes for intervals. The goal is to keep them compact enough.	2025-03-31 16:21:37 +02:00
Willy Tarreau	3955f151b1	MINOR: cpu-set: compare two cpu sets with ha_cpuset_isequal() This function returns true if two CPU sets are equal.	2025-03-31 16:21:37 +02:00
Valentine Krasnobaeva	b303861469	MINOR: compiler: add __nonstring macro GCC 15 throws the following warning on fixed-size char arrays if they do not contain terminated NUL: src/tools.c:2041:25: error: initializer-string for array of 'char' truncates NUL terminator but destination lacks 'nonstring' attribute (17 chars into 16 available) [-Werror=unterminated-string-initialization] 2041 \| const char hextab[16] = "0123456789ABCDEF"; We are using a couple of such definitions for some constants. Converting them to flexible arrays, like: hextab[] = "0123456789ABCDEF" may have consequences, as enlarged arrays won't fit anymore where they were possibly located due to the memory alignement constraints. GCC adds 'nonstring' variable attribute for such char arrays, but clang and other compilers don't have it. Let's wrap 'nonstring' with our __nonstring macro, which will test if the compiler supports this attribute. This fixes the issue #2910.	2025-03-31 13:50:28 +02:00
Willy Tarreau	6b17310757	MEDIUM: pools: be a bit smarter when merging comparable size pools By default, pools of comparable sizes are merged together. However, the current algorithm is dumb: it rounds the requested size to the next multiple of 16 and compares the sizes like this. This results in many entries which are already multiples of 16 not being merged, for example 1024 and 1032 are separate, 65536 and 65540 are separate, 48 and 56 are separate (though 56 merges with 64). This commit changes this to consider not just the entry size but also the average entry size, that is, it compares the average size of all objects sharing the pool with the size of the object looking for a pool. If the object is not more than 1% bigger nor smaller than the current average size or if it neither 16 bytes smaller nor larger, then it can be merged. Also, it always respects exact matches in order to avoid merging objects into larger pools or worse, extending existing ones for no reason, and when there's a tie, it always avoids extending an existing pool. Also, we now visit all existing pools in order to spot the best one, we do not stop anymore at the smallest one large enough. Theoretically this could cost a bit of CPU but in practice it's O(N^2) with N quite small (typically in the order of 100) and the cost at each step is very low (compare a few integer values). But as a side effect, pools are no longer sorted by size, "show pools bysize" is needed for this. This causes the objects to be much better grouped together, accepting to use a little bit more sometimes to avoid fragmentation, without causing everyone to be merged into the same pool. Thanks to this we're now seeing 36 pools instead of 48 by default, with some very nice examples of compact grouping: - Pool qc_stream_r (80 bytes) : 13 users > qc_stream_r : size=72 flags=0x1 align=0 > quic_cstrea : size=80 flags=0x1 align=0 > qc_stream_a : size=64 flags=0x1 align=0 > hlua_esub : size=64 flags=0x1 align=0 > stconn : size=80 flags=0x1 align=0 > dns_query : size=64 flags=0x1 align=0 > vars : size=80 flags=0x1 align=0 > filter : size=64 flags=0x1 align=0 > session pri : size=64 flags=0x1 align=0 > fcgi_hdr_ru : size=72 flags=0x1 align=0 > fcgi_param_ : size=72 flags=0x1 align=0 > pendconn : size=80 flags=0x1 align=0 > capture : size=64 flags=0x1 align=0 - Pool h3s (56 bytes) : 17 users > h3s : size=56 flags=0x1 align=0 > qf_crypto : size=48 flags=0x1 align=0 > quic_tls_se : size=48 flags=0x1 align=0 > quic_arng : size=56 flags=0x1 align=0 > hlua_flt_ct : size=56 flags=0x1 align=0 > promex_metr : size=48 flags=0x1 align=0 > conn_hash_n : size=56 flags=0x1 align=0 > resolv_requ : size=48 flags=0x1 align=0 > mux_pt : size=40 flags=0x1 align=0 > comp_state : size=40 flags=0x1 align=0 > notificatio : size=48 flags=0x1 align=0 > tasklet : size=56 flags=0x1 align=0 > bwlim_state : size=48 flags=0x1 align=0 > xprt_handsh : size=48 flags=0x1 align=0 > email_alert : size=56 flags=0x1 align=0 > caphdr : size=41 flags=0x1 align=0 > caphdr : size=41 flags=0x1 align=0 - Pool quic_cids (32 bytes) : 13 users > quic_cids : size=16 flags=0x1 align=0 > quic_tls_ke : size=32 flags=0x1 align=0 > quic_tls_iv : size=12 flags=0x1 align=0 > cbuf : size=32 flags=0x1 align=0 > hlua_queuew : size=24 flags=0x1 align=0 > hlua_queue : size=24 flags=0x1 align=0 > promex_modu : size=24 flags=0x1 align=0 > cache_st : size=24 flags=0x1 align=0 > spoe_appctx : size=32 flags=0x1 align=0 > ehdl_sub_tc : size=32 flags=0x1 align=0 > fcgi_flt_ct : size=16 flags=0x1 align=0 > sig_handler : size=32 flags=0x1 align=0 > pipe : size=24 flags=0x1 align=0 - Pool quic_crypto (1032 bytes) : 2 users > quic_crypto : size=1032 flags=0x1 align=0 > requri : size=1024 flags=0x1 align=0 - Pool quic_conn_r (65544 bytes) : 2 users > quic_conn_r : size=65536 flags=0x1 align=0 > dns_msg_buf : size=65540 flags=0x1 align=0 On a very unscientific test consisting in sending 1 million H1 requests and 1 million H2 requests to the stats page, we're seeing an ~6% lower memory usage with the patch: before the patch: Total: 48 pools, 4120832 bytes allocated, 4120832 used (~3555680 by thread caches). after the patch: Total: 36 pools, 3880648 bytes allocated, 3880648 used (~3299064 by thread caches). This should be taken with care however since pools allocate and release in batches.	2025-03-25 18:01:01 +01:00
Pierre-Andre Savalle	8ed1e91efd	MEDIUM: lb-chash: add directive hash-preserve-affinity When using hash-based load balancing, requests are always assigned to the server corresponding to the hash bucket for the balancing key, without taking maxconn or maxqueue into account, unlike in other load balancing methods like 'first'. This adds a new backend directive that can be used to take maxconn and possibly maxqueue in that context. This can be used when hashing is desired to achieve cache locality, but sending requests to a different server is preferable to queuing for a long time or failing requests when the initial server is saturated. By default, affinity is preserved as was the case previously. When 'hash-preserve-affinity' is set to 'maxqueue', servers are considered successively in the order of the hash ring until a server that does not have a full queue is found. When 'maxconn' is set on a server, queueing cannot be disabled, as 'maxqueue=0' means unlimited. To support picking a different server when a server is at 'maxconn' irrespective of the queue, 'hash-preserve-affinity' can be set to 'maxconn'.	2025-03-25 18:01:01 +01:00
Amaury Denoyelle	cf9e40bd8a	MINOR: quic: define max-stream-data configuration as a ratio	2025-03-25 16:30:35 +01:00
Amaury Denoyelle	68c10d444d	MINOR: mux-quic: define config for max-data Define a new global configuration tune.quic.frontend.max-data. This allows users to explicitely set the value for the corresponding QUIC TP initial-max-data, with direct impact on haproxy memory consumption.	2025-03-25 16:30:09 +01:00
Amaury Denoyelle	a71007c088	MINOR: quic: move global tune options into quic_tune A new structure quic_tune has recently been defined. Its purpose is to store global options related to QUIC. Previously, only the tunable to toggle pacing was stored in it. This commit moves several QUIC related tunable from global to quic_tune structure. This better centralizes QUIC configuration option and gives room for future generic options.	2025-03-24 10:01:46 +01:00
Willy Tarreau	9091c5317f	MINOR: cli/pools: record the list of pool registrations even when merging them By default, create_pool() tries to merge similar pools into one. But when dealing with certain bugs, it's hard to say which ones were merged together. We do have the information at registration time, so let's just create a list of registrations ("pool_registration") attached to each pool, that will store that information. It can then be consulted on the CLI using "show pools detailed", where the names, sizes, alignment and flags are reported.	2025-03-21 17:09:30 +01:00
Aurelien DARRAGON	7ec6f4412c	MINOR: stats: add alt_name field to stat_col struct alt_name will be used by metric exporters to know how the metric should be presented to the user. If the alt_name is NULL, the metric should be ignored. For now only promex exporter will make use of this.	2025-03-21 17:04:54 +01:00
Olivier Houchard	98967aa09f	MEDIUM: mt_list: Reduce the max number of loops with exponential backoff Reduce the max number of loops in the mt_list code while waiting for a lock to be available with exponential backoff. It's been observed that the current value led to severe performances degradation at least on some hardware, hopefully this value will be acceptable everywhere.	2025-03-21 11:30:59 +01:00
Aurelien DARRAGON	af68343a56	MINOR: stats: use stat_col storage stat_cols_info Use stat_col storage for stat_cols_info[] array instead of name_desc. As documented in `65624876f` ("MINOR: stats: introduce a more expressive stat definition method"), stat_col supersedes name_desc storage but it remains backward compatible. Here we migrate to the new API to be able to further extend stat_cols_info[] in following patches.	2025-03-20 11:38:32 +01:00
Aurelien DARRAGON	9c60fc9fe1	MINOR: stats: STATS_PX_CAP___B_ macro STATS_PX_CAP___B_ points to STATS_PX_CAP_BE, it is just an alias for consistency, like STATS_PX_CAP____S which points to STATS_PX_CAP_SRV.	2025-03-20 11:37:47 +01:00
Aurelien DARRAGON	3c1b00b127	MINOR: stats: add .generic explicit field in stat_col struct Further extend logic implemented in `65624876` ("MINOR: stats: introduce a more expressive stat definition method") and `4e9e8418` ("MINOR: stats: prepare stats-file support for values other than FN_COUNTER"): we don't rely anymore on the presence of the capability to know if the metric is generic or not. This is because it prevents us from setting a capability on static statistics. Yet it could be useful to set the capability even on static metrics, thus we add a dedicated .generic bit to tell haproxy that the metric is generic and can be handled automatically by the API. Also, ME_NEW_* helpers are not explicitly associated to generic metric definition (as it was already the case before) to avoid ambiguities. It may change in the future as we may need to use the new definition method to define static metrics (without the generic bit set). But for now it isn't the case as this need definition was implemented for generic metrics support in the first place. If we want to define static metrics using the API, we could add a new set of helpers for instance.	2025-03-20 11:37:21 +01:00
William Lallemand	2fb6270910	MEDIUM: ssl/ckch: make the ckch_conf more generic The ckch_store_load_files() function makes specific processing for PARSE_TYPE_STR as if it was a type only used for paths. This patch changes a little bit the way it's done, PARSE_TYPE_STR is only meant to strdup() a string and stores the resulting pointer in the ckch_conf structure. Any processing regarding the path is now done in the callback. Since the callbacks were basically doing the same thing, they were transformed into the DECLARE_CKCH_CONF_LOAD() macros which allows to do some templating of these functions. The resulting ckch_conf_load_* functions will do the same as before, except they will also do the path processing instead of letting ckch_store_load_files() do it, which means we don't need the "base" member anymore in the struct ckch_conf_kws.	2025-03-19 18:08:40 +01:00
William Lallemand	b0ad777902	MINOR: tools: path_base() concatenates a path with a base path With the SSL configuration, crt-base, key-base are often used, these keywords concatenates the base path with the path when the path does not start by '/'. This is done at several places in the code, so a function to do this would be better to standardize the code.	2025-03-19 17:59:31 +01:00
William Lallemand	29b4b985c3	MINOR: jws: use jwt_alg type instead of a char This patch implements the function EVP_PKEY_to_jws_algo() which returns a jwt_alg compatible with the private key. This value can then be passed to jws_b64_protected() and jws_b64_signature() which modified to take an jwt_alg instead of a char.	2025-03-17 18:06:34 +01:00
William Lallemand	de67f25a7e	MINOR: jws: add new functions in jws.h Add signatures of jws_b64_payload(), jws_b64_protected(), jws_b64_signature(), jws_flattened() which allows to create a complete JWS flattened object.	2025-03-17 11:51:52 +01:00
Willy Tarreau	156430ceb6	MINOR: cpu-topo: add a CPU policy setting to the global section We'll need to let the user decide what's best for their workload, and in order to do this we'll have to provide tunable options. For that, we're introducing struct ha_cpu_policy which contains a name, a description and a function pointer. The purpose will be to use that function pointer to choose the best CPUs to use and now to set the number of threads and thread-groups, that will be called during the thread setup phase. The only supported policy for now is "none" which doesn't set/touch anything (i.e. all available CPUs are used).	2025-03-14 18:33:16 +01:00
Willy Tarreau	c93ee25054	MINOR: cpu-topo: add "only-node" and "drop-node" to cpu-set These are processed after the topology is detected, and they allow to restrict binding to or evict CPUs matching the indicated node(s).	2025-03-14 18:33:16 +01:00
Willy Tarreau	aa4776210b	MINOR: cpu-topo: create an array of the clusters The goal here is to keep an array of the known CPU clusters, because we'll use that often to decide of the performance of a cluster and its relevance compared to other ones. We'll store the number of CPUs in it, the total capacity etc. For the capacity, we count one unit per core, and 1/3 of it per extra SMT thread, since this is roughly what has been measured on modern CPUs. In order to ease debugging, they're also dumped with -dc.	2025-03-14 18:30:31 +01:00
Willy Tarreau	4a6eaf6c5e	MINOR: cpu-topo: add a function to sort by cluster+capacity The purpose here is to detect heterogenous clusters which are not properly reported, based on the exposed information about the cores capacity. The algorithm here consists in sorting CPUs by capacity within a cluster, and considering as equal all those which have 5% or less difference in capacity with the previous one. This allows large clusters of more than 5% total between extremities, while keeping apart those where the limit is more pronounced. This is quite common in embedded environments with big.little systems, as well as on some laptops.	2025-03-14 18:30:31 +01:00
Willy Tarreau	d169758fa9	MINOR: cpu-topo: make sure we don't leave unassigned IDs in the cpu_topo It's important that we don't leave unassigned IDs in the topology, because the selection mechanism is based on index-based masks, so an unassigned ID will never be kept. This is particularly visible on systems where we cannot access the CPU topology, the package id, node id and even thread id are set to -1, and all CPUs are evicted due to -1 not being set in the "only-cpu" sets. Here in new function "cpu_fixup_topology()", we assign them with the smallest unassigned value. This function will be used to assign IDs where missing in general.	2025-03-14 18:30:31 +01:00
Willy Tarreau	af648c7b58	MINOR: cpu-topo: assign clusters to cores without and renumber them Due to the previous commit we can end up with cores not assigned any cluster ID. For this, at the end we sort the CPUs by topology and assign cluster IDs to remaining CPUs based on pkg/node/llc. For example an 14900 now shows 5 clusters, one for the 8 p-cores, and 4 of 4 e-cores each. The local cluster numbers are per (node,pkg) ID so that any rule could easily be applied on them, but we also keep the global numbers that will help with thread group assignment. We still need to force to assign distinct cluster IDs to cores running on a different L3. For example the EPYC 74F3 is reported as having 8 different L3s (which is true) and only one cluster. Here we introduce a new function "cpu_compose_clusters()" that is called from the main init code just after cpu_detect_topology() so that it's not OS-dependent. It deals with this renumbering of all clusters in topology order, taking care of considering any distinct LLC as being on a distinct cluster.	2025-03-14 18:30:31 +01:00
Willy Tarreau	a4471ea56d	MINOR: cpu-topo: implement a CPU sorting mechanism by cluster ID This will be used to detect and fix incorrect setups which report the same cluster ID for multiple L3 instances. The arrangement of functions in this file is becoming a real problem. Maybe we should move all this to cpu_topo for example, and better distinguish OS-specific and generic code.	2025-03-14 18:30:31 +01:00
Willy Tarreau	a8acdbd9fd	MINOR: cpu-topo: implement a sorting mechanism by CPU locality Once we've kept only the CPUs we want, the next step will be to form groups and these ones are based on locality. Thus we'll have to sort by locality. For now the locality is only inferred by the index. No grouping is made at this point. For this we add the "cpu_reorder_by_locality" function with a locality-based comparison function.	2025-03-14 18:30:31 +01:00
Willy Tarreau	18133a054d	MINOR: cpu-topo: implement a sorting mechanism for CPU index CPU selection will be performed by sorting CPUs according to various criteria. For dumps however, that's really not convenient and we'll need to reorder the CPUs according to their index only. This is what the new function cpu_reorder_by_index() does. It's called in thread_detect_count() before dumping the CPU topology.	2025-03-14 18:30:31 +01:00
Willy Tarreau	1af4942c95	MEDIUM: thread: start to detect thread groups and threads min/max By mutually refining the thread count and group count, we can try to detect the most suitable setup for the current machine. Taskset is implicitly handled correctly. tgroups automatically adapt to the configured number of threads. cpu-map manages to limit tgroups to the smallest supported value. The thread-limit is enforced. Just like in cfgparse, if the thread count was forced to a higher value, it's reduced and a warning is emitted. But if it was not set, the thr_max value is bound to this limit so that further calculations respect it. We continue to default to the max number of available threads and 1 tgroup by default, with the limit. This normally allows to get rid of that test in check_config_validity().	2025-03-14 18:30:30 +01:00
Willy Tarreau	f0661e79fe	MINOR: global: add a command-line option to enable CPU binding debugging During development, everything related to CPU binding and the CPU topology is debugged using state dumps at various places, but it does make sense to have a real command line option so that this remains usable in production to help users figure why some CPUs are not used by default. Let's add "-dc" for this. Since the list of global.tune.options values is almost full and does not 100% match this option, let's add a new "tune.debug" field for this.	2025-03-14 18:30:30 +01:00
Willy Tarreau	ac1db9db7d	MINOR: thread: turn thread_cpu_mask_forced() into an init-time variable The function is not convenient because it doesn't allow us to undo the startup changes, and depending on where it's being used, we don't know whether the values read have already been altered (this is not the case right now but it's going to evolve). Let's just compute the status during cpu_detect_usable() and set a variable accordingly. This way we'll always read the init value, and if needed we can even afford to reset it. Also, placing it in cpu_topo.c limits cross-file dependencies (e.g. threads without affinity etc).	2025-03-14 18:30:30 +01:00
Willy Tarreau	7cb274439b	MINOR: cpu-topo: add CPU topology detection for linux This uses the publicly available information from /sys to figure the cache and package arrangements between logical CPUs and fill ha_cpu_topo[], as well as their SMT capabilities and relative capacity for those which expose this. The functions clearly have to be OS-specific.	2025-03-14 18:30:30 +01:00
Willy Tarreau	8f72ce335a	MINOR: cpu-topo: add detection of online CPUs on Linux This adds a generic function ha_cpuset_detect_online() which for now only supports linux via /sys. It fills a cpuset with the list of online CPUs that were detected (or returns a failure).	2025-03-14 18:30:30 +01:00
Willy Tarreau	8c524c7c9d	REORG: cpu-topo: move bound cpu detection from cpuset to cpu-topo The cpuset files are normally used only for cpu manipulations. It happens that the initial CPU binding detection was initially placed there since there was no better place, but in practice, being OS-specific, it should really be in cpu-topo. This simplifies cpuset which doesn't need to know about the OS anymore.	2025-03-14 18:30:30 +01:00
Willy Tarreau	a6fdc3eaf0	MINOR: cpu-topo: update CPU topology from excluded CPUs at boot Now before trying to resolve the thread assignment to groups, we detect which CPUs are not bound at boot so that we can mark them with HA_CPU_F_EXCLUDED. This will be useful to better know on which CPUs we can count later. Note that we purposely ignore cpu-map here as we don't know how threads and groups will map to cpu-map entries, hence which CPUs will really be used. It's important to proceed this way so that when we have no info we assume they're all available.	2025-03-14 18:30:30 +01:00
Willy Tarreau	bdb731172c	MINOR: cpu-topo: add a function to dump CPU topology The new function cpu_dump_topology() will centralize most debugging calls, and it can make efforts of not dumping some possibly irrelevant fields (e.g. non-existing cache levels).	2025-03-14 18:30:30 +01:00
Willy Tarreau	041462c4af	MINOR: cpu-topo: rely on _SC_NPROCESSORS_CONF to trim maxcpus We don't want to constantly deal with as many CPUs as a cpuset can hold, so let's first try to trim the value to what the system claims to support via _SC_NPROCESSORS_CONF. It is obviously still subject to the limit of the cpuset size though. The value is stored globally so that we can reuse it elsewhere after initialization.	2025-03-14 18:30:30 +01:00
Willy Tarreau	656cedad42	MINOR: cpu-topo: allocate and initialize the ha_cpu_topo array. This does the bare minimum to allocate and initialize a global ha_cpu_topo array for the number of supported CPUs and release it at deinit time.	2025-03-14 18:30:30 +01:00
Willy Tarreau	d165f5d3ab	MINOR: cpu-topo: add ha_cpu_topo definition This structure will be used to store information about each CPU's topology (package ID, L3 cache ID, NUMA node ID etc). This will be used in conjunction with CPU affinity setting to try to perform a mostly optimal binding between threads and CPU numbers by default. Since it was noticed during tests that absolutely none of the many machines tested reports different die numbers, the die_id is not stored. Also, it was found along experiments that the cluster ID will be used a lot, half of the time as a node-local identifier, and half of the time as a global identifier. So let's store the two versions at once (cl_gid, cl_lid). Some flags are added to indicate causes of exclusion (offline, excluded at boot, excluded by rules, ignored by policy).	2025-03-14 18:30:30 +01:00
Willy Tarreau	69ac4cd315	MINOR: compiler: add a new __decl_thread_var() macro to declare local variables __decl_thread() already exists but is more suited for struct members. When using it in a variables block, it appends the final trailing semi-colon which is a statement that ends the variable block. Better clean this up and have one precisely for variable blocks. In this case we can simply define an unused enum value that will consume the semi-colon. That's what the new macro __decl_thread_var() does.	2025-03-12 18:08:12 +01:00
Willy Tarreau	bb4addabb7	MINOR: compiler: add a simple macro to concatenate resolved strings It's often useful to be able to concatenate strings after resolving them (e.g. __FILE__, __LINE__ etc). Let's just have a CONCAT() macro to do that, which calls _CONCAT() with the same arguments to make sure the contents are resolved before being concatenated.	2025-03-12 18:06:55 +01:00
Aurelien DARRAGON	003fe530ae	MINOR: log: add "option host" log-forward option add only the parsing part, options are currently unused	2025-03-12 10:51:35 +01:00
Aurelien DARRAGON	47f14be9f3	MINOR: tools: only print address in sa2str() when port == -1 Support special value for port in sa2str: if port is equal to -1, only print the address without the port, also ignoring <map_ports> value.	2025-03-12 10:51:20 +01:00
Aurelien DARRAGON	bc76f6dde9	MINOR: log: migrate log-forward options from proxy->options2 to options3 Migrate recently added log-forward section options, currently stored under proxy->options2 to proxy->options3 since proxy->options2 is running out of space and we plan on adding more log-forward options.	2025-03-12 10:50:03 +01:00
Aurelien DARRAGON	cc5a66212d	MINOR: proxy: add proxy->options3 proxy->options2 is almost full, yet we will add new log-forward options in upcoming patches so we anticipate that by adding a new {no_}options3 and cfg_opts3[] to further extend proxy options	2025-03-12 10:49:36 +01:00
Amaury Denoyelle	dc7913d814	MAJOR: mux-quic: increase stream flow-control for multi-buffer alloc Support for multiple Rx buffers per QCS instance has been introduced by previous patches. However, due to flow-control initial values, client were still unable to fully used this to increase their upload throughput. This patch increases max-stream-data-bidi-remote flow-control initial values. A new define QMUX_STREAM_RX_BUF_FACTOR will fix the number of concurrent buffers allocable per QCS. It is set to 90. Note that connection flow-control initial value did not changed. It is still configured to be equivalent to bufsize multiplied by the maximum concurrent streams. This ensures that Rx buffers allocation is still constrained per connection, so that it won't be possible to have all active QCS instances using in parallel their maximum Rx buffers count.	2025-03-07 12:06:27 +01:00
Amaury Denoyelle	a4f31ffeeb	MINOR: mux-quic: store QCS Rx buf in a single-entry tree Convert QCS rx buffer pointer to a tree container. Additionnaly, offset field of qc_stream_rxbuf is thus transformed into a node tree. For now, only a single Rx buffer is stored at most in QCS tree. Multiple Rx buffers will be implemented in a future patch to improve QUIC clients upload throughput.	2025-03-07 12:06:26 +01:00
Amaury Denoyelle	cc3c2d1f12	MINOR: mux-quic: define rxbuf wrapper Define a new type qc_stream_rxbuf. This is used as a wrapper around QCS Rx buffer with encapsulation of the ncbuf storage. It is allocated via a new pool. Several functions are adapted to be able to deal with qc_stream_rxbuf as a wrapper instead of the previous plain ncbuf instance. No functional change should happen with this patch. For now, only a single qc_stream_rxbuf can be instantiated per QCS. However, this new type will be useful to implement multiple Rx buffer storage in a future commit.	2025-03-07 12:06:26 +01:00
Amaury Denoyelle	4b1e63d191	MINOR: mux-quic: define globally stream rxbuf size QCS uses ncbuf for STREAM data storage. This serves as a limit for maximum STREAM buffering capacity, advertised via QUIC transport parameters for initial flow-control values. Define a new function qmux_stream_rx_bufsz() which can be used to retrieve this Rx buffer size. This can be used both in MUX/H3 layers and in QUIC transport parameters.	2025-03-07 12:06:26 +01:00
Amaury Denoyelle	861b11334c	MINOR: h3/hq-interop: restore function for standalone FIN receive Previously, a function qcs_http_handle_standalone_fin() was implemented to handle a received standalone FIN, bypassing app_ops layer decoding. However, this was removed as app_ops layer interaction is necessary. For example, HTTP/3 checks that FIN is never sent on the control uni stream. This patch reintroduces qcs_http_handle_standalone_fin(), albeit in a slightly diminished version. Most importantly, it is now the responsibility of the app_ops layer itself to use it, to avoid the shortcoming described above. The main objective of this patch is to be able to support standalone FIN in HTTP/0.9 layer. This is easily done via the reintroduction of qcs_http_handle_standalone_fin() usage. This will be useful to perform testing, as standalone FIN is a corner case which can easily be broken.	2025-03-07 12:06:26 +01:00
Valentine Krasnobaeva	e900ef987e	BUG/MEIDUM: startup: return to initial cwd only after check_config_validity() In check_config_validity() we evaluate some sample fetch expressions (log-format, server rules, etc). These expressions may use external files like maps. If some particular 'default-path' was set in the global section before, it's no longer applied to resolve file pathes in check_config_validity(). parse_cfg() at the end of config parsing switches back to the initial cwd. This fixes the issue #2886. This patch should be backported in all stable versions since 2.4.0, including 2.4.0.	2025-03-06 10:49:48 +01:00
Roberto Moreda	f98b5c4f59	MINOR: log: add dont-parse-log and assume-rfc6587-ntf options This commit introduces the dont-parse-log option to disable log message parsing, allowing raw log data to be forwarded without modification. Also, it adds the assume-rfc6587-ntf option to frame log messages using only non-transparent framing as per RFC 6587. This avoids missparsing in certain cases (mainly with non RFC compliant messages). The documentation is updated to include details on the new options and their intended use cases. This feature was discussed in GH #2856	2025-03-06 09:30:39 +01:00
Aurelien DARRAGON	0746f6bde0	MINOR: cfgparse-listen: add and use cfg_parse_listen_match_option() helper cfg_parse_listen_match_option() takes cfg_opt array as parameter, as well current args, expected mode and cap bitfields. It is expected to be used under cfg_parse_listen() function or similar. Its goal is to remove code duplication around proxy->options and proxy->options2 handling, since the same checks are performed for the two. Also, this function could help to evaluate proxy options for mode-specific proxies such as log-forward section for instance: by giving the expected mode and capatiblity as input, the function would only match compatible options.	2025-03-06 09:30:18 +01:00
Aurelien DARRAGON	d9aa199100	MINOR: proxy: make pr_mode enum bitfield compatible Current pr_mode enum is a regular enum because a proxy only supports one mode at a time. However it can be handy for a function to be given a list of compatible modes for a proxy, and we can't do that using a bitfield because pr_mode is not bitfield compatible (values share the same bits). In this patch we manually define pr_mode values so that they are all using separate bits and allows a function to take a bitfield of compatible modes as parameter.	2025-03-06 09:30:11 +01:00
Olivier Houchard	335ef3264b	DEBUG: init: Add a macro to register unit tests Add a new macro, REGISTER_UNITTEST(), that will automatically make sure we call hap_register_unittest(), instead of having to create a function that will do so.	2025-03-04 18:18:10 +01:00
William Lallemand	a647839954	DEBUG: init: add a way to register functions for unit tests Doing unit tests with haproxy was always a bit difficult, some of the function you want to test would depend on the buffer or trash buffer initialisation of HAProxy, so building a separate main() for them is quite hard. This patch adds a way to register a function that can be called with the "-U" parameter on the command line, will be executed just after step_init_1() and will exit the process with its return value as an exit code. When using the -U option, every keywords after this option is passed to the callback and could be used as a parameter, letting the capability to handle complex arguments if required by the test. HAProxy need to be built with DEBUG_UNIT to activate this feature.	2025-03-03 12:43:32 +01:00
William Lallemand	4dc0ba233e	MINOR: jws: implement a JWK public key converter Implement a converter which takes an EVP_PKEY and converts it to a public JWK key. This is the first step of the JWS implementation. It supports both EC and RSA keys. Know to work with: - LibreSSL - AWS-LC - OpenSSL > 1.1.1	2025-03-03 12:43:32 +01:00
Willy Tarreau	730641f7ca	BUG/MINOR: server: check for either proxy-protocol v1 or v2 to send hedaer As reported in issue #2882, using "no-send-proxy-v2" on a server line does not properly disable the use of proxy-protocol if it was enabled in a default-server directive in combination with other PP options. The reason for this is that the sending of a proxy header is determined by a test on srv->pp_opts without any distinction, so disabling PPv2 while leaving other options results in a PPv1 header to be sent. Let's fix this by explicitly testing for the presence of either send-proxy or send-proxy-v2 when deciding to send a proxy header. This can be backported to all versions. Thanks to Andre Sencioles (@asenci) for reporting the issue and testing the fix.	2025-03-03 04:05:47 +01:00
Olivier Houchard	706b008429	MEDIUM: servers: Add strict-maxconn. Maxconn is a bit of a misnomer when it comes to servers, as it doesn't control the maximum number of connections we establish to a server, but the maximum number of simultaneous requests. So add "strict-maxconn", that will make it so we will never establish more connections than maxconn. It extends the meaning of the "restricted" setting of tune.takeover-other-tg-connections, as it will also attempt to get idle connections from other thread groups if strict-maxconn is set.	2025-02-26 13:00:18 +01:00
Olivier Houchard	8de8ed4f48	MEDIUM: connections: Allow taking over connections from other tgroups. Allow haproxy to take over idle connections from other thread groups than our own. To control that, add a new tunable, tune.takeover-other-tg-connections. It can have 3 values, "none", where we won't attempt to get connections from the other thread group (the default), "restricted", where we only will try to get idle connections from other thread groups when we're using reverse HTTP, and "full", where we always try to get connections from other thread groups. Unless there is a special need, it is advised to use "none" (or restricted if we're using reverse HTTP) as using connections from other thread groups may have a performance impact.	2025-02-26 13:00:18 +01:00
Olivier Houchard	c36aae2af1	MINOR: pollers: Add a fixup_tgid_takeover() method. Add a fixup_tgid_takeover() method to pollers for which it makes sense (epoll, kqueue and evport). That method can be called after a takeover of a fd from a different thread group, to make sure the poller's internal structure reflects the new state.	2025-02-26 13:00:18 +01:00
Olivier Houchard	c5cc09c00d	MINOR: fd: Add fd_lock_tgid_cur(). Add fd_lock_tgid_cur(), a function that will lock the tgid, without modifying its value.	2025-02-26 13:00:18 +01:00
Olivier Houchard	52b97ff8dd	MEDIUM: fd: Wait if locked in fd_grab_tgid() and fd_take_tgid(). Wait while the tgid is locked in fd_grab_tgid() and fd_take_tgid(). As that lock is barely used, it should have no impact.	2025-02-26 13:00:18 +01:00
Willy Tarreau	fb7874c286	MINOR: tinfo: split the signal handler report flags into 3 While signals are not recursive, one signal (e.g. wdt) may interrupt another one (e.g. debug). The problem this causes is that when leaving the inner handler, it removes the outer's flag, hence the protection that comes with it. Let's just have 3 distinct flags for regular signals, debug signal and watchdog signal. We add a 4th definition which is an aggregate of the 3 to ease testing.	2025-02-24 13:37:52 +01:00
Vincent Dechenaux	9011b3621b	MINOR: compression: Introduce minimum size This is the introduction of "minsize-req" and "minsize-res". These two options allow you to set the minimum payload size required for compression to be applied. This helps save CPU on both server and client sides when the payload does not need to be compressed.	2025-02-22 11:32:40 +01:00
Willy Tarreau	29e246a84c	MINOR: freq_ctr: provide non-blocking read functions Some code called by the debug handlers in the context of a signal handler accesses to some freq_ctr and occasionally ends up on a locked one from the same thread that is dumping it. Let's introduce a non-blocking version that at least allows to return even if the value is in the process of being updated, it's less problematic than hanging.	2025-02-21 18:26:29 +01:00
Willy Tarreau	ddd173355c	MINOR: tinfo: add a new thread flag to indicate a call from a sig handler Signal handlers must absolutely not change anything, but some long and complex call chains may look innocuous at first glance, yet result in some subtle write accesses (e.g. pools) that can conflict with a running thread being interrupted. Let's add a new thread flag TH_FL_IN_SIG_HANDLER that is only set when entering a signal handler and cleared when leaving them. Note, we're speaking about real signal handlers (synchronous ones), not deferred ones. This will allow some sensitive call places to act differently when detecting such a condition, and possibly even to place a few new BUG_ON().	2025-02-21 17:41:38 +01:00
Aurelien DARRAGON	9561b9fb69	BUG/MINOR: sink: add tempo between 2 connection attempts for sft servers When the connection for sink_forward_{oc}_applet fails or a previous one is destroyed, the sft->appctx is instantly released. However process_sink_forward_task(), which may run at any time, iterates over all known sfts and tries to create sessions for orphan ones. It means that instantly after sft->appctx is destroyed, a new one will be created, thus a new connection attempt will be made. It can be an issue with tcp log-servers or sink servers, because if the server is unavailable, process_sink_forward() will keep looping without any temporisation until the applet survives (ie: connection succeeds), which results in unexpected CPU usage on the threads responsible for that task. Instead, we add a tempo logic so that a delay of 1second is applied between two retries. Of course the initial attempt is not delayed. This could be backported to all stable versions.	2025-02-21 11:22:35 +01:00
Christopher Faulet	851e52b551	BUG/MEDIUM: spoe/mux-spop: Introduce an NOOP action to deal with empty ACK In the SPOP protocol, ACK frame with empty payload are allowed. However, in that case, because only the payload is transferred, there is no data to return to the SPOE applet. Only the end of input is reported. Thus the applet is never woken up. It means that the SPOE filter will be blocked during the processing timeout and will finally return an error. To workaournd this issue, a NOOP action is introduced with the value 0. It is only an internal action for now. It does not exist in the SPOP protocol. When an ACK frame with an empy payload is received, this noop action is transferred to the SPOE applet, instead of nothing. Thanks to this trick, the applet is properly notified. This works because unknown actions are ignored by the SPOE filter. This patch must be backported to 3.1.	2025-02-20 11:56:27 +01:00
Amaury Denoyelle	06e7674399	MINOR: mux-quic/h3: emit SETTINGS via MUX tasklet handler Previously, QUIC MUX application layer was installed and initialized via MUX init. However, the latter stage involve I/O operations, for example when using HTTP/3 with the emission of a SETTINGS frame. Change this to prevent any I/O operations during MUX init. As such, finalize app_ops callback is now called during the first invokation of qcc_io_send(), in the context of MUX tasklet. To implement this, a new application state value is added, to detect the transition from NULL to INIT stage.	2025-02-19 11:03:40 +01:00
Amaury Denoyelle	188fc45b95	MINOR: mux-quic: define a QCC application state member Introduce a new QCC field to track the current application layer state. For the moment, only INIT and SHUT state are defined. This allows to replace the older flag QC_CF_APP_SHUT. This commit does not bring major changes. It is only necessary to permit future evolutions on QUIC MUX. The only noticeable change is that QMUX traces can now display this new field.	2025-02-19 10:59:53 +01:00
William Lallemand	69163cd63e	MINOR: ssl/crtlist: split the ckch_conf loading from the crtlist line parsing ckch_conf loading is not that simple as it requires to check - if the cert already exists in the ckchs_tree - if the ckch_conf is compatible with an existing cert in ckchs_tree - if the cert is a bundle which need to load multiple ckch_store This logic could be reuse elsewhere, so this commit introduce the new crtlist_load_crt() function which does that.	2025-02-17 18:26:37 +01:00
Amaury Denoyelle	32691e7c25	MINOR: quic: support frame type as a varint QUIC frame type is encoded as a variable-length integer. Thus, 64-bit integer should be used for them. Currently, this was not the case as type was represented as a 1-byte char inside quic_frame structure. This does not cause any issue with QUIC from RFC9000, as all frame types fit in this range. Furthermore, a QUIC implementation is required to use the smallest size varint when encoding a frame type. However, the current code is unable to accept QUIC extension with bigger frame types. This is notably the case for quic-on-streams draft. Thus, this commit readjusts quic_frame architecture to be able to support higher frame type values. First, type field of quic_frame is changed to a 64-bits variable. Both encoding and decoding frame functions uses variable-length integer helpers to manipulate the frame type field. Secondly, the quic_frame builders/parsers infrastructure is still preserved. However, it could be impossible to define new large frame type as an index into quic_frame_builders / quic_frame_parsers arrays. Thus, wrapper functions are now provided to access the builders and parsers. Both qf_builder() and qf_parser() wrappers can then be extended to return custom builder/parser instances for larger frame type. Finally, unknown frame type detection also uses the new wrapper quic_frame_is_known(). As with builders/parsers, for large frame type, this function must be manually completed to support a new type value.	2025-02-14 09:00:05 +01:00
William Lallemand	7034f2ca48	MINOR: ssl: store the filenames resulting from a lookup in ckch_conf With this patch, files resulting from a lookup (.key, .ocsp, *.issuer etc) are now stored in the ckch_conf. It allows to see the original filename from where it was loaded in "show ssl cert <filename>"	2025-02-13 17:44:00 +01:00
Amaury Denoyelle	f96af8e463	MINOR: quic: refactor STREAM encoding and splitting CRYPTO and STREAM frames encoding is similar. If payload is too large, frame will be splitted and only the first payload part will be written in the output QUIC packet. This process is complexified by the presence of a variable-length integer Length field prior to the payload. This commit aims at refactor these operations. Define two functions to simplify the code : * quic_strm_frm_fillbuf() which is used to calculate the optimal frame length of a STREAM/CRYPTO frame with its payload in a buffer * quic_strm_frm_split() which is used to split the frame payload if buffer is too small With this patch, both functions are now implemented for STREAM encoding.	2025-02-12 15:10:03 +01:00
William Lallemand	4de86bbbfc	MEDIUM: initcall: allow to register mutiple post_section_parser per section Before this patch, REGISTER_CONFIG_SECTION() allowed to register one and only one callback (<post>) called after the parsing of a section. It was limitating because you couldn't register a post callback from anywhere else in the code. This patch introduces the new REGISTER_CONFIG_SECTION_POST() macros which allows to register a new post callback for a section keyword from anywhere. This patch introduces the feature by allowing `struct cfg_section` entries that does not have a `section_parser`, and then iterating on all cfg_section with a post_section_parser for a keyword.	2025-02-12 12:52:41 +01:00
Amaury Denoyelle	731340afbd	MINOR: quic: simplify length calculation for STREAM/CRYPTO frames STREAM and CRYPTO frames have a similar encoding format. In particular, both of them have a variable-length integer Length field just before the frame payload. It is complex to determine the optimal Length value before copying the payload data in the remaining buffer space. As such, helper functions were implemented to calculate this. However, CRYPTO and STREAM frames encoding implementation were not completely aligned, which renders the code harder to follow. The purpose of this commit is to simplify CRYPTO and STREAM frames encoding. First, a new helper quic_int_cap_length() is defined which is useful to determine the optimal buffer room available if prefixed by a variable-length integer as Length field. Then, processing of both CRYPTO and STREAM frames is now nearly identical, based on this new helper function. Functions max_available_room() and max_stream_data_size() are now unused and are removed.	2025-02-12 11:51:09 +01:00
Willy Tarreau	b6a8318cc2	MEDIUM: server: allocate a tasklet for asyncronous requeuing This creates a tasklet that only expects to be called when the LB algorithm is under contention when trying to reposition the server in its tree. Indeed, that's one of the operations that usually requires to take a write lock on a highly contended area, often for very little benefits under contention; indeed, under load, if a server keeps its previous position for a few extra microseconds, usually there's no harm. Thus this new tasklet can be woken up by the LB algo to ask the server to later call lbprm.server_requeue(). It does nothing else.	2025-02-11 17:24:09 +01:00
Willy Tarreau	20b8c4ddba	MINOR: lbprm: add a new callback ->server_requeue to the lbprm This callback will be used to reposition a server to its expected position regardless of the fact that it was taken or dropped. It will only be used by supporting LB algos. For now, only fwlc defines it and assigns it to fwlc_srv_reposition(). At the moment it's not used yet.	2025-02-11 17:16:14 +01:00
Willy Tarreau	eced1d6d8a	DEBUG: thread: reduce the struct lock_stat to store only 30 buckets Storing only 30 buckets means we only keep 256 bytes per label. This further simplifies address calculation and reduces the memory used without complicating the locking code. It means we won't measure wait times larger than a second but we're not supposed to face this as it would trigger the watchdog anyway. It may become a little bit just if measuring using rdtsc() instead of now_mono_time() though (typically the limit would be around 350ms for a 3 GHz CPU).	2025-02-10 18:34:43 +01:00
Willy Tarreau	c2f2d6fd3c	DEBUG: thread: make lock_stat per operation instead of for all operations It's more convenient (and more readable) to have the lock stats arranged by operation type (read, seek, write). It will also allow to later simplify the structure format and the bucket address calculation. Now lock_stat[] got split into lock_stats_rd[], lock_stats_sk[], lock_stats_wr[].	2025-02-10 18:34:43 +01:00
Willy Tarreau	4168d1278c	DEBUG: thread: don't keep the redundant _locked counter Now that we have our sums by bucket, the _locked counter is redundant since it's always equal to the sum of all entries. Let's just get rid of it and replace its consumption with a loop over all buckets, this will reduce the overhead of taking each lock at the expense of a tiny extra effort when dumping all locks, which we don't care about.	2025-02-10 18:34:43 +01:00
Willy Tarreau	a22550fbd7	DEBUG: thread: report the wait time buckets for lock classes In addition to the total/average wait time, we now also store the wait time in 2^N buckets. There are 32 buckets for each type (read, seek, write), allowing to store wait times from 1-2ns to 2.1-4.3s, which is quite sufficient, even if we'd want to switch from NS to CPU cycles in the future. The counters are only reported for non- zero buckets so as not to visually pollute the output. This significantly inflates the lock_stat struct, which is now aligned to 256 bytes and rounded up to 1kB. But that's not really a problem, given that there's only one per lock label.	2025-02-10 18:34:43 +01:00
Willy Tarreau	7ddcdff33f	BUG/MEDIUM: debug: close a possible race between thread dump and panic() The rework of the thread dumping mechanism in 2.8 with commit `9a6ecbd590` ("MEDIUM: debug: simplify the thread dump mechanism") opened a small race, which is that a thread in the process of dumping other ones may block the other one from panicing while it's looping at the end of ha_thread_dump_fill(), or any other sequence involving the currently dumped one. This was emphasized in 3.1 with commit `148eb5875f` ("DEBUG: wdt: better detect apparently locked up threads and warn about them") that allowed to emit warnings about long-stuck threads, because in this case, what happens is that sometimes a thread starts to emit a warning (or a set of warnings), and while the warning is being awaited for, a panic finally happens and interrupts either the dumping thread, which never finishes and waits for the target's pointer to become NULL which will never happen since it was supposed to do it itself, or the currently dumped thread which could wait for the dumping thread to become ready while this one has not released the former. In order to address this, first we now make sure never to dump a thread that is already in the process of dumping another one. We're adding a new thread flag to know this situation, that is set in ha_thread_dump_fill() and cleared in ha_thread_dump_done(). And similarly, we don't trigger the watchdog on a thread waiting for another one to finish its dump, as it's likely a case of warning (and maybe even a panic) that makes them wait for each other and we don't want such cases to be reentrant. Finally, we check in the main polling loop that the flag never accidentally leaked (e.g. wrong flag manipulation) as this would be difficult to spot with bad consequences. This should be backported at least to 2.8, and should resolve github issue #2860. Thanks to Chris Staite for the very informative backtrace that exhibited the problem.	2025-02-10 18:34:26 +01:00
Willy Tarreau	ae540e3d9c	Revert "IMPORT: plock: export the uninlined version of the lock wait function" This reverts commit `5496d06b2b`. It breaks the build on Windows which apparently doesn't support the weak attribute well on functions. It's not big deal anyway, playing with build options while debugging still works though it's less easy to use.	2025-02-07 19:51:15 +01:00
Willy Tarreau	b957e2f3ef	IMPORT: plock: use cpu_relax() for a shorter time in EBO Tests have shown that on modern CPUs it's interesting to wait a bit less in cpu_relax(). Till now we were looping down to 60 iterations and then switching to just barriers. Increasing the threshold to 90 iterations left before getting out of the loop improved the average and max time to grab a write lock by a few percent (e.g. 10% at 1us, 20% at 256ns or lower). Higher values tend to progressively lose that gain so let's stick to this one. This was measured on an EPYC 74F3 like previous measurements that initially led to this value, and the value might possibly depend on the mask applied to the loop counter. This is plock commit 74ca0a7307fa6aec3139f27d3b7e534e1bdb748e.	2025-02-07 18:04:29 +01:00
Willy Tarreau	253fba01a7	IMPORT: plock: lower the slope of the exponential back-off Along many tests involving both haproxy's scheduler and forwarded traffic, various exponents and algorithms were attempted for the EBO and their effects were measured. It was found that a growth in 1.25^N limited to 128k cycles consistently gives a better latency than 1.5^N limited to 256k cycles, without degrading general performance. The measures of the time to grab a write lock on a 48-thread EPYC show that the number of occurrences of low times was roughly multiplied by 2-3 while the number of occurrences of times above 64us was reduced by similar factors, to even reach 300 at 64us and limiting the maximum time by a factor of 4. The other variants that were experimented with are: m = ((m + (m >> 1)) + 2) & 0x3ffff; // original m = ((m + (m >> 1) + (m >> 3)) + 2) & 0x3ffff; m = ((m + (m >> 1) + (m >> 4)) + 2) & 0x3ffff; m = ((m + (m >> 1) + (m >> 4)) + 2) & 0x1ffff; m = ((m + (m >> 1) + (m >> 4)) + 1) & 0x1ffff; m = ((m + (m >> 2) + (m >> 4)) + 1) & 0x1ffff; // lowest CPU on pl_wr test + good perf m = ((m + (m >> 2)) + 1) & 0x1ffff; // even lower cpu usage, lowest max m = ((m + (m >> 1) + (m >> 2)) + 1) & 0x1ffff; // correct but slightly higher maxes m = ((m + (m >> 1) + (m >> 3)) + 1) & 0x1ffff; // less good than m+m>>2 m = ((m + (m >> 2) + (m >> 3)) + 1) & 0x1ffff; // better but not as good as m+m>>2 m = ((m + (m >> 3) + (m >> 4)) + 1) & 0x1ffff; // less good, lower rates on small coounts. m = ((m + (m >> 2) + (m >> 3) + (m >> 4)) + 1) & 0x1ffff; // less good as well m = ((m & 0x7fff) + (m >> 1) + (m >> 4)) + 2; m = ((m & 0xffff) + (m >> 1) + (m >> 4)) + 2; This is plock commit dddd9ee01c522da33c353e2e4d4fd743d8336ec3.	2025-02-07 18:04:29 +01:00
Willy Tarreau	9dd56da730	IMPORT: plock: give higher precedence to W than S It was noticed in haproxy that in certain extreme cases, a write lock subject to EBO may fail for a very long time in front of a large set of readers constantly trying to upgrade to the S state. The reason is that among many readers, one will succeed in its upgrade, and this situation can last for a very long time with many readers upgrading in turn, while the writer waits longer and longer before trying again. Here we're taking a reasonable approach which is that the write lock should have a higher precedence in its attempt to grab the lock. What is done is that instead of fully rolling back in case of conflict with a pure S lock, the writer will only release its read part in order to let the S upgrade to W if needed, and finish its operations. This guarantees no other seek/read/write can enter. Once the conflict is resolved, the writer grabs the read part again and waits for readers to be gone (in practice it could even return without waiting since we know that any possible wanderers would leave or even not be there at all, but it avoids a complicated loop code that wouldn't improve the practical situation but inflate the code). Thanks to this change, the maximum write lock latency on a 48 threads AMD with aheavily loaded scheduler went down from 256 to 64 ms, and the number of occurrences of 32ms or more was divided by 300, while all occurrences of 1ms or less were multiplied by up to 3 (3 for the 4-16ns cases). This is plock commit b6a28366d156812f59c91346edc2eab6374a5ebd.	2025-02-07 18:04:29 +01:00
Willy Tarreau	5496d06b2b	IMPORT: plock: export the uninlined version of the lock wait function The inlining of the lock waiting function was made more easily configurable with commit 7505c2e ("plock: always expose the inline version of the lock wait function"). However, the standard one remained static, but in order to resolve the symbols in "perf top", it's much better to export it, so let's move "static" with "inline" and leave it exported when PLOCK_INLINE_EBO is not set. This is plock commit 3bea7812ec705b9339bbb0ed482a2cd8aa6c185c.	2025-02-07 18:04:29 +01:00
Christopher Faulet	eb4e517489	CLEANUP: mux-spop: Remove useless comments Just a small cleanup to remove some comments added during the development of the mux.	2025-02-06 11:19:32 +01:00
Christopher Faulet	d16c534511	MINOR: mux-spop: Report EOI on the SE when a ACK is received for a stream The spop stream now reports the end of input when the ACK is transferred to the SPOE applet. To do so, the flag SPOP_SF_ACK_RCVD was added. It is set on the SPOP stream when its ACK is received by the SPOP connection. In addition when SPOP stream flags are propagated to the SE, the error is now reported if end of input was not reached instead of testing the connection error code. It is more accurate. This patch should be backported to 3.1.	2025-02-06 11:19:32 +01:00
Frederic Lecaille	85cb1cc7f4	BUILD: ssl: remove a boringssl definition defined by recent boringssl libs This is the case for AWS-LC which derives from boringssl, where X509_OBJECT_get0_X509_CRL() is already defined. There is definitively no more need to define this function to build haproxy against TLS libs derived from boringssl.	2025-02-06 10:48:25 +01:00
Aurelien DARRAGON	0846638f7f	MEDIUM: stream: interrupt costly rulesets after too many evaluations It is not rare to see configurations with a large number of "tcp-request content" or "http-request" rules for instance. A large number of rules combined with cpu-demanding actions (e.g.: actions that work on content) may create thread contention as all the rules from a given ruleset are evaluated under the same polling loop if the evaluation is not interrupted Thus, in this patch we add extra logic around "tcp-request content", "tcp-response content", "http-request" and "http-response" rulesets, so that when a certain number of rules are evaluated under the single polling loop, we force the evaluating function to yield. As such, the rule which was about to be evaluated is saved, and the function starts evaluating rules from the save pointer when it returns (in the next polling loop). We use task_wakeup(task, TASK_WOKEN_MSG) to explicitly wake the task so that no time is wasted and the processing is resumed ASAP. TASK_WOKEN_MSG is mandatory here because process_stream() expects TASK_WOKEN_MSG for explicit analyzers re-evaluation. rules_bcount stream's attribute was added to count how manu rules were evaluated since last interruption (yield). Also, SF_RULE_FYIELD flag was added to know that the s->current_rule was assigned due to forced yield and not regular yield. By default haproxy will enforce a yield every 50 rules, this behavior can be configured using the "tune.max-rules-at-once" global keyword. There is a limitation though: for now, if the ACT_OPT_FINAL flag is set on act_opts, we consider it is not safe to yield (as it is already the case for automatic yield). In this case instead of yielding an taking the risk of not being called back, we skip the yield and hope it will not create contention. This is something we should ideally try to improve in order to yield in all conditions.	2025-02-03 17:09:48 +01:00
Christopher Faulet	5f927f603a	BUG/MEDIUM: mux-fcgi: Properly handle read0 on partial records A Read0 event could be ignored by the FCGI multiplexer if it is blocked on a partial record. Instead of handling the event, it remained blocked, waiting for the end of the record. To fix the issue, the same solution than the H2 multiplexer is used. Two flags are introduced. The first one, FCGI_CF_END_REACHED, is used to acknowledge a read0. This flag is set when a read0 was received AND the FCGI multiplexer must handle it. The second one, FCGI_CF_DEM_SHORT_READ, is set when the demux is interrupted on a partial record. A short read and a read0 lead to set the FCGI_CF_END_REACHED flag. With these changes, the FCGI mux should be able to properly handle read0 on partial records. This patch should be backported to all stable versions after a period of observation.	2025-02-03 07:49:50 +01:00
Christopher Faulet	71320fc9c1	MINOR: tevt/connection: Add support for POLL_HUP/POLL_ERR events Connection errors can be detected via connect/recv/send syscall, but also because it was reported by the poller. So dedicated events, at the FD level, are introduced to make the difference. term_events tool was updated accordingly.	2025-01-31 10:41:50 +01:00
Christopher Faulet	990854ee0d	REORG: tevt/connection: Move enums at the end of the header file Enums used to report events were placed in the connection header for conveniance. But it is not specifically related to connection. So, they are moved at the end of the file to have a better isolation.	2025-01-31 10:41:50 +01:00
Christopher Faulet	487d6b09f1	MINOR: tevt: Improve function to convert a termination events log to string The function is now responsible to handle empty log because no event was reported. In that case, an empty string is returned. It is also responsible to handle case where termination events log is not supported for an given entity (for instance the quic mux for now). In that case, a dash ("-") is returned.	2025-01-31 10:41:50 +01:00
Christopher Faulet	cbd898c42b	MINOR: tevt: Don't duplicate termination event during reporting It is hard to never detect the same event several time without painful tests. In other words, the same termination event can be reported several time and this must be handled. To do so, "tevt_report_event" macro is updated to ignore an event if the last reported one is of the same type, for the same location. Of course, if the same event is reported several times at different moment, it will not be detected.	2025-01-31 10:41:50 +01:00
Christopher Faulet	2dc02f75b1	MEDIUM: tevt/stconn/stream: Add dedicated termination events for stream location If it is the last patch to introduce dedicated termination events for each location. In this one, events for the stream location are introcued. The old enum is also removed because it is now unused. Here, more accurate evets are added. The "intercepted" event was splitted.	2025-01-31 10:41:50 +01:00
Christopher Faulet	a58e650ad1	MEDIUM: tevt/muxes: Add dedicated termination events for muxc/se locations Termination events dedicated to mux connection and stream-endpoint descriptors are added in this patch. Specific events to these locations are thus added. Changes for the H1 and H2 multiplexers are reviewed to be more accurate.	2025-01-31 10:41:50 +01:00
Christopher Faulet	f2778ccc7d	MINOR: tevt/connection: Add dedicated termination events for lower locations To be able to add more accurate termination events for each location, the enum will be splitted by location. Indeed, there are at most 16 possbile events. It will be pretty confusing to use same termination events for the different locations. So the best is to split them. In this patch, the termination events for the fd, hs and xprt locations are introduced. For now some holes are added to keep similar events aligned across enums. But this may change in future.	2025-01-31 10:41:50 +01:00
Christopher Faulet	a4c281a190	MINOR: tevt/muxes: Add CTL and SCTL command to get the termination event logs MUX_CTL_TEVTS command is added to get the termination event logs of a mux connection and MUX_SCTL_TEVTS command to get the termination event logs of a mux stream.	2025-01-31 10:41:50 +01:00
Christopher Faulet	00a07c8b54	MINOR: tevt/stream/stconn: Report termination events for stream and sc In this patch, events for the stream location are reported. These events are first reported on the corresponding stream-connector. So front events on scf and back event on scb. Then all events are both merged in the stream. But only 4 events are saved on the stream. Several internal events are for now grouped with the type "tevt_type_intercepted". More events will be added to have a better resolution. But at least the place to report these events are identified. For now, when a event is reported on a SC, it is also reported on the stream and vice versa.	2025-01-31 10:41:50 +01:00
Christopher Faulet	992b4b9726	MINOR: tevt/stconn: Add a termination events log in the SE descriptor This termination events log will be used to report events from the mux streams. The location will be "tevt_loc_se" and the muxes will be responsible to report the corresponding events.	2025-01-31 10:41:50 +01:00
Christopher Faulet	e944944990	MINOR: tevt: Add the termination events log's fundations Termination events logs will be used to report the events that led to close a connection. Unlike flags, that reflect a state, the idea here is to store a log to preserve the order of the events. Most of time, when debugging an issue, the order of the events is crucial to be able to understand the root cause of the issue. The traces are trully heplful to do so. But it is not always possible to active them because it is pretty verbose. On heavily loaded platforms, it is not acceptable. We hope that the termination events logs will help us in that situations. One termination events log will be be store at each layer (connection, mux connection, mux stream...) as a 32-bits integer. Each event will be store on 8 bits, 4 bits for the location and 4 bits for the type. So the first four events will be stored only for each layer. It should be enough why a connection is closed. In this patch, the enums defining the termination event locations and types are added. The macro to report a new event is also added and a function to convert a termination events log to a string that could be display in log messages for instance.	2025-01-31 10:41:49 +01:00
Christopher Faulet	e56e718c82	MINOR: mux-h1: Add masks to group H1S DEMUX and MUX errors It is just a small patch to clean up mux/demux functions. Instead of listing the H1S errors that must be handled during demux of mux operations, masks of flags are used. It is more readable.	2025-01-31 10:41:49 +01:00
Willy Tarreau	d155924efe	MINOR: fd: add a generation number to file descriptors This patch adds a counter of close() on file descriptors in the fdtab. The goal is to better detect if reported events concern the current or a previous file descriptor. For now the counter is only added, and is showed in "show fd" as "gen". We're reusing unused space at the end of the struct. If it's needed for something more important later, this patch can be reverted.	2025-01-30 19:45:34 +01:00
Willy Tarreau	44ac7a7e73	DEBUG: fd: add a counter of takeovers of an FD since it was last opened That's essentially in order to help with debugging strange cases like the occasional epoll issues/races, by keeping a counter of how many times an FD was taken over since last inserted. The room is available so let's use it. If it's needed later, this patch can easily be reverted. The counter is also reported in "show fd" as "tkov".	2025-01-30 19:45:34 +01:00
Amaury Denoyelle	b849ee5fa3	BUILD: quic: fix overflow in global tune A new global option was recently introduced to disable pacing. However, the value used (1<<31) caused issue with some compiler as options field used for storage is declared as int. Move pacing deactivation flag outside into the newly defined quic_tune to fix this. This should be backported up to 3.1 after a period of observation. Note that it relied on the previous patch which defined new quic_tune type.	2025-01-30 18:12:53 +01:00
Amaury Denoyelle	09e9c7d5b7	MINOR: quic: define quic_tune Define a new structure quic_tune. It will be useful to regroup various configuration settings and tunable related to QUIC, instead of defining them into the global structure.	2025-01-30 18:12:40 +01:00
Amaury Denoyelle	0c8b54b2d1	MINOR: quic: transform pacing settings into a global option Pacing support was previously activated on each bind line individually, via an optional argument of quic-cc-algo keyword. Remove this optional argument and introduce a global setting to enable/disable pacing. Pacing activation is still flagged as experimental. One important change is that previously BBR usage automatically activated pacing support. This is not the case anymore, so users should now always explicitely activate pacing if BBR is selected. A new warning message will be displayed if this is not the case. Another consequence of this change is that now pacing_inter callback is always defined for every quic_cc_algo types. As such, QUIC MUX uses global.tune.options to determine if pacing is required. This should be backported up to 3.1, after a period of observation.	2025-01-30 17:19:38 +01:00
William Lallemand	b43e5d8c16	BUILD: ssl: more cleaner approach to WolfSSL without renegotiation Patch discussed in https://github.com/wolfSSL/wolfssl/issues/6834 When building Wolfssl without renegotiation options, WolfSSL still defines the macros about it, which warns during the build. This patch completes the previous one by undefining the macros so haproxy could build without any warning.	2025-01-28 20:55:20 +01:00
William Lallemand	c6a8279cdf	BUILD: ssl: allow to build without the renegotiation API of WolfSSL In ticket https://github.com/wolfSSL/wolfssl/issues/6834, it was suggested to push --enable-haproxy within --enable-distro. WolfSSL does not want to include the renegotiation support in --enable-distro. To achieve this, let haproxy build without SSL_renegotiate_pending() when wolfssl does not define HAVE_SECURE_RENEGOCIATION or HAVE_SERVER_RENEGOCIATION_INFO.	2025-01-28 18:31:32 +01:00
Willy Tarreau	f17b0a994b	BUILD: tools: fix build on BSD by dropping the ETIME check Commit `44537379fc` ("MINOR: tools: add errname to print errno macro name") brought a facility to report errno using a symbolic string when known instead of showing only the value. However, among the listed options, ETIME is mentioned but is unknown from FreeBSD where it breaks the build. Let's simply drop it, we don't use ETIME anyway and even if it would be reported, the default code path still reports the numeric value so there's no harm. If other ones fail to build in the future, they could be handled the same way.	2025-01-28 15:58:57 +01:00
Christopher Faulet	36d151dc10	MEDIUM: stream: No longer use TASK_F_UEVT* to shut a stream down Thanks to the previous patch, it is now possible to explicitly rely on stream's events to shut it down. The right event is set in stream_shutdown(), before waking up the stream, via an atomic operation. In process_stream(), this event will be handled as expected. Thus, TASK_F_UEVT* are no longer used, but not removed since still usable for other tasks. This patch depends on "MEDIUM: stream: Map task wake up reasons to dedicated stream events".	2025-01-28 14:53:37 +01:00
Christopher Faulet	6048460102	MEDIUM: stream: Map task wake up reasons to dedicated stream events To fix thread-safety issues when a stream must be shut, three new task states were added. These states are generic (UEVT1, UEVT2 and UEVT3), the task callback function is responsible to know what to do with them. However, it is not really scalable. The best is to use an atomic field in the stream structure itself to deal with these dedicated events. There is already the "pending_events" field that save wake up reasons (TASK_WOKEN_) to not loose them if process_stream() is interrupted before it had a chance to handle them. So the idea is to introduce a new field to handle streams dedicated events and merged them with the task's wake up reasons used by the stream. This means a mapping must be performed between some task wake up reasons and streams events. Note that not all task wake up reasons will be mapped. In this patch, the "new_events" field is introduced. It is an atomic bit-field. Streams events (STRM_EVT_) are also introduced to map the task wake up reasons used by process_stream(). Only TASK_WOKEN_TIMER and TASK_WOKEN_MSG are mapped, in addition to TASK_F_UEVT* flags. In process_stream(), "pending_events" field is now filled with new stream events and the mapping of the wake up reasons.	2025-01-28 14:53:37 +01:00
Christopher Faulet	0a52a75ef7	BUG/MINOR: stream: Properly handle "on-marked-up shutdown-backup-sessions" shutdown-backup-sessions action for on-marked-up directive does not work anymore since the stream_shutdown() function was modified to be async-safe. When stream_shutdown() was modified to be async-safe, dedicated task events were added to map the reasons to shut a stream down. SF_ERR_DOWN was mapped to TASK_F_EVT1 and SF_ERR_KILLED was mapped to TASK_F_EVT2. The reverse mapping was performed by process_stream() to shut the stream with the appropriate reason. However, SF_ERR_UP reason, used by shutdown-backup-sessions action to shut a stream down because a preferred server became available, was not mapped in the same way. So since commit `b8e3b0a18d` ("BUG/MEDIUM: stream: make stream_shutdown() async-safe"), this action is ignored and does not work anymore. To fix an issue, and being able to bakcport the fix, a third task event was added. TASK_F_EVT3 is now mapped on SF_ERR_UP. This patch should fix the issue #2848. It must be backported as far as 2.6.	2025-01-28 14:53:37 +01:00
Olivier Houchard	26b3e5236f	MEDIUM: servers/proxies: Switch to using per-tgroup queues. For both servers and proxies, use one connection queue per thread-group, instead of only one. Having only one can lead to severe performance issues on NUMA machines, it is actually trivial to get the watchdog to trigger on an AMD machine, having a server with a maxconn of 96, and an injector that uses 160 concurrent connections. We now have one queue per thread-group, however when dequeueing, we're dequeuing MAX_SELF_USE_QUEUE (currently 9) pendconns from our own queue, before dequeueing one from another thread group, if available, to make sure everybody is still running.	2025-01-28 12:49:41 +01:00
Olivier Houchard	583303c48b	MINOR: proxies/servers: Calculate queueslength and use it. For both proxies and servers, properly calculates queueslength, which is the total number of element in each queues (as they currently are only using one queue, it is equivalent to the number of element of that queue), and use it instead of the queue's length.	2025-01-28 12:49:41 +01:00
Olivier Houchard	59eddabe16	MINOR: Add fields to the per-thread group field in struct server. Add a per-thread group queue and associated fields in per-thread group field in struct server, as well as a new field, queues length. This is currently unused, so should change nothing.	2025-01-28 12:49:41 +01:00
Olivier Houchard	f879b9a18a	MINOR: proxies: Add a per-thread group field to struct proxy. Add a per-thread group field to struct proxy, that will contain a struct queue, as well as a new field, "queueslength". This is currently unused, so should change nothing. Please note that proxy_init_per_thr() must now be called for each proxy once the thread groups number is known.	2025-01-28 12:49:41 +01:00
Aurelien DARRAGON	e768a531b7	CLEANUP: tree-wide: define and use acl_match_cond() helper acl_match_cond() combines acl_exec_cond() + acl_pass() and a check on the condition->pol (to check if the cond is inverted) in order to return either 0 if the cond doesn't match or 1 if it matches (or NULL). Thanks to this we can actually simplify some redundant constructs that iterate over rules and evaluate if the condition matches or not. Conditions for tcp-request inspect-content and tcp-response inspect-content couldn't be simplified because they perform an extra check for missing data, and thus still need to leverage acl_exec_cond() It's best to display the patch using "-w", like "git show xxxx -w", because some blocks had to be re-indented after the cleanup, which makes the patch hard to review by default.	2025-01-27 11:11:43 +01:00
Valentine Krasnobaeva	94d3b7375a	CLEANUP: ssl: move ssl_sock_gencert_load_ca declaration in ssl_gencert.h As ssl_sock_gencert_load_ca and ssl_sock_gencert_free_ca are compiled only if SSL_NO_GENERATE_CERTIFICATES is not defined, let's align it and move these declarations in ssl_gencert.h.	2025-01-24 12:31:07 +01:00
Valentine Krasnobaeva	846819b316	CLEANUP: ssl: rename ssl_sock_load_ca to ssl_sock_gencert_load_ca ssl_sock_load_ca is defined in ssl_gencert.c and compiled only if SSL_NO_GENERATE_CERTIFICATES is not defined. It's name is a bit confusing, as we may think at the first glance, that it's a generic function, which is also used to load CA file, provided via 'ca-file' keyword. ssl_set_verify_locations_file is used in this case. So let's rename ssl_sock_load_ca into ssl_sock_gencert_load_ca. Same is applied to ssl_sock_free_ca.	2025-01-24 12:31:07 +01:00
Valentine Krasnobaeva	44537379fc	MINOR: tools: add errname to print errno macro name Add helper to print the name of errno's corresponding macro, for example "EINVAL" for errno=22. This may be helpful for debugging and for using in some CLI commands output. The switch-case in errname() contains only the errnos currently used in the code. So, it needs to be extended, if one starts to use new syscalls.	2025-01-24 09:54:57 +01:00
Amaury Denoyelle	7896edccdc	MINOR: quic: remove unused pacing burst in bind_conf/quic_cc_path Pacing burst size is now dynamic. As such, configuration value has been removed and related fields in bind_conf and quic_cc_path structures can be safely removed. This should be backported up to 3.1.	2025-01-23 17:40:48 +01:00
Amaury Denoyelle	cb91ccd8a8	MEDIUM: quic: use dynamic credit for pacing Major improvements have been introduced in pacing recently. Most notably, QMUX schedules emission on a millisecond resolution, which allow to use passive wait to be much CPU friendly. However, an issue remains with the pacing max credit. Unless BBR is used, it is fixed to the configured value from quic-cc-algo bind statement. This is not practical as if too low, it may drastically reduce performance due to 1ms sleep resolution. If too high, some clients will suffer from too much packet loss. This commit fixes the issue by implementing a dynamic maximum credit value based on the network condition specific to each clients. Calculation is done to fix a maximum value which should allow QMUX current tasklet context to emit enough data to cover the delay with the next tasklet invokation. As such, avg_loop_us is used to detect the process load. If too small, 1.5ms is used as minimal value, to cover the extra delay incurred by the system which will happen for a default 1ms sleep. This should be backported up to 3.1.	2025-01-23 17:40:48 +01:00
Amaury Denoyelle	8098be1fdc	MEDIUM: mux-quic: reduce pacing CPU usage with passive wait Pacing algorithm has been revamped in the previous commit to implement a credit based solution. This is a far more adaptative solution, in particular which allow to catch up in case pause between pacing emission was longer than expected. This allows QMUX to remove the active loop based on tasklet wake-up. Instead, a new task is used when emission should be paced. The main advantage is that CPU usage is drastically reduced. New pacing task timer is reset each time qcc_io_send() is invoked. Timer will be set only if pacing engine reports that emission must be interrupted. In this case timer is set via qcc_wakeup_pacing() to the delay reported by congestion algorithm, or 1ms if delay is too short. At the end of qcc_io_cb(), pacing task is queued if timer has been set. Pacing task execution is simple enough : it immediately wakes up QCC I/O handler. Note that to have decent performance, it requires to have a large enough burst defined in configuration of quic-cc-algo. However, this value is common to every listener clients, which may cause too much loss under network conditions. This will be address in a future patch. This should be backported up to 3.1.	2025-01-23 17:40:22 +01:00
Amaury Denoyelle	4489a61585	MEDIUM: quic: implement credit based pacing Implement a new method for QUIC pacing emission based on credit. This represents the number of packets which can be emitted in a single burst. After emission, decrement from the credit the number of emitted packets. Several emission can be conducted in the same sequence until the credit is completely decremented. When a new emission sequence is initiated (i.e. under a new QMUX tasklet invokation), credit is refilled according to the delay which occured between the last and current emission context. This new mechanism main advantage is that it allows to conduct several emission in the same task context without having to wait between each invokation. Wait is only forced if pacing is expired, which is now equivalent to having a null credit. Furthermore, if delay between two emissions sequence would have been smaller than expected, credit is only partially refilled. This allows to restart emission without having to wait for the whole credit to be available. On the implementation side, a new field <credit> is avaiable in quic_pacer structure. It is automatically decremented on quic_pacing_sent_done() invokation. Also, a new function quic_pacing_reload() must be used by QUIC MUX when a new emission sequence is initiated to refill credit. <next> field from quic_pacer has been removed. For the moment, credit is based on the burst configured via quic-cc-algo keyword, or directly reported by BBR. This should be backported up to 3.1.	2025-01-23 17:40:20 +01:00
Amaury Denoyelle	bbaa7aef7b	BUG/MINOR: quic: do not increase congestion window if app limited Previously, congestion window was increased any time each time a new acknowledge was received. However, it did not take into account the window filling level. In a network condition with negligible loss, this will cause the window to be incremented until the maximum value (by default 480k), even though the application does not have enough data to fill it. In most cases, this issue is not noticeable. However, it may lead to excessive memory consumption when a QUIC connection is suddendly interrupted, as in this case haproxy will fill the window with retransmission. It even has caused OOM crash when thousands of clients were interrupted at once on a local network benchmark. Fix this by first checking window level prior to every incrementation via a new helper function quic_cwnd_may_increase(). It was arbitrarily decided that the window must be at least 50% full when the ACK is handled prior to increment it. This value is a good compromise to keep window in check while still allowing fast increment when needed. Note that this patch only concerns cubic and newreno algorithm. BBR has already its notion of application limited which ensures the window is only incremented when necessary. This should be backported up to 2.6.	2025-01-23 14:49:35 +01:00
Amaury Denoyelle	7c0820892f	MINOR: quic: rename pacing_rate cb to pacing_inter Rename one of the congestion algorithms pacing callback from pacing_rate to pacing_inter. This better reflects that this function returns a delay (in nanoseconds) which should be applied between each packet emission to fill the congestion window with a perfectly smoothed emission. This should be backported up to 3.1.	2025-01-23 14:49:35 +01:00
Amaury Denoyelle	2178bf1192	CLEANUP: quic: remove unused prototype Remove undefined quic_pacing_send() function prototype from quic_pacing module. This should be backported up to 3.1.	2025-01-23 14:49:35 +01:00
Frederic Lecaille	4f38c4bfd8	MINOR: quic: Add a BUG_ON() on quic_tx_packet refcount This is definitively a bug to call quic_tx_packet_refdec() to decrement the reference counter of a TX packet calling quic_tx_packet_refdec(), and possibly to release its memory when it is negative or null. This counter is incremented when a TX frm is attached to it with some allocated memory and when the packet is inserted into a data structure, if needed (list or tree). Should be easily backported as far as 2.6 to ease any further backport around this code part.	2025-01-21 22:01:34 +01:00
Frederic Lecaille	cb729fb64d	BUG/MINOR: quic: ensure a detached coalesced packet can't access its neighbours Reset ->prev and ->next fields of a coalesced TX packet to ensure it cannot access several times its neighbours after it is supposed to be detached from them calling quic_tx_packet_dgram_detach(). There are two cases where a packet can be coalesced to another previous built one: this is when it is built into the same datagrame without GSO (and flagged flag with QUIC_FL_TX_PACKET_COALESCED) or when sent from the same sendto() syscall with GOS (not flagged with QUIC_FL_TX_PACKET_COALESCED). This fix may be in relation with GH #2839. Must be backported as far as 2.6.	2025-01-21 22:01:34 +01:00
Willy Tarreau	b066c0affb	REORG: version: move the remaining BUILD_* stuff from haproxy.c to version.c version.c tries to centralize all variables conveying version information, but there's still an issue with the BUILD_* variables which are only passed to haproxy.o and are only updated when that one is rebuilt. This is not very logical given that we can end up with values there which contradict info from version.c. Better move all of these to version.c which is systematically rebuilt. Most of these variables only end up as string concatenation at the moment. Some of them are even duplicated. In version.c we now have one variable (or constant) for each of them and haproxy.c references them in messages. This is much more logical and easier to maintain in a consistent state. The patch looks a bit large but it really only moves the ifdefed string assignment from one file to another, placing them into variables.	2025-01-20 17:53:55 +01:00
Amaury Denoyelle	a50dd07c16	MINOR: trace: ensure -dt priority over traces config section Traces can be activated on startup either via -dt command line argument or via the traces configuration section. This can caused confusion as it may not be clear as trace source can be completed or overriden by one or the other. Fix the precedence to give the priority to the command line argument. Now, each trace source configured via -dt is first resetted to a default state before applying new settings. Then, it is impossible to change a trace source via the configuration file if it was already targetted via -dt argument.	2025-01-10 14:50:59 +01:00
Willy Tarreau	b25850f25b	MINOR: tools: add a few functions to simply check for a file's existence At many places we'd like to be able to simply construct a path from a format string and check if that path corresponds to an existing file, directory etc. Here we add 3 functions, a generic one to test that a path corresponds to a given file mode (e.g. S_IFDIR, S_IFREG etc), and two other ones specifically checking for a file or a dir for easier use.	2025-01-09 09:18:49 +01:00
Willy Tarreau	bd06502b22	BUILD: makefile: add a qinfo macro to pass info in quiet mode Some commands such as $(cmd_CC) etc already handle the quiet vs verbose mode in the makefile, but sometimes we may want to pass other info. The new "qinfo" macro can be called with a 9-char string argument (spaces included) as a prefix for some commands, to emit that string when in quiet mode. The caller must fill the spaces needed for alignment. E.g: $(call quinfo, CC )$(CC) ...	2025-01-08 11:26:05 +01:00
Amaury Denoyelle	af00be8e0f	MINOR: mux-quic: change return value of qcs_attach_sc() A recent fix was introduced to ensure that a streamdesc instance won't be attached to an already completed QCS which is eligible to purging. This was performed by skipping application protocol decoding if a QCS is in such a state. Here is the patch responsible for this change. `caf60ac696` BUG/MEDIUM: mux-quic: do not attach on already closed stream However, this is too restrictive, in particular for unidirection stream where no streamdesc is never attached. To fix this behavior, first qcs_attach_sc() API has been modified. Instead of returning a streamdesc instance, it returns either 0 on success or a negative error code. There should be no functional changes with this patch. It is only to be able to extend qcs_attach_sc() with the possibility of skipping streamdesc instantiation while still keeping a success return value. This should be backported wherever the above patch has been merged. For the record, it was scheduled for immediate backport on 3.1, plus merging on older releases up to 2.8 after a period of observation.	2025-01-03 17:19:21 +01:00
Willy Tarreau	f486f976c7	BUILD: limits: make normalize_rlim() take an rlim_t to fix build on m68k As can be seen here, the build fails on m68k since commit `665dde648` ("MINOR: debug: use LIM2A to show limits") in 3.1: https://github.com/haproxy/haproxy/actions/runs/12440234399/job/34735360177 The reason is the comparison between a ulong limit and RLIM_INFINITY. Indeed, on m68k, rlim_t is an unsigned long long. Let's just change the function's input type to take an rlim_t instead. This also allows to get rid of the casts in the call place. This can be backported to 3.1 though it's not important given the low prevalence of this platform for such use cases.	2024-12-25 12:33:06 +01:00
Willy Tarreau	f78121dd32	BUILD: compat: add missing fcntl.h before defining F_SETPIPE_SZ n 1.5-dev8, 13 years ago, support for setting pipe size was added by commit `bd9a0a778` ("OPTIM/MINOR: make it possible to change pipe size (tune.pipesize)"). For compatibility purposes, it was defining F_SETPIPE_SZ in compat.h if it was not set. It apparently always had F_SETPIPE_SZ defined before being included. Now in 3.2-dev1, commit `fbc534a6f` ("REORG: startup: move nofile limit checks in limits.c") reordered a few includes and ended up with mworker-prog.c including compat.h before fcntl.h, causing a redefinition error on certain libcs: CC src/mworker-prog.o In file included from /usr/include/bits/fcntl.h:61:0, from /usr/include/fcntl.h:35, from include/haproxy/limits.h:11, from include/haproxy/mworker.h:18, from src/mworker-prog.c:27: /usr/include/bits/fcntl-linux.h:203:0: warning: "F_SETPIPE_SZ" redefined [enabled by default] In file included from include/haproxy/api-t.h:35:0, from include/haproxy/api.h:33, from src/mworker-prog.c:23: include/haproxy/compat.h:161:0: note: this is the location of the previous definition Let's simply include fcntl.h in compat.h before the macro is redefined. There's normally no need to backport this, though it's harmless to do it if needed.	2024-12-25 11:53:11 +01:00
Olivier Houchard	505480eeef	CLEANUP: Remove pendconn_must_try_again(). Remove pendconn_must_try_again(), now that it no longer is used.	2024-12-24 14:10:06 +01:00
Olivier Houchard	cda7275ef5	MEDIUM: queue: Handle the race condition between queue and dequeue differently There is a small race condition, where a server would check if there is something left in the proxy queue, and adding something to the proxy queue. If the server checks just before the stream is added to the queue, and it no longer has any stream to deal with, then nothing will take care of the stream, that may stay in the queue forever. This was worked around with commit `5541d4995d`, by checking for that exact condition after adding the stream to the queue, and trying again to get a server assigned if it is detected. That fix lead to multiple infinite loops, that got fixed, but it is not unlikely that it could happen again. So let's fix the initial problem differently : a single server may mark itself as ready, and it removes itself once used. The principle is that when we discover that the just queued stream is alone with no active request anywhere ot dequeue it, instead of rebalancing it, it will be assigned to that current "ready" server that is available to handle it. The extra cost of the atomic ops is negligible since the situation is super rare.	2024-12-24 14:10:06 +01:00
Olivier Houchard	5b8899b6cc	BUG/MEDIUM: queue: Make process_srv_queue return the number of streams Make process_srv_queue() return the number of streams unqueued, as pendconn_grab_from_px() did, as that number is used by srv_update_status() to generate logs. This should be backported up to 2.6 with `111ea83ed4`	2024-12-23 15:03:40 +01:00
William Lallemand	056ec51c26	MEDIUM: ssl/ocsp: counters for OCSP stapling Add 2 counters in the SSL stats module for OCSP stapling. - ssl_ocsp_staple is the number of OCSP response successfully stapled with the handshake - ssl_failed_ocsp_stapled is the number of OCSP response that we couldn't staple, it could be because of an error or because the response is expired. These counters are incremented in the OCSP stapling callback, so if no OCSP was configured they won't never increase. Also they are only working in frontends. This was discussed in github issue #2822.	2024-12-23 11:23:00 +01:00
William Lallemand	0e6af97233	MINOR: ssl: change visibility of ssl_stats_module In order to add stats from other files, the ssl_stats_module need to be visible from other files. This moves the ssl_counters definition in ssl_sock-t.h and removes the static of ssl_stats_module.	2024-12-23 11:23:00 +01:00
William Lallemand	acb2c9eb8b	MINOR: ssl: improve HAVE_SSL_OCSP ifdef Allow to build correctly without OCSP. It could be disabled easily with OpenSSL build with OPENSSL_NO_OCSP. Or even with DEFINE="-DOPENSSL_NO_OCSP" on haproxy make line.	2024-12-19 10:53:05 +01:00
Remi Tricot-Le Breton	93f2c73423	MINOR: ssl/ocsp: Add extra details in error logs when possible When the ocsp response auto update process fails during insertion or while validating the received ocsp response, we call ssl_sock_update_ocsp_response or ssl_ocsp_check_response respectively and both these functions take an 'err' parameter in which detailed error messages can be written. Until now, those error messages were discarded and the only information given to the user was a generic error (ERR_CHECK or ERR_INSERT) which does not help much. We now keep a pointer to the last error message in the certificate_ocsp structure and dump its content in the update logs as well as in the "show ssl ocsp-updates" cli command. This issue was raised in GitHub #2817.	2024-12-18 10:41:16 +01:00
Amaury Denoyelle	9d155ca706	MINOR: trace: implement tracing disabling API Define a set of functions to temporarily disable/reactivate tracing for the current thread. This could be useful when wanting to quickly remove tracing output for some code parts. The API relies on a disable/resume set of functions, with a thread-local counter. This counter is tested under __trace_enabled(). It is a cumulative value so that the same count of resume must be issued after several disable usage. There is also the possibility to force reset the counter to 0 before restoring the old value. This should be backported up to 3.1.	2024-12-18 09:52:06 +01:00
Amaury Denoyelle	e296585ae9	MEDIUM/OPTIM: mux-quic: implement purg_list This commit is part of the current serie which aims to refactor and improve overall performance of QUIC MUX I/O handler. qcc_io_process() is responsible to perform some internal operations on QUIC MUX after I/O completion. It is notably called on every qcc_io_cb() tasklet handler. The most intensive work on it is the purging of QCS instances after transfer completion. This was implemented by looping on QCC streams tree and inspecting the state of every QCS. The purpose of this commit is to optimize this processing. A new purg_list QCC member is defined. It is responsible to list every QCS instances whose transfer has been completed. It is thus safe to reuse <el_send> QCS list attach point. Stream purging will thus only loop on purg_list instead of every known QCS. This should be backported up to 3.1.	2024-12-18 09:33:52 +01:00
Amaury Denoyelle	4b42dd4ae0	MEDIUM/OPTIM: mux-quic: define a recv_list for demux resumption This commit is part of the current serie which aims to refactor and improve overall performance of QUIC MUX I/O handler. Define a recv_list element into qcc structure. This is used to registered every instance of qcs which are currently blocked on demuxing, which happen on no more space in <rx.appbuf>. The purpose of this patch is to reduce qcc_io_recv() CPU usage. Now, only recv_list iteration is performed, instead of the previous looping over every qcs instances. This is useful as qcc_io_recv() is called each time qcc_io_cb() is scheduled, even if only sending condition was the wakeup origin. A qcs is not inserted into recv_list immediately after blocking on demux full buffer. Instead, this is only done after unblocking via stream rcv_buf callback, which ensure that new buffer space is available. This should be backported up to 3.1.	2024-12-18 09:23:41 +01:00
Amaury Denoyelle	0a53a008d0	MINOR: mux-quic: refactor wait-for-handshake support This commit refactors wait-for-handshake support from QUIC MUX. The flag logic QC_CF_WAIT_HS is inverted : it is now positionned only if MUX is instantiated before handshake completion. When the handshake is completed, the flag is removed. The flag is now set directly on initialization via qmux_init(). Removal via qcc_wait_for_hs() is moved from qcc_io_process() to qcc_io_recv(). This is deemed more logical as QUIC MUX is scheduled on RECV to be notify by the transport layer about handshake termination. Moreover, qcc_wait_for_hs() is now called if recv subscription is still active. This commit is the first of a serie which aims to refactor QUIC MUX I/O handler and improves its overall performance. The ultimate objective is to be able to stream qcc_io_cb() by removing pacing specific code path via qcc_purge_sending(). This should be backported up to 3.1.	2024-12-18 09:23:41 +01:00
Amaury Denoyelle	17bfe93768	CLEANUP: mux-quic: remove unused qcc member send_retry_list Remove unused fields send_retry_list from qcc and its corresponding attach element el from qcs. This should be backported up to 3.1.	2024-12-18 09:20:20 +01:00
Willy Tarreau	7b6acb6a51	MINOR: bug: make BUG_ON() fall back to ASSUME When the strict level is zero and BUG_ON() is not implemented, some possible null-deref warnings are emitted again because some were covering for these cases. Let's make it fall back to ASSUME() so that the compiler continues to know that the tested expression never happens. It also allows to further optimize certain functions by helping the compiler eliminate certain tests for impossible values. However it requires that the expression is really evaluated before passing the result through ASSUME() otherwise it was shown that gcc-11 and above will fail to evaluate its implications and will continue to emit the null-deref warnings in case the expression is non-trivial (e.g. it has multiple terms). We don't do it for BUG_ON_HOT() however because the extra cost of evaluating the condition is generally not welcome in fast paths, particularly when that BUG_ON_HOT() was kept disabled for performance reasons.	2024-12-17 17:39:12 +01:00
Willy Tarreau	63798088b3	MINOR: compiler: add ASSUME_NONNULL() to tell the compiler a pointer is valid At plenty of places we have ALREADY_CHECKED() or DISGUISE() on a pointer just to avoid "possibly null-deref" warnings. These ones have the side effect of weakening optimizations by passing through an assembly step. Using ASSUME_NONNULL() we can avoid that extra step. And when the __builtin_unreachable() builtin is not present, we fall back to the old method using assembly. The macro returns the input value so that it may be used both as a declarative way to claim non-nullity or directly inside an expression like DISGUISE().	2024-12-17 16:46:46 +01:00
Willy Tarreau	2ce63b7b17	MINOR: compiler: also enable __builtin_assume() for ASSUME() Clang apparently has __builtin_assume() which does exactly the same as our macro, since at least v3.8. Let's enable it, in case it may even better detect assumptions vs unreachable code.	2024-12-17 16:46:46 +01:00
Willy Tarreau	efc897484b	MINOR: compiler: add a new "ASSUME" macro to help the compiler This macro takes an expression, tests it and calls an unreachable statement if false. This allows the compiler to know that such a combination does not happen, and totally eliminate tests that would be related to this condition. When the statement is not available in the compiler, we just perform a break from a do {} while loop so that the expression remains evaluated if needed (e.g. function call).	2024-12-17 16:46:46 +01:00
Willy Tarreau	41fc18b1d1	MINOR: compiler: rely on builtin detection for __builtin_unreachable() Due to __builtin_unreachable() only being associated to gcc 4.5 and above, it turns out it was not enabled for clang. It's not used that much but still a little bit, so let's enable it now. This reduces the code size by 0.2% and makes it a bit more efficient.	2024-12-17 16:46:46 +01:00
Willy Tarreau	96cfcb1df3	MINOR: compiler: add a __has_builtin() macro to detect features more easily We already have a __has_attribute() macro to detect when the compiler supports a specific attribute, but we didn't have the equivalent for builtins. clang-3 and gcc-10 have __has_builtin() for this. Let's just bring it using the same mechanism as __has_attribute(), which will allow us to simply define the macro's value for older compilers. It will save us from keeping that many compiler-specific tests that are incomplete (e.g. the __builtin_unreachable() test currently doesn't cover clang).	2024-12-17 16:46:46 +01:00
Olivier Houchard	b3cd5a4b86	CLEANUP: queues: Remove pendconn_grab_from_px(). pendconn_grab_from_px() is now unused, so just remove it.	2024-12-17 16:05:44 +01:00
William Lallemand	bb88f68cf7	MINOR: ssl: add utils functions to extract X509 notAfter date Add ASN1_to_time_t() which converts an ASN1_TIME to a time_t and x509_get_notafter_time_t() which returns the notAfter date in time_t format.	2024-12-16 14:54:53 +01:00
Valentine Krasnobaeva	fbc534a6fa	REORG: startup: move nofile limit checks in limits.c Let's encapsulate the code, which checks the applied nofile limit into a separate helper check_nofile_lim_and_prealloc_fd(). Let's keep in this new function scope the block, which tries to create a copy of FD with the highest number, if prealloc-fd is set in the configuration.	2024-12-16 10:44:01 +01:00
Valentine Krasnobaeva	14f5e00d38	REORG: startup: move code that applies limits to limits.c In step_init_3() we try to apply provided or calculated earlier haproxy maxsock and memmax limits. Let's encapsulate these code blocks in dedicated functions: apply_nofile_limit() and apply_memory_limit() and let's move them into limits.c. Limits.c gathers now all the logic for calculating and setting system limits in dependency of the provided configuration.	2024-12-16 10:44:01 +01:00
Valentine Krasnobaeva	1332e9b58d	REORG: startup: move global.maxconn calculations in limits.c Let's encapsulate the code, which calculates global.maxconn and global.maxsslconn into a dedicated function set_global_maxconn() and let's move this function in limits.c. In limits.c we keep helpers to calculate and check haproxy internal limits, based on the system nofile and memory limits.	2024-12-16 10:44:01 +01:00
Frederic Lecaille	e1d25cdbdd	CLEANUP: quic: remove a wrong comment about ->app_limited (drs) ->app_limited quic_drs struct member is not a boolean. This is the index of the last transmitted packet marked as application-limited, or 0 if the connection is not currently application-limited (see C.app_limited definition in BBR v3 draft).	2024-12-13 14:42:43 +01:00
Frederic Lecaille	eeaeb412dc	MINOR: quic: reduce the private data size of QUIC cc algos After these commits: BUG/MINOR: quic: remove max_bw filter from delivery rate sampling BUG/MINOR: quic: fix BBB max bandwidth oscillation issue where some members were removed from bbr struct, the private data size of QUIC cc algorithms may be reduced from 160 to 144 uint32_t. Should be easily backported to 3.1 alonside the commits mentioned above.	2024-12-13 14:42:43 +01:00
Frederic Lecaille	22ab45a3a8	BUG/MINOR: quic: remove max_bw filter from delivery rate sampling This filter is no more needed after this commit: BUG/MINOR: quic: fix BBB max bandwidth oscillation issue. Indeed, one added this filter at delivery rate sampling level to filter the BBR max bandwidth estimations and was inspired from ngtcp2 code source when trying to fix the oscillation issue. But this BBR max bandwidth oscillation issue was fixed by the aforementioned commit. Furthermore this code tends to always increment the BBR max bandwidth. From my point of view, this is not a good idea at all. Must be backported to 3.1.	2024-12-13 14:42:43 +01:00
Frederic Lecaille	a9a2f98f86	MINOR: window_filter: rely on the time to update the filter samples (QUIC/BBR) The windowed filters are used only the BBR implementation for QUIC to filter the maximum bandwidth samples for its estimation over a virtual time interval tracked by counting the cyclical progression through ProbeBW cycles. ngtcp2 and quiche use such windowed filters in their BBR implementation. But in a slightly different way. When updating the 2nd or 3rd filter samples, this is done based on their values in place of the time they have been sampled. It seems more logical to rely on the sample timestamps even if this has no implication because when a sample is updated using another sample because it has the same value, they have both the same timestamps! This patch modifies two statements which compare two consecutive filter samples based on their values (smp[]->v) by statements which compare them based on the virtual time they have been sampled (smp[]->t). This fully complies which the code used by the Linux kernel in lib/win_minmax.c. Alo take the opportunity of this patch to shorten some statements using <smp> local variable value to update smp[2] sample in place of initializing its two members with the <smp> member values. This patch SHOULD be easily backported to 3.1 where BBR was first implemented.	2024-12-13 14:42:43 +01:00
Amaury Denoyelle	1f458b3ea8	MINOR: applet: define applet_putchk_stress() alternative Previous patch introduced stress mode to be able to easily test alternative code paths. The first point would be to force interruption of stats dump on every line and check reentrant patchs, in particular while adding and removing servers instances. The purpose of this patch is to be able to use applet_putchk_stress() during stats dump while not impacting other applets. To support this, extract applet_putchk() into an internal _applet_putchk() which have a new argument stress. Define two helpers applet_putchk() and applet_putchk_stress(), the latter to set the stress argument to true. For the moment, applet_putchk_stress() is not used. This will be the subject of the next patch.	2024-12-12 11:26:33 +01:00
Amaury Denoyelle	9d19fc4cf7	MINOR: build: define DEBUG_STRESS Define a new build mode DEBUG_STRESS. This will be used to stress some code parts which cannot be reproduce easily with an alternative suboptimal code. First, a global <mode_stress> is set either to 1 or 0 depending on DEBUG_STRESS compilation. A new global keyword "stress-level" is also defined. It allows to specify a level from 0 to 9, to increase the stress incurred on the code. Helper macro STRESS_RUN* are defined for each stress level. This allows to easily specify an instruction in default execution and a stress counterpart if running on the corresponding stress level.	2024-12-12 11:19:10 +01:00
Aurelien DARRAGON	358166ae6a	BUG/MINOR: hlua_fcn: restore server pairs iterator pointer consistency Since `9c91b30` ("MINOR: server: remove prev_deleted server list"), hlua server pair iterator may use and return invalid (stale) server pointer if multiple servers were deleted between two iterations. Indeed, the server refcount mechanism (using srv_take()) is no longer sufficient as the prev_deleted mitigation was removed. To ensure server pointer consistency between two yields, the new watcher mechanism must be used (as it already the case for stats dumping). Thus in this patch we slightly change the server iteration logic: hlua_server_list_iterator_context struct now stores the next valid server pointer, and a watcher is added to ensure this pointer is never stale. Then in hlua_listable_servers_pairs_iterator(), this next pointer is used to create the Lua server object, and the next valid pointer is obtained by leveraging watcher_next(). No backport needed unless `9c91b30` ("MINOR: server: remove prev_deleted server list") is. Please note that dynamic servers were not supported in Lua prior to 2.8, so it doesn't make sense to backport this patch further than 2.8.	2024-12-11 10:52:11 +01:00
Amaury Denoyelle	9c91b30139	MINOR: server: remove prev_deleted server list This patch is a direct follow-up to the previous one. Thanks to watcher type, it is not safe to assume that servers manipulated via stats dump were not targetted by a "delete server" CLI command. As such, prev_deleted list server member is now unneeded. This patch thus removes any reference to it.	2024-12-10 16:19:33 +01:00
Amaury Denoyelle	071ae8ce3d	BUG/MEDIUM: stats/server: use watcher to track server during stats dump If a server A is deleted while a stats dump is currently on it, deletion is delayed thanks to reference counting. Server A is nonetheless removed from the proxy list. However, this list is a single linked list. If the next server B is deleted and freed immediately, server A would still point to it. This problem has been solved by the prev_deleted list in servers. This model seems correct, but it is difficult to ensure completely its validity. In particular, it implies when stats dump is resumed, server A elements will be accessed despite the server being in a half-deleted state. Thus, it has been decided to completely ditch the refcount mechanism for stats dump. Instead, use the watcher element to register every stats dump currently tracking a server instance. Each time a server is deleted on the CLI, each stats dump element which may points to it are updated to access the next server instance, or NULL if this is the last server. This ensures that a server which was deleted via CLI but not completely freed is never accessed on stats dump resumption. Currently, no race condition related to dynamic servers and stats dump is known. However, as described above, the previous model is deemed too fragile, as such this patch is labelled as bug-fix. It should be backported up to 2.6, after a reasonable period of observation. It relies on the following patch : MINOR: list: define a watcher type	2024-12-10 16:19:33 +01:00
Amaury Denoyelle	eafa8a32bb	MINOR: list: define a watcher type Define a new watcher type into list module. This type is similar to bref and can be used to register an element which is currently tracking a dynamic target. Contrary to bref, if the target is freed, every watcher element are updated to point to a next valid entry or NULL. This type will simplify handling of dynamic servers deletion, in particular while stats dump are performed. This patch is not a bug-fix. However, it is mandatory to fix a race condition in dynamic servers. Thus, it should be backported along the next commit up to 2.6.	2024-12-10 16:04:11 +01:00
Valentine Krasnobaeva	1f63a53955	BUG/MINOR: mworker: detach from tty when received READY from worker Some master process' initialization steps are conditioned by receiving the READY message from worker (pidfile creation, forwarding READY message to the launching parent). So, master process can not do these initialization routines before. If the master process fails, while creating pid or forwarding the READY to the parent in daemon mode, he exits with a proper alert message. In daemon mode we no longer see such message, as process is already detached from the tty. To fix this, as these alerts could be very useful, let's detach the master process from the tty after his last initialization steps in _send_status.	2024-12-09 21:32:54 +01:00
Valentine Krasnobaeva	663d75e7a0	BUG/MEDIUM: startup: report status if daemonized process fails Due to master-worker rework, daemonization fork happens now before parsing and applying the configuration. This makes impossible to report correctly all warnings and alerts to shell's stdout. Daemonzied process fails, while being already in background, exit code reported by shell via '$?' equals to 0, as it's the exit code of his parent. To fix this, let's create a pipe between parent and daemonized child. The child will send into this pipe a "READY" message, when it finishes his initialization. The parent will wait on the "read" end of the pipe until receiving something. If read() fails, parent obtains the status of the exited child with waitpid(). So, the parent can correctly report the error to the stdout and he can exit with child's exitcode. This fix should be backported only in 3.1.	2024-12-09 21:32:44 +01:00
William Lallemand	5454824e31	MINOR: ssl: add notBefore and notAfter utility functions Extracting notBefore and notAfter as a string can be bothersome, add 2 utility functions that returns the value in a static buffer.	2024-12-09 18:29:23 +01:00
Willy Tarreau	c3ee4e375b	MINOR: tools: make fddebug() automatically emit the location fddebug() is sometimes quite helpful, but annoying to use when following a call path because it's a pain to always repeat the function name and call place. Let's have it automatically prepend the function name, the file name and the line number, and make its arguments optional, replacing them by a simple LF when all absent. This way, simply placing: fddebug(); is sufficient to emit a location follocing "[%s@%s:%d]\n". This function must not be used in production (and even call places with it shouldn't be committed) and it should only be used by developers, so the simplest the better.	2024-12-09 18:05:09 +01:00
Willy Tarreau	d6dc8120c0	BUILD: debug: fix build issues in COUNT_IF() with -Wunused-value Commit `7f64bb79fd` ("BUG/MINOR: debug: COUNT_IF() should return true/false") allowed the COUNT_IF() macro to return the evaluated value. This is handy to place it in "if ()" conditions and count them at the same time. When glitches are disabled, the condition is just returned as-is, but most call places do not use the result, making some compilers complain. In addition, while reviewing this, it was noticed that when DEBUG_STRICT=0, the macro would still be replaced by a "do { } while (0)" statement, which not only does not evaluate the expression, but also cannot return anything. Ditto for COUNT_IF_HOT(). Let's make sure both are always properly evaluated now.	2024-12-09 18:04:51 +01:00
Willy Tarreau	7f64bb79fd	BUG/MINOR: debug: COUNT_IF() should return true/false The COUNT_IF() macro was initially meant to return true/false to be used in if() conditions but had an extra do { } while(0) that prevents it from doing so. Let's get rid of the do { } while(0) before the code generalizes to too many places. There's no impact on existing code, but may have to be backported if future fixes rely on it.	2024-12-06 18:45:46 +01:00
Valentine Krasnobaeva	cd0b58e23e	BUG/MINOR: startup: fix error path for master, if can't open pidfile If master process can't open a pidfile, there is no sense to send SIGTTIN to oldpids, as it will exit. So, old workers will terminate as well. It's better to send the last alert to the log about unrecoverable error, because master is already in its polling loop. For the standalone mode we should keep the previous logic in this case: send SIGTTIN to old process and unbind listeners for the new one. So, it's better to put this error path in main(), as it's done when other configuration settings can't be applied. This patch should be backported only in 3.1.	2024-12-06 12:00:22 +01:00
Aurelien DARRAGON	ae9d8d40d0	CLEANUP: stktable: add some stktable flags polishing Better late than never, commit `1f73d35` ("MINOR: stktable: implement "recv-only" table option") implemented stktable flags and initial definitions, but it lacks some comments plus the flag is stored as 16bits but the SKT_FL_ definition width allows for only 8bits so it is a bit confusing, let's fix that	2024-12-05 13:14:21 +01:00
Aurelien DARRAGON	9f44c5f9be	CLEANUP: stktable: replace nopurge attribute with flag Thanks to previous commit stktable struct now have a "flags" struct member Let's take this opportunity to remove the isolated "nopurge" attribute in stktable struct and rely on a flag named STK_FL_NOPURGE instead. This helps to better organize stktable struct members.	2024-12-05 12:15:31 +01:00
Aurelien DARRAGON	1f73d3524d	MINOR: stktable: implement "recv-only" table option When "recv-only" keyword is added on a stick table declaration (in peers or proxy section), haproxy considers that the table is only used for data retrieval from a remote location and not used to perform local updates. As such, it enables the retrieval of local-only values such as conn_cur that are ignored by default. This can be useful in some contexts where we want to know about local-values such are conn_cur from a remote peer. To do this, add stktable struct flags which default to NONE and enable the RECV_ONLY flag on the table then "recv-only" keyword is found in the table declaration. Then, when in peer_treat_updatemsg(), when handling table updates, don't ignore data updates for local-only values if the flag is set.	2024-12-05 12:15:24 +01:00
Willy Tarreau	e6f4f15929	MINOR: tasklet: set TASK_WOKEN_OTHER on tasklets by default Now when tasklets are woken up via tasklet_wakeup(), tasklet_wakeup_on() or tasklet_wakeup_after(), either the optional wakeup flags will be used, or TASK_WOKEN_OTHER will be used. This allows tasklet handlers waking up for any given cause to notice whether or not they were also woken for another reason. For example, a mux handler could skip heavy parts when seeing that TASK_WOKEN_OTHER is absent, proving that no standard tasklet_wakeup() was done, for example in response to a subscribe(). The benefit of the TASK_WOKEN_* flags is that they're purged during the wakeup, and that they're easy to check for using TASK_WOKEN_ANY. TASK_F_UEVT1 and TASK_F_UEVT2 are also usable for private use (e.g. wakeup from a stream to a connection inside a mux). Probably that in the future, code dealing with subscribe events should start to place TASK_WOKEN_IO like is done for upper layers.	2024-12-03 19:45:08 +01:00
Willy Tarreau	6322c9fbbf	MINOR: tools: add a new macro DEFVAL() to provide a default argument This is like DEFZERO and DEFNULL, but this one allows to specify the default value to be used as the first argument.	2024-12-03 19:45:08 +01:00
Valentine Krasnobaeva	a33977da48	BUG/MINOR: startup: close pidfd and free global.pidfile in handle_pidfile() After master-worker mode refactoring, global.pidfile is only used in handle_pidfile(), which opens the provided file and writes the PID into it. So, it's more appropriate to perform the close(pidfd) and ha_free(&global.pidfile) also in this function. This commit prepares the fix of the pidfile creation, as it's created now very early, when we are not sure, that process has successfully started. In master-worker mode handle_pidfile() can be called in the master process context. So, let's make it accessible from other compilation units via global.h. This should be backported only in 3.1.	2024-12-02 17:28:04 +01:00
Aurelien DARRAGON	8bce7ff854	MINOR: hlua_fcn: add Patref:commit() method commit() method may be used to commit pending updates on the local patref object: hlua_patref flags were added: HLUA_PATREF_FL_GEN means the patref object has been updated and it is associated to a new revision (curr_gen) in order to prepare and commit the pending updates. upon commit, the pattern API is leveraged with curr_gen as revision to commit new object items. Once commit is performed, previous (pending) revisions that are older than the committed one are cleaned up (similar to what's done with commit on the cli). Also, Patref function APIs now take into account curr_gen to perform lookups.	2024-11-29 07:23:08 +01:00
Aurelien DARRAGON	e769d8f426	MINOR: pattern: add pat_ref_may_commit() helper function pat_ref_may_commit() may be used to know if a given generation ID id still valid, which means it may still be committed at some point. Else it means that another pending generation ID older than the tested one was already committed and thus other generations ID below this one are stale and must be regenerated.	2024-11-29 07:23:01 +01:00
Aurelien DARRAGON	43ab25f007	MINOR: hlua_fcn: wrap pat_ref struct for patref class In order to extend the patref class features, let's wrap the pat_ref struct into hlua_patref struct. This way we may add additional data alongside the pat_ref pointer to store additional context required for pat_ref data manipulation from lua. Since the wrapper (hlua_patref) is an allocated object, we declare the _gc metamethod for patref class in order to properly cleanup resources when they are out of scope.	2024-11-29 07:22:54 +01:00
Aurelien DARRAGON	2021072391	MINOR: hlua_fcn: implement index and pair metamethods for patref class patref object may now leverage index and pair methamethods to list and access patref elements at a specific index (=key) Also, patref:is_map() method may be used to know if the patref stores acl (key only) or map-style (key:value) patterns.	2024-11-29 07:22:46 +01:00
Aurelien DARRAGON	956a25cf60	MINOR: hlua: add patref class Implement patref class to expose pat_ref struct internal pattern struct in lua. This is some prerequisite work needed to be able to manipulate exisiting generic pattern object lists (acl/map) from Lua, because the Map class can only be used to perform matching ops on Map files.	2024-11-29 07:22:32 +01:00
Aurelien DARRAGON	f72a66eef2	MINOR: pattern: publish event_hdl events on pat_ref updates Now that PAT_REF events were defined in previous commit, let's actually publish them from pattern API where relevant. Unlike server events, pattern reference events are only published in the pat_ref subscriber's list on purpose, because in some setups patref updates (updates performed on a map for instance from action or cli) are very frequent, and we don't want to impact pattern API performance just for that. Moreover, as the main use case is to be able to subscribe to maps updates from Lua, allowing a per-pattern reference registration is already enough. No additional data is provided for such events (also for performance reason) Care was taken not to publish events when the update doesn't affect the live subset (the one targeted by curr_gen).	2024-11-29 07:22:25 +01:00
Aurelien DARRAGON	f7267bd315	MINOR: event_hdl: add PAT_REF events This is some prerequisite work for implementing PAT_REF events. In this commit we define the PAT_REF event_hdl family (which gets family slot id #2), with the following supported events: - EVENT_HDL_SUB_PAT_REF_ADD: element was added to the current version of the pattern ref - EVENT_HDL_SUB_PAT_REF_DEL: element was deleted from the current version of the pattern ref - EVENT_HDL_SUB_PAT_REF_SET: element was modified in the current version of the pattern ref - EVENT_HDL_SUB_PAT_REF_COMMIT: pending element(s) was/were commited in the current version of the pattern ref - EVENT_HDL_SUB_PAT_REF_CLEAR: all elements were cleared from the current version of the pattern ref The goal is to be able to track a pat_ref struct in order to be notified when it is updated. For performance reasons, events from this family won't provide any additional info, and will only be published in the pat_ref subscription list. Indeed, pat_ref may be updated at a relatively high frequency (or worse, batch work), so we cannot afford doing expensive treatment for each update.	2024-11-29 07:22:18 +01:00
Frederic Lecaille	f8b697c19b	BUG/MINOR: improve BBR throughput on very fast links This patch fixes the loss of information when computing the delivery rate (quic_cc_drs.c) on links with very low latency due to usage of 32bits variables with the millisecond as precision. Initialize the quic_conn task with TASK_F_WANTS_TIME flag ask it to ask the scheduler to update the call date of this task. This allows this task to get a nanosecond resolution on the call date calling task_mono_time(). This is enabled only for congestion control algorithms with delivery rate estimation support (BBR only at this time). Store the send date with nanosecond precision of each TX packet into ->time_sent_ns new quic_tx_packet struct member to store the date a packet was sent in nanoseconds thanks to task_mono_time(). Make use of this new timestamp by the delivery rate estimation algorithm (quic_cc_drs.c). Rename current ->time_sent member from quic_tx_packet struct to ->time_sent_ms to distinguish the unit used by this variable (millisecond) and update the code which uses this variable. The logic found in quic_loss.c is not modified at all. Must be backported to 3.1.	2024-11-28 21:39:05 +01:00
Christopher Faulet	bc66d31985	MINOR: proxy: Add support of 421-Misdirected-Request in retry-on status The "421" status can now be specified on retry-on directives. PR_RE_* flags were updated to remains sorted. This patch should fix the issue #2794. It is quite simple so it may safely be backported to 3.1 if necessary.	2024-11-28 11:47:40 +01:00
Willy Tarreau	97d33abb23	MINOR: version: this is development again (3.2) This basically reverts commit `b629f366a7` ("MINOR: version: mention that 3.1 is stable now").	2024-11-26 17:21:16 +01:00
Aurelien DARRAGON	4792f27892	MINOR: pattern: add pat_ref_gen_delete() function pat_ref_gen_delete(ref, gen_id, key) tries to delete all samples belonging to <gen_id> and matching <key> under <ref> The goal is to be able to target a single subset from <ref>	2024-11-26 16:12:21 +01:00
Aurelien DARRAGON	a131c542a6	MINOR: pattern: add pat_ref_gen_find_elt() function pat_ref_gen_find_elt(ref, gen_id, key) tries to find <elt> element belonging to <gen_id> and matching <key> in <ref> reference. The goal is to be able to target a single subset from <ref>	2024-11-26 16:12:16 +01:00
Aurelien DARRAGON	c9d6af3c6d	MINOR: pattern: add pat_ref_gen_set() function pat_ref_gen_set(ref, gen_id, value, err) modifies to <value> the sample of all patterns matching <key> and belonging to <gen_id> (generation id) under <ref> The goal is to be able to target a single subset from <ref>	2024-11-26 16:12:11 +01:00
Aurelien DARRAGON	3d250b3be8	MINOR: pattern: split pat_ref_set() split pat_ref_set() function in 2 distinct functions. Indeed, since `0844bed7d3` ("MEDIUM: map/acl: Improve pat_ref_set() efficiency (for "set-map", "add-acl" action perfs)"), pat_ref_set() prototype was updated to include an extra <elt> argument. But the logic behind is not explicit because the function will not only try to set <elt>, but also its duplicate (unlike pat_ref_set_elt() which only tries to update <elt>). Thus, to make it clearer and better distinguish between the key-based lookup version and the elt-based one, restotre pat_ref_set() previous prototype and add a dedicated pat_ref_set_elt_duplicate() that takes <elt> as argument and tries to update <elt> and all duplicates.	2024-11-26 16:12:05 +01:00
Willy Tarreau	4d58f521ee	[RELEASE] Released version 3.2-dev0 Released version 3.2-dev0 with the following main changes : - exact copy of 3.1.0	2024-11-26 15:33:57 +01:00
Christopher Faulet	b629f366a7	MINOR: version: mention that 3.1 is stable now This version will be maintained up to around Q1 2026. The INSTALL file also mentions it.	2024-11-26 15:23:54 +01:00
Amaury Denoyelle	2fffd85b97	BUG/MEDIUM: quic: prevent EMSGSIZE with GSO for larger bufsize A UDP datagram cannot be greater than 65535 bytes, as UDP length header field is encoded on 2 bytes. As such, sendmsg() will reject a bigger input with error EMSGSIZE. By default, this does not cause any issue as QUIC datagrams are limited to 1.252 bytes and sent individually. However, with GSO support, value bigger than 1.252 bytes are specified on sendmsg(). If using a bufsize equal to or greater than 65535, syscall could reject the input buffer with EMSGSIZE. As this value is not expected, the connection is immediately closed by haproxy and the transfer is interrupted. This bug can easily reproduced by requesting a large object on loopback interface and using a bufsize of 65535 bytes. In fact, the limit is slightly less than 65535, as extra room is also needed for IP + UDP headers. Fix this by reducing the count of datagrams encoded in a single GSO invokation via qc_prep_pkts(). Previously, it was set to 64 as specified by man 7 udp. However, with 1252 datagrams, this is still too many. Reduce it to a value of 52. Input to sendmsg will thus be restricted to at most 65.104 bytes if last datagram is full. If there is still data available for encoding in qc_prep_pkts(), they will be written in a separate batch of datagrams. qc_send_ppkts() will then loop over the whole QUIC Tx buffer and call sendmsg() for each series of at most 52 datagrams. This does not need to be backported.	2024-11-26 11:49:30 +01:00
Valentine Krasnobaeva	3500865bc1	REORG: startup: move mworker_apply_master_worker_mode in mworker.c mworker_apply_master_worker_mode() is called only in master-worker mode, so let's move it mworker.c	2024-11-25 15:20:24 +01:00
Valentine Krasnobaeva	dee247c14e	REORG: startup: move mworker_reexec and mworker_reload in mworker.c Let's move mworker_reexec() and mworker_reload() in mworker.c. mworker_reload() is called only within the functions, which are already in mworker.c. So, this reorganization allows to declare mworker_reload() as a static.	2024-11-25 15:20:24 +01:00
Valentine Krasnobaeva	0c7b93eb1d	REORG: startup: move mworker_run_master and mworker_loop in mworker.c mworker_run_master() is called only in master mode. mworker_loop() is static and called only in mworker_run_master(). So let's move these both functions in mworker.c. We also need here to make run_thread_poll_loop() accessible from other units, as it's used in mworker_loop().	2024-11-25 15:20:24 +01:00
Valentine Krasnobaeva	7974089ac6	REORG: startup: move mworker_prepare_master in mworker.c mworker_prepare_master() performs some preparation routines for the new worker process, which will be forked during the startup. It's called only in master-worker mode, so let's move it in mworker.c.	2024-11-25 15:20:24 +01:00
Valentine Krasnobaeva	af642420b4	REORG: startup: move on_new_child_failure in mworker.c mworker_on_new_child_failure() performs some routines for the worker process, if it has failed the reload. As it's called only in mworker_catch_sigchld() from mworker.c, let's move mworker_on_new_child_failure() in mworker.c as well. Like this it could also be declared as a static.	2024-11-25 15:20:24 +01:00
Valentine Krasnobaeva	321c021a83	MINOR: startup: rename on_new_child_failure to mworker_on_new_child_failure This patch prepares the moving of on_new_child_failure definition into mworker.c. So, let's rename it accordingly and let's also update its description.	2024-11-25 15:20:24 +01:00
Amaury Denoyelle	22bd92a87f	MINOR: mux-quic: use sched call time for pacing QUIC pacing was recently implemented to limit burst and improve overall bandwidth. This is used only for MUX STREAM emission. Pacing requires nanosecond resolution. As such, it used now_cpu_time() which relies on clock_gettime() syscall. The usage of clock_gettime() has several drawbacks : * it is a syscall and thus requires a context-switch which may hurt performance * it is not be available on all systems * timestamp is retrieved multiple times during a single task execution, thus yielding different values which may tamper pacing calculation Improve this by using task_mono_time() instead. This returns task call time from the scheduler thread context. It requires the flag TASK_F_WANTS_TIME on QUIC MUX tasklet to force the scheduler to update call time with now_mono_time(). This solves every limitations listed above : * syscall invokation is only performed once before tasklet execution, thus reducing context-switch impact * on non compatible system, a millisecond timer is used as a fallback which should ensure that pacing works decently for them * timer value is now guaranteed to be fixed duing task execution	2024-11-25 11:21:45 +01:00
Willy Tarreau	670507a66e	MINOR: tools: add a new function "resolve_dso_name" to find a symbol's DSO In the memprofile summary per DSO, we currently have to pay a high price by calling dladdr() on each symbol when doing the summary per DSO at the end, while we're not interested in these details, we just want the DSO name which can be made cheaper to obtain, and easier to manipulate. So let's create resolve_dso_name() to only extract minimal information from an address. At the moment it still uses dladdr() though it avoids all the extra expensive work, and will further be able to leverage the same mechanism as "show libs" to instantly spot DSO from address ranges.	2024-11-21 19:58:06 +01:00
Willy Tarreau	5ddc8b3ad4	MINOR: activity/memprofile: monitor non-portable calls as well Some dependencies might very well rely on posix_memalign(), strndup() or other less portable callsn making us miss them when chasing memory leaks, resulting in negative global allocation counters. Let's provide the handlers for the following functions: strndup() // _POSIX_C_SOURCE >= 200809L \|\| glibc >= 2.10 valloc() // _BSD_SOURCE \|\| _XOPEN_SOURCE>=500 \|\| glibc >= 2.12 aligned_alloc() // _ISOC11_SOURCE posix_memalign() // _POSIX_C_SOURCE >= 200112L memalign() // obsolete pvalloc() // obsolete This time we don't fail if they're not found, we just silently forward the calls.	2024-11-21 19:58:06 +01:00
Willy Tarreau	33c0ce299d	MINOR: activity/memprofile: also monitor strdup() activity Some memory profiling outputs have showed negative counters, very likely due to some libs calling strdup(). Let's add it to the list of monitored activities. Actually even haproxy itself uses some. Having "profiling.memory on" in the config reveals 35 call places.	2024-11-21 19:58:06 +01:00
Willy Tarreau	623a2c4e19	CLEANUP: activity: better use a mask to tests freeing methods In "show profiling memory", we need to distinguish methods which really free memory from those which do not so that we don't account for the free value twice. However for now it's done using multiple tests, which are going to complicate the addition of new methods. Let's switch to a bit field defined as a mask in a single place instead, as we don't intend to use more than 32/64 methods!	2024-11-21 19:58:06 +01:00
Willy Tarreau	859341c1ec	MINOR: activity/memprofile: offer a function to unregister stale info There's actually a problem with memprofiles: the pool pointer is stored in ->info but some pools are replaced during startup, such as the trash pool, leaving a dangling pointer there. Let's complete the API with a new function memprof_remove_stale_info() that will remove all stale references to this info pointer. It's also present when USE_MEMORY_PROFILING is not set so as to ease the job on callers.	2024-11-21 19:58:06 +01:00
Valentine Krasnobaeva	bfe0f9d02d	MINOR: startup: use global progname variable Let's store progname in the global variable, as it is handy to use it in different parts of code to format messages sent to stdout. This reduces the number of arguments, which we should pass to some functions.	2024-11-21 19:55:21 +01:00
Valentine Krasnobaeva	ef154a49e1	MINOR: capabilities: rename program_name argument to progname This commit prepares the usage of the global progname variable. prepare_caps_from_permitted_set() use progname value in warning messages. So, let's rename program_name argument to progname.	2024-11-21 19:55:21 +01:00
Frederic Lecaille	01fcbd6c08	BUG/MINOR: quic: Missing application limitations tracking for BBR The ->app_limited member of the delivery rate struct (quic_cc_drs) aim is to store the index of the last transmitted byte marked as application-limited so that to track the application-limited phases. During these phases, BBR must ignore delivery rate samples to properly estimate the delivery rate. Without such a patch, the Startup phase could be exited very quickly with a very low estimated bottleneck bandwidth. This had a very bad impact on little objects with download times smaller than the expected Startup phase duration. For such objects, with enough bandwith, BBR should stay in the Startup state. No need to be backported, as BBR is implemented in the current developement version.	2024-11-21 19:23:53 +01:00
Amaury Denoyelle	95d3edd68f	MINOR: quic: support pacing for newreno and nocc Extend extra pacing support for newreno and nocc congestion algorithms, as with cubic. For better extensibility of cc algo definition, define a new flags field in quic_cc_algo structure. For now, the only value is QUIC_CC_ALGO_FL_OPT_PACING which is set if pacing support can be optionally activated. Both cubic, newreno and nocc now supports this. This new flag is then reused by QUIC config parser. If set, extra quic-cc-algo burst parameter is taken into account. If positive, this will activate pacing support on top of the congestion algorithm. As with cubic previously, pacing is only supported if running under experimental mode. Only BBR is not flagged with this new value as pacing is directly builtin in the algorithm and cannot be turn off. Furthermore, BBR calculates automatically its value for maximum burst. As such, any quic-cc-algo burst argument used with BBR is still ignored with a warning.	2024-11-21 11:33:44 +01:00
Christopher Faulet	e58a30d369	MINOR: cfgparse: Emit a warning for misplaced "tcp-response content" rules When a "tcp-response content" rule is placed after a "http-response" rule, a warning is now emitted, just like for rules applied on the requests.	2024-11-21 09:55:04 +01:00
Christopher Faulet	5dcd3b0d99	CLEANUP: cfgparse: Add direction in functions name that warn on misplaced rules This only concerns functions emitting warnings about misplaced tcp-request rules. The direction is now specified in the functions name. For instance "warnif_misplaced_tcp_conn" is replaced by "warnif_misplaced_tcp_req_conn".	2024-11-21 09:51:37 +01:00
Christopher Faulet	7710580428	MINOR: config: Improve warnings on misplaced rules by adding an optional arg In warnings about misplaced rules, only the first keyword is mentionned. It works well for http-request or quic-initial rules for instance. But it is a bit confusing for tcp-request rules, because the layer is missing (session or content). To make it a bit systematic (and genric), the second argument can now be provided. It can be set to NULL if there is no layer or scope. But otherwise, it may be specified and it will be reported in the warning. So the following snippet: tcp-request content reject if FALSE tcp-request session reject if FALSE tcp-request connection reject if FALSE Will now emit the following warnings: a 'tcp-request session' rule placed after a 'tcp-request content' rule will still be processed before. a 'tcp-request connection' rule placed after a 'tcp-request session' rule will still be processed before. This patch should fix the issue #2596.	2024-11-21 09:28:42 +01:00
Frederic Lecaille	d85eb127e9	MINOR: quic: quic_loss modifications to support BBR qc_packet_loss_lookup() aim is to detect the packet losses. This is this function which must called ->on_pkt_lost() BBR specific callback. It also set <bytes_lost> passed parameter to the total number of bytes detected as lost upon an ACK frame receipt for its caller. Modify qc_release_lost_pkts() to call ->congestion_event() with the send time from the newest packet detected as lost. Modify qc_release_lost_pkts() to call ->slow_start() callback only if define by the congestion control algorithm. This is not the case for BBR.	2024-11-20 17:34:22 +01:00
Frederic Lecaille	af75665cb7	MINOR: quic: quic_cc modifications to support BBR Add several callbacks to quic_cc_algo struct which are only called by BBR. ->get_drs() may be used to retrieve the delivery rate sampling information from an congestion algorithm struct (quic_cc). ->on_transmit() must be called before sending any packet a QUIC sender. ->on_ack_rcvd() must be called after having received an ACK. ->on_pkt_lost() must be called after having detected a packet loss. ->congestion_event() must be called after any congestion event detection Modify quic_cc.c to call ->event only if defined. This is not the case for BBR.	2024-11-20 17:34:22 +01:00
Frederic Lecaille	d04adf44dc	MINOR: quic: implement BBR congestion control algorithm for QUIC Implement the version 3 of BBR for QUIC specified by the IETF in this draft: https://datatracker.ietf.org/doc/draft-ietf-ccwg-bbr/ Here is an extract from the Abstract part to sum up the the capabilities of BBR: BBR ("Bottleneck Bandwidth and Round-trip propagation time") uses recent measurements of a transport connection's delivery rate, round-trip time, and packet loss rate to build an explicit model of the network path. BBR then uses this model to control both how fast it sends data and the maximum volume of data it allows in flight in the network at any time. Relative to loss-based congestion control algorithms such as Reno [RFC5681] or CUBIC [RFC9438], BBR offers substantially higher throughput for bottlenecks with shallow buffers or random losses, and substantially lower queueing delays for bottlenecks with deep buffers (avoiding "bufferbloat"). BBR can be implemented in any transport protocol that supports packet-delivery acknowledgment. Thus far, open source implementations are available for TCP [RFC9293] and QUIC [RFC9000]. In haproxy, this implementation is considered as still experimental. It depends on the newly implemented pacing feature. BBR was asked in GH #2516 by @KazuyaKanemura, @osevan and @kennyZ96.	2024-11-20 17:34:22 +01:00
Frederic Lecaille	472d575950	MINOR: quic: implement delivery rate sampling algorithm This patch implements an algorithm which may be used by congestion algorithms for QUIC to estimate the current delivery rate of a sender. It is at least used by BBR and could be used by others congestion algorithms as cubic. This algorithm was specified by an RFC draft here: https://datatracker.ietf.org/doc/html/draft-cheng-iccrg-delivery-rate-estimation before being merged into BBR v3 here: https://datatracker.ietf.org/doc/html/draft-cardwell-ccwg-bbr#section-4.5.2.2	2024-11-20 17:34:22 +01:00
Frederic Lecaille	c08b877657	MINOR: window_filter: Implement windowed filter (only max) Implement the Kathleen Nichols' algorithm used by several congestion control algorithm implementation (TCP/BBR in Linux kernel, QUIC/BBR in quiche) to track the maximum value of a data type during a fixe time interval. In this implementation, counters which are periodically reset are used in place of timestamps. Only the max part has been implemented. (see lib/minmax.c implemenation for Linux kernel).	2024-11-20 17:34:22 +01:00
Frederic Lecaille	7bbe8828ba	MINOR: quic: Add the congestion window initial value to QUIC path Add ->initial_wnd new member to quic_cc_path struct to keep the initial value of the congestion window. This member is initialized as soon as a QUIC connection is allocated. This modification is required for BBR congestion control algorithm.	2024-11-20 17:34:22 +01:00
Willy Tarreau	1171a23aec	BUILD: makefile: make ERR apply to build options as well Once in a while we find some makefiles ignoring some outdated arguments and just emit a warning. What's annoying is that if users (say, distro packagers), have purposely added ERR=1 to their build scripts to make sure to fail on any warning, these ones will be ignored and the build can continue with invalid or missing options. William rightfully suggested that ERR=1 should also catch make's warnings so this patch implements this, by creating a new "complain" variable that points either to "error" or "warning" depending on $(ERR), and that is used to send the messages using $(call $(complain),...). This does the job right at little effort (tested from GNU make 3.82 to 4.3). Note that for this purpose the ERR declaration was upped in the makefile so that it appears before the new errors.mk file is included.	2024-11-20 14:58:35 +01:00
Willy Tarreau	12fcd65468	MINOR: tasklet: support an optional set of wakeup flags to tasklet_wakeup_on() tasklet_wakeup_on() and its derivates (tasklet_wakeup_after() and tasklet_wakeup()) do not support passing a wakeup cause like task_wakeup(). This is essentially due to an API limitation cause by the fact that for a very long time the only reason for waking up was to process pending I/O. But with the growing complexity of mux tasks, it is becoming important to be able to skip certain heavy processing when not strictly needed. One possibility is to permit the caller of tasklet_wakeup() to pass flags like task_wakeup(). Instead of going with a complex naming scheme, let's simply make the flags optional and be zero when not specified. This means that tasklet_wakeup_on() now takes either 2 or 3 args, and that the third one is the optional flags to be passed to the callee. Eligible flags are essentially the non-persistent ones (TASK_F_UEVT* and TASK_WOKEN_*) which are cleared when the tasklet is executed. This way the handler will find them in its <state> argument and will be able to distinguish various causes for the call.	2024-11-19 20:13:41 +01:00
Willy Tarreau	0334cb28a9	MINOR: tasklet: make the low-level tasklet API take a flag Everything in the tasklet layer supports flags, except that they are just not implemented in the wakeup functions, while they are in the task_wakeup functions. Initially it was not considered useful to pass wakeup causes because these were essentially I/O, but with the growing number of I/O handlers having to deal with various types of operations (typically cheap I/O notifications on subscribe vs heavy parsing on application-level wakeups), it would be nice to start to make this distinction possible. This commit extends _tasklet_wakeup_on() and _tasklet_wakeup_after() to pass a set of flags that continues to be set as zero. For now this changes nothing, but new functions will come.	2024-11-19 20:13:41 +01:00
Willy Tarreau	e57581d76d	MINOR: tools: add new macro DEFZERO to provide a default zero argument This is the equivalent of DEFNULL except that it sets a zero value instead of a NULL for a missing argument.	2024-11-19 20:13:41 +01:00
Willy Tarreau	c5052bad8a	MINOR: sched: add TASK_F_WANTS_TIME to make the scheduler update the call date Currently tasks being profiled have th_ctx->sched_call_date set to the current nanosecond in monotonic time. But there's no other way to have this, despite the scheduler being capable of it. Let's just declare a new task flag, TASK_F_WANTS_TIME, that makes the scheduler take the time just before calling the handler. This way, a task that needs nanosecond resolution on the call date will be able to be called with an up-to-date date without having to abuse now_mono_time() if not needed. In addition, if CLOCK_MONOTONIC is not supported (now_mono_time() always returns 0), the date is set to the most recently known now_ns, which is guaranteed to be atomic and is only updated once per poll loop. This date can be more conveniently retrieved using task_mono_time(). This can be useful, e.g. for pacing. The code was slightly adjusted so as to merge the common parts between the profiling case and this one.	2024-11-19 20:13:41 +01:00
Willy Tarreau	12969c1b17	MINOR: tinfo/clock: turn sched_call_date to 64-bits We used to store it in 32-bits since we'd only use it for latency and CPU usage calculation but usages will evolve so let's not truncate the value anymore. Now we store the full 64 bits. Note that this doesn't even increase the storage size due to alignment. The 3 usage places were verified to still be valid (most were already cast to 32 bits anyway).	2024-11-19 20:13:41 +01:00
Willy Tarreau	973c81ceec	CLEANUP: tinfo: move sched__date/_mono_time to the thread-local area These ones are never atomically accessed, they have nothing to do in the atomic ops cache line, let's move them to the thread-local area.	2024-11-19 20:13:41 +01:00
Amaury Denoyelle	5a29fd6c61	MINOR: mux_quic/pacing: display pacing info on show quic To improve debugging, extend "show quic" output to report if pacing is activated on a connection. Two values will be displayed for pacing : * a new counter paced_sent_ctr is defined in QCC structure. It will be incremented each time an emission is interrupted due to pacing. * pacing engine now saves the number of datagrams sent in the last paced emission. This will be helpful to ensure burst parameter is valid.	2024-11-19 16:21:05 +01:00
Amaury Denoyelle	24cea66e07	MEDIUM: quic: define cubic-pacing congestion algorithm Define a new QUIC congestion algorithm token 'cubic-pacing' for quic-cc-algo bind keyword. This is identical to default cubic implementation, except that pacing is used for STREAM frames emission. This algorithm supports an extra argument to specify a burst size. This is stored into a new bind_conf member named quic_pacing_burst which can be reuse to initialize quic path. Pacing support is still considered experimental. As such, 'cubic-pacing' can only be used with expose-experimental-directives set.	2024-11-19 16:20:58 +01:00
Amaury Denoyelle	796446a15e	MAJOR: mux-quic: support pacing emission Support pacing emission for STREAM frames at the QUIC MUX layer. This is implemented by adding a quic_pacer engine into QCC structure. The main changes have been written into qcc_io_send(). It now differentiates cases when some frames have been rejected by transport layer. This can occur as previously due to congestion or FD buffer full, which requires subscribing on transport layer. The new case is when emission has been interrupted due to pacing timing. In this case, QUIC MUX I/O tasklet is rescheduled to run with the flag TASK_F_USR1. On tasklet execution, if TASK_F_USR1 is set, all standard processing for emission and reception is skipped. Instead, a new function qcc_purge_sending() is called. Its purpose is to retry emission with the saved STREAM frames list. Either all remaining frames can now be send, subscribe is done on transport error or tasklet must be rescheduled for pacing purging. In the meantime, if tasklet is rescheduled due to other conditions, TASK_F_USR1 is reset. This will trigger a full regeneration of STREAM frames. In this case, pacing expiration must be check before calling qcc_send_frames() to ensure emission is now allowed.	2024-11-19 16:16:48 +01:00
Amaury Denoyelle	ede4cd4c2e	MINOR: mux-quic: encapsulate QCC tasklet wakeup QUIC MUX will be responsible to drive emission with pacing. This will be implemented via setting TASK_F_USR1 before I/O tasklet wakeup. To prepare this, encapsulate each I/O tasklet wakeup into a new function qcc_wakeup(). This commit is purely refactoring prior to pacing implementation into QUIC MUX.	2024-11-19 16:16:48 +01:00
Amaury Denoyelle	4a94a018f0	MINOR: mux-quic: define a tx STREAM frame list member For STREAM emission, MUX QUIC previously used a local list defined under qcc_io_send(). This was suitable as either all frames were sent, or emission must be interrupted due to transport congestion or fatal error. In the latter case, the list was emptied anyway and a new frame list was built on future qcc_io_send() invokation. For pacing, MUX QUIC may have to save the frame list if pacing should be applied across emission. This is necessary to avoid to unnecessarily rebuilt stream frame list between each paced emission. To support this, STREAM list is now stored as a member of QCC structure. Ensure frame list is always deleted, even on QCC release, using newly defined utility function qcc_tx_frms_free().	2024-11-19 16:16:48 +01:00
Amaury Denoyelle	886a7c475c	MINOR: quic/pacing: add burst support qc_send_mux() has been extended previously to support pacing emission. This will ensure that no more than one datagram will be emitted during each invokation. However, to achieve better performance, it may be necessary to emit a batch of several datagrams one one turn. A so-called burst value can be specified by the user in the configuration. However, some congestion control algos may defined their owned dynamic value. As such, a new CC callback pacing_burst is defined. quic_cc_default_pacing_burst() can be used for algo without pacing interaction, such as cubic. It will returns a static value based on user selected configuration.	2024-11-19 16:16:48 +01:00

... 6 7 8 9 10 ...

8742 commits