haproxy

mirror of https://github.com/haproxy/haproxy.git synced 2026-05-28 04:12:17 -04:00

Author	SHA1	Message	Date
Christopher Faulet	044ef9b3d6	CLEANUP: Slightly reorder some proxy option flags to free slots PR_O_TCPCHK_SSL and PR_O_CONTSTATS was shifted to free a slot. The idea is to have 2 contiguous slots to be able to insert two new options.	2025-04-22 16:14:46 +02:00
Amaury Denoyelle	4309a6fbf8	BUG/MINOR: quic: do not crash on CRYPTO ncbuf alloc failure To handle out-of-order received CRYPTO frames, a ncbuf instance is allocated. This is done via the helper quic_get_ncbuf(). Buffer allocation was improperly checked. In case b_alloc() fails, it crashes due to a BUG_ON(). Fix this by removing it. The function now returns NULL on allocation failure, which is already properly handled in its caller qc_handle_crypto_frm(). This should fix the last reported crash from github issue #2935. This must be backported up to 2.6.	2025-04-18 18:11:17 +02:00
Olivier Houchard	3758eab71c	MEDIUM: lb_fwrr: Use one ebtree per thread group. When using the round-robin load balancer, the major source of contention is the lbprm lock, that has to be held every time we pick a server. To mitigate that, make it so there are one tree per thread-group, and one lock per thread-group. That means we now have a lb_fwrr_per_tgrp structure that will contain the two lb_fwrr_groups (active and backup) as well as the lock to protect them in the per-thread lbprm struct, and all fields in the struct server are now moved to the per-thread structure too. Those changes are mostly mechanical, and brings good performances improvment, on a 64-cores AMD CPU, with 64 servers configured, we could process about 620000 requests par second, and we now can process around 1400000 requests per second.	2025-04-17 17:38:23 +02:00
Olivier Houchard	f36f6cfd26	MINOR: proxies: Add a per-thread group lbprm struct. Add a new structure in the per-thread groups proxy structure, that will contain whatever is per-thread group in lbprm. It will be accessed as p->per_tgrp[tgid].lbprm.	2025-04-17 17:38:23 +02:00
Olivier Houchard	7ca1c94ff0	MINOR: lb_fwrr: Move the next weight out of fwrr_group. Move the "next_weight" outside of fwrr_group, and inside struct lb_fwrr directly, one for the active servers, one for the backup servers. We will soon have one fwrr_group per thread group, but next_weight will be global to all of them.	2025-04-17 17:38:23 +02:00
Olivier Houchard	444125a764	MINOR: servers: Provide a pointer to the server in srv_per_tgroup. Add a pointer to the server into the struct srv_per_tgroup, so that if we only have access to that srv_per_tgroup, we can come back to the corresponding server.	2025-04-17 17:38:23 +02:00
Willy Tarreau	36ec70c526	MINOR: sched: add a new function is_sched_alive() to report scheduler's health This verifies that the scheduler is still ticking without having to access the activity[] array nor keeping local copies of the ctxsw counter. It just tests and sets a flag that is reset after each return from a ->process() function.	2025-04-17 16:25:47 +02:00
Willy Tarreau	874ba2afed	CLEANUP: debug: no longer set nor use TH_FL_DUMPING_OTHERS TH_FL_DUMPING_OTHERS was being used to try to perform exclusion between threads running "show threads" and those producing warnings. Now that it is much more cleanly handled, we don't need that type of protection anymore, which was adding to the complexity of the solution. Let's just get rid of it.	2025-04-17 16:25:47 +02:00
Willy Tarreau	c16d5415a8	MINOR: debug: make ha_stuck_warning() only work for the current thread Since we no longer call it with a foreign thread, let's simplify its code and get rid of the special cases that were relying on ha_thread_dump_fill() and synchronization with a remote thread. We're not only dumping the current thread so ha_thread_dump_one() is sufficient.	2025-04-17 16:25:47 +02:00
Willy Tarreau	b24d7f248e	MINOR: pass a valid buffer pointer to ha_thread_dump_one() The goal is to let the caller deal with the pointer so that the function only has to fill that buffer without worrying about locking. This way, synchronous dumps from "show threads" are produced and emitted directly without causing undesired locking of the buffer nor risking causing confusion about thread_dump_buffer containing bits from an interrupted dump in progress. It's only the caller that's responsible for notifying the requester of the end of the dump by setting bit 0 of the pointer if needed (i.e. it's only done in the debug handler).	2025-04-17 16:25:47 +02:00
Willy Tarreau	5ac739cd0c	MINOR: debug: remove unused case of thr!=tid in ha_thread_dump_one() This function was initially designed to dump any threadd into the presented buffer, but the way it currently works is that it's always called for the current thread, and uses the distinction between coming from a sighandler or being called directly to detect which thread is the caller. Let's simplify all this by replacing thr with tid everywhere, and using the thread-local pointers where it makes sense (e.g. th_ctx, th_ctx etc). The confusing "from_signal" argument is now replaced with "is_caller" which clearly states whether or not the caller declares being the one asking for the dump (the logic is inverted, but there are only two call places with a constant).	2025-04-17 16:25:47 +02:00
Willy Tarreau	6d8a523d14	MINOR: tinfo: keep a copy of the pointer to the thread dump buffer Instead of using the thread dump buffer for post-mortem analysis, we'll keep a copy of the assigned pointer whenever it's used, even for warnings or "show threads". This will offer more opportunities to figure from a core what happened, and will give us more freedom regarding the value of the thread_dump_buffer itself. For example, even at the end of the dump when the pointer is reset, the last used buffer is now preserved.	2025-04-17 16:25:47 +02:00
Willy Tarreau	337017e2f9	BUG/MINOR: threads: set threads_idle and threads_harmless even with no threads Some signal handlers rely on these to decide about the level of detail to provide in dumps, so let's properly fill the info about entering/leaving idle. Note that for consistency with other tests we're using bitops with t->ltid_bit, while we could simply assign 0/1 to the fields. But it makes the code more readable and the whole difference is only 88 bytes on a 3MB executable. This bug is not important, and while older versions are likely affected as well, it's not worth taking the risk to backport this in case it would wake up an obscure bug.	2025-04-17 16:25:47 +02:00
Amaury Denoyelle	52246249ab	MEDIUM: listener/mux-h2: implement idle-ping on frontend side This commit is the counterpart of the previous one, adapted on the frontend side. "idle-ping" is added as keyword to bind lines, to be able to refresh client timeout of idle frontend connections. H2 MUX behavior remains similar as the previous patch. The only significant change is in h2c_update_timeout(), as idle-ping is now taken into account also for frontend connection. The calculated value is compared with http-request/http-keep-alive timeout value. The shorter delay is then used as expired date. As hr/ka timeout are based on idle_start, this allows to run them in parallel with an idle-ping timer.	2025-04-17 14:49:36 +02:00
Amaury Denoyelle	a78a04cfae	MEDIUM: server/mux-h2: implement idle-ping on backend side This commit implements support for idle-ping on the backend side. First, a new server keyword "idle-ping" is defined in configuration parsing. It is used to set the corresponding new server member. The second part of this commit implements idle-ping support on H2 MUX. A new inlined function conn_idle_ping() is defined to access connection idle-ping value. Two new connection flags are defined H2_CF_IDL_PING and H2_CF_IDL_PING_SENT. The first one is set for idle connections via h2c_update_timeout(). On h2_timeout_task() handler, if first flag is set, instead of releasing the connection as before, the second flag is set and tasklet is scheduled. As both flags are now set, h2_process_mux() will proceed to PING emission. The timer has also been rearmed to the idle-ping value. If a PING ACK is received before next timeout, connection timer is refreshed. Else, the connection is released, as with timer expiration. Also of importance, special care is needed when a backend connection is going to idle. In this case, idle-ping timer must be rearmed. Thus a new invokation of h2c_update_timeout() is performed on h2_detach().	2025-04-17 14:49:36 +02:00
William Lallemand	e778049ffc	MINOR: acme: register the task in the ckch_store This patch registers the task in the ckch_store so we don't run 2 tasks at the same time for a given certificate. Move the task creation under the lock and check if there was already a task under the lock.	2025-04-16 17:12:43 +02:00
William Lallemand	c291a5c73c	BUILD: incompatible pointer type suspected with -DDEBUG_UNIT src/jws.c: In function '__jws_init': src/jws.c:594:38: error: passing argument 2 of 'hap_register_unittest' from incompatible pointer type [-Wincompatible-pointer-types] 594 \| hap_register_unittest("jwk", jwk_debug); \| ^~~~~~~~~ \| \| \| int ()(int, char ) In file included from include/haproxy/api.h:36, from include/import/ebtree.h:251, from include/import/ebmbtree.h:25, from include/haproxy/jwt-t.h:25, from src/jws.c:5: include/haproxy/init.h:37:52: note: expected 'int ()(void)' but argument is of type 'int ()(int, char )' 37 \| void hap_register_unittest(const char name, int (*fct)()); \| ~~~~~~^~~~~~ GCC 15 is warning because the function pointer does have its arguments in the register function. Should fix issue #2929.	2025-04-15 15:49:44 +02:00
Willy Tarreau	b708345c17	DEBUG: counters: add the ability to enable/disable updating the COUNT_IF counters These counters can have a noticeable cost on large machines, though not dramatic. There's no single good choice to keep them enabled or disabled. This commit adds multiple choices: - DEBUG_COUNTERS set to 2 will automatically enable them by default, while 1 will disable them by default - the global "debug.counters on/off" will allow to change the setting at boot, regardless of DEBUG_COUNTERS as long as it was at least 1. - the CLI "debug counters on/off" will also allow to change the value at run time, allowing to observe a phenomenon while it's happening, or to disable counters if it's suspected that their cost is too high Finally, the "debug counters" command will append "(stopped)" at the end of the CNT lines when these counters are stopped. Not that the whole mechanism would easily support being extended to all counter types by specifying the types to apply to, but it doesn't seem useful at all and would require the user to also type "cnt" on debug lines. This may easily be changed in the future if it's found relevant.	2025-04-14 19:02:13 +02:00
Willy Tarreau	a142adaba0	DEBUG: counters: make COUNT_IF() only appear at DEBUG_COUNTERS>=1 COUNT_IF() is convenient but can be heavy since some of them were found to trigger often (roughly 1 counter per request on avg). This might even have an impact on large setups due to the cost of a shared cache line bouncing between multiple cores. For now there's no way to disable it, so let's only enable it when DEBUG_COUNTERS is 1 or above. A future change will make it configurable.	2025-04-14 19:02:13 +02:00
Willy Tarreau	61d633a3ac	DEBUG: rename DEBUG_GLITCHES to DEBUG_COUNTERS and enable it by default Till now the per-line glitches counters were only enabled with the confusingly named DEBUG_GLITCHES (which would not turn glitches off when disabled). Let's instead change it to DEBUG_COUNTERS and make sure it's enabled by default (though it can still be disabled with -DDEBUG_GLITCHES=0 just like for DEBUG_STRICT). It will later be expanded to cover more counters.	2025-04-14 19:02:13 +02:00
William Lallemand	39c05cedff	BUILD: acme: enable the ACME feature when JWS is present The ACME feature depends on the JWS, which currently does not work with every SSL libraries. This patch only enables ACME when JWS is enabled.	2025-04-12 01:39:03 +02:00
William Lallemand	5500bda9eb	MINOR: acme: implement retrieval of the certificate Once the Order status is "valid", the certificate URL is accessible, this patch implements the retrieval of the certificate which is stocked in ctx->store.	2025-04-12 01:39:03 +02:00
William Lallemand	27fff179fe	MINOR: acme: verify the order status once finalized This implements a call to the order status to check if the certificate is ready.	2025-04-12 01:39:03 +02:00
William Lallemand	680222b382	MINOR: acme: finalize by sending the CSR This patch does the finalize step of the ACME task. This encodes the CSR into base64 format and send it to the finalize URL. https://www.rfc-editor.org/rfc/rfc8555#section-7.4	2025-04-12 01:29:27 +02:00
William Lallemand	de5dc31a0d	MINOR: acme: generate the CSR in a X509_REQ Generate the X509_REQ using the generated private key and the SAN from the configuration. This is only done once before the task is started. It could probably be done at the beginning of the task with the private key generation once we have a scheduler instead of a CLI command.	2025-04-12 01:29:27 +02:00
William Lallemand	00ba62df15	MINOR: acme: implement a check on the challenge status This patch implements a check on the challenge URL, once haproxy asked for the challenge to be verified, it must verify the status of the challenge resolution and if there weren't any error.	2025-04-12 01:29:27 +02:00
William Lallemand	711a13a4b4	MINOR: acme: send the request for challenge ready This patch sends the "{}" message to specify that a challenge is ready. It iterates on every challenge URL in the authorization list from the acme_ctx. This allows the ACME server to procede to the challenge validation. https://www.rfc-editor.org/rfc/rfc8555#section-7.5.1	2025-04-12 01:29:27 +02:00
William Lallemand	ae0bc88f91	MINOR: acme: get the challenges object from the Auth URL This patch implements the retrieval of the challenges objects on the authorizations URLs. The challenges object contains a token and a challenge url that need to be called once the challenge is setup. Each authorization URLs contain multiple challenge objects, usually one per challenge type (HTTP-01, DNS-01, ALPN-01... We only need to keep the one that is relevent to our configuration.	2025-04-12 01:29:27 +02:00
William Lallemand	4842c5ea8c	MINOR: acme: newOrder request retrieve authorizations URLs This patch implements the newOrder action in the ACME task, in order to ask for a new certificate, a list of SAN is sent as a JWS payload. the ACME server replies a list of Authorization URLs. One Authorization is created per SAN on a Order. The authorization URLs are stored in a linked list of 'struct acme_auth' in acme_ctx, so we can get the challenge URLs from them later. The location header is also store as it is the URL of the order object. https://datatracker.ietf.org/doc/html/rfc8555#section-7.4	2025-04-12 01:29:27 +02:00
William Lallemand	04d393f661	MINOR: acme: generate new account The new account action in the ACME task use the same function as the chkaccount, but onlyReturnExisting is not sent in this case!	2025-04-12 01:29:27 +02:00
William Lallemand	7f9bf4d5f7	MINOR: acme: check if the account exist This patch implements the retrival of the KID (account identifier) using the pkey. A request is sent to the newAccount URL using the onlyReturnExisting option, which allow to get the kid of an existing account. acme_jws_payload() implement a way to generate a JWS payload using the nonce, pkey and provided URI.	2025-04-12 01:29:27 +02:00
William Lallemand	0aa6dedf72	MINOR: acme: handle the nonce ACME requests are supposed to be sent with a Nonce, the first Nonce should be retrieved using the newNonce URI provided by the directory. This nonce is stored and must be replaced by the new one received in the each response.	2025-04-12 01:29:27 +02:00
William Lallemand	471290458e	MINOR: acme: get the ACME directory The first request of the ACME protocol is getting the list of URLs for the next steps. This patch implements the first request and the parsing of the response. The response is a JSON object so mjson is used to parse it.	2025-04-12 01:29:27 +02:00
William Lallemand	b8209cf697	MINOR: acme/cli: add the 'acme renew' command The "acme renew" command launch the ACME task for a given certificate. The CLI parser generates a new private key using the parameters from the acme section..	2025-04-12 01:29:27 +02:00
William Lallemand	bf6a39c4d1	MINOR: acme: add private key configuration This commit allows to configure the generated private keys, you can configure the keytype (RSA/ECDSA), the number of bits or the curves. Example: acme LE uri https://acme-staging-v02.api.letsencrypt.org/directory account account.key contact foobar@example.com challenge HTTP-01 keytype ECDSA curves P-384	2025-04-12 01:29:27 +02:00
William Lallemand	2e8c350b95	MINOR: acme: add configuration for the crt-store Add new acme keywords for the ckch_conf parsing, which will be used on a crt-store, a crt line in a frontend, or even a crt-list. The cfg_postparser_acme() is called in order to check if a section referenced elsewhere really exists in the config file.	2025-04-12 01:29:27 +02:00
William Lallemand	077e2ce84c	MINOR: acme: add the acme section in the configuration parser Add a configuration parser for the new acme section, the section is configured this way: acme letsencrypt uri https://acme-staging-v02.api.letsencrypt.org/directory account account.key contact foobar@example.com challenge HTTP-01 When unspecified, the challenge defaults to HTTP-01, and the account key to "<section_name>.account.key". Section are stored in a linked list containing acme_cfg structures, the configuration parsing is mostly resolved in the postsection parser cfg_postsection_acme() which is called after the parsing of an acme section.	2025-04-12 01:29:27 +02:00
William Lallemand	20718f40b6	MEDIUM: ssl/ckch: add filename and linenum argument to crt-store parsing Add filename and linenum arguments to the crt-store / ckch_conf parsing. It allows to use them in the parsing function so we could emits error.	2025-04-12 01:29:27 +02:00
Willy Tarreau	00c967fac4	MINOR: master/cli: support bidirectional communications with workers Some rare commands in the worker require to keep their input open and terminate when it's closed ("show events -w", "wait"). Others maintain a per-session context ("set anon on"). But in its default operation mode, the master CLI passes commands one at a time to the worker, and closes the CLI's input channel so that the command can immediately close upon response. This effectively prevents these two specific cases from being used. Here the approach that we take is to introduce a bidirectional mode to connect to the worker, where everything sent to the master is immediately forwarded to the worker (including the raw command), allowing to queue multiple commands at once in the same session, and to continue to watch the input to detect when the client closes. It must be a client's choice however, since doing so means that the client cannot batch many commands at once to the master process, but must wait for these commands to complete before sending new ones. For this reason we use the prefix "@@<pid>" for this. It works exactly like "@" except that it maintains the channel open during the whole execution. Similarly to "@<pid>" with no command, "@@<pid>" will simply open an interactive CLI session to the worker, that will be ended by "quit" or by closing the connection. This can be convenient for the user, and possibly for clients willing to dedicate a connection to the worker.	2025-04-11 16:09:17 +02:00
Aurelien DARRAGON	fbfeb591f7	MINOR: proxy: add deinit_proxy() helper func Same as free_proxy(), but does not free the base proxy pointer (ie: the proxy itself may not be allocated) Goal is to be able to cleanup statically allocated dummy proxies.	2025-04-10 22:10:31 +02:00
Aurelien DARRAGON	e1cec655ee	MINOR: proxy: add setup_new_proxy() function Split alloc_new_proxy() in two functions: the preparing part is now handled by setup_new_proxy() which can be called individually, while alloc_new_proxy() takes care of allocating a new proxy struct and then calling setup_new_proxy() with the freshly allocated proxy.	2025-04-10 22:10:31 +02:00
Willy Tarreau	f4634e5a38	MINOR: ring/cli: support delimiting events with a trailing \0 on "show events" At the moment it is not supported to produce multi-line events on the "show events" output, simply because the LF character is used as the default end-of-event mark. However it could be convenient to produce well-formatted multi-line events, e.g. in JSON or other formats. UNIX utilities have already faced similar needs in the past and added "-print0" to "find" and "-0" to "xargs" to mention that the delimiter is the NUL character. This makes perfect sense since it's never present in contents, so let's do exactly the same here. Thus from now on, "show events <ring> -0" will delimit messages using a \0 instead of a \n, permitting a better and safer encapsulation.	2025-04-08 14:36:35 +02:00
Willy Tarreau	0be6d73e88	MINOR: ring: support arbitrary delimiters through ring_dispatch_messages() In order to support delimiting output events with other characters than just the LF, let's pass the delimiter through the API. The default remains the LF, used by applet_append_line(), and ignored by the log forwarder.	2025-04-08 14:36:35 +02:00
Willy Tarreau	f01ff2478f	BUILD: atomics: fix build issue on non-x86/non-arm systems Commit `f435a2e518` ("CLEANUP: atomics: also replace __sync_synchronize() with __atomic_thread_fence()") replaced the builtins used for barriers, but the different API required an argument while the macros didn't specify any, resulting in double parenthesis that were causing obscure build errors such as "called object type 'void' is not a function or function pointer". Let's just specify the args for the macro. No backport is needed.	2025-04-07 09:38:22 +02:00
Aurelien DARRAGON	11d4d0957e	MEDIUM: task: make notification_* API thread safe by default Some notification_* functions were not thread safe by default as they assumed only one producer would emit events for registered tasks. While this suited well with the Lua sockets use-case, this proved to be a limitation with some other event sources (ie: lua Queue class) instead of having to deal with both the non thread safe and thread safe variants (_mt suffix), which is error prone, let's make the entire API thread safe regarding the event list. Pruning functions still require that only one thread executes them, with Lua this is always the case because there is one cleanup list per context.	2025-04-03 17:52:50 +02:00
Aurelien DARRAGON	748dba4859	MINOR: hlua_fcn: register queue class using hlua_register_metatable() Most lua classes are registered by leveraging the hlua_register_metatable() helper. Let's use that for the Queue class as well for consitency.	2025-04-03 17:52:17 +02:00
Aurelien DARRAGON	b77b1a2c3a	MINOR: task: add thread safe notification_new and notification_wake variants notification_new and notification_wake were historically meant to be called by a single thread doing both the init and the wakeup for other tasks waiting on the signals. In this patch, we extend the API so that notification_new and notification_wake have thread-safe variants that can safely be used with multiple threads registering on the same list of events and multiple threads pushing updates on the list.	2025-04-03 17:52:03 +02:00
Amaury Denoyelle	f0f1816f1a	MINOR: check: implement check-pool-conn-name srv keyword This commit is a direct follow-up of the previous one. It defines a new server keyword check-pool-conn-name. It is used as the default value for the name parameter of idle connection hash generation. Its behavior is similar to server keyword pool-conn-name, but reserved for checks reuse. If check-pool-conn-name is set, it is used in priority to match a connection for reuse. If unset, a fallback is performed on check-sni.	2025-04-03 17:19:07 +02:00
Amaury Denoyelle	43367f94f1	MINOR: check/backend: support conn reuse with SNI Support for connection reuse during server checks was implemented recently. This is activated with the server keyword check-reuse-pool. Similarly to stream processing via connect_backend(), a connection hash is calculated when trying to perform reuse for checks. This is necessary to retrieve for a connection which shares the check connect parameters. However, idle connections can additionnally be tagged using a pool-conn-name or SNI under connect_backend(). Check reuse does not test these values, which prevent to retrieve a matching connection. Improve this by using "check-sni" value as idle connection hash input for check reuse. be_calculate_conn_hash() API has been adjusted so that name value can be passed as input, both when using streams or checks. Even with the current patch, there is still some scenarii which could not be covered for checks connection reuse. most notably, when using dynamic pool-conn-name/SNI value. It is however at least sufficient to cover simpler cases.	2025-04-03 17:19:07 +02:00
Willy Tarreau	f435a2e518	CLEANUP: atomics: also replace __sync_synchronize() with __atomic_thread_fence() The drop of older compilers also allows us to focus on clearer barriers, so let's use them.	2025-04-03 11:59:31 +02:00
Willy Tarreau	34e3b83f9c	CLEANUP: atomics: remove support for gcc < 4.7 The old __sync_* API is no longer necessary since we do not support gcc before 4.7 anymore. Let's just get rid of this code, the file is still ugly enough without it.	2025-04-03 11:55:35 +02:00
Ilia Shipitsin	27a6353ceb	CLEANUP: assorted typo fixes in the code, commits and doc	2025-04-03 11:37:25 +02:00
William Lallemand	b351f06ff1	REORG: ssl: move curves2nid and nid2nist to ssl_utils curves2nid and nid2nist are generic functions that could be used outside the JWS scope, this patch put them at the right place so they can be reused.	2025-04-02 19:34:09 +02:00
Amaury Denoyelle	f1fb396d71	MEDIUM: check: implement check-reuse-pool Implement the possibility to reuse idle connections when performing server checks. This is done thanks to the recently introduced functions be_calculate_conn_hash() and be_reuse_connection(). One side effect of this change is that be_calculate_conn_hash() can now be called with a NULL stream instance. As such, part of the functions are adjusted accordingly. Note that to simplify configuration, connection reuse is not performed if any specific check connection parameters are defined on the server line or via the tcp-check connect rule. This is performed via newly defined tcpcheck_use_nondefault_connect().	2025-04-02 14:57:40 +02:00
Amaury Denoyelle	e34f748e3a	MINOR: check define check-reuse-pool server keyword Define a new server keyword check-reuse-pool, and its counterpart with a "no" prefix. For the moment, only parsing is implemented. The real behavior adjustment will be implemented in the next patch.	2025-04-02 14:57:40 +02:00
Amaury Denoyelle	20eb57b486	MINOR: backend: remove stream usage on connection reuse Adjust newly defined be_reuse_connection() API. The stream argument is removed. This will allows checks to be able to invoke it without relying on a stream instance.	2025-04-02 14:57:40 +02:00
Amaury Denoyelle	ee94a6cfc1	MINOR: backend: extract conn reuse from connect_server() Following the previous patch, the part directly related to connection reuse is extracted from connect_server(). It is now define in a new function be_reuse_connection().	2025-04-02 14:57:40 +02:00
Amaury Denoyelle	c7cc6b6401	MINOR: backend: extract conn hash calculation from connect_server() On connection reuse, a hash is first calculated. It is generated from various connection parameters, to retrieve a matching connection. Extract hash calculation from connect_server() into a new dedicated function be_calculate_conn_hash(). The objective is to be able to perform connection reuse for checks, without connect_server() invokation which relies on a stream instance.	2025-04-02 14:57:40 +02:00
Willy Tarreau	4ec5509541	BUILD: compiler: undefine the CONCAT() macro if already defined As Ilya reported in issue #2911, the CONCAT() macro breaks on NetBSD which defines its own as __CONCAT() (which is exactly the same). Let's just undefine it before ours to fix the issue instead of renaming, but keep ours so that we don't have doubts about what we're running with. Note that the patch introducing this breaking change was backported to 3.0.	2025-04-02 11:36:43 +02:00
Ilia Shipitsin	78b849b839	CLEANUP: assorted typo fixes in the code and comments code, comments and doc actually.	2025-04-02 11:12:20 +02:00
Olivier Houchard	9fe72bba3c	MAJOR: leastconn; Revamp the way servers are ordered. For leastconn, servers used to just be stored in an ebtree. Each server would be one node. Change that so that nodes contain multiple mt_lists. Each list will contain servers that share the same key (typically meaning they have the same number of connections). Using mt_lists means that as long as tree elements already exist, moving a server from one tree element to another does no longer require the lbprm write lock. We use multiple mt_lists to reduce the contention when moving a server from one tree element to another. A list in the new element will be chosen randomly. We no longer remove a tree element as soon as they no longer contain any server. Instead, we keep a list of all elements, and when we need a new element, we look at that list only if it contains a number of elements already, otherwise we'll allocate a new one. Keeping nodes in the tree ensures that we very rarely have to take the lbrpm write lock (as it only happens when we're moving the server to a position for which no element is currently in the tree). The number of mt_lists used is defined as FWLC_NB_LISTS. The number of tree elements we want to keep is defined as FWLC_MIN_FREE_ENTRIES, both in defaults.h. The value used were picked afrer experimentation, and seems to be the best choice of performances vs memory usage. Doing that gives a good boost in performances when a lot of servers are used. With a configuration using 500 servers, before that patch, about 830000 requests per second could be processed, with that patch, about 1550000 requests per second are processed, on an 64-cores AMD, using 1200 concurrent connections.	2025-04-01 18:05:30 +02:00
Olivier Houchard	ba521a1d88	MINOR: threads: Add HA_RWLOCK_TRYRDTOWR() Add HA_RWLOCK_TRYRDTOWR(), that tries to upgrade a lock from reader to writer, and fails if any seeker or writer already holds it.	2025-04-01 18:05:30 +02:00
Olivier Houchard	2a9436f96b	MINOR: lbprm: Add method to deinit server and proxy Add two new methods to lbprm, server_deinit() and proxy_deinit(), in case something should be done at the lbprm level when removing servers and proxies.	2025-04-01 18:05:30 +02:00
Olivier Houchard	17059098e7	MINOR: mt_list: Implement mt_list_try_lock_prev(). Implement mt_list_try_lock_prev(), that does the same thing as mt_list_lock_prev(), exceot if the list is locked, it returns { NULL, NULL } instaed of waiting.	2025-04-01 18:05:30 +02:00
William Lallemand	fdcb97614c	MINOR: ssl/ckch: add substring parser for ckch_conf Add a substring parser for the ckch_conf keyword parser, this will split a string into multiple substring, and strdup them in a array.	2025-04-01 15:38:32 +02:00
William Lallemand	f8fe84caca	MINOR: jws: emit the JWK thumbprint jwk_thumbprint() is a function which is a function which implements RFC7368 and emits a JWK thumbprint using a EVP_PKEY. EVP_PKEY_EC_to_pub_jwk() and EVP_PKEY_RSA_to_pub_jwk() were changed in order to match what is required to emit a thumbprint (ie, no spaces or lines and the lexicographic order of the fields)	2025-04-01 11:57:55 +02:00
Willy Tarreau	1e9a2529aa	MINOR: cpu-topo: pass an extra argument to ha_cpu_policy This extra argument will allow common functions to distinguish between multiple policies. For now it's not used.	2025-03-31 16:21:37 +02:00
Willy Tarreau	571573874a	MINOR: cpu-set: add a new function to print cpu-sets in human-friendly mode The new function "print_cpu_set()" will print cpu sets in a human-friendly way, with commas and dashes for intervals. The goal is to keep them compact enough.	2025-03-31 16:21:37 +02:00
Willy Tarreau	3955f151b1	MINOR: cpu-set: compare two cpu sets with ha_cpuset_isequal() This function returns true if two CPU sets are equal.	2025-03-31 16:21:37 +02:00
Valentine Krasnobaeva	b303861469	MINOR: compiler: add __nonstring macro GCC 15 throws the following warning on fixed-size char arrays if they do not contain terminated NUL: src/tools.c:2041:25: error: initializer-string for array of 'char' truncates NUL terminator but destination lacks 'nonstring' attribute (17 chars into 16 available) [-Werror=unterminated-string-initialization] 2041 \| const char hextab[16] = "0123456789ABCDEF"; We are using a couple of such definitions for some constants. Converting them to flexible arrays, like: hextab[] = "0123456789ABCDEF" may have consequences, as enlarged arrays won't fit anymore where they were possibly located due to the memory alignement constraints. GCC adds 'nonstring' variable attribute for such char arrays, but clang and other compilers don't have it. Let's wrap 'nonstring' with our __nonstring macro, which will test if the compiler supports this attribute. This fixes the issue #2910.	2025-03-31 13:50:28 +02:00
Willy Tarreau	6b17310757	MEDIUM: pools: be a bit smarter when merging comparable size pools By default, pools of comparable sizes are merged together. However, the current algorithm is dumb: it rounds the requested size to the next multiple of 16 and compares the sizes like this. This results in many entries which are already multiples of 16 not being merged, for example 1024 and 1032 are separate, 65536 and 65540 are separate, 48 and 56 are separate (though 56 merges with 64). This commit changes this to consider not just the entry size but also the average entry size, that is, it compares the average size of all objects sharing the pool with the size of the object looking for a pool. If the object is not more than 1% bigger nor smaller than the current average size or if it neither 16 bytes smaller nor larger, then it can be merged. Also, it always respects exact matches in order to avoid merging objects into larger pools or worse, extending existing ones for no reason, and when there's a tie, it always avoids extending an existing pool. Also, we now visit all existing pools in order to spot the best one, we do not stop anymore at the smallest one large enough. Theoretically this could cost a bit of CPU but in practice it's O(N^2) with N quite small (typically in the order of 100) and the cost at each step is very low (compare a few integer values). But as a side effect, pools are no longer sorted by size, "show pools bysize" is needed for this. This causes the objects to be much better grouped together, accepting to use a little bit more sometimes to avoid fragmentation, without causing everyone to be merged into the same pool. Thanks to this we're now seeing 36 pools instead of 48 by default, with some very nice examples of compact grouping: - Pool qc_stream_r (80 bytes) : 13 users > qc_stream_r : size=72 flags=0x1 align=0 > quic_cstrea : size=80 flags=0x1 align=0 > qc_stream_a : size=64 flags=0x1 align=0 > hlua_esub : size=64 flags=0x1 align=0 > stconn : size=80 flags=0x1 align=0 > dns_query : size=64 flags=0x1 align=0 > vars : size=80 flags=0x1 align=0 > filter : size=64 flags=0x1 align=0 > session pri : size=64 flags=0x1 align=0 > fcgi_hdr_ru : size=72 flags=0x1 align=0 > fcgi_param_ : size=72 flags=0x1 align=0 > pendconn : size=80 flags=0x1 align=0 > capture : size=64 flags=0x1 align=0 - Pool h3s (56 bytes) : 17 users > h3s : size=56 flags=0x1 align=0 > qf_crypto : size=48 flags=0x1 align=0 > quic_tls_se : size=48 flags=0x1 align=0 > quic_arng : size=56 flags=0x1 align=0 > hlua_flt_ct : size=56 flags=0x1 align=0 > promex_metr : size=48 flags=0x1 align=0 > conn_hash_n : size=56 flags=0x1 align=0 > resolv_requ : size=48 flags=0x1 align=0 > mux_pt : size=40 flags=0x1 align=0 > comp_state : size=40 flags=0x1 align=0 > notificatio : size=48 flags=0x1 align=0 > tasklet : size=56 flags=0x1 align=0 > bwlim_state : size=48 flags=0x1 align=0 > xprt_handsh : size=48 flags=0x1 align=0 > email_alert : size=56 flags=0x1 align=0 > caphdr : size=41 flags=0x1 align=0 > caphdr : size=41 flags=0x1 align=0 - Pool quic_cids (32 bytes) : 13 users > quic_cids : size=16 flags=0x1 align=0 > quic_tls_ke : size=32 flags=0x1 align=0 > quic_tls_iv : size=12 flags=0x1 align=0 > cbuf : size=32 flags=0x1 align=0 > hlua_queuew : size=24 flags=0x1 align=0 > hlua_queue : size=24 flags=0x1 align=0 > promex_modu : size=24 flags=0x1 align=0 > cache_st : size=24 flags=0x1 align=0 > spoe_appctx : size=32 flags=0x1 align=0 > ehdl_sub_tc : size=32 flags=0x1 align=0 > fcgi_flt_ct : size=16 flags=0x1 align=0 > sig_handler : size=32 flags=0x1 align=0 > pipe : size=24 flags=0x1 align=0 - Pool quic_crypto (1032 bytes) : 2 users > quic_crypto : size=1032 flags=0x1 align=0 > requri : size=1024 flags=0x1 align=0 - Pool quic_conn_r (65544 bytes) : 2 users > quic_conn_r : size=65536 flags=0x1 align=0 > dns_msg_buf : size=65540 flags=0x1 align=0 On a very unscientific test consisting in sending 1 million H1 requests and 1 million H2 requests to the stats page, we're seeing an ~6% lower memory usage with the patch: before the patch: Total: 48 pools, 4120832 bytes allocated, 4120832 used (~3555680 by thread caches). after the patch: Total: 36 pools, 3880648 bytes allocated, 3880648 used (~3299064 by thread caches). This should be taken with care however since pools allocate and release in batches.	2025-03-25 18:01:01 +01:00
Pierre-Andre Savalle	8ed1e91efd	MEDIUM: lb-chash: add directive hash-preserve-affinity When using hash-based load balancing, requests are always assigned to the server corresponding to the hash bucket for the balancing key, without taking maxconn or maxqueue into account, unlike in other load balancing methods like 'first'. This adds a new backend directive that can be used to take maxconn and possibly maxqueue in that context. This can be used when hashing is desired to achieve cache locality, but sending requests to a different server is preferable to queuing for a long time or failing requests when the initial server is saturated. By default, affinity is preserved as was the case previously. When 'hash-preserve-affinity' is set to 'maxqueue', servers are considered successively in the order of the hash ring until a server that does not have a full queue is found. When 'maxconn' is set on a server, queueing cannot be disabled, as 'maxqueue=0' means unlimited. To support picking a different server when a server is at 'maxconn' irrespective of the queue, 'hash-preserve-affinity' can be set to 'maxconn'.	2025-03-25 18:01:01 +01:00
Amaury Denoyelle	cf9e40bd8a	MINOR: quic: define max-stream-data configuration as a ratio	2025-03-25 16:30:35 +01:00
Amaury Denoyelle	68c10d444d	MINOR: mux-quic: define config for max-data Define a new global configuration tune.quic.frontend.max-data. This allows users to explicitely set the value for the corresponding QUIC TP initial-max-data, with direct impact on haproxy memory consumption.	2025-03-25 16:30:09 +01:00
Amaury Denoyelle	a71007c088	MINOR: quic: move global tune options into quic_tune A new structure quic_tune has recently been defined. Its purpose is to store global options related to QUIC. Previously, only the tunable to toggle pacing was stored in it. This commit moves several QUIC related tunable from global to quic_tune structure. This better centralizes QUIC configuration option and gives room for future generic options.	2025-03-24 10:01:46 +01:00
Willy Tarreau	9091c5317f	MINOR: cli/pools: record the list of pool registrations even when merging them By default, create_pool() tries to merge similar pools into one. But when dealing with certain bugs, it's hard to say which ones were merged together. We do have the information at registration time, so let's just create a list of registrations ("pool_registration") attached to each pool, that will store that information. It can then be consulted on the CLI using "show pools detailed", where the names, sizes, alignment and flags are reported.	2025-03-21 17:09:30 +01:00
Aurelien DARRAGON	7ec6f4412c	MINOR: stats: add alt_name field to stat_col struct alt_name will be used by metric exporters to know how the metric should be presented to the user. If the alt_name is NULL, the metric should be ignored. For now only promex exporter will make use of this.	2025-03-21 17:04:54 +01:00
Olivier Houchard	98967aa09f	MEDIUM: mt_list: Reduce the max number of loops with exponential backoff Reduce the max number of loops in the mt_list code while waiting for a lock to be available with exponential backoff. It's been observed that the current value led to severe performances degradation at least on some hardware, hopefully this value will be acceptable everywhere.	2025-03-21 11:30:59 +01:00
Aurelien DARRAGON	af68343a56	MINOR: stats: use stat_col storage stat_cols_info Use stat_col storage for stat_cols_info[] array instead of name_desc. As documented in `65624876f` ("MINOR: stats: introduce a more expressive stat definition method"), stat_col supersedes name_desc storage but it remains backward compatible. Here we migrate to the new API to be able to further extend stat_cols_info[] in following patches.	2025-03-20 11:38:32 +01:00
Aurelien DARRAGON	9c60fc9fe1	MINOR: stats: STATS_PX_CAP___B_ macro STATS_PX_CAP___B_ points to STATS_PX_CAP_BE, it is just an alias for consistency, like STATS_PX_CAP____S which points to STATS_PX_CAP_SRV.	2025-03-20 11:37:47 +01:00
Aurelien DARRAGON	3c1b00b127	MINOR: stats: add .generic explicit field in stat_col struct Further extend logic implemented in `65624876` ("MINOR: stats: introduce a more expressive stat definition method") and `4e9e8418` ("MINOR: stats: prepare stats-file support for values other than FN_COUNTER"): we don't rely anymore on the presence of the capability to know if the metric is generic or not. This is because it prevents us from setting a capability on static statistics. Yet it could be useful to set the capability even on static metrics, thus we add a dedicated .generic bit to tell haproxy that the metric is generic and can be handled automatically by the API. Also, ME_NEW_* helpers are not explicitly associated to generic metric definition (as it was already the case before) to avoid ambiguities. It may change in the future as we may need to use the new definition method to define static metrics (without the generic bit set). But for now it isn't the case as this need definition was implemented for generic metrics support in the first place. If we want to define static metrics using the API, we could add a new set of helpers for instance.	2025-03-20 11:37:21 +01:00
William Lallemand	2fb6270910	MEDIUM: ssl/ckch: make the ckch_conf more generic The ckch_store_load_files() function makes specific processing for PARSE_TYPE_STR as if it was a type only used for paths. This patch changes a little bit the way it's done, PARSE_TYPE_STR is only meant to strdup() a string and stores the resulting pointer in the ckch_conf structure. Any processing regarding the path is now done in the callback. Since the callbacks were basically doing the same thing, they were transformed into the DECLARE_CKCH_CONF_LOAD() macros which allows to do some templating of these functions. The resulting ckch_conf_load_* functions will do the same as before, except they will also do the path processing instead of letting ckch_store_load_files() do it, which means we don't need the "base" member anymore in the struct ckch_conf_kws.	2025-03-19 18:08:40 +01:00
William Lallemand	b0ad777902	MINOR: tools: path_base() concatenates a path with a base path With the SSL configuration, crt-base, key-base are often used, these keywords concatenates the base path with the path when the path does not start by '/'. This is done at several places in the code, so a function to do this would be better to standardize the code.	2025-03-19 17:59:31 +01:00
William Lallemand	29b4b985c3	MINOR: jws: use jwt_alg type instead of a char This patch implements the function EVP_PKEY_to_jws_algo() which returns a jwt_alg compatible with the private key. This value can then be passed to jws_b64_protected() and jws_b64_signature() which modified to take an jwt_alg instead of a char.	2025-03-17 18:06:34 +01:00
William Lallemand	de67f25a7e	MINOR: jws: add new functions in jws.h Add signatures of jws_b64_payload(), jws_b64_protected(), jws_b64_signature(), jws_flattened() which allows to create a complete JWS flattened object.	2025-03-17 11:51:52 +01:00
Willy Tarreau	156430ceb6	MINOR: cpu-topo: add a CPU policy setting to the global section We'll need to let the user decide what's best for their workload, and in order to do this we'll have to provide tunable options. For that, we're introducing struct ha_cpu_policy which contains a name, a description and a function pointer. The purpose will be to use that function pointer to choose the best CPUs to use and now to set the number of threads and thread-groups, that will be called during the thread setup phase. The only supported policy for now is "none" which doesn't set/touch anything (i.e. all available CPUs are used).	2025-03-14 18:33:16 +01:00
Willy Tarreau	c93ee25054	MINOR: cpu-topo: add "only-node" and "drop-node" to cpu-set These are processed after the topology is detected, and they allow to restrict binding to or evict CPUs matching the indicated node(s).	2025-03-14 18:33:16 +01:00
Willy Tarreau	aa4776210b	MINOR: cpu-topo: create an array of the clusters The goal here is to keep an array of the known CPU clusters, because we'll use that often to decide of the performance of a cluster and its relevance compared to other ones. We'll store the number of CPUs in it, the total capacity etc. For the capacity, we count one unit per core, and 1/3 of it per extra SMT thread, since this is roughly what has been measured on modern CPUs. In order to ease debugging, they're also dumped with -dc.	2025-03-14 18:30:31 +01:00
Willy Tarreau	4a6eaf6c5e	MINOR: cpu-topo: add a function to sort by cluster+capacity The purpose here is to detect heterogenous clusters which are not properly reported, based on the exposed information about the cores capacity. The algorithm here consists in sorting CPUs by capacity within a cluster, and considering as equal all those which have 5% or less difference in capacity with the previous one. This allows large clusters of more than 5% total between extremities, while keeping apart those where the limit is more pronounced. This is quite common in embedded environments with big.little systems, as well as on some laptops.	2025-03-14 18:30:31 +01:00
Willy Tarreau	d169758fa9	MINOR: cpu-topo: make sure we don't leave unassigned IDs in the cpu_topo It's important that we don't leave unassigned IDs in the topology, because the selection mechanism is based on index-based masks, so an unassigned ID will never be kept. This is particularly visible on systems where we cannot access the CPU topology, the package id, node id and even thread id are set to -1, and all CPUs are evicted due to -1 not being set in the "only-cpu" sets. Here in new function "cpu_fixup_topology()", we assign them with the smallest unassigned value. This function will be used to assign IDs where missing in general.	2025-03-14 18:30:31 +01:00
Willy Tarreau	af648c7b58	MINOR: cpu-topo: assign clusters to cores without and renumber them Due to the previous commit we can end up with cores not assigned any cluster ID. For this, at the end we sort the CPUs by topology and assign cluster IDs to remaining CPUs based on pkg/node/llc. For example an 14900 now shows 5 clusters, one for the 8 p-cores, and 4 of 4 e-cores each. The local cluster numbers are per (node,pkg) ID so that any rule could easily be applied on them, but we also keep the global numbers that will help with thread group assignment. We still need to force to assign distinct cluster IDs to cores running on a different L3. For example the EPYC 74F3 is reported as having 8 different L3s (which is true) and only one cluster. Here we introduce a new function "cpu_compose_clusters()" that is called from the main init code just after cpu_detect_topology() so that it's not OS-dependent. It deals with this renumbering of all clusters in topology order, taking care of considering any distinct LLC as being on a distinct cluster.	2025-03-14 18:30:31 +01:00
Willy Tarreau	a4471ea56d	MINOR: cpu-topo: implement a CPU sorting mechanism by cluster ID This will be used to detect and fix incorrect setups which report the same cluster ID for multiple L3 instances. The arrangement of functions in this file is becoming a real problem. Maybe we should move all this to cpu_topo for example, and better distinguish OS-specific and generic code.	2025-03-14 18:30:31 +01:00
Willy Tarreau	a8acdbd9fd	MINOR: cpu-topo: implement a sorting mechanism by CPU locality Once we've kept only the CPUs we want, the next step will be to form groups and these ones are based on locality. Thus we'll have to sort by locality. For now the locality is only inferred by the index. No grouping is made at this point. For this we add the "cpu_reorder_by_locality" function with a locality-based comparison function.	2025-03-14 18:30:31 +01:00
Willy Tarreau	18133a054d	MINOR: cpu-topo: implement a sorting mechanism for CPU index CPU selection will be performed by sorting CPUs according to various criteria. For dumps however, that's really not convenient and we'll need to reorder the CPUs according to their index only. This is what the new function cpu_reorder_by_index() does. It's called in thread_detect_count() before dumping the CPU topology.	2025-03-14 18:30:31 +01:00
Willy Tarreau	1af4942c95	MEDIUM: thread: start to detect thread groups and threads min/max By mutually refining the thread count and group count, we can try to detect the most suitable setup for the current machine. Taskset is implicitly handled correctly. tgroups automatically adapt to the configured number of threads. cpu-map manages to limit tgroups to the smallest supported value. The thread-limit is enforced. Just like in cfgparse, if the thread count was forced to a higher value, it's reduced and a warning is emitted. But if it was not set, the thr_max value is bound to this limit so that further calculations respect it. We continue to default to the max number of available threads and 1 tgroup by default, with the limit. This normally allows to get rid of that test in check_config_validity().	2025-03-14 18:30:30 +01:00
Willy Tarreau	f0661e79fe	MINOR: global: add a command-line option to enable CPU binding debugging During development, everything related to CPU binding and the CPU topology is debugged using state dumps at various places, but it does make sense to have a real command line option so that this remains usable in production to help users figure why some CPUs are not used by default. Let's add "-dc" for this. Since the list of global.tune.options values is almost full and does not 100% match this option, let's add a new "tune.debug" field for this.	2025-03-14 18:30:30 +01:00
Willy Tarreau	ac1db9db7d	MINOR: thread: turn thread_cpu_mask_forced() into an init-time variable The function is not convenient because it doesn't allow us to undo the startup changes, and depending on where it's being used, we don't know whether the values read have already been altered (this is not the case right now but it's going to evolve). Let's just compute the status during cpu_detect_usable() and set a variable accordingly. This way we'll always read the init value, and if needed we can even afford to reset it. Also, placing it in cpu_topo.c limits cross-file dependencies (e.g. threads without affinity etc).	2025-03-14 18:30:30 +01:00
Willy Tarreau	7cb274439b	MINOR: cpu-topo: add CPU topology detection for linux This uses the publicly available information from /sys to figure the cache and package arrangements between logical CPUs and fill ha_cpu_topo[], as well as their SMT capabilities and relative capacity for those which expose this. The functions clearly have to be OS-specific.	2025-03-14 18:30:30 +01:00
Willy Tarreau	8f72ce335a	MINOR: cpu-topo: add detection of online CPUs on Linux This adds a generic function ha_cpuset_detect_online() which for now only supports linux via /sys. It fills a cpuset with the list of online CPUs that were detected (or returns a failure).	2025-03-14 18:30:30 +01:00
Willy Tarreau	8c524c7c9d	REORG: cpu-topo: move bound cpu detection from cpuset to cpu-topo The cpuset files are normally used only for cpu manipulations. It happens that the initial CPU binding detection was initially placed there since there was no better place, but in practice, being OS-specific, it should really be in cpu-topo. This simplifies cpuset which doesn't need to know about the OS anymore.	2025-03-14 18:30:30 +01:00

1 2 3 4 5 ...

8350 commits