haproxy

mirror of https://github.com/haproxy/haproxy.git synced 2026-04-15 21:59:41 -04:00

Author	SHA1	Message	Date
Willy Tarreau	ed75148ca0	BUILD: tools: avoid a build warning on gcc-4.8 in resolve_sym_name() A build warning is emitted with gcc-4.8 in tools.c since commit `e920d73f59` ("MINOR: tools: improve symbol resolution without dl_addr") because the compiler doesn't see that <size> is necessarily initialized. Let's just preset it.	2025-03-14 18:30:30 +01:00
Willy Tarreau	4e09789644	MINOR: tools: teach resolve_sym_name() a few more common symbols This adds run_poll_loop, run_tasks_from_lists, process_runnable_tasks, ha_dump_backtrace and cli_io_handler which are fairly common in backtraces. This will be less relative symbols when dladdr is not usable.	2025-03-13 17:31:16 +01:00
Willy Tarreau	a3582a77f7	MINOR: tools: ease the declaration of known symbols in resolve_sym_name() Let's have a macro that declares both the symbol and its name, it will avoid the risk of introducing typos, and encourages adding more when needed. The macro also takes an optional second argument to permit an inline declaration of an extern symbol.	2025-03-13 17:30:48 +01:00
Willy Tarreau	e920d73f59	MINOR: tools: improve symbol resolution without dl_addr When dl_addr is not usable or fails, better fall back to the closest symbol among the known ones instead of providing everything relative to main. Most often, the location of the function will give some hints about what it can be. Thus now we can emit fct+0xXXX in addition to main+0xXXX or main-0xXXX. We keep a margin of +256kB maximum after a function for a match, which is around the maximum size met in an object file, otherwise it becomes pointless again.	2025-03-13 17:30:48 +01:00
Willy Tarreau	1e99efccef	MINOR: cli: export cli_io_handler() to ease symbol resolution It's common to meet this function in backtraces, it's a bit annoying that it's not resolved, so let's export it so that it becomes resolvable.	2025-03-13 17:30:48 +01:00
Aurelien DARRAGON	8311be5ac6	BUG/MINOR: stats: fix capabilities and hide settings for some generic metrics Performing a diff on stats output before vs after commit `66152526` ("MEDIUM: stats: convert counters to new column definition") revealed that some metrics were not properly ported to to the new API. Namely, "lbtot", "cli_abrt" and "srv_abrt" are now exposed on frontend and listeners while it was not the case before. Also, "hrsp_other" is exposed even when "mode http" wasn't set on the proxy. In this patch we restore original behavior by fixing the capabilities and hide settings. As this could be considered as a minor regression (looking at the commit message it doesn't seem intended), better tag this as a bug. It should be backported in 3.0 with `66152526`.	2025-03-13 11:49:18 +01:00
Willy Tarreau	78ef52dbd1	BUILD: backend: silence a build warning when threads are disabled Since commit `8de8ed4f48` ("MEDIUM: connections: Allow taking over connections from other tgroups.") we got this partially absurd build warning when disabling threads: src/backend.c: In function 'conn_backend_get': src/backend.c:1371:27: warning: array subscript [0, 0] is outside array bounds of 'struct tgroup_info[1]' [-Warray-bounds] The reason is that gcc sees that curtgid is not equal to tgid which is defined as 1 in this case, thus it figures that tgroup_info[curtgid-1] will be anything but zero and that doesn't fit. It is ridiculous as it is a perfect case of dead code elimination which should not warrant a warning. Nevertheless we know we don't need to do this when threads are disabled and in this case there will not be more than 1 thread group, so we can happily use that preliminary test to help the compiler eliminate the dead condition and avoid spitting this warning. No backport is needed.	2025-03-12 18:16:14 +01:00
Willy Tarreau	b61ed9babe	BUILD: tools: silence a build warning when USE_THREAD=0 The dladdr_lock that was added to avoid re-entering into dladdr is conditioned by threads, but the way it's declared causes a build warning if threads are disabled due to the insertion of a lone semi colon in the variables block. Let's switch to __decl_thread_var() for this. This can be backported wherever commit `eb41d768f9` ("MINOR: tools: use only opportunistic symbols resolution") is backported. It relies on these previous two commits: `bb4addabb7` ("MINOR: compiler: add a simple macro to concatenate resolved strings") `69ac4cd315` ("MINOR: compiler: add a new __decl_thread_var() macro to declare local variables")	2025-03-12 18:11:14 +01:00
Willy Tarreau	12383fd9f5	BUG/MEDIUM: thread: use pthread_self() not ha_pthread[tid] in set_affinity A bug was uncovered by the work on NUMA. It only triggers in the CI with libmusl due to a race condition. What happens is that the call to set_thread_cpu_affinity() is done very early in the polling loop, and that it relies on ha_pthread[tid] instead of pthread_self(). The problem is that ha_pthread[tid] is only set by the return from pthread_create(), which might happen later depending on the number of CPUs available to run the starting thread. Let's just use pthread_self() here. ha_pthread[] is only used to send signals between threads, there's no point in using it here. This can be backported to 2.6.	2025-03-12 15:59:23 +01:00
Aurelien DARRAGON	e942305214	MEDIUM: log: change default "host" strategy for log-forward section Historically, log-forward proxy used to preserve host field from input message as much as possible, and if syslog host wasn't provided (rfc5424 '-' or bad rfc3164 or rfc5424 message) then "localhost" or "-" would be used as host when outputting message using rfc3164 or rfc5424. We change that behavior (which corresponds to "keep" host option), so that log-forward now uses "fill" strategy as default: if the host is provided in input message, it is preserved. However if it is missing and IP address from sender is available, we use it.	2025-03-12 10:55:49 +01:00
Aurelien DARRAGON	ad0133cc50	MINOR: log: handle log-forward "option host" Following previous patch, we know implement the logic for the host option under log-forward section. Possible strategies are: replace If input message already contains a value for the host field, we replace it by the source IP address from the sender. If input message doesn't contain a value for the host field (ie: '-' as input rfc5424 message or non compliant rfc3164 or rfc5424 message), we use the source IP address from the sender as host field. fill If input message already contains a value for the host field, we keep it. If input message doesn't contain a value for the host field (ie: '-' as input rfc5424 message or non compliant rfc3164 or rfc5424 message), we use the source IP address from the sender as host field. keep If input message already contains a value for the host field, we keep it. If input message doesn't contain a value for the host field, we set it to localhost (rfc3164) or '-' (rfc5424). (This is the default) append If input message already contains a value for the host field, we append a comma followed by the IP address from the sender. If input message doesn't contain a value for the host field, we use the source IP address from the sender. Default value (unchanged) is "keep" strategy. option host is only relevant with rfc3164 or rfc5424 format on log targets. Also, if the source address is not available (ie: UNIX socket), default behavior prevails. Documentation was updated.	2025-03-12 10:52:07 +01:00
Aurelien DARRAGON	003fe530ae	MINOR: log: add "option host" log-forward option add only the parsing part, options are currently unused	2025-03-12 10:51:35 +01:00
Aurelien DARRAGON	47f14be9f3	MINOR: tools: only print address in sa2str() when port == -1 Support special value for port in sa2str: if port is equal to -1, only print the address without the port, also ignoring <map_ports> value.	2025-03-12 10:51:20 +01:00
Aurelien DARRAGON	2de62d0461	MINOR: log: provide source address information in syslog_process_message() provide struct sockaddr_storage pointer from the message sender in syslog_process_message()	2025-03-12 10:50:30 +01:00
Aurelien DARRAGON	bc76f6dde9	MINOR: log: migrate log-forward options from proxy->options2 to options3 Migrate recently added log-forward section options, currently stored under proxy->options2 to proxy->options3 since proxy->options2 is running out of space and we plan on adding more log-forward options.	2025-03-12 10:50:03 +01:00
Aurelien DARRAGON	cc5a66212d	MINOR: proxy: add proxy->options3 proxy->options2 is almost full, yet we will add new log-forward options in upcoming patches so we anticipate that by adding a new {no_}options3 and cfg_opts3[] to further extend proxy options	2025-03-12 10:49:36 +01:00
Aurelien DARRAGON	d47e7103b8	CLEANUP: log: add syslog_process_message() helper Prevent code duplication under syslog_fd_handler() and syslog_io_handler() by merging common code path in a single syslog_process_message() helper that processed a single message stored in <buf> according to <frontend> settings.	2025-03-12 10:49:18 +01:00
Aurelien DARRAGON	8b8520305e	CLEANUP: log-forward: remove useless options2 init It is actually not required to zero out proxy->options2 since proxy is allocated using calloc() which already does it.	2025-03-12 10:49:08 +01:00
William Lallemand	d014d7ee72	TESTS: jws: implement a test for JWS signing This test returns a JWS payload signed a specified private key in the PEM format, and uses the "jose" command tool to check if the signature is correct against the jwk public key. The test could be improved later by using the code from jwt.c allowing to check a signature.	2025-03-11 22:29:40 +01:00
William Lallemand	3abb428fc8	MINOR: jws: implement JWS signing This commits implement JWS signing, this is divided in 3 parts: - jws_b64_protected() creates a JWS "protected" header, which takes the algorithm, kid or jwk, nonce and url as input, and fill a destination buffer with the base64url version of the header - jws_b64_payload() just encode a payload in base64url - jws_b64_signature() generates a signature using as input the protected header and the payload, it supports ES256, ES384 and ES512 for ECDSA keys, and RS256 for RSA ones. The RSA signature just use the EVP_DigestSign() API with its result encoded in base64url. For ECDSA it's a little bit more complicated, and should follow section 3.4 of RFC7518, R and S should be padded to byte size. Then the JWS can be output with jws_flattened() which just formats the 3 base64url output in a JSON representation with the 3 fields, protected, payload and signature.	2025-03-11 22:29:40 +01:00
Valentine Krasnobaeva	7d427134fe	MINOR: startup: adjust alert messages, when capabilities are missed CAP_SYS_ADMIN support was added, in order to access sockets in namespaces. So let's adjust the alert at startup, where we check preserved capabilities from global.last_checks. Let's mention here cap_sys_admin as well.	2025-03-07 16:37:16 +01:00
Damien Claisse	f0a07f834c	BUG/MINOR: cfgparse-tcp: relax namespace bind check Commit `5cbb278` introduced cap_sys_admin support, and enforced checks for both binds and servers. However, when binding into a namespace, the bind is done before dropping privileges. Hence, checking that we have cap_sys_admin capability set in this case is not needed (and it would decrease security to add it). For users starting haproxy with other user than root and without cap_sys_admin, bind should have already failed. As a consequence, relax runtime check for binds into a namespace.	2025-03-07 16:23:29 +01:00
Amaury Denoyelle	dc7913d814	MAJOR: mux-quic: increase stream flow-control for multi-buffer alloc Support for multiple Rx buffers per QCS instance has been introduced by previous patches. However, due to flow-control initial values, client were still unable to fully used this to increase their upload throughput. This patch increases max-stream-data-bidi-remote flow-control initial values. A new define QMUX_STREAM_RX_BUF_FACTOR will fix the number of concurrent buffers allocable per QCS. It is set to 90. Note that connection flow-control initial value did not changed. It is still configured to be equivalent to bufsize multiplied by the maximum concurrent streams. This ensures that Rx buffers allocation is still constrained per connection, so that it won't be possible to have all active QCS instances using in parallel their maximum Rx buffers count.	2025-03-07 12:06:27 +01:00
Amaury Denoyelle	75027692a3	MEDIUM: mux-quic: handle too short data splitted on multiple rxbuf Previous commit introduces support for multiple Rx buffers per QCS instance. Contiguous data may be splitted accross multiple buffers depending on their offset. A particular issue could arise with this new model. Indeed, app_ops rcv_buf callback can still deal with a single buffer at a time. This may cause a deadlock in decoding if app_ops layer cannot proceed due to partial data, but such data are precisely divided on two buffers. This can for example intervene during HTTP/3 frame header parsing. To deal with this, a new function is implemented to force data realign between two contiguous buffers. This is called only when app_ops rcv_buf returned 0 but data is available in the next buffer after the current one. In this case, data are transferred from the next into the current buffer via qcs_transfer_rx_data(). Decoding is then restarted, which should ensure that app_ops layer has enough data to advance. During this operation, special care is ensure to removed both qc_stream_rxbuf entries, as their offset are adjusted. The next buffer is only reinserted if there is remaining data in it, else it can be freed. This case is not easily reproducible as it depends on the HTTP/3 framing used by the client. It seems to be easily reproduced though with quiche. $ quiche-client --http-version HTTP/3 --method POST --body /tmp/100m \ "https://127.0.0.1:20443/post"	2025-03-07 12:06:27 +01:00
Amaury Denoyelle	60f64449fb	MAJOR: mux-quic: support multiple QCS RX buffers Implement support for multiple Rx buffers per QCS instances. This requires several changes mostly in qcc_recv() / qcc_decode_qcs() which deal with STREAM frames reception and decoding. These multiple buffers can be stored in QCS rx.bufs tree which was introduced in an earlier patch. On STREAM frame reception, a buffer is retrieved from QCS bufs tree, or allocated if necessary, based on the data starting offset. Each buffers are aligned on bufsize for convenience. This ensures there is no overlap between two contiguous buffers. Special care is taken when dealing with a STREAM frame which must be splitted and stored in two contiguous buffers. When decoding input data, qcc_decode_qcs() is still invoked with a single buffer as input. This requires a new while loop to ensure decoding is performed accross multiple contiguous buffers until all data are decoded or app stream buffer is full. Also, after qcs_consume() has been performed, the stream Rx channel is immediately closed if FIN was already received and QCS now contains only a single buffer with all remaining data. This is necessary as qcc_recv() is unable to close the Rx channel if FIN is received for a buffer different from the current readable offset. Note that for now stream flow-control value is still too low to fully utilizing this new infrastructure and improve clients upload throughput. Indeed, flow-control max-stream-data initial values are set to match bufsize. This ensures that each QCS will use 1 buffer, or at most 2 if data are splitted. A future patch will increase this value to unblock this limitation.	2025-03-07 12:06:26 +01:00
Amaury Denoyelle	7b168e356f	MINOR: mux-quic: adapt return value of qcc_decode_qcs() Change return value of qcc_decode_qcs(). It now directly returns the value from app_ops rcv_buf callback. Function documentation is updated to reflect this. For now, qcc_decode_qcs() return value is ignored by callers, so this patch should not have any functional change. However, it will become necessary when implementing multiple Rx buffers per QCS, as a loop will be implemented to invoke qcc_decode_qcs() on several contiguous buffers. Decoding must be stopped however as soon as an error is returned by rcv_buf callback. This is also the case in case of a null value, which indicates there is not enough data to continue decoding.	2025-03-07 12:06:26 +01:00
Amaury Denoyelle	6b5607d66f	MINOR: mux-quic: adjust Rx data consumption API HTTP/3 data are converted into HTX via qcc_decode_qcs() function. On completion, these data are removed from QCS Rx buffer via qcs_consume(). This patch adjust qcs_consume() API with several changes. Firstly, the Rx buffer instance to operate on must now be specified as a new argument to the function. Secondly, buffer liberation when all data were removed from qcs_consume() is extracted up to qcc_decode_qcs() caller. No functional change with this patch. The objective is to have an API which can be better adapted to multiple Rx buffers per QCS instance.	2025-03-07 12:06:26 +01:00
Amaury Denoyelle	a4f31ffeeb	MINOR: mux-quic: store QCS Rx buf in a single-entry tree Convert QCS rx buffer pointer to a tree container. Additionnaly, offset field of qc_stream_rxbuf is thus transformed into a node tree. For now, only a single Rx buffer is stored at most in QCS tree. Multiple Rx buffers will be implemented in a future patch to improve QUIC clients upload throughput.	2025-03-07 12:06:26 +01:00
Amaury Denoyelle	cc3c2d1f12	MINOR: mux-quic: define rxbuf wrapper Define a new type qc_stream_rxbuf. This is used as a wrapper around QCS Rx buffer with encapsulation of the ncbuf storage. It is allocated via a new pool. Several functions are adapted to be able to deal with qc_stream_rxbuf as a wrapper instead of the previous plain ncbuf instance. No functional change should happen with this patch. For now, only a single qc_stream_rxbuf can be instantiated per QCS. However, this new type will be useful to implement multiple Rx buffer storage in a future commit.	2025-03-07 12:06:26 +01:00
Amaury Denoyelle	4b1e63d191	MINOR: mux-quic: define globally stream rxbuf size QCS uses ncbuf for STREAM data storage. This serves as a limit for maximum STREAM buffering capacity, advertised via QUIC transport parameters for initial flow-control values. Define a new function qmux_stream_rx_bufsz() which can be used to retrieve this Rx buffer size. This can be used both in MUX/H3 layers and in QUIC transport parameters.	2025-03-07 12:06:26 +01:00
Amaury Denoyelle	7dd1eec2b1	MINOR: mux-quic: refine reception of standalone STREAM FIN Reception of standalone STREAM FIN is a corner case, which may be difficult to handle. In particular, care must be taken to ensure app_ops rcv_buf() is always called to be notify about FIN, even if Rx buffer is empty or full demux flag is set. If this is the case, it could prevent closure of QCS Rx channel. To ensure this, rcv_buf() was systematically called if FIN was received, with or without data payload. This could called unnecessary invokation when FIN is transmitted with data and full demux flag is set, or data are received out-of-order. This patches improve qcc_recv() by detecting explicitely a standalone FIN case. Thus, rcv_buf() is only forcefully called in this case and if all data were already previously received.	2025-03-07 12:06:26 +01:00
Amaury Denoyelle	20dc8e4ec2	MINOR/OPTIM: mux-quic: do not allocate rxbuf on standalone FIN STREAM FIN may be received without any payload. However, qcc_recv() always called qcs_get_ncbuf() indiscriminately, which may allocate a QCS Rx buffer. This is unneeded as there is no payload to store. Improve this by skipping qcs_get_ncbuf() invokation when dealing with a standalone FIN signal. This should prevent superfluous buffer allocation.	2025-03-07 12:06:26 +01:00
Amaury Denoyelle	861b11334c	MINOR: h3/hq-interop: restore function for standalone FIN receive Previously, a function qcs_http_handle_standalone_fin() was implemented to handle a received standalone FIN, bypassing app_ops layer decoding. However, this was removed as app_ops layer interaction is necessary. For example, HTTP/3 checks that FIN is never sent on the control uni stream. This patch reintroduces qcs_http_handle_standalone_fin(), albeit in a slightly diminished version. Most importantly, it is now the responsibility of the app_ops layer itself to use it, to avoid the shortcoming described above. The main objective of this patch is to be able to support standalone FIN in HTTP/0.9 layer. This is easily done via the reintroduction of qcs_http_handle_standalone_fin() usage. This will be useful to perform testing, as standalone FIN is a corner case which can easily be broken.	2025-03-07 12:06:26 +01:00
Amaury Denoyelle	6f95d0dad0	TESTS: quic: create first quic unittest Define a first unit-test dedicated to QUIC. A single test for now ensures that variable length decoding is compliant. This should be extended in the future with new set of tests.	2025-03-07 12:06:26 +01:00
Willy Tarreau	5e558c1727	MINOR: stream/cli: make "show sess" support filtering on front/back/server With "show sess", particularly "show sess all", we're often missing the ability to inspect only streams attached to a frontend, backend or server. Let's just add these filters to the command. Only one at a time may be set. One typical use case could be to dump streams attached to a server after issuing "shutdown sessions server XXX" to figure why any wouldn't stop for example.	2025-03-07 10:38:12 +01:00
Willy Tarreau	2bd7cf53cb	MINOR: stream/cli: rework "show sess" to better consider optional arguments The "show sess" CLI command parser is getting really annoying because several options were added in an exclusive mode as the single possible argument. Recently some cumulable options were added ("show-uri") but the older ones were not yet adapted. Let's just make sure that the various filters such as "older" and "age" now belong to the options and leave only <id>, "all", and "help" for the first ones. The doc was updated and it's now easier to find these options.	2025-03-07 10:36:58 +01:00
Willy Tarreau	1cdf2869f6	BUG/MINOR: stream: fix age calculation in "show sess" output The "show sess" output reports an age that's based on the last byte of the HTTP request instead of the stream creation date, due to a confusion between logs->request_ts and the request_date sample fetch function. Most of the time these are equal except when the request is not yet full for any reason (e.g. wait-body). This explains why a few "show sess" could report a few new streams aged by 99 days for example. Let's perform the correct request timestamp calculation like the sample fetch function does, by adding t_idle and t_handshake to the accept_ts. Now the stream's age is correct and can be correctly used with the "show sess older <age>" variant. This issue was introduced in 2.9 and the fix can be backported to 3.0.	2025-03-07 10:36:58 +01:00
Aurelien DARRAGON	dbb25720dd	MINOR: cfgparse/peers: provide more info when ignoring invalid "peer" or "server" lines Invalid (incomplete) "server" or "peer" lines under peers section are now properly ignored. For completeness, in this patch we add some reports so that the user knows that incomplete lines were ignored. For an incomplete server line, since it is tolerated (see GH #565), we only emit a diag warning. For an incomplete peer line, we report a real warning, as it is not expected to have a peer line without an address:port specified. Also, 'newpeer == curpeers->local' check could be simplified since we already have the 'local_peer' variable which tells us that the parsed line refers to a local peer.	2025-03-07 09:39:51 +01:00
Aurelien DARRAGON	a76b5358f0	BUG/MINOR: server: dont return immediately from parse_server() when skipping checks If parse_server() is called under peers section parser, and the address needs to be parsed but it is missing, we directly return from the function However since `0fc136ce5b` ("REORG: server: use parsing ctx for server parsing"), parse_server() uses parsing ctx to emit warning/errors, and the ctx must be reset before returning from the function, yet this early return was overlooked. Because of that, any ha_{warning,alert..} message reported after early return from parse_server() could cause messages to have an extra "parsing [file:line]" info. We fix that by ensuring parse_server() doesn't return without resetting the parsing context. It should be backported up to 2.6	2025-03-07 09:39:46 +01:00
Aurelien DARRAGON	054443dfb9	BUG/MINOR: cfgparse/peers: properly handle ignored local peer case In `8ba10fea6` ("BUG/MINOR: peers: Incomplete peers sections should be validated."), some checks were relaxed in parse_server(), and extra logic was added in the peers section parser in an attempt to properly ignore incomplete "server" or "peer" statement under peers section. This was done in response to GH #565, the main intent was that haproxy should already complain about incomplete peers section (ie: missing localpeer). However, `8ba10fea69` explicitly skipped the peer cleanup upon missing srv association for local peers. This is wrong because later haproxy code always assumes that peer->srv is valid. Indeed, we got reports that the (invalid) config below would cause segmentation fault on all stable versions: global localpeer 01JM0TEPAREK01FQQ439DDZXD8 peers my-table peer 01JM0TEPAREK01FQQ439DDZXD8 listen dummy bind localhost:8080 To fix the issue, instead of by-passing some cleanup for the local peer, handle this case specifically by doing the regular peer cleanup and reset some fields set on the curpeers and curpeers proxy because of the invalid local peer (do as if the peer was not declared). It should still comply with requirements from #565. This patch should be backported to all stable versions.	2025-03-06 22:05:29 +01:00
Aurelien DARRAGON	2560ab892f	BUG/MINOR: cfgparse/peers: fix inconsistent check for missing peer server In the "peers" section parser, right after parse_server() is called, we used to check whether the curpeers->peers_fe->srv pointer was set or not to know if parse_server() successfuly added a server to the peers proxy, server that we can then associate to the new peer. However the check is wrong, as curpeers->peers_fe->srv points to the last added server, if a server was successfully added before the failing one, we cannot detect that the last parse_server() didn't add a server. This is known to cause bug with bad "peer"/"server" statements. To fix the issue, we save a pointer on the last known curpeers->peers_fe->srv before parse_server() is called, and we then compare the save with the pointer after parse_server(), if the value didn't change, then parse_server() didn't add a server. This makes the check consistent in all situations. It should be backported to all stable versions.	2025-03-06 22:05:24 +01:00
Valentine Krasnobaeva	e900ef987e	BUG/MEIDUM: startup: return to initial cwd only after check_config_validity() In check_config_validity() we evaluate some sample fetch expressions (log-format, server rules, etc). These expressions may use external files like maps. If some particular 'default-path' was set in the global section before, it's no longer applied to resolve file pathes in check_config_validity(). parse_cfg() at the end of config parsing switches back to the initial cwd. This fixes the issue #2886. This patch should be backported in all stable versions since 2.4.0, including 2.4.0.	2025-03-06 10:49:48 +01:00
Roberto Moreda	f98b5c4f59	MINOR: log: add dont-parse-log and assume-rfc6587-ntf options This commit introduces the dont-parse-log option to disable log message parsing, allowing raw log data to be forwarded without modification. Also, it adds the assume-rfc6587-ntf option to frame log messages using only non-transparent framing as per RFC 6587. This avoids missparsing in certain cases (mainly with non RFC compliant messages). The documentation is updated to include details on the new options and their intended use cases. This feature was discussed in GH #2856	2025-03-06 09:30:39 +01:00
Roberto Moreda	c25e6f5efa	MINOR: log: detach prepare from parse message This commit adds a new function `prepare_log_message` to initialize log message buffers and metadata. This function sets default values for log level and facility, ensuring a consistent starting state for log processing. It also prepares the buffer and metadata fields, simplifying subsequent log parsing and construction.	2025-03-06 09:30:31 +01:00
Roberto Moreda	834e9af877	MINOR: log: add options eval for log-forward This commit adds parsing of options in log-forward config sections and prepares the scenario to implement actual changes of behaviuor. So far we only take in account proxy->options2, which is the bit container with more available positions.	2025-03-06 09:30:25 +01:00
Aurelien DARRAGON	0746f6bde0	MINOR: cfgparse-listen: add and use cfg_parse_listen_match_option() helper cfg_parse_listen_match_option() takes cfg_opt array as parameter, as well current args, expected mode and cap bitfields. It is expected to be used under cfg_parse_listen() function or similar. Its goal is to remove code duplication around proxy->options and proxy->options2 handling, since the same checks are performed for the two. Also, this function could help to evaluate proxy options for mode-specific proxies such as log-forward section for instance: by giving the expected mode and capatiblity as input, the function would only match compatible options.	2025-03-06 09:30:18 +01:00
Aurelien DARRAGON	c7abe7778e	MEDIUM: log: postpone the decision to send or not log with empty messages As reported by Nick Ramirez in GH #2891, it is currently not possible to use log-profile without a log-format set on the proxy. This is due to historical reason, because all log sending functions avoid trying to send a log with empty message. But now with log-profile which can override log-format, it is possible that some loggers may actually end up generating a valid log message that should be sent! Yet from the upper logging functions we don't know about that because loggers are evaluated in lower API functions. Thus, to avoid skipping potentially valid messages (thanks to log-profile overrides), in this patch we postpone the decision to send or not empty log messages in lower log API layer, ie: _process_send_log_final(), once the log-profile settings were evaluated for a given logger. A known side-effect of this change is that fe->log_count statistic may be increased even if no log message is sent because the message was empty and even the log-profile didn't help to produce a non empty log message. But since configurations lacking proxy log-format are not supposed to be used without log-profile (+ log steps combination) anyway it shouldn't be an issue.	2025-03-05 15:38:52 +01:00
Aurelien DARRAGON	9e9b110032	MINOR: log: use __send_log() with exact payload length Historically, __send_log() was called with terminating NULL byte after the message payload. But now that __send_log() supports being called without terminating NULL byte (thanks to size hint), and that __sendlog() actually stips any \n or NULL byte, we don't need to bother with that anymore. So let's remove extra logic around __send_log() users where we added 1 extra byte for the terminating NULL byte. No change of behavior should be expected.	2025-03-05 15:38:46 +01:00
Aurelien DARRAGON	94a9b0f5de	BUG/MINOR: log: set proper smp size for balance log-hash result.data.u.str.size was set to size+1 to take into account terminating NULL byte as per the comment. But this is wrong because the caller is free to set size to just the right amount of bytes (without terminating NULL byte). In fact all smp API functions will not read past str.data so there is not risk about uninitialized reads, but this leaves an ambiguity for converters that may use all the smp size to perform transformations, and since we don't know about the "message" memory origin, we cannot assume that its size may be greater than size. So we max it out to size just to be safe. This bug was not known to cause any issue, it was spotted during code review. It should be backported in 2.9 with `b30bd7a` ("MEDIUM: log/balance: support for the "hash" lb algorithm")	2025-03-05 15:38:41 +01:00
Aurelien DARRAGON	ddf66132f4	CLEANUP: log: removing "log-balance" references This is a complementary patch to `0e1f389fe9` ("DOC: config: removing "log-balance" references"): we properly removed all log-balance references in the doc but there remained some in the code, let's fix that. It could be backported in 2.9 with `0e1f389fe9`	2025-03-05 15:38:34 +01:00
Valentine Krasnobaeva	b46b81949f	MINOR: sample: allow custom date format in error-log-format Sample fetches %[accept_date] and %[request_date] with converters can be used in error-log-format string. But in the most error cases they fetches nothing, as error logs are produced on SSL handshake issues or when invalid PROXY protocol header is used. Stream object is never allocated in such cases and smp_fetch_accept_date() just simply returns 0. There is a need to have a custom date format (ISO8601) also in the error logs, along with normal logs. When sess_build_logline_orig() builds log line it always copies the accept date to strm_logs structure. When stream is absent, accept date is copied from the session object. So, if the steam object wasn't allocated, let's use the session date info in smp_fetch_accept_date(). This allows then, in sample_process(), to apply to the fetched date different converters and formats. This fixes the issue #2884.	2025-03-04 18:57:29 +01:00
William Lallemand	cf71e9f5cf	MINOR: jws: conversion to NIST curves name OpenSSL version greater than 3.0 does not use the same API when manipulating EVP_PKEY structures, the EC_KEY API is deprecated and it's not possible anymore to get an EC_GROUP and simply call EC_GROUP_get_curve_name(). Instead, one must call EVP_PKEY_get_utf8_string_param with the OSSL_PKEY_PARAM_GROUP_NAME parameter, but this would result in a SECG curves name, instead of a NIST curves name in previous version. (ex: secp384r1 vs P-384) This patch adds 2 functions: - the first one look for a curves name and converts it to an openssl NID. - the second one converts a NID to a NIST curves name The list only contains: P-256, P-384 and P-521 for now, it could be extended in the fure with more curves.	2025-03-03 12:43:32 +01:00
William Lallemand	09457111bb	TESTS: jws: register a unittest for jwk Add a way to test the jwk converter in the unit test system $ make TARGET=linux-glibc USE_OPENSSL=1 CFLAGS="-DDEBUG_UNIT=1" $ ./haproxy -U jwk foobar.pem.rsa { "kty": "RSA", "n": "...", "e": "AQAB" } $ ./haproxy -U jwk foobar.pem.ecdsa { "kty": "EC", "crv": "P-384", "x": "...", "y": "..." } This is then tested by a shell script: $ HAPROXY_PROGRAM=${PWD}/haproxy tests/unit/jwk/test.sh + readlink -f tests/unit/jwk/test.sh + BASENAME=/haproxy/tests/unit/jwk/test.sh + dirname /haproxy/tests/unit/jwk/test.sh + TESTDIR=/haproxy/tests/unit/jwk + HAPROXY_PROGRAM=/haproxy/haproxy + mktemp + FILE1=/tmp/tmp.iEICxC5yNK + /haproxy/haproxy -U jwk /haproxy/tests/unit/jwk/ecdsa.key + diff -Naurp /haproxy/tests/unit/jwk/ecdsa.pub.jwk /tmp/tmp.iEICxC5yNK + rm /tmp/tmp.iEICxC5yNK + mktemp + FILE2=/tmp/tmp.EIrGZGaCDi + /haproxy/haproxy -U jwk /haproxy/tests/unit/jwk/rsa.key + diff -Naurp /haproxy/tests/unit/jwk/rsa.pub.jwk /tmp/tmp.EIrGZGaCDi + rm /tmp/tmp.EIrGZGaCDi $ echo $? 0	2025-03-03 12:43:32 +01:00
William Lallemand	a647839954	DEBUG: init: add a way to register functions for unit tests Doing unit tests with haproxy was always a bit difficult, some of the function you want to test would depend on the buffer or trash buffer initialisation of HAProxy, so building a separate main() for them is quite hard. This patch adds a way to register a function that can be called with the "-U" parameter on the command line, will be executed just after step_init_1() and will exit the process with its return value as an exit code. When using the -U option, every keywords after this option is passed to the callback and could be used as a parameter, letting the capability to handle complex arguments if required by the test. HAProxy need to be built with DEBUG_UNIT to activate this feature.	2025-03-03 12:43:32 +01:00
William Lallemand	4dc0ba233e	MINOR: jws: implement a JWK public key converter Implement a converter which takes an EVP_PKEY and converts it to a public JWK key. This is the first step of the JWS implementation. It supports both EC and RSA keys. Know to work with: - LibreSSL - AWS-LC - OpenSSL > 1.1.1	2025-03-03 12:43:32 +01:00
Willy Tarreau	730641f7ca	BUG/MINOR: server: check for either proxy-protocol v1 or v2 to send hedaer As reported in issue #2882, using "no-send-proxy-v2" on a server line does not properly disable the use of proxy-protocol if it was enabled in a default-server directive in combination with other PP options. The reason for this is that the sending of a proxy header is determined by a test on srv->pp_opts without any distinction, so disabling PPv2 while leaving other options results in a PPv1 header to be sent. Let's fix this by explicitly testing for the presence of either send-proxy or send-proxy-v2 when deciding to send a proxy header. This can be backported to all versions. Thanks to Andre Sencioles (@asenci) for reporting the issue and testing the fix.	2025-03-03 04:05:47 +01:00
Amaury Denoyelle	d0f97040a3	BUG/MINOR: hq-interop: fix leak in case of rcv_buf early return HTTP/0.9 parser was recently updated to support truncated requests in rcv_buf operation. However, this caused a leak as input buffer is allocated early. In fact, the leak was already present in case of fatal errors. Fix this by first delaying buffer allocation, so that initial checks are performed before. Then, ensure that buffer is released in case of a latter error. This is considered as minor, as HTTP/0.9 is reserved for experiment and QUIC interop usages. This should be backported up to 2.6.	2025-02-28 17:37:00 +01:00
Willy Tarreau	fd5d59967a	MINOR: h1: permit to relax the websocket checks for missing mandatory headers At least one user would like to allow a standards-violating client setup WebSocket connections through haproxy to a standards-violating server that accepts them. While this should of course never be done over the internet, it can make sense in the datacenter between application components which do not need to mask the data, so this typically falls into the situation of what the "accept-unsafe-violations-in-http-request" option and the "accept-unsafe-violations-in-http-response" option are made for. See GH #2876 for more context. This patch relaxes the test on the "Sec-Websocket-Key" header field in the request, and of the "Sec-Websocket-Accept" header in the response when these respective options are set. The doc was updated to reference this addition. This may be backported to 3.1 but preferably not further.	2025-02-28 17:31:20 +01:00
Christopher Faulet	0e08252294	BUG/MEDIUM: mux-fcgi: Try to fully fill demux buffer on receive if not empty Don't reserve space for the HTX overhead on receive if the demux buffer is not empty. Otherwise, the demux buffer may be erroneously reported as full and this may block records processing. Because of this bug, a ping-pong loop till timeout between data reception and demux process can be observed. This bug was introduced by the commit `5f927f603` ("BUG/MEDIUM: mux-fcgi: Properly handle read0 on partial records"). To fix the issue, if the demux buffer is not empty when we try to receive more data, all free space in the buffer can now be used. However, if the demux buffer is empty, we still try to keep it aligned with the HTX. This patch must be backported to 3.1.	2025-02-28 16:07:05 +01:00
Amaury Denoyelle	3cc095a011	MINOR: hq-interop: properly handle incomplete request Extends HTTP/0.9 layer to be able to deal with incomplete requests. Instead of an error, 0 is returned. Thus, instead of a stream closure. QUIC-MUX may retry rcv_buf operation later if more data is received, similarly to HTTP/3 layer. Note that HTTP/0.9 is only used for testing and interop purpose. As such, this limitation is not considered as a bug. It is probably not worth to backport it.	2025-02-27 17:34:06 +01:00
Amaury Denoyelle	0aa35289b3	CLEANUP: h3: fix documentation of h3_rcv_buf() Return value of h3_rcv_buf() is incorrectly documented. Indeed, it may return a positive value to indicate that input bytes were converted into HTX. This is especially important, as caller uses this value to consume the reported data amount in QCS Rx buffer. This should be backported up to 2.6. Note that on 2.8, h3_rcv_buf() was named h3_decode_qcs().	2025-02-27 17:31:40 +01:00
Amaury Denoyelle	f6648d478b	BUG/MINOR: h3: do not report transfer as aborted on preemptive response HTTP/3 specification allows a server to emit the entire response even if only a partial request was received. In particular, this happens when request STREAM FIN is delayed and transmitted in an empty payload frame. In this case, qcc_abort_stream_read() was used by HTTP/3 layer to emit a STOP_SENDING. Remaining received data were not transmitted to the stream layer as they were simply discared. However, this prevents FIN transmission to the stream layer. This causes the transfer to be considered as prematurely closed, resulting in a cL-- log line status. This is misleading to users which could interpret it as if the response was not sent. To fix this, disable STOP_SENDING emission on full preemptive reponse emission. Rx channel is kept opened until the client closes it with either a FIN or a RESET_STREAM. This ensures that the FIN signal can be relayed to the stream layer, which allows the transfer to be reported as completed. This should be backported up to 2.9.	2025-02-27 17:23:24 +01:00
Dragan Dosen	0ae7a5d672	BUG/MINOR: server: fix the "server-template" prefix memory leak The srv->tmpl_info.prefix was not freed in srv_free_params(). This could be backported to all stable versions.	2025-02-27 04:21:01 +01:00
Dragan Dosen	6838fe43a3	BUG/MEDIUM: server: properly initialize PROXY v2 TLVs The PROXY v2 TLVs were not properly initialized when defined with "set-proxy-v2-tlv-fmt" keyword, which could have caused a crash when validating the configuration or malfunction (e.g. when used in combination with "server-template" and/or "default-server"). The issue was introduced with commit `6f4bfed3a` ("MINOR: server: Add parser support for set-proxy-v2-tlv-fmt"). This should be backported up to 2.9.	2025-02-27 04:20:45 +01:00
Olivier Houchard	706b008429	MEDIUM: servers: Add strict-maxconn. Maxconn is a bit of a misnomer when it comes to servers, as it doesn't control the maximum number of connections we establish to a server, but the maximum number of simultaneous requests. So add "strict-maxconn", that will make it so we will never establish more connections than maxconn. It extends the meaning of the "restricted" setting of tune.takeover-other-tg-connections, as it will also attempt to get idle connections from other thread groups if strict-maxconn is set.	2025-02-26 13:00:18 +01:00
Olivier Houchard	8de8ed4f48	MEDIUM: connections: Allow taking over connections from other tgroups. Allow haproxy to take over idle connections from other thread groups than our own. To control that, add a new tunable, tune.takeover-other-tg-connections. It can have 3 values, "none", where we won't attempt to get connections from the other thread group (the default), "restricted", where we only will try to get idle connections from other thread groups when we're using reverse HTTP, and "full", where we always try to get connections from other thread groups. Unless there is a special need, it is advised to use "none" (or restricted if we're using reverse HTTP) as using connections from other thread groups may have a performance impact.	2025-02-26 13:00:18 +01:00
Olivier Houchard	d31b1650ae	MEDIUM: pollers: Drop fd events after a takeover to another tgid. In pollers that support it, provide the generation number in addition to the fd, and, when an event happened, if the generation number is the same, but the tgid changed, then assumed the fd was taken over by a thread from another thread group, and just delete the event from the current thread's poller, as we no longer want to hear about it.	2025-02-26 13:00:18 +01:00
Olivier Houchard	c36aae2af1	MINOR: pollers: Add a fixup_tgid_takeover() method. Add a fixup_tgid_takeover() method to pollers for which it makes sense (epoll, kqueue and evport). That method can be called after a takeover of a fd from a different thread group, to make sure the poller's internal structure reflects the new state.	2025-02-26 13:00:18 +01:00
Olivier Houchard	752c5cba5d	MEDIUM: epoll: Make sure we can add a new event Check that the call to epoll_ctl() succeeds, and if it does not, if we're adding a new event and it fails with EEXIST, then delete and re-add the event. There are a few cases where we may already have events for a fd. If epoll_ctl() fails for any reason, use BUG_ON to make sure we immediately crash, as this should not happen.	2025-02-26 13:00:18 +01:00
Willy Tarreau	a826250659	OPTIM: connection: don't try to kill other threads' connection when !shared Users may have good reasons for using "tune.idle-pool.shared off", one of them being the cost of moving cache lines between cores, or the kernel- side locking associated with moving FDs. For this reason, when getting close to the file descriptors limits, we must not try to kill adjacent threads' FDs when the sharing of pools is disabled. This is extremely expensive and kills the performance. We must limit ourselves to our local FDs only. In such cases, it's up to the users to configure a large enough maxconn for their usages. Before this patch, perf top reported 9% CPU usage in connect_server() onthe trylock used to kill connections when running at 4800 conns for a global maxconn of 6400 on a 128-thread server. Now it doesn't spend its time there anymore, and performance has increased by 12%. Note, it was verified that disabling the locks in such a case has no effect at all, so better keep them and stay safe.	2025-02-25 09:23:46 +01:00
Willy Tarreau	2e0bac90da	BUG/MEDIUM: stream: don't use localtime in dumps from a signal handler In issue #2861, Jarosaw Rzesz�tko reported another issue with "show threads", this time in relation with the conversion of a stream's accept date to local time. Indeed, if the libc was interrupted in this same function, it could have been interrupted with a lock held, then it's no longer possible to dump the date, and we face a deadlock. This is easy to reproduce with logging enabled. Let's detect we come from a signal handler and do not try to resolve the time to localtime in this case.	2025-02-24 13:40:42 +01:00
Willy Tarreau	fb7874c286	MINOR: tinfo: split the signal handler report flags into 3 While signals are not recursive, one signal (e.g. wdt) may interrupt another one (e.g. debug). The problem this causes is that when leaving the inner handler, it removes the outer's flag, hence the protection that comes with it. Let's just have 3 distinct flags for regular signals, debug signal and watchdog signal. We add a 4th definition which is an aggregate of the 3 to ease testing.	2025-02-24 13:37:52 +01:00
Willy Tarreau	bbf824933f	BUG/MINOR: h2: always trim leading and trailing LWS in header values Annika Wickert reported some occasional disconnections between haproxy and varnish when communicating over HTTP/2, with varnish complaining about protocol errors while captures looked apparently normal. Nils Goroll managed to reproduce this on varnish by injecting the capture of the outgoing haproxy traffic and noticed that haproxy was forwarding a header value containing a trailing space, which is now explicitly forbidden since RFC9113. It turns out that the only way for such a header to pass through haproxy is to arrive in h2 and not be edited, in which case it will arrive in HTX with its undesired spaces. Since the code dealing with HTX headers always trims spaces around them, these are not observable in dumps, but only when started in debug mode (-d). Conversions to/from h1 also drop the spaces. With this patch we trim LWS both on input and on output. This way we always present clean headers in the whole stack, and even if some are manually crafted by the configuration or Lua, they will be trimmed on the output. This must be backported to all stable versions. Thanks to Annika for the helpful capture and Nils for the help with the analysis on the varnish side!	2025-02-24 09:39:57 +01:00
Vincent Dechenaux	9011b3621b	MINOR: compression: Introduce minimum size This is the introduction of "minsize-req" and "minsize-res". These two options allow you to set the minimum payload size required for compression to be applied. This helps save CPU on both server and client sides when the payload does not need to be compressed.	2025-02-22 11:32:40 +01:00
Willy Tarreau	e7510d6230	CLEANUP: task: move the barrier after clearing th_ctx->current There's a barrier after releasing the current task in the scheduler. However it's improperly placed, it's done after pool_free() while in fact it must be done immediately after resetting the current pointer. Indeed, the purpose is to make sure that nobody sees the task as valid when it's in the process of being released. This is something that could theoretically happen if interrupted by a signal in the inlined code of pool_free() if the compiler decided to postpone the write to ->current. In practice since nothing fancy is done in the inlined part of the function, there's currently no risk of reordering. But it could happen if the underlying __pool_free() were to be inlined for example, and in this case we could possibly observe th_ctx->current pointing to something currently being destroyed. With the barrier between the two, there's no risk anymore.	2025-02-21 18:31:46 +01:00
Willy Tarreau	eb41d768f9	MINOR: tools: use only opportunistic symbols resolution As seen in issue #2861, dladdr_and_size() an be quite expensive and will often hold a mutex in the underlying library. It becomes a real problem when issuing lots of "show threads" or wdt warnings in parallel because threads will queue up waiting for each other to finish, adding to their existing latency that possibly caused the warning in the first place. Here we're taking a different approach. If the thread is not isolated and not panicking, it's doing unimportant stuff like showing threads or warnings. In this case we try to grab a lock, and if we fail because another thread is already there, we just pretend we cannot resolve the symbol. This is not critical because then we fall back to the already used case which consists in writing "main+<offset>". In practice this will almost never happen except in bad situations which could have otherwise degenerated.	2025-02-21 18:26:29 +01:00
Willy Tarreau	3c22fa315b	BUG/MEDIUM: stream: use non-blocking freq_ctr calls from the stream dumper The stream dump function is called from signal handlers (warning, show threads, panic). It makes use of read_freq_ctr() which might possibly block if it tries to access a locked freq_ctr in the process of being updated, e.g. by the current thread. Here we're relying on the non-blocking API instead. It may return incorrect values (typically smaller ones after resetting the curr counter) but at least it will not block. This needs to be backported to stable versions along with the previous commit below: MINOR: freq_ctr: provide non-blocking read functions At least 3.1 is concerned as the warnings tend to increase the risk of this situation appearing.	2025-02-21 18:26:29 +01:00
Willy Tarreau	29e246a84c	MINOR: freq_ctr: provide non-blocking read functions Some code called by the debug handlers in the context of a signal handler accesses to some freq_ctr and occasionally ends up on a locked one from the same thread that is dumping it. Let's introduce a non-blocking version that at least allows to return even if the value is in the process of being updated, it's less problematic than hanging.	2025-02-21 18:26:29 +01:00
Willy Tarreau	84d4c948fc	BUG/MEDIUM: stream: never allocate connection addresses from signal handler In __strm_dump_to_buffer(), we call conn_get_src()/conn_get_dst() to try to retrieve the connection's IP addresses. But this function may be called from a signal handler to dump a currently running stream, and if the addresses were not allocated yet, a poll_alloc() will be performed while we might possibly already be running pools code, resulting in pool list corruption. Let's just make sure we don't call these sensitive functions there when called from a signal handler. This must be backported at least to 3.1 and ideally all other versions, along with this previous commit: MINOR: tinfo: add a new thread flag to indicate a call from a sig handler	2025-02-21 17:41:38 +01:00
Willy Tarreau	ddd173355c	MINOR: tinfo: add a new thread flag to indicate a call from a sig handler Signal handlers must absolutely not change anything, but some long and complex call chains may look innocuous at first glance, yet result in some subtle write accesses (e.g. pools) that can conflict with a running thread being interrupted. Let's add a new thread flag TH_FL_IN_SIG_HANDLER that is only set when entering a signal handler and cleared when leaving them. Note, we're speaking about real signal handlers (synchronous ones), not deferred ones. This will allow some sensitive call places to act differently when detecting such a condition, and possibly even to place a few new BUG_ON().	2025-02-21 17:41:38 +01:00
Willy Tarreau	a56dfbdcb4	BUG/MINOR: mux-h1: always make sure h1s->sd exists in h1_dump_h1s_info() This function may be called from a signal handler during a warning, a panic or a show thread. We need to be more cautious about what may or may not be dereferenced since an h1s is not necessarily fully initialized. Loops of "show threads" sometimes manage to crash when dereferencing a null h1s->sd, so let's guard it and add a comment remining about the unusual call place. This can be backported to the relevant versions.	2025-02-21 17:41:38 +01:00
Willy Tarreau	9d5bd47634	BUG/MINOR: stream: do not call co_data() from __strm_dump_to_buffer() co_data() was instrumented to detect cases where c->output > data and emits a warning if that's not correct. The problem is that it happens quite a bit during "show threads" if it interrupts traffic anywhere, and that in some environments building with -DDEBUG_STRICT_ACTION=3, it will kill the process. Let's just open-code the channel functions that make access to co_data(), there are not that many and the operations remain very simple. This can be backported to 3.1. It didn't trigger in earlier versions because they didn't have this CHECK_IF_HOT() test.	2025-02-21 17:18:00 +01:00
Aurelien DARRAGON	97a19517ff	MINOR: clock: always use atomic ops for global_now_ms global_now_ms is shared between threads so we must give hint to the compiler that read/writes operations should be performed atomically. Everywhere global_now_ms was used, atomic ops were used, except in clock_update_global_date() where a read was performed without using atomic op. In practise it is not an issue because on most systems such reads should be atomic already, but to prevent any confusion or potential bug on exotic systems, let's use an explicit _HA_ATOMIC_LOAD there. This may be backported up to 2.8	2025-02-21 11:22:35 +01:00
Aurelien DARRAGON	9561b9fb69	BUG/MINOR: sink: add tempo between 2 connection attempts for sft servers When the connection for sink_forward_{oc}_applet fails or a previous one is destroyed, the sft->appctx is instantly released. However process_sink_forward_task(), which may run at any time, iterates over all known sfts and tries to create sessions for orphan ones. It means that instantly after sft->appctx is destroyed, a new one will be created, thus a new connection attempt will be made. It can be an issue with tcp log-servers or sink servers, because if the server is unavailable, process_sink_forward() will keep looping without any temporisation until the applet survives (ie: connection succeeds), which results in unexpected CPU usage on the threads responsible for that task. Instead, we add a tempo logic so that a delay of 1second is applied between two retries. Of course the initial attempt is not delayed. This could be backported to all stable versions.	2025-02-21 11:22:35 +01:00
Aurelien DARRAGON	c9d4192726	BUG/MINOR: log: fix outgoing abns address family While reviewing the code in an attempt to fix GH #2875, I stumbled on another case similar to `aac570c` ("BUG/MEDIUM: uxst: fix outgoing abns address family in connect()") that caused abns(z) addresses to fail when used as log targets. The underlying cause is the same as `aac570c`, which is the rework of the unix socket families in order to support custom addresses for different adressing schemes, where a real_family() was overlooked before passing a haproxy-internal address struct to socket-oriented syscall. To fix the issue, we first copy the target's addr, and then leverage real_family() to set the proper low-level address family that is passed to sendmsg() syscall. It should be backported in 3.1	2025-02-21 11:22:28 +01:00
Willy Tarreau	aac570cd03	BUG/MEDIUM: uxst: fix outgoing abns address family in connect() Since we reworked the unix socket families in order to support custom addresses for different addressing schemes, we've been using extra values for the ss_family field in sockaddr_storage. These ones have to be adjusted before calling bind() or connect(). It turns out that after the abns/abnsz updates in 3.1, the connect() code was not adjusted to take care of the change, resulting in AF_CUST_ABNS or AF_CUST_ABNSZ to be placed in the address that was passed to connect(). The right approach is to locally copy the address, get its length, fixup the family and use the fixed value and length for connect(). This must be backported to 3.1. Many thanks for @Mewp for reporting this issue in github issue #2875.	2025-02-21 07:59:08 +01:00
Valentine Krasnobaeva	390df282c1	BUG/MINOR: cfgparse: fix NULL ptr dereference in cfg_parse_peers When "peers" keyword is followed by more than one argument and it's the first "peers" section in the config, cfg_parse_peers() detects it and exits with "ERR_ALERT\|ERR_FATAL" err_code. So, upper layer parser, parse_cfg(), continues and parses the next keyword "peer" and then he tries to check the global cfg_peers, which should contain "my_cluster". The global cfg_peers is still NULL, because after alerting a user in alertif_too_many_args, cfg_parse_peers() exited. peers my_cluster __some_wrong_data__ peer haproxy1 1.1.1.1 1000 In order to fix this, let's add ERR_ABORT, if "peers" keyword is followed by more than one argument. Like this parse_cfg() will stops immediately and terminates haproxy with "too many args for peers my_cluster..." alert message. It's more reliable, than add checks "if (cfg_peers !=NULL)" in "peer" subparser, as we may have many "peers" sections. peers my_another_cluster peer haproxy1 1.1.1.2 1000 peers my_cluster __some_wrong_data__ peer haproxy1 1.1.1.1 1000 In addition, for the example above, parse_cfg() will parse all configuration until the end and only then terminates haproxy with the alert "too many args...". Peer haproxy1 will be wrongly associated with my_another_cluster. This fixes the issue #2872. This should be backported in all stable versions.	2025-02-20 17:10:26 +01:00
Christopher Faulet	851e52b551	BUG/MEDIUM: spoe/mux-spop: Introduce an NOOP action to deal with empty ACK In the SPOP protocol, ACK frame with empty payload are allowed. However, in that case, because only the payload is transferred, there is no data to return to the SPOE applet. Only the end of input is reported. Thus the applet is never woken up. It means that the SPOE filter will be blocked during the processing timeout and will finally return an error. To workaournd this issue, a NOOP action is introduced with the value 0. It is only an internal action for now. It does not exist in the SPOP protocol. When an ACK frame with an empy payload is received, this noop action is transferred to the SPOE applet, instead of nothing. Thanks to this trick, the applet is properly notified. This works because unknown actions are ignored by the SPOE filter. This patch must be backported to 3.1.	2025-02-20 11:56:27 +01:00
Christopher Faulet	efc46de294	BUG/MEDIUM: applet: Don't handle EOI/EOS/ERROR is applet is waiting for room The commit `7214dcd52` ("BUG/MEDIUM: applet: Don't pretend to have more data to handle EOI/EOS/ERROR") introduced a regression. Because of this patch, it was possible to handle EOI/EOS/ERROR applet flags too early while the applet was waiting for more room to transfer the last output data. This bug can be encountered with any applet using its own buffers (cache and stats for instance). And depending on the configuration and the timing, the data may be truncated or the stream may be blocked, infinitely or not. Streams blocked infinitely were observed with the cache applet and the HTTP compression enabled. For the record, it is important to detect EOI/EOS/ERROR applet flags to be able to report the corresponding event on the SE and by transitivity on the SC. Most of time, this happens when some data should be transferred to the stream. The .rcv_buf callback function is called and these flags are properly handled. However, some applets may also report them spontaneously, outside of any data transfer. In that case, the .rcv_buf callback is not called. It is the purpose of this patch (and the one above). Being able to detect pending EOI/EOS/ERROR applet flags. However, we must be sure to not handle them too early at this place. When these flags are set, it means no more data will be produced by the applet. So we must only wait to have transferred everything to the stream. And this happens when the applet is no longer waiting for more room. This patch must be backported to 3.1 with the one above.	2025-02-20 10:00:32 +01:00
Amaury Denoyelle	a7645d7cd5	MINOR: mux-quic/h3: support temporary blocking on control stream sending When HTTP/3 layer is initialized via QUIC MUX, it first emits a SETTINGS frame on an unidirectional control stream. However, this could be prevented if client did not provide initial flow control. Previously, QUIC MUX was unable to deal with such situation. Thus, the connection was closed immediately and no transfer could occur. Improve this by extending QUIC MUX application layer API : initialization may now return a transient error. This allows MUX to continue to use the connection normally. Initialization will be retried periodically alter until it can succeed. This new API allows to deal with the flow control issue described above. Note that this patch is not considered as a bug fix. Indeed, clients are strongly advised to provide enough flow control for a SETTINGS frame exchange.	2025-02-19 11:08:02 +01:00
Amaury Denoyelle	06e7674399	MINOR: mux-quic/h3: emit SETTINGS via MUX tasklet handler Previously, QUIC MUX application layer was installed and initialized via MUX init. However, the latter stage involve I/O operations, for example when using HTTP/3 with the emission of a SETTINGS frame. Change this to prevent any I/O operations during MUX init. As such, finalize app_ops callback is now called during the first invokation of qcc_io_send(), in the context of MUX tasklet. To implement this, a new application state value is added, to detect the transition from NULL to INIT stage.	2025-02-19 11:03:40 +01:00
Amaury Denoyelle	188fc45b95	MINOR: mux-quic: define a QCC application state member Introduce a new QCC field to track the current application layer state. For the moment, only INIT and SHUT state are defined. This allows to replace the older flag QC_CF_APP_SHUT. This commit does not bring major changes. It is only necessary to permit future evolutions on QUIC MUX. The only noticeable change is that QMUX traces can now display this new field.	2025-02-19 10:59:53 +01:00
Christopher Faulet	b70921f2c1	BUG/MINOR: mux-h2: Properly handle full or truncated HTX messages on shut On shut, truncated HTX messages were not properly handled by the H2 multiplexer. Depending on how data were emitted, a chunked HTX message without the 0-CRLF could be considered as full and an empty data with ES flag set could be emitted instead of a RST_STREAM(CANCEL) frame. In the H2 multiplexer, when a shut is performed, an HTX message is considered as truncated if more HTX data are still expected. It is based on the presence or not of the H2_SF_MORE_HTX_DATA flag on the H2 stream. However, this flag is set or unset depending on the HTX extra field value. This field is used to state how much data that must still be transferred, based on the announced data length. For a message with a content-length, this assumption is valid. But for a chunked message, it is not true. Only the length of the current chunk is announced. So we cannot rely on this field in that case to know if a message is full or not. Instead, we must rely on the HTX start-line flags to know if more HTX data are expected or not. If the xfer length is known (the HTX_SL_F_XFER_LEN flag is set on the HTX start-line), it means that more data are always expected, until the end of message is reached (the HTX_FL_EOM flag is set on the HTX message). This is true for bodyless message because the end of message is reported with the end of headers. This is also true for tunneled messages because the end of message is received before switching the H2 stream in tunnel mode. This patch must be backported as far as 2.8.	2025-02-18 17:34:59 +01:00
Amaury Denoyelle	2715dbe9d0	BUG/MINOR: mux-quic: prevent crash after MUX init failure qmux_init() may fail for several reasons. In this case, connection resources are freed and underlying and a CONNECTION_CLOSE will be emitted via its quic_conn instance. In case of qmux_init() failure, qcc_release() is used to clean up resources, but QCC <conn> member is first resetted to NULL, as connection released must be delayed. Some cleanup operations are thus skipped, one of them is the resetting of <ctx> connection member to NULL. This may cause a crash as <ctx> is a dangling pointer after QCC release. One of the possible reproducer is to activate QMUX traces, which will cause a segfault on the qmux_init() error leave trace. To fix this, simply reset <ctx> to NULL manually on qmux_init() failure. This must be backported up to 3.0.	2025-02-18 11:02:46 +01:00
Amaury Denoyelle	2cdc4695cb	BUG/MINOR: quic: prevent crash on conn access after MUX init failure Initially, QUIC-MUX was responsible to reset quic_conn <conn> member to NULL when MUX was released. This was performed via qcc_release(). However, qcc_release() is also used on qmux_init() failure. In this case, connection must be freed via its session, so QCC <conn> member is resetted to NULL prior to qcc_release(), which prevents quic_conn <conn> member to also be resetted. As the connection is freed soon after, quic_conn <conn> is a dangling pointer, which may cause crashes. This bug should be very rare as first it implies that QUIC-MUX initialization has failed (for example due to a memory alloc error). Also, <conn> member is rarely used by quic_conn instance. In fact, the only reproducible crash was done with QUIC traces activated, as in this case connection is accessed via quic_conn under __trace_enabled() function. To fix this, detach connection from quic_conn via the XPRT layer instead of the MUX. More precisely, this is performed via quic_close(). This should ensure that it will always be conducted, either on normal connection closure, but also after special conditions such as MUX init failure. This should be backported up to 2.6.	2025-02-18 10:43:56 +01:00
William Lallemand	cd6a02ace9	MEDIUM: ssl/crtlist: "crt" keyword in frontend This patch implements the "crt" keywords in frontend, declaring an implicit crt-list named after the frontend. The patch is split in two steps: The first step is the crt keyword parser, which parses crt lines and fill a "cfg_crt_node" struct containing a ssl_bind_conf and a ckch_conf which are put in a list to be used later. After parsing the frontend section, as a 2nd step, a post_section_parser is called, it will create a crt-list named after the frontend and will fill it with certificates from the list of cfg_crt_node. Once created this crt-list will be loaded in every "ssl" bind lines that didn't declare any crt or crt-list. Example: listen https bind :443 ssl crt foobar.pem crt test1.net.crt key test1.net.key Implements part of #2854	2025-02-17 18:26:37 +01:00
William Lallemand	82f927817e	MINOR: ssl/ckch: return from ckch_conf_clean() when conf is NULL ckch_conf_clean() mustn't be executed when the argument is NULL, this will keep the API more consistant like any free() function.	2025-02-17 18:26:37 +01:00
William Lallemand	0330011acf	MINOR: ssl/crtlist: handle crt_path == cc->crt in crtlist_load_crt() Handle the case where crt_path == cc->crt, so the pointer doesn't get free'd before getting strdup'ed in crtlist_load_crt().	2025-02-17 18:26:37 +01:00
William Lallemand	69163cd63e	MINOR: ssl/crtlist: split the ckch_conf loading from the crtlist line parsing ckch_conf loading is not that simple as it requires to check - if the cert already exists in the ckchs_tree - if the ckch_conf is compatible with an existing cert in ckchs_tree - if the cert is a bundle which need to load multiple ckch_store This logic could be reuse elsewhere, so this commit introduce the new crtlist_load_crt() function which does that.	2025-02-17 18:26:37 +01:00
Christopher Faulet	ca79ed5eef	BUG/MINOR: fcgi: Don't set the status to 302 if it is already set When a "Location" header was found in a FCGI response, the status code was forced to 302. But it should only be performed if no status code was set first. So now, we take care to not override an already defined status code when the "Location" header is found. This patch should fix the issue #2865. It must backported to all stable versions.	2025-02-17 16:37:53 +01:00
Christopher Faulet	34542d5ec2	BUG/MEDIUM: filters: Handle filters registered on data with no payload callback An HTTP filter with no http_payload callback function may be registered on data. In that case, this filter is obviously not called when some data are received but it remains important to update its internal state to be sure to keep it synchronized on the stream, especially its offet value. Otherwise, the wrong calculation on the global offset may be performed in flt_http_end(), leading to an integer overflow when data are moved from input to output. This overflow triggers a BUG_ON() in c_adv(). The same is true for TCP filters with no tcp_payload callback function. This patch must be backport to all stable versions.	2025-02-17 16:16:29 +01:00
Christopher Faulet	49b7bcf583	BUG/MINOR: cli: Wait for the last ACK when FDs are xferred from the old worker On reload, the new worker requests bound FDs to the old one. The old worker sends them in message of at most 252 FDs. Each message is acknowledged by the new worker. All messages sent or received by the old worker are handled manually via sendmsg/recv syscalls. So the old worker must be sure consume all the ACK replies. However, the last one was never consumed. So it was considered as a command by the CLI applet. This issue was hidden since recently. But it was the root cause of the issue #2862. Note this last ack is also the first one when there are less than 252 FDs to transfer. This patch must be backported to all stable versions.	2025-02-17 15:31:07 +01:00
Christopher Faulet	972ce87676	BUG/MEDIUM: cli: Be sure to drop all input data in END state Commit `7214dcd` ("BUG/MEDIUM: applet: Don't pretend to have more data to handle EOI/EOS/ERROR") revealed a bug with the CLI applet. Pending input data when the applet is in CLI_ST_END state were never consumed or dropped, leading to a wakeup loop. The CLI applet implements its own snd_buf callback function. It is important it consumes all pending input data. Otherwise, the applet is woken up in loop until it empties the request buffer. Another way to fix the issue would be to report an error. But in that case, it seems reasonnable to drop these data. The issue can be observed on reload, in master/worker mode, because of issue about the last ACK message which was never consummed by the _getsocks() command. This patch should fix the issue #2862. It must be backported to 3.1 with the commit above.	2025-02-17 15:31:07 +01:00
William Lallemand	ab2fa95bdd	BUG/MINOR: startup: hap_register_feature() fix for partial feature name In patch `2fe4cbd8e` ("MINOR: startup: allow hap_register_feature() to enable a feature in the list"), the ability to overwrite a '-' in the feature list was added. However the code was not tokenizing correctly the string, and partial feature name found in the name could result in having the same feature name multiple time. This patch rewrites the lookup of the string by tokenizing it correctly.	2025-02-17 14:56:09 +01:00
William Lallemand	7268e9c249	BUG/MINOR: startup: leave at first post_section_parser which fails Since we are now iterating on post_section_parser() for a same keyword, we need to exit at the first ERR_ABORT. The post_section_parser() is called when parsing a new section, but also at the end of the file to be called for the last section. The changes in `4de86bb` ("MEDIUM: initcall: allow to register mutiple post_section_parser per section") should have added tests on the ERR_ABORT value. Also pcs->post_section_parser() must be called instead of cs->post_section_parser() because we could have a NULL ptr. This bug does not affect anything since we don't use REGISTER_CONFIG_POST_SECTION() yet.	2025-02-17 11:21:20 +01:00
Amaury Denoyelle	32691e7c25	MINOR: quic: support frame type as a varint QUIC frame type is encoded as a variable-length integer. Thus, 64-bit integer should be used for them. Currently, this was not the case as type was represented as a 1-byte char inside quic_frame structure. This does not cause any issue with QUIC from RFC9000, as all frame types fit in this range. Furthermore, a QUIC implementation is required to use the smallest size varint when encoding a frame type. However, the current code is unable to accept QUIC extension with bigger frame types. This is notably the case for quic-on-streams draft. Thus, this commit readjusts quic_frame architecture to be able to support higher frame type values. First, type field of quic_frame is changed to a 64-bits variable. Both encoding and decoding frame functions uses variable-length integer helpers to manipulate the frame type field. Secondly, the quic_frame builders/parsers infrastructure is still preserved. However, it could be impossible to define new large frame type as an index into quic_frame_builders / quic_frame_parsers arrays. Thus, wrapper functions are now provided to access the builders and parsers. Both qf_builder() and qf_parser() wrappers can then be extended to return custom builder/parser instances for larger frame type. Finally, unknown frame type detection also uses the new wrapper quic_frame_is_known(). As with builders/parsers, for large frame type, this function must be manually completed to support a new type value.	2025-02-14 09:00:05 +01:00
William Lallemand	2fe4cbd8e5	MINOR: startup: allow hap_register_feature() to enable a feature in the list This patch allows hap_register_feature() to enable a feature in the list which was already registered and marked disabled. This way we could enable automatically some features under certain condition without the need of the USE argument with make and correctly report its activation.	2025-02-14 00:09:17 +01:00
William Lallemand	7034f2ca48	MINOR: ssl: store the filenames resulting from a lookup in ckch_conf With this patch, files resulting from a lookup (.key, .ocsp, *.issuer etc) are now stored in the ckch_conf. It allows to see the original filename from where it was loaded in "show ssl cert <filename>"	2025-02-13 17:44:00 +01:00
William Lallemand	0c0b38d64c	MINOR: ssl/cli: display more filenames in 'show ssl cert' "show ssl cert <file>" only displays a unique filename, which is the key used in the ckch_store tree. This patch extends it by displaying every filenames from the ckch_conf that can be configured with the crt-store. In order to be more consistant, some changes are needed in the future: - we need to store the complete path in the ckch_conf (meaning with crt-path or key-path) - we need to fill a ckch_conf in cases the files are autodiscovered	2025-02-13 16:18:06 +01:00
William Lallemand	5a7cbb8d81	BUG/MINOR: ssl/cli: "show ssl crt-list" lacks sigals `1d3c8223` ("MINOR: ssl: allow to change the server signature algorithm") mplemented the sigals keyword in the crt-list but never the dump of the keyword over the CLI. Must be backported as far as 2.8.	2025-02-12 17:16:50 +01:00
William Lallemand	037d2e5498	BUG/MINOR: ssl/cli: "show ssl crt-list" lacks client-sigals `b6ae2aafde` ("MINOR: ssl: allow to change the signature algorithm for client authentication") implemented the client-sigals keyword in the crt-list but never the dump of the keyword over the CLI. Must be backported as far as 2.8.	2025-02-12 17:16:50 +01:00
Willy Tarreau	561319bd1c	BUG/MEDIUM: fd: mark FD transferred to another process as FD_CLONED The crappy epoll API stroke again with reloads and transferred FDs. Indeed, when listening sockets are retrieved by a new worker from a previous one, and the old one finally stops listening on them, it closes the FDs. But in this case, since the sockets themselves were not closed, epoll will not unregister them and will continue to report new activity for these in the old process, which can only observe, count an fd_poll_drop event and not unregister them since they're not reachable anymore. The unfortunate effect is that long-lasting old processes are woken up at the same rate as the new process when accepting new connections, and can waste a lot of CPU. Accept rates divided by 8 were observed on a small test involving a slow transfer on 10 connections facing a reload every second so that 10 processes were busy dealing with them while another process was hammering the service with new connections. Fortunately, years ago we implemented a flag FD_CLONED exactly for similar purposes. Let's simply mark transferred FDs with FD_CLONED so that the process knows that these ones require special treatment and have to be manually unregistered before being closed. This does the job fine, now old processes correctly unregister the FD before closing it and no longer receive accept events for the new process. This needs to be backported to all stable versions. It only affects epoll, as usual, and this time in combination with transferred FDs (typically reloads in master-worker mode). Thanks to Damien Claisse for providing all detailed measurements and statistics allowing to understand and reproduce the problem.	2025-02-12 16:35:01 +01:00
Amaury Denoyelle	e2744d23be	MINOR: quic: refactor CRYPTO encoding and splitting This patch is the direct follow-up of the previous one which refactor STREAM frame encoding. Reuse the newly defined quic_strm_frm_fillbuf() and quic_strm_frm_split() functions for CRYPTO frame encoding. The code for CRYPTO and STREAM frames encoding should now be clearer as it is mostly identical.	2025-02-12 15:10:54 +01:00
Amaury Denoyelle	f96af8e463	MINOR: quic: refactor STREAM encoding and splitting CRYPTO and STREAM frames encoding is similar. If payload is too large, frame will be splitted and only the first payload part will be written in the output QUIC packet. This process is complexified by the presence of a variable-length integer Length field prior to the payload. This commit aims at refactor these operations. Define two functions to simplify the code : * quic_strm_frm_fillbuf() which is used to calculate the optimal frame length of a STREAM/CRYPTO frame with its payload in a buffer * quic_strm_frm_split() which is used to split the frame payload if buffer is too small With this patch, both functions are now implemented for STREAM encoding.	2025-02-12 15:10:03 +01:00
William Lallemand	4de86bbbfc	MEDIUM: initcall: allow to register mutiple post_section_parser per section Before this patch, REGISTER_CONFIG_SECTION() allowed to register one and only one callback (<post>) called after the parsing of a section. It was limitating because you couldn't register a post callback from anywhere else in the code. This patch introduces the new REGISTER_CONFIG_SECTION_POST() macros which allows to register a new post callback for a section keyword from anywhere. This patch introduces the feature by allowing `struct cfg_section` entries that does not have a `section_parser`, and then iterating on all cfg_section with a post_section_parser for a keyword.	2025-02-12 12:52:41 +01:00
William Lallemand	5c2039b5b8	CLEANUP: mworker: "program" section does not have a post_section_parser anymore The "program" section does not have a post_section_parser anymore so no need to make an exception for it.	2025-02-12 12:37:01 +01:00
William Lallemand	313eeae7db	BUG/MINOR: mworker: post_section_parser for the last section in discovery Previous patch `2c270a05f` ("BUG/MINOR: mworker: section ignored in discovery after a post_section_parser") needs an adjustment for the last section of the file. Indeed the post_section_parser of the last section must not be called in discovery mode. Must be backported in 3.1.	2025-02-12 12:34:57 +01:00
William Lallemand	2c270a05f0	BUG/MINOR: mworker: section ignored in discovery after a post_section_parser When a new section is discovered, the post_section_parser of the previous section is called. However in the new master-worker mode the discovery mode will skip the post_section_parser. But instead of trying to parse the current section keyword after that, it would skip completely the current line. This is a minor bug since there isn't a lot of section with post_section_parser, and not a lot of section to parse in discovery mode. But this could be reproduced like this: global expose-deprecated-directives resolvers res parse-resolv-conf program foo command sleep 10 program bar command sleep 10 Ths 'resolvers' section has a post_section_parser which will be ignored in discovery mode with the consequence of ignoring the first program section. This must be backported in 3.1.	2025-02-12 12:18:17 +01:00
Amaury Denoyelle	731340afbd	MINOR: quic: simplify length calculation for STREAM/CRYPTO frames STREAM and CRYPTO frames have a similar encoding format. In particular, both of them have a variable-length integer Length field just before the frame payload. It is complex to determine the optimal Length value before copying the payload data in the remaining buffer space. As such, helper functions were implemented to calculate this. However, CRYPTO and STREAM frames encoding implementation were not completely aligned, which renders the code harder to follow. The purpose of this commit is to simplify CRYPTO and STREAM frames encoding. First, a new helper quic_int_cap_length() is defined which is useful to determine the optimal buffer room available if prefixed by a variable-length integer as Length field. Then, processing of both CRYPTO and STREAM frames is now nearly identical, based on this new helper function. Functions max_available_room() and max_stream_data_size() are now unused and are removed.	2025-02-12 11:51:09 +01:00
Amaury Denoyelle	e6a223542a	BUG/MINOR: quic: fix CRYPTO payload size calcul for encoding Function max_stream_data_size() is used to determine the payload length of a CRYPTO frame. It takes into account that the CRYPTO length field is a variable length integer. Implemented calcul was incorrect as it reserved too much space as a frame header. This error is mostly due because max_stream_data_size() reuses max_available_room() which also reserve space for a variable length integer. This results in CRYPTO frames shorter of 1 to 2 bytes than the maximum achievable value, which produces in the end datagram shorter than the MTU. Fix max_stream_data_size() implementation. It is now merely a wrapper on max_available_room(). This ensures that CRYPTO frame encoding is now properly optimized to use the MTU available. This should be backported up to 2.6.	2025-02-12 11:51:09 +01:00
Amaury Denoyelle	63747452a3	BUG/MINOR: quic: reserve length field for long header encoding Long header packets have a mandatory Length field, which contains the size of Packet number and payload, encoded as a variable-length integer. Its value can thus only be determined after the payload size is known, which depends on the remaining buffer space after this variable-length field. Packet payload are encoded in two steps. First, a list of input frames is processed until the packet buffer is full. CRYPTO and STREAM frames payload can be splitted if need to fill the buffer. Real encoding is then performed as a second stage operation, first with Length field, then with the selected frames themselves. Before this patch, no space was reserved in the buffer for Length field when attaching the frames to the packet. This could result in a error as the packet payload would be too large for the remaining space. In practice, this issue was rarely encounted, mostly as a side-effect from another issue linked to CRYPTO frame encoding. Indeed, a wrong calculation is performed on CRYPTO splitting, which results in frame payload shorter by a few bytes than expected. This however ensured there would be always enough room for the Length field and payload during encoding. As CRYPTO frames are the only big enough content emitted with a Long header packet, this renders the current issue mostly non reproducible. Fix the original issue by reserving some space for Length field prior to frame payload calculation, using a maximum value based on the remaining room space. Packet length is then reduced if needed when encoding is performed, which ensures there is always enough room for the selected frames. Note that the other issue impacting CRYPTO frame encoding is not yet fixed. This could result in datagrams with Long header packets not completely extended to the full MTU. The issue will be addressed in another patch. This should be backported up to 2.6.	2025-02-12 11:51:09 +01:00
Willy Tarreau	627280e15f	MAJOR: leastconn: postpone the server's repositioning under contention When leastconn is used under many threads, there can be a lot of contention on leastconn, because the same node has to be moved around all the time (when picking it and when releasing it). In GH issue #2861 it was noticed that 46 threads out of 64 were waiting on the same lock in fwlc_srv_reposition(). In such a case, the accuracy of the server's key becomes quite irrelevant because nobody cares if the same server is picked twice in a row and the next one twice again. While other approaches in the past considered using a floating key to avoid moving the server each time (which was not compatible with the round-robin rule for equal keys), here a more drastic solution is needed. What we're doing instead is that we turn this lock into a trylock. If we can grab it, we do the job. If we can't, then we just wake up a server's tasklet dedicated to this. That tasklet will then try again slightly later, knowing that during this short time frame, the server's position in the queue is slightly inaccurate. Note that any thread touching the same server will also reposition it and save that work for next time. Also if multiple threads wake the tasklet up, then that's fine, their calls will be merged and a single lock will be taken in the end. Testing this on a 24-core EPYC 74F3 showed a significant performance boost from 382krps to 610krps. The performance profile reported by perf top dropped from 43% to 2.5%: Before: Overhead Shared Object Symbol 43.46% haproxy-master-inlineebo [.] fwlc_srv_reposition 21.20% haproxy-master-inlineebo [.] fwlc_get_next_server 0.91% haproxy-master-inlineebo [.] process_stream 0.75% [kernel] [k] ice_napi_poll 0.51% [kernel] [k] tcp_recvmsg 0.50% [kernel] [k] ice_start_xmit 0.50% [kernel] [k] tcp_ack After: Overhead Shared Object Symbol 30.37% haproxy [.] fwlc_get_next_server 2.51% haproxy [.] fwlc_srv_reposition 1.91% haproxy [.] process_stream 1.46% [kernel] [k] ice_napi_poll 1.36% [kernel] [k] tcp_recvmsg 1.04% [kernel] [k] tcp_ack 1.00% [kernel] [k] skb_release_data 0.96% [kernel] [k] ice_start_xmit 0.91% haproxy [.] conn_backend_get 0.82% haproxy [.] connect_server 0.82% haproxy [.] run_tasks_from_lists Tested on an Ampere Altra with 64 aarch64 cores dedicated to haproxy, the gain is even more visible (3.6x): Before: 311-323k rps, 3.16-3.25ms, 6400% CPU Overhead Shared Object Symbol 55.69% haproxy-master [.] fwlc_srv_reposition 33.30% haproxy-master [.] fwlc_get_next_server 0.89% haproxy-master [.] process_stream 0.45% haproxy-master [.] h1_snd_buf 0.34% haproxy-master [.] run_tasks_from_lists 0.32% haproxy-master [.] connect_server 0.31% haproxy-master [.] conn_backend_get 0.31% haproxy-master [.] h1_headers_to_hdr_list 0.24% haproxy-master [.] srv_add_to_idle_list 0.23% haproxy-master [.] http_request_forward_body 0.22% haproxy-master [.] __pool_alloc 0.21% haproxy-master [.] http_wait_for_response 0.21% haproxy-master [.] h1_send After: 1.21M rps, 0.842ms, 6400% CPU Overhead Shared Object Symbol 17.44% haproxy [.] fwlc_get_next_server 6.33% haproxy [.] process_stream 4.40% haproxy [.] fwlc_srv_reposition 3.64% haproxy [.] conn_backend_get 2.75% haproxy [.] connect_server 2.71% haproxy [.] h1_snd_buf 2.66% haproxy [.] srv_add_to_idle_list 2.33% haproxy [.] run_tasks_from_lists 2.14% haproxy [.] h1_headers_to_hdr_list 1.56% haproxy [.] stream_set_backend 1.37% haproxy [.] http_request_forward_body 1.35% haproxy [.] http_wait_for_response 1.34% haproxy [.] h1_send And at similar loads, the CPU usage considerably drops (3.55x), as well as the response time (10x): After: 320k rps, 0.322ms, 1800% CPU Overhead Shared Object Symbol 7.62% haproxy [.] process_stream 4.64% haproxy [.] h1_headers_to_hdr_list 3.09% haproxy [.] h1_snd_buf 3.08% haproxy [.] h1_process_demux 2.22% haproxy [.] __pool_alloc 2.14% haproxy [.] connect_server 1.87% haproxy [.] h1_send > 1.84% haproxy [.] fwlc_srv_reposition 1.84% haproxy [.] run_tasks_from_lists 1.77% haproxy [.] sock_conn_iocb 1.75% haproxy [.] srv_add_to_idle_list 1.66% haproxy [.] http_request_forward_body 1.65% haproxy [.] wake_expired_tasks 1.59% haproxy [.] h1_parse_msg_hdrs 1.51% haproxy [.] http_wait_for_response > 1.50% haproxy [.] fwlc_get_next_server The cost of fwlc_get_next_server() naturally increases as the server count increases, but now has no visible effect on updates. The load distribution remains unchanged compared to the previous approach, the weight still being respected. For further improvements to the fwlc algo, please consult github issue #881 which centralizes everything related to this algorithm.	2025-02-12 11:48:10 +01:00
Willy Tarreau	b6a8318cc2	MEDIUM: server: allocate a tasklet for asyncronous requeuing This creates a tasklet that only expects to be called when the LB algorithm is under contention when trying to reposition the server in its tree. Indeed, that's one of the operations that usually requires to take a write lock on a highly contended area, often for very little benefits under contention; indeed, under load, if a server keeps its previous position for a few extra microseconds, usually there's no harm. Thus this new tasklet can be woken up by the LB algo to ask the server to later call lbprm.server_requeue(). It does nothing else.	2025-02-11 17:24:09 +01:00
Willy Tarreau	20b8c4ddba	MINOR: lbprm: add a new callback ->server_requeue to the lbprm This callback will be used to reposition a server to its expected position regardless of the fact that it was taken or dropped. It will only be used by supporting LB algos. For now, only fwlc defines it and assigns it to fwlc_srv_reposition(). At the moment it's not used yet.	2025-02-11 17:16:14 +01:00
Willy Tarreau	eced1d6d8a	DEBUG: thread: reduce the struct lock_stat to store only 30 buckets Storing only 30 buckets means we only keep 256 bytes per label. This further simplifies address calculation and reduces the memory used without complicating the locking code. It means we won't measure wait times larger than a second but we're not supposed to face this as it would trigger the watchdog anyway. It may become a little bit just if measuring using rdtsc() instead of now_mono_time() though (typically the limit would be around 350ms for a 3 GHz CPU).	2025-02-10 18:34:43 +01:00
Willy Tarreau	c2f2d6fd3c	DEBUG: thread: make lock_stat per operation instead of for all operations It's more convenient (and more readable) to have the lock stats arranged by operation type (read, seek, write). It will also allow to later simplify the structure format and the bucket address calculation. Now lock_stat[] got split into lock_stats_rd[], lock_stats_sk[], lock_stats_wr[].	2025-02-10 18:34:43 +01:00
Willy Tarreau	4168d1278c	DEBUG: thread: don't keep the redundant _locked counter Now that we have our sums by bucket, the _locked counter is redundant since it's always equal to the sum of all entries. Let's just get rid of it and replace its consumption with a loop over all buckets, this will reduce the overhead of taking each lock at the expense of a tiny extra effort when dumping all locks, which we don't care about.	2025-02-10 18:34:43 +01:00
Willy Tarreau	a22550fbd7	DEBUG: thread: report the wait time buckets for lock classes In addition to the total/average wait time, we now also store the wait time in 2^N buckets. There are 32 buckets for each type (read, seek, write), allowing to store wait times from 1-2ns to 2.1-4.3s, which is quite sufficient, even if we'd want to switch from NS to CPU cycles in the future. The counters are only reported for non- zero buckets so as not to visually pollute the output. This significantly inflates the lock_stat struct, which is now aligned to 256 bytes and rounded up to 1kB. But that's not really a problem, given that there's only one per lock label.	2025-02-10 18:34:43 +01:00
Willy Tarreau	0b849c59fb	DEBUG: thread: make lock time computation more consistent The lock time computation was a bit inconsistent between functions, particularly those using a try_lock. Some of them would count the lock as taken without counting the time, others would simply not count it. This is essentially due to the way the time is retrieved, as it was done inside the atomic increment. Let's instead always use start_time to carry the elapsed time, by presetting it to the negative time before the event and addinf the positive time after, so that it finally contains the duration. Then depending on the try lock's success, we add the result or not. This was generalized to all lock functions for consistency, and because this will be handy for future changes.	2025-02-10 18:34:43 +01:00
Willy Tarreau	99a88ee904	DEBUG: thread: report the spin lock counters as seek locks Technically speaking, spin locks use a seek lock, not a write lock, so better count them appropriately for consistency (lock time, or function calls count).	2025-02-10 18:34:43 +01:00
Willy Tarreau	7ddcdff33f	BUG/MEDIUM: debug: close a possible race between thread dump and panic() The rework of the thread dumping mechanism in 2.8 with commit `9a6ecbd590` ("MEDIUM: debug: simplify the thread dump mechanism") opened a small race, which is that a thread in the process of dumping other ones may block the other one from panicing while it's looping at the end of ha_thread_dump_fill(), or any other sequence involving the currently dumped one. This was emphasized in 3.1 with commit `148eb5875f` ("DEBUG: wdt: better detect apparently locked up threads and warn about them") that allowed to emit warnings about long-stuck threads, because in this case, what happens is that sometimes a thread starts to emit a warning (or a set of warnings), and while the warning is being awaited for, a panic finally happens and interrupts either the dumping thread, which never finishes and waits for the target's pointer to become NULL which will never happen since it was supposed to do it itself, or the currently dumped thread which could wait for the dumping thread to become ready while this one has not released the former. In order to address this, first we now make sure never to dump a thread that is already in the process of dumping another one. We're adding a new thread flag to know this situation, that is set in ha_thread_dump_fill() and cleared in ha_thread_dump_done(). And similarly, we don't trigger the watchdog on a thread waiting for another one to finish its dump, as it's likely a case of warning (and maybe even a panic) that makes them wait for each other and we don't want such cases to be reentrant. Finally, we check in the main polling loop that the flag never accidentally leaked (e.g. wrong flag manipulation) as this would be difficult to spot with bad consequences. This should be backported at least to 2.8, and should resolve github issue #2860. Thanks to Chris Staite for the very informative backtrace that exhibited the problem.	2025-02-10 18:34:26 +01:00
William Lallemand	3912780b1e	BUG/MEDIUM: ssl: chosing correct certificate using RSA-PSS with TLSv1.3 The clienthello callback was written when TLSv1.3 was not yet out, and signatures algorithm changed since then. With TLSv1.2, the least significant byte was used to determine the SignatureAlgorithm, which could be rsa(1), dsa(2), ecdsa(3). https://datatracker.ietf.org/doc/html/rfc5246#section-7.4.1.4.1 This was used to chose which type of certificate to push to the client. But TLSv1.3 changed that, and introduced new RSA-PSS algorithms that does not have the least sinificant byte to 1. https://datatracker.ietf.org/doc/html/rfc8446#section-4.2.3 This would result in chosing the wrong certificate when an RSA an ECDSA ones are in the configuration for the same SNI or default entry. This patch fixes the issue by parsing bothe hash and signature field to check the RSA-PSS signature scheme. This must fix issue #2852. This must be backported in every stable versions. The code was moved from ssl_sock.c to ssl_clienthello in recent versions.	2025-02-07 20:56:42 +01:00
Willy Tarreau	8d63dc50ab	BUG/MINOR: debug: make sure the "debug dev sched" tasks don't block stopping When "debug dev sched" is used to pop up background tasks, these tasks are never stopped, so we must be careful to stop them when the stopping flag is set, otherwise they can prevent the process from stopping when sufficiently numerous (tests went as far as 100 million tasks, leading the run queue never being completely purged in one poll round). No backport is needed since this is only used when debugging and tuning the scheduler.	2025-02-07 18:04:29 +01:00
Willy Tarreau	6765a32eb4	BUG/MINOR: debug: make "debug dev sched" accept a negative TID The TID passed to "debug dev sched" is used to pin the task to a given thread. A negative value normally means the task is unpinned and goes to the shared wait queue and run queue. However due to the type of the variable, negative values were mapped as highly positive values and were set to the current thread. Let's add the proper cast to fix this. No backport is needed since this is only used to experiment with the scheduler and measure its performance.	2025-02-07 18:04:29 +01:00
Christopher Faulet	d48b5add88	BUG/MINOR: stats-json: Define JSON_INT_MAX as a signed integer A JSON integer is defined in the range [-(253)+1, (253)-1]. Macro are used to define the minimum and the maximum value, The minimum one is defined using the maximum one. So JSON_INT_MAX must be defined as a signed integer value to avoid wrong cast of JSON_INT_MIN. It was reported by Coverity in #2841: CID 1587769. This patch could be backported to all stable versions.	2025-02-06 17:19:49 +01:00
Christopher Faulet	bc487afc85	MINOR: filters: Improve errors formating during filters parsing The error message reported by a filter during parsing are displayed between quotes. It is not really user friendly. So let's remove the quotes here.	2025-02-06 17:03:40 +01:00
Christopher Faulet	b20e2c96cf	BUG/MINOR: flt-trace: Support only one name option When a trace filter is defined, only one 'name' option is expected. But it was not tested. Thus it was possible to set several names leading to a memory leak. It is now tested, and it is not allowed to redefine the trace filter name. It was reported by Coverity in #2841: CID 1587768. This patch could be backported to all stable versions.	2025-02-06 17:01:15 +01:00
Christopher Faulet	a7f513af91	BUG/MINOR: auth: Fix a leak on error path when parsing user's groups In a userlist section, when a user is parsed, if a specified group is not found, an error is reported. In this case we must take care to release the alredy built groups list. It was reported by Coverity in #2841: CID 1587770. This patch could be backported to all stable versions.	2025-02-06 16:55:37 +01:00
Christopher Faulet	a1e14d2a82	BUG/MINOR: config/userlist: Support one 'users' option for 'group' directive When a group is defined in a userlist section, only one 'users' option is expected. But it was not tested. Thus it was possible to set several options leading to a memory leak. It is now tested, and it is not allowed to redefine the users option. It was reported by Coverity in #2841: CID 1587771. This patch could be backported to all stable versions.	2025-02-06 16:55:29 +01:00
Christopher Faulet	75e8c8ed33	BUG/MINOR: cli: Fix a possible infinite loop in _getsocks() In _getsocks() functuoin, when we failed to set the unix socket in non-blocking mode, a goto to "out" label led to loop infinitly. To fix the issue, we must only let the function exit. This patch should be backported to all stable versions.	2025-02-06 15:44:21 +01:00
Christopher Faulet	372cc696d4	BUG/MINOR: cli: Fix memory leak on error for _getsocks command Some errors in parse function of _getsocks commands were not properly handled and immediately returned, leading to a memory leak on cmsgbuf and tmpbuf buffers. To fix the issue, instead of immediately return with -1, we jump to "out" label. Returning 1 intead of -1 in that case is valid. This was reported by Coverity in #2841: CIDs 1587773 and 1587772. This patch should be backported as far as 2.4.	2025-02-06 15:43:04 +01:00
Christopher Faulet	7e927243b9	BUG/MINOR: cli: Don't set SE flags from the cli applet Since the CLI was updated to use the new applet API, it should no longer set directly the SE flags. Instead, the corresponding applet flags must be set, using the applet API (appet_set_*). It is true for the CLI I/O handler but also for the commands parse function and I/O callback function. This patch should be backported as far as 3.0.	2025-02-06 15:23:20 +01:00
Christopher Faulet	0aa69e7865	MINOR: mux-spop/mux-fcgi: Add support of the debug string for logs Now it is possible to have debug info about FCGI and SPOP multiplexers. To do so, the support for the MUX_SCTL_DBG_STR command was implemented for these muxes. The have this log message, the log-format must be set to: log-format "$HAPROXY_HTTP_LOG_FMT bs=<%[bs.debug_str]>"	2025-02-06 11:19:32 +01:00
Christopher Faulet	456cfa450a	MINOR: mux-fcgi: Dump info about connections and streams in dedicated functions fcgi_show_fd() function was splitted to dump the info about the FCGI connections and the FCGI streams in dedicated functions, duplicating this way what is performed in other muxes. In addition, the FCGI multiplexer now implements the .show_sd callback function called by "show sess" CLI command.	2025-02-06 11:19:32 +01:00
Christopher Faulet	bbc8c98a54	MINOR: tevt/mux-fcgi: Report termination events for the FCGI connect/stream Termination events are now reported for the FCGI connections and the FCGI streams. In addition, all available termination events logs are reported in the "show-fd" callback function. The .ctl and .sctl callback functions were also update to support, respectively, MUX_CTL_TEVTS and MUX_SCTL_TEVTS commands.	2025-02-06 11:19:32 +01:00
Christopher Faulet	5b1c2277ae	BUG/MEDIUM: mux-fcgi: Propagate flags to SE in fcgi_strm_wake_one_stream The commit is flagged as a bug because the same fix on the H2 multiplexer was reported as a bug. But no issue was reported. When a stream is explicitly woken up by the FCGI conneciton, if an error condition is detected, the corresponding error flag is set on the SE. So SE_FL_ERROR or SE_FL_ERR_PENDING, depending if the end of stream was reported or not. However, there is no attempt to propagate other termination flags. We must be sure to properly set SE_FL_EOI and SE_FL_EOS when appropriate to be able to switch a pending error to a fatal error. Because of this bug, the SE could remain with a pending error and no end of stream, preventing the applicative stream to trully abort it. It means on some abort scenario, it seems to be possible to block a stream infinitely. This patche depends on: * MEDIUM: mux-fcgi: Add a function to propagate termination flags from fstrm to SE * BUG/MEDIUM: mux-fcgi: Properly handle read0 on partial records This patch could be backported at least as far as 2.8 after a period of observation. However no bug was reportedn so there is no rush.	2025-02-06 11:19:32 +01:00
Christopher Faulet	ccdca4bb77	MEDIUM: mux-fcgi: Add a function to propagate termination flags from fstrm to SE The function fcgi_strm_propagate_term_flags() was added to check the FSTRM state and evaluate when EOI/EOS/ERR_PENDING/ERROR flags must be set on the SE. It is not the only place where those flags are set. But it centralizes the synchro between the FCGI stream and the SC. For now, this function is only used at the end of fcgi_rcv_buf(). But it will be used to fix a potential bug.	2025-02-06 11:19:32 +01:00
Christopher Faulet	7b638eb1a6	MINOR: mux-spop: Implement .show_sd callback function The SPOP multiplexer now implements the .show_sd callback function called by "show sess" CLI command.	2025-02-06 11:19:32 +01:00
Christopher Faulet	5aeb678762	MINOR: mux-spop: Dump info about connections and streams in dedicated functions spop_show_fd() function was splitted to dump the info about the SPOP connections and the SPOP streams in dedicated functions, duplicating this way what is performed in other muxes.	2025-02-06 11:19:32 +01:00
Christopher Faulet	eb4e517489	CLEANUP: mux-spop: Remove useless comments Just a small cleanup to remove some comments added during the development of the mux.	2025-02-06 11:19:32 +01:00

1 2 3 4 5 ...

19006 commits