haproxy

mirror of https://github.com/haproxy/haproxy.git synced 2026-04-29 10:11:49 -04:00

Author	SHA1	Message	Date
Amaury Denoyelle	35542ce7bf	MINOR: mux-quic: report local error on stream endpoint asap If an error a detected at the MUX layer, all remaining stream endpoints should be closed asap with error set. This is now done by checking for QC_CF_ERRL flag on qc_wake_some_streams() and qc_send_buf(). To complete this, qc_wake_some_streams() is called by qc_process() if needed. This should help to quickly release streams as soon as a new error is detected locally by the MUX or APP layer. This allows to in turn free the MUX instance itself. Previously, error would not have been automatically reported until the transport layer closure would occur on CONNECTION_CLOSE emission. This should be backported up to 2.7.	2023-05-04 16:36:51 +02:00
Amaury Denoyelle	51f116d65e	MINOR: mux-quic: adjust local error API When a fatal error is detected by the QUIC MUX or H3 layer, the connection should be closed with a CONNECTION_CLOSE with an error code as the reason. Previously, a direct call was used to the quic_conn layer to try to close the connection. This API was adjusted to be more flexible. Now, when an error is detected, the function qcc_set_error() is called. This set the flag QC_CF_ERRL with the error code stored by the MUX. The connection will be closed soon so most of the operations are not conducted anymore. Connection is then finally closed during qc_send() via quic_conn layer if QC_CF_ERRL is set. This will set the flag QC_CF_ERRL_DONE which indicates that the MUX instance can be freed. This model is cleaner and brings the following improvments : - interaction with quic_conn layer for closure is centralized on a single function - CO_FL_ERROR is not set anymore. This was incorrect as this should be reserved to errors reported by the transport layer to be similar with other haproxy components. As a consequence, qcc_is_dead() has been adjusted to check for QC_CF_ERRL_DONE to release the MUX instance. This should be backported up to 2.7.	2023-05-04 16:36:51 +02:00
Amaury Denoyelle	b8901d2c86	MINOR: mux-quic: wake up after recv only if avail data When HTX content is transferred from qcs instance to upper stream endpoint, a wakeup is conducted for MUX tasklet. However, this is only necessary if demux was interrupted due to a full QCS HTX buffer. This should be backported up to 2.7.	2023-05-04 16:36:51 +02:00
Amaury Denoyelle	8d44bfaf0b	MINOR: mux-quic: add trace event for local error Add a dedicated trace event QMUX_EV_QCC_ERR. This is used for locally detected error when a CONNECTION_CLOSE should be emitted. This should be backported up to 2.7.	2023-05-04 16:36:51 +02:00
Amaury Denoyelle	b737f95009	BUG/MINOR: mux-quic: prevent quic_conn error code to be overwritten When MUX performs a graceful shutdown, quic_conn error code is set to a "no error" code which depends on the application layer used. However, this may overwrite a previous error code if quic_conn layer has detected an error on its side. In practice, this behavior has not been seen on production. In fact, it may have undesirable effect only if this error code modification happens between the quic_conn error detection and the emission of the CONNECTION_CLOSE, so it should be pretty rare. However, there is still a tiny possibility it may happen. To prevent this, first check that quic_conn error code is not set before setting it. Ideally, transport layer API should be adjusted to be able to set this without fiddling with the quic_conn directly. This should be backported up to 2.6.	2023-05-04 16:36:51 +02:00
Christopher Faulet	4403cdf653	BUG/MEDIUM: mux-h2: Properly handle end of request to expect data from server The commit 2722c04b ("MEDIUM: mux-h2: Don't expect data from server as long as request is unfinished") introduced a regression in the H2 multiplexer. The end of the request is not systematically handled to state a H2 stream on client side now expexts data from the server. Indeed, while the client is uploading its request, the H2 stream warns it does not expect data from the server. This way, no server timeout is applied at this stage. When end of the request is detected, the H2 stream must state it now expects the server response. This enables the server timeout. However, it was only performed at one place while the end of the request can be handled at different places. First, during a zero-copy in h2_rcv_buf(). Then, when the SC is created with the full request. Because of this bug, it is possible to totally disable the server timeout for H2 streams. In h2_rcv_buf(), we now rely on h2s flags to detect the end of the request, but only when the rxbuf was emptied. It is a 2.8-specific bug. No backport needed.	2023-05-04 16:29:27 +02:00
Willy Tarreau	e5e62231d8	MINOR: debug: permit the "debug dev loop" to run under isolation Sometimes it's convenient to test the effect of tasks running under isolation, e.g. to validate the contents of the crash dumps. Let's add an optional "isolated" keyword to "debug dev loop" for this.	2023-05-04 11:50:26 +02:00
Willy Tarreau	b30ced3d88	BUG/MINOR: debug: fix incorrect profiling status reporting in show threads Thread dumps include a field "prof" for each thread that reports whether task profiling is currently active or not. It turns out that in 2.7-dev1, commit `680ed5f28` ("MINOR: task: move profiling bit to per-thread") mistakenly replaced it with a check for the current thread's bit in the thread dumps, which basically is the only place where another thread is being watched. The same mistake was done a few lines later by confusing threads_want_rdv_mask with the profiling mask. This mask disappeared in 2.7-dev2 with commit `598cf3f22` ("MAJOR: threads: change thread_isolate to support inter-group synchronization"), though instead we know the ID of the isolated thread. This commit fixes this and now reports "isolated" instead of "wantrdv". This can be backported to 2.7.	2023-05-04 11:41:33 +02:00
Willy Tarreau	8b3e39e37b	MINOR: activity: allow "show activity" to restart in the middle of a line 16kB buffers are not enough to dump 4096 threads with up to 10 bytes value on each line. By storing the column number in the applet's context, we can now restart from the last attempted column. This requires to dump all values as they are produced, but it doesn't cost that much: a 4096-thread output from a fesh process produces 300kB of output in ~8ms, or ~400us per call (19*16kB), most of which are spent in vfprintf(). Given that we don't print more than needed, it doesn't really change anything. The main caveat is that when interrupted on such large lines, there's a great possibility that the total or average on the first column doesn't match anymore the sum or average of all dumped values. In order to avoid this whenever possible (typically less than ~1500 threads), we first try to dump entire lines and only proceed one column at a time when we have to retry a failed dump. This is already the same for other stats that are dumped in an interruptible way anyway and there's little that can be done about it at this point (and not much immediately perceived benefit in doing this with extreme accuracy for >1500 threads).	2023-05-03 17:26:11 +02:00
Willy Tarreau	6ed0b9885d	MINOR: activity: allow "show activity" to restart dumping on any line When using many threads, it's difficult to see the end of "show activity" due to the numerous columns which fill the buffer. For example a dump of a 256-thread, freshly booted process yields around 15kB. Here by arranging the dump in a loop around a switch/case block where each case checks the code line number against the current dump position, we have a restartable counter for free with a granularity of the line of code, without having to maintain a matching between states and specific lines. It just requires to reset the trash buffer for each line and to try to dump it after each line. Now dumping 256 threads after a few seconds of traffic happily emits 20kB.	2023-05-03 17:24:54 +02:00
Willy Tarreau	8ee0d11cb8	MINOR: activity: iterate over all fields in a main loop for dumping Now each line of "show activity" will iterate over n+2 fields, one for the line header, one for the total, and one per thread. This will soon allow us to save the current state in a restartable way.	2023-05-03 17:24:54 +02:00
Willy Tarreau	a465b21516	MINOR: activity: show the line header inside the SHOW_VAL macro Doing so will allow us to drop the extra chunk_appendf() dedicated to the line header and simplify iteration over restartable columns.	2023-05-03 17:24:54 +02:00
Willy Tarreau	5ddf9bea09	MINOR: activity: use a single macro to iterate over all fields Instead of having SHOW_AVG() and SHOW_TOT(), let's just have SHOW_VAL() which iterates over all values.	2023-05-03 17:24:54 +02:00
Willy Tarreau	ff508f12c6	BUILD: cli: fix build on Windows due to isalnum() implemented as a macro Commit `986798718` ("DEBUG: cli: add "debug dev task" to show/wake/expire/kill tasks and tasklets") broke the build on windows due to this: src/debug.c:940:95: error: array subscript has type char [-Werror=char-subscripts] 940 \| caller && may_access(caller) && may_access(caller->func) && isalnum(*caller->func) ? caller->func : "0", \| ^~~~~~~~~~~~~ It's classical on platforms which implement ctype.h as macros instead of functions, let's cast it as uchar. No backport is needed.	2023-05-03 16:32:50 +02:00
William Lallemand	117c7fde06	BUG/MINOR: ssl/sample: x509_v_err_str converter output when not found The x509_v_err_str converter now outputs the numerical value as a string when the corresponding constant name was not found. Must be backported as far as 2.7.	2023-05-03 15:19:38 +02:00
Willy Tarreau	9867987182	DEBUG: cli: add "debug dev task" to show/wake/expire/kill tasks and tasklets When analyzing certain types of bugs in field, sometimes it would be nice to be able to wake up a task or tasklet to see how events progress (e.g. to detect a missing wakeup condition), or expire or kill such a task. This restricted command shows hte current state of a task or tasklet and allows to manipulate it like this. However it must be used with extreme care because while it does verify that the pointers are mapped, it cannot know if they point to a real task, and performing such actions on something not a task will easily lead to a crash. In addition, performing a "kill" on a task has great chances of provoking a deferred crash due to a double free and/or another kill that is not idempotent. Use with extreme care!	2023-05-03 11:47:44 +02:00
Willy Tarreau	dd01448953	MINOR: debug: clarify "debug dev stream" help message The help message was insufficient to figure how to use it and specify the stream pointer and changes to operate.	2023-05-03 11:47:44 +02:00
Willy Tarreau	65efd33c06	BUG/MINOR: stream/cli: fix stream age calculation in "show sess" The "show sess" command displays the stream's age in synthetic form, and also makes it appear in the long version (show sess all). But that last one uses the wrong origin, it uses accept_date.tv_sec instead of accept_ts (formerly known as tv_accept). This was introduced in 1.4.2 with the long format, with commit `66dc20a17` ("[MINOR] stats socket: add show sess <id> to dump details about a session"), while the code that split the two variables was introduced in 1.3.16 with commit `b7f694f20` ("[MEDIUM] implement a monotonic internal clock"). This problem was revealed by recent change `ad5a5f677` ("MEDIUM: tree-wide: replace timeval with nanoseconds in tv_accept and tv_request") that made this value report random garbage, and generally emphasized by the fact that in 2.8 the two clocks have sufficiently large an offset for such mistakes to be noticeable early. Arguably a difference between date and accept_date could also make sense, to indicate if the stream had been there for more than 49 days, but this would introduce instabilities for most sockets (including negative times) for extremely rare cases while the goal is essentially to see how much longer than a configured timeout a stream has been there. And that's what other locations (including the short form) provide. This patch could be backported but most users will never notice. In case of backport, tv_accept.tv_sec should be used instead of accept_date.tv_sec.	2023-05-03 11:47:44 +02:00
William Lallemand	64a77e3ea5	MINOR: ssl: disable CRL checks with WolfSSL when no CRL file WolfSSL is enabling by default the CRL checks even if a CRL file wasn't provided. This patch resets the default X509_STORE flags so this is not checked by default.	2023-05-02 18:30:11 +02:00
Tim Duesterhus	0ababda701	BUG/MINOR: stats: fix typo in `TotalSplicedBytesOut` field name An additional `d` slipped in there. This likely should not be backported, because scripts might rely on the typoed name. Public discussion on this topic here: https://www.mail-archive.com/haproxy@formilux.org/msg43359.html	2023-05-02 11:15:49 +02:00
Amaury Denoyelle	bc0adfa334	MINOR: proxy: factorize send rate measurement Implement a new dedicated function increment_send_rate() which can be call anywhere new bytes must be accounted for global total sent.	2023-04-28 16:53:44 +02:00
Amaury Denoyelle	1bcb695a05	MINOR: quic: use real sending rate measurement Before this patch, global sending rate was measured on the QUIC lower layer just after sendto(). This meant that all QUIC frames were accounted for, including non STREAM frames and also retransmission. To have a better reflection of the application data transferred, move the incrementation into the MUX layer. This allows to account only for STREAM frames payload on their first emission. This should be backported up to 2.6.	2023-04-28 16:52:26 +02:00
Aleksandar Lazic	5529c9985e	MINOR: sample: Add bc_rtt and bc_rttvar This Patch adds fetch samples for backends round trip time.	2023-04-28 16:31:08 +02:00
Willy Tarreau	c05d30e9d8	MINOR: clock: replace the timeval start_time with start_time_ns Now that "now" is no more a timeval, there's no point keeping a copy of it as a timeval, let's also switch start_time to nanoseconds, it simplifies operations.	2023-04-28 16:08:08 +02:00
Willy Tarreau	69530f59ae	MEDIUM: clock: replace timeval "now" with integer "now_ns" This puts an end to the occasional confusion between the "now" date that is internal, monotonic and not synchronized with the system's date, and "date" which is the system's date and not necessarily monotonic. Variable "now" was removed and replaced with a 64-bit integer "now_ns" which is a counter of nanoseconds. It wraps every 585 years, so if all goes well (i.e. if humanity does not need haproxy anymore in 500 years), it will just never wrap. This implies that now_ns is never nul and that the zero value can reliably be used as "not set yet" for a timestamp if needed. This will also simplify date checks where it becomes possible again to do "date1<date2". All occurrences of "tv_to_ns(&now)" were simply replaced by "now_ns". Due to the intricacies between now, global_now and now_offset, all 3 had to be turned to nanoseconds at once. It's not a problem since all of them were solely used in 3 functions in clock.c, but they make the patch look bigger than it really is. The clock_update_local_date() and clock_update_global_date() functions are now much simpler as there's no need anymore to perform conversions nor to round the timeval up or down. The wrapping continues to happen by presetting the internal offset in the short future so that the 32-bit now_ms continues to wrap 20 seconds after boot. The start_time used to calculate uptime can still be turned to nanoseconds now. One interrogation concerns global_now_ms which is used only for the freq counters. It's unclear whether there's more value in using two variables that need to be synchronized sequentially like today or to just use global_now_ns divided by 1 million. Both approaches will work equally well on modern systems, the difference might come from smaller ones. Better not change anyhting for now. One benefit of the new approach is that we now have an internal date with a resolution of the nanosecond and the precision of the microsecond, which can be useful to extend some measurements given that timestamps also have this resolution.	2023-04-28 16:08:08 +02:00
Willy Tarreau	eed5da1037	MINOR: clock: do not use now.tv_sec anymore Instead we're using ns_to_sec(tv_to_ns(&now)) which allows the tv_sec part to disappear. At this point, "now" is only used as a timeval in clock.c where it is updated.	2023-04-28 16:08:08 +02:00
Willy Tarreau	e8e4712771	MINOR: checks: use a nanosecond counters instead of timeval for checks->start Now we store the checks start date as a nanosecond timestamps instead of a timeval, this will simplify the operations with "now" in the near future.	2023-04-28 16:08:08 +02:00
Willy Tarreau	b68d308aec	MINOR: activity: use nanoseconds, not timeval to compute uptime Now that we have the required functions, let's get rid of the timeval in intermediary calculations.	2023-04-28 16:08:08 +02:00
Willy Tarreau	563efe62e9	MINOR: stats: use nanoseconds, not timeval to compute uptime Now that we have the required functions, let's get rid of the timeval in intermediary calculations.	2023-04-28 16:08:08 +02:00
Willy Tarreau	ad5a5f6779	MEDIUM: tree-wide: replace timeval with nanoseconds in tv_accept and tv_request Let's get rid of timeval in storage of internal timestamps so that they are no longer mistaken for wall clock time. These were exclusively used subtracted from each other or to/from "now" after being converted to ns, so this patch removes the tv_to_ns() conversion to use them natively. Two occurrences of tv_isge() were turned to a regular wrapping subtract.	2023-04-28 16:08:08 +02:00
Willy Tarreau	aaebcae58b	MINOR: spoe: switch the timeval-based timestamps to nanosecond timestamps Various points were collected during a request/response and were stored using timeval. Let's now switch them to nanosecond based timestamps.	2023-04-28 16:08:08 +02:00
Willy Tarreau	76d343d3d3	MINOR: time: replace calls to tv_ms_elapsed() with a linear subtract Instead of operating on {sec, usec} now we convert both operands to ns then subtract them and convert to ms. This is a first step towards dropping timeval from these timestamps. Interestingly, tv_ms_elapsed() and tv_ms_remain() are no longer used at all and could be removed.	2023-04-28 16:08:08 +02:00
Willy Tarreau	7222db7b84	BUG/MINOR: stats: report the correct start date in "show info" The "show info" help for "Start_time_sec" says "Start time in seconds" so it's definitely the start date in human format, not the internal one that is solely used to compute uptime. Since commit `28360dc` ("MEDIUM: clock: force internal time to wrap early after boot"), both are split apart since the start time takes into account the offset needed to cause the early wraparound, so we must only use start_date here. No backport is needed.	2023-04-28 16:08:08 +02:00
Christopher Faulet	2ebac6a320	BUG/MEDIUM: tcpcheck: Don't eval custom expect rule on an empty buffer The commit `a664aa6a6` ("BUG/MINOR: tcpcheck: Be able to expect an empty response") instroduced a regression for expect rules relying on a custom function. Indeed, there is no check on the buffer to be sure it is not empty before calling the custom function. But some of these functions expect to have data and don't perform any test on the buffer emptiness. So instead of fixing all custom functions, we just don't eval them if the buffer is empty. This patch must be backported but only if the commit above was backported first.	2023-04-28 15:01:10 +02:00
Christopher Faulet	89aeabff5b	BUG/MINOR: resolvers: Use sc_need_room() to wait more room when dumping stats It was a cut/paste typo during stream-interface to conn-stream refactoring. sc_have_room() was used instead of sc_need_room(). This patch must be backported as far as 2.6.	2023-04-28 08:51:34 +02:00
Christopher Faulet	e99c43907c	BUG/MEDIUM: spoe: Don't start new applet if there are enough idle ones It is possible to start too many applets on sporadic burst of events after an inactivity period. It is due to the way we estimate if a new applet must be created or not. It is based on a frequency counter. We compare the events processing rate against the number of events currently processed (in progress or waiting to be processed). But we should also take care of the number of idle applets. We already track the number of idle applets, but it is global and not per-thread. Thus we now also track the number of idle applets per-thread. It is not a big deal because this fills a hole in the spoe_agent structure. Thanks to this counter, we can refrain applets creation if there is enough idle applets to handle currently processed events. This patch should be backported to every stable versions.	2023-04-28 08:51:34 +02:00
Willy Tarreau	d2f61de8c2	BUG/MINOR: hlua: return wall-clock date, not internal date in core.now() That's hopefully the last one affected by this. It was a bit trickier because there's the promise in the doc that the date is monotonous, so we continue to use now-start_time as the uptime value and add it to start_date to get the current date. It was also emphasized by commit `28360dc` ("MEDIUM: clock: force internal time to wrap early after boot"), causing core.now() to return a date of Mar 20 on Apr 27. No backport is needed.	2023-04-27 18:44:14 +02:00
Willy Tarreau	bc3c4e85f0	BUG/MINOR: trace: show wall-clock date, not internal date in show activity Yet another case where "now" was used instead of "date" for a publicly visible date that was already incorrect and became worse after commit `28360dc` ("MEDIUM: clock: force internal time to wrap early after boot"). No backport is needed.	2023-04-27 18:22:34 +02:00
Willy Tarreau	22b6d26c57	BUG/MINOR: calltrace: fix 'now' being used in place of 'date' Since commit `28360dc` ("MEDIUM: clock: force internal time to wrap early after boot") we have a much clearer distinction between 'now' (the internal, drifting clock) and 'date' (the wall clock time). The calltrace code was using "now" instead of "date" since the value is displayed to humans. No backport is needed.	2023-04-27 18:14:57 +02:00
Willy Tarreau	fe1b3b8777	Revert "BUG/MINOR: clock: fix a few occurrences of 'now' being used in place of 'date'" This reverts commit `aadcfc9ea6`. The parts affecting the DeviceAtlas addon were wrong actually, the "now" variable was a local time_t in a file that's not compiled with the haproxy binary (dadwsch). Only the fix to the calltrace is correct, so better revert and fix the only one in a separate commit. No backport is needed.	2023-04-27 18:14:57 +02:00
Willy Tarreau	82bde18aa4	BUG/MINOR: activity: show wall-clock date, not internal date in show activity Another case where "now" was used instead of "date" for a publicly visible date that was already incorrect and became worse after commit `28360dc` ("MEDIUM: clock: force internal time to wrap early after boot"). No backport is needed.	2023-04-27 14:47:50 +02:00
Willy Tarreau	a5f0e6cfc0	BUG/MINOR: spoe: use "date" not "now" in debug messages The debug messages were still emitted with a date taken from "now" instead of "date", which was not correct a long time ago but which became worse in 2.8 since commit `28360dc` ("MEDIUM: clock: force internal time to wrap early after boot"). Let's fix it. No backport is needed.	2023-04-27 11:57:53 +02:00
Willy Tarreau	aadcfc9ea6	BUG/MINOR: clock: fix a few occurrences of 'now' being used in place of 'date' Since commit `28360dc` ("MEDIUM: clock: force internal time to wrap early after boot") we have a much clearer distinction between 'now' (the internal, drifting clock) and 'date' (the wall clock time). There were still a few places where 'now' was being used for human consumption. No backport is needed.	2023-04-26 19:21:25 +02:00
Amaury Denoyelle	7b516d3732	BUG/MINOR: quic: fix race on quic_conns list during affinity rebind Each quic_conn are attached in a global thread-local quic_conns list used for "show quic" command. During thread rebinding, a connection is detached from its local list instance and moved to its new thread list. However this operation is not thread-safe and may cause a race condition. To fix this, only remove the connection from its list inside qc_set_tid_affinity(). The connection is inserted only after in qc_finalize_affinity_rebind() on the new thread instance thus prevented a race condition. One impact of this is that a connection will be invisible during rebinding for "show quic". A connection must not transition to closing state in between this two steps or else cleanup via quic_handle_stopping() may not miss it. To ensure this, this patch relies on the previous commit : commit `d6646dddcc` MINOR: quic: finalize affinity change as soon as possible This should be backported up to 2.7.	2023-04-26 17:50:22 +02:00
Amaury Denoyelle	d6646dddcc	MINOR: quic: finalize affinity change as soon as possible During accept, a quic-conn is rebind to a new thread. This process is done in two times : * first on the original thread via qc_set_tid_affinity() * then on the newly assigned thread via qc_finalize_affinity_rebind() Most quic_conn operations (I/O tasklet, task and quic_conn FD socket read) are reactivated ony after the second step. However, there is a possibility that datagrams are handled before it via quic_dgram_parse() when using listener sockets. This does not seem to cause any issue but this may cause unexpected behavior in the future. To simplify this, qc_finalize_affinity_rebind() will be called both by qc_xprt_start() and quic_dgram_parse(). Only one invocation will be performed thanks to the new flag QUIC_FL_CONN_AFFINITY_CHANGED. This should be backported up to 2.7.	2023-04-26 17:50:16 +02:00
Amaury Denoyelle	a57ab0fabe	MINOR: mux-quic: do not allocate Tx buf for empty STREAM frame Sometimes it may be necessary to send an empty STREAM frame to signal clean stream closure with FIN bit set. Prior to this change, a Tx buffer was allocated unconditionnally even if no data is transferred. Most of the times, allocation was not performed due to an older buffer reused. But if data were already acknowledge, a new buffer is allocated. No memory leak occurs as the buffer is properly released when the empty frame acknowledge is received. But this allocation is unnecessary and it consumes a connexion Tx buffer for nothing. Improve this by skipping buffer allocation if no data to transfer. qcs_build_stream_frm() is now able to deal with a NULL out argument. This should be backported up to 2.6.	2023-04-26 17:50:16 +02:00
Amaury Denoyelle	42c5b75cac	MINOR: mux-quic: do not set buffer for empty STREAM frame Previous patch fixes an issue occurring with empty STREAM frames without payload. The crash was hidden in part because buf/data fields of qf_stream were set even if no payload is referenced. This was not the true cause of the crash but to ease future debugging, a STREAM frame built with no payload now has its buf and data fields set to NULL. This should be backported up to 2.6.	2023-04-26 17:50:16 +02:00
Amaury Denoyelle	19eaf88fda	BUG/MINOR: quic: prevent buggy memcpy for empty STREAM Sometimes it may be necessary to send empty STREAM frames with only the FIN bit set. For these frames, memcpy is thus unnecessary as their payload is empty. However, we did not prevent its invocation inside quic_build_stream_frame(). Normally, memcpy invocation with length==0 is safe. However, there is an extra condition in our function to handle data wrapping. For an empty STREAM frame in the context of MUX emission, this is safe as the frame points to a valid buffer which causes the wrapping condition to be false and resulting in a memcpy with 0 length. However, in the context of retransmission, this may lead to a crash. Consider the following scenario : two STREAM frames A and B are produced, one with payload and one empty with FIN set, pointing to the same stream_desc buffer. If A is acknowledged by the peer, its buffer is released as no more data is left in it. If B needs to be resent, the wrapping condition will be messed up to a reuse of a freed buffer. Most of the times, <wrap> will be a negative number, which results in a memcpy invocation causing a buffer overflow. To fix this, simply add an extra condition to skip memcpy and wrapping check if STREAM frame length is null inside quic_build_stream_frame(). This crash is pretty rare as it relies on a lot of conditions difficult to reproduce. It seems to be the cause for the latest crashes reported under github issue #2120. In all the inspected dumps, the segfault occurred during retransmission with an empty STREAM frame being used as input. Thanks again to Tristan from Mangadex for his help and investigation on it. This should be backported up to 2.6.	2023-04-26 17:50:16 +02:00
Amaury Denoyelle	7c5591facb	BUG/MEDIUM: mux-quic: improve streams fairness to prevent early timeout Since the following mentioned patch, a send-list mechanism was implemented to improve streams priorization on sending. commit `20f2a425ff` MAJOR: mux-quic: rework stream sending priorization This is done to prevent the same streams to always be used as first ones on emission. However there is still a flaw on the algorithm. Once put in the send-list, a streams is not removed until it has sent all of its content. When a stream transfers a large object, it will remain in the send-list during all the transfer and will soon monopolize the first place. the stream does never leave its position until the transfer is finished and will monopolize the first place. Other streams behind won't have the opportunity to advance on their own transfers due to a Tx buffer exhaustion. This situation is especially problematic if a small timeout client is used. As some streams won't advance on their transfer for a long period of time, they will be aborted due to a stream layer timeout client causing a RESET_STREAM emission. To fix this, during sending each stream with at least some bytes transferred from its tx.buf to qc_stream_desc out buffer is put at the end of the send-list. This ensures that on the next iteration streams that cannot transfer anything will be used in priority. This patch improves significantly h2load benchmarks for large objects with several streams opened in parallel on a single connection. Without it, errors may be reported by h2load for aborted streams. For example, this improved the following scenario on a 10mbit/s link with a 10s timeout client : $ ./build/bin/h2load --npn-list h3 -t 1 -c 1 -m 30 -n 30 https://198.18.10.11:20443/?s=500k This fix may help with the github issue #2004 where chrome browser stop to use QUIC after receiving RESET_STREAM frames. This should be backported up to 2.7.	2023-04-26 17:50:16 +02:00
Amaury Denoyelle	24962dd178	BUG/MEDIUM: mux-quic: do not emit RESET_STREAM for unknown length Some HTX responses may not always contain a EOM block. For example this is the case if content-length header is missing from the HTTP server response. Stream termination is thus signaled to QUIC mux via shutw callback. However, this is interpreted inconditionnally as an early close by the mux with a RESET_STREAM emission. Most of the times, QUIC clients report this as an error. To fix this, check if htx.extra is set to HTX_UNKOWN_PAYLOAD_LENGTH for a qcs instance. If true, shutw will never be used to emit a RESET_STREAM. Instead, the stream will be closed properly with a FIN STREAM frame. If all data were already transfered, an empty STREAM frame is sent. This fix may help with the github issue #2004 where chrome browser stop to use QUIC after receiving RESET_STREAM frames. This issue was reported by Vladimir Zakharychev. Thanks to him for his help and testing. It was also reproduced locally using httpterm with the query string "/?s=1k&b=0&C=1". This should be backported up to 2.7.	2023-04-26 17:50:09 +02:00
Frédéric Lécaille	7d23e8d1a6	CLEANUP: quic: Rename several <buf> variables into quic_sock.c Rename some variables which are not struct buffer variables. Should be backported to 2.7.	2023-04-24 15:53:27 +02:00
Frédéric Lécaille	bb426aa5f1	CLEANUP: quic: Rename <buf> variable into qc_parse_hd_form() There is no struct buffer variable manipulated by this function. Should be backported to 2.7.	2023-04-24 15:53:27 +02:00
Frédéric Lécaille	6ff52f9ce5	CLEANUP: quic: Rename <buf> variable into quic_packet_read_long_header() Make this function be more readable: there is no struct buffer variable passed as parameter to this function. Should be backported to 2.7.	2023-04-24 15:53:27 +02:00
Frédéric Lécaille	81a02b59f5	CLEANUP: quic: Rename several <buf> variables at low level Make quic_stateless_reset_token_cpy(), quic_derive_cid() and quic_get_cid_tid() be more readable: there is no struct buffer variable manipulated by these functions. Should be backported to 2.7.	2023-04-24 15:53:27 +02:00
Frédéric Lécaille	182934d80b	CLEANUP: quic: Rename quic_get_dgram_dcid() <buf> variable quic_get_dgram_dcid() does not manipulate any struct buffer variable. Should be backported to 2.7.	2023-04-24 15:53:26 +02:00
Frédéric Lécaille	1e0f8255a1	CLEANUP: quic: Make qc_build_pkt() be more readable There is no <buf> variable passed to this function. Also rename <buf_end> to <end> to mimic others functions. Rename <beg> to <first_byte> and <end> to <last_byte>. Should be backported to 2.7.	2023-04-24 15:53:26 +02:00
Frédéric Lécaille	3adb9e85a1	CLEANUP: quic: Rename <buf> variable for several low level functions Make quic_build_packet_long_header(), quic_build_packet_short_header() and quic_apply_header_protection() be more readable: there is no struct buffer variables used by these functions. Should be backported to 2.7.	2023-04-24 15:53:26 +02:00
Frédéric Lécaille	bef3098d33	CLEANUP: quic: Rename <buf> variable into quic_rx_pkt_parse() Make this function be more readable: there is no struct buffer variable used by this function. Should be backported to 2.7.	2023-04-24 15:53:26 +02:00
Frédéric Lécaille	7f0b1c7016	CLEANUP: quic: Rename <buf> variable into quic_padding_check() Make quic_padding_check() be more readable: there is not struct buffer variable used by this function. Should be backported to 2.7.	2023-04-24 15:53:26 +02:00
Frédéric Lécaille	dad0ede28a	CLEANUP: quic: Rename <buf> variable to <token> in quic_generate_retry_token() Make quic_generate_retry_token() be more readable: there is no struct buffer variable used in this function. Should be backported to 2.7.	2023-04-24 15:53:26 +02:00
Frédéric Lécaille	e66d67a1ae	CLEANUP: quic: Remove useless parameters passes to qc_purge_tx_buf() Remove the pointer to the connection passed as parameters to qc_purge_tx_buf() and other similar function which came with qc_purge_tx_buf() implementation. They were there do track the connection during tests. Must be backported to 2.7.	2023-04-24 15:53:26 +02:00
Amaury Denoyelle	d5f03cd576	CLEANUP: quic: rename frame variables Rename all frame variables with the suffix _frm. This helps to differentiate frame instances from other internal objects. This should be backported up to 2.7.	2023-04-24 15:35:22 +02:00
Amaury Denoyelle	888c5f283a	CLEANUP: quic: rename frame types with an explicit prefix Each frame type used in quic_frame union has been renamed with the following prefix "qf_". This helps to differentiate frame instances from other internal objects. This should be backported up to 2.7.	2023-04-24 15:35:03 +02:00
Frédéric Lécaille	b73762ad78	BUG/MINOR: quic: Useless I/O handler task wakeups (draining, killing state) From the idle_timer_task(), the I/O handler must be woken up to send ack. But there is no reason to do that in draining state or killing state. In draining state this is even forbidden. Must be backported to 2.7.	2023-04-24 11:47:11 +02:00
Frédéric Lécaille	d21c628ffd	BUG/MINOR: quic: Useless probing retransmission in draining or killing state The timer task responsible of triggering probing retransmission did not inspect the state of the connection before doing its job. But there is no need to probe the peer when the connection is in draining or killing state. About the draining state, this is even forbidden. Must be backported to 2.7 and 2.6.	2023-04-24 11:46:33 +02:00
Frédéric Lécaille	c6bec2a3af	BUG/MINOR: quic: Possible leak during probing retransmissions qc_dgrams_retransmit() prepares two list of frames to be retransmitted into two datagrams. If the first datagram could not be sent, the TX buffer will be purged with the prepared packet and its frames, but this was not the case for the second list of frames. Must be backported in 2.7.	2023-04-24 11:38:28 +02:00
Frédéric Lécaille	ce0bb338c6	BUG/MINOR: quic: Possible memory leak from TX packets This bug arrived with this commit which was not sufficient: BUG/MEDIUM: quic: Missing TX buffer draining from qc_send_ppkts() Indeed, there were also remaining allocated TX packets to be released and their TX frames. Implement qc_purge_tx_buf() to do so which depends on qc_free_tx_coalesced_pkts() and qc_free_frm_list(). Must be backported to 2.7.	2023-04-24 11:38:28 +02:00
Frédéric Lécaille	e95e00e305	MINOR: quic: Move traces at proto level These traces has already been useful to debug issues. Must be backported to 2.7 and 2.6.	2023-04-24 11:38:16 +02:00
Willy Tarreau	0e875cf291	MEDIUM: listener: switch the default sharding to by-group Sharding by-group is exactly identical to by-process for a single group, and will use the same number of file descriptors for more than one group, while significantly lowering the kernel's locking overhead. Now that all special listeners (cli, peers) are properly handled, and that support for SO_REUSEPORT is detected at runtime per protocol, there should be no more reason for now switching to by-group by default. That's what this patch does. It does only this and nothing else so that it's easy to revert, should any issue be raised. Testing on an AMD EPYC 74F3 featuring 24 cores and 48 threads distributed into 8 core complexes of 3 cores each, shows that configuring 8 groups (one per CCX) is sufficient to simply double the forwarded connection rate from 112k to 214k/s, reducing kernel locking from 71 to 55%.	2023-04-23 10:18:16 +02:00
Willy Tarreau	7310164b2c	MINOR: listener: add a new global tune.listener.default-shards setting This new setting accepts "by-process", "by-group" and "by-thread" and will dictate how listeners will be sharded by default when nothing is specified. While the default remains "by-process", "by-group" should be much more efficient with many threads, while not changing anything for single-group setups.	2023-04-23 09:46:15 +02:00
Willy Tarreau	c38499ceae	MINOR: listener: do not restrict CLI to first group anymore Now that we're able to run listeners on any set of groups, we don't need to maintain a special case about the stats socket anymore. It used to be forced to group 1 only so as to avoid startup failures in case several groups were configured, but if it's done now, it will automatically bind the needed FDs to have one per group so this is no more an issue.	2023-04-23 09:46:15 +02:00
Willy Tarreau	f1003ea7fa	MINOR: protocol: perform a live check for SO_REUSEPORT support When testing if a protocol supports SO_REUSEPORT, we're now able to verify if the OS does really support it. While it may be supported at build time, it may possibly have been blocked in a container for example so we'd rather know what it's like.	2023-04-23 09:46:15 +02:00
Willy Tarreau	b073573c10	MINOR: sock: add a function to check for SO_REUSEPORT support at runtime The new function _sock_supports_reuseport() will be used to check if a protocol type supports SO_REUSEPORT or not. This will be useful to verify that shards can really work.	2023-04-23 09:46:15 +02:00
Willy Tarreau	8a5e6f4cca	MINOR: protocol: add a function to check if some features are supported The new function protocol_supports_flag() checks the protocol flags to verify if some features are supported, but will support being extended to refine the tests. Let's use it to check for REUSEPORT.	2023-04-23 09:46:15 +02:00
Willy Tarreau	c1fbdd6397	MINOR: listener: automatically adjust shards based on support for SO_REUSEPORT Now if multiple shards are explicitly requested, and the listener's protocol doesn't support SO_REUSEPORT, sharding is disabled, which will result in the socket being automatically duped if needed. A warning is emitted when this happens. If "shards by-group" or "shards by-thread" are used, these will automatically be turned down to 1 since we want this to be possible easily using -dR on the command line without having to djust the config. For "by-thread", a diag warning will be emitted to help troubleshoot possible performance issues.	2023-04-23 09:46:15 +02:00
Willy Tarreau	785b89f551	MINOR: protocol: move the global reuseport flag to the protocols Some protocol support SO_REUSEPORT and others not. Some have such a limitation in the kernel, and others in haproxy itself (e.g. sock_unix cannot support multiple bindings since each one will unbind the previous one). Also it's really protocol-dependent and not just family-dependent because on Linux for some time it was supported for TCP and not UDP. Let's move the definition to the protocols instead. Now it's preset in tcp/udp/quic when SO_REUSEPORT is defined, and is otherwise left unset. The enabled() config condition test validates IPv4 (generally sufficient), and -dR / noreuseport all protocols at once.	2023-04-23 09:46:15 +02:00
Willy Tarreau	65df7e028d	MINOR: protocol: add a flags field to store info about protocols We'll use these flags to know if some protocols are supported, and if so, with what options/extensions. Reuseport will move there for example. Two functions were added to globally set/clear a flag.	2023-04-23 09:46:15 +02:00
Willy Tarreau	a22db6567f	MEDIUM: peers: call bind_complete_thread_setup() to finish the config The listeners in peers sections were still not handing the thread groups fine. Shards were silently ignored and if a listener was bound to more than one group, it would simply fail. Now we can call the dedicated function to resolve all this and possibly create the missing extra listeners. bind_complete_thread_setup() was adjusted to use the proxy_type_str() instead of writing "proxy" at the only place where this word was still hard-coded so that we continue to speak about peers sections when relevant.	2023-04-23 09:46:15 +02:00
Willy Tarreau	f6a8444f55	REORG: listener: move the bind_conf's thread setup code to listener.c What used to be only two lines to apply a mask in a loop in check_config_validity() grew into a 130-line block that performs deeply listener-specific operations that do not have their place there anymore. In addition it's worth noting that the peers code still doesn't support shards nor being bound to more than one group, which is a second reason for moving that code to its own function. Nothing was changed except recreating the missing variables from the bind_conf itself (the fe only).	2023-04-23 09:46:15 +02:00
Willy Tarreau	e1a0107f9c	BUG/MINOR: config: fix NUMA topology detection on FreeBSD In 2.6-dev1, NUMA topology detection was enabled on FreeBSD with commit `f5d48f8b3` ("MEDIUM: cfgparse: numa detect topology on FreeBSD."). But it suffers from a minor bug which is that it forgets to check for the number of domains and always emits a confusing warning indicating that multiple sockets were found while it's not the case. This can be backported to 2.6.	2023-04-23 09:46:15 +02:00
Willy Tarreau	997ad155fe	BUG/MINOR: tools: check libssl and libcrypto separately The lib compatibility checks introduced in 2.8-dev6 with commit `c3b297d5a` ("MEDIUM: tools: further relax dlopen() checks too consider grouped symbols") were partially incorrect in that they check at the same time libcrypto and libssl. But if loading a library that only depends on libcrypto, the ssl-only symbols will be missing and this might present an inconsistency. This is what is observed on FreeBSD 13.1 when libcrypto is being loaded, where it sees two symbols having disappeared. The fix consists in splitting the checks for libcrypto and libssl. No backport is needed, unless the patch above finally gets backported.	2023-04-23 09:46:15 +02:00
Willy Tarreau	9f53b7b41a	BUG/MINOR: sock_inet: use SO_REUSEPORT_LB where available On FreeBSD 13.1 I noticed that thread balancing using shards was not always working. Sometimes several threads would work, but most of the time a single one was taking all the traffic. This is related to how SO_REUSEPORT works on FreeBSD since version 12, as it seems there is no guarantee that multiple sockets will receive the traffic. However there is SO_REUSEPORT_LB that is designed exactly for this, so we'd rather use it when available. This patch may possibly be backported, but nobody complained and it's not sure that many users rely on shards. So better wait for some feedback before backporting this.	2023-04-23 09:46:15 +02:00
Ilya Shipitsin	ccf8012f28	CLEANUP: assorted typo fixes in the code and comments This is 36th iteration of typo fixes	2023-04-23 09:44:53 +02:00
Willy Tarreau	023c311d70	BUG/MINOR: cli: clarify error message about stats bind-process In 2.7-dev2, "stats bind-process" was removed by commit `94f763b5e` ("MEDIUM: config: remove deprecated "bind-process" directives from frontends") and an error message indicates that it's no more supported. However it says "stats" is not supported instead of "stats bind-process", making it a bit confusing. This should be backported to 2.7.	2023-04-23 09:40:56 +02:00
Tim Duesterhus	1307cd42d2	CLEANUP: Stop checking the pointer before calling `ring_free()` Changes performed with this Coccinelle patch: @@ expression e; @@ - if (e != NULL) { ring_free(e); - } @@ expression e; @@ - if (e) { ring_free(e); - } @@ expression e; @@ - if (e) ring_free(e); @@ expression e; @@ - if (e != NULL) ring_free(e);	2023-04-23 00:28:25 +02:00
Tim Duesterhus	fe83f58906	CLEANUP: Stop checking the pointer before calling `task_free()` Changes performed with this Coccinelle patch: @@ expression e; @@ - if (e != NULL) { task_destroy(e); - } @@ expression e; @@ - if (e) { task_destroy(e); - } @@ expression e; @@ - if (e) task_destroy(e); @@ expression e; @@ - if (e != NULL) task_destroy(e);	2023-04-23 00:28:25 +02:00
Tim Duesterhus	c18e244515	CLEANUP: Stop checking the pointer before calling `pool_free()` Changes performed with this Coccinelle patch: @@ expression e; expression p; @@ - if (e != NULL) { pool_free(p, e); - } @@ expression e; expression p; @@ - if (e) { pool_free(p, e); - } @@ expression e; expression p; @@ - if (e) pool_free(p, e); @@ expression e; expression p; @@ - if (e != NULL) pool_free(p, e);	2023-04-23 00:28:25 +02:00
Tim Duesterhus	b1ec21d259	CLEANUP: Stop checking the pointer before calling `tasklet_free()` Changes performed with this Coccinelle patch: @@ expression e; @@ - if (e != NULL) { tasklet_free(e); - } @@ expression e; @@ - if (e) { tasklet_free(e); - } @@ expression e; @@ - if (e) tasklet_free(e); @@ expression e; @@ - if (e != NULL) tasklet_free(e); See GitHub Issue #2126	2023-04-23 00:28:25 +02:00
Willy Tarreau	8adffaa899	MINOR: listener: always compare the local thread as well By comparing the local thread's load with the least loaded thread's load, we can further improve the fairness and at the same time also improve locality since it allows a small ratio of connections not to be migrated. This is visible on CPU usage with long connections on very large thread counts (224) and high bandwidth (200G). The cost of checking the local thread's load remains fairly low so there's no reason not to do this. We continue to update the index if we select the local thread, because it means that the two other threads were both more loaded so we'd rather find better ones.	2023-04-21 17:41:26 +02:00
Willy Tarreau	ff18504d73	MINOR: listener: make sure to avoid ABA updates in per-thread index One limitation of the current thread index mechanism is that if the values are assigned multiple times to the same thread and the index loops, it can match again the old value, which will not prevent a competing thread from finishing its CAS and assigning traffic to a thread that's not the optimal one. The probability is low but the solution is simple enough and consists in implementing an update counter in the high bits of the index to force a mismatch in this case (assuming we don't try to cover for extremely unlikely cases where the update counter loops while the index remains equal). So let's do that. In order to improve the situation a little bit, we now set the index to a ulong so that in 32 bits we have 8 bits of counter and in 64 bits we have 40 bits.	2023-04-21 17:41:26 +02:00
Willy Tarreau	77e33509c8	MINOR: listener: resync with the thread index before heavy calculations During heavy accept competition, the CAS will occasionally fail and we'll have to go through all the calculation again. While the first two loops look heavy, they're almost never taken so they're quite cheap. However the rest of the operation is heavy because we have to consult connection counts and queue indexes for other threads, so better double-check if the index is still valid before continuing. Tests show that it's more efficient do retry half-way like this.	2023-04-21 17:41:26 +02:00
Willy Tarreau	b657492680	MINOR: listener: use a common thr_idx from the reference listener Instead of seeing each listener use its own thr_idx, let's use the same for all those from a shard. It should provide more accurate and smoother thread allocation.	2023-04-21 17:41:26 +02:00
Willy Tarreau	9d360604bd	MEDIUM: listener: rework thread assignment to consider all groups Till now threads were assigned in listener_accept() to other threads of the same group only, using a single group mask. Now that we have all the relevant info (array of listeners of the same shard), we can spread the thr_idx to cover all assigned groups. The thread indexes now contain the group number in their upper bits, and the indexes run over te whole list of threads, all groups included. One particular subtlety here is that switching to a thread from another group also means switching the group, hence the listener. As such, when changing the group we need to update the connection's owner to point to the listener of the same shard that is bound to the target group.	2023-04-21 17:41:26 +02:00
Willy Tarreau	e6f5ab5afa	MINOR: listener: make accept_queue index atomic There has always been a race when checking the length of an accept queue to determine which one is more loaded that another, because the head and tail are read at two different moments. This is not required, we can merge them as two 16 bit numbers inside a single 32-bit index that is always accessed atomically. This way we read both values at once and always have a consistent measurement.	2023-04-21 17:41:26 +02:00
Willy Tarreau	09b52d1c3d	MEDIUM: config: permit to start a bind on multiple groups at once Now it's possible for a bind line to span multiple thread groups. When this happens, the first one will become the reference and will be entirely set up, and the subsequent ones will be duplicated from this reference, so that they can be registered in distinct groups. The reference is always setup and started first so it is always available when the other ones are started. The doc was updated to reflect this new possibility with its limitations and impacts, and the differences with the "shards" option.	2023-04-21 17:41:26 +02:00
Willy Tarreau	09e266e6f5	MINOR: proto: skip socket setup for duped FDs It's not strictly necessary, but it's still better to avoid setting up the same socket multiple times when it's being duplicated to a few FDs. We don't change that for inherited ones however since they may really need to be set up, so we only skip duplicated ones.	2023-04-21 17:41:26 +02:00
Willy Tarreau	0e1aaf4e78	MEDIUM: proto: duplicate receivers marked RX_F_MUST_DUP The different protocol's ->bind() function will now check the receiver's RX_F_MUST_DUP flag to decide whether to bind a fresh new listener from scratch or reuse an existing one and just duplicate it. It turns out that the existing code already supports reusing FDs since that was done as part of the FD passing and inheriting mechanism. Here it's not much different, we pass the FD of the reference receiver, it gets duplicated and becomes the new receiver's FD. These FDs are also marked RX_F_INHERITED so that they are not exported and avoid being touched directly (only the reference should be touched).	2023-04-21 17:41:26 +02:00
Willy Tarreau	aae1810b4d	MINOR: receiver: add a struct shard_info to store info about each shard In order to create multiple receivers for one multi-group shard, we'll need some more info about the shard. Here we store: - the number of groups (= number of receivers) - the number of threads (will be used for accept LB) - pointer to the reference rx (to get the FD and to find all threads) - pointers to the other members (to iterate over all threads) For now since there's only one group per shard it remains simple. The listener deletion code already takes care of removing the current member from its shards list and moving others' reference to the last one if it was their reference (so as to avoid o(n^2) updates during ordered deletes). Since the vast majority of setups will not use multi-group shards, we try to save memory usage by only allocating the shard_info when it is needed, so the principle here is that a receiver shard_info==NULL is alone and doesn't share its socket with another group. Various approaches were considered and tests show that the management of the listeners during boot makes it easier to just attach to or detach from a shard_info and automatically allocate it if it does not exist, which is what is being done here. For now the attach code is not called, but detach is already called on delete.	2023-04-21 17:41:26 +02:00
Willy Tarreau	84fe1f479b	MINOR: listener: support another thread dispatch mode: "fair" This new algorithm for rebalancing incoming connections to multiple threads is simpler and instead of considering the threads load, it will only cycle through all of them, offering a fair share of the traffic to each thread. It may be well suited for short-lived connections but is also convenient for very large thread counts where it's not always certain that the least loaded thread will always be found.	2023-04-21 17:41:26 +02:00
Willy Tarreau	6a4d48b736	MINOR: quic_sock: index li->per_thr[] on local thread id, not global one There's a li_per_thread array in each listener for use with QUIC listeners. Since thread groups were introduced, this array can be allocated too large because global.nbthread is allocated for each listener, while only no more than MIN(nbthread,MAX_THREADS_PER_GROUP) may be used by a single listener. This was because the global thread ID is used as the index instead of the local ID (since a listener may only be used by a single group). Let's just switch to local ID and reduce the allocated size.	2023-04-21 17:41:26 +02:00

1 2 3 4 5 ...

15822 commits