Commit graph

16884 commits

Author SHA1 Message Date
Aurelien DARRAGON
f638d4b1bc BUG/MEDIUM: server/event_hdl: memory overrun in _srv_event_hdl_prepare_inetaddr()
As reported in GH #2358, #2359, #2360, #2361 and #2362: ipv6 address
handling may cause memory overrun due to struct in6_addr being handled
as sockaddr_in6 which is larger. Moreover, source variable wasn't properly
read from since the raw value was used as a pointer instead of pointing to
the actual variable's address.

This bug was introduced by 6fde37e046
("MINOR: server/event_hdl: add SERVER_INETADDR event")

Unfortunately for us, gcc didn't catch this and, this actually used to
"work" by accident since in6_addr struct is made of array so not passing
pointer explicitly still resolved to the proper starting address..
Hopefully this was caught by coverity so thanks to Ilya for that.

The fix is simple: we simply copy the whole in6_addr struct by accessing
it using a pointer and using the proper struct size for the copy.
2023-11-29 08:59:27 +01:00
William Lallemand
08f1e2bea2 MINOR: mworker/cli: implements the customized payload pattern for master CLI
Implements the customized payload pattern for the master CLI.

The pattern is stored in the stream in char pcli_payload_pat[8].

The principle is basically the same as the CLI one, it looks for '<<'
then stores what's between '<<' and '\n', and look for it to exit the
payload mode.
2023-11-28 19:13:49 +01:00
William Lallemand
dd38c37777 CLEANUP: mworker/cli: use a label to return errors
Remove the returns in the function to end directly at the end label.
2023-11-28 19:12:32 +01:00
William Lallemand
e3557c7d45 MEDIUM: cli: allow custom pattern for payload
The CLI payload syntax has some limitation, it can't handle payloads
with empty lines, which is a common problem when uploading a PEM file
over the CLI.

This patch implements a way to customize the ending pattern of the CLI,
so we can't look for other things than empty lines.

A char cli_payload_pat[8] is used in the appctx to store the customized
pattern. The pattern can't be more than 7 characters and can still empty
to match an empty line.

The cli_io_handler() identifies the pattern and stores it, and
cli_parse_request() identifies the end of the payload.

If the customized pattern between "<<" and "\n" is more than 7
characters, it is not considered as a pattern.

This patch only implements the parser for the 'stats socket', another
patch is needed for the 'master CLI'.
2023-11-28 19:12:32 +01:00
Remi Tricot-Le Breton
23c810d042 BUG/MINOR: cache: Remove incomplete entries from the cache when stream is closed
When a stream is interrupted by the client before the full answer is
stored in the cache, we end up with an incomplete entry in the cache
that cannot be overwritten until it "naturally" expires. In such a case,
we call the cache filter's cache_store_strm_deinit callback without ever
calling cache_store_http_end which means that the 'complete' flag is
never set on the concerned cache_entry.
This patch adds a check on the 'complete' flag in the strm_deinit
callback and removes the entry from the cache if it is incomplete.

A way to exhibit this bug is to try to get the same "big" response on
multiple clients at the same time thanks to h2load for instance, and to
interrupt the client side before the answer can be fully stored in the
cache.

This patch can be backported up to 2.4 but it will need some rework
starting with branch 2.8 because of the latest cache changes.
2023-11-28 17:18:48 +01:00
Frédéric Lécaille
ad61a5dde3 REORG: quic: Move quic_increment_curr_handshake() to quic_sock
Move quic_increment_curr_handshake() from quic_conn.c to quic_sock.h to be inlined.
Also move all the inlined functions at the end of this header.
2023-11-28 15:47:18 +01:00
Frédéric Lécaille
3e16784dfc REORG: quic: Remove qc_pkt_insert() implementation
As this function does only a few things with a not very well chosen name,
remove it and replace it by the its statements at the unique location it
is called.
2023-11-28 15:47:18 +01:00
Frédéric Lécaille
95e9033fd2 REORG: quic: Add a new module for retransmissions
Move several functions in relation with the retransmissions from TX part
(quic_tx.c) to quic_retransmit.c new C file.
2023-11-28 15:47:18 +01:00
Frédéric Lécaille
714d1096bc REORG: quic: Move qc_notify_send() to quic_conn
Move qc_notify_send() from quic_tx.c to quic_conn.c. Note that it was already
exported from both quic_conn.h and quic_tx.h. Modify this latter header
to fix the duplication.
2023-11-28 15:47:18 +01:00
Frédéric Lécaille
b5970967ca REORG: quic: Add a new module for QUIC retry
Add quic_retry.c new C file for the QUIC retry feature:
   quic_saddr_cpy() moved from quic_tx.c,
   quic_generate_retry_token_aad() moved from
   quic_generate_retry_token() moved from
   parse_retry_token() moved from
   quic_retry_token_check() moved from
   quic_retry_token_check() moved from
2023-11-28 15:47:18 +01:00
Frédéric Lécaille
43fbea0f38 REORG: quic: Move ncbuf related function from quic_rx to quic_conn
Move quic_get_ncbuf() and quic_free_ncbuf() from quic_rx.c to quic_conn.h
as static inlined functions.
2023-11-28 15:47:18 +01:00
Frédéric Lécaille
e0d3eb496b REORG: quic: Move NEW_CONNECTION_ID frame builder to quic_cid
Move qc_build_new_connection_id_frm() from quic_conn.c to quic_cid.c.
Also move quic_connection_id_to_frm_cpy() from quic_conn.h to quic_cid.h.
2023-11-28 15:47:18 +01:00
Frédéric Lécaille
795d1a57bf REORG: quic: Rename some (quic|qc)_conn* objects to quic_conn_closed
These objects could be confused with the ones defined by the congestion control
part (quic_cc.c).
2023-11-28 15:47:16 +01:00
Frédéric Lécaille
0b872e24cd REORG: quic: Move qc_may_probe_ipktns() to quic_tls.h
This function is in relation with the Initial packet number space which is
more linked to the QUIC TLS specifications. Let's move it to quic_tls.h
to be inlined.
2023-11-28 15:37:50 +01:00
Frédéric Lécaille
c93ebcc59b REORG: quic: Move quic_build_post_handshake_frames() to quic_conn module
Move quic_build_post_handshake_frames() from quic_rx.c to quic_conn.c. This
is a function which is also called from the TX part (quic_tx.c).
2023-11-28 15:37:50 +01:00
Frédéric Lécaille
3482455ddd REORG: quic: Move qc_handle_conn_migration() to quic_conn.c
This function manipulates only quic_conn objects. Its location is definitively
in quic_conn.c.
2023-11-28 15:37:50 +01:00
Frédéric Lécaille
581549851c REORG: quic: Move QUIC path definitions/declarations to quic_cc module
Move quic_path struct from quic_conn-t.h to quic_cc-t.h and rename it to quic_cc_path.
Update the code consequently.
Also some inlined functions in relation with QUIC path to quic_cc.h
2023-11-28 15:37:50 +01:00
Frédéric Lécaille
f32fc26b62 REORG: quic: Rename some functions used upon ACK receipt
Rename some functions to reflect more their jobs.
Move qc_release_lost_pkts() to quic_loss.c
2023-11-28 15:37:50 +01:00
Frédéric Lécaille
f74d882ef0 REORG: quic: Move the QUIC DCID parser to quic_sock.c
Move quic_get_dgram_dcid() from quic_conn.c to quic_sock.c because
only used in this file and define it as static.
2023-11-28 15:37:50 +01:00
Frédéric Lécaille
3b91756ebe REORG: quic: Move QUIC SSL BIO method related functions to quic_ssl.c
Move __quic_conn_init() and __quic_conn_deinit() from quic_conn.c to quic_ssl.c.
2023-11-28 15:37:50 +01:00
Frédéric Lécaille
09ab48472c REORG: quic: Move several inlined functions from quic_conn.h
Move quic_pkt_type(), quic_saddr_cpy(), quic_write_uint32(), max_available_room(),
max_stream_data_size(), quic_packet_number_length(), quic_packet_number_encode()
and quic_compute_ack_delay_us()	to quic_tx.c because only used in this file.
Also move quic_ack_delay_ms() and quic_read_uint32() to quic_tx.c because they
are used only in this file.

Move quic_rx_packet_refinc() and quic_rx_packet_refdec() to quic_rx.h header.
Move qc_el_rx_pkts(), qc_el_rx_pkts_del() and qc_list_qel_rx_pkts() to quic_tls.h
header.
2023-11-28 15:37:47 +01:00
Frédéric Lécaille
831764641f REORG: quic: Move QUIC CRYPTO stream definitions/declarations to QUIC TLS
Move quic_cstream struct definition from quic_conn-t.h to quic_tls-t.h.
Its pool is also moved from quic_conn module to quic_tls. Same thing for
quic_cstream_new() and quic_cstream_free().
2023-11-28 15:37:22 +01:00
Frédéric Lécaille
ae885b9b68 REORG: quic: Move CRYPTO data buffer defintions to QUIC TLS module
Move quic_crypto_buf struct definition from quic_conn-t.h to quic_tls-t.h.
Also move its pool definition/declaration to quic_tls-t.h/quic_tls.c.
2023-11-28 15:37:22 +01:00
Frédéric Lécaille
0fc0d45745 REORG: quic: Add a new module to handle QUIC connection IDs
Move quic_cid and quic_connnection_id from quic_conn-t.h to new quic_cid-t.h header.
Move defintions of quic_stateless_reset_token_init(), quic_derive_cid(),
new_quic_cid(), quic_get_cid_tid() and retrieve_qc_conn_from_cid() to quic_cid.c
new C file.
2023-11-28 15:37:22 +01:00
Frédéric Lécaille
1564ec0a93 REORG: quic: Move some QUIC CLI code to its C file
Move init_quic() from quic_conn.c to quic_cli.c and rename it to cli_quic_init().
2023-11-28 15:37:22 +01:00
Frédéric Lécaille
21615d4376 CLEANUP: quic: Remove dead definitions/declarations
Remove useless definitions and declarations.
2023-11-28 15:37:22 +01:00
Christopher Faulet
af733ef6e4 BUG/MEDIUM: mux-h2: Remove H2_SF_NOTIFIED flag for H2S blocked on fast-forward
When a H2 stream is blocked during data fast-forwarding, we must take care
to remove H2_SF_NOTIFIED flag. This was only performed when data
fast-forward was attempted. However, if the H2 stream was blocked for any
reason, this flag was not removed. During our tests, we found it was
possible to infinitely block a connection because one of its streams was in
the send_list with the flag set. In this case, the stream was no longer
woken up to resume the sends, blocking all other streams.

No backport needed.
2023-11-28 14:01:56 +01:00
Amaury Denoyelle
fe3726cb76 BUG/MINOR: quic: fix CONNECTION_CLOSE_APP encoding
CONNECTION_CLOSE_APP encoding is broken, which prevents the sending of
every packet with such a frame. This bug was always present in quic
haproxy. However, it was slightly dissimulated by the previous code
which always initialized all frame members to zero, which was sufficient
to ensure CONNECTION_CLOSE_APP encoding was ok. The below patch changes
this behavior by removing this costly initialization step.

  4cf784f38e
  MINOR: quic: Avoid zeroing frame structures

Now, frames members must always be initialized individually given the
type of frame to used. However, for CONNECTION_CLOSE_APP this was not
done as qc_cc_build_frm() accessed the wrong union member refering to a
CONNECTION_CLOSE instead.

This bug was detected when trying to generate a HTTP/3 error. The
CONNECTION_CLOSE_APP frame encoding failed due to a non-initialized
<reason_phrase_len> which was too big. This was reported by the
following trace :
  "frame building error : qc@0x5555561b86c0 idle_timer_task@0x5555561e5050 flags=0x86038058 CONNECTION_CLOSE_APP"

This must be backported up to 2.6. This is necessary even if above
commit is not as previous code is also buggy, albeit with a different
behavior.
2023-11-28 11:40:01 +01:00
Willy Tarreau
d656ac7e13 OPTIM: mux-h2/zero-copy: don't allocate more buffers per connections than streams
It's the exact same as commit 0a7ab7067 ("OPTIM: mux-h2: don't allocate
more buffers per connections than streams"), but for the zero-copy case
this time. Previously it was only done on the regular snd_buf() path, but
this one is needed as well. A transfer on 16 parallel streams now consumes
half of the memory, and a single stream consumes much less.

An alternate approach would be worth investigating in the future, based
on the same principle as the CF_STREAMER_FAST at the higher level: in
short, by monitoring how many mux buffers we write at once before refilling
them, we would get an idea of how much is worth keeping in buffers max,
given that anything beyond would just waste memory. Some tests show that
a single buffer already seems almost as good, except for single-stream
transfers, which is why it's worth spending more time on this.
2023-11-28 09:15:26 +01:00
Amaury Denoyelle
e97489a526 MINOR: trace: support -dt optional format
Add an optional argument for "-dt". This argument is interpreted as a
list of several trace statement separated by comma. For each statement,
a specific trace name can be specifed, or none to act on all sources.
Using double-colon separator, it is possible to add specifications on
the wanted level and verbosity.
2023-11-27 17:15:14 +01:00
Amaury Denoyelle
670520cff8 MINOR: trace: parse verbosity in a function
This patch is similar to the previous one except that it handles trace
verbosity. Trace source must be specified unless "quiet" is used.
2023-11-27 17:11:14 +01:00
Amaury Denoyelle
ed9fbeed78 MINOR: trace: parse level in a function
Extract conversion of level string argument to integer value in a
dedicated internal function trace_parse_level(). This function is used
to for CLI trace parsing and will also be useful for "-dt" process
argument.
2023-11-27 17:11:14 +01:00
Amaury Denoyelle
cef29d3708 MINOR: trace: define simple -dt argument
Add '-dt' haproxy process argument. This will automatically activate all
trace sources on stderr with the error level. This could be useful to
troubleshoot issues such as protocol violations.
2023-11-27 17:10:18 +01:00
Amaury Denoyelle
eabe477ad2 BUILD: map: fix build warning
<pattern> field pointer of pat_ref_elt structure has been by a
zero-length array. As such, it's now unneeded to check for NULL address
before printing it.

This type conversion was done in the following commit :
  3ac9912837
  OPTIM: pattern: save memory and time using ebst instead of ebis

The current patch is mandatory to fix the following GCC warning :
  CC      src/map.o
  src/map.c: In function ‘cli_io_handler_map_lookup’:
  src/map.c:549:54: error: the comparison will always evaluate as ‘true’ for the address of ‘pattern’ will never be NULL [-Werror=address]
  549 |                                         if (pat->ref && pat->ref->pattern)
      |

No need to backport it unless the above commit is.
2023-11-27 15:03:41 +01:00
Willy Tarreau
3ac9912837 OPTIM: pattern: save memory and time using ebst instead of ebis
In the pat_ref_elt struct, the pattern string is stored outside of the
node element, using a pointer to an strdup(). Not only this needlessly
wastes at least 16-24 bytes per entry (8 for the pointer, 8-16 for the
allocator), it also makes the tree descent less efficient since both
the node and the string have to be visited for each layer (hence at least
two cache lines). Let's use an ebmb storage and place the pattern right
at the end of the pat_ref_elt, making it a variable-sized element instead.

The set-map test below jumps from 173 to 182 kreq/s/core, and the memory
usage drops from 356 MB to 324 MB:

  http-request set-map(/dev/null) %[rand(1000000)] 1

This is even more visible with large maps: after loading 16M IP addresses
into a map, the process uses this amount of memory:

  - 3.15 GB with haproxy-2.8
  - 4.21 GB with haproxy-2.9-dev11
  - 3.68 GB with this patch

So that's a net saving of 32 bytes per entry here, which cuts in half the
extra cost of the tree, and loading a large map takes about 20% less time.
2023-11-27 11:25:07 +01:00
Christopher Faulet
b4eaadae84 BUG/MEDIUM: mux-h1: Properly ignore trailers when a content-length is announced
It is not possible in H1, but in H2 (and probably H3) it is possible to have
trailers at the end of a message while a Content-Length was announced.

However, depending if the trailers are received with the last HTX DATA block
or the zero-copy forwarding is used or not, an processing error may be
triggered, leading to a 500-internal-error.

To fix the issue, when a content-length is announced and all the payload was
processed, we switch the message to H1_MSG_DONE state only if the
end-of-message was also reported (HTX_FL_EOM flag set). Otherwise, it is
switched to H1_MSG_TRAILERS state to be able to properly ignored the
trailers, if so.

The patch must be backported as far as 2.4. Be careful, this part was highly
refactored. The patch will have to be adapted to be backported.
2023-11-27 08:37:48 +01:00
William Lallemand
3dd55fa132 MINOR: mworker/cli: implement hard-reload over the master CLI
The mworker mode never had a proper 'hard-stop' (-st) for the reload,
this is a mode which was commonly used with the daemon mode, but it was
never implemented in mworker mode.

This patch fixes the problem by implementing a "hard-reload" command
over the master CLI. It does the same as the "reload" command, but
instead of waiting for the connections to stop in the previous process,
it immediately quits the previous process after binding.
2023-11-24 21:44:25 +01:00
William Lallemand
77a97536e8 MEDIUM: ssl: use ssl_sock_chose_sni_ctx() in the clienthello callback
This patch removes the code which selects the SSL certificate in the
OpenSSL Client Hello callback, to use the ssl_sock_chose_sni_ctx()
function which does the same.

The bigger part of the function which remains is the extraction of the
servername, ciphers and sigalgs, because it's done manually by parsing
the TLS extensions.

This is not supposed to change anything functionally.
2023-11-24 20:07:27 +01:00
William Lallemand
9f2e07bf7b MINOR: ssl: move certificate selection in a dedicate function
The certificate selection used in the WolfSSL cert_cb and in the OpenSSL
clienthello callback is the same, the function was duplicate to achieve
the same.

This patch move the selection code to a common function called
ssl_sock_chose_sni_ctx().

The servername string is still lowered in the callback, however the
search for the first dot in the string (wildp) is done in
ssl_sock_chose_sni_ctx()

The function uses the same certificate selection algorithm as before, it
needs to know if you need rsa or ecdsa, the bind_conf to achieve the
lookup, and the servername string.

This patch moves the code for WolSSL only.
2023-11-24 20:07:27 +01:00
William Lallemand
b900a3533c MINOR: ssl: replace 'trash.area' by 'servername' in ssl_sock_switchctx_cbk()
Replace 'trash.area' by 'servername' for more readibility.
2023-11-24 20:07:27 +01:00
William Lallemand
3750442bc4 MEDIUM: ssl: implement rsa/ecdsa selection with WolfSSL
PR https://github.com/wolfSSL/wolfssl/pull/6963 implements primitives to
extract ciphers and algorithm signatures.

It allows to chose a certificate depending on the sigals and
ciphers presented by the client (RSA or ECDSA).

Since WolfSSL does not implement the clienthello callback, the patch
uses the certificate callback (SSL_CTX_set_cert_cb())

The callback is inspired by our clienthello callback, however the
extraction of client ciphers and sigalgs is simpler,
wolfSSL_get_sigalg_info() and wolfSSL_get_ciphersuite_info() are used.

This is not enabled by default yet as the PR was not merged.
2023-11-24 20:07:27 +01:00
Aurelien DARRAGON
20437b3e32 MINOR: log/balance: set lbprm tot_weight on server on queue/dequeue
Maintain proper px->lbprm.tot_weight for log backends. server's weight is
considered as 1 as long as the server is usable.

This will allow the stats page to correctly display the proxy status since
the check currently relies on proxy's lbprm.tot_weight variable.
2023-11-24 16:27:55 +01:00
Aurelien DARRAGON
661c079bc5 MINOR: log/backend: prevent "use-server" rules use with LOG mode
server_rules declared using "use-server" keyword within a proxy are not
supported inside a log backend (with "mode log" set), so we report a
warning to the user and reset the setting.
2023-11-24 16:27:55 +01:00
Aurelien DARRAGON
f2629ebd4e MINOR: proxy: add free_server_rules() helper function
Take the px->server_rules freeing part out of free_proxy() and make it
a dedicated helper function so that it becomes possible to use it from
anywhere.
2023-11-24 16:27:55 +01:00
Aurelien DARRAGON
481e9317e3 MINOR: proxy: add free_logformat_list() helper function
There are multiple places inside free_proxy() where we need to perform
the exact same operation: freeing a logformat list which includes freeing
every member.

To prevent code duplication, we add the free_logformat_list() function
that takes such list as parameter and does all the freeing job on its
own.
2023-11-24 16:27:55 +01:00
Aurelien DARRAGON
8f878d5969 Revert "MINOR: cfgparse-listen: warn when use-server rules is used in wrong mode"
This reverts commit 5884e46ec8c8231e73c68e1bdd345c75c9af97a0 since we
cannot perform the test during parsing as the effective proxy mode is
not yet known.
2023-11-24 16:27:55 +01:00
Aurelien DARRAGON
ffae3ca34b MINOR: backend: remove invalid mode test for "hash-balance-factor"
This is a leftover from 1e0093a317 ("MINOR: backend/balance: "balance"
requires TCP or HTTP mode").

Indeed, we cannot perform the test during parsing as the effective proxy
type is not yet known. Moreover, thanks to b61147fd ("MEDIUM: log/balance:
merge tcp/http algo with log ones") we could potentially benefit from
this setting even in log mode, but for now it is ignored by all log
compatible load-balancing algorithms.
2023-11-24 16:27:55 +01:00
Aurelien DARRAGON
c886fb58eb MINOR: server/ip: centralize server ip updates
Add a new helper function named _srv_update_inetaddr() to centralize ip
addr and port updates during runtime.
2023-11-24 16:27:55 +01:00
Aurelien DARRAGON
24da4d3ee7 MINOR: tools: use const for read only pointers in ip{cmp,cpy}
In this patch we fix the prototype for ipcmp() and ipcpy() functions so
that input pointers that are used exclusively for reads are used as const
pointers. This way, the compiler can safely assume that those variables
won't be altered by the function.
2023-11-24 16:27:55 +01:00
Aurelien DARRAGON
683b2ae013 MINOR: server/event_hdl: add SERVER_INETADDR event
In this patch we add the support for a new SERVER event in the
event_hdl API.

SERVER_INETADDR is implemented as an advanced server event.
It is published each time the server's ip address or port is
about to change. (ie: from the cli, dns, lua...)

SERVER_INETADDR data is an event_hdl_cb_data_server_inetaddr struct
that provides additional info related to the server inet addr change,
but can be casted as a regular event_hdl_cb_data_server struct if
additional info is not needed.
2023-11-24 16:27:55 +01:00
Christopher Faulet
671e07617c BUG/MINOR: global: Fix tune.disable-(fast-forward/zero-copy-forwarding) options
These options were not properly handled during configration parsing. A wrong
bitwise operation was used.

No backport needed.
2023-11-24 09:33:56 +01:00
Christopher Faulet
8d46a2c973 MAJOR: h3: Implement zero-copy support to send DATA frame
When possible, we try send DATA frame without copying data. To do so, we
swap the input buffer with QCS tx buffer. It is only possible iff:

 * There is only one HTX block of data at the beginning of the message
 * Amount of data to send is equal to the size of the HTX data block
 * The QCS tx buffer is empty

In this case, both buffers are swapped. The frame metadata are written at
the begining of the buffer, before data and where the HTX structure is
stored.
2023-11-24 07:42:43 +01:00
Christopher Faulet
1bcc0f8892 MEDIUM: mux-quic: Add consumer-side fast-forwarding support
The QUIC multiplexer now implements callbacks to consume fast-forwarded
data. It relies on the H3 stack to acquire the buffer and format the frame.
2023-11-24 07:42:43 +01:00
Willy Tarreau
cd352c0dbe MINOR: log/balance: rename "log-sticky" to "sticky"
After giving it some thought, it could pretty well happen that other
protocols benefit from the sticky algorithm that some used to emulate
using a "stick-on int(0)" or things like this previously. So better
rename it to "sticky" right now instead of having to keep that "log-"
prefix forever. It's still limited to logs, of course, only the algo
is renamed in the config.
2023-11-23 18:21:31 +01:00
Amaury Denoyelle
71ed381249 MINOR: listener: allow thread kw for rhttp bind
Thanks to previous commit, a reverse HTTP listener is able to distribute
actively opened connections accross its threads. To be able to exploit
this, allow "thread" keyword for such a listener.

An extra check is added to explicitely forbids a reverse bind to span
multiple thread groups. Without this, multiple listeners instances will
be created, each with its owned "nbconn" value. This may surprise users
so for now, better to deactivate this possibility.
2023-11-23 17:46:00 +01:00
Amaury Denoyelle
3d0c7f2e2a MEDIUM: rhttp: support multi-thread active connect
Implement support for active HTTP reverse task migration on listener
threads. This operation is done each time a new reversable connection
will be instantiated. Instead of directly allocate the connection, a
lookup is done among all the listener threads.

A comparison is done to select the thread with the smallest number of
current reverse connection. If the thread found is different from the
current one, the connection allocation is delayed and the task
rescheduled on the chosen thread. The connection will then be created
and pinned on the new thread. This mechanisms allows to balance reverse
HTTP connections accross different threads.

Note that rhttp_set_affinity is still defined to disable thread
migration on accept. This is necessary as it's unsafe to move an
existing connection to another thread. However, active reverse task
migration should be sufficient to distribute connections accross several
threads. Better than that, this design allows to differentiate standard
frontend and reversable connections. The latest are designed to be
long-lived so it's useful to have their repartition solely based on
others reversed connections.
2023-11-23 17:45:56 +01:00
Amaury Denoyelle
a3187fe06c MINOR: rhttp: add count of active conns per thread
Add a new member <nb_rhttp_conns> in thread_ctx structure. Its purpose
is to count the current number of opened reverse HTTP connections
regarding from their listeners membership.

This patch will be useful to support multi-thread for active reverse
HTTP, in order to select the less loaded thread.

Note that despite access to <nb_rhttp_conns> are only done by the
current thread, atomic operations are used. This is because once
multi-thread support will be added, external threads will also retrieve
values from others.
2023-11-23 17:43:01 +01:00
Amaury Denoyelle
55e78ff7e1 MINOR: rhttp: large renaming to use rhttp prefix
Previous commit renames 'proto_reverse_connect' module to 'proto_rhttp'.
This commits follows this by replacing various custom prefix by 'rhttp_'
to make the code uniform.

Note that 'reverse_' prefix was kept in connection module. This is
because if a new reversable protocol not based on HTTP is implemented,
it may be necessary to reused the same connection function which are
protocol agnostic.
2023-11-23 17:40:01 +01:00
Amaury Denoyelle
e09af499b4 MINOR: rhttp: rename proto_reverse_connect
This commit is renaming of module proto_reverse_connect to proto_rhttp.
This name is selected as it is shorter and more precise.
2023-11-23 17:38:58 +01:00
Christopher Faulet
85da7116a9 BUG/MEDIUM: mux-h1: Don't set CO_SFL_MSG_MORE flag on last fast-forward send
In the mux-to-mux fast-forwarding, when end-of-input is reached on the producer
side, the consumer side must not set the CO_SFL_MSG_MORE flag on send. It means
the H1C_F_CO_MSG_MORE flag must be removed from the H1 connection.

No backport needed.
2023-11-23 17:30:18 +01:00
Willy Tarreau
1de44daf7d MINOR: ext-check: add an option to preserve environment variables
In Github issue #2128, @jvincze84 explained the complexity of using
external checks in some advanced setups due to the systematic purge of
environment variables, and expressed the desire to preserve the
existing environment. During the discussion an agreement was found
around having an option to "external-check" to do that and that
solution was tested and confirmed to work by user @nyxi.

This patch just cleans this up, implements the option as
"preserve-env" and documents it. The default behavior does not change,
the environment is still purged, unless "preserve-env" is passed. The
choice of not using "import-env" instead was made so that we could
later use it to name specific variables that have to be imported
instead of keeping the whole environment.

The patch is simple enough that it could be backported if needed (and
was in fact tested on 2.6 first).
2023-11-23 16:53:57 +01:00
Ilya Shipitsin
80813cdd2a CLEANUP: assorted typo fixes in the code and comments
This is 37th iteration of typo fixes
2023-11-23 16:23:14 +01:00
Willy Tarreau
45a9e4e24b MINOR: init: add info about the main program to the post_mortem struct
This way we'll still have haproxy's version, build options etc in core
dumps and centralized all at once.
2023-11-23 15:39:21 +01:00
Willy Tarreau
6455fd5024 MINOR: debug: add the ability to enter components in the post_mortem struct
Here the idea is to collect components' versions and build options. The
main component is haproxy, but the API is made so that any sub-system
can easily add a component there (for example the detailed version of a
device detection lib, or some info about a lib loaded from Lua).

The elements are stored as a pointer to an array of structs and its count
so that it's sufficient to issue this in gdb to list them all at once:

  print *post_mortem.components@post_mortem.nb_components

For now we collect name, version, toolchain, toolchain options, build
options and path. Maybe more could be useful in the future.
2023-11-23 15:39:21 +01:00
Willy Tarreau
a88a3482b5 MINOR: debug: dump the mapping of the libs into post_mortem
Having the libs and their addresses listed in the post_mortem struct
is also helpful. Sometimes it helps notice that one version is not the
expected one, e.g. due to some LD_LIBRARY_PATH. We don't emit it on
"show dev" however since that's already available via "show libs".
2023-11-23 15:39:21 +01:00
Willy Tarreau
37e3dd718c MINOR: debug: copy the thread info into the post_mortem struct
The last starting thread now copies the pthread ID and stack top of
each thread into post_mortem. That way it's as easy as issuing
"p post_mortem" in gdb to see all thread IDs and stack frames and more
easily map them to the threads met in a core.
2023-11-23 15:39:21 +01:00
Willy Tarreau
c0eec3a4aa MINOR: debug: collect some boot-time info related to the process
Here we collect the original uid/gid/rlimits for FD and RAM since these
ones do affect behavior and are sometimes different from expected in
containers or when starting as a service.
2023-11-23 15:39:21 +01:00
Willy Tarreau
ff9e06cd53 MINOR: debug: report any detected hypervisor in post_mortem
When the x86 CPU flags show the "hypervisor" flag, we know we're running
inside QEMU, VMware or possibly other flavors of hypervisors. In this
case we'll report either "qemu", "vmware" or "yes" for other ones in
the "virt_techno" field, based on the DMI hardware vendor name,
otherwise "no" when the flag is not found.
2023-11-23 15:39:21 +01:00
Willy Tarreau
0cc799bdd1 MINOR: debug: detect CPU model and store it in post_mortem
The CPU model and type has significant impact on certain bugs, such
as contention issues caused by CPUs having split L3 caches, or stricter
memory models that exhibit some barrier issues. It's complicated though
because the info about the model depends on the arch. For example, x86
reports an SKU name while ARM rather reports the CPU core types, families
and versions for each CPU core. There, the SoC will sometimes be reported
in the device tree or DMI info instead. But we don't really care, it's
essentially useful to know if the code is running on an armv8.0 such as
A53, a 8.2 such as A55/A76/Neoverse etc. For MIPS the model appears to
generally be there, and in addition the SoC is often present in the
"system type" field before the first CPU, and the type of machine in the
"machine" field, to replace the missing DMI and DT, so they are also
collected. Note that only the first CPU is checked and reported, that's
expected to be vastly sufficient, since we're just trying to spot known
incompatibilities or issues.
2023-11-23 15:39:21 +01:00
Willy Tarreau
2974f3e71b MINOR: debug: report in post_mortem if the container techno used is docker
If we detect we're running inside a container on Linux, let's check if
it seems to be docker. Docker usually creates a /.dockerenv file, which
is easy to check. It's uncertain whether it's always the case, but on the
few tested instances that was true, and we don't really care, what matters
is to place helpful debugging info for developers. When this file is
detected, we report "docker" instead of "yes" in the container techno.
2023-11-23 15:39:21 +01:00
Willy Tarreau
cf8be50a3d MINOR: debug: report in port_mortem whether a container was detected
Containers often cause significant trouble depending on how they're
set up, and they're not always trivial for their users to extract info
from. Here we're trying to detect if we're running inside a container
on Linux. There are plenty of approaches and none is perfectly clean
nor reliable, which makes sense since the goal is to remain transparent
enough.

One interesting approach is to rely on the observation that containers
generally do not expose most kernel threads, and that the very firsts
of them are extremely stable across all kernel versions: pid 2 was
called "keventd" in kernel 2.4, became "kthreadd" in kernel 2.6, and
has since not changed. This is true on all architectures tested, even
with highly stripped down kernels such as those found on 15 year-old
OpenWRT images. And this one doesn't appear inside containers. Thus
here we check if we find such a thread via /proc and whether it's
called keventd or kthreadd, to detect a container, and we set the
"cont_techno" variable to "yes" or "no" depending on what is found.
2023-11-23 15:39:21 +01:00
Willy Tarreau
4e3f9921de MINOR: debug: add OS/hardware info to the post_mortem struct
Let's extract some info about the system (board model, vendor etc),
this will indicate some hypervisors, some cloud instances or some
uncommon embedded boards etc. Typically, vmware, qemu and raspberry-pi
are visible here and can help during the troubleshooting session.
2023-11-23 15:39:21 +01:00
Willy Tarreau
0184597522 MINOR: debug: start to create a new struct post_mortem
The goal here is to accumulate precious debugging information in a
struct that is easy to find in memory. It's aligned to 256-byte as
it also helps. We'll progressively add a lot of info about the
startup conditions, the operating system, the hardware and hypervisor
so as to limit the number of round trips between developers and users
during debugging sessions. Also, opening a core file with an hex editor
should often be sufficient to extract most of the info.

In addition, a new "show dev" command will show these information so
that they can be checked at runtime without having to wait for a crash
(e.g. if a limit is bad in a container, better know it early).

For now the struct only contains utsname that's fed at boot time.
2023-11-23 15:39:21 +01:00
Willy Tarreau
2268f10dd6 DEBUG: tinfo: store the pthread ID and the stack pointer in tinfo
When debugging a core, it's difficult to match a given gdb thread number
against an internal thread. Let's just store the pthread ID and the stack
pointer in each tinfo. This could help in the future by allowing to just
glance over them and pick the right one depending what info is found
first.
2023-11-23 14:32:55 +01:00
Willy Tarreau
53da8bfcb6 BUG/MINOR: server: do not leak default-server in defaults sections
When a default-server directive is used in a defaults section, it's never
freed and the "defaults" proxy gets reset without freeing the fields from
that default-server. Normally there are no allocation there, except for
the config file location stored in srv->conf.file form an strdup() since
commit 9394a9444 ("REORG: server: move alert traces in parse_server")
that appeared in 2.4. In addition, if a "default-server" directive
appears multiple times in a defaults section, one more entry will be
leaked per call.

This commit addresses this by checking that we don't overwrite the file
upon multiple calls, and by clearing it when resetting the default proxy.
This should be backported to 2.4.
2023-11-23 14:32:55 +01:00
Frédéric Lécaille
7fc52357cb BUG/MINOR: quic: Possible RX packet memory leak under heavy load
This bug could be reproduced with -dMfail and h2load generating plenty of connections.
A "show pools" CLI command showed that some memory in relation with RX packet pool
was never release. Furthermore, adding a RX packet counter to each connection
and a BUG_ON() in quic_conn_release() has proved that this unreleased memory
was in relation with RX packet which were not linked to a connection.

The responsible is quic_dgram_parse() which does not release some RX packet
memory before exiting after the connection thread affinity has changed.

Must be backported as far as 2.7.
2023-11-22 18:03:26 +01:00
Frédéric Lécaille
cd225da46c BUG/MINOR: quic: Possible leak of TX packets under heavy load
This bug could be reproduced with -dMfail and detected added a counter of TX packet
to the QUIC connection. When released calling quic_conn_release() the connection
should have a null counter of TX packets. This was not always the case.
This could occur during the handshake step: a first packet was built, then another
one should have followed in the same datagram, but fail due to a memory allocation
issue. As the datagram length and first TX packet were not written in the TX
buffer, this latter could not really be purged by qc_purge_tx_buf() even if
called. This bug occured only when building coalesced packets in the same datagram.

To fix this, write the packet information (datagram length and first packet
address) in the TX buffer before purging it.

Must be backported as far as 2.6.
2023-11-22 18:03:26 +01:00
Frédéric Lécaille
dc8a20b317 BUG/MEDIUM: quic: Possible crash during retransmissions and heavy load
This bug could be reproduced with -dMfail and dectected by libasan as follows:

$ ASAN_OPTIONS=disable_coredump=0:unmap_shadow_on_exit=1:abort_on_error=f quic-freeze.cfg -dMfail -dMno-cache -dM0x55
=================================================================
==82989==ERROR: AddressSanitizer: stack-use-after-scope on address 0x7ffc 0x560790cc4749 bp 0x7fff8e0e8e30 sp 0x7fff8e0e8e28
WRITE of size 8 at 0x7fff8e0ea338 thread T0
    #0 0x560790cc4748 in qc_frm_free src/quic_frame.c:1222
    #1 0x560790cc5260 in qc_release_frm src/quic_frame.c:1261
    #2 0x560790d1de99 in qc_treat_acked_tx_frm src/quic_rx.c:312
    #3 0x560790d1e708 in qc_ackrng_pkts src/quic_rx.c:370
    #4 0x560790d22a1d in qc_parse_ack_frm src/quic_rx.c:694
    #5 0x560790d25daa in qc_parse_pkt_frms src/quic_rx.c:988
    #6 0x560790d2a509 in qc_treat_rx_pkts src/quic_rx.c:1373
    #7 0x560790c72d45 in quic_conn_io_cb src/quic_conn.c:906
    #8 0x560791207847 in run_tasks_from_lists src/task.c:596
    #9 0x5607912095f0 in process_runnable_tasks src/task.c:876
    #10 0x560791135564 in run_poll_loop src/haproxy.c:2966
    #11 0x5607911363af in run_thread_poll_loop src/haproxy.c:3165
    #12 0x56079113938c in main src/haproxy.c:3862
    #13 0x7f92606edd09 in __libc_start_main ../csu/libc-start.c:308
    #14 0x560790bcd529 in _start (/home/flecaille/src/haproxy/haproxy+0x

Address 0x7fff8e0ea338 is located in stack of thread T0 at offset 1032 i
    #0 0x560790d29b52 in qc_treat_rx_pkts src/quic_rx.c:1341

  This frame has 2 object(s):
    [32, 48) 'ar' (line 1380)
    [64, 1088) '_msg' (line 1368) <== Memory access at offset 1032 is inable
HINT: this may be a false positive if your program uses some custom stacnism, swapcontext or vfork
      (longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: stack-use-after-scope src/quic_frame.c:1222 i
Shadow bytes around the buggy address:
  0x100071c15410: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
  0x100071c15420: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
  0x100071c15430: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
  0x100071c15440: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
  0x100071c15450: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
=>0x100071c15460: f8 f8 f8 f8 f8 f8 f8[f8]f8 f8 f8 f8 f8 f8 f3 f3
  0x100071c15470: f3 f3 f3 f3 f3 f3 f3 f3 f3 f3 f3 f3 f3 f3 00 00
  0x100071c15480: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x100071c15490: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x100071c154a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x100071c154b0: 00 00 00 00 00 00 00 00 f1 f1 f1 f1 04 f3 f3 f3
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
  Shadow gap:              cc
==82989==ABORTING
AddressSanitizer:DEADLYSIGNAL
AddressSanitizer:DEADLYSIGNAL
AddressSanitizer:DEADLYSIGNAL
AddressSanitizer:DEADLYSIGNAL
AddressSanitizer:DEADLYSIGNAL
AddressSanitizer:DEADLYSIGNAL
Aborted (core dumped)

Note that a coredump could not always be produced with all compilers. This was
always the case with clang 11.

When allocating frames to be retransmitted from qc_dgrams_retransmit(), if they
could not be sent for any reason, they could remain attached to a local list to
qc_dgrams_retransmit() and trigger a crash with libasan when releasing the
original frames they were duplicated from.

To fix this, always release the frames which could not be sent during
retransmissions calling qc_free_frm_list() where needed.

Must be backported as far as 2.6.
2023-11-22 18:03:26 +01:00
Frédéric Lécaille
34bc100b8f MINOR: quic: Add traces to debug frames handling during retransmissions
This is really boring to not know why some retransmissions could not be done
from qc_prep_hpkts() which allocates frames, prepare packets and send them.
Especially to not know about if frames are not remaining allocated and
attached to list on the stack. This patch already helped in diagnosing
such an issue during "-dMfail" tests.
2023-11-22 18:03:26 +01:00
Willy Tarreau
8f9e94ecff BUILD: log: silence a build warning when threads are disabled
Building without threads emits two warnings because the proxy pointer
is no longer used (only serves for the lock) since 2.9 commit 9a74a6cb1
("MAJOR: log: introduce log backends"). No backport is needed.
2023-11-22 11:21:07 +01:00
Amaury Denoyelle
89da4e9e5d MINOR: acl: define explicit HTTP_3.0
Some ACL shortcuts are defined to match HTTP requests by their version.
This exists for HTTP_1.0 to HTTP_2.0. This patch adds HTTP_3.0
definition.
2023-11-20 18:01:07 +01:00
Amaury Denoyelle
decf29d06d MINOR: quic: remove unneeded QUIC specific stopping function
On CONNECTION_CLOSE reception/emission, QUIC connections enter CLOSING
state. At this stage, only CONNECTION_CLOSE can be reemitted and all
other exchanges are stopped.

Previously, on haproxy process stopping, if all QUIC connections were in
CLOSING state, they were released before their closing timer expiration
to not block the process shutdown. However, since a recent commit, the
closing timer has been shorten to a more reasonable delay. It is now
consider viable to respect connections closing state even on process
shutdown. As such, stopping specific code in QUIC connections idle timer
task was removed.

A specific function quic_handle_stopping() was implemented to notify
QUIC connections on shutdown from main() function. It should have been
deleted along the removal in QUIC idle timer task. This patch just does
this.
2023-11-20 17:59:52 +01:00
Frédéric Lécaille
756b3c5f7b BUG/MEDIUM: quic: Possible crash for connections to be killed
The connections are flagged as "to be killed" asap when the peer has left
(detected by sendto() "Connection refused" errno) by qc_kill_conn(). This
function has to wakeup the idle timer task to release the connection (and the idle
timer  and the idle timer task itself). Then if in the meantime the connection
was flagged as having to process some retransmissions, some packet could lead
to sendto() errors again with a call to qc_kill_conn(), this time with a released
idle timer task.

This bug could be detected by libasan as follows:

.AddressSanitizer:DEADLYSIGNAL
=================================================================
==21018==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x    560b5d898717 bp 0x7f9aaac30000 sp 0x7f9aaac2ff80 T3)
==21018==The signal is caused by a READ memory access.
==21018==Hint: address points to the zero page.
.    #0 0x560b5d898717 in _task_wakeup include/haproxy/task.h:209
    #1 0x560b5d8a563c in qc_kill_conn src/quic_conn.c:171
    #2 0x560b5d97f832 in qc_send_ppkts src/quic_tx.c:636
    #3 0x560b5d981b53 in qc_send_app_pkts src/quic_tx.c:876
    #4 0x560b5d987122 in qc_send_app_probing src/quic_tx.c:910
    #5 0x560b5d987122 in qc_dgrams_retransmit src/quic_tx.c:1397
    #6 0x560b5d8ab250 in quic_conn_app_io_cb src/quic_conn.c:712
    #7 0x560b5de41593 in run_tasks_from_lists src/task.c:596
    #8 0x560b5de4333c in process_runnable_tasks src/task.c:876
    #9 0x560b5dd6f2b0 in run_poll_loop src/haproxy.c:2966
    #10 0x560b5dd700fb in run_thread_poll_loop src/haproxy.c:3165
    #11 0x7f9ab9188ea6 in start_thread nptl/pthread_create.c:477
    #12 0x7f9ab90a8a2e in __clone (/lib/x86_64-linux-gnu/libc.so.6+0xfba2e)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV include/haproxy/task.h:209 in _task_wakeup
Thread T3 created by T0 here:
    #0 0x7f9ab97ac2a2 in __interceptor_pthread_create ../../../../src/libsaniti    zer/asan/asan_interceptors.cpp:214
    #1 0x560b5df4f3ef in setup_extra_threads src/thread.c:252                  o
    #2 0x560b5dd730c7 in main src/haproxy.c:3856
    #3 0x7f9ab8fd0d09 in __libc_start_main ../csu/libc-start.c:308             i

==21018==ABORTING
AddressSanitizer:DEADLYSIGNAL
Aborted (core dumped)

To fix, simply reset the connection flag QUIC_FL_CONN_RETRANS_NEEDED to cancel
the retransmission when qc_kill_conn is called.

Note that this new bug arrived with this fix which is correct and flagged as to be
backported as far as 2.6.
      BUG/MINOR: quic: idle timer task requeued in the past

Must be backported as far as 2.6.
2023-11-20 17:17:16 +01:00
Amaury Denoyelle
a8968701c0 BUG/MAJOR: quic: complete thread migration before tcp-rules
A quic_conn is instantiated and tied on the first thread which has
received the first INITIAL packet. After handshake completion,
listener_accept() is called. For each quic_conn, a new thread is
selected among the least loaded ones Note that this occurs earlier if
handling 0-RTT data.

This thread connection migration is done in two steps :
* inside listener_accept(), on the origin thread, quic_conn
  tasks/tasklet are killed. After this, no quic_conn related processing
  will occur on this thread. The connection is flagged with
  QUIC_FL_CONN_AFFINITY_CHANGED.
* as soon as the first quic_conn related processing occurs on the new
  thread, the migration is finalized. This allows to allocate the new
  tasks/tasklet directly on the destination thread.

This last step on the new thread must be done prior to other quic_conn
access. There is two events which may trigger it :
* a packet is received on the new thread. In this case,
  qc_finalize_affinity_rebind() is called from quic_dgram_parse().
* the recently accepted connection is popped from accept_queue_ring via
  accept_queue_process(). This will called session_accept_fd() as
  listener.bind_conf.accept callback. This instantiates a new session
  and start connection stack via conn_xprt_start(), which itself calls
  qc_xprt_start() where qc_finalize_affinity_rebind() is used.

A condition was recently found which could cause a closing to be used
with qc_finalize_affinity_rebind() which is forbidden with a BUG_ON().

This lat step was not compatible with layer 4 rule such as "tcp-request
connection reject" which closes the connection early. In this case, most
of the body of session_accept_fd() is skipped, including
qc_xprt_start(), so thread migration is not finalized. At the end of the
function, conn_xprt_close() is then called which flags the connection as
CLOSING.

If a datagram is received for this connection before it is released,
this will call qc_finalize_affinity_rebind() which triggers its BUG_ON()
to prevent thread migration for CLOSING quic_conn.

FATAL: bug condition "qc->flags & ((1U << 29)|(1U << 30))" matched at src/quic_conn.c:2036
Thread 3 "haproxy" received signal SIGILL, Illegal instruction.
[Switching to Thread 0x7ffff794f700 (LWP 2973030)]
0x00005555556221f3 in qc_finalize_affinity_rebind (qc=0x7ffff002d060) at src/quic_conn.c:2036
2036            BUG_ON(qc->flags & (QUIC_FL_CONN_CLOSING|QUIC_FL_CONN_DRAINING));
(gdb) bt
 #0  0x00005555556221f3 in qc_finalize_affinity_rebind (qc=0x7ffff002d060) at src/quic_conn.c:2036
 #1  0x0000555555682463 in quic_dgram_parse (dgram=0x7fff5003ef10, from_qc=0x0, li=0x555555f38670) at src/quic_rx.c:2602
 #2  0x0000555555651aae in quic_lstnr_dghdlr (t=0x555555fc4440, ctx=0x555555fc3f78, state=32832) at src/quic_sock.c:189
 #3  0x00005555558c9393 in run_tasks_from_lists (budgets=0x7ffff7944c90) at src/task.c:596
 #4  0x00005555558c9e8e in process_runnable_tasks () at src/task.c:876
 #5  0x000055555586b7b2 in run_poll_loop () at src/haproxy.c:2966
 #6  0x000055555586be87 in run_thread_poll_loop (data=0x555555d3d340 <ha_thread_info+64>) at src/haproxy.c:3165
 #7  0x00007ffff7b59609 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
 #8  0x00007ffff7a7e133 in clone () from /lib/x86_64-linux-gnu/libc.so.6

To fix this issue, ensure quic_conn migration is completed earlier
inside session_accept_fd(), before any tcp rules processing. This is
done by moving qc_finalize_affinity_rebind() invocation from
qc_xprt_start() to qc_conn_init().

This must be backported up to 2.7.
2023-11-20 16:11:26 +01:00
Willy Tarreau
3e913909e7 BUILD: cache: fix build error on older compilers
pre-c99 compilers will fail to build the cache since commit 48f81ec09
("MAJOR: cache: Delay cache entry delete in reserve_hot function") due
to an int declaration in the for loop. No backport is needed.
2023-11-20 11:43:52 +01:00
Willy Tarreau
445fc1fe3a BUG/MINOR: sock: mark abns sockets as non-suspendable and always unbind them
In 2.3, we started to get a cleaner socket unbinding mechanism with
commit f58b8db47 ("MEDIUM: receivers: add an rx_unbind() method in
the protocols"). This mechanism rightfully refrains from unbinding
when sockets are expected to be transferrable to another worker via
"expose-fd listeners", but this is not compatible with ABNS sockets,
which do not support reuseport, unbinding nor being renamed: in short
they will always prevent a new process from binding.

It turns out that this is not much visible because by pure accident,
GTUNE_SOCKET_TRANSFER is only set in the code dealing with master mode
and deamons, so it's never set in foreground mode nor in tests even if
present on the stats socket. However with master mode, it is now always
set even when not present on the stats socket, and will always conflict.

The only reasonable approach seems to consist in marking these abns
sockets as non-suspendable so that the generic sock_unbind() code can
decide to just unbind them regardless of GTUNE_SOCKET_TRANSFER.

This should carefully be backported as far as 2.4.
2023-11-20 11:38:26 +01:00
William Lallemand
ef9a195742 BUG/MINOR: startup: set GTUNE_SOCKET_TRANSFER correctly
This bug was forbidding the GTUNE_SOCKET_TRANSFER option to be set
when haproxy is neither in daemon mode nor in mworker mode. So it
basically only impacts the foreground mode.

The fix moves the code outside the 'if (global.mode & (MODE_DAEMON |
MODE_MWORKER | MODE_MWORKER_WAIT))' condition.

Bug was introduced with 7f80eb23 ("MEDIUM: proxy: zombify proxies only
when the expose-fd socket is bound").

Must be backported in every stable version.
2023-11-20 10:49:05 +01:00
Aurelien DARRAGON
82f4bcafae MINOR: log/backend: prevent "dynamic-cookie-key" use with LOG mode
It doesn't make sense to set "dynamic-cookie-key" inside a log backend,
thus we report a warning to the user and reset the setting.
2023-11-18 11:16:21 +01:00
Aurelien DARRAGON
c7783fb32b MINOR: log/backend: prevent "http-send-name-header" use with LOG mode
It doesn't make sense to use the "http-send-name-header" directive inside
a log backend so we report a warning in with case and reset the setting.
2023-11-18 11:16:21 +01:00
Aurelien DARRAGON
4b2616f784 MINOR: log/backend: prevent stick table and stick rules with LOG mode
Report a warning and prevent errors if user tries to declare a stick table
or use stick rules within a log backend.
2023-11-18 11:16:21 +01:00
Aurelien DARRAGON
5335618967 MINOR: log/backend: prevent tcp-{request,response} use with LOG mode
We start implementing some postparsing compatibility checks for log
backends.

Here we report a warning if user tries to use tcp-{request,response} rules
with log backend, and we properly ignore such rules when inherited from
defaults section.
2023-11-18 11:16:21 +01:00
Aurelien DARRAGON
6a29888f60 MINOR: log/backend: ensure log exclusive params are not used in other modes
add proxy_cfg_ensure_no_log() function (similar to
proxy_cfg_ensure_no_http()) to ensure at the end of proxy parsing that
no log exclusive options are found if the proxy is not in log mode.
2023-11-18 11:16:21 +01:00
Aurelien DARRAGON
42d7d1bd47 Revert "MINOR: filter: "filter" requires TCP or HTTP mode"
This reverts commit f9422551cd since we
cannot perform the test during parsing as the effective proxy mode is
not yet known.
2023-11-18 11:16:21 +01:00
Aurelien DARRAGON
c8948fb7ac Revert "MINOR: flt_http_comp: "compression" requires TCP or HTTP mode"
This reverts commit 225526dc16 since we
cannot perform the test during parsing as the effective proxy mode is
not yet known.
2023-11-18 11:16:21 +01:00
Aurelien DARRAGON
33e5c4055f Revert "MINOR: http_htx/errors: prevent the use of some keywords when not in tcp/http mode"
This reverts commit b41b77b4cc since we
cannot perform the test during parsing as the effective proxy mode is
not yet known.
2023-11-18 11:16:21 +01:00
Aurelien DARRAGON
0f9b475333 Revert "MINOR: fcgi-app: "use-fcgi-app" requires TCP or HTTP mode"
This reverts commit 0ba731f50b since we
cannot perform the test during parsing as the effective proxy mode is
not yet known.
2023-11-18 11:16:21 +01:00
Aurelien DARRAGON
7d59730100 Revert "MINOR: cfgparse-listen: "http-reuse" requires TCP or HTTP mode"
This reverts commit 65f1124b5d since we
cannot perform the test during parsing as the effective proxy mode is
not yet known.
2023-11-18 11:16:21 +01:00
Aurelien DARRAGON
f1a072d077 Revert "MINOR: cfgparse-listen: "dynamic-cookie-key" requires TCP or HTTP mode"
This reverts commit 0b09727a22 since we
cannot perform the test during parsing as the effective proxy mode is
not yet known.
2023-11-18 11:16:21 +01:00
Aurelien DARRAGON
a0a7dd1ee7 Revert "MINOR: cfgparse-listen: "http-send-name-header" requires TCP or HTTP mode"
This reverts commit d354947365 since we
cannot perform the test during parsing as the proxy mode is not yet
known.
2023-11-18 11:16:21 +01:00
Aurelien DARRAGON
c90d7dc46b Revert "MINOR: stktable: "stick" requires TCP or HTTP mode"
This reverts commit 098ae743fd since we
cannot perform the test during parsing as the effective proxy mode is
not yet known.
2023-11-18 11:16:21 +01:00
Aurelien DARRAGON
8e20fdbb1c Revert "MINOR: tcp_rules: tcp-{request,response} requires TCP or HTTP mode"
This reverts commit 09b15e4163 since
we cannot perform the test during parsing as the effective proxy mode is
not yet known.
2023-11-18 11:16:21 +01:00
Aurelien DARRAGON
b6e1e9ec8b Revert "MINOR: proxy: report a warning for max_ka_queue in proxy_cfg_ensure_no_http()"
This reverts commit 3934901 since it makes no sense to report a warning
in this case given that max-keepalive-queue will also work with TCP
backends.
2023-11-18 11:16:21 +01:00
Aurelien DARRAGON
b61147fd2a MEDIUM: log/balance: merge tcp/http algo with log ones
"log-balance" directive was recently introduced to configure the
balancing algorithm to use when in a log backend. However, it is
confusing and it causes issues when used in default section.

In this patch, we take another approach: first we remove the
"log-balance" directive, and instead we rely on existing "balance"
directive to configure log load balancing in log backend.

Some algorithms such as roundrobin can be used as-is in a log backend,
and for log-only algorithms, they are implemented as "log-$name" inside
the "backend" directive.

The documentation was updated accordingly.
2023-11-18 11:16:21 +01:00
Aurelien DARRAGON
2c4943c18b BUG/MINOR: proxy/stktable: missing frees on proxy cleanup
In 1b8e68e ("MEDIUM: stick-table: Stop handling stick-tables as proxies.")
we forgot to free the table pointer which is now dynamically allocated.

Let's take this opportunity to also fix a missing free in the table itself
(the table expire task wasn't properly destroyed)

This patch depends on:
 - "MINOR: stktable: add sktable_deinit function"

It should be backported in every stable versions.
2023-11-18 11:16:21 +01:00
Aurelien DARRAGON
e10cf61099 MINOR: stktable: add stktable_deinit function
Adding sktable_deinit() helper function to properly cleanup a sticktable
that was initialized using stktable_init().
2023-11-18 11:16:21 +01:00
Willy Tarreau
6c7771f1b4 MINOR: stream/cli: add another filter "susp" to "show sess"
This one reports streams considered as "suspicious", i.e. those with
no expiration dates or dates in the past, or those without a front
endpoint. More criteria could be added in the future.
2023-11-17 19:30:07 +01:00
Willy Tarreau
3ffcf7beb1 MINOR: stream/cli: add an optional "older" filter for "show sess"
It's often needed to be able to refine "show sess" when debugging, and
very often a first glance at old streams is performed, but that's a
difficult task in large dumps, and it takes lots of resources to dump
everything.

This commit adds "older <age>" to "show sess" in order to specify the
minimum age of streams that will be dumped. This should simplify the
identification of blocked ones.
2023-11-17 19:30:04 +01:00
Willy Tarreau
ec76e0138b BUG/MINOR: stream/cli: report correct stream age in "show sess"
Since 2.4-dev2 with commit 15e525f49 ("MINOR: stream: Don't retrieve
anymore timing info from the mux csinfo"), we don't replace the
tv_accept (now accept_ts) anymore with the current request's, so that
it properly reflects the session's accept date and not the request's
date. However, since then we failed to update "show sess" to make use
of the request's timestamp instead of the session's timestamp, resulting
in fantasist values in the "age" field of "show sess" for the task.

Indeed, the session's age is displayed instead of the stream's, which
leads to great confusion when debugging, particularly when it comes to
multiplexed inter-proxy connections which are kept up forever.

Let's fix this now. This must be backported as far as 2.4. However,
for 2.7 and older, the field was named tv_request and was a timeval.
2023-11-17 18:59:12 +01:00
Willy Tarreau
662565ddb4 MINOR: backend: without ->connect(), allow to pick another thread's connection
If less connections than threads are established on a reverse-http gateway
and these servers have a non-nul pool-min-conn, then conn_backend_get()
will refrain from picking available connections from other threads. But
this makes no sense for protocols for which there is no ->connect(),
since there's no way the current thread will manage to establish its own
connection. For such situations we should always accept to use another
thread's connection. That's precisely what this patch does.
2023-11-17 18:13:04 +01:00
Willy Tarreau
f592a0d5dd MINOR: rhttp: remove the unused outgoing connect() function
A dummy connect() function previously had to be installed for the log
server so that a reverse-http address could be referenced on a "server"
line, but after the recent rework of the server line parsing, this is
no longer needed, and this is actually annoying as it makes one believe
there is a way to connect outside, which is not true. Let's now get rid
of this function.
2023-11-17 18:10:16 +01:00
Willy Tarreau
d069825c5f BUG/MEDIUM: mux-fcgi: fail earlier on malloc in takeover()
This is the equivalent of the previous "BUG/MEDIUM: mux-h1: fail earlier
on malloc in takeover()".

Connection takeover was implemented for fcgi in 2.2 by commit a41bb0b6c
("MEDIUM: mux_fcgi: Implement the takeover() method."). It does have one
corner case related to memory allocation failure: in case the task or
tasklet allocation fails, the connection gets released synchronously.

Unfortunately the situation is bad there, because the lower layers are
already switched to the new thread while the tasklet is either NULL or
still the old one, and calling fcgi_release() will also result in
touching the thread-local list of buffer waiters, calling unsubscribe(),
There are even code paths where the thread will try to grab the lock of
its own idle conns list, believing the connection is there while it has
no useful effect. However, if the owner thread was doing the same at the
same moment, and ended up trying to pick from the current thread (which
could happen if picking a connection for a different name), the two
could even deadlock.

No tests were made to try to reproduce the problem, but the description
above is sufficient to see that nothing can guarantee against it.

This patch takes a simple but radically different approach. Instead of
starting to migrate the connection before risking to face allocation
failures, it first pre-allocates a new task and tasklet, then assigns
them to the connection if the migration succeeds, otherwise it just
frees them. This way it's no longer needed to manipulate the connection
until it's fully migrated, and as a bonus this means the connection will
continue to exist and the use-after-free condition is solved at the same
time.

This should be backported to 2.2. Thanks to Fred for the initial analysis
of the problem!
2023-11-17 18:10:16 +01:00
Willy Tarreau
95fd2d6801 BUG/MEDIUM: mux-h1: fail earlier on malloc in takeover()
This is the h1 equivalent of previous "BUG/MEDIUM: mux-h2: fail earlier
on malloc in takeover()".

Connection takeover was implemented for H1 in 2.2 by commit f12ca9f8f1
("MEDIUM: mux_h1: Implement the takeover() method."). It does have one
corner case related to memory allocation failure: in case the task or
tasklet allocation fails, the connection gets released synchronously.

Unfortunately the situation is bad there, because the lower layers are
already switched to the new thread while the tasklet is either NULL or
still the old one, and calling h1_release() will call some unsubscribe
and and possibly other things whose safety is not guaranteed (and the
ambiguity here alone is sufficient to be careful). There are even code
paths where the thread will try to grab the lock of its own idle conns
list, believing the connection is there while it has no useful effect.
However, if the owner thread was doing the same at the same moment, and
ended up trying to pick from the current thread (which could happen if
picking a connection for a different name), the two could even deadlock.

Contrary to mux-h2, a few tests were not sufficient to try to crash the
process, but there's nothing that indicates it couldn't happen based on
the description above.

This patch takes a simple but radically different approach. Instead of
starting to migrate the connection before risking to face allocation
failures, it first pre-allocates a new task and tasklet, then assigns
them to the connection if the migration succeeds, otherwise it just
frees them. This way it's no longer needed to manipulate the connection
until it's fully migrated, and as a bonus this means the connection will
continue to exist and the use-after-free condition is solved at the same
time.

This should be backported to 2.2. Thanks to Fred for the initial analysis
of the problem!
2023-11-17 18:10:16 +01:00
Willy Tarreau
4f02e3da67 BUG/MEDIUM: mux-h2: fail earlier on malloc in takeover()
Connection takeover was implemented for H2 in 2.2 by commit cd4159f03
("MEDIUM: mux_h2: Implement the takeover() method."). It does have one
corner case related to memory allocation failure: in case the task or
tasklet allocation fails, the connection gets released synchronously.
Unfortunately the situation is bad there, because the lower layers are
already switched to the new thread while the tasklet is either NULL or
still the old one, and calling h2_release() will also result in
h2_process() and h2_process_demux() that may process any possibly
pending frames. Even the session remains the old one on the old thread,
so that some sess_log() that are called when facing certain demux errors
will be associated with the previous thread, possibly accessing a number
of elements belonging to another thread. There are even code paths where
the thread will try to grab the lock of its own idle conns list, believing
the connection is there while it has no useful effect. However, if the
owner thread was doing the same at the same moment, and ended up trying
to pick from the current thread (which could happen if picking a connection
for a different name), the two could even deadlock.

The risk is extremely low, but Fred managed to reproduce use-after-free
errors in conn_backend_get() after a takeover() failed by playing with
-dMfail, indicating that h2_release() had been successfully called. In
practise it's sufficient to have h2 on the server side with reuse-always
and to inject lots of request on it with -dMfail.

This patch takes a simple but radically different approach. Instead of
starting to migrate the connection before risking to face allocation
failures, it first pre-allocates a new task and tasklet, then assigns
them to the connection if the migration succeeds, otherwise it just
frees them. This way it's no longer needed to manipulate the connection
until it's fully migrated, and as a bonus this means the connection will
continue to exist and the use-after-free condition is solved at the same
time.

This should be backported to 2.2. Thanks to Fred for the initial analysis
of the problem!
2023-11-17 18:10:16 +01:00
Willy Tarreau
c7a90cc181 CLEANUP: haproxy: remove old comment from 1.1 from the file header
There was still a totally outdated comment speaking about issues
affecting solaris on 1.1.8pre4 (April 2002, 21 year-old)! This
proves that comments in headers are never read, so let's take this
opportunity for also removing the outdated one recommending to read
the "updated" RFC7230.
2023-11-17 18:10:16 +01:00
Frdric Lcaille
888d1dc3dc MINOR: quic: Rename "handshake" timeout to "client-hs"
Use a more specific name for this timeout to distinguish it from a possible
future one on the server side.
Also update the documentation.
2023-11-17 18:09:41 +01:00
Frédéric Lécaille
373e40f0c1 MEDIUM: session: handshake timeout (TCP)
Adapt session_accept_fd() called on accept() to set the handshake timeout from
"hanshake-timeout" setting if set by configuration. If not set, continue to use
the "client" timeout setting.
2023-11-17 17:31:42 +01:00
Frédéric Lécaille
392640a61b BUG/MINOR: quic: Malformed CONNECTION_CLOSE frame
This bug arrived with this commit:
      MINOR: quic: Avoid zeroing frame structures
Before this latter, the CONNECTION_CLOSE was zeroed, especially the "reason phrase
length".

Restablish this behavior.

No need to backport.
2023-11-17 17:31:42 +01:00
Frédéric Lécaille
953c7dc2b9 MINOR: quic: Dump the expiration date of the idle timer task
This date is shared between the idle timer and hanshake timeout. So, it should be
useful to dump the expiration date of the idle timer task itself, in place of the
idle timer expiration date. This way, the handshake timeout value will be visible
during the handshake from CLI "show quic full" command.
2023-11-17 17:31:42 +01:00
Frédéric Lécaille
e3e0bb90ce MEDIUM: quic: Add support for "handshake" timeout setting.
The idle timer task may be used to trigger the client handshake timeout.
The hanshake timeout expiration date (qc->hs_expire) is initialized when the
connection is allocated. Obviously, this timeout is taken into an account only
during the handshake by qc_idle_timer_do_rearm() whose job is to rearm the idle timer.

The idle timer expiration date could be initialized only one time, then
never updated until the hanshake completes. But this only works if the
handshake timeout is smaller than the idle timer task timeout. If the handshake
timeout is set greater than the idle timeout, this latter may expire before the
handshake timeout.

This patch may have an impact on the L1/C1 interop tests (with heavy packet loss
or corruption). This is why I guess some implementations with a hanshake timeout
support set a big timeout during this test. This is at least the case for ngtcp2
which sets a 180s hanshake timeout! haproxy will certainly have to proceed the
same way if it wants to have a chance to pass this test as before this handshake
timeout.
2023-11-17 17:31:42 +01:00
Frédéric Lécaille
b33eacc523 MINOR: proxy: Add "handshake" new timeout (frontend side)
Add a new timeout for the handshake, on the frontend side only. Such a hanshake
will be typically used for TLS hanshakes during client connections to TLS/TCP or
QUIC frontends.
2023-11-17 17:31:42 +01:00
Remi Tricot-Le Breton
d5cce92a46 BUG/MINOR: shctx: Remove old HA_SPIN_INIT
The shctx lock was changed from a SPINLOCK to a RWLOCK in commit ed35b94
"MEDIUM: cache: Switch shctx spinlock to rwlock and restrict its scope"
but a SPIN_INIT was left behind.

This patch does not need to be backported.
2023-11-17 16:56:18 +01:00
Christopher Faulet
7676a2cdf6 BUG/MINOR: stconn/applet: Report send activity only if there was output data
For applets and connection, when a send attempt is performed, we must be
sure to not report a send activity if there was no output data at all before
the attempt.

It is not important for the <fsb> date itself but for the <lra> date for
non-independent stream.

This patch must be backported to 2.8.
2023-11-17 15:36:43 +01:00
Christopher Faulet
ab5ecaa2ea BUG/MINOR: stconn: Use HTX-aware channel's functions to get info on buffer
Some channel function are used to check if the channel's buffer is full, not
empty or if there are input data. However, functions used are not
HTX-aware. So it is not accurate and may prevent some actions to be
performed (However, not sure there are really issues). Because HTX-aware
versions now exist, use them instead.

This patch may be backported as far as 2.2. It relies on

    * "MINOR: channel: Add functions to get info on buffers and deal with HTX streams"
    * "MINOR: htx: Use a macro for overhead induced by HTX"
2023-11-17 15:09:33 +01:00
Christopher Faulet
24409a5caa BUG/MINOR: stconn: Fix streamer detection for HTX streams
Since the HTX was introduced, the streamer detection is broken for HTX
streams because the HTX overhead was not counted in the test to set
CF_STREAMER and CF_STREAMER_FAST flags.

The consequence was that the consumer side was no longer able to send more
than tune.ssl.maxrecord at a time in SSL.

To fix the issue, we now count the HTX overhead of HTX streams to be able to
set CF_STREAMER/CF_STREAMER_FAST flags on a channel.

This patch relies on folloing commits:

  * "MINOR: channel: Add functions to get info on buffers and deal with HTX streams"
  * "MINOR: htx: Use a macro for overhead induced by HTX"

The series must be backported as far as 2.2.
2023-11-17 15:09:17 +01:00
Christopher Faulet
b68c579eda BUG/MEDIUM: stconn: Update fsb date on partial sends
The first-send-blocked date was originally designed to save the date of the
first send of a series where some data remain blocked. It was relaxed
recently (3083fd90e "BUG/MEDIUM: stconn: Report a send activity everytime
data were sent") to save the date of the first full blocked send. However,
it is not accurrate.

When all data are sent, the fsb value must be reset to TICK_ETERNITY. When
nothing is sent and if it is not already set, it must be set. But when data
are partially sent, the value must be updated and not reset. Otherwise the
write timeout may be ignored because fsb date is never set.

So, changes brought by the patch above are reverted and
sc_ep_report_blocked_send() was changed to know if some data were sent or
not. This way we are able to update fsb value.
l
This patch must be backported to 2.8.
2023-11-17 12:13:00 +01:00
Remi Tricot-Le Breton
f1f8e2b3df DOC: cache: Specify when function expects a cache lock
Some functions are built on the fact that the cache lock must be already
taken by the caller. This patch adds this information in the functions'
descriptions.
2023-11-16 19:35:10 +01:00
Remi Tricot-Le Breton
45a2ff0f4a MINOR: shctx: Remove 'use_shared_mem' variable
This global variable was used to avoid using locks on shared_contexts in
the unlikely case of nbthread==1. Since the locks do not do anything
when USE_THREAD is not defined, it will be more beneficial to simply
remove this variable and the systematic test on its value in the shared
context locking functions.
2023-11-16 19:35:10 +01:00
Remi Tricot-Le Breton
4fe6c1365d MINOR: shctx: Remove redundant arg from free_block callback
The free_block callback does not get called on blocks that are not row
heads anymore so we don't need too shared_block parameters.
2023-11-16 19:35:10 +01:00
Remi Tricot-Le Breton
48f81ec09d MAJOR: cache: Delay cache entry delete in reserve_hot function
A reference counter on the cache_entry was added in a previous commit.
Its value is atomically increased and decreased via the retain_entry and
release_entry functions.

This is needed because of the latest cache and shared_context
modifications that introduced two separate locks instead of the
preexisting single shctx_lock one.
With the new logic, we have two main blocks competing for the two locks:
- the one in the http_action_req_cache_use that performs a lookup in the
  cache tree (locked by the cache lock) and then tries to remove the
  corresponding blocks from the shared_context's 'avail' list until the
  response is sent to the client by the cache applet,
- the shctx_row_reserve_hot that traverses the 'avail' list and gives
  them back to the caller, while removing previous row heads from the
  cache tree
Those two blocks require the two locks but one of them would take the
cache lock first, and the other one the shctx_lock first, which would
end in a deadlock without the current patch.

The way this conflict is resolved in this patch is by ensuring that at
least one of those uses works without taking the two locks at the same
time.
The solution found was to keep taking the two locks in the cache_use
case. We first lock the cache to lookup for an entry and we then take
the shctx lock as well to detach the corresponding blocks from the
'avail' list. The subtlety is that between the cache lookup and the
actual locking of the shctx, another thread might have called the
reserve_hot function in which we only take the shctx lock.
In this function we traverse the 'avail' list to remove blocks that are
then given to the caller. If one of those blocks corresponds to a
previous row head, we call the 'free_blocks' callback that used to
delete the cache entry from the tree.
We now avoid deleting directly the cache entries in reserve_hot and we
rather set the cache entries 'complete' param to 0 so that no other
thread tries to work with this entry. This way, when we release the
shctx lock in reserve_hot, the first thread that had performed the cache
lookup and had found an entry that we just gave to another thread will
see that the 'complete' field is 0 and it won't try to work with this
response.

The actual removal of entries from the cache tree will now be performed
in the new 'reserve_finish' callback called at the end of the
shctx_row_reserve_hot function. It will iterate on all the row head that
were inserted in a dedicated list in the 'free_block' callback and
perform the actual delete.
2023-11-16 19:35:10 +01:00
Remi Tricot-Le Breton
1cd91b4f2a MINOR: shctx: Add new reserve_finish callback call to shctx_row_reserve_hot
This patch adds a reserve_finish callback that can be defined by the
subsystems that require a shared_context. It is called at the end of
shctx_row_reserve_hot after the shared_context lock is released.
2023-11-16 19:35:10 +01:00
Remi Tricot-Le Breton
11df806c88 MEDIUM: shctx: Descend shctx_lock calls into the shctx_row_reserve_hot
Descend the shctx_lock calls into the shctx_row_reserve_hot so that the
cases when we don't need to lock anything (enough space in the current
row or not enough space in the 'avail' list) do not take the lock at
all.
In sh_ssl_sess_new_cb the lock had to be descended into
sh_ssl_sess_store in order not to cover the shctx_row_reserve_hot call
anymore.
2023-11-16 19:35:10 +01:00
Remi Tricot-Le Breton
a29b073f26 MEDIUM: cache: Add refcount on cache_entry
Add a reference counter on the cache_entry. Its value will be atomically
increased and decreased via the retain_entry and release_entry
functions.
The release_entry function has two distinct versions,
release_entry_locked and release_entry_unlocked that should be called
when the cache lock is already taken in write mode or not
(respectively). In the unlocked case the cache lock will only be taken
in write mode on the last reference of the entry (before calling
delete_entry). This allows to limit the amount of times when we need to
take the cache lock during a release operation.
2023-11-16 19:35:10 +01:00
Remi Tricot-Le Breton
ed35b9411a MEDIUM: cache: Switch shctx spinlock to rwlock and restrict its scope
Since a lock on the cache tree was added in the latest cache changes, we
do not need to use the shared_context's lock to lock more than pure
shared_context related data anymore. This already existing lock will now
only cover the 'avail' list from the shared_context. It can then be
changed to a rwlock instead of a spinlock because we might want to only
run through the avail list sometimes.

Apart form changing the type of the shctx lock, the main modification
introduced by this patch is to limit the amount of code covered by the
shctx lock. This lock does not need to cover any code strictly related
to the cache tree anymore.
2023-11-16 19:35:10 +01:00
Remi Tricot-Le Breton
a0d7c290ec MINOR: cache: Use dedicated trash for "show cache" cli command
After the latest changes in the cache/shared_context mechanism, the
cache and shared_context logic were decorrelated and in some unlikely
cases we might end up using the "show cache" command while some regular
cache processing is occurring (a response being stored in the cache for
instance). In such a case, because we used the same 'trash' buffer in
those two contexts, we could end up with the contents of a response in
the ouput of the "show cache" command.
This patch fixes this problem by allocating a dedicated trash for the
CLI command.
2023-11-16 19:35:10 +01:00
Remi Tricot-Le Breton
3831d8454f MEDIUM: shctx: Remove 'hot' list from shared_context
The "hot" list stored in a shared_context was used to keep a reference
to shared blocks that were currently being used and were thus removed
from the available list (so that they don't get reused for another cache
response). This 'hot' list does not ever need to be shared across
threads since every one of them only works on their current row.

The main need behind this 'hot' list was to detach the corresponding
blocks from the 'avail' list and to have a known list root when calling
list_for_each_entry_from in shctx_row_data_append (for instance).

Since we actually never need to iterate over all members of the 'hot'
list, we can remove it and replace the inc_hot/dec_hot logic by a
detach/reattach one.
2023-11-16 19:35:10 +01:00
Remi Tricot-Le Breton
bd24118212 MEDIUM: cache: Use rdlock on cache in cache_use
When looking for a valid entry in the cache tree in
http_action_req_cache_use, we do not need to delete an expired entry at
once because even if an expired entry exists, since the request will be
forwarded to the server, then the expired entry will be overwritten when
the updated response is seen. We can then use a simpler rdlock during
cache_use operation.
2023-11-16 19:35:10 +01:00
Remi Tricot-Le Breton
0dfb57bbf9 MINOR: cache: Add option to avoid removing expired entries in lookup function
Any lookup in the cache tree done through entry_exist or
secondary_entry_exist functions could end up deleting the corresponding
entry if it is expired which prevents from using a rdlock on code paths
that would just perform a lookup on the tree (in
http_action_req_cache_use for instance).
Adding a 'delete_expired' boolean as a parameter allows for "pure"
lookups and thus it will allow to perform operations on the tree that
simply require a rdlock instead of a "heavier" wrlock.
2023-11-16 19:35:10 +01:00
Remi Tricot-Le Breton
ff3cb6dad4 MINOR: cache: Remove expired entry delete in "show cache" command
The "show cache" CLI command iterates over all the entries of the cache
tree and it used this opportunity to remove expired entries from the
cache. This behavior was completely undocumented and does not seem that
necessary. By removing it we can take the cache lock in read mode only
which limits the impact on the other threads.
2023-11-16 19:35:10 +01:00
Remi Tricot-Le Breton
ac9c49b40d MEDIUM: cache: Use dedicated cache tree lock alongside shctx lock
Every use of the cache tree was covered by the shctx lock even when no
operations were performed on the shared_context lists (avail and hot).
This patch adds a dedicated RW lock for the cache so that blocks of code
that work on the cache tree only can use this lock instead of the
superseding shctx one. This is useful for operations during which the
concerned blocks are already in the hot list.
When the two locks need to be taken at the same time, in
http_action_req_cache_use and in shctx_row_reserve_hot, the shctx one
must be taken first.
A new parameter needed to be added to the shared_context's free_block
callback prototype so that cache_free_block can take the cache lock and
release it afterwards.
2023-11-16 19:35:10 +01:00
Remi Tricot-Le Breton
81d8014af8 MINOR: shctx: Remove explicit 'from' param from shctx_row_data_append
This parameter is not necessary since the first element of a row always
has a pointer to the row's tail.
2023-11-16 19:35:10 +01:00
Remi Tricot-Le Breton
610b67fd8b MEDIUM: shctx: Simplify shctx_row_reserve_hot loop
The shctx_row_reserve_hot relied on two loop levels in order to first
look for the first block of a preused row and then iterate on all the
blocks of this row to reserve them for the new row. This was not the
simplest nor the easiest to read way so this logic could be replaced by
a single iteration on the avail list members.
The two use cases of calling this function with or without a preexisting
"first" member were a bit cumbersome as well and were replaced by a more
straightforward approach.
2023-11-16 19:35:10 +01:00
Remi Tricot-Le Breton
eccb97f60e MEDIUM: shctx: Move list between hot and avail list in O(1)
Instead of iterating over all the elements of a given row when moving it
between the hot and available lists, we can make use of the last_reserved
pointer that already points to the last block of the list to perform the
move in O(1).
2023-11-16 19:35:10 +01:00
Remi Tricot-Le Breton
55fbf82080 MINOR: shctx: Set last_append to NULL when reserving block in hot list
Ensure that the last_append pointer is always set to NULL on first block
of rows reserved by the subsystems using the shctx (cache for instance).
This pointer will be used directly in shctx_row_data_append instead of
the 'from' param which will simplify its uses.
2023-11-16 19:35:10 +01:00
Amaury Denoyelle
560cb1332a MINOR: server: force add to idle on reverse
A backend connection is inserted in server idle list via
srv_add_to_idle_list(). This function has several conditions which may
cause the connection to be rejected instead.

One of this condition is based on the current estimate count of needed
connections for the server. If the count of idle connections stored has
already reached this estimation, the new connection is rejected. This is
in opposition with the purpose of reverse HTTP. On active reverse,
haproxy can instantiate several connections to properly serve the future
traffic. However, the opposite passive haproxy will have only a low
estimate of needed connection and will reject most of them.

To fix this, simply check CO_FL_REVERSED connection flag on
srv_add_to_idle_list(). If set, the connection is inserted without
checking for estimate count. Note that all other conditions are not
impacted, so it's still possible to reject a connection, for example if
process FD limit is reached.

This commit relies on recent patch which change CO_FL_REVERSED flag for
connection after passive reverse.
2023-11-16 18:43:41 +01:00
Amaury Denoyelle
a1457296d5 BUG/MINOR: mux_h2: reject passive reverse conn if error on add to idle
On passive reverse, H2 mux is responsible to insert the connection in
the server idle list. This is done via srv_add_to_idle_list(). However,
this function may fail for various reason, such as FD usage limit
reached.

Handle properly this error case. H2 mux flags the connection on error
which will cause its release. Prior to this patch, the connection was
only released on server timeout.

This bug was found inspecting server curr_used_conns counter. Indeed, on
connection reverse, this counter is first incremented. It is decremented
just after on srv_add_to_idle_list() if insertion is validated. However,
if insertion is rejected, the connection was not released which cause
curr_used_conns to remains positive. This has the major downside to
break the reusing of idle connection on rhttp causing spurrious 503
errors.

No need to backport.
2023-11-16 18:43:32 +01:00
Amaury Denoyelle
8cc3fc73f1 MINOR: connection: update rhttp flags usage
Change the flags used for reversed connection :
* CO_FL_REVERSED is now put after reversal for passive connect. For
  active connect, it is delayed when accept is completed after reversal.
* CO_FL_ACT_REVERSING replace the old CO_FL_REVERSED. It is put only for
  active connect on reversal and removes once accept is done.

This allows to identify a connection as reversed during its whole
lifetime. This should be useful to extend reverse connect.
2023-11-16 17:53:31 +01:00
Christopher Faulet
691f4cf449 BUG/MEDIUM: stream: Don't call mux .ctl() callback if not implemented
The commit 5ff7d2276 ("BUG/MEDIUM: stream: Properly handle abortonclose when set
on backend only") introduced a regression. Not all multiplexer implement the
.ctl() callback function. Thus we must be sure this callback function is defined
first to call it.

This patch should fix a crash reported by Tristan in the issue #2095. It must be
backported as far as 2.2, with the commit above.
2023-11-14 19:21:52 +01:00
William Lallemand
d76fa37534 BUG/MEDIUM: mworker: set the master variable earlier
Since 2.7 and the mcli_reload_bind_conf (56f73b21a5), upon a reload
failure because of a bind error, the mcli_reload_bind_conf go through a
sock_unbind((). This is not supposed to do anything when a listener is
RX_F_INHERITED in the master, but unfortunately this is done too early
and provokes an exit of the master.

We already suspected in the past that setting the 'master' variable this
late could have negative impact.

The fix sets the master variable earlier before the bind.

This must be backported at least to 2.7. This could be backported
earlier but better wait any feedbacks on the fix.
2023-11-14 14:32:39 +01:00
Willy Tarreau
a63e016d27 MINOR: activity: report profiling duration and age in "show profiling"
Seeing counters in "show profiling" is not always very helpful without
an indication of how long the analysis lasted nor if it's still active
or not. Let's add a pair of start/stop timers for tasks and memory so
that we can now indicate how long the measurements lasted and when they
ended (or 0 if still running).

Note that for tasks profiling set to "auto", the measurement is considered
enabled since it can automatically switch on and off on a per-thread
basis.
2023-11-14 11:46:37 +01:00
Christopher Faulet
ec3ea6f698 MINOR: stconn: Use SC to detect frontend connections in sc_conn_recv()
In sc_conn_recv(), instead of using the connection to know we are on the
frontend side, we now use the SC flags. It changes nothing but it is
cleaner.
2023-11-14 11:01:51 +01:00
Christopher Faulet
5ff7d22767 BUG/MEDIUM: stream: Properly handle abortonclose when set on backend only
Since the 2.2 and the commit dedd30610 ("MEDIUM: h1: Don't wake the H1 tasklet
if we got the whole request."), we avoid to subscribe for reads if the H1
message is fully received. However, this broke the abortonclose option. To fix
the issue, a CO_RFL flag was added to instruct the mux it should still wait for
read events to properly handle read0. Only the H1 mux was concerned.

But since then, most of time, the option is only handled if it is set on the
frontend proxy because the request is fully received before selecting the
backend. If the backend is selected before the end of the request there is no
issue. But otherwise, because the backend is not known yet, we are unable to
properly handle the option and we miss to subscribe for reads.

Of course the option cannot be set on a frontend proxy. So concretly it means
the option is properly handled if it is enabled in the defaults section (if
common to frontend and backend) or a listen proxy, but it is ignored if it is
set on backend only.

Thanks to previous patches, we can now instruct the mux it should subscribe for
reads if not already done. We use this mechanism in process_stream() when the
connection is set up, ie when backend SC is set to SC_ST_REQ state.

This patch relies on following patches:
  * MINOR: connection: Add a CTL flag to notify mux it should wait for reads again
  * MEDIUM: mux-h1: Handle MUX_SUBS_RECV flag in h1_ctl() and susbscribe for reads

This patch should be the issue #2344. All the series must be backported as far
as 2.2.
2023-11-14 11:01:51 +01:00
Christopher Faulet
450ff71c95 MEDIUM: mux-h1: Handle MUX_SUBS_RECV flag in h1_ctl() and susbscribe for reads
The H1 mux now handle MUX_SUBS_RECV flag in h1_ctl(). If it is not already
subscribed for reads, it does so. This patch will be mandatory to properly
handle abortonclose option.
2023-11-14 11:01:51 +01:00
Christopher Faulet
9327e7efa7 BUG/MINOR: stconn: Handle abortonclose if backend connection was already set up
abortonclose option is a backend option, it should not be handle on frontend
side. Of course a frontend can also be a backend but the option should not
be handled too early because it is not necessarily the selected backend
(think about a listen proxy routing requests to another backend).

It is especially an issue when the abortonclose option is enabled in the
defaults section and disabled by the selected backend. Because in this case,
the option may still be enabled while it should not.

Thus, now we wait the backend connection was set up to handle the option. To
do so, we check the backend SC state. The option is ignored if it is in
ST_CS_INI state. For all other states, it means the backend was already
selected.

This patch could be backported as far as 2.2.
2023-11-14 11:01:51 +01:00
Willy Tarreau
6a4591c3d0 BUG/MEDIUM: connection: report connection errors even when no mux is installed
An annoying issue was met when testing the reverse-http mechanism, by
which failed connection attempts would apparently not be attempted again
when there was no connect timeout. It turned out to be more generalized
than the rhttp system, and actually affects all outgoing connections
relying on NPN or ALPN to choose the mux, on which no mux is installed
and for which the subscriber (ssl_sock) must be notified instead.

The problem appeared during 2.2-dev1 development. First, commit
062df2c23 ("MEDIUM: backend: move the connection finalization step to
back_handle_st_con()") broke the error reporting by testing CO_FL_ERROR
only under CO_FL_CONNECTED. While it still worked OK for cases where a
mux was present, it did not for this specific situation because no
single error path would be considered when no mux was present. Changing
the CO_FL_CONNECTED test to also include CO_FL_ERROR did work, until
a few commits later with 477902bd2 ("MEDIUM: connections: Get ride of
the xprt_done callback.") which removed the xprt_done callback that was
used to indicate success or failure of the transport layer setup, since,
as the commit explains, we can report this via the mux. What this last
commit says is true, except when there is no mux.

For this, however, the sock_conn_iocb() function (formerly conn_fd_handler)
is called for such errors, evaluates a number of conditions, none of which
is matched in this error condition case, since sock_conn_check() instantly
reports an error causing a jump to the leave label. There, the mux is
notified if installed, and the function returns. In other error condition
cases, readiness and activity are checked for both sides, the tasklets
woken up and the corresponding subscriber flags removed. This means that
a sane (and safe) approach would consist in just notifying the subscriber
in case of error, if such a subscriber still exists: if still there, it
means the event hasn't been caught earlier, then it's the right moment
to report it. And since this is done after conn_notify_mux(), it still
leaves all control to the mux once it's installed.

This commit should be progressively backported as far as 2.2 since it's
where the problem was introduced. It's important to clearly check the
error path in each function to make sure the fix still does what it's
supposed to.
2023-11-14 08:49:23 +01:00
Frédéric Lécaille
3741e4bf90 BUG/MINOR: quic: maximum window limits do not match the doc
This bug arrived with this commit:
     MINOR: quic: Add a max window parameter to congestion control algorithms

The documentation was been modified with missing/wrong modifications in the code part.
The 'g' suffix must be accepted to parse value in gigabytes. And exctly 4g is
also accepted.

No need to backport.
2023-11-13 19:56:28 +01:00
Frédéric Lécaille
9021e8935e MINOR: quic: Maximum congestion control window for each algo
Make all the congestion support the maximum congestion control window
set by configuration. There is nothing special to explain. For each
each algo, each time the window is incremented it is also bounded.
2023-11-13 17:53:18 +01:00
Frédéric Lécaille
028a55a1d0 MINOR: quic: Add a max window parameter to congestion control algorithms
Add a new ->max_cwnd member to bind_conf struct to store the maximum
congestion control window value for each QUIC binding.
Modify the "quic-cc-algo" keyword parsing to add an optional parameter
to its value: the maximum congestion window value between parentheses
as follows:

      ex: quic-cc-algo cubic(10m)

This value must be bounded, greater than 10k and smaller than 1g.
2023-11-13 17:53:18 +01:00
Frédéric Lécaille
840af0928b BUG/MEDIUM: quic: Non initialized CRYPTO data stream deferencing
This bug arrived with this commit:
   BUG/MINOR: quic: Useless use of non-contiguous buffer for in order CRYPTO data

Before this commit qc->cstream was tested before entering qc_treat_rx_crypto_frms().
This patch restablishes this behavior. Furthermore, it simplyfies
qc_ssl_provide_all_quic_data() which is a little bit ugly: the CRYPTO data
frame may be freed asap in the list_for_each_entry_safe() block after
having store its data pointer and length in local variables.
Also interrupt the CRYPTO data process as soon as qc_ssl_provide_quic_data()
or qc_treat_rx_crypto_frms() fail.

No need to be backported.
2023-11-13 16:00:25 +01:00
Amaury Denoyelle
954b5b756a BUG/MEDIUM: quic: fix FD for quic_cc_conn
Since following commit, quic_conn closes its owned socket before
transition to quic_cc_conn for closing state. This allows to save FDs as
quic_cc_conn could use the listener socket for their I/O.

  commit 150c0da889
  MEDIUM: quic: release conn socket before using quic_cc_conn

This patch is incomplete as it removes initialization of <fd> member for
quic_cc_conn. Thus, if sending is done on closing state, <fd> value is
undefined which in most cases will result in a crash. Fix this by simply
initializing <fd> member with qc_init_fd() in qc_new_cc_conn().

This bug should fix recent issue from #2095. Thanks to Tristan for its
reporting and then testing of this patch.

No need to backport.
2023-11-13 11:55:07 +01:00
Amaury Denoyelle
78d244e9f7 BUG/MINOR: quic: fix decrement of half_open counter on qc alloc failure
Half open counter is used to comptabilize QUIC connections waiting for
address validation. It was recently reworked to adjust its scope. With
each decrement operation, a BUG_ON() was added to ensure the counter
never wraps.

This BUG_ON() could be triggered if an allocation fails for one of
quic_conn members in qc_new_conn(). This is because half open counter is
incremented at the end of qc_new_conn(). However, in case of alloc
failure, quic_conn_release() is called immediately to ensure the counter
is decremented if a connection is freed before peer address has been
validated.

To fix this, increment half open counter early in qc_new_conn() prior to
every quic_conn members allocations.

This issue was reproduced using -dMfail argument.

This issue has been introduced by
  commit 278808915b
  MINOR: quic: reduce half open counters scope

No need to backport.
2023-11-13 11:16:41 +01:00
Amaury Denoyelle
92da3accfd BUG/MINOR: quic: fix crash on qc_new_conn alloc failure
A new counter was recently introduced to comptabilize the current number
of active QUIC handshakes. This counter is stored on the listener
instance.

This counter is incremented at the beginning of qc_new_conn() to check
if limit is not reached prior to quic_conn allocation. If quic_conn or
one of its inner member allocation fails, special care is taken to
decrement the counter as the connection instance is released. However,
it relies on <l> variable which is initialized too late to cover
pool_head_quic_conn allocation failure.

To fix this, simply initialize <l> at the beginning of qc_new_conn().

This issue was reproduced using -dMfail argument.

This issue was introduced by the following commit
  commit 3df6a60113
  MEDIUM: quic: limit handshake per listener

No need to backport.
2023-11-13 11:16:41 +01:00
Aurelien DARRAGON
76acde9107 BUG/MINOR: log: keep the ref in dup_logger()
This bug was introduced with 969e212 ("MINOR: log: add dup_logsrv() helper
function")

When duplicating an existing log entry, we must take care to inherit from
its original ->ref if it is set, because not doing so would make 28ac0999
("MINOR: log: Keep the ref when a log server is copied to avoid duplicate entries")
ineffective given that global log directives will lose their original
reference when duplicated resursively (at least twice), which is what
happens when global log directives are first inherited to defaults which
are then inherited to a regular proxy at the end of the chain.

This can be easily reproduced using the following configuration:

   |global
   |  log stdout format raw local0
   |
   |defaults
   |  log global
   |
   |frontend test
   |  log global
   |  ...

Logs from "test" proxy will be duplicated because test incorrectly
inherited from global "log" directives twice, which 28ac0999 would
normally detect and prevent.

No backport needed unless 969e212 gets backported.
2023-11-13 11:06:05 +01:00
Christopher Faulet
33a1fc883a BUG/MINOR: sample: Fix bytes converter if offset is bigger than sample length
When the bytes converter was improved to be able to use variables (915e48675
["MEDIUM: sample: Enhances converter "bytes" to take variable names as
arguments"]), the behavior of the sample slightly change. A failure is
reported if the given offset is bigger than the sample length. Before, a
empty binary sample was returned.

This patch fixes the converter to restore the original behavior. The
function was also refactored to properly handle failures by removing
SMP_F_MAY_CHANGE flag. Because the converter now handles variables, the
conversion to an integer may fail. In this case SMP_F_MAY_CHANGE flag must
be removed to be sure the caller will not retry.

This patch should fix the issue #2335. No backport needed except if commit
above is backported.
2023-11-13 11:06:05 +01:00
William Lallemand
a06f6212c9 MEDIUM: startup: 'haproxy -c' is quiet when valid
MODE_CHECK does not output "Configuration file is valid" by default
anymore. To display this message the -V option must be used with -c.

However the warning and errors are still output by default if they
exist.

This allows to clean the output of the systemd unit file with is doing a
-c.
2023-11-13 09:59:34 +01:00
Willy Tarreau
cf07cb96be BUG/MEDIUM: proxy: always initialize the default settings after init
The proxy's initialization is rather odd. First, init_new_proxy() is
called to zero all the lists and certain values, except those that can
come from defaults, which are initialized by proxy_preset_defaults().
The default server settings are also only set there.

This results in these settings not to be set for a number of internal
proxies that do not explicitly call proxy_preset_defaults() after
allocation, such as sink and log forwarders.

This was revealed by last commit 79aa63823 ("MINOR: server: always
initialize pp_tlvs for default servers") which crashes in log parsers
when applied to certain proxies which did not initialize their default
servers.

In theory this should be backported, however it would be desirable to
wait a bit before backporting it, in case certain parts would rely on
these elements not being initialized.
2023-11-13 09:17:05 +01:00
Willy Tarreau
79aa638238 MINOR: server: always initialize pp_tlvs for default servers
In commit 6f4bfed3a ("MINOR: server: Add parser support for
set-proxy-v2-tlv-fmt") a suspicious check for a NULL srv_tlv was placed
in the list_for_each_entry(), that should not be needed. In practice,
it's caused by the list head not being initialized, hence the first
element is NULL, as shown by Alexander's reproducer below which crashes
if the test in the loop is removed:

  backend dummy
    default-server send-proxy-v2 set-proxy-v2-tlv-fmt(0xE1) %[fc_pp_tlv(0xE1)]
    server dummy_server 127.0.0.1:2319

The right place to initialize this field is proxy_preset_defaults().
We'd really need a function to initialize a server :-/

The check in the loop was removed. No backport is needed.
2023-11-13 08:53:28 +01:00
Frdric Lcaille
dfda884633 BUG/MINOR: quic: Useless use of non-contiguous buffer for in order CRYPTO data
This issue could be reproduced with a TLS client certificate verificatio to
generate enough CRYPTO data between the client and haproxy and with dev/udp/udp-perturb
as network perturbator. Haproxy could crash thanks to a BUG_ON() call as soon as
in disorder data were bufferized into a non-contiguous buffer.

There is no need to pass a non NULL non-contiguous to qc_ssl_provide_quic_data()
from qc_ssl_provide_all_quic_data() which handles in order CRYPTO data which
have not been bufferized. If not, the first call to qc_ssl_provide_quic_data()
to process the first block of in order data leads the non-contiguous buffer
head to be advanced to a wrong offset, by <len> bytes which is the length of the
in order CRYPTO frame. This is detected by a BUG_ON() as follows:

FATAL: bug condition "ncb_ret != NCB_RET_OK" matched at src/quic_ssl.c:620
  call trace(11):
  | 0x5631cc41f3cc [0f 0b 8b 05 d4 df 48 00]: qc_ssl_provide_quic_data+0xca7/0xd78
  | 0x5631cc41f6b2 [89 45 bc 48 8b 45 b0 48]: qc_ssl_provide_all_quic_data+0x215/0x576
  | 0x5631cc3ce862 [48 8b 45 b0 8b 40 04 25]: quic_conn_io_cb+0x19a/0x8c2
  | 0x5631cc67f092 [e9 1b 02 00 00 83 45 e4]: run_tasks_from_lists+0x498/0x741
  | 0x5631cc67fb51 [89 c2 8b 45 e0 29 d0 89]: process_runnable_tasks+0x816/0x879
  | 0x5631cc625305 [8b 05 bd 0c 2d 00 83 f8]: run_poll_loop+0x8b/0x4bc
  | 0x5631cc6259c0 [48 8b 05 b9 ac 29 00 48]: main-0x2c6
  | 0x7fa6c34a2ea7 [64 48 89 04 25 30 06 00]: libpthread:+0x7ea7
  | 0x7fa6c33c2a2f [48 89 c7 b8 3c 00 00 00]: libc:clone+0x3f/0x5a

Thank you to @Tristan971 for having reported this issue in GH #2095.

No need to backport.
2023-11-10 18:16:14 +01:00
Aurelien DARRAGON
078ebde870 CLEANUP: sink: useless leftover in sink_add_srv()
Removing a useless leftover which has been introduced with 31e8a003a5
("MINOR: sink: function to add new sink servers")
2023-11-10 17:49:57 +01:00
Aurelien DARRAGON
2694621151 CLEANUP: sink: bad indent in sink_new_from_logger()
Fixing bad indent in sink_new_from_logger() which was recently introduced
2023-11-10 17:49:57 +01:00
Aurelien DARRAGON
d710dfbacc BUG/MINOR: sink: don't learn srv port from srv addr
Since 04276f3d ("MEDIUM: server: split the address and the port into two
different fields") we should not use srv->addr to store server's port
and rely on srv->svc_port instead.

For sink servers, we correctly set >svc_port upon server creation but
we didn't use it when initializing address for the connection.

As a result, FQDN resolution will not work properly with sink servers.
Hopefully, this used to work by accident because sink servers were
resolved using the PA_O_RESOLVE flag in str2sa_range(), which made the
srv->addr contain the port in addition to the address.

But this will fail to work when FQDN resolution is postponed because only
->svc_port will contain the proper server port upon resolution.

For instance, FQDN resolution with servers from log backends (which are
resolved as regular servers, that is, without the PA_O_RESOLVE) will fail
to work because of this.

This may be backported as far as 2.2 even though the bug didn't have
noticeable effects for versions below 2.9

[In 2.2, sink_forward_session_init() didn't exist it should be applied in
 sink_forward_session_create()]
2023-11-10 17:49:57 +01:00
Aurelien DARRAGON
64e0b63442 BUG/MEDIUM: server: invalid address (post)parsing checks
This bug was introduced with 29b76ca ("BUG/MEDIUM: server/log: "mode log"
after server keyword causes crash ")

Indeed, we cannot safely rely on addr_proto being set when str2sa_range()
returns in parse_server() (even if SRV_PARSE_PARSE_ADDR is set), because
proto lookup might be bypassed when FQDN addresses are involved.

Unfortunately, the above patch wrongly assumed that proto would always
be set when SRV_PARSE_PARSE_ADDR was passed to parse_server() (so when
str2sa_range() was called), resulting in invalid postparsing checks being
performed, which could as well lead to crashes with log backends
("mode log" set) because some postparsing init was skipped as a result of
proto not being set and this wasn't expected later in the init code.

To fix this, we now make use of the previous patch to perform server's
address compatibility checks on hints that are always set when
str2sa_range() succesfully returns.

For log backend, we're also adding a complementary test to check if the
address family is of expected type, else we report an error, plus we're
moving the postinit logic in log api since _srv_check_proxy_mode() is
only meant to check proxy mode compatibility and we were abusing it.

This patch depends on:
 - "MINOR: tools: make str2sa_range() directly return type hints"

No backport required unless 29b76ca gets backported.
2023-11-10 17:49:57 +01:00
Aurelien DARRAGON
12582eb8e5 MINOR: tools: make str2sa_range() directly return type hints
str2sa_range() already allows the caller to provide <proto> in order to
get a pointer on the protocol matching with the string input thanks to
5fc9328a ("MINOR: tools: make str2sa_range() directly return the protocol")

However, as stated into the commit message, there is a trick:
   "we can fail to return a protocol in case the caller
    accepts an fqdn for use later. This is what servers do and in this
    case it is valid to return no protocol"

In this case, we're unable to return protocol because the protocol lookup
depends on both the [proto type + xprt type] and the [family type] to be
known.

While family type might not be directly resolved when fqdn is involved
(because family type might be discovered using DNS queries), proto type
and xprt type are already known. As such, the caller might be interested
in knowing those address related hints even if the address family type is
not yet resolved and thus the matching protocol cannot be looked up.

Thus in this patch we add the optional net_addr_type (custom type)
argument to str2sa_range to enable the caller to check the protocol type
and transport type when the function succeeds.
2023-11-10 17:49:57 +01:00
Christopher Faulet
ebf90ca550 BUG/MEDIUM: applet: Remove appctx from buffer wait list on release
For now, the appctx is removed from the buffer wait list when it is
freed. However, when it is released, it is not necessarily freed
immediately. But it is detached from the SC. If it is still registered in
the buffer wait list, it could then be woken up to get a buffer. At this
stage it is totally unexpected, especially because we must access the SC.

The fix is obvious, the appctx must be removed from the buffer wait list on
release.

Note this bug exists because the appctx was moved at the mux level.

This patch must be backported as far as 2.6.
2023-11-10 17:49:57 +01:00
Amaury Denoyelle
150c0da889 MEDIUM: quic: release conn socket before using quic_cc_conn
After emission/reception of a CONNECTION_CLOSE, a connection enters the
CLOSING state. In this state, only minimal exchanges occurs as only the
packets which containted the CONNECTION_CLOSE frame can be reemitted. In
conformance with the RFC, most resources are released and quic_conn
instance is converted to the lighter quic_cc_conn.

Push further this optimization by closing quic_conn socket FD before
switching to a quic_cc_conn. This means that quic_cc_conn will rely on
listener socket for its send/recv operation. This should not impact
performance as as stated input/output are minimal on this state.

This patch should improve FD consumption as prior to this a socket FD
was kept during the closing delay which could cause maxsock to be
reached for other connections.

Note that fd member is kept in QUIC_CONN_COMMON and not removed from
quic_cc_conn. This is because quic_cc_conn relies on qc_snd_buf() which
access this field.

As a side-effect to this change, jobs accounting for quic_conn is also
updated. quic_cc_conn instances are now not counted as jobs. Indeed, the
main objective of jobs is to prevent haproxy process to be stopped with
data truncation. However, this relies on the connection to uses its
owned socket as the listener socket is shut down inconditionaly on
shutdown.

A consequence of the jobs handling change is that haproxy process will
be closed if only quic_cc_conn instances are present, thus preventing to
respect the closing state. In case of a reload, if a client missed a
CONNECTION_CLOSE frame just before process shutdown, it will probably
received a Stateless Reset on sending retry.

This change is considered safe as, for now, haproxy only emits
CONNECTION_CLOSE on error conditions (such as protocol violation or
timeout). It is considered as expected to suffer from data truncation
from this. However, if connection closing is reused by haproxy to
implement clean shutdown, it should be necessary to delay
CONNECTION_CLOSE frame emission to ensure no data truncation happens
here.
2023-11-10 15:27:45 +01:00
Amaury Denoyelle
f549eb2b34 MEDIUM: quic: respect closing state even on soft-stop
Prior to this patch, a special condition was set when idle timer was
rearmed for closing connections during haproxy process stopping. In this
case, the timeout was ditched and the idle task woken up immediatly.

The objective was to release quickly closing connections to not prevent
the process stopping to be too long. However, it is not conform with RFC
9000 recommandations and may cause some clients to miss a
CONNECTION_CLOSE in case of a packet loss.

A recent fix was set to use a shorter timeout for closing state. Now a
connection should only be left in this state for one second or less.
This reduces greatly the importance of stopping special condition. Thus,
this patch removes it completely.
2023-11-10 15:26:03 +01:00
Amaury Denoyelle
75e36c57f0 BUG/MINOR: quic: remove dead code in error path
In quic_rx_pkt_retrieve_conn(), err label is now only used if qc is
NULL. Thus, condition on qc can be removed.

No need to backport.

This issue was reported by coverity on github.
This should fix issue #2338.
2023-11-10 15:26:03 +01:00
Willy Tarreau
0a7ab7067f OPTIM: mux-h2: don't allocate more buffers per connections than streams
When an H2 mux works with a slow downstream connection and without the
mux-mux mode, it is possible that a single stream will allocate all 32
buffers in the connection. This is not desirable at all because 1) it
brings no value, and 2) it allocates a lot of memory per connection,
which, in addition to using a lot of memory, tends to degrade performance
due to cache thrashing.

This patch improves the situation by refraining from sending data frames
over a connection when more mbufs than streams are allocated. On a test
featuring 10k connections each with a single stream reading from the
cache, this patch reduces the RAM usage from ~180k buffers to ~20k bufs,
and improves the bandwidth. This may even be backported later to recent
versions to improve memory usage. Note however that it is efficient only
when combined with e16762f8a ("OPTIM: mux-h2: call h2_send() directly
from h2_snd_buf()"), and tends to slightly reduce the single-stream
performance without it, so in case of a backport, the two need to be
considered together.
2023-11-09 17:24:00 +01:00
Willy Tarreau
a13f8425f0 MINOR: task/debug: make task_queue() and task_schedule() possible callers
It's common to see process_stream() being woken up by wake_expired_tasks
in the profiling output, without knowing which timeout was set to cause
this. By making it possible to record the call places of task_queue()
and task_schedule(), and by making wake_expired_tasks() explicitly not
replace it, we'll be able to know which task_queue() or task_schedule()
was triggered for a given wakeup.

For example below:
  process_stream                51200   311.4ms   6.081us   34.59s    675.6us <- run_tasks_from_lists@src/task.c:659 task_queue
  process_stream                19227   70.00ms   3.640us   9.813m    30.62ms <- sc_notify@src/stconn.c:1136 task_wakeup
  process_stream                 6414   102.3ms   15.95us   8.093m    75.70ms <- stream_new@src/stream.c:578 task_wakeup

It's visible that it's the run_tasks_from_lists() which in fact applies
on the task->expire returned by the ->process() function itself.
2023-11-09 17:24:00 +01:00
Amaury Denoyelle
4dee110f56 BUG/MINOR: quic: fix retry token check inconsistency
A client may send multiple INITIAL packets if ClientHello is too big for
only one. In case a Retry token is used, the client must reuse it for
every INITIAL packets.

On the haproxy server side, there was an inconsistency to handle these
packets depending on the socket mode :
* when using listener socket, token is always revalidated.
* when using connection socket, token check is bypassed. This is because
  quic_conn instance is known through its socket and thus
  quic_rx_pkt_retrieve_conn() is not necessary.

RFC 9000 does not seems to mandate retry token validation after the
first INITIAL packet per connection. Thus, this patch chooses to bypass
the check every time the connection instance is known, as this indicates
that a previous token was already validated.

This should be backported up to 2.7.
2023-11-09 16:57:37 +01:00
Amaury Denoyelle
bb28215d9b MEDIUM: quic: define an accept queue limit
QUIC connections are pushed manually into a dedicated listener queue
when they are ready to be accepted. This happens after handshake
finalization or on 0-RTT packet reception. Listener is then woken up to
dequeue them with listener_accept().

This patch comptabilizes the number of connections currently stored in
the accept queue. If reaching a certain limit, INITIAL packets are
dropped on reception to prevent further QUIC connections allocation.
This should help to preserve system resources.

This limit is automatically derived from the listener backlog. Half of
its value is reserved for handshakes and the other half for accept
queues. By default, backlog is equal to maxconn which guarantee that
there can't be no more than maxconn connections in handshake or waiting
to be accepted.
2023-11-09 16:24:00 +01:00
Amaury Denoyelle
3df6a60113 MEDIUM: quic: limit handshake per listener
Implement a limit per listener for concurrent number of QUIC
connections. When reached, INITIAL packets for new connections are
automatically dropped until the number of handshakes is reduced.

The limit value is automatically based on listener backlog, which itself
defaults to maxconn.

This feature is important to ensure CPU and memory resources are not
consume if too many handshakes attempt are started in parallel.

Special care is taken if a connection is released before handshake
completion. In this case, counter must be decremented. This forces to
ensure that member <qc.state> is set early in qc_new_conn() before any
quic_conn_release() invocation.
2023-11-09 16:23:52 +01:00
Amaury Denoyelle
278808915b MINOR: quic: reduce half open counters scope
Accounting is implemented for half open connections which represent QUIC
connections waiting for handshake completion. When reaching a certain
limit, Retry mechanism is automatically activated prior to instantiate
new connections.

The issue with this behavior is that two notions are mixed : QUIC
connection handshake phase and Retry which is mechanism against
amplification attacks. As such, only peer address validation should be
taken into account to activate Retry protection.

This patch chooses to reduce the scope of half_open_conn. Now only
connection waiting to validate the peer address are now accounted for.
Most notably, connections instantiated with a validated Retry token
check are not accounted.

One impact of this patch is that it should prevent to activate Retry
mechanism too early, in particular in case if multiple handshakes are
too slow. Another limitation should be implemented to protect against
this scenario.
2023-11-09 16:23:52 +01:00
Amaury Denoyelle
d38bb7f8a7 MEDIUM: quic: adjust address validation
When a new QUIC connection is created, server considers peer address as
not yet validated. The server must limit its sending up to 3 times the
content already received. This is a defensive measure to avoid flooding
a remote host victim of address spoofing.

This patch adjust the condition to consider the peer address as
validated. Two conditions are now considered :
* successful handling of a received HANDSHAKE packet. This was already
  done before although implemented in a different way.
* validation of a Retry token. This was not considered prior this patch
  despite RFC recommandation.

This patch also adjusts how a connection is internally labelled as using
a validated peer address. Before, above conditions were checked via
quic_peer_validated_addr(). Now, a flag QUIC_FL_CONN_PEER_VALIDATED_ADDR
is set to labelled this. It already existed prior this patch but was
only used for quic_cc_conn. This should now be more explicit.
2023-11-09 16:23:52 +01:00
Christopher Faulet
3a051ca0c8 BUG/MEDIUM: mux-h1: Exit early if fast-forward is not supported by opposite SC
The commit 4be0c7c65 ("MEDIUM: stconn/muxes: Loop on data fast-forwarding to
forward at least a buffer") introduced a regression. In h1_fastfwd(), if
data fast-forwarding is not supported by the opposite SC, we must exit
without calling se_donn_ff(). Otherwise a BUG_ON() will be triggered because
the opposite mux has no .done_fastfwd() callback function.

No backport needed.
2023-11-09 15:18:43 +01:00
William Lallemand
3ac3a06963 MEDIUM: mworker: -W is mandatory when using -S
Defining a master CLI without the master-worker mode emits a warning
since version 1.8. This patch enforce the behavior by forbiding the
usage of the -S option without the master-worker mode.
2023-11-09 15:07:15 +01:00
William Lallemand
da24b462c3 MEDIUM: errors: move the MODE_QUIET test in print_message()
Move the MODE_QUIET and MODE_VERBOSE test in print_message() so we
always output in the startup-logs even with MODE_QUIET.

ha_warning(), ha_alert() and ha_notice() does not check the MODE_QUIET
and MODE_VERBOSE anymore, it is done before doing the fprintf() in
print_message().
2023-11-09 14:39:11 +01:00
William Lallemand
59d699c0c4 MINOR: errors: does not check MODE_STARTING for log emission
ha_alert(), ha_warning() and ha_notice() shouldn't check MODE_STARTING
for log emission. Let's remove the check.

This shouldn't do much since the stdio_quiet() function mute the output
in main().
2023-11-09 14:39:11 +01:00
William Lallemand
b959b752f9 MINOR: errors: ha_alert() and ha_warning() uses warn_exec_path()
Move the code to display the haproxy version and path during starting
mode, which is called by the first ha_alert() or ha_warning().
2023-11-09 14:39:11 +01:00
Christopher Faulet
78021ee9ef BUG/MEDIUM: stconn: Don't update stream expiration date if already expired
The commit 08d7169f4 ("MINOR: stconn: Don't queue stream task in past in
sc_notify()") tried to fix issues with epiration date set in past for the
stream in sc_notify(). However it remains some cases where the stream
expiration date may already be expired before recomputing it. This happens
when an event is reported by the mux exactly when a timeout is triggered. In
this case, depending on the scheduling, the SC may be woken up before the
stream. For these cases, we fall into the BUG_ON() preventing to queue in
the past.

So, it remains unexpected to queue a task in the past. The BUG_ON() is
correct at this place. We must just avoid to recompute the stream expiration
date if it is already expired. At worst, the stream will be woken up for
nothing. But it is not really a big deal because it will only happen on
timeouts from time to time. It is so sporadic that we can ignore it from a
performance point of view.

This patch must be backpoted to 2.8. Be careful to remove the BUG_ON() on
the 2.8.
2023-11-09 12:08:59 +01:00
Frédéric Lécaille
819690303d BUG/MEDIUM: quic: Avoid some crashes upon TX packet allocation failures
If a TX packet cannot be allocated (by qc_build_pkt()), as it can be coalesced
to another one, this leads the TX buffer to have remaining not sent prepared data.
Then haproxy crashes upon a BUG_ON() triggered by the next call to qc_txb_release().
This may happen only during handshakes.

To fix this, qc_build_pkt() returns a new -3 error to dected such allocation
failures followed which is for now on followed by a call to qc_purge_txbuf() to
send the TX prepared data and purge the TX buffer.

Must be backported as far as 2.6.
2023-11-09 10:32:31 +01:00
Frédéric Lécaille
b21e08cbd2 BUG/MEDIUM: quic: Possible crashes when sending too short Initial packets
This may happen during handshakes when Handshake packets cannot be coalesced
to a first Initial packet because of TX frame allocation failures (from
qc_build_frms()). This leads too short (not padded) Initial packets to be sent.
This is detected by a BUG_ON() in qc_send_ppkts().

To avoid this an Handshake packet without ack-eliciting frames which should have
been built by qc_build_frms() is built.

Must be backported as far as 2.6.
2023-11-09 10:32:31 +01:00
Frédéric Lécaille
c78cb49a3b BUG/MEDIUM: quic: Avoid trying to send ACK frames from an empty ack ranges tree
This may happen upon ack ranges allocation failures (from quic_update_ack_ranges_list().
This can lead to empty trees of ack ranges to be used to build ACK frames which
is not good at all. Furthermore this is detected by a BUG_ON() (in qc_do_build_pkt()).

To avoid this, simply update the acknowledgemen state of the connection only if
quic_update_ack_ranges_list() succeeds, as it fails only in case of memory
allocation failures.

Must be backported as far as 2.6.
2023-11-09 10:32:31 +01:00
Frédéric Lécaille
4e3b28e8b6 BUG/MEDIUM: quic: Too short Initial packet sent (enc. level allocation failed)
If the Handshake encryption level could not be allocated, this could lead
to Initial packets to be sent because no Handshake CRYPTO frames were generated.

Furthermore in such an allocation failure case, the connection should be closed
as soon as possible. This is done making ha_quic_set_encryption_secrets() return
0 upon an encryption level allocation failure.

Also fix a typo in the trace in relation to this allocation failure.

No need to be backported.
2023-11-09 10:32:31 +01:00
Frédéric Lécaille
4cf784f38e MINOR: quic: Avoid zeroing frame structures
Do not initialize anymore ->type of quic_frame structures which leads
to the others to be zeroed.
2023-11-09 10:32:31 +01:00
Frédéric Lécaille
f1be725474 CLEANUP: quic: Indentation fix in qc_do_build_pkt()
Modification without any functional impact.
2023-11-09 10:32:31 +01:00
Frédéric Lécaille
7ecf4b34b9 BUG/MINOR: quic: idle timer task requeued in the past
When the idle timer expired with a still present mux, this task was not freed
and even requeued with a timer in the past.

Fix this issue calling task_destroy() in this case. As the task is freed,
its handler must return NULL setting local <t> variable to NULL in every cases.

Also ensure that this timer task is not armed again after having been released
with a <return> statement when this is the case from qc_idle_timer_do_rearm().

Must be backported as far as 2.6.
2023-11-09 10:32:31 +01:00
Frédéric Lécaille
b48abf0beb MINOR: quic: Add idle timer task pointer to traces
Helpful to detect if this timer was freed or not.
2023-11-09 10:32:31 +01:00
Frédéric Lécaille
4cfae3ac01 MINOR: quic: release the TLS context asap from quic_conn_release()
This was no reason not to release as soon as possible the TLS/SSL QUIC connection
context from quic_conn_release() before allocating a "closing connection" connection
(quic_cc_conn struct).
2023-11-09 10:32:31 +01:00
Frédéric Lécaille
3a8dd48e30 MEDIUM: quic: Heavy task mode with non contiguously bufferized CRYPTO data
This patch sets the handshake task in heavy task mode when receiving in disorder
CRYPTO data which results in in order bufferized CRYPTO data. This is done
thanks to a non-contiguous buffer and from qc_handle_crypto_frm() after having
potentially bufferized CRYPTO data in this buffer.
qc_treat_rx_crypto_frms() is no more called from qc_treat_rx_pkts() but instead
this is where the task is set in heavy task mode. Consequently,
this is the job of qc_ssl_provide_all_quic_data() to call directly
qc_treat_rx_crypto_frms() to provide the in order bufferized CRYPTO data to the
TLS stack. As this function releases the non-contiguous buffer for the CRYPTO
data, if possible, there is no need to do that from qc_treat_rx_crypto_frms()
anymore.
2023-11-09 10:32:31 +01:00
Frédéric Lécaille
94d20be138 MEDIUM: quic: Heavy task mode during handshake
Add a new pool for the CRYPTO data frames received in order.
Add ->rx.crypto_frms list to each encryption level to store such frames
when they are received in order from qc_handle_crypto_frm().
Also set the handshake task (qc_conn_io_cb()) in heavy task mode from
this function after having received such frames. When this task
detects that it is set in heavy mode, it calls qc_ssl_provide_all_quic_data()
newly implemented function to provide the CRYPTO data to the TLS task.
Modify quic_conn_enc_level_uninit() to release these CRYPTO frames
when releasing the encryption level they are in relation with.
2023-11-09 10:32:31 +01:00