Commit graph

5387 commits

Author SHA1 Message Date
Frédéric Lécaille
48f8e1925b MINOR: proto_quic: Allocate TX ring buffers for listeners
We allocate an array of QUIC ring buffer, one by thread, and arranges them in a
MT_LIST. Everything is allocated or nothing: we do not want to usse an incomplete
array of ring buffers to ensure that each thread may safely acquire one of these
buffers.
2021-09-23 15:27:25 +02:00
Frédéric Lécaille
22cfdf8d0e MINOR: quic: Add ring buffer definition (struct qring) for QUIC
A ring buffer is made of a circular buffer (->cbuf) and must be arrange
in a MT_LIST (->mt_list).
2021-09-23 15:27:25 +02:00
Frédéric Lécaille
9621565b74 MINOR: net_helper: add functions for pointers
Add two functions to read/write pointer values to/from vectors.
2021-09-23 15:27:25 +02:00
Frédéric Lécaille
c6bc185c18 MINOR: quic: Add a ring buffer implementation for QUIC
This implementation is inspired from Linux kernel circular buffer implementation
(see include/linux/circ-buf.h). Such buffers may be used at the same time both
by writer and reader (lock-free).
2021-09-23 15:27:25 +02:00
Frédéric Lécaille
f3d078d22e MINOR: quic: Make qc_lstnr_pkt_rcv() be thread safe.
Modify the I/O dgram handler principal function used to parse QUIC packets
be thread safe. Its role is at least to create new incoming connections
add to two trees protected by the same RW lock. The packets are for now on
fully parsed before possibly creating new connections.
2021-09-23 15:27:25 +02:00
Frédéric Lécaille
a11d0e26d4 MINOR: quic: Replace the RX unprotected packet list by a thread safety one.
This list is shared between the I/O dgram handler and the task responsible
for processing the QUIC packets inside.
2021-09-23 15:27:25 +02:00
Frédéric Lécaille
c28aba2a8d MINOR: quic: Replace the RX list of packet by a thread safety one.
This list is shared between the I/O dgram handler and the task responsible
for processing the QUIC packets.
2021-09-23 15:27:25 +02:00
Frédéric Lécaille
a5fe49f44a MINOR: quic: Move the connection state
Move the connection state from quic_conn_ctx struct to quic_conn struct which
is the structure which is used to store the QUIC connection part information.
This structure is initialized by the I/O dgram handler for each new connection
to QUIC listeners. This is needed for the multithread support so that to not
to have to depend on the connection context potentially initialized by another
thread.
2021-09-23 15:27:25 +02:00
Frédéric Lécaille
9fccace8b0 MINOR: quic: Add a lock for RX packets
We must protect from concurrent the tree which stores the QUIC packets received
by the dgram I/O handler, these packets being also parsed by the xprt task.
2021-09-23 15:27:25 +02:00
Frédéric Lécaille
497fa78ad8 MINOR: quic: Derive the initial secrets asap
Make depends qc_new_isecs() only on quic_conn struct initialization only (no more
dependency on connection struct initialization) to be able to run it as soon as
the quic_conn struct is initialized (from the I/O handler) before running ->accept()
quic proto callback.
2021-09-23 15:27:25 +02:00
Frédéric Lécaille
3d77fa754d MINOR: quic: QUIC conn initialization from I/O handler
Move the QUIC conn (struct quic_conn) initialization from quic_sock_accept_conn()
to qc_lstnr_pkt_rcv() as this is done for the server part.
Move the timer initialization to ->start xprt callback to ensure the connection
context is done : it is initialized by the ->accept callback which may be run
by another thread than the one for the I/O handler which also run ->start.
2021-09-23 15:27:25 +02:00
Frédéric Lécaille
785c9c998a MINOR: quic: Replace max_packet_size by max_udp_payload size.
The name the maximum packet size transport parameter was ambiguous and replaced
by maximum UDP payload size. Our code would be also ambiguous if it does not
reflect this change.
2021-09-23 15:27:25 +02:00
Frédéric Lécaille
8648c7c995 MINOR: quic: Avoid header collisions
Extract the QUIC varints encoding functions from xprt_quic.h to avoid
header collisions.
2021-09-23 15:27:25 +02:00
Frédéric Lécaille
738397065c MINOR: quic: Add a wrapper function to update transport parameters.
This function calls quic_mux_transport_params_update() to update the related
streams transport parameter of the mux. It is there only so that not to have
to include mux_quic.h to update these parameters.
2021-09-23 15:27:25 +02:00
Frédéric Lécaille
7a668f5acb MINOR: quic: Variable-length integer encoding/decoding into/from buffer struct.
Add a function to encode a QUIC varint into a buffer struct. Samething for the
deconding part.
2021-09-23 15:27:25 +02:00
Frédéric Lécaille
b4672fb6f0 MINOR: qpack: Add QPACK compression.
Implement QPACK used for HTTP header compression by h3.
2021-09-23 15:27:25 +02:00
Frédéric Lécaille
ccac11f35a MINOR: h3: Add HTTP/3 definitions.
Add all the definitions for HTTP/3 implementation.
2021-09-23 15:27:25 +02:00
Frédéric Lécaille
b8f60b3c66 MINOR: quic: Add a new definition to store STREAM frames.
Add a new structure to store enough information about STREAM frames which
must be stored before being delivered to the application layer, for any
reason.
2021-09-23 15:27:25 +02:00
Frédéric Lécaille
65bc43434a MINOR: quic: Attach QUIC mux connection objet to QUIC connection.
This add a qcc struct for QUIC mux/demux connection layer to quic_conn struct
at low level connection layer.
2021-09-23 15:27:25 +02:00
Frédéric Lécaille
2abe74f39c MINOR: connection: Add callbacks definitions for QUIC.
The flow control at stream level is organized by types (client bidi, server bidi,
client uni, server uni). Adds at least callback to retrieve the number
of available streams by direction.
2021-09-23 15:27:25 +02:00
Frédéric Lécaille
dfbae766b2 MINOR: mux_quic: Add QUIC mux layer.
This file has been derived from mux_h2.c removing all h2 parts. At
QUIC mux layer, there must not be any reference to http. This will be the
responsability of the application layer (h3) to open streams handled by the mux.
2021-09-23 15:27:25 +02:00
Frédéric Lécaille
5aa4143d6c MINOR: quic: Move transport parmaters to anynomous struct.
We move ->params transport parameters to ->rx.params. They are the
transport parameters which will be sent to the peer, and used for
the endpoint flow control. So, they will be used to received packets
from the peer (RX part).
Also move ->rx_tps transport parameters to ->tx.params. They are the
transport parameter which are sent by the peer, and used to respect
its flow control limits. So, they will be used when sending packets
to the peer (TX part).
2021-09-23 15:27:25 +02:00
Tim Duesterhus
ec4a8754da CLEANUP: Apply xalloc_size.cocci
This fixes a few locations with a hardcoded type within `sizeof()`.
2021-09-17 17:22:05 +02:00
Tim Duesterhus
b113b5ca24 CLEANUP: Apply ist.cocci
This cleans up ist handling.
2021-09-17 17:22:05 +02:00
Willy Tarreau
81a76f4827 REORG: threads: move ha_get_pthread_id() to tinfo.h
This solely manipulates the thread_info struct, it ought to be in
tinfo.h, not in thread.h.
2021-09-17 16:08:34 +02:00
Willy Tarreau
e61244631a MINOR: applet: remove the thread mask from appctx_new()
appctx_new() is exclusively called with tid_bit and it only uses the
mask to pass it to the accompanying task. There is no point requiring
the caller to know about a mask there, nor is there any point in
creating an applet outside of the context of its own thread anyway.
Let's drop this and pass tid_bit to task_new() directly.
2021-09-17 16:08:34 +02:00
Amaury Denoyelle
7a8aff2688 BUILD: ist: prevent gcc11 maybe-uninitialized warning on istalloc
A new warning is reported by gcc11 when using a pointer to uninitialized
memory block for a function with a const pointer argument. The warning
is triggered for istalloc, used by http_client.c / proxy.c / tcpcheck.c.

This warning is reported because the uninitialized memory block
allocated by malloc should not be passed to a const argument as in ist2.
See https://gcc.gnu.org/onlinedocs/gcc-11.1.0/gcc/Warning-Options.html#index-Wmaybe-uninitialized

This should be backported up to 2.2.
2021-09-17 09:57:27 +02:00
Willy Tarreau
c2afb860f2 MINOR: pools: use mallinfo2() when available instead of mallinfo()
Ilya reported in issue #1391 a build warning on Fedora about mallinfo()
being deprecated in favor of mallinfo2() since glibc-2.33. Let's add
support for it. This should be backported where the following commit is
also backported: 157e39303 ("MINOR: pools: automatically disable
malloc_trim() with external allocators").
2021-09-16 09:20:16 +02:00
Tim Duesterhus
8f1669b10f CLEANUP: Remove prototype for non-existent thread_get_default_count()
This is the only location of `thread_get_default_count` within the codebase.
2021-09-15 11:07:18 +02:00
Tim Duesterhus
992007ec78 CLEANUP: tree-wide: fix prototypes for functions taking no arguments.
"f(void)" is the correct and preferred form for a function taking no
argument, while some places use the older "f()". These were reported
by clang's -Wmissing-prototypes, for example:

  src/cpuset.c:111:5: warning: no previous prototype for function 'ha_cpuset_size' [-Wmissing-prototypes]
  int ha_cpuset_size()
  include/haproxy/cpuset.h:42:5: note: this declaration is not a prototype; add 'void' to make it a prototype for a zero-parameter function
  int ha_cpuset_size();
      ^
                     void

This aggregate patch fixes this for the following functions:

   ha_backtrace_to_stderr(), ha_cpuset_size(), ha_panic(), ha_random64(),
   ha_thread_dump_all_to_trash(), get_exec_path(), check_config_validity(),
   mworker_child_nb(), mworker_cli_proxy_(create|stop)(),
   mworker_cleantasks(), mworker_cleanlisteners(), mworker_ext_launch_all(),
   mworker_reload(), mworker_(env|proc_list)_to_(proc_list|env)(),
   mworker_(un|)block_signals(), proxy_adjust_all_maxconn(),
   proxy_destroy_all_defaults(), get_tainted(),
   pool_total_(allocated|used)(), thread_isolate(_full|)(),
   thread(_sync|)_release(), thread_harmless_till_end(),
   thread_cpu_mask_forced(), dequeue_all_listeners(), next_timer_expiry(),
   wake_expired_tasks(), process_runnable_tasks(), init_acl(),
   init_buffer(), (de|)init_log_buffers(), (de|)init_pollers(),
   fork_poller(), pool_destroy_all(), pool_evict_from_local_caches(),
   pool_total_failures(), dump_pools_to_trash(), cfg_run_diagnostics(),
   tv_init_(process|thread)_date(), __signal_process_queue(),
   deinit_signals(), haproxy_unblock_signals()
2021-09-15 11:07:18 +02:00
Willy Tarreau
4f5485bfad BUG/MINOR: compat: make sure __WORDSIZE is always defined
-Wundef triggered on a MIPS-based musl build on __WORDSIZE that's used
in ultoa_o() and some Lua initialization. The former will fail to convert
integers larger to 1 billion to proper string in this case. Let's make
sure this macro is defined and fall back to values determined from
__SIZEOF_LONG__ otherwise. A cleaner long-term approach would consist
in removing all remaining occurrences of this macro.

This can be backported to all versions.
2021-09-15 10:32:12 +02:00
Willy Tarreau
8ab9419394 BUILD: threads: fix -Wundef for _POSIX_PRIORITY_SCHEDULING on libmusl
Building with an old musl-based toolchain reported this warning:

  include/haproxy/thread.h: In function 'ha_thread_relax':
  include/haproxy/thread.h:256:5: warning: "_POSIX_PRIORITY_SCHEDULING" is not defined [-Wundef]
   #if _POSIX_PRIORITY_SCHEDULING
       ^

There were indeed two "#if" insteadd of #ifdef" for this macro, let's
fix them.
2021-09-15 10:32:12 +02:00
Willy Tarreau
8ac6597cbe BUILD: compiler: fixed a missing test on defined(__GNUC__)
This one could theoretically trigger -Wundef on non-gcc compatible
compilers if DEBUG_USE_ABORT is not set.
2021-09-13 09:30:47 +02:00
Tim Duesterhus
cf6f574872 CLEANUP: Move XXH3 macro from haproxy/compat.h to haproxy/xxhash.h
This moves all the xxhash functionality into a single location.

see d5fc8fcb86
2021-09-11 20:37:50 +02:00
Tim Dsterhus
a8bfb4d135 CLEANUP: ebmbtree: Replace always-taken elseif by else
`diff` is guaranteed to be less than 0, because the `if` handles the `>= 0`
case.

Found using GitHub's CodeQL scan in HAProxy's codebase.
2021-09-11 20:15:28 +02:00
Tim Duesterhus
d5fc8fcb86 CLEANUP: Add haproxy/xxhash.h to avoid modifying import/xxhash.h
This solves setting XXH_INLINE_ALL in a cleaner way, because the imported
header is not modified, easing future updates.

see 6f7cc11e6d
2021-09-11 19:58:45 +02:00
Christopher Faulet
f079f44096 MINOR: htx: Skip headers with no value when adding a header list to a message
When the header list is added, after the message parsing, headers with no
value are now ignored. It is not the same than headers with empty value
fields. Only headers with a NULL pointer as value are skipped. This only
happens if the header value is removed during the message
parsing. Concretly, such headers are now ignored when htx_add_all_headers()
is called. However, htx_add_header() is not affected by this change.

Symetrically, the same is true for trailers. It may be backported to 2.4
because of the previous fix ("BUG/MEDIUM: mux-h1: Remove "Upgrade:" header
for requests with payload").
2021-09-10 10:35:53 +02:00
devnexen@gmail.com
ac5f634cb1 BUILD: fix dragonfly build again on __read_mostly
It looks like some versions define it and others not. Better rely on
the macro itself rather than checking for a particular OS.
2021-09-08 19:46:29 +02:00
Willy Tarreau
61ecf28389 OPTIM: vars: only takes the variables lock on shared entries
There's no point taking the variables locks for sess/txn/req/res
contexts since these ones always run inside the same thread anyway.
This patch conditions the lock on the variable's scope to avoid
flushing cache lines when not needed.

This showed an improvement of ~5% on a 16-thread machine with 12
variables.
2021-09-08 15:44:45 +02:00
Willy Tarreau
dc72fbb8e8 MINOR: vars: centralize the lock/unlock into static inlines
The goal it to simplify the variables locking in order to later
simplify it.
2021-09-08 15:19:57 +02:00
Willy Tarreau
3a4bedccc6 MEDIUM: vars: replace the global name index with a hash
The global table of known variables names can only grow and was designed
for static names that are registered at boot. Nowadays it's possible to
set dynamic variable names from Lua or from the CLI, which causes a real
problem that was partially addressed in 2.2 with commit 4e172c93f
("MEDIUM: lua: Add `ifexist` parameter to `set_var`"). Please see github
issue #624 for more context.

This patch simplifies all this by removing the need for a central
registry of known names, and storing 64-bit hashes instead. This is
highly sufficient given the low number of variables in each context.
The hash is calculated using XXH64() which is bijective over the 64-bit
space thus is guaranteed collision-free for 1..8 chars. Above that the
risk remains around 1/2^64 per extra 8 chars so in practice this is
highly sufficient for our usage. A random seed is used at boot to seed
the hash so that it's not attackable from Lua for example.

There's one particular nit though. The "ifexist" hack mentioned above
is now limited to variables of scope "proc" only, and will only match
variables that were already created or declared, but will now verify
the scope as well. This may affect some bogus Lua scripts and SPOE
agents which used to accidentally work because a similarly named
variable used to exist in a different scope. These ones may need to be
fixed to comply with the doc.

Now we can sum up the situation as this one:
  - ephemeral variables (scopes sess, txn, req, res) will always be
    usable, regardless of any prior declaration. This effectively
    addresses the most problematic change from the commit above that
    in order to work well could have required some script auditing ;

  - process-wide variables (scope proc) that are mentioned in the
    configuration, referenced in a "register-var-names" SPOE directive,
    or created via "set-var" in the global section or the CLI, are
    permanent and will always accept to be set, with or without the
    "ifexist" restriction (SPOE uses this internally as well).

  - process-wide variables (scope proc) that are only created via a
    set-var() tcp/http action, via Lua's set_var() calls, or via an
    SPOE with the "force-set-var" directive), will not be permanent
    but will always accept to be replaced once they are created, even
    if "ifexist" is present

  - process-wide variables (scope proc) that do not exist will only
    support being created via the set-var() tcp/http action, Lua's
    set_var() calls without "ifexist", or an SPOE declared with
    "force-set-var".

This means that non-proc variables do not care about "ifexist" nor
prior declaration, and that using "ifexist" should most often be
reliable in Lua and that SPOE should most often work without any
prior declaration. It may be doable to turn "ifexist" to 1 by default
in Lua to further ease the transition. Note: regtests were adjusted.

Cc: Tim Dsterhus <tim@bastelstu.be>
2021-09-08 15:06:11 +02:00
Willy Tarreau
c1c88f4809 MEDIUM: vars: make var_clear() only reset VF_PERMANENT variables
We certainly do not want that a permanent variable (one that is listed
in the configuration) be erased by accident by an "unset-var" action.
Let's make sure these ones are only reset to an empty sample, like at
the moment of their initial registration. One trick is that the same
function is used to purge the memory at the end and to delete, so we
need to add an extra "force" argument to make the choice.
2021-09-08 15:06:11 +02:00
Willy Tarreau
3dc6dc3178 MINOR: vars: store flags into variables and add VF_PERMANENT
In order to continue to honor the ifexist Lua option and prevent rogue
SPOA agents from creating too many variables, we'll need to keep the
ability to mark certain proc.* variables as permanent when they're
known from the config file.

Let's add a flag there for this. It's added to the variable when the
variable is created with this flag set by the caller.

Another approach could have been to use a distinct list or distinct
scope but that sounds complicated and bug-prone.
2021-09-08 14:06:34 +02:00
Willy Tarreau
4994b57728 MINOR: vars: add a VF_CREATEONLY flag for creation
Passing this flag to var_set() will result in the variable to only be
created if it did not exist, otherwise nothing is done (it's not even
updated). This will be used for pre-registering names.
2021-09-08 11:47:30 +02:00
Willy Tarreau
7978c5c422 MEDIUM: vars: make the ifexist variant of set-var only apply to the proc scope
When setting variables, there are currently two variants, one which will
always create the variable, and another one, "ifexist", which will only
create or update a variable if a similarly named variable in any scope
already existed before.

The goal was to limit the risk of injecting random names in the proc
scope, but it was achieved by making use of the somewhat limited name
indexing model, which explains the scope-agnostic restriction.

With this change, we're moving the check downwards in the chain, at the
variable level, and only variables under the scope "proc" will be subject
to the restriction. A new set of VF_* flags was added to adjust how
variables are set, and VF_UPDATEONLY is used to mention this restriction.

In this exact state of affairs, this is not completely exact, as if a
similar name was not known in any scope, the variable will continue to
be rejected like before, but this will change soon.
2021-09-08 11:47:06 +02:00
Willy Tarreau
b7bfcb3ff3 MINOR: vars: rename vars_init() to vars_init_head()
The vars_init() name is particularly confusing as it does not initialize
the variables code but the head of a list of variables passed in
arguments. And we'll soon need to have proper initialization code, so
let's rename it now.
2021-09-08 11:10:16 +02:00
Willy Tarreau
10080716bf MINOR: proxy: add a global "grace" directive to postpone soft-stop
In ticket #1348 some users expressed some concerns regarding the removal
of the "grace" directive from the proxies. Their use case very closely
mimmicks the original intent of the grace keyword, which is, let haproxy
accept traffic for some time when stopping, while indicating an external
LB that it's stopping.

This is implemented here by starting a task whose expiration triggers
the soft-stop for real. The global "stopping" variable is immediately
set however. For example, this below will be sufficient to instantly
notify an external check on port 9999 that the service is going down,
while other services remain active for 10s:

    global
      grace 10s

    frontend ext-check
      bind :9999
      monitor-uri /ext-check
      monitor fail if { stopping }
2021-09-07 17:34:29 +02:00
Willy Tarreau
3b69886f7d BUG/MAJOR: htx: fix missing header name length check in htx_add_header/trailer
Ori Hollander of JFrog Security reported that htx_add_header() and
htx_add_trailer() were missing a length check on the header name. While
this does not allow to overwrite any memory area, it results in bits of
the header name length to slip into the header value length and may
result in forging certain header names on the input. The sad thing here
is that a FIXME comment was present suggesting to add the required length
checks :-(

The injected headers are visible to the HTTP internals and to the config
rules, so haproxy will generally stay synchronized with the server. But
there is one exception which is the content-length header field, because
it is already deduplicated on the input, but before being indexed. As
such, injecting a content-length header after the deduplication stage
may be abused to present a different, shorter one on the other side and
help build a request smuggling attack, or even maybe a response splitting
attack. CVE-2021-40346 was assigned to this problem.

As a mitigation measure, it is sufficient to verify that no more than
one such header is present in any message, which is normally the case
thanks to the duplicate checks:

   http-request  deny if { req.hdr_cnt(content-length) gt 1 }
   http-response deny if { res.hdr_cnt(content-length) gt 1 }

This must be backported to all HTX-enabled versions, hence as far as 2.0.
In 2.3 and earlier, the functions are in src/htx.c instead.

Many thanks to Ori for his work and his responsible report!
2021-09-03 16:15:29 +02:00
Willy Tarreau
3d5f19e04d CLEANUP: htx: remove comments about "must be < 256 MB"
Since commit "BUG/MINOR: config: reject configs using HTTP with bufsize
>= 256 MB" we are now sure that it's not possible anymore to have an HTX
block of a size 256 MB or more, even after concatenation thanks to the
tests for len >= htx_free_data_space(). Let's remove these now obsolete
comments.

A BUG_ON() was added in htx_add_blk() to track any such exception if
the conditions would change later, to complete the one that is performed
on the start address that must remain within the buffer.
2021-09-03 16:15:29 +02:00
Willy Tarreau
e352b9dac7 MINOR: vars: make vars_get_by_* support an optional default value
In preparation for support default values when fetching variables, we
need to update the internal API to pass an extra argument to functions
vars_get_by_{name,desc} to provide an optional default value. This
patch does this and always passes NULL in this argument. var_to_smp()
was extended to fall back to this value when available.
2021-09-03 12:08:54 +02:00
Willy Tarreau
9a621ae76d MEDIUM: vars: add a new "set-var-fmt" action
The set-var() action is convenient because it preserves the input type
but it's a pain to deal with when trying to concatenate values. The
most recurring example is when it's needed to build a variable composed
of the source address and the source port. Usually it ends up like this:

    tcp-request session set-var(sess.port) src_port
    tcp-request session set-var(sess.addr) src,concat(":",sess.port)

This is even worse when trying to aggregate multiple fields from stick-table
data for example. Due to this a lot of users instead abuse headers from HTTP
rules:

    http-request set-header(x-addr) %[src]:%[src_port]

But this requires some careful cleanups to make sure they won't leak, and
it's significantly more expensive to deal with. And generally speaking it's
not clean. Plus it must be performed for each and every request, which is
expensive for this common case of ip+port that doesn't change for the whole
session.

This patch addresses this limitation by implementing a new "set-var-fmt"
action which performs the same work as "set-var" but takes a format string
in argument instead of an expression. This way it becomes pretty simple to
just write:

    tcp-request session set-var-fmt(sess.addr) %[src]:%[src_port]

It is usable in all rulesets that already support the "set-var" action.
It is not yet implemented for the global "set-var" directive (which already
takes a string) and the CLI's "set var" command, which would definitely
benefit from it but currently uses its own parser and engine, thus it
must be reworked.

The doc and regtests were updated.
2021-09-02 21:22:22 +02:00
Willy Tarreau
57467b8356 MINOR: sample: add missing ARGC_ entries
For a long time we couldn't have arguments in expressions used in
tcp-request, tcp-response etc rules. But now due to the variables
it's possible, and their context in case of failure to resolve an
argument (e.g. backend name not found) is not properly reported
because there is no arg context values in ARGC_* to report them.

Let's add a number of missing ones for tcp-request {connection,
session,content}, tcp-response content, tcp-check, the config
parser (for "set-var" in the global section) and the CLI parser
(for "set-var" on the CLI).
2021-09-02 19:43:20 +02:00
Willy Tarreau
bc1223be79 MINOR: http-rules: add a new "ignore-empty" option to redirects.
Sometimes it is convenient to remap large sets of URIs to new ones (e.g.
after a site migration for example). This can be achieved using
"http-request redirect" combined with maps, but one difficulty there is
that non-matching entries will return an empty response. In order to
avoid this, duplicating the operation as an ACL condition ending in
"-m found" is possible but it becomes complex and error-prone while it's
known that an empty URL is not valid in a location header.

This patch addresses this by improving the redirect rules to be able to
simply ignore the rule and skip to the next one if the result of the
evaluation of the "location" expression is empty. However in order not
to break existing setups, it requires a new "ignore-empty" keyword.

There used to be an ACT_FLAG_FINAL on redirect rules that's used during
the parsing to emit a warning if followed by another rule, so here we
only set it if the option is not there. The http_apply_redirect_rule()
function now returns a 3rd value to mention that it did nothing and
that this was not an error, so that callers can just ignore the rule.
The regular "redirect" rules were not modified however since this does
not apply there.

The map_redirect VTC was completed with such a test and updated to 2.5
and an example was added into the documentation.
2021-09-02 17:06:18 +02:00
Tim Duesterhus
abc6b31ab8 CLEANUP: Add missing include guard to signal.h
Found using GitHub's CodeQL scan.
2021-09-01 21:39:19 +02:00
Willy Tarreau
87154e3010 BUG/MAJOR: queue: better protect a pendconn being picked from the proxy
The locking in the dequeuing process was significantly improved by commit
49667c14b ("MEDIUM: queue: take the proxy lock only during the px queue
accesses") in that it tries hard to limit the time during which the
proxy's queue lock is held to the strict minimum. Unfortunately it's not
enough anymore, because we take up the task and manipulate a few pendconn
elements after releasing the proxy's lock (while we're under the server's
lock) but the task will not necessarily hold the server lock since it may
not have successfully found one (e.g. timeout in the backend queue). As
such, stream_free() calling pendconn_free() may release the pendconn
immediately after the proxy's lock is released while the other thread
currently proceeding with the dequeuing tries to wake up the owner's
task and dies in task_wakeup().

One solution consists in releasing le proxy's lock later. But tests have
shown that we'd have to sacrifice a significant share of the performance
gained with the patch above (roughly a 20% loss).

This patch takes another approach. It adds a "del_lock" to each pendconn
struct, that allows to keep it referenced while the proxy's lock is being
released. It's mostly a serialization lock like a refcount, just to maintain
the pendconn alive till the task_wakeup() call is complete. This way we can
continue to release the proxy's lock early while keeping this one. It had
to be added to the few points where we're about to free a pendconn, namely
in pendconn_dequeue() and pendconn_unlink(). This way we continue to
release the proxy's lock very early and there is no performance degradation.

This lock may only be held under the queue's lock to prevent lock
inversion.

No backport is needed since the patch above was merged in 2.5-dev only.
2021-08-31 18:37:13 +02:00
Remi Tricot-Le Breton
fe21fe76bd MINOR: log: Add new "error-log-format" option
This option can be used to define a specific log format that will be
used in case of error, timeout, connection failure on a frontend... It
will be used for any log line concerned by the log-separate-errors
option. It will also replace the format of specific error messages
decribed in section 8.2.6.
If no "error-log-format" is defined, the legacy error messages are still
emitted and the other error logs keep using the regular log-format.
2021-08-31 12:13:08 +02:00
Willy Tarreau
ea57a9b103 BUILD: ssl: next round of build warnings on LIBRESSL_VERSION_NUMBER
Other build warnings were emitted on LIBRESSL_VERSION_NUMBER with -Wundef
under openssl < 1.1. Related to GH issue #1369. Seems like some of them
could be simplified a little bit.
2021-08-30 06:20:46 +02:00
Willy Tarreau
a01f8ce2d4 BUILD/MINOR: regex: avoid a build warning on USE_PCRE2 with -Wundef
regex-t emits a warning on #elif USE_PCRE2 when built with -Wundef,
let's just fix it. This was reported in GH issue #1369.
2021-08-28 12:49:58 +02:00
Willy Tarreau
6e5542e9f4 BUILD/MINOR: ssl: avoid a build warning on LIBRESSL_VERSION with -Wundef
Openssl-compat emits a warning for the test on LIBRESSL_VERSION that might
be underfined, if built with -Wundef. The fix is easy, let's do it. Related
to GH issue #1369.
2021-08-28 12:06:51 +02:00
Willy Tarreau
33056436c7 BUILD/MINOR: defaults: eliminate warning on MAXHOSTNAMELEN with -Wundef
As reported in GH issue #1369, there is a single case of #if with a
possibly undefined value in defaults.h which is on MAXHOSTNAMELEN. Let's
turn it to a #ifdef.
2021-08-28 12:05:32 +02:00
Willy Tarreau
cbdc74b4b3 BUG/MINOR: ebtree: remove dependency on incorrect macro for bits per long
The code used to rely on BITS_PER_LONG to decide on the most efficient
way to perform a 64-bit shift, but this macro is not defined (at best
it's __BITS_PER_LONG) and it's likely that it's been like this since
the early implementation of ebtrees designed on i386. Let's remove the
test on this macro and rely on sizeof(long) instead, it also has the
benefit of letting the compiler validate the two branches.

This can be backported to all versions. Thanks to Ezequiel Garcia for
reporting this one in issue #1369.
2021-08-28 11:55:53 +02:00
Willy Tarreau
fe456c581f MINOR: time: add report_idle() to report process-wide idle time
Before threads were introduced in 1.8, idle_pct used to be a global
variable indicating the overall process idle time. Threads made it
thread-local, meaning that its reporting in the stats made little
sense, though this was not easy to spot. In 2.0, the idle_pct variable
moved to the struct thread_info via commit 81036f273 ("MINOR: time:
move the cpu, mono, and idle time to thread_info"). It made it more
obvious that the idle_pct was per thread, and also allowed to more
accurately measure it. But no more effort was made in that direction.

This patch introduces a new report_idle() function that accurately
averages the per-thread idle time over all running threads (i.e. it
should remain valid even if some threads are paused or stopped), and
makes use of it in the stats / "show info" reports.

Sending traffic over only two connections of an 8-thread process
would previously show this erratic CPU usage pattern:

  $ while :; do socat /tmp/sock1 - <<< "show info"|grep ^Idle;sleep 0.1;done
  Idle_pct: 30
  Idle_pct: 35
  Idle_pct: 100
  Idle_pct: 100
  Idle_pct: 100
  Idle_pct: 100
  Idle_pct: 100
  Idle_pct: 100
  Idle_pct: 35
  Idle_pct: 33
  Idle_pct: 100
  Idle_pct: 100
  Idle_pct: 100
  Idle_pct: 100
  Idle_pct: 100
  Idle_pct: 100

Now it shows this more accurate measurement:

  $ while :; do socat /tmp/sock1 - <<< "show info"|grep ^Idle;sleep 0.1;done
  Idle_pct: 83
  Idle_pct: 83
  Idle_pct: 83
  Idle_pct: 83
  Idle_pct: 83
  Idle_pct: 83
  Idle_pct: 83
  Idle_pct: 83
  Idle_pct: 83
  Idle_pct: 83
  Idle_pct: 83
  Idle_pct: 83
  Idle_pct: 83
  Idle_pct: 83
  Idle_pct: 83

This is not technically a bug but this lack of precision definitely affects
some users who rely on the idle_pct measurement. This should at least be
backported to 2.4, and might be to some older releases depending on users
demand.
2021-08-28 11:18:10 +02:00
Willy Tarreau
e365aa28d4 BUG/MINOR: time: fix idle time computation for long sleeps
In 2.4 we extended the max poll time from 1s to 60s with commit
4f59d3861 ("MINOR: time: increase the minimum wakeup interval to 60s").

This had the consequence that the calculation of the idle time percentage
may overflow during the multiply by 100 if the thread had slept 43s or
more. Let's change this to a 64 bit computation. This will have no
performance impact since this is done at most twice per second.

This should fix github issue #1366.

This must be backported to 2.4.
2021-08-27 23:36:20 +02:00
Marcin Deranek
310a260e4a MEDIUM: config: Deprecate tune.ssl.capture-cipherlist-size
Deprecate tune.ssl.capture-cipherlist-size in favor of
tune.ssl.capture-buffer-size which better describes the purpose of the
setting.
2021-08-26 19:52:04 +02:00
Marcin Deranek
959a48c116 MINOR: sample: Expose SSL captures using new fetchers
To be able to provide JA3 compatible TLS Fingerprints we need to expose
all Client Hello captured data using fetchers. Patch provides new
and modifies existing fetchers to add ability to filter out GREASE values:
- ssl_fc_cipherlist_*
- ssl_fc_ecformats_bin
- ssl_fc_eclist_bin
- ssl_fc_extlist_bin
- ssl_fc_protocol_hello_id
2021-08-26 19:48:34 +02:00
Marcin Deranek
769fd2e447 MEDIUM: ssl: Capture more info from Client Hello
When we set tune.ssl.capture-cipherlist-size to a non-zero value
we are able to capture cipherlist supported by the client. To be able to
provide JA3 compatible TLS fingerprinting we need to capture more
information from Client Hello message:
- SSL Version
- SSL Extensions
- Elliptic Curves
- Elliptic Curve Point Formats
This patch allows HAProxy to capture such information and store it for
later use.
2021-08-26 19:48:33 +02:00
Willy Tarreau
906f7daed1 MINOR: compiler: implement an ONLY_ONCE() macro
There are regularly places, especially in config analysis, where we
need to report certain things (warnings or errors) only once, but
where implementing a counter is sufficiently deterrent so that it's
not done.

Let's add a simple ONLY_ONCE() macro that implements a static variable
(char) which is atomically turned on, and returns true if it's set for
the first time. This uses fairly compact code, a single byte of BSS
and is thread-safe. There are probably a number of places in the config
parser where this could be used. It may also be used to implement a
WARN_ON() similar to BUG_ON() but which would only warn once.
2021-08-26 16:35:00 +02:00
Amaury Denoyelle
5cca48cba2 MINOR: server: define non purgeable server flag
Define a flag to mark a server as non purgeable. This flag will be used
for "delete server" CLI handler. All servers without this flag will be
eligible to runtime suppression.
2021-08-25 15:53:54 +02:00
Amaury Denoyelle
bc2ebfa5a4 MEDIUM: server: extend refcount for all servers
In a future patch, it will be possible to remove at runtime every
servers, both static and dynamic. This requires to extend the server
refcount for all instances.

First, refcount manipulation functions have been renamed to better
express the API usage.

* srv_refcount_use -> srv_take
The refcount is always initialize to 1 on the server creation in
new_server. It's also incremented for each check/agent configured on a
server instance.

* free_server -> srv_drop
This decrements the refcount and if null, the server is freed, so code
calling it must not use the server reference after it. As a bonus, this
function now returns the next server instance. This is useful when
calling on the server loop without having to save the next pointer
before each invocation.

In these functions, remove the checks that prevent refcount on
non-dynamic servers. Each reference to "dynamic" in variable/function
naming have been eliminated as well.
2021-08-25 15:53:54 +02:00
Amaury Denoyelle
0a8d05d31c BUG/MINOR: stats: use refcount to protect dynamic server on dump
A dynamic server may be deleted at runtime at the same moment when the
stats applet is pointing to it. Use the server refcount to prevent
deletion in this case.

This should be backported up to 2.4, with an observability period of 2
weeks. Note that it requires the dynamic server refcounting feature
which has been implemented on 2.5; the following commits are required :

- MINOR: server: implement a refcount for dynamic servers
- BUG/MINOR: server: do not use refcount in free_server in stopping mode
- MINOR: server: return the next srv instance on free_server
2021-08-25 15:53:43 +02:00
Amaury Denoyelle
f5c1e12e44 MINOR: server: return the next srv instance on free_server
As a convenience, return the next server instance from servers list on
free_server.

This is particularily useful when using this function on the servers
list without having to save of the next pointer before calling it.
2021-08-25 15:29:19 +02:00
Ilya Shipitsin
ff0f278860 CLEANUP: assorted typo fixes in the code and comments
This is 26th iteration of typo fixes
2021-08-25 05:13:31 +02:00
William Lallemand
3aeb3f9347 MINOR: cfgcond: implements openssl_version_atleast and openssl_version_before
Implements a way of checking the running openssl version:

If the OpenSSL support was not compiled within HAProxy it will returns a
error, so it's recommanded to do a SSL feature check before:

	$ ./haproxy -cc 'feature(OPENSSL) && openssl_version_atleast(0.9.8zh) && openssl_version_before(3.0.0)'

This will allow to select the SSL reg-tests more carefully.
2021-08-22 00:30:24 +02:00
William Lallemand
44d862d8d4 MINOR: ssl: add an openssl version string parser
openssl_version_parser() parse a string in the OpenSSL version format
which is documented here:

https://www.openssl.org/docs/man1.1.1/man3/OPENSSL_VERSION_NUMBER.html

The function returns an unsigned int that could be used for comparing
openssl versions.
2021-08-21 23:44:02 +02:00
William Lallemand
2a8fe8bb48 MINOR: httpclient: cleanup the include files
Include the correct .h files in http_client.c and http_client.h.

The api.h is needed in http_client.c and http_client-t.h is now include
directly from http_client.h
2021-08-20 14:25:15 +02:00
Remi Tricot-Le Breton
f95c29546c BUILD/MINOR: ssl: Fix compilation with OpenSSL 1.0.2
The X509_STORE_CTX_get0_cert did not exist yet on OpenSSL 1.0.2 and
neither did X509_STORE_CTX_get0_chain, which was not actually needed
since its get1 equivalent already existed.
2021-08-20 10:05:58 +02:00
Remi Tricot-Le Breton
74f6ab6e87 MEDIUM: ssl: Keep a reference to the client's certificate for use in logs
Most of the SSL sample fetches related to the client certificate were
based on the SSL_get_peer_certificate function which returns NULL when
the verification process failed. This made it impossible to use those
fetches in a log format since they would always be empty.

The patch adds a reference to the X509 object representing the client
certificate in the SSL structure and makes use of this reference in the
fetches.

The reference can only be obtained in ssl_sock_bind_verifycbk which
means that in case of an SSL error occurring before the verification
process ("no shared cipher" for instance, which happens while processing
the Client Hello), we won't ever start the verification process and it
will be impossible to get information about the client certificate.

This patch also allows most of the ssl_c_XXX fetches to return a usable
value in case of connection failure (because of a verification error for
instance) by making the "conn->flags & CO_FL_WAIT_XPRT" test (which
requires a connection to be established) less strict.

Thanks to this patch, a log-format such as the following should return
usable information in case of an error occurring during the verification
process :
    log-format "DN=%{+Q}[ssl_c_s_dn] serial=%[ssl_c_serial,hex] \
                hash=%[ssl_c_sha1,hex]"

It should answer to GitHub issue #693.
2021-08-19 23:26:05 +02:00
William Lallemand
33b0d095cc MINOR: httpclient: implement a simple HTTP Client API
This commit implements a very simple HTTP Client API.

A client can be operated by several functions:

    - httpclient_new(), httpclient_destroy(): create
      and destroy the struct httpclient instance.

    - httpclient_req_gen(): generate a complete HTX request using the
      the absolute URL, the method and a list of headers. This request
      is complete and sets the HTX End of Message flag. This is limited
      to small request we don't need a body.

    - httpclient_start() fill a sockaddr storage with a IP extracted
      from the URL (it cannot resolve an fqdm for now), start the
      applet. It also stores the ptr of the caller which could be an
      appctx or something else.

   - hc->ops contains a list of callbacks used by the
     HTTPClient, they should be filled manually after an
     httpclient_new():

        * res_stline(): the client received a start line, its content
          will be stored in hc->res.vsn, hc->res.status, hc->res.reason

        * res_headers(): the client received headers, they are stored in
          hc->res.hdrs.

        * res_payload(): the client received some payload data, they are
        stored in the hc->res.buf buffer and could be extracted with the
        httpclient_res_xfer() function, which takes a destination buffer
        as a parameter

        * res_end(): this callback is called once we finished to receive
        the response.
2021-08-18 17:36:32 +02:00
Willy Tarreau
d3d8d03d98 MINOR: http: add a new function http_validate_scheme() to validate a scheme
While http_parse_scheme() extracts a scheme from a URI by extracting
exactly the valid characters and stopping on delimiters, this new
function performs the same on a fixed-size string.
2021-08-17 10:16:22 +02:00
Ilya Shipitsin
01881087fc CLEANUP: assorted typo fixes in the code and comments
This is 25th iteration of typo fixes
2021-08-16 12:37:59 +02:00
Christopher Faulet
df97ac4584 MEDIUM: filters/lua: Add HTTPMessage class to help HTTP filtering
This new class exposes methods to manipulate HTTP messages from a filter
written in lua. Like for the HTTP class, there is a bunch of methods to
manipulate the message headers. But there are also methods to manipulate the
message payload. This part is similar to what is available in the Channel
class. Thus the payload can be duplicated, erased, modified or
forwarded. For now, only DATA blocks can be retrieved and modified because
the current API is limited. No HTTPMessage method is able to yield. Those
manipulating the headers are always called on messages containing all the
headers, so there is no reason to yield. Those manipulating the payload are
called from the http_payload filters callback function where yielding is
forbidden.

When an HTTPMessage object is instantiated, the underlying Channel object
can be retrieved via the ".channel" field.

For now this class is not used because the HTTP filtering is not supported
yet. It will be the purpose of another commit.

There is no documentation for now.
2021-08-12 08:57:07 +02:00
Christopher Faulet
8c9e6bba0f MINOR: lua: Add flags on the lua TXN to know the execution context
A lua TXN can be created when a sample fetch, an action or a filter callback
function is executed. A flag is now used to track the execute context.
Respectively, HLUA_TXN_SMP_CTX, HLUA_TXN_ACT_CTX and HLUA_TXN_FLT_CTX. The
filter flag is not used for now.
2021-08-12 08:57:07 +02:00
Christopher Faulet
1f43a3430e MINOR: lua: Add a flag on lua context to know the yield capability at run time
When a script is executed, a flag is used to allow it to yield. An error is
returned if a lua function yield, explicitly or not. But there is no way to
get this capability in C functions. So there is no way to choose to yield or
not depending on this capability.

To fill this gap, the flag HLUA_NOYIELD is introduced and added on the lua
context if the current script execution is not authorized to yield. Macros
to set, clear and test this flags are also added.

This feature will be usefull to fix some bugs in lua actions execution.
2021-08-12 08:57:07 +02:00
William Lallemand
8c29fa7454 MINOR: channel: remove an htx block from a channel
co_htx_remove_blk() implements a way to remove an htx block from a
channel buffer and update the channel output.
2021-08-12 00:51:59 +02:00
Amaury Denoyelle
7afa5c1843 MINOR: global: define MODE_STOPPING
Define a new mode MODE_STOPPING. It is used to indicate that the process
is in the stopping stage and no event loop runs anymore.
2021-08-09 17:51:55 +02:00
Amaury Denoyelle
b33a0abc0b MEDIUM: check: implement check deletion for dynamic servers
Implement a mechanism to free a started check on runtime for dynamic
servers. A new function check_purge is created for this. The check task
will be marked for deletion and scheduled to properly close connection
elements and free the task/tasklet/buf_wait elements.

This function will be useful to delete a dynamic server wich checks.
2021-08-06 11:09:48 +02:00
Amaury Denoyelle
d6b7080cec MINOR: server: implement a refcount for dynamic servers
It is necessary to have a refcount mechanism on dynamic servers to be
able to enable check support. Indeed, when deleting a dynamic server
with check activated, the check will be asynchronously removed. This is
mandatory to properly free the check resources in a thread-safe manner.
The server instance must be kept alive for this.
2021-08-06 11:09:48 +02:00
Amaury Denoyelle
3c2ab1a0d4 MINOR: check: export check init functions
Remove static qualifier on init_srv_check, init_srv_agent_check and
start_check_task. These functions will be called in server.c for dynamic
servers with checks.
2021-08-06 11:08:04 +02:00
Amaury Denoyelle
7b368339af MEDIUM: task: implement tasklet kill
Implement an equivalent of task_kill for tasklets. This function can be
used to request a tasklet deletion in a thread-safe way.

Currently this function is unused.
2021-08-06 11:07:48 +02:00
Christopher Faulet
434b8525ee MINOR: spoe: Add a pointer on the filter config in the spoe_agent structure
There was no way to access the SPOE filter configuration from the agent
object. However it could be handy to have it. And in fact, this will be
required to fix a bug.
2021-08-05 10:07:43 +02:00
Willy Tarreau
7b2ac29a92 CLEANUP: fd: remove the now unneeded fd_mig_lock
This is not needed anymore since we don't use it when setting the running
mask anymore.
2021-08-04 16:03:36 +02:00
Willy Tarreau
b201b1dab1 CLEANUP: fd: remove the now unused fd_set_running()
It was inlined inside fd_update_events() since it relies on a loop that
may return immediate failure codes.
2021-08-04 16:03:36 +02:00
Willy Tarreau
f69fea64e0 MAJOR: fd: get rid of the DWCAS when setting the running_mask
Right now we're using a DWCAS to atomically set the running_mask while
being constrained by the thread_mask. This DWCAS is annoying because we
may seriously need it later when adding support for thread groups, for
checking that the running_mask applies to the correct group.

It turns out that the DWCAS is not strictly necessary because we never
need it to set the thread_mask based on the running_mask, only the other
way around. And in fact, the running_mask is always cleared alone, and
the thread_mask is changed alone as well. The running_mask is only
relevant to indicate a takeover when the thread_mask matches it. Any
bit set in running and not present in thread_mask indicates a transition
in progress.

As such, it is possible to re-arrange this by using a regular CAS around a
consistency check between running_mask and thread_mask in fd_update_events
and by making a CAS on running_mask then an atomic store on the thread_mask
in fd_takeover(). The only other case is fd_delete() but that one already
sets the running_mask before clearing the thread_mask, which is compatible
with the consistency check above.

This change has happily survived 10 billion takeovers on a 16-thread
machine at 800k requests/s.

The fd-migration doc was updated to reflect this change.
2021-08-04 16:03:36 +02:00
Willy Tarreau
b1f29bc625 MINOR: activity/fd: remove the dead_fd counter
This one is set whenever an FD is reported by a poller with a null owner,
regardless of the thread_mask. It has become totally meaningless because
it only indicates a migrated FD that was not yet reassigned to a thread,
but as soon as a thread uses it, the status will change to skip_fd. Thus
there is no reason to distinguish between the two, it adds more confusion
than it helps. Let's simply drop it.
2021-08-04 16:03:36 +02:00
Willy Tarreau
88d1c5d3fb MEDIUM: threads: add a stronger thread_isolate_full() call
The current principle of running under isolation was made to access
sensitive data while being certain that no other thread was using them
in parallel, without necessarily having to place locks everywhere. The
main use case are "show sess" and "show fd" which run over long chains
of pointers.

The thread_isolate() call relies on the "harmless" bit that indicates
for a given thread that it's not currently doing such sensitive things,
which is advertised using thread_harmless_now() and which ends usings
thread_harmless_end(), which also waits for possibly concurrent threads
to complete their work if they took this opportunity for starting
something tricky.

As some system calls were notoriously slow (e.g. mmap()), a bunch of
thread_harmless_now() / thread_harmless_end() were placed around them
to let waiting threads do their work while such other threads were not
able to modify memory contents.

But this is not sufficient for performing memory modifications. One such
example is the server deletion code. By modifying memory, it not only
requires that other threads are not playing with it, but are not either
in the process of touching it. The fact that a pool_alloc() or pool_free()
on some structure may call thread_harmless_now() and let another thread
start to release the same object's memory is not acceptable.

This patch introduces the concept of "idle threads". Threads entering
the polling loop are idle, as well as those that are waiting for all
others to become idle via the new function thread_isolate_full(). Once
thread_isolate_full() is granted, the thread is not idle anymore, and
it is released using thread_release() just like regular isolation. Its
users have to keep in mind that across this call nothing is granted as
another thread might have performed shared memory modifications. But
such users are extremely rare and are actually expecting this from their
peers as well.

Note that that in case of backport, this patch depends on previous patch:
  MINOR: threads: make thread_release() not wait for other ones to complete
2021-08-04 14:49:36 +02:00
William Lallemand
8e765b86fd MINOR: proxy: disabled takes a stopping and a disabled state
This patch splits the disabled state of a proxy into a PR_DISABLED and a
PR_STOPPED state.

The first one is set when the proxy is disabled in the configuration
file, and the second one is set upon a stop_proxy().
2021-08-03 14:17:45 +02:00
William Lallemand
56f1f75715 MINOR: log: rename 'dontloglegacyconnerr' to 'log-error-via-logformat'
Rename the 'dontloglegacyconnerr' option to 'log-error-via-logformat'
which is much more self-explanatory and readable.

Note: only legacy keywords don't use hyphens, it is recommended to
separate words with them in new keywords.
2021-08-02 10:42:42 +02:00
Willy Tarreau
99198546f6 MEDIUM: atomic: relax the load/store barriers on x86_64
The x86-tso model makes the load and store barriers unneeded for our
usage as long as they perform at least a compiler barrier: the CPU
will respect store ordering and store vs load ordering. It's thus
safe to remove the lfence and sfence which are normally needed only
to communicate with external devices. Let's keep the mfence though,
to make sure that reads of same memory location after writes report
the value from memory and not the one snooped from the write buffer
for too long.

An in-depth review of all use cases tends to indicate that this is
okay in the rest of the code. Some parts could be cleaned up to
use atomic stores and atomic loads instead of explicit barriers
though.

Doing this reliably increases the overall performance by about 2-2.5%
on a 8c-16t Xeon thanks to less frequent flushes (it's likely that the
biggest gain is in the MT lists which use them a lot, and that this
results in less cache line flushes).
2021-08-01 17:34:06 +02:00
Willy Tarreau
cb0451146f MEDIUM: atomic: simplify the atomic load/store/exchange operations
The atomic_load/atomic_store/atomic_xchg operations were all forced to
__ATOMIC_SEQ_CST, which results in explicit store or even full barriers
even on x86-tso while we do not need them: we're not communicating with
external devices for example and are only interested in respecting the
proper ordering of loads and stores between each other.

These ones being rarely used, the emitted code on x86 remains almost the
same (barring a handful of locations). However they will allow to place
correct barriers at other places where atomics are accessed a bit lightly.

The patch is marked medium because we can never rule out the risk of some
bugs on more relaxed platforms due to the rest of the code.
2021-08-01 17:34:06 +02:00
Willy Tarreau
55a0975b1e BUG/MINOR: freq_ctr: use stricter barriers between updates and readings
update_freq_ctr_period() was using relaxed atomics without using barriers,
which usually works fine on x86 but not everywhere else. In addition, some
values were read without being enclosed by barriers, allowing the compiler
to possibly prefetch them a bit earlier. Finally, freq_ctr_total() was also
reading these without enough barriers. Let's make explicit use of atomic
loads and atomic stores to get rid of this situation. This required to
slightly rearrange the freq_ctr_total() loop, which could possibly slightly
improve performance under extreme contention by avoiding to reread all
fields.

A backport may be done to 2.4 if a problem is encountered, but last tests
on arm64 with LSE didn't show any issue so this can possibly stay as-is.
2021-08-01 17:34:06 +02:00