We're using variable-to-sample conversion at least 4 times in the code,
two of which are bogus. Let's introduce a generic conversion function
that performs the required checks.
This one only handles integers, contrary to its sibling with the suffix
_str that only handles strings. Let's rename it and uninline it since it
may well be used from outside.
The SSL stuff in struct server takes less than 3% of it and requires
lots of annoying ifdefs in the code just to take care of the cases
where the field is absent. Let's get rid of this and stop including
openssl-compat from server.c to detect NPN and ALPN capabilities.
This reduces the total LoC by another 0.4%.
During httpclient_destroy, add a condition in the BUG_ON which checks
that the client was started before it has ended. A httpclient structure
could have been created without being started.
httpclient_stop_and_destroy() tries to destroy the httpclient structure
if the client was stopped.
In the case the client wasn't stopped, it ask the client to stop itself
and to destroy the httpclient structure itself during the release of the
applet.
That's where that code initially was but it had been moved to
activity_count_runtime() for pure reasons of dependency loops. These
ones are no longer true so we can move that code back to the scheduler
and keep it where the information are updated and checked.
time.h is a horrible place to put activity calculation, it's a
historical mistake because the functions were there. We already have
most of the parts in sched.{c,h} and these ones make an exception in
the middle, forcing time.h to include some thread stuff and to access
the before/after_poll and idle_pct values.
Let's move these 3 functions to task.h with the other ones. They were
prefixed with "sched_" instead of the historical "tv_" which already
made no sense anymore.
I don't know why I inlined this one, this makes no sense given that it's
only used for stats, and it starts a circular dependency on tinfo.h which
can be problematic in the future. In addition, all the stuff related to
idle time calculation should be with the rest of the scheduler, which
currently is in task.{c,h}, so let's move it there.
We'll need to improve the API to pass other arguments in the future, so
let's start to adapt better to the current use cases. task_new() is used:
- 18 times as task_new(tid_bit)
- 18 times as task_new(MAX_THREADS_MASK)
- 2 times with a single bit (in a loop)
- 1 in the debug code that uses a mask
This patch provides 3 new functions to achieve this:
- task_new_here() to create a task on the calling thread
- task_new_anywhere() to create a task to be run anywhere
- task_new_on() to create a task to run on a specific thread
The change is trivial and will allow us to later concentrate the
required adaptations to these 3 functions only. It's still possible
to call task_new() if needed but a comment was added to encourage the
use of the new ones instead. The debug code was not changed and still
uses it.
Work lists were a mechanism introduced in 1.8 to asynchronously delegate
some work to be performed on another thread via a dedicated task.
The only user was the listeners, to deal with the queue. Nowadays
the tasklets have made this much more convenient, and have replaced
work_lists in the listeners. It seems there will be no valid use case
of work lists anymore, so better get rid of them entirely and keep the
scheduler code cleaner.
__task_queue() must absolutely not be called with TICK_ETERNITY or it
will place a never-expiring node upfront in the timers queue, preventing
any timer from expiring until the process is restarted. Code was found
to cause this using "task_schedule(task, now_ms)" which does this one
millisecond every 49.7 days, so let's add a condition against this. It
must never trigger since any process susceptible to trigger it would
already accumulate tasks until it dies.
An extra test was added in wake_expired_tasks() to detect tasks whose
timeout would have been changed after being queued.
An improvement over this could be in the future to use a non-scalar
type (union/struct) for expiration dates so as to avoid the risk of
using them directly like this. But now_ms is already such a valid
time and this specific construct would still not be caught.
This could even be backported to stable versions to help detect other
occurrences if any.
The ssl_bc_hsk_err sample fetch will need to raise more errors than only
handshake related ones hence its renaming to a more generic ssl_bc_err.
This patch is required because some handshake failures that should have
been caught by this fetch (verify error on the server side for instance)
were missed. This is caused by a change in TLS1.3 in which the
'Finished' state on the client is reached before its certificate is sent
(and verified) on the server side (see the "Protocol Overview" part of
RFC 8446).
This means that the SSL_do_handshake call is finished long before the
server can verify and potentially reject the client certificate.
The ssl_bc_hsk_err will then need to be expanded to catch other types of
errors.
This change is also applied to the frontend fetches (ssl_fc_hsk_err
becomes ssl_fc_err) and to their string counterparts.
In case of a connection error happening after the SSL handshake is
completed, the error code stored in the connection structure would not
always be set, hence having some connection failures being described as
successful in the fc_conn_err or bc_conn_err sample fetches.
The most common case in which it could happen is when the SSL server
rejects the client's certificate. The SSL_do_handshake call on the
client side would be sucessful because the client effectively sent its
client hello and certificate information to the server, but the next
call to SSL_read on the client side would raise an SSL_ERROR_SSL code
(through the SSL_get_error function) which is decribed in OpenSSL
documentation as a non-recoverable and fatal SSL error.
This patch ensures that in such a case, the connection's error code is
set to a special CO_ERR_SSL_FATAL value.
There's no reason CONFIG_HAP_POOLS and its opposite are located into
pools-t.h, it forces those that depend on them to inlcude the file.
Other similar options are normally dealt with in defaults.h, which is
part of the default API, so let's do that.
According to the RFC7230, "chunked" encoding must not be applied more than
once to a message body. To handle this case, h1_parse_xfer_enc_header() is
now responsible to fail when a parsing error is found. It also fails if the
"chunked" encoding is not the last one for a request.
To help the parsing, two H1 parser flags have been added: H1_MF_TE_CHUNKED
and H1_MF_TE_OTHER. These flags are set, respectively, when "chunked"
encoding and any other encoding are found. H1_MF_CHNK flag is used when
"chunked" encoding is the last one.
This commit provides an hlua_httpclient object which is a bridge between
the httpclient and the lua API.
The HTTPClient is callable in lua this way:
local httpclient = core.httpclient()
local response = httpclient:get("http://127.0.0.1:9000/?s=9999")
core.Debug("Status: ".. res.status .. ", Reason : " .. res.reason .. ", Len:" .. string.len(res.body) .. "\n")
The resulting response object will provide a "status" field which
contains the status code, a "reason" string which contains the reason
string, and a "body" field which contains the response body.
The implementation uses the httpclient callback to wake up the lua task
which yield each time it pushes some data. The httpclient works in the
same thread as the lua task.
The transient flag CO_RFL_BUF_NOT_STUCK should now be set when the mux's
rcv_buf() function is called, in si_cs_recv(), to be sure the mux is able to
perform some optimisation during data copy. This flag is set when we are
sure the channel buffer is not stuck. Concretely, it happens when there are
data scheduled to be sent.
It is not a fix and this flag is not used for now. But it makes sense to have
this info to be sure to be able to do some optimisations if necessary.
This patch is related to the issue #1362. It may be backported to 2.4 to
ease future backports.
HTX_FL_FRAGMENTED flag is now set on an HTX message when it is
fragmented. It happens when an HTX block is removed in the middle of the
message and flagged as unused. HTX_FL_FRAGMENTED flag is removed when all
data are removed from the message or when the message is defragmented.
Note that some optimisations are still possible because the flag can be
avoided in other situations. For instance when the last header of a bodyless
message is removed.
If the stream-interface is waiting for more buffer room to store incoming
data, it is important at the stream level to stop to wait for more data to
continue. Thanks to the previous patch ("BUG/MEDIUM: stream-int: Notify
stream that the mux wants more room to xfer data"), the stream is woken up
when this happens. In this patch, we take care to interrupt the
corresponding tcp-content ruleset or to stop waiting for the HTTP message
payload.
To ease detection of the state, si_rx_blocked_room() helper function has
been added. It returns non-zero if the stream interface's Rx path is blocked
because of lack of room in the input buffer.
This patch is part of a series related to the issue #1362. It should be
backported as ar as 2.0, probably with some adaptations. So be careful
during backports.
When the mux failed to transfer data to the upper layer because of a lack of
room, it is important to wake the stream up to let it handle this
event. Otherwise, if the stream is waiting for more data, both the stream
and the mux reamin blocked waiting for each other.
When this happens, the mux set the CS_FL_WANT_ROOM flag on the
conn-stream. Thus, in si_cs_recv() we are able to detect this event. Today,
the stream-interface is blocked. But, it is not enough to wake the stream
up. To fix the bug, CF_READ_PARTIAL flag is extended to also handle cases
where a read exception occurred. This flag should idealy be renamed. But for
now, it is good enough. By setting this flag, we are sure the stream will be
woken up.
This patch is part of a series related to the issue #1362. It should be
backported as far as 2.0, probably with some adaptations. So be careful
during backports.
When a message is parsed and copied into the channel buffer, in
h1_process_demux(), more space is requested if some pending data remain
after the parsing while the channel buffer is not empty. To do so,
CS_FL_WANT_ROOM flag is set. It means the H1 parser needs more space in the
channel buffer to continue. In the stream-interface, when this flag is set,
the SI is considered as blocked on the RX path. It is only unblocked when
some data are sent.
However, it is not accurrate because the parsing may be stopped because
there is not enough data to continue. For instance in the middle of a chunk
size. In this case, some data may have been already copied but the parser is
blocked because it must receive more data to continue. If the calling SI is
blocked on RX at this stage when the stream is waiting for the payload
(because http-buffer-request is set for instance), the stream remains stuck
infinitely.
To fix the bug, we must request more space to the app layer only when it is
not possible to copied more data. Actually, this happens when data remain in
the input buffer while the H1 parser is in states MSG_DATA or MSG_TUNNEL, or
when we are unable to copy headers or trailers into a non-empty buffer.
The first condition is quite easy to handle. The second one requires an API
refactoring. h1_parse_msg_hdrs() and h1_parse_msg_tlrs() fnuctions have been
updated. Now it is possible to know when we need more space in the buffer to
copy headers or trailers (-2 is returned). In the H1 mux, a new H1S flag
(H1S_F_RX_CONGESTED) is used to track this state inside h1_process_demux().
This patch is part of a series related to the issue #1362. It should be
backported as far as 2.0, probably with some adaptations. So be careful
during backports.
During the packet loss detection we must treat the paquet number
in this order Initial -> Handshake -> O1RTT. This was not the case
due to the chosen order to implement the array of packet number space
which was there before the packet loss detection implementation.
The STREAM data to send coming from the upper layer must be stored until
having being acked by the peer. To do so, we store them in buffer structs,
one by stream (see qcs.tx.buf). Each time a STREAM is built by quic_push_frame(),
its offset must match the offset of the first byte added to the buffer (modulo
the size of the buffer) by the frame. As they are not always acknowledged in
order, they may be stored in eb_trees ordered by their offset to be sure
to sequentially delete the STREAM data from their buffer, in the order they
have been added to it.
This function does exactly the same thing as b_xfer() which transfers
data from a struct buffer to another one but without zero copy when
the destination buffer is empty. This is at least useful to transfer
h3 data to the QUIC mux from buffer with garbage medata which have
been used to build h3 frames without too much memcopy()/memmove().
The peer transport parameter values were not initialized with
the default ones (when absent), especially the
"active_connection_id_limit" parameter with 2 as default value
when absent from received remote transport parameters. This
had as side effect to send too much NEW_CONNECTION_ID frames.
This was the case for curl which does not announce any
"active_connection_id_limit" parameter.
Also rename ->idle_timeout to ->max_idle_timeout to reflect the RFC9000.
These salts are used to derive initial secrets to decrypt the first Initial packet.
We support draft-29 and v1 QUIC version initial salts.
Add parameters to our QUIC-TLS API functions used to derive these secret for
these salts.
Make our xprt_quic use the correct initial salt upon QUIC version field found in
the first paquet. Useful to support connections with curl which use draft-29
QUIC version.
Move the "ACK required" bit from the packet number space to the connection level.
Force the "ACK required" option when acknowlegding Handshake or Initial packet.
A client may send three packets with a different encryption level for each. So,
this patch modifies qc_treat_rx_pkts() to consider two encryption level passed
as parameters, in place of only one.
Make qc_conn_io_cb() restart its process after the handshake has succeeded
so that to process any Application level packets which have already been received
in the same datagram as the last CRYPTO frames in Handshake packets.
Make qc_prep_hdshk_pkts() and qui_conn_io_cb() handle the case
where we enter them with QUIC_HS_ST_COMPLETE or QUIC_HS_ST_CONFIRMED
as connection state with QUIC_TLS_ENC_LEVEL_APP and QUIC_TLS_ENC_LEVEL_NONE
to consider to prepare packets.
quic_get_tls_enc_levels() is modified to return QUIC_TLS_ENC_LEVEL_APP
and QUIC_TLS_ENC_LEVEL_NONE as levels to consider when coalescing
packets in the same datagram.
qc_build_pkt() has recently been modified to support any type of
supported frame at any encryption level (assuming that an encryption level does
not support any type of frame) but quic_tls_level_pkt_type()
prevented it from building application level packet type because it was written
only for the handshake.
This patch simply adds the remaining encryption level QUIC_TLS_ENC_LEVEL_APP
which must be supported by quic_tls_level_pkt_type().
This should be used by the function which build packets to prevent
it from failing. This is important when the packet numbers are consumed
by several threads. The packet number is used to build and encrypt packets
and must be incremented only and only if the packet it refers to has been
successfully built.
These structures are similar. quic_tx_frm was there to try to reduce the
size of such objects which embed a union for all the QUIC frames.
Furtheremore this patch fixes the issue where quic_tx_frm objects were freed
from the pool for quic_frame.
Make quic_rx_packet_ref(inc|dec)() functions be thread safe.
Make use of ->rx.crypto.frms_rwlock RW lock when manipulating RX frames
from qc_treat_rx_crypto_frms().
Modify atomically several variables attached to RX part of quic_enc_level struct.
->rx.crypto member of quic_enc_level struct was not initialized as
this was done for all other members of this structure. This patch
fixes this.
Also adds a RW lock for the frame of this member.
Add two functions to encrement or decrement a referenc counter
attached to TX packet structure (struct quic_tx_packet). The packet are freed
when their counters reach the null value.
We use only ring buffers (struct qring) to prepare and send QUIC datagrams.
We can safely remove the old buffering implementation which was not thread safe.
We modify the functions responsible of building packets to put these latters
in ring buffers (qc_build_hdshk_pkt() during the handshake step, and
qc_build_phdshk_apkt() during the post-handshake step). These functions
remove a ring buffer from its list to build as much as possible datagrams.
Eache datagram is prepended of two field: the datagram length and the
first packet in the datagram. We chain the packets belonging to the same datagram
in a singly linked list to reach them from the first one: indeed we must
modify some members of each packet when we really send them from send_ppkts().
This function is also modified to retrieved the datagram from ring buffers.
We initialize the pointer to the listener TX ring buffer list.
Note that this is not done for QUIC clients as we do not fully support them:
we only have to allocate the list and attach it to server struct I guess.
We allocate an array of QUIC ring buffer, one by thread, and arranges them in a
MT_LIST. Everything is allocated or nothing: we do not want to usse an incomplete
array of ring buffers to ensure that each thread may safely acquire one of these
buffers.
This implementation is inspired from Linux kernel circular buffer implementation
(see include/linux/circ-buf.h). Such buffers may be used at the same time both
by writer and reader (lock-free).
Modify the I/O dgram handler principal function used to parse QUIC packets
be thread safe. Its role is at least to create new incoming connections
add to two trees protected by the same RW lock. The packets are for now on
fully parsed before possibly creating new connections.
Move the connection state from quic_conn_ctx struct to quic_conn struct which
is the structure which is used to store the QUIC connection part information.
This structure is initialized by the I/O dgram handler for each new connection
to QUIC listeners. This is needed for the multithread support so that to not
to have to depend on the connection context potentially initialized by another
thread.
We must protect from concurrent the tree which stores the QUIC packets received
by the dgram I/O handler, these packets being also parsed by the xprt task.
Make depends qc_new_isecs() only on quic_conn struct initialization only (no more
dependency on connection struct initialization) to be able to run it as soon as
the quic_conn struct is initialized (from the I/O handler) before running ->accept()
quic proto callback.
Move the QUIC conn (struct quic_conn) initialization from quic_sock_accept_conn()
to qc_lstnr_pkt_rcv() as this is done for the server part.
Move the timer initialization to ->start xprt callback to ensure the connection
context is done : it is initialized by the ->accept callback which may be run
by another thread than the one for the I/O handler which also run ->start.
The name the maximum packet size transport parameter was ambiguous and replaced
by maximum UDP payload size. Our code would be also ambiguous if it does not
reflect this change.
This function calls quic_mux_transport_params_update() to update the related
streams transport parameter of the mux. It is there only so that not to have
to include mux_quic.h to update these parameters.
Add a new structure to store enough information about STREAM frames which
must be stored before being delivered to the application layer, for any
reason.
The flow control at stream level is organized by types (client bidi, server bidi,
client uni, server uni). Adds at least callback to retrieve the number
of available streams by direction.
This file has been derived from mux_h2.c removing all h2 parts. At
QUIC mux layer, there must not be any reference to http. This will be the
responsability of the application layer (h3) to open streams handled by the mux.
We move ->params transport parameters to ->rx.params. They are the
transport parameters which will be sent to the peer, and used for
the endpoint flow control. So, they will be used to received packets
from the peer (RX part).
Also move ->rx_tps transport parameters to ->tx.params. They are the
transport parameter which are sent by the peer, and used to respect
its flow control limits. So, they will be used when sending packets
to the peer (TX part).
appctx_new() is exclusively called with tid_bit and it only uses the
mask to pass it to the accompanying task. There is no point requiring
the caller to know about a mask there, nor is there any point in
creating an applet outside of the context of its own thread anyway.
Let's drop this and pass tid_bit to task_new() directly.
A new warning is reported by gcc11 when using a pointer to uninitialized
memory block for a function with a const pointer argument. The warning
is triggered for istalloc, used by http_client.c / proxy.c / tcpcheck.c.
This warning is reported because the uninitialized memory block
allocated by malloc should not be passed to a const argument as in ist2.
See https://gcc.gnu.org/onlinedocs/gcc-11.1.0/gcc/Warning-Options.html#index-Wmaybe-uninitialized
This should be backported up to 2.2.
Ilya reported in issue #1391 a build warning on Fedora about mallinfo()
being deprecated in favor of mallinfo2() since glibc-2.33. Let's add
support for it. This should be backported where the following commit is
also backported: 157e39303 ("MINOR: pools: automatically disable
malloc_trim() with external allocators").
"f(void)" is the correct and preferred form for a function taking no
argument, while some places use the older "f()". These were reported
by clang's -Wmissing-prototypes, for example:
src/cpuset.c:111:5: warning: no previous prototype for function 'ha_cpuset_size' [-Wmissing-prototypes]
int ha_cpuset_size()
include/haproxy/cpuset.h:42:5: note: this declaration is not a prototype; add 'void' to make it a prototype for a zero-parameter function
int ha_cpuset_size();
^
void
This aggregate patch fixes this for the following functions:
ha_backtrace_to_stderr(), ha_cpuset_size(), ha_panic(), ha_random64(),
ha_thread_dump_all_to_trash(), get_exec_path(), check_config_validity(),
mworker_child_nb(), mworker_cli_proxy_(create|stop)(),
mworker_cleantasks(), mworker_cleanlisteners(), mworker_ext_launch_all(),
mworker_reload(), mworker_(env|proc_list)_to_(proc_list|env)(),
mworker_(un|)block_signals(), proxy_adjust_all_maxconn(),
proxy_destroy_all_defaults(), get_tainted(),
pool_total_(allocated|used)(), thread_isolate(_full|)(),
thread(_sync|)_release(), thread_harmless_till_end(),
thread_cpu_mask_forced(), dequeue_all_listeners(), next_timer_expiry(),
wake_expired_tasks(), process_runnable_tasks(), init_acl(),
init_buffer(), (de|)init_log_buffers(), (de|)init_pollers(),
fork_poller(), pool_destroy_all(), pool_evict_from_local_caches(),
pool_total_failures(), dump_pools_to_trash(), cfg_run_diagnostics(),
tv_init_(process|thread)_date(), __signal_process_queue(),
deinit_signals(), haproxy_unblock_signals()
-Wundef triggered on a MIPS-based musl build on __WORDSIZE that's used
in ultoa_o() and some Lua initialization. The former will fail to convert
integers larger to 1 billion to proper string in this case. Let's make
sure this macro is defined and fall back to values determined from
__SIZEOF_LONG__ otherwise. A cleaner long-term approach would consist
in removing all remaining occurrences of this macro.
This can be backported to all versions.
Building with an old musl-based toolchain reported this warning:
include/haproxy/thread.h: In function 'ha_thread_relax':
include/haproxy/thread.h:256:5: warning: "_POSIX_PRIORITY_SCHEDULING" is not defined [-Wundef]
#if _POSIX_PRIORITY_SCHEDULING
^
There were indeed two "#if" insteadd of #ifdef" for this macro, let's
fix them.
When the header list is added, after the message parsing, headers with no
value are now ignored. It is not the same than headers with empty value
fields. Only headers with a NULL pointer as value are skipped. This only
happens if the header value is removed during the message
parsing. Concretly, such headers are now ignored when htx_add_all_headers()
is called. However, htx_add_header() is not affected by this change.
Symetrically, the same is true for trailers. It may be backported to 2.4
because of the previous fix ("BUG/MEDIUM: mux-h1: Remove "Upgrade:" header
for requests with payload").
There's no point taking the variables locks for sess/txn/req/res
contexts since these ones always run inside the same thread anyway.
This patch conditions the lock on the variable's scope to avoid
flushing cache lines when not needed.
This showed an improvement of ~5% on a 16-thread machine with 12
variables.
The global table of known variables names can only grow and was designed
for static names that are registered at boot. Nowadays it's possible to
set dynamic variable names from Lua or from the CLI, which causes a real
problem that was partially addressed in 2.2 with commit 4e172c93f
("MEDIUM: lua: Add `ifexist` parameter to `set_var`"). Please see github
issue #624 for more context.
This patch simplifies all this by removing the need for a central
registry of known names, and storing 64-bit hashes instead. This is
highly sufficient given the low number of variables in each context.
The hash is calculated using XXH64() which is bijective over the 64-bit
space thus is guaranteed collision-free for 1..8 chars. Above that the
risk remains around 1/2^64 per extra 8 chars so in practice this is
highly sufficient for our usage. A random seed is used at boot to seed
the hash so that it's not attackable from Lua for example.
There's one particular nit though. The "ifexist" hack mentioned above
is now limited to variables of scope "proc" only, and will only match
variables that were already created or declared, but will now verify
the scope as well. This may affect some bogus Lua scripts and SPOE
agents which used to accidentally work because a similarly named
variable used to exist in a different scope. These ones may need to be
fixed to comply with the doc.
Now we can sum up the situation as this one:
- ephemeral variables (scopes sess, txn, req, res) will always be
usable, regardless of any prior declaration. This effectively
addresses the most problematic change from the commit above that
in order to work well could have required some script auditing ;
- process-wide variables (scope proc) that are mentioned in the
configuration, referenced in a "register-var-names" SPOE directive,
or created via "set-var" in the global section or the CLI, are
permanent and will always accept to be set, with or without the
"ifexist" restriction (SPOE uses this internally as well).
- process-wide variables (scope proc) that are only created via a
set-var() tcp/http action, via Lua's set_var() calls, or via an
SPOE with the "force-set-var" directive), will not be permanent
but will always accept to be replaced once they are created, even
if "ifexist" is present
- process-wide variables (scope proc) that do not exist will only
support being created via the set-var() tcp/http action, Lua's
set_var() calls without "ifexist", or an SPOE declared with
"force-set-var".
This means that non-proc variables do not care about "ifexist" nor
prior declaration, and that using "ifexist" should most often be
reliable in Lua and that SPOE should most often work without any
prior declaration. It may be doable to turn "ifexist" to 1 by default
in Lua to further ease the transition. Note: regtests were adjusted.
Cc: Tim Dsterhus <tim@bastelstu.be>
We certainly do not want that a permanent variable (one that is listed
in the configuration) be erased by accident by an "unset-var" action.
Let's make sure these ones are only reset to an empty sample, like at
the moment of their initial registration. One trick is that the same
function is used to purge the memory at the end and to delete, so we
need to add an extra "force" argument to make the choice.
In order to continue to honor the ifexist Lua option and prevent rogue
SPOA agents from creating too many variables, we'll need to keep the
ability to mark certain proc.* variables as permanent when they're
known from the config file.
Let's add a flag there for this. It's added to the variable when the
variable is created with this flag set by the caller.
Another approach could have been to use a distinct list or distinct
scope but that sounds complicated and bug-prone.
Passing this flag to var_set() will result in the variable to only be
created if it did not exist, otherwise nothing is done (it's not even
updated). This will be used for pre-registering names.
When setting variables, there are currently two variants, one which will
always create the variable, and another one, "ifexist", which will only
create or update a variable if a similarly named variable in any scope
already existed before.
The goal was to limit the risk of injecting random names in the proc
scope, but it was achieved by making use of the somewhat limited name
indexing model, which explains the scope-agnostic restriction.
With this change, we're moving the check downwards in the chain, at the
variable level, and only variables under the scope "proc" will be subject
to the restriction. A new set of VF_* flags was added to adjust how
variables are set, and VF_UPDATEONLY is used to mention this restriction.
In this exact state of affairs, this is not completely exact, as if a
similar name was not known in any scope, the variable will continue to
be rejected like before, but this will change soon.
The vars_init() name is particularly confusing as it does not initialize
the variables code but the head of a list of variables passed in
arguments. And we'll soon need to have proper initialization code, so
let's rename it now.
In ticket #1348 some users expressed some concerns regarding the removal
of the "grace" directive from the proxies. Their use case very closely
mimmicks the original intent of the grace keyword, which is, let haproxy
accept traffic for some time when stopping, while indicating an external
LB that it's stopping.
This is implemented here by starting a task whose expiration triggers
the soft-stop for real. The global "stopping" variable is immediately
set however. For example, this below will be sufficient to instantly
notify an external check on port 9999 that the service is going down,
while other services remain active for 10s:
global
grace 10s
frontend ext-check
bind :9999
monitor-uri /ext-check
monitor fail if { stopping }
Ori Hollander of JFrog Security reported that htx_add_header() and
htx_add_trailer() were missing a length check on the header name. While
this does not allow to overwrite any memory area, it results in bits of
the header name length to slip into the header value length and may
result in forging certain header names on the input. The sad thing here
is that a FIXME comment was present suggesting to add the required length
checks :-(
The injected headers are visible to the HTTP internals and to the config
rules, so haproxy will generally stay synchronized with the server. But
there is one exception which is the content-length header field, because
it is already deduplicated on the input, but before being indexed. As
such, injecting a content-length header after the deduplication stage
may be abused to present a different, shorter one on the other side and
help build a request smuggling attack, or even maybe a response splitting
attack. CVE-2021-40346 was assigned to this problem.
As a mitigation measure, it is sufficient to verify that no more than
one such header is present in any message, which is normally the case
thanks to the duplicate checks:
http-request deny if { req.hdr_cnt(content-length) gt 1 }
http-response deny if { res.hdr_cnt(content-length) gt 1 }
This must be backported to all HTX-enabled versions, hence as far as 2.0.
In 2.3 and earlier, the functions are in src/htx.c instead.
Many thanks to Ori for his work and his responsible report!
Since commit "BUG/MINOR: config: reject configs using HTTP with bufsize
>= 256 MB" we are now sure that it's not possible anymore to have an HTX
block of a size 256 MB or more, even after concatenation thanks to the
tests for len >= htx_free_data_space(). Let's remove these now obsolete
comments.
A BUG_ON() was added in htx_add_blk() to track any such exception if
the conditions would change later, to complete the one that is performed
on the start address that must remain within the buffer.
In preparation for support default values when fetching variables, we
need to update the internal API to pass an extra argument to functions
vars_get_by_{name,desc} to provide an optional default value. This
patch does this and always passes NULL in this argument. var_to_smp()
was extended to fall back to this value when available.
The set-var() action is convenient because it preserves the input type
but it's a pain to deal with when trying to concatenate values. The
most recurring example is when it's needed to build a variable composed
of the source address and the source port. Usually it ends up like this:
tcp-request session set-var(sess.port) src_port
tcp-request session set-var(sess.addr) src,concat(":",sess.port)
This is even worse when trying to aggregate multiple fields from stick-table
data for example. Due to this a lot of users instead abuse headers from HTTP
rules:
http-request set-header(x-addr) %[src]:%[src_port]
But this requires some careful cleanups to make sure they won't leak, and
it's significantly more expensive to deal with. And generally speaking it's
not clean. Plus it must be performed for each and every request, which is
expensive for this common case of ip+port that doesn't change for the whole
session.
This patch addresses this limitation by implementing a new "set-var-fmt"
action which performs the same work as "set-var" but takes a format string
in argument instead of an expression. This way it becomes pretty simple to
just write:
tcp-request session set-var-fmt(sess.addr) %[src]:%[src_port]
It is usable in all rulesets that already support the "set-var" action.
It is not yet implemented for the global "set-var" directive (which already
takes a string) and the CLI's "set var" command, which would definitely
benefit from it but currently uses its own parser and engine, thus it
must be reworked.
The doc and regtests were updated.
For a long time we couldn't have arguments in expressions used in
tcp-request, tcp-response etc rules. But now due to the variables
it's possible, and their context in case of failure to resolve an
argument (e.g. backend name not found) is not properly reported
because there is no arg context values in ARGC_* to report them.
Let's add a number of missing ones for tcp-request {connection,
session,content}, tcp-response content, tcp-check, the config
parser (for "set-var" in the global section) and the CLI parser
(for "set-var" on the CLI).
Sometimes it is convenient to remap large sets of URIs to new ones (e.g.
after a site migration for example). This can be achieved using
"http-request redirect" combined with maps, but one difficulty there is
that non-matching entries will return an empty response. In order to
avoid this, duplicating the operation as an ACL condition ending in
"-m found" is possible but it becomes complex and error-prone while it's
known that an empty URL is not valid in a location header.
This patch addresses this by improving the redirect rules to be able to
simply ignore the rule and skip to the next one if the result of the
evaluation of the "location" expression is empty. However in order not
to break existing setups, it requires a new "ignore-empty" keyword.
There used to be an ACT_FLAG_FINAL on redirect rules that's used during
the parsing to emit a warning if followed by another rule, so here we
only set it if the option is not there. The http_apply_redirect_rule()
function now returns a 3rd value to mention that it did nothing and
that this was not an error, so that callers can just ignore the rule.
The regular "redirect" rules were not modified however since this does
not apply there.
The map_redirect VTC was completed with such a test and updated to 2.5
and an example was added into the documentation.
The locking in the dequeuing process was significantly improved by commit
49667c14b ("MEDIUM: queue: take the proxy lock only during the px queue
accesses") in that it tries hard to limit the time during which the
proxy's queue lock is held to the strict minimum. Unfortunately it's not
enough anymore, because we take up the task and manipulate a few pendconn
elements after releasing the proxy's lock (while we're under the server's
lock) but the task will not necessarily hold the server lock since it may
not have successfully found one (e.g. timeout in the backend queue). As
such, stream_free() calling pendconn_free() may release the pendconn
immediately after the proxy's lock is released while the other thread
currently proceeding with the dequeuing tries to wake up the owner's
task and dies in task_wakeup().
One solution consists in releasing le proxy's lock later. But tests have
shown that we'd have to sacrifice a significant share of the performance
gained with the patch above (roughly a 20% loss).
This patch takes another approach. It adds a "del_lock" to each pendconn
struct, that allows to keep it referenced while the proxy's lock is being
released. It's mostly a serialization lock like a refcount, just to maintain
the pendconn alive till the task_wakeup() call is complete. This way we can
continue to release the proxy's lock early while keeping this one. It had
to be added to the few points where we're about to free a pendconn, namely
in pendconn_dequeue() and pendconn_unlink(). This way we continue to
release the proxy's lock very early and there is no performance degradation.
This lock may only be held under the queue's lock to prevent lock
inversion.
No backport is needed since the patch above was merged in 2.5-dev only.
This option can be used to define a specific log format that will be
used in case of error, timeout, connection failure on a frontend... It
will be used for any log line concerned by the log-separate-errors
option. It will also replace the format of specific error messages
decribed in section 8.2.6.
If no "error-log-format" is defined, the legacy error messages are still
emitted and the other error logs keep using the regular log-format.
Other build warnings were emitted on LIBRESSL_VERSION_NUMBER with -Wundef
under openssl < 1.1. Related to GH issue #1369. Seems like some of them
could be simplified a little bit.
Openssl-compat emits a warning for the test on LIBRESSL_VERSION that might
be underfined, if built with -Wundef. The fix is easy, let's do it. Related
to GH issue #1369.
As reported in GH issue #1369, there is a single case of #if with a
possibly undefined value in defaults.h which is on MAXHOSTNAMELEN. Let's
turn it to a #ifdef.
The code used to rely on BITS_PER_LONG to decide on the most efficient
way to perform a 64-bit shift, but this macro is not defined (at best
it's __BITS_PER_LONG) and it's likely that it's been like this since
the early implementation of ebtrees designed on i386. Let's remove the
test on this macro and rely on sizeof(long) instead, it also has the
benefit of letting the compiler validate the two branches.
This can be backported to all versions. Thanks to Ezequiel Garcia for
reporting this one in issue #1369.
Before threads were introduced in 1.8, idle_pct used to be a global
variable indicating the overall process idle time. Threads made it
thread-local, meaning that its reporting in the stats made little
sense, though this was not easy to spot. In 2.0, the idle_pct variable
moved to the struct thread_info via commit 81036f273 ("MINOR: time:
move the cpu, mono, and idle time to thread_info"). It made it more
obvious that the idle_pct was per thread, and also allowed to more
accurately measure it. But no more effort was made in that direction.
This patch introduces a new report_idle() function that accurately
averages the per-thread idle time over all running threads (i.e. it
should remain valid even if some threads are paused or stopped), and
makes use of it in the stats / "show info" reports.
Sending traffic over only two connections of an 8-thread process
would previously show this erratic CPU usage pattern:
$ while :; do socat /tmp/sock1 - <<< "show info"|grep ^Idle;sleep 0.1;done
Idle_pct: 30
Idle_pct: 35
Idle_pct: 100
Idle_pct: 100
Idle_pct: 100
Idle_pct: 100
Idle_pct: 100
Idle_pct: 100
Idle_pct: 35
Idle_pct: 33
Idle_pct: 100
Idle_pct: 100
Idle_pct: 100
Idle_pct: 100
Idle_pct: 100
Idle_pct: 100
Now it shows this more accurate measurement:
$ while :; do socat /tmp/sock1 - <<< "show info"|grep ^Idle;sleep 0.1;done
Idle_pct: 83
Idle_pct: 83
Idle_pct: 83
Idle_pct: 83
Idle_pct: 83
Idle_pct: 83
Idle_pct: 83
Idle_pct: 83
Idle_pct: 83
Idle_pct: 83
Idle_pct: 83
Idle_pct: 83
Idle_pct: 83
Idle_pct: 83
Idle_pct: 83
This is not technically a bug but this lack of precision definitely affects
some users who rely on the idle_pct measurement. This should at least be
backported to 2.4, and might be to some older releases depending on users
demand.
In 2.4 we extended the max poll time from 1s to 60s with commit
4f59d3861 ("MINOR: time: increase the minimum wakeup interval to 60s").
This had the consequence that the calculation of the idle time percentage
may overflow during the multiply by 100 if the thread had slept 43s or
more. Let's change this to a 64 bit computation. This will have no
performance impact since this is done at most twice per second.
This should fix github issue #1366.
This must be backported to 2.4.
To be able to provide JA3 compatible TLS Fingerprints we need to expose
all Client Hello captured data using fetchers. Patch provides new
and modifies existing fetchers to add ability to filter out GREASE values:
- ssl_fc_cipherlist_*
- ssl_fc_ecformats_bin
- ssl_fc_eclist_bin
- ssl_fc_extlist_bin
- ssl_fc_protocol_hello_id
When we set tune.ssl.capture-cipherlist-size to a non-zero value
we are able to capture cipherlist supported by the client. To be able to
provide JA3 compatible TLS fingerprinting we need to capture more
information from Client Hello message:
- SSL Version
- SSL Extensions
- Elliptic Curves
- Elliptic Curve Point Formats
This patch allows HAProxy to capture such information and store it for
later use.
There are regularly places, especially in config analysis, where we
need to report certain things (warnings or errors) only once, but
where implementing a counter is sufficiently deterrent so that it's
not done.
Let's add a simple ONLY_ONCE() macro that implements a static variable
(char) which is atomically turned on, and returns true if it's set for
the first time. This uses fairly compact code, a single byte of BSS
and is thread-safe. There are probably a number of places in the config
parser where this could be used. It may also be used to implement a
WARN_ON() similar to BUG_ON() but which would only warn once.
Define a flag to mark a server as non purgeable. This flag will be used
for "delete server" CLI handler. All servers without this flag will be
eligible to runtime suppression.
In a future patch, it will be possible to remove at runtime every
servers, both static and dynamic. This requires to extend the server
refcount for all instances.
First, refcount manipulation functions have been renamed to better
express the API usage.
* srv_refcount_use -> srv_take
The refcount is always initialize to 1 on the server creation in
new_server. It's also incremented for each check/agent configured on a
server instance.
* free_server -> srv_drop
This decrements the refcount and if null, the server is freed, so code
calling it must not use the server reference after it. As a bonus, this
function now returns the next server instance. This is useful when
calling on the server loop without having to save the next pointer
before each invocation.
In these functions, remove the checks that prevent refcount on
non-dynamic servers. Each reference to "dynamic" in variable/function
naming have been eliminated as well.
A dynamic server may be deleted at runtime at the same moment when the
stats applet is pointing to it. Use the server refcount to prevent
deletion in this case.
This should be backported up to 2.4, with an observability period of 2
weeks. Note that it requires the dynamic server refcounting feature
which has been implemented on 2.5; the following commits are required :
- MINOR: server: implement a refcount for dynamic servers
- BUG/MINOR: server: do not use refcount in free_server in stopping mode
- MINOR: server: return the next srv instance on free_server
As a convenience, return the next server instance from servers list on
free_server.
This is particularily useful when using this function on the servers
list without having to save of the next pointer before calling it.
Implements a way of checking the running openssl version:
If the OpenSSL support was not compiled within HAProxy it will returns a
error, so it's recommanded to do a SSL feature check before:
$ ./haproxy -cc 'feature(OPENSSL) && openssl_version_atleast(0.9.8zh) && openssl_version_before(3.0.0)'
This will allow to select the SSL reg-tests more carefully.
Include the correct .h files in http_client.c and http_client.h.
The api.h is needed in http_client.c and http_client-t.h is now include
directly from http_client.h
The X509_STORE_CTX_get0_cert did not exist yet on OpenSSL 1.0.2 and
neither did X509_STORE_CTX_get0_chain, which was not actually needed
since its get1 equivalent already existed.
Most of the SSL sample fetches related to the client certificate were
based on the SSL_get_peer_certificate function which returns NULL when
the verification process failed. This made it impossible to use those
fetches in a log format since they would always be empty.
The patch adds a reference to the X509 object representing the client
certificate in the SSL structure and makes use of this reference in the
fetches.
The reference can only be obtained in ssl_sock_bind_verifycbk which
means that in case of an SSL error occurring before the verification
process ("no shared cipher" for instance, which happens while processing
the Client Hello), we won't ever start the verification process and it
will be impossible to get information about the client certificate.
This patch also allows most of the ssl_c_XXX fetches to return a usable
value in case of connection failure (because of a verification error for
instance) by making the "conn->flags & CO_FL_WAIT_XPRT" test (which
requires a connection to be established) less strict.
Thanks to this patch, a log-format such as the following should return
usable information in case of an error occurring during the verification
process :
log-format "DN=%{+Q}[ssl_c_s_dn] serial=%[ssl_c_serial,hex] \
hash=%[ssl_c_sha1,hex]"
It should answer to GitHub issue #693.
This commit implements a very simple HTTP Client API.
A client can be operated by several functions:
- httpclient_new(), httpclient_destroy(): create
and destroy the struct httpclient instance.
- httpclient_req_gen(): generate a complete HTX request using the
the absolute URL, the method and a list of headers. This request
is complete and sets the HTX End of Message flag. This is limited
to small request we don't need a body.
- httpclient_start() fill a sockaddr storage with a IP extracted
from the URL (it cannot resolve an fqdm for now), start the
applet. It also stores the ptr of the caller which could be an
appctx or something else.
- hc->ops contains a list of callbacks used by the
HTTPClient, they should be filled manually after an
httpclient_new():
* res_stline(): the client received a start line, its content
will be stored in hc->res.vsn, hc->res.status, hc->res.reason
* res_headers(): the client received headers, they are stored in
hc->res.hdrs.
* res_payload(): the client received some payload data, they are
stored in the hc->res.buf buffer and could be extracted with the
httpclient_res_xfer() function, which takes a destination buffer
as a parameter
* res_end(): this callback is called once we finished to receive
the response.
While http_parse_scheme() extracts a scheme from a URI by extracting
exactly the valid characters and stopping on delimiters, this new
function performs the same on a fixed-size string.
This new class exposes methods to manipulate HTTP messages from a filter
written in lua. Like for the HTTP class, there is a bunch of methods to
manipulate the message headers. But there are also methods to manipulate the
message payload. This part is similar to what is available in the Channel
class. Thus the payload can be duplicated, erased, modified or
forwarded. For now, only DATA blocks can be retrieved and modified because
the current API is limited. No HTTPMessage method is able to yield. Those
manipulating the headers are always called on messages containing all the
headers, so there is no reason to yield. Those manipulating the payload are
called from the http_payload filters callback function where yielding is
forbidden.
When an HTTPMessage object is instantiated, the underlying Channel object
can be retrieved via the ".channel" field.
For now this class is not used because the HTTP filtering is not supported
yet. It will be the purpose of another commit.
There is no documentation for now.
A lua TXN can be created when a sample fetch, an action or a filter callback
function is executed. A flag is now used to track the execute context.
Respectively, HLUA_TXN_SMP_CTX, HLUA_TXN_ACT_CTX and HLUA_TXN_FLT_CTX. The
filter flag is not used for now.