When traces are disabled, we used to make TRACE() and other macros just
emit a "do { } while (0)" statement, which has the unfortunate limitation
of explicitly marking the arguments as not used. As such, all variables
that are initialized in functions for the sole purpose of being passed
to the trace calls end up emitting warnings about "foo defined but not
used". It is difficult to keep these in a clean state all the time, and
to always think about adding __maybe_unused after each declaration, and
the traces try hard to be developer-friendly in order to gain in adoption.
Let's just remap all macros to __eat_all_args() which will mark all
arguments as used. No code is emitted, the output binary is the same
as with the while(0) stuff, but syntactically speaking the argument is
used and the compiler is happy.
It may be useful to backport this to 3.4 as it's already expected that
some future fixes will trigger build warnings there otherwise. This
commit requires these two ones:
CLEANUP: traces: get rid of a few rare empty args in TRACE calls
MINOR: compiler: add a macro to ignore all arguments
Regularly when disabling features (e.g. traces), some macros that would
make use of some arguments end up not consuming them at all, making the
compiler complain that "variable foo defined but not used".
An elegant way to generically mark arguments as used is to pass them to
a variadic function. However a first argument is needed. So we create a
macro that passes (0, __VA_ARGS__) to an inline function that does nothing
from its arguments, and that's done.
Recent fix of some HTX muxes to drain remaining data when the stream is in
closed state revealed a bug, mainly due to a corner case of the HTX API.
It is possible to have an empty HTX message with a parsing/internal
error. In that case, the underlying buffer remains full. It is mandatory to
prevent any buffer release and be sure the error will be handeled.
On the other end, at several places, when data must be transfer from an HTX
message to another one, we try to swap underlying buffers instead of
performing a bloc-per-bloc copy. To do so, we rely on b_xfer() function. One
condition is that the destination message must be empty. And here is the
issue. The HTX message can be empty but the buffer can also be full because
an HTX error was triggered earlier and not handled yet. In that case,
attempting to call b_xfer() leads to a crash because the destination buffer
is full. It is not expected to call b_xfer() if there is not enough space in
the destination buffer.
So, it appears the HTX API should be improved/fixed but first of all, the
bug must be fixed. Especially because stable versions are also affected. The
htx_is_empty_noerr() function was added to know if a HTX message is empty
and no error was reported on it. And this function is now used, instead of
htx_is_empty(), to know if we can safely swap the underlying buffers or not.
the FCGI, H2 and QUIC multiplexers are concerned. The HTTP client and the
applet API were also fixed while it seems harder to trigger the bug at these
places.
The fix must be backported to all supported versions.
app_log() and send_log() build the message with vsnprintf(), which stops
at the first NUL byte and therefore cannot emit an arbitrary binary
payload.
Add two variants that pass a pre-built <msg> of <len> bytes straight to
__send_log() without formatting it, so embedded NUL bytes are preserved:
* app_log_raw() : takes an explicit list of loggers and a tag
* send_log_raw() : derives both from a proxy
The send path still strips trailing LF / NUL bytes (kept for the legacy
text logs), so the message must be self-terminating by its own encoding
and must not rely on a meaningful trailing '\n' or NUL.
This patch imports the implementation of haload, a lightweight,
multi-threaded traffic generator designed to benchmark HTTP infrastructures
under heavy loads. Built onto HAProxy's highly scalable
architecture, it natively supports HTTP/1, HTTP/2, and HTTP/3 (QUIC).
It uses the previously exposed initialization functions, the no-listener mode,
the lightweight hbuf API, and the specialized hldstream object types to
dynamically derive and generate its configuration in memory from basic
command-line inputs. By leveraging HAProxy's internal HTX
(Internal HTTP Native Representation) format, haload abstractly manipulates
HTTP elements independently of the wire protocol. This
abstraction allows it to generate unified requests and process responses
seamlessly across HTTP/1.1, HTTP/2, or HTTP/3 without duplicating the payload
handling logic for each version.
- Makefile:
Introduce the 'haload' compilation target and define HALOAD_OBJS.
- src/haload.c, include/haproxy/haload.h:
Add user and stream task scheduling handlers, HTX-driven traffic orchestration
mechanisms, and terminal benchmarking statistical summary rendering.
- src/haload_init.c:
Implement program arguments parsing, fileless HAProxy memory configuration
generation, and target URL allocations.
- src/stconn.c:
Wire up sc_attach_mux() to properly allocate the specific tasklet
context when dealing with a haload stream.
- doc/haload.txt:
Add detailed documentation covering compilation, flags, and usage examples.
Export _srv_parse_kw() and srv_postinit() so they can be called from
haload (to come), which needs to configure servers using HAProxy's configuration
parser keywords.
This patch exports sc_new() by removing its static storage class and
adding its prototype to include/haproxy/stconn.h.
This is required to allow external modules, such as the upcoming haload
benchmarking tool, to allocate and initialize new stream connectors
from a stream endpoint descriptor (sedesc).
This patch introduces the sc_hastream() and __sc_hastream() inline
helpers to retrieve a haload stream context (struct hastream) from
a stream connector.
These functions allow the stconn layer to safely access haload-specific
stream data when the application type is OBJ_TYPE_HXLOAD.
This patch introduces the OBJ_TYPE_HXLOAD object type to distinguish
the haload stream objects (struct hastream).
It also adds the associated inline helper functions objt_hastream()
and __objt_hastream() to allow safe casting and retrieval of
hastream contexts from a generic object pointer, following the
standard container_of pattern.
haload is a client-side HTTP benchmarking tool designed to manage
concurrent HTTP streams.
This patch defines the hldstream C structure, which serves as the
core object to represent a haload HTTP stream for all the HTTP protocol.
It will be used by the upcoming haload module to handle specialized
stream contexts.
haload is the successor to the h1load HTTP benchmarking tool.
This patch adds haload stream definitions as arguments for the TRACE API.
These will be used by the upcoming haload module, which will handle
hldstream struct objects instead of regular stream structs.
Introduce the new <no_listener_mode> global variable to define a new operating mode
for haproxy. This variable can be set to 1 to allow haproxy to start without
any listeners. Without such a setting, haproxy refuses to start without listener.
During the initialization cycle, setting this variable to 1 ensures that the
lack of configured listeners is no longer treated as a fatal error. This allows
programs based on haproxy source code to initialize the stack and use its
features even without a frontend. This will be the case for haload.
Add a new lightweight hbuf API to buffer formatted strings, similar to the
existing buffer API (struct buffer), extracting the code which already does this
in haterm_init.c. This is used by haterm to build its configuration in memory
(fileless mode). And this will be used by haload to do the same thing.
Update haterm to use this new API.
Note: hstream_str_buf_append() has been renamed to hbuf_str_append().
port_range was never freed. That used not to be a problem, but now that
we can dynamically add and remove servers, it becomes one, as that leads
to a memory leak each time a server with a "source" directive is destroyed.
However, just adding a free() is not enough. We have to add a refcount,
because the server is not the only one with a reference to it. We may
also have one in fdinfo, so that we know which port to release when we
finally close the fd.
So add a refcount, and make sure to call port_range_release() when a
server is destroyed.
This should be backported up to 3.0.
Add the ability to rename a HAProxy server at runtime via the CLI:
set server <backend>/<server> name <newname>
This is useful in slot-based dynamic scaling setups where servers are
pre-allocated with generic names (e.g. srv001, srv002) but the operator
wants the names to reflect the current workload (e.g. pod name or
IP:port) for observability and server-state-file consistency.
The implementation:
- validates the new name: non-empty, passes invalid_char() check
(allows [A-Za-z0-9_:.-]), and fits in the event data name field
- requires the server to be administratively in maintenance mode
(same precondition as 'del server')
- rejects the rename if the server has SRV_F_NAME_REFD set (use-server
target, track target, sample-fetch ARGT_SRV referent) - keeps the
running state consistent with the configuration text
- re-indexes the server in the name tree under thread_isolate(),
mirroring the locking pattern used by 'add server' / 'del server'
- publishes a new EVENT_HDL_SUB_SERVER_NAME event with the old and
new names so downstream consumers (logs, observability backends)
can track the rename
- frees the old name immediately under thread isolation: srv_name
sample consumers (ACLs, log formats, ...) act on the fetched pointer
within the current task and do not retain it across wake-ups, so
no extra deferred-free machinery is needed
There is no opt-in directive: like 'add server' and 'del server', the
operation is gated by the server's properties rather than by a
per-backend toggle. This avoids the runtime-surprise failure mode
where an operator discovers at the CLI that renaming is forbidden by
a missing 'option server-rename' rather than by an actual structural
reference.
This feature was discussed in:
https://github.com/haproxy/haproxy/issues/952
Until now, every form of "this server is referenced by something in the
running config" was collapsed onto a single flag, SRV_F_NON_PURGEABLE,
which prevents the server from being removed via 'del server'. This
catches everything but conflates two distinct properties:
- the server object itself is pinned by another runtime structure
(e.g. DNS resolution attached to it), versus
- the server's *name* is referenced statically (use-server rules,
track chains, sample-fetch arguments of type ARGT_SRV)
These differ for any operation that touches the name but not the
object identity, e.g. the runtime rename feature added next. Removing
a name-referenced server is still forbidden (the rule text would
dangle), but renaming such a server should also be forbidden for the
same reason - while renaming a resolver-pinned server is fine, since
the resolver holds the object pointer and doesn't care about the name.
Introduce SRV_F_NAME_REFD for the name-reference case and move the
three name-based setters (sample.c ARGT_SRV resolution, proxy.c
use-server resolution, server.c track chain setup) from
SRV_F_NON_PURGEABLE to SRV_F_NAME_REFD. The resolvers.c call site
keeps SRV_F_NON_PURGEABLE since it is the object-pinned case.
Adjust 'del server' to check both flags so the set of servers it
refuses to remove is unchanged: same observable behavior, just a
richer internal taxonomy.
A subsequent patch introducing 'set server name' will gate on
SRV_F_NAME_REFD only.
OpenTracing support has long been best-effort and was deprecated in 3.3
with removal planned in 3.5. Let's clean it up now.
This commit removes addons/ot, the build script, ARGC_OT, USE_OT and
OT_* variables in the Makefile, and replaces the config section with a
mention for the OpenTelemetry filter instead.
For more info, see GH issues #1640 and #2782, as well as the wiki's
"breaking changes" page.
Since a connection's target may no longer be a proxy and is necessarily
a server, let's simplify such checks. This is essentially in mux install
code and in the debugging code.
These ones were deprecated in 3.3-dev2 with commits 5c15ba5eff ("MEDIUM:
proxy: mark the "dispatch" directive as deprecated") and e93f3ea3f8
("MEDIUM: proxy: deprecate the "transparent" and "option transparent"
directives"), and were planned for removal in 3.5. See also:
https://github.com/orgs/haproxy/discussions/2921
as well as the wiki page about breaking changes.
They've lived their lives and always cause internal limitations
(exceptions between connecting to server or connecting to proxy), and
are even confusing to some extents (especially "transparent" which users
often get wrong).
This commit removes the ability to configure them, tests based on them
and all the doc related to them. The keywords remain detected by the
parser and indicate how to proceed instead.
It's likely that other deeper parts will be changed as well (e.g.
conn->target will no longer be of OBJ_TYPE_PROXY). This will be done
over the long term.
This adds new class TL_RT, which is processed before other queues for
one (and only one) tasklet featuring the TASK_RT flag. This is meant to
process real time wakeups under load with even less latency. We only
process one entry to make sure it will not be abused for unimportant
stuff, and if tune.sched.low-latency is set, we also avoid picking more
tasks from the current run queues and looping after the first call to
run_tasks_from_list().
Measurements under a load of 10k concurrent conns injection at 10 Gbps
(~58k 20kB objects/s) on 4 threads and with task profiling enabled shows
that the average wakeup latency for wakeups every 10ms dropped from 220
microseconds to 1.8 microsecond, and even ~550 nanoseconds when
tune.sched.low-latency is set, or 400 times less.
The doc was updated, including the schematics.
For some very rare tasks that need to be woken up at an exact date (right
now the only known use case is haload's periodic stats collection), it's
currently difficult to guarantee the wake up date on a heavily loaded
run queue.
This patch introduces TASK_RT for real-time tasks. Right now, all it does
is modify __task_wakeup() to immediately switch to __tasklet_wakeup_*()
and effectively bypass the priority-based run queue. Doing it here has
the benefit of making sure that it automatically applies to tasks found
in the wait queue, and that it will also work for _task_drop_running().
For now nothing uses it. The doc was updated.
The ambiguity in usage for __tasklet_wakeup_on() is now gone. All known
callers that used to be able to pass a negative value now call
__tasklet_wakeup_here(), and remaining ones always pass an explicit
thread number. This means that we can remove the "if (thr<0)" branch,
but still leave a BUG_ON_HOT() to catch any possibly missed case. The
comment around tasklet_wakeup_on() not supporting remotely waking a
tasklet whose tid<0 was also removed since it was addressed long ago.
This patch moves the tid check upper in the chain, in task_instant_wakeup()
so as to branch to _tasklet_wakeup_here() for run-anywhere tasks, or
_tasklet_wakeup_on() for designated threads.
At this point there is no longer any direct caller of __tasklet_wakeup_on()
passing a negative thread value.
This patch moves the tid check upper in the chain, in tasklet_wakeup()
so as to branch to _tasklet_wakeup_here() for run-anywhere tasklets, or
_tasklet_wakeup_on() for designated threads. The tid is retrieved via
__task_get_current_owner() so that the call remains compatible with
tasklets that would have a super-negative tid due to being tasks used
as tasklets.
The current tasklet_wakeup() call relies on tasklet_wakeup_on(tl->tid),
which was already quite ambiguous till now due to the sole reliance on
tid being negative or not to decide to run locally, but it no longer
works correctly if used to wake tasks up since the new set of possible
negative values for ->tid (particularly if some code calls
__tasklet_wakeup_on() on a task as is done in task_instant_wakeup()).
The problem is that it is not possible in the current API to explicitly
say that we want a task/tasklet to run locally or remotely without having
to play games with a thread number. The chosen approach to address this
is to change tasklet_wakeup_on() to always be remote and have
tasklet_wakeup_here() which will always be local, with tasklet_wakeup()
choosing one or the other depending on the tid, for backwards compat
only.
This patch implements tasklet_wakeup_here() to __tasklet_wakeup_here()
that reimplement the part of __tasklet_wakeup_on() that used to deal
with the local thread only (negative tid). No other change was made.
For now it remains unused.
The doc was updated.
The checks on TH_FL_TASK_PROFILING that are used to decide whether or not
to set t->wake_date from now_mono_time() used to be made in callers of
__tasklet_wakeup_on() and __tasklet_wakeup_after(), but not only this
needlessly inflates code by placing this in every caller (~4kB), it also
renders the design fragile since each caller needs to blindly copy-paste
that statement.
Let's move the operation in the callees instead. As a bonus, it allows
to check the flag on the target thread and not on the calling thread
(which was arguably a bug though without a noticeable effect since for
now profiling is for all threads or none).
Refactor the Lua HTTP client to defer initialization. core.httpclient()
no longer initializes the internal HTTP client immediately. Instead,
initialization now occurs within hlua_httpclient_send() when a request
method (e.g., get, put, head) is invoked.
The HTTPClient class now serves as a factory for accessing methods, while
a new class, HTTPClientRequest, has been introduced to represent individual
requests and manage the HTTP client lifecycle.
This change allows multiple requests to be executed using a single
HTTP client instance:
local hc = core.httpclient()
local res1 = hc:get({url = "...", headers = ...})
local res2 = hc:post({url = "...", headers = ...})
local res3 = hc:put({url = "...", headers = ...})
This refactor maintains backward compatibility, as existing scripts that
instantiate a new core.httpclient() for every request will continue to
work as expected.
Move the lua httpclient code from hlua.c to http_client.c
The code is almost the same but the registering of the class which is
done in hlua_http_client_init_state(), from REGISTER_HLUA_STATE_INIT()
check_args() calls have been replaced by hlua_check_args().
hlua_httpclient_destroy_all() is exported so it can be called in hlua.c.
hlua_httpclient_table_to_hdrs() is made static.
hlua_pusherror() and check_args() are being exported.
check_args() is now a macro to hlua_check_args() so it's not confusing
when called outside hlua.c.
Now that there is no longer a shared wake queue, chances are if a shared task
is scheduled, it will always end up on the same thread. In
wake_expired_tasks(), when a task has to be waken up, randomly look to
three other threads, and if the runqueue of the current thread is at least
two time bigger than the runqueue of one of the other threads, then give
that task to that thread, so that our load gets reduced.
If we're giving the task to another thread, then we have to add the
TASK_RUNNING flag until we waked it up, otherwise the other thread could
just run it, if it gets waken up from another path, and free it while
we're still not done with it.
2 times has been chosen somewhat arbitrarily, and may be tweaked at a
later date if deemed not optimal.
Modify task_instant_wakeup() to use __task_set_state_and_tid().
It uses the new ownership behavior, but that's okay because
task_instant_wakeup() was not used anywhere.
Totally remove the per-thread group wait queue. This was potentially a
source of contention, because there were only a global lock for all
those wait queues.
Instead, for shared tasks, there is now the concept of ownership for the
task. When a task is in the wait queue, run queue, or is running on that
particular thread, the task's tid is set to -2 - thread_tid, and only
that thread will be responsible for it until it is no longer running,
and in none of its queue.
When a shared task is scheduled to be run at a later time, if its
current tid is -1, then the current thread will take ownership, and put
it in its own wait queue. If it is already owned, then TASK_WOKEN_WQ is
added to the task's state, and a task_wakeup() is done, so that the
owner thread will add it in its wait queue.
If there is any owner, then a task_wakeup() will just add the task to
the owner's runqueue, otherwise the current thread will become the
owner.
Introduce a new function, __task_get_current_owner, that returns the
owner of a task based on its current tid.
-1 means there is no current owner, otherwise either the tid is >= 0, in
which case it will just return it, or it's < -1, in which case it will
return -2 - tid, the tid of the thread with the current ownership.
Introduce __task_get_new_tid_field(), that provides the tid to be used
for a task.
For shared task, to mark temporary ownership of a task, instead of -1,
the tid will be set to -2-tid, tid being the tid of the current thread.
Introduce a new function, __task_set_state_and_tid, that atomically can
set a task's state and its tid. This will be used later, as the tid will
be used to indicate task ownership even for shared tasks.
Add EVENT_HDL_SUB_ACME_DEPLOY to the ACME family. It is published in
the dns-01 challenge path after the TXT record information has been
prepared, carrying the certificate store name, domain, account
thumbprint, dns_record value, and optionally the provider and vars
strings.
Lua subscribers using core.event_sub() receive the event data as an
AcmeEvent object, which is the same class used for ACME_NEWCERT and
carries the fields relevant to the event type.
Add a new EVENT_HDL_SUB_ACME_NEWCERT event type in the ACME family.
It is published after a new certificate has been successfully fetched
and installed. The event carries the certificate store name, allowing
subscribers to act on newly available certificates.
Lua subscribers using core.event_sub() receive the event data as an
AcmeEvent object with a crtname field containing the certificate store
name.
Right now the only way to report info that is only displayed in diag
mode with -dD is to use ha_diag_warning(). The problem is that this is
then counted as a warning and may result in errors when combined with
-dW, as happens for the CPU topology info:
$ printf "global\nstats socket /tmp/sock1\n" | ./haproxy -dD -dW -c -f /dev/stdin; echo $?
[NOTICE] (10406) : haproxy version is 3.5-dev0-5091ac-35
[NOTICE] (10406) : path to executable is ./haproxy
[DIAG] (10406) : Created 20 threads split into 2 groups
[ALERT] (10406) : Some warnings were found and 'zero-warning' is set. Aborting.
1
We need another level. This commit introduces ha_diag_notice() which only
emits a notification that doesn't count as a warning. Note that we could
even introduce an info level and revisit various messages so that notice
only reports certain events while info is for anything (like versions
above). That could be a future improvement.
The Linux tls module requires a socket to be in TCP_ESTABLISHED state
before we can enable the TLS ULP on the socket, if the socket is in any
other state, then the setsockopt() call will fail, and we won't use
kTLS on that socket.
To make sure we're not doing it too early, defer it until the TLS
handshake is done, which means the TCP connection is established.
This should be backported up to 3.3.
Signed-off-by: Karol Kucharski <kkucharski@fastlogic.pl>
__LJMP, WILL_LJMP() and MAY_LJMP() were defined locally in hlua.c,
making them unavailable to other modules that implement Lua bindings.
Move them to include/haproxy/hlua.h so they can be used outside of
hlua.c.
Add a registration mechanism so that modules outside of hlua.c can hook
into each lua_State creation. Modules call hap_register_hlua_state_init()
(or the REGISTER_HLUA_STATE_INIT() macro) with a callback of the form:
int my_init(lua_State *L, char **errmsg);
The callback returns an ERR_* code. ERR_ALERT and ERR_WARN trigger
ha_alert()/ha_warning() respectively; any other non-zero errmsg is
emitted via ha_notice(). ERR_FATAL or ERR_ABORT cause exit(1).
Registered entries are freed in hlua_deinit().
Extract the challenge-readiness logic from cli_acme_chall_ready_parse()
into a new acme_challenge_ready(crt, dns) function so it can be called
from other contexts such as Lua event handlers.
It slightly changes the messages on the CLI.
Having a single task to take care of idle connection cleanup across all
servers leads to high contention. It uses a lock to maintain its tree of
servers to track, and then can acquire the idle_conns lock for each thread.
Instead, have one task per thread. Each thread will maintain its own
tree, so there will be no need for any lock, and it will just acquire
its own idle_conns lock, so it will lead to less contention.
This is a performance improvement, so backporting is optional, but may be
considered if it is worth it. That would require backporting commit
6f8dab2583 too.