Having a single task to take care of idle connection cleanup across all
servers leads to high contention. It uses a lock to maintain its tree of
servers to track, and then can acquire the idle_conns lock for each thread.
Instead, have one task per thread. Each thread will maintain its own
tree, so there will be no need for any lock, and it will just acquire
its own idle_conns lock, so it will lead to less contention.
This is a performance improvement, so backporting is optional, but may be
considered if it is worth it. That would require backporting commit
6f8dab2583 too.
When tune.max-checks-per-thread is used, checks that should run are
queued, to avoid having too many checks running at the same time.
But if the check is about to be purged, because the server is being
deleted, we have to explicitly remove it from the queue as that memory is
about to be freed, otherwise it will cause a use-after-free.
Also, queued checks have not yet incremented th_ctx->running_checks, so
don't decrement it if we're queued.
This should be backported up to 3.0.
For certain calls like strdup(), certain libc call the malloc() symbol
themselves, resulting in both strdup() and malloc() accounting for the
allocation while a single free() call is accounted for. Usually it's
not very hard to spot as these allocations are done inside libc, but
yet they complicate the tracing of allocations.
Let's note when we enter a handler and refrain from doing the accounting
again in this case. This way, the strdup() call place will be accountable
for the allocation and the libc's internal malloc() will not be seen.
It's not convenient to use it as it is now because it may only be
used to count passes via the memprof init code. Let's turn it to
a bitfield instead so that we can also check what we're doing there.
This is safe because all callers of memprof_init() check for the
bit being zero first so it's not reentrant.
As reported by @broxio in issue #3411, when trying to delete an ACL by
its name, in case of error the message says "unknown map identifier".
We need to check the type to decide between map and ACL as in other
messages.
This can be backported to all stable branches. Thanks to @broxio for
reporting the issue with a reproducer and providing this tested fix.
In order to preventively avoid issues that complicate debugging, let's
report to developers early if a pool name is not acceptable. This patch
does it in create_pool_from_reg() which catches both direct and declared
registrations. Aside the previous case, this didn't catch any other
occurrence.
Using "show pools detailed" on the CLI breaks the column alignment on
"sess_priv_conns" because the pool name contains spaces: "session priv
conns list", which is not welcome as pool names are truncated after the
12th chars anyway. Let's shorten it to the pool's name as done for many
other ones: sess_priv_conns.
This can be backported as far as 3.0 where this name was introduced,
because it helps when trying to sum or graph certain metrics during
debugging.
conn_get_ssl_sock_ctx() retrieves the ssl_sock_ctx of a connection by
calling conn->xprt->get_ssl_sock_ctx(). Only ssl_sock implements this
method, and it returns conn->xprt_ctx. This works because for every
existing XPRT combination the SSL layer is the topmost one: even
xprt_handshake (SOCKS4, PROXY, NetScaler CIP) is installed *below*
ssl_sock, so conn->xprt keeps pointing to ssl_sock.
Qmux changes this assumption: xprt_qmux is stacked *on top of* ssl_sock
and keeps the SSL layer as its lower layer to exchange the QUIC transport
parameters over the established TLS stream. During the qmux handshake,
conn->xprt therefore points to xprt_qmux, which does not implement
get_ssl_sock_ctx(), making conn_get_ssl_sock_ctx() return NULL for the
whole connection, affecting every caller that inspects the SSL layer
(sample fetches, logging, ssl_sock_infocbk(), ...).
The visible consequence was a crash: when the peer sends a TLS alert
during the qmux handshake, the SSL library calls ssl_sock_infocbk(),
which recovers a valid connection but a NULL ctx, rightfully triggering
the "BUG_ON(!ctx)" early in the function.
This patch implements xprt_qmux_get_ssl_sock_ctx() so that it returns
the ssl_sock_ctx of the lower layer when it is the SSL layer, just like
ssl_sock_get_ctx() does. conn_get_ssl_sock_ctx() then works again for
all callers while the qmux handshake is in progress. After the handshake,
conn->xprt is restored to the SSL layer so nothing else changes.
This should be backported to 3.4.
In thread_detect_count(), avoid any usage of thread_cpu_enable_at_boot
if we're building without thread support. That variable is only defined
when building with threads, and those tests make little sense when
building with no thread, anyway.
This was submitted by: ririnto <ririnto@kakao.com>
This should fix github issue #3408.
This should be backported to 3.4.
As reported by @zhanhb in github issue #3410, since 3.3 with commit
fda6dc959 ("MINOR: regex: use a thread-local match pointer for pcre2"),
the local_pcre2_match array is initialized too late for use by Lua. If
a lua-load makes use of regex, it may segfault (actually using PCRE2
is fine but PCRE2_JIT will crash):
Let's change the init sequence so that the first thread's context is
initialized early at boot and other threads are initialized when they
are created. For lua-load-per-thread, all extra threads will run on
the first thread's temporary storage during init but that's not a
problem since the sole purpose is to avoid concurrent accesses.
Thanks to @zhanbb for the detailed report and quick tests. This needs
to be backported to 3.3.
When an external check was configured at the proxy level, the healthcheck
section set on a server was not considered. The main reason was that the
check type of the server was always inherited for the proxy one.
To fix the issue, when a healthcheck section is set on a server line, the
check type for the server is forced to TCPCHK.
This patch must be backported to 3.4.
Prior to this patch, qcc_io_recv() stream decoding loop was interrupted
on the first decoding error or if incomplete data could not be parsed.
This patch adjusts this part so that loop is stopped only on a
connection level error. In case of a stream level error or on incomplete
data, decoding continues on the next QCS entry.
Without this patch, there is a risk that a QCS decode is not performed
as expected, with a possible client timeout firing. This is pretty
unlikely though. However this patch is still necessary to remove
completely this possibility.
This should be backported up to 3.2.
When a RESET_STREAM is received, QCS Rx channel is closed and pending Rx
data and buf are cleared without being transmitted to upper stream
layer.
This patch complements this by removing the QCS from recv_list if
present in it. This is a small optimization nothing would be performed
for such QCS on qcc_io_recv().
When a RESET_STREAM is received, QCS Rx channel is closed and pending Rx
data and buf are cleared without being transmitted to upper stream
layer.
This can cause an issue if this QCS instance is present in the QCC
recv_list. When qcc_io_recv() is executed after reset handling, an
infinite loop is triggered for the QCS instance as qcs_rx_avail_data()
always return 0.
This issue happened due to the poor writing of the while loop in
qcc_io_recv() which is not correctly protected against infinite
execution.
To prevent this issue, this patch rewrites the loop. Crucially,
LIST_DEL_INIT() is now performed unconditionally outside of the inner
loop. This guarantees that even if the inner loop is not executed, the
stream will be removed from QCC recv_list and iteration will progress.
This is functionally correct as a QCS should not be present in recv_list
if there is no avail data or demux is currently blocked. For the first
condition, qcc_decode_qcs() will be called again when new data is read
unless demux is blocked. In this case, QCS will be reinserted in the
list on unblocking, with a rescheduling to invoke qcc_decode_qcs().
In the context of the currently found reproducer linked to stream reset,
the QCS instance can be safely removed from the recv_list without
implication.
This must be backported up to 3.2.
The healthcheck keyword could be parsed on default-server lines but not
copied during server initialization, making it ineffective. But there is
also a true issue by setting it on a default-server. The pseudo server used
to parse the default-server line is not initialized via the new_server()
function, as regular servers. So there is no tcpcheck information inherited
from the proxy. We must take care of that when the "healthcheck" keyword is
parsed to avoid crashes.
This patch must be backported to 3.4.
When an external check is started for a server, there is no tcpcheck
ruleset. The pointer is NULL. It was an issue leading to a crash if the
small-buffer option was enabled on the healthchecks. However, it is
irrelevant for external checks because it is only usefull to tcp checks.
So, the option must be ignored if there is no tcpcheck ruleset.
This patch must be backported to 3.4.
When an external check was configured on a backend, the tcpcheck post config
for backend's servers was still performed instead to be skipped. The led to
a NULL-deref on the tcpcheck ruleset pointer and so to a segfault.
It seems to be only an issue for the 3.4 and higher. However, for older
versions, the tcpcheck post-config is still performed for external checks
and it is not really clean. This can hide some bugs.
For the 3.4, a workaround consists in configuring the backend to use a
tcp-check before configuring the external check:
backend be
option tcp-check
option external-check
...
This patch should fix the issue #3407. It could be good to backport it to
all supported versions.
Released version 3.4.0 with the following main changes :
- BUG/MINOR: tcpcheck: Check LDAP response to not read more data than available
- BUG/MINOR: ssl-gencert: validate SNI characters to prevent SAN certificate injection
- BUG/MINOR: mux-h1: H2 preface rejection doesn't update stick-table glitches
- BUG/MEDIUM: cpu-topo: Enforce thread-hard-limit on policy
- BUG/MEDIUM: qmux: do not crash on too large record
- BUG/MEDIUM: qmux: do not crash on receiving an invalid first frame
- BUG/MINOR: qmux: reject too large initial record
- Revert "BUG/MEDIUM: dns: fix long loops in additional records parse on name failure"
- BUG/MINOR: qpack: Fix index calculation in debug functions
- BUG/MINOR: qpack: fix potential null-pointer dereference in qpack_dht_insert()
- CLEANUP: qpack: fix copy-paste typo in value Huffman debug string
- BUG/MINOR: qpack: fix sign bit mask in qpack_decode_fs_pfx()
- CLEANUP: qpack: fix copy-paste typo in value Huffman debug string for WLN
- BUG/MINOR: qpack: fix huff_dec() error handling in qpack_decode_fs()
- CLEANUP: qpack: move encoded macros to qpack-t.h to avoid duplication
- BUG/MEDIUM: quic: handle ECONNREFUSED on RX side
- BUG/MINOR: quic: Fix memory leak in quic_deallocate_dghdlrs()
- BUG/MEDIUM: lua: defer Lua VM initialisation to the first Lua config keyword
- REGTESTS: lua: fix tune.lua.openlibs in Lua reg-tests
- BUG/MINOR: mux-h2: Count padding for connection flow control on error path
- BUILD: addons: convert 51d addon to EXTRA_MAKE
- BUILD: addons: convert deviceatlas addon to EXTRA_MAKE
- BUILD: addons: convert WURFL addon to EXTRA_MAKE
- MINOR: mux_quic/flags: add missing flags
- BUG/MINOR: mux_quic: open an idle QCS on reset on BE side
- BUG/MINOR: mux_quic: fix BE conn removal on app shutdown
- BUG/MINOR: mux_quic: prevent BE reuse with an errored conn
- BUG/MINOR: quic: fix ack range node pool_free call passing wrong pointer type
- MEDIUM: quic: optimize HKDF operations by reusing per-thread contexts
- BUG/MEDIUM: quic: reset cwnd in slow_start on persistent congestion (cubic)
- BUG/MEDIUM: quic: reset consecutive_losses on exit from recovery period (cubic)
- BUG/MINOR: quic: update drs->lost before calling on_ack_recv
- Revert "MEDIUM: quic: optimize HKDF operations by reusing per-thread contexts"
- BUG/MEDIUM: lua: register hlua_init() as a pre-check to fix crash without Lua config
- REGTESTS: quic: disable quic/ocsp_auto_update for now
- BUG/MINOR: threads: set at least grp_max when mtpg is too small
- BUG/MEDIUM: threads: ignore max-threads-per-group when thread-groups is set
- CLEANUP: thread: indicate when max-threads-per-group is ignored
- MINOR: cpu-topo: notify when cpu-policy is ignored due to other settings
- MINOR: thread: report when thread-groups or nbthread results in less threads
- BUILD: makefile: include EXTRA_MAKE in the .build_opts construction
- BUG/MINOR: quic: Fix another buffer overflow with sockaddr_in46
- MINOR: quic: Copy sin6_flowinfo and sin6_scope_id too
- BUILD: Makefile: put EXTRA_MAKE help at the right place
- BUG/MINOR: cache: fix cache tree iteration
- BUG/MEDIUM: resolvers: Wait a bit before calling the xprt prepare_srv
- CLEANUP: addons/51degrees: initialize variables
- MINOR: addons/51degrees: handle memory allocation failures
- CLEANUP: ncbmbuf: improve handling of memory allocation errors in unit tests
- CLEANUP: admin/halog: improve handling of memory allocation errors
- DOC: internals: clarify ambiguous wording in core-principles
- DOC: internals: add a threat model definition
- DOC: add security.txt describing how to report security issues
- DOC: security: also add a note to exclude dev/ and admin/
- BUG/MEDIUM: qmux: Close connection on invalid frame
- CLEANUP: fix comment typo
- BUG/MEDIUM: h3: fix MAX_PUSH_ID handling
- BUG/MINOR: cache: Fix copy of value when parsing maxage
- BUG/MEDIUM: mux-h1: Dup connection/upgrade value to parse it when making headers
- BUG/MEDIUM: htx: Fix headers rollback on partial copy in htx_xfer()
- MINOR: deinit: release the in-memory copy of shared libs
- MINOR: debug: add -dA to dump an archive of all dependencies
- BUG/MEDIUM: ssl: Make sure the alpn length is small enough
- BUG/MINOR: applet: Commit changes into input buffer after sending HTX data
- BUG/MINOR: mux-spop: Fix possible off-by-one OOB read in spop_get_varint()
- BUG/MEDIUM: leastconn: Unlock the write lock on allocation failure
- BUG/MINOR: tasks: Increase the right niced_task counter
- BUILD: makefile: search for Lua 5.5 as well
- DEV: dev/gdb: improve ebtree pointer handling
- DEV: dev/gdb: add simple task dump
- DEV: dev/gdb: add simple thread dump
- DEV: dev/gdb: add fdtab dump
- DOC: config: add a few more explanation in http-reusee regarding sni-auto
- REGTESTS: add basic QMux tests
- BUG/MINOR: http-act: Properly handle final evaluation in pause action
- BUILD: makefile/lua: use the system's default library before all other variants
- BUG/MINOR: startup: unbreak chroot with CAP_SYS_CHROOT
- BUG/MINOR: haterm: do not try to bind QUIC when not supported
- BUG/MINOR: haterm: also apply the tcp-bind-opts to clear TCP "bind" lines
- CLEANUP: haterm: do not try to bind to SSL when not built in
- MINOR: haterm: enable ktls on the SSL bind line when supported
- CI: github: replace cirrus by a vmactions/freebsd-vm job
- BUILD: makefile: fix build error with GNU make 4.2.1 and /bin/dash
- BUG/MEDIUM: channel: Fix condition to know if a channel may send
- BUG/MEDIUM: vars: Properly eval set-var-fmt action for emtpy log-format string
- CI: github: run illumos job weekly on Mondays at 03:00 instead of monthly
- BUG/MEDIUM: stream: Don't use small buffer on queuing with a request data filter
- BUG/MINOR: jwe: don't write randoms past MAX_DECRYPTED_CEK_LEN in RSA_PKCS1_PADDING
- BUG/MEDIUM: chunk: do not rely on small trash by default for expressions
- CLEANUP: map: always test pat->ref in sample_conv_map_key()
- DEV: patchbot: prepare for new version 3.5-dev
- MINOR: version: mention that it's 3.4 LTS now.
sample_conf_map_key() calls pattern_exec_match() which may return a
static pattern with ref=NULL when passed with fill=1 (which is the
case) and pat->match == NULL (which doesn't seem to be the case). It
doesn't seem it could happen with standard maps, as only "-m found"
drops has a NULL ->match function and there's no keyword associated
with it) but maybe this could happen with maps implemented in Lua,
though this remains unlikely.
Anyway better clarify the situation by always checking that the ref
is non-null before dereferencing it, it will at least avoid warnings
from code coverage tools.
There's a corner case with get_trash_chunk_sz() combined with the use
of small bufs: if some incoming data is going to be inflated by a
converter in a non-predictable way (say url_enc etc) then there are
two possibilities:
- either we try to allocate a size that corresponds to the data, but
we risk to allocate a small buf to convert a 900B chunk, that will
now fail if it contains too many non-printable chars;
- or we try to allocate 3x the size to be conservative, but without
large bufs we'd fail to transcode any chunk larger than 5.3kB, even
if it contains only printable chars.
The approach should definitely be refined and it is not 100% reliable
for now. Better temporarily ignore the small buffers for these particular
cases where the savings are not relevant, and see how to pass the knowledge
of the expected size ranges deeper down the API in 3.5. We may possibly rely
on the current trash size (instead of contents) or other mechanisms that
are yet to be specified. alloc_small_trash_chunk() gets the same change
BTW for the same reasons.
The comment for get_trash_chunk_sz() was updated to restate the importance
of being conservative when requesting a size.
No backport is needed.
The recent fix in commit 1a5a33396d ("BUG/MEDIUM: jwe: substitute random
CEK on RSA1_5 decryption failure per RFC 7516 #11.5") writes 8 bytes at
once but stops at the last one, so it can overflow the sample by 7 bytes.
This is totally harmless since the max size is 64 bytes, but better stop
at the boundary. A final loop completes one byte at a time by construction
so that we can adapt to any value of MAX_DECRYPTED_CEK_LEN, but the compiler
will not emit it since we stop at 64.
No backport is needed, it's only for 3.4.
When there is a filter registered on the request data forwarding, we must
disable usage of the small buffers. For now it is safer to do so because we
don't know if the filter will properly handle the small buffers. In
addition, there is a true issue because it is possible to never re-arm the
receives in that case because the buffer reserve must be respected. This
leads to think a small buffer is always full, even empty one.
No backport needed.
The previous schedule (25th of each month) provided too little coverage
frequency. Switch to a weekly run every Monday at 03:00 UTC to catch
regressions sooner.
When the log-format string was empty, in action_store() function, a fallback was
performed on the expression evaluation, thinking a set-var() was performed.
However, it is possible to have an empty log-format string. At least, on 3.2 and
3.0, it is allowed to parse an empty log-format string, quoted empty string are
not rejected.
So, on 3.2 and 3.0, it was possible to have a "set-var-fmt" action in the config
leading to parse an empty log-format string. Doing so, a crash could be
experienced when the action was executed because the fallback on the expression
evaluation led to dereference a NULL pointer.
To fix the issue, during parsing the action type is now set to a different value
for a "set-var" or a "set-var-fmt" action. And this action type is tested during
execution to perform the right action.
This patch should fix issue #3406. It must be backported as far as 3.0. Only 3.2
and 3.0 are affected by the issue.
Historically, we considered a channel cannot send before the connection was
established. This was useful to know if the reserve should still be
respected for the receives. This was because it was possible to rewrite the
request on connection retry (because of http-send-name-header option).
However noadays, it is a useless limitation. Once data forwarding is
started, there is no longer rewrites on the request at the stream layer
(http-send-name-header option is handled by the muxes). And, since it is
possible to use small buffers to queue requests, it could be an issue,
because the reserve and the small buffer size are the same by default. Once
a small request was finally dequeued, the receives on client side were not
re-armed because we should still respect the reserve on receives
(channel_recv_limit() was returning 0 in that case).
To fix the issue, we must consider a channel may send since the underlying
stconn has reached the SC_ST_REQ state, instead of SC_ST_EST. Doing so, we
are able to ignore the reserve earlier and the receives can be re-armed even
with small buffers.
There is no reason to backport this patch, except if an issue is reported,
because only the 3.4 is concerned. But it could theorically be backported to
all stable versions.
The latest fix in the Makefile in commit 9993688954 ("BUILD: makefile/lua:
use the system's default library before all other variants") broke the
build on a machine with GNU make 4.2.1 and /bin/dash:
Makefile:690: *** unterminated call to function 'shell': missing ')'. Stop.
It's caused by the '#' in '#include'. Protecting it with a backslash
fixes the make issue but moves it to the shell where it's echoed in the
output. Printf '\043' works but not sure if it's everywhere yet. At this
point better just revert that tiny part which was made to refine the
presence check for lua.h by checking that it contains valid C code. If
the commit above is backported, this one will have to be as well.
When both USE_LINUX_SPLICE and USE_KTLS are enabled, it's worth
enabling kTLS on the bind line as it significantly increases the
local bit rate as well as through TLS accelerators (up to x2/x3).
The -dT option remains available to disable it. It was verified to
gracefully downgrade when not supported (e.g. OpenSSL 3.0.1 does
this).
When built without USE_OPENSSL, the binding errors are dirty, speaking
about crt-store and stuff like this. Better just indicate that SSL
support was not built in and explain how to enable it.
Commit 92581043fb ("MINOR: haterm: add long options for QUIC and TCP
"bind" settings") added --tcp-bind-opts. The doc (and commit) says that
it applies to TCP bind lines but it only applied to the TCP/SSL ones,
not the clear ones. Let's fix it. No backport needed, this is only 3.4.
When building without QUIC support (e.g. an SSL library not supporting
it), we'll get errors when trying to bind to the SSL port that QUIC is
not supported because the quic binding was unconditional. Let's only
place it when QUIC is supported. No backport needed, this is only 3.4.
The use of the unshare() mechanism to get the ability to chroot as an
unprivileged user produced a warning on some configurations where the
haproxy process has the CAP_SYS_CHROOT capability. We now only attempt
to use it when a previous chroot() call failed because of insufficient
privileges.
This should fix GitHub issue #3395. No backport needed.
The recent update to the makefile in commit bfbca23dc2 ("BUILD: makefile:
search for Lua 5.5 as well") to enable searching for Lua 5.5 revealed a
problem by which we were using the fallback versions before the main one
(e.g. /usr/include/lua-5.4/lua.h before /usr/include/lua.h). However, the
libs often contain the version in their name so that we can end up linking
with 5.5 while 5.4 was used in the include.
This was detected only when enabling lua 5.5 because in Lua 5.4
"luaL_openlibs()" was a symbol and became an inline in 5.5, preventing
from using a mix of the two versions.
The current change is minimal in that it skips all fallbacks when lua.h
is present in /usr/include, and includes it in the test to make sure that
the directory found contains valid C. LUA_LIB checks for lua before the
variants so as to remain consistent with the system provided version.
Thanks to @gene-git for reporting this problem in GH issue #3404.
This may have to be backported after a period of observation if users
face build issues for older releases on newer distros. In this case,
backporting 1c0f781994 ("MINOR: hlua: Add support for lua 5.5") would
equally be needed. However this will result in the system's version
being used first, which may or may not be desired.
The ACT_OPT_FINAL flag was not properly handled in the pause action. When
this flag is set, because of an abort or an unexpected error, an action must
no longer yield. However, in the pause action, this flag was never tested.
In case of client abort for instance, this could trigger an internal error
instead of a client error.
This patch should fix the issue #3403. It must be backported as far as 3.2.
The default sni-auto that aims at not upsetting certain servers doing
excessive checks of SNI vs host has some drawbacks (lower reuse ratio)
that are particularly hard to diagnose, so let's explain how connections
are reused/purged when dealing with many hosts, and how to cheat as well.
Let's also mention the expression used by "sni-auto" since it was only
mentioned in the code.
Three functions are provided here:
fd_dump: lists all FDs
fd_dump_conn: lists all FDs holding a connection
fd_dump_listener: lists all FDs holding a listener
They take no argument, and dump some of the known info. E.g. for
a connection, ctrl, xprt, flags, mux, sessions, frontend's name
and session's age are reported. Example:
(gdb) fd_dump_conn
fd 31: rm=0 tm=0x2 um=0 st=0x21 refc=0x1 tkov=0 gen=0 conn=0x7fffe803b600: flg=0x300 err=0 ctrl=0xdf51c0 xprt=0xdf5c80 mux=0xbaeee0 sess=0x7ffff003b570: fe=0x1e45b00 id=foo age=0ms
They are particularly slow because they iterate over all possible FDs,
so better limit them to the desired types.
The thread_dump function dumps the list of known threads and a few info
on them (pointer, current run queue, flags etc). This should help more
easily spot a particular one and find stuck ones.
E.g:
(gdb) thread_dump
Tid 0: pth=0x7ffff7e797c0 mono=2222322327950732 now_ms=4294947291 fl=0x38 rq=-1 cq=0 current=(nil)
Tid 1: pth=0x7ffff78d8640 mono=2222322327928085 now_ms=4294947291 fl=0x38 rq=-1 cq=0 current=(nil)
Tid 2: pth=0x7ffff6b7e640 mono=2222322327927150 now_ms=4294947291 fl=0x38 rq=-1 cq=0 current=(nil)
Tid 3: pth=0x7ffff637d640 mono=2222322327924878 now_ms=4294947291 fl=0x38 rq=-1 cq=0 current=(nil)
Tid 4: pth=0x7ffff5b7c640 mono=2222322327925676 now_ms=4294947291 fl=0x38 rq=-1 cq=0 current=(nil)
Tid 5: pth=0x7ffff537b640 mono=2222322327929524 now_ms=4294947291 fl=0x38 rq=-1 cq=0 current=(nil)
Tid 6: pth=0x7ffff4b7a640 mono=2222322327926817 now_ms=4294947291 fl=0x38 rq=-1 cq=0 current=(nil)
Tid 7: pth=0x7fffdffff640 mono=2222322327947960 now_ms=4294947291 fl=0x38 rq=-1 cq=0 current=(nil)
New functions task_dump_wq and task_dump_rq can be used to dump tasks
in a wait queue or in a run queue respectively. For the wait queue (the
most common usage), one needs to pass either the thread-local's timers,
or the thread group ones for shared tasks:
task_dump_wq &ha_tgroup_ctx[0].timers
task_dump_wq &ha_thread_ctx[0].timers
For the run queue, task_dump_rq will take the thread's rqueue:
task_dump_rq &ha_thread_ctx[0].rqueue
The output is the task pointer and a dump of the task* struct per line,
then a total count at the end.
The ebtree descent functions currently use $arg0 as is and it's up to
the user to manually type the required casts that are never obvious
(particularly when coming from a pointer). Let's put the eb_root* cast
in the function to be more user-friendly.
Support for Lua 5.5 was brought in 3.4-dev2 with commit 1c0f781994
("MINOR: hlua: Add support for lua 5.5") but the Makefile doesn't look
for it, which can be quite confusing on recent distros which start to
ship with it. Let's add it to the looked up names.
In __task_wakeup(), for a niced task, we don't always want to increase
the niced_task counter of the running thread's thread group, if we are
waking up the task of another thread, who belongs to a different thread
group, then we want to increment that thread group's counter instead, as
that's the one that will get decremented later.
So just increase the counter for the target thread'd thread group,
instead of using tg_ctx.
The impact is probably pretty minor, niced task shared amongst thread
are not very common, and the impact would mostly mean we'd run more/less
tasks in one run of process_runnable_tasks() than expected.
This should be backported as far as 2.8.
When we fail to allocate a new tree element, we're still holding the
write lock, so we should do an write unlock, not a read unlock, or the
lock will get corrupted and most likely this will end in a deadlock.
This should be backported up to 3.2.
In spop_get_varint(), -1 is returned if there is not enough data in the
buffer to decode the variable integer. However a strict comparison agasint
b_data() was performed, which is wrong. A failure must be reported if the
index is greater or equal to b_data().
This patch must be backported as far as 3.2.
After sending HTX data to an applet, htx_to_buf() must be called on the
applet buffer to commit changes (and possibly to reset the buffer if it is
empty). This was performed on the output buffer while it should in fact be
performed on the input buffer. So let's fix it.
This patch must be backported as far as 3.0.
When the check for server hash was introduced to make sure we're using
the right alpn, the logic to store the new alpn was flawed. We should
always check that the new alpn length is small enough to fit in the
buffer, no matter if the server hash is not the same or not. So always
check the length first, and only check if the alpn or the server changed
after.
This should be backported whenever commit
de3f245df0 has been backported.
This adds "-dA[file]" on the command line, which dumps an archive of all
dependencies detected at runtime into the designated file in tar format.
This is equivalent to "set-dumpable libs", but instead of keeping the libs
in memory, it dumps them into a file. This may be used after a core dump,
in order to provide all necessary libraries to developers to permit them
to exploit the core. This may not be available on all operating systems.
When shared libs were loaded via "set-dumpable libs", better release
them upon deinit, it will make valgrind happier. For this we now have
a new function free_collected_libs() in tools.c and call it in deinit().
In htx_xfer() function, when headers are partially copied, depending on the
flags, a rollback may be performed to remove all copied headers from the
destination message. However, there was an issue in the loop performing the
rollback. Instead of decrementing the returned value using the size of the
HTX block from the destination message, the one from the source message was
used. So the wrong value was be returned and in worst case, it could
overflow.
In addition, the BUG_ON() in the loop was removed because test condition was
wrong.
It is a 3.4-specific issue. No backport needed.
When message headers are formatted, the connection and upgrade header values
are parsed to be sanitized and to fill H1M flags. The values are modified in
place without changing the HTX message information accordingly (the block
info and the HTX info). It could be an issue if the output buffer is full
and the header cannot be formatted. Because the formatting can be stopped
with a HTX message in hazardous state.
It should be quite difficult to trigger this issue. But now, a copy of the
value is performed before parsing it. So only the copy will be altered,
leaving the HTX message in a safe state.
This patch must be backported to all stable versions.
During maxage parsing, the size of the value was not properly computed when
it was copied into the trash chunk. The name (max-age or s-maxage) must be
skipped with the '=' character. But instead of doing a subtraction, and
addition was performed, adding 2 extra bytes to the value used for the
convertion to integer.
In addition, the "chunk_memcat(chk, "", 1)" operation to add a trailing
NULL-byte was replaced by "*(b_tail(chk)) = '\0'". It a bit easier to
understand.
This patch should be backported to all stable versions.
MAX_PUSH_ID frames are emitted by the client only on the control stream.
These conditions are checked via h3_check_frame_valid() since the
following patch.
e4a5a64198
BUG/MINOR: h3: reject server MAX_PUSH_ID frame
However control stream test was inverted by mistake. This patch fixes
it.
Due to this bug, H3 connections were improperly closed on error by
haproxy for clients which send MAX_PUSH_ID frames. This has been
detected on the QUIC interop with aioquic and neqo clients.
This must be backported up to 3.3.
In qcc_qmux_recv(), when calling qmux_parse_frm(), also treat negative
values as an error, and close the connexion. qmux_parse_frm() will
return -1 if the frame is of an invalid type, and we don't want to
process any further, or we will crash.
Move the security contact out of intro.txt into a dedicated, easily
searchable doc/security.txt that points reporters at the threat model
first, and reference it from intro.txt's contacts section and the
documentation index.
Add doc/internals/threat-model.txt describing what does and does not
qualify as a security vulnerability in HAProxy so that reporters and
developers have a common understanding of the threat model, and make it
clear that anything non-critical should be handled in the open and
not hidden behind embargoes.
The document lists assets to protect, what constitutes an attack, what
are the mitigations in place, and the severity ordering of various
risks. This may in the long term also help developers make better
choices of default settings and option names, and may also justify
changing default settings over time when modern operating systems
bring new possibilities.
A section also lists some invariants and defaults in an attempt to
limit the risk of reporting theoretical issues that are technically
impossible to happen in the field.
This is an initial version meant to be refined as cases arise. It
was incrementally designed and cross-checked with the help of three
independent LLMs (Qwen, Gemini and Claude) until each correctly
classified a set of sample reports against it. In the current state
they do not raise any residual ambiguities anymore.
After testing against a few LLMs, it appeared that several entries in
the core principles document were ambiguous or imprecise and could be
misread (size_t, pools, trash, dwcas, comparison, ncbuf). No more
complaint after this rewording so this will be sufficient for now.
Found via cppcheck --force --enable=all --output-file=haproxy.log :
admin/halog/halog.c:1805:2: warning: If memory allocation fails, then there is a possible null pointer dereference: ustat [nullPointerOutOfMemory]
admin/halog/halog.c:1806:2: warning: If memory allocation fails, then there is a possible null pointer dereference: ustat [nullPointerOutOfMemory]
admin/halog/halog.c:1809:2: warning: If memory allocation fails, then there is a possible null pointer dereference: ustat [nullPointerOutOfMemory]
admin/halog/halog.c:1810:2: warning: If memory allocation fails, then there is a possible null pointer dereference: ustat [nullPointerOutOfMemory]
admin/halog/halog.c:1814:2: warning: If memory allocation fails, then there is a possible null pointer dereference: ustat [nullPointerOutOfMemory]
Found via cppcheck --force --enable=all --output-file=haproxy.log :
src/ncbmbuf.c:192:9: warning: If memory allocation fails, then there is a possible null pointer dereference: area [nullPointerOutOfMemory]
src/ncbmbuf.c:373:9: warning: If memory allocation fails, then there is a possible null pointer dereference: data [nullPointerOutOfMemory]
src/ncbmbuf.c:546:9: warning: If memory allocation fails, then there is a possible null pointer dereference: data [nullPointerOutOfMemory]
Found via cppcheck --force --enable=all --output-file=haproxy.log :
addons/51degrees/51d.c:130:3: warning: If memory allocation fails, then
there is a possible null pointer dereference: name [nullPointerOutOfMemory]
addons/51degrees/51d.c:922:4: warning: If memory allocation fails, then
there is a possible null pointer dereference: _51d_property_list [nullPointerOutOfMemory]
We can't call call the prepare_srv() method too early, because it needs
global.nbthreads to be properly set, which won't be true at post_parse
time. So instead, make it so that code runs later, as a post_check
function, when it will be safe to do so.
This should be backported up to 2.8.
This should fix github issue #3402
Ever since the introduction of multiple cache trees, the "show cache"
CLI command was not properly showing the contents of each tree, but was
only showing the first one.
Fix that by properly resetting next_key when we switch to the next tree.
Should be backported up to 3.0.
In in46un_to_addr(), when copying a struct sockaddr_in6, copy the
sin6_flowinfo and sin6_scope_id, as they are part of the structure too.
They are unlikely to be of any use for us, but this is more correct
anyway.
Very similarly to what was fixed with commit
63f853957a, we cast a sockaddr_in46 in
quic_dgram_parse() to sockaddr_storage while providing source and
destination addresses to qc_handle_conn_migration(), which will then
copy the whole sockaddr_storage, thus reading memory past what was
provided.
While this most likely won't have any impact, let's do the right thing,
and use in46un_to_addr() to generate a real sockaddr_storage.
This does not need to be backported.
EXTRA_MAKE allows to source an external makefile to bring new options
that will result in including add-ons etc. It must be part of the
construction of .build_opts that decides whether or not existing .o
are reusable or need to be rebuilt, otherwise we can end up with a mix
of .o built with some options and others with different options.
No backport is needed, as this appeared in 3.4.
Some setups where the number of threads is forced without any binding
(no cpu-map), are quite suspicious if they result in less threads than
available CPUs, and not even predictably bound, so we want to notify
the user that this might be an oversight.
Similarly, when thread-groups is forced and not nbthread (and no cpu-map),
and the final number of threads is lower than the hard-limit or the number
of CPUs we also indicate the impact and how to remedy it. This can happen
for example when starting on a machine with more than 64 CPUs and
thread-groups forced to 1, or on more than 128 CPUs and thread-groups
forced to 2 (e.g. when moving an older config to a new platform).
It is possible that some of these conditions might need to be readjusted
in the future to catch other traps or to relax certain commonly used,
valid cases, so for now it is preferable not to backport this patch.
The cpu-policy directive is ignored when nbthreads, thread-groups, or
cpu-map are set. In addition, first-usable-node is ignored when the
process was externally restricted (e.g. taskset). This is difficult to
debug when it happens because multiple parameters come into the mix and
it's easy to forget to unset one. Let's emit a notice when this happens
and the policy was forced. This way, it remains silent with the default
policy, but if it was forced, the incompatibility is reported.
It's worth noting that ll the cpu-policy functions take a char **err
but none uses it. It could have been useful here instead of calling
ha_notice() all along, but one needs to determine who the consumers
are and who will be responsible for freeing the message, so let's go
with ha_notice() given that were were already some diag_warnings in
these functions.
It could be helpful to backport this to 3.2.
Since it's easy to get caught by some parameters being ignored, let's
detect when mtpg was explicitly set and report a notice if it is ignored
due to thread-groups being set. For this we need to avoid presetting
the value in the global section and only set it when entering function
thread_detect_count(), which is OK since the value cannot be used before.
As documented, max-threads-per-group is the default number of threads
to arrange in a group before creating another group, and is only meant
to be used when thread-groups is not set.
However it was always enforced, so configs like:
global
thready-groups 2
which were sufficient in 3.2 and above to start with 64-128 threads
are now suddenly limited to 32 threads! Let's relax the limit when
thread-groups is set!
No backport is needed since this is only 3.4.
When starting, say, 128 threads with max-threads-per-group set to 2
and MAX_TGROUPS set to the default 32, instead of setting the resulting
number of groups to 32 and threads to 64, they're set to 1 and 32
respectively because the condition to raise grp_min is not satisfied.
Let's cut the condition in two parts to also permit to raise it at
least to grp_max.
This should be backported to 3.2.
It was made from the split of the original one into the SSL and the QUIC
variant. However there's a catch: both use the same certificates which
includes the OCSP URL 127.0.0.1:12345, and both need to start a server
on that port. Depending on the number of parallel process and their
speed, they might very well work, or totally fail due to a binding
conflict and the fact that the test runs for a few seconds.
Let's disable the QUIC variant for now, since the whole point of the
test is to verify all the sequencing, the SSL one is greatly sufficient.
Maybe a better approach can be found later.
Commit 1c59c39171 deferred hlua_init() to be called lazily from the
config keyword handlers (lua-load, lua-load-per-thread,
lua-prepend-path, tune.lua.openlibs), with a call inside
hlua_post_init() as a safety net for the case where no Lua directive
appears in the configuration at all.
The problem is hlua_init() is a function that allocates internal
servers (socket_proxy, socket_tcp, socket_ssl) that must exist before
haproxy initialize the configuration. But hlua_post_init() is done too
far after this initialization, so the safety net does not work
correctly.
This would results in a crash in the deinit() if no lua
configuration was loaded in haproxy.
Core was generated by `./haproxy -W -f /dev/null'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x00005671c72b1047 in _ceb_first (root=0x30, kofs=16, key_type=CEB_KT_U64, key_len=0,
is_dup_ptr=0x7ffc13197a14) at include/import/cebtree-prv.h:1160
1160 if (!*root)
(gdb) bt
#0 0x00005671c72b1047 in _ceb_first (root=0x30, kofs=16, key_type=CEB_KT_U64, key_len=0,
is_dup_ptr=0x7ffc13197a14) at include/import/cebtree-prv.h:1160
#1 _ceb64_first (root=0x30, kofs=16) at src/_ceb_int.c:73
#2 ceb64_ofs_first (root=0x30, kofs=16) at src/_ceb_int.c:66
#3 0x00005671c6be5e6e in srv_close_idle_conns (srv=0x5671fd592a80) at src/server.c:7676
#4 0x00005671c6d3be17 in deinit_proxy (p=0x5671fd5d7780) at src/proxy.c:393
#5 0x00005671c6d3c536 in proxy_drop (p=0x5671fd5d7780) at src/proxy.c:479
#6 0x00005671c6aed998 in hlua_deinit () at src/hlua.c:14934
#7 0x00005671c6db2e41 in deinit () at src/haproxy.c:2846
#8 0x00005671c6db3d98 in deinit_and_exit (status=0) at src/haproxy.c:2966
#9 0x00005671c6db6111 in main (argc=4, argv=0x7ffc131983c8) at src/haproxy.c:3997
The fix is to do the initialization earlier, in a pre-check callback.
Thanks to Amaury for reporting this issue.
No backport needed.
The QUIC congestion control algorithm impacted by this bug is BBR.
In qc_notify_cc_of_newly_acked_pkts(), drs->lost was updated after
quic_cc_drs_on_ack_recv(), causing the current sample's lost count to
miss the bytes_lost from the current loss detection round. This meant
that rs->lost = drs->lost - rs->prior_lost would always be 0 for the
current losses, since both drs->lost and rs->prior_lost (captured at
packet send time) excluded the current bytes_lost.
Moving drs->lost += bytes_lost before on_ack_recv ensures that the
rate sample correctly includes the newly detected lost bytes, matching
the BBR algorithm's intent where C.delta_lost = C.lost - C.prior_lost
should reflect all losses since the last sample.
Must be backported as far as 3.1 where delivery rate sampling was
implemented.
When exiting the recovery period and re-entering congestion avoidance,
the consecutive_losses counter was not reset. This meant that if a loss
event arrived immediately after the ACK that ended recovery, the counter
would still hold the value that triggered recovery, causing an immediate
re-entry into recovery (recovery -> CA -> recovery loop).
Resetting consecutive_losses to 0 on recovery exit matches the behavior
of resetting it on ACK in CA, ensuring a clean slate for the new
congestion avoidance period.
Must be backported to all versions.
The cubic slow_start callback was only resetting the internal cubic state
without reducing the congestion window, unlike newreno which calls
quic_cc_path_reset(). Per RFC 9002, persistent congestion should trigger
both entry into slow start and a reduction of the congestion window.
Must be backported to all versions.
Allocating and freeing an OpenSSL EVP_PKEY_CTX context via
EVP_PKEY_CTX_new_id() and EVP_PKEY_CTX_free() on every HKDF cryptographic
operation (such as during stateless reset token generation) induces
unnecessary memory allocation overhead.
Optimize this by introducing a global per-thread context array
'quic_tls_hkdf_ctxs'. These contexts are allocated and initialized once
at startup via a POST_CHECK hook (quic_tls_alloc_hkdf_ctxs) and are
properly freed at exit via a POST_DEINIT hook (quic_tls_dealloc_hkdf_ctxs).
The functions quic_hkdf_extract(), quic_hkdf_expand(), and
quic_hkdf_extract_and_expand() now reuse the pre-allocated context
corresponding to the current thread ID ('tid'), removing dynamic
allocations from these frequent execution paths.
As a cleanup, quic_hkdf_expand() is now static and unexported from the
header file.
Should be easily backported to all versions for optimization purposes.
In quic_insert_new_range(), the variable 'first' is a struct eb64_node*,
but pool_free expects a struct quic_arng_node*. While the addresses are identical
(since 'first' is the first member of quic_arng_node), this is technically
incorrect and should use eb64_entry() for proper type safety.
Must be backported to all versions.
When a backend connection is reused, qcm_strm_attach() callback is used.
A BUG_ON() is present to ensure that the connection is not already on
error. This should be guaranteed by the fact that idle insertion is
skipped for such connections.
However, when a connection is flagged on error, it is not immediately
removed from its idle/avail pool. Thus, there is a risk that it is
reused, triggering the aformentioned BUG_ON() statement.
This issue should be avoided via avail_streams callback which should
return 0, forcing the caller to cancel the connection reuse. In QUIC,
this callback implementation relies on internal qcc_be_is_reusable().
However, it lacked checks for error status.
To fix this, extend qcc_be_is_reusable() to properly check connection
errors or an expired timeout.
Previously, these parameters were already checked by qcc_is_dead(). As
it also relies on qcc_be_is_reusable(), this patch also rearranges it to
avoid duplicate checks for backend connections.
This should be backported up to 3.3.
When QUIC application layer is shut for a backend connection, the
connection is immediately removed from its idle pool. This is a nice
optimization as this prevents a future streams to try to reuse an
unusable connection. This is implemented since the following commit.
00d668549e
MINOR: mux-quic: do not reuse connection if app already shut
However, this removal is not correctly performed as it is used
conn_delete_from_tree(). For private connections, this can cause crashes
as they are stored in the session instead. Thus, connection status is
now properly check, and alternatively session_unown_conn() is used if
stored in the session.
This must be backported up to 3.3.
On the backend side, a QCS may be opened but resetted immediately. No
STREAM frame will be emitted prior to the RESET_STREAM. When the latter
is sent, qcs_close_local() will mark the QCS Tx channel as closed.
In this case, a BUG_ON() would be triggered as there is QCS Tx channel
is not yet marked as opened. To prevent this, add a qcs_idle_open() call
when the stream is resetted, but only for the backend side.
This should be backported up to 3.3.
Move the WURFL Makefile part to addons/wurfl/Makefile.mk so it can be
used with EXTRA_MAKE and allow to cleanup the main Makefile.
Shouldn't have impact on the build system, every build variable
previously used are the same.
Move the deviceatlas Makefile.inc to Makefile.mk so it can be used with
EXTRA_MAKE and allow to cleanup the main Makefile.
EXTRA_MAKE paths are appended with /Makefile.mk via addsuffix, so the
path must not have a trailing slash.
Shouldn't have impact on the build system, every build variable
previously used are the same.
Move the 51degrees Makefile part to addons/51degrees/Makefile.mk so it
can be used with EXTRA_MAKE and allow to cleanup the main Makefile.
EXTRA_MAKE paths are appended with /Makefile.mk via addsuffix, so the
path must not have a trailing slash.
Shouldn't have impact on the build system, every build variable
previously used are the same.
When DATA frame are received, we take care to update the counter used to
send WINDOW_UPDATE for the connection. It is also performed on error path
when DATA frames are processed. However, when this happened, only the frame
length was accounted while the padding must also be considered.
To fix the issue, the full frame length (h2c->dfl), which include the
padding length, must be added to the amount of newly received data
(h2c->rcvd_c).
The issue was introduced with commit eeacca75d ("BUG/MINOR: mux-h2: count
rejected DATA frames against the connection's flow control") and backported
to 2.8.
So this patch must be backported as far as 2.8.
These tests were using "tune.lua.openlibs none" with lua-load, which
was a no-op in the old code since Lua states 0 and 1 were always
initialised before config parsing with all standard libraries.
Now that the Lua VM is initialised lazily, the restriction correctly
applies to state 0 as well. Replace "none" with the minimal set of
libraries actually required by each test's Lua code:
- lua_socket.vtc, h_txn_get_priv.vtc, lua_httpclient.vtc: string
- txn_get_priv.vtc: string,table
HAProxy used to call hlua_init() unconditionally from step_init_1(),
before any configuration file was parsed. As a consequence, Lua states
0 and 1 were always created with hlua_openlibs_flags set to its default
value (HLUA_OPENLIBS_ALL), regardless of any tune.lua.openlibs directive
that appeared later in the global section. With multiple threads, states
2..N were created correctly in hlua_post_init() after the config had been
parsed, while states 0 and 1 retained the full standard-library set.
This produced the observable bug reported in GitHub issue #3396: a script
loaded with lua-load-per-thread could see require() as a function on
thread 1 but nil on thread 2 when tune.lua.openlibs was used to restrict
the available libraries.
The initialisation is now lazy. hlua_init() is idempotent: it returns
immediately if the states already exist (hlua_states[0] != NULL). It is
called explicitly from the three config keyword handlers that need the
Lua states to be live before they can do their work (lua-load,
lua-load-per-thread, lua-prepend-path) and from tune.lua.openlibs, after
the hlua_openlibs_flags variable has been updated, so that the states are
always created with the correct library set.
hlua_post_init() calls hlua_init() unconditionally as a safety net,
covering the case where no Lua directive appeared in the configuration at
all (no global section, or only pure-tuning directives such as timeouts
and memory limits), and ensuring correct behaviour with multiple
consecutive global sections.
As a result of this change, tune.lua.openlibs must now appear before
lua-load, lua-load-per-thread, and lua-prepend-path in the configuration;
if any of those keywords is encountered first, the Lua states will already
be initialised and tune.lua.openlibs with a non-default value will return
a parse error.
No backport needed.
When deallocating the QUIC datagram handlers, the per-thread buffer
allocated inside quic_dghdlrs[i].buf.buffer was missing a free().
This led to a memory leak on exit or reload.
Fix this by freeing each thread buffer before releasing the main
quic_dghdlrs array.
Unlike the detection performed during sendto() for an unreachable peer,
ECONNREFUSED was not handled when received via recvmsg() as an ICMP
"host unreachable" message.
This patch tracks ECONNREFUSED errors on the receive path.
Note that this detection is entirely dependent on the remote host effectively
sending an ICMP "host unreachable" message and on the absence of any network
filtering (e.g., firewalls) that would drop such ICMP packets. Without
receiving this ICMP signal, the connection state cannot be updated through
this mechanism.
At a higher level, similar to how this error is handled on sendto(),
the connection is now terminated as soon as possible by calling
qc_kill_conn(). This triggers a call to qc_notify_err(). When the mux
does not exist, it attempts to create one via conn_create_mux(). While
the latter systematically fails if the connection is flagged with
CO_FL_ERROR, it has the useful side effect of waking the stconn stream
attached to the connection during a session opening without a mux
(e.g., for H3).
This issue was caught by haload (upcoming tool).
Must be backported as far as 2.6 because it impacts both the QUIC
frontends and backends.
QPACK_LFL_WLN_BIT and related encoded field line bitmasks were defined
in both qpack-enc.c and qpack-dec.c. Moved them to qpack-t.h where
they are shared between encoder and decoder, eliminating the duplicate
definitions.
Should be backported to ease any further commit to come.
The <nlen> variable is a signed integer, but the check for a Huffman
decoding error was written as 'nlen == (uint32_t)-1'.
With standard compiler type promotion rules, this comparison happens to
work as intended when huff_dec() returns -1. However, relying on implicit
unsigned promotions for signed error checking is fragile. If a compiler
applies different promotion semantics, or if huff_dec() returns any other
negative error code, the failure would go undetected, leading to buffer
corruption or a crash via b_add() and ist2().
Fix this by using 'nlen < 0', removing any ambiguity regardless of the
compiler used.
Must be backported to all versions.
In qpack_decode_fs(), inside the QPACK_LFL_WLN_BIT branch (Literal field
line with literal name), the debug message printed "[name huff ...]" instead
of "[value huff ...]" after decoding the value string.
This is a harmless copy-paste typo from the preceding name decoding block.
Even if this is a cleanup, should be easily backported to ease any further
backport.
The sign bit of the Delta Base integer encoding was extracted using
mask 0x8 (bit 3) instead of 0x80 (bit 7). This was likely a copy-paste
error from other QPACK instructions using 3-bit varints.
According to RFC 9204 Section 5.2.1, for prefix instructions, the sign
bit 'S' is the most significant bit (bit 7) of the first byte, followed
by a 7-bit varint.
This fix is harmless for current HTTP/3 traffic: per RFC 9204, the Delta
Base calculation is strictly used for dynamic table entry references.
Since HAProxy's QPACK dynamic table is currently disabled and the extracted
sign bit is not yet used in the decoding logic (only in debug prints),
this code path has no impact on production for now.
Must be backported to all versions.
In qpack_decode_fs(), when decoding a literal field line with a literal
value, the debug message mistakenly printed "[name huff ...]" instead of
"[value huff ...]" after a successful Huffman decoding of the value string.
This is a harmless copy-paste typo from the field name decoding block
just above, fix it to prevent confusion when debugging QPACK streams.
Should be easily backported to all versions to ease further modifications
into the QPACK code.
When defragmenting the QPACK dynamic header table upfront during an
insertion, qpack_dht_defrag() can fail and return NULL if memory
allocation or re-allocation fails.
However, qpack_dht_insert() was blindly using the returned pointer
without validation, immediately leading to a null-pointer dereference
on 'dht->wrap'.
Fix this by checking if 'dht' is NULL after the defrag call and return
an error (-1).
Note that this has no impact on production yet because the QPACK dynamic
table is currently not enabled/used, so qpack_dht_insert() is never called.
Should be easily backported to all versions.
Although qpack_idx_to_name and qpack_idx_to_value are currently only
called within uncompiled debug code, they contained an index bug. They
passed absolute indexes directly to qpack_get_dte instead of relative
dynamic table indexes.
This patch fixes the logic by subtracting QPACK_SHT_SIZE and guarding
against static table index lookups.
Should be easily backported to all versions.
This reverts commit fefce297ab.
The commit broke the resolvers. All responses are marked as invalid. The
resolv_read_name() function can return 0 on error, but it seems also
possible to return 0 when no label name was found. And depending on the
caller, it can be an error... or not.
So, let's revert it. This might trigger a watchdog but doesn't seem to and
once fixed it makes things worse.
Must be backported as far as 2.4.
Initial max_record_size is set to 16382. If the first received record
size is larger, abort xprt_qmux layer immediately without having to wait
for the timeout.
No need to backport.
With QMux, each peer has to first emit a transport parameters frame. If
the received frame is different, xprt_qmux handshake cannot proceed.
This patch removes the BUG_ON() in this case, replacing it with a safer
connection closure.
In the future, a graceful close with CONNECTION_CLOSE frame should be
implemented.
No need to backport.
Remove BUG_ON() when reading a QMux record larger than the buffer. It is
now replaced by a safer error handling. In the future, a proper
CONNECTION_CLOSE emission should be implemented for this case.
No need to backport.
When a policy is set, and the number of threads is calculated
dynamically, make sure we enforce thread-hard-limit, and do not create
thread groups based on how many thread we would have created without
the limit.
This should be backported to 3.3 and 3.2. The patch won't apply cleanly
there, because the code has changed since then, but it should be very
similar, only we'll have to check "cpu_count" there, where in 3.4 we
check "thr_count".
commit 72fd357814 ("MEDIUM: mux-h1: Return an error on h2 upgrade
attempts if not allowed") added an h1_report_glitch() call on the new
405 path but exits via "goto no_parsing", which skips the
session_add_glitch_ctr() call at the end of the parse block. As a
result fc_glitches increments correctly but the per-session stick
counters never see it, breaking sc_glitch_cnt-based rate limiting of
the H2-preface-over-H1 abuse pattern.
No backport needed beyond the branches that took 72fd357814.
[cf: Patch was edited to move the goto label instead of duplicating
the call to session_add_glitch_ctr]
ssl_sock_add_san_ext() builds the Subject Alternative Name extension by
concatenating "DNS:" + servername and passing the result to
X509V3_EXT_nconf_nid(). OpenSSL's nconf parser splits the value string on
commas into multiple type:value SAN entries. The SNI comes from unauthenticated
TLS ClientHello data -- an attacker can embed commas and colons (e.g.,
"host,dns:internal.corp,ip:10.0.0.1") to inject arbitrary GENERAL_NAME entries
into certificates signed by HAProxy's configured CA.
This is a CA issuance-policy violation: the operator expects one certificate
per SNI hostname, but an attacker can obtain certificates containing additional
hostnames/IPs/emails without access to the CA private key.
Fix by adding ssl_sock_sni_is_valid() that validates the SNI contains only
DNS-label-legal characters (alphanumeric, hyphens, dots). The check is
performed at the start of ssl_sock_do_create_cert() before any allocation.
Commas, colons, spaces, and other special characters cause certificate
generation to fail, preventing SAN injection while allowing all valid
hostname values.
Must be backported in every maintained branches.
tcpcheck_ldap_expect_bindrsp() parses ASN.1 BER-encoded LDAP responses from
the health check target. After reading the outer message size and validating
protocol fields, it encounters a long-form BER length for the bindResponse
value (high bit set in the length byte). The code reads nbytes = (*ptr &
0x7f) then advances ptr by 1 + nbytes without checking that enough bytes
remain in the receive buffer. So, it is possible to read more data than
available.
Note that it is only possible if the LDAP response was forged because the
message length was already checked. LDAP response remains quite short and it
is not possible to read outside the buffer area. So at worst, garbage are
parsed and a wrong result is reported by the LDAP health-check. Most
probably an error will be reported.
This patch could be backported to all stable versions.
Released version 3.4-dev14 with the following main changes :
- MINOR: config: shm-stats-file is no longer experimental
- BUILD: proxy: unstatify the proxies_del_lock to avoid a warning without threads
- BUG/MEDIUM: net_helper: fix a remaining possibly infinite loop in converters
- MINOR: ssl_sock: remove unneeded check on QMux flags
- MINOR: connection: define xprt_add_l6hs()
- MINOR: xprt_qmux: define default value for get_alpn
- MINOR: connection: define mask CO_FL_WAIT_XPRT_L6
- MINOR: session: support QMux in clear on FE side
- MINOR: backend: support QMux in clear for BE side
- BUG/MINOR: ocsp: Manage date too far away in the future
- MINOR: mux_quic: handle STOP_SENDING in QMux
- MINOR: mux_quic: handle MAX_STREAMS for uni stream in QMux
- MINOR: mux_quic: do not crash on unhandled QMux frame reception
- BUG/MEDIUM: applet: Properly handle receives of size 0
- BUG/MEDIUM: resolvers: Fix test on dn label size in resolv_dn_label_to_str()
- BUG/MEDIUM: ssl-gencert: Unlock LRU cache if failing to generate certificate
- BUG/MINOR: quic: fix ODCID lookup from derived value
- BUG/MEDIUM: dict: hold lock while decrementing refcount in dict_entry_unref
- BUG/MINOR: tcpchecks: Limit parsing of agent-check reply to the buffer
- BUG/MEDIUM: hlua: Fix integer underflow when receiving line from lua cosocket
- BUG/MEDIUM: cli: Fix parsing of pattern finishing a command payload
- BUG/MEDIUM: acme: NUL terminate response buffer before PEM parsing
- BUILD: intops: mask the fail value in array_size_or_fail()
- BUG/MEDIUM: log-forward: make sure the month is unsigned
- BUG/MEDIUM: regex: allocate a large enough pcre2 match for all matches
- BUG/MEDIUM: tcpcheck/spoe: bound the SPOP error code to valid values
- BUG/MEDIUM: cache: fix a refcount leak for missed secondary entries
- BUG/MINOR: log: free logformat expr on compile failure in cfg_parse_log_profile
- BUG/MINOR: resolvers: fix room for trailing zero in resolv_dn_label_to_str()
- BUG/MINOR: resolvers: fix risk of appending garbage past the domain name
- BUG/MINOR: mux-h2: validate HEADERS frame length before reading stream dep
- BUG/MINOR: log: look for the end of priority before the end of the buffer
- BUG/MINOR: dict: fix refcount race on insert collision
- BUG/MINOR: init: use more than ha_random64() for the cluster secret
- BUG/MINOR: sample: limit the be2hex converter's chunk size
- CLEANUP: resolvers: use read_n32() instead of open-coded big-endian read
- CLEANUP: resolvers: remove pool_free(NULL) in SRV additional record matching
- CLEANUP: resolvers: fix comment typos and wrong filenames in file headers
- BUG/MINOR: haterm: fix the random suffix multiplication
- MINOR: haterm: enable h3 for TCP bindings
- MINOR: haterm: do not emit a warning when not using SSL
- BUG/MEDIUM: h1: drop headers whose names contain invalid chars
- BUG/MEDIUM: h1: limit status codes to 3 digits by default
- BUG/MEDIUM: cache: always verify the primary hash in get_secondary_entry()
- BUG/MINOR: cache: also recognize directives in the form "token="
- BUG/MINOR: resolvers: relax size checks in authority record parsing
- BUG/MINOR: sample: request an extra output byte for the url_dec converter
- BUG/MINOR: http-fetch: check against the whole token in get_http_auth()
- BUG/MEDIUM: acme: protect against risk of null-deref on connection failure
- BUG/MINOR: http-ext: always check remaining data when reading rfc7239 nodeport
- BUG/MINOR: base64: return empty string for empty input in base64dec()
- BUG/MINOR: payload: fix the handshake length bounds check smp_client_hello_parse()
- BUG/MINOR: ssl-hello: make use of the null-terminated servername
- BUG/MINOR: resolvers: switch to a better PRNG for query IDs
- BUG/MINOR: addons/51d: NUL-terminate headers before passing them to Trie API
- BUG/MEDIUM: tools: insert an XXH64 layer on the PRNG output
- MINOR: tools: provide a function to generate a hashed random pair
- MEDIUM: init: fall back to ha_random64_pair_hashed() for the cluster secret
- MEDIUM: tools: use the hashed random pair for UUID generation
- MEDIUM: h1: use ha_random64_pair_hashed() for the WebSocket key
- MEDIUM: quic: use ha_random64_pair_hashed() to generate the QUIC retry tokens
- MEDIUM: tools: switch the main PRNG to a thread-local xoshiro256**
- BUG/MEDIUM: h3: reject client push stream
- BUG/MINOR: h3: reject server push stream
- BUG/MINOR: h3: reject client CANCEL_PUSH frame
- BUG/MINOR: h3: adjust error on PUSH_PROMISE frame reception
- BUG/MINOR: h3: reject server MAX_PUSH_ID frame
- BUG/MEDIUM: auth: fix unconfigured password NULL deref
- BUG/MINOR: h3: add missing break on rcv_buf()
- BUG/MINOR: hlua: prevent Lua from passing CR/LF/NUL in HTTP headers
- BUG/MINOR: qmux: do not crash on frame parsing issue
- BUG/MINOR: quic: reject packet too short for HP decryption
- BUG/MINOR: jwe: enforce GCM tag length to 128 bits
- BUG/MEDIUM: jwe: substitute random CEK on RSA1_5 decryption failure per RFC 7516 #11.5
- BUG/MEDIUM: mux-fcgi: reject stream ID 0 for application records
- MINOR: http: Add function to remove all occurrences of a value in a header
- MINOR: h1: Add a H1M flag to specify a non-empty 'Upgrade:' header was parsed
- BUG/MEDIUM: h1-htx: Sanitize parsing to properly handle upgrade requests
- BUG/MINOR: mux-fcgi: Use relative offset to compute contig data in demux buf
- BUG/MINOR: mux-spop: Use relative offset to compute contig data in demux buf
- CLEANUP: mux-fcgi/mux-spop: Remove copy/pasted comment about slow realign
A comment about the condition to perform a slow realign of the demux buffer
was abusively copy/pasted from the FCGI multiplexer at different places in
the FCGI and SPOP multiplexers. Let's remove these comments.
b_contig_data() should be called with a head-relative offset (0 for the
beginning of readable data). However, in the SPOP multiplexer, to get
contiguous data available in the demux buffer, it is called with
b_head_ofs(dbuf) which returns an absolute buffer position (b->head). So
b->head is counted twice. Because of this bug, the demux buffer could be
realigned while it should not and conversely.
Instead, the offset 0 must be used. So let's fix it.
This patch must be backported as far as 3.2.
b_contig_data() should be called with a head-relative offset (0 for the
beginning of readable data). However, in the FCGI multiplexer, to get
contiguous data available in the demux buffer, it is called with
b_head_ofs(dbuf) which returns an absolute buffer position (b->head). So
b->head is counted twice. Because of this bug, the demux buffer could be
realigned while it should not and conversely.
Instead, the offset 0 must be used. So let's fix it.
This patch must be backported as far as 2.4.
Thanks to previous patches, the request messages are now sanitized to
properly handle Upgrade requests. Now, if a 'connection: upgrade' header
value was found while no 'Upgrade' header, the 'upgrade' values is removed
from the 'connection' header. Conversely the opposite is also performed. If
'Upgrade' header was found, but no "conneciotn: upgrade" header value, all
occurrences of 'Upgrade' header are refused.
This patch depends on following ones:
* MINOR: h1: Add a H1M flag to specify a non-empty 'Upgrade:' header was parsed
* MINOR: http: Add function to remove all occurrences of a value in a header
It should fix the issue 3397. But the H2 part should be reviewed too, and
probably the H1 response parsing, to be consistent with this change.
The series should be backported as far as 2.4.
http_remove_header_value() function was added to parse a header value and
remove all occurrences of a specific value.
This patch is mandatory to fix a bug.
Records with a stream ID set to 0 are reserved to management records.
However there was no check to trigger an error if an application record is
received with a stream ID to 0. This could lead to crash becausqe management
streams (which are static and immutable) can be modified while processing
application records (STDOUT/STDERR/END_REQUEST).
To fix the issue, An error is returned if the stream ID 0 is set on
GET_VALUES_RESULT or UNKNOWN_TYPE records.
This patch must be backported to all stable versions.
do_decrypt_cek_rsa() calls EVP_PKEY_decrypt with RSA_PKCS1_PADDING for
RSA1_5 and returns failure (goto end) on decrypt error. This creates a
measurable timing difference between "padding invalid" (fast exit before
content decryption) and "padding valid + AEAD tag fail" (full AES-GCM/CBC
decryption path), exposing the RSA private key to a Bleichenbacher-style
adaptive attack requiring ~10^4-10^6 queries.
Fix: On RSA_PKCS1_PADDING failure, fill decrypted_cek with random bytes
of the buffer size and return success (retval=0). This forces execution
into decrypt_ciphertext() regardless of padding validity, so the attacker
cannot distinguish valid from invalid padding via timing. The AEAD tag
check in decrypt_ciphertext() will still reject the wrong CEK, but the
timing profile is identical for both branches.
RSA-OAEP variants are not affected (mathematically infeasible to craft
valid ciphertext without the private key).
Introduced by RSA1_5 path lacking constant-time fallback.
Two fixes addressing cryptographic and parsing correctness issues:
1. Enforce 16-byte GCM authentication tag in decrypt_ciphertext()
The base64url-decoded 5th JWE component (authentication tag) was passed
directly to EVP_CTRL_AEAD_SET_TAG with its attacker-controlled length.
OpenSSL accepts 1-16 byte GCM tags and only verifies that many bytes, so
a 1-byte tag reduces forgery work factor to ~256. RFC 7518 mandates 128-bit
(16 byte) tags for A*GCM. The CBC-HMAC path already enforced correct length,
confirming this was an oversight.
Fix: Add (*aead_tag)->data != 16 check before the GCM branch in
decrypt_ciphertext(), rejecting any non-16-byte tag.
Introduced by 416b87d5db (JWE A*GCM support).
2. Enforce 16-byte GCMKW tag in parse_jose() decode_jose_field()
The $.tag field from the attacker-supplied protected header in A*GCMKW
key-wrap was similarly decoded without length enforcement. Fix: Add a
size != 16 check for fields named ".tag" in decode_jose_field() when
called from the GCMKW path.
Introduced by 026652a7eb (GCMKW tag field parsing).
Header protection can only be performed on a packet of a minimal size.
There was already a check for this in qc_do_rm_hp() but it did not use
the correct value.
Fix this by using the correct minimal size which is 20 bytes starting
from the packet number offset. This is enough to decrypt 4 bytes (PN max
size) and 16 bytes of IV. If the packet is not big enough, it is
still silently discarded.
This must be backported up to 2.6.
Ensure frame parsing error does not cause a crash by removing the
associated BUG_ON()/ABORT_NOW().
For now, connection is flagged on error, which ensures that any
send/receive future operations are prevented and connection is closed
asap. In the future, a proper CONNECTION_CLOSE will be required as
defined by QMux protocol.
No need to backport.
hlua_http_add_hdr() passes Lua string values directly to htx_add_header()
without validation. This can be an issue for user-controlled data, but as
well when relying on poorly written scripts. This patch makes sure that
neither the name nor the value may contain any of these forbidden chars.
This should be backported to all versions since the issue has been there
since at least 2.4.
The following patch ensures server MAX_PUSH_ID are rejected as a client.
This has been implemented by extending h3_rcv_buf().
e4a5a64198
BUG/MINOR: h3: reject server MAX_PUSH_ID frame
Case label for MAX_PUSH_ID has been moved in the function, however the
break instruction was removed by error. Fix this by adding the missing
break statement.
This must be backported to every version the above fix is. Currently, it
is scheduled to 3.3.
Fix a case of dereference NULL pointer when trying to use an user from
an userlist which does not have a password configured.
The check_user() function tries to do an strcmp of the password, howver
u->pass is NULL and the strcmp would crash when trying.
Must be backported in every stable branches.
Previously, MAX_PUSH_ID frames were silently ignored both on client and
server sides. However, such frame cannot be emitted by the server.
This patch fixes this by properly issuing connection error
FRAME_UNEXPECTED when receiving a MAX_PUSH_ID frame as a client. This is
implemented by extending h3_check_frame_valid().
This must be backported up to 3.3.
HTTP/3 PUSH_PROMISE frames are systematically rejected with H3 error
FRAME_UNEXPECTED. This is adapted on the server side as a client can
never emit them.
This patch adapts error reporting when haproxy runs as a client. In this
case, server is still forbidden to emit any PUSH_PROMISE as MAX_PUSH_ID
frames are never emitted. In this case, ID_ERROR must be used as an
error code.
This must be backported up to 3.3.
CANCEL_PUSH frames are silently ignored on both client and server sides.
However, as push support is not implemented by haproxy, clients are thus
forbidden to emit any of those frames.
Fix this by closing the connection with ID_ERROR when receiving a client
CANCEL_PUSH as a server. On client side, the frame is still silently
discarded.
This must be backported up to 2.6.
Push streams are not supported by haproxy as a client. Thus, it never
emits any MAX_PUSH_ID frame. In this case, the server is not allowed to
initiate any push stream.
This patch ensures that such stream is closed with error H3_ID_ERROR, as
specified by HTTP/3 RFC.
This must be backported up to 3.3.
HTTP/3 push streams can only be opened by a server instance. The
specification mandates that the connection must be closed if a server
receives a client-initiated push stream.
This patch should ensure that it is not possible to exploit
unidirectional streams for an unexpected usage.
This must be backported up to 2.6.
The current PRNG is xoroshiro128**, it was introduced in 2.2 with
commit 52bf83939 ("BUG/MEDIUM: random: implement a thread-safe and
process-safe PRNG"). It features a 2^128 sequence and can perform
2^64 or 2^96 jumps, though only the 2^96 jump is implemented. It
was initially designed to support both processes and threads, and
implements a shared state between threads instead of allocating
distinct sequences based on PID and thread numbers.
Since then, the PRNG's usage grew and processes have disappeared,
but the lock or the DWCAS are still there due to its shared nature,
and it's possible to trigger watchdog warnings by issuing 100 UUIDs
in a single log-format string.
Also, UUID and QUIC retry tokens now consume 128 bits from the PRNG
in two 64-bit calls, and used to weaken the PRNG by rapidly disclosing
its internal state on reasonably idle systems. This indicates that
most of the time we now need 128 bits.
This patch modernizes the internal generator by switching to xoshiro256**,
which has comparable properties (it's even faster), and features even
longer 2^256 periods, still returning 64 bits per call. It can be
initialized with 2^128 and 2^192 jumps. More details here:
https://prng.di.unimi.it/https://prng.di.unimi.it/xoshiro256starstar.c
Here we implement a thread-local state instead of the old shared one,
so there is no more need for synchronization. The state is seeded at
boot, and each thread performs as many 2^192 jumps as their TID is
large. The master process performs a 2^128 jump where it used to
perform a 2^96 jump so that it doesn't overlap with any worker thread.
However a cleaner approach could be to perform a 2^128 jump for each
fork() (here the worker) and 2^192 for each thread. This might be for
a future improvement.
ha_random64_internal() is now the new PRNG, so that everything else
remains totally transparent. _ha_random64_pair_hashed() continues to
hash the first 128 bits of the state.
A simple config generating 100 UUID on 20 threads jumps from 135k to
1.25M req/s, which translates to a bump from 13.5M to 125M UUID/s,
or 9 times faster. And there is no more DWCAS can be seen anymore
in perf top:
Before: 13.5M/s
Overhead Shared Object Symbol
99.04% haproxy [.] ha_random64_internal
0.66% haproxy [.] _ha_random64_pair_hashed
0.03% libc-2.42.so [.] __printf_buffer
0.02% [kernel] [k] _raw_spin_lock
0.01% libc-2.42.so [.] __strchrnul_avx2
0.01% [kernel] [k] ktime_get
0.01% [kernel] [k] lapic_next_deadline
0.01% haproxy [.] sample_process
0.01% haproxy [.] chunk_printf
0.01% libc-2.42.so [.] __printf_buffer_write
0.01% [kernel] [k] hrtimer_active
0.01% libc-2.42.so [.] __memmove_avx_unaligned_erms
0.01% libc-2.42.so [.] _itoa_word
After: 125M/s
18.84% libc-2.42.so [.] __printf_buffer
9.84% haproxy [.] sample_process
8.33% libc-2.42.so [.] __strchrnul_avx2
6.61% libc-2.42.so [.] __memmove_avx_unaligned_erms
6.06% libc-2.42.so [.] __printf_buffer_write
4.43% haproxy [.] strlcpy2
4.09% libc-2.42.so [.] _itoa_word
2.62% haproxy [.] sess_build_logline_orig
2.12% haproxy [.] _ha_random64_pair_hashed
1.28% haproxy [.] pool_put_to_cache
1.06% haproxy [.] __pool_alloc
1.00% haproxy [.] smp_fetch_uuid
0.93% haproxy [.] lf_text_len
0.82% haproxy [.] ha_generate_uuid_v4
The QUIC retry tokens used to directly return ha_random64(), making the
next tokens easily predictable on low-load systems before the XXH64 call.
Let's now switch to the faster and safer ha_random64_pair_hashed() instead.
Instead of using two consecutive calls to ha_random64(), let's use the
cleaner and safer ha_random64_pair_hashed(). This way the internal
PRNG state will not leak into the emitted headers.
The UUID generation used to emit the internal PRNG state, which allows
to predict previous and next ones, or disclose the internal PRNG state.
While not critical, it may eventually become an issue.
This patch uses the new ha_random64_pair_hashed() function that returns
a pair of u64 that are hashed from the internal PRNG state. It's almost
twice as fast on 20 threads (14.1M UUID/s vs 7.8M/s).
The cluster secret, when SSL is not working, used to involve a mix of
calls to ha_random64() and random() to mask the bits that we didn't want
to see leaked. Let's now simply fall back to ha_random64_pair_hashed()
that does a much better job.
A lot of places call two ha_random64() in a row to generate a 128-bit
random. While it's now safe against linear analysis thanks to the XXH64
call, it's still particularly expensive due to the lock.
Here we introduce a new function ha_random64_pair_hashed(), that feeds
two uint64_t with a hash of the PRNG's internal state, and make it
advance. This will cut in half the number of calls to ha_random64()
and should recover a part of the performance lost in the lock. For
now it's not used.
Consuming randoms in pairs directly exposes the internal PRNG's state
on moderately idle system. It can allow to predict next (or previous)
UUIDs, QUIC retry tokens, and WS keys for example. Let's insert an XXH64
call on the ha_random64() output to avoid this. We expand the boot seed
as the secret at boot, and use now_ns as the seed for each call. The
original ha_random64() function was renamed to ha_random64_internal()
for use cases where it's not a problem to directly use the internal
state.
The performance loss is only measurable when single-threaded. It drops
from 7.32M UUID per second to 7.16M. Above that there is no longer any
difference due to the DWCAS loop which reaches up to 98.5% CPU at 20
threads.
This will need to be backported to stable releases after a period of
observation.
_51d_set_device_offsets() passes ctx.value.ptr directly to
fiftyoneDegreesGetDeviceOffset() which expects a null-terminated string.
Let's copy it through the trash first, to avoid possibly surronding
garbage.
This can be backported to all versions.
The PRNG used by the DNS currently is easily predictable once an
observer can collect a few consecutive IDs from the same thread, since
it's a 32-bit xorshift reduced to 16 bits output. Let's switch it to
ha_random32() instead.
This should be backported, however on older releases the ha_random32()
cost is higher due to the lock involved.
In ssl_sock_switchctx_cbk(), the servername is copied into the trash
and null-terminated, but later in the call to strncpy() it's still used
as-is, so anything that follows it will be copied as well, which is not
really expected. Let's make the servername point to the trash after
sanitizing it, like ssl_sock_switchcbk_wolfSSL_cbk() does.
This can be backported to 2.6 since it was introduced with commit
a996763619 ("BUG/MINOR: ssl: Store client SNI in SSL context in case
of ClientHello error").
After reading the handshake length, which is covered by the previous
4 bytes check, the size was not subtracted before being compared to the
retrieved handshake length, making it possible to accept a handshake
that claims to be 4 bytes larger than it really is. Similarly, a few
lines later, data[34] is accessed without checking that it is present,
because the test is made on the second hs_len, which doesn't guarantee
that the data are there.
This fix adds both tests. It can be backported to all stable versions
as it was introduced in 1.6 with commit bb2acf589f ("MINOR: payload:
add support for tls session ticket ext").
Right now no special case is made of size zero and the parser assumes
that it can read the last two chars, which do not exist in this case.
Let's check for this empty string situation and return zero (empty) as
well.
This should be backported to all versions.
http_7239_extract_nodeport() reads the first byte of the passed string
but the caller doesn't check that it's not empty, which can happen if
passed as 'host="127.0.0.1:"'. In that case the function would read and
return garbage that is present in the buffer after the colon. Let's just
check the remaining length before reading.
This can be backported to 2.8 as it was introduced with commit b2bb9257d2
("MINOR: proxy/http_ext: introduce proxy forwarded option").
7 ACME state handlers iterate over hc->res.hdrs, but they can be called
after an error was detected, and the HTTP client will leave res.hdrs NULL
on connection errors before headers are received. Let's check this inside
the loop, like the chkorder handler already does.
Most of them, if not all, need to be backported to 3.2.
In 1.4, Basic authentication support was added by commit f9423ae43a
("[MINOR] acl: add http_auth and http_auth_group"). Interestingly,
a mistake there consisted in taking the length of the comparison from
the input token, so "b" matches "Basic". It was later propagated to
Bearer in 2.5 with commit f5dd337b12 ("MINOR: http:
Add http_auth_bearer sample fetch"). Let's just compare the entire
tokens.
This may be backported though it is very minor.
A dynamic chunk size is now being allocated for output since commit
dfc4085413 ("MEDIUM: sample: Get chunks with a size dependent on input
data when necessary"). However this one missed the need for the trailing
zero when specifying the size, let's add it.
No backport is needed, this is only in 3.4.
Both boundary checks in the authority record parsing loop of
resolv_validate_dns_response() use >= bufend where they should use
> bufend, causing valid DNS responses with exactly enough bytes to be
rejected as invalid.
The first one, "reader + offset + 10 >= bufend" is too strict since it
prevents 10-byte responses from being accepted as valid while they
are. The second one, "reader + len >= bufend" has the same issue, when
exactly len bytes remain, the check rejects it even though dns_max_name()
already validated it. It may be backported though it is unlikely to ever
be noticed.
The caching RFC (9111, but was present since 2616) indicate that
cache-control supports both the "token" and "token=..." forms and that
consumers are supposed to recognize both. In addition, "private=..." is
explicitly mentioned, so servers could very well emit it. However,
haproxy only recognizes the short form without argument, except for
"no-cache" where it also supports it followed by the beginning of a
set-cookie argument. Thus it could miss "private=" or "no-store=".
Let's refine the checks. Now we explicitly recognize the form
no-cache="set-cookie", and all variants of "token" or "token=" as
identical to disable caching. It will more reliably catch such edge
cases and make sure we never cache a response marked like this.
This should be backported, at least to the latest LTS (3.2), maybe
further after some observation.
When checking for secondary entries, the tree is walked within duplicates
of the primary key, only indexed on the first 32 bits, which means that
in case of hash collision, we could start looking for an object and
switch to another one while visiting secondaries. In order to avoid this
we simply need to always check the full primary hash of the entry that
was found.
This should be backported to all stable versions.
By default, HTTP/1 status codes are not limited in the parser. However,
the value is stored in a 16-bit field, meaning that it may be truncated
if too large. Let's just restrict to 3-digits by default, and permit to
relax the check when accept-unsafe-violations is set, provided that the
value still fits in 16 bits.
This could be backported to latest LTS release.
Originally with "option accept-invalid-http-request", we couldn't really
edit the request on the fly to remove offending headers. But since we
have HTX and the headers are indexed one at a time, it has become
trivial. A non-negligible number of violations are conditioned by the
now renamed "option accept-unsafe-violations-in-http-request", and a
controversial one could definitely be reporting and passing invalid
header names containing control chars or spaces. The option was placed
so as not to block requests/responses containing them, but there's no
point in passing them to the other side. Most of the time it will be
totally harmless since the other side will reject them. But in case
haproxy is placed in front of a non-compliant server, it would fail
to protect it.
This patch implements a name check for all headers when a parsing
error was detected. It's cheap enough (especially since only done
after an error), and will skip the header if its name is invalid.
This may also remove some possibilities of confusion in logs, or
when encoding headers names for example.
This should be backported at least till the latest LTS.
Latest commit 04811943b5 ("MINOR: haterm: enable h3 for TCP bindings")
produces a warning when SSL is not enabled due to the addition of
expose-experimental-directives. Let's condition it to the use of SSL.
Passing a size or anything with suffix "r" is supposed to apply a
random factor form 0 to 1. However due to the replacement of random()
with ha_random64(), all 64 bits are random before the divide, so the
end result is a random 32-bit value. In addition, ha_random64() is
slow since shared between threads.
Let's use statistical_prng() which is designed for this purpose and
is much cheaper. No backport is needed, this is only in 3.4.
In resolv_validate_dns_response(), when matching an additional A/AAAA
record to an SRV record, the code checked tmp_record->ar_item == NULL
then called pool_free(resolv_answer_item_pool, tmp_record->ar_item).
This is a copy-paste mistake from similar patterns elsewhere since
the pointer is confirmed to be NULL a few lines above, so let's just
drop the confusing pool_free.
In resolv_validate_dns_response(), the second DNS record parsing path
manually constructs a 32-bit big-endian TTL value from four individual
bytes using the expression:
reader[0] * 16777216 + reader[1] * 65536 + reader[2] * 256 + reader[3]
We have read_n32() to do this, and it's more robust against unexpected
signedness surprises (which should not happen right here since reader is
unsigned char and we use -fwrapv so the result is defined). Also, let's
make the ttl an uint instead of an int. The TTL is only retrieved and not
used for now, so better clean it now.
In 2.5, commit da0264a96 ("MINOR: sample: Add be2hex converter")
introduced the be2hex() converter, which reads input data of a given
chunk size, processes it as a big endian block and turns it to hex
output.
There's an issue if the configured chunk_size (2nd argument) is larger
than tune.bufsize/2, because the max_size calculation will underflow,
and the later loop will always match since it compares a size_t to an
int (BTW, compilers love to annoy us with useless warnings but I never
found how to see some for these ones). This can result in overflowing
the output trash if the input sample is at least as large as half a
buffer.
Let's add an explicit check for this, and change the max_size type to
size_t so that the comparison is always right. While we're at it, let's
ask the trash buffer to be twice as large, just like bin2hex() does, as
it may result in offering a larger buffer in 3.4. thanks to the large
buffers support.
Despite the risk, this is marked as minor because a config with that
large an argument in the converter makes absolutely no sense.
This should be backported to 2.6. The *2 for the trash allocation will
conflict and have to be dropped in stable versions, which is safe.
When not set, the cluster secret is randomly generated by two
consecutive calls to ha_random64(). However, the random64 PRNG may be
partially observed on a fully idle machine (QUIC retry tokens, UUID,
WS key), and it could be rolled back to the initial call that produced
the secret. This is purely theoretical as a normally loaded system
wouldn't reveal meaningful sequences, but better address this while
it's still easy.
The first here consists in isolating the cluster_secret from the PRNG
sequence. When RAND_bytes() is available and works, it's used. Otherwise
ha_random64() is mixed with uncorrelated bits from random().
This could be backported to stable releases.
In dict_insert(), when ebis_insert() returns an existing node n indicating
that another thread inserted the same key concurrently, the code freed its
own newly-allocated entry and returned the winner without bumping its
refcount. Both callers then held a reference with refcount=1 instead of 2,
so when one expires the other becomes a use-after-free or double-free.
The bug likely comes from the fact that new_dict_entry() creates an entry
with a refcount preset to 1 (saves an atomic op) and that because of this
there is no refcount increment upon a successful insertion in the tree,
resulting in requiring different code paths for collision and normal
insertion.
A simple fix consists in bumping the refcount under the lock and unlocking
only at the end, but this would mean performing two free() calls under a
lock, which we always try to avoid. The code was slightly rearranged so
that we can now bump the existing entry's refcount under the lock in case
of duplicate, or unlock immediately in the common case, so that the free()
call is done out of the lock.
The probably of the race is very low (at peers connection setup only),
reason why it's marked low. This should be backported to all versions.
In parse_log_message(), the first loop looks for '>' that finishes the
priority field, and unfortunately it stops once it has checked the first
byte after the end of the buffer. This means that a priority made only
of digits for the whole buffer would read one extra byte. In practice
since pools have a tag at the end this is only detectable when using ASAN,
but this should be fixed nevertheless.
This can be backported to all versions.
It's worth noting that RFC5424 now says that the PRI field is 1..3
digits only, so maybe at some point we could seriously limit the
length as well.
When the PRIORITY flag is present on a HEADERS frame, the frame must
contain a stream dependency and a weight, for a total of 5 bytes. The
length is checked after reading the stream dep field so theoretically
such a frame could cause up to 4-byte OOB read at the end of the buffer,
though in practice buffers allocated from pools never end on a page
boundary (one extra word at the end) and the anomaly is still detected
after reading the stream ID and the connection aborted with the glitch
count incremented. Thus while not technically correct, practically
speaking it's harmless.
This should be backported to all stable releases.
The previous fix 75f72c2eb ("BUG/MEDIUM: resolvers: Fix test on dn label
size in resolv_dn_label_to_str()") may still leave garbage from the input
buffer into the response: if a component length is passed as zero, it
should mark the end, but instead a dot will be emitted, and whatever
follows it in the input buffer would continue to be appended as extra
components. While having no direct consequences beyond the domain not
being properly decoded, it could at least complicate troubleshooting.
This should be backported where the fix above is backported.
The previous fix 75f72c2eb ("BUG/MEDIUM: resolvers: Fix test on dn label
size in resolv_dn_label_to_str()") can still be fooled by an input exactly
the size of str_len, in which case the trailing zero appended at the end
was not being accounted for. Let's add 1 to the condition to prepare for
it.
This needs to be backported wherever the fix above is backported.
When lf_expr_compile() fails in cfg_parse_log_profile, the code leaves
without freeing the previously strdup()'d strings in target_lf->str and
target_lf->conf.file. Let's add a call to lf_expr_deinit() there to
release it.
It was harmless anyway since the startup will abort when this happens,
but better clean it because with increasingly dynamic setups, one day
it could become a runtime leak.
No backport is needed.
When a primary cache hit has a Vary secondary_key_signature, the code calls
retain_entry() and shctx_row_detach() before performing the secondary lookup.
If get_secondary_entry() returns NULL (no stored variant matches), res is set
to NULL and the function falls through to return ACT_RET_CONT without calling
release_entry() or shctx_row_reattach(). Each such request leaks one refcount
and pins one shctx row permanently, eventually exhausting the cache if this
happens to all objects. This is visible when requesting a secondary key
covered by vary for an object that is already stored without that key.
"show cache" then shows the object's refcount increasing after each request.
In order to fix this we must do like when no secondary key could be built
and release everything. We only reattach to the row if we previously
detached.
The issue was introduced in 2.4 with commit 1785f3dd9 ("MEDIUM: cache: Add
the Vary header support"). The code changed a bit in 2.9 with commit
48f81ec09 ("MAJOR: cache: Delay cache entry delete in reserve_hot function"),
so in order to backport to 2.8 and older, the patch will have to be manually
applied (no test on detached).
tcpcheck_spop_expect_hello() stores the SPOA agent-supplied status-code
varint directly into check->code (signed short) without range validation.
The code is later used as an index into spop_err_reasons[100]. Let's
just replace invalid status codes with SPOP_ERR_UNKNOWN to avoid any
problem.
The SPOP tcp-check was introduced in 3.1 so this fix must be backported
to 3.2.
In 3.3 with commit fda6dc959 ("MINOR: regex: use a thread-local match
pointer for pcre2") we got a thread-local match that saves us from having
to allocate a match array with each match. However something was clearly
overlooked or misunderstood in the pcre2 API because the local match
array was initialized via pcre2_match_data_create() for MAX_MATCH-1
entries instead of MAX_MATCH, despite the commit message mentioning
MAX_MATCH entries. It was possibly confused with an index. Due to this
there is a risk of crash when matching more than 9 groups in a regex.
This fix must be backported to 3.3.
In 2.3, in preparation for log forwarding, commit 546488559 ("MEDIUM:
log/sink: re-work and merge of build message API.") extended the log
send API to be able to use metadata from an existing header. However
the month number is parsed from the passed meta-data and compared
against 11 but there's no check for negative values which could in
theory cause a negative monthname[] index.
It can be a problem when the date is received as RFC5424 and forced
to RFC3164 because certain characters in the month field could result
in a negative month value. Let's fix it by turning the month to unsigned
to make sure we only accept months 0..11.
This should be backported to all branches.
Cross-compilation on m68k fails in ssl_sock_resize_passphrase_cache()
where the compiler noticed the SIZE_MAX passed to realloc() in the
error path and complained that it's larger than PTRDIFF_MAX. This can
be disabled with -Walloc-size-larger-than=SIZE_MAX but in practice we
can simply hide the value and keep the warning to detect real failures
elsewhere. Let's pass it through DISGUISE() and also take this
opportunity for doing that inside an unlikely() clause since it's never
supposed to happen.
acme_res_certificate() passes the httpclient response buffer to
ssl_sock_load_pem_into_ckch(), which will then call BIO_new_mem_buf(buf, -1).
The "-1" flag will make the OpenSSL PEM parser determine the length by
using strlen(). However, the httpclient populates the response buffer with
__b_putblk() without writing a trailing NUL to it. The byte at area[data]
is whatever data previously resided there in the memory pool.
Thus, a malicious or compromised ACME CA can perform an arbitrary-length
out-of-bounds read until hitting the first NULL byte past the response
body. The OpenSSL PEM loader will try to iterate to load the chain
certificates, thus the PEM-looking garbage found in freed memory chunks
can be erroneously loaded as additional intermediate certificates. The
presence of a single NUL inside the valid response body will result in
silent truncation of the certificate.
Make sure that the area[data] contains a terminating NULL before passing
the buffer to the parser. Fail on insufficient room for the NUL terminator.
No backport required: The ACME client has been added in 3.x and this
code path didn't exist in 2.x.
When the dedidacted buffer to store the command payload was added (c5ae0da62
"MEDIUM: cli: Make a buffer for the command payload"), an bug was
introduced. When the pattern finishing the command payload is found, it is
removed from the buffer. A NULL-bytes is added before it, skipping the
previous newline character.
It worked well in all cases before the commit above, because the commandline
was already parsed and was placed at the beginning of the cmdline buffer.
So, there is always a line before the payload.
Now, the payload is stored in a dedicated buffer. So there is nothing
preceeding it in a buffer. If the payload is empty, we cannot rewind to the
previous line to set the NULL-byte character. We must handle this case to
avoid integer underflow on the payload buffer length.
It is a 3.4-specific bug. No backport needed.
In hlua_socket_receive_yield(), when we try to get a line, the trailing CRLF is
stripped by decrementing the block length. The '\n' is first skipped, then,
possible a preceeding '\r'. But the block lenght is never checked. If an empty
line is returned, this leads to an integer underflow and most probably to a
crash because this length is used to copy data into a LUA string.
To fix the issue, the block length is now properly tested against 0 before
decrementing it.
This patch must be backported to all stable versions.
When parsing the agent-check reply, we first loop on the response to find
the newline character, to add a NULL-byte at the end of the line. However,
this loop is not bounded to the data available in the buffer. So it is
possible to read bytes outside the buffer and eventually write a NULL-byte
ouside the buffer.
So let's check for the end of the buffer when looping on the agent-check
reply.
This patch must be backported to all stable versions.
In dict_entry_unref(), the write lock on d->rwlock was only acquired after
decrementing the refcount. However, between the decrement and the lock,
another thread could increment it by calling dict_insert(). That could lead
to a UAF.
To fix the issue, the call to HA_ATOMIC_SUB_FETCH is moved inside the write
lock.
This patch must be backported to all stable versions.
In haproxy, when an Initial packet is received, a new connection may be
created and a DCID must be attributed. This CID is derived from the
original DCID used by the client in its first packet. This is an
optimization to avoid storing two CIDs values in the CID tree.
On CID lookup, if the DCID used is not found, derivation is performed
again. This should permit to retrieve the DCID node. However, this
operation is not performed as expected in quic_get_cid_tid(), as the
wrong value is used on the second lookup. Fix this function by using
derive CID for it. Note that retrieve_qc_conn_from_cid() performs the
same lookup but the bug was not present there.
The impact of this bug is relatively low as most clients send a single
Initial packet. Even in case of multiple packets in a single datagram,
this does not cause any issue as the current thread is assigned as
default.
This should be backported up to 2.8.
In ssl_sock_generate_certificate(), if the LRU cache for generated
certificates is used, the LRU tree is not unlocked on cache miss if the
certificate generation failed. So let's unlock it on error path.
The bug was introduced by the commit fbc98ebcd ("BUG/MEDIUM: ssl: fix error
path on generate-certificates"). So this patch must be backported with the
commit above, so to all stable versions.
In resolv_dn_label_to_str(), size for a dn label was stored into an integer
from a signed char without a cast to unsigned. So dn label with a size of
128 bytes or more become negative, skipping this way the copy loop and
desynchronizing input vs output.
In addition, the size of the destination string was only checked at the
begining, against the dn string length. But it must also be checked for
every dn label, to be sure. The dn string can be forged to copied more bytes
than expected.
This patch must be backported to all stable versions.
when appctx_rcv_buf() function was called to get data from the applet, but
to get zero bytes, nothing was performed and the function early
returned. However, we must at least take care to set SE_FL_WANT_ROOM if
necessary. Otherwise, if data are still blocked in the applet's output
buffer while the EOI/EOS are pending, the information can be reported to the
upper layer and remaining data can be lost.
Indeed, in such case, SE_FL_WANT_ROOM flag is here to specify the applet has
more data to deliver. Thanks to this flag, the stream will wait before
closing. But when appctx_rcv_buf() function is called, this flag is removed by
the stconn. It is the function responsibility to set it again when necessary.
This patch should fix second part of the issue #3366. It must be backported
to 3.0.
Completes qmux_parse_frm() to ensure every frames allowed by QMux
protocol are listed. For now, nothing is implemented except a CHECK_IF()
to report such events.
This is necessary to prevent a crash on abort. Frames not supported by
QMux should already have been rejected prior via qmux_is_frm_valid().
Handle reception of a MAX_STREAMS frame for unidirectional stream usage
when using QMux. This simply consists in using qcc_recv_max_streams() as
with QUIC protocol.
Ensure reception of STOP_SENDING via QMux protocol is properly handled.
This simply consists in using qcc_recv_stop_sending() which will update
the associated QCS if found.
The check on the OCSP response expire time is based on the "Next Update"
field of the response, converted by my_timegm function that returns a
time_t (signed long). It is then stored in the 'expire' field of the
certificate_ocsp structure which is typed as a signed long.
When loading an OCSP response, if the "Next Update" time is too far in
the future and we are running on a 32 bits machine, we might end up with
negative times ireturned by my_timegm, which make the comparison with
the current date fail and raises the "OCSP single response: no longer
valid." error message.
This problem typically happens in the ocsp_auto_update.vtc regtest since
the loaded OCSP response have a "Next Update" field in 2050.
This patch simply changes the type of the expire field to an unsigned
long since the 'my_timegm' function does not return '-1' in case of
error, contrary to the standard 'timegm' one.
Ths patch can be backported to all stable branches.
Use xprt_add_l6hs() at the end of connect_server() if selected MUX layer
relies on a temporary handshake prior to its initialization. This
functions is noop is SSL layer is active.
This change is necessary to support clear QMux on the backend side.
Recently defined <init_xprt> from mux_proto_list is used to render the
code as generic as possible.
Activates xprt_qmux layer if necessary via session_accept_fd(). This is
necessary to be able to support QMux in clear. This operation is noop if
SSL is active, as in this case xprt_qmux will be activated after the SSL
handshake completion.
To ensure MUX init is delayed when running with clear QMux, mask
CO_FL_WAIT_XPRT_L6 is added to test if the embryonic task must be
started instead.
Define a new connection flag mask CO_FL_WAIT_XPRT_L6. This will be used
to indicate that a XPRT layer is running on top of layer 6. For now,
only xprt_qmux implements this method of operation.
Extend get_alpn() for xprt_qmux layer. If lower layer does not implement
ALPN negotiation, return a statically default protocol value. Currently
this is set to "h3".
This change is required to support QMux in clear without SSL. In the
future, it could be useful to configure the default protocol, for
example by extending the syntax for the "proto" keyword.
When QMux protocol is used, xprt_qmux layer is setup after SSL handshake
completion but prior to the MUX initialization. Once transport
parameters exchange is successful, the layer is removed and the MUX is
started.
The layer setup operation was performed directly on ssl_sock_io_cb().
Simplify the code by extracting it in a dedicated function
xprt_add_l6hs(). The function is generic so the requested XPRT layer
must be passed as argument.
The code is mostly identical. One difference is that a check is
performed to ensure no SSL handshake is pending. If this is the case,
the function is a noop. This will become useful to support QMux
transparently both in clear or on top of SSL.
Another minor addition is that CO_FL_XPRT_READY flag is automatically
resetted by xprt_add_l6hs(). This allows the code to use
conn_xprt_start() standard function after XPRT init.
A recent patch has introduced <init_xprt> mux_proto_list member. This
allows to activate QMux on SSL handshake completion without explicit
"proto qmux" setting.
Thanks to this change, on SSL handshake completion it is not necessary
anymore to check for CO_FL_QMUX_* flags.
The various tcp_option_* converters rely on tcp_fullhdr_find_opt() to
find the option. However, the same bug as fixed in commit dbf471f99a
("BUG/MAJOR: net_helper: ip.fp infinite loop on malformed tcp options")
was also present there, by which an option of length 0 could be looped
over indefinitely. In practice this does not happen since such options
are not valid, but if passed encoded in an HTTP header for example, it
could possibly be passed.
While fixing it, let's check for length >1 in all 3 locations insteead
of only non-zero, since there's no point processing a malformed option
that wouldn't even be properly skipped.
This fix doesn't need to be backported, unless the ip.fp series is.
Thanks to @Vincent55 for reporting this issue.
When threads are disabled, "static __decl_spinlock(foo);" ends up as
"static;", causing a build warning when threads are disabled. We don't
need it to be static so let's drop "static" here. No backport is needed,
this is 3.4-only.