During maxage parsing, the size of the value was not properly computed when
it was copied into the trash chunk. The name (max-age or s-maxage) must be
skipped with the '=' character. But instead of doing a subtraction, and
addition was performed, adding 2 extra bytes to the value used for the
convertion to integer.
In addition, the "chunk_memcat(chk, "", 1)" operation to add a trailing
NULL-byte was replaced by "*(b_tail(chk)) = '\0'". It a bit easier to
understand.
This patch should be backported to all stable versions.
MAX_PUSH_ID frames are emitted by the client only on the control stream.
These conditions are checked via h3_check_frame_valid() since the
following patch.
e4a5a64198
BUG/MINOR: h3: reject server MAX_PUSH_ID frame
However control stream test was inverted by mistake. This patch fixes
it.
Due to this bug, H3 connections were improperly closed on error by
haproxy for clients which send MAX_PUSH_ID frames. This has been
detected on the QUIC interop with aioquic and neqo clients.
This must be backported up to 3.3.
In qcc_qmux_recv(), when calling qmux_parse_frm(), also treat negative
values as an error, and close the connexion. qmux_parse_frm() will
return -1 if the frame is of an invalid type, and we don't want to
process any further, or we will crash.
Move the security contact out of intro.txt into a dedicated, easily
searchable doc/security.txt that points reporters at the threat model
first, and reference it from intro.txt's contacts section and the
documentation index.
Add doc/internals/threat-model.txt describing what does and does not
qualify as a security vulnerability in HAProxy so that reporters and
developers have a common understanding of the threat model, and make it
clear that anything non-critical should be handled in the open and
not hidden behind embargoes.
The document lists assets to protect, what constitutes an attack, what
are the mitigations in place, and the severity ordering of various
risks. This may in the long term also help developers make better
choices of default settings and option names, and may also justify
changing default settings over time when modern operating systems
bring new possibilities.
A section also lists some invariants and defaults in an attempt to
limit the risk of reporting theoretical issues that are technically
impossible to happen in the field.
This is an initial version meant to be refined as cases arise. It
was incrementally designed and cross-checked with the help of three
independent LLMs (Qwen, Gemini and Claude) until each correctly
classified a set of sample reports against it. In the current state
they do not raise any residual ambiguities anymore.
After testing against a few LLMs, it appeared that several entries in
the core principles document were ambiguous or imprecise and could be
misread (size_t, pools, trash, dwcas, comparison, ncbuf). No more
complaint after this rewording so this will be sufficient for now.
Found via cppcheck --force --enable=all --output-file=haproxy.log :
admin/halog/halog.c:1805:2: warning: If memory allocation fails, then there is a possible null pointer dereference: ustat [nullPointerOutOfMemory]
admin/halog/halog.c:1806:2: warning: If memory allocation fails, then there is a possible null pointer dereference: ustat [nullPointerOutOfMemory]
admin/halog/halog.c:1809:2: warning: If memory allocation fails, then there is a possible null pointer dereference: ustat [nullPointerOutOfMemory]
admin/halog/halog.c:1810:2: warning: If memory allocation fails, then there is a possible null pointer dereference: ustat [nullPointerOutOfMemory]
admin/halog/halog.c:1814:2: warning: If memory allocation fails, then there is a possible null pointer dereference: ustat [nullPointerOutOfMemory]
Found via cppcheck --force --enable=all --output-file=haproxy.log :
src/ncbmbuf.c:192:9: warning: If memory allocation fails, then there is a possible null pointer dereference: area [nullPointerOutOfMemory]
src/ncbmbuf.c:373:9: warning: If memory allocation fails, then there is a possible null pointer dereference: data [nullPointerOutOfMemory]
src/ncbmbuf.c:546:9: warning: If memory allocation fails, then there is a possible null pointer dereference: data [nullPointerOutOfMemory]
Found via cppcheck --force --enable=all --output-file=haproxy.log :
addons/51degrees/51d.c:130:3: warning: If memory allocation fails, then
there is a possible null pointer dereference: name [nullPointerOutOfMemory]
addons/51degrees/51d.c:922:4: warning: If memory allocation fails, then
there is a possible null pointer dereference: _51d_property_list [nullPointerOutOfMemory]
We can't call call the prepare_srv() method too early, because it needs
global.nbthreads to be properly set, which won't be true at post_parse
time. So instead, make it so that code runs later, as a post_check
function, when it will be safe to do so.
This should be backported up to 2.8.
This should fix github issue #3402
Ever since the introduction of multiple cache trees, the "show cache"
CLI command was not properly showing the contents of each tree, but was
only showing the first one.
Fix that by properly resetting next_key when we switch to the next tree.
Should be backported up to 3.0.
In in46un_to_addr(), when copying a struct sockaddr_in6, copy the
sin6_flowinfo and sin6_scope_id, as they are part of the structure too.
They are unlikely to be of any use for us, but this is more correct
anyway.
Very similarly to what was fixed with commit
63f853957a, we cast a sockaddr_in46 in
quic_dgram_parse() to sockaddr_storage while providing source and
destination addresses to qc_handle_conn_migration(), which will then
copy the whole sockaddr_storage, thus reading memory past what was
provided.
While this most likely won't have any impact, let's do the right thing,
and use in46un_to_addr() to generate a real sockaddr_storage.
This does not need to be backported.
EXTRA_MAKE allows to source an external makefile to bring new options
that will result in including add-ons etc. It must be part of the
construction of .build_opts that decides whether or not existing .o
are reusable or need to be rebuilt, otherwise we can end up with a mix
of .o built with some options and others with different options.
No backport is needed, as this appeared in 3.4.
Some setups where the number of threads is forced without any binding
(no cpu-map), are quite suspicious if they result in less threads than
available CPUs, and not even predictably bound, so we want to notify
the user that this might be an oversight.
Similarly, when thread-groups is forced and not nbthread (and no cpu-map),
and the final number of threads is lower than the hard-limit or the number
of CPUs we also indicate the impact and how to remedy it. This can happen
for example when starting on a machine with more than 64 CPUs and
thread-groups forced to 1, or on more than 128 CPUs and thread-groups
forced to 2 (e.g. when moving an older config to a new platform).
It is possible that some of these conditions might need to be readjusted
in the future to catch other traps or to relax certain commonly used,
valid cases, so for now it is preferable not to backport this patch.
The cpu-policy directive is ignored when nbthreads, thread-groups, or
cpu-map are set. In addition, first-usable-node is ignored when the
process was externally restricted (e.g. taskset). This is difficult to
debug when it happens because multiple parameters come into the mix and
it's easy to forget to unset one. Let's emit a notice when this happens
and the policy was forced. This way, it remains silent with the default
policy, but if it was forced, the incompatibility is reported.
It's worth noting that ll the cpu-policy functions take a char **err
but none uses it. It could have been useful here instead of calling
ha_notice() all along, but one needs to determine who the consumers
are and who will be responsible for freeing the message, so let's go
with ha_notice() given that were were already some diag_warnings in
these functions.
It could be helpful to backport this to 3.2.
Since it's easy to get caught by some parameters being ignored, let's
detect when mtpg was explicitly set and report a notice if it is ignored
due to thread-groups being set. For this we need to avoid presetting
the value in the global section and only set it when entering function
thread_detect_count(), which is OK since the value cannot be used before.
As documented, max-threads-per-group is the default number of threads
to arrange in a group before creating another group, and is only meant
to be used when thread-groups is not set.
However it was always enforced, so configs like:
global
thready-groups 2
which were sufficient in 3.2 and above to start with 64-128 threads
are now suddenly limited to 32 threads! Let's relax the limit when
thread-groups is set!
No backport is needed since this is only 3.4.
When starting, say, 128 threads with max-threads-per-group set to 2
and MAX_TGROUPS set to the default 32, instead of setting the resulting
number of groups to 32 and threads to 64, they're set to 1 and 32
respectively because the condition to raise grp_min is not satisfied.
Let's cut the condition in two parts to also permit to raise it at
least to grp_max.
This should be backported to 3.2.
It was made from the split of the original one into the SSL and the QUIC
variant. However there's a catch: both use the same certificates which
includes the OCSP URL 127.0.0.1:12345, and both need to start a server
on that port. Depending on the number of parallel process and their
speed, they might very well work, or totally fail due to a binding
conflict and the fact that the test runs for a few seconds.
Let's disable the QUIC variant for now, since the whole point of the
test is to verify all the sequencing, the SSL one is greatly sufficient.
Maybe a better approach can be found later.
Commit 1c59c39171 deferred hlua_init() to be called lazily from the
config keyword handlers (lua-load, lua-load-per-thread,
lua-prepend-path, tune.lua.openlibs), with a call inside
hlua_post_init() as a safety net for the case where no Lua directive
appears in the configuration at all.
The problem is hlua_init() is a function that allocates internal
servers (socket_proxy, socket_tcp, socket_ssl) that must exist before
haproxy initialize the configuration. But hlua_post_init() is done too
far after this initialization, so the safety net does not work
correctly.
This would results in a crash in the deinit() if no lua
configuration was loaded in haproxy.
Core was generated by `./haproxy -W -f /dev/null'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x00005671c72b1047 in _ceb_first (root=0x30, kofs=16, key_type=CEB_KT_U64, key_len=0,
is_dup_ptr=0x7ffc13197a14) at include/import/cebtree-prv.h:1160
1160 if (!*root)
(gdb) bt
#0 0x00005671c72b1047 in _ceb_first (root=0x30, kofs=16, key_type=CEB_KT_U64, key_len=0,
is_dup_ptr=0x7ffc13197a14) at include/import/cebtree-prv.h:1160
#1 _ceb64_first (root=0x30, kofs=16) at src/_ceb_int.c:73
#2 ceb64_ofs_first (root=0x30, kofs=16) at src/_ceb_int.c:66
#3 0x00005671c6be5e6e in srv_close_idle_conns (srv=0x5671fd592a80) at src/server.c:7676
#4 0x00005671c6d3be17 in deinit_proxy (p=0x5671fd5d7780) at src/proxy.c:393
#5 0x00005671c6d3c536 in proxy_drop (p=0x5671fd5d7780) at src/proxy.c:479
#6 0x00005671c6aed998 in hlua_deinit () at src/hlua.c:14934
#7 0x00005671c6db2e41 in deinit () at src/haproxy.c:2846
#8 0x00005671c6db3d98 in deinit_and_exit (status=0) at src/haproxy.c:2966
#9 0x00005671c6db6111 in main (argc=4, argv=0x7ffc131983c8) at src/haproxy.c:3997
The fix is to do the initialization earlier, in a pre-check callback.
Thanks to Amaury for reporting this issue.
No backport needed.
The QUIC congestion control algorithm impacted by this bug is BBR.
In qc_notify_cc_of_newly_acked_pkts(), drs->lost was updated after
quic_cc_drs_on_ack_recv(), causing the current sample's lost count to
miss the bytes_lost from the current loss detection round. This meant
that rs->lost = drs->lost - rs->prior_lost would always be 0 for the
current losses, since both drs->lost and rs->prior_lost (captured at
packet send time) excluded the current bytes_lost.
Moving drs->lost += bytes_lost before on_ack_recv ensures that the
rate sample correctly includes the newly detected lost bytes, matching
the BBR algorithm's intent where C.delta_lost = C.lost - C.prior_lost
should reflect all losses since the last sample.
Must be backported as far as 3.1 where delivery rate sampling was
implemented.
When exiting the recovery period and re-entering congestion avoidance,
the consecutive_losses counter was not reset. This meant that if a loss
event arrived immediately after the ACK that ended recovery, the counter
would still hold the value that triggered recovery, causing an immediate
re-entry into recovery (recovery -> CA -> recovery loop).
Resetting consecutive_losses to 0 on recovery exit matches the behavior
of resetting it on ACK in CA, ensuring a clean slate for the new
congestion avoidance period.
Must be backported to all versions.
The cubic slow_start callback was only resetting the internal cubic state
without reducing the congestion window, unlike newreno which calls
quic_cc_path_reset(). Per RFC 9002, persistent congestion should trigger
both entry into slow start and a reduction of the congestion window.
Must be backported to all versions.
Allocating and freeing an OpenSSL EVP_PKEY_CTX context via
EVP_PKEY_CTX_new_id() and EVP_PKEY_CTX_free() on every HKDF cryptographic
operation (such as during stateless reset token generation) induces
unnecessary memory allocation overhead.
Optimize this by introducing a global per-thread context array
'quic_tls_hkdf_ctxs'. These contexts are allocated and initialized once
at startup via a POST_CHECK hook (quic_tls_alloc_hkdf_ctxs) and are
properly freed at exit via a POST_DEINIT hook (quic_tls_dealloc_hkdf_ctxs).
The functions quic_hkdf_extract(), quic_hkdf_expand(), and
quic_hkdf_extract_and_expand() now reuse the pre-allocated context
corresponding to the current thread ID ('tid'), removing dynamic
allocations from these frequent execution paths.
As a cleanup, quic_hkdf_expand() is now static and unexported from the
header file.
Should be easily backported to all versions for optimization purposes.
In quic_insert_new_range(), the variable 'first' is a struct eb64_node*,
but pool_free expects a struct quic_arng_node*. While the addresses are identical
(since 'first' is the first member of quic_arng_node), this is technically
incorrect and should use eb64_entry() for proper type safety.
Must be backported to all versions.
When a backend connection is reused, qcm_strm_attach() callback is used.
A BUG_ON() is present to ensure that the connection is not already on
error. This should be guaranteed by the fact that idle insertion is
skipped for such connections.
However, when a connection is flagged on error, it is not immediately
removed from its idle/avail pool. Thus, there is a risk that it is
reused, triggering the aformentioned BUG_ON() statement.
This issue should be avoided via avail_streams callback which should
return 0, forcing the caller to cancel the connection reuse. In QUIC,
this callback implementation relies on internal qcc_be_is_reusable().
However, it lacked checks for error status.
To fix this, extend qcc_be_is_reusable() to properly check connection
errors or an expired timeout.
Previously, these parameters were already checked by qcc_is_dead(). As
it also relies on qcc_be_is_reusable(), this patch also rearranges it to
avoid duplicate checks for backend connections.
This should be backported up to 3.3.
When QUIC application layer is shut for a backend connection, the
connection is immediately removed from its idle pool. This is a nice
optimization as this prevents a future streams to try to reuse an
unusable connection. This is implemented since the following commit.
00d668549e
MINOR: mux-quic: do not reuse connection if app already shut
However, this removal is not correctly performed as it is used
conn_delete_from_tree(). For private connections, this can cause crashes
as they are stored in the session instead. Thus, connection status is
now properly check, and alternatively session_unown_conn() is used if
stored in the session.
This must be backported up to 3.3.
On the backend side, a QCS may be opened but resetted immediately. No
STREAM frame will be emitted prior to the RESET_STREAM. When the latter
is sent, qcs_close_local() will mark the QCS Tx channel as closed.
In this case, a BUG_ON() would be triggered as there is QCS Tx channel
is not yet marked as opened. To prevent this, add a qcs_idle_open() call
when the stream is resetted, but only for the backend side.
This should be backported up to 3.3.
Move the WURFL Makefile part to addons/wurfl/Makefile.mk so it can be
used with EXTRA_MAKE and allow to cleanup the main Makefile.
Shouldn't have impact on the build system, every build variable
previously used are the same.
Move the deviceatlas Makefile.inc to Makefile.mk so it can be used with
EXTRA_MAKE and allow to cleanup the main Makefile.
EXTRA_MAKE paths are appended with /Makefile.mk via addsuffix, so the
path must not have a trailing slash.
Shouldn't have impact on the build system, every build variable
previously used are the same.
Move the 51degrees Makefile part to addons/51degrees/Makefile.mk so it
can be used with EXTRA_MAKE and allow to cleanup the main Makefile.
EXTRA_MAKE paths are appended with /Makefile.mk via addsuffix, so the
path must not have a trailing slash.
Shouldn't have impact on the build system, every build variable
previously used are the same.
When DATA frame are received, we take care to update the counter used to
send WINDOW_UPDATE for the connection. It is also performed on error path
when DATA frames are processed. However, when this happened, only the frame
length was accounted while the padding must also be considered.
To fix the issue, the full frame length (h2c->dfl), which include the
padding length, must be added to the amount of newly received data
(h2c->rcvd_c).
The issue was introduced with commit eeacca75d ("BUG/MINOR: mux-h2: count
rejected DATA frames against the connection's flow control") and backported
to 2.8.
So this patch must be backported as far as 2.8.
These tests were using "tune.lua.openlibs none" with lua-load, which
was a no-op in the old code since Lua states 0 and 1 were always
initialised before config parsing with all standard libraries.
Now that the Lua VM is initialised lazily, the restriction correctly
applies to state 0 as well. Replace "none" with the minimal set of
libraries actually required by each test's Lua code:
- lua_socket.vtc, h_txn_get_priv.vtc, lua_httpclient.vtc: string
- txn_get_priv.vtc: string,table
HAProxy used to call hlua_init() unconditionally from step_init_1(),
before any configuration file was parsed. As a consequence, Lua states
0 and 1 were always created with hlua_openlibs_flags set to its default
value (HLUA_OPENLIBS_ALL), regardless of any tune.lua.openlibs directive
that appeared later in the global section. With multiple threads, states
2..N were created correctly in hlua_post_init() after the config had been
parsed, while states 0 and 1 retained the full standard-library set.
This produced the observable bug reported in GitHub issue #3396: a script
loaded with lua-load-per-thread could see require() as a function on
thread 1 but nil on thread 2 when tune.lua.openlibs was used to restrict
the available libraries.
The initialisation is now lazy. hlua_init() is idempotent: it returns
immediately if the states already exist (hlua_states[0] != NULL). It is
called explicitly from the three config keyword handlers that need the
Lua states to be live before they can do their work (lua-load,
lua-load-per-thread, lua-prepend-path) and from tune.lua.openlibs, after
the hlua_openlibs_flags variable has been updated, so that the states are
always created with the correct library set.
hlua_post_init() calls hlua_init() unconditionally as a safety net,
covering the case where no Lua directive appeared in the configuration at
all (no global section, or only pure-tuning directives such as timeouts
and memory limits), and ensuring correct behaviour with multiple
consecutive global sections.
As a result of this change, tune.lua.openlibs must now appear before
lua-load, lua-load-per-thread, and lua-prepend-path in the configuration;
if any of those keywords is encountered first, the Lua states will already
be initialised and tune.lua.openlibs with a non-default value will return
a parse error.
No backport needed.
When deallocating the QUIC datagram handlers, the per-thread buffer
allocated inside quic_dghdlrs[i].buf.buffer was missing a free().
This led to a memory leak on exit or reload.
Fix this by freeing each thread buffer before releasing the main
quic_dghdlrs array.
Unlike the detection performed during sendto() for an unreachable peer,
ECONNREFUSED was not handled when received via recvmsg() as an ICMP
"host unreachable" message.
This patch tracks ECONNREFUSED errors on the receive path.
Note that this detection is entirely dependent on the remote host effectively
sending an ICMP "host unreachable" message and on the absence of any network
filtering (e.g., firewalls) that would drop such ICMP packets. Without
receiving this ICMP signal, the connection state cannot be updated through
this mechanism.
At a higher level, similar to how this error is handled on sendto(),
the connection is now terminated as soon as possible by calling
qc_kill_conn(). This triggers a call to qc_notify_err(). When the mux
does not exist, it attempts to create one via conn_create_mux(). While
the latter systematically fails if the connection is flagged with
CO_FL_ERROR, it has the useful side effect of waking the stconn stream
attached to the connection during a session opening without a mux
(e.g., for H3).
This issue was caught by haload (upcoming tool).
Must be backported as far as 2.6 because it impacts both the QUIC
frontends and backends.
QPACK_LFL_WLN_BIT and related encoded field line bitmasks were defined
in both qpack-enc.c and qpack-dec.c. Moved them to qpack-t.h where
they are shared between encoder and decoder, eliminating the duplicate
definitions.
Should be backported to ease any further commit to come.
The <nlen> variable is a signed integer, but the check for a Huffman
decoding error was written as 'nlen == (uint32_t)-1'.
With standard compiler type promotion rules, this comparison happens to
work as intended when huff_dec() returns -1. However, relying on implicit
unsigned promotions for signed error checking is fragile. If a compiler
applies different promotion semantics, or if huff_dec() returns any other
negative error code, the failure would go undetected, leading to buffer
corruption or a crash via b_add() and ist2().
Fix this by using 'nlen < 0', removing any ambiguity regardless of the
compiler used.
Must be backported to all versions.
In qpack_decode_fs(), inside the QPACK_LFL_WLN_BIT branch (Literal field
line with literal name), the debug message printed "[name huff ...]" instead
of "[value huff ...]" after decoding the value string.
This is a harmless copy-paste typo from the preceding name decoding block.
Even if this is a cleanup, should be easily backported to ease any further
backport.
The sign bit of the Delta Base integer encoding was extracted using
mask 0x8 (bit 3) instead of 0x80 (bit 7). This was likely a copy-paste
error from other QPACK instructions using 3-bit varints.
According to RFC 9204 Section 5.2.1, for prefix instructions, the sign
bit 'S' is the most significant bit (bit 7) of the first byte, followed
by a 7-bit varint.
This fix is harmless for current HTTP/3 traffic: per RFC 9204, the Delta
Base calculation is strictly used for dynamic table entry references.
Since HAProxy's QPACK dynamic table is currently disabled and the extracted
sign bit is not yet used in the decoding logic (only in debug prints),
this code path has no impact on production for now.
Must be backported to all versions.
In qpack_decode_fs(), when decoding a literal field line with a literal
value, the debug message mistakenly printed "[name huff ...]" instead of
"[value huff ...]" after a successful Huffman decoding of the value string.
This is a harmless copy-paste typo from the field name decoding block
just above, fix it to prevent confusion when debugging QPACK streams.
Should be easily backported to all versions to ease further modifications
into the QPACK code.
When defragmenting the QPACK dynamic header table upfront during an
insertion, qpack_dht_defrag() can fail and return NULL if memory
allocation or re-allocation fails.
However, qpack_dht_insert() was blindly using the returned pointer
without validation, immediately leading to a null-pointer dereference
on 'dht->wrap'.
Fix this by checking if 'dht' is NULL after the defrag call and return
an error (-1).
Note that this has no impact on production yet because the QPACK dynamic
table is currently not enabled/used, so qpack_dht_insert() is never called.
Should be easily backported to all versions.
Although qpack_idx_to_name and qpack_idx_to_value are currently only
called within uncompiled debug code, they contained an index bug. They
passed absolute indexes directly to qpack_get_dte instead of relative
dynamic table indexes.
This patch fixes the logic by subtracting QPACK_SHT_SIZE and guarding
against static table index lookups.
Should be easily backported to all versions.