Extend "show quic" to be able to dump MUX related information. This is
done via the new function qcc_show_quic(). This replaces the old streams
dumping list which was incomplete.
These info are displayed on full output or by specifying "mux" field.
Add the possibility to customize show quic full output with only a
specific set of printed fields. This is specified as a comma-separated
list. Here are the currently supported values :
* tp: transport parameters
* sock: connection addresses and socket FD
* pktns: packet number space with ack ranges and in flight bytes
* cc: congestion controler and loss information
Note that streams output is not filtered by this mechanism. It's because
it will be replaced soon by an output generated from the MUX which will
use its owned field name.
Add the possibilty to restrict show quic output to only a single
connection. This is done by specifying a quic_conn address pointer.
Default format selection has evolved with it. Indeed, it seems more
fitting to use full format by default when filtering on a connection.
However, it's still possible to revert to the original oneline format
with it by specifying it explicitely.
When a response was returned by HAProxy, a dedicated HTX flag was
set. Thanks to this flag, it was possible to add a "connection: close"
header to the response if the request was not fully received and to close
the connection. In the same way, when a redirect rule was applied,
keep-alive was forcefully disabled for unfinished requests.
All these mechanisms are now useless because the H1 mux is able to drain the
response. So HTX_FL_PROXY_RESP flag is removed and no special processing is
performed on HAProxy response when the request is unfinished.
unlike for H2 and H3, there is no mechanism in H1 to notify the client it
must stop to upload data when a response is replied before the end of the
request without closing the connection. There is no RST_STREAM frame
equivalent.
Thus, there is only two ways to deal with this situation: closing the
connection or draining the request. Until now, HAProxy didn't support
draining H1 messages. Closing the connection in this case has however a
major drawback. It leads to send a TCP reset, dropping this way all in-fly
data. There is no warranty the client has fully received the response.
Draining H1 messages was never implemented because in old versions it was a
bit tricky to implement. However, it is now far simplier to support this
feature because it is possible to have a H1 stream without any applicative
stream. It is the purpose of this patch. Now, when a shutdown is requested
and the stream is detached from the connection, if the request is unfinished
while the response was fully sent, the request in drained.
To do so, in this case the shutdown and the detach are delayed. From the
upper layer point of view, there is no changes. The endpoint is shut down
and detached as usual. But on H1 mux point of view, the H1 stream is still
alive and is being able to drain data. However the stream-endpoint
descriptor is orphan. Once the request is fully received (and drained), the
connection is shut down if it cannot be reused for a new transaction and the
H1 stream is destroyed.
All code from h1_detach() function was moved in a internal function,
h1s_finish_detach(). It will be used to defer the detach and be able to
drain the requests payload.
Checks performed in h1_shutw() to determine if the connection must be
shutdown now or not was move in a dedicated function. This will be used to
be able to drain the requests payload.
During a zero-copy forwarding negociation, if the H1 mux is blocked for any
reason, the IOBUF_FL_FF_BLOCKED flag must be set on its iobuf to notfiy the
producer it must wait. However, there were two places where it was not
performed: when the output buffer allocation failed and when the chunk
formatting failed.
This patch fixes the issue. It must be backported to 2.9.
There is still an issue with zero-copy forwarding of chunks with an unknown
size. It is possible for a producer to fill the sapce reserved for the CRLF
at the end of the chunk. The root cause is that this space is not accounted
in the iobuf offset. So, from the producer point of view, the space may be
used. We can also argue the current design for iobuf is not well suited for
this case. Instead of using a pointer on the consumer's buffer, it could be
easier to use a custom buffer built on top of the consumer one, via a call
to b_make(), with the size, head and data field reflecting the avaialble
space the producer can use.
By the way, because of this bug, it is possible to trigger a BUG_ON() when
we try to write the CRLF at the end of the chunk because the buffer is
full. It is unexpected. Only the stats applet may hit this bug.
To fix the issue, instead of writting this CRLF when the current chunk is
consumed, it is written before consuming the next one. This way, all space
reserved to create the chunk formatting is always placed before forwarding
data.
No backport needed.
This is a followup of the previous commit: GH user @songliumeng initially
reported an issue with the GPL license version for event_hdl source file
which was fixed by the previous commit. It turns out the same mistake was
made in http_ext source file: due to a mixup between LGPL and GPL, GPL
version '2.1' was referenced instead of '2'.
Again, clarify that this is indeed GPL by making use of the banner
provided in doc/gpl.txt
This should be backported in 2.8 with b2bb925 ("MINOR: proxy/http_ext:
introduce proxy forwarded option")
As spotted by user @songliumeng in GH #2463, there was a mixup between
LGPL and GPL in event_hdl source file: GPL version '2.1' was referenced
instead of '2'. Clarify that this is indeed GPL by making use of the
banner provided in doc/gpl.txt.
This should be backported in 2.8 with 68e692d ("MINOR: event_hdl: add
event handler base api")
Since 23cab33 ("BUG/MINOR: ssl: Clear the ckch instance when deleting a
crt-list line"), LIST_DELETE is done twice, one time in
cli_parse_del_crtlist() and another time in ckch_inst_free().
It could trigger a crash with -DDEBUG_LIST.
This isn't a major problem since the ptr is not freed in the meantime so
it will only trigger with the debug.
This patch removes the LIST_DELETE as well as the loop done on link_ref
which is also don in ckch_inst_free()
Could be backported as far as 2.4. 2.4 version does not have a link_ref
loop.
Contrary to static servers, dynamic servers does not initialize their
settings from a default server instance. As such, _srv_parse_init() was
responsible to set a set of minimal values to have a correct behavior.
However, some settings were not properly initialized. This caused
dynamic servers to not behave as static ones without explicit
parameters.
Currently, the main issue detected is connection reuse which was
completely impossible. This is due to incorrect pool_purge_delay and
max_reuse settings incompatible with srv_add_to_idle_list().
To fix the connection reuse, but also more generally to ensure dynamic
servers are aligned with other server instances, define a new function
srv_settings_init(). This is used to set initial values for both default
servers and dynamic servers. For static servers, srv_settings_cpy() is
kept instead, using their default server as reference.
This patch could have unexpected effects on dynamic servers behavior as
it restored proper initial settings. Previously, they were set to 0 via
calloc() invocation from new_server().
This should be backported up to 2.6, after a brief period of
observation.
This patch reverts 2 fixes that were made in an attempt to fix the
ocsp-update feature used with the 'commit ssl cert' command.
The patches crash the worker when doing a soft-stop when the 'set ssl
ocsp-response' command was used, or during runtime if the ocsp-update
was used.
This was reported in issue #2462 and #2442.
The last patch reverted is the associated reg-test.
Revert "BUG/MEDIUM: ssl: Fix crash when calling "update ssl ocsp-response" when an update is ongoing"
This reverts commit 5e66bf26ec.
Revert "BUG/MEDIUM: ocsp: Separate refcount per instance and per store"
This reverts commit 04b77f84d1b52185fc64735d7d81137479d68b00.
Revert "REGTESTS: ssl: Add OCSP related tests"
This reverts commit acd1b85d3442fc58164bd0fb96e72f3d4b501d15.
In appctx_htx_rcv_buf(), HTX blocks found in the appctx output buffer are
copied into the channel buffer. At the end, the state of the underlying
buffer must be updated. If everything was copied, the buffer is reset. This
way, it will be released later, at the end of the applet process function.
However, here there was a typo. We do it on the input buffer instead of the
output buffer. As side effect, an empty HTX message remained stuck in the
appctx outbut buffer, blocking the applet and leading to blocked session
with no expiration date.
No backport needed.
Due to the possibility of calling a control process after adding CRLs, the
ssl_commit_crlfile_cb variable was added. It is actually a pointer to the
callback function, which is called if defined after initial loading of CRL
data from disk and after committing CRL data via CLI command
'commit ssl crl-file ..'.
If the callback function returns an error, then the CLI commit operation
is terminated.
Also, one case was added to the CLI context used by "commit cafile" and
"commit crlfile": CACRL_ST_CRLCB in which the callback function is called.
Signed-off-by: William Lallemand <wlallemand@haproxy.com>
Output of 'show quic' is messed up since the introduction of reordered
packets counter in the following commit. The new counter is mixed up
with the first stream line. This is due to the wrong placement of the
newline delimiter.
167e38e0e0
MINOR: quic: Add a counter for reordered packets
This should be backported up to 2.6.
The issue was decribed in commit "BUG/MEDIUM: cli: Warn if pipelined commands
are delimited by a \n". In non-interactive mode, it was possible to use a
newline character as delimiter for pipelined commands. As a consequence, it was
possible to stop commands processing on the middle.
With the above commit, a warning is emitted to notify users. With this one,
we restore the expected behavior, as documented in the management guide.
Only the first line of commands is parsed. This commit will not be
backported to avoid breaking changes on stable versions.
This commit has of course some visible effects. All script using a newline
character as delimiter to pipeline commands in non-interactive mode will
stop working. Only the first command will be evaluated, all others will be
ignored. Pipelined commands MUST now be separated by a semi-colon.
But there is a more subtle and probably more annoying change. It is no
longer possible to pipeline commands with a payload ! A command with a
payload will always be the last one evaluated because it must be finished by
a newline (eventually preceeded by a custom pattern).
It is really annoying to introduce such breaking change. But, on the long
term, it is mandatory. The 2.8 will be the last LST version supporting the
old behavior (with some warning however). This will let 4 years to users to
adapt their scripts.
No backport needed.
This was broken since commit 0011c25144 ("BUG/MINOR: cli: avoid O(bufsize)
parsing cost on pipelined commands"). It is not really a bug fix but it is
labelled as is to make it more visible.
Before, a full line was first retrieved from the request buffer before
extracting the first command to eval it. Now, only one command is retrieved.
But we rely on the request buffer state to interrupt processing in
non-interactive mode. After a command processing, if output of the request
buffer is empty, we leave. Before the above commit, this was not a problem.
But since then, it is obviously a bad statement. First because some input
data may still be there. It is not true today, but it might change. Then,
there is no warranty to receive all commands in same time. For small list of
commands, it will be most of time the case, but it is a dangerous
assumption. For long list of commands, it is almost always false.
To be an issue, commands must be chunked exactly between two commands. But
in this case, remaining commands are skipped. A good way to reproduce the
issue is to wait a bit between two commands, for instance:
(printf "show info;"; sleep 2; printf "show stat\n") | socat ...
In fact, to properly fix the issue, we should exit on the first command
finished by a newline. Indeed, as stated in the documentation, in
non-interactive mode, a single line is processed. To pipeline commands,
commands must be separated by a semi-colon. Unfortunately, the above commit
introduced another change. It is possible to pipeline commands delimited by
a newline. It was pushed 2 years ago and backported to all stable versions.
Several scripts may rely on this behavior.
So, on stable version, the bug will not be fixed. However a warning will be
emitted to notify users their scripts don't respect the documentation and
they must adapt it. Mainly because the cli behavior on this point will be
changed in 3.0 to stick to the doc. This warning will only be emitted once
over the whole worker process life. Idea is to not flood the logs with the
same warning for every offending commands.
This commit should probably be backported to all stable versions. But with
some cautions because the CLI was often modified.
This loop was added to detect pipelined commands when only co_getline() was
used to get commands. Now, co_getdelim() is used and the semi-colon is also
considered as a command delimiter.
As side effet, the last semi-colon, if any, is no longer replaced by a
newline. Thus, we must take care to adapt the test to detect partial
commands.
On qcs_destroy(), a BUG_ON() statement check that QCS does not have
anymore prepared data. This is to ensure connection flow control is
always coherent and prevent transfer freeze.
However, this BUG_ON() may cause a spurrious crash in case QCC is
considered on error. Indeed, in this case, all transfers are interrupted
and qmux_strm_detach() will proceed to immediate QCS free before
releasing the connection. In this situation, connection flow control is
irrelevant so the BUG_ON() should be ignored.
This crash occurs since the MUX refactoring via the following patch.
Previously, a similar BUG_ON() was used but it was incorrectly
implemented rendering it immune even to targetted cause.
3fe3251593
MEDIUM: mux-quic: simplify sending API
This should fix github issue #2456.
This does not need to be backported.
Before a dynamic server can be deleted, a set of preconditions must be
validated to ensure it is not referenced naymore by a stream or a
connection. This is implemented in srv_check_for_deletion().
The various criteria specified were incomplete. This allows a server
instance to be deleted while still be referenced by a stream and a
connection.
This bug was reproduced by using ASAN compilation. A script was used to
add and delete a server every second, while using h2load to generate
traffic with download of 1k objects. Here is the ASAN error.
==140916==ERROR: AddressSanitizer: heap-use-after-free on address 0x520000020080 at pc 0x63cb25679537 bp 0x701529ff5070 sp 0x701529ff5060
READ of size 1 at 0x520000020080 thread T7
#0 0x63cb25679536 in objt_server include/haproxy/obj_type.h:99
#1 0x63cb2568f465 in process_stream src/stream.c:1823
#2 0x63cb25a4a4a2 in run_tasks_from_lists src/task.c:632
#3 0x63cb25a4bf62 in process_runnable_tasks src/task.c:876
#4 0x63cb2596a220 in run_poll_loop src/haproxy.c:3050
#5 0x63cb2596b192 in run_thread_poll_loop src/haproxy.c:3252
#6 0x701539aa9559 (/usr/lib/libc.so.6+0x8b559) (BuildId: c0caa0b7709d3369ee575fcd7d7d0b0fc48733af)
#7 0x701539b26a3b (/usr/lib/libc.so.6+0x108a3b) (BuildId: c0caa0b7709d3369ee575fcd7d7d0b0fc48733af)
To fix this, add <curr_used_conns> to the counters checked in
srv_check_for_deletion().
Outside of this bug, one case which remains sensible is for SF_DIRECT
streams which referenced a server instance early in process_stream()
before connect_server(). This occurs with use-server directive,
force-persist rule or cookie persistence. However, after code
reexamination, the code is considered reliable as process_stream() is
not rescheduled before connect_server() invocation. These observations
have been saved in sess_change_server() documentation to ensure it
remains valid in the future.
This must be backported up to 2.6.
Server refcount is used to protect from server deletion while dumping a
server instance, for stats dump on both CLI and HTTP applet. However,
dump can be aborted prematurely before reaching the end. In this case,
server refcount is never decremented.
This bug can cause an inconsistency on servers refcount, preventing them
to be deleted even after "del server" success.
To fix this, implement release handler for both stats CLI and HTTP
applet. Drop server reference if dump was interrupted during servers
loop.
This should be backported up to 2.6.
Recent commit 2ed6068 ("MINOR: log: custom name for logformat node")
introduced a potential memory leak because when custom name is provided,
lf->name value is allocated using strdup(), thus is expected to be freed
alongside the node when the node is released.
However lf->name was only freed in some common places within log.c
cleanups and helpers func, but in reality there are still cases where
lf nodes are manually freed without making use of freeing helpers.
So this is what this patch does, it makes sure all lf freeing places now
leverage the free_logformat_node() helper function that takes care of
freeing all known allocated elements within the node, including custom
name.
This commit depends on:
- "MINOR: log: add free_logformat_node() helper function"
No backport needed unless 2ed6068 gets backported.
This is a follow up for 24a5e42db6 ("CLEANUP: log: deinitialization of
the log buffer in one function") as there was another opportunity to
make use of the new cleanup function.
A chunk message transferred via zero-copy forwarding in H1 may be
corrupted. This only happens when the chunk size is not known during the
nego stage and when there is nothing to forward when h1_donn_ff() is
called. In this case, we always emit a chunk. Because there is nothing to
forward, a 0-CRLF is emitted in the middle of the message.
The issue occurred with the HTTP stats applet only.
A simple fix is to check the size of data in the iobuf before emitting a new
chunk in h1_done_ff(). However, we still try to send outgoing data because
when this happens, it is most of time because the H1 output buffer is almost
full.
This patch should fix the issue #2453. No backport needed.
Previously, msghdr struct used for sendmsg was memset to 0. This was
updated for performance reason with each members individually defined.
This is done by the following commit :
commit 107d6d7546
OPTIM: quic: improve slightly qc_snd_buf() internal
msg_flags is the only member unset, as sendmsg manual page reports that
it is unused. However, this caused a coverity report. In the end, it is
better to explicitely set it to 0 to avoid any future interrogations,
compiler warning or even portability issues.
This should fix coverity report from github issue #2455.
No need to backport unless above patch is.
The to_forward field was added to debugging output of applets with commit
62a81cb6a ("MINOR: applet: Add callback function to deal with zero-copy
forwarding"), though it's a size_t printed as %lu, which causes complaints
on 32-bit archs. Let's just cast as %lu.
No backport is needed.
This patch is the direct followup of the previous one :
MINOR: quic: remove sendto() usage variant
This finalizes qc_snd_buf() simplification by removing send() syscall
usage for quic-conn owned socket. Syscall invocation is merged in a
single code location to the sendmsg() variant.
The only difference for owned socket is that destination address for
sendmsg() is set to NULL. This usage is documented in man 2 sendmsg as
valid for connected sockets. This allows maximum performance by avoiding
unnecessary lookups on kernel socket address tables.
As the previous patch, no functional change should happen here. However,
it will be simpler to extend qc_snd_buf() for GSO usage.
qc_snd_buf() is a wrapper around emission syscalls. Given QUIC
configuration, a different variant is used. When using connection
socket, send() is the only used. For listener sockets, sendmsg() and
sendto() are possible. The first one is used only if local address has
been retrieved prior. This allows to fix it on sending to guarantee the
source address selection. Finally, sendto() is used for systems which do
not support local address retrieval.
All of these variants render the code too complex. As such, this patch
simplifies this by removing sendto() alternative. Now, sendmsg() is
always used for listener sockets. Source address is then specified only
if supported by the system.
This patch should not exhibit functional behavior changes. It will be
useful when implementing GSO as the code is now simpler.
When using listener socket, source address for emission is explicitely
set using ancillary data for sendmsg(). This is useful to guarantee the
correct address is used when binding on a non-explicit address.
This code was implemented directly under qc_snd_buf(). However, it is
quite complex due to portability issue. For IPv4, two parallel
implementations coexist, defined under IP_PKTINFO or IP_RECVDSTADDR. For
IPv6, another option is defined under IPV6_RECVPKTINFO. Each variant
uses its distinct name which increase the code complexity.
Extract ancillary data filling in a dedicated function named
cmsg_set_saddr(). This reduces greatly the body of qc_snd_buf(). Such
functions can be replicated when other ancillary data type will be
implemented. This will notably be useful for GSO implementation.
qc_snd_buf() is a wrapper for sendmsg() syscall (or its derivatives)
used for all QUIC emissions. This patch aims at removing several
non-optimal code sections :
* fd_send_ready() for connected sockets is only checked on the function
preambule instead of inside the emission loop
* zero-ing msghdr structure for unconnected sockets is removed. This is
unnecessary as all fields are properly initialized then.
* extra memcpy/memset invocations when using IP_PKTINFO/IPV6_RECVPKTINFO
are removed by setting directly the address value into cmsg buffer
Binding on multiple addresses for QUIC is safe only if IP_PKTINFO or
equivalent is available. Else, the behavior may be undefined as the
system is responsible to choose the network interface and source address
on response.
This commit adds a warning on boot if no or partial support for
IP_PKTINFO or equivalent is detected and configuration contains UDP
binding on multiple addresses.
This should be backported up to 2.6. Special backport recommdations :
* change ha_warning() to ha_diag_warning() to ensure no spurrious
warnings will be triggered on stable releases
* IP_PKTINFO usage was introduced on 2.7. For 2.6, multiple addresses
QUIC binding is always unreliable. As such, preprocessor condition
must simply be removed so that the warning is always active regarding
of the system. Warning message should also be truncated to suppress
IP_PKTINFO reference.
make it so string array construction is performed by dedicated macro
helpers instead of manual char insertion between string members.
The goal is to easily be able to support multiple forms of array
construction depending on the data encoding format (raw, json..).
Only %hrl and %hsl logformats are concerned.
Some log variables may be prefixed with specific chars that represent
extra informations that are relevant with it but are are not directly
part of the "raw" value.
ie: '+' char is prepended before some values when "option logasap" is
used to indicate that the value has not yet reached its final value.
However, as those "metadata" are printed using the general purpose
LOGCHAR() printing helper, it's not easy to tell if they are part of the
base value or not.
In this patch we add the LOGMETACHAR() helper that is a wrapper for
LOGCHAR(). The goal is to prepare for adding some logic to prevent such
additional infos from being generated when not relevant or needed.
quotes building for some log formats is directly performed under each
switch case statement so it would become painful to add other conditions
to prevent the quotes from being generated when it's not supported by the
the data encoding format for instance (ie: JSON).
Let's centralize and simplify quotes handling by adding LOGQUOTE_START()
and LOGQUOTE_END() helper macros. If a quotation is started and not
explicitly ended, it will be automatically ended at the end of the current
logformat node:
LOGQUOTE_START() sets 'quote' variable to 1, this way LOGQUOTE_END() only
prints the ending quote when needed. LOGQUOTE_END() is systematically
called after each node switch-case (after each value). LOGQUOTE_START()
does nothing if LOG_OPT_QUOTE isn't set, so does LOGQUOTE_END().
Some rare cases such as %hsl (list of captured headers) required special
handling: in this case multiple quoted texts are generated for the same
field value so explicit LOGQUOTE_START() + LOGQUOTE_END() combination was
needed.
last_isspace variable is explicitly set to 0 in all cases except
LOG_FMT_SEPARATOR case. So we can actually simplify the code by setting
last_isspace to 0 by default and skipping the assignment for the
LOG_FMT_SEPARATOR case.
Add the ability to manually specify desired output type after a custom
field name for logformat nodes. Forcing the type can be useful to ensure
value is stored with the proper type representation. (i.e.: forcing
numerical to string to work around the limited resolution of JS number
types)
By default, type is set to SMP_T_SAME, which means the original type will
be preserved.
Currently supported types are: bool, str, sint
type_to_smp(type) does the reverse operation of smp_to_type[smp]: it takes
a type name as input string and tries to return the corresponding SMP_T_*
smp type or SMP_TYPES if not found.
Add the ability to specify custom name (will be used for representation
in verbose output types such as json) to logformat nodes.
For now, a custom name should be composed by characters [a-zA-Z0-9-_]*
Transient send errors is handled differentely if using connection or
listener socket for QUIC transfers. In the first case, proper poller
subscription is used via fd_cant_send()/fd_want_send(). For the listener
socket case, error is ignored by qc_snd_buf() caller and retransmission
mechanism will allow to reemit the data.
For listener socket, transient error code handling is buggy. It blindly
uses fd_cand_send() with <qc.fd> member which is set to -1 for listener
socket usage. This results in an invalid fdtab access, with a possible
crash or a modification of a totally unrelated FD.
This bug is simply fixed by using qc_test_fd() before using
fd_cant_send()/fd_want_send(). This ensures <qc.fd> is used only if
initialized which is only the case when using connection socket.
No crash was reported yet for this bug. However, it is reproducible by
using ASAN compilation and the following strace sendmsg() errno command
injection :
# strace -qq -yy -p $(pgrep haproxy) -f -e trace=%network \
-e inject=sendto,sendmsg:error=EAGAIN:when=20+20
This must be backported up to 2.7.
If some data are received for a lua socket while the lua script responsible
to consume these data is not ready to do so, for instance because it is
sleeping, the applet is woken up in loop because it never states it will not
consume these data yet.
To fix the issue, in the applet I/O handle, when there are outgoing data, we
always pretend the applet will not consume it. It is the responsibility to
the lua script to reactivate receives by calling Socket.receive() function.
This patch must be backported to every stable version. For 2.4 and older,
si_want_get()/si_cant_get() must be used instead of
applet_will_consume()/applet_wont_consume().
It is poosible to create a lua socket without performing any connect. In
this case, the lua socket is released because of the garbage collector.
However, the garbarge collector does not release the applet, it wakes it
up. Since commit 751b59c40b ("BUG/MEDIUM: hlua: Initialize appctx used by a
lua socket on connect only"), the applet initialization is performed on
connect. So, here, it is possible to wake an uninitialized applet. It is an
unexpected case for the applet's I/O handler, leading to a segfault because
some resources are not initialized (the stream's target in this case).
So, now, in the lua socket GC function, we take care to immediately release
uninitialized applets. At worst, the release itself is delayed. But it is
safe because we are sure the applet's I/O handler will never be executed.
In addition, we take case to increment the GC counter when the lua socket is
created. The way, uninitialized lua socket are released more quickly.
This patch should fix the issue #2451. It must be backported as far as 2.6.
When an error is triggered during the applet initialization, a dedicated
function is called to release it. Indeed, in this case, because the applet
was not initialized, the ->release callback must not be called. However,
because the init stage may be delayed to be performed during the first
applet wakeup, we must also take care to not rely on the default
appctx_free() function, to immediately release the applet. Otherwise, if the
error happens in a delayed init stage, the applet is never released.
This patch partially fix the issue #2451. It must be backported as far as
2.6.
Currently haproxy does not implement dynamic table support for QPACK. As
such, dynamic table capacity advertized via H3 SETTINGS is 0. When
receiving a non-null Set Dynamic Table Capacity instruction, close
immediately the connection using QPACK_ENCODER_STREAM_ERROR.
Prior to this patch, such instructions were simply ignored. This is non
conform to QUIC specification.
This should be backported up to 2.6. Note that on 2.6 qcc_set_error()
must be replaced by function qcc_emit_cc_app().
Close the connection using QPACK_DECODER_STREAM_ERROR when receiving an
invalid insert count increment. As haproxy does not use dynamic table,
this instruction must never be emitted by the peer.
Prior to this patch, haproxy silently ignored such instruction which is
not conform to the QUIC specification.
This should be backported up to 2.6. Note that on 2.6 qcc_set_error()
must be replaced by function qcc_emit_cc_app().
As specified in RFC 9000, a client must never emit a HANDSHAKE_DONE
frame. If this happens, the server must close the connection with error
PROTOCOL VIOLATION.
Previously, such a frame was silently discarded on server side. The
connection remained opened which is not conformant to the specification.
This should be backported up to 2.6.
Ensure every frame types are handled in qc_parse_pkt_frms. Add an
ABORT_NOW on the default case. This is safe as an unknown frame must be
rejected prior via qc_parse_frm().
As specified by RFC 9000, connection is closed on error if an unknown
QUIC frame type is received.
Previously, a frame with unknown type was silently discarded. The
connection remained opened which is not conformant to the specification.
This should be backported up to 2.6.
Global options to disable for zero-copy forwarding are now tested outside
callbacks responsible to perform the forwarding itself. It is cleaner this
way because we don't try at all zero-copy forwarding if at least one side
does not support it. It is equivalent to what was performed before, but it
is simplier this way.
There is a nego stage when a producer is ready to forward data to the other
side. At this stage, the zero-copy forwarding may be disabled if the
consumer does not support it. However, there is a flaw with this way to
proceed. If the channel buffer is not empty, we delay the zero-copy
forwarding to flush all data from the channel first. During this delay,
receives on the endpoint (at connection level for muxes), are blocked to be
sure to have the opportunity to switch on zero-copy forwarding. It is a
problem if the consumer cannot flush data from the channel's buffer, waiting
for more data for instance.
It is especially annoying with the CLI applet, because this scenario can
happen if a command is partially received. For instance without the LF at
the end. In this case, the CLI applet is blocked because it waits more
data. The frontend connexion is also blocked because channel's data must be
flushed before trying to receive more data. Worst, this happen at where no
timeout is armed. Thus the session is stuck infinitly, client aborts cannot
be detected because receives are blocked, and the applet cannot abort on its
side because there are pending outgoing data. It is clearly a situation
where it is easy to consume all CLI slots.
To fix the issue, thanks to previous commits, we now check zero-copy
forwarding support on both sides before proceeding.
This patch relies on the following commits:
* MINOR: muxes: Announce support for zero-copy forwarding on consumer side
* MINOR: stconn: Add SE flag to announce zero-copy forwarding on consumer side
* MINOR: stconn: Rename SE_FL_MAY_FASTFWD and reorder bitfield
* CLEANUP: stconn: Move SE flags set by app layer at the end of the bitfield
All the series must be backported to 2.9.
It is unused for now, but the muxes announce their support of the zero-copy
forwarding on consumer side. All muxes, except the fgci one, are supported
it.
To fix a bug, a flag to announce the capabitlity to support the zero-copy
forwarding on the consumer side will be added on the SE descriptor. So the
old flag SE_FL_MAY_FASTFWD is renamed to indicate it concerns the producer
side. It is now SE_FL_MAY_FASTFWD_PROD. And to prepare addition of the new
flag, the bitfield is a bit reordered.
This revert of commit 0b93ff8c87 ("BUG/MEDIUM: stconn: Wake applets on
sending path if there is a pending shutdown") and 9e394d34e0 ("BUG/MINOR:
stconn: Don't report blocked sends during connection establishment") because
it was not the right fixes.
We must not wake an applet up when a shutdown is pending because it means
output some data are still blocked in the channel buffer. The applet does
not necessarily consume these data. In this case, the applet may be woken up
infinitly, except if it explicitly reports it wont consume datay yet.
This patch must be backported as far as 2.8. For older versions, as far as
2.2, it may be backported. If so, a previous fix must be pushed to prevent
an HTTP applet to be stuck. In http_ana.c, in http_end_request() and
http_end_reponse(), the call to channel_htx_truncate() on the request
channel in case of MSG_ERROR must be replace by a call to
channel_htx_erase().
Thanks to the previous patch ("MEDIUM: applet: Add notion of shutdown for
write for applets"), it is no longer necessary to check SC flags to detect
shutdowns to interrupt the wait command. It is possible to remove this ugly
workaround. In addition, we only test the SE for shutdown because end of
stream and error are already checked by the CLI I/O handler. And it is no
longer necessary to remove output data from the channel's buffer because
shutdown are not reported if there are remaining outgoing data.
Of course, if the "wait" command is backported, the commit above and this
one must be backported too.
In fact there is already flags on the SE to state a shutdown for reads or
writes was performed. But for applets, this notion does not exist. Both
flags are set in same time when the applet is released. But at the SC level,
there are functions to perform a shutdown (formely the shutw) and an abort
(formely the shutr). For applets, when a shutdown is performed on the SC, if
the applet is not immediately released, nothing is acknowledge at the SE
level.
With old way to implement applets, this was not an real issue until recently
because applets accessed to the channel/SC flags. It was thus possible to
catch the shutdowns. But the "wait" command on the CLI reveals the
flaw. Indeed, when this command is executed, nothing is read or sent. So, it
is not possible to detect the shutdowns. As a workaround, a dedicated test
on the SC flags was added at the end of the wait command I/O handler. But it
is pretty ugly.
With new way to implement applets, there is no longer access to the channel
or SC. So we must add a way to acknowledge shutdown into the SE.
This patch solves the both sides of the issue. The shutw notion is added for
applets. Its only purpose is to set SE_FL_SHWN flags. This flag is tested by
all applets, so, it solves the issue quite simply.
Note that it is described as a bug fix but there is no real issue, just a
design flaw. However, if the "wait" command is backported, this patch must
be backported too. Unfortinately it will require an adaptation because there
is no appctx flags on older versions.
These both flags are set after releasing the applet, in
appctx_shut(). Concretly, it means the applet is shutdown for reads and
writes. Once set, the applet's I/O handler was no longer called. Tests on
these flags are useless. There is no chance to match them.
This case does not exist yet with the H1 multiplexer, but applets may decide to
not produce data if there is not enough room in the destination buffer (the
applet's outbuf or the opposite SE buffer). It is true for the stats applets for
instance. However this case is not properly handled when the zero-copy
forwarding is in-use.
To fix the issue, the se_done_ff() function was modified to return the number of
bytes really forwarded and to subs for sends if nothing was forwarded while the
zero-copy forwarding was blocked by the producer. On the applet side, we take
care to block the zero-copy forwarding if the applet requests more room. At the
end, zero-copy forwarding is unblocked if something was forwarded.
This way, it is now possible for the stats applet to report a full buffer and
block the zero-copy forwarding, even if the buffer is not really full, by
requesting more room.
No backport needed.
An issue was introduced when zero-copy forwarding was added to the stats and
cache applets. There is no test to be sure the upper layer is ready to use
the zero-copy forwarding. So these applets refuse to deliver the response
into the applet's output buffer if the zero-copy forwarding is supported by
the opposite endpoint. It is especially an issue when a filter, like the
compression, is in-use on the response channel.
Because of this bug, the response is not delivered and the applet is woken
up in loop to produce data.
To fix the issue, an appctx flag was added, APPCTX_FL_FASTFWD, to know when
the zero-copy forwarding is in-use. We rely on this flag to not fill the
outbuf in the applet's I/O handler.
No backport needed.
This simplifies a bit the stats applet. Because the CLI part was not
refactored yet to use the applet's buffers, there are 3 ways to produce
data:
* the HTX message for the HTTP stats when zero-copy forwarding is not
used
* raw data in the opposite endpoint buffer for the HTTP stats when
zero-copy forwarding is used
* the channel buffer when the CLI "show stat" command is evaluated
There is already a dedicated function to take care to copy data at the right
place. There is now also a dedicated function to check us the output buffer
is almost full.
Commit 91b77c1632 ("MEDIUM: mux-h1: Support zero-copy forwarding for chunks with
an unknown size") was recently pushed but it contains 3 bugs. The first one is
during the nego. The extra size reserved for the CRLF at the end of the chunk
must not be added to the offset value. Indeed, the CRLF will be appended after
the data and not prepended to them.
The second one, still during the nego, is an integer overflow when the available
room in the output buffer is computed.
Finally, the last one is when the chunk itself is formatted. This part was
totally buggy if the output buffer was not empty at the beginning.
No backport needed.
A packet is considered as reordered when it is detected as lost because its packet
number is above the largest acknowledeged packet number by at least the
packet reordering threshold value.
Add ->nb_reordered_pkt new quic_loss struct member at the same location that
the number of lost packets to count such packets.
Should be backported to 2.6.
Let's say that the largest packet number acknowledged by the peer is #10, when inspecting
the non already acknowledged packets to detect if they are lost or not, this is the
case a least if the difference between this largest packet number and and their
packet numbers are bigger or equal to the packet reordering threshold as defined
by the RFC 9002. This latter must not be less than QUIC_LOSS_PACKET_THRESHOLD(3).
Which such a value, packets #7 and oldest are detected as lost if non acknowledged,
contrary to packet number #8 or #9.
So, the packet loss detection is very sensitive to such a network characteristic
where non acknowledged packets are distant from each others by their packet number
differences.
Do not use this static value anymore for the packet reordering threshold which is used
as a criteria to detect packet loss. In place, make it depend on the difference
between the number of the last transmitted packet and the number of the oldest
one among the packet which are still in flight before being inspected to be
deemed as lost.
Add new tune.quic.reorder-ratio setting to apply a ratio in percent to this
dynamic packet reorder threshold.
Should be backported to 2.6.
The new formula for K CUBIC which arrives with RFC 9438 is as follows:
K = cubic_root((W_max - cwnd_epoch) / C)
Note that W_max is c->last_w_max, and cwnd_epoch is c->cwnd when entering
quic_cubic_update() just after a congestion event.
Must be backported as far as 2.6.
The formula for K CUBIC calculation is as follows:
K = cubic_root(W_max * (1 - beta_quic) / C).
Note that this does not match the comment. But the aim of this patch is to not
hide a bug inside another patch to update this K CUBIC calculation.
The unit of C is bytes/s^3 (or segments/s^3). And we want to store K as
milliseconds. So, the conversion inside the cubic_root() to convert seconds in
milliseconds is wrong. The unit used here is bytes/(ms/1000)^3 or
bytes*1000^3/ms^3. That said, it is preferable to compute K as seconds, then
convert to milliseconds as done by this patch.
Must be backported as far as 2.6.
The CLI command "update ssl ocsp-response" was forcefully removing an
OCSP response from the update tree regardless of whether it used to be
in it beforehand or not. But since the main OCSP upate task works by
removing the entry being currently updated from the update tree and then
reinserting it when the update process is over, it meant that in the CLI
command code we were modifying a structure that was already being used.
These concurrent accesses were not properly locked on the "regular"
update case because it was assumed that once an entry was removed from
the update tree, the update task was the only one able to work on it.
Rather than locking the whole update process, an "updating" flag was
added to the certificate_ocsp in order to prevent the "update ssl
ocsp-response" command from trying to update a response already being
updated.
An easy way to reproduce this crash was to perform two "simultaneous"
calls to "update ssl ocsp-response" on the same certificate. It would
then crash on an eb64_delete call in the main ocsp update task function.
This patch can be backported up to 2.8.
As reported by github user @JB0925 in issue #2427, there is a possible
crash in pool_flush(). The problem is that if the free_list is not empty
in the first test, and is empty at the moment the xchg() is performed,
for example because another thread called it in parallel, we place a
POOL_BUSY there that is never removed later, causing the next thread to
wait forever.
This was introduced in 2.5 with commit 2a4523f6f ("BUG/MAJOR: pools: fix
possible race with free() in the lockless variant"). It has probably
very rarely been detected, because:
- pool_flush() is only called when stopping is set
- the function does nothing if global pools are disabled, which is
the case on most modern systems with a fast memory allocator.
It's possible to reproduce it by modifying __task_free() to call
pool_flush() on 1% of the calls instead of only when stopping.
The fix is quite simple, it consists in moving the zeroing of the
entry in the break path after verifying that the entry was not already
busy.
This must be backported wherever commit 2a4523f6f is.
In issue #2427 Ilya reports that gcc-14 rightfully complains about
sizeof() being placed in the left term of calloc(). There's no impact
but it's a bad pattern that gets copy-pasted over time. Let's fix the
few remaining occurrences (debug.c, halog, udp-perturb).
This can be backported to all branches, and the irrelevant parts dropped.
The "wait" command now supports a condition, "srv-unused", which waits
for the designated server to become totally unused, indicating that it
is removable. Upon each wakeup it calls srv_check_for_deletion() to
verify if conditions are met, if not if it's recoverable, or if it's
not recoverable, and proceeds according to this, never waiting for a
final decision longer than the configured delay.
The purpose is to make it possible to remove servers from the CLI after
waiting for their sessions to be terminated:
$ socat -t5 /path/to/socket - <<< "
disable server px/srv1
shutdown sessions server px/srv1
wait 2s srv-unused px/srv1
del server px/srv1"
Or even wait for connections to terminate themselves:
$ socat -t70 /path/to/socket - <<< "
disable server px/srv1
wait 1m srv-unused px/srv1
del server px/srv1"
Conditions will need to have context, arguments etc from the command line.
Since these will vary with time (otherwise we wouldn't wait), let's just
pass them as text (possibly pre-processed). We're starting with 4 strings
that are expected to be allocated by strdup() and are always sent to free()
upon release.
Since we'll support waiting for an action to succeed or permanently
fail, we need the ability to return an unrecoverable failure. Let's
add CLI_WAIT_ERR_FAIL for this. A static error message may be placed
into ctx->msg to report to the user why the failure is unrecoverable.
We'll need to be able to verify whether or not a server may be deleted.
For now, both the verification and the action are performed in the same
function, at once under thread isolation. The goal here is to extract
the verification code into a new function that will perform these checks,
return a status between success/recoverable/non-recoverable failure, and
will also return a message for the caller.
When an applet is using its own buffers, it is important to release them, if
empty, after processing to recycle unsued buffers. It is not a leak because
these buffers are necessarily released when the applet is released. But this
leads to an excess of buffer allocations.
No need to backport.
This allows to insert delays between commands, i.e. to collect a same
set of metrics at a fixed interval. E.g:
$ socat -t20 /path/to/socket <<< "show activity; wait 10s; show activity"
The goal will be to extend the feature to optionally support waiting on
certain conditions. For this reason the struct definitions and enums were
placed into cli-t.h.
The CLI applet doesn't make use of its timeout at all, only the stream
does. That's a wonder because it allows any command's I/O handler to
trivially set a wakeup timer by simply touching the task's ->expire
field, and the I/O handler will automatically be woken up again. The
only condition for this is that we properly take care of clearing that
timeout whenever we finish processing a command and switch back to the
PROMPT state. That's what this patch does.
If a release handler produces a final message, it's currently left
pending in the CLI context and needs another I/O event to be dumped
because immediately after calling ->release, we check for states
OUTPUT and above and we wait until more data arrives.
This patch adds continue statement to go back to the loop immediately
after leaving the release handler in order to attempt to emit the
output message.
At this point it's not sure whether any release handlers are producing
messages, so it's probably not needed to backport this.
Some commands are still missing their trailing LF, and very few were even
already spotted in the past emitting more than one. The risk of missing
this LF is particularly high, especially when tests are run in non-
interactive mode where the output looks good at first glance. The problem
is that once run in interactive mode, the missing empty line makes the
command not being complete, and scripts can wait forever.
Let's tackle the problem at its root: messages emitted at the end must
always end with an LF and we know some miss it. Thus, in cli_output_msg()
we now start by removing the trailing LFs from the string, and we always
add exactly one. This way the trailing LF added by correct functions are
silently ignored and all functions are now correct.
This would need to be progressively backported to all supported versions
in order to address them all at once, though the risk of breaking a legacy
script relying on the wrong output is never zero. At first it should at
least go as far as the lastest LTS (2.8), and maybe another one depending
on user demands. Note that it also requires previous patch ("BUG/MINOR:
vars/cli: fix missing LF after "get var" output") because it fixes a test
for a bogus output for "get var" in a VTC.
"get var" on the CLI was also missing an LF, and the vtest as well, so
that fixing only the code breaks the vtest. This must be backported to
2.4 as the issue was brought with commit c35eb38f1d ("MINOR: vars/cli:
add a "get var" CLI command to retrieve global variables").
Some cli_err(), cli_msg() or even ha_error() etc are missing the trailing
LF, which breaks the continuity of the CLI parsing: the extra LF that serves
to mark the end of the command is in fact taken as the missing LF and no
extra one is added.
This patch adds the missing LF on identified messages. It might be worth
trying to proceed in a more generic way with this, given the amount of
code that is possibly at risk.
We now update the session's tracked counters with the observed glitches.
In order to avoid incurring a high cost, e.g. if many small frames contain
issues, we batch the updates around h2_process_demux() by directly passing
the difference. Indeed, for now all functions that increment glitches are
called from h2_process_demux(). If that were to change, we'd just need to
keep the value of the last synced counter in the h2c struct instead of the
stack.
The regtest was updated to verify that the 3rd client that does not cause
issue still sees the counter resulting from client 2's mistakes. The rate
is also verified, considering it shouldn't fail since the period is very
long (1m).
This adds a new pair of stored types in the stick-tables:
- glitch_cnt
- glitch_rate
These keep count of the number of glitches reported on a front connection,
in order to decide how to act with a badly defective client or a potential
attacker. For now nothing updates these counters, but all the infrastructure
needed to configure, update and retrieve them was added, including the doc.
No regtest was added yet since they're not filled yet.
It's quite uncommon for a client to decide to change the connection's
initial window size after the settings exchange phase, unless it tries
to increase it. One of the impacts depending is that it updates all
streams, so it can be expensive, depending on the stacks, and may even
be used to construct an attack. For this reason, we now count a glitch
when this happens.
A test with h2spec shows that it triggers 9 across a full test.
Here we consider that if a HEADERS frame is made of more than 4 fragments
whose average size is lower than 1kB, that's very likely an abuse so we
count a glitch per 16 fragments, which means 1 glitch per 1kB frame in a
16kB buffer. This means that an abuser sending 1600 1-byte frames would
increase the counter by 100, and that sending 100 headers per request in
individual frames each results in a count of ~7 to be added per request.
A test consisting in sending 100M requests made of 101 frames each over
a connection resulted in ~695M glitches to be counted for this connection.
Note that no special care is taken to avoid wrapping since it already takes
a very long time to reach 100M and there's no particular impact of wrapping
here (roughly 1M/s).
RFC9113 clarified a point regarding the payload from DATA frames sent to
closed streams. It must always be counted against the connection's flow
control. In practice it should really have no practical effect, but if
repeated upload attempts are aborted, this might cause the client's
window to progressively shrink since not being ACKed.
It's probably not necessary to backport this, unless another patch
depends on it.
The fetch will return true if the stream was redispatched: this is a
past action, thus we rename the fetch to better reflect its true
meaning and prevent confusions.
Documentation was updated.
While at it, the fetch was moved from internal states section to Layer 4
section, which is where it belongs.
No backport needed unless 92b2edb (" MINOR: stream: add "txn.redispatch"
fetch") gets backported.
If a certificate that has an OCSP uri is unused and gets added to a
crt-list with the ocsp auto update option "on", it would not have been
inserted into the auto update tree because this insertion was only
working on the first call of the ssl_sock_load_ocsp function.
If the configuration used a crt-list like the following:
cert1.pem *
cert2.pem [ocsp-update on] *
Then calling "del ssl crt-list" on the second line and then reverting
the delete by calling "add ssl crt-list" with the same line, then the
cert2.pem would not appear in the ocsp update list (can be checked
thanks to "show ssl ocsp-updates" command).
This patch ensures that in such a case we still perform the insertion in
the update tree.
This patch can be backported up to branch 2.8.
The ckch_store's free'ing function might end up calling
'ssl_sock_free_ocsp' if the corresponding certificate had ocsp data.
This ocsp cleanup function expects for the 'refcount_instance' member of
the certificate_ocsp structure to be 0, meaning that no live
ckch instance kept a reference on this certificate_ocsp structure.
But since in ckch_store_free we were destroying the ckch_data before
destroying the linked instances, the BUG_ON would fail during a standard
deinit. Reversing the cleanup order fixes the problem.
Must be backported to 2.8.
With the current way OCSP responses are stored, a single OCSP response
is stored (in a certificate_ocsp structure) when it is loaded during a
certificate parsing, and each ckch_inst that references it increments
its refcount. The reference to the certificate_ocsp is actually kept in
the SSL_CTX linked to each ckch_inst, in an ex_data entry that gets
freed when he context is freed.
One of the downside of this implementation is that is every ckch_inst
referencing a certificate_ocsp gets detroyed, then the OCSP response is
removed from the system. So if we were to remove all crt-list lines
containing a given certificate (that has an OCSP response), the response
would be destroyed even if the certificate remains in the system (as an
unused certificate). In such a case, we would want the OCSP response not
to be "usable", since it is not used by any ckch_inst, but still remain
in the OCSP response tree so that if the certificate gets reused (via an
"add ssl crt-list" command for instance), its OCSP response is still
known as well. But we would also like such an entry not to be updated
automatically anymore once no instance uses it. An easy way to do it
could have been to keep a reference to the certificate_ocsp structure in
the ckch_store as well, on top of all the ones in the ckch_instances,
and to remove the ocsp response from the update tree once the refcount
falls to 1, but it would not work because of the way the ocsp response
tree keys are calculated. They are decorrelated from the ckch_store and
are the actual OCSP_CERTIDs, which is a combination of the issuer's name
hash and key hash, and the certificate's serial number. So two copies of
the same certificate but with different names would still point to the
same ocsp response tree entry.
The solution that answers to all the needs expressed aboved is actually
to have two reference counters in the certificate_ocsp structure, one
for the actual ckch instances and one for the ckch stores. If the
instance refcount becomes 0 then we remove the entry from the auto
update tree, and if the store reference becomes 0 we can then remove the
OCSP response from the tree. This would allow to chain some "del ssl
crt-list" and "add ssl crt-list" CLI commands without losing any
functionality.
Must be backported to 2.8.
When deleting a crt-list line through a "del ssl crt-list" call on the
CLI, we ended up free'ing the corresponding ckch instances without fully
clearing their contents. It left some dangling references on other
objects because the attache SSL_CTX was not deleted, as well as all the
ex_data referenced by it (OCSP responses for instance).
This patch can be backported up to branch 2.4.
The only useful information taken out of the ckch_store in order to copy
an OCSP certid into a buffer (later used as a key for entries in the
OCSP response tree) is the ocsp_certid field of the ckch_data structure.
We then don't need to pass a pointer to the full ckch_store to
ckch_store_build_certid or even any information related to the store
itself.
The ckch_store_build_certid is then converted into a helper function
that simply takes an OCSP_CERTID and converts it into a char buffer.
When calling ckchs_dup (during a "set ssl cert" CLI command), if the
modified store had OCSP auto update enabled then the new certificate
would not keep the previous update mode and would not appear in the auto
update list.
This patch can be backported to 2.8.
At the beginning of the 3.0-dev cycle, the zero-copy forwarding support was
added only for the cache applet with an option to disable it. This was a
hack, waiting for a better integration with applets. It is now possible to
implement the zero-copy forwarding for any applets. So the specific option
for the cache applet was renamed to be used for all applets. And this option
is now also checked for the stats applet.
Concretely, 'tune.cache.zero-copy-forwarding' was renamed to
'tune.applet.zero-copy-forwarding'.
This field was introduced when the first implementation of the zero-copy
forwarding was added. It is now useless. However, we must still save the
body-size of the object in the cache.
Default .rcv_buf and .snd_buf functions that applets can use are now
specialized to manipulate raw buffers or HTX buffers.
Thus a TCP applet should use appctx_raw_rcv_buf() and appctx_raw_snd_buf()
while HTTP applet should use appctx_htx_rcv_buf() and appctx_htx_snd_buf().
Note that the appctx is now directly passed to these functions instead of
the SC.