Since the previous patch, accounting of HTTP requests in progress on MUX
QUIC as been simplified. Now QCC <nb_hreq> identifies them until the QCS
free.
Thus, MUX_CTL_GET_NBSTRM can be simplified. Instead of relying on
<nb_sc> plus the <opening_list>, simply return <nb_hreq> value which
should be slighly identical.
Muxes are responsible to release connections once they are inactive and
won't be reusable. In QUIC mux, such connections are detected via
qcc_is_dead(). The first precondition is that there is no more upper
streams attached. This was accounted via QCC <nb_sc> counter.
A special characteristic of QCS instances is that they can be in
detached state : upper stream has been removed but there is still data
to emit. Such QCS were not taken into account in qcc_is_dead(), so a
connection could be freed with some remaining data not yet emitted.
It is also not possible for QUIC MUX to simply look at the QCS tree to
determine if the connection is inactive. Indeed, some streams are opened
for protocol internal usage. This is the case for example with HTTP/3
unidirectional control stream or QPACK encoder/decoder streams. These
streams are never closed. In the end, only requests streams should be
taken into account for the connection activity.
This patch improves the situation by reworking <nb_hreq> QCC counter.
Previously, it served for http-request timeout implementation. However,
this timeout only relies on <opening_list> now. Thus, <nb_hreq> scope is
changed : it is now incremented via qcs_wait_http_req(), used by app
protocol layer once a request stream is identified. Decrement is
performed on qcs_free(), so this guarantees that a connection cannot be
freed anymore if request streams still exists, unless if inactivity
timeout fires. As such, <nb_hreq> now supersedes <nb_sc> entirely, so
the qcc_is_dead() can now relies on the former.
Along with this change, qcc_timeout_task() must be updated. Call to
qcc_is_dead() was unnecessary prior to this patch as timeout handling
was only active when no upper streams were attached. When tested, both
<nb_sc> and QCC <task> were already null, so a connection was always
released on timeout, as expected. With qcc_is_dead() now checking
<nb_hreq> instead, this is not always the case anymore. In fact, this
check is unnecessary as inactivity timeout serves precisely to free a
stucked connection with remaining data to emit.
This patch also has some impact on http-keep-alive timeout. Previously,
this timeout could be armed if only detached streams remained. Now, it
is only applicable if all QCS request instances are closed and freed.
Thus, qcc_reset_idle_start() is now closed directly on qcs_free().
Ideally this should be backported up to 2.6, or at least 2.8 as QUIC
experimental status was removed there.
MUX implements a timeout for HTTP keep-alive which monitors the delay
between two HTTP requests. This is only applicable for frontend
connections, as on the backend side idle connections can be kept in the
server pool. In QUIC mux, this timeout relies on QCC <idle_start> which
is refresh when the last request is terminated.
This patch modifies the refresh operation so that it is only performed
for frontend connections. This is not strictly necessary but the timeout
timeout management is now clearer and it eliminates an unnecessary
operation for backend connections.
Similarly, http-request timeout is also only applicable for frontend
connections. This relies on qcs_wait_http_req() function. A request QCS
is inserted in <opening_list> until the headers are received. This is
unnecessary on the backend side so this is excluded as well.
HTTP/3 implements a GOAWAY frame for graceful shutdown. This allows to
reject any new stream opening with a larger ID. This is implemented via
HTTP/3 attach() callback called by qcs_new().
When HTTP/0.9 is used, there is no similar mechanism. This renders some
feature such as server max-reuse difficult to implement. This patch now
provides a method for such protocols with no graceful shutdown support.
Instead of invoking attach() callback, a stream is now immediately
resetted if the application protocol layer is already closed.
This patch does not change the behavior for HTTP/3. Only limited
protocols (currently only HTTP/0.9) without graceful shutdown are
impacted. These protocols are identified as their shutdown() callback is
nul.
This change is only necessary for HTTP/0.9 as there is no equivalent of
HTTP/3 GOAWAY in this case.
req_body field is no longer used, except in trace messages. And in fact, it
is not necessarily true if some data are received with the request headers.
So no reason to still use it.
When an error is reported when reading request data, the hstream now try to
send a 400-bad-request to the client. Before, the connection was just closed
with no error message.
The direct fowarding support was only relying on "hs->to_write" value. But
we must be sure to retry if fast-forward data are still there in the I/O
buffer.
No client timeout was set with haterm. It could be an issue with
unresponsive clients. So the I/O timeout of the SC is initialized to the
frontend client timeout when the hstream is created. Then a read activity is
reported when data are received. This read activity is used to set an
expiration date on the hstream task and test it when the hstream is woken up
with TASK_WOKEN_TIMER reason.
When a client timeout is detected, the hstream try to send a 408 and report
an error.
There were several issues with the handling of end of the request or end of
response. The main problem was about the request draining.
To help to fix these issues, two flags were introduced:
* HS_ST_HTTP_EOM_RCVD: to know the request was fully received
* HS_ST_HTTP_EOM_SENT: to know the response was fully sent
Thanks to these flags some parts were reviewed and simplified.
In the I/O callback function, outside of any error, the hstream task is now
woken when one of the direction is not finished or when there are still some
data in a buffer.
The function hstream_must_drain() was reworked to properly drain request
with no content-length before replying.
The condition to wake hstream up to drain the request after replying was
also reworked, and moved ouside of the else block. Indeed, it must also be
evaluated when the response was fully sent in one call, when request headers
were processed.
Finally, the condition to shut the hstream was slighly adapted to use the
new flags. In addition, we now rely on se_shutdown().
When draining the request, if some data were received, no subscribe for
receives was performed to get the remaining. However, because request data
are just ignored, we must always subscribe until it was fully
drained. Otherwise, haterm will never be woken up to drain more data.
When request data were received, the request body length was decremented
accordingly with no check on it to be sure it was set. However, it remains
equal to 0 for chunked requests or H2/H3 requests with no content-length.
So now, it is only decremented when it is greater than 0.
When draining the request data, if an error was reported while some data
were received, the error was not processed immediately. This part was copied
from tcpchecks where the response should be processed first. For haterm, the
request data are ignored. So no reason to wait to handle the error. It may
be an issue because the response may be sent in the meanwhile.
When haterm was waiting for request headers, there was two test to know if
it had to drain the request data before replying. One of them was useless
and was thus removed.
In h1_parse_full_contig_chunks(), we first try to reserve the bigger HTX
DATA block as possible. It is ajusted at the end of chunks parsing or
removed if no data was copied. However, it should also be removed when a
parsing error is triggered. It could be an issue for http health checks and
haterm to properly handle errors.
This patch should be backported as far as 2.6.
When http_auth_bearer() sample fetch function is called with a custom header
and the header is not found or type didn't match 'Bearer', a mismatch must
be reported instead of an empty string.
This patch should be backported as far as 2.6.
Several return value for chunk_istcat() or chunk_memcat() calls were not
tested. Now, 0 is returned on failure.
Concretly, for now, it is unexpected to trigger error because the result
cannot exceed the buffer size. Data are extracted from an HTX message.
At first glance, no reason to backport it.
According to ACME RFC contact email is optional.
Letsencrypt used it some long time ago, but not today.
Currently HAProxy always sets the value of the contact mail to a string
that is read from the config, but if that string is not specified,
it sets %s in mailto:%s to null, which cases new account request
to fail in pebble.
Also HAProxy currently passes termsOfServiceAgreed bool to requests
that contain onlyReturnExisting, that isn't needed according to the RFC
and other ACME impls.
This patch dynamically builds the account request JSON to address that.
Can be backported to 3.2
As reported by Huangbin Zhan in github issue #3355, we're too lax on
the :protocol pseudo header. It is currently accepted with regular
CONNECT as well as non-CONNECT methods while it only ought to be
accepted with extended CONNECT (i.e. CONNECT after the connection
negotiated the RFC8441 extension). Let's refine the check in H2 by
leveraging the new flag H2_MSGF_EXT_CONN_OK that is passed by the
caller when the connection supports the extension. This is sufficient
to sort the various cases.
The proto upgrade regtest was updated to verify that CONNECT with
:protocol without nego and another method with nego and :protocol
both fail.
Thanks to Huangbin Zhan (@zhanhb) for the report and helpful reproducer.
This needs to be backported to all versions. It relies on these patches
first:
REGTESTS: http-messaging: always send RFC8441 client settings to use ext connect
BUG/MINOR: mux-h2: condition the processing of 8441 extension to global setting
MINOR: mux-h2: add a new message flag to indicate ext connect support
When rfc8441 (extended connect) is disabled via
h2-workaround-bogus-websocket-clients, we properly refrain from
advertising support for extended connect, but we should also ignore
the incoming setting, otherwise it can remain enabled if the client
advertises it.
This should be backported to stable versions.
Function h2_phdr_to_list() was missing the decoding for the :protocol
header and would emit :UNKNOWN in this case. It's only used in traces
so it's not important. The fix can be backported in all versions.
The tests were validating extended connect without sending the setting
in the client settings frame. It currently works due to a bug, so let's
fix the vtc first.
All ci_insert() calls in pcli_parse_request() were ignoring the return
value, silently miscounting the bytes to forward if an insertion failed.
Add a check on each call and return -1 on failure. In practice this has
no impact: the channel receive machinery enforces a maxrewrite reserve,
so there is always sufficient room in the buffer for these small
prefixes by the time pcli_parse_request() is called.
Must be backported in every maintained versions, the list of available
commands might change.
When @@<pid> is matched in pcli_parse_request(), no "operator -" or
"user -" is being sent before the command, like it's done for @<pid>.
It leads to privileges not being respected and commands are sent as
admin.
Fix this by applying the access-level downgrade in the @@<pid> path,
like it's done for @<pid>.
Must be backported to 3.2.
Reported-by: Omkhar Arasaratnam <omkhar@linkedin.com>
This test first performs two successive requests over the same
connection where reuse is expected, then perform two 401 which must
both work, testing both the transition from null->sess, and sess->sess.
This test could be backported to detect changes related to private
sessions.
Thanks to Omkhar Arasaratnam for the test.
During the architectural review leading to commit 90b2154d93 ("MEDIUM:
muxes: always set conn->owner to the session that owns the connection"),
we wondered whether srv_conn->owner could ever be NULL or even invalid,
or if it ought to be changed to sess, and the projections of various use
cases, as well as a number of attempts to fool it led us to conclude
that it was always valid since the connection is private. So it was
considered safer not to start fiddling with the pointer in case it
could still match a previous session after a reuse, which would match
the scenario described in the session_add_conn() comment.
Actually there was exactly one case where a NULL could be met, and that
was covered by the preliminary call to conn_set_owner() that was removed
in that patch, precisely related to the one that the next patch tried to
address: in http-reuse always, after the second request on a connection
releases the connection, the owner can now become NULL, so if an NTLM
header is seen at this point, it will crash.
Interestingly, after the immediately following commit was merged,
d93c53b0df ("MEDIUM: session: always reset the conn->owner on backend
when installing mux"), the problem became immediate as the conn's
owner is now null during operation if the connection is not private, and
now the first response in NTLM is sufficient to crash the process. On
the other hand, thanks to the two patches above, we're now certain never
to meet a different session, which was the sought goal: either the session
is normal and it has no owner, or it's private and it has <sess> as owner.
Also with HTTP/1 (since the code explicitly checks for H1), there may be
a single request at a time on a connection so the owner should either be
the session or NULL.
So this patch finally implements the original plan, to pass <sess> to
session_add_conn(). The call is idempotent if the owner is already set,
but at least the function performs some preliminary sanity checks which
are quite welcome, so better continue to always call it.
Note that this is only for 3.4 or any branch that has exactly the two
patches above. And if the patches above were to ever be backported
(together), this one would be needed as well.
Thanks to Omkhar Arasaratnam for reporting this regression.
AES-CBC uses a 16-byte block size, and PKCS padding always adds at least
one byte (up to a full 16 bytes when input is already block-aligned), so
the encrypted output is always larger than the input. Without checking
that the output buffer can hold the padded result, encryption could
overflow it. Add a pre-encryption guard for block cipher (blksize > 1)
that rejects the operation when the output buffer is too small.
No backport needed.
Reported-by: Omkhar Arasaratnam <omkhar@linkedin.com>
The exact same issue that was fixed for ip.fp with commit dbf471f99a
("BUG/MAJOR: net_helper: ip.fp infinite loop on malformed tcp options")
also exists for tcp_options_list: an option with a zero size will prevent
the offset from making any progress and will loop forever. In this case,
since we produce output, trash->data gets incremented on each loop and
the byte at ofs is used to fill the memory there, quickly overflowing the
area.
The exact same fix as above was applied here again, including the uchar
cast to make sure we don't go back-and-forth between two positions.
Such TCP options are not valid and will not be reported by the TCP stack,
however they can happen in case options are passed in headers by a first
proxy, or are offered in a return directive fed from a query string as a
means of providing a debugging tool for admins.
Thanks to Omkhar Arasaratnam for reporting this issue.
This fix must be backported to 3.2 where the commits above were
backported.
The error handler at do_resolve_parse_error freed varname and resolvers_id
but missed freeing rule->arg.resolv.opts (allocated via calloc). Added
ha_free(&rule->arg.resolv.opts) to the cleanup path.
This patch could be backported to all stable branches.
In resolv_check_response(), the condition 'if (!srvrq->named_servers)' was
inverted - it ran the named server lookup when the tree was empty/NULL
instead of when it was populated. This prevented proper hostname-based
matching of SRV records to servers from the state file.
The issue was introduced when tree of servers was replaced by a cebis tree
(fdf6fd5b45).
This patch must be backported to 3.3.
If an error is triggered while the requester was allocated, it is not
immediately released. It is not really a memory leak because the requester
will be reused laster, on a next attempt, or released on deinit. For a
do-resolv action, the requester is released with the stream. So no leak
here. But it is probably not expected to keep a newly allocated requester in
case of error.
This patch could be backported as far as 2.6.
When an error is reported on an expect rule for tcp and http health-checks,
a dedicated message is reported with details about the wrong match. However,
this was never performed for HTTP health-check because of the wrong test on
the check type.
In addition, when an error was reported on an "expect hdr" rule, a break
statement was missing. So the error message could be truncated (but never
emitted because of the issue above).
This patch should be backported to all stable versions.
When we removed the use of connect() to reach DNS servers in 3.3 with
commit 2c7e05f80e ("MEDIUM: dns: don't call connect to dest socket for
AF_INET*"), we accidentally lost a check on the server's address in
responses, opening the possibility of spoofed DNS responses for someone
who knows both haproxy's IP:port and the transaction ID.
In practice, the impact is very low, because:
- DNS servers IP addresses are almost always known, and often even among
the widely used ones (1.1.1.1, 8.8.8.8 etc), and their port is 53.
- all DNS "security" relies on the ignorance of the transaction ID and
is possible source port, so if either the attacker is on-path and sees
them, or it's off-path and has to guess them, but in any case it's
trivial to spoof the known server in responses, with or without the
check.
Regardless, let's not further weaken the protocol and do the check.
Thanks to Omkhar Arasaratnam for reporting this issue.
An interesting observation while testing this fix was that the code does
support UNIX dgram sockets (via connect()) but that since we don't bind
to a local UNIX socket to send requests, the server's recvfrom() doesn't
get any address and has nowhere to respond to. So in practice while the
code is designed to deal with UNIX sockets, these cannot work by design.
This fix must be backported to 3.2 where the commit above was backported.
Both keywords are used with dns-01 and dns-persist-01 challenges to pass
information to an external DNS provisioning tool (e.g. the dataplaneAPI)
via the "dpapi" sink. provider-name sets the DNS provider identifier and
acme-vars passes arbitrary tool-specific variables.
Thanks to @oliwer for reporting the issue.
Must be backported to 3.2, however previous version don't have
"dns-persist-01".
The docs (readme and configuration.txt) still used to mention that OTEL
was under development. Now that it's released, let's indicate that it's
ready with the download URL.
Commit d12edebe4a ("BUG/MAJOR: mux-h2: detect incomplete transfers on
HEADERS frames as well") tried to enforce strict matching between
advertised content-length and transferred data when dealing with ES on
a headers frame. It purposely arranged the code so that it would cover
both headers and trailers. The problem is, in h2c_dec_hdrs() we preset
message flags (msgf) based on the current state and knowledge related
to the stream being processed, then we pass these flags to the headers
parser and use their final state to perform some extra checks.
MSGF_BODY_CL was set by the parsers themselves when processing a
content-length header, but when parsing a trailers frame, it will not
be set, and due to this the matching between the remaining expected
content-length and the transferred data is not verified, so the fix
above doesn't work for trailers.
This patch sets MSGF_BODY_CL to the same value as H2_SF_DATA_CLEN so
that during headers it remains zero, but it matches what was learned
during headers when processing trailers. It is sufficient to re-enable
the check that was attempted in the commit above.
The impact remains the same as the one indicated in the commit above: in
practice this can be used to force subsequent requests to fail, or when
running with "http-reuse never" or when running with a totally idle server,
to perform a request smuggling by constructing specially crafted request
pairs where the first one is used to trigger an early response and hide
parts of or all headers of the second one, to instead use a second
embedded one that was not subject to analysis. As such, the risk remains
moderate given the low prevalence of "http-reuse never" in production
environments, and of idle servers. Again, a temporary alternative to the
fix is to disable HTTP/2 by specifying "alpn http/1.1" on "bind" lines,
and adding "option disable-h2-upgrade" in HTTP frontends.
Many thanks to Pratham Gupta / alchemy1729 for spotting and analyzing
this problem, and for providing a lightweight reproducer to illustrate
the problem!
This fix must be backported to all versions where the fix above was
backported (i.e. all). Note that it depends on this previous commit
otherwise trailers will always break:
BUG/MEDIUM: mux-h2: fix the body_len to check when parsing request trailers
As a side note, it's worth noting that these temporary message flags have
reached a level of pain and fragility that really warrants a complete
rework. Ideally we should have a pair of such flags in the h2s (one per
direction) and the callers of the parsers should point to them so that
they are always up to date. And by having generic HTTP flags instead of
H2, we could better unify the h1/h2/h3/fcgi processors (and maybe avoid
some HTX conversion). One flag could even indicate that trailers are
being parsed (since they're last) so as to ease this detection down the
chain.
The h2 content-length validation in commit d12edebe4a ("BUG/MAJOR:
mux-h2: detect incomplete transfers on HEADERS frames as well") was
insufficient. The content-length check is still ineffective on request
trailers and it could not work by default due to the fact that the
default body_len is used in h2c_frt_handle_headers() when processing
trailers, instead of passing h2s->body_len, which was necessarily parsed
before reaching trailers. Let's fix this point first, otherwise fixing
the second issue would break trailers.
Many thanks to Pratham Gupta / alchemy1729 for spotting and analyzing
this problem, and for providing a lightweight reproducer to illustrate
the problem!
This fix must be backported to all versions where the fix above was
backported (i.e. all).
The opentelemetry addons now live outside haproxy sources and is
available at https://github.com/haproxytech/haproxy-opentelemetry/
The addon must be built using the EXTRA_MAKE option from HAProxy
Makefile:
$ PKG_CONFIG_PATH=/opt/lib/pkgconfig make -j8 TARGET=linux-glibc
EXTRA_MAKE="/tmp/a/a/haproxy-opentelemetry" OTEL_DEBUG=1 OTEL_USE_VARS=1
The OpenTelemetry filter has been moved out of the haproxy source tree and
now lives and is developed in the haproxy-opentelemetry repository as an
external addon. Removed all hard-coded references to USE_OTEL and to the
addons/otel directory from the main Makefile, as the addon is now plugged
in through the generic EXTRA_MAKE hook added in the previous commit.
Allow to call an external Makefile called Makefile.inc in order to build
complex addons.
make TARGET=linux-glibc ... EXTRA_MAKE="/path/to/addon1" \
EXTRA_MAKE+="/path/to/addon2"
The historical code dealing with timeout was left with a confusing
condition that is always true but always requires some analysis. It
would check if some streams were left pending before deciding to
release the connection. It could indeed be problematic to leave with
no timeout and an active connection!
As Christopher figured, the situation cannot exist because a first
check ensures there's no more h2c via h2c_may_expire(), then a call
to h2_wake_some_streams() will call h2s_wake_one_stream() for each
of the h2s, and all those with no h2c are purged. Thus on return
from h2_wake_some_streams() we're guaranteed to have an empty tree.
Let's just remove the condition and clean up the code.