Support reception via QMux of flow control MAX-STREAMS frame for
bidirectional streams. This is similar to the QUIC with shared
qcc_recv_max_streams() function.
When xprt_qstrm layer is completed, MUX layer is started. Rx buffer from
the XPRT layer is transferred to the MUX so that it can handle any extra
data following the transport parameters first frame.
Since previous commit, QCC Rx buffer is dynamically allocated only when
needed. However, qmux_init() must still allocate it when there is data
to be transferred from the XPRT layer. As a result, code has been over
extended to continue to support this case.
This patch simplifies xprt_qstrm API for the Rx buffer transfer. Buffer
content and remaining record length can now be retrieved via the single
function xprt_qstrm_xfer_rxbuf(). If the buffer is empty, nothing is
performed and XPRT layer will release it. If not empty, MUX will take
ownership of the buffer from the XPRT layer.
Allocate and release as needed the QCC buffers used for QMux protocol.
This should reduce the memory consumption of QMux. This is performed
both for send and receive buffers. Along with this, always free these
buffers in qcc_release() to prevent a memory leak.
Improve QMux memory usage at the QCS level in accordance with the
haproxy model. The tx buffer is now allocated only when used and
released as soon as it is empty.
This change requires to extend qcc_get_stream_txbuf() for QMux. Code
part related to qc_stream_desc is protected via conn_is_quic(). A
dedicated QMux bloc is added. Similarly to QUIC, a small buf can be
allocated first.
This also requires to adapt qcc_realloc_stream_txbuf() in a similar
fashion.
Clean up qcc_qstrm_send_frames(). The main change is that now return
value is clearly specified at the end of the function, depending if
everything was sent or not.
Implement close callback for xprt_qstrm layer. This is called when a
connection is prematurely closed following a connect failure. Its
purpose is to clean up all xprt resources.
A special care is required for the frontend side. Indeed,
xprt_qstrm_io_cb() can call close callback via conn_create_mux() on the
latter failure. The tasklet should then immediately be stopped as the
whole xprt layer has been freed as well.
Function conn_create_mux() has different behavior for frontend and
backend connections. In particular, on FE side, there is a risk that the
connection is freed.
Write a comment to explain these differences clearly.
Recently, an extra check has been added so that a dead connection is
immediately release on at the end of qcc_recv() operation. This is
useful when a GOAWAY frame is received from a server, so that the
backend connection is released if idle.
This step is in fact only necessary for QUIC, as qcc_recv() is called
directly from the lower transport layer. It causes issues with QMux as
in this case qcc_recv() is called via qcc_io_recv(). A crash in this
context will occur as qcc_recv() does not indicate that a release has
been performed.
To fix this, simply disable the extra check at the end of qcc_recv() for
QMux. This is fine as in this case receive operation is always followed
by qcc_io_process() which is able to release the connection in a safe
way.
No need to backport.
When QMux XPRT has successfully been able to process to transport
parameters exchange, the MUX is initialized and immediately woken up to
start transfers. However, if the connection is in an unusable state, the
latter operation will instead release the connection and all of its
network stack.
A crash would occur in case of release when finalizing the XPRT tasklet
completion. To fix this, first free every XPRT resources. MUX wake is
now conducted in a safe way as the last operation before the tasklet is
completely released.
No need to backport.
Complete initialization of xprt_qstrm layer by setting local parameters
to zero. This should prevent to emit random values to the peer.
No backport needed.
qc_frm_free() is a helper used to clean up a QUIC frame object. It is
used by MUX layer both for QUIC and QMux protocols.
This function takes a pointer to the underlying quic_conn, used only for
trace purpose. This patch fixes its usage for QMux to ensure that in
this case a NULL value is used.
No need to backport.
When dealing with EOH block, we must be sure to force the close mode for
message with no payload but annoncing a non-null content-length.
It is mainly an issue on the server side but it could be encountered on
client side too. Without this fix, a request can be switched to the DONE
state while the server is still expecting the payload. In an ideal world,
this case should not happen. But in conjunction with other bugs, it may lead
to a desynchro between haproxy and the server.
Now, when a non-null content-length is announced but we know we reached the
end of the message, we force the close mode. The only exception is for
bodyless responses (204s, 304s and responses to head requests).
Thanks to Martino Spagnuolo (r3verii) for his detailed report on this issue.
This patch must be backported to all stable version.
Checks are already made on H2 to detect inconsistencies between
advertised content-length and transferred data (excess of data or
premature END_STREAM flag on DATA frame). However, as found by
Martino Spagnuolo (r3verii), a subtle case remains: if the END_STREAM
appears on the HEADERS frame (i.e. a regular request for example),
then the check is not made. In this case it is possible to advertise
more contents than will really be transferred. If the other side uses
HTTP/1.1, and the server responds before the end of the transfer,
this means that the number of advertised bytes that will never be
transferred and that the server will drain will be taken from the
next request, effectively hiding a part of the header.
In practice this can be used to force subsequent requests to fail, or
when running with "http-reuse never" or when running with a totally
idle server, to perform a request smuggling by constructing specially
crafted request pairs where the first one is used to trigger an early
response and hide parts of or all headers of the second one, to
instead use a second embedded one that was not subject to analysis.
The risk remains moderate given the low prevalence of "http-reuse never"
in production environments, and of idle servers.
The fix consists in detecting if advertised content-length remains when
processing an END_STREAM flag on a HEADERS frame. It also does it for
trailers, which turn out to be another way to abuse the bug. However it
takes great care not to break bodyless responses (204, 304 and responses
to HEAD requests) that may present a content-length that doesn't reflect
the presence of a body in the response.
A temporary alternative to the fix is to disable HTTP/2 by specifying
"alpn http/1.1" on "bind" lines, and adding "option disable-h2-upgrade"
in HTTP frontends.
This must be backported to all stable versions.
In 3.4-dev6, commit de5fc2f515 ("BUG/MINOR: server: set auto SNI for
dynamic servers") allowed to properly set the SNI, and return an error
message. However the error message is leaked after being printed on the
CLI.
This should be backported to 3.3.
In 3.4-dev7, commit e1738b665d ("MINOR: debug: read all libs in memory
when set-dumpable=libs") reads dependencies into memory to store them as
a tar archive for later debugging. There was an attempt to mark the whole
archive read-only, except that the size passed in argument to mprotect()
is wrong: lib_size is only assigned after the operation and is still zero
at the moment this is done. new_size ought to be used instead.
This needs to be backported wherever the commit above is backported, at
least 3.2.
In 2.8, commit ead43fe4f2 ("MEDIUM: compression: Make it so we can
compress requests as well.") added the ability to independently enable
compression on request and/or response. However there's a bug in the
"compression direction response" case, which preserves only the request
flag and adds the response direction instead of clearing the request
flag, so this directive would clear offload and make it impossible to
disable request if it was already previously enabled.
This can be backported to stable releases as far as 2.8.
When a http request is sent during an http healthcheck, if an error is
triggered while the output buffer is a small buffer, another attempt is made
with a larger one. When this happens, the temporary chunk used to format
headers must be released.
No backport needed.
Ruleset are stored in a global tree, released on deinit staged. All errors
are fatal and abort the configuration parsing. So the current ruleset must
not be released here.
The test to remove trailers from chunked messages was inverted and is thus
ineffective. The flag for the requests was tested on client side and the flag
for the response was tested on server side. It should be the opposite.
This patch must be backported as far as 3.2.
When the EOH block is processed, before sending message headers, there is a
test to know if there is no payload. In case of a chunked message, a
null-chunk is emitted, except for bodyless response. For instance, a
response to a HEAD request has no payload at all and no null-chunk.
However, the test for bodyless responses is not correct. Only
H1S_F_BODYLESS_RESP flag is tested. But this flag can be set on server side
when we are processing the request. To fix the issue, the test was
adapted. The null-chunk is added if a message with no payload is chunked and
it is a request or a non-bodyless responses.
This patch must be backported to all stable version.
A typo in commit e51be30f78 ("BUG/MINOR: log: consider format expression
dependencies to decide when to log") made HRSHP appear twice (persistent
response) while the second one ought to be HRSHV (volatile response, e.g.
header values). This is harmless in practice since logs always wait for
at least headers.
This should be backported wherever the patch above was backported.
In h2_init(), if we have a failure while creating the h2c, and we
allocated shared_tx_bufs, don't forget to free it, otherwise we'll have
a memory leak.
This was introduced in 3.1 by commit a891534bfd ("MINOR: mux-h2: allocate
the array of shared rx bufs in the h2c"), so the fix should be backported
as far as 3.2.
When receiving a PRIORITY frame, when checking if the stream id provided
is ours, ignore bit 31, as it is the exclusive bit, and not part of the
stream id, whoever sends a PRIORITY frame with its own id and the
exclusive bit set will not be considered an error, as it should per the
RFC.
The impact is basically non-existent since we don't use PRIORITY frames,
it's only that we would ignore such an invalid frame instead of breaking
the connection.
The bug was introduced in 1.9 with commit 92153fccd3 ("BUG/MINOR: h2:
properly check PRIORITY frames") so the fix must be backported to all
versions.
Commit e67e36c9eb introduced
tune.h2.log-errors, that would let you pick if you wanted to know about
stream errors, connection errors, or no error.
However, a logic error made it so no error will be picked for any value
except for "none", in which case connection would be picked. Fix that by
just checking the strcmp() return value correctly.
This should be backported wherever e67e36c9eb
has been backported.
A malformed tcp option with an option length set to 0 can cause
an infinite loop on ip.fp converter.
The patch also forces the computation to use an unsigned char to
avoid a shift back during the parsing.
This fix should be backported on all versions including the ip.fp
converter.
The proxy error counter was not updated in h2c_frt_handle_headers() in
case of failure to decode a HEADERS frame. Make sure to keep it updated.
This can be backported to all stable versions.
Commit aab1a60977 ("BUG/MEDIUM: h2/htx: always fail on too large trailers")
explicitly returned an RST_STREAM on failure to decode some trailers, and
used the code H2_ERR_INTERNAL_ERROR. However there are multiple possible
causes for this failure to happen, and it turns out that it's much more
likely to be related to a protocol error than a decompression error. So
let's change this to PROTOCOL_ERROR, and count a protocol error on the
proxy and in the session.
This can be backported to all stable versions (with adjustments related
to these versions, maybe focusing on 3.2 max is reasonable).
A new type of transaction was introduced for master-cli streams. So
SF_TXN_PCLI flag and functions to allocate and destroy PCLI transactions
were added.
In the stream structure, all pcli_* members were moved in the pcli
transaction and the txn union was updated accordingly.
When it was ambiguous, a test on the transaction type was performed. For
instance to destroy the transaciton.
To be able to deal with different types of transaction for a stream, new
stream flags was added to know the transaction type when allocated. For now
only HTTP transactions can be allocated, so only SF_TXN_HTTP was
introduced. The mask SF_TXN_MASK must be used to get the transaction type.
The transaction type is set when it is allocated and removed when it is
destroyed.
The HTTP transaction is moved in an union. For now, it is the only possible
transaction that can be allocated. But that will change. Thanks to this
commit and the next one, it will be possible to deal with different kind of
transactions for a stream.
This patch looks quite huge, but it is more or less a renaming of all
accesses to "txn" field by "txn.http".
The maximum size allowed for the payload pattern was increase up to 64 bytes
(65 bytes because of the trailing \0), to be able to use a sha256 of random
data for instance. It could be useful to prevent any data smuggling on the
payload.
Note that on the CLI, it could be possible to have only the buffer size as a
limit, because the command line is only consumed once all commands are
executed. The payload pattern is only a pointer in the buffer where the
command line was copied. However, for the master CLI, the data are streamed
to the worker, so we must keep a copy of he payload pattern. This is why we
must limit its size.
It is now possible to deal with too big payload to fit in a buffer, without
changing the buffer size. By default, a payload up to 128 KB can be
dynamically allocated. "tune.cli.max-payload-size" global parameter can be
used to change this value, with some caution for huge values.
For CLI command handler functions, there is no change at all. A pointer on
the payload is still passed as parameter. Internally, an area is allocated
for the payload only if it is too big.
The payload pattern used to detect the end of the payload is part from the
allocated area.
The payload is now saved as a buffer in the CLI context instead of a simple
pointer. It is mandatory to be able to reallocate the payload if it is too
big.
Instead of copying the payload pattern in the CLI context, we now only save
a pointer on this pattern. It is possible because the command line is copied
in the CLI context. Arguments are already handled this way when the command
is processed.
There was a test below the "release" label on conn->owner to decide
whether to kill the connection or not. But this test is not needed,
because:
- for frontends, it's always set so the test never matches
- for backends, it was NULL on the second stream once a request
was being reused from an idle pool, so it couldn't be used to
discriminate between connections. In practice, the goal was to
try to detect certain dead connections but all cases leading to
such connections are either already handled in the tests before
(which don't reach this label), or are handled by the other
conditions.
Thus, let's remove this confusing test.
Some places use conn->owner to retrieve the session. It's valid because
each time it is done, it's on the frontend, though it's not always 100%
obvious and sometimes requires deep code analysis. Let's clarify these
points and even rely on an intermediary variable to make it clearer. One
case where the owner couldn't differ from the session without being NULL
was also eliminated.
When installing a mux on the backend, unless we have a good reason for
keeping the session set in conn->owner, we must reset it. Having the
session there just hides potential bugs and prevents certain tests from
being properly done.
Now it is much clearer: conn->owner remains set to the session on
frontend connections, is set to the session when the connection is
private or assimilated private and belongs to the session list, or
is NULL.
When an idle connection is private or considered private, session_add_conn()
is called to add it to the list of connections owned by the session. But
in case of allocation failure, the session is not set, which results in
a long list of possible situations that are all corner cases which are
difficult to test (and debug).
This commit relies on the fact that it is already permitted to have
conn->owner pointing to a session even if the connection couldn't be
added to the session's list, as this was already the case in
conn_backend_get() when dealing with HOL_RISK. Also as seen in commit
3aab17bd56 added in 2.4, it is already possible to have conn->owner
set with the connection not being in a list, and only the list element
is checked for this.
This commit modifies session_add_conn() to always set conn->onwer, even
if the list element couldn't be allocated. This way it's possible to
always refer to conn->owner to find the session owning a private conn
even in case of failure to allocate an entry. This requires to change
the checks on conn->owner to a check of the list element to see if the
connection belongs to a session, the pre-assignment of sess to
conn->owner in conn_backend_get() is no longer needed, same for the
pre-assignment in http_wait_for_response(), and that's all.
The H1 mux remained unchanged because since it cannot multiplex, in
case it fails to allocate a pconn, it instantly kills the connection.
The bytes_in, bytes_out, {req,res}.bytes_{in,out} sample fetch functions
are marked as internal dependencies only. But that's not exact, they are
statistics. Request traffic (bytes_in, req.bytes*) is usable starting
from the request, while response traffic (bytes_out, res.bytes*) is usable
as soon as a response begins to be received, and all are valid till the
end of the transaction.
The impact is that the log-format below:
log-format "req.bytes_in=%[req.bytes_in] req.bytes_out=%[req.bytes_out] res.bytes_in=%[res.bytes_in] res.bytes_out=%[res.bytes_out]"
is emitted too early and only logs zeroes when uploading 1MB and
downloading 1MB:
req.bytes_in=0 req.bytes_out=0 res.bytes_in=15288 res.bytes_out=0
This patch marks the request stats RQFIN and the response stats RSFIN,
so that they're valid at any moment and the logs backend knows it must
wait for the latest moment to emit such a line. With this change, the
line above now correctly produces:
req.bytes_in=1000157 req.bytes_out=1000157 res.bytes_in=1048629 res.bytes_out=1048629
This should be backported as far as the latest LTS probably, along with
these 2 previous patches:
BUG/MINOR: log: consider format expression dependencies to decide when to log
MINOR: sample: make RQ/RS stats available everywhere