Commit graph

79 commits

Author SHA1 Message Date
Willy Tarreau
e158b7efb7 CLEANUP: h1: make use of the multi-byte matching functions
Instead of leaving the hard-coded non-trivial operations in the H1
parsing code, let's just rely on the new intops functions that do the
same and that are less prone to being accidentally touched. It was
verified that the resulting code is exactly the same.
2024-04-24 16:05:38 +02:00
Willy Tarreau
b9bf16b382 BUG/MINOR: h1: fix detection of upper bytes in the URI
In 1.7 with commit 5f10ea30f4 ("OPTIM: http: improve parsing performance
of long URIs") we improved the URI parser's performance on platforms
supporting unaligned accesses by reading 4 chars at a time in a 32-bit
word. However, as reported in GH issue #2545, there's a bug in the way
the top bytes are checked, as the parser will stop when all 4 of them
are above 7e instead of when one of them is, so certain patterns can be
accepted through if the last ones are all valid. The fix requires to
negate the value but on the other hand it allows to parallelize some of
the tests and fuse the masks, which could even end up slightly faster.

This needs to be backported to all stable versions, but be careful, this
code moved a lot over time, from proto_http.c to h1.c, to http_msg.c, to
h1.c again. Better just grep for "24242424" or "21212121" in each version
to find it.

Big kudos to Martijn van Oosterhout (@kleptog) for spotting this problem
while analyzing that piece of code, and reporting it.
2024-04-24 11:50:36 +02:00
Willy Tarreau
fadabc430f CLEANUP: h1: remove unused function h1_measure_trailers()
This one stopped being used in 2.1 when HTX became mandatory,
let's drop it.
2024-01-31 15:22:12 +01:00
Willy Tarreau
0d76a284b6 BUG/MEDIUM: h1: always reject the NUL character in header values
Ben Kallus kindly reported that we still hadn't blocked the NUL
character from header values as clarified in RFC9110 and that, even
though there's no known issure related to this, it may one day be
used to construct an attack involving another component.

Actually, both Christopher and I sincerely believed we had done it
prior to releasing 2.9, shame on us for missing that one and thanks
to Ben for the reminder!

The change was applied, it was confirmed to properly reject this NUL
byte from both header and trailer values, and it's still possible to
force it to continue to be supported using the usual pair of unsafe
"option accept-invalid-http-{request|response}" for those who would
like to keep it for whatever reason that wouldn't make sense.

This was tagged medium so that distros also remember to apply it as
a preventive measure.

It should progressively be backported to all versions down to 2.0.
2024-01-31 15:22:12 +01:00
Christopher Faulet
e7964eac2d BUG/MEDIUM: h1: Ignore C-L value in the H1 parser if T-E is also set
In fact, during the parsing there is already a test to remove the
Content-Length header if a Transfer-Encoding one is found. However, in the
parser, the content-length value was still used to set the body length (the
final one and the remaining one). This value is thus also used to set the
extra field in the HTX message and is then used during the sending stage to
announce the chunk size.

So, Content-Length header value must be ignored by the H1 parser to properly
reformat the message when it is sent.

This patch must be backported as far as 2.6. Lower versions don"t handle
this case.
2023-10-04 15:34:18 +02:00
Willy Tarreau
22731762d9 BUG/MINOR: http: skip leading zeroes in content-length values
Ben Kallus also noticed that we preserve leading zeroes on content-length
values. While this is totally valid, it would be safer to at least trim
them before passing the value, because a bogus server written to parse
using "strtol(value, NULL, 0)" could inadvertently take a leading zero
as a prefix for an octal value. While there is not much that can be done
to protect such servers in general (e.g. lack of check for overflows etc),
at least it's quite cheap to make sure the transmitted value is normalized
and not taken for an octal one.

This is not really a bug, rather a missed opportunity to sanitize the
input, but is marked as a bug so that we don't forget to backport it to
stable branches.

A combined regtest was added to h1or2_to_h1c which already validates
end-to-end syntax consistency on aggregate headers.
2023-08-09 11:28:48 +02:00
Willy Tarreau
6492f1f29d BUG/MAJOR: http: reject any empty content-length header value
The content-length header parser has its dedicated function, in order
to take extreme care about invalid, unparsable, or conflicting values.
But there's a corner case in it, by which it stops comparing values
when reaching the end of the header. This has for a side effect that
an empty value or a value that ends with a comma does not deserve
further analysis, and it acts as if the header was absent.

While this is not necessarily a problem for the value ending with a
comma as it will be cause a header folding and will disappear, it is a
problem for the first isolated empty header because this one will not
be recontructed when next ones are seen, and will be passed as-is to the
backend server. A vulnerable HTTP/1 server hosted behind haproxy that
would just use this first value as "0" and ignore the valid one would
then not be protected by haproxy and could be attacked this way, taking
the payload for an extra request.

In field the risk depends on the server. Most commonly used servers
already have safe content-length parsers, but users relying on haproxy
to protect a known-vulnerable server might be at risk (and the risk of
a bug even in a reputable server should never be dismissed).

A configuration-based work-around consists in adding the following rule
in the frontend, to explicitly reject requests featuring an empty
content-length header that would have not be folded into an existing
one:

    http-request deny if { hdr_len(content-length) 0 }

The real fix consists in adjusting the parser so that it always expects a
value at the beginning of the header or after a comma. It will now reject
requests and responses having empty values anywhere in the C-L header.

This needs to be backported to all supported versions. Note that the
modification was made to functions h1_parse_cont_len_header() and
http_parse_cont_len_header(). Prior to 2.8 the latter was in
h2_parse_cont_len_header(). One day the two should be refused but the
former is also used by Lua.

The HTTP messaging reg-tests were completed to test these cases.

Thanks to Ben Kallus of Dartmouth College and Narf Industries for
reporting this! (this is in GH #2237).
2023-08-09 09:27:38 +02:00
Willy Tarreau
2eab6d3543 BUG/MINOR: h1: do not accept '#' as part of the URI component
Seth Manesse and Paul Plasil reported that the "path" sample fetch
function incorrectly accepts '#' as part of the path component. This
can in some cases lead to misrouted requests for rules that would apply
on the suffix:

    use_backend static if { path_end .png .jpg .gif .css .js }

Note that this behavior can be selectively configured using
"normalize-uri fragment-encode" and "normalize-uri fragment-strip".

The problem is that while the RFC says that this '#' must never be
emitted, as often it doesn't suggest how servers should handle it. A
diminishing number of servers still do accept it and trim it silently,
while others are rejecting it, as indicated in the conversation below
with other implementers:

   https://lists.w3.org/Archives/Public/ietf-http-wg/2023JulSep/0070.html

Looking at logs from publicly exposed servers, such requests appear at
a rate of roughly 1 per million and only come from attacks or poorly
written web crawlers incorrectly following links found on various pages.

Thus it looks like the best solution to this problem is to simply reject
such ambiguous requests by default, and include this in the list of
controls that can be disabled using "option accept-invalid-http-request".

We're already rejecting URIs containing any control char anyway, so we
should also reject '#'.

In the H1 parser for the H1_MSG_RQURI state, there is an accelerated
parser for bytes 0x21..0x7e that has been tightened to 0x24..0x7e (it
should not impact perf since 0x21..0x23 are not supposed to appear in
a URI anyway). This way '#' falls through the fine-grained filter and
we can add the special case for it also conditionned by a check on the
proxy's option "accept-invalid-http-request", with no overhead for the
vast majority of valid URIs. Here this information is available through
h1m->err_pos that's set to -2 when the option is here (so we don't need
to change the API to expose the proxy). Example with a trivial GET
through netcat:

  [08/Aug/2023:16:16:52.651] frontend layer1 (#2): invalid request
    backend <NONE> (#-1), server <NONE> (#-1), event #0, src 127.0.0.1:50812
    buffer starts at 0 (including 0 out), 16361 free,
    len 23, wraps at 16336, error at position 7
    H1 connection flags 0x00000000, H1 stream flags 0x00000810
    H1 msg state MSG_RQURI(4), H1 msg flags 0x00001400
    H1 chunk len 0 bytes, H1 body len 0 bytes :

    00000  GET /aa#bb HTTP/1.0\r\n
    00021  \r\n

This should be progressively backported to all stable versions along with
the following patch:

    REGTESTS: http-rules: add accept-invalid-http-request for normalize-uri tests

Similar fixes for h2 and h3 will come in followup patches.

Thanks to Seth Manesse and Paul Plasil for reporting this problem with
detailed explanations.
2023-08-08 19:56:11 +02:00
Willy Tarreau
a8598a2eb1 BUG/CRITICAL: http: properly reject empty http header field names
The HTTP header parsers surprizingly accepts empty header field names,
and this is a leftover from the original code that was agnostic to this.

When muxes were introduced, for H2 first, the HPACK decompressor needed
to feed headers lists, and since empty header names were strictly
forbidden by the protocol, the lists of headers were purposely designed
to be terminated by an empty header field name (a principle that is
similar to H1's empty line termination). This principle was preserved
and generalized to other protocols migrated to muxes (H1/FCGI/H3 etc)
without anyone ever noticing that the H1 parser was still able to deliver
empty header field names to this list. In addition to this it turns out
that the HPACK decompressor, despite a comment in the code, may
successfully decompress an empty header field name, and this mistake
was propagated to the QPACK decompressor as well.

The impact is that an empty header field name may be used to truncate
the list of headers and thus make some headers disappear. While for
H2/H3 the impact is limited as haproxy sees a request with missing
headers, and headers are not used to delimit messages, in the case of
HTTP/1, the impact is significant because the presence (and sometimes
contents) of certain sensitive headers is detected during the parsing.
Thus, some of these headers may be seen, marked as present, their value
extracted, but never delivered to upper layers and obviously not
forwarded to the other side either. This can have for consequence that
certain important header fields such as Connection, Upgrade, Host,
Content-length, Transfer-Encoding etc are possibly seen as different
between what haproxy uses to parse/forward/route and what is observed
in http-request rules and of course, forwarded. One direct consequence
is that it is possible to exploit this property in HTTP/1 to make
affected versions of haproxy forward more data than is advertised on
the other side, and bypass some access controls or routing rules by
crafting extraneous requests.  Note, however, that responses to such
requests will normally not be passed back to the client, but this can
still cause some harm.

This specific risk can be mostly worked around in configuration using
the following rule that will rely on the bug's impact to precisely
detect the inconsistency between the known body size and the one
expected to be advertised to the server (the rule works from 2.0 to
2.8-dev):

  http-request deny if { fc_http_major 1 } !{ req.body_size 0 } !{ req.hdr(content-length) -m found } !{ req.hdr(transfer-encoding) -m found } !{ method CONNECT }

This will exclusively block such carefully crafted requests delivered
over HTTP/1. HTTP/2 and HTTP/3 do not need content-length, and a body
that arrives without being announced with a content-length will be
forwarded using transfer-encoding, hence will not cause discrepancies.
In HAProxy 2.0 in legacy mode ("no option http-use-htx"), this rule will
simply have no effect but will not cause trouble either.

A clean solution would consist in modifying the loops iterating over
these headers lists to check the header name's pointer instead of its
length (since both are zero at the end of the list), but this requires
to touch tens of places and it's very easy to miss one. Functions such
as htx_add_header(), htx_add_trailer(), htx_add_all_headers() would be
good starting points for such a possible future change.

Instead the current fix focuses on blocking empty headers where they
are first inserted, hence in the H1/HPACK/QPACK decoders. One benefit
of the current solution (for H1) is that it allows "show errors" to
report a precise diagnostic when facing such invalid HTTP/1 requests,
with the exact location of the problem and the originating address:

  $ printf "GET / HTTP/1.1\r\nHost: localhost\r\n:empty header\r\n\r\n" | nc 0 8001
  HTTP/1.1 400 Bad request
  Content-length: 90
  Cache-Control: no-cache
  Connection: close
  Content-Type: text/html

  <html><body><h1>400 Bad request</h1>
  Your browser sent an invalid request.
  </body></html>

  $ socat /var/run/haproxy.stat <<< "show errors"
  Total events captured on [10/Feb/2023:16:29:37.530] : 1

  [10/Feb/2023:16:29:34.155] frontend decrypt (#2): invalid request
    backend <NONE> (#-1), server <NONE> (#-1), event #0, src 127.0.0.1:31092
    buffer starts at 0 (including 0 out), 16334 free,
    len 50, wraps at 16336, error at position 33
    H1 connection flags 0x00000000, H1 stream flags 0x00000810
    H1 msg state MSG_HDR_NAME(17), H1 msg flags 0x00001410
    H1 chunk len 0 bytes, H1 body len 0 bytes :

    00000  GET / HTTP/1.1\r\n
    00016  Host: localhost\r\n
    00033  :empty header\r\n
    00048  \r\n

I want to address sincere and warm thanks for their great work to the
team composed of the following security researchers who found the issue
together and reported it: Bahruz Jabiyev, Anthony Gavazzi, and Engin
Kirda from Northeastern University, Kaan Onarlioglu from Akamai
Technologies, Adi Peleg and Harvey Tuch from Google. And kudos to Amaury
Denoyelle from HAProxy Technologies for spotting that the HPACK and
QPACK decoders would let this pass despite the comment explicitly
saying otherwise.

This fix must be backported as far as 2.0. The QPACK changes can be
dropped before 2.6. In 2.0 there is also the equivalent code for legacy
mode, which doesn't suffer from the list truncation, but it would better
be fixed regardless.

CVE-2023-25725 was assigned to this issue.
2023-02-14 08:48:54 +01:00
Christopher Faulet
e16ffb0383 BUG/MINOR: h1: Replace authority validation to conform RFC3986
Except for CONNECT method, where a normalization is performed, we expected
to have an exact match between the authority and the host header value.
However it was too strict. Indeed, default port must be handled and the
matching must respect the RFC3986.

There is already a scheme based normalizeation performed on the URI later,
on the HTX message. And we cannot normalize the URI during H1 parsing to be
able to report proper errors on the original raw buffer. And a systematic
read-only normalization to validate the authority will consume CPU for only
few requests. At the end, we decided to perform extra-checks when the exact
match fails. Now, following authority/host are considered as equivalent:

  http:  domain.com <=> domain.com:80  <=> domain.com:
  https: domain.com <=> domain.com:443 <=> domain.com:

This patch depends on:

  * MINOR: h1: Consider empty port as invalid in authority for CONNECT
  * MINOR: http: Considere empty ports as valid default ports

It is a bug regarding the RFC3986. Technically, I may be backported as far
as 2.4. However, this must be discussed first. If it is backported, the
commits above must be backported too.
2022-11-22 17:49:10 +01:00
Christopher Faulet
75348c2e8b MINOR: h1: Consider empty port as invalid in authority for CONNECT
For now, this change is useless because http_get_host_port() returns
IST_NULL when the port is empty. But this will change. For other methods,
empty ports are valid. But not for CONNECT method. To still return a
400-Bad-Request if a CONNECT is performed with an empty port, istlen() is
used to test the port, instead of isttest().
2022-11-22 16:27:52 +01:00
Willy Tarreau
55d2e8577e BUILD: h1: silence an initiialized warning with gcc-4.7 and -Os
Building h1.c with gcc-4.7 -Os produces the following warning:

  src/h1.c: In function 'h1_headers_to_hdr_list':
  src/h1.c:1101:36: warning: 'ptr' may be used uninitialized in this function [-Wmaybe-uninitialized]

In fact ptr may be taken from sl.rq.u.ptr which is only initialized after
passing through the relevant states, but gcc doesn't know which states
are visited. Adding an ALREADY_CHECKED() statement there is sufficient to
shut it up and doesn't affect the emitted code.

This may be backported to stable versions to make sure that builds on older
distros and systems is clean.
2022-10-04 08:02:03 +02:00
Christopher Faulet
3f5fbe9407 BUG/MEDIUM: h1: Improve authority validation for CONNCET request
From time to time, users complain to get 400-Bad-request responses for
totally valid CONNECT requests. After analysis, it is due to the H1 parser
performs an exact match between the authority and the host header value. For
non-CONNECT requests, it is valid. But for CONNECT requests the authority
must contain a port while it is often omitted from the host header value
(for default ports).

So, to be sure to not reject valid CONNECT requests, a basic authority
validation is now performed during the message parsing. In addition, the
host header value is normalized. It means the default port is removed if
possible.

This patch should solve the issue #1761. It must be backported to 2.6 and
probably as far as 2.4.
2022-07-07 09:35:58 +02:00
Tim Duesterhus
7750850594 CLEANUP: Reapply ist.cocci with --include-headers-for-types --recursive-includes
Previous uses of `ist.cocci` did not add `--include-headers-for-types` and
`--recursive-includes` preventing Coccinelle seeing `struct ist` members of
other structs.

Reapply the patch with proper flags to further clean up the use of the ist API.

The command used was:

    spatch -sp_file dev/coccinelle/ist.cocci -in_place --include-headers --include-headers-for-types --recursive-includes --dir src/
2022-03-21 08:30:47 +01:00
Christopher Faulet
02c893332b BUG/MEDIUM: h1: Properly reset h1m flags when headers parsing is restarted
If H1 headers are not fully received at once, the parsing is restarted a
last time when all headers are finally received. When this happens, the h1m
flags are sanitized to remove all value set during parsing.

But some flags where erroneously preserved. Among others, H1_MF_TE_CHUNKED
flag was not removed, what could lead to parsing error.

To fix the bug and make things easy, a mask has been added with all flags
that must be preserved. It will be more stable. This mask is used to
sanitize h1m flags.

This patch should fix the issue #1469. It must be backported to 2.5.
2021-12-02 09:46:29 +01:00
Tim Duesterhus
4c8f75fc31 CLEANUP: Apply ist.cocci
Make use of the new rules to use `istend()`.
2021-11-08 08:05:39 +01:00
Christopher Faulet
545fbba273 MINOR: h1: Change T-E header parsing to fail if chunked encoding is found twice
According to the RFC7230, "chunked" encoding must not be applied more than
once to a message body. To handle this case, h1_parse_xfer_enc_header() is
now responsible to fail when a parsing error is found. It also fails if the
"chunked" encoding is not the last one for a request.

To help the parsing, two H1 parser flags have been added: H1_MF_TE_CHUNKED
and H1_MF_TE_OTHER. These flags are set, respectively, when "chunked"
encoding and any other encoding are found. H1_MF_CHNK flag is used when
"chunked" encoding is the last one.
2021-09-28 16:21:25 +02:00
Christopher Faulet
631c7e8665 MEDIUM: h1: Force close mode for invalid uses of T-E header
Transfer-Encoding header is not supported in HTTP/1.0. However, softwares
dealing with HTTP/1.0 and HTTP/1.1 messages may accept it and transfer
it. When a Content-Length header is also provided, it must be
ignored. Unfortunately, this may lead to vulnerabilities (request smuggling
or response splitting) if an intermediary is only implementing
HTTP/1.0. Because it may ignore Transfer-Encoding header and only handle
Content-Length one.

To avoid any security issues, when Transfer-Encoding and Content-Length
headers are found in a message, the close mode is forced. The same is
performed for HTTP/1.0 message with a Transfer-Encoding header only. This
change is conform to what it is described in the last HTTP/1.1 draft. See
also httpwg/http-core#879.

Note that Content-Length header is also removed from any incoming messages
if a Transfer-Encoding header is found. However it is not true (not yet) for
responses generated by HAProxy.
2021-09-28 16:21:25 +02:00
Amaury Denoyelle
69294b20ac MINOR: http: use http uri parser for authority
Replace http_get_authority by the http_uri_parser API.

The new function is renamed http_parse_authority. Replace duplicated
scheme parsing code by http_parse_scheme invocation. A new
http_uri_parser state is declared to mark the authority parsing as done.
2021-07-08 17:11:17 +02:00
Amaury Denoyelle
aad333a9fc MEDIUM: h1: add a WebSocket key on handshake if needed
Add the header Sec-Websocket-Key when generating a h1 handshake websocket
without this header. This is the case when doing h2-h1 conversion.

The key is randomly generated and base64 encoded. It is stored on the session
side to be able to verify response key and reject it if not valid.
2021-01-28 16:37:14 +01:00
Amaury Denoyelle
c193823343 MEDIUM: h1: generate WebSocket key on response if needed
Add the Sec-Websocket-Accept header on a websocket handshake response.
This header may be missing if a h2 server is used with a h1 client.

The response key is calculated following the rfc6455. For this, the
handshake request key must be stored in the h1 session, as a new field
name ws_key. Note that this is only done if the message has been
prealably identified as a Websocket handshake request.
2021-01-28 16:37:14 +01:00
Amaury Denoyelle
18ee5c3eb0 MINOR: h1: reject websocket handshake if missing key
If a request is identified as a WebSocket handshake, it must contains a
websocket key header or else it can be reject, following the rfc6455.

A new flag H1_MF_UPG_WEBSOCKET is set on such messages.  For the request
te be identified as a WebSocket handshake, it must contains the headers:
  Connection: upgrade
  Upgrade: websocket

This commit is a compagnon of
"MEDIUM: h1: generate WebSocket key on response if needed" and
"MEDIUM: h1: add a WebSocket key on handshake if needed".

Indeed, it ensures that a WebSocket key is added only from a http/2 side
and not for a http/1 bogus peer.
2021-01-28 16:37:14 +01:00
Willy Tarreau
f278eec37a BUILD: tree-wide: cast arguments to tolower/toupper to unsigned char
NetBSD apparently uses macros for tolower/toupper and complains about
the use of char for array subscripts. Let's properly cast all of them
to unsigned char where they are used.

This is needed to fix issue #729.
2020-07-05 21:50:02 +02:00
Ilya Shipitsin
47d17182f4 CLEANUP: assorted typo fixes in the code and comments
This is 10th iteration of typo fixes
2020-06-26 11:27:28 +02:00
Willy Tarreau
f1d32c475c REORG: include: move channel.h to haproxy/channel{,-t}.h
The files were moved with no change. The callers were cleaned up a bit
and a few of them had channel.h removed since not needed.
2020-06-11 10:18:58 +02:00
Willy Tarreau
5413a87ad3 REORG: include: move common/h1.h to haproxy/h1.h
The file was moved as-is. There was a wrong dependency on dynbuf.h
instead of buf.h which was addressed. There was no benefit to
splitting this between types and functions.
2020-06-11 10:18:57 +02:00
Willy Tarreau
0017be0143 REORG: include: split common/http-hdr.h into haproxy/http-hdr{,-t}.h
There's only one struct and 2 inline functions. It could have been
merged into http.h but that would have added a massive dependency on
the hpack parts for nothing, so better keep it this way since hpack
is already freestanding and portable.
2020-06-11 10:18:57 +02:00
Willy Tarreau
4c7e4b7738 REORG: include: update all files to use haproxy/api.h or api-t.h if needed
All files that were including one of the following include files have
been updated to only include haproxy/api.h or haproxy/api-t.h once instead:

  - common/config.h
  - common/compat.h
  - common/compiler.h
  - common/defaults.h
  - common/initcall.h
  - common/tools.h

The choice is simple: if the file only requires type definitions, it includes
api-t.h, otherwise it includes the full api.h.

In addition, in these files, explicit includes for inttypes.h and limits.h
were dropped since these are now covered by api.h and api-t.h.

No other change was performed, given that this patch is large and
affects 201 files. At least one (tools.h) was already freestanding and
didn't get the new one added.
2020-06-11 10:18:42 +02:00
Christopher Faulet
7032a3fd0a BUG/MEDIUM: h1: Don't compare host and authority if only h1 headers are parsed
When only request headers are parsed, the host header should not be compared to
the request authority because no start-line was parsed. Thus there is no
authority.

Till now this bug was hidden because this parsing mode was only used for the
response in the FCGI multiplexer. Since the HTTP checks refactoring, the request
headers may now also be parsed without the start-line.

This patch fixes the issue #610. It must be backported to 2.1.
2020-05-04 09:27:01 +02:00
Willy Tarreau
02ac950a11 CLEANUP: http/h1: rely on HA_UNALIGNED_LE instead of checking for CPU families
Now that we have flags indicating the CPU's capabilities, better use
them instead of missing some updates for new CPU families (ARMv8 was
missing there).
2020-02-21 16:32:57 +01:00
Christopher Faulet
1703478e2d BUG/MINOR: h1: Report the right error position when a header value is invalid
During H1 messages parsing, when the parser has finished to parse a full header
line, some tests are performed on its value, depending on its name, to be sure
it is valid. The content-length is checked and converted in integer and the host
header is also checked. If an error occurred during this step, the error
position must point on the header value. But from the parser point of view, we
are already on the start of the next header. Thus the effective reported
position in the error capture is the beginning of the unparsed header line. It
is a bit confusing when we try to figure out why a message is rejected.

Now, the parser state is updated to point on the invalid value. This way, the
error position really points on the right position.

This patch must be backported as far as 1.9.
2020-01-06 13:58:21 +01:00
Christopher Faulet
bc7c03eba3 BUG/MINOR: h1: Don't test the host header during response parsing
During the H1 message parsing, the host header is tested to be sure it matches
the request's authority, if defined. When there are multiple host headers, we
also take care they are all the same. Of course, these tests must only be
performed on the requests. A host header in a response has no special meaning.

This patch must be backported to 2.1.
2019-11-27 14:01:17 +01:00
Christopher Faulet
531b83e039 MINOR: h1: Reject requests if the authority does not match the header host
As stated in the RCF7230#5.4, a client must send a field-value for the header
host that is identical to the authority if the target URI includes one. So, now,
by default, if the authority, when provided, does not match the value of the
header host, an error is triggered. To mitigate this behavior, it is possible to
set the option "accept-invalid-http-request". In that case, an http error is
captured without interrupting the request parsing.
2019-10-14 22:28:50 +02:00
Christopher Faulet
497ab4f519 MINOR: h1: Reject requests with different occurrences of the header host
There is no reason for a client to send several headers host. It even may be
considered as a bug. However, it is totally invalid to have different values for
those. So now, in such case, an error is triggered during the request
parsing. In addition, when several headers host are found with the same value,
only the first instance is kept and others are skipped.
2019-10-14 22:28:50 +02:00
Christopher Faulet
84f06533e1 BUG/MINOR: h1: Properly reset h1m when parsing is restarted
Otherwise some processing may be performed twice. For instance, if the header
"Content-Length" is parsed on the first pass, when the parsing is restarted, we
skip it because we think another header with the same value was already seen. In
fact, it is currently the only existing bug that can be encountered. But it is
safer to reset all the h1m on restart to avoid any future bugs.

This patch must be backported to 2.0 and 1.9
2019-09-04 10:30:11 +02:00
Christopher Faulet
711ed6ae4a MAJOR: http: Remove the HTTP legacy code
First of all, all legacy HTTP analyzers and all functions exclusively used by
them were removed. So the most of the functions in proto_http.{c,h} were
removed. Only functions to deal with the HTTP transaction have been kept. Then,
http_msg and hdr_idx modules were entirely removed. And finally the structure
http_msg was lightened of all its useless information about the legacy HTTP. The
structure hdr_ctx was also removed because unused now, just like unused states
in the enum h1_state. Note that the memory pool "hdr_idx" was removed and
"http_txn" is now smaller.
2019-07-19 09:24:12 +02:00
Christopher Faulet
a51ebb7f56 MEDIUM: h1: Add an option to sanitize connection headers during parsing
The flag H1_MF_CLEAN_CONN_HDR has been added to let the H1 parser sanitize
connection headers. It means it will remove all "close" and "keep-alive" values
during the parsing. One noticeable effect is that connection headers may be
unfolded. In practice, this is not a problem because it is not frequent to have
multiple values for the connection headers.

If this flag is set, during the parsing The function
h1_parse_next_connection_header() is called in a loop instead of
h1_parse_conection_header().

No need to backport this patch
2019-04-12 22:06:53 +02:00
Christopher Faulet
68b1bbd767 BUG/MEDIUM: h1: Get the h1m state when restarting the headers parsing
Since the commit 0f8fb6b7f ("MINOR: h1: make the H1 headers block parser able to
parse headers only"), when headers are not received in one time, a parsing error
is returned because the local state in the function h1_headers_to_hdr_list() was
not initialized with the previous one (in fact, it was not initialized at all).

So now, we start the parsing of headers with the state H1_MSG_HDR_FIRST when the
flag H1_MF_HDRS_ONLY is set. Otherwise, we always get it from the h1m.

This patch must be backported to 1.9.
2019-01-04 16:23:03 +01:00
Willy Tarreau
0f8fb6b7f9 MINOR: h1: make the H1 headers block parser able to parse headers only
Currently the H1 headers parser works for either a request or a response
because it starts from the start line. It is also able to resume its
processing when it was interrupted, but in this case it doesn't update
the list.

Make it support a new flag, H1_MF_HDRS_ONLY so that the caller can
indicate it's only interested in the headers list and not the start
line. This will be convenient to parse H1 trailers.
2019-01-04 10:48:03 +01:00
Willy Tarreau
afba57ae80 REORG: h1: merge types+proto into common/h1.h
These two files are self-contained and do not depend on other
layers, so let's remerge them together for easier manipulation.
2018-12-11 17:15:13 +01:00
Willy Tarreau
538746ad38 REORG: h1: move legacy http functions to http_msg.c
Now that h1 and legacy HTTP are two distinct things, there's no need
to keep the legacy HTTP parsers in h1.c since they're only used by
the legacy code in proto_http.c, and h1.h doesn't need to include
hdr_idx anymore. This concerns the following functions :

- http_parse_reqline();
- http_parse_stsline();
- http_msg_analyzer();
- http_forward_trailers();

All of these were moved to http_msg.c.
2018-12-11 17:15:13 +01:00
Christopher Faulet
25da9e34f1 MINOR: h1: Add the flag H1_MF_NO_PHDR to not add pseudo-headers during parsing
Some pseudo-headers are added during the headers parsing, mainly for the mux
H2. With this flag, it is possible to not add them. This avoid some boring
filtering in the mux H1.
2018-10-12 16:15:18 +02:00
Christopher Faulet
1dc2b49556 MINOR: h1: Change the union h1_sl to use indirect strings to store infos
Instead of using offsets relating to the parsed buffer to store start line
infos, we now use indirect strings. So now, these infos remain valid only if the
origin buffer remains untouched. But it's not a real problem because this union
is used during the parsing and never stored to a later use.
2018-10-12 16:14:57 +02:00
Christopher Faulet
ff08a92797 MINOR: h1: Add EOH marker during headers parsing
When headers parsing ends, a pseudo header with an empty name and an empty value
is added to the array of parsed headers to mark its end. It is convenient to
loop on this array, but not really useful if we want remove the last header or
add a new one, because we don't really know where is the last CRLF (the empty
line ending the headers block). So now, instead the name of this pseudo header
points on this last CRLF. Its length is still 0 and its value is still empty, so
loops on the array remains unchanged.
2018-10-12 16:08:27 +02:00
Christopher Faulet
2912f87443 BUG/MEDIUM: h1: Really skip all updates when incomplete messages are parsed
In h1_headers_to_hdr_list, when an incomplete message is parsed, all updates
must be skipped until the end of the message is found. Then the parsing is
restarted from the beginning. But not all updates were skipped, leading to
invalid rewritting or segfault.

No backport is needed.
2018-09-19 15:08:05 +02:00
Willy Tarreau
73373ab43a MEDIUM: h1: deduplicate the content-length header
Just like we used to do in proto_http, we now check that each and every
occurrence of the content-length header field and each of its values are
exactly identical, and we normalize the header to return the last value
of the first header with spaces trimmed.
2018-09-14 19:04:28 +02:00
Willy Tarreau
2557f6a3e2 MEDIUM: h1: better handle transfer-encoding vs content-length
The transfer-encoding header processing was a bit lenient in this part
because it was made to read messages already validated by haproxy. We
absolutely need to reinstate the strict processing defined in RFC7230
as is currently being done in proto_http.c. That is, transfer-encoding
presence alone is enough to cancel content-length, and must be
terminated by the "chunked" token, except in the response where we
can fall back to the close mode if it's not last.

For this we now use a specific parsing function which updates the
flags and we introduce a new flag H1_MF_XFER_ENC indicating that the
transfer-encoding header is present.

Last, if such a header is found, we delete all content-length header
fields found in the message.
2018-09-14 17:40:35 +02:00
Willy Tarreau
2ea6bb5c31 MINOR: h1: add headers to the list after controls, not before
This will ease removal/skipping of duplicates such as content-length.
2018-09-14 17:40:35 +02:00
Willy Tarreau
98f5cf7a59 MINOR: h1: parse the Connection header field
The new function h1_parse_connection_header() is called when facing a
connection header in the generic parser, and it will set up to 3 bits
in h1m->flags indicating if at least one "close", "keep-alive" or "upgrade"
tokens was seen.
2018-09-13 14:52:31 +02:00
Willy Tarreau
ba5fbca33f MINOR: h1: report in the h1m struct if the HTTP version is 1.1 or above
This will be needed for the mux to know how to process the Connection
header, and will save it from having to re-parse the request line since
it's captured on the fly.
2018-09-13 14:34:09 +02:00