Modify quic-cc-algo for better extensability of optional parameters
parsing. This will be useful to support a new parameter for maximum
allowed pacing burst size.
Take this opportunity to refine quic-cc-algo documentation. Optional
parameters are now presented as a list which would be soon extended.
Rename 'tune.quic.frontend.max-window-size' with the prefix 'default-'.
This highlights the fact that it is not a hard limit, as it can be
overriden if specifying an optional window size via quic-cc-algo on a
bind line.
No need to backport as this keyword was added on the current dev
version.
414-Uri-Too-Long and 431-Request-Header-Fields-Too-Large are now part of
supported status codes that can be define as error files. The hash table
defined in http_get_status_idx() was updated accordingly.
"show sess" command now supports a list of options that can be set after all
other possible arguments (<id>, all...). For now, "show-uri" is the only
supported option. With this options, the captured URI, if non-null, is added
to the dump of a stream, complete or now. The URI may be anonymized if
necessary.
This patch should fix the issue #663.
Historically, an agent-check program is only able to set a proportial weight
to the initial server's weight. However, it could be handy to also set an
absolute value. It is the purpose of this patch.
Instead of changing the current way to set a server's weight, a new
agent-check command is introduced. The string "weight:", followed by an
positive interger or a positive interger percentage, can now be used. If the
value ends with the '%' sign, then the new weight will be proportional to
the initially weight of the server. Otherwise, the value is considered as an
absolute weight and must be between 0 and 256.
This patch should fix the issue #360.
It is now possible to use a log-format string to define the "Set-Cookie"
header value of a response generated by a redirect rule. There is no special
check on the result format and it is not possible during the configuration
parsing. It is proably not a big deal because already existing "set-cookie"
and "clear-cookie" options don't perform any check.
Here is an example:
http-request redirect location https://someurl.com/ set-cookie haproxy="%[var(txn.var)]"
This patch should fix the issue #1784.
On prefix-based redirect, there is an option to drop the query-string of the
location. Here it is the opposite. an option is added to preserve the
query-string of the original URI for a localtion-based redirect.
By setting "keep-query" option, for a location-based redirect only, the
query-string of the original URI is appended to the location. If there is no
query-string, nothing is added (no empty '?'). If there is already a
non-empty query-string on the localtion, the original one is appended with
'&' separator.
This patch should fix issue #2728.
Till now this value was parsed as raw integer using atol() and would
silently ignore any trailing suffix, causing unexpected behaviors when
set, e.g. to "4k". Let's make use of parse_size_err() on it so that
units are supported. This requires to turn it to uint as well, which
was verified to be OK.
Till now this value was parsed as raw integer using atol() and would
silently ignore any trailing suffix, preventing from starting when set
e.g. to "64k". Let's make use of parse_size_err() on it so that units are
supported. This requires to turn it to uint as well, and to explicitly
limit its range to INT_MAX - 2*sizeof(void*), which was previously
partially handled as part of the sign check.
Till now this value was parsed as raw integer using atol() and would
silently ignore any trailing suffix, causing unexpected behaviors when
set, e.g. to "512k". Let's make use of parse_size_err() on it so that
units are supported. This requires to turn it to uint as well, and
since it's sometimes compared to an int, we limit its range to
0..INT_MAX.
Till now this value was parsed as raw integer using atol() and would
silently ignore any trailing suffix, causing unexpected behaviors when
set, e.g. to "512k". Let's make use of parse_size_err() on it so that
units are supported. This requires to turn it to uint as well, which
was verified to be OK.
Till now these values were parsed as raw integer using atol() and would
silently ignore any trailing suffix, causing unexpected behaviors when
set, e.g. to "512k". Let's make use of parse_size_err() on them so that
units are supported. This requires to turn them to uint as well, which
is OK.
Till now these values were parsed as raw integer using atol() and would
silently ignore any trailing suffix, causing unexpected behaviors when
set, e.g. to "512k". Let's make use of parse_size_err() on them so that
units are supported. This requires to turn them to uint as well, which
is OK.
Sometimes conditions to decide of an anomaly are not as easy to define
as just an error or a success. One example use case would be to monitor
the transfer time and fix a threshold.
An idea suggested by Tristan would be to make permit the "when"
converter to refer to a more variable or dynamic condition.
Here we make this possible by making "when" rely on a named ACL. The
ACL then needs to be specified in either the proxy or the defaults
section. Since it is evaluated inline, it may even refer to information
available at the end (at log time) such as the data transfer time. If
the ACL evalutates to true, the converter passes the data.
Example: log "dbg={-}" when fine, or "dbg={... debug info ...}" on slow
transfers:
acl slow_xfer res.timer.data ge 10000 # more than 10s is slow
log-format "$HAPROXY_HTTP_LOG_FMT \
fsdbg={%[fs.debug_str,when(acl,slow_xfer)]} \
bsdbg={%[bs.debug_str,when(acl,slow_xfer)]}"
Released version 3.1-dev13 with the following main changes :
- MEDIUM: mworker: depreciate the 'program' section
- BUILD: ot: use a cebtree instead of a list for variable names
- MINOR: startup: replace HAPROXY_LOAD_SUCCESS with global load_status
- BUG/MINOR: startup: set HAPROXY_CFGFILES in read_cfg
- BUG/MINOR: cli: don't show sockpairs in HAPROXY_CLI and HAPROXY_MASTER_CLI
- BUG/MEDIUM: stconn: Don't forward shut for SC in connecting state
- BUG/MEDIUM: resolvers: Insert a non-executed resulution in front of the wait list
- MINOR: debug: explicitly permit the counter condition to be empty
- MINOR: debug: add a new counter type for glitches
- MINOR: mux-h2: count glitches when they're reported
- BUG/MINOR: deinit: release uri_auth admin rules
- MINOR: uri_auth: add stats_uri_auth_free helper
- MEDIUM: uri_auth: implement clean uri_auth cleaning
- MINOR: mux-quic/h3: count glitches when they're reported
- BUG/MEDIUM: mux-h2: Don't send RST_STREAM frame for streams with no ID
- BUG/MINOR: Don't report early srv aborts on request forwarding in DONE state
- MINOR: promex: Expose the global node and description in process metrics
- MINOR: promex: Add global and proxies description as labels to all metrics
- OPTIM: pattern: only apply LRU cache for large enough lists
- BUG/MEDIUM: checks: make sure to always apply offsets to now_ms in expiration
- BUG/MINOR: debug: do not set task expiration to TICK_ETERNITY
- BUG/MEDIUM: mailers: make sure to always apply offsets to now_ms in expiration
- BUG/MINOR: mux_quic: make sure to always apply offsets to now_ms in expiration
- BUG/MINOR: peers: make sure to always apply offsets to now_ms in expiration
- BUG/MEDIUM: clock: make sure now_ms cannot be TICK_ETERNITY
- MINOR: debug/cli: replace "debug dev counters" with "debug counters"
- DOC: config: add tune.h2.{be,fe}.rxbuf to the global keywords index
- MINOR: chunk: add a BUG_ON upon the next init_trash_buffer()
"debug dev" commands are not meant to be used by end-users, and are
purposely not documented. Yet due to their usefulness in troubleshooting
sessions, users are increasingly invited by developers to use some of
them.
"debug dev counters" is one of them. Better move it to "debug counters"
and document it so that users can check them even if the output can look
cryptic at times. This, combined with DEBUG_GLITCHES, can be convenient
to observe suspcious activity. The doc however precises that the format
may change between versions and that new entries/types might appear
within a stable branch.
The program section is unreliable and should not be used, more reliable
alternatives exist outside HAProxy. Let's depreciate the section so we
could remove it completely in 3.3.
Released version 3.1-dev12 with the following main changes :
- MINOR: startup: tune.renice.{startup,runtime} allow to change priorities
- BUG/MEDIUM: promex: Fix dump of extra counters
- BUILD: import/mt_list: support building with TCC
- BUILD: compiler: define __builtin_prefetch() for tcc
- CLEANUP: quic: Remove the useless directive "tune.quic.backend.max-idle-timeou"
- DOC: config: document connection error 44 (reverse connect failure)
- CLEANUP: connection: properly name the CO_ER_SSL_FATAL enum entry
- DEBUG: cli: support closing "hard" using close() in addition to fd_delete()
- MINOR: connection: add more connection error codes to cover common errno
- MINOR: rawsock: set connection error codes when returning from recv/send/splice
- MINOR: connection: add new sample fetch functions fc_err_name and bc_err_name
- MINOR: quic: Help diagnosing malformed probing packets
- BUG/MINOR: quic: fix malformed probing packet building
- MINOR: listener: Remove useless checks on the receiver protocol existence
- MINOR: http-conv: Remove unreachable goto statement in sample_conv_q_preferred
- MINOR: http: don't %-encode the payload when not relevant
- MINOR: quic: simplify qc_parse_pkt_frms() return path
- MINOR: quic: use dynamically allocated frame on parsing
- MINOR: quic: extend return value of CRYPTO parsing
- BUG/MINOR: quic: repeat packet parsing to deal with fragmented CRYPTO
- BUG/MINOR: mworker: do 'program' postparser checks in read_cfg_in_discovery_mode
- EXAMPLES: add "traces.cfg" with traces examples
- BUG/MEDIUM: quic: do not consider ACK on released stream as error
- CLEANUP: stats: fix misleading comment on top of stat_idx_info
- MINOR: wdt: move the local timers to a struct
- MINOR: debug: add a function to dump a stuck thread
- DEBUG: wdt: better detect apparently locked up threads and warn about them
- DEBUG: cli: make it possible for "debug dev loop" to trigger warnings
- DEBUG: wdt: make the blocked traffic warning delay configurable
- DEBUG: wdt: add a stats counter "BlockedTrafficWarnings" in show info
- DEBUG: wdt: set the default blocked task delay to 100 ms
- MINOR: debug: move the "recover now" warn message after the optional notes
- MINOR: event_hdl: add event_hdl_sub_list_empty() helper func
- MINOR: pattern: add _pat_ref_new() helper func
- OPTIM: pattern: use malloc() to initialize new pat_ref struct
- MINOR: pattern: add pat_ref_free() helper func
- CLEANUP: guid: remove global tree export
- BUG/MINOR: guid/server: ensure thread-safety on GUID insert/delete
- DOC: management: explain the change of behavior of the program section
- BUG/MEDIUM: mux-h2: try to wait for the peer to read the GOAWAY
- BUG/MEDIUM: quic: prevent crash due to CRYPTO parsing error
The warn-blocked-traffic-after can be significantly lowered. In any
case, in order to be usable it must be well below the limit to have a
chance to emit exploitable traces before the watchdog finally fires.
Even configured at 1ms it looks very difficult to trigger it on a
laptop doing SSL and compression, so applying a 100-fold factor to
cover for large configs and small machines sounds sane for 3.1. In any
case, even at 100ms, the service degradation becomes quite visible.
These functions return a symbolic error code such as ECONNRESET to keep
logs compact while making them human-readable. It's a good alternative
to the numeric code in that it's more expressive, and a good one to the
full message since it's shorter and more precise (some codes even match
errno names).
The doc was updated so that the symbolic names appear in the table. It
could be useful to backport this feature to help with troubleshooting
some issues, though backporting the doc might possibly be more annoying
in case users have local patches already, so maybe the table update does
not need to be backported in this case.
While we get reports of connection setup errors in fc_err/bc_err, we
don't have the equivalent for the recv/send/splice syscalls. Let's
add provisions for new codes that cover the common errno values that
recv/send/splice can return, i.e. ECONNREFUSED, ENOMEM, EBADF, EFAULT,
EINVAL, ENOTCONN, ENOTSOCK, ENOBUFS, EPIPE. We also add a special case
for when the poller reported the error itself. It's worth noting that
EBADF/EFAULT/EINVAL will generally indicate serious bugs in the code
and should not be reported.
The only thing is that it's quite hard to forcefully (and reliably)
trigger these errors in automated tests as the timing is critical.
Using iptables to manually reset established connections in the
middle of large transfers at least permits to see some ECONNRESET
and/or EPIPE, but the other ones are harder to trigger.
This commit introduces the tune.renice.startup and tune.renice.runtime
global keywords that allows to change the priority with setpriority().
tune.renice.startup is parsed and applied in the worker or the standalone
process for configuration parsing. If this keyword is used alone, the
nice value is changed to the previous one after configuration parsing.
tune.renice.runtime is applied after configuration parsing, so in the
worker or a standalone process. Combined with tune.renice.startup it
allows to have a different nice value during configuration parsing and
during runtime.
The feature was discussed in github issue #1919.
Example:
global
tune.renice.startup 15
tune.renice.runtime 0
Released version 3.1-dev11 with the following main changes :
- BUG/MINOR: httpclient: return NULL when no proxy available during httpclient_new()
- BUG/MEDIUM: mworker/httpclient: initialization skipped by accident in mworker mode
- BUG/MINOR: resolvers/mworker: missing default resolvers in mworker mode
- MINOR: mworker/ocsp: skip ocsp-update proxy init in master
- BUG/MEDIUM: stconn: Wait iobuf is empty to shut SE down during a check send
- MINOR: mux-h1: Show the SD iobuf in trace messages on stream send events
- MINOR: mux-h1: Add a trace on shutdown when keep-alive is not possible
- BUG/MINOR: http-ana: Don't report a server abort if response payload is invalid
- BUG/MEDIUM: stconn: Check FF data of SC to perform a shutdown in sc_notify()
- BUG/MAJOR: filters/htx: Add a flag to state the payload is altered by a filter
- REGTESTS: Never reuse server connection in http-messaging/truncated.vtc
- BUG/MINOR: quic: avoid leaking post handshake frames
- MINOR: quic: send new tokens (NEW_TOKEN) even for 1RTT sessions
- BUG/MEDIUM: quic: avoid freezing 0RTT connections
- DOC: config: fix rfc7239 forwarded typo in desc
- MINOR: http_ext: implement rfc7239_{nn,np} converters
- CLEANUP: http_ext: remove useless BUG_ON() in http_handle_xot_header()
- BUG/MINOR: sample: free err2 in smp_resolve_args for type ARGT_REG
- MINOR: arg: add an argument type for identifier
- BUILD: buffers: keep b_getblk_nc() and b_peek_varint() in buf.h
- CLEANUP: buffers: simplify b_get_varint()
- OPTIM: buffers: avoid a useless wrapping check for ofs == 0
- MINOR: debug: make mark_tainted() return the previous value
- MINOR: chunk: drop the global thread_dump_buffer
- MINOR: debug: split ha_thread_dump() in two parts
- MINOR: debug: slightly change the thread_dump_pointer signification
- MINOR: debug: make ha_thread_dump_done() take the pointer to be used
- MINOR: debug: replace ha_thread_dump() with its two components
- MEDIUM: debug: on panic, make the target thread automatically allocate its buf
- BUILD: mux-h2/traces: fix build on 32-bit due to size of the DATA frame
- CI: prepare Coverity build for Ubuntu 24
- CI: bump development builds explicitely to Ubuntu 24.04
- CI: modernize macos builds to macos-15
- BUG/MINOR: mworker: fix mworker-max-reloads parser
- MINOR: mux-quic: simplify sending of empty STREAM FIN
- BUG/MINOR: mux-quic: do not close STREAM with empty FIN if no data sent
- CLEANUP: debug: make the BUG_ON() macros check the condition in the outer one
- MEDIUM: debug: add match counters for BUG_ON/WARN_ON/CHECK_IF
- MINOR: debug: add a new debug macro COUNT_IF()
- MINOR: debug: add "debug dev counters" to list code counters
- BUG/MEDIUM: stats-html: Never dump more data than expected during 0-copy FF
- BUG/MEDIUM: mux-h2: Remove H2S from send list if data are sent via 0-copy FF
- BUG/MINOR: stconn: Pretend the SE have more data to deliver on abortonclose
- CLEANUP: stream: remove outdated comments
- DEBUG: stream: Add debug counters to track some client/server aborts
- DEBUG: mux-h1: Add debug counters to track some errors
- MINOR: mux-h1: Add support of the debug string for logs
- MINOR: stream: maintain per-stream counters of the number of passes on code
- MINOR: filters: add per-filter call counters
- MINOR: sample: add the "when" converter to condition some expressions
- BUG/MEDIUM: connection/http-reuse: fix address collision on unhandled address families
- BUILD: spoe: fix build warning on older gcc around sub-struct initialization
- Revert "OPTIM: mux-h2: make h2_send() report more accurate wake up conditions"
- DEBUG: mux-h1: Add debug counters to track errors with in/out pending data
- BUG/MINOR: mux-h1: Fix conditions on pipe in some COUNT_IF()
- MINOR: activity/memprofile: show per-DSO stats
- BUG/MINOR: mworker/cli: show master startup logs in recovery mode
- MINOR: mworker: stop MASTER proxy listener on worker mcli sockpair
- MINOR: error: simplify startup_logs_init_shm
- BUG/MINOR: mworker: show worker warnings in startup logs
- CLEANUP: mworker: clean mworker_reexec
- MINOR: mworker/cli: split mworker_cli_proxy_create
- BUG/MINOR: server: fix dynamic server leak with check on failed init
- BUG/MEDIUM: server: fix race on servers_list during server deletion
- BUG/MEDIUM: stconn: Report blocked send if sends are blocked by an error
- BUG/MINOR: http-ana: Fix wrong client abort reports during responses forwarding
- BUG/MINOR: stconn: Don't disable 0-copy FF if EOS was reported on consumer side
- MINOR: mworker/cli: add 'debug' to 'show proc'
- MINOR: mworker/cli: remove comment line for program when useless
- MINOR: mworker/cli: 'show proc debug' for old workers
- BUILD: debug: silence a build warning with threads disabled
- CLEANUP: mux-h2: remove the unused "full" variable in h2_frt_transfer_data()
- MINOR: pools: export the pools variable
- MINOR: debug: place a magic pattern at the beginning of post_mortem
- MINOR: debug: place the post_mortem struct in its own section.
- MINOR: debug: store important pointers in post_mortem
- MINOR: debug: do not limit backtraces to stuck threads
- MINOR: cli: remove non-printable characters from 'debug dev fd'
- MINOR: cli: add an 'echo' command
- MINOR: debug: also add a pointer to struct global to post_mortem
- CLEANUP: mworker: make mworker_create_master_cli more readable
- BUG/MEIDUM: mworker: fix fd leak from master to worker
- BUG/MINOR: mworker/cli: fix mworker_cli_global_proxy_new_listener
- MINOR: tools: add strnlen2() helper
- CLEANUP: log: use strnlen2() in _lf_text_len() to compute string length
- DOC: design: add notes about more detailed error reporting for logs
- MINOR: debug: also add fdtab and acitvity to struct post_mortem
- MINOR: debug: remove the redundant process.thread_info array from post_mortem
- DEV: gdb: add a number of gdb scripts to navigate in core dumps
- BUG/MINOR: trace: stop rewriting argv with -dt
- MEDIUM: protocol: make abns a custom unix socket address family
- MEDIUM: protocol: rely on AF_CUST_ABNS family to recognize ABNS sockets
- CLEANUP: tools: rely on address family to detect ABNS sockets
- MINOR: protocol: create abnsz socket address family
- MINOR: sock: restore effective UNIX family in sock_get_old_sockets()
- MEDIUM: sock: also restore effective unix family in get_{src,dst}()
- MEDIUM: sock_unix: use per-family addrcmp function
- MEDIUM: socket: add zero-terminated ABNS alternative
- BUG/MINOR: ssl/cli: 'set ssl cert' does not check the transaction name correctly
- BUG/MINOR: mworker: mworker_reexec: unset MODE_STARTING before free startup logs ring
- BUG/MINOR: errors: startup_logs_free: set global startup_logs ptr to NULL
- BUG/MINOR: errors: print_message: don't allocate startup logs ring
- BUG/MINOR: startup: don't fork worker if started with -c -W
- BUG/MINOR: startup: dump libs only in worker if started with -W -dL
- BUG/MINOR: startup: dump keywords only in worker if started with -W -dKAll
- BUG/MINOR: startup: don't dump polling info for master in verbose mode
- CI: switch QUIC Interop on AWS-LC to common docker image
- CI: switch QUIC Interop on LibreSSL to common docker image
- CI: enable chacha20 test on LibreSSL QUIC Interop
- DOC: config: add missing glitch_{cnt,rate} data types
- DOC: config: add missing glitch_{cnt,rate} sample definitions
- CI: LibreSSL QUIC Interop: fix docker context
- DEBUG: mux-h1: Add H1C expiration dates in trace messages
- BUG/MEDIUM: mux-h1: Fix how timeouts are applied on H1 connections
- BUG/MINOR: http-ana: Report internal error if an action yields on a final eval
- MINOR: stream: Save last evaluated rule on invalid yield
- MINOR: quic: complete trace in qc_may_build_pkt()
- MINOR: quic: move qc_send_mux() prototype into quic_tx.h
- MINOR: stream: Replace last_rule_file/line fields by a more generic field
- MINOR: stream: Save the last filter evaluated interrupting the processing
- MINOR: stream: Save the entity waiting to continue its processing
- MINOR: stream: Use an enum to identify last and waiting entities for streams
- MINOR: stream: Add http-buffer-request option in the waiting entities
- DOC: config: Add documentation about last_entity sample fetch
- DOC: config: Add documentation about waiting_entity sample fetch
Following previous commit, when glitch_cnt and glitch_rate data types were
implemented in c9c6b683f ("MEDIUM: stick-tables: add a new stored type for
glitch_cnt and glitch_rate"), newly exposed samples such as
table_glitch_cnt(), table_glitch_rate, src_glitch_cnt() and
src_glitch_rate() were documented but their definitions was missing in
supported keywords list.
It should be backported in 3.0 with c9c6b683f
When glitch_cnt and glitch_rate data types were implemented in
c9c6b683f ("MEDIUM: stick-tables: add a new stored type for glitch_cnt and
glitch_rate"), the data types list for "stick-table" keyword documentation
was overlooked.
This was reported by Nick Ramirez.
It should be backported in 3.0 with c9c6b683f.
When an abstract unix socket is bound by HAProxy (using "abns@" prefix),
NUL bytes are appended at the end of its path until sun_path is filled
(for a total of 108 characters).
Here we add an alternative to pass only the non-NUL length of that path
to connect/bind calls, such that the effective path of the socket's name
is as humanly written. This may be useful to interconnect with existing
softwares that implement abstract sockets with this logic instead of the
default haproxy one.
This is achieved by implementing the "abnsz" socket prefix (instead of
"abns"), which stands for "zero-terminated ABNS". "abnsz" prefix may be
used anywhere "abns" is. Internally, haproxy uses the custom socket
family (AF_CUST_ABNS vs AF_CUST_ABNSZ) to differentiate default abns
sockets from zero-terminated ones.
Documentation was updated and regtest was added.
Fixes GH issues #977 and #2479
Co-authored-by: Aurelien DARRAGON <adarragon@haproxy.com>
These are the notes of a day long code analysis session (CFA+WTA)
aimed at figuring what's missing during most code troubleshooting
sessions. The goal is to provide good indications about what rules/
filters were still active when the processing ended (timeout, error
etc), what subscribers are still active (indicating waiting for an
event), and what shut/abort events were met at the various levels
of each side's stack, in each direction.
Sometimes it would be desirable to include some debugging output only
under certain conditions, but the end of the transfer is too late to
apply some rules.
Here we take the approach of making a converter ("when") that takes a
condition among an arbitrary list, and decides whether or not to let
the input sample pass through or not based on the condition. This
allows for example to log debugging information only when an error
was encountered during the processing (sort of an extension of
dontlog-normal). The conditions are quite limited (stopping, error,
normal, toapplet, forwarded, processed) and can be negated. The
converter can also be chained to use more complex conditions.
A suggested example will be:
# log "dbg={-}" when fine, or "dbg={... debug info ...}" on error:
log-format "$HAPROXY_HTTP_LOG_FMT dbg={%[bs.debug_str,when(!normal)]}"
"option forwarded" provides a convenient way to automatically insert
rfc7239 forwarded header to requests sent to servers.
On the other hand, manually crafting the header is quite complicated due
to specific formatting rules that must be followed as per rfc7239.
However, sometimes it may be necessary to craft the header manually, for
instance if it has to be conditional or based on parameters that "option
forwarded" doesn't provide. To ease this task, in this patch we implement
rfc7239_nn and rfc7239_np which are respectively meant to craft nodename:
nodeport values, specifically intended to manually build rfc7239 'for'
and 'by' header fields while ensuring rfc7239 compliancy.
Example:
# build RFC-compliant 7239 header:
http-request set-var-fmt(txn.forwarded) "for=\"%[ipv6(::1),rfc7239_nn]:%[str(8888),rfc7239_np]\";host=\"haproxy.org\";proto=http"
# check RFC-compliancy:
http-request set-var(txn.test) "var(txn.forwarded),debug(ok,stderr),rfc7239_is_valid,debug(ok,stderr)"
# stderr output:
# [debug] ok: type=str <for="[::1]:_8888";host="haproxy.org";proto=http>
# [debug] ok: type=bool <1>
See documentation for more info and examples.
Released version 3.1-dev10 with the following main changes :
- BUG/MAJOR: mux-quic: do not crash on empty STREAM frame emission
- BUG/MINOR: stats: Fix the name for the total number of streams created
- MINOR: quic: strengthen qc_release_frm()
- MEDIUM: quic: decount acknowledged data for MUX txbuf window
- MINOR: quic: implement dedicated type for out-of-order stream ACK
- MEDIUM: quic: merge contiguous/overlapping buffered ack stream range
- MEDIUM: quic: decount out-of-order ACK data range for MUX txbuf window
- MINOR: log: add do_log() logging helper
- MINOR: log: add do_log_parse_act() helper func
- MINOR: action: add do-log action
- REGTESTS: add some tests for 'do-log' action
- BUG/MEDIUM: hlua: make hlua_ctx_renew() safe
- BUG/MEDIUM: hlua: properly handle sample func errors in hlua_run_sample_{fetch,conv}()
- BUG/MINOR: quic: fix discarding of already stored out-of-order ACK
- BUG/MEDIUM: quic: properly decount out-of-order ACK on stream release
- MINOR: ssl: disable server side default CRL check with WolfSSL
- MEDIUM: sink: implement sink_find_early()
- MINOR: trace: postresolve sink names
- MINOR: sample: postresolve sink names in debug() converter
- BUG/MEDIUM: mux-quic: ensure timeout server is active for short requests
- MINOR: cfgparse: simulate long configuration parsing with force-cfg-parser-pause
- BUILD: cache: silence an uninitialized warning at -Og with gcc-12.2
- BUG/MINOR: mux-h2/traces: present the correct buffer for trailers errors traces
- MINOR: mux-h2/traces: print the size of the DATA frames
- CLEANUP: muxes: remove useless inclusion of ebmbtree.h
- REORG: buffers: move some of the heavy functions from buf.h to buf.c
- MINOR: buffer: add a buffer list type with functions
- MINOR: mux-h2: split the amount of rx data from the amount to ack
- MINOR: mux-h2: create and initialize an rx offset per stream
- MEDIUM: mux-h2: start to update stream when sending WU
- MEDIUM: mux-h2: start to introduce the window size in the offset calculation
- MINOR: mux-h2: count within a connection, how many streams are receiving data
- MINOR: mux-h2: allocate the array of shared rx bufs in the h2c
- MINOR: mux-h2: add rxbuf head/tail/count management for h2s
- MINOR: mux-h2: move H2_CF_WAIT_IN_LIST flag away from the demux flags
- MINOR: mux-h2: simplify the exit code in h2_rcv_buf()
- MINOR: mux-h2: simplify the wake up code in h2_rcv_buf()
- MINOR: mux-h2: clear up H2_CF_DEM_DFULL and H2_CF_DEM_SHORT_READ ambiguity
- MAJOR: mux-h2: make streams use the connection's buffers
- MAJOR: mux-h2: permit a stream to allocate as many buffers as desired
- MAJOR: mux-h2: make the rxbuf allocation algorithm a bit smarter
- MINOR: mux-h2: add tune.h2.be.rxbuf and tune.h2.fe.rxbuf global settings
- MEDIUM: mux-h2: change the default initial window to 16kB
- DOC: design-thoughts: add diagrams illustrating an rx win groth
- MEDIUM: mux-h2: rework h2_restart_reading() to differentiate recv and demux
- OPTIM: mux-h2: make h2_send() report more accurate wake up conditions
- OPTIM: mux-h2: try to continue reading after demuxing when useful
- OPTIM: mux-h2: use tasklet_wakeup_after() in h2s_notify_recv()
- MINOR: mux-h2/traces: add missing flags and proxy ID in traces
- MINOR: mux-h2/traces: add buffer-related info to h2s and h2c
- CI: cirrus-ci: bump FreeBSD image to 14-1
- REGTESTS: fix a reload race in abns_socket.vtc
- MINOR: activity/memprofile: always return "other" bin on NULL return address
- MINOR: quic: notify connection layer on handshake completion
- BUG/MINOR: stream: unblock stream on wait-for-handshake completion
- BUG/MEDIUM: quic: support wait-for-handshake
- BUG/MEDIUM: server: server stuck in maintenance after FQDN change
- BUG/MEDIUM: queue: make sure never to queue when there's no more served conns
- DEBUG: mux-h2/flags: add H2_CF_DEM_RXBUF & H2_SF_EXPECT_RXDATA for the decoder
- REGTESTS: cli: add delay 0.1 before connect to cli
- MINOR: startup: add O_CLOEXEC flag to open /dev/null
- MEDIUM: startup: move daemonization fork in init
- MINOR: startup: refactor "daemonization" fork
- MEDIUM: startup: move PID handling in init()
- MAJOR: mworker: move master-worker fork in init()
- BUG/MINOR: mworker: fix memory leak due to master-worker fork
- REORG: mworker: set nbthread=1 for master after fork
- MINOR: init: check MODE_MWORKER before creating master CLI
- REORG: mworker: move mworker_create_master_cli in master 'case'
- MEDIUM: startup: call chroot() if needed in one place
- MEDIUM: startup: do set_identity() if needed in one place
- MINOR: startup: only worker gets capabilities from bin
- CLEANUP: haproxy: rm no longer used mworker_reexec_waitmode
- MINOR: startup: rename exit_on_waitmode_failure to exit_on_failure
- MINOR: defaults: update MASTER_MAXCONN description
- MEDIUM: startup: remove MODE_MWORKER_WAIT
- MINOR: global: add MODE_DISCOVERY flag
- MEDIUM: cfgparse: add KWF_DISCOVERY keyword flag
- MEDIUM: cfgparse: call some parsers only in MODE_DISCOVERY
- MEDIUM: cfgparse-global: parse only KWF_DISCOVERY keywords in MODE_DISCOVERY
- MEDIUM: cfgparse: parse only "global" section in MODE_DISCOVERY
- MEDIUM: startup: introduce load_cfg and read_cfg
- MINOR: cfgparse: fix *thread keywords sensitive to global section position
- MINOR: mworker/cli: rename mworker_cli_proxy_new_listener
- MINOR: mworker/cli: rename and clean mworker_cli_sockpair_new
- MINOR: mworker/cli: create master CLI sockpair before fork
- MINOR: mworker/cli: create MASTER proxy before mcli listeners
- MINOR: mworker: add and set state PROC_O_INIT for new worker
- MEDIUM: mworker/cli: close child and parent fds, setup listeners
- MINOR: mworker: mworker_catch_sigchld: use fd_delete instead of close
- MINOR: startup: rename and adapt reexec_on_failure
- MINOR: mworker: add support for case when new worker dies
- MINOR: mworker: simplify the code that sets PROC_O_LEAVING
- MINOR: mworker/cli: add _send_status to support state transition
- MEDIUM: startup: split sending oldpids_sig logic for standalone and mworker modes
- MINOR: startup: split init() into separate initialization routines
- MINOR: startup: split main: add step_init_3
- MINOR: startup: simplify check for calling sock_get_old_sockets
- MINOR: startup: encapsulate sock_get_old_sockets in a function
- MINOR: startup: add bind_listeners
- MINOR: startup: split main: add step_init_4
- MINOR: startup: encapsulate master's code in run_master
- MINOR: startup: add read_cfg_in_discovery_mode
- MINOR: mworker: adapt exit_on_failure for master recovery mode
- MEDIUM: mworker: add support of master recovery mode
- MINOR: startup: add set_verbosity
- MEDIUM: mworker: block reloads
- MINOR: mworker: slow load status delivery if worker is starting
- MINOR: mworker: readapt program support in mworker_catch_sigchld
- MINOR: mworker: deserialize process list before read_cfg_in_discovery_mode
- MINOR: mworker: parse program only in MODE_DISCOVERY
- MINOR: cfgparse: add support for program section
- MINOR: startup: reintroduce program support
- MINOR: mworker-prog: stop old programs in mworker_ext_launch_all
- MINOR: mworker: reintroduce systemd support
- MINOR: mworker: report explicitly when worker exits due to max reloads
- MINOR: cfgparse-global: parse *env keywords in MODE_DISCOVERY
- MINOR: startup: reintroduce *env keywords support
- MINOR: startup: close devnullfd, when daemon mode is applied
Let's just see on a diagram how the receiver can detect that the
window is large enough for the remote sender to fill the link. Here
it seems that a first criterion is that data are accumulating in
the rxbuf, indicating that the next hop doesn't consume them fast
enough. On the diagram it's visible when blue arrows (incoming data)
are more frequent than the magenta ones on average (outgoing data),
which happens when silence moments are less frequent and don't allow
the reader to catch up. It's also visible that there are two phases
alternating in the transfer:
- measure round trip time (i.e. how long it takes to restart
sending after a WU was sent after a long silence)
- measure the lowest rxbuf size during the previous round trip
It's worth noting that a window size change only has *observable* effect
after two RTT: the first RTT is to restart sending (opening or enlarging
the window), the second RTT to measure the lowest rxbuf size over the
period.
By turning the advertised window into an offset and comparing it to
the received quantity, it's possible to measure the RTT of the whole
chain (including the client possibly producing the data). Note that
when multiple streams compete for BW this can become tricky. Limiting
the window to available buffers and counting the number of sending
streams on a connection could work (i.e. split total buffers into
1+#senders, first one being used for tx).
Now that we're using all available rx buffers for transfers, there's
no point anymore in advertising more than the minimum value we can
safely buffer. Let's be conservative and only rely on the dynamic
buffers to improve speed beyond the configured value, and make sure
than many streams will no longer cause unfairness.
Interestingly, the total number of wakeups has further shrunk down, but
with a different distribution. From 128k for 1000 1M transfers, it went
down to 119k, with 96k from restart_reading, 10k from done_ff and 2.6k
from snd_buf. done_ff went up by 30% and restart_reading went down by
30%.
These settings allow to change the total buffer size allocated to the
backend and frontend respectively. This way it's no longer necessary to
play with tune.bufsize nor increase the number of streams to benefit from
more buffers.
Setting tune.h2.fe.rxbuf to 4m to match a sender's max tcp_wmem resulted
in 257 Mbps for a single stream at 103ms vs 121 Mbps default (or 5.1 Mbps
with a single buffer and 64kB window).
The buffer ring is problematic in multiple aspects, one of which being
that it is only usable by one entity. With multiplexed protocols, we need
to have shared buffers used by many entities (streams and connection),
and the only way to use the buffer ring model in this case is to have
each entity store its own array, and keep a shared counter on allocated
entries. But even with the default 32 buf and 100 streams per HTTP/2
connection, we're speaking about 32*101*32 bytes = 103424 bytes per H2
connection, just to store up to 32 shared buffers, spread randomly in
these tables. Some users might want to achieve much higher than default
rates over high speed links (e.g. 30-50 MB/s at 100ms), which is 3 to 5
MB storage per connection, hence 180 to 300 buffers. There it starts to
cost a lot, up to 1 MB per connection, just to store buffer indexes.
Instead this patch introduces a variant which we call a buffer list.
That's basically just a free list encoded in an array. Each cell
contains a buffer structure, a next index, and a few flags. The index
could be reduced to 16 bits if needed, in order to make room for a new
struct member. The design permits initializing a whole freelist at once
using memset(0).
The list pointer is stored at a single location (e.g. the connection)
and all users (the streams) will just have indexes referencing their
first and last assigned entries (head and tail). This means that with
a single table we can now have all our buffers shared between multiple
streams, irrelevant to the number of potential streams which would want
to use them. Now the 180 to 300 entries array only costs 7.2 to 12 kB,
or 80 times less.
Two large functions (bl_deinit() & bl_get()) were implemented in buf.c.
A basic doc was added to explain how it works.
This command is pausing the configuration parser for <timeout>
milliseconds. This is useful for development or for testing timeouts of
init scripts, particularly to simulate a very long reload. It requires
the expose-experimental-directives to be set.
A previous known limitation about traces was that parsing was performed on
the fly, meaning that when using "sink" keyword, only sinks that were
either internal or previously defined in the config could be used. Indeed,
it was not possible to use a ring section defined AFTER the traces section
when using the 'sink' keyword from traces.
This limitation was also mentioned in the config file.
Let's get rid of that limitation by implementing proper postparsing for
the sink parameter in traces section. To do this, make use of the new
sink_find_early() helper to start referencing sink by their names even
if they don't exist yet (if they are about to be defined later in the
config)
Traces commands on the cli are not concerned by this change.
Thanks to the two previous commits, we can now expose the do-log action
on all available action contexts, including the new quic-init context.
Each context is responsible for exposing the do-log action by registering
the relevant log steps, saving the idendifier, and then store it in the
rule's context so that do_log_action() automatically uses it to produce
the log during runtime.
To use the feature, it is simply needed to use "do-log" (without argument)
on an action directive, example:
tcp-request connection do-log
As mentioned before, each context where the action is exposed has its own
log step identifier. Currently known identifiers are:
quic-initial: quic-init
tcp-request connection: tcp-req-conn
tcp-request session: tcp-req-sess
tcp-request content: tcp-req-cont
tcp-response content: tcp-res-cont
http-request: http-req
http-response: http-res
http-after-response: http-after-res
Thus, these "additional" logging steps can be used as-is under log-profile
section (after "on" keyword). However, although the parser will accept
them, it makes no sense to use them with the "log-steps" proxy keyword,
since the only path for these origins to trigger a log generation is
through the explicit use of "do-log" action.
This need was described in GH #401, it should help to conditionally
trigger logs using ACL at specific key points.. and may either be used
alone or combined with "log-steps" to add additional log "trackers" during
transaction handling.
Documentation was updated and some examples were added.
Released version 3.1-dev9 with the following main changes :
- MINOR: tools: add minimal file name management
- CLEANUP: stick-table: make the file location point to a global file name
- MINOR: proxy: use the global file names for conf->file
- CLEANUP: cfgparse: factor proxy vs log-forward collisions
- BUG/MINOR: cfgparse: detect another uncaught case of duplicate defaults
- MINOR: proxy: add a list of orphaned defaults sections
- MEDIUM: cfgparse: drop duplicate named defaults sections after use
- OPTIM: cfgparse: speed up duplicate server detection
- MEDIUM: cfgparse: warn about deprecated use of duplicate server names
- BUG/MINOR: server: shut down streams under thread isolation
- BUG/MINOR: proxy: also make the cli and resolvers use the global name
- REGTESTS: log: fix log-profile.vtc
- MEDIUM: mailers: warn about deprecated legacy mailers
- BUG/MEDIUM: cli: Be sure to catch immediate client abort
- DEV: flags/applet: decode appctx flags
- BUG/MEDIUM: cli: Deadlock when setting frontend maxconn
- MINOR: log: fix indent in strm_log()
- MINOR: log: introduce extra log profile steps
- MINOR: log: handle extra log origins in _process_send_log_override()
- MINOR: log: introduce log_orig flags
- MINOR: log: explicitly handle extra log origins as error when relevant
- MINOR: log: support extra log origins for '%OG' alias
- MINOR: proxy: add log_steps struct member
- MINOR: log: introduce "log-steps" proxy keyword
- MINOR: log: add log_orig_proxy() helper function
- MEDIUM: log: consider log-steps proxy setting for existing log origins
- DOC: config: document proxy "log-steps" keyword
- REGTESTS: add a test for proxy "log-steps"
- Revert "BUG/MINOR: server: shut down streams under thread isolation"
- MINOR: task: define two new one-shot events for use with WOKEN_OTHER or MSG
- BUG/MEDIUM: stream: make stream_shutdown() async-safe
- BUG/MINOR: server: make sure the HMAINT state is part of MAINT
- BUG/MINOR: queue: make sure that maintenance redispatches server queue
- MINOR: server: make srv_shutdown_sessions() call pendconn_redistribute()
- BUILD: tools: only include execinfo.h for the real backtrace() function
- MINOR: tools: do not attempt to use backtrace() on linux without glibc
- OPTIM: channel: speed up co_getline()'s search of the end of line
- OPTIM: stconn: Don't pretend mux have more data to deliver on EOI/EOS/ERROR
- BUG/MINOR: mcli: Pretend the mux have more data to deliver between two commands
- MINOR: action: Export release_expr_int_action() release function
- MINOR: stream: Rely on a per-stream max connection retries value
- MINOR: stream: Support dynamic changes of the number of connection retries
- MINOR: stream/stats: Expose the current number of streams in stats
- MINOR: stream/stats: Expose the total number of streams ever created in stats
- BUG/MINOR: cfgparse-global: fix allowed args number for setenv
- MINOR: cfgparse-global: add dedicated parser for *env keywords
- MINOR: mux-quic: complete Tx infos for QCS dump
- MINOR: quic: ensure txbuf realloc is only performed on empty buffer
- MINOR: mux-quic: strengthen qcs_send_metadata() usage
- MINOR: quic: remove unneeded notification of txbuf room
- MINOR: quic: refactor MUX send notification
- MEDIUM: quic: strengthen MUX send notification
- MINOR: quic: refactor STREAM room notification
- MINOR: quic: do not remove qc_stream_desc automatically on ACK handling
- MINOR: quic: store streambuf in a streamdesc tree
- MINOR: quic: move buffered ACK to streambuf
- MEDIUM: quic: handle out-of-order ACK at streamdesc layer
- MEDIUM: quic: refactor buffered STREAM ACK consuming
- BUG/MEDIUM: queue: always dequeue the backend when redistributing the last server
- MINOR: config/trace: Add a 'traces' section to declare debug traces
- MINOR: trace: Be able to chain commands for a source in one line
- MINOR: tcpcheck: Add support for an option host header value for httpchk option
- BUG/MINOR: mux-h1: Fix condition to set EOI on SE during zero-copy forwarding
- MINOR: mux-h1: Use a dedicated function to conditionnaly set EOI flag on SE
- BUG/MINOR: http-ana: Disable fast-fwd for unfinished req waiting for upgrade
- BUG/MINOR: mux-quic: fix crash on qcc_init() early return
- BUG/MINOR: quic: fix trace on releasing STREAM frame after ack
Support for headers and body hidden in the version for the "option httpchk"
directive was removed. However a Host header is mandatory for HTTP/1.1
requests and some servers may return an error if it is not set. For now, to
add it, an "http-check send" rule must be added. But it is not really handy
to use an extra config line for this purpose.
So now, it is possible to set the host header value, a log-format string, as
extra argument to "option httpchk" directive. It must be the fourth argument:
option httpchk GET / HTTP/1.1 www.srv.com
While this patch is not a bug fix, it is simple enough to be backported if
necessary. On 2.9 and older, lf_init_expr() does not exist and LIST_INIT() must
be used instead.
In the configuration file or on the CLI, configuring traces for a specific
source is a bit painful because this must be done in several lines. Thanks
to this patch, it is now possible to fully configure traces for a source in
one line. For instance, the following on the CLI:
trace h1 sink stderr; trace h1 level developer; trace h1 verbosity complete; trace h1 start now
can now be replaced by:
trace h1 sink stderr level developer verbosity complete start now
The same is true for the 'trace' directives in the configuration file.
It is no longer supported to declare debug traces, via 'trace' directive, in
a global section. A 'traces' directive must be used instead. The syntax of
the 'trace' directive in these sections remains the same. But it is no
longer experimental.
The main reason for this change is to avoid to have a ring section defined
before a global one. Indeed, for now, forward declarations of ring sections
are not supported. So to configure traces, you had to add a ring section
before the global one defining the traces. Most of time, that meant to have
two global sections :
global
[...] # global settings
ring <name>
[...]
global
[...] # trace config
In addition, it will be possible to easily extend the traces section by
adding some new directives.
Thanks to the previous patch, it is now possible to add an action to
dynamically change the maxumum number of connection retires for a stream.
"set-retries" action may now be used to do so, from a "tcp-request content"
or a "http-request" rule. This action accepts an expression or an integer
between 0 and 100. The integer value is checked during the configuration
parsing and leads to an error if it is not in the expected range. However,
for the expression, the value is retrieve at runtime. So, invalid value are
just ignored.
Too high value is forbidden to avoid any trouble. 100 retries seems already
be an amazingly hight value. In addition, the option is only available on
backend or listen sections.
Because the max retries is limited to 100 at most, it can be stored as a
unsigned short. This save some space in the stream structure.
As mentioned in 2.8 announce on the mailing list [1] and on the wiki [2],
use of legacy mailers is now deprecated and will not be supported anymore
starting with version 3.3. Use of Lua script (AKA Lua mailers) is now
encouraged (and fully supported since 2.8) for this purpose, as it offers
more flexibility (e.g: alerts can be customized) and is more future-proof.
Configurations relying on legacy mailers will now raise a warning.
Users willing to keep their existing mailers config in a working state
should simply add the following line to their global section:
# mailers.lua file as provided in the git repository
# adjust path as needed
lua-load examples/lua/mailers.lua
[1]: https://www.mail-archive.com/haproxy@formilux.org/msg43600.html
[2]: https://github.com/haproxy/wiki/wiki/Breaking-changes
Released version 3.1-dev8 with the following main changes :
- DOC: configuration: place the HAPROXY_HTTP_LOG_FMT example on the correct line
- MINOR: mux-h1: Set EOI on SE during demux when both side are in DONE state
- BUG/MEDIUM: mux-h1/mux-h2: Reject upgrades with payload on H2 side only
- REGTESTS: h1/h2: Update script testing H1/H2 protocol upgrades
- BUG/MEDIUM: clock: detect and cover jumps during execution
- BUG/MINOR: pattern: prevent const sample from being tampered in pat_match_beg()
- BUG/MEDIUM: pattern: prevent uninitialized reads in pat_match_{str,beg}
- BUG/MEDIUM: pattern: prevent UAF on reused pattern expr
- MEDIUM: ssl/cli: "dump ssl cert" allow to dump a certificate in PEM format
- BUG/MAJOR: mux-h1: Wake SC to perform 0-copy forwarding in CLOSING state
- BUG/MINOR: h1-htx: Don't flag response as bodyless when a tunnel is established
- REGTESTS: fix random failures with wrong_ip_port_logging.vtc under load
- BUG/MINOR: pattern: do not leave a leading comma on "set" error messages
- REGTESTS: shorten a bit the delay for the h1/h2 upgrade test
- MINOR: server: allow init-state for dynamic servers
- DOC: server: document what to check for when adding new server keywords
- MEDIUM: h1: Accept invalid T-E values with accept-invalid-http-response option
- BUG/MINOR: polling: fix time reporting when using busy polling
- BUG/MINOR: clock: make time jump corrections a bit more accurate
- BUG/MINOR: clock: validate that now_offset still applies to the current date
- BUG/MEDIUM: queue: implement a flag to check for the dequeuing
- OPTIM: sample: don't check casts for samples of same type
- OPTIM: vars: remove the unneeded lock in vars_prune_*
- OPTIM: vars: inline vars_prune() to avoid many calls
- MINOR: vars: remove the emptiness tests in callers before pruning
- IMPORT: import cebtree (compact elastic binary trees)
- OPTIM: vars: use a cebtree instead of a list for variable names
- OPTIM: vars: use multiple name heads in the vars struct
- BUG/MINOR: peers: local entries updates may not be advertised after resync
- DOC: config: Explicitly list relaxing rules for accept-invalid-http-* options
- MINOR: proxy: Rename accept-invalid-http-* options
- DOC: configuration: Remove dangerous directives from the proxy matrix
- BUG/MEDIUM: sc_strm/applet: Wake applet after a successfull synchronous send
- BUG/MEDIUM: cache/stats: Wait to have the request before sending the response
- BUG/MEDIUM: promex: Wait to have the request before sending the response
- MINOR: clock: test all clock_gettime() return values
- MEDIUM: clock: collect the monotonic time in clock_local_update_date()
- MEDIUM: clock: opportunistically use CLOCK_MONOTONIC for the internal time
- MEDIUM: clock: use the monotonic clock for idle time calculation
- MEDIUM: clock: don't compute before_poll when using monotonic clock
- BUG/MINOR: fix missing "log-format overrides previous 'option tcplog clf'..." detection
- BUG/MINOR: fix missing "'option httpslog' overrides previous 'option tcplog clf'..." detection
- BUG/MINOR: cfgparse-listen: fix option httpslog override warning message
- BUG/MINOR: cfgparse: detect incorrect overlap of same backend names
- MEDIUM: cfgparse: warn about proxies having the same names
- DOC: management: add init-state to add server keywords
- BUG/MINOR: mux-quic: report glitches to session
- BUILD: cebtree: silence a bogus gcc warning on impossible code paths
- MEDIUM: cfgparse: warn about colliding names between defaults and proxies
- MEDIUM: cfgparse: detect collisions between defaults and log-forward
For now, that only concerns accept-invalid-http-{request/response} and
accept-unsafe-violations-in-http-{request/response}. But the idea is to make
dangerous directives hard to find. It is one more way to discourage anyone
to use it. And, optionnaly, it is also handy because it keeps the matrix
aligned on 80 columns.
With these options, it is possible to accept some invalid messages that may
considered as unsafe and may result as vulnerabilities. The naming is not
explicit enough on this point. These option must really be considered as
dangerous and only used as a temporary workaround. Unfortunately, when used,
it is probably because there are some legacy and unsupported applications in
place. Nevermind. The documentation warns about the use of these
options. Now the name of the options itself is a warning.
So now, "accept-invalid-http-request" and "accept-invalid-http-response"
options are deprecated and replaced by
"accept-unsafe-violations-in-http-request" and
"accept-unsafe-violations-in-http-response" options.
Time to time, new exceptions are added in the HTTP parsing (most of time H1)
to not reject some invalid messages sent by legacy applications. But the
documentation of accept-invalid-http-request and
accept-invalid-http-response options is not pretty clear. So, now, there is
an explicit list of relaxing rules for both options.
Commit 50322df introduced the init-state keyword, but it didn't enable
it for dynamic servers. However, this feature is perfectly desirable
for virtual servers too, where someone would like a server inlived
through "set server be1/srv1 state ready" to be put out of maintenance
in down state until the next health check succeeds.
At reading the code, it seems that it's only a matter of allowing this
keyword for dynamic servers, as current code path calls
srv_adm_set_ready() which incidentally triggers a call to
_srv_update_status_adm().
The new "dump ssl cert" CLI command allows to dump a certificate stored
into HAProxy memory. Until now it was only possible to dump the
description of the certificate using "show ssl cert", but with this new
command you can dump the PEM content on the filesystem.
This command is only available on a admin stats socket.
$ echo "@1 dump ssl cert cert.pem" | socat /tmp/master.sock -
-----BEGIN PRIVATE KEY-----
[...]
-----END PRIVATE KEY-----
-----BEGIN CERTIFICATE-----
[...]
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
[...]
-----END CERTIFICATE-----
When HAPROXY_HTTP_LOG_FMT was added by commit 537b9e7f36 ("MINOR: config:
add environment variables for default log format"), the example was placed
by accident after the clf log format instead of the HTTP log format,
causing a bit of confusion.
This can be backported to 2.8.
Released version 3.1-dev7 with the following main changes :
- MINOR: config: Created env variables for http and tcp clf formats
- MINOR: mux-quic: add buf_in_flight to QCC debug infos
- MINOR: mux-quic: correct qcc_bufwnd_full() documentation
- MINOR: tools: add helpers to backup/clean/restore env
- MINOR: mworker: restore initial env before wait mode
- BUG/MINOR: haproxy: free init_env in deinit only if allocated
- BUILD: tools: environ is not defined in OS X and BSD
- DEV: coccinelle: add a test to detect unchecked malloc()
- DEV: coccinelle: add a test to detect unchecked calloc()
- CI: QUIC Interop AWS-LC: enable ngtcp2 client
- CI: fix missing comma introduced in 956839c0f6
- CI: QUIC Interop: do not run bandwidth measurement tests
- CI: QUIC Interop: use different artifact names for uploading logs
- BUILD: quic: 32bits build broken by wrong integer conversions for printf()
- CLEANUP: ssl: cleanup the clienthello capture
- MEDIUM: ssl: capture the supported_versions extension from Client Hello
- MEDIUM: ssl/sample: add ssl_fc_supported_versions_bin sample fetch
- MEDIUM: ssl: capture the signature_algorithms extension from Client Hello
- MEDIUM: ssl/sample: add ssl_fc_sigalgs_bin sample fetch
- MINOR: proxy: Add support of 429-Too-Many-Requests in retry-on status
- BUG/MEDIUM: mux-h2: Set ES flag when necessary on 0-copy data forwarding
- BUG/MEDIUM: stream: Prevent mux upgrades if client connection is no longer ready
- BUG/MINIR: proxy: Match on 429 status when trying to perform a L7 retry
- CLEANUP: haproxy: fix typos in code comment
- CLEANUP: mqtt: fix typo in MQTT_REMAINING_LENGHT_MAX_SIZE
- MINOR: tools: Implement ipaddrcpy().
- MINOR: quic: Implement quic_tls_derive_token_secret().
- MINOR: quic: Token for future connections implementation.
- BUG/MINOR: quic: Missing incrementation in NEW_TOKEN frame builder
- MINOR: quic: Modify NEW_TOKEN frame structure (qf_new_token struct)
- MINOR: quic: Implement qc_ssl_eary_data_accepted().
- MINOR: quic: Add trace for QUIC_EV_CONN_IO_CB event.
- BUG/MEDIUM: quic: always validate sender address on 0-RTT
- BUILD: quic: fix build errors on FreeBSD since recent GSO changes
- MINOR: tools: extend str2sa_range to add an alt parameter
- MINOR: server: add a alt_proto field for server
- MEDIUM: sock: use protocol when creating socket
- MEDIUM: protocol: add MPTCP per address support
- BUG/MINOR: quic: Crash from trace dumping SSL eary data status (AWS-LC)
- MEDIUM: stick-table: Add support of a factor for IN/OUT bytes rates
- MEDIUM: bwlim: Use a read-lock on the sticky session to apply a shared limit
- BUG/MEDIUM: mux-pt: Never fully close the connection on shutdown
- BUG/MEDIUM: cli: Always release back endpoint between two commands on the mcli
- BUG/MINOR: quic: unexploited retransmission cases for Initial pktns.
- BUG/MEDIUM: mux-h1: Properly handle empty message when an error is triggered
- MINOR: mux-h2: try to clear DEM_MROOM and MUX_MFULL at more places
- BUG/MAJOR: mux-h2: always clear MUX_MFULL and DEM_MROOM when clearing the mbuf
- BUG/MINOR: mux-spop: always clear MUX_MFULL and DEM_MROOM when clearing the mbuf
- BUG/MINOR: Crash on O-RTT RX packet after dropping Initial pktns
- BUG/MEDIUM: mux-pt: Fix condition to perform a shutdown for writes in mux_pt_shut()
- CLEANUP: assorted typo fixes in the code and comments
- DEV: patchbot: count the number of backported/non-backported patches
- DEV: patchbot: add direct links to show only specific categories
- DEV: patchbot: detect commit IDs starting with 7 chars
- BUG/MEDIUM: clock: also update the date offset on time jumps
- MEDIUM: server: add init-state
Allow the user to set the "initial state" of a server.
Context:
Servers are always set in an UP status by default. In
some cases, further checks are required to determine if the server is
ready to receive client traffic.
This introduces the "init-state {up|down}" configuration parameter to
the server.
- when set to 'fully-up', the server is considered immediately available
and can turn to the DOWN sate when ALL health checks fail.
- when set to 'up' (the default), the server is considered immediately
available and will initiate a health check that can turn it to the DOWN
state immediately if it fails.
- when set to 'down', the server initially is considered unavailable and
will initiate a health check that can turn it to the UP state immediately
if it succeeds.
- when set to 'fully-down', the server is initially considered unavailable
and can turn to the UP state when ALL health checks succeed.
The server's init-state is considered when the HAProxy instance
is (re)started, a new server is detected (for example via service
discovery / DNS resolution), a server exits maintenance, etc.
Link: https://github.com/haproxy/haproxy/issues/51
Add a factor parameter to stick-tables, called "brates-factor", that is
applied to in/out bytes rates to work around the 32-bits limit of the
frequency counters. Thanks to this factor, it is possible to have bytes
rates beyond the 4GB. Instead of counting each bytes, we count blocks
of bytes. Among other things, it will be useful for the bwlim filter, to be
able to configure shared limit exceeding the 4GB/s.
For now, this parameter must be in the range ]0-1024].
Multipath TCP (MPTCP), standardized in RFC8684 [1], is a TCP extension
that enables a TCP connection to use different paths.
Multipath TCP has been used for several use cases. On smartphones, MPTCP
enables seamless handovers between cellular and Wi-Fi networks while
preserving established connections. This use-case is what pushed Apple
to use MPTCP since 2013 in multiple applications [2]. On dual-stack
hosts, Multipath TCP enables the TCP connection to automatically use the
best performing path, either IPv4 or IPv6. If one path fails, MPTCP
automatically uses the other path.
To benefit from MPTCP, both the client and the server have to support
it. Multipath TCP is a backward-compatible TCP extension that is enabled
by default on recent Linux distributions (Debian, Ubuntu, Redhat, ...).
Multipath TCP is included in the Linux kernel since version 5.6 [3]. To
use it on Linux, an application must explicitly enable it when creating
the socket. No need to change anything else in the application.
This attached patch adds MPTCP per address support, to be used with:
mptcp{,4,6}@<address>[:port1[-port2]]
MPTCP v4 and v6 protocols have been added: they are mainly a copy of the
TCP ones, with small differences: names, proto, and receivers lists.
These protocols are stored in __protocol_by_family, as an alternative to
TCP, similar to what has been done with QUIC. By doing that, the size of
__protocol_by_family has not been increased, and it behaves like TCP.
MPTCP is both supported for the frontend and backend sides.
Also added an example of configuration using mptcp along with a backend
allowing to experiment with it.
Note that this is a re-implementation of Bjrn's work from 3 years ago
[4], when haproxy's internals were probably less ready to deal with
this, causing his work to be left pending for a while.
Currently, the TCP_MAXSEG socket option doesn't seem to be supported
with MPTCP [5]. This results in a warning when trying to set the MSS of
sockets in proto_tcp:tcp_bind_listener.
This can be resolved by adding two new variables:
sock_inet(6)_mptcp_maxseg_default that will hold the default
value of the TCP_MAXSEG option. Note that for the moment, this
will always be -1 as the option isn't supported. However, in the
future, when the support for this option will be added, it should
contain the correct value for the MSS, allowing to correctly
set the TCP_MAXSEG option.
Link: https://www.rfc-editor.org/rfc/rfc8684.html [1]
Link: https://www.tessares.net/apples-mptcp-story-so-far/ [2]
Link: https://www.mptcp.dev [3]
Link: https://github.com/haproxy/haproxy/issues/1028 [4]
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/515 [5]
Co-authored-by: Dorian Craps <dorian.craps@student.vinci.be>
Co-authored-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
The "429" status can now be specified on retry-on directives. PR_RE_* flags
were updated to remains sorted.
This patch should fix the issue #2687. It is quite simple so it may safely
be backported to 3.0 if necessary.
Released version 3.1-dev6 with the following main changes :
- BUG/MINOR: proto_tcp: delete fd from fdtab if listen() fails
- BUG/MINOR: proto_tcp: keep error msg if listen() fails
- MINOR: proto_tcp: tcp_bind_listener: copy errno in errmsg
- MINOR: channel: implement ci_insert() function
- BUG/MEDIUM: mworker/cli: fix pipelined modes on master CLI
- REGTESTS: mcli: test the pipelined commands on master CLI
- MINOR: cfgparse: load_cfg_in_mem: fix null ptr dereference reported by coverity
- MINOR: startup: fix unused value reported by coverity
- BUG/MINOR: mux-quic: do not send too big MAX_STREAMS ID
- BUG/MINOR: proto_uxst: delete fd from fdtab if listen() fails
- BUG/MINOR: cfgparse: parse_cfg: fix null ptr dereference reported by coverity
- MINOR: proto_uxst: copy errno in errmsg for syscalls
- MINOR: mux-quic: do not trace error in qcc_send_frames() on empty list
- BUG/MINOR: h3: properly reject too long header responses
- CLEANUP: mworker/cli: clean up the mode handling
- BUG/MINOR: tools: make fgets_from_mem() stop at the end of the input
- BUG/MINOR: pattern: pat_ref_set: fix UAF reported by coverity
- BUG/MINOR: pattern: pat_ref_set: return 0 if err was found
- CI: keep logs for failed QIUC Interop jobs
- BUG/MINOR: release-estimator: fix relative scheme in CHANGELOG URL
- MINOR: release-estimator: add requirements.txt
- MINOR: release-estimator: add installation steps in README.md
- MINOR: release-estimator: fix the shebang of the python script
- DOC: config: correct the table for option tcplog
- MEDIUM: log: relax some checks and emit diag warnings instead in lf_expr_postcheck()
- MINOR: log: "drop" support for log-profile steps
- CI: QUIC Interop LibreSSL: document chacha20 test status
- CI: modernize codespell action, switch to node 16
- CI: QUIC Interop AWS-LC: enable chrome client
- DOC: lua: fix incorrect english in lua.txt
- MINOR: Implements new log format of option tcplog clf
- MINOR: cfgparse: limit file size loaded via /dev/stdin
- BUG/MINOR: stats: fix color of input elements in dark mode
- CLEANUP: stats: use modern DOCTYPE tag
- BUG/MINOR: stats: add lang attribute to html tag
- DOC: quic: fix default minimal value for max window size
- DOC: quic: document nocc debug congestion algorithm
- MINOR: quic: extract config window-size parsing
- MINOR: quic: define max-window-size config setting
- MINOR: quic: allocate stream txbuf via qc_stream_desc API
- MINOR: mux-quic: account stream txbuf in QCC
- MEDIUM: mux-quic: implement API to ignore txbuf limit for some streams
- MINOR: h3: mark control stream as metadata
- MINOR: mux-quic: define buf_in_flight
- MAJOR: mux-quic: allocate Tx buffers based on congestion window
- MINOR: quic/config: adapt settings to new conn buffer limit
- MINOR: quic: define sbuf pool
- MINOR: quic: support sbuf allocation in quic_stream
- MEDIUM: h3: allocate small buffers for headers frames
- MINOR: mux-quic: retry after small buf alloc failure
- BUG/MINOR: cfgparse-global: fix err msg in mworker keyword parser
- BUG/MINOR: cfgparse-global: clean common_kw_list
- BUG/MINOR: cfgparse-global: remove redundant goto
- MINOR: cfgparse-global: move 'pidfile' in global keywords list
- MINOR: cfgparse-global: move 'expose-*' in global keywords list
- MINOR: cfgparse-global: move tune options in global keywords list
- MINOR: cfgparse-global: move unsupported keywords in global list
- BUG/MINOR: cfgparse-global: remove tune.fast-forward from common_kw_list
- MINOR: quic: store the lost packets counter in the quic_cc_event element
- MINOR: quic: support a tolerance for spurious losses
- MINOR: protocol: properly assign the sock_domain and sock_family
- MINOR: protocol: add a family lookup
- MEDIUM: socket: always properly use the sock_domain for requested families
- MINOR: protocol: add the real address family to the protocol
- MINOR: socket: don't ban all custom families from reuseport
- MINOR: protocol: always initialize the receivers list on registration
- CLEANUP: protocol: no longer initialize .receivers nor .nb_receivers
Tests performed between a 1 Gbps connected server and a 100 mbps client,
distant by 95ms showed that:
- we need 1.1 MB in flight to fill the link
- rare but inevitable losses are sufficient to make cubic's window
collapse fast and long to recover
- a 100 MB object takes 69s to download
- tolerance for 1 loss between two ACKs suffices to shrink the download
time to 20-22s
- 2 losses go to 17-20s
- 4 losses reach 14-17s
At 100 concurrent connections that fill the server's link:
- 0 loss tolerance shows 2-3% losses
- 1 loss tolerance shows 3-5% losses
- 2 loss tolerance shows 10-13% losses
- 4 loss tolerance shows 23-29% losses
As such while there can be a significant gain sometimes in setting this
tolerance above zero, it can also significantly waste bandwidth by sending
far more than can be received. While it's probably not a solution to real
world problems, it repeatedly proved to be a very effective troubleshooting
tool helping to figure different root causes of low transfer speeds. In
spirit it is comparable to the no-cc congestion algorithm, i.e. it must
not be used except for experimentation.
Define a new buffer pool reserved to allocate smaller memory area. For
the moment, its usage will be restricted to QUIC, as such it is declared
in quic_stream module.
Add a new config option "tune.bufsize.small" to specify the size of the
allocated objects. A special check ensures that it is not greater than
the default bufsize to avoid unexpected effects.
QUIC MUX buffer allocation limit is now directly based on the underlying
congestion window size. previous static limit based on conn-tx-buffers
is now unused. As such, this commit adds a warning to users to prevent
that it is now obsolete.
Secondly, update max-window-size setting. It is now the main entrypoint
to limit both the maximum congestion window size and the number of QUIC
MUX allocated buffer on emission. Remove its special value '0' which was
used to automatically adjust it on now unused conn-tx-buffers.
Define a new global keyword tune.quic.frontend.max-window-size. This
allows to set globally the maximum congestion window size for each QUIC
frontend connections.
The default value is 0. It is a special value which automatically derive
the size from the configured QUIC connection buffer limit. This is
similar to the previous "quic-cc-algo" behavior, which can be used to
override the maximum window size per bind line.
Document nocc congestion algorithm as an entry of quic-cc-algo.
Highlight the fact that it is reserved for debugging and should not be
used outside of this use case.
It is possible to override the default QUIC congestion algorithm on a
bind line. With the same setting, it is also possible to specify the
maximum congestion window size.
The parser rejects values outside of the range between 10k and 4g. This
is in contradiction with the documentation which specify 1k as the lower
value. Correct this value in the documentation.
This should be backported up to 2.9.
Some systems require log formats in the CLF format and that meant that I
could not send my logs for proxies in mode tcp to those servers. This
implements a format that uses log variables that are compatble with TCP
mode frontends and replaces traditional HTTP values in the CLF format
to make them stand out. Instead of logging method and URI like this
"GET /example HTTP/1.1" it will log "TCP " and for a response code I
used "000" so it would be easy to separate from legitimate HTTP
traffic. Now your log servers that require a CLF format can see the
timings for TCP traffic as well as HTTP.
It is now possible to use "drop" keyword for "on" lines under a
log-profile section to specify that no log at all should be emitted for
the specified step (setting an empty format was not sufficient to do so
because only the log payload would be empty, not the log header, thus the
log would still be emitted).
It may be useful to selectively disable logging at specific steps for a
given log target (since the log profile may be set on log directives):
log-profile myprof
on request format "blabla" sd "custom sd"
on response drop
New testcase was added to reg-tests/log/log_profiles.vtc
Released version 3.1-dev5 with the following main changes :
- BUG/MINOR: quic: Lack of precision when computing K (cubic only cc)
- MEDIUM: ssl/quic: implement quic crypto with EVP_AEAD
- MINOR: quic: rename confusing wording aes to hp
- MEDIUM: quic: add key argument to header protection crypto functions
- MEDIUM: quic: implement CHACHA20_POLY1305 for AWS-LC
- MEDIUM: sink: assume sft appctx stickiness
- MINOR: quic: delay Retry emission on quic-force-retry
- MEDIUM: quic: implement quic-initial rules
- MINOR: quic: support ACL for quic-initial rules
- MINOR: quic: pass quic_dgram as obj_type for quic-initial rules
- MINOR: quic: implement reject quic-initial action
- MINOR: quic: implement send-retry quic-initial rules
- BUG/MEDIUM: quic: fix invalid conn reject with CONNECTION_REFUSED
- MEDIUM: h1: allow to preserve keep-alive on T-E + C-L
- MINOR: quic: Add information to "show quic" for CUBIC cc.
- MINOR: quic: Dump TX in flight bytes vs window values ratio.
- BUG/MEDIUM: jwt: Clear SSL error queue on error when checking the signature
- BUILD: cfgparse-quic: fix build error on Solaris due to missing netinet/in.h
- MINOR: queue: add a function to check for TOCTOU after queueing
- BUG/MEDIUM: queue: deal with a rare TOCTOU in assign_server_and_queue()
- DOC: config: Add documentation about spop mode for backends
- BUG/MEDIUM: stconn: Report error on SC on send if a previous SE error was set
- BUG/MEDIUM: mux-pt/mux-h1: Release the pipe on connection error on sending path
- BUILD: mux-pt: Use the right name for the sedesc variable
- BUG/MINOR: stconn: bs.id and fs.id had their dependencies incorrect
- BUG/MEDIUM: ssl: reactivate 0-RTT for AWS-LC
- BUG/MEDIUM: ssl: 0-RTT initialized at the wrong place for AWS-LC
- BUILD: ssl: replace USE_OPENSSL_AWSLC by OPENSSL_IS_AWSLC
- BUG/MEDIUM: quic: prevent conn freeze on 0RTT undeciphered content
- MINOR: tcp_sample: Move TCP low level sample fetch function to control layer
- MINOR: quic: Define ->get_info() control layer callback for QUIC
- MINOR: flags/mux-quic: decode qcc and qcs flags
- BUG/MINOR: quic: fix fc_rtt/srtt values
- BUG/MIONR: quic: fix fc_lost
- BUG/MINOR: h1: do not forward h2c upgrade header token
- BUG/MINOR: h2: reject extended connect for h2c protocol
- BUG/MEDIUM: http-ana: Report error on write error waiting for the response
- BUG/MEDIUM: h2: Only report early HTX EOM for tunneled streams
- BUG/MEDIUM: mux-h2: Propagate term flags to SE on error in h2s_wake_one_stream
- BUG/MEDIUM: peer: Notify the applet won't consume data when it waits for sync
- BUG/MINOR: quic: Too shord datagram during O-RTT handshakes (aws-lc only)
- CI: add weekly QUIC Interop regression against AWS-LC
- CI: harden NetBSD builds by ERR=1
- BUG/MINOR: quic: Too short datagram during packet building failures (aws-lc only)
- DEV: coccinelle: add a test to detect unchecked strdup()
- BUG/MINOR: fcgi-app: handle a possible strdup() failure
- BUG/MEDIUM: server/addr: fix tune.events.max-events-at-once event miss and leak
- MINOR: quic: convert qc_stream_desc release field to flags
- MINOR: quic: implement function to check if STREAM is fully acked
- BUG/MEDIUM: quic: handle retransmit for standalone FIN STREAM
- MINOR: quic: enforce ACK reception is handled in order
- DOC: configuration: fix alphabetical ordering of {bs,fs}.aborted
- MINOR: stconn: add a new pair of sf functions {bs,fs}.debug_str
- MINOR: mux-h2: implement the debug string for logs
- MINOR: mux-quic: define dump functions for QCC and QCS
- MINOR: mux-quic: implement debug string for logs
- MINOR: quic: dump quic_conn debug string for logs
- MINOR: time: define tot_time structure
- MINOR: mux-quic: measure QCS lifetime and its blocking state
- BUG/MINOR: trace/quic: enable conn/session pointer recovery from quic_conn
- BUG/MINOR: trace/quic: permit to lock on frontend/connect/session etc
- BUG/MEDIUM: trace: fix null deref in lockon mechanism since TRACE_ENABLED()
- BUG/MINOR: trace: automatically start in waiting mode with "start <evt>"
- BUG/MINOR: trace/quic: make "qconn" selectable as a lockon criterion
- BUG/MINOR: quic/trace: make quic_conn_enc_level_init() emit NEW not CLOSE
- MINOR: trace: support setting the sink and level for all sources at once
- MINOR: session/trace: enable very minimal session tracing
- MEDIUM: trace: implement a "follow" mechanism
- MINOR: trace: move the known trace context into a dedicated struct
- MINOR: trace: add a per-source helper to pre-fill the context
- MINOR: mux-h2: add a trace context filling helper
- MINOR: mux-h1: add a trace context filling helper
- MINOR: mux-quic: don't leave dangling pointer after freeing qcs->sd
- MINOR: mux-quic: add a trace context filling helper
- MINOR: mux-h1/trace: add a state trace on stream creation/upgrade
- MINOR: mux-h2/trace: add a state trace on stream creation/destruction
- MINOR: mux-h3/trace: add a state trace on stream creation/destruction
- BUG/MINOR: quic: prevent freeze after early QCS closure
- MINOR: server: ensure max_events_at_once > 0 in server_atomic_sync()
- MINOR: cfgparse: add struct cfgfile to represent config in memory
- REORG: tools: move list_append_word to cfgparse
- MINOR: startup: adapt list_append_word to use cfgfile
- MINOR: cfgparse: add load_cfg_in_mem
- MINOR: cfgparse: load_cfg_in_mem: take in account file size
- MINOR: tools: add fgets_from_mem
- MEDIUM: startup: make read_cfg() return immediately on ENOMEM
- MEDIUM: startup: load and parse configs from memory
- MINOR: startup: rename readcfgfile in parse_cfg
With "follow" from one source to another, it becomes possible for a
source to automatically follow another source's tracked pointer. The
best example is the session:
- the "session" source is enabled and has a "lockon session"
-> its lockon_ptr is equal to the session when valid
- other sources (h1,h2,h3 etc) are configured for "follow session"
and will then automatically check if session's lockon_ptr matches
its own session, in which case tracing will be enabled for that
trace (no state change).
It's not necessary to start/pause/stop traces when using this, only
"follow" followed by a source with lockon enabled is needed. Some
combinations might work better than others. At the moment the session
is almost never known from the backend, but this may improve.
The meta-source "all" is supported for the follower so that all sources
will follow the tracked one.
It's extremely painful to have to set "trace <src> sink buf1" for all
sources, then to do the same for "level developer" (for example). Let's
have a possibility via a meta-source "all" to apply the change to all
sources at once. This currently supports level and sink, which are not
dependent on the source, this is a good start.
These are passed to the underlying mux to retrieve debug information
at the mux level (stream/connection) as a string that's meant to be
added to logs.
The API is quite complex just because we can't pass any info to the
bottom function. So we construct a union and pass the argument as an
int, and expect the callee to fill that with its buffer in return.
Most likely the mux->ctl and ->sctl API should be reworked before
the release to simplify this.
The functions take an optional argument that is a bit mask of the
layers to dump:
muxs=1
muxc=2
xprt=4
conn=8
sock=16
The default (0) logs everything available.
These must be before {bs,fs}.id, not after. Should be backported wherever
068ce2d5d2 ("MINOR: stconn: Add samples to retrieve about stream aborts")
is (normally 3.0).
This low level callback may be called by several sample fetches for
frontend connections like "fc_rtt", "fc_rttvar" etc.
Define this callback for QUIC protocol as pointer to quic_get_info().
This latter supports these sample fetches:
"fc_lost", "fc_reordering", "fc_rtt" and "fc_rttvar".
Update the documentation consequently.
The SPOE was refactored. Now backends referenced by a SPOE filter must use
the spop mode to be able to use the spop multiplexer for server connections.
The "spop" mode was added in the list of supported mode for backends.
In 2.5-dev9, commit 631c7e866 ("MEDIUM: h1: Force close mode for invalid
uses of T-E header") enforced a recently arrived new security rule in the
HTTP specification aiming at preventing a class of content-smuggling
attacks involving HTTP/1.0 agents. It consists in handling the very rare
T-E + C-L requests or responses in close mode.
It happens it does have an impact of a rare few and very old clients
(probably running insecure TLS stacks by the way) that continue to send
both with their POST requests. The impact is that for each and every
request they'll have to reconnect, possibly negotiating a full TLS
handshake that becomes harmful to the machine in terms of CPU computation.
This commit adds a new option "h1-do-not-close-on-insecure-transfer-encoding"
that does exactly what it says, it just asks not to close on such messages,
even though the message continues to be sanitized and C-L dropped. It means
that the risk is only between the sender and haproxy, which is limited, and
might be the only acceptable solution for such environments having to deal
with broken implementations.
The cases are so rare that it should not need to be backported, or in the
worst case, to the latest LTS if there is any demand.
Define a new quic-initial "send-retry" rule. This allows to force the
emission of a Retry packet on an initial without token instead of
instantiating a new QUIC connection.
Define a new quic-initial action named "reject". Contrary to dgram-drop,
the client is notified of the rejection by a CONNECTION_CLOSE with
CONNECTION_REFUSED error code.
To be able to emit the necessary CONNECTION_CLOSE frame, quic_conn is
instantiated, contrary to dgram-drop action. quic_set_connection_close()
is called immediatly after qc_new_conn() which prevents the handshake
startup.
Add ACL condition support for quic-initial rules. This requires the
extension of quic_parse_quic_initial() to parse an extra if/unless
block.
Only layer4 client samples are allowed to be used with quic-initial
rules. However, due to the early execution of quic-initial rules prior
to any connection instantiation, some samples are non supported.
To be able to use the 4 described samples, a dummy session is
instantiated before quic-initial rules execution. Its src and dst fields
are set from the received datagram values.
Implement a new set of rules labelled as quic-initial.
These rules as specific to QUIC. They are scheduled to be executed early
on Initial packet parsing, prior a new QUIC connection instantiation.
Contrary to tcp-request connection, this allows to reject traffic
earlier, most notably by avoiding unnecessary QUIC SSL handshake
processing.
A new module quic_rules is created. Its main function
quic_init_exec_rules() is called on Initial packet parsing in function
quic_rx_pkt_retrieve_conn().
For the moment, only "accept" and "dgram-drop" are valid actions. Both
are final. The latter drops silently the Initial packet instead of
allocating a new QUIC connection.
Released version 3.1-dev4 with the following main changes :
- MINOR: limits: prepare to keep limits in one place
- REORG: fd: move raise_rlim_nofile to limits
- CLEANUP: fd: rm struct rlimit definition
- REORG: global: move rlim_fd_*_at_boot in limits
- MINOR: haproxy: prepare to move limits-related code
- REORG: haproxy: move limits handlers to limits
- MINOR: limits: add is_any_limit_configured
- CLEANUP: quic: remove obsolete comment on send
- MINOR: quic: extend detection of UDP API OS features
- MINOR: quic: activate UDP GSO for QUIC if supported
- MINOR: quic: define quic_cc_path MTU as constant
- MINOR: quic: add GSO parameter on quic_sock send API
- MAJOR: quic: support GSO when encoding datagrams
- MEDIUM: quic: implement GSO fallback mechanism
- MINOR: quic: add counters of sent bytes with and without GSO
- BUG/MEDIUM: bwlim: Be sure to never set the analyze expiration date in past
- CLEANUP: proto: rename TID affinity callbacks
- CLEANUP: quic: rename TID affinity elements
- BUG/MINOR: limits: fix license type in limits.h
- BUG/MINOR: session: Eval L4/L5 rules defined in the default section
- CLEANUP: stconn: Fix a typo in comments for SE_ABRT_SRC_*
- MEDIUM: spoe: Remove fragmentation support
- MEDIUM: spoe: Remove async mode support
- MINOR: spoe: Use only a global engine-id per agent
- MINOR: spoe: Remove debugging
- MAJOR: spoe: Remove idle applets and pipelining support
- MINOR: spoe: Remove the dedicated SPOE applet task
- MEDIUM: proxy/spoe: Add a SPOP mode
- MEDIUM: applet: Add a .shut callback function for applets
- MINOR: connection: No longer include stconn type header in connection-t.h
- MINOR: stconn: Use a dedicated function to get the opposite sedesc
- MINOR: spoe: Rename some flags and constant to use SPOP prefix
- MINOR: spoe: Dynamically alloc the message list per event of an agent
- MINOR: spoe: Move all stuff regarding the filter/applet in the C file
- MINOR: spoe: Move spoe_str_to_vsn() into the header file
- MEDIUM: mux-spop: Introduce the SPOP multiplexer
- MEDIUM: check/spoe: Use SPOP multiplexer to perform SPOP health-checks
- MAJOR: spoe: Rewrite SPOE applet to use the SPOP mux
- CLEANUP: spoe: Uniformize function definitions
- MINOR: spoe: Add internal sample fetch to retrieve the SPOE engine ID
- MEDIUM: spoe: Set a specific name for the connection pool of SPOP servers
- MINOR: backend: Remove test on HTX streams to reuse idle connections on connect
- MEDIUM: spoe: Force the reuse 'always' mode for SPOP backends
- MINOR: mux-spop: Use a dedicated function to update the SPOP connection timeout
- MAJOR: mux-spop: Make the SPOP connections reusable
- MINOR: stats-html: Display reuse ratio for spop connections
- MEDIUM: spoe: Directly xfer NOTIFY frame when SPOE applet is created
- MEDIUM: spoe: Directly receive ACK frame in the SPOE context buffer
- MEDIUM: mux-spop/spoe: Save negociated max-frame-size value in the mux
- MINOR: spoe: Remove the spop version from the SPOE appctx context
- MEDIUM: mux-spop: Add checks on received frames
- MEDIUM: mux-spop: Announce the pipeling support if possible
- MEDIUM: spoe: Forward SPOE context error to the SPOE applet
- MEDIUM: spoe: Make the SPOE applet use its own buffers
- DOC: spoe: Update SPOE documentation to reflect recent refactoring
- BUILD: mux-spop: fix build failure on gcc 4-10 and clang
- MINOR: fd: don't scan the full fdtab on all threads
- MINOR: server: better mt_list usage for node migration (prev_deleted handling)
- BUG/MINOR: do not close uninit FD in quic_test_socketops()
- BUG/MEDIUM: debug/cli: fix "show threads" crashing with low thread counts
- MINOR: debug: prepare feed_post_mortem_late
- CLEANUP: debug: fix indents in debug_parse_cli_show_dev
- MINOR: debug: store runtime uid/gid in postmortem
- MINOR: debug: keep runtime capabilities in post_mortem
- MINOR: debug: use LIM2A to show limits
- MINOR: debug: prepare to show runtime limits
- MINOR: debug: keep runtime limits in postmortem
- DOC: install: don't reference removed CPU arg
- BUG/MEDIUM: ssl_sock: fix deadlock in ssl_sock_load_ocsp() on error path
- BUG/MAJOR: mux-h2: force a hard error upon short read with pending error
- MEDIUM: sink: start applets asynchronously
- OPTIM: sink: balance applets accross threads
- MEDIUM: ocsp: fix ocsp when the chain is loaded from 'issuers-chain-path'
- MEDIUM: ssl: add extra_chain to ckch_data
- MINOR: ssl: change issuers-chain for show_cert_detail()
- REGTESTS: ssl: test the issuers-chain-path keyword
- DOC: configuration: issuers-chain-path not compatible with OCSP
- DOC: configuration: issuers-chain-path is compatible with OCSP
- BUG/MEDIUM: startup: fix zero-warning mode
- BUILD: tree-wide: cast arguments to tolower/toupper to unsigned char (2)
- MINOR: cfgparse-global: move mode's keywords in cfg_kw_list
- MINOR: cfgparse-global: move no<poller_name> in cfg_kw_list
- DOC: config: improve the http-keep-alive section
- BUG/MINOR: stick-table: fix crash for src_inc_gpc() without stkcounter
- BUG/MINOR: server: Don't warn fallback IP is used during init-addr resolution
- BUG/MINOR: cli: Atomically inc the global request counter between CLI commands
- MINOR: stream: Add a pointer to set the parent stream
- MINOR: vars: Fill a description instead of hash and scope when a name is parsed
- MINOR: vars: Use a description to set/unset a variable instead of its hash and scope
- MEDIUM: vars: Be able to parse parent scopes for variables
- MINOR: vars: Use a variable description to get variables of a specific scope
- MEDIUM: vars: Be able to retrieve variable of the parent stream, if any
- MEDIUM: spoe: Set the parent stream for SPOE streams
- BUG/MINOR: quic: Non optimal first datagram.
- DOC: config: Add a dedicated section about variables
- DOC: config: Add info about variable scopes referencing the parent stream
- DOC: config: Explicitly state the SPOE streams have a usable parent stream
- MINOR: quic: Avoid cc priv buffer overflow.
- MINOR: spoe: Add a function to validate a version is supported
- MINOR: spoe: export the list of SPOP error reasons
- MEDIUM: spoe/tcpcheck: Reintroduce SPOP check as a customized tcp-check
- REGTESTS: check/spoe: Re-enable the script performing SPOP health-checks
- BUG/MEDIUM: sink: properly init applet under sft lock
- MINOR: sink: unify and sink_forward_io_handler() and sink_forward_oc_io_handler()
- MINOR: sink: Remove useless test on SE_FL_SHR/SHW flags
- MINOR: sink: merge sink_forward_io_handler() with sink_forward_oc_io_handler()
- MINOR: sink: add some comments about sft->appctx usage in applet handlers
- MINOR: sink: distinguish between hard and soft close in _sink_forward_io_handler()
- MEDIUM: sink: don't set NOLINGER flag on the outgoing stream interface
- MINOR: ring: count processed messages in ring_dispatch_messages()
- MINOR: sink: add processed events counter in sft
- MEDIUM: sink: "max-reuse" support for sink servers
- OPTIM: sink: consider threads' current load when rebalancing applets
Thanks to the previous commit, it is now possible to know how many events
were processed for a given sft/server sink pair. As mentioned in commit
c454296 ("OPTIM: sink: balance applets accross threads"), let's provide
the ability to restart a server connection when a certain amount of events
were processed to help better balance the load over multiple threads.
For this, we make use the of "max-reuse" server keyword which was only
relevant under "http" context so far. Under sink context, "max-reuse"
corresponds to the number of times the tcp connection can be reused
for sending messages, which in fact means that "max-reuse + 1" is the
number of events (ie: messages) that are allowed to be sent using the
same tcp server connection: when this threshold is met, the connection
will be destroyed and a new one will be created on a random thread.
The value is not strict: it is the minimum value above which the
connection may be destroyed since the value is checked after
ring_dispatch_messages() which may process multiple messages at once.
By default, no limit is enforced (the connection will be reused for as
long as it is available).
The documentation was updated accordingly.
It is explicitly mentionned in the configuration manual that the parent of a
SPOE stream is the filtered stream. It means variables of the filtered
stream are usable from the SPOE stream.
It is now possible for a stream to have a parent and it is also possible to
retrieve variables defined in the parent stream context. To do so, some
extra scopes were introduced. The section 2.8. was updated accordingly.
The variables in the HAProxy configuration are now described in a dedicated
section. Instead of repeating the same description everywhere a variable
name can be used, the section 2.8. is now referenced.
Nathan Wehrman suggested this add-on to try to better explain the
interactions between http-keep-alive and other timeouts, and the
impacts on protocols (HTTP/1, HTTP/2 etc).
Let's check the second time a global counter of "ha_warning" messages, if
zero-warning is set. And let's do this just before forking. At this moment we
are sure, that we've already done all init operations, where we could emit
"ha_warning", and we still have stderr fd opened.
Even with the second check, we could lost some late and rare warnings
about failing to drop supplementary groups and about re-enabling core dumps.
Notes about this are added into 'zero-warning' keyword description.
Since patch f3dfd95a ("MEDIUM: ocsp: fix ocsp when the chain is loaded
from 'issuers-chain-path'") the OCSP features are compatible with
'issuers-chain-path'.
The SPOE was refactored. Several parameters were deprecated. Fragmentation
and async capabilities support were removed. The default log-format was
updated too.
So, the SPOE documentation was updated accordingly.
The related issue is #2502.
Add a startup test for GSO support in quic_test_socketopts() and
automatically activate it in qc_prep_pkts() when building datagrams as
big as MTU.
Also define a new config option tune.quic.disable-udp-gso. This is
useful to prevent warning on older platform or to debug an issue which
may be related to GSO.
Released version 3.1-dev3 with the following main changes :
- BUG/MINOR: quic: Wrong datagram building when probing.
- BUG/MEDIUM: quic: fix possible exit from qc_check_dcid() without unlocking
- BUG/MINOR: promex: Remove Help prefix repeated twice for each metric
- DOC: configuration: add details about crt-store in bind "crt" keyword
- BUG/MEDIUM: hlua/cli: Fix lua CLI commands to work with applet's buffers
- DOC: configuration: more details about the master-worker mode
- BUG/MEDIUM: server: fix race on server_atomic_sync()
- BUG/MINOR: jwt: don't try to load files with HMAC algorithm
- CLEANUP: quic: cleanup prototypes related to CIDs handling
- CLEANUP: quic: remove non-existing quic_cid_tree definition
- MINOR: quic: remove access to CID global tree outside of quic_cid module
- REORG: quic: remove quic_cid_trees reference from proto_quic
- MINOR: quic: add 2 BUG_ON() on datagram dispatch
- MINOR: quic: ensure quic_conn is never removed on thread affinity rebind
- MEDIUM: init: set default for fd_hard_limit via DEFAULT_MAXFD
- DOC: configuration: update maxconn description
- MINOR: proto: extend connection thread rebind API
- BUG/MEDIUM: quic: prevent crash on accept queue full
- BUG/MEDIUM: peers: Fix crash when syncing learn state of a peer without appctx
- CI: add weekly QUIC Interop regression against LibreSSL
- DEV: flags/quic: decode quic_conn flags
- MINOR: quic: rename "ssl error" trace
- BUG/MEDIUM: init: fix fd_hard_limit default in compute_ideal_maxconn
- BUG/MINOR: jwt: fix variable initialisation
- MINOR: ssl/sample: ssl_c_san returns a comma separated list of SAN
- OPTIM: pool: improve needed_avg cache line access pattern
- MAJOR: import: update mt_list to support exponential back-off (try #2)
- CI: weekly QUIC Interop: try to fix private image
- BUG/MINOR: h1: Fail to parse empty transfer coding names
- BUG/MINOR: h1: Reject empty coding name as last transfer-encoding value
- BUG/MEDIUM: h1: Reject empty Transfer-encoding header
- BUG/MEDIUM: spoe: Be sure to create a SPOE applet if none on the current thread
- BUILD: listener: silence a build warning about unused value without threads
- DOC: architecture: remove the totally outdated architecture manual
- SCRIPTS: create-release: no more need to skip architecture.txt
We've discussed about removing it many times and I thought it had been
removed long ago, but apparently not as William proved me. Let's get
rid of it now. It's totally outdated (last updated 18 years ago, when
laptop processors were still 32 bits), mentions keywords and external
products that don't exist anymore. It's not even on docs.haproxy.org.
At some point, old stuff must really die.
This is the second attempt at importing the updated mt_list code (commit
59459ea3). The previous one was attempted with commit c618ed5ff4 ("MAJOR:
import: update mt_list to support exponential back-off") but revealed
problems with QUIC connections and was reverted.
The problem that was faced was that elements deleted inside an iterator
were no longer reset, and that if they were to be recycled in this form,
they could appear as busy to the next user. This was trivially reproduced
with this:
$ cat quic-repro.cfg
global
stats socket /tmp/sock1 level admin
stats timeout 1h
limited-quic
frontend stats
mode http
bind quic4@:8443 ssl crt rsa+dh2048.pem alpn h3
timeout client 5s
stats uri /
$ ./haproxy -db -f quic-repro.cfg &
$ h2load -c 10 -n 100000 --npn h3 https://127.0.0.1:8443/
=> hang
This was purely an API issue caused by the simplified usage of the macros
for the iterator. The original version had two backups (one full element
and one pointer) that the user had to take care of, while the new one only
uses one that is transparent for the user. But during removal, the element
still has to be unlocked if it's going to be reused.
All of this sparked discussions with Fred and Aurlien regarding the still
unclear state of locking. It was found that the lock API does too much at
once and is lacking granularity. The new version offers a much more fine-
grained control allowing to selectively lock/unlock an element, a link,
the rest of the list etc.
It was also found that plenty of places just want to free the current
element, or delete it to do anything with it, hence don't need to reset
its pointers (e.g. event_hdl). Finally it appeared obvious that the
root cause of the problem was the unclear usage of the list iterators
themselves because one does not necessarily expect the element to be
presented locked when not needed, which makes the unlock easy to overlook
during reviews.
The updated version of the list presents explicit lock status in the
macro name (_LOCKED or _UNLOCKED suffixes). When using the _LOCKED
suffix, the caller is expected to unlock the element if it intends to
reuse it. At least the status is advertised. The _UNLOCKED variant,
instead, always unlocks it before starting the loop block. This means
it's not necessary to think about unlocking it, though it's obviously
not usable with everything. A few _UNLOCKED were used at obvious places
(i.e. where the element is deleted and freed without any prior check).
Interestingly, the tests performed last year on QUIC forwarding, that
resulted in limited traffic for the original version and higher bit
rate for the new one couldn't be reproduced because since then the QUIC
stack has gaind in efficiency, and the 100 Gbps barrier is now reached
with or without the mt_list update. However the unit tests definitely
show a huge difference, particularly on EPYC platforms where the EBO
provides tremendous CPU savings.
Overall, the following changes are visible from the application code:
- mt_list_for_each_entry_safe() + 1 back elem + 1 back ptr
=> MT_LIST_FOR_EACH_ENTRY_LOCKED() or MT_LIST_FOR_EACH_ENTRY_UNLOCKED()
+ 1 back elem
- MT_LIST_DELETE_SAFE() no longer needed in MT_LIST_FOR_EACH_ENTRY_UNLOCKED()
=> just manually set iterator to NULL however.
For MT_LIST_FOR_EACH_ENTRY_LOCKED()
=> mt_list_unlock_self() (if element going to be reused) + NULL
- MT_LIST_LOCK_ELT => mt_list_lock_full()
- MT_LIST_UNLOCK_ELT => mt_list_unlock_full()
- l = MT_LIST_APPEND_LOCKED(h, e); MT_LIST_UNLOCK_ELT();
=> l=mt_list_lock_prev(h); mt_list_lock_elem(e); mt_list_unlock_full(e, l)
The ssl_c_san sample fetch returns a list of Subject Alt Name which was
presented by the client certificate.
The format is the same as the "openssl x509 -text" command, it's a
Description: Value list separated by commas.
The format is directly generated by the GENERAL_NAME_print() openssl
function.
https://github.com/openssl/openssl/blob/openssl-3.0/crypto/x509/v3_san.c#L207
Example:
IP Address:127.0.0.1, IP Address:127.0.0.2, IP Address:127.0.0.3, URI:http://docs.haproxy.org/2.7/, DNS:ca.tests.haproxy.com
Let's update maxconn keyword description, in order to make it clear, which
setting has the precedence over the global.maxconn and the SYSTEM_MAXCONN if
set.
Let's provide a default value for fd_hard_limit, if it's not set in the
configuration. With this patch we could set some specific default via
compile-time variable DEFAULT_MAXFD as well. Hope, this will be helpfull for
haproxy package maintainers.
make -j 8 TARGET=linux-glibc DEBUG=-DDEFAULT_MAXFD=50000
If haproxy is comipled without DEFAULT_MAXFD defined, the default will be set
to 1048576.
This is done to avoid killing the process by its watchdog, while it started
without any limitations in its configuration or in the command line and the
hard RLIMIT_NOFILE is extremely huge (~1000000000). We use in this case
compute_ideal_maxconn() to calculate maxconn and maxsock, maxsock defines the
size of internal fdtab, which becames very-very large as well. When
the process starts to simply loop over this fdtab (0(n)), this takes a lot of
time, so watchdog does it job.
To avoid this, maxconn now is always reduced to some reasonable value either
by explicit global.fd-hard-limit from configuration, or by its default. The
default may be changed at build-time and overwritten then by
global.fd-hard-limit at runtime. Explicit global.fd-hard-limit from the
configuration has always precedence over DEFAULT_MAXFD, if set.
Must be backported in all stable versions until v2.6.0, including v2.6.0.
Released version 3.1-dev2 with the following main changes :
- BUG/MINOR: log: fix broken '+bin' logformat node option
- DEBUG: hlua: distinguish burst timeout errors from exec timeout errors
- REGTESTS: ssl: fix some regtests 'feature cmd' start condition
- BUG/MEDIUM: ssl: AWS-LC + TLSv1.3 won't do ECDSA in RSA+ECDSA configuration
- MINOR: ssl: activate sigalgs feature for AWS-LC
- REGTESTS: ssl: activate new SSL reg-tests with AWS-LC
- BUG/MEDIUM: proxy: fix email-alert invalid free
- REORG: mailers: move free_email_alert() to mailers.c
- BUG/MINOR: proxy: fix email-alert leak on deinit() (2nd try)
- DOC: configuration: fix alphabetical order of bind options
- DOC: management: document ptr lookup for table commands
- BUG/MAJOR: quic: fix padding with short packets
- BUG/MAJOR: quic: do not loop on emission on closing/draining state
- MINOR: sample: date converter takes HTTP date and output an UNIX timestamp
- SCRIPTS: git-show-backports: do not truncate git-show output
- DOC: api/event_hdl: small updates, fix an example and add some precisions
- BUG/MINOR: h3: fix crash on STOP_SENDING receive after GOAWAY emission
- BUG/MINOR: mux-quic: fix crash on qcs SD alloc failure
- BUG/MINOR: h3: fix BUG_ON() crash on control stream alloc failure
- BUG/MINOR: quic: fix BUG_ON() on Tx pkt alloc failure
- DEV: flags/show-fd-to-flags: adapt to recent versions
- MINOR: capabilities: export capget and __user_cap_header_struct
- MINOR: capabilities: prepare support for version 3
- MINOR: capabilities: use _LINUX_CAPABILITY_VERSION_3
- MINOR: cli/debug: show dev: add cmdline and version
- MINOR: cli/debug: show dev: show capabilities
- MINOR: debug: print gdb hints when crashing
- BUILD: debug: also declare strlen() in __ABORT_NOW()
- BUILD: Missing inclusion header for ssize_t type
- BUG/MINOR: hlua: report proper context upon error in hlua_cli_io_handler_fct()
- MINOR: cfgparse/log: remove leftover dead code
- BUG/MEDIUM: stick-table: Decrement the ref count inside lock to kill a session
- MINOR: stick-table: Always decrement ref count before killing a session
- REORG: init: do MODE_CHECK_CONDITION logic first
- REORG: init: encapsulate CHECK_CONDITION logic in a func
- REORG: init: encapsulate 'reload' sockpair and master CLI listeners creation
- REORG: init: encapsulate code that reads cfg files
- BUG/MINOR: server: fix first server template name lookup UAF
- MINOR: activity: make the memory profiling hash size configurable at build time
- BUG/MEDIUM: server/dns: prevent DOWN/UP flap upon resolution timeout or error
- BUG/MEDIUM: h3: ensure the ":method" pseudo header is totally valid
- BUG/MEDIUM: h3: ensure the ":scheme" pseudo header is totally valid
- BUG/MEDIUM: quic: fix race-condition in quic_get_cid_tid()
- BUG/MINOR: quic: fix race condition in qc_check_dcid()
- BUG/MINOR: quic: fix race-condition on trace for CID retrieval
Fix an example suggesting that using EVENT_HDL_SUB_TYPE(x, y) with y being
0 was valid. Then add some notes to explain how to use
EVENT_HDL_SUB_FAMILY() and EVENT_HDL_SUB_TYPE() with valid values.
Also mention that the feature is available starting from 2.8 and not 2.7.
Finally, perform some purely cosmetic updates.
This could be backported in 2.8.
Add missing documentation and examples for the optional ptr lookup method
for table {show,set,clear} commands introduced in commit 9b2717e7 ("MINOR:
stktable: use {show,set,clear} table with ptr"), as initially described in
GH #2118.
It may be backported in 3.0.
Released version 3.1-dev1 with the following main changes :
- REGTESTS: Remove REQUIRE_VERSION=2.1 from all tests
- REGTESTS: Remove REQUIRE_VERSION=2.2 from all tests
- CI: use "--no-install-recommends" for apt-get
- CI: switch to lua 5.4
- CI: use USE_PCRE2 instead of USE_PCRE
- DOC: replace the README by a markdown version
- CI: VTest: accelerate package install a bit
- ADMIN: acme.sh: remove the old acme.sh code
- BUG/MINOR: cfgparse: remove the correct option on httpcheck send-state warning
- BUG/MINOR: tcpcheck: report correct error in tcp-check rule parser
- BUG/MINOR: tools: fix possible null-deref in env_expand() on out-of-memory
- DOC: configuration: add an example for keywords from crt-store
- CI: speedup apt package install
- DOC: add the FreeBSD status badge to README.md
- DOC: change the link to the FreeBSD CI in README.md
- MINOR: stktable: avoid ambiguous stktable_data_ptr() usage in cli_io_handler_table()
- BUG/MINOR: hlua: use CertCache.set() from various hlua contexts
- CLEANUP: hlua: fix CertCache class comment
- CI: FreeBSD: upgrade image, packages
- BUG/MEDIUM: h1-htx: Don't state interim responses are bodyless
- MEDIUM: stconn: Be able to unblock zero-copy data forwarding from done_fastfwd
- BUG/MEDIUM: mux-quic: Unblock zero-copy forwarding if the txbuf can be released
- BUG/MINOR: quic: prevent crash on qc_kill_conn()
- CLEANUP: hlua: use hlua_pusherror() where relevant
- BUG/MINOR: hlua: don't use lua_pushfstring() when we don't expect LJMP
- BUG/MINOR: hlua: fix unsafe hlua_pusherror() usage
- BUG/MINOR: hlua: prevent LJMP in hlua_traceback()
- CLEANUP: hlua: get rid of hlua_traceback() security checks
- BUG/MINOR: hlua: fix leak in hlua_ckch_set() error path
- CLEANUP: hlua: simplify ambiguous lua_insert() usage in hlua_ctx_resume()
- BUG/MEDIUM: mux-quic: Don't unblock zero-copy fwding if blocked during nego
- MINOR: mux-quic: Don't send an emtpy H3 DATA frame during zero-copy forwarding
- BUG/MEDIUM: ssl: wrong priority whem limiting ECDSA ciphers in ECDSA+RSA configuration
- BUG/MEDIUM: ssl: bad auth selection with TLS1.2 and WolfSSL
- BUG/MINOR: quic: fix computed length of emitted STREAM frames
- BUG/MINOR: quic: ensure Tx buf is always purged
- BUG/MEDIUM: stconn/mux-h1: Fix suspect change causing timeouts
- BUG/MAJOR: mux-h1: Properly copy chunked input data during zero-copy nego
- BUG/MINOR: mux-h1: Use the right variable to set NEGO_FF_FL_EXACT_SIZE flag
- DOC: install: remove boringssl from the list of supported libraries
- MINOR: log: fix "http-send-name-header" ignore warning message
- BUG/MINOR: proxy: fix server_id_hdr_name leak on deinit()
- BUG/MINOR: proxy: fix log_tag leak on deinit()
- BUG/MINOR: proxy: fix email-alert leak on deinit()
- BUG/MINOR: proxy: fix check_{command,path} leak on deinit()
- BUG/MINOR: proxy: fix dyncookie_key leak on deinit()
- BUG/MINOR: proxy: fix source interface and usesrc leaks on deinit()
- BUG/MINOR: proxy: fix header_unique_id leak on deinit()
- MINOR: proxy: add proxy_free_common() helper function
- BUG/MEDIUM: proxy: fix UAF with {tcp,http}checks logformat expressions
- MINOR: log: change wording in lf_expr_postcheck() error message
- BUG/MEDIUM: log: fix lf_expr_postcheck() behavior with default section
- CLEANUP: log/proxy: fix comment in proxy_free_common()
- DOC: config: move "hash-key" from proxy to server options
- DOC: config: add missing section hint for "guid" proxy keyword
- DOC: config: add missing context hint for new server and proxy keywords
- BUG/MINOR: promex: Skip resolvers metrics when there is no resolver section
- DOC: internals: add a documentation about the master worker
- BUG/MAJOR: mux-h1: Prevent any UAF on H1 connection after draining a request
- BUG/MINOR: quic: fix padding of INITIAL packets
- OPTIM: quic: fill whole Tx buffer if needed
- MINOR: quic: refactor qc_build_pkt() error handling
- MINOR: quic: use global datagram headlen definition
- MINOR: quic: refactor qc_prep_pkts() loop
- DOC/MINOR: management: add missed -dR and -dv options
- DOC/MINOR: management: add -dZ option
- DOC: management: rename show stats domain cli "dns" to "resolvers"
- REORG: log: reorder send log helpers by dependency order
- MINOR: session: expose session_embryonic_build_legacy_err() function
- MEDIUM: log/session: handle embryonic session log within sess_log()
- MINOR: log: provide sending log context to process_send_log() when available
- MINOR: log: add log_orig_to_str() function
- MINOR: log: provide log origin in logformat expressions using '%OG'
- CLEANUP: log: remove ambiguous legacy comment for resolve_logger()
- MINOR: log/backend: always free parsing hints in resolve_logger()
- MINOR: log: make resolve_logger() static
- MINOR: log: provide proxy context to resolve_logger()
- MINOR: log: add __send_log_set_metadata_sd helper
- MINOR: log: add logger flags
- MINOR: log: add log-profile parsing logic
- MINOR: log: add log profile buildlines
- MEDIUM: log: handle log-profile in process_send_log()
- DOC: config: add documentation for log profiles
- REGTESTS: log: add a test for log-profile
- MINOR: ssl: add ssl_sock_bind_verifycbk() in ssl_sock.h
- REORG: ssl: move the SNI selection code in ssl_clienthello.c
- BUILD: ssl: fix build with wolfSSL
- CI: github: upgrade aws-lc to 1.29.0
- Revert "CI: github: upgrade aws-lc to 1.29.0"
- MEDIUM: ssl: support for ECDA+RSA certificate selection with AWS-LC
- BUILD: ssl: disable deprecated functions for AWS-LC 1.29.0
- MINOR: ssl: relax the 'ssl.default-dh-param' keyword parsing
- CI: github: upgrade aws-lc to 1.29.0
- DOC: INSTALL: minimum AWS-LC version is v1.22.0
- CI: github: do the AWS-LC weekly build with ERR=1
Now that log-profile parsing logic has been implemented in "MINOR: log:
add log-profile parsing logic" and is actually effective since "MEDIUM:
log: handle log-profile in process_send_log()", let's document the feature
and add some examples.
Log-profile section is declared like this:
log-profile myprof
log-tag "custom-tag"
on error format "%ci: error"
on any format "(custom httplog) ${HAPROXY_HTTP_LOG_FMT}" sd "[exampleSDID@1234 step=\"accept\" id=\"%ID\"]"
(check out the documentation for the full list of options, some options
are only relevant under specific contexts)
And used this way (from usual "log" directive lines):
global
log stdout format rfc5424 profile myprof local0
--------------
For now, the use of log-profiles is somewhat limited because we lack
the ability to explicitly trigger the log building process at specific
steps during the stream handling, but it should gain more traction over
the time as the feature evolves and new mechanisms allowing the emission
of logs at expected processing steps will be added.
It should partially fix GH #401
'%OG' logformat alias may be used to report the log origin (when/where)
that triggered log generation using sess_build_logline().
Possible values are:
- "sess_error": log was generated during session error handling
- "sess_killed": log was generated during session abortion (killed
embryonic session)
- "txn_accept": log was generated right after frontend conn was accepted
- "txn_request": log was generated after client request was received
- "txn_connect": log was generated after backend connection establishment
- "txn_response": log was generated during server response handling
- "txn_close": log was generated at the final txn step, before closing
- "unspec": unknown or not specified
Documentation was updated.
In commit f8642ee82 ("MEDIUM: resolvers: rename dns extra counters to
resolvers extra counters"), we renamed "dns" counters to "resolvers", but
we forgot to update the documentation accordingly.
This may be backported to all stable versions.
Add a documentation about the history of the master-worker and how it
was implemented in its first version and how it is currently working.
This is a global view of the architecture, and not an exhaustive
explanation of all mechanisms.
To stay consistent with the work started in 54627f991 ("DOC: config: add
context hint for proxy keywords") and 3d4e1e682 ("DOC: config: add context
hint for server keywords"), we add missing context hint for "guid" (both
proxy and server) keyword and "hash-key" server keyword that were added
during 3.0 development.
This may be backported in 3.0.
"guid" proxy keyword added in da754b45 ("MINOR: proxy: implement GUID
support") was lacking the section hint in the keyword description, let's
fix that.
It could be backported in 3.0 with da754b45.
As reported by Ashley Morris, "hash-key" keyword which was introduced in
commit faa8c3e0 ("MEDIUM: lb-chash: Deterministic node hashes based on
server address") doesn't belong to proxy keywords and should be found in
5.2 "Server and default-server options" instead.
It should be backported in 3.0 with faa8c3e0
Using CertCache.set() from init context wasn't explicitly supported and
caused the process to crash:
crash.lua:
core.register_init(function()
CertCache.set{filename="reg-tests/ssl/set_cafile_client.pem", ocsp=""}
end)
crash.conf:
global
lua-load crash.lua
listen front
bind localhost:9090 ssl crt reg-tests/ssl/set_cafile_client.pem ca-file reg-tests/ssl/set_cafile_interCA1.crt verify none
./haproxy -f crash.conf
[NOTICE] (267993) : haproxy version is 3.0-dev2-640ff6-910
[NOTICE] (267993) : path to executable is ./haproxy
[WARNING] (267993) : config : missing timeouts for proxy 'front'.
| While not properly invalid, you will certainly encounter various problems
| with such a configuration. To fix this, please ensure that all following
| timeouts are set to a non-zero value: 'client', 'connect', 'server'.
[1] 267993 segmentation fault (core dumped) ./haproxy -f crash.conf
This is because in hlua_ckch_set/hlua_ckch_commit_yield, we always
consider that we're being called from a yield-capable runtime context.
As such, hlua_gethlua() is never checked for NULL and we systematically
try to wake hlua->task and yield every 10 instances.
In fact, if we're called from the body or init context (that is, during
haproxy startup), hlua_gethlua() will return NULL, and in this case we
shouldn't care about yielding because it is ok to commit all instances
at once since haproxy is still starting up.
Also, when calling CertCache.set() from a non-yield capable runtime
context (such as hlua fetch context), we kept doing as if the yield
succeeded, resulting in unexpected function termination (operation
would be aborted and the CertCache lock wouldn't be released). Instead,
now we explicitly state in the doc that CertCache.set() cannot be used
from a non-yield capable runtime context, and we raise a runtime error
if it is used that way.
These bugs were discovered by reading the code when trying to address
Svace report documented by @Bbulatov GH #2586.
It should be backported up to 2.6 with 30fcca18 ("MINOR: ssl/lua:
CertCache.set() allows to update an SSL certificate file")
In ticket #785, people are still confused about how to use the crt-store
load parameters in a crt-list.
This patch adds an example.
This must be backported in 3.0
This patch removes the old README file and replaces it with a more
modern markdown version which allows clickable links on the github page.
It also adds some of the Github Actions worfklow Status.
This patch includes the HAProxy png in the doc directory.
Released version 3.0.0 with the following main changes :
- MINOR: sample: implement the uptime sample fetch
- CI: scripts: fix build of vtest regarding option -C
- CI: scripts: build vtest using multiple CPUs
- MINOR: log: rename 'log-format tag' to 'log-format alias'
- DOC: config: document logformat item naming and typecasting features
- BUILD: makefile: yearly reordering of objects by build time
- BUILD: fd: errno is also needed without poll()
- DOC: config: fix two typos "RST_STEAM" vs "RST_STREAM"
- DOC: config: refer to the non-deprecated keywords in ocsp-update on/off
- DOC: streamline http-reuse and connection naming definition
- REGTESTS: complete http-reuse test with pool-conn-name
- DOC: config: add %ID logformat alias alternative
- CLEANUP: ssl/ocsp: readable ifdef in ssl_sock_load_ocsp
- BUG/MINOR: ssl/ocsp: init callback func ptr as NULL
- CLEANUP: ssl_sock: move dirty openssl-1.0.2 wrapper to openssl-compat
- BUG/MINOR: activity: fix Delta_calls and Delta_bytes count
- CI: github: upgrade the WolfSSL job to 5.7.0
- DOC: install: update quick build reminders with some missing options
- DOC: install: update the range of tested openssl version to cover 3.3
- DEV: patchbot: prepare for new version 3.1-dev
- MINOR: version: mention that it's 3.0 LTS now.
With the introduction of "pool-conn-name", documentation related to
http-reuse was rendered more complex than already, notably with multiple
cross-references between "pool-conn-name" and "sni" server keywords.
Took the opportunity to improve all http-reuse related documentation.
First, "http-reuse" keyword general purpose has been greatly expanded
and reordered.
Then, "pool-conn-name" and "sni" have been clarified, in particular the
relation between them, with the foremost being an advanced usage to the
default SSL SNI case in the context of http-reuse. Also update
attach-srv rule documentation as its name parameter is directly linked
to both "pool-conn-name" and "sni".
The ability to give a name to a logformat_node (known as logformat item in
the documentation) implemented in 2ed6068f2a ("MINOR: log: custom name for
logformat node") wasn't documented.
The same goes for the ability to force the logformat_node's output type to
a specific type implemented in 1448478d62 ("MINOR: log: explicit
typecasting for logformat nodes")
Let's quickly describe such new usages at the start of the custom log
format section.
In 2.9 we started to introduce an ambiguity in the documentation by
referring to historical log-format variables ('%var') as log-format
tags in 739c4e5b1e ("MINOR: sample: accept_date / request_date return
%Ts / %tr timestamp values") and 454c372b60 ("DOC: configuration: add
sample fetches for timing events").
In fact, we've had this confusion between log-format tag and log-format
var for more than 10 years now, but in 2.9 it was the first time the
confusion was exposed in the documentation.
Indeed, both 'log-format variable' and 'log-format tag' actually refer
to the same feature (that is: '%B' and friends that can be used for
direct access to some log-oriented predefined fetches instead of using
%[expr] with generic sample expressions).
This feature was first implemented in 723b73ad75 ("MINOR: config: Parse
the string of the log-format config keyword") and later documented in
4894040fa ("DOC: log-format documentation"). At that time, it was clear
that we used to name it 'log-format variable'.
But later the same year, 'log-format tag' naming started to appear in
some commit messages (while still referring to the same feature), for
instance with ffc3fcd6d ("MEDIUM: log: report SSL ciphers and version
in logs using logformat %sslc/%sslv").
Unfortunately in 2.9 when we added (and documented) new log-format
variables we officially started drifting to the misleading 'log-format
tag' naming (perhaps because it was the most recent naming found for
this feature in git log history, or because the confusion has always
been there)
Even worse, in 3.0 this confusion led us to rename all 'var' occurrences
to 'tag' in log-format related code to unify the code with the doc.
Hopefully William quickly noticed that we made a mistake there, but
instead of reverting to historical naming (log-format variable), it was
decided that we must use a different name that is less confusing than
'tags' or 'variables' (tags and variables are keywords that are already
used to designate other features in the code and that are not very
explicit under log-format context today).
Now we refer to '%B' and friends as a logformat alias, which is
essentially a handy way to print some log oriented information in the
log string instead of leveraging '%[expr]' with generic sample expressions
made of fetches and converters. Of course, there are some subtelties, such
as a few log-format aliases that still don't have sample fetch equivalent
for historical reasons, and some aliases that may be a little faster than
their generic sample expression equivalents because most aliases are
pretty much hardcoded in the log building function. But in general
logformat aliases should be simply considered as an alternative to using
expressions (with '%[expr']')
Also, under log-format context, when we want to refer to either an alias
('%alias') or an expression ('%[expr]'), we should use the generic term
'logformat item', which in fact designates a single item within the
logformat string provided by the user. Indeed, a logformat item (whether
is is an alias or an expression) always starts with '%' and may accept
optional flags / arguments
Both the code and the documentation were updated in that sense, hopefully
this will clarify things and prevent future confusions.
Released version 3.0-dev13 with the following main changes :
- CLEANUP: ssl/cli: remove unused code in dump_crtlist_conf
- MINOR: ssl: check parameter in ckch_conf_cmp()
- BUG/MINOR: ring: free ring's allocated area not ring's usable area when using maps
- DOC: configuration: rework the crt-store load documentation
- DEBUG: tools: add vma_set_name() helper
- DEBUG: shctx: name shared memory using vma_set_name()
- DEBUG: sink: add name hint for memory area used by memory-backed sinks
- DEBUG: pollers: add name hint for large memory areas used by pollers
- DEBUG: errors: add name hint for startup-logs memory area
- DEBUG: fd: add name hint for large memory areas
- MEDIUM: ssl: don't load file by discovering them in crt-store
- DOC: configuration: update the crt-list documentation
- DOC: configuration: add the supported crt-store options in crt-list
- BUG/MEDIUM: proto: fix fd leak in <proto>_connect_server
- MINOR: sock: set conn->err_code in case of EPERM
- BUG/MINOR: http-ana: Don't crush stream termination condition on internal error
- MAJOR: spoe: Let the SPOE back into the game
- BUG/MINOR: connection: parse PROXY TLV for LOCAL mode
- BUG/MINOR: server: free PROXY v2 TLVs on srv drop
- MINOR: rhttp: add log on connection allocation failure
- BUG/MEDIUM: rhttp: fix preconnect on single-thread
- BUG/MINOR: rhttp: prevent listener suspend
- BUG/MINOR: rhttp: fix task_wakeup state
- MINOR: session: define flag to explicitely release listener on free
- MEDIUM: rhttp: create session for active preconnect
- MINOR: rhttp: support PROXY emission on preconnect
- MINOR: connection: support PROXY v2 TLV emission without stream
- MINOR: traces: enumerate the list of levels/verbosities when not found
- BUG/MINOR: sock: fix sock_create_server_socket
- MINOR: proto: fix coding style
- BUG/MAJOR: quic: Crash with TLS_AES_128_CCM_SHA256 (libressl only)
- REGTESTS: scripts: allow to change the vtest timeout
- BUG/MEDIUM: quic_tls: prevent LibreSSL < 4.0 from negotiating CHACHA20_POLY1305
- CI: scripts/build-ssl.sh: loudly fail on unsupported platforms
- BUG/MEDIUM: mux-quic: Create sedesc in same time of the QUIC stream
- MINOR: mux-quic: Set abort info for SC-less QCS on STOP_SENDING frame
- CI: scripts/build-ssl: add a DESTDIR and TMPDIR variable
- CI: scripts/buil-ssl: cleanup the boringssl and quictls build
- MINOR: config: add thread-hard-limit to set an upper bound to nbthread
- BUILD: quic: fix unused variable warning when threads are disabled
- BUG/MEDIUM: stick-tables: Fix race with peers when trashing oldest entries
- BUG/MEDIUM: stick-tables: Fix race with peers when killing a sticky session
- BUG/MEDIUM: stick-tables: make sure never to create two same remote entries
- CLEANUP: stick-tables: remove a few unneeded tests for use_wrlock
- MINOR: stick-tables: remove the uneeded read lock in stksess_free()
- CLEANUP: tools: fix vma_set_name() function comment
- DEBUG: tools: add vma_set_name_id() helper
- DEBUG: pollers/fd: add thread id suffix to per-thread memory areas name hints
- DOC: config: fix aes_gcm_enc() description text
- BUILD: trace: fix warning on null dereference
- MEDIUM: config: prevent communication with privileged ports
- MAJOR: config: prevent QUIC with clients privileged port by default
- BUG/MINOR: quic: adjust restriction for stateless reset emission
- MINOR: quic: clarify doc for quic_recv()
- MINOR: server: generalize sni expr parsing
- MINOR: server: define pool-conn-name keyword
- MEDIUM: connection: use pool-conn-name instead of sni on reuse
- BUG/MINOR: rhttp: initialize session origin after preconnect reversal
- BUG/MEDIUM: server/dns: preserve server's port upon resolution timeout or error
- BUG/MINOR: http-htx: Support default path during scheme based normalization
- BUG/MINOR: server: Don't reset resolver options on a new default-server line
- DOC: quic: specify that connection migration is not supported
- DOC: config: fix incorrect section reference about custom log format
- DOC: config: uniformize the naming and description of custom log format args
- DOC: config: clarify the fact that custom log format is not just for logging
- REGTESTS: acl_cli_spaces: avoid a warning caused by undefined logs
The wording in the Custom log format section was still extremely centered
on logging, but it's about time to mention that these are usable for other
actions as well, otherwise it's very confusing for newcomers who try to
define a variable or header. The updated text also reminds about the risks
of safe encodings that may (rarely) mangle an output string, and encourages
to migrate away from the unquoted definition which is full of backslashes.
It would definitely deserve further improvements and refinements.
A significant number of actions now take arguments that are evaluated as
log-format expressions. Some of them are called "fmt", others "string".
The description of the argument sometimes just says "the log-format
string" or "log format" or "custom log format" etc. Most of them do not
mention the section to visit, and section 8.2 speaking about log-format
is very centric on logs usage (the primary use case), making all of this
very confusing for newcomers.
Since section 8.2.6 is titled "Custom log format" and describes the syntax
to be used with the "log-format" (and other) directives, let's call this
"Custom log format" everywhere and mention section 8.2.6. When the field
was called "string", it was also renamed to "fmt".
It doesn't seem worth backporting this, unless it applies fine.
Since 2.5 with commit 98b930d043 ("MINOR: ssl: Define a default https
log format"), some log-format sections were shifted a bit without having
been renumberred, causing 8.2.4 to be referenced as the custom log
format while it's in fact 8.2.6. This patch fixes the affected
locations.
In addition two places mentioned 8.2.6 instead of 8.2.5 for the error
log format.
This can be backported to 2.6.
Currently haproxy does not support QUIC connection migration. This is
advertized to clients on their connections. Document this in the first
QUIC related paragraph.
This should be backported up to 2.6.
Define a new server keyword pool-conn-name. The purpose of this keyword
will be to identify connections inside the idle connections pool,
replacing SNI in case SSL is not wanted.
This keyword uses a sample expression argument. It thus can reuse
existing function parse_srv_expr() for parsing. In the future, it may be
necessary to define a keyword variant which uses a logformat for
extensability.
This patch only implement parsing. Argument is stored inside new server
field <pool_conn_name> and expression is generated in
_srv_parse_finalize() into <pool_conn_name_expr>.
If pool-conn-name is not set but SNI is, the latter is reused
automatically as pool-conn-name via _srv_parse_finalize(). This ensures
current reuse behavior remains compatible and idle connection reuse will
not mix connections with different SNIs by mistake.
Main usage will be for rhttp when SSL is not wanted between the two
haproxy instances. Previously, it was possible to use "sni" keyword even
without SSL on a server line which have a similar effect. However,
having a dedicated "pool-conn-name" keyword is deemed clearer. Besides,
it would allow for more complex configuration where pool-conn-name and
SNI are use in parallel with different values.
Previous commit introduce new protection mechanism to forbid
communications with clients which use a privileged source port. By
default, this mechanism is disabled for every protocols.
This patch changes the default value and activate the protection
mechanism for QUIC protocol. This is justified as it is a probable sign
of DNS/NTP amplification attack.
This is labelled as major as it can be a breaking change with some
network environments.
This commit introduces a new global setting named
harden.reject_privileged_ports.{tcp|quic}. When active, communications
with clients which use privileged source ports are forbidden. Such
behavior is considered suspicious as it can be used as spoofing or
DNS/NTP amplication attack.
Value is configured per transport protocol. For each TCP and QUIC
distinct code locations are impacted by this setting. The first one is
in sock_accept_conn() which acts as a filter for all TCP based
communications just after accept() returns a new connection. The second
one is dedicated for QUIC communication in quic_recv(). In both cases,
if a privileged source port is used and setting is disabled, received
message is silently dropped.
By default, protection are disabled for both protocols. This is to be
able to backport it without breaking changes on stable release.
This should be backported as it is an interesting security feature yet
relatively simple to implement.
On todays large systems, it's not always desired to run on all threads
for light loads, and usually users enforce nbthread to a lower value
(e.g. 8). The problem is that this is a fixed value, and moving such
configs to smaller machines continues to enforce the value and this
becomes extremely unproductive due to having more threads than CPUs.
This also happens quite a bit in VMs, containers, or cloud instances
of various sizes.
This commit introduces the thread-hard-limit setting that allows to only
set an upper bound to the number of threads without raising a lower value.
This means that using "thread-hard-limit 8" will make sure that no more
than 8 threads will be used when available, but it will remain two when
run on a dual-core machine.
It's quite frustrating, particularly on the command line, not to have
access to the list of available levels and verbosities when one does
not exist for a given source, because there's no easy way to find them
except by starting without and connecting to the CLI. Let's enumerate
the list of supported levels and verbosities when a name does not match.
For example:
$ ./haproxy -db -f quic-repro.cfg -dt h2:help
[NOTICE] (9602) : haproxy version is 3.0-dev12-60496e-27
[NOTICE] (9602) : path to executable is ./haproxy
[ALERT] (9602) : -dt: no such trace level 'help', available levels are 'error', 'user', 'proto', 'state', 'data', and 'developer'.
$ ./haproxy -db -f quic-repro.cfg -dt h2:user:help
[NOTICE] (9604) : haproxy version is 3.0-dev12-60496e-27
[NOTICE] (9604) : path to executable is ./haproxy
[ALERT] (9604) : -dt: no such trace verbosity 'help' for source 'h2', available verbosities for this source are: 'quiet', 'clean', 'minimal', 'simple', 'advanced', and 'complete'.
The same is done for the CLI where the existing help message is always
displayed when entering an invalid verbosity or level.
This reverts commits 885e40494c and
dff9807188.
We decided to spend some time to refactor and rationnalize the SPOE for the
3.1. Thus there is no reason to still consider it as deprecated for the
3.0. Compatibility between the both versions will be maintained.
See #2502 for more info.
The load keyword from the documentation has its own section to be
readable (like the server or bind options section).
The ocsp-update keyword was move from the bind section to the crt-list
load one.
Released version 3.0-dev12 with the following main changes :
- CI: drop asan.log umbrella completely
- BUG/MINOR: log: fix leak in add_sample_to_logformat_list() error path
- BUG/MINOR: log: smp_rgs array issues with inherited global log directives
- MINOR: rhttp: Don't require SSL when attach-srv name parsing
- REGTESTS: ssl: be more verbose with ocsp_compat_check.vtc
- DOC: Update UUID references to RFC 9562
- MINOR: hlua: add hlua_nb_instruction getter
- MEDIUM: hlua: take nbthread into account in hlua_get_nb_instruction()
- BUG/MEDIUM: server: clear purgeable conns before server deletion
- BUG/MINOR: mux-quic: fix error code on shutdown for non HTTP/3
- BUG/MINOR: qpack: fix error code reported on QPACK decoding failure
- BUG/MEDIUM: htx: mark htx_sl as packed since it may be realigned
- BUG/MEDIUM: stick-tables: properly mark stktable_data as packed
- SCRIPTS: run-regtests: fix a few occurrences of extended regexes
- BUG/MINOR: ssl_sock: fix xprt_set_used() to properly clear the TASK_F_USR1 bit
- MINOR: dynbuf: provide a b_dequeue() variant for multi-thread
- BUG/MEDIUM: muxes: enforce buf_wait check in takeover()
- BUG/MINOR: h1: Check authority for non-CONNECT methods only if a scheme is found
- BUG/MEDIUM: h1: Reject CONNECT request if the target has a scheme
- BUG/MAJOR: h1: Be stricter on request target validation during message parsing
- MINOR: qpack: prepare error renaming
- MINOR: h3/qpack: adjust naming for errors
- MINOR: h3: adjust error reporting on sending
- MINOR: h3: adjust error reporting on receive
- MINOR: mux-quic: support glitches
- MINOR: h3: report glitch on RFC violation
- BUILD: stick-tables: better mark the stktable_data as 32-bit aligned
- MINOR: ssl: rename tune.ssl.ocsp-update.mode in ocsp-update.mode
- REGTESTS: update the ocsp-update tests
- BUILD: stats: remove non portable getline() usage
- MEDIUM: ssl: add ocsp-update.mindelay and ocsp-update.maxdelay
- BUILD: log: get rid of non-portable strnlen() func
- BUG/MEDIUM: fd: prevent memory waste in fdtab array
- CLEANUP: compat: make the MIN/MAX macros more reliable
- Revert: MEDIUM: evports: permit to report multiple events at once"
- BUG/MINOR: stats: Don't state the 303 redirect response is chunked
- MINOR: mux-h1: Add a flag to ignore the request payload
- REORG: mux-h1: Group H1S_F_BODYLESS_* flags
- CLEANUP: mux-h1: Remove unused H1S_F_ERROR_MASK mask value
- MEDIUM: mux-h1: Support C-L/T-E header suppressions when sending messages
- MINOR: ssl: ckch_store_new_load_files_conf() loads filenames from ckch_conf
- MEDIUM: ssl/crtlist: loading crt-store keywords from a crt-list
- CLEANUP: ssl/ocsp: remove the deprecated parsing code for "ocsp-update"
- MINOR: ssl: pass ckch_store instead of ckch_data to ssl_sock_load_ocsp()
- MEDIUM: ssl: ckch_conf_parse() uses -1/0/1 for off/default/on
- MINOR: ssl: handle PARSE_TYPE_INT and PARSE_TYPE_ONOFF in ckch_store_load_files()
- MINOR: ssl/ocsp: use 'ocsp-update' in crt-store
- MINOR: ssl: ckch_conf_clean() utility function for ckch_conf
- MEDIUM: ssl: add ocsp-update.disable global option
- MEDIUM: ssl/cli: handle crt-store keywords in crt-list over the CLI
- MINOR: ssl: ckch_conf_cmp() compare multiple ckch_conf structures
- MEDIUM: ssl: temporarily load files by detecting their presence in crt-store
- REGTESTS: ocsp-update: change the reg-test to support the new crt-store mode
- DOC: capabilities: fix chapter header rendering
The header of a new management guide chapter, "13.1. Linux capabilities
support", is not rendered in HTML format in a proper way, because of missing
dots at the end of this chapter's number.
This option allow to disable completely the ocsp-update.
To achieve this, the ocsp-update.mode global keyword don't rely anymore
on SSL_SOCK_OCSP_UPDATE_OFF during parsing to call
ssl_create_ocsp_update_task().
Instead, we will inherit the SSL_SOCK_OCSP_UPDATE_* value from
ocsp-update.mode for each certificate which does not specify its own
mode.
To disable completely the ocsp without editing all crt entries,
ocsp-update.disable is used instead of "ocsp-update.mode" which is now
only used as the default value for crt.
Since the ocsp-update is not strictly a tuning of the SSL stack, but a
feature of its own, lets rename the option.
The option was also missing from the index.
Implement basic support for glitches on QUIC multiplexer. This is mostly
identical too glitches for HTTP/2.
A new configuration option named tune.quic.frontend.glitches-threshold
is defined to limit the number of glitches on a connection before
closing it.
Glitches counter is incremented via qcc_report_glitch(). A new
qcc_app_ops callback <report_susp> is defined. On threshold reaching, it
allows to set an application error code to close the connection. For
HTTP/3, value H3_EXCESSIVE_LOAD is returned. If not defined, default
code INTERNAL_ERROR is used.
For the moment, no glitch are reported for QUIC or HTTP/3 usage. This
will be added in future patches as needed.
Based on Willy's idea (from 3.0-dev6 announcement message): in this patch
we try to reduce the max latency that can be caused by running lua scripts
with default settings.
Indeed, by default, hlua engine is allowed to process up to 10k
instructions per batch. While this value was found to be the optimal one
for a single thread, it turns out that keeping a thread busy for 10k lua
instructions could increase thread contention. This is especially true
when the script is loaded with 'lua-load', because in that case the
current thread owns the main lua lock and prevent other threads from
making any progress if they're also waiting on the main lock.
Thanks to Thierry Fournier's work, we know that performance-wise we can
reach optimal performance by sticking between 500 and 10k instructions
per batch. Given that, when the script is loaded using 'lua-load', if no
"tune.lua.forced-yield" was set by the user, we automatically divide the
default value (10K) by the number of threads haproxy can use to reduce
thread contention (given that all threads could compete for the main lua
lock), however we make sure not to return a value below 500, because
Thierry's work showed that this would come with a significant performance
loss.
The historical behavior may still be enforced by setting
"tune.lua.forced-yield" to 10000 in the global config section.
When support for UUIDv7 was added in commit
aab6477b67
the specification still was a draft.
It has since been published as RFC 9562.
This patch updates all UUID references from the obsoleted RFC 4122 and the
draft for RFC 9562 to the published RFC 9562.
Released version 3.0-dev11 with the following main changes :
- BUILD: clock: improve check for pthread_getcpuclockid()
- CI: add Illumos scheduled workflow
- CI: netbsd: limit scheduled workflow to parent repo only
- OPTIM: log: resolve logformat options during postparsing
- BUG/MINOR: haproxy: only tid 0 must not sleep if got signal
- REGTEST: add tests for acl() sample fetch
- BUG/MINOR: acl: support built-in ACLs with acl() sample
- BUG/MINOR: cfgparse: use curproxy global var from config post validation
- MEDIUM: stconn/muxes: Add an abort reason for SE shutdowns on muxes
- MINOR: mux-h2: Set the SE abort reason when a RST_STREAM frame is received
- MEDIUM: mux-h2: Forward h2 client cancellations to h2 servers
- MINOR: mux-quic: Set tha SE abort reason when a STOP_SENDING frame is received
- MINOR: stconn: Add samples to retrieve about stream aborts
- MINOR: mux-quic: Add .ctl callback function to get info about a mux connection
- MINOR: muxes: Add ctl commands to get info on streams for a connection
- MINOR: connection: Add samples to retrieve info on streams for a connection
- BUG/MEDIUM: log/ring: broken syslog octet counting
- BUG/MEDIUM: mux-quic: fix crash on STOP_SENDING received without SD
- DOC: lua: fix filters.txt file location
- MINOR: dynbuf: pass a criticality argument to b_alloc()
- MINOR: dynbuf: add functions to help queue/requeue buffer_wait fields
- MINOR: dynbuf: use the b_queue()/b_requeue() functions everywhere
- MEDIUM: dynbuf: make the buffer_wq an array of list heads
- CLEANUP: tinfo: better align fields in thread_ctx
- MINOR: dynbuf: provide a b_dequeue() function to detach a bw from the queue
- MEDIUM: dynbuf: generalize the use of b_dequeue() to detach buffer_wait
- MEDIUM: dynbuf/stream: re-enable queueing upon failed buffer allocation
- MEDIUM: dynbuf/stream: do not allocate the buffers in the callback
- MEDIUM: applet: make appctx_buf_available() only wake the applet up, not allocate
- MINOR: applet: set the blocking flag in the buffer allocation function
- MINOR: applet: adjust the allocation criticity based on the requested buffer
- MINOR: dynbuf/mux-h1: use different criticalities for buffer allocations
- MEDIUM: dynbuf/mux-h1: do not allocate the buffers in the callback
- MEDIUM: dynbuf: refrain from offering a buffer if more critical ones are waiting
- MINOR: stconn: report that a buffer allocation succeeded
- MINOR: stream: report that a buffer allocation succeeded
- MINOR: applet: report about buffer allocation success
- MINOR: mux-h1: report that a buffer allocation succeeded
- MEDIUM: stream: allocate without queuing when retrying
- MEDIUM: channel: allocate without queuing when retrying
- MEDIUM: mux-h1: allocate without queuing when retrying
- MEDIUM: dynbuf: implement emergency buffers
- MEDIUM: dynbuf: use emergency buffers upon failed memory allocations
The buffer reserve set by tune.buffers.reserve has long been unused, and
in order to deal gracefully with failed memory allocations we'll need to
resort to a few emergency buffers that are pre-allocated per thread.
These buffers are only for emergency use, so every time their count is
below the configured number a b_free() will refill them. For this reason
their count can remain pretty low. We changed the default number from 2
to 4 per thread, and the minimum value is now zero (e.g. for low-memory
systems). The tune.buffers.limit setting has always been a problem when
trying to deal with the reserve but now we could simplify it by simply
pushing the limit (if set) to match the reserve. That was already done in
the past with a static value, but now with threads it was a bit trickier,
which is why the per-thread allocators increment the limit on the fly
before allocating their own buffers. This also means that the configured
limit is saner and now corresponds to the regular buffers that can be
allocated on top of emergency buffers.
At the moment these emergency buffers are not used upon allocation
failure. The only reason is to ease bisecting later if needed, since
this commit only has to deal with resource management.
The goal is to indicate how critical the allocation is, between the
least one (growing an existing buffer ring) and the topmost one (boot
time allocation for the life of the process).
The 3 tcp-based muxes (h1, h2, fcgi) use a common allocation function
to try to allocate otherwise subscribe. There's currently no distinction
of direction nor part that tries to allocate, and this should be revisited
to improve this situation, particularly when we consider that mux-h2 can
reduce its Tx allocations if needed.
For now, 4 main levels are planned, to translate how the data travels
inside haproxy from a producer to a consumer:
- MUX_RX: buffer used to receive data from the OS
- SE_RX: buffer used to place a transformation of the RX data for
a mux, or to produce a response for an applet
- CHANNEL: the channel buffer for sync recv
- MUX_TX: buffer used to transfer data from the channel to the outside,
generally a mux but there can be a few specificities (e.g.
http client's response buffer passed to the application,
which also gets a transformation of the channel data).
The other levels are a bit different in that they don't strictly need to
allocate for the first two ones, or they're permanent for the last one
(used by compression).
At the beginning of the filter class section, we encourage the user to
check out filters.txt file to get to know how the filters API works
within haproxy.
However the file location is incorrect. The proper directory to look for
the file is: doc/internals/api.
It should be backported up to 2.5.
Thanks to the previous fix, it is now possible to get the number of opened
streams for a connection and the negociated limit. Here, corresponding
sample feches are added, in fc_ and bc_ scopes.
On frontend side, the limit of streams is imposed by HAProxy. But on the
backend side, the limit is defined by the server. it may be useful for
debugging purpose because it may explain slow-downs on some processing.
It is now possible to retrieve some info about the abort received for a
server or a client stream, if any.
* fs.aborted and bs.aborted can be used to know if an abort was received
on frontend or backend side. A boolean is returned.
* fs.rst_code and bs.rst_code return the code of the received RESET_STREAM
frame for a H2 stream or the code of the received STOP_SENDING frame for
a QUIC stream. In both cases, the error code attached to the frame is
returned. The sample fetch fails if no such frame was received or if the
stream is not an H2/QUIC stream.
Released version 3.0-dev10 with the following main changes :
- BUG/MEDIUM: cache: Vary not working properly on anything other than accept-encoding
- REGTESTS: cache: Add test on 'vary' other than accept-encoding
- BUG/MINOR: stats: replace objt_* by __objt_* macros
- CLEANUP: tools/cbor: rename cbor_encode_ctx struct members
- MINOR: log/cbor: _lf_cbor_encode_byte() explicitly requires non-NULL ctx
- BUG/MINOR: log: fix global lf_expr node options behavior
- CLEANUP: log: add a macro to know if a lf_node is configurable
- MINOR: httpclient: allow to use absolute URI with new flag HC_F_HTTPROXY
- MINOR: ssl: introduce ocsp_update.http_proxy for ocsp-update keyword
- BUG/MINOR: log/encode: consider global options for key encoding
- BUG/MINOR: log/encode: fix potential NULL-dereference in LOGCHAR()
- BUG/MINOR: log: fix global lf_expr node options behavior (2nd try)
- MINOR: log/cbor: _lf_cbor_encode_byte() explicitly requires non-NULL ctx (again)
- BUG/MEDIUM: log: don't ignore disabled node's options
- BUG/MINOR: stconn: don't wake up an applet waiting on buffer allocation
- MINOR: sock: rename sock to sock_fd in sock_create_server_socket
- MEDIUM: proto_uxst: take in account server namespace
- MEIDUM: unix sock: use my_socketat to create bind socket
- MINOR: sock_set_mark: take sock family in account
- MEDIUM: proto: make common fd checks in sock_create_server_socket
- MINOR: sock: add EPERM case in sock_handle_system_err
- MINOR: capabilities: add cap_sys_admin support
- CLEANUP: ssl: clean the includes in ssl_ocsp.c
- CLEANUP: ssl: move the global ocsp-update options parsing to ssl_ocsp.c
- MINOR: stats: fix visual alignment for stat_cols_px definition
- MINOR: stats: convert req_tot as generic column
- MINOR: stats: prepare stats-file support for values other than FN_COUNTER
- MINOR: counters: move freq-ctr from proxy/server into counters struct
- MINOR: stats: support rate in stats-file
- MINOR: stats: convert rate as generic column for proxy stats
- MINOR: counters: move last_change into counters struct
- MINOR: stats: support age in stats-file
- MINOR: stats: convert age as generic column for proxy stat
- CLEANUP: ssl: rename new_ckch_store_load_files_path() to ckch_store_new_load_files_path()
- MINOR: ssl: rename ocsp_update.http_proxy into ocsp-update.httpproxy
- REORG: stats: define stats-proxy source module
- MINOR: stats: extract proxy clear-counter in a dedicated function
- REGTESTS: stats: add test stats-file counters preload
- CI: netbsd: adjust packages after NetBSD-10 released
- CLEANUP: assorted typo fixes in the code and comments
- REGTESTS: replace REQUIRE_VERSION by version_atleast
- MEDIUM: log: optimizing tmp->type handling in sess_build_logline()
- BUG/MINOR: log: prevent double spaces emission in sess_build_logline()
- OPTIM: log: declare empty buffer as global variable
- OPTIM: log: use thread local lf_buildctx to stop pushing it on the stack
- OPTIM: log: use lf_buildctx's buffer instead of temporary stack buffers
- OPTIM: log: speedup date printing in sess_build_logline() when no encoding is used
If 'namespace' keyword is used in the backend server settings or/and in the
bind string, it means that haproxy process will call setns() to change its
default namespace to the configured one and then, it will create a
socket in this new namespace. setns() syscall requires CAP_SYS_ADMIN
capability in the process Effective set (see man 2 setns). Otherwise, the
process must be run as root.
To avoid to run haproxy as root, let's add cap_sys_admin capability in the
same way as we already added the support for some other network capabilities.
As CAP_SYS_ADMIN belongs to CAP_SYS_* capabilities type, let's add a separate
flag LSTCHK_SYSADM for it. This flag is set, if the 'namespace' keyword was
found during configuration parsing. The flag may be unset only in
prepare_caps_for_setuid() or in prepare_caps_from_permitted_set(), which
inspect process EUID/RUID and Effective and Permitted capabilities sets.
If system doesn't support Linux capabilities or 'cap_sys_admin' was not set
in 'setcap', but 'namespace' keyword is presented in the configuration, we
keep the previous strict behaviour. Process, that has changed uid to the
non-priviledged user, will terminate with alert. This alert invites the user
to recheck its configuration.
In the case, when haproxy will start and run under a non-root user and
'cap_sys_admin' is not set, but 'namespace' keyword is presented, this patch
does not change previous behaviour as well. We'll still let the user to try
its configuration, but we inform via warning, that unexpected things, like
socket creation errors, may occur.
The ocsp_update.http_proxy global option allows to set an HTTP proxy
address which will be used to send the OCSP update request with an
absolute form URI.
Released version 3.0-dev9 with the following main changes :
- BUILD: ssl: use %zd for sizeof() in ssl_ckch.c
- MINOR: backend: use be_counters for health down accounting
- BUG/MINOR: backend: use cum_sess counters instead of cum_conn
- BUG/MINOR: stats: fix stot metric for listeners
- REGTESTS: use -dI for insecure fork by default in the regtest scripts
- MINOR: stats: rename proxy stats
- MINOR: stats: rename ambiguous stat_l and stat_count
- MINOR: stats: rename info stats
- MINOR: stats: use stricter naming stats/field/line
- MINOR: stats: use STAT_F_* prefix for flags
- BUG/MEDIUM: applet: Let's applets decide if they have more data to deliver
- BUILD: stick-tables: silence build warnings when threads are disabled
- MINOR: tools: Rename `ha_generate_uuid` to `ha_generate_uuid_v4`
- MINOR: Add `ha_generate_uuid_v7`
- MINOR: Add support for UUIDv7 to the `uuid` sample fetch
- MEDIUM: shctx: Naming shared memory context
- BUG/MINOR: h1: fix detection of upper bytes in the URI
- MINOR: intops: add a pair of functions to check multi-byte ranges
- TESTS: add a unit test for the multi-byte range checks
- CLEANUP: h1: make use of the multi-byte matching functions
- REGTESTS: ssl: Remove "sleep" calls from ocsp auto update test
- BUG/MEDIUM: peers: Automatically start to learn on local peer
- BUG/MEDIUM: peers: Reprocess peer state after all session shutdowns
- MINOR: peers: Remove unused PEERS_F_RESYNC_REQUESTED flag
- MINOR: peers: Don't set TEACH flags on a peer from the sync task
- MINOR: peers: Use a peer flag to block the applet waiting ack of the sync task
- BUG/MEDIUM: peers: Wait for sync task ack when a resynchro is finished
- MINOR: peers: Remove unused PEERS_F_RESYNC_PROCESS flag
- MINOR: applet: Add a function to know the side where an applet was created
- MEDIUM: peers: Simplify the peer flags dealing with the connection state
- MEDIUM: peers: Use true states for the peer applets as seen from outside
- MEDIUM: peers: Use true states for the learn state of a peer
- MINOR: peers: Start learning for local peer before receiving messages
- MINOR: peers: Rename PEERS_F_TEACH_COMPLETE to PEERS_F_LOCAL_TEACH_COMPLETE
- MINOR: peers: Reorder and slightly rename PEER flags
- MINOR: peers: Reorder and rename PEERS flags
- REORG: peers: Move peer and peers flags in the corresponding header file
- DEV: flags/peers: Decode PEER and PEERS flags
- MINOR: peers: Add comment on processing functions of the sync task
- MINOR: peers: Use a static variable to wait a resync on reload
- BUG/MEDIUM: peers: Use atomic operations on peers flags when necessary
- REORG: peers: Rename all occurrences to 'ps' variable
- BUG/MINOR: peers: Don't wait for a remote resync if there no remote peer
- MINOR: stats: update ambiguous "metrics" naming to "stat_cols"
- MINOR: stats: introduce a more expressive stat definition method
- MINOR: stats: implement automatic metric generation from stat_col
- MINOR: stats: hide some columns in output
- MEDIUM: stats: convert counters to new column definition
- MINOR: stats: define stats-file output format support
- MEDIUM: stats: implement dump stats-file CLI
- MINOR: ist: define iststrip() new function
- MINOR: guid: define guid_is_valid_fmt()
- MINOR: stats: apply stats-file on process startup
- MINOR: stats: parse header lines from stats-file
- MINOR: stats: parse values from stats-file
- MEDIUM: stats: define stats-file keyword
- BUG/MINOR: mworker: reintroduce way to disable seamless reload with -x /dev/null
- CLEANUP: log: remove unused checks for encode_{chunk,string}
- MINOR: log: store lf_expr nodes inside substruct
- MINOR: log: global lf_expr node options
- CLEANUP: log: simplify complex values usages in sess_build_logline()
- MINOR: log: skip custom logformat_node name if empty
- MINOR: log: add lf_int() wrapper to print integers
- MINOR: log: add lf_rawtext{_len}() functions
- MEDIUM: log: pass date strings to lf_rawtext()
- MEDIUM: log: write raw strings using lf_rawtext()
- MEDIUM: log: use lf_rawtext for lf_ip() and lf_port() hex strings
- MINOR: log: explicitly handle %ts and %tsc as text strings
- MINOR: log: use LOG_VARTEXT_{START,END} to enclose text strings
- MINOR: log: make all lf_* sess build helper static
- MINOR: log: merge lf_encode_string() and lf_encode_chunk() logic
- MEDIUM: log: lf_* build helpers now take a ctx argument
- MINOR: log: expose node typecast in lf_buildctx struct
- MINOR: log: postpone conversion for sample expressions in sess_build_logline()
- MINOR: log: add LOG_OPT_NONE flag
- MINOR: log: add no_escape_map to bypass escape with _lf_encode_bytes()
- MINOR: log: add +bin logformat node option
- MINOR: log: add +json encoding option
- MINOR: tools: add cbor encode helpers
- MINOR: log: add +cbor encoding option
- MINOR: log: support true cbor binary encoding
- CLEANUP: dynbuf: move the reserve and limit parsers to dynbuf.c
- MINOR: list: add a macro to detect that a list contains at most one element
- MINOR: cli/wait: rename the condition "srv-unused" to "srv-removable"
As previously discussed, "srv-unused" is sufficiently ambiguous to cause
some trouble over the long term. Better use "srv-removable" to indicate
that the server is removable, and if the conditions to delete a server
change over time, the wait condition will be adjusted without renaming
it.
CBOR in hex format as implemented in previous commit is convenient because
the produced output is portable and can easily be embedded in regular
syslog payloads.
However, one of the goal of CBOR implementation is to be able to produce
"Concise Binary" object representation. Here is an excerpt from cbor.io
website:
"Some applications also benefit from CBOR itself being encoded in
binary. This saves bulk and allows faster processing."
Currently we don't offer that with '+cbor', quite the opposite actually
since a text string encoded with '+cbor' option will be larger than a
text string encoded with '+json' or without encoding at all, because for
each CBOR binary byte, 2 characters will be emitted.
Hopefully, the sink/log API allows for binary data to be passed as
parameter, this is because all relevant functions in the chain don't rely
on the terminating NULL byte and take a string pointer + string length as
parameter. We can actually rely on this property to support the '+bin'
option when combined with '+cbor' to produce RAW binary CBOR output.
Be careful though, as this is only intended for use with set-var-fmt or to
send binary data to capable UDP/ring endpoints.
Example:
log-format "%{+cbor,+bin}o %(test)[bin(00AABB)]"
Will produce:
bf64746573745f4300aabbffff
(output was piped to `hexdump -ve '1/1 "%.2x"'` to dump raw bytes as HEX
characters)
With cbor.me pretty printer, it gives us:
BF # map(*)
64 # text(4)
74657374 # "test"
5F # bytes(*)
43 # bytes(3)
00AABB # "\u0000\xAA\xBB"
FF # primitive(*)
FF # primitive(*)
In this patch, we make use of the CBOR (RFC8949) encode helper functions
from the previous commit to implement '+cbor' encoding option for log-
formats. The logic behind it is pretty similar to '+json' encoding option,
except that the produced output is a CBOR payload written in HEX format so
that it remains compatible to use this with regular syslog endpoints.
Example:
log-format "%{+cbor}o %[int(4)] test %(named_field)[str(ok)]"
Will produce:
BF6B6E616D65645F6669656C64626F6BFF
Detailed view (from cbor.me):
BF # map(*)
6B # text(11)
6E616D65645F6669656C64 # "named_field"
62 # text(2)
6F6B # "ok"
FF # primitive(*)
If the option isn't set globally, but on a specific node instead, then
only the value will be encoded according to CBOR specification.
Example:
log-format "test cbor bool: %{+cbor}[bool(true)]"
Will produce:
test cbor bool: F5
In this patch, we add the "+json" log format option that can be set
globally or per log format node.
What it does, it that it sets the LOG_OPT_ENCODE_JSON flag for the
current context which is provided to all lf_* log building function.
This way, all lf_* are now aware of this option and try to comply with
JSON specification when the option is set.
If the option is set globally, then sess_build_logline() will produce a
map-like object with key=val pairs for named logformat nodes.
(logformat nodes that don't have a name are simply ignored).
Example:
log-format "%{+json}o %[int(4)] test %(named_field)[str(ok)]"
Will produce:
{"named_field": "ok"}
If the option isn't set globally, but on a specific node instead, then
only the value will be encoded according to JSON specification.
Example:
log-format "{ \"manual_key\": %(named_field){+json}[bool(true)] }"
Will produce:
{"manual_key": true}
When the option is set, +E option will be ignored, and partial numerical
values (ie: because of logasap) will be encoded as-is.
Support '+bin' option argument on logformat nodes to try to preserve
binary output type with binary sample expressions.
For this, we rely on the log/sink API which is capable of conveying binary
data since all related functions don't search for a terminating NULL byte
in provided log payload as they take a string pointer and a string length
as argument.
Example:
log-format "%{+bin}o %[bin(00AABB)]"
Will produce:
00aabb
(output was piped to `hexdump -ve '1/1 "%.2x"'` to dump raw bytes as HEX
characters)
This should be used carefully, because many syslog endpoints don't expect
binary data (especially NULL bytes). This is mainly intended for use with
set-var-fmt actions or with ring/udp log endpoints that know how to deal
with such binary payloads.
Also, this option is only supported globally (for use with '%o'), it will
not have any effect when set on an individual node. (it makes no sense to
have binary data in the middle of log payload that was started without
binary data option)
Since the introduction of the automatic seamless reload using the
internal socketpair, there is no way of disabling the seamless reload.
Previously we just needed to remove -x from the startup command line,
and remove any "expose-fd" keyword on stats socket lines.
This was introduced in 2be557f7c ("MEDIUM: mworker: seamless reload use
the internal sockpairs").
The patch copy /dev/null again and pass it to the next exec so we never
try to get socket from the -x.
Must be backported as far as 2.6.
This commit is the final to implement preloading of haproxy internal
counters via stats-file parsing.
Define a global keyword "stats-file". It allows to specify the path to
the stats-file which will be parsed on process startup.
Define a new CLI command "dump stats-file" with its handler
cli_parse_dump_stat_file(). It will loop twice on proxies_list to dump
first frontend and then backend side. It reuses the common function
stats_dump_stat_to_buffer(), using STAT_F_BOUND to restrict on the
correct side.
A new module stats-file.c is added to regroup function specifics to
stats-file. It defines two main functions :
* stats_dump_file_header() to generate the list of column list prefixed
by the line context, either "#fe" or "#be"
* stats_dump_fields_file() to generate each stat lines. Object without
GUID are skipped. Each stat entry is separated by a comma.
For the moment, stats-file does not support statistics modules. As such,
stats_dump_*_line() functions are updated to prevent looping over stats
module on stats-file output.
Released version 3.0-dev8 with the following main changes :
- BUG/MINOR: cli: Don't warn about a too big command for incomplete commands
- BUG/MINOR: listener: always assign distinct IDs to shards
- BUG/MINOR: log: fix lf_text_len() truncate inconsistency
- BUG/MINOR: tools/log: invalid encode_{chunk,string} usage
- BUG/MINOR: log: invalid snprintf() usage in sess_build_logline()
- CLEANUP: log: lf_text_len() returns a pointer not an integer
- MINOR: quic: simplify qc_send_hdshk_pkts() return
- MINOR: quic: uniformize sending methods for handshake
- MINOR: quic: improve sending API on retransmit
- MINOR: quic: use qc_send_hdshk_pkts() in handshake IO cb
- MEDIUM: quic: remove duplicate hdshk/app send functions
- OPTIM: quic: do not call qc_send() if nothing to emit
- OPTIM: quic: do not call qc_prep_pkts() if everything sent
- BUG/MEDIUM: http-ana: Deliver 502 on keep-alive for fressh server connection
- BUG/MINOR: http-ana: Fix TX_L7_RETRY and TX_D_L7_RETRY values
- BUILD: makefile: warn about unknown USE_* variables
- BUILD: makefile: support USE_xxx=0 as well
- BUG/MINOR: guid: fix crash on invalid guid name
- BUILD: atomic: fix peers build regression on gcc < 4.7 after recent changes
- BUG/MINOR: debug: make sure DEBUG_STRICT=0 does work as documented
- BUILD: cache: fix non-inline vs inline declaration mismatch to silence a warning
- BUILD: debug: make DEBUG_STRICT=1 the default
- BUILD: pools: make DEBUG_MEMORY_POOLS=1 the default option
- CI: update the build options to get rid of unneeded DEBUG options
- BUILD: makefile: get rid of the config CFLAGS variable
- BUILD: makefile: allow to use CFLAGS to append build options
- BUILD: makefile: drop the SMALL_OPTS settings
- BUILD: makefile: move -O2 from CPU_CFLAGS to OPT_CFLAGS
- BUILD: makefile: get rid of the CPU variable
- BUILD: makefile: drop the ARCH variable and better document ARCH_FLAGS
- BUILD: makefile: extract ARCH_FLAGS out of LDFLAGS
- BUILD: makefile: move the fwrapv option to STD_CFLAGS
- BUILD: makefile: make the ERR variable also support 0
- BUILD: makefile: add FAILFAST to select the -Wfatal-errors behavior
- BUILD: makefile: extract -Werror/-Wfatal-errors from automatic CFLAGS
- BUILD: makefile: split WARN_CFLAGS from SPEC_CFLAGS
- BUILD: makefile: rename SPEC_CFLAGS to NOWARN_CFLAGS
- BUILD: makefile: do not pass warnings to VERBOSE_CFLAGS
- BUILD: makefile: also drop DEBUG_CFLAGS
- CLEANUP: makefile: make the output of the "opts" target more readable
- DOC: install: clarify the build process by splitting it into subsections
- BUG/MINOR: server: fix slowstart behavior
- BUG/MEDIUM: cache/stats: Handle inbuf allocation failure in the I/O handler
- MINOR: ssl: add the section parser for 'crt-store'
- DOC: configuration: Add 3.12 Certificate Storage
- REGTESTS: ssl: test simple case of crt-store
- MINOR: ssl: rename ckchs_load_cert_file to new_ckch_store_load_files_path
- MINOR: ssl/crtlist: alloc ssl_conf only when a valid keyword is found
- BUG/MEDIUM: stick-tables: fix the task's next expiration date
- CLEANUP: stick-tables: always respect the to_batch limit when trashing
- BUG/MEDIUM: peers/trace: fix crash when listing event types
- BUG/MAJOR: stick-tables: fix race with peers in entry expiration
- DEBUG: pool: improve decoding of corrupted pools
- REORG: pool: move the area dump with symbol resolution to tools.c
- DEBUG: pools: report the data around the offending area in case of mismatch
- MINOR: listener/protocol: add proto name in alerts
- MINOR: proto_quic: add proto name in alert
- BUG/MINOR: lru: fix the standalone test case for invalid revision
- DOC: management: fix typos
- CI: revert kernel addr randomization introduced in 3a0fc864
- MINOR: ring: clarify the usage of ring_size() and add ring_allocated_size()
- BUG/MAJOR: ring: use the correct size to reallocate startup_logs
- MINOR: ring: always check that the old ring fits in the new one in ring_dup()
- CLEANUP: ssl: remove dead code in cfg_parse_crtstore()
- MINOR: ssl: supports crt-base in crt-store
- MINOR: ssl: 'key-base' allows to load a 'key' from a specific path
- MINOR: net_helper: Add support for floats/doubles.
- BUG/MEDIUM: grpc: Fix several unaligned 32/64 bits accesses
- MINOR: peers: Split resync process function to separate running/stopping states
- MINOR: peers: Add 2 peer flags about the peer learn status
- MINOR: peers: Add flags to report the peer state to the resync task
- MINOR: peers: sligthly adapt part processing the stopping signal
- MINOR: peers: Add functions to commit peer changes from the resync task
- BUG/MINOR: peers: Report a resync was explicitly requested from a thread-safe manner
- BUG/MAJOR: peers: Update peers section state from a thread-safe manner
- MEDIUM: peers: Only lock one peer at a time in the sync process function
- MINOR: peer: Restore previous peer flags value to ease debugging
- BUG/MEDIUM: stconn: Don't forward channel data if input data must be filtered
- BUILD: cache: fix a build warning with gcc < 7
- BUILD: xxhash: silence a build warning on Solaris + gcc-5.5
- CI: reduce ASAN log redirection umbrella size
- CLEANUP: assorted typo fixes in the code and comments
- BUG/MEDIUM: evports: do not clear returned events list on signal
- MEDIUM: evports: permit to report multiple events at once
- MEDIUM: ssl: support aliases in crt-store
- BUG/MINOR: ssl: check on forbidden character on wrong value
- BUG/MINOR: ssl: fix crt-store load parsing
- BUG/MEDIUM: applet: Fix applet API to put input data in a buffer
- BUG/MEDIUM: spoe: Always retry when an applet fails to send a frame
- BUG/MEDIUM: peers: Fix exit condition when max-updates-at-once is reached
- BUILD: linuxcap: Properly declare prepare_caps_from_permitted_set()
- BUG/MEDIUM: peers: fix localpeer regression with 'bind+server' config style
- MINOR: peers: stop relying on srv->addr to find peer port
- MEDIUM: ssl: support a named crt-store section
- MINOR: stats: remove implicit static trash_chunk usage
- REORG: stats: extract HTML related functions
- REORG: stats: extract JSON related functions
- MEDIUM: ssl: crt-base and key-base local keywords for crt-store
- MINOR: stats: Get the right prototype for stats_dump_html_end().
- MAJOR: ssl: use the msg callback mecanism for backend connections
- MINOR: ssl: implement keylog fetches for backend connections
- BUG/MINOR: stconn: Fix sc_mux_strm() return value
- MINOR: mux-pt: Test conn flags instead of sedesc ones to perform a full close
- MINOR: stconn/connection: Move shut modes at the SE descriptor level
- MINOR: stconn: Rewrite shutdown functions to simplify the switch statements
- MEDIUM: stconn: Use only one SC function to shut connection endpoints
- MEDIUM: stconn: Explicitly pass shut modes to shut applet endpoints
- MEDIUM: stconn: Use one function to shut connection and applet endpoints
- MEDIUM: muxes: Use one callback function to shut a mux stream
- BUG/MINOR: sock: handle a weird condition with connect()
- BUG/MINOR: fd: my_closefrom() on Linux could skip contiguous series of sockets
- BUG/MEDIUM: peers: Don't set PEERS_F_RESYNC_PROCESS flag on a peer
- BUG/MEDIUM: peers: Fix state transitions of a peer
- MINOR: init: use RLIMIT_DATA instead of RLIMIT_AS
- CI: modernize macos matrix
Limiting total allocatable process memory (VSZ) via setting RLIMIT_AS limit is
no longer effective, in order to restrict memory consumption at run time.
We can see from process memory map below, that there are many holes within
the process VA space, which bumps its VSZ to 1.5G. These holes are here by
many reasons and could be explaned at first by the full randomization of
system VA space. Now it is usually enabled in Linux kernels by default. There
are always gaps around the process stack area to trap overflows. Holes before
and after shared libraries could be explained by the fact, that on many
architectures libraries have a 'preferred' address to be loaded at; putting
them elsewhere requires relocation work, and probably some unshared pages.
Repetitive holes of 65380K are most probably correspond to the header that
malloc has to allocate before asked a claimed memory block. This header is
used by malloc to link allocated chunks together and for its internal book
keeping.
$ sudo pmap -x -p `pidof haproxy`
127136: ./haproxy -f /home/haproxy/haproxy/haproxy_h2.cfg
Address Kbytes RSS Dirty Mode Mapping
0000555555554000 388 64 0 r---- /home/haproxy/haproxy/haproxy
00005555555b5000 2608 1216 0 r-x-- /home/haproxy/haproxy/haproxy
0000555555841000 916 64 0 r---- /home/haproxy/haproxy/haproxy
0000555555926000 60 60 60 r---- /home/haproxy/haproxy/haproxy
0000555555935000 116 116 116 rw--- /home/haproxy/haproxy/haproxy
0000555555952000 7872 5236 5236 rw--- [ anon ]
00007fff98000000 156 36 36 rw--- [ anon ]
00007fff98027000 65380 0 0 ----- [ anon ]
00007fffa0000000 156 36 36 rw--- [ anon ]
00007fffa0027000 65380 0 0 ----- [ anon ]
00007fffa4000000 156 36 36 rw--- [ anon ]
00007fffa4027000 65380 0 0 ----- [ anon ]
00007fffa8000000 156 36 36 rw--- [ anon ]
00007fffa8027000 65380 0 0 ----- [ anon ]
00007fffac000000 156 36 36 rw--- [ anon ]
00007fffac027000 65380 0 0 ----- [ anon ]
00007fffb0000000 156 36 36 rw--- [ anon ]
00007fffb0027000 65380 0 0 ----- [ anon ]
...
00007ffff7fce000 4 4 0 r-x-- [ anon ]
00007ffff7fcf000 4 4 0 r---- /usr/lib/x86_64-linux-gnu/ld-2.31.so
00007ffff7fd0000 140 140 0 r-x-- /usr/lib/x86_64-linux-gnu/ld-2.31.so
...
00007ffff7ffe000 4 4 4 rw--- [ anon ]
00007ffffffde000 132 20 20 rw--- [ stack ]
ffffffffff600000 4 0 0 --x-- [ anon ]
---------------- ------- ------- -------
total kB 1499288 75504 72760
This exceeded VSZ makes impossible to start an haproxy process with 200M
memory limit, set at its initialization stage as RLIMIT_AS. We usually
have in this case such cryptic output at stderr:
$ haproxy -m 200 -f haproxy_quic.cfg
(null)(null)(null)(null)(null)(null)
At the same time the process RSS (a memory really used) is only 75,5M.
So to make process memory accounting more realistic let's base the memory
limit, set by -m option, on RSS measurement and let's use RLIMIT_DATA instead
of RLIMIT_AS.
RLIMIT_AS was used before, because earlier versions of haproxy always allocate
memory buffers for new connections, but data were not written there
immediately. So these buffers were not instantly counted in RSS, but were
always counted in VSZ. Now we allocate new buffers only in the case, when we
will write there some data immediately, so using RLIMIT_DATA becomes more
appropriate.
This patch implements the backend side of the keylog fetches.
The code was ready but needed the SSL message callbacks.
This could be used like this:
log-format "CLIENT_EARLY_TRAFFIC_SECRET %[ssl_bc_client_random,hex] %[ssl_bc_client_early_traffic_secret]\n
CLIENT_HANDSHAKE_TRAFFIC_SECRET %[ssl_bc_client_random,hex] %[ssl_bc_client_handshake_traffic_secret]\n
SERVER_HANDSHAKE_TRAFFIC_SECRET %[ssl_bc_client_random,hex] %[ssl_bc_server_handshake_traffic_secret]\n
CLIENT_TRAFFIC_SECRET_0 %[ssl_bc_client_random,hex] %[ssl_bc_client_traffic_secret_0]\n
SERVER_TRAFFIC_SECRET_0 %[ssl_bc_client_random,hex] %[ssl_bc_server_traffic_secret_0]\n
EXPORTER_SECRET %[ssl_bc_client_random,hex] %[ssl_bc_exporter_secret]\n
EARLY_EXPORTER_SECRET %[ssl_bc_client_random,hex] %[ssl_bc_early_exporter_secret]"
Add support for crt-base and key-base local keywords for the crt-store.
current_crtbase and current_keybase are filed with a copy of the global
keyword argument when a crt-store is declared, and updated with a new
path when the keywords are in the crt-store section.
The ckch_conf_kws[] array was updated with ¤t_crtbase and
¤t_keybase instead of the global_ssl ones so the parser can use
them.
The keyword must be used before any "load" line in a crt-store section.
Example:
crt-store web
crt-base /etc/ssl/certs/
key-base /etc/ssl/private/
load crt "site3.crt" alias "site3"
load crt "site4.crt" key "site4.key"
frontend in2
bind *:443 ssl crt "@web/site3" crt "@web/site4.crt"
This patch introduces named crt-store section. A named crt-store allows
to add a scope to the crt name.
For example, a crt named "foo.crt" in a crt-store named "web" will
result in a certificate called "@web/foo.crt".
The crt-store load line now allows to put an alias. This alias is used
as the key in the ckch_tree instead of the certificate. This way an
alias can be referenced in the configuration with the '@/' prefix.
This can only be define with a crt-store.
The global 'key-base' keyword allows to read the 'key' parameter of a
crt-store load line using a path prefix.
This is the equivalent of the 'crt-base' keyword but for 'key'.
It only applies on crt-store.
Released version 3.0-dev7 with the following main changes :
- BUG/MINOR: ssl: Wrong ocsp-update "incompatibility" error message
- BUG/MINOR: ssl: Detect more 'ocsp-update' incompatibilities
- MEDIUM: ssl: Add 'tune.ssl.ocsp-update.mode' global option
- REGTESTS: ssl: Add OCSP update compatibility tests
- REGTESTS: ssl: Add functional test for global ocsp-update option
- BUG/MINOR: server: reject enabled for dynamic server
- BUG/MINOR: server: fix persistence cookie for dynamic servers
- MINOR: server: allow cookie for dynamic servers
- REGTESTS: Fix script about OCSP update compatibility tests
- BUG/MINOR: cli: Report an error to user if command or payload is too big
- MINOR: sc_strm: Add generic version to perform sync receives and sends
- MEDIUM: stream: Use generic version to perform sync receives and sends
- MEDIUM: buf: Add b_getline() and b_getdelim() functions
- MEDIUM: applet: Handle applets with their own buffers in put functions
- MEDIUM: cli/applet: Stop to test opposite SC in I/O handler of CLI commands
- MINOR: applet: Always use applet API to set appctx flags
- BUG/MEDIUM: applet: State appctx have more data if its EOI/EOS/ERROR flag is set
- MAJOR: cli: Update the CLI applet to handle its own buffers
- MINOR: applet: Let's applets .snd_buf function deal with full input buffers
- MINOR: stconn: Add a connection flag to notify sending data are the last ones
- MAJOR: cli: Use a custom .snd_buf function to only copy the current command
- DOC: config: balance 'first' not usable in LOG mode
- BUG/MINOR: log/balance: detect if user tries to use unsupported algo
- MINOR: lbprm: implement true "sticky" balance algo
- MEDIUM: log/balance: leverage lbprm api for log load-balancing
- BUG/BUILD: debug: fix unused variable error
- MEDIUM: lb-chash: Deterministic node hashes based on server address
- BUG/MEDIUM: stick-tables: fix a small remaining race in expiration task
- REGTESTS: Do not use REQUIRE_VERSION for HAProxy 2.5+ (4)
- REGTESTS: Remove REQUIRE_VERSION=1.9 from all tests (2)
- CLEANUP: Reapply ist.cocci (3)
- CLEANUP: Reapply strcmp.cocci (2)
- CLEANUP: Reapply xalloc_cast.cocci
- CLEANUP: Reapply ha_free.cocci
- CI: vtest: show coredumps if any
- REGTESTS: ssl: disable ssl/ocsp_auto_update.vtc
- BUG/MINOR: backend: properly handle redispatch 0
- MINOR: quic: HyStart++ implementation (RFC 9406)
- BUG/MEDIUM: stconn: Don't forward shutdown to SE if iobuf is not empty
- BUG/MEDIUM: stick-table: use the update lock when reading tables from peers
- BUG/MAJOR: applet: fix a MIN vs MAX usage in appctx_raw_rcv_buf()
- OPTIM: peers: avoid the locking dance around peer_send_teach_process_msgs()
- BUILD: quic: 32 bits compilation issue (QUIC_MIN() usage)
- BUG/MEDIUM: server/lbprm: fix crash in _srv_set_inetaddr_port()
- MEDIUM: mworker: get rid of libsystemd
- BUILD: systemd: fix build error on non-systemd systems with USE_SYSTEMD=1
- BUG/MINOR: bwlim/config: fix missing '\n' after error messages
- MINOR: stick-tables: mark the seen stksess with a flag "seen"
- OPTIM: stick-tables: check the stksess without taking the read lock
- MAJOR: stktable: split the keys across multiple shards to reduce contention
- CI: extend Fedora Rawhide, add m32 mode
- BUG/MINOR: stick-tables: Missing stick-table key nullity check
- BUILD: systemd: enable USE_SYSTEMD by default with TARGET=linux-glibc
- MINOR: systemd: Include MONOTONIC_USEC field in RELOADING=1 message
- BUG/MINOR: proxy: fix logformat expression leak in use_backend rules
- MEDIUM: log: rename logformat var to logformat tag
- MINOR: log: expose logformat_tag struct
- MEDIUM: log: carry tag context in logformat node
- MEDIUM: tree-wide: add logformat expressions wrapper
- MINOR: proxy: add PR_FL_CHECKED flag
- MAJOR: log: implement proper postparsing for logformat expressions
- MEDIUM: log: add compiling logic to logformat expressions
- MEDIUM: proxy/log: leverage lf_expr API for logformat preparsing
- MINOR: guid: introduce global UID module
- MINOR: guid: restrict guid format
- MINOR: proxy: implement GUID support
- MINOR: server: implement GUID support
- MINOR: listener: implement GUID support
- DOC: configuration: grammar fixes for strict-sni
- BUG/MINOR: init: relax LSTCHK_NETADM checks for non root
- MEDIUM: capabilities: check process capabilities sets
- CLEANUP: global: remove LSTCHK_CAP_BIND
- BUG/MEDIUM: quic: don't blindly rely on unaligned accesses
Since the Linux capabilities support add-on (see the commit bd84387beb
("MEDIUM: capabilities: enable support for Linux capabilities")), we can also
check haproxy process effective and permitted capabilities sets, when it
starts and runs as non-root.
Like this, if needed network capabilities are presented only in the process
permitted set, we can get this information with capget and put them in the
process effective set via capset. To do this properly, let's introduce
prepare_caps_from_permitted_set().
First, it checks if binary effective set has CAP_NET_ADMIN or CAP_NET_RAW. If
there is a match, LSTCHK_NETADM is removed from global.last_checks list to
avoid warning, because in the initialization sequence some last configuration
checks are based on LSTCHK_NETADM flag and haproxy process euid may stay
unpriviledged.
If there are no CAP_NET_ADMIN and CAP_NET_RAW in the effective set, permitted
set will be checked and only capabilities given in 'setcap' keyword will be
promoted in the process effective set. LSTCHK_NETADM will be also removed in
this case by the same reason. In order to be transparent, we promote from
permitted set only capabilities given by user in 'setcap' keyword. So, if
caplist doesn't include CAP_NET_ADMIN or CAP_NET_RAW, LSTCHK_NETADM would not
be unset and warning about missing priviledges will be emitted at
initialization.
Need to call it before protocol_bind_all() to allow binding to priviledged
ports under non-root and 'setcap cap_net_bind_service' must be set in the
global section in this case.