tcpcheck_ruleset struct was extended to host a config part that will be used
for healthcheck sections. This config part is mainly used to store element
for the server's tcpcheck part.
When a healthcheck section is parsed, a ruleset is created with its name
(which must be unique). "*healthcheck-{NAME}" is used for these ruleset. So
it is not possible to mix them with regular rulesets.
For now, in a healthcheck section, the type must be defined, based on the
options name (tcp-check, httpchk, redis-check...). In addition, several
"tcp-check" or "http-check" rules can be specified, depending on the
healthcheck type.
The proxy flag PR_O_TCPCHK_SSL is replaced by a flag on the tcpcheck
itself. When TCPCHK_FL_USE_SSL flag is set, it means the healthcheck will
use an SSL connection and the SSL xprt must be prepared for the server.
disable-on-404 and send-state options, configured on an HTTP healtcheck,
were handled as proxy options. Now, these options are handled in the
tcp-check itself. So the corresponding PR_O and PR_02 flags are removed.
The tcpcheck_rules structure is replaced by the tcpcheck structure. The main
difference is that the ruleset is now referenced in the tcpcheck structure,
instead of the rules list. The flags about the ruleset type are moved into
the ruleset structure and flags to track unused rules remains on the
tcpcheck structure. So it should be easier to track unused rulesets. But it
should be possible to configure a set of tcpcheck rules outside of the proxy
scope.
The main idea of these changes is to prepare the parsing of a new
healthcheck section. So this patch is quite huge, but it is mainly about
renaming some fields.
Add keylog_format_fc and keylog_format_bc global variables containing
the SSLKEYLOGFILE log-format strings for the frontend (client-facing)
and backend (server-facing) TLS connections respectively. These produce
output compatible with the SSLKEYLOGFILE format described at:
https://tlswg.org/sslkeylogfile/draft-ietf-tls-keylogfile.html
Both formats are also exported as environment variables at startup:
HAPROXY_KEYLOG_FC_LOG_FMT
HAPROXY_KEYLOG_BC_LOG_FMT
These variables contains \n so they might not be compatible with syslog
servers, using them with stderr or a sink might be required.
These can be referenced directly in "log-format" directives to produce
SSLKEYLOGFILE-compatible output, usable by network analyzers such as
Wireshark to decrypt captured TLS traffic.
Check that last_sess actually changed before attempting to set it, as it
should only change once every second, that will avoid a lot of atomic
writes on a busy cache line.
Check that last_sess actually changed before attempting to set it, as it
should only change once every second, that will avoid a lot of atomic
writes on a busy cache line.
Reverse the default, to hide the version from stats by default, and add
a new keyword, "stats show-version", to enable them, as we don't want to
disclose the version by default, especially on public websites.
These actions allow setting, adding and deleting multiple headers from
the same action, without having to know the header names during parsing.
This is useful when doing things with SPOE.
The previous patch implemented the 'dns-check' option. This one replaces
it by a more generic 'challenge-ready' option, which allows the user to
chose the condition to validate the readiness of a challenge. It could
be 'cli', 'dns' or both.
When in dns-01 mode it's by default to 'cli' so the external tool used to
configure the TXT record can validate itself. If the tool does not
validate the TXT record, you can use 'cli,dns' so a DNS check would be
done after the CLI validated with 'challenge_ready'.
For an automated validation of the challenge, it should be set to 'dns',
this would check that the TXT record is right by itself.
When using the dns-01 challenge type, TXT record propagation across
DNS servers can take time. If the ACME server verifies the challenge
before the record is visible, the challenge fails and it's not possible
to trigger it again.
This patch introduces an optional DNS pre-check mechanism controlled
by two new configuration directives in the "acme" section:
- "dns-check on|off": enable DNS propagation verification before
notifying the ACME server (default: off)
- "dns-delay <time>": delay before querying DNS (default: 300s)
When enabled, three new states are inserted in the state machine
between AUTH and CHALLENGE:
- ACME_RSLV_WAIT: waits dns-delay seconds before starting
- ACME_RSLV_TRIGGER: starts an async TXT resolution for each
pending authorization using HAProxy's resolver infrastructure
- ACME_RSLV_READY: compares the resolved TXT record against the
expected token; retries from ACME_RSLV_WAIT if any record is
missing or does not match
The "acme_rslv" structure is implemented in acme_resolvers.c, it holds
the resolution for each domain. The "auth" structure which contains each
challenge to resolve contains an "acme_rslv" structure. Once
ACME_RSLV_TRIGGER leaves, the DNS tasks run on the same thread, and the
last DNS task which finishes will wake up acme_process().
Note that the resolution goes through the configured resolvers, not
through the authoritative name servers of the domain. The result may
therefore still be affected by DNS caching at the resolver level.
This patch adds support for TXT records. It allows to get the first
string of a TXT-record which is limited to 255 characters.
The rest of the record is ignored.
In order to improve our ability to distinguish operations that had
already started from others under high loads, it would be nice to know
if an application layer (stream) has started to work with an endpoint
or not. The use case typically is a frontend mux instantiating a stream
to instantly cancel it. Currently this info will take some time to be
detected and processed if the applcation's task takes time to wake up.
By flagging the sedesc with SE_FL_APP_STARTED the first time a the app
layer starts, the lower layers can know whether they're cancelling a
stream that has started to work or not, and act accordingly. For now
this is done unconditionally on the backend, and performed early in the
only two app layers that can be reached by a frontend: process_stream()
and process_hstream() (for haterm).
The SC created from a healthcheck is always a back SC. But SC_FL_ISBACK
flags was missing. Instead of passing it when sc_new_from_check() is called,
the function was simplified to set SC_FL_ISBACK flag systematically when a
SC is created from a healthcheck.
This patch should be backported as far as 2.6.
If server returns an auth with status valid it seems that client
needs to always skip it, CA can recycle authorizations, without
this change haproxy fails to obtain certificates in that case.
It is also something that is explicitly allowed and stated
in the dns-persist-01 draft RFC.
Note that it would be better to change how haproxy does status polling,
and implements the state machine, but that will take some thought
and time, this patch is a quick fix of the problem.
See:
https://github.com/letsencrypt/boulder/issues/2125https://github.com/letsencrypt/pebble/issues/133
This must be backported to 3.2 and later.
When picking a mux, pay attention to its MX_FL_FRAMED. If it is set,
then it means we explicitely want QUIC, so don't use that mux for any
protocol that is not QUIC.
When parsing the check address, store the associated proto too.
That way we can use the notation like quic4@address, and the right
protocol will be used. It is possible for checks to use a different
protocol than the server, ie we can have a QUIC server but want to run
TCP checks, so we can't just reuse whatever the server uses.
WIP: store the protocol in checks
Thanks to previous commits, it is possible to use small buffers at different
places: to store the request when a connection is queued or when L7 retries
are enabled, or for health-checks requests. However, there was no
configuration parameter to fine tune small buffer use.
It is now possible, thanks to the proxy option "use-small-buffers".
Documentation was updated accordingly.
When healthchecks were configured for a proxy, an enum-like was used to
sepcify the check's type. The idea was to reserve some values for futur
types of healthcheck. But it is overkill. I doubt we will ever have
something else than tcp and external checks. So corresponding PR_O2 flags
were slightly reviewed and a hole was filled.
Thanks to this change, some bits were released in options2 bitfield.
If support for small buffers is enabled, we now try to use them for
healthcheck requests. First, we take care the tcpcheck ruleset may use small
buffers. Send rules using LF strings or too large data are excluded. The
ability to use small buffers or not are set on the ruleset. All send rules
of the ruleset must be compatible. This info is then transfer to server's
healthchecks relying on this ruleset.
Then, when a healthcheck is running, when a send rule is evaluated, if
possible, we try to use small buffers. On error, the ability to use small
buffers is removed and we retry with a regular buffer. It means on the first
error, the support is disabled for the healthcheck and all other runs will
use regular buffers.
In the same way support for large chunks was added to properly work with
large buffers, we are now adding supports for small chunks because it is
possible to process small buffers.
So a dedicated memory pool is added to allocate small
chunks. alloc_small_trash_chunk() must be used to allocate a small
chunk. alloc_trash_chunk_sz() and free_trash_chunk() were uppdated to
support small chunks.
In addition, small trash buffers are also created, using the same mechanism
than for regular trash buffers. So three thread-local trash buffers are
created. get_small_trash_chunk() must be used to get a small trash buffer.
And get_trash_chunk_sz() was updated to also deal with small buffers.
htx_move_to_small_buffer()/htx_move_to_large_buffer() and
htx_copy_to_small_buffer()/htx_copy_to_large_buffer() functions can now be
used to move or copy blocks from a default buffer to a small or large
buffer. The destination buffer is allocated and then each blocks are
transferred into it.
These funtions relies in htx_xfer() function.
htx_xfer() function should replace htx_xfer_blks(). It will be a bit easier to
maintain and to use. The behavior of htx_xfer() can be changed by calling it
with specific flags:
* HTX_XFER_KEEP_SRC_BLKS: Blocks from the source message are just copied
* HTX_XFER_PARTIAL_HDRS_COPY: It is allowed to partially xfer headers or trailers
* HTX_XFER_HDRS_ONLY: only headers are xferred
By default (HTX_XFER_DEFAULT or 0), all blocks from the source message are moved
into to the destination mesage. So copied in the destination messageand removed
from the source message.
The caller must still define the maximum amount of data (including meta-data)
that can be xferred.
It is no longer necessary to specify a block type to stop the copy. Most of
time, with htx_xfer_blks(), this parameter was set to HTX_BLK_UNUSED. And
otherwise it was only specified to transfer headers.
It is important to not that the caller is responsible to verify the original
HTX message is well-formated. Especially, it must be sure headers part and
trailers part are complete (finished by EOH/EOT block).
For now, htx_xfer_blks() is not removed for compatiblity reason. But it is
deprecated.
Because small buffers were only used by QUIC streams, the pool used to alloc
these buffers was located in the quic code. However, their usage will be
extended to other parts. So, the small buffers pool was moved into the
dynbuf part.
Function proxy_check_errors() is used when configuration parsing is
over. This patch splits it in two newly named ones.
The first function is named proxy_check_http_errors(). It is responsible
to check for the validity of any "errorfiles" directive which could
reference non-existent http-errors section or code not defined in such
section. This function is now called via proxy_finalize().
The second function is named proxy_finalize_http_errors(). It converts
each conf_errors type used during parsing in a proper http_reply type
for runtime usage. This function is still called via post-proxy-check,
after proxy_finalize().
This patch does not bring any functional change. However, it will become
necessary to ensure http-errors can be used as expected with dynamic
backends.
In conf_errors struct, arbitrary integer values were used for both
<type> field and <status> array. This renders the code difficult to
follow.
Replaces these values with proper enums type. Two new types are defined
for each of these fields. The first one represents the directive type,
derived from the keyword used (errorfile vs errorfiles). This directly
represents which part of <info> union should be manipulated.
The second enum is used for errorfiles directive with a reference on a
http-errors section. It indicates whether or not if a status code should
be imported from this section, and if this import is explicit or
implicit.
There's always a risk that some tasks run multiple times if they wake
each other up. Now we include the loop counter in the task struct and
stop processing the queue it's in when meeting a task that has already
run. We only pick 16 bits since that's only what remains free in the
task common part, so from time to time (once every 65536) it will be
possible to wrongly match a task as having already run and stop evaluating
its queue, but it's rare enough that we don't care, because this will
be OK on the next iteration.
We anticipated that the do-log action should be expanded with optional
arguments at some point. Now that we heard of multiple use-cases
that could be achieved with do-log action, but that are limitated by the
fact that all do-log statements inherit from the implicit log-profile
defined on the logger, we need to provide a way for the user to specify
that custom log-profile that could be used per do-log actions individually
This is what we try to achieve in this commit, by leveraging the
prerequisite work performed by the last 2 commits.
The new field th_ctx->rq_tot_peak contains the computed peak run queue
length averaged over the last 512 calls. This is computed when entering
process_runnable_tasks. It will not take into account new tasks that are
created or woken up during this round nor those which are evicted, which
is the reason why we're using a peak measurement to increase chances to
observe transient high values. Tests have shown that 512 samples are good
to provide a relatively smooth average measurement while still fading
away in a matter of milliseconds at high loads. Since this value is
only updated once per round, it cannot be used as a statistic and
shouldn't be exposed, it's only for internal use (self-regulation).
Sometimes it's desirable to observe fading away peak values, where a new
value that is higher than the historical one instantly replaces it,
otherwise contributes to it. It is convenient when trying to observe
certain phenomenons like peak queue sizes. The new function
swrate_add_peak_local() does that to a private variable (no atomic ops
involved as it's not worth the cost since such use cases are typically
local).
When "set-dumpable" is set to "libs", in addition to marking the process
dumpable, haproxy also reads the binary and shared objects into memory as
a tar archive in a page-aligned location so that these files are easily
extractable from a future core dump. The goal here is to always have
access to the exact same binary and libs as those which caused the core
to happen. It's indeed very frequent to miss some of these, or to get
mismatching files due to a local update that didn't experience a reload,
or to get those of a host system instead of the container.
The in-memory tar file presents everything under a directory called
"core-%d" where %d corresponds to the PID of the worker process. In
order to ease the finding of these data in the core dump, the memory
area is contiguous and surrounded by PROT_NONE pages so that it appears
in its own segment in the core file. The total size used by this is a
few tens of MB, which is not a problem on large systems.
New function load_file_into_tar() concatenates a file into an in-memory
tar archive and grows its size. Only the base name and a provided prefix
are used to name the faile. If the file cannot be loaded, it's added as
size zero and permissions 0 to show that it failed to load. This will
be used to load post-mortem information so it needs to remain simple.
The purpose here is to create a tar file header in memory from a known
file name, prefix, size and mode. It will be used to prepare archives
of libs in use for improved debugging, but may probably be useful for
other purposes due to its simplicity.
We no longer rely on now_offset stored in the shm-stats-file. Instead
haproxy automatically computes the now_offset relative to the monotonic
clock and the shared global clock.
Indeed, the previous model based on static now_offset when monotonic
clock is available proved to be insufficient when used in
combination with shm-stats-file (that is when monotonic clock is shared
between multiple co-processes). In ideal situation co-processes would
correctly apply the offset to their local monotonic clock and end up
with consistent now_ns. But when restarting from an existing
shm-stats-file from a previous session (ie: prior to reboot), then the
local monotonic clock would no longer be consistent with the one used
to update the file previously, so applying a static offset would fail
to restore clock consistency.
For this specific issue, a workaround was brought by 09bf116
("BUG/MEDIUM: stats-file: detect and fix inconsistent shared clock when resuming from shm-stats-file")
but the solution implemented there was deemed too fragile, because there
is a 60sec window where the fix would fail to detect inconsistent clock
and would leave haproxy with a broken clock ranging from 0 to 60 seconds,
which can be huge..
By simply recomputing the now_offset each time we learn from another
process (through the shared map by reading global_now_ns), we simply
recompute our local offset (difference between OUR monotonic clock
and the SHARED one). Also, in clock_update_global_date(), we make
sure we always recompute the now_offset as now_ms may have been
updated from shared clock if shared clock was ahead of us.
Thanks to that new logic, interrupted processes, resumed processes,
processed started with shm-stats-file from previous session now
correctly recover from those various situations and multiple
co-processes with diverting clocks on startup end up converging to
the same values.
Since it is no longer relevant to save now_offset in the map, it was
removed but to prevent shm-stats-file incompatibility with previous
versions, 8-byte hole was forced, and we didn't bump the shm-stats-file
version on purpose.
This patch may be backported in 3.3 after a solid period of observation
to ensure we didn't break things.
We defer processing of the "-dt" options until after the configuration
file has been read. This will be useful if we ever allow trace sources
to be registered later, for instance with LUA.
No backport needed.
Keywords registered out of an initcall will have a TH_EX_CTX_CLI_KWL
execution context pointing to the keyword list. The report will indicate
the 5 first words of the first command of the list, e.g.:
exec_ctx: cli kwl starting with 'debug counters '
This should also work for CLI keywords registered in Lua.
Now CLI keywords registered via an initcall will be tracked during
execution, by keeping a link to their initcall location. "show threads"
now shows "exec_ctx: kw registered at @debug.c:3093" which indeed
corresponds to the initcall for the debugging commands.
Till now the CLI didn't know what keyword was being processed after it
was parsed. In order to report the execution context, we'll need to
store it. And this may even help for post-mortem analysis to know the
exact keyword being processed, so let's store the pointer in the cli_ctx
part of the appctx.
It allows to know when a thread is currnetly running inside an applet.
For example now "show threads" will show "applet '<CLI>'" for the thread
issuing this command.
It now appears almost everywhere due to callbacks (e.g. ssl_sock_io_cb).
Muxes also become visible now on memory profiling. A small test on h1+ssl
yields 838 lines of statistics. The number of buckets should definitely
be increased, and more grouping criteria should be added.
A performance test was conducted to observe the possible effect of
setting the execution context on each task switch, and it didn't change
at all, remaining at about 1.01 billion ctxsw/s on a 128-thread EPYC.
Most calls to mux ops were instrumented with a CALL_MUX_WITH_RET() or
CALL_MUX_NO_RET() macro in order to make the current thread's context
point to the called mux and be able to track its allocations. Only
a bunch of harmless mux_ctl() and ->subscribe/unsubscribe calls were
left untouched since useless. But destroy/detach/shut/init/snd_buf
and rcv_buf are now tracked.
It will not show allocations performed in IO callback via tasklet
wakeups however.
In order to ease reading of the output, cmp_memprof_ctx() knows about
muxes and sorts based on the .subscribe function address instead of
the mux_ops address so as to keep various callers grouped.
Doing this allows to report the allocations/releases performed by filters
when running with memory profiling enabled. The flt_conf pointer is kept
and the report shows the filter name.
A bit similar to what was done for sample fetch functions and converters,
we now store with each action keyword the location of the initcall when
they're registered this way. Since there are many functions only calling
a LIST_APPEND() (one per ruleset), we now implement a dedicated function
to store the context in all keywords before doing the append.
However that's not sufficient, because keywords are not mandatory for
actions, so we cannot safely rely on rule->kw. Thus we then set the
exec_ctx per rule when they are all scanned in check_action_rules(),
based on the keyword if it exists, otherwise we make a context from
the action_ptr function if it is set (it should).
Finally at all call points we now check rule->exec_ctx.
The purpose here is to be able to spot certain callbacks, such as the
SSL message callbacks, which are difficult to associate to anything.
Thus we introduce a new context type, TH_EX_CTX_FUNC, for which the
context is just the function pointed to by the void *pointer. One
difficulty with callbacks is that the allocation and release contexts
will likely be different, so the code should be properly structured
to allow proper tracking, either by instrumenting all calls, or by
making sure that the free calls are easy to spot in a report.
With the two new context types TH_EX_CTX_SMPF/CONV, we can now also
report contexts corresponding to direct calls to sample_register_fetches()
and sample_register_convs(). In this case, the first word of the keyword
list is reported.
Now keywords are registered with an exec_ctx and this one is passed
when calling ->process. The ctx is of type INITCALL when passed via
an initcall where we know the file name and line number.
This was tested with and extra "malloc(15)" added in smp_fetch_path()
which shows that it works:
$ socat /tmp/sock1 - <<< "show profiling memory"|grep via
Calls | Tot Bytes | Caller and method [via]
1893399 0 60592592 0| 0x78b2ec task_run_applet+0x3339c malloc(32) [via initcall @http_fetch.c:2416]
When the execution context is set to TH_EX_CTX_INITCALL, the pointer
points to a valid initcall, and the decoder will show "kw registered
at %s:%d" with file and line number of the initcall declaration. It's
up to the caller to make the initcall pointer point to the one that was
set during the initcall. The purpose here is to be able to preserve and
pass that knowledge of an initcall down the chain so that future calls
to functions registered via the initcall are still assigned to it.
The INITCALL macros will now store the file and line number where they
are declared into the initcall struct, and RUN_INITCALLS() will assign
them to the global caller_file and caller_line variables, and will even
set caller_initcall to the current initall so that at any instant such
functions know where their caller declared them. This will help with
error messages and traces where a bit of context will be welcome.
The new function chunk_append_thread_ctx() appends to a buffer the given
execution context based on its type and pointer. The goal is to easily
use it in profiling output and thread dumps. For now it only handles
TH_EX_CTX_NONE (which prints nothing) and TH_EX_CTX_OTHER (which indicates
"other ctx" followed by the pointer). It will be extended by new types as
they arrive.
This now allows to report the same function in multiple bins based on the
th_ctx's exec_ctx discriminant. It's also worth noting that the context is
not atomically committed, but this shouldn't be a problem since a single
entry can get it. In the worst case, a second thread trying to create the
same context in parallel would create a different bin just for this call,
which is harmless. The same situation already exists with the caller
pointer.
We have the struct made of a type and a pointer in the th_ctx and a
function to switch it for the current thread. Two macros are provided
to enclose a callee within a temporary context. For now only type OTHER
is supported (only a generic pointer).
The purpose here is to combine two pointers and a long argument instead
of having the caller perform the mixing. Also it's cleaner and more
efficient this was as the arg is mixed after the multiplications, and
modern processors are efficient at multiplying then adding.
We'll need to further extend the pointer hashing code to pass extra
parameters and to retrieve the dropped bits, so let's first split the
part that hashes the pointer from the part that reduces the hash to
the desired size.
Starting with OpenSSL 4.0, X509_get_subject_name(), X509_get_issuer_name(),
and X509_CRL_get_issuer() return a const-qualified X509_NAME pointer.
Similarly, X509_NAME_get_entry() returns a const X509_NAME_ENTRY *, and
X509_NAME_ENTRY_get_data() returns a const ASN1_STRING *.
Introduce the __X509_NAME_CONST__ macro (defined to 'const' for OpenSSL
>= 4.0.0, empty for WolfSSL and older OpenSSL version which lacks const
on these APIs) and use it to qualify X509_NAME * variables and the
parameters of the three DN helper functions ssl_sock_get_dn_entry(),
ssl_sock_get_dn_formatted(), and ssl_sock_get_dn_oneline(). This avoids
both const-qualifier warnings on OpenSSL 4.0 and discarded-qualifier
warnings on WolfSSL, without needing explicit casts at call sites.
In ssl_sock.c (ssl_get_client_ca_file) and ssl_gencert.c
(ssl_sock_do_create_cert), a __X509_NAME_CONST__ X509_NAME * variable was
being reused to store the result of X509_NAME_dup() and then passed to
mutating functions (X509_NAME_add_entry_by_txt, X509_NAME_free). Introduce
separate X509_NAME * variables (xn_dup, subject) to hold the mutable
duplicate.
Original patch from Alexandr Nedvedicky <sashan@openssl.org>:
https://www.mail-archive.com/haproxy@formilux.org/msg46696.html
wake_srv_chk() function is now only used by srv_chk_io_cb(), the
health-checl I/O callback function. So let's remove it. The code of the
function was moved in srv_chk_io_cb().
At we fail to create a mux, in conn_create_mux(), instead of calling the
app_ops .wake() callback function, we can directly call sc_conn_process().
At this stage, we know we are using an connection, so it is safe to do so.
At the end of task_run_applet() and task_process_applet(), instead of
calling the app_ops .wake() callback function, we can directly call
sc_applet_process(). At this stage, we know we are using an applet, so it is
safe to do so.
The 'jwt_tokenize' function that can be used to split a JWT token into
its subparts can either fully process the token (from beginning to end)
when we need to check its signature, or only partially when using the
jwt_header_query or jwt_member_query converters. In this case we relied
on the fact that the return value of the 'jwt_tokenize' function was not
checked because a '-1' was returned (which was not actually an error).
In order to make this logic more explicit, the 'jwt_tokenize' function
now has a way to warn the caller that the token was invalid (less
subparts than the specified 'item_num') or that the token was not
processed in full (enough subparts found without parsing the token all
the way).
The function will now only return 0 if we found strictly the same number
of subparts as 'item_num'.
Existing "compression" filter is a multi-purpose filter that will try
to compress both requests and responses according to "compression"
settings, such as "compression direction".
One of the pre-requisite work identified to implement decompression
filter is that we needed a way to manually define the sequence of
enabled filters to chain them in the proper order to make
compression and decompression chains work as expected in regard
to the intended use-case.
Due to the current nature of the "compression" filter this was not
possible, because the filter has a combined action as it will try
to compress both requests and responses, and as we are about to
implement "filter-sequence" directive, we will not be able to
change the order of execution of the compression filter between
requests and responses.
A possible solution we identified to solve this issue is to split the
existing "compression" filter into 2 distinct filters, one which is
request-oriented, "comp-req", and another one which is response-oriented
"comp-res". This is what we are doing in this commit. Compression logic
in itself is unchanged, "comp-req" will only aim to compress the request
while "comp-res" will try to compress the response. Both filters will
still be invoked on request and responses hooks, but they only do their
part of the job.
From now on, to compress both requests and responses, both filters have
to be enabled on the proxy. To preserve original behavior, the "compression"
filter is still supported, what it does is that it instantiates both
"comp-req" and "comp-res" filters implicitly, as the compression filter is
now effectively split into 2 separate filters under the hood.
When using "comp-res" and "comp-req" filters explicitly, the use of the
"compression direction" setting is not relevant anymore. Indeed, the
compression direction is assumed as soon as one or both filters are
enabled. Thus "compression direction" is kept as a legacy option in
order to configure the "compression" generic filter.
Documentation was updated.
Add a new option, "stats calculate-max-counters [on|off]".
It makes it possible to disable the calculation of max counters, as they
can have a performance cost.
Introduce COUNTERS_UPDATE_MAX(), and use it instead of using
HA_ATOMIC_UPDATE_MAX() directly.
For now it just calls HA_ATOMIC_UPDATE_MAX(), but will later be modified
so that we can disable max calculation.
This can be backported up to 2.8 if the usage of COUNTERS_UPDATE_MAX()
generates too many conflicts.
This function is no longer used. So it is not really an bug. But it is still
available and could be used by legacy applets. In that case, we must take
care to increment the stconn bytes_in value accordingly when input data are
inserted.
This patch must be backported to 3.3.
When the request or the response is received, the numerical value of the
message version is now saved. To do so, the field "vsn" was added in the
http_msg structure. It is an unsigned char. The 4 MSB bits are used for the
major digit and the 4 LSB bits for the minor one.
Of couse, the version must be valid. the HTX_SL_F_NOT_HTTP flag of the
start-line is used to be sure the version is valid. But because this flag is
quite new, we also take care the string representation of the version is 8
bytes length. 0 means the version is not valid.
Now, when the HTTP version format is not strictly valid, flags are set on
the h1 parser and the HTX start-line. H1_MF_NOT_HTTP is set on the H1 parser
and HTX_SL_F_NOT_HTTP is set on the HTX start-line. These flags were
introduced to avoid parsing again and again the version to know if it is a
valid version or not, escpecially because it is most of time valid.
htx_sl_vsn() function can now be used to retrieve the ist string
representing the HTTP version from a start-line passed as parameter. This
function takes care to return the right part of the start-line, depending on
its type (request or response).
This patch reworks the installation of app-ops layer by QUIC MUX.
Previously, app_ops field was stored directly into the quic_conn
structure. Then the MUX reused it directly during its qmux_init().
This patch removes app_ops field from quic_conn and replaces it with a
copy of the negotiated ALPN. By using quic_alpn_to_app_ops(), it ensures
it remains compatible with a known application layer.
On the MUX layer, qcc_install_app_ops() now uses the standard
conn_get_alpn() to retrieve the ALPN from the transport layer. This is
done via the newly defined <get_alpn> QUIC xprt callback.
This new architecture should be cleaner as it better highlights the
responsibility of each layers in the ALPN/app negotiation.
Extract the conversion from ALPN to qcc_app_ops type from quic_conn
source file into QUIC MUX. The newly created function is named
quic_alpn_to_app_ops(). This will serve as a central point to identify
which ALPNs are currently supported in our QUIC stack.
This patch is purely a small refactoring. It will be useful for the next
one which rework MUX app-ops layer init. The current cleanup allows
notably to remove H3/hq-interop headers from quic_conn source file.
The QUIC MUX layer is closed after its transport counterpart. This may
be necessary then to reject any new streams opened by the remote peer.
This operation is dependent however from the application protocol.
Previously, a function qc_h3_request_reject() was directly implemented
in quic_conn source file for use when HTTP/3 was previously negotiated.
However, this solution was not evolutive and broke layering.
This patch introduces a new proper separation with a <strm_reject>
callback defined in quic_conn structure. When set, it will be used to
preemptively close any new stream. QUIC MUX is responsible to set it
just before its closure.
No functional change. This patch is purely a refactoring with a better
architecture design. Especially, H3 specific code from transport layer
is now completely removed.
In most of haproxy code, ALPN is used as a signed char pointer. In QUIC
code instead, it is manipulated as unsigned.
Unifies this by using signed type in QUIC code. This allows to remove a
bunch of unnecessary casts.
The conversion of TASK_WOKEN_RES to a stream event was missing. Among other
things, this wakeup reason is used when a stream is dequeued. So it was
possible to skip the connection establishment if the stream was also woken
up for a timer reason. When this happened, the stream was blocked till the
queue timeout expiration.
Converting TASK_WOKEN_RES to STRM_EVT_RES fixes the issue.
This patch should fix the issue #3290. It must be backported as far as 3.2.
Define a new lock with label PROXIES_DEL_LOCK. Its purpose is to protect
operations performed on global lists or trees while a proxy is freed.
Currently, this lock is unneeded as proxies are only freed on
single-thread init or deinit. However, with the incoming dynamic backend
deletion, this operation will be also performed at runtime, outside of
thread isolation.
Implement be-removable argument to CLI wait. This is implemented via
be_check_for_deletion() invokation, also used by "del backend" handler.
The objective is to test whether a backend instance can be removed. If
this is not the case, the command may returns immediately if the target
proxy is incompatible with dynamic removal or if a user action is
required. Else, the command will wait until the temporary restriction is
lifted.
Define a new proxy flag PR_FL_NON_PURGEABLE. This is used to mark every
proxy instance explicitely referenced in the config. Such instances
cannot be deleted at runtime.
Static use_backend/default_backend rules are handled in
proxy_finalize(). Also, sample expression proxy references are protected
via smp_resolve_args().
Note that this last case also incidentally protects any proxies
referenced via a CLI "set var" expression. This should not be the case
as in this case variable value is instantly resolved so the proxy
reference is not needed anymore. This also affects dynamic servers.
Rename proxy conf <refcount> to <def_ref>. This field only serves for
defaults proxy instances. The objective is to avoid confusion with the
newly introduced <refcount> field used for dynamic backends.
As an optimization, it could be possible to remove <def_ref> and only
use <refcount> also for defaults proxies usage. However for now the
simplest solution is implemented.
This patch does not bring any functional change.
Implement refcount notion into proxy structure. The objective is to be
able to increment refcount on proxy to prevent its deletion temporarily.
This is similar to the server refcount : "del backend" is not blocked
and will remove the targetted instance from the global proxies_list.
However, the final free operation is delayed until the refcount is null.
As stated above, the API is similar to servers. Proxies are initialized
with a refcount of 1. Refcount can be incremented via proxy_take(). When
no longer useful, refcount is decremented via proxy_drop() which
replaces the older free_proxy(). Deinit is only performed once refcount
is null.
This commit also defines flag PR_FL_DELETED. It is set when a proxy
instance has been removed via a "del backend" CLI command. This should
serve as indication to modules which may still have a refcount on the
target proxy so that they can release it as soon as possible.
Note that this new refcount is completely ignored for a default proxy
instance. For them, proxy_take() is pure noop. Free is immediately
performed on first proxy_drop() invokation.
Define a new <px_watch> watcher member in stats applet context. It is
used to register the applet on a proxy when iterating over the proxies
list. <obj1> is automatically updated via the watcher interaction.
Watcher is first initialized prior to stats_dump_proxies() invocation.
This guarantees that stats dump is safe even if applet yields and a
backend is removed in parallel.
Define a new member watcher_list in proxy. It will be used to register
modules which iterate over the proxies list. This will ensure that the
operation is safe even if a backend is removed in parallel.
Add "del backend" handler which is restricted to admin level. Along with
it, a new function be_check_for_deletion() is used to test if the
backend is removable.
Correct documentation for srv_detach() which previously stated that this
function could be called for a server even if not stored in its proxy
list. In fact there is a BUG_ON() which detects this case.
Proxy flags member were of type char. This will soon enough not be
sufficient as new flags will be defined. As such, convert flags member
to unsigned int type.
Now we store and retrieve only counters for the current tgid when more
than one is supported. This allows to significantly reduce contention
on shared stats. The haterm utility saw its performance increase from
4.9 to 5.8M req/s in H1, and 6.0 to 7.6M for H2, both with 5 groups of
16 threads, showing that we don't necessarily need insane amounts of
groups.
Now thanks to new macro EXTRA_COUNTERS_AGGR() we can iterate over all
thread groups storages when returning the data for a given metric. This
remains convenient and mostly transparent. The caller continues to pass
the pointer to the metric in the first group, and offsets are calculated
for all other groups and data summed. For now all groups except the
first one contain only zeroes but reported values are nevertheless
correct.
The goal is to always retrieve the storage address of the first thread
group for the given module. This will be used to iterate over all thread
groups. For now it returns the same value as EXTRA_COUNTERS_GET().
In order to be able to properly allocate all storage and retrieve data
from there, we'll need to know how many thread groups are supposed to
access it. Let's store the number of thread groups at init time. If the
tgrp_step is zero, there's always only one tg though.
Now EXTRA_COUNTERS_ALLOC() takes this number of thread groups in argument
and stores it in the structure. It also allocates as many areas as needed,
incrementing the datap pointer by the step for each of them.
EXTRA_COUNTERS_FREE() uses this info to free all allocated areas.
EXTRA_COUNTERS_INIT() initializes all allocated areas, this is used
elsewhere to clear/preset counters, e.g. in proxy_stats_clear_counters().
It involves a memcpy() call for each array, which is normally preset to
something empty but might also be used to preset certain non-scalar
fields such as an instance name.
We'll need to permit any user to update its own tgroup's extra counters
instead of the global ones. For this we now store the per-tgroup step
between two consecutive data storages, for when they're stored in a
tgroup array. When shared (e.g. resolvers or listeners), we just store
zero to indicate that it doesn't scale with tgroups. For now only the
registration was handled, it's not used yet.
Servers, proxies, listeners and resolvers all use extra_counters. We'll
need to move the storage to per-tgroup for those where it matters. Now
we're relying on an external storage, and the data member of the struct
was replaced with a pointer to that pointer to data called datap. When
the counters are registered, these datap are set to point to relevant
locations. In the case of proxies and servers, it points to the first
tgrp's storage. For listeners and resolvers, it points to a local
storage. The rationale here is that listeners are limited to a single
group anyway, and that resolvers have a low enough load so that we do
not care about contention there.
Nothing should change for the user at this point.
We'll soon need to iterate over thread groups in the fill_stats() functions,
so let's first pass the extra_counters and stats_module pointers to the
fill_stats functions. They now call EXTRA_COUNTERS_GET() themselves with
these elements in order to retrieve the required pointer. Nothing else
changed, and it's getting even a bit more transparent for callers.
This doesn't change anything visible however.
A number of C files include stats.h or stats-t.h, many of which were
just to access the counters. Now those which really need counters rely
on counters.h or counters-t.h, which already reduces the amount of
preprocessed code to be built (~3000 lines or about 0.05%).
It was always difficult to find extra_counters when the rest of the
counters are now in counters-t.h. Let's move the types to counters-t.h
and the macros to counters.h. Stats include them since they're used
there. But some users could be cleaned from the stats definitions now.
There's something a bit awkward in the way stats counters are inherited
through the QUIC modules: quic_conn-t includes quic_stats-t.h, which
declares quic_stats_module as extern from a type that's not known from
this file. And anyway externs should not be exported from type defintions
since they're not part of the ABI itself.
This commit moves the declaration to quic_stats.h which now takes care
to include stats-t.h to get the definition of struct stats_module. The
few users who used to learn it through quic_conn-t.h now include it
explicitly. As a bonus this reduces the number of preprocessed lines
by 5000 (~0.1%).
By the way, it looks like struct stats_module could benefit from being
moved off stats-t.h since it's only used at places where the rest of
the stats is not needed. Maybe something to consider for a future
cleanup.
The QUIC mux requires "application operations" (app ops), which are a list
of callbacks associated with the application level (i.e., h3, h0.9) and
derived from the ALPN. For 0-RTT, when the session cache cannot be reused
before activation, the current code fails to reach the initialization of
these app ops, causing the mux to crash during its initialization.
To fix this, this patch restores the behavior of
ssl_sock_srv_try_reuse_sess(), whose purpose was to reuse sessions stored
in the session cache regardless of whether 0-RTT was enabled, prior to
this commit:
MEDIUM: quic-be: modify ssl_sock_srv_try_reuse_sess() to reuse backend
sessions (0-RTT)
With this patch, this function now does only one thing: attempt to reuse a
session, and that's it!
This patch allows ignoring whether a session was successfully reused from
the cache or not. This directly fixes the issue where app ops
initialization was skipped upon a session cache reuse failure. From a
functional standpoint, starting a mux without reusing the session cache
has no negative impact; the mux will start, but with no early data to
send.
Finally, there is the case where the ALPN is reset when the backend is
stopped. It is critical to continue locking read access to the ALPN to
secure shared access, which this patch does. It is indeed possible for the
server to be stopped between the call to connect_server() and
quic_reuse_srv_params(). But this cannot prevent the mux to start
without app ops. This is why a 'TODO' section was added, as a reminder that a
race condition regarding the ALPN reset still needs to be fixed.
Must be backported to 3.3
Some perf profiles occasionally show that reading the trace source's
state can take some time, which is not expected at all. It just happens
that the trace_source is not cache-aligned so depending on linkage, it
may share a cache line with a more active variable, thereby inducing a
slow down to all threads trying to read the variable.
Let's always mark it aligned to avoid this. For now the problem was not
observed again.
quic_conn is initialized with a pointer to its proxy counters. These
counters are then updated during the connection lifetime.
Counters pointer was incorrect for backend quic_conn, as it always
referenced frontend counters. For pure backend, no stats would be
updated. For listen instances, this resulted in incorrect stats
reporting.
Fix this by correctly set proxy counters based on the connection side.
This must be backported up to 3.3.
Auto SNI configuration is configured during check config validity.
However, nothing was implemented for dynamic servers.
Fix this by implementing auto SNI configuration during "add server" CLI
handler. Auto SNI configuration code is moved in a dedicated function
srv_configure_auto_sni() called both for static and dynamic servers.
Along with this, allows the keyword "no-sni-auto" on dynamic servers, so
that this process can be deactivated if wanted. Note that "sni-auto"
remains unavailable as it only makes sense with default-servers which
are never used for dynamic server creation.
This must be backported up to 3.3.
shm-stats-file heartbeat is derived from now_ms with an extra time added
to it, thus it should be handled using the same time as now_ms is.
Until now, we used to handle heartbeat using signed integer. This was not
found to cause severe harm but it could result in improper handling due
to early wrapping because of signedness for instance, so let's better fix
that before it becomes a real issue.
It should be backported in 3.3
Contrary to haproxy, httpterm does not support all the HTTP protocols.
Furthermore, it has become easier to handle inbound/outbound
connections / streams since the rework done at conn_stream level.
This patch implements httpterm HTTP server services into haproxy. To do
so, it proceeds the same way as for the TCP checks which use only one
stream connector, but on frontend side.
The makefile is modified to handle haterm.c in additions to all the C
files for haproxy to build new haterm program into haproxy, the haterm
server also instantiates a haterm stream (hstream struct) attached to a
stream connector for each incoming connection without backend stream
connector. This is the role of sc_new_from_endp() called by the muxes to
instantiate streams/hstreams.
As for stream_new(), hstream_new() instantiates a task named
process_hstream() (see haterm.c) which has the same role as
process_stream() but for haterm streams.
haterm into haproxy takes advantage of the HTTP muxes and HTX API to
support all the HTTP protocols supported by haproxy.
Add a pointer to function to proxies as ->stream_new_from_sc proxy
struct member to instantiate stream from connection as this is done by
all the muxes when they call sc_new_from_endp(). The default value for
this pointer is obviously stream_new() which is exported by this patch.
This patch provides the possibility to initialize haproxy without
configuration file. This may be identified by the new global and exported
<fileless_mode> and <fileless_cfg> variables which may be used to
provide a struct cfgfile to haproxy by others means than a physical
file (built in memory).
When enabled, this fileless mode skips all the configuration files
parsing.
Add definitions for haterm stream as arguments to be used by the TRACE API.
This will be used by the haterm module to come which will have to handle
hstream struct objects (in place of stream struct objects).
Add "generate-dummy" on/off type keyword to "load" directive to
automatically generate dummy certificates as this is done for ACME from
ckch_conf_load_pem_or_generate() function which is called if a "crt"
keyword is also provide for this directive.
Also implement "keytype" to specify the key type used for these
certificates. Only "RSA" or "ECDSA" is accepted. This patch also
implements "bits" keyword for the "load" directive to specify the
private key size used for RSA. For ECDSA, a new "curves" keyword is also
provided by this patch to specify the curves to be used for the EDCSA
private keys generation.
ckch_conf_load_pem_or_generate() is modified to use these parameters
provided by "keytype", "bits" and "curves" to generate the private key
with ssl_gen_EVP_PKEY() before generating the X509 certificate calling
ssl_gen_x509().
Move acme_EVP_PKEY_gen() implementation to ssl_gencrt.c and rename it to
ssl_EVP_PKEY_gen(). Also extract from acme_gen_tmp_x509() the generic
part to implement ssl_gen_x509() into ssl_gencrt.c.
To generate a self-signed expired certificate ssl_EVP_PKEY_gen() must be
used to generate the private key. Then, ssl_gen_x509() must be called
with the private key as argument. acme_gen_tmp_x509() is also modified
to called these two functions to generate a temporary certificate has
done before modifying this part.
Such an expired self-signed certificate should not be use on the field
but only during testing and development steps.
Add the ability to set connect, queue and tarpit timeouts from the
set-timeout action. This is especially useful when using set-dst to
dynamically connect to servers.
This patch also adds the relevant fe_/be_/cur_ sample fetches for these
timeouts.
b_is_default() and b_is_large() can now be used to know if a buffer is a
default buffer or a large one. _b_free() now relies on it.
These functions are also used when possible (stream_free(),
stream_release_buffers() and http_wait_for_msg_body()).
Thanks to previous patches, it is now possible to allocate a large buffer to
store the message payload in the context of the "wait-for-body" action. To
do so, "use-large-buffer" option must be set.
It means now it is no longer necessary to increase the regular buffer size
to be able to get message payloads of some requests or responses.
Because there is now a memory pool for large buffers, we must also add the
support for large chunks. So, if large buffers are configured, a dedicated
memory pool is created to allocate large chunks. alloc_large_trash_chunk()
must be used to allocate a large chunk. alloc_trash_chunk_sz() can be used to
allocate a chunk with the best size. However free_trash_chunk() remains the
only way to release a chunk, regular or large.
In addition, large trash buffers are also created, using the same mechanism
than for regular trash buffers. So three thread-local trash buffers are
created. get_large_trash_chunk() must be used to get a large trash buffer.
And get_trash_chunk_sz() may be used to get a trash buffer with the best
size.
Add the support for large bufers. A dedicated memory pool is added. The size
of these buffers must be explicitly configured by setting
"tune.bufsize.large" directive. If it is not set, the pool is not
created. In addition, if the size for large buffers is the same than for
regular buffer, the feature is automatically disable.
For now, large buffers remain unused.
First, an HTX flags was added to know when blocks are unordered. It may
happen when a header is added while part of the payload was already received
or when the start-line is replaced by an new one. In these cases, the blocks
indexes are in the right order but not the blocks payload. Knowing a message
is unordered can be useful to trigger a defragmentation, mainly to be able
to append data properly for instance.
Then, detection of fragmented messages was improved, especially when a
header or a start-line is replaced by a new one.
Finally, when data are added in a message and cannot be appended into the
previous DATA block because the message is not aligned, a defragmentation is
performed to realign the message and append data.
It is not a bug fix, because there is no way to hit the issue for now. But
there is nothing preventing a loop of synchronous sends in process_stream().
Indead, when a synchronous send is successfully performed, we restart the
SCs evaluation and at the end another synchronous send is attempted. So with
an endpoint consuming data bit by bit or with a filter fowarding few bytes
at each call, it is possible to loop for a while in process_stream().
Because it is not expected, we now limit the number of synchronous send per
wakeup to two calls. In a nominal case, it should never be more. This commit
is mandatory to be able to handle large buffers on channels
There is no reason to backport this commit except if the large buffers
support on channels are backported.
At many places, we rely on global.tune.bufsize value instead of using the buffer
size. For now, it is not a problem. But if we want to be able to deal with
buffers of different sizes, it is good to reduce as far as possible dependencies
on the global value. most of time, we can use b_size() or c_size()
functions. The main change is performed on the error snapshot where the buffer
size was added into the error_snapshot structure.
sc_have_buff(), sc_need_buff(), sc_have_room() and sc_need_room() are
related to the buffer's channel. So we can move them in sc_strm.h header
file. In addition, this will be mandatory for the next commit.
This reverts commit 235e8f1afd.
Prior to the above commit, snd_buf callback for QUIC MUX was able to
deal with data even after stream closure. The excess was simply
discarded, as no STREAM frame can be emitted after FIN/RESET_STREAM.
This code was later removed and replaced by a BUG_ON() to ensure snd_buf
is never called after stream closure.
However, this approach is too strict. Indeed, there is nothing in the
haproxy stream architecture which forbids this scheduling, in part
because QUIC MUX is the sole responsible of the stream closure. As such,
it is preferable to revert to the old code to prevent any triggering of
a BUG_ON() failure.
Note that nego_ff does not implement data draining if called after
stream closure. This will be done in a future patch.
Thanks to Mike Walker for his investigation on the subject.
This must be backported up to 2.8.
In the historical implementation, all filter related information where
stored at the stream level (using struct strm_flt * context), and filters
iteration was performed at the stream level also.
We identified that this was not ideal and would make the implementation of
future filters more complex since filters ordering should be handled in
a different order during request and response handling for decompression
for instance.
To make such thing possible, in this commit we migrate some channel
specific filter contexts in the channel directly (request or response),
and we implement 2 additional filter lists, one on the request channel
and another on the response channel. The historical stream filter list
is kept as-is because in some contexts only the stream is available and
we have to iterate on all filters. But for functions where we only are
interested in request side or response side filters, we now use dedicated
channel filters list instead.
The only overhead is that the "struct filter" was expanded by two "struct
list".
For now, no change of behavior is expected.
Multiple channel related functions have the same construction: they use
list_for_each_entry() to work on a given filter from the stream+channel
combination. In future commits we will try to use filter list from
dedicated channel list instead of the stream one, thus in this patch we
need as a prerequisite to implement and use the flt_list_{start,next} API
to iterate over filter list, giving the API the responsibility to iterate
over the correct list depending on the context, while the calling function
remains free to use the iteration construction it needs. This way we will
be able to easily change the way we iterate over filter list without
duplicating the code for requests and responses.
The documentation of @system-ca specifies that one can overwrite the
value provided by the SSL Library using SSL_CERT_DIR.
However it seems like X509_get_default_cert_dir() is not affected by
this environment variable, and X509_get_default_cert_dir_env() need to
be used in order to get the variable name, and get the value manually.
This could be backported in every stable branches. Note that older
branches don't have the memprintf in ssl_sock.c.
In continuity of previous patch, this one makes use of the new profiling
flags. For this, based on the global "profiling" setting, when switching
profiling on, we set or clear two flags on the thread context,
TH_FL_TASK_PROFILING_L and TH_FL_TASK_PROFILING_M to indicate whether
lock profiling and/or malloc profiling are desired when profiling is
enabled. These flags are checked along with TH_FL_TASK_PROFILING to
decide when to collect time around a lock or a malloc. And by default
we're back to the behavior of 3.2 in that neither lock nor malloc times
are collected anymore.
This is sufficient to see the CPU usage spent in the VDSO to significantly
drop from 22% to 2.2% on a highly loaded system.
This should be backported to 3.3 along with the previous patch.
Damien Claisse reported in issue #3257 a performance regression between
3.2 and 3.3 when task profiling is enabled, more precisely in relation
with the following patches were merged:
98cc815e3e ("MINOR: activity: collect time spent with a lock held for each task")
503084643f ("MINOR: activity: collect time spent waiting on a lock for each task")
9d8c2a888b ("MINOR: activity: collect CPU time spent on memory allocations for each task")
The issue mostly comes from the first patches. What happens is that the
local time is taken when entering and leaving each lock, which costs a
lot on a contended system. The problem here is the lack of finegrained
settings for lock and malloc profiling.
This patch introduces a better approach. The task profiler goes back to
its default behavior in on/auto modes, but the configuration now accepts
new extra options "lock", "no-lock", "memory", "no-memory" to precisely
indicate other timers to watch for each task when profiling turns on.
This is achieved by setting two new flags HA_PROF_TASKS_LOCK and
HA_PROF_TASKS_MEM in the global "profiling" variable.
This patch only parses the new values and assigns them to the global
variable from the config file for now. The doc was updated.
An issue was introduced in 3.0 with commit faa8c3e024 ("MEDIUM: lb-chash:
Deterministic node hashes based on server address"): the new server_key
field and lb_nodes entries initialization were not updated for servers
added at run time with "add server": server_key remains zero and the key
used in lb_node remains the one depending only on the server's ID.
This will cause trouble when adding new servers with consistent hashing,
because the hash-key will be ignored until the server's weight changes
and the key difference is detected, leading to its recalculation.
This is essentially caused by the poorly placed lb_nodes initialization
that is specific to lb-chash and had to be replicated in the code dealing
with server addition.
This commit solves the problem by adding a new ->server_init() function
in the lbprm proxy struct, that is called by the server addition code.
This also allows to abandon the complex check for LB algos that was
placed there for that purpose. For now only lb-chash provides such a
function, and calls it as well during initial setup. This way newly
added servers always use the correct key now.
While it should also theoretically have had an impact on servers added
with the "random" algorithm, it's unlikely that the difference between
proper server keys and those based on their ID could have had any visible
effect.
This patch should be backported as far as 3.0. The backport may be eased
by a preliminary backport of previous commit "CLEANUP: lb-chash: free
lb_nodes from chash's deinit(), not global", though this is not strictly
necessary if context is manually adjusted.
Implement proxy ID generation for dynamic backends. This is performed
through the already function existing proxy_get_next_id().
As an optimization, lookup will performed starting from a global
variable <dynpx_next_id>. It is initialized to the greatest ID assigned
after parsing, and updated each time a backend instance is created. When
backend deletion will be implemented, it could be lowered to the newly
available slot.
Add an optional "mode" argument to "add backend" CLI command. This
argument allows to specify if the backend is in TCP or HTTP mode.
By default, it is mandatory, unless the inherited default proxy already
explicitely specifies the mode. To differentiate if TCP mode is implicit
or explicit, a new proxy flag PR_FL_DEF_EXPLICIT_MODE is defined. It is
set for every defaults instances which explicitely defined their mode.
Move backend compatibility checks performed during 'add server' in a
dedicated function be_supports_dynamic_srv(). This should simplify
addition of future restriction.
This function will be reused when implementing backend creation at
runtime.
Define a new utility function str_to_proxy_mode() which is able to
convert a string into the corresponding proxy mode if possible. This new
function is used for the parsing of "mode" configuration proxy keyword.
This patch will be reused for dynamic backend implementation, in order
to parse a similar "mode" argument via a CLI handler.