The loop looking for existing ADD items to renew their last_seen must ignore
the items already renewed in the same loop. To do so, we rely on the
last_seen time. because it is now based on now_ms, it is safe.
Doing so avoid to match several time the same ADD item when the same IP
address is found in several ADD item. This reduces the number of extra DNS
resolutions.
This patch depends on "MINOR: resolvers: Use milliseconds for cached items
in resolver responses". Both may be backported as far as 2.2 if necessary.
The last time when an item was seen in a resolver responses is now stored in
milliseconds instead of seconds. This avoid some corner-cases at the
edges. This also simplifies time comparisons.
At startup, if a SRV resolution is set for a server, no DNS resolution is
created. We must wait the first SRV resolution to know if it must be
triggered. It is important to do so for two reasons.
First, during a "classical" startup, a server based on a SRV resolution has
no hostname. Thus the created DNS resolution is useless. Best waiting the
first SRV resolution. It is not really a bug at this stage, it is just
useless.
Second, in the same situation, if the server state is loaded from a file,
its hosname will be set a bit later. Thus, if there is no additionnal record
for this server, because there is already a DNS resolution, it inhibits any
new DNS resolution. But there is no hostname attached to the existing DNS
resolution. So no resolution is performed at all for this server.
To avoid any problem, it is fairly easier to handle this special case during
startup. But this means we must be prepared to have no "resolv_requester"
field for a server at runtime.
This patch must be backported as far as 2.2.
Another way to say it: "Safely unlink requester from a requester callbacks".
Requester callbacks must never try to unlink a requester from a resolution, for
the current requester or another one. First, these callback functions are called
in a loop on a request list, not necessarily safe. Thus unlink resolution at
this place, may be unsafe. And it is useless to try to make these loops safe
because, all this stuff is placed in a loop on a resolution list. Unlink a
requester may lead to release a resolution if it is the last requester.
However, the unkink is necessary because we cannot reset the server state
(hostname and IP) with some pending DNS resolution on it. So, to workaround
this issue, we introduce the "safe" unlink. It is only performed from a
requester callback. In this case, the unlink function never releases the
resolution, it only reset it if necessary. And when a resolution is found
with an empty requester list, it is released.
This patch depends on the following commits :
* MINOR: resolvers: Purge answer items when a SRV resolution triggers an error
* MINOR: resolvers: Use a function to remove answers attached to a resolution
* MINOR: resolvers: Directly call srvrq_update_srv_state() when possible
* MINOR: resolvers: Add function to change the srv status based on SRV resolution
All the series must be backported as far as 2.2. It fixes a regression
introduced by the commit b4badf720 ("BUG/MINOR: resolvers: new callback to
properly handle SRV record errors").
don't release resolution from requester cb
When the server status must be updated from the result of a SRV resolution,
we can directly call srvrq_update_srv_state(). It is simpler and this avoid
a test on the server DNS resolution.
This patch is mandatory for the next commit. It also rely on "MINOR:
resolvers: Directly call srvrq_update_srv_state() when possible".
srvrq_update_srv_status() update the server status based on result of SRV
resolution. For now, it is only used from snr_update_srv_status() when
appropriate.
When a SRV request trigger an error, if we decide to handle the error
because last_valid duration is expired, the answer list may be purged. All
items are considered as obsolete.
resolv_purge_resolution_answer_records() must be used to removed all answers
attached to a resolution. For now, it is only used when a resolution is
released.
When a ADD item attached to a SRV item is removed because it is obsolete, we
must trigger a DNS resolution to be sure the hostname still resolves or
not. There is no other way to be the entry is still valid. And we cannot set
the server in RMAINT immediatly, because a DNS server may be inconsitent and
may stop to add some additionnal records.
The opposite is also true. If a valid ADD item is still attached to a SRV
item, any DNS resolution must be stopped. There is no reason to perform
extra resolution in this case.
This patch must be backported as far as 2.2.
If no ADD item is found for a SRV item in a SRV response, a DNS resolution
is triggered. When it succeeds, we must be sure the SRV item is still
alive. Otherwise the DNS resolution must be ignored.
This patch depends on the commit "MINOR: resolvers: Move last_seen time of
an ADD into its corresponding SRV item". Both must be backported as far as
2.2.
This function search for a SRV answer item associated to a requester
whose type is server.
This is mainly useful to "link" a server to its SRV record when no
additional record were found to configure the IP address.
This patch is required by a bug fix.
For each ADD item found in a SRV response, we try to find a corresponding
ADD item already attached to an existing SRV item. If found, the ADD
last_seen time is updated, otherwise we try to find a SRV item with no ADD
to attached the new one.
However, the loop is buggy. Instead of comparing 2 ADD items, it compares
the new ADD item with the SRV item. Because of this bug, we are unable to
renew last_seen time of existing ADD.
This patch must be backported as far as 2.2.
when a server status is updated based on a SRV item, it is always set to UP,
regardless it has an IP address defined or not. For instance, if only a SRV
item is received, with no additional record, only the server hostname is
defined. We must wait to have an IP address to set the server as UP.
This patch must be backported as far as 2.2.
When a server is set in RMAINT becaues of a SRV resolution failure, the
server DNS resolution, if any, must be unlink first. It is mandatory to
handle the change in the context of a SRV resolution.
This patch must be backported as far as 2.2.
When a DNS resolution error is detected, in snr_resolution_error_cb(), the
server address must be reset only if the server status has changed. It this
case, it means the server is set to RMAINT. Thus the server address may by
reset.
This patch fixes a bug introduced by commit d127ffa9f ("BUG/MEDIUM:
resolvers: Reset address for unresolved servers"). It must be backported as
far as 2.0.
When an error is received for a DNS resolution, for instance a NXDOMAIN
error, the server must be considered to have no address when its status is
updated, not the opposite.
Concretly, because this parameter is not used on error path in
snr_update_srv_status(), there is no impact.
This patch must be backported as far as 1.8.
This reverts commit a331a1e8eb.
This commit fixes a real bug, but it also reveals some hidden bugs, mostly
because of some design issues. Thus, in itself, it create more problem than
it solves. So revert it for now. All known bugs will be addressed in next
commits.
This patch should be backported as far as 2.2.
This was introduced in previous commit 49c2b45c1 ("MINOR: cfgparse/server:
try to fix spelling mistakes on server lines"), the loop was changed but
the increment left. No backport is needed.
This adds support for action_suggest() in http-request, http-response
and http-after-response rulesets. For example:
parsing [/dev/stdin:2]: 'http-request' expects (...), but got 'del-hdr'. Did you mean 'del-header' maybe ?
action_suggest() will return a pointer to an action whose keyword more or
less ressembles the passed argument. It also accepts to be more tolerant
against prefixes (since actions taking arguments are handled as prefixes).
This will be used to suggest approaching words.
Just like with the server keywords, now's the turn of "bind" keywords.
The difference is that 100% of the bind keywords are registered, thus
we do not need the list of extra keywords.
There are multiple bind line parsers today, all were updated:
- peers
- log
- dgram-bind
- cli
$ printf "listen f\nbind :8000 tcut\n" | ./haproxy -c -f /dev/stdin
[NOTICE] 070/101358 (25146) : haproxy version is 2.4-dev11-7b8787-26
[NOTICE] 070/101358 (25146) : path to executable is ./haproxy
[ALERT] 070/101358 (25146) : parsing [/dev/stdin:2] : 'bind :8000' unknown keyword 'tcut'; did you mean 'tcp-ut' maybe ?
[ALERT] 070/101358 (25146) : Error(s) found in configuration file : /dev/stdin
[ALERT] 070/101358 (25146) : Fatal errors found in configuration.
Let's apply the fuzzy match to server keywords so that we can avoid
dumping the huge list of supported keywords each time there is a spelling
mistake, and suggest proper spelling instead:
$ printf "listen f\nserver s 0 sendpx-v2\n" | ./haproxy -c -f /dev/stdin
[NOTICE] 070/095718 (24152) : haproxy version is 2.4-dev11-caa6e3-25
[NOTICE] 070/095718 (24152) : path to executable is ./haproxy
[ALERT] 070/095718 (24152) : parsing [/dev/stdin:2] : 'server s' unknown keyword 'sendpx-v2'; did you mean 'send-proxy-v2' maybe ?
[ALERT] 070/095718 (24152) : Error(s) found in configuration file : /dev/stdin
[ALERT] 070/095718 (24152) : Fatal errors found in configuration.
The global section also knows a large number of keywords that are not
referenced in any list, so this needed them to be specifically listed.
It becomes particularly handy now because some tunables are never easy
to remember, but now it works remarkably well:
$ printf "global\nsched.queue_depth\n" | ./haproxy -c -f /dev/stdin
[NOTICE] 070/093007 (23457) : haproxy version is 2.4-dev11-dd8ee5-24
[NOTICE] 070/093007 (23457) : path to executable is ./haproxy
[ALERT] 070/093007 (23457) : parsing [/dev/stdin:2] : unknown keyword 'sched.queue_depth' in 'global' section; did you mean 'tune.runqueue-depth' maybe ?
[ALERT] 070/093007 (23457) : Error(s) found in configuration file : /dev/stdin
[ALERT] 070/093007 (23457) : Fatal errors found in configuration.
Let's start by the largest keyword list, the listeners. Many keywords were
still not part of a list, so a common_kw_list array was added to list the
not enumerated ones. Now for example, typing "tmout" properly suggests
"timeout":
$ printf "frontend f\ntmout client 10s\n" | ./haproxy -c -f /dev/stdin
[NOTICE] 070/091355 (22545) : haproxy version is 2.4-dev11-3b728a-21
[NOTICE] 070/091355 (22545) : path to executable is ./haproxy
[ALERT] 070/091355 (22545) : parsing [/dev/stdin:2] : unknown keyword 'tmout' in 'frontend' section; did you mean 'timeout' maybe ?
[ALERT] 070/091355 (22545) : Error(s) found in configuration file : /dev/stdin
[ALERT] 070/091355 (22545) : Fatal errors found in configuration.
Instead of just reporting "unknown keyword", let's provide a function which
will look through a list of registered keywords for a similar-looking word
to the one that wasn't matched. This will help callers suggest correct
spelling. Also, given that a large part of the config parser still relies
on a long chain of strcmp(), we'll need to be able to pass extra candidates.
Thus the function supports an optional extra list for this purpose.
This introduces two functions, one which creates a fingerprint of a word,
and one which computes a distance between two words fingerprints. The
fingerprint is made by counting the transitions between one character and
another one. Here we consider the 26 alphabetic letters regardless of
their case, then any digit as a digit, and anything else as "other". We
also consider the first and last locations as transitions from begin to
first char, and last char to end. The distance is simply the sum of the
squares of the differences between two fingerprints. This way, doubling/
missing a letter has the same cost, however some repeated transitions
such as "e"->"r" like in "server" are very unlikely to match against
situations where they do not exist. This is a naive approach but it seems
to work sufficiently well for now. It may be refined in the future if
needed.
The error message for http-request and http-response starts with a comma
that very likely is a leftover from a previous list construct. Let's remove
it: "'http-request' expects , 'wait-for-handshake', 'use-service' ...".
The error message after "http-response set-var" isn't very clear:
[ALERT] 070/115043 (30526) : parsing [/dev/stdin:2] : error detected in proxy 'f' while parsing 'http-response set-var' rule : invalid variable 'set-var'. Expects 'set-var(<var-name>)' or 'unset-var(<var-name>)'.
Let's change it to this instead:
[ALERT] 070/115608 (30799) : parsing [/dev/stdin:2] : error detected in proxy 'f' while parsing 'http-response set-var' rule : invalid or incomplete action 'set-var'. Expects 'set-var(<var-name>)' or 'unset-var(<var-name>)'.
With a wrong action name, it also works better (it's handled as a prefix
due to the opening parenthesis):
[ALERT] 070/115608 (30799) : parsing [/dev/stdin:2] : error detected in proxy 'f' while parsing 'http-response set-varxxx' rule : invalid or incomplete action 'set-varxxx'. Expects 'set-var(<var-name>)' or 'unset-var(<var-name>)'.
The tcp-request error message only mentions "accept", "reject" and
track-sc*, but there are a few other ones that were missing, so let's
add them.
This could be backported, though it's not likely that it will help anyone
with an existing config.
The refactoring in commit 131b07be3 ("MEDIUM: server: Refactor
apply_server_state() to make it more readable") also had a copy-paste
error resulting in using global.server_state_file instead of the
function's argument, which easily crashes with a conf having a
state file in a backend and no global state file.
In addition, let's simplify the code and get rid of strcpy() which
almost certainly will break the build on OpenBSD.
This was introduced in 2.4-dev10, no backport is needed.
The refactoring in commit 131b07be3 ("MEDIUM: server: Refactor
apply_server_state() to make it more readable") made the global
server_state_base be dereferenced before being checked, resulting
in a crash on certain files.
This happened in 2.4-dev10, no backport is needed.
When a "tcp-check" or a "http-check" rule is parsed, we try to get the
previous rule in the ruleset to get its index. We must take care to reset
the pointer on this rule in case an error is triggered later on the
parsing. Otherwise, the same rule may be released twice. For instance, it
happens with such line :
http-check meth GET uri / ## note there is no "send" parameter
This patch must be backported as far as 2.2.
If an agent-check is configured for a server, When the response is parsed,
the .health threshold of the agent must be updated on up/down/stopped/fail
command and not the threshold of the health-check. Otherwise, the
agent-check will compete with the health-check and may mark a DOWN server as
UP.
This patch should fix the issue #1176. It must be backported as far as 2.2.
CF_FL_ANALYZE flag is used to know a channel is filtered. It is important to
synchronize request and response channels when the filtering ends.
However, it is possible to call all request analyzers before starting the
filtering on the response channel. This means flt_end_analyze() may be
called for the request channel before flt_start_analyze() on the response
channel. Thus because CF_FL_ANALYZE flag is not set on the response channel,
we consider the filtering is finished on both sides. The consequence is that
flt_end_analyze() is not called for the response and backend filters are
unregistered before their execution on the response channel.
It is possible to encounter this bug on TCP frontend or CONNECT request on
HTTP frontend if the client shutdown is reveiced with the first read.
To fix this bug, CF_FL_ANALYZE is set when filters are attached to the
stream. It means, on the request channel when the stream is created, in
flt_stream_start(). And on both channels when the backend is set, in
flt_set_stream_backend().
This patch must be backported as far as 1.7.
Setting multiple http-request track-scX rules generates entries
which never expires.
If there was already an entry registered by a previous http rule
'stream_track_stkctr(&s->stkctr[rule->action], t, ts)' didn't
register the new 'ts' into the stkctr. And function is left
with no reference on 'ts' whereas refcount had been increased
by the '_get_entry'
The patch applies the same policy as the one showed on tcp track
rules and if there is successive rules the track counter keep the
first entry registered in the counter and nothing more is computed.
After validation this should be backported in all versions.
The recent default runqueue size reduction appeared to have significantly
lowered performance on low-thread count configs. Testing various values
runqueue values on different workloads under thread counts ranging from
1 to 64, it appeared that lower values are more optimal for high thread
counts and conversely. It could even be drawn that the optimal value for
various workloads sits around 280/sqrt(nbthread), and probably has to do
with both the L3 cache usage and how to optimally interlace the threads'
activity to minimize contention. This is much easier to optimally
configure, so let's do this by default now.
Instead of setting a hard-limit on runqueue-depth and keeping it short
to maintain fairness, let's allow the scheduler to automatically cut
the existing one in two equal halves if its size is between the configured
size and its double. This will allow to increase the default value while
keeping a low latency.
Emeric found that SSL+keepalive traffic had dropped quite a bit in the
recent changes, which could be bisected to recent commit 9205ab31d
("MINOR: ssl: mark the SSL handshake tasklet as heavy"). Indeed, a
first incarnation of this commit made use of the TASK_SELF_WAKING
flag but the last version directly used TASK_HEAVY, but it would still
continue to remove the already absent TASK_SELF_WAKING one instead of
TASK_HEAVY. As such, the SSL traffic remained processed with low
granularity.
No backport is needed as this is only 2.4.
The tests were made on slz and the zlib parsers for memlevel and windowsize
managed to escape the change made by commit 018251667 ("CLEANUP: config: make
the cfg_keyword parsers take a const for the defproxy"). This is now fixed.
These are some leftovers from the ancient code where they were still
called sessions, but these areas in the code remain confusing due to
this naming. They were now called "strm" which will not even affect
indenting nor alignment.
When implementing a client applet, a NULL dereference was encountered on
the error path which increment the counters.
Indeed, the counters incremented are the one in the listener which does
not exist in the case of client applets, so in sess->listener->counters,
listener is NULL.
This patch fixes the access to the listener structure when accessing
from a sesssion, most of the access are the counters in error paths.
Must be backported as far as 1.8.
The default proxy was passed as a variable to all parsers instead of a
const, which is not without risk, especially when some timeout parsers used
to make some int pointers point to the default values for comparisons. We
want to be certain that none of these parsers will modify the defaults
sections by accident, so it's important to mark this proxy as const.
This patch touches all occurrences found (89).
grq_total was incremented when picking tasks from the global run queue,
but this variable was not defined with threads disabled, and the code
was optimized away at -O2. No backport is needed.
This commit is a fix/complement to the following one :
08d87b3f49
BUG/MEDIUM: backend: never reuse a connection for tcp mode
It fixes the check for the early insertion of backend connections in
the reuse lists if the backend mode is HTTP.
The impact of this bug seems limited because :
- in tcp mode, no insertion is done in the avail list as mux_pt does not
support multiple streams.
- in http mode, muxes are also responsible to insert backend connections
in lists in their detach functions. Prior to this fix the reuse rate
could be slightly inferior.
It can be backported to 2.3.
Currently, there seems to be no way to have the transport layer ready
but not the mux in the function connect_server. Add a BUG_ON to report
if this implicit condition is not true anymore.
This should fix coverity report from github issue #1120.
The actconns list creates massive contention on low server counts because
it's in fact a list of streams using a server, all threads compete on the
list's head and it's still possible to see some watchdog panics on 48
threads under extreme contention with 47 threads trying to add and one
thread trying to delete.
Moving this list per thread is trivial because it's only used by
srv_shutdown_streams(), which simply required to iterate over the list.
The field was renamed to "streams" as it's really a list of streams
rather than a list of connections.
There are multiple per-thread lists in the listeners, which isn't the
most efficient in terms of cache, and doesn't easily allow to store all
the per-thread stuff.
Now we introduce an srv_per_thread structure which the servers will have an
array of, and place the idle/safe/avail conns tree heads into. Overall this
was a fairly mechanical change, and the array is now always initialized for
all servers since we'll put more stuff there. It's worth noting that the Lua
code still has to deal with its own deinit by itself despite being in a
global list, because its server is not dynamically allocated.
Till now servers were only initialized as part of the proxy setup loop,
which doesn't cover peers, tcp log, dns, lua etc. Let's move this part
out of this loop and instead iterate over all registered servers. This
way we're certain to visit them all.
The patch looks big but it's just a move of a large block with the
corresponding reindent (as can be checked with diff -b). It relies
on the two previous ones ("MINOR: server: add a global list of all
known servers and" and "CLEANUP: lua: set a dummy file name and line
number on the dummy servers").
It's a real pain not to have access to the list of all registered servers,
because whenever there is a need to late adjust their configuration, only
those attached to regular proxies are seen, but not the peers, lua, logs
nor DNS.
What this patch does is that new_server() will automatically add the newly
created server to a global list, and it does so as well for the 1 or 2
statically allocated servers created for Lua. This way it will be possible
to iterate over all of them.
The "socket_tcp" and "socket_ssl" servers had no config file name nor
line number, but this is sometimes annoying during debugging or later
in error messages, while all other places using new_server() or
parse_server() make sure to have a valid file:line set. Let's set
something to address this.
Currently the SSL layer checks the validity of its tasklet's context just
in case it would have been stolen, had the connection been idle. Now it
will be able to be notified by the mux when this situation happens so as
not to have to grab the idle connection lock on each pass. This reuses the
TASK_F_USR1 flag just as the muxes do.
These functions are used on the mux layer to indicate that the connection
is becoming idle and that the xprt ought to be careful before checking the
context or that it's not idle anymore and that the context is safe. The
purpose is to allow a mux which is going to release a connection to tell
the xprt to be careful when touching it. At the moment, the xprt are
always careful and that's costly so we want to have the ability to relax
this a bit.
No xprt layer uses this yet.
The muxes are touching the idle_conns_lock all the time now because
they need to be careful that no other thread has stolen their tasklet's
context.
This patch changes this a little bit by setting the TASK_F_USR1 flag on
the tasklet before marking a connection idle, and removing it once it's
not idle anymore. Thanks to this we have the guarantee that a tasklet
without this flag cannot be present in an idle list and does not need
to go through this costly lock. This is especially true for front
connections.
This flag will be usable by any application. It will be preserved across
wakeups so the application can use it to do various stuff. Some I/O
handlers will soon benefit from this.
It's been too short for quite a while now and is now full. It's still
time to extend it to 32-bits since we have room for this without
wasting any space, so we now gained 16 new bits for future flags.
The values were not reassigned just in case there would be a few
hidden u16 or short somewhere in which these flags are placed (as
it used to be the case with stream->pending_events).
The patch is tagged MEDIUM because this required to update the task's
process() prototype to use an int instead of a short, that's quite a
bunch of places.
It's cleaner to use a flag from the task's state to detect a tasklet
and it's even cheaper. One of the best benefits is that this will
allow to get the nice field out of the common part since the tasklet
doesn't need it anymore. This commit uses the last task bit available
but that's temporary as the purpose of the change is to extend this.
The PRNG used by the "random" LB algorithm was the central one which tries
hard to produce "correct" (i.e. hardly predictable) values suitable for use
in UUIDs or cookies. It's much too expensive for pure load balancing where
a cheaper thread-local PRNG is sufficient, and the current PRNG is part of
the hot places when running with many threads.
Let's switch to the stastistical PRNG instead, it's thread-local, very
fast, and with a period of (2^32)-1 which is more than enough to decide
on a server.
We frequently need to access a simple and fast PRNG for statistical
purposes. The debug_prng() function did exactly this using a xorshift
generator but its use was limited to debug only. Let's move this to
tools.h and tools.c to make it accessible everywhere. Since it needs to
be fast, its state is thread-local. An initialization function starts a
different initial value for each thread for better distribution.
In conn_backend_get() we can cause some extreme contention due to the
idle_conns_lock. Indeed, even though it's per-thread, it still causes
high contention when running with many threads. The reason is that all
threads which do not have any idle connections are quickly skipped,
till the point where there are still some, so the first reaching that
point will grab the lock and the other ones wait behind. From this
point, all threads are synchronized waiting on the same lock, and
will follow the leader in small jumps, all hindering each other.
Here instead of doing this we're using a trylock. This way when a thread
is already checking a list, other ones will continue to next thread. In
the worst case, a high contention will lead to a few new connections to
be set up, but this may actually be what is required to avoid contention
in the first place. With this change, the contention has mostly
disappeared on this lock (it's still present in muxes and transport
layers due to the takeover).
Surprisingly, checking for emptiness of the tree root before taking
the lock didn't address any contention.
A few improvements are still possible and desirable here. The first
one would be to avoid seeing all threads jump to the next one. We
could have each thread use a different prime number as the increment
so as to spread them across the entire table instead of keeping them
synchronized. The second one is that the lock in the muck layers
shouldn't be needed to check for the tasklet's context availability.
Using abort() occasionally results in unexploitable core due to issues
rewinding the stack. Let's use ABORT_NOW() which in addition to crashing
much closer to the call point also has the benefit of showing the call
trace.
We've reached a point where the global pools represent a significant
bottleneck with threads. On a 64-core machine, the performance was
divided by 8 between 32 and 64 H2 connections only because there were
not enough entries in the local caches to avoid picking from the global
pools, and the contention on the list there was very high. It becomes
obvious that we need to have an array of lists, but that will require
more changes.
In parallel, standard memory allocators have improved, with tcmalloc
and jemalloc finding their ways through mainstream systems, and glibc
having upgraded to a thread-aware ptmalloc variant, keeping this level
of contention here isn't justified anymore when we have both the local
per-thread pool caches and a fast process-wide allocator.
For these reasons, this patch introduces a new compile time setting
CONFIG_HAP_NO_GLOBAL_POOLS which is set by default when threads are
enabled with thread local pool caches, and we know we have a fast
thread-aware memory allocator (currently set for glibc>=2.26). In this
case we entirely bypass the global pool and directly use the standard
memory allocator when missing objects from the local pools. It is also
possible to force it at compile time when a good allocator is used with
another setup.
It is still possible to re-enable the global pools using
CONFIG_HAP_GLOBAL_POOLS, if a corner case is discovered regarding the
operating system's default allocator, or when building with a recent
libc but a different allocator which provides other benefits but does
not scale well with threads.
Errors reported by ssl_sock_dump_errors() to stderr would only report the
16 lower bits of the file descriptor because it used to be casted to ushort.
This can be backported to all versions but has really no importance in
practice since this is never seen.
Refactoring performed with the following Coccinelle patch:
@@
char *s;
@@
(
- ist2(s, strlen(s))
+ ist(s)
|
- ist2(strdup(s), strlen(s))
+ ist(strdup(s))
)
Note that this replacement is safe even in the strdup() case, because `ist()`
will not call `strlen()` on a `NULL` pointer. Instead is inserts a length of
`0`, effectively resulting in `IST_NULL`.
When dns_connect_nameserver() is called, the nameserver has always a dgram
field properly defined. The caller, dns_send_nameserver(), already performed
the appropriate verification.
When a DNS session is created, the call to ring_attach() never fails. The
ring is freshly initialized and there is other watcher on it. Thus, the call
always succeeds.
Instead of catching an error that must never happen, we use the DISGUISE()
macro to make static analyzers happy.
Recent changes on the server-state file loading have introduced a
regression. HAproxy crashes if a backend with no server-state file is
disabled in the configuration. Indeed, configuration of such backends is not
finalized. Thus many fields are not defined.
To fix the bug, disabled backends must be ignored. In addition a BUG_ON()
has been added to verify the proxy mode regarding the server-state file. It
must be specified (none, global or local) for enabled backends.
No backport needed.
hlua_pushstrippedstring() function strips leading and trailing LWS
characters. But the result length it too short by 1 byte. Thus the last
non-LWS character is stripped. Note that a string containing only LWS
characters resulting to a stipped string with an invalid length (-1). This
leads to a lua runtime error.
This bug was reported in the issue #1155. It must be backported as far as
1.7.
If dispatch mode or transparent backend is used, the backend connection
target is a proxy instead of a server. In these cases, the reuse of
backend connections is not consistent.
With the default behavior, no reuse is done and every new request uses a
new connection. However, if http-reuse is set to never, the connection
are stored by the mux in the session and can be reused for future
requests in the same session.
As no server is used for these connections, no reuse can be made outside
of the session, similarly to http-reuse never mode. A different
http-reuse config value should not have an impact. To achieve this, mark
these connections as private to have a defined behavior.
For this feature to properly work, the connection hash has been slightly
adjusted. The server pointer as an input as been replaced by a generic
target pointer to refer to the server or proxy instance. The hash is
always calculated on connect_server even if the connection target is not
a server. This also requires to allocate the connection hash node for
every backend connections, not just the one with a server target.
Fix a leak in connect_server which happens when a connection is reused
and a bind_addr was allocated because transparent mode is active. The
connection has already an allocated bind_addr so free the newly
allocated one.
No backport needed.
That comma should've been a semicolon. Fortunately, as it is now there
is no impact thanks to operators precedence, and all expressions are
properly evaluated. But this is troubling and the risk is high to
turn it into an effective bug with a minor change.
Introduced in b8ce8905cf which first
appeared in 2.1-dev3. This fix must be backported to 2.1+.
Since this commit:
144289b45 ("REORG: move init_default_instance() to proxy.c and pass it the defproxy pointer")
as quic_transport_params_init() has been moved from cfgparse.c to proxy.c this
latter source file must include xprt_quic.h header.
Should fix#1153 issue.
When the processing stage is finished for a SPOE applet, before returning it
into the idle list, we check if the assigned server appears as full or if
there are some pending connections on the backend or the assigned server. If
yes, it means we reach a maxconn and we close the applet to free a
slot. Otherwise, the applet can be reused. This test is only performed if
there are more than one thread.
It is important to close SPOE applets when there are pending connections for
multithreaded instances because connections with the SPOE agents are
persistent and local to a thread (applets are local to a thread). If a
maxconn is configured, some threads may take all available slots for a
while, leaving remaining threads without any free slot to process SPOE
messages. It is especially true if the maxconn is low.
This patch should fix the issue #705. It must be backported as far as
1.8. However, the code in 1.8 is quite different, a test must be performed
to be sure it works well.
When the selected server has no address, the destination address of the
client is used. However, for now, only the address is set, not the
family. Thus depending on how the server is configured and the client's
destination address, the server address family may be wrong.
For instance, with such server :
server srv 0.0.0.0:0
The server address family is AF_INET. The server connection will fail if a
client is asking for an IPv6 destination.
To fix the bug, we take care to set the rigth family, the family of the
client destination address.
This patch should fix the issue #202. It must be backported to all stable
versions.
If an IPv4 is set via a TCP/HTTP set-dst rule, the original port must be
preserved or set to 0 if the previous family was neither AF_INET nor
AF_INET6. The first case is not an issue because the port remains the
same. But if the previous family was, for instance, AF_UNIX, the port is not
set to 0 and have an undefined value.
This patch must be backported as far as 1.7.
There was a free(ptr) followed by ptr=malloc(ptr, len), which is the
equivalent of ptr = realloc(ptr, len) but slower and less clean. Let's
replace this.
In ssl_sock_free_srv_ctx() there are some calls to free() which are not
followed by a zeroing of the pointers. For now this function is only used
during deinit but it could be used at run time in the near future, so
better secure this.
In sample_store(), depending on the new sample types, the area pointer
was not always zeroed after being freed. Let's make sure it's always the
case to avoid the risk of dangling pointers being misused.
A few occurrences of calls to free() to free a section name,
peers name or server name were using casts and didn't include
the trailing free, let's switch them to ha_free().
This makes the code more readable and less prone to copy-paste errors.
In addition, it allows to place some __builtin_constant_p() predicates
to trigger a link-time error in case the compiler knows that the freed
area is constant. It will also produce compile-time error if trying to
free something that is not a regular pointer (e.g. a function).
The DEBUG_MEM_STATS macro now also defines an instance for ha_free()
so that all these calls can be checked.
178 occurrences were converted. The vast majority of them were handled
by the following Coccinelle script, some slightly refined to better deal
with "&*x" or with long lines:
@ rule @
expression E;
@@
- free(E);
- E = NULL;
+ ha_free(&E);
It was verified that the resulting code is the same, more or less a
handful of cases where the compiler optimized slightly differently
the temporary variable that holds the copy of the pointer.
A non-negligible amount of {free(str);str=NULL;str_len=0;} are still
present in the config part (mostly header names in proxies). These
ones should also be cleaned for the same reasons, and probably be
turned into ist strings.
A network may be specified to avoid header addition for "forwardfor" and
"orignialto" option via the "except" parameter. However, only IPv4
networks/addresses are supported. This patch adds the support of IPv6.
To do so, the net_addr structure is used to store the parameter value in the
proxy structure. And ipcmp2net() function is used to perform the comparison.
This patch should fix the issue #1145. It depends on the following commit:
* c6ce0ab MINOR: tools: Add function to compare an address to a network address
* 5587287 MINOR: tools: Add net_addr structure describing a network addess
ipcmp2net() function may be used to compare an addres (struct
sockaddr_storage) to a network address (struct net_addr). Among other
things, this function will be used to add support of IPv6 for "except"
parameter of "forwardfor" and "originalto" options.
When an except parameter is used for originalto option, only the destination
address must be evaluated. Especially, the address family of the destination
must be tested and not the source one.
This patch must be backported to all stable versions. However be careful,
depending the versions the code may be slightly different.
The preliminary approach to dealing with heavy tasks forced us to quit
the poller after meeting one. Now instead we process at most one per poll
loop and ignore the next ones, so that we get more bandwidth to process
all other classes.
Doing so further reduced the induced HTTP request latency at 100k req/s
under the stress of 1000 concurrent SSL handshakes in the following
proportions:
| default | low-latency
---------+------------+--------------
before | 2.75 ms | 2.0 ms
after | 1.38 ms | 0.98 ms
In both cases, the latency is roughly halved. It's worth noting that
both values are now exactly 10 times better than in 2.4-dev9. Even the
percentiles have much improved. For 16 HTTP connections (1 per thread)
competing with 1000 SSL handshakes, we're seeing these long-tail
latencies (in milliseconds) :
| 99.5% | 99.9% | 100%
-----------+---------+---------+--------
2.4-dev9 | 48.4 | 58.1 | 78.5
previous | 6.2 | 11.4 | 67.8
this patch | 2.8 | 2.9 | 6.1
The task latency profiling report now shows this in default mode:
$ socat - /tmp/sock1 <<< "show profiling"
Per-task CPU profiling : on # set profiling tasks {on|auto|off}
Tasks activity:
function calls cpu_tot cpu_avg lat_tot lat_avg
si_cs_io_cb 3061966 2.224s 726.0ns 42.03s 13.72us
h1_io_cb 3061960 6.418s 2.096us 18.76m 367.6us
process_stream 3059982 9.137s 2.985us 15.52m 304.3us
ssl_sock_io_cb 602657 4.265m 424.7us 4.736h 28.29ms
h1_timeout_task 202973 - - 6.254s 30.81us
accept_queue_process 135547 1.179s 8.699us 16.29s 120.1us
srv_cleanup_toremove_conns 81 15.64ms 193.1us 30.87ms 381.1us
task_run_applet 10 758.7us 75.87us 51.77us 5.176us
srv_cleanup_idle_conns 4 375.3us 93.83us 54.52us 13.63us
And this in low-latency mode, showing that both si_cs_io_cb() and process_stream()
have significantly benefitted from the improvement, with values 50 to 200 times
smaller than 2.4-dev9:
$ socat - /tmp/sock1 <<< "show profiling"
Per-task CPU profiling : on # set profiling tasks {on|auto|off}
Tasks activity:
function calls cpu_tot cpu_avg lat_tot lat_avg
h1_io_cb 6407006 11.86s 1.851us 31.14m 291.6us
process_stream 6403890 18.40s 2.873us 2.134m 20.00us
si_cs_io_cb 6403866 4.139s 646.0ns 1.773m 16.61us
ssl_sock_io_cb 894326 6.407m 429.9us 7.326h 29.49ms
h1_timeout_task 301189 - - 8.440s 28.02us
accept_queue_process 211989 1.691s 7.977us 21.48s 101.3us
srv_cleanup_toremove_conns 220 23.46ms 106.7us 65.61ms 298.2us
task_run_applet 16 1.219ms 76.17us 181.7us 11.36us
srv_cleanup_idle_conns 12 713.3us 59.44us 168.4us 14.03us
The changes are slightly more invasive than previous ones and depend on
recent patches so they are not likely well suited for backporting.
Instead of placing heavy tasklets into the TL_BULK queue, we now place
them into the TL_HEAVY one, which is assigned a default weight of ~1%
load at once. This way heavy tasks will not block TL_BULK anymore.
This class will be used exclusively for heavy processing tasklets. It
will be cleaner than mixing them with the bulk ones. For now it's
allocated ~1% of the CPU bandwidth.
The largest part of the patch consists in re-arranging the fields in the
task_per_thread structure to preserve a clean alignment with one more
list head. Since we're now forced to increase the struct past a second
cache line, it now uses 4 cache lines (for easy multiplying) with the
first two ones being exclusively used by local operations and the third
one mostly by atomic operations. Interestingly, this better arrangement
causes less stress and reduced the response time by 8 microseconds at
1 million requests per second.
A potential null pointer dereference was reported with an old gcc
version (6.5)
src/ssl_ckch.c: In function 'cli_parse_set_cert':
src/ssl_ckch.c:844:7: error: potential null pointer dereference [-Werror=null-dereference]
if (!ssl_sock_copy_cert_key_and_chain(src->ckch, dst->ckch))
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/ssl_ckch.c:844:7: error: potential null pointer dereference [-Werror=null-dereference]
src/ssl_ckch.c: In function 'ckchs_dup':
src/ssl_ckch.c:844:7: error: potential null pointer dereference [-Werror=null-dereference]
if (!ssl_sock_copy_cert_key_and_chain(src->ckch, dst->ckch))
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/ssl_ckch.c:844:7: error: potential null pointer dereference [-Werror=null-dereference]
This could happen if ckch_store_new() fails to allocate memory and returns NULL.
This patch must be backported with 8f71298 since it was wrongly fixed and
the bug could happen.
Must be backported as far as 2.2.
These function names are unbearably long, they don't even fit into the
screen in "show profiling", let's trim the "_connections" to "_conns",
which happens to match the name of the lists there.
There's a fairness issue between SSL and clear text. A full end-to-end
cleartext connection can require up to ~7.7 wakeups on average, plus 3.3
for the SSL tasklet, one of which is particularly expensive. So if we
accept to process many handshakes taking 1ms each, we significantly
increase the processing time of regular tasks just by adding an extra
delay between their calls. Ideally in order to be fair we should have a
1:18 call ratio, but this requires a bit more accounting. With very little
effort we can mark the SSL handshake tasklet as TASK_HEAVY until the
handshake completes, and remove it once done.
Doing so reduces from 14 to 3.0 ms the total response time experienced
by HTTP clients running in parallel to 1000 SSL clients doing full
handshakes in loops. Better, when tune.sched.low-latency is set to "on",
the latency further drops to 1.8 ms.
The tasks latency distribution explain pretty well what is happening:
Without the patch:
$ socat - /tmp/sock1 <<< "show profiling"
Per-task CPU profiling : on # set profiling tasks {on|auto|off}
Tasks activity:
function calls cpu_tot cpu_avg lat_tot lat_avg
ssl_sock_io_cb 2785375 19.35m 416.9us 5.401h 6.980ms
h1_io_cb 1868949 9.853s 5.271us 4.829h 9.302ms
process_stream 1864066 7.582s 4.067us 2.058h 3.974ms
si_cs_io_cb 1733808 1.932s 1.114us 26.83m 928.5us
h1_timeout_task 935760 - - 1.033h 3.975ms
accept_queue_process 303606 4.627s 15.24us 16.65m 3.291ms
srv_cleanup_toremove_connections452 64.31ms 142.3us 2.447s 5.415ms
task_run_applet 47 5.149ms 109.6us 57.09ms 1.215ms
srv_cleanup_idle_connections 34 2.210ms 65.00us 87.49ms 2.573ms
With the patch:
$ socat - /tmp/sock1 <<< "show profiling"
Per-task CPU profiling : on # set profiling tasks {on|auto|off}
Tasks activity:
function calls cpu_tot cpu_avg lat_tot lat_avg
ssl_sock_io_cb 3000365 21.08m 421.6us 20.30h 24.36ms
h1_io_cb 2031932 9.278s 4.565us 46.70m 1.379ms
process_stream 2010682 7.391s 3.675us 22.83m 681.2us
si_cs_io_cb 1702070 1.571s 922.0ns 8.732m 307.8us
h1_timeout_task 1009594 - - 17.63m 1.048ms
accept_queue_process 339595 4.792s 14.11us 3.714m 656.2us
srv_cleanup_toremove_connections779 75.42ms 96.81us 438.3ms 562.6us
srv_cleanup_idle_connections 48 2.498ms 52.05us 178.1us 3.709us
task_run_applet 17 1.738ms 102.3us 11.29ms 663.9us
other 1 947.8us 947.8us 202.6us 202.6us
=> h1_io_cb() and process_stream() are divided by 6 while ssl_sock_io_cb() is
multipled by 4
And with low-latency on:
$ socat - /tmp/sock1 <<< "show profiling"
Per-task CPU profiling : on # set profiling tasks {on|auto|off}
Tasks activity:
function calls cpu_tot cpu_avg lat_tot lat_avg
ssl_sock_io_cb 3000565 20.96m 419.1us 20.74h 24.89ms
h1_io_cb 2019702 9.294s 4.601us 49.22m 1.462ms
process_stream 2009755 6.570s 3.269us 1.493m 44.57us
si_cs_io_cb 1997820 1.566s 783.0ns 2.985m 89.66us
h1_timeout_task 1009742 - - 1.647m 97.86us
accept_queue_process 494509 4.697s 9.498us 1.240m 150.4us
srv_cleanup_toremove_connections1120 92.32ms 82.43us 463.0ms 413.4us
srv_cleanup_idle_connections 70 2.703ms 38.61us 204.5us 2.921us
task_run_applet 13 1.303ms 100.3us 85.12us 6.548us
=> process_stream() is divided by 100 while ssl_sock_io_cb() is
multipled by 4
Interestingly, the total HTTPS response time doesn't increase and even very
slightly decreases, with an overall ~1% higher request rate. The net effect
here is a redistribution of the CPU resources between internal tasks, and
in the case of SSL, handshakes wait bit more but everything after completes
faster.
This was made simple enough to be backportable if it helps some users
suffering from high latencies in mixed traffic.
While the scheduler is priority-aware and class-aware, and consistently
tries to maintain fairness between all classes, it doesn't make use of a
fine execution budget to compensate for high-latency tasks such as TLS
handshakes. This can result in many subsequent calls adding multiple
milliseconds of latency between the various steps of other tasklets that
don't even depend on this.
An ideal solution would be to add a 4th queue, have all tasks announce
their estimated cost upfront and let the scheduler maintain an auto-
refilling budget to pick from the most suitable queue.
But it turns out that a very simplified version of this already provides
impressive gains with very tiny changes and could easily be backported.
The principle is to reserve a new task flag "TASK_HEAVY" that indicates
that a task is expected to take a lot of time without yielding (e.g. an
SSL handshake typically takes 700 microseconds of crypto computation).
When the scheduler sees this flag when queuing a tasklet, it will place
it into the bulk queue. And during dequeuing, we accept only one of
these in a full round. This means that the first one will be accepted,
will not prevent other lower priority tasks from running, but if a new
one arrives, then the queue stops here and goes back to the polling.
This will allow to collect more important updates for other tasks that
will be batched before the next call of a heavy task.
Preliminary tests consisting in placing this flag on the SSL handshake
tasklet show that response times under SSL stress fell from 14 ms
before the patch to 3.0 ms with the patch, and even 1.8 ms if
tune.sched.low-latency is set to "on".
Only the first 3 characters are compared for ';no-maint' suffix in
http_handle_stats. Fix it by doing a full match over the entire suffix.
As a side effect, the ';norefresh' suffix matched the inaccurate
comparison, so the maintenance servers were always hidden on the stats
page in this case.
no-maint suffix is present since commit
3e32036701
MINOR: stats: also support a "no-maint" show stat modifier
It should be backported up to 2.3.
This fixes github issue #1147.
In H1, H2 and FCGI muxes, in the show_fd function, there is duplicated test on
the stream's subs field.
This patch fixes the issue #1142. It may be backported as far as 2.2.
Some static functions are now exported and renamed to follow the same
pattern of other exported functions. Here is the list :
* update_server_fqdn: Renamed to srv_update_fqdn and exported
* update_server_check_addr_port: renamed to srv_update_check_addr_port and exported
* update_server_agent_addr_port: renamed to srv_update_agent_addr_port and exported
* update_server_addr: renamed to srv_update_addr
* update_server_addr_potr: renamed to srv_update_addr_port
* srv_prepare_for_resolution: exported
This change is mandatory to move all functions dealing with the server-state
files in a separate file.
This change is not huge but may have a visible impact for users. Now, if a
line of a server-state file is corrupted, the whole file is ignored. A
warning is emitted with the corrupted line number.
In fact, there is no way to recover from a corrupted line. A line is
considered as corrupted if it is too long (truncated line) or if it contains
the wrong number of arguments. In both cases, it means the file was forged
(or at least manually edited). It is safer to ignore it.
Note for now, memory allocation errors are not reported and the
corresponding line is silently ignored.
Now, srv_state_parse_and_store_line() function is used to parse and store a
line in a tree. It is used for global and local server-state files. This
significatly simplies the apply_server_state() function.
Just like for the global server-state file, the line of a local server-state
file are now stored in a tree. This way, the file is fully parsed before
loading the servers state. And with this change, global and local
server-state files are now handled the same way. This will be the
opportunity to factorize the code. It is also a good way to validate the
file before loading any server state.
The loop on the servers of a proxy to load the server states was moved in
the function srv_state_px_update(). This simplify a bit the
apply_server_state() function. It is aslo mandatory to simplify the loading
of local server-state file.
When a server for a given backend is found in the tree containing all lines
of the global server-state file, the node is removed from the tree. It is
useless to keep it longer. It is a small improvement, but it may also be
usefull to track the orphan lines (not used for now).
Parsed parameters are now stored in the tree of server-state lines. This
way, a line from the global server-state file is only parsed once. Before,
it was parsed a first time to store it in the tree and one more time to load
the server state. To do so, the server-state line object must be allocated
before parsing a line. This means its size must no longer depend on the
length of first parsed parameters (backend and server names). Thus the node
type was changed to use a hashed key instead of a string.
Now, we read a full line and expects to found an integer only on it. And if
the line is empty or truncated, an error is returned. If the version is not
valid, an error is also returned. This way, the first line is no longer
partially read.
There is no reason to use a global variable to store the lines of the global
server-state file. This tree is only used during the file parsing, as a line
cache. Now the eb-tree is declared as a local variable in the
apply_server_state() function.
The structure used to store a server-state line in an eb-tree has a too
generic name. Instead of state_line, the structure is renamed as
server_state_line.
<state_line.name_name> field is a node in an eb-tree. Thus, instead of
"name_name", we now use "node" to name this field. If is a more explicit
name and not too strange.
The apply_server_state() function is really hard to read. Thus it was
refactored to be more maintainable. First, an helper function is used to get
the server-state file path. Some useless variables were removed and most of
other variables were renamed to be more readable. The error messages are now
prefixed to know the context (global vs per-proxy). Finally, the loop on the
proxies list was simplified.
This patch may seem a bit huge, but the changes are not so important.
There is no reason to fill two parameter arrays in srv_state_parse_line()
function. Now, only one array is used. The 4th first entries are just
skipped when srv_update_state() is called.
The srv_state_parse_line() function was rewritten to be more strict. First
of all, it is possible to make the difference between an ignored line and an
malformed one. Then, only blank characters (spaces and tabs) are now allowed
as field separator. An error is reported for truncated lines or for lines
with an unexpected number of arguments regarding the provided version.
However, for now, errors are ignored by the caller, invalid lines are just
skipped.
First, we don't want to measure wakeup times if the call date had not
been set before profiling was enabled at run time. And second, we may
only collect the value before clearing the TASK_IN_LIST bit, otherwise
another wakeup might happen on another thread and replace the call date
we're about to use, hence artificially lower the wakeup times.
It is extremely useful to be able to observe the wakeup latency of some
important I/O operations, so let's accept to inflate the tasklet struct
by 8 extra bytes when DEBUG_TASK is set. With just this we have enough
to get live reports like this:
$ socat - /tmp/sock1 <<< "show profiling"
Per-task CPU profiling : on # set profiling tasks {on|auto|off}
Tasks activity:
function calls cpu_tot cpu_avg lat_tot lat_avg
si_cs_io_cb 8099492 4.833s 596.0ns 8.974m 66.48us
h1_io_cb 7460365 11.55s 1.548us 2.477m 19.92us
process_stream 7383828 22.79s 3.086us 18.39m 149.5us
h1_timeout_task 4157 - - 348.4ms 83.81us
srv_cleanup_toremove_connections751 39.70ms 52.86us 10.54ms 14.04us
srv_cleanup_idle_connections 21 1.405ms 66.89us 30.82us 1.467us
task_run_applet 16 1.058ms 66.13us 446.2us 27.89us
accept_queue_process 7 34.53us 4.933us 333.1us 47.58us
Instead of decrementing grq_total once per task picked from the global
run queue, let's do it at once after the loop like we do for other
counters. This simplifies the code everywhere. It is not expected to
bring noticeable improvements however, since global tasks tend to be
less common nowadays.
Now we don't need to decrement rq_total when we pick a tack in the tree
to immediately increment it again after installing it into the local
list. Instead, we simply add to the local queue count the number of
globally picked tasks. Avoiding this shows ~0.5% performance gains at
1Mreq/s (2M task switches/s).
As indicated in previous commit, this function tries to guess which tree
the task is in to figure what counters to update, while we already have
that info in the caller. Let's just pick the relevant parts to place them
in the caller.
In process_runnable_tasks() we're still calling __task_unlink_rq() to
pick a task, and this function tries to guess where to pick the task
from and which counter to update while the caller's context already
has everything. Worse, the number of local tasks is decremented then
recredited, doubling the operations. In order to avoid this we first
need to keep separate counters for local and global tasks that were
picked. This is what this patch does.
The function htx_reserve_max_data() should be used to get an HTX DATA block
with the max possible size. A current block may be extended or a new one
created, depending on the HTX message state. But the idea is to let the
caller to copy a bunch of data without requesting many new blocks. It is its
responsibility to resize the block at the end, to set the final block size.
This function will be used to parse messages with small chunks. Indeed, we
can have more than 2700 1-byte chunks in a 16Kb of input data. So it is easy
to understand how this function may help to improve the parsing of chunk
messages.
If the DNS resolution failed for a server, its ip address must be
removed. Otherwise, the server is stopped but keeps its ip. This may be
confusing when the servers state are retrieved on the CLI and it may lead to
undefined behavior if HAproxy is configured to load its servers state from a
file.
This patch should be backported as far as 2.0.
When a SRV record expires, the ip/port assigned to the associated server are
now removed. Otherwise, the server is stopped but keeps its ip/port while
the server hostname is removed. It is confusing when the servers state are
retrieve on the CLI and may be a problem if saved in a server-state
file. Because the reload may fail because of this inconsistency.
Here is an example:
* Declare a server template in a backend, using the resolver <dns>
server-template test 2 _http._tcp.example.com resolvers dns check
* 2 SRV records are announced with the corresponding additional
records. Thus, 2 servers are filled. Here is the "show servers state"
output :
2 frt 1 test1 192.168.1.1 2 64 0 1 2 15 3 4 6 0 0 0 http1.example.com 8001 _http._tcp.example.com 0 0 - - 0
2 frt 2 test2 192.168.1.2 2 64 0 1 1 15 3 4 6 0 0 0 http2.example.com 8002 _http._tcp.example.com 0 0 - - 0
* Then, one additional record is removed (or a SRV record is removed, the
result is the same). Here is the new "show servers state" output :
2 frt 1 test1 192.168.1.1 2 64 0 1 38 15 3 4 6 0 0 0 http1.example.com 8001 _http._tcp.example.com 0 0 - - 0
2 frt 2 test2 192.168.1.2 0 96 0 1 19 15 3 0 14 0 0 0 - 8002 _http._tcp.example.com 0 0 - - 0
On reload, if a server-state file is used, this leads to undefined behaviors
depending on the configuration.
This patch should be backported as far as 2.0.
When a SRV record was created, it used to register the regular server name
resolution callbacks. That said, SRV records and regular server name
resolution don't work the same way, furthermore on error management.
This patch introduces a new call back to manage DNS errors related to
the SRV queries.
this fixes github issue #50.
Backport status: 2.3, 2.2, 2.1, 2.0
If no additional record is associated to a SRV record, its TTL must not be
renewed. Otherwise the entry never expires. Thus once announced a first
time, the entry remains blocked on the same IP/port except if a new announce
replaces the old one.
Now, the TTL is updated if a SRV record is received while a matching
existing one is found with an additional record or when an new additional
record is assigned to an existing SRV record.
This patch should be backported as far as 2.2.
At the end of resolv_validate_dns_response(), if a received additionnal
record is not assigned to an existing server record, it is released. But the
condition to do so is buggy. If "answer_record" (the received AR) is not
assigned, "tmp_record" is not a valid record object. It is just a dummy
record "representing" the head of the record list.
Now, the condition is far cleaner. This patch must be backported as far as
2.2.
This function has become large with the multi-queue scheduler. We need
to keep the fast path and the debugging parts inlined, but the rest now
moves to task.c just like was done for task_wakeup(). This has reduced
the code size by 6kB due to less inlining of large parts that are always
context-dependent, and as a side effect, has increased the overall
performance by 1%.
The nb_tasks counter was still global and gets incremented and decremented
for each task_new()/task_free(), and was read in process_runnable_tasks().
But it's only used for stats reporting, so doing this this often is
pointless and expensive. Let's move it to the task_per_thread struct and
have the stats sum it when needed.
The test in __task_wakeup() to figure if the remote threads are sleeping
doesn't make sense outside of the global runqueue test, since there are
only two possibilities here: local runqueue or global runqueue, hence a
sleeping thread is another one and can only happen when sending to the
global run queue. Let's move the test inside the "if" block.
Historically we used to call __task_wakeup() with a known tree root but
this is not the case and the code has remained needlessly complicated
with the root calculation in task_wakeup() passed in argument to
__task_wakeup() which compares it again.
Let's get rid of this and just move the detection code there. This
eliminates some ifdefs and allows to simplify the test conditions quite
a bit.
This one is systematically misunderstood due to its unclear name. It
is in fact the number of tasks in the local tasklet list. Let's call
it "tasks_in_list" to remove some of the confusion.
This one is exclusively used as a boolean nowadays and is non-zero only
when the thread-local run queue is not empty. Better check the root tree's
pointer and avoid updating this counter all the time.
This counter is solely used for reporting in the stats and is the hottest
thread contention point to date. Moving it to the scheduler and having a
separate one for the global run queue dramatically improves the performance,
showing a 12% boost on the request rate on 16 threads!
In addition, the thread debugging output which used to rely on rqueue_size
was not totally accurate as it would only report task counts. Now we can
return the exact thread's run queue length.
It is also interesting to note that there are still a few other task/tasklet
counters in the scheduler that are not efficiently updated because some cover
a single area and others cover multiple areas. It looks like having a distinct
counter for each of the following entries would help and would keep the code
a bit cleaner:
- global run queue (tree)
- per-thread run queue (tree)
- per-thread shared tasklets list
- per-thread local lists
Maybe even splitting the shared tasklets lists between pure tasklets and
tasks instead of having the whole and tasks would simplify the code because
there remain a number of places where several counters have to be updated.
dns_session_release() only uses its struct dns_stream_server to access
the lock, so a warning is emitted when threads are disabled. Let's mark
it __maybe_unused.
The lock was still used exclusively to deal with the concurrency between
the "show sess" release handler and a stream_new() or stream_free() on
another thread. All other accesses made by "show sess" are already done
under thread isolation. The release handler only requires to unlink its
node when stopping in the middle of a dump (error, timeout etc). Let's
just isolate the thread to deal with this case so that it's compatible
with the dump conditions, and remove all remaining locking on the streams.
This effectively kills the streams lock. The measured gain here is around
1.6% with 4 threads (374krps -> 380k).
The global streams list is exclusively used for "show sess", to look up
a stream to shut down, and for the hard-stop. Having all of them in a
single list is extremely expensive in terms of locking when using threads,
with performance losses as high as 7% having been observed just due to
this.
This patch makes the list per-thread, since there's no need to have a
global one in this situation. All call places just iterate over all
threads. The most "invasive" changes was in "show sess" where the end
of list needs to go back to the beginning of next thread's list until
the last thread is seen. For now the lock was maintained to keep the
code auditable but a next commit should get rid of it.
The observed performance gain here with only 4 threads is already 7%
(350krps -> 374krps).
Instead of placing the current stream at the end of the stream list when
issuing a "show sess" on the CLI as was done in 2.2 with commit c6e7a1b8e
("MINOR: cli: make "show sess" stop at the last known session"), now we
compare the listed stream's epoch with the dumping stream's and stop on
more recent ones.
This way we're certain to always only dump known streams at the moment we
issue the dump command without having to modify the list. In theory we
could miss some streams if more than 2^31 "show sess" requests are issued
while an old stream remains present, but that's 68 years at 1 "show sess"
per second and it's unlikely we'll keep a process, let alone a stream, that
long.
It could be verified that the count of dumped streams still matches the
one before this change.
The "show sess" CLI command currently lists all streams and needs to
stop at a given position to avoid dumping forever. Since 2.2 with
commit c6e7a1b8e ("MINOR: cli: make "show sess" stop at the last known
session"), a hack consists in unlinking the stream running the applet
and linking it again at the current end of the list, in order to serve
as a delimiter. But this forces the stream list to be global, which
affects scalability.
This patch introduces an epoch, which is a global 32-bit counter that
is incremented by the "show sess" command, and which is copied by newly
created streams. This way any stream can know whether any other one is
newer or older than itself.
For now it's only stored and not exploited.
The hard-stop event didn't wake threads up. In the past it wasn't an issue
as the poll timeout was limited to 1 second, but since commit 4f59d3861
("MINOR: time: increase the minimum wakeup interval to 60s") it has become
a problem because old processes can remain live for up to one minute after
the hard-stop-after delay. Let's just wake them up.
This may be backported to older releases, though before 2.4 the extra
delay was only one second.
There's no locking around the lookup of a stream nor its shutdown
when issuing "shutdown sessions" over the CLI so the risk of crashing
the process is particularly high.
Let's use a thread_isolate() there which is suitable for this task, and
there are not that many alternatives.
This must be backported to 1.8.
When setting hard-stop-after, hard_stop() is called at the end to kill
last pending streams. Unfortunately there's no locking there while
walking over the streams list nor when shutting them down, so it's
very likely that some old processes have been crashing or gone wild
due to this. Let's use a thread_isolate() call for this as we don't
have much other choice (and it happens once in the process' life,
that's OK).
This must be backported to 1.8.
This patch adds a lock to functions vars_get_by_name() and
vars_get_by_desc() to protect accesses to the list of variables.
After the variable is fetched, a sample data is duplicated by using
smp_dup() because the variable may be modified by another thread.
This should be backported to all versions supporting vars along with
"BUG/MINOR: sample: secure convs that accept base64 string and var name
as args" which this patch depends on.
This patch adds a few improvements in order to secure the use of
converters that accept base64 string and variable name as arguments.
The first change is within related function sample_conv_var2smp_str()
which now flags the sample as SMP_F_CONST if the argument is of type
ARGT_STR. This makes the sample more safe for later use.
A new function sample_check_arg_base64() is added. It checks an argument
and fills it with a variable type if the argument string contains a
valid variable name. If failed, it tries to perform a base64 decode
operation on a non-empty string, and fills the argument with the decoded
content which can be used later, without any additional base64dec()
function calls during runtime. This means that haproxy configuration
check may fail if variable lookup fails and an invalid base64 encoded
string is specified as an argument for such converters.
Both converters, "aes_gcm_dec" and "hmac", now use alloc_trash_chunk()
in order to allocate additional buffers for various conversions, and
avoid the use of a pre-allocated trash chunks directly (usually returned
by get_trash_chunk()). The function sample_check_arg_base64() is used
for both converters in order to check their arguments specified within
the haproxy configuration.
This patch should be backported as far as 2.0. However, it is important
to keep in mind a few things. The "hmac" converter is only available
starting with 2.2. In versions prior to 2.2, the "aes_gcm_dec" converter
and sample_conv_var2smp_str() are implemented in src/ssl_sock.c. Thus
the patch will have to be adapted on these versions.
Note that this patch is required for a subsequent, more important fix.
A potential null pointer dereference was reported with an old gcc
version (6.5)
src/ssl_ckch.c: In function 'cli_parse_set_cert':
src/ssl_ckch.c:838:7: error: potential null pointer dereference [-Werror=null-dereference]
if (!ssl_sock_copy_cert_key_and_chain(src->ckch, dst->ckch))
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/ssl_ckch.c:838:7: error: potential null pointer dereference [-Werror=null-dereference]
src/ssl_ckch.c: In function 'ckchs_dup':
src/ssl_ckch.c:838:7: error: potential null pointer dereference [-Werror=null-dereference]
if (!ssl_sock_copy_cert_key_and_chain(src->ckch, dst->ckch))
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/ssl_ckch.c:838:7: error: potential null pointer dereference [-Werror=null-dereference]
cc1: all warnings being treated as errors
This case does not actually happen but it's better to fix the ckch API
with a NULL check.
Could be backported as far as 2.1.
RAND_keep_random_devices_open is OpenSSL specific function, not
implemented in LibreSSL and BoringSSL. Let us define guard
HAVE_SSL_RAND_KEEP_RANDOM_DEVICES_OPEN in include/haproxy/openssl-compat.h
That guard does not depend anymore on HA_OPENSSL_VERSION
The runqueue_ticks counts the number of task wakeups and is used to
position new tasks in the run queue, but since we've had per-thread
run queues, the values there are not very relevant anymore and the
nice value doesn't apply well if some threads are more loaded than
others. In addition, letting all threads compete over a shared counter
is not smart as this may cause some excessive contention.
Let's move this index close to the run queues themselves, i.e. one per
thread and a global one. In addition to improving fairness, this has
increased global performance by 2% on 16 threads thanks to the lower
contention on rqueue_ticks.
Fairness issues were not observed, but if any were to be, this patch
could be backported as far as 2.0 to address them.
Historically this function would try to wake the most accurate number of
process_stream() waiters. But since the introduction of filters which could
also require buffers (e.g. for compression), things started not to be as
accurate anymore. Nowadays muxes and transport layers also use buffers, so
the runqueue size has nothing to do anymore with the number of supposed
users to come.
In addition to this, the threshold was compared to the number of free buffer
calculated as allocated minus used, but this didn't work anymore with local
pools since these counts are not updated upon alloc/free!
Let's clean this up and pass the number of released buffers instead, and
consider that each waiter successfully called counts as one buffer. This
is not rocket science and will not suddenly fix everything, but at least
it cannot be as wrong as it is today.
This could have been marked as a bug given that the current situation is
totally broken regarding this, but this probably doesn't completely fix
it, it only goes in a better direction. It is possible however that it
makes sense in the future to backport this as part of a larger series if
the situation significantly improves.
The buffer wait queue used to be global historically but this doest not
make any sense anymore given that the most common use case is to have
thread-local pools. Thus there's no point waking up waiters of other
threads after releasing an entry, as they won't benefit from it.
Let's move the queue head to the thread_info structure and use
ti->buffer_wq from now on.
When a server-state line is parsed, a test is performed to be sure there is
enough but not too much fields. However the test is buggy. The bug was
introduced in the commit ea2cdf55e ("MEDIUM: server: Don't introduce a new
server-state file version").
No backport needed.
This revert the commit 63e6cba12 ("MEDIUM: server: add server-states version
2"), but keeping all recent features added to the server-sate file. Instead
of adding a 2nd version for the server-state file format to handle the 5 new
fields added during the 2.4 development, these fields are considered as
optionnal during the parsing. So it is possible to load a server-state file
from HAProxy 2.3. However, from 2.4, these new fields are always dumped in
the server-state file. But it should not be a problem to load it on the 2.3.
This patch seems a bit huge but the diff ignoring the space is much smaller.
The version 2 of the server-state file format is reserved for a real
refactoring to address all issues of the current format.
If a line of a server-state file has too many fields, the last one is not
cut on the first following space, as all other fileds. It contains all the
end of the line. It is not the expected behavior. So, now, we cut it on the
next following space, if any. The parsing loop was slighly rewritten.
Note that for now there is no error reported if the line is too long.
This patch may be backported at least as far as 2.1. On 2.0 and prior the
code is not the same. The line parsing is inlined in apply_server_state()
function.
Same static arrays of parameters are used to parse all server-state
lines. Thus it is important to reinit them to be sure to not get params from
the previous line, eventually from the previous loaded file.
This patch should be backported to all stable branches. However, in 2.0 and
prior, the parsing of server-state lines are inlined in apply_server_state()
function. Thus the patch will have to be adapted on these versions.
When a HTTP return action is triggered, HAProxy is responsible to return the
response, based on the configured status code. On the request side, there is
no problem because there is no server response to replace. But on the
response side, we must take care to override the server response status
code, if any, to be sure to use the rigth status code to get the http reply
message.
In short, we must always set the configured status code of the HTTP return
action before returning the http reply to be sure to get the right reply,
the one base on the http return action status code and not a reply based on
the server response status code..
This patch should fix the issue #1139. It must be backported as far as 2.2.
If a SPOE filter is configured to send its logs to a ring buffer, the
corresponding sink must be resolved during the configuration post
parsing. Otherwise, the sink is undefined when a log message is emitted,
crashing HAProxy.
This patch must be backported as far as 2.2.
Remove ebmb_node entry from struct connection and create a dedicated
struct conn_hash_node. struct connection contains now only a pointer to
a conn_hash_node, allocated only for connections where target is of type
OBJ_TYPE_SERVER. This will reduce memory footprints for every
connections that does not need http-reuse such as frontend connections.
In h2_process there was two parts where the connection was removed from
the idle trees, without first checking if the connection is a backend
side.
This should not produce a crash as the node is properly zeroed on
conn_init. However, it is better to explicit the test as it is done on
all other places. Besides it will be mandatory if the node part is
dynamically allocated only for backend connections.
The maximum number of connections accepted at once by a thread for a single
listener used to default to 64 divided by the number of processes but the
tasklet-based model is much more scalable and benefits from smaller values.
Experimentation has shown that 4 gives the highest accept rate for all
thread values, and that 3 and 5 come very close, as shown below (HTTP/1
connections forwarded per second at multi-accept 4 and 64):
ac\thr| 1 2 4 8 16
------+------------------------------
4| 80k 106k 168k 270k 336k
64| 63k 89k 145k 230k 274k
Some tests were also conducted on SSL and absolutely no change was observed.
The value was placed into a define because it used to be spread all over the
code.
It might be useful at some point to backport this to 2.3 and 2.2 to help
those who observed some performance regressions from 1.6.
SCTL (signed certificate timestamp list) specified in RFC6962
was implemented in c74ce24cd22e8c683ba0e5353c0762f8616e597d, let
us introduce macro HAVE_SSL_SCTL for the HAVE_SSL_SCTL sake,
which in turn is based on SN_ct_cert_scts, which comes in the same commit
we previously forgot to add `agent-*` commands.
Take this opportunity to rewrite the help string in a simpler way for
readability (mainly removing simple quotes)
Signed-off-by: William Dauchy <wdauchy@gmail.com>
Due to the two-phase server reservation, there are 3 calls to
fwlc_srv_reposition() per request, one during assign_server() to reserve
the slot, one in connect_server() to commit it, and one in process_stream()
to release it. However only one of the first two will change the key, so
it's needlessly costly to take the lock, remove a server and insert it
again at the same place when we can already figure we ought not to even
have taken the lock.
Furthermore, even when the server needs to move, there can be quite some
contention on the lbprm lock forcing the thread to wait. During this time
the served and nbpend server values might have changed, just like the
lb_node.key itself. Thus we measure the values again under the lock
before updating the tree. Measurements have shown that under contention
with 16 servers and 16 threads, 50% of the updates can be avoided there.
This patch makes the function compute the new key and compare it to
the current one before deciding to move the entry (and does it again
under the lock forthe second test).
This removes between 40 and 50% of the tree updates depending on the
thread contention and the number of servers. The performance gain due
to this (on 16 threads) was:
16 servers: 415 krps -> 440 krps (6%, contention on lbprm)
4 servers: 554 krps -> 714 krps (+29%, little contention)
One point worth thinking about is that it's not logic to update the
tree 2-3 times per request while it's only read once. half to 2/3 of
these updates are not needed. An experiment consisting in subscribing
the server to a list and letting the readers reinsert them on the fly
showed further degradation instead of an improvement.
A better approach would probably consist in avoinding writes to shared
cache lines by having the leastconn nodes distinct from the servers,
with one node per value, and having them hold an mt-list with all the
servers having that number of connections. The connection count tree
would then be read-mostly instead of facing heavy writes, and most
write operations would be performed on 1-3 list heads which are way
cheaper to migrate than a tree node, and do not require updating the
last two updated neighbors' cache lines.
The operations are only an insert and a delete into the LB tree, which
doesn't require the server's lock at all as the lbprm lock is already
held. Let's drop it. Just for the sake of cleanness, given that the
served and nbpend values used to be atomically updated, we'll use an
atomic load to read them.
The operations are only an insert and a delete into the LB tree, which
doesn't require the server's lock at all as the lbprm lock is already
held. Let's drop it.
The two algos defining these functions (first and leastconn) do not need the
server's lock. However it's already present in pendconn_process_next_strm()
so the API must be updated so that the functions may take it if needed and
that the callers indicate whether they already own it.
As such, the call places (backend.c and stream.c) now do not take it
anymore, queue.c was unchanged since it's already held, and both "first"
and "leastconn" were updated to take it if not already held.
A quick test on the "first" algo showed a jump from 432 to 565k rps by
just dropping the lock in stream.c!
The remaining contention on the server lock solely comes from
sess_change_server() which takes the lock to add and remove a
stream from the server's actconn list. This is both expensive
and pointless since we have mt-lists, and this list is only
used by the CLI's "shutdown server sessions" command!
Let's migrate to an mt-list and remove the need for this costly
lock. By doing so, the request rate increased by ~1.8%.
The server lock was taken preventively for anything in health_adjust(),
including the static config checks needed to detect that the lock was not
needed, while the function is always called on the response path to update
a server's status. This was responsible for huge contention causing a
performance drop of about 17% on 16 threads. Let's move the lock only
where it should be, i.e. inside the function around the critical sections
only. By doing this, a 16-thread process jumped back from 575 to 675 krps.
This should be backported to 2.3 as the situation degraded there, and
maybe later to 2.2.
There's an issue when a server state changes, we use an integer comparison
to decide whether or not to reschedule a test instead of using a wrapping
timer comparison. This will cause some health-checks not to be immediately
triggered half of the time, and some unneeded calls to task_queue() to be
performed in other cases.
This bug has always been there as it was introduced with the commit that
added the feature, 97f07b832 ("[MEDIUM] Decrease server health based on
http responses / events, version 3"). This may be backported everywhere.
conn_hash_prehash does not need a nul-terminated string, thus it is only
needed to test if the sni sample is not null before using it as
connection hash input.
Moreover, a bug could be introduced between smp_make_safe and
ssl_sock_set_servername call. Indeed, smp_make_safe may call smp_dup
which duplicates the sample in the trash buffer. If another function
manipulates the trash buffer before the call to ssl_sock_set_servername,
the sni sample might be erased. Currently, no function seems to do that
except make_proxy_line in case proxy protocol is used simultaneously
with the sni on the server.
This does not need to be backported.
In session_count_new() the tracked counter was still incremented with
a "++" outside of any lock, resulting in occasional slightly off values
such as the following:
# table: foo, type: string, size:1000, used:1
0xb2a398: key=127.1.2.3 use=0 exp=86398318 sess_cnt=999959 http_req_cnt=1000004
Now with the correct atomic increment:
# table: foo, type: string, size:1000, used:1
0x7f82a4026d38: key=127.1.2.3 use=0 exp=86399294 sess_cnt=1000004 http_req_cnt=1000004
This can be backported to 1.8.
It seems that fd_delete perform the close of the file descriptor
Se we must not close the fd once again after that.
This should fix issues #1128, #1130 and #1131
This patch fix a case which should never happen writing
in output channel since we check available room before
This patch should fix github issue #1132
This patch fix returns code in case of dns_connect_server is called
on unsupported type (which should not happen). Doing this we have
the warranty that after a return 0 the fd is never -1.
This patch should fix github issues #1127, #1128 and #1130
This patch adds a missing test in dns_session_io_handler, getting
the query id from the buffer of the ring. An error should never
happen since messages are completely added atomically.
This bug should fix github issue #1133
move listen status to a helper, defining both status enum and string
definition.
this will be helpful to be reused in prometheus code. It also removes
this hard-to-read nested ternary.
Signed-off-by: William Dauchy <wdauchy@gmail.com>
prometheus approach requires to output all values for a given metric
name; meaning we iterate through all metrics, and then iterate in the
inner loop on all objects for this metric.
In order to allow more code reuse, adapt the stats API to be able to
select one field or fill them all otherwise.
From this patch it should be possible to add support for listen stats in
prometheus.
Signed-off-by: William Dauchy <wdauchy@gmail.com>
The RMAINT admin state is dynamic and should be remove from the
srv_admin_state parameter when a server state is loaded from a server-state
file. Otherwise an erorr is reported, the server-state line is ignored and
the server state is not updated.
This patch should fix the issue #576. It must be backported as far as 1.8.
This patch introduce the new line "server" to set a TCP
nameserver in a "resolvers" section:
server <name> <address> [param*]
Used to configure a DNS TCP or stream server. This supports for all
"server" parameters found in 5.2 paragraph. Some of these parameters
are irrelevant for DNS resolving. Note: currently 4 queries are pipelined
on the same connections. A batch of idle connections are removed every
5 seconds. "maxconn" can be configured to limit the amount of those
concurrent connections and TLS should also usable if the server supports
. The current implementation limits to 4 pipelined
The name of the line in configuration is open to discussion
and could be changed before the next release.
This patch introduce the "dns_stream_nameserver" to use DNS over
TCP on strict nameservers. For the upper layer it is analog to
the api used with udp nameservers except that the user que switch
the name server in "stream" mode at the init using "dns_stream_init".
The fallback from UDP to TCP is not handled and this is not the
purpose of this feature. This is done to choose the transport layer
during the initialization.
Currently there is a hardcoded limit of 4 pipelined transactions
per TCP connections. A batch of idle connections is expired every 5s.
This code is designed to support a maximum DNS message size on TCP: 64k.
Note: this code won't perform retry on unanswered queries this
should be handled by the upper layer
This patch splits current dns.c into two files:
The first dns.c contains code related to DNS message exchange over UDP
and in future other TCP. We try to remove depencies to resolving
to make it usable by other stuff as DNS load balancing.
The new resolvers.c inherit of the code specific to the actual
resolvers.
Note:
It was really difficult to obtain a clean diff dur to the amount
of moved code.
Note2:
Counters and stuff related to stats is not cleany separated because
currently counters for both layers are merged and hard to separate
for now.
This patch splits recv and send functions in two layers. the
lowest is responsible of DNS message transactions over
the network. Doing this we could use DNS message layer
for something else than resolving. Load balancing for instance.
This patch also re-works the way to init a nameserver and
introduce the new struct dns_dgram_server to prepare the arrival
of dns_stream_server and the support of DNS over TCP.
The way to retry a send failure of a request because of EAGAIN
was re-worked. Previously there was no control and all "pending"
queries were re-played each time it reaches a EAGAIN. This
patch introduce a ring to stack messages in case of sent
failure. This patch is emptied if poller shows that the
socket is ready again to push messages.
Counters are currently stored into lowlevel nameservers struct but
most of them are resolving layer data and increased in the upper layer
So this patch renames the prototype used to allocate/dump them with prefix
'resolv' waiting for a clean split.
Some types are specific to resolver code and a renamed using
the 'resolv' prefix instead 'dns'.
-struct dns_query_item {
+struct resolv_query_item {
-struct dns_answer_item {
+struct resolv_answer_item {
-struct dns_response_packet {
+struct resolv_response {
Resolv callbacks are also updated to rely on counters and not on
nameservers.
"show stat domain dns" will now show the parent id (i.e. resolvers
section name).
FreeBSD has a kernel feature (accf) and a sockopt flag similar to the
Linux's TCP_DEFER_ACCEPT to filter incoming data upon ACK. The main
difference is the filter needs to be placed when the socket actually
listens.
Very often, especially since reg-tests, it would be desirable to be able
to conditionally comment out a config block, such as removing an SSL
binding when SSL is disabled, or enabling HTX only for certain versions,
etc.
This patch introduces a very simple nested block management which takes
".if", ".elif", ".else" and ".endif" directives to take or ignore a block.
For now the conditions are limited to empty string or "0" for false versus
a non-nul integer for true, which already suffices to test environment
variables. Still, it needs to be a bit more advanced with defines, versions
etc.
A set of ".notice", ".warning" and ".alert" statements are provided to
emit messages, often in order to provide advice about how to fix certain
conditions.
The "show peers" output has become huge due to the dictionaries making it
less readable. Now this feature has reached a certain level of maturity
which doesn't warrant to dump it all the time, given that it was essentially
needed by developers. Let's make it optional, and disabled by default, only
when "show peers dict" is requested. The default output reminds about the
command. The output has been divided by 5 :
$ socat - /tmp/sock1 <<< "show peers dict" | wc -l
125
$ socat - /tmp/sock1 <<< "show peers" | wc -l
26
It could be useful to backport this to recent stable versions.
This variable is now only used to point on the local server-state file. When
the server-state is global, it is unused. So, we now use "localfilepath"
instead. Thus, the "filepath" variable can safely be removed.
When a local server-state file is loaded, if its name is too long, the error
is not properly handled, resulting to a call to fopen() with the "filepath"
variable set to NULL. To fix the bug, when this error occurs, we jump to the
next proxy, via a "continue" statement. And we take case to set "filepath"
variable after the error handling to be sure.
This patch should fix the issue #1111. It must be backported as far as 1.6.
When a connect rule is evaluated a test is performed on the "port" variable
while it is set to 0 just on the line just above. Just remove this useless
test to make ccpcheck happy.
This patch fixes the issue #1113.
Now it becomes possible to specify "from foo" on a frontend/listen/backend
or even on a "defaults" line, to mention that defaults section "foo" needs
to be used to preset the proxy's settings.
When not set, the last section remains used. In case the designated name
is found at multiple places, it is rejected and an error indicates two
occurrences of the same name. Similarly, if the section name is found,
its name must only use valid characters. This allows multiple named
defaults section to continue to coexist without the risk that they will
cause trouble by accident.
When it comes to "defaults" relying on another defaults, what happens is
just that a new defaults section is created from the designated one. This
will make it possible for example to reuse some settings such as log-format
like below:
defaults tcp-clear
log stdout local0 info
log-format "%ci:%cp/%b/%si:%sp %ST %ts %U/%B %{+Q}r"
defaults tcp-ssl
log stdout local0 info
log-format "%ci:%cp/%b/%si:%sp %ST %ts %U/%B %{+Q}r ssl=%sslv"
defaults http-clear from tcp-clear
mode http
defaults http-ssl from tcp-ssl
mode http
frontend fe1 from http-clear
bind :8001
frontend fe2 from http-ssl
bind :8002
A small corner case remains in the error detection, if a second defaults
section appears with the same name after the point where it was used, and
nobody references it, the duplicate will not be detected. This could be
addressed by performing the syntactic checks in check_config_validity(),
and by postponing the freeing of the defaults, after tagging a defaults
section as explicitly looked up by another section. This doesn't seem
that important at the moment though.
Now default proxies are stored into a dedicated tree, sorted by name.
Only unnamed entries are not kept upon new section creation. The very
first call to cfg_parse_listen() will automatically allocate a dummy
defaults section which corresponds to the previous static one, since
the code requires to have one at a few places.
The first immediately visible benefit is that it allows to reuse
alloc_new_proxy() to allocate a defaults section instead of doing it by
hand. And the secret goal is to allow to keep multiple named defaults
section in memory to reuse them from various proxies.
Now we'll have a tree of named defaults sections. The regular insertion
and lookup functions take care of the capability in order to select the
appropriate tree. A new function proxy_destroy_defaults() removes a
proxy from this tree and frees it entirely.
In order to make the default proxy configurable, we'll need to have a
pointer to it which might differ from &defproxy. cfg_parse_listen()
now gets curr_defproxy for this.
We don't want to expose this one anymore as we'll soon keep multiple
default proxies. Let's move it inside the parser which is the only
place which still uses it, and initialize it on the fly once needed
instead of doing it at boot time.
The default proxy was passed as a variable, which in addition to being
a PITA to deal with in the config parser, doesn't feel safe to use when
it ought to be const.
This will only affect new code so no backport is needed.
The default proxy was passed as a variable, which in addition to being
a PITA to deal with in the config parser, doesn't feel safe to use when
it ought to be const.
This will only affect new code so no backport is needed.
The default proxy was passed as a variable, which in addition to being
a PITA to deal with in the config parser, doesn't feel safe to use when
it ought to be const.
This will only affect new code so no backport is needed.
In proxy_free_defaults(); none of the free() calls was followed by a
pointer reset. Not only it's hard to figure if one of them is duplicated,
but this code started to call other functions which might or might not
rely on such just freed pointers. Let's reset them as they should be to
make sure there will never be any case of use-after-free. The 3 functions
called there were inspected and are all unaffected by this so this remains
safe to do right now.
This used to be open-coded in cfgparse-listen.c when facing a "defaults"
keyword. Let's move this into proxy_free_defaults(). This code is ugly and
doesn't even reset the just freed pointers. Let's not change this yet.
This code should probably be merged with a generic proxy deinit function
called from deinit(). However there's a catch on uri_auth which cannot be
freed because it might be used by one or several proxies. We definitely
need refcounts there!
The proxy initialization code relies on three phases, allocation,
pre-initialization, and assignments from defaults. This last part is
entirely taken from the defaults proxy when arguments are set. This
sensibly complexifies the initialization code as it requires to always
have a default proxy.
This patch instead first applies the original default settings on a
proxy, and then uses those from a default proxy only if one such is
used. This will allow to initialize a proxy out of any default proxy
while still using valid defaults. A careful inspection of the function
showed that only 4 fields used to be set regardless of the default
proxy, and those were moved to init_new_proxy() where they ought to
have been in the first place.
This new function takes over the old open-coding that used to be done
for too long in cfg_parse_listen() and it now does everything at once
in a proxy-centric function. The function does all the job of allocating
the structure, initializing it, presetting its defaults from the default
proxy and checking for errors. The code was almost unchanged except for
defproxy being passed as a pointer, and the error message being passed
using memprintf().
This change will be needed to ease reuse of multiple default proxies,
or to create dynamic backends in a distant future.
init_default_instance() was still left in cfgparse.c which is not the
best place to pre-initialize a proxy. Let's place it in proxy.c just
after init_new_proxy(), take this opportunity for renaming it to
proxy_preset_defaults() and taking out init_new_proxy() from it, and
let's pass it the pointer to the default proxy to be initialized instead
of implicitly assuming defproxy. We'll soon be able to exploit this.
Only two call places had to be updated.
This is just an API bug but it's annoying when trying to tidy the code.
The source list passed in argument must be a const and not a variable,
as it's typically the list head from a default proxy and must obviously
not be modified by the function. No backport is needed as it only impacts
new code.
This is just an API bug but it's annoying when trying to tidy the code.
The default proxy passed in argument must be a const and not a variable.
No backport is needed as it only impacts new code.
The very old error message indicating that a proxy name is mandatory
still had a reference to the optional addr:port argument while this one
is explicitly rejected a few lines later since at least 1.9.
This is harmless but confusing. This can be backported to 2.0.
In 2.1, commit ee4f5f83d ("MINOR: stats: get rid of the ST_CONVDONE flag")
introduced a subtle bug. By testing curproxy against defproxy in
check_config_validity(), it tried to eliminate the need for a flag
to indicate that stats authentication rules were already compiled,
but by doing so it left the issue opened for the case where a new
defaults section appears after the two proxies sharing the first
one:
defaults
mode http
stats auth foo:bar
listen l1
bind :8080
listen l2
bind :8181
defaults
# just to break above
This config results in:
[ALERT] 042/113725 (3121) : proxy 'f2': stats 'auth'/'realm' and 'http-request' can't be used at the same time.
[ALERT] 042/113725 (3121) : Fatal errors found in configuration.
Removing the last defaults remains OK. It turns out that the cleanups
that followed that patch render it useless, so the best fix is to revert
the change (with the up-to-date flags instead). The flag was marked as
belonging to the config. It's not exact but it's the closest to the
reality, as it's not there to configure the behavior but ti mention
that the config parser did its job.
This could be backported as far as 2.1, but in practice it looks like
nobody ever hit it.
Since commit 1.3.14 with commit 1fa3126ec ("[MEDIUM] introduce separation
between contimeout, and tarpit + queue"), check_config_validity() looks
at the last defaults section to update all proxies' queue and tarpit
timeouts if they were not set!
This was apparently an attempt to properly set them on the fallback values,
except that the fallback values were taken from the default proxy before
looking at the current proxy itself. The worst part of it is that it might
have randomly worked by accident for some configurations when there was a
single defaults section, but has certainly caused too short queue
expirations once another defaults section was added later in the file with
these explicitly defined.
Let's remove the defproxy part and keep only the curproxy ones. This could
be backported everywhere, the bug has been there for 13 years.
Since the beginning, this directive is documented to accept an optional file
name. But it should also be possible to use it without any argument to use
the backend name as file name. However, when no argument is provided, an
error is reported during the configuration parsing requesting an argument, a
file name or "use-backend-name". And This last special argument is not
documented.
So, to respect the documentation and to avoid configuration breakages, all
modes are now supported. If this directive is called with no argument or
with "use-backend-name", the backend name is use as file name for the
server-state file. Otherwise, the provided string is used.
In addition, we take care to release any previously allocated file name in
case this directive is defines multiple times in the same backend. And an
error is reported if more than one argument are defined. Finally, the
documentation is updated accordingly. Sections supporting this directive are
also mentioned.
This patch should be backported as far as 1.6.
server health checks and agent parameters are written the same way as
others to be able to enahcne code reuse: basically we make use of
parsing and assignment at the same place. It makes it difficult for
error handling to know whether srv object was modified partially or not.
The problem was already present with SRV resolution though.
I was a bit puzzled about the approach to take to be honest, and I did
not wanted to go into a full refactor, so I assumed it was ok to simply
notify whether the line was failed or partially applied.
Signed-off-by: William Dauchy <wdauchy@gmail.com>
logical followup from cli commands addition, so that the state server
file stays compatible with the changes made at runtime; use previously
added helper to load server attributes.
also alloc a specific chunk to avoid mixing with other called functions
using it
Signed-off-by: William Dauchy <wdauchy@gmail.com>
Even if it is possibly too much work for the current usage, it makes
sure we don't break states file from v2.3 to v2.4; indeed, since v2.3,
we introduced two new fields, so we put them aside to guarantee we can
easily reload from a version 1.
The diff seems huge but there is no specific change apart from:
- introduce v2 where it is needed (parsing, update)
- move away from switch/case in update to be able to reuse code
- move srv lock to the whole function to make it easier
this patch confirm how painful it is to maintain this functionality.
Signed-off-by: William Dauchy <wdauchy@gmail.com>
this patch allows to set agent port at runtime. In order to align with
both `addr` and `check-addr` commands, also add the possibility to
optionnaly set port on `agent-addr` command. This led to a small
refactor in order to use the same function for both `agent-addr` and
`agent-port` commands.
Signed-off-by: William Dauchy <wdauchy@gmail.com>
this patch allows to set server health check address at runtime. In
order to align with `addr` command, also allow to set port optionnaly.
This led to a small refactor in order to use the same function for both
`check-addr` and `check-port` commands.
for `check-port`, we however don't permit the change anymore if checks
are not enabled on the server.
This command becomes more and more useful for people having a consul
like architecture:
- the backend server is located on a container with its own IP
- the health checks are done the consul instance located on the host
with the host IP
Signed-off-by: William Dauchy <wdauchy@gmail.com>
Use the proxy protocol frame if proxy protocol is activated on the
server line. Do not add anymore these connections in the private list.
If some requests are made with the same proxy fields, they can reuse
the idle connection.
The reg-tests proxy_protocol_send_unique_id must be adapted has it
relied on the side effect behavior that every requests from a same
connection reused a private server connection. Now, a new connection is
created as expected if the proxy protocol fields differ.
The source address is used as an input to the the server connection hash. The
address and port are used as separate hash inputs. Do not add anymore these
connections in the private list.
This parameter is set only if used in the transparent-proxy mode.
The destination address is used as an input to the server connection hash. The
address and port are used as separated hash inputs. Note that they are not used
when statically specified on the server line. This is only useful for dynamic
destination address.
This is typically used when the server address is dynamically set via the
set-dst action. The address and port are separated hash parameters.
Most notably, it should fixed set-dst use case (cf github issue #947).
Change the API of the function used to allocate the stream target
address. This is done in order to be able to allocate the destination
address and use it to reuse a connection sharing with the same address.
In particular, the flag stream SF_ADDR_SET is now set outside of the
function.
The sni parameter is an input to the server connection hash. Do not add
anymore connections with dynamic sni in the private list. Thus, it is
now possible to reuse a server connection if they use the same sni.
Compare the connection hash when reusing a connection from the session.
This ensures that a private connection is reused only if it shares the
same set of parameters.
The pointer of the target server is used as a first parameter for the
server connection hash calcul. This prevents the hash to be null when no
specific parameters are present, and can serve as a simple defense
against an attacker trying to reuse a non-conform connection.
This is a preliminary work for the calcul of the backend connection
hash. A structure conn_hash_params is the input for the operation,
containing the various specific parameters of a connection.
The high bits of the hash will reflect the parameters present as input.
A set of macros is written to manipulate the connection hash and extract
the parameters/payload.
With http-reuse always, if no matching safe connection is found, check
in idle tree for a matching one. This is needed because now idle
connections can be differentiated from each other.
If only the safe tree was checked because not empty, but did not contain
a matching connection, we could miss matching entry in idle tree.
If no matching connection is found on available, check on idle/safe
trees for a matching one. This is needed because now idle connections
can be differentiated from each other.
If only the available list was checked because not empty, but did not
contain a matching connection, we could miss matching entries in idle or
safe trees.
The server idle/safe/available connection lists are replaced with ebmb-
trees. This is used to store backend connections, with the new field
connection hash as the key. The hash is a 8-bytes size field, used to
reflect specific connection parameters.
This is a preliminary work to be able to reuse connection with SNI,
explicit src/dst address or PROXY protocol.
This is a preparation work for connection reuse with sni/proxy
protocol/specific src-dst addresses.
Protect every access to idle conn lists with a lock. This is currently
strictly not needed because the access to the list are made with atomic
operations. However, to be able to reuse connection with specific
parameters, the list storage will be converted to eb-trees. As this
structure does not have atomic operation, it is mandatory to protect it
with a lock.
For this, the takeover lock is reused. Its role was to protect during
connection takeover. As it is now extended to general idle conns usage,
it is renamed to idle_conns_lock. A new lock section is also
instantiated named IDLE_CONNS_LOCK to isolate its impact on performance.
The wrong lock seems to be held when trying to remove another thread
connection if max fd limit has been reached (locking the current thread
instead of the target thread lock).
This could be backported up to 2.0.
This patch removes unecessary tests on p or pp pointers in
pendconn_process_next_strm() function. This should make cppcheck happy and
avoid false report of null pointer dereference.
This patch should fix the issue #1036.
this is pure cleanup, no need to backport
2116 if ((end - 1) == (payload + strlen(PAYLOAD_PATTERN))) {
2117 /* if the payload pattern is at the end */
2118 s->pcli_flags |= PCLI_F_PAYLOAD;
CID 1399833 (#1 of 1): Unused value (UNUSED_VALUE)assigned_value: Assigning value from reql to ret here, but that stored value is overwritten before it can be used.
2119 ret = reql;
2120 }
This patch fixes the issue #1048.
When an invalid character is found during parsing in parse_dotted_uints()
function, the allocated array of uint must be released. This patch fixes a
memory leak on error path during the configuration parsing.
This patch should fix the issue #1106. It should be backported as far as
2.0. Note that, for 2.1 and 2.0, the function is in src/standard.c
In H1, H2 and FCGI muxes, b_realign_if_empty() is called to reset the head
of an empty buffer before setting it a specific value to permit the
zero-copy. Thus, we can remove call to b_realign_if_empty().
When a message is sent, an extra check is performed when the parser is
switch to MSG_DONE state to be sure the EOM flag is really set. This flag is
quite new and replaces the EOM block. Thus, this test is a safeguard waiting
for a proper refactoring of the outgoing side.
In the H2 mux, when a empty DATA frame is used to finish a message, just to
set the ES flag, we now only set the EOM flag on the HTX message. However,
if the HTX message is empty, this event will not be properly handled on the
other side because there is no effective data to handle. Thus, it is
interpreted as an abort by the H1 mux.
It is in part caused by the current H1 mux design but also because there is
no way to emit empty HTX block (NOOP HTX block) or to wakeup a mux for send
when there is no data to finish some internal processing.
Thus, for now, to work around this limitation, an EOT HTX block is added by
the H2 mux if a EOM flag is added on an empty HTX message. This case is only
possible when an empty DATA frame with the ES flag is received.
This fix is specific for 2.4. No backport needed.
In HTTP/2, we may have trailers for messages with a Content-length
header. Thus, when the H2 mux receives a HEADERS frame at the end of a
message, it always emits TLR and EOT HTX blocks. On the H1 mux, if this
happens, these blocks are just skipped because we cannot emit trailers for a
non-chunked message. But the EOT HTX block must not be blindly
ignored. Indeed, there is no longer EOM HTX block to mark the end of the
message. Thus the EOT block, when found, is the end of the message. So we
must handle it to swith in MSG_DONE state.
This fix is specific for 2.4. No backport needed.
When payload is received for a bodyless response, for instance a response to
a HEAD request, it is silently skipped. Unfortunately, when this happens,
the end of the message is not properly handled. The response remains in the
MSG_DATA state (or MSG_TRAILERS if the message is chunked). In addition,
when a zero-copy is possible, the data are not removed from the channel
buffer and the H1 connection is killed because an error is then triggered.
To fix the bug, the zero-copy is disabled for bodyless responses. It is not
a problem because there is no copy at all. And the last block (DATA or EOT)
is now properly handled.
This bug was introduced by the commit e5596bf53 ("MEDIUM: mux-h1: Don't emit
any payload for bodyless responses").
This fix is specific for 2.4. No backport needed.
During the message parsing, if in MSG_DONE state, the CS_FL_EOI flag must
always be set on the conn-stream if following conditions are met :
* It is a response or
* It is a request but not a protocol upgrade nor a CONNECT.
For now, there is no test on the message type (request or response). Thus
the CS_FL_EOI flag is not set for a response with a "Connection: upgrade"
header but not a 101 response.
This bug was introduced by the commit 3e1748bbf ("BUG/MINOR: mux-h1: Don't
set CS_FL_EOI too early for protocol upgrade requests"). It was backported
as far as 2.0. Thus, this patch must also be backported as far as 2.0.
If internal error is reported by the mux during HTTP request parsing, the
HTTP error counter should not be incremented. It should only be incremented
on parsing error to reflect errors caused by clients.
This patch must be backported as far as 2.0. During the backport, the same
must be performed for 408-request-time-out errors.
The HTTP error counter reflects the number of errors caused by
clients. Thus, In the H1 mux, it should only be increment on parsing errors.
This fix is specific for 2.4. No backport needed.
Historically we've been counting lots of client-triggered events in stick
tables to help detect misbehaving ones, but we've been missing the same on
the server side, and there's been repeated requests for being able to count
the server errors per URL in order to precisely monitor the quality of
service or even to avoid routing requests to certain dead services, which
is also called "circuit breaking" nowadays.
This commit introduces http_fail_cnt and http_fail_rate, which work like
http_err_cnt and http_err_rate in that they respectively count events and
their frequency, but they only consider server-side issues such as network
errors, unparsable and truncated responses, and 5xx status codes other
than 501 and 505 (since these ones are usually triggered by the client).
Note that retryable errors are purposely not accounted for, so that only
what the client really sees is considered.
With this it becomes very simple to put some protective measures in place
to perform a redirect or return an excuse page when the error rate goes
beyond a certain threshold for a given URL, and give more chances to the
server to recover from this condition. Typically it could look like this
to bypass a URL causing more than 10 requests per second:
stick-table type string len 80 size 4k expire 1m store http_fail_rate(1m)
http-request track-sc0 base # track host+path, ignore query string
http-request return status 503 content-type text/html \
lf-file excuse.html if { sc0_http_fail_rate gt 10 }
A more advanced mechanism using gpt0 could even implement high/low rates
to disable/enable the service.
Reg-test converteers_ref_cnt_never_dec.vtc was updated to test it.
The sleep time calculation in next_event_delay() was wrong because it
was dividing 999 by the number of pending events, and was directly
responsible for an observation made a long time ago that listeners
would eat all the CPU when hammered while globally rate-limited,
because the more the queued events, the least it would wait, and would
ignore the configured frequency to compute the delay.
This was addressed in various ways in listeners through the switch to
the FULL state and the wakeup of manage_global_listener_queue() that
avoids this fast loop, but the calculation made there remained wrong
nevertheless. It's even visible with this patch that the accept
frequency is much more accurate at low values now; for example,
configuring a maxconrate of 10 would give between 8.99 and 11.0 cps
before this patch and between 9.99 and 10.0 with it.
Better fix it now in case it's reused anywhere else and causes confusion
again. It maybe be backported but is probably not worth it.
When adding the server side support for certificate update over the CLI
we encountered a design problem with the SSL session cache which was not
locked.
Indeed, once a certificate is updated we need to flush the cache, but we
also need to ensure that the cache is not used during the update.
To prevent the use of the cache during an update, this patch introduce a
rwlock for the SSL server session cache.
In the SSL session part this patch only lock in read, even if it writes.
The reason behind this, is that in the session part, there is one cache
storage per thread so it is not a problem to write in the cache from
several threads. The problem is only when trying to write in the cache
from the CLI (which could be on any thread) when a session is trying to
access the cache. So there is a write lock in the CLI part to prevent
simultaneous access by a session and the CLI.
This patch also remove the thread_isolate attempt which is eating too
much CPU time and was not protecting from the use of a free ptr in the
session.
both SSL_CTX_set_msg_callback and SSL_CTRL_SET_MSG_CALLBACK defined since
ea262260469e49149cb10b25a87dfd6ad3fbb4ba, we can safely switch to that guard
instead of OpenSSL version
Because of a buggy tests when processing the EOH HTX block, an extra CRLF is
added for empty chunked messages. This bug was introduced by the commit
d1ac2b90c ("MAJOR: htx: Remove the EOM block type and use HTX_FL_EOM
instead").
This fix is specific for 2.4. No backport needed.
special guard macros HAVE_SSL_CTX_ADD_SERVER_CUSTOM_EXT was defined earlier
exactly for guarding SSL_CTX_add_server_custom_ext, let us use it wherever
appropriate
HAVE_SSL_CTX_ADD_SERVER_CUSTOM_EXT was introduced in ec60909871
however it was defined as HAVE_SL_CTX_ADD_SERVER_CUSTOM_EXT (missing "S")
let us fix typo
The demux loop could quit on missing data but the H2_CF_END_REACHED flag
would not be set in this case. This fixes a remaining situation where
previous commit f09612289 ("BUG/MEDIUM: mux-h2: handle remaining read0
cases") could not be sufficient and still leave CLOSE_WAIT. It's harder
to reproduce but was still observed in prod.
Now we quit via the end of the loop which already takes care of shutr.
This should be backported along with the patch above as far as 2.0.
If allocating a connection object failed right after a successful accept
on a listener, the new file descriptor was not properly closed.
This fixes GitHub issue #905.
It can be backported to 2.3.
During error files conversion to HTX message, in http_str_to_htx(), if a
file is empty, the corresponding buffer's area is initialized with a
malloc(0) and its size is set to 0. There is no problem here. The behaviour
is totally defined. But it is not really intuitive. Instead, we can simply
set the area to NULL.
This patch should fix the issue #1022.
Commit 3d4631fec ("BUG/MEDIUM: mux-h2: fix read0 handling on partial
frames") tried to address an issue introduced in commit aade4edc1 where
read0 wasn't properly handled in the middle of a frame. But the fix was
incomplete for two reasons:
- first, it would set H2_CF_RCVD_SHUT in h2_recv() after detecting
a read0 but the condition was guarded by h2_recv_allowed() which
explicitly excludes read0 ;
- second, h2_process would only call h2_process_demux() when there
were still data in the buffer, but closing after a short pause to
leave a buffer empty wouldn't be caught in this case.
This patch fixes this by properly taking care of the received shutdown
and by also waking up h2_process_demux() on an empty buffer if the demux
is not blocked.
Given the patches above were tagged for backporting to 2.0, this one
should be as well.
FD dumps are not always easy to match against netstat dumps, and often
require an lsof as a third dump. Let's emit the socket family, and the
local and remore ports when the FD is an IPv4/IPv6 socket, this will
significantly ease the matching.
The CO_FL_EARLY_SSL_HS flag was inconditionally set on the connection,
resulting in SSL_read_early_data() always being used first in handshake
calculations. While this seems to work well (probably that there are
fallback paths inside openssl), it's particularly confusing and makes
the debugging quite complicated. It possibly is not optimal by the way.
This flag ought to be set only when early_data is configured on the bind
line. Apparently there used to be a good reason for doing it this way in
1.8 times, but it really does not make sense anymore. It may be OK to
backport this to 2.3 if this helps with troubleshooting, but better not
go too far as it's unlikely to fix any real issue while it could introduce
some in old versions.
in the same manner of agentaddr, we now:
- permit to set agentport through `port` keyword, like it is the case
for agentaddr through `addr`
- set the priority on `agent-port` keyword when used
- add a flag to be able to test when the value is set like for agentaddr
it makes the behaviour between `addr` and `port` more consistent.
Signed-off-by: William Dauchy <wdauchy@gmail.com>
small consistency problem with `addr` and `agent-addr` options:
for the both options, the last one parsed is always used to set the
agent-check addr. Thus these two lines don't have the same behavior:
server ... addr <addr1> agent-addr <addr2>
server ... agent-addr <addr2> addr <addr1>
After this patch `agent-addr` will always be the priority option over
`addr`. It means we test the flag before setting agentaddr.
We also fix all the places where we did not set the flag to be coherent
everywhere.
I was not really able to determine where this issue is coming from. So
it is probable we may backport it to all stable version where the agent
is supported.
Signed-off-by: William Dauchy <wdauchy@gmail.com>
We can currently change the check-port using the cli command `set server
check-port` but there is a consistency issue when using server state.
This patch aims to fix this problem but will be also a good preparation
work to get rid of checkport flag, so we are able to know when checkport
was set by config.
I am fully aware this is not making github #953 moving forward, I
however think this might be acceptable while waiting for a proper
solution and resolve consistency problem faced with port settings.
Signed-off-by: William Dauchy <wdauchy@gmail.com>
While trying to fix some consistency problem with the config file/cli
(e.g. check-port cli command does not set the flag), we realised
checkport flag was not necessarily needed. Indeed tcpcheck uses service
port as the last choice if check.port is zero. So we can assume if
check.port is zero, it means it was never set by the user, regardless if
it is by the cli or config file. In the longterm this will avoid to
introduce a new consistency issue if we forget to set the flag.
in the same manner of checkport flag, we don't really need checkaddr
flag. We can assume if checkaddr is not set, it means it was never set
by the user or config.
Signed-off-by: William Dauchy <wdauchy@gmail.com>
When a server dns resolution is performed, there is no reason to set an
unconfigured check port with the server port. Because by default, if the
check port is not set, the server's one is used. Thus we can remove this
useless assignment. It is mandatory for next improvements.
When the server state is loaded from a server-state file, there is no reason
to set an unconfigured check port with the server port. Because by default,
if the check port is not set, the server's one is used. Thus we can remove
this useless assignment. It is mandatory for next improvements.
while reading `update_server_addr_port` I found out some things which
can be seen as incoherency. I hope I did not overlooked anything:
- one comment is stating check's address should be updated if it uses
the server one; however the condition checks if `SRV_F_CHECKADDR` is
set; this flag is set when a check address is set; result is that we
override the check address where I was not expecting it. In fact we
don't need to update anything here as server addr is used when check
addr is not set.
- same goes for check agent addr
- for port, it is a bit different, we update the check port if it is
unset. This is harmless because we also use server port if check port
is unset. However it creates some incoherency before/after using this
command, as check port should stay unset througout the life of the
process unless it is is set by `set server check-port` command.
quite hard to locate the origin of this this issue but the function was
introduced in commit d458adcc52 ("MINOR:
new update_server_addr_port() function to change both server's ADDR and
service PORT"). I was however not able to determine whether this is due
to a change of behavior along the years. So this patch can potentially
be backported up to v1.8 but we must be careful while doing so, as the
code has changed a lot. That being said, the bug being not very
impacting I would be fine keeping it for 2.4 only.
Signed-off-by: William Dauchy <wdauchy@gmail.com>
Flush the SSL session cache when updating a certificate which is used on a
server line. This prevent connections to be established with a cached
session which was using the previous SSL_CTX.
This patch also replace the ha_barrier with a thread_isolate() since there
are more operations to do. The reg-test was also updated to remove the
'no-ssl-reuse' keyword which is now uneeded.
Duplicate titles for the stats H2_ST_{OPEN,TOTAL}_{CONN,STREAM}. These
entries are used on csv for the heading.
This must be backported up to 2.3.
This fixes the github issue #1102.
As spotted in issue #822, we're having a problem with error detection in
the SSL layer. The problem is that on an overwhelmed machine, accepted
connections can start to pile up, each of them requiring a slow handshake,
and during all this time if the client aborts, the handshake will still be
calculated.
The error controls are properly placed, it's just that the SSL layer
reads records exactly of the advertised size, without having the ability
to encounter a pending connection error. As such if injecting many TLS
connections to a listener with a huge backlog, it's fairly possible to
meet this situation:
12:50:48.236056 accept4(8, {sa_family=AF_INET, sin_port=htons(62794), sin_addr=inet_addr("127.0.0.1")}, [128->16], SOCK_NONBLOCK) = 1109
12:50:48.236071 setsockopt(1109, SOL_TCP, TCP_NODELAY, [1], 4) = 0
(process other connections' handshakes)
12:50:48.257270 getsockopt(1109, SOL_SOCKET, SO_ERROR, [ECONNRESET], [4]) = 0
(proof that error was detectable there but this code was added for the PoC)
12:50:48.257297 recvfrom(1109, "\26\3\1\2\0", 5, 0, NULL, NULL) = 5
12:50:48.257310 recvfrom(1109, "\1\0\1\3"..., 512, 0, NULL, NULL) = 512
(handshake calculation taking 700us)
12:50:48.258004 sendto(1109, "\26\3\3\0z"..., 1421, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = -1 EPIPE (Broken pipe)
12:50:48.258036 close(1109) = 0
The situation was amplified by the multi-queue accept code, as it resulted
in many incoming connections to be accepted long before they could be
handled. Prior to this they would have been accepted and the handshake
immediately started, which would have resulted in most of the connections
waiting in the the system's accept queue, and dying there when the client
aborted, thus the error would have been detected before even trying to
pass them to the handshake code.
As a result, with a listener running on a very large backlog, it's possible
to quickly accept tens of thousands of connections and waste time slowly
running their handshakes while they get replaced by other ones.
This patch adds an SO_ERROR check on the connection's FD before starting
the handshake. This is not pretty as it requires to access the FD, but it
does the job.
Some improvements should be made over the long term so that the transport
layers can report extra information with their ->rcv_buf() call, or at the
very least, implement a ->get_conn_status() function to report various
flags such as shutr, shutw, error at various stages, allowing an upper
layer to inquire for the relevance of engaging into a long operation if
it's known the connection is not usable anymore. An even simpler step
could probably consist in implementing this in the control layer.
This patch is simple enough to be backported as far as 2.0.
Many thanks to @ngaugler for his numerous tests with detailed feedback.
The "abort ssl cert" command is buggy and removes the current ckch store,
and instances, leading to SNI removal. It must only removes the new one.
This patch also adds a check in set_ssl_cert.vtc and
set_ssl_server_cert.vtc.
Must be backported as far as 2.2.
In order to unify prometheus and stats description, we need to remove
some field reference which are specific to stats implementation:
- `scur` in max current sessions (also reword current session)
- `rate` in max sessions
- `req_rate` in max requests
- `conn_rate` in max connections
Signed-off-by: William Dauchy <wdauchy@gmail.com>
In order to unify prometheus and stats description, we need to clarify
the description for pending connections.
- remove the BE reference in counters struct, as it is also used in
servers
- remove reference of `qcur` field in description as it is specific to
stats implemention
- try to reword cur and max pending connections description
Signed-off-by: William Dauchy <wdauchy@gmail.com>
The function get_check_status_result() can now be used to get the result
code (CHK_RES_*) corresponding to a check status (HCHK_STATUS_*). It will be
used by the Prometheus exporter when reporting the check status of a server.
During the call to thread_isolate(), some other threads might have
performed some task_wakeup() which will have a call date past the
one we retrieved. It could be avoided by taking the current date
once we're alone but this would significantly affect the latency
measurements by adding the isolation time. Instead we're now only
accounting positive times, so that late wakeups normally appear
with a zero latency.
No backport is needed, this is 2.4.
In h2s_bck_make_req_headers() function, in the loop on the HTX blocks, the
most common blocks, the headers, are now handled in first, before the
start-line. The same change was already performed on the response HEADERS
frames. Thus the code is more consistent now.
When a HEADERS frame is sent, it is always when an HTX start-line block is
found. Thus, in h2s_bck_make_req_headers() and h2s_frt_make_resp_headers()
functions, it is useless to tests the start-line. Instead of being too
defensive, we use BUG_ON() now because it must not happen and must be
handled as a bug.
This patch should fix the issue #1086.
The list is always defined by definition. Thus there is no reason to test
it. There is also plenty of checks on arguments types while it is already
validated during the configuration parsing. But one thing at a time.
This patch should fix the issue #1087.
The sample fetch functions must always be called with a valid argument
list. When called by hand, if there is no argument to pass, empty_arg_list must
be used.
In the stick-table code, there are some calls to smp_fetch_src() with NULL as
argument list. It is changed to use empty_arg_list instead. It is not really a
bug because smp_fetch_src() does not use the argument list. But it is an API
bug.
This patch may be backported to all stable branches as a cleanup.
h1_process_output() function is never called with no data to send (count ==
0). Thus, the first test on count, at the beginning of the function is
useless and may be removed. This way, by reading the code, it is obvious the
<chn_htx> variable is always defined.
This patch should fix the issue #1085.
This handler can take quite some time as it deletes a large number of
entries under a lock, let's export it so that it's immediately visible
in "show profiling".
The check I/O handler, process_chk_conn and server_warmup are often
present in complex backtraces as they're impacted by locking or I/O
issues. Let's export them so that they resolve cleanly.
This finally adds the long-awaited solution to inspect the run queues
and figure what is eating the CPU or causing latencies. We can even see
the experienced latencies when profiling is enabled. Example on a
saturated process:
> show tasks
Running tasks: 14983 (4 threads)
function places % lat_tot lat_avg
process_stream 4948 33.0 5.840m 70.82ms
h1_io_cb 2535 16.9 - -
main+0x9e670 2508 16.7 2.930m 70.10ms
ssl_sock_io_cb 2499 16.6 - -
si_cs_io_cb 2493 16.6 - -
If a user enables profiling by hand, it makes sense to reset the stats
counters to provide fresh new measurements. Therefore it's worth using
this as the standard method to reset counters.
"show profiling" will now dump the stats collected by the scheduler if
profiling was previously enabled. This will immediately make it obvious
what functions are responsible for others' high latencies or which ones
are suffering from others, and should help spot issues like undesired
wakeups.
Example:
Per-task CPU profiling : on # set profiling tasks {on|auto|off}
Tasks activity:
function calls cpu_tot cpu_avg lat_tot lat_avg
si_cs_io_cb 5569479 23.37s 4.196us - -
h1_io_cb 5558654 13.60s 2.446us - -
process_stream 250841 1.476s 5.882us 3.499s 13.95us
main+0x9e670 198 - - 5.526ms 27.91us
task_run_applet 17 1.509ms 88.77us 205.8us 12.11us
srv_cleanup_idle_connections 12 44.51us 3.708us 25.71us 2.142us
main+0x158c80 9 48.72us 5.413us - -
srv_cleanup_toremove_connections 5 165.1us 33.02us 123.6us 24.72us
Now when the profiling is enabled, the scheduler wlil update per-function
task-level statistics on number of calls, cpu usage and lateny, that could
later be checked using "show profiling". This will immediately make it
obvious what functions are responsible for others' high latencies or which
ones are suffering from others, and should help spot issues like undesired
wakeups. For now the stats are only collected but not reported (though they
are readable from sched_activity[] under gdb).
The new sched_activity structure will be used to collect task-level
activity based on the target function. The principle is to declare a
large enough array to make collisions rare (256 entries), and hash
the function pointer using a reduced XXH to decide where to store the
stats. On first computation an entry is definitely assigned to the
array and it's done atomically. A special entry (0) is used to store
collisions ("others"). The goal is to make it easy and inexpensive for
the scheduler code to use these to store #calls, cpu_time and lat_time
for each task.
In 2.0, commit d2d3348ac ("MINOR: activity: enable automatic profiling
turn on/off") introduced an automatic mode to enable/disable profiling.
The problem is that the automatic mode automatically changes to on/off,
which implied that the forced on/off modes aren't sticky anymore. It's
annoying when debugging because as soon as the load decreases, profiling
stops.
This makes a small change which ought to have been done first, which
consists in having two states for "auto" (auto-on, auto-off) to
distinguish them from the forced states. Setting to "auto" in the config
defaults to "auto-off" as before, and setting it on the CLI switches to
auto but keeps the current operating state.
This is simple enough to be backported to older releases if needed.
When reporting some values in debugging output we often need to have
some condensed, stable-length values. This function prints a duration
from nanosecond to years with at least 4 digits of accuracy using the
most suitable unit, always on 7 chars.
Do not consider reuse connection if available list is not allocated for
the target server. This will prevent a crash when using a standalone
server for an external purpose like socket_tcp/socket_ssl on hlua code.
For the idle/safe lists, they are considered allocated if
srv.max_idle_conns is not null.
Note that the hlua code is currently safe thanks to the additional
checks on proxy http mode and stream reuse policy not never. However,
this might not be sufficient for future code.
This patch should be backported in every branches containing the
following patch :
7f68d815af (2.4 tree)
REORG: backend: simplify conn_backend_get
This reverts commit 62e8aaa1bd.
While is works extremely well to address SSL handshake floods, it prevents
establishment of new connections during regular traffic above 50-60 Gbps,
because for an unknown reason the queue seems to have ~1.7 active tasks
per connection all the time, which makes no sense as these ought to be
waiting on subscribed events. It might uncover a deeper issue but at least
for now a different solution is needed. cf issue #822.
The test is trivial to run, just start a config with tune.runqueue-depth 10
and inject on 1GB objects with more than 10 connections. Try to connect to
the stats socket, it only works once, then the listeners are not dequeued.
In github issue #822, user @ngaugler reported some performance problems when
dealing with many concurrent SSL connections on restarts, after migrating
from 1.6 to 2.2, indicating a long time required to re-establish connections.
The Run_queue metric in the traces showed an abnormally high number of tasks
in the run queue, likely indicating we were accepting faster than we could
process. And this is indeed one of the differences between 1.6 and 2.2, the
accept I/O loop and the TLS handshakes are totally independent, so much that
they can even run on different threads. In 1.6 the SSL handshake was handled
almost immediately after the accept(), so this was limiting the input rate.
With large maxconn values, as long as there are incoming connections, new
I/Os are scheduled and many of them pass before the handshake, being tagged
for low latency processing.
The result is that handshakes get postponed, and are further postponed as
new connections are accepted. When they are finally able to be processed,
some of them fail as the client is gone, and the client had already queued
new ones. This causes an excess number of apparent connections and total
number of handshakes to be processed, just because we were accepting
connections on a temporarily saturated machine.
The solution is to temporarily pause new incoming connections when the
load already indicates that more tasks are already queued than will be
handled in a poll loop. The difficulty with this usually is to be able
to come back to re-enable the operation, but given that the metric is
the run queue, we just have to queue the global_listener_queue task so
that it gets picked by any thread once the run queues get flushed.
Before this patch, injecting with SSL reneg with 10000 concurrent
connections resulted in 350k tasks in the run queue, and a majority of
handshake timeouts noticed by the client. With the patch, the run queue
fluctuates between 1-3x runqueue-depth, the process is constantly busy, the
accept rate is maximized and clients observe no error anymore.
It would be desirable to backport this patch to 2.3 and 2.2 after some more
testing, provided the accept loop there is compatible.
The allowed chunk size was historically limited to 2GB to avoid risk of
overflow. This restriction is no longer necessary because the chunk size is
immediately stored into a 64bits integer after the parsing. Thus, it is now
possible to raise this limit. However to never fed possibly bogus values
from languages that use floats for their integers, we don't get more than 13
hexa-digit (2^52 - 1). 4 petabytes is probably enough !
This patch should fix the issue #1065. It may be backported as far as
2.1. For the 2.0, the legacy HTTP part must be reviewed. But there is
honestely no reason to do so.
A number of traces could be added or changed to report errors with
TRACE_ERROR. The goal is to be able to enable error tracing only to detect
anomalies.
A number of traces could be added or changed to report errors with
TRACE_ERROR. The goal is to be able to enable error tracing only to detect
anomalies.
In order to announce support for the Extended CONNECT h2 method by
haproxy, always send the ENABLE_CONNECT_PROTOCOL h2 settings. This new
setting has been described in the rfc 8441.
After receiving ENABLE_CONNECT_PROTOCOL, the client is free to use the
Extended CONNECT h2 method. This can notably be useful for the support
of websocket handshake on http/2.
Support for the rfc 8441 Bootstraping WebSockets with HTTP/2
Convert an Extended CONNECT HTTP/2 request into a htx representation.
The htx message uses the GET method with an Upgrade header field to be
fully compatible with the equivalent HTTP/1.1 Upgrade mechanism.
The Extended CONNECT is of the following form :
:method = CONNECT
:protocol = websocket
:scheme = https
:path = /chat
:authority = server.example.com
The new pseudo-header :protocol has been defined and is used to identify
an Extended CONNECT method. Contrary to standard CONNECT, Extended
CONNECT must have :scheme, :path and :authority defined.
Add the header Sec-Websocket-Key when generating a h1 handshake websocket
without this header. This is the case when doing h2-h1 conversion.
The key is randomly generated and base64 encoded. It is stored on the session
side to be able to verify response key and reject it if not valid.
Support for the rfc 8441 Bootstraping WebSockets with HTTP/2
Generate an HTTP/2 Extended CONNECT request from a htx Upgrade message.
This conversion is done when seeing the header Connection: Upgrade. A
CONNECT request is written with the :protocol pseudo-header set from the
Upgrade htx header value.
The protocol is saved in the h2s structure. This is needed on the
response side because the protocol is not present on HTTP/2 response but
is needed if the client side is using HTTP/1.1 with 101 status code.
Support for the rfc 8441 Bootstraping WebSockets with HTTP/2
Convert a 200 status reply from an Extended CONNECT request into a htx
representation. The htx message is set to 101 status code to be fully
compatible with the equivalent HTTP/1.1 Upgrade mechanism.
This conversion is only done if the stream flags H2_SF_EXT_CONNECT_SENT
has been set. This is true if an Extended CONNECT request has already
been seen on the stream.
Besides the 101 status, the additional headers Connection/Upgrade are
added to the htx message. The protocol is set from the value stored in
h2s. Typically it will be extracted from the client request. This is
only used if the client is using h1 as only the HTTP/1.1 101 Response
contains the Upgrade header.
This flag is used to signal that an Extended CONNECT has been sent by
the server mux on the current stream.
This will allow to convert the response to a 101 htx status message.
Add the Sec-Websocket-Accept header on a websocket handshake response.
This header may be missing if a h2 server is used with a h1 client.
The response key is calculated following the rfc6455. For this, the
handshake request key must be stored in the h1 session, as a new field
name ws_key. Note that this is only done if the message has been
prealably identified as a Websocket handshake request.
If a request is identified as a WebSocket handshake, it must contains a
websocket key header or else it can be reject, following the rfc6455.
A new flag H1_MF_UPG_WEBSOCKET is set on such messages. For the request
te be identified as a WebSocket handshake, it must contains the headers:
Connection: upgrade
Upgrade: websocket
This commit is a compagnon of
"MEDIUM: h1: generate WebSocket key on response if needed" and
"MEDIUM: h1: add a WebSocket key on handshake if needed".
Indeed, it ensures that a WebSocket key is added only from a http/2 side
and not for a http/1 bogus peer.
The code dealing with the copy of requests in the L7-buffer and the
retransmits during L7 retries has been moved in the HTTP analysers. The copy
is now performed in the REQ_HTTP_XFER_BODY analyser and the L7 retries is
performed in the RES_WAIT_HTTP analyser. This way, si_cs_recv() and
si_cs_send() don't care of it anymore. It is much more natural to deal with
L7 retry in HTTP analysers.
Some responses must not contain data. Reponses to HEAD requests and 204/304
responses. But there is no warranty that this will be really respected by
the senders or even if it is possible. For instance, the method may be
rewritten by an http-request rule (HEAD->GET). Thus, it is not really
possible to always strip these data from the response at the receive
stage. And the response may be emitted by an applet or an internal service
not strictly following the spec. All that to say that we may be prepared to
handle payload for bodyless responses on the sending path.
In addition, unlike the HTTP/1, it is not really clear that the trailers is
part of the payload or not. Thus, some clients may expect to have the
trailers, if any, in the response to a HEAD request. For instance, the GRPC
status is placed in a trailer and clients rely on it. But what happens for
204 responses then. Read the following thread for details :
https://lists.w3.org/Archives/Public/ietf-http-wg/2020OctDec/0040.html
So, thanks to previous patches, it is now possible to know on the sending
path if a response must be bodyless or not. So, for such responses, no DATA
frame is emitted, except eventually the last empty one carring the ES
flag. However, the TRAILERS frames are still emitted. The h2s_skip_data()
function is added to take care to remove HTX DATA blocks without emitting
any DATA frame expect the last one, if there is no trailers.
The H2 message flag H2_MSGF_BODYLESS_RSP is now used during the request or
the response parsing to notify the mux that, considering the parsed message,
the response is known to have no body. This happens during HEAD requests
parsing and during 204/304 responses parsing.
On the H2 multiplexer, the equivalent flag is set on H2 streams. Thus the
H2_SF_BODYLESS_RESP flag is set on a H2 stream if the H2_MSGF_BODYLESS_RSP
is found after a HEADERS frame parsing. Conversely, this flag is also set
when a HEADERS frame is emitted for HEAD requests and for 204/304 responses.
The H2_SF_BODYLESS_RESP flag will be used to ignore data payload from the
response but not the trailers.
No connection header must be added by the H1 mux in 1xx messages, including
101. Existing connection headers remains untouched, especially the "Connection:
upgrade" of 101 responses. This patch only avoids to add "Connection: close" or
"Connection: keep-alive" to 1xx responses.
204 and 1xx responses must not have any payload. Now, the H1 mux takes care
of that in last resort. But they also must not have any C-L or T-E
headers. Thus, if found on the sending path, these headers are ignored.
Some responses must not contain data. Reponses to HEAD requests and 204/304
xresponses. But there is no warranty that this will be really respected by
the senders or even if it is possible. For instance, the method may be
rewritten by an http-request rule (HEAD->GET). Thus, it is not really
possible to always strip the payload from the response at the receive
stage. And the response may be emitted by an applet or an internal service
not strictly following the spec. All that to say that we may be prepared to
handle payload for bodyless responses on the sending path.
So, thanks to previous patches, it is now possible to know on the sending
path if a response must be bodyless or not. So, for such responses, no
payload is emitted, all HTX blocks after the EOH are silently removed
(including the trailers).
In HTTP/1, responses to HEAD requests and 204/304 must not have payload. The
H1S_F_BODYLESS_RESP flag is not set on streams that should handle such
responses, on the client side and the server side.
On the client side, this flag is set when a HEAD request is parsed and when
a 204/304 response is emitted. On the server side, this happends when a HEAD
request is emitted or a 204/304 response is parsed.
The EOM block may be removed. The HTX_FL_EOM flags is enough. Most of time,
to know if the end of the message is reached, we just need to have an empty
HTX message with HTX_FL_EOM flag set. It may also be detected when the last
block of a message with HTX_FL_EOM flag is manipulated.
Removing EOM blocks simplifies the HTX message filling. Indeed, there is no
more edge problems when the message ends but there is no more space to write
the EOM block. However, some part are more tricky. Especially the
compression filter or the FCGI mux. The compression filter must finish the
compression on the last DATA block. Before it was performed on the EOM
block, an extra DATA block with the checksum was added. Now, we must detect
the last DATA block to be sure to finish the compression. The FCGI mux on
its part must be sure to reserve the space for the empty STDIN record on the
last DATA block while this record was inserted on the EOM block.
The H2 multiplexer is probably the part that benefits the most from this
change. Indeed, it is now fairly easier to known when to set the ES flag.
The HTX documentaion has been updated accordingly.
Tunnel management between the H1 and H2 multiplexers is a bit blurred. And
the HTX is not enough well defined on this point to make things clear. In
fact, Establishing a tunnel between an H2 client and an H1 server, or the
opposite is buggy because the both multiplexers don't handle the EOM block
the same way when a tunnel is established. In fact, the H2 multiplexer is
pretty strict and add an END_STREAM flag when an EOM block is found, while
the H1 multiplexer is more flexible.
The purpose of this patch is to make the EOM block usage pretty clear and to
fix the HTTP multiplexers to really handle HTTP tunnels in the right
way. Now, an EOM block is used to mark the end of an HTTP message,
semantically speaking. That means it may be followed by tunneled data. Thus,
CONNECT requests are now finished by an EOM block, just after the EOH block.
On the H1 multiplexer side, a tunnel is now only established on the response
path. So a CONNECT request remains in a DONE state waiting for the 2xx
response. On the H2 multiplexer side, a flag is used to know an HTTP tunnel
is requested, to not immediately add the END_STREAM flag on the EOM block.
All these changes are sensitives and not backportable because of recent
changes. The same problem exists on earlier versions and should be
addressed. But it will only be possible with a specific patchset.
This patch relies on the following ones :
* MEDIUM: mux-h1: Properly handle tunnel establishments and aborts
* MEDIUM: mux-h2: Close streams when processing data for an aborted tunnel
* MEDIUM: mux-h2: Block client data on server side waiting tunnel establishment
* MINOR: mux-h2: Add 2 flags to help to properly handle tunnel mode
* MINOR: mux-h1: Split H1C_F_WAIT_OPPOSITE flag to separate input/output sides
* MINOR: mux-h1/mux-fcgi: Don't set TUNNEL mode if payload length is unknown
In the same way than the H2 mux, we now bloc data sending on the server side
if a tunnel is not fully established. In addition, if some data are still
pending for a aborted tunnel, an error is triggered and the server
connection is closed.
To do so, we rely on the H1C_F_WAIT_INPUT flag to bloc the output
processing. This patch contributes to fix the tunnel mode between the H1 and
the H2 muxes.
In the previous patch ("MEDIUM: mux-h2: Block client data on server side
waiting tunnel establishment"), we added a way to block client data for not
fully established tunnel on the server side. This one closes the stream with
an ERR_CANCEL erorr if there are some pending tunneled data while the tunnel
was aborted. This may happen on the client side if a non-empty DATA frame or
an empty DATA frame without the ES flag is received. This may also happen on
the server side if there is a DATA htx block. However in this last case, we
first wait the response is fully forwarded.
This patch contributes to fix the tunnel mode between the H1 and the H2
muxes.
On the server side, when a tunnel is not fully established, we must block
tunneled data, waiting for the server response. It is mandatory because the
server may refuse the tunnel. This happens when a DATA htx block is
processed in tunnel mode (H2_SF_BODY_TUNNEL flag set) but before the
response HEADERS frame is received (H2_SF_HEADERS_RCVD flag no set). In this
case, the H2_SF_BLK_MBUSY flag is set to mark the stream as busy. This flag
is removed when the tunnel is fully established or aborted.
This patch contributes to fix the tunnel mode between the H1 and the H2
muxes.
H2_SF_BODY_TUNNEL and H2_SF_TUNNEL_ABRT flags are added to properly handle
the tunnel mode in the H2 mux. The first one is used to detect tunnel
establishment or fully established tunnel. The second one is used to abort a
tunnel attempt. It is the first commit having as a goal to fix tunnel
establishment between H1 and H2 muxes.
There is a subtlety in h2_rcv_buf(). CS_FL_EOS flag is added on the
conn-stream when ES is received on a tunneled stream. It really reflects the
conn-stream state and is mandatory for next commits.
The H1C_F_WAIT_OPPOSITE flag is now splitted in 2 flags, H1C_F_WAIT_INPUT
and H1C_F_WAIT_OUTPUT, depending on the side is waiting. The change is a
prerequisite to fix the tunnel mode management in HTTP muxes.
H1C_F_WAIT_INPUT must be used to bloc the output side and to wait for an
event from the input side. H1C_F_WAIT_OUTPUT does the opposite. It bloc the
input side and wait for an event from the output side.
Responses with no C-L and T-E headers are no longer switched in TUNNEL mode
and remains in DATA mode instead. The H1 and FCGI muxes are updated
accordingly. This change reflects the real message state. It is not a true
tunnel. Data received are still part of the message.
It is not a bug. However, this message may be backported after some
observation period (at least as far as 2.2).
As stated in the RFC7540, section 8.1.1, the HTTP/2 removes support for the
101 informational status code. Thus a PROTOCOL_ERROR is now returned to the
server if a 101-switching-protocols response is received. Thus, the server
connection is aborted.
This patch may be backported as far as 2.0.
A 101-switching-protocols response must contain a Connection header with the
Upgrade option. And this response must only be received from a server if the
client explicitly requested a protocol upgrade. Thus, the request must also
contain a Connection header with the Upgrade option. If not, a
502-bad-gateway response is returned to the client. This way, a tunnel is
only established if both sides are agree.
It is closer to what the RFC says, but it remains a bit flexible because
there is no check on the Upgrade header itself. However, that's probably
enough to ensure a tunnel is not established when not requested.
This one is not tagged as a bug. But it may be backported, at least to
2.3. It relies on :
* MINOR: htx/http-ana: Save info about Upgrade option in the Connection header
Add an HTX start-line flag and its counterpart into the HTTP message to
track the presence of the Upgrade option into the Connection header. This
way, without parsing the Connection header again, it will be easy to know if
a client asks for a protocol upgrade and if the server agrees to do so. It
will also be easy to perform some conformance checks when a
101-switching-protocols is received.
It is the second part and the most important of the fix.
Since the mux-h1 refactoring, and more specifically since the commit
c4bfa59f1 ("MAJOR: mux-h1: Create the client stream as later as possible"),
the upgrade from a TCP client connection to H1 is broken. Indeed, now the H1
mux is responsible to create the frontend conn-stream once the request
headers are fully received. But, to properly support TCP to H1 upgrades, we
must inherit from the existing conn-stream. To do so, if the conn-stream
already exists when the client H1 connection is created, we create a H1
stream in ST_ATTACHED state, but not ST_READY, and the conn-stream is
attached to it. Because the ST_READY state is not set, no data are xferred
to the data layer when h1_rcv_buf() is called and shutdowns are inhibited
except on client aborts. This way, the request is parsed the same way than
for a classical H1 connection. Once the request headers are fully received
and parsed, the data stream is upgraded and the ST_READY state is set.
A tricky case appears when an H2 upgrade is performed because the H2 preface
is matched. In this case, the conn-stream must be detached and destroyed
before switching to the H2 mux and releasing the current H1 mux. We must
also take care to detach and destroy the conn-stream when a timeout
occurres.
This patch relies on the following series of patches :
* BUG/MEDIUM: stream: Don't immediatly ack the TCP to H1 upgrades
* MEDIUM: http-ana: Do nothing in wait-for-request analyzer if not htx
* MINOR: stream: Add a function to validate TCP to H1 upgrades
* MEDIUM: mux-h1: Add ST_READY state for the H1 connections
* MINOR: mux-h1: Wake up instead of subscribe for reads after H1C creation
* MINOR: mux-h1: Try to wake up data layer first before calling its wake callback
* MINOR: stream-int: Take care of EOS in the SI wake callback function
* BUG/MINOR: stream: Don't update counters when TCP to H2 upgrades are performed
This fix is specific for 2.4. No backport needed.
Instead of switching the stream to HTX mode, the request channel is only
reset (the request buffer is xferred to the mux) and the SF_IGNORE flag is
set on the stream. This flag prevent any processing in case of abort. Once
the upgrade confirmed, the flag is removed, in stream_upgrade_from_cs().
It is only the first part of the fix. The next one ("BUG/MAJOR: mux-h1:
Properly handle TCP to H1 upgrades") is also required. Both rely on the
following series of patches :
* MEDIUM: http-ana: Do nothing in wait-for-request analyzer if not htx
* MINOR: stream: Add a function to validate TCP to H1 upgrades
* MEDIUM: mux-h1: Add ST_READY state for the H1 connections
* MINOR: mux-h1: Wake up instead of subscribe for reads after H1C creation
* MINOR: mux-h1: Try to wake up data layer first before calling its wake callback
* MINOR: stream-int: Take care of EOS in the SI wake callback function
* BUG/MINOR: stream: Don't update counters when TCP to H2 upgrades are performed
This fix is specific for 2.4. No backport needed.
If http_wait_for_request() analyzer is called with a non-htx stream, nothing
is performed and we return immediatly. For now, it is totally unexpected.
But it will be true during TCP to H1 upgrades, once fixed. Indeed, there
will be a transition period during these upgrades. First the mux will be
upgraded and the not the stream, and finally the stream will be upgraded by
the mux once ready. In the meantime, the stream will still be in raw
mode. Nothing will be performed in wait-for-request analyzer because it will
be the mux responsibility to handle errors.
This patch is required to fix the TCP to H1 upgrades.
TCP to H1 upgrades are buggy for now. When such upgrade is performed, a
crash is experienced. The bug is the result of the recent H1 mux
refactoring, and more specifically because of the commit c4bfa59f1 ("MAJOR:
mux-h1: Create the client stream as later as possible"). Indeed, now the H1
mux is responsible to create the frontend conn-stream once the request
headers are fully received. Thus the TCP to H1 upgrade is a problem because
the frontend conn-stream already exists.
To fix the bug, we must keep this conn-stream and the associate stream and
use it in the H1 mux. To do so, the upgrade will be performed in two
steps. First, the mux is upgraded from mux-pt to mux-h1. Then, the mux-h1
performs the stream upgrade, once the request headers are fully received and
parsed. To do so, stream_upgrade_from_cs() must be used. This function set
the SF_HTX flags to switch the stream to HTX mode, it removes the SF_IGNORE
flags and eventually it fills the request channel with some input data.
This patch is required to fix the TCP to H1 upgrades and is intimately
linked with the next commits.
An alive H1 connection may be in one of these 3 states :
* ST_IDLE : not active and is waiting to be reused (no h1s and no cs)
* ST_EMBRYONIC : active with a h1s but without any cs
* ST_ATTACHED : active with a h1s and a cs
ST_IDLE and ST_ATTACHED are possible for frontend and backend
connection. ST_EMBRYONIC is only possible on the client side, when we are
waiting for the request headers. The last one is the expected state for an
active connection processing data. These states are mutually exclusives.
Now, there is a new state, ST_READY. It may only be set if ST_ATTACHED is
also set and when the CS is considered as fully active. For now, ST_READY is
set in the same time of ST_ATTACHED. But it will be used to fix TCP to H1
upgrades. Idea is to have an H1 connection in ST_ATTACHED state but not
ST_READY yet and have more or less the same behavior than an H1 connection
in ST_EMBRYONIC state. And when the upgrade is fully achieved, the ST_READY
state may be set and the data layer may be notified accordingly.
So for now, this patch should not change anything. TCP to H1 upgrades are
still buggy. But it is mandatory to make it work properly.
When a H1 connection is created, we now wakeup the H1C tasklet if there are
some data in the input buffer. If not we only subscribe for reads.
This patch is required to fix the TCP to H1 upgrades.
Instead of calling the data layer wake callback function, we now first try
to wake it up. If the data layer is subscribed for receives or for sends,
its tasklet is woken up. The wake callback function is only called as the
last chance to notify the data layer.
Because si_cs_process() is also the SI wake callback function, it may be
called from the mux layer. Thus, in such cases, it is performed outside any
I/O event and si_cs_recv() is not called. If a read0 is reported by the mux,
via the CS_FL_EOS flag, the event is not handled, because only si_cs_recv()
take care of this flag for now.
It is not a bug, because this does not happens for now. All muxes set this
flag when the data layer retrieve data (via mux->rcv_buf()). But it is safer
to be prepared to handle it from the wake callback. And in fact, it will be
useful to fix the HTTP upgrades of TCP connections (especially TCP>H1>H2
upgrades).
To be sure to not handle the same event twice, it is only handled if the
shutr is not already set on the input channel.
The reuse of idle connections should only happen for a proxy with the
http mode. In case of a backend with the tcp mode, the reuse selection
and insertion in session list are skipped.
This behavior is present since commit :
MEDIUM: connection: Add private connections synchronously in session server list
It could also be further exagerated by :
MEDIUM: backend: add reused conn to sess if mux marked as HOL blocking
It can be backported up to 2.3.
Use chunk_inistr() for a chunk initialisation in
ssl_sock_load_sctl_from_file() instead of a manual initialisation which
was not initialising head.
Fix issue #1073.
Must be backported as far as 2.2
The new ckch_inst_new_load_srv_store() function which mimics the
ckch_inst_new_load_store() function includes some dead code which was
used only in the former function.
Fix issue #1081.
The previous patch was pushed too quickly (399bf72f6 "BUG/MINOR: stats:
Remove a break preventing ST_F_QCUR to be set for servers"). It was not an
extra break but a misplaced break statement. Thus, now a break statement
must be added after filling the ST_F_MODE field in stats_fill_sv_stats().
No backport needed except if the above commit is backported.
There is an extra break statement wrongly placed in stats_fill_sv_stats()
function, just before filling the ST_F_QCUR field. It prevents this field to
be set to the right value for servers.
No backport needed except if commit 3a9a4992 ("MEDIUM: stats: allow to
select one field in `stats_fill_sv_stats`") is backported.
This patch makes things more consistent between the bind_conf functions
and the server ones:
- ssl_sock_load_srv_ckchs() loads the SSL_CTX in the server
(ssl_sock_load_ckchs() load the SNIs in the bind_conf)
- add the server parameter to ssl_sock_load_srv_ckchs()
- changes made to the ckch_inst are done in
ckch_inst_new_load_srv_store()
Since the server SSL_CTX is now stored in the ckch_inst, it is not
needed anymore to pass an SSL_CTX to ckch_inst_new_load_srv_store() and
ssl_sock_load_srv_ckchs().
The new feature allowing the change of server side certificates
introduced duplicated free code. Rework the code in
cli_io_handler_commit_cert() to be more consistent.
The client_crt member is not used anymore since the server's ssl context
initialization now behaves the same way as the bind lines one (using
ckch stores and instances).
When trying to update a backend certificate, we should find a
server-side ckch instance thanks to which we can rebuild a new ssl
context and a new ckch instance that replace the previous ones in the
server structure. This way any new ssl session will be built out of the
new ssl context and the newly updated certificate.
This resolves a subpart of GitHub issue #427 (the certificate part)
In order for the backend server's certificate to be hot-updatable, it
needs to fit into the implementation used for the "bind" certificates.
This patch follows the architecture implemented for the frontend
implementation and reuses its structures and general function calls
(adapted for the server side).
The ckch store logic is kept and a dedicated ckch instance is used (one
per server). The whole sni_ctx logic was not kept though because it is
not needed.
All the new functions added in this patch are basically server-side
copies of functions that already exist on the frontend side with all the
sni and bind_cond references removed.
The ckch_inst structure has a new 'is_server_instance' flag which is
used to distinguish regular instances from the server-side ones, and a
new pointer to the server's structure in case of backend instance.
Since the new server ckch instances are linked to a standard ckch_store,
a lookup in the ckch store table will succeed so the cli code used to
update bind certificates needs to be covered to manage those new server
side ckch instances.
Split the server's ssl context initialization into the general ssl
related initializations and the actual initialization of a single
SSL_CTX structure. This way the context's initialization will be
usable by itself from elsewhere.
Reorganize the conditions for the reuse of idle/safe connections :
- reduce code by using variable to store reuse mode and idle/safe conns
counts
- consider that idle/safe/avail lists are properly allocated if
max_idle_conns not null. An allocation failure prevents haproxy
startup.
It is only a problem on the response path because the request payload length
it always known. But when a filter is registered to analyze the response
payload, the filtering may hang if the server closes just after the headers.
The root cause of the bug comes from an attempt to allow the filters to not
immediately forward the headers if necessary. A filter may choose to hold
the headers by not forwarding any bytes of the payload. For a message with
no payload but a known payload length, there is always a EOM block to
forward. Thus holding the EOM block for bodyless messages is a good way to
also hold the headers. However, messages with an unknown payload length,
there is no EOM block finishing the message, but only a SHUTR flag on the
channel to mark the end of the stream. If there is no payload when it
happens, there is no payload at all to forward. In the filters API, it is
wrongly detected as a condition to not forward the headers.
Because it is not the most used feature and not the obvious one, this patch
introduces another way to hold the message headers at the begining of the
forwarding. A filter flag is added to explicitly says the headers should be
hold. A filter may choose to set the STRM_FLT_FL_HOLD_HTTP_HDRS flag and not
forwad anything to hold the headers. This flag is removed at each call, thus
it must always be explicitly set by filters. This flag is only evaluated if
no byte has ever been forwarded because the headers are forwarded with the
first byte of the payload.
reg-tests/filters/random-forwarding.vtc reg-test is updated to also test
responses with unknown payload length (with and without payload).
This patch must be backported as far as 2.0.
prometheus approach requires to output all values for a given metric
name; meaning we iterate through all metrics, and then iterate in the
inner loop on all objects for this metric.
In order to allow more code reuse, adapt the stats API to be able to
select one field or fill them all otherwise.
This patch follows what has already been done on frontend and backend
side.
From this patch it should be possible to remove most of the duplicate
code on prometheuse side for the server.
A few things to note though:
- state require prior calculation, so I moved that to a sort of helper
`stats_fill_be_stats_computestate`.
- all ST_F*TIME fields requires some minor compute, so I moved it at te
beginning of the function under a condition.
Signed-off-by: William Dauchy <wdauchy@gmail.com>
prometheus approach requires to output all values for a given metric
name; meaning we iterate through all metrics, and then iterate in the
inner loop on all objects for this metric.
In order to allow more code reuse, adapt the stats API to be able to
select one field or fill them all otherwise.
This patch follows what has already been done on frontend side.
From this patch it should be possible to remove most of the duplicate
code on prometheuse side for the backend
A few things to note though:
- status and uweight field requires prior compute, so I moved that to a
sort of helper `stats_fill_be_stats_computesrv`.
- all ST_F*TIME fields requires some minor compute, so I moved it at te
beginning of the function under a condition.
Signed-off-by: William Dauchy <wdauchy@gmail.com>
while working on backend/servers I realised I could have written that in
a better way and avoid one extra break. This is slightly improving
readiness.
also while being here, fix function declaration which was not 100%
accurate.
this patch does not change the behaviour of the code.
Signed-off-by: William Dauchy <wdauchy@gmail.com>
In stats_fill_fe_stats(), some fields are conditionnal (ST_F_HRSP_* for
instance). But unlike unimplemented fields, for those fields, the <metric>
variable is used to fill the <stats> array, but it is not initialized. This
bug as no impact, because these fields are not used. But it is better to fix
it now to avoid future bugs.
To fix it, the metric is now defined and initialized into the for loop.
The bug was introduced by the commit 0ef54397 ("MEDIUM: stats: allow to
select one field in `stats_fill_fe_stats`"). No backport is needed except if
the above commit is backported. It fixes the issue #1063.
A regression was introduced by the commit 0ef54397b ("MEDIUM: stats: allow
to select one field in `stats_fill_fe_stats`"). stats_fill_fe_stats()
function fails on unimplemented metrics for frontends. However, not all
stats metrics are used by frontends. For instance ST_F_QCUR. As a
consequence, the frontends stats are always skipped.
To fix the bug, we just skip unimplemented metric for frontends. An error is
triggered only if a specific field is given and is unimplemented.
No backport is needed except if the above commit is backported.
Lua requires LLONG_MAX defined with __USE_ISOC99 which is set by
_GNU_SOURCE, not necessarely defined by default on old compiler/glibc.
$ make V=1 TARGET=linux-glibc-legacy USE_THREAD= USE_ACCEPT4= USE_PCRE=1 USE_OPENSSL=1 USE_ZLIB=1 USE_LUA=1
..
cc -Iinclude -O2 -g -Wall -Wextra -Wdeclaration-after-statement -fwrapv -Wno-strict-aliasing -Wno-unused-label -Wno-sign-compare -Wno-unused-parameter -Wno-missing-field-initializers -DUSE_EPOLL -DUSE_NETFILTER -DUSE_PCRE -DUSE_POLL -DUSE_TPROXY -DUSE_LINUX_TPROXY -DUSE_LINUX_SPLICE -DUSE_LIBCRYPT -DUSE_CRYPT_H -DUSE_GETADDRINFO -DUSE_OPENSSL -DUSE_LUA -DUSE_FUTEX -DUSE_ZLIB -DUSE_CPU_AFFINITY -DUSE_DL -DUSE_RT -DUSE_PRCTL -DUSE_THREAD_DUMP -I/usr/include/openssl101e/ -DUSE_PCRE -I/usr/include -DCONFIG_HAPROXY_VERSION=\"2.4-dev5-73246d-83\" -DCONFIG_HAPROXY_DATE=\"2021/01/21\" -c -o src/hlua.o src/hlua.c
In file included from /usr/local/include/lua.h:15,
from /usr/local/include/lauxlib.h:15,
from src/hlua.c:16:
/usr/local/include/luaconf.h:581:2: error: #error "Compiler does not support 'long long'. Use option '-DLUA_32BITS' or '-DLUA_C89_NUMBERS' (see file 'luaconf.h' for details)"
..
cc -Iinclude -O2 -g -Wall -Wextra -Wdeclaration-after-statement -fwrapv -Wno-strict-aliasing -Wno-unused-label -Wno-sign-compare -Wno-unused-parameter -Wno-missing-field-initializers -DUSE_EPOLL -DUSE_NETFILTER -DUSE_PCRE -DUSE_POLL -DUSE_TPROXY -DUSE_LINUX_TPROXY -DUSE_LINUX_SPLICE -DUSE_LIBCRYPT -DUSE_CRYPT_H -DUSE_GETADDRINFO -DUSE_OPENSSL -DUSE_LUA -DUSE_FUTEX -DUSE_ZLIB -DUSE_CPU_AFFINITY -DUSE_DL -DUSE_RT -DUSE_PRCTL -DUSE_THREAD_DUMP -I/usr/include/openssl101e/ -DUSE_PCRE -I/usr/include -DCONFIG_HAPROXY_VERSION=\"2.4-dev5-73246d-83\" -DCONFIG_HAPROXY_DATE=\"2021/01/21\" -c -o src/hlua_fcn.o src/hlua_fcn.c
In file included from /usr/local/include/lua.h:15,
from /usr/local/include/lauxlib.h:15,
from src/hlua_fcn.c:17:
/usr/local/include/luaconf.h:581:2: error: #error "Compiler does not support 'long long'. Use option '-DLUA_32BITS' or '-DLUA_C89_NUMBERS' (see file 'luaconf.h' for details)"
..
Cc: Thierry Fournier <tfournier@arpalert.org>
hlua_init() uses 'idx' only in openssl related code, while 'i' is used
in shared code and is safe to be reused. This commit replaces the use of
'idx' with 'i'
$ make V=1 TARGET=linux-glibc USE_LUA=1 USE_OPENSSL=
..
cc -Iinclude -O2 -g -Wall -Wextra -Wdeclaration-after-statement -fwrapv -Wno-address-of-packed-member -Wno-unused-label -Wno-sign-compare -Wno-unused-parameter -Wno-clobbered -Wno-missing-field-initializers -Wno-cast-function-type -Wtype-limits -Wshift-negative-value -Wshift-overflow=2 -Wduplicated-cond -Wnull-dereference -DUSE_EPOLL -DUSE_NETFILTER -DUSE_POLL -DUSE_THREAD -DUSE_BACKTRACE -DUSE_TPROXY -DUSE_LINUX_TPROXY -DUSE_LINUX_SPLICE -DUSE_LIBCRYPT -DUSE_CRYPT_H -DUSE_GETADDRINFO -DUSE_LUA -DUSE_FUTEX -DUSE_ACCEPT4 -DUSE_CPU_AFFINITY -DUSE_TFO -DUSE_NS -DUSE_DL -DUSE_RT -DUSE_PRCTL -DUSE_THREAD_DUMP -I/usr/include/lua5.3 -I/usr/include/lua5.3 -DCONFIG_HAPROXY_VERSION=\"2.4-dev5-37286a-78\" -DCONFIG_HAPROXY_DATE=\"2021/01/21\" -c -o src/hlua.o src/hlua.c
src/hlua.c: In function 'hlua_init':
src/hlua.c:9145:6: warning: unused variable 'idx' [-Wunused-variable]
9145 | int idx;
| ^~~
When writing commit a8459b28c ("MINOR: debug: create
ha_backtrace_to_stderr() to dump an instant backtrace") I just forgot
that some distros are a bit extremist about the syscall return values.
src/debug.c: In function `ha_backtrace_to_stderr':
src/debug.c:147:3: error: ignoring return value of `write', declared with attribute warn_unused_result [-Werror=unused-result]
write(2, b.area, b.data);
^~~~~~~~~~~~~~~~~~~~~~~~
CC src/h1_htx.o
Let's apply the usual tricks to shut them up. No backport is needed.
The dump state is now passed to the function so that the caller can adjust
the behavior. A new series of 4 values allow to stop *after* dumping main
instead of before it or any of the usual loops. This allows to also report
BUG_ON() that could happen very high in the call graph (e.g. startup, or
the scheduler itself) while still understanding what the call path was.
The purpose is to enable the dumping of a backtrace on BUG_ON(). While
it's very useful to know that a condition was met, very often some
caller context is missing to figure how the condition could happen.
From now on, on systems featuring backtrace, a backtrace of the calling
thread will also be dumped to stderr in addition to the unexpected
condition. This will help users of DEBUG_STRICT as they'll most often
find this backtrace in their logs even if they can't find their core
file.
A new "debug dev bug" expert-mode CLI command was added to test the
feature.
This function calls the ha_dump_backtrace() function with a locally
allocated buffer and sends the output slightly indented to fd #2. It's
meant to be used as an emergency backtrace dump.
The backtrace dumping code was located into the thread dump function
but it looks particularly convenient to be able to call it to produce
a dump in other situations, so let's move it to its own function and
make sure it's called last in the function so that we can benefit from
tail merging to save one entry.
In order to simplify the code and remove annoying ifdefs everywhere,
let's always export my_backtrace() and make it adapt to the situation
and return zero if not supported. A small update in the thread dump
function was needed to make sure we don't use its results if it fails
now.
Since commit aade4edc1 ("BUG/MEDIUM: mux-h2: Don't handle pending read0
too early on streams"), we've met a few cases where an early connection
close wouldn't be properly handled if some data were pending in a frame
header, because the test now considers the buffer's contents before
accepting to report the close, but given that frame headers or preface
are consumed at once, the buffer cannot make progress when it's stuck
at intermediary lengths.
In order to address this, this patch introduces two flags in the h2c
connection to store any reported shutdown and failed parsing. The idea
is that we cannot rely on conn_xprt_read0_pending() in the parser since
it wouldn't consider data pending in the buffer nor intermediary layers,
but we know for certain that after a read0 is reported by the transport
layer in presence of an RD_SH on the connection, no more progress will
be made there. This alone is not sufficient to decide to end processing,
we can only do this once these final data have been submitted to a parser.
Therefore, now when a parser fails on missing data, we check if a read0
has already been reported on this connection, and if so we set a new
END_REACHED flag on the connection to indicate a failure to process the
final data. The h2c_read0_pending() function now simply reports this
flag's status. This way we're certain that the input shutdown is only
considered after the demux attempted to parse the last frame.
Maybe over the long term the subscribe() API should be improved to
synchronously fail when trying to subscribe for an even that will not
happen. This may be an elegant solution that could possibly work across
multiple layers and even muxes, and be usable at a few specific places
where that's needed.
Given the patch above was backported as far as 2.0, this one should be
backported there as well. It is possible that the fcgi mux has the same
issue, but this was not analysed yet.
Thanks to Pierre Cheynier for providing detailed traces allowing to
quickly narrow the problem down, and to Olivier for his analysis.
When a TCP to H2 upgrade is performed, the SF_IGNORE flag is set on the
stream before killing it. This happens when a TCP/SSL client connection is
routed to a HTTP backend and the h2 alpn detected. The SF_IGNORE flag was
added for this purpose, to skip some processing when the stream is aborted
before a mux upgrade. Some counters updates were skipped this way. But some
others are still updated.
Now, all counters update at the end of process_stream(), before releasing
the stream, are ignored if SF_IGNORE flag is set. Note this stream is
aborted because we switch from a mono-stream to a multi-stream
multiplexer. It works differently for TCP to H1 upgrades.
This patch should be backported as far as 2.0 after some observation period.
use `stats_fill_fe_stats` when possible to avoid duplicating code; make
use of field selector to get the needed field only.
this should not introduce any difference of output.
Signed-off-by: William Dauchy <wdauchy@gmail.com>
prometheus approach requires to output all values for a given metric
name; meaning we iterate through all metrics, and then iterate in the
inner loop on all objects for this metric.
In order to allow more code reuse, adapt the stats API to be able to
select one field or fill them all otherwise.
From this patch it should be possible to remove most of the duplicate
code on prometheuse side for the frontend.
Signed-off-by: William Dauchy <wdauchy@gmail.com>
Another patch in order to try to reconciliate haproxy stats and
prometheus. Here I'm adding a proper start time field in order to make
proper use of uptime field.
That being done we can move the calculation in `fill_info`
Signed-off-by: William Dauchy <wdauchy@gmail.com>
in order to prepare a possible merge of fields between haproxy stats and
prometheus, duplicate 3 fields:
INF_MEMMAX
INF_POOL_ALLOC
INF_POOL_USED
Those were specifically named in MB unit which is not what prometheus
recommends. We therefore used them but changed the unit while doing the
calculation. It created a specific case for that, up to the description.
This patch:
- removes some possible confusion, i.e. using MB field for bytes
- will permit an easier merge of fields such as description
First consequence for now, is that we can remove the calculation on
prometheus side and move it on `fill_info`.
Signed-off-by: William Dauchy <wdauchy@gmail.com>
If an HTTP protocol upgrade request with a payload is received, a
501-not-implemented error is now returned to the client. It is valid from
the RFC point of view but will be incompatible with the way the H2
websockets will be handled by HAProxy. And it is probably a very uncommon
way to do perform protocol upgrades.
With this patch, the H1 mux is now able to return 501-not-implemented errors
to client during the request parsing. However, no such errors are returned
for now.
The MUX_ES_NOTIMPL_ERR exit status is added to allow the multiplexers to
report errors about not implemented features. This will be used by the H1
mux to return 501-not-implemented errors.
Add the support for the 501-not-implemented status code with the
corresponding default message. The documentation is updated accordingly
because it is now part of status codes HAProxy may emit via an errorfile or
a deny/return HTTP action.
Just like the H1 muliplexer, when a new frontend H2 stream is created, the
rxbuf is xferred to the stream at the upper layer.
Originally, it is not a bug fix, but just an api standardization. And in
fact, it fixes a crash when a h2 stream is aborted after the request parsing
but before the first call to process_stream(). It crashes since the commit
8bebd2fe5 ("MEDIUM: http-ana: Don't process partial or empty request
anymore"). It is now totally unexpected to have an HTTP stream without a
valid request. But here the stream is unable to get the request because the
client connection was aborted. Passing it during the stream creation fixes
the bug. But the true problem is that the stream-interfaces are still
relying on the connection state while only the muxes should do so.
This fix is specific for 2.4. No backport needed.
When a tcpcheck ruleset uses multiple connections, the existing one must be
closed and destroyed before openning the new one. This part is handled in
the tcpcheck_main() function, when called from the wake callback function
(wake_srv_chk). But it is indeed a problem, because this function may be
called from the mux layer. This means a mux may call the wake callback
function of the data layer, which may release the connection and the mux. It
is easy to see how it is hazardous. And actually, depending on the
scheduling, it leads to crashes.
Thus, we must avoid to release the connection in the wake callback context,
and move this part in the check's process function instead. To do so, we
rely on the CHK_ST_CLOSE_CONN flags. When a connection must be replaced by a
new one, this flag is set on the check, in tcpcheck_main() function, and the
check's task is woken up. Then, the connection is really closed in
process_chk_conn() function.
This patch must be backported as far as 2.2, with some adaptations however
because the code is not exactly the same.
glibc < 2.10 requires _GNU_SOURCE in order to make use of strsignal(),
otherwise leading to SEGV at runtime.
$ make V=1 TARGET=linux-glibc-legacy USE_THREAD= USE_ACCEPT4=
..
src/mworker.c: In function 'mworker_catch_sigchld':
src/mworker.c:285: warning: implicit declaration of function 'strsignal'
src/mworker.c:285: warning: pointer/integer type mismatch in conditional expression
..
$ make V=1 reg-tests REGTESTS_TYPES=slow,default
..
###### Test case: reg-tests/mcli/mcli_start_progs.vtc ######
## test results in: "/tmp/haregtests-2021-01-19_15-18-07.n24989/vtc.29077.28f6153d"
---- h1 Bad exit status: 0x008b exit 0x0 signal 11 core 128
---- h1 Assert error in haproxy_wait(), src/vtc_haproxy.c line 792: Condition(*(&h->fds[1]) >= 0) not true. Errno=0 Success
..
$ gdb ./haproxy /tmp/core.0.haproxy.30270
..
Core was generated by `/root/haproxy/haproxy -d -W -S fd@8 -dM -f /tmp/haregtests-2021-01-19_15-18-07.'.
Program terminated with signal 11, Segmentation fault.
#0 0x00002aaaab387a10 in strlen () from /lib64/libc.so.6
(gdb) bt
#0 0x00002aaaab387a10 in strlen () from /lib64/libc.so.6
#1 0x00002aaaab354b69 in vfprintf () from /lib64/libc.so.6
#2 0x00002aaaab37788a in vsnprintf () from /lib64/libc.so.6
#3 0x00000000004a76a3 in memvprintf (out=0x7fffedc680a0, format=0x5a5d58 "Current worker #%d (%d) exited with code %d (%s)\n", orig_args=0x7fffedc680d0)
at src/tools.c:3868
#4 0x00000000004bbd40 in print_message (label=0x58abed "ALERT", fmt=0x5a5d58 "Current worker #%d (%d) exited with code %d (%s)\n", argp=0x7fffedc680d0)
at src/log.c:1066
#5 0x00000000004bc07f in ha_alert (fmt=0x5a5d58 "Current worker #%d (%d) exited with code %d (%s)\n") at src/log.c:1109
#6 0x0000000000534b7b in mworker_catch_sigchld (sh=<value optimized out>) at src/mworker.c:293
#7 0x0000000000556af3 in __signal_process_queue () at src/signal.c:88
#8 0x00000000004f6216 in signal_process_queue () at include/haproxy/signal.h:39
#9 run_poll_loop () at src/haproxy.c:2859
#10 0x00000000004f63b7 in run_thread_poll_loop (data=<value optimized out>) at src/haproxy.c:3028
#11 0x00000000004faaac in main (argc=<value optimized out>, argv=0x7fffedc68498) at src/haproxy.c:904
See: https://man7.org/linux/man-pages/man3/strsignal.3.html
Must be backported as far as 2.0.
If a subscriber's tasklet was called more than one million times, if
the ssl_ctx's connection doesn't match the current one, or if the
connection appears closed in one direction while the SSL stack is
still subscribed, the FD is reported as suspicious. The close cases
may occasionally trigger a false positive during very short and rare
windows. Similarly the 1M calls will trigger after 16GB are transferred
over a given connection. These are rare enough events to be reported as
suspicious.
A file descriptor which maps to a connection but has more than one
thread in its mask, or an FD handle that doesn't correspond to the FD,
or wiht no mux context, or an FD with no thread in its mask, or with
more than 1 million events is flagged as suspicious.
Now the show_fd helpers at the transport and mux levels return an integer
which indicates whether or not the inspected entry looks suspicious. When
an entry is reported as suspicious, "show fd" will suffix it with an
exclamation mark ('!') in the dump, that is supposed to help detecting
them.
For now, helpers were adjusted to adapt to the new API but none of them
reports any suspicious entry yet.
When dumping a live h1 stream, also take the opportunity for reporting
the subscriber including the event, tasklet, handler and context. Example:
3030 : st=0x21(R:rA W:Ra) ev=0x04(heOpi) [Lc] tmask=0x4 umask=0x0 owner=0x7f97805c1f70 iocb=0x65b847(sock_conn_iocb) back=1 cflg=0x00002300 sv=s1/recv mux=H1 ctx=0x7f97805c21b0 h1c.flg=0x80000200 .sub=1 .ibuf=0@(nil)+0/0 .obuf=0@(nil)+0/0 h1s=0x7f97805c2380 h1s.flg=0x4010 .req.state=MSG_DATA .res.state=MSG_RPBEFORE .meth=POST status=0 .cs.flg=0x00000000 .cs.data=0x7f97805c1720 .subs=0x7f97805c1748(ev=1 tl=0x7f97805c1990 tl.calls=2 tl.ctx=0x7f97805c1720 tl.fct=si_cs_io_cb) xprt=RAW
In FD dumps it's often very important to figure what upper layer function
is going to be called. Let's export the few I/O callbacks that appear as
tasklet functions so that "show fd" can resolve them instead of printing
a pointer relative to main. For example:
1028 : st=0x21(R:rA W:Ra) ev=0x01(heopI) [lc] tmask=0x2 umask=0x2 owner=0x7f00b889b200 iocb=0x65b638(sock_conn_iocb) back=0 cflg=0x00001300 fe=recv mux=H2 ctx=0x7f00c8824de0 h2c.st0=FRH .err=0 .maxid=795 .lastid=-1 .flg=0x0000 .nbst=0 .nbcs=0 .fctl_cnt=0 .send_cnt=0 .tree_cnt=0 .orph_cnt=0 .sub=1 .dsi=795 .dbuf=0@(nil)+0/0 .msi=-1 .mbuf=[1..1|32],h=[0@(nil)+0/0],t=[0@(nil)+0/0] xprt=SSL xprt_ctx=0x7f00c86d0750 xctx.st=0 .xprt=RAW .wait.ev=1 .subs=0x7f00c88252e0(ev=1 tl=0x7f00a07d1aa0 tl.calls=1047 tl.ctx=0x7f00c8824de0 tl.fct=h2_io_cb) .sent_early=0 .early_in=0
The SSL context contains a lot of important details that are currently
missing from debug outputs. Now that we detect ssl_sock, we can perform
some sanity checks, print the next xprt, the subscriber callback's context,
handler and number of calls. The process function is also resolved. This
now gives for example on an H2 connection:
1029 : st=0x21(R:rA W:Ra) ev=0x01(heopI) [lc] tmask=0x2 umask=0x2 owner=0x7fc714881700 iocb=0x65b528(sock_conn_iocb) back=0 cflg=0x00001300 fe=recv mux=H2 ctx=0x7fc734545e50 h2c.st0=FRH .err=0 .maxid=217 .lastid=-1 .flg=0x0000 .nbst=0 .nbcs=0 .fctl_cnt=0 .send_cnt=0 .tree_cnt=0 .orph_cnt=0 .sub=1 .dsi=217 .dbuf=0@(nil)+0/0 .msi=-1 .mbuf=[1..1|32],h=[0@(nil)+0/0],t=[0@(nil)+0/0] xprt=SSL xprt_ctx=0x7fc73478f230 xctx.st=0 .xprt=RAW .wait.ev=1 .subs=0x7fc734546350(ev=1 tl=0x7fc7346702e0 tl.calls=278 tl.ctx=0x7fc734545e50 tl.fct=main-0x144efa) .sent_early=0 .early_in=0
Just like we did for the muxes, now the transport layers will have the
ability to provide helpers to report more detailed information about their
internal context. When the helper is not known, the pointer continues to
be dumped as-is if it's not NULL. This way a transport with no context nor
dump function will not add a useless "xprt_ctx=(nil)" but the pointer will
be emitted if valid or if a helper is defined.
These ones are definitely missing from some dumps, let's report them! We
print the xprt's name instead of its useless pointer, as well as its ctx
when xprt is not NULL.
Over time the code has uglified, casting fdt.owner as a struct connection
for about everything. Let's have a const struct connection* there and take
this opportunity for passing all fields as const as well.
Additionally a misplaced closing parenthesis on the output was fixed.
When 0c439d895 ("BUILD: tools: make resolve_sym_name() return a const")
was written, the pointer argument ought to have been turned to const for
more flexibility. Let's do it now.
That was causing confusing outputs like this one whenan H2S is known:
1030 : ... last_h2s=0x2ed8390 .id=775 .st=HCR.flg=0x4001 .rxbuf=...
^^^^
This was introduced by commit ab2ec4540 in 2.1-dev2 so the fix can be
backported as far as 2.1.
This counter could be hugely incremented by the peer task responsible of managing
peer synchronizations and reconnections, for instance when a peer is not reachable
there is a period where the appctx is not created. If we receive stick-table
updates before the peer session (appctx) is instantiated, we reach the code
responsible of incrementing the "new_conn" counter.
With this patch we increment this counter only when we really instantiate a new
peer session thanks to peer_session_create().
May be backported as far as 2.0.
If a server varies on the accept-encoding header and it sends a response
with an encoding we do not know (see parse_encoding_value function), we
will not store it. This will prevent unexpected errors caused by
cache collisions that could happen in accept_encoding_hash_cmp.
commit 5a982a7165 ("MINOR:
contrib/prometheus-exporter: export build_info") is breaking lua
`core.get_info()`.
This patch makes sure build_info is correctly initialised in all cases.
Reviewed-by: William Dauchy <wdauchy@gmail.com>
Previous commit da2b0844f ("MINOR: peers: Add traces for peer control
messages.") introduced a build warning on some compiler versions after
the removal of variable "peers" in peer_send_msgs() because variable
"s" was used only to assign this one, and variable "si" to assign "s".
Let's remove both to fix the warning. No backport is needed.
V2 of this fix which includes a missing pointer initialization which was
causing a segfault in v1 (949a7f6459)
This bug happens when a service has multiple records on the same host
and the server provides the A/AAAA resolution in the response as AR
(Additional Records).
In such condition, the first occurence of the host will be taken from
the Additional section, while the second (and next ones) will be process
by an independent resolution task (like we used to do before 2.2).
This can lead to a situation where the "synchronisation" of the
resolution may diverge, like described in github issue #971.
Because of this behavior, HAProxy mixes various type of requests to
resolve the full list of servers: SRV+AR for all "first" occurences and
A/AAAA for all other occurences of an existing hostname.
IE: with the following type of response:
;; ANSWER SECTION:
_http._tcp.be2.tld. 3600 IN SRV 5 500 80 A2.tld.
_http._tcp.be2.tld. 3600 IN SRV 5 500 86 A3.tld.
_http._tcp.be2.tld. 3600 IN SRV 5 500 80 A1.tld.
_http._tcp.be2.tld. 3600 IN SRV 5 500 85 A3.tld.
;; ADDITIONAL SECTION:
A2.tld. 3600 IN A 192.168.0.2
A3.tld. 3600 IN A 192.168.0.3
A1.tld. 3600 IN A 192.168.0.1
A3.tld. 3600 IN A 192.168.0.3
the first A3 host is resolved using the Additional Section and the
second one through a dedicated A request.
When linking the SRV records to their respective Additional one, a
condition was missing (chek if said SRV record is already attached to an
Additional one), leading to stop processing SRV only when the target
SRV field matches the Additional record name. Hence only the first
occurence of a target was managed by an additional record.
This patch adds a condition in this loop to ensure the record being
parsed is not already linked to an Additional Record. If so, we can
carry on the parsing to find a possible next one with the same target
field value.
backport status: 2.2 and above
Display traces when sending/receiving peer control messages (synchronisation, heartbeat).
Add remaining traces when parsing malformed messages (acks, stick-table definitions)
or ignoring them.
Also add traces when releasing session or when reaching the PEER_SESS_ST_ERRPROTO
peer protocol state.
It's about the third time I get confused by these functions, half of
which manipulate the reference as a whole and those manipulating only
an entry. For me "pat_ref_commit" means committing the pattern reference,
not just an element, so let's rename it. A number of other ones should
really be renamed before 2.4 gets released :-/
There is no low level api to achieve same as Linux/FreeBSD, we rely
on CPUs available. Without this, the number of threads is just 1 for
Mac while having 8 cores in my M1.
Backporting to 2.1 should be enough if that's possible.
Signed-off-by: David CARLIER <devnexen@gmail.com>
An fatal error is now reported if a server is defined in a frontend
section. til now, a warning was just emitted and the server was ignored. The
warning was added in the 1.3.4 when the frontend/backend keywords were
introduced to allow a smooth transition and to not break existing
configs. It is old enough now to emit an fatal error in this case.
This patch is related to the issue #1043. It may be backported at least as
far as 2.2, and possibly to older versions. It relies on the previous commit
("MINOR: config: Add failifnotcap() to emit an alert on proxy capabilities").
The HAPROXY_CFGFILES env variable is built using a static trash chunk, via a
call to get_trash_chunk() function. This chunk is reserved during the whole
configuration parsing. It is far too large to guarantee it will not be
reused during the configuration parsing. And in fact, it happens in the lua
code since the commit f67442efd ("BUG/MINOR: lua: warn when registering
action, conv, sf, cli or applet multiple times"), when a lua script is
loaded.
To fix the bug, we now use a dynamic buffer instead. And we call memprintf()
function to handle both the allocation and the formatting. Allocation errors
at this stage are fatal.
This patch should fix the issue #1041. It must be backported as far as 2.0.
The strict-limits global option was introduced with commit 0fec3ab7b
("MINOR: init: always fail when setrlimit fails"). When used in
conjuction with master-worker, haproxy will not fail when a setrlimit
fails. This happens because we only exit() if master-worker isn't used.
This patch removes all tests for master-worker mode for all cases covered
by strict-limits scope.
This should be backported from 2.1 onward.
This should fix issue #1042.
Reviewed by William Dauchy <wdauchy@gmail.com>
If a server is defined in a frontend, thus a proxy without the backend
capability, the 'check' and 'agent-check' keywords are ignored. This way, no
check is performed on an ignored server. This avoids a segfault because some
part of the tcpchecks are not fully initialized (or released for frontends
during the post-check).
In addition, an test on the server's proxy capabilities is performed when
checks or agent-checks are initialized and nothing is performed for servers
attached to a non-backend proxy.
This patch should fix the issue #1043. It must be backported as far as 2.2.
If an errors occurs during the sample expression parsing, the alloced
sample_expr is not freed despite having its main pointer reset.
This fixes GitHub issue #1046.
It could be backported as far as 1.8.
This reverts commit 949a7f6459.
The first part of the patch introduces a bug. When a dns answer item is
allocated, its <ar_item> is only initialized at the end of the parsing, when
the item is added in the answer list. Thus, we must not try to release it
during the parsing.
The second part is also probably buggy. It fixes the issue #971 but reverts
a fix for the issue #841 (see commit fb0884c8297 "BUG/MEDIUM: dns: Don't
store additional records in a linked-list"). So it must be at least
revalidated.
This revert fixes a segfault reported in a comment of the issue #971. It
must be backported as far as 2.2.
like it is done in other places, check the return value of
`alloc_trash_chunk` before using it. This was detected by coverity.
this patch fixes commit 591fc3a330
("BUG/MINOR: sample: fix concat() converter's corruption with non-string
variables"
As a consequence, this patch should be backported as far as 2.0
this should fix github issue #1039
Signed-off-by: William Dauchy <wdauchy@gmail.com>
I was looking at writing a simple first test for prometheus but I
realised there is no proper way to exclude it if haproxy was not built
with prometheus plugin.
Today we have `REQUIRE_OPTIONS` in reg-tests which is based on `Feature
list` from `haproxy -vv`. Those options are coming from the Makefile
itself.
A plugin is build this way:
EXTRA_OBJS="contrib/prometheus-exporter/service-prometheus.o"
It does register service actions through `service_keywords_register`.
Those are listed through `list_services` in `haproxy -vv`.
To facilitate parsing, I slightly changed the output to a single line
and integrate it in regtests shell script so that we can now specify a
dependency while writing a reg-test for prometheus, e.g:
#REQUIRE_SERVICE=prometheus-exporter
#REQUIRE_SERVICES=prometheus-exporter,foo
There might be other ways to handle this, but that's the cleanest I
found; I understand people might be concerned by this output change in
`haproxy -vv` which goes from:
Available services :
foo
bar
to:
Available services : foo bar
Signed-off-by: William Dauchy <wdauchy@gmail.com>
- check functions are never called with a NULL args list, it is always
an array, so first check can be removed
- the expression parser guarantees that we can't have anything else,
because we mentioned json converter takes a mandatory string argument.
Thus test on `ARGT_STR` can be removed as well
- also add breaking line between enum and function declaration
In order to validate it, add a simple json test testing very simple
cases but can be improved in the future:
- default json converter without args
- json converter failing on error (utf8)
- json converter with error being removed (utf8s)
Signed-off-by: William Dauchy <wdauchy@gmail.com>
GitHub Issue #1037 Reported a memory leak in deinit() caused by an
allocation made in sa2str() that was stored in srv_set_addr_desc().
When destroying each server for a proxy in deinit, include freeing the
memory in the key of server->addr_node.
The leak was introduced in commit 92149f9a8 ("MEDIUM: stick-tables: Add
srvkey option to stick-table") which is not in any released version so
no backport is needed.
Cc: Tim Duesterhus <tim@bastelstu.be>
Patrick Hemmer reported that calling concat() with an integer variable
causes a %00 to appear at the beginning of the output. Looking at the
code, it's not surprising. The function uses get_trash_chunk() to get
one of the trashes, but can call casting functions which will also use
their trash in turn and will cycle back to ours, causing the trash to
be overwritten before being assigned to a sample.
By allocating the trash from a pool using alloc_trash_chunk(), we can
avoid this. However we must free it so the trash's contents must be
moved to a permanent trash buffer before returning. This is what's
achieved using smp_dup().
This should be backported as far as 2.0.
This is from the output of codespell. It's done at once over a bunch
of files and only affects comments, so there is nothing user-visible.
No backport needed.
During a configuration check valgrind reports:
==14425== 0 bytes in 106 blocks are definitely lost in loss record 1 of 107
==14425== at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==14425== by 0x4C2FDEF: realloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==14425== by 0x443CFC: hlua_alloc (hlua.c:8662)
==14425== by 0x5F72B11: luaM_realloc_ (in /usr/lib/x86_64-linux-gnu/liblua5.3.so.0.0.0)
==14425== by 0x5F78089: luaH_free (in /usr/lib/x86_64-linux-gnu/liblua5.3.so.0.0.0)
==14425== by 0x5F707D3: sweeplist (in /usr/lib/x86_64-linux-gnu/liblua5.3.so.0.0.0)
==14425== by 0x5F710D0: luaC_freeallobjects (in /usr/lib/x86_64-linux-gnu/liblua5.3.so.0.0.0)
==14425== by 0x5F7715D: close_state (in /usr/lib/x86_64-linux-gnu/liblua5.3.so.0.0.0)
==14425== by 0x443D4C: hlua_deinit (hlua.c:9302)
==14425== by 0x543F88: deinit (haproxy.c:2742)
==14425== by 0x5448E7: deinit_and_exit (haproxy.c:2830)
==14425== by 0x5455D9: init (haproxy.c:2044)
This is due to Lua calling `hlua_alloc()` with `ptr = NULL` and `nsize = 0`.
While `realloc` is supposed to be equivalent `free()` if the size is `0` this
is only required for a non-NULL pointer. Apparently my allocator (or valgrind)
actually allocates a zero size area if the pointer is NULL, possibly taking up
some memory for management structures.
Fix this leak by specifically handling the case where both the pointer and the
size are `0`.
This bug appears to have been introduced with the introduction of the
multi-threaded Lua, thus this fix is specific for 2.4. No backport needed.
add base support for url encode following RFC3986, supporting `query`
type only.
- add test checking url_enc/url_dec/url_enc
- update documentation
- leave the door open for future changes
this should resolve github issue #941
Signed-off-by: William Dauchy <wdauchy@gmail.com>
Released version 2.4-dev5 with the following main changes :
- BUG/MEDIUM: mux_h2: Add missing braces in h2_snd_buf()around trace+wakeup
- BUILD: hpack: hpack-tbl-t.h uses VAR_ARRAY but does not include compiler.h
- MINOR: time: increase the minimum wakeup interval to 60s
- MINOR: check: do not ignore a connection header for http-check send
- REGTESTS: complete http-check test
- CI: travis-ci: drop coverity scan builds
- MINOR: atomic: don't use ; to separate instruction on aarch64.
- IMPORT: xxhash: update to v0.8.0 that introduces stable XXH3 variant
- MEDIUM: xxhash: use the XXH3 functions to generate 64-bit hashes
- MEDIUM: xxhash: use the XXH_INLINE_ALL macro to inline all functions
- CLEANUP: xxhash: remove the unused src/xxhash.c
- MINOR: sample: add the xxh3 converter
- REGTESTS: add tests for the xxh3 converter
- MINOR: protocol: Create proto_quic QUIC protocol layer.
- MINOR: connection: Attach a "quic_conn" struct to "connection" struct.
- MINOR: quic: Redefine control layer callbacks which are QUIC specific.
- MINOR: ssl_sock: Initialize BIO and SSL objects outside of ssl_sock_init()
- MINOR: connection: Add a new xprt to connection.
- MINOR: ssl: Export definitions required by QUIC.
- MINOR: cfgparse: Do not modify the QUIC xprt when parsing "ssl".
- MINOR: tools: Add support for QUIC addresses parsing.
- MINOR: quic: Add definitions for QUIC protocol.
- MINOR: quic: Import C source code files for QUIC protocol.
- MINOR: listener: Add QUIC info to listeners and receivers.
- MINOR: server: Add QUIC definitions to servers.
- MINOR: ssl: SSL CTX initialization modifications for QUIC.
- MINOR: ssl: QUIC transport parameters parsing.
- MINOR: quic: QUIC socket management finalization.
- MINOR: cfgparse: QUIC default server transport parameters init.
- MINOR: quic: Enable the compilation of QUIC modules.
- MAJOR: quic: Make usage of ebtrees to store QUIC ACK ranges.
- MINOR: quic: Attempt to make trace more readable
- MINOR: quic: Make usage of the congestion control window.
- MINOR: quic: Flag RX packet as ack-eliciting from the generic parser.
- MINOR: quic: Code reordering to help in reviewing/modifying.
- MINOR: quic: Add traces to congestion avoidance NewReno callback.
- MINOR: quic: Display the SSL alert in ->ssl_send_alert() callback.
- MINOR: quic: Update the initial salt to that of draft-29.
- MINOR: quic: Add traces for in flght ack-eliciting packet counter.
- MINOR: quic: make a packet build fails when qc_build_frm() fails.
- MINOR: quic: Add traces for quic_packet_encrypt().
- MINOR: cache: Refactoring of secondary_key building functions
- MINOR: cache: Avoid storing responses whose secondary key was not correctly calculated
- BUG/MINOR: cache: Manage multiple headers in accept-encoding normalization
- MINOR: cache: Add specific secondary key comparison mechanism
- MINOR: http: Add helper functions to trim spaces and tabs
- MEDIUM: cache: Manage a subset of encodings in accept-encoding normalizer
- REGTESTS: cache: Simplify vary.vtc file
- REGTESTS: cache: Add a specific test for the accept-encoding normalizer
- MINOR: cache: Remove redundant test in http_action_req_cache_use
- MINOR: cache: Replace the "process-vary" option's expected values
- CI: GitHub Actions: enable daily Coverity scan
- BUG/MEDIUM: cache: Fix hash collision in `accept-encoding` handling for `Vary`
- MEDIUM: stick-tables: Add srvkey option to stick-table
- REGTESTS: add test for stickiness using "srvkey addr"
- BUILD: Makefile: disable -Warray-bounds until it's fixed in gcc 11
- BUG/MINOR: sink: Return an allocation failure in __sink_new if strdup() fails
- BUG/MINOR: lua: Fix memory leak error cases in hlua_config_prepend_path
- MINOR: lua: Use consistent error message 'memory allocation failed'
- CLEANUP: Compare the return value of `XXXcmp()` functions with zero
- CLEANUP: Apply the coccinelle patch for `XXXcmp()` on include/
- CLEANUP: Apply the coccinelle patch for `XXXcmp()` on contrib/
- MINOR: qpack: Add static header table definitions for QPACK.
- CLEANUP: qpack: Wrong comment about the draft for QPACK static header table.
- CLEANUP: quic: Remove useless QUIC event trace definitions.
- BUG/MINOR: quic: Possible CRYPTO frame building errors.
- MINOR: quic: Pass quic_conn struct to frame parsers.
- BUG/MINOR: quic: Wrong STREAM frames parsing.
- MINOR: quic: Drop packets with STREAM frames with wrong direction.
- CLEANUP: ssl: Remove useless loop in tlskeys_list_get_next()
- CLEANUP: ssl: Remove useless local variable in tlskeys_list_get_next()
- MINOR: ssl: make tlskeys_list_get_next() take a list element
- Revert "BUILD: Makefile: disable -Warray-bounds until it's fixed in gcc 11"
- BUG/MINOR: cfgparse: Fail if the strdup() for `rule->be.name` for `use_backend` fails
- CLEANUP: mworker: remove duplicate pointer tests in cfg_parse_program()
- CLEANUP: Reduce scope of `header_name` in http_action_store_cache()
- CLEANUP: Reduce scope of `hdr_age` in http_action_store_cache()
- CLEANUP: spoe: fix typo on `var_check_arg` comment
- BUG/MINOR: tcpcheck: Report a L7OK if the last evaluated rule is a send rule
- CI: github actions: build several popular "contrib" tools
- DOC: Improve the message printed when running `make` w/o `TARGET`
- BUG/MEDIUM: server: srv_set_addr_desc() crashes when a server has no address
- REGTESTS: add unresolvable servers to srvkey-addr
- BUG/MINOR: stats: Make stat_l variable used to dump a stat line thread local
- BUG/MINOR: quic: NULL pointer dereferences when building post handshake frames.
- SCRIPTS: improve announce-release to support different tag and versions
- SCRIPTS: make announce release support preparing announces before tag exists
- CLEANUP: assorted typo fixes in the code and comments
- BUG/MINOR: srv: do not init address if backend is disabled
- BUG/MINOR: srv: do not cleanup idle conns if pool max is null
- CLEANUP: assorted typo fixes in the code and comments
- CLEANUP: few extra typo and fixes over last one ("ot" -> "to")
If a server is configured to not have any idle conns, returns immediatly
from srv_cleanup_connections. This avoids a segfault when a server is
configured with pool-max-conn to 0.
This should be backported up to 2.2.
Do not proceed on init_addr if the backend of the server is marked as
disabled. When marked as disabled, the server is not fully initialized
and some operation must be avoided to prevent segfault. It is correct
because there is no way to activate a disabled backend.
This fixes the github issue #1031.
This should be backported to 2.2.
Since ee63d4bd6 ("MEDIUM: stats: integrate static proxies stats in new
stats"), all dumped stats for a given domain, the default ones and the
modules ones, are merged in a signle array to dump them in a generic way.
For this purpose, the stat_l global variable is allocated at startup to
store a line of stats before the dump, i.e. all stats of an entity
(frontend, backend, listener, server or dns nameserver). But this variable
is not thread safe. If stats are retrieved concurrently by several clients
on different threads, the same variable is used. This leads to corrupted
stats output.
To fix the bug, the stat_l variable is now thread local.
This patch should probably solve issues #972 and #992. It must be backported
to 2.3.
GitHub Issue #1026 reported a crash during configuration check for the
following example config:
backend 0
server 0 0
server 0 0
HAProxy crashed in srv_set_addr_desc() due to a NULL pointer dereference
caused by `sa2str` returning NULL for an `AF_UNSPEC` address (`0`).
Check to make sure the address key is non-null before using it for
comparison or inserting it into the tree.
The crash was introduced in commit 92149f9a8 ("MEDIUM: stick-tables: Add
srvkey option to stick-table") which not in any released version so no
backport is needed.
Cc: Tim Duesterhus <tim@bastelstu.be>
When all rules of a tcpcheck ruleset are successfully evaluated, the right
check status must always be reported. It is true if the last evaluated rule
is an expect or a connect rule. But not if it is a send rule. In this
situation, nothing more is done until the check timeout expiration and a
L7TOUT is reported instead of a L7OK.
Now, by default, when all rules were successfully evaluated, a L7OK is
reported. When the last evaluated rule is an expect or a connect, the
behavior remains unchanged.
This patch should fix the issue #1027. It must be backported as far as 2.2.
This variable is only needed deeply nested in a single location and clang's
static analyzer complains about a dead initialization. Reduce the scope to
satisfy clang and the human that reads the function.
As reported in issue #1017, there are two harmless duplicate tests in
cfg_parse_program(), one made of a "if" using the same condition as the
loop it's in, and the other one being a null test before a free. This
just removes them. No backport is needed.
This patch fixes GitHub issue #1024.
I could track the `strdup` back to commit
3a1f5fda10 which is 1.9-dev8. It's probably not
worth the effort to backport it across this refactoring.
This patch should be backported to 1.9+.
As reported in issue #1010, gcc-11 as of 2021-01-05 is overzealous in
its -Warray-bounds check as it considers that a cast of a global struct
accesses the entire struct even if only one specific element is accessed.
This instantly breaks all lists making use of container_of() to build
their iterators as soon as the starting point is known if the next
element is retrieved from the list head in a way that is visible to the
compiler's optimizer, because it decides that accessing the list's next
element dereferences the list as a larger struct (which it does not).
The temporary workaround consisted in disabling -Warray-bounds, but this
warning is traditionally quite effective at spotting real bugs, and we
actually have is a single occurrence of this issue in the whole code.
By changing the tlskeys_list_get_next() function to take a list element
as the starting point instead of the current element, we can avoid
the starting point issue but this requires to change all call places
to write hideous casts made of &((struct blah*)ref)->list. At the
moment we only have two such call places, the first one being used to
initialize the list (which is the one causing the warning) and which
is thus easy to simplify, and the second one for which we already have
an aliased pointer to the reference that is still valid at the call
place, and given the original pointer also remained unchanged, we can
safely use this alias, and this is safer than leaving a cast there.
Let's make this change now while it's still easy.
The generated code only changed in function cli_io_handler_tlskeys_files()
due to register allocation and the change of variable scope between the
old one and the new one.
`getnext` was only used to fill `ref` at the beginning of the function. Both
have the same type. Replace the parameter name by `ref` to remove the useless
local variable.
After having re-read the RFC, we noticed there are two bugs in the STREAM
frame parser. When the OFF bit (0x04) in the frame type is not set
we must set the offset to 0 (it was not set at all). When the LEN bit (0x02)
is not set we must extend the length of the data field to the end of the packet
(it was not set at all).
This is issue is due to the fact that when we call the function
responsible of building CRYPTO frames to fill a buffer, the Length
field of this packet did not take into an account the trailing 16 bytes for
the AEAD tag. Furthermore, the remaining <room> available in this buffer
was not decremented by the CRYPTO frame length, but only by the CRYPTO data length
of this frame.
This patch fixes GitHub issue #1023.
The function was introduced in commit 99c453d ("MEDIUM: ring: new
section ring to declare custom ring buffers."), which first appeared
in 2.2-dev9. The fix should be backported to 2.2+.
This allows using the address of the server rather than the name of the
server for keeping track of servers in a backend for stickiness.
The peers code was also extended to support feeding the dictionary using
this key instead of the name.
Fixes#814
This patch fixes GitHub Issue #988. Commit ce9e7b2521
was not sufficient, because it fell back to a hash comparison if the bitmap
of known encodings was not acceptable instead of directly returning the the
cached response is not compatible.
This patch also extends the reg-test to test the hash collision that was
mentioned in #988.
Vary handling is 2.4, no backport needed.
The accept-encoding normalizer now explicitely manages a subset of
encodings which will all have their own bit in the encoding bitmap
stored in the cache entry. This way two requests with the same primary
key will be served the same cache entry if they both explicitely accept
the stored response's encoding, even if their respective secondary keys
are not the same and do not match the stored response's one.
The actual hash of the accept-encoding will still be used if the
response's encoding is unmanaged.
The encoding matching and the encoding weight parsing are done for every
subpart of the accept-encoding values, and a bitmap of accepted
encodings is built for every request. It is then tested upon any stored
response that has the same primary key until one with an accepted
encoding is found.
The specific "identity" and "*" accept-encoding values are managed too.
When storing a response in the key, we also parse the content-encoding
header in order to only set the response's corresponding encoding's bit
in its cache_entry encoding bitmap.
This patch fixes GitHub issue #988.
It does not need to be backported.
The accept-encoding part of the secondary key (vary) was only built out
of the first occurrence of the header. So if a client had two
accept-encoding headers, gzip and br for instance, the key would have
been built out of the gzip string. So another client that only managed
gzip would have been sent the cached resource, even if it was a br resource.
The http_find_header function is now called directly by the normalizers
so that they can manage multiple headers if needed.
A request that has more than 16 encodings will be considered as an
illegitimate request and its response will not be stored.
This fixes GitHub issue #987.
It does not need any backport.
If any of the secondary hash normalizing functions raises an error, the
secondary hash will be unusable. In this case, the response will not be
stored anymore.
Add traces to have an idea why this function may fail. In fact
in never fails when the passed parameters are correct, especially the
lengths. This is not the case when a packet is not correctly built
before being encrypted.
Even if the size of frames built by qc_build_frm() are computed so that
not to overflow a buffer, do not rely on this and always makes a packet
build fails if we could not build a frame.
Also add traces to have an idea where qc_build_frm() fails.
Fixes a memory leak in qc_build_phdshk_apkt().
This salt is ued at leat up to draft-32. At this date ngtcp2 always
uses this salt even if it started the draft-33 development.
Note that when the salt is not correct, we cannot remove the header
protection. In this case the packet number length is wrong.
At least displays the SSL alert error code passed to ->ssl_send_alert()
QUIC BIO method and the SSL encryption level. This function is newly called
when using picoquic client with a recent version of BoringSSL (Nov 19 2020).
This is not the case with OpenSSL with 32 as QUIC draft implementation.
Reorder by increasing type the switch/case in qc_parse_pkt_frms()
which is the high level frame parser.
Add new STREAM_X frame types to support some tests with ngtcp2 client.
Add ->flags to the QUIC frame parser as this has been done for the builder so
that to flag RX packets as ack-eliciting at low level. This should also be
helpful to maintain the code if we have to add new flags to RX packets.
Remove the statements which does the same thing as higher level in
qc_parse_pkt_frms().
Remove ->ifcdata which was there to control the CRYPTO data sent to the
peer so that not to saturate its reception buffer. This was a sort
of flow control.
Add ->prep_in_flight counter to the QUIC path struct to control the
number of bytes prepared to be sent so that not to saturare the
congestion control window. This counter is increased each time a
packet was built. This has nothing to see with ->in_flight which
is the real in flight number of bytes which have really been sent.
We are olbiged to maintain two such counters to know how many bytes
of data we can prepared before sending them.
Modify traces consequently which were useful to diagnose issues about
the congestion control window usage.
As there is a lot of information in this protocol, this is not
easy to make the traces readable. We remove here a few of them and
shorten some line shortening the variable names.
This patch is there to initialize the default transport parameters for QUIC
as a preparation for one of the QUIC next steps to come: fully support QUIC
protocol for haproxy servers.
Implement ->accept_conn() callback for QUIC listener sockets.
Note that this patch also implements quic_session_accept() function
which is similar to session_accept_fd() without calling conn_complete_session()
at this time because we do not have any real QUIC mux.
Makes TLS/TCP and QUIC share the same CTX initializer so that not to modify the
caller which is an XPRT callback used both by the QUIC xprt and the SSL xprt over
TCP.
This patch adds QUIC structs to server struct so that to make the QUIC code
compile. Also initializes the ebtree to store the connections by connection
IDs.
This patch adds a quic_transport_params struct to bind_conf struct
used for the listeners. This is to store the QUIC transport parameters
for the listeners. Also initializes them when calling str2listener().
Before str2sa_range() it's too early to figure we're going to speak QUIC,
and after it's too late as listeners are already created. So it seems that
doing it in str2listener() when the protocol is discovered is the best
place.
Also adds two ebtrees to the underlying receivers to store the connection
by connections IDs (one for the original connection IDs, and another
one for the definitive connection IDs which really identify the connections.
However it doesn't seem normal that it is stored in the receiver nor the
listener. There should be a private context in the listener so that
protocols can store internal information. This element should in
fact be the listener handle.
Something still feels wrong, and probably we'll have to make QUIC and
SSL co-exist: a proof of this is that there's some explicit code in
bind_parse_ssl() to prevent the "ssl" keyword from replacing the xprt.
This patch imports all the C files for QUIC protocol implementation with few
modifications from 20200720-quic branch of quic-dev repository found at
https://github.com/haproxytech/quic-dev.
Traces were implemented to help with the development.
When parsing "ssl" keyword for TLS bindings, we must not use the same xprt as the one
for TLS/TCP connections. So, do not modify the QUIC xprt which will be initialized
when parsing QUIC addresses wich "ssl" bindings.
QUIC needs to initialize its BIO and SSL session the same way as for SSL over TCP
connections. It needs also to use the same ClientHello callback.
This patch only exports functions and variables shared between QUIC and SSL/TCP
connections.
This patch extraces the code which initializes the BIO and SSL session
objects so that to reuse it elsewhere later for QUIC conections which
only needs SSL and BIO objects at th TLS layer stack level to work.
We add src/quic_sock.c QUIC specific socket management functions as callbacks
for the control layer: ->accept_conn, ->default_iocb and ->rx_listening.
accept_conn() will have to be defined. The default I/O handler only recvfrom()
the datagrams received. Furthermore, ->rx_listening callback always returns 1 at
this time but should returns 0 when reloading the processus.
As QUIC is a connection oriented protocol, this file is almost a copy of
proto_tcp without TCP specific features. To suspend/resume a QUIC receiver
we proceed the same way as for proto_udp receivers.
With the recent updates to the listeners, we don't need a specific set of
quic*_add_listener() functions, the default ones are sufficient. The fields
declaration were reordered to make the various layers more visible like in
other protocols.
udp_suspend_receiver/udp_resume_receiver are up-to-date (the check for INHERITED
is present) and the code being UDP-specific, it's normal to use UDP here.
Note that in the future we might more reasily reference stacked layers so that
there's no more need for specifying the pointer here.
A new XXH3 variant of hash functions shows a noticeable improvement in
performance (especially on small data), and also brings 128-bit support,
better inlining and streaming capabilities.
Performance comparison is available here:
https://github.com/Cyan4973/xxHash/wiki/Performance-comparison
Allow the user to specify a custom Connection header for http-check
send. This is useful for example to implement a websocket upgrade check.
If no connection header has been set, a 'Connection: close' header is
automatically appended to allow the server to close the connection
immediately after the request/response.
Update the documentation related to http-check send.
This fixes the github issue #1009.
This is a regression in 7838a79ba ("MEDIUM: mux-h2/trace: add lots of traces
all over the code"). The issue was found using -Wmisleading-indentation.
This patch fixes GitHub issue #1015.
The impact of this bug is that it could in theory cause occasional delays
on some long responses for connections having otherwise no traffic.
This patch should be backported to 2.1+, the commit was first tagged in
v2.1-dev2.
This bug happens when a service has multiple records on the same host
and the server provides the A/AAAA resolution in the response as AR
(Additional Records).
In such condition, the first occurence of the host will be taken from
the Additional section, while the second (and next ones) will be process
by an independent resolution task (like we used to do before 2.2).
This can lead to a situation where the "synchronisation" of the
resolution may diverge, like described in github issue #971.
Because of this behavior, HAProxy mixes various type of requests to
resolve the full list of servers: SRV+AR for all "first" occurences and
A/AAAA for all other occurences of an existing hostname.
IE: with the following type of response:
;; ANSWER SECTION:
_http._tcp.be2.tld. 3600 IN SRV 5 500 80 A2.tld.
_http._tcp.be2.tld. 3600 IN SRV 5 500 86 A3.tld.
_http._tcp.be2.tld. 3600 IN SRV 5 500 80 A1.tld.
_http._tcp.be2.tld. 3600 IN SRV 5 500 85 A3.tld.
;; ADDITIONAL SECTION:
A2.tld. 3600 IN A 192.168.0.2
A3.tld. 3600 IN A 192.168.0.3
A1.tld. 3600 IN A 192.168.0.1
A3.tld. 3600 IN A 192.168.0.3
the first A3 host is resolved using the Additional Section and the
second one through a dedicated A request.
When linking the SRV records to their respective Additional one, a
condition was missing (chek if said SRV record is already attached to an
Additional one), leading to stop processing SRV only when the target
SRV field matches the Additional record name. Hence only the first
occurence of a target was managed by an additional record.
This patch adds a condition in this loop to ensure the record being
parsed is not already linked to an Additional Record. If so, we can
carry on the parsing to find a possible next one with the same target
field value.
backport status: 2.2 and above
SSL_CTX_get0_privatekey is openssl/boringssl specific function present
since openssl-1.0.2, let us define readable guard for it, not depending
on HA_OPENSSL_VERSION
Since commit 8a069eb9a ("MINOR: debug: add a trivial PRNG for scheduler
stress-tests"), 32-bit gcc 4.7 emits this warning when parsing the
initial seed for the debugger's RNG (2463534242):
src/debug.c:46:1: warning: this decimal constant is unsigned only in ISO C90 [enabled by default]
Let's mark it explicitly unsigned.
On frontend side, when a conn-stream is detached from a H1 connection, the
H1 stream is destroyed and if we already have some data to parse (a
pipelined request), we process these data immedialtely calling
h1_process(). Then we adjust the H1 connection timeout. But h1_process() may
fail and release the H1 connection. For instance, a parsing error may be
reported. Thus, when that happens, we must not use anymore the H1 connection
and exit.
This patch must be backported as far as the 2.2. This bug can impact the 2.3
and the 2.2, in theory, if h1 stream creation fails. But, concretly, it only
fails on the 2.4 because the requests are now parsed at this step.
When a channel is set in TUNNEL mode, we now always set the CF_NEVER_WAIT flag,
to be sure to never wait for sending data. It is important because in TUNNEL
mode, we have no idea if more data are expected or not. Setting this flag
prevent the MSG_MORE flag to be set on the connection.
It is only a problem with the HTX, since the 2.2. On previous versions, the
MSG_MORE flag is only set on the mux initiative. In fact, the problem arises
because there is an ambiguity in tunnel mode about the HTX_FL_EOI flag. In this
mode, from the mux point of view, while the SHUTR is not received more data are
expected. But from the channel point of view, we want to send data asap.
At short term, this fix is good enough and is valid anyway. But for the long
term more reliable solution must be found. At least, the to_forward field must
regain its original meaning.
This patch must be backported as far as 2.2.
When a protocol upgrade request is received, once parsed, it is waiting for
the response in the DONE state. But we must not set the flag CS_FL_EOI
because we don't know if a protocol upgrade will be performed or not.
Now, it is set on the response path, if both sides reached the DONE
state. If a protocol upgrade is finally performed, both side are switched in
TUNNEL state. Thus the CS_FL_EOI flag is not set.
If backported, this patch must be adapted because for now it relies on last
2.4-dev changes. It may be backported as far as 2.0.
As stated in the rfc7231, section 4.3.6, an HTTP tunnel via a CONNECT method
is successfully established if the server replies with any 2xx status
code. However, only 200 responses are considered as valid. With this patch,
any 2xx responses are now considered to estalish the tunnel.
This patch may be backported on demand to all stable versions and adapted
for the legacy HTTP. It works this way since a very long time and nobody
complains.
Due to the addition of the OpenTracing filter it is necessary to define
ARGC_OT enum. This value is used in the functions fmt_directive() and
smp_resolve_args().
The OpenTracing filter uses several internal HAProxy functions to work
with variables and therefore requires two static local HAProxy functions,
var_accounting_diff() and var_clear(), to be declared global.
In fact, the var_clear() function was not originally defined as static,
but it lacked a declaration.
This new option allows to tune the maximum number of simultaneous
entries with the same primary key in the cache (secondary entries).
When we try to store a response in the cache and there are already
max-secondary-entries living entries in the cache, the storage will
fail (but the response will still be sent to the client).
It defaults to 10 and does not have a maximum number.
The secondary entry counter cannot be updated without going over all the
items of a duplicates list periodically. In order to avoid doing it too
often and to impact the cache's performances, a timestamp is added to
the cache_entry. It will store the timestamp (with second precision) of
the last iteration over the list (actually the last call of the
clear_expired_duplicates function). This way, this function will not be
called more than once per second for a given duplicates list.
Add an arbitrary maximum number of secondary entries per primary hash
(10 for now) to the cache. This prevents the cache from being filled
with duplicates of the same resource.
This works thanks to an entry counter that is kept in one of the
duplicates of the list (the last one).
When an entry is added to the list, the ebtree's implementation ensures
that it will be added to the end of the existing list so the only thing
to do to keep the counter updated is to get the previous counter from
the second to last entry.
Likewise, when an entry is explicitely deleted, we update the counter
from the list's last item.
SSL_CTX_add_server_custom_ext is openssl specific function present
since openssl-1.0.2, let us define readable guard for it, not depending
on HA_OPENSSL_VERSION
The cache entries are now added into the tree even when they are not
complete yet. If we realized while trying to add a response's payload
that the shctx was full, the entry was disabled through the
disable_cache_entry function, which cleared the key field of the entry's
node, but without actually removing it from the tree. So the shctx row
could be stolen from the entry and the row's content be rewritten while
a lookup in the tree would still find a reference to the old entry. This
caused a random crash in case of cache saturation and row reuse.
This patch adds the missing removal of the node from the tree next to
the reset of the key in disable_cache_entry.
This bug was introduced by commit 3243447 ("MINOR: cache: Add entry
to the tree as soon as possible")
It does not need to be backported.
In issue #1004, it was reported that it is not possible to remove
correctly a certificate after updating it when it came from a crt-list.
Indeed the "commit ssl cert" command on the CLI does not update the list
of ckch_inst in the crtlist_entry. Because of this, the "del ssl
crt-list" command does not remove neither the instances nor the SNIs
because they were never linked to the crtlist_entry.
This patch fixes the issue by inserting the ckch_inst in the
crtlist_entry once generated.
Must be backported as far as 2.2.
When a frontend H1 connection timed out waiting for the next request, a 408
error message is returned to the client. It is performed into the H1C task
process function, h1_timeout_task(), and under the idle connection takeover
lock. If the 408 error message cannot be sent immediately, we wait for a
next retry. In this case, the lock must be released.
This bug was introduced by the commit c4bfa59f1d ("MAJOR: mux-h1: Create the
client stream as later as possible") and is specific to the 2.4-DEV. No
backport needed.
Depending on the context, the current eweight or the next one must be used
to reposition a server in the tree. When the server state is updated, for
instance its weight, the next eweight must be used because it is not yet
committed. However, when the server is used, on normal conditions, the
current eweight must be used.
In fact, it is only a bug on the 1.8. On newer versions, the changes on a
server are performed synchronously. But it is safer to rely on the right
eweight value to avoid any futur bugs.
On the 1.8, it is important to do so, because the server state is updated
and committed inside the rendez-vous point. Thus, the next server state may
be unsync with the current state for a short time, waiting all threads join
the rendez-vous point. It is especially a problem if the next eweight is set
to 0. Because otherwise, it must not be used to reposition the server in the
tree, leading to a divide by 0.
This patch must be backported as far as 1.8.
This changes the subscribe/unsubscribe functions to rely on the control
layer's check_events/ignore_events. At the moment only the socket version
of these functions is present so the code should basically be the same.
Right now the connection subscribe/unsubscribe code needs to manipulate
FDs, which is not compatible with QUIC. In practice what we need there
is to be able to either subscribe or wake up depending on readiness at
the moment of subscription.
This commit introduces two new functions at the control layer, which are
provided by the socket code, to check for FD readiness or subscribe to it
at the control layer. For now it's not used.
Now we don't touch the fd anymore there, instead we rely on the ->drain()
provided by the control layer. As such the function was renamed to
conn_ctrl_drain().
This is what we need to drain pending incoming data from an connection.
The code was taken from conn_sock_drain() without the connection-specific
stuff. It still takes a connection for now for API simplicity.
conn_fd_handler() is 100% specific to socket code. It's about time
it moves to sock.c which manipulates socket FDs. With it comes
conn_fd_check() which tests for the socket's readiness. The ugly
connection status check at the end of the iocb was moved to an inlined
function in connection.h so that if we need it for other socket layers
it's not too hard to reuse.
The code was really only moved and not changed at all.
The send() loop present in this function and the error handling is already
present in raw_sock_from_buf(). Let's rely on it instead and stop touching
the FD from this place. The send flag was changed to use a more agnostic
CO_SFL_*. The name was changed to "conn_ctrl_send()" to remind that it's
meant to be used to send at the lowest level.
Add cur_server_timeout and cur_tunnel_timeout.
These sample fetches return the current timeout value for a stream. This
is useful to retrieve the value of a timeout which was changed via a
set-timeout rule.
Prepare the possibility to register sample fetches on the stream.
This commit is necessary to implement sample fetches to retrieve the
current timeout values.
Add a new http-request action 'set-timeout [server/tunnel]'. This action
can be used to update the server or tunnel timeout of a stream. It takes
two parameters, the timeout name to update and the new timeout value.
This rule is only valid for a proxy with backend capabilities. The
timeout value cannot be null. A sample expression can also be used
instead of a plain value.
Allow the modification of the tunnel timeout on the stream side.
Use a new field in the stream for the tunnel timeout. It is initialized
by the tunnel timeout from backend unless it has already been set by a
set-timeout tunnel rule.
Allow the modification of the timeout server value on the stream side.
Do not apply the default backend server timeout in back_establish if it
is already defined. This is the case if a set-timeout server rule has
been executed.
parse_size_err() function is now more strict on the size format. The first
character must be a digit. Otherwise an error is returned. Thus "size k" is
now rejected.
This patch must be backported to all stable versions.
First, an error is now reported if the first character is not a digit. Thus,
"timeout client s" triggers an error now. Then 'u' is also rejected
now. 'us' is valid and should be used set the timer in microseconds. However
'u' alone is not a valid unit. It was just ignored before (default to
milliseconds). Now, it is an error. Finally, a warning is reported if the
end of the text is not reached after the timer parsing. This warning will
probably be switched to an error in a futur version.
This patch must be backported to all stable versions.
For HTTP expect rules, if the buffer is not empty, it is guarantee that all
responses headers are received, with the start-line. Thus, except for
payload matching, there is no reason to wait for more data from the moment
the htx message is not empty.
This patch may be backported as far as 2.2.
The check timeout is used to limit a health-check execution. By default
inter timeout is used. But when defined the check timeout is used. In this
case, the inter timeout (or connect timeout) is used for the connection
establishment only. And the check timeout for the health-check
execution. Thus, it must be set after a successfull connect. It means it is
rearm at the end of each connect rule.
This patch with the previous one (BUG/MINOR: http-check: Use right condition
to consider HTX message as full) should solve the issue #991. It must be
backported as far as 2.2. On the 2.3 and 2.2, there are 2 places were the
connection establishement is handled. The check timeout must be set on both.
When an HTTP expect rule is evaluated, we must know if more data is expected
or not to wait if the matching fails. If the whole response is received or
if the HTX message is full, we must not wait. In this context,
htx_free_data_space() must be used instead of htx_free_space(). The fisrt
one count down the block size. Otherwise at the edge, when only the block
size remains free (8 bytes), we may think there is some place for more data
while the mux is unable to add more block.
This bug explains the loop described on the GH issue #991. It should be
backported as far as 2.2.
This last call to conn_cond_update_polling() is now totally misleading as
the function only stops polling in case of unrecoverable connection error.
Let's open-code the test to make it more prominent and explain what we're
trying to do there. It's even almost certain this code is never executed
anymore, as the only remaining case should be a mux's wake function setting
CO_FL_ERROR without disabling the polling, but they need to be audited first
to make sure this is the case.
This was a leftover of the pre-mux v1.8-dev3 era. It makes no sense anymore
to try to disable polling on a connection we don't own, it's the mux's job
and it's properly done upon shutdowns and closes.
As explained in previous commit, the situation is absurd as we try to
cleanly drain pending data before impolitely shutting down, and it could
be counter productive on real muxes. Let's use cs_drain_and_close() instead.
When the shutr() requests CS_SHR_DRAIN and there's no particular shutr
implemented on the underlying transport layer, we must drain pending data.
This is what happens when cs_drain_and_close() is called. It is important
for TCP checks to drain large responses and close cleanly.
Not only it's become totally useless with muxes, in addition it's
dangerous to play with the mux's FD while shutting a stream down for
writes. It's already done *if necessary* by the cs_shutw() code at the
mux layer. Fortunately it doesn't seem to have any impact, most likely
the polling updates used to immediately revert this operation.
conn_stop_polling() in fact only calls fd_stop_both() after checking
that the ctrl layer is ready. It's the case in conn_fd_check() so
let's get rid of this next-to-last user of this function.
The duplicated entries (in case of vary) were not taken into account by
the "show cache" command. They are now dumped too.
A new "vary" column is added to the output. It contains the complete
seocndary key (in hex format).
QUIC will rely on UDP at the receiver level, and will need these functions
to suspend/resume the receivers. In the future, protocol chaining may
simplify this.
Currnetly conn_ctrl_init() does an fd_insert() and conn_ctrl_close() does an
fd_delete(). These are the two only short-term obstacles against using a
non-fd handle to set up a connection. Let's have pur these into the protocol
layer, along with the other connection-level stuff so that the generic
connection code uses them instead. This will allow to define new ones for
other protocols (e.g. QUIC).
Since we only support regular sockets at the moment, the code was placed
into sock.c and shared with proto_tcp, proto_uxst and proto_sockpair.
For the sake of an improved readability, let's group the protocol
field members according to where they're supposed to be defined:
- connection layer (note: for now even UDP needs one)
- binding layer
- address family
- socket layer
Nothing else was changed.
The various protocols were made static since there was no point in
exporting them in the past. Nowadays with QUIC relying on UDP we'll
significantly benefit from UDP being exported and more generally from
being able to declare some functions as being the same as other
protocols'.
In an ideal world it should not be these protocols which should be
exported, but the intermediary levels:
- socket layer (sock.c only right now), already exported as functions
but nothing structured at the moment ;
- family layer (sock_inet, sock_unix, sockpair etc): already structured
and exported
- binding layer (the part that relies on the receiver): currently fused
within the protocol
- connectiong layer (the part that manipulates connections): currently
fused within the protocol
- protocol (connection's control): shouldn't need to be exposed
ultimately once the elements above are in an easily sharable way.
This field used to be needed before commit 2b5e0d8b6 ("MEDIUM: proto_udp:
replace last AF_CUST_UDP* with AF_INET*") as it was used as a protocol
entry selector. Since this commit it's always equal to the socket family's
value so it's entirely redundant. Let's remove it now to simplify the
protocol definition a little bit.
At the end of stream_new(), once the input buffer is transfer to the request
channel, it must not be used anymore. The previous patch (16df178b6 "BUG/MEDIUM:
stream: Xfer the input buffer to a fully created stream") was pushed to quickly.
No backport needed.
The input buffer passed as argument to create a new stream must not be
transferred when the request channel is initialized because the channel
flags are not set at this stage. In addition, the API is a bit confusing
regarding the buffer owner when an error occurred. The caller remains the
owner, but reading the code it is not obvious.
So, first of all, to avoid any ambiguities, comments are added on the
calling chain to make it clear. The buffer owner is the caller if any error
occurred. And the ownership is transferred to the stream on success.
Then, to make things simple, the ownership is transferred at the end of
stream_new(), in case of success. And the input buffer is updated to point
on BUF_NULL. Thus, in all cases, if the caller try to release it calling
b_free() on it, it is not a problem. Of course, it remains the caller
responsibility to release it on error.
The patch fixes a bug introduced by the commit 26256f86e ("MINOR: stream:
Pass an optional input buffer when a stream is created"). No backport is
needed.
Since HAProxy 2.3, OpenSSL 1.1.1 is a requirement for using a
multi-certificate bundle in the configuration. This patch emits a fatal
error when HAProxy tries to load a bundle with an older version of
HAProxy.
This problem was encountered by an user in issue #990.
This must be backported in 2.3.
With the removal of the family-specific port setting, all protocol had
exactly the same implementation of ->add(). A generic one was created
with the name "default_add_listener" so that all other ones can now be
removed. The API was slightly adjusted so that the protocol and the
listener are passed instead of the listener and the port.
Note that all protocols continue to provide this ->add() method instead
of routinely calling default_add_listener() from create_listeners(). This
makes sure that any non-standard protocol will still be able to intercept
the listener addition if needed.
This could be backported to 2.3 along with the few previous patches on
listners as a pure code cleanup.
In create_listeners() we iterate over a port range and call the
protocol's ->add() function to add a new listener on the specified
port. Only tcp4/tcp6/udp4/udp6 support a port, the other ones ignore
it. Now that we can rely on the address family to properly set the
port, better do it this way directly from create_listeners() and
remove the family-specific case from the protocol layer.
At various places we need to set a port on an IPv4 or IPv6 address, and
it requires casts that are easy to get wrong. Let's add a new set_port()
helper to the address family to assist in this. It will be directly
accessible from the protocol and will make the operation seamless.
Right now this is only implemented for sock_inet as other families do
not need a port.
When a H1 message is parsed, if the parser state is switched to TUNNEL mode
just after the header parsing, the BODYLESS flag is set on the HTX
start-line. By transitivity, the corresponding flag is set on the message in
HTTP analysers. Thus it is possible to rely on it to not wait for the
request body.
CNT_LEN and TE_CHNK flags must be set on the message only when the
corresponding flag is set on the HTX start-line. Before, when the transfer
length was known XFER_LEN set), the HTTP_MSGF_TE_CHNK was the default. But
it is not appropriate. Now, it is only set if the message is chunked. Thus,
it is now possible to have a known transfer length without CNT_LEN or
TE_CHNK.
In addition, the BODYLESS flags may be set, independently on XFER_LEN one.
This flags is now unused. It was used in REQ_WAIT_HTTP analyser, when a
stream was waiting for a request, to set the keep-alive timeout or to avoid
to send HTTP errors to client.
It is now impossible to start the HTTP request processing in the stream
analysers with a partial or empty request message. The mux-h2 was already
waiting of the request headers before creating the stream. Now the mux-h1
does the same. All errors (aborts, timeout or invalid requests) waiting for
the request headers are now handled by the multiplexers. So there is no
reason to still handle them in the REQ_WAIT_HTTP (http_wait_for_request)
analyser.
To ensure there is no ambiguity, a BUG_ON() was added to exit if a partial
request is received in this analyser.
H1C_F_CS_* flags are renamed into H1C_F_ST_*. They reflect the connection
state. So "ST" is well suited. "CS" is confusing because it is also the
abbreviation for conn-stream.
In addition, H1C flags are reordered.
This is the reason for all previous patches. The conn-stream and the
associated stream are created as later as possible. It only concerns the
frontend connections. But it means the request headers, and possibly the
first data block, are received and parsed before the conn-stream
creation. To do so, an embryonic H1 stream, with no conn-stream, is
created. The result of this "early parsing" is stored in its rx buffer, used
to fill the request channel when the stream is created. During this step,
some HTTP errors may be returned by the mux. It must also handle
http-request/keep-alive timeouts. A significative change is about H1 to H2
upgrade. It happens very early now, and no H1 stream are created (and thus
of course no conn-stream).
The most important part of this patch is located to the h1_process()
function. Because it must trigger the parsing when there is no H1
stream. h1_recv() function has also been simplified.
For now, this part is unsued. But this patch adds functions to handle errors
on idle and embryonic H1 connections and send corresponding HTTP error
messages to the client (400, 408 or 500). Thanks to previous patches, these
functions take care to update the right stats counters, but also the
counters tracked by the session.
A field to store the HTTP error code has been added in the H1C structure. It
is used for error retransmits, if any, and to get it in http logs. It is
used to return the mux exit status code when the MUX_EXIT_STATUS ctl
parameter is requested.
When a log message is emitted from the session level, by a multiplexer,
there is no stream. Thus for HTTP session, there no status code and the
termination flags are not correctly set.
Thanks to previous patch, the HTTP status code is deduced from the mux exist
status, using the MUX_EXIT_STATE ctl param. This is only done for HTTP
frontends. If it is defined ( != 0), it is used to deduce the termination
flags.
The ctl param MUX_EXIT_STATUS can be request to get the exit status of a
multiplexer. For instance, it may be an HTTP status code or an H2 error. For
now, 0 is always returned. When the mux h1 will be able to return HTTP
errors itself, this ctl param will be used to get the HTTP status code from
the logs.
the mux_exit_status enum has been created to map internal mux exist status
to generic one. Thus there is 5 possible status for now: success, invalid
error, timeout error, internal error and unknown.
The cumulative numbers of http requests, http errors, bytes received and
sent and their respective rates for a tracked counters are now updated using
specific stream independent functions. These functions are used by the
stream but the aim is to allow the session to do so too. For now, there is
no reason to perform these updates from the session, except from the mux-h2
maybe. But, the mux-h1, on the frontend side, will be able to return some
errors to the client, before the stream creation. In this case, it will be
mandatory to update counters tracked at the session level.
An idle expiration date is added on the H1 connection with the function to
set it depending on connection state. First, there is no idle timeout on
backend connections, For idle frontend connections, the http-request or
keep-alive timeout are used depending on which timeout is defined and if it
is the first request or not. For embryonic connections, the http-request is
always used, if defined. For attached or shutted down connections, no idle
timeout is applied.
For now the idle expiration date is never set and the h1_set_idle_expiration
function remains unused.
When the conn-stream is detached for a H1 connection, there is no reason to
subscribe for reads or process pending input data if the connection is not
idle. Because, it means a shutdown is pending.
Conditions to set a timeout on the H1C task have been simplified or at least
changed to rely on H1 connection flags. Now, following rules are used :
* the shutdown timeout is applied on dead (not alive) or shutted down
connections.
* The client/server timeout is applied if there are still some pending
outgoing data.
* The client timeout is applied on alive frontend connections with no
conn-stream. It means on idle or embryionic frontend connections.
* For all other connections (backend or attached connections), no timeout
is applied. For frontend or backend attached connections, the timeout is
handled by the application layer. For idle backend connections, there is
no timeout.
We now only rely on one flag to notify a shutdown. The shutdown is performed
at the connection level when there are no more pending outgoing data. So, it
means it is performed immediately if the output buffer is empty. Otherwise
it is deferred after the outgoing data are sent.
This simplify a bit the mux because there is now only one flag to check.
Don't try to read more data if a parsing or a formatting error was reported
on the H1 stream. There is no reason to continue to process the messages for
the current connection in this case. If a parsing error occurs, it means the
input is invalid. If a formatting error occurs, it is an internal error and
it is probably safer to give up.
Mainly to make it easier to read. First of all, when a H1 connection is
still there, we check if the connection was stolen by another thread or
not. If yes we release the task and leave. Then we check if the task is
expired or not. Only expired tasks are considered. Finally, if a conn-stream
is still attached to the connection (H1C_F_CS_ATTACHED flag set), we
return. Otherwise, the task and the H1 connection are released.
Be prepared to have a H1 connection in one of the following states :
* A H1 connection waiting for a new message with no H1 stream.
H1C_F_CS_IDLE flag is set.
* A H1 connection processing a new message with a H1 stream but no
conn-stream attached. H1C_F_CS_EMBRYONIC flag is set
* A H1 connection with a H1 stream and a conn-stream attached.
H1C_F_CS_ATTACHED flag is set.
* A H1 connection with no H1 stream, waiting to be released. No flag is set.
These flags are mutually exclusives. When none is set, it means the
connection will be released ASAP, just remaining outgoing data must be sent
before. For now, the second state (H1C_F_CS_EMBRYONIC) is transient.
For now this buffer is not used. But it will be used to parse the headers,
and possibly the first block of data, when no stream is attached to the H1
connection. The aim is to use it to create the stream, thanks to recent
changes on the streams creation api.
Dedicated functions are now used to create frontend and backend H1
streams. h1c_frt_stream_new() is now used to create frontend H1 streams and
h1c_bck_stream_new() to create backend ones. Both rely on h1s_new() function
to allocate the stream itself. It is a bit easier to add specific processing
depending we are on the frontend or the backend side.
Instead of using H1S flags to report an error on the request or the
response, independently it is a parsing or a formatting error, we now use a
flag to report parsing errors and another one to report formatting
ones. This simplify the message parsing. It is also easier to figure out
what error happened when one of this flag is set. The side may be deduced
checking the H1C_F_IS_BACK flag.
Instead of using 2 flags on the H1 stream (H1S_F_BUF_FLUSH and
H1S_F_SPLICED_DATA), we now only use one flag on the H1 connection
(H1C_F_WANT_SPLICE) to notify we want to use splicing or we are using
splicing. This flag blocks the calls to rcv_buf() connection callback.
It is a bit easier to set the H1 connection capability to receive data in
its input buffer instead of relying on the H1 stream.
H1C_F_WAIT_OPPOSITE must be set on the H1 conenction to don't read more data
because we must be sync with the opposite side. This flag replaces the
H1C_F_IN_BUSY flag. Its name is a bit explicit. It is automatically set on
the backend side when the mux is created. It is safe to do so because at
this stage, the request has not yet been sent to the server. This way, in
h1_recv_allowed(), a test on this flag is enough to block the reads instead
of testing the H1 stream state on the backend side.
It is now possible to set the buffer used by the channel request buffer when
a stream is created. It may be useful if input data are already received,
instead of waiting the first call to the mux rcv_buf() callback. This change
is mandatory to support H1 connection with no stream attached.
For now, the multiplexers don't pass any buffer. BUF_NULL is thus used to
call stream_create_from_cs().
These info are only provided by the mux-h1. But, thanks to previous patches,
we can get them from the session directly. There is no need to retrieve them
from the mux anymore.
Since the idle duration provided by the session is always up-to-date, there
is no more reason to rely on the multiplexer cs_info to set it to the
stream.
When a log message is emitted from the session, using sess_log() function,
there is no stream available. In this case, instead of deducing the idle
duration from the accept date, we use the one provided by the session. 0 is
used if it is undefined (i.e set to -1).
These info are reset for the next transaction, if the connection is kept
alive. From the stream point of view, it should be the same a new
connection, except there is no handshake. Thus the handshake duration is set
to 0.
The idle duration between two streams is added to the session structure. It
is not necessarily pertinent on all protocols. In fact, it is only defined
for H1 connections. It is the duration between two H1 transactions. But the
.get_cs_info() callback function on the multiplexers only exists because
this duration is missing at the session level. So it is a simplification
opportunity for a really low cost.
To reduce the cost, a hole in the session structure is filled by moving
.srv_list field at the end of the structure.
IDLE frontend connections have no stream attached. The stream is only
created when new data are received, when the parsing of the next request
starts. Thus the keep-alive timeout, handled into the HTTP analysers, is not
considered while nothing is received. But this is especially when this
timeout must be considered. Concretely the http-keep-alive is ignored while
no data are received. Only the client timeout is used. It will only be
considered on incomplete requests, if the http-request timeout is not set.
To fix the bug, the http-keep-alive timeout must be handled at the mux
level, for IDLE frontend connection only.
This patch should fix the issue #984. It must be backported as far as
2.2. On prior versions, the stream is created earlier. So, it is not a
problem, except if this behavior changes of course (it was an optim of the
2.2, but don't remember the commit).
A copy-paste bug between {tcp,udp}{4,6}_add_listener() resulted in
using a struct sockaddr_in to set the TCP/UDP port while it ought to
be a struct sockaddr_in6. Fortunately, the port has the same offset
(2) in both so it was harmless. A cleaner way to proceed would be
to have a set_port function exported by the address family layer.
This needs to be backported to 2.3.
It seems to me that lua_close() must be called on all states at deinit
time, not just the first two ones. This is likely a remnant of commit
59f11be43 ("MEDIUM: lua-thread: Add the lua-load-per-thread directive").
There should likely be some memory leak reports when using Lua without
this fix, though none were observed for now.
No backport is needed as this was merged into 2.4-dev.
Lua dedicated TCP, HTTP and SSL socket and proxies must be initialized
once. Right now, they are initialized from the Lua init state, but since
commit 59f11be43 ("MEDIUM: lua-thread: Add the lua-load-per-thread
directive") this function is called one time per lua context. This
caused some fields to be cleared and overwritten, and pre-allocated
object to be lost. This is why the address sanitizer detected memory
leaks from the socket_ssl server initialization.
Let's move all the state-independent part of the function to the
hlua_init() function to avoid this.
No backport is needed, this is only 2.4-dev.
In case of successful unsafe method on a stored resource, the cached entry
must be invalidated (see RFC7234#4.4).
A "non-error response" is one with a 2xx (Successful) or 3xx (Redirection)
status code.
This implies that the primary hash must now be calculated on requests
that have an unsafe method (POST or PUT for instance) so that we can
disable the corresponding entries when we process the response.
The Cache-Control max-age and s-maxage directives should be followed by
a positive numerical value (see RFC 7234#5.2.1.1). According to the
specs, a sender "should not" generate a quoted-string value but we will
still accept this format.
When a response has an Age header (filled in by another cache on the
message's path) that is greater than its defined maximum age (extracted
either from cache-control directives or an expires header), it is
already stale and should not be cached.
The goal is to allow execution of one main lua state per thread.
This patch contains the main job. The lua init is done using these
steps:
- "lua-load-per-thread" loads the lua code in the first thread
- it creates the structs
- it stores loaded files
- the 1st step load is completed (execution of hlua_post_init)
and now, we known the number of threads
- we initilize lua states for all remaining threads
- for each one, we load the lua file
- for each one, we execute post-init
Once all is loaded, we control consistency of functions references.
The rules are:
- a function reference cannot be in the shared lua state and in
a per-thread lua state at the same time.
- if a function reference is declared in a per-thread lua state, it
must be declared in all per-thread lua states
The goal is to allow execution of one main lua state per thread.
The array introduces storage of one reference per thread, because each
lua state can have different reference id for a same function. A function
returns the preferred state id according to configuration and current
thread id.
The goal is to allow execution of one main lua state per thread.
"state_from" is a pointer to the parent lua state. "state_id"
is the index of the parent state id in the reference lua states
array. "state_id" is better because the lock is a "== 0" test
which is quick than pointer comparison. In other way, the state_id
index could index other things the the Lua state concerned. I
think to the function references.
The goal is to allow execution of one main lua state per thread.
This function will initialize the struct with other things than 0.
With this function helper, the initialization is centralized and
it prevents mistakes. This patch also keeps a reference to each
declared function in a list. It will be useful in next patches to
control consistency of declared references.
The goal is to allow execution of one main lua state per thread.
The array of states is initialized at the max number of thread +1.
We define the index 0 is the common state shared by all threads
and should be locked. Other index index are dedicated to each
one thread. The old gL now becomes hlua_states[0].
The goal is to allow execution of one main lua state per thread.
This patch opens the way to addition of a per-thread dedicated lua state.
By passing the hlua we can figure the original state that's been used
and decide to lock or not.
The goal is to allow execution of one main lua state per thread.
Stop using locks in init part, we will use only in parts where
the parent lua state is known, so we could take decision about lock
according with the lua parent state.
The goal is to allow execution of one main lua state per thread.
This commit introduces this variable in the core. Lua state initialized
by thread will have access to this variable, which reports the executing
thread. 0 indicates the shared thread. Programs which must be executed
only once can check for core.thread <= 1.
The goal is to allow execution of one main lua state per thread.
This function will be called for each initialized lua state, so
one per thread. The split transforms the lua state variable from
global to local.
The goal is to allow execution of one main lua state per thread.
This function will be called once per thread, using different Lua
states. This patch prepares the work.
The goal is to allow execution of one main lua state per thread.
The function hlua_ctx_init() now gets the original lua state from
its caller. This allows the initialisation of lua_thread (coroutines)
from any master lua state.
The parent lua state is stored in the hlua struct.
This patch is a temporary transition, it will be modified later.
The goal is to allow execution of one main lua state per thread.
This is a preparative work in order to init more than one stack
in the lua-thread objective.
The goal is to allow execution of one main lua state per thread.
Because this struct will be filled after the configuration parser, we
cannot copy the content. The actual state of the Haproxy code doesn't
justify this change, it is an update preparing next steps.
The goal is to no longer use "struct hlua" with global main lua_state.
The usage of the "struct hlua" is no longer required. This patch replaces
this struct by another one.
Now, the usage of runtime Lua phase is separated from the start lua phase.
The goal is to no longer use "struct hlua" with global main lua_state.
This patch returns NULL value when some code tries go get the hlua struct
associated with a task through hlua_gethlua(). This functions is useful
only during runtime because the struct hlua contains only runtime states.
Some Lua functions allowed to yield are called from init environment.
I'm not sure this is a good practice. Maybe it will be clever to
disallow calling this kind of functions.
The goal is no longer using "struct hlua" with global main lua_state.
if somewhere in the code, hlua_ctx_renew() is called with a global Lua
context, we have a serious bug. A crash is better than working with
this bug, so this patch remove a useless control.
In other way, this control were used during hlua_post_init() function.
The function hlua_post_init() used a call to the runtime hlua_ctx_resume()
function. This call no longer exists.
The goal is to no longer use "struct hlua" with global main lua_state.
The hlua_post_init() is executed during start phase, it does not require
yielding nor any advanced runtime error processing. Let's simplify this
by re-implementing the code using lower-level functions which directly
take a state and not an hlua anymore.
The goal is to no longer use "struct hlua" with global main lua_state
and directly take the state instead.
This patch removes the implicit dependency to this struct with
the function hlua_prepend_path()
Let's switch memory accounting to atomics so that the allocator function
may safely be used from concurrent Lua states.
Given that this function is extremely hot on the call path, we try to
optimize it for the most common case, which is:
- no limit
- there's enough memory
The accounting is what is particuarly expensive in threads since all
CPUs compete for a cache line, so when the limit is not used, we don't
want to use accounting. However we need to preserve it during the boot
phase until we may parse a "tune.lua.maxmem" value. For this, we turn
the unlimited "0" value to ~0 at the end of the boot phase to mark the
definite end of accounting. The function then detects this value and
directly jumps to realloc() in this case.
When the limit is enforced however, we use a CAS to check and reserve
our share of memory, and we roll back on failure. The CAS is used both
for increments and decrements so that a single operation is enough to
update the counters.
The function really has the semantics of a realloc() except that it
also passes the old size to help with accounting. No need to special
case the free or malloc, realloc does everything we need.
If the session is not established, the applet handler could leave
with the applet detached from the ring. At next call, the attach
counter will be decreased again causing unpredectable behavior.
This patch should be backported on branches >=2.2