Commit graph

4905 commits

Author SHA1 Message Date
Willy Tarreau
958ae26c35 REORG: atomic: reimplement pl_cpu_relax() from atomic-ops.h
There is some confusion here as we need to place some cpu_relax statements
in some loops where it's not easily possible to condition them on the use
of threads. That's what atomic.h already does. So let's take the various
pl_cpu_relax() implementations from there and place them in atomic.h under
the name __ha_cpu_relax() and let them adapt to the presence or absence of
threads and to the architecture (currently only x86 and aarch64 use a barrier
instruction), though it's very likely that arm would work well with a cache
flushing ISB instruction as well).

This time they were implemented as expressions returning 1 rather than
statements, in order to ease their placement as the loop condition or the
continuation expression inside "for" loops. We should probably do the same
with barriers and a few such other ones.
2021-03-05 08:30:08 +01:00
Amaury Denoyelle
8ede3db080 MINOR: backend: handle reuse for conns with no server as target
If dispatch mode or transparent backend is used, the backend connection
target is a proxy instead of a server. In these cases, the reuse of
backend connections is not consistent.

With the default behavior, no reuse is done and every new request uses a
new connection. However, if http-reuse is set to never, the connection
are stored by the mux in the session and can be reused for future
requests in the same session.

As no server is used for these connections, no reuse can be made outside
of the session, similarly to http-reuse never mode. A different
http-reuse config value should not have an impact. To achieve this, mark
these connections as private to have a defined behavior.

For this feature to properly work, the connection hash has been slightly
adjusted. The server pointer as an input as been replaced by a generic
target pointer to refer to the server or proxy instance. The hash is
always calculated on connect_server even if the connection target is not
a server. This also requires to allocate the connection hash node for
every backend connections, not just the one with a server target.
2021-03-03 11:31:19 +01:00
Frédéric Lécaille
b28812af7a BUILD: quic: Implicit conversion between SSL related enums.
Fix such compilation issues:

include/haproxy/quic_tls.h:157:10: error: implicit conversion from
'enum ssl_encryption_level_t' to 'enum quic_tls_enc_level'
[-Werror=enum-conversion]
  157 |   return ssl_encryption_application;
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~~
src/xprt_quic.c: In function 'quic_conn_enc_level_init':
src/xprt_quic.c:2358:13: error: implicit conversion from
'enum quic_tls_enc_level' to 'enum ssl_encryption_level_t'
[-Werror=enum-conversion]
 2358 |  qel->level = quic_to_ssl_enc_level(level);
      |             ^

Not detected by all the compilators.
2021-03-02 10:34:18 +01:00
Willy Tarreau
61cfdf4fd8 CLEANUP: tree-wide: replace free(x);x=NULL with ha_free(&x)
This makes the code more readable and less prone to copy-paste errors.
In addition, it allows to place some __builtin_constant_p() predicates
to trigger a link-time error in case the compiler knows that the freed
area is constant. It will also produce compile-time error if trying to
free something that is not a regular pointer (e.g. a function).

The DEBUG_MEM_STATS macro now also defines an instance for ha_free()
so that all these calls can be checked.

178 occurrences were converted. The vast majority of them were handled
by the following Coccinelle script, some slightly refined to better deal
with "&*x" or with long lines:

  @ rule @
  expression E;
  @@
  - free(E);
  - E = NULL;
  + ha_free(&E);

It was verified that the resulting code is the same, more or less a
handful of cases where the compiler optimized slightly differently
the temporary variable that holds the copy of the pointer.

A non-negligible amount of {free(str);str=NULL;str_len=0;} are still
present in the config part (mostly header names in proxies). These
ones should also be cleaned for the same reasons, and probably be
turned into ist strings.
2021-02-26 21:21:09 +01:00
Christopher Faulet
29e9326f2f CLEANUP: hlua: Use net_addr structure internally to parse and compare addresses
hlua_addr structure may be replaced by net_addr structure to parse and
compare addresses. Both structures are similar.
2021-02-26 13:53:26 +01:00
Christopher Faulet
5d1def623a MEDIUM: http-ana: Add IPv6 support for forwardfor and orignialto options
A network may be specified to avoid header addition for "forwardfor" and
"orignialto" option via the "except" parameter. However, only IPv4
networks/addresses are supported. This patch adds the support of IPv6.

To do so, the net_addr structure is used to store the parameter value in the
proxy structure. And ipcmp2net() function is used to perform the comparison.

This patch should fix the issue #1145. It depends on the following commit:

  * c6ce0ab MINOR: tools: Add function to compare an address to a network address
  * 5587287 MINOR: tools: Add net_addr structure describing a network addess
2021-02-26 13:52:48 +01:00
Christopher Faulet
9553de7fec MINOR: tools: Add function to compare an address to a network address
ipcmp2net() function may be used to compare an addres (struct
sockaddr_storage) to a network address (struct net_addr). Among other
things, this function will be used to add support of IPv6 for "except"
parameter of "forwardfor" and "originalto" options.
2021-02-26 13:52:06 +01:00
Christopher Faulet
01f02a4d84 MINOR: tools: Add net_addr structure describing a network addess
The net_addr structure describes a IPv4 or IPv6 address. Its ip and mask are
represented. Among other things, this structure will be used to add support
of IPv6 for "except" parameter of "forwardfor" and "originalto" options.
2021-02-26 13:32:17 +01:00
Willy Tarreau
401135cee6 MINOR: task: add one extra tasklet class: TL_HEAVY
This class will be used exclusively for heavy processing tasklets. It
will be cleaner than mixing them with the bulk ones. For now it's
allocated ~1% of the CPU bandwidth.

The largest part of the patch consists in re-arranging the fields in the
task_per_thread structure to preserve a clean alignment with one more
list head. Since we're now forced to increase the struct past a second
cache line, it now uses 4 cache lines (for easy multiplying) with the
first two ones being exclusively used by local operations and the third
one mostly by atomic operations. Interestingly, this better arrangement
causes less stress and reduced the response time by 8 microseconds at
1 million requests per second.
2021-02-26 12:00:53 +01:00
Willy Tarreau
d8aa21a611 CLEANUP: server: rename srv_cleanup_{idle,toremove}_connections()
These function names are unbearably long, they don't even fit into the
screen in "show profiling", let's trim the "_connections" to "_conns",
which happens to match the name of the lists there.
2021-02-26 00:30:22 +01:00
Willy Tarreau
74dea8caea MINOR: task: limit the number of subsequent heavy tasks with flag TASK_HEAVY
While the scheduler is priority-aware and class-aware, and consistently
tries to maintain fairness between all classes, it doesn't make use of a
fine execution budget to compensate for high-latency tasks such as TLS
handshakes. This can result in many subsequent calls adding multiple
milliseconds of latency between the various steps of other tasklets that
don't even depend on this.

An ideal solution would be to add a 4th queue, have all tasks announce
their estimated cost upfront and let the scheduler maintain an auto-
refilling budget to pick from the most suitable queue.

But it turns out that a very simplified version of this already provides
impressive gains with very tiny changes and could easily be backported.
The principle is to reserve a new task flag "TASK_HEAVY" that indicates
that a task is expected to take a lot of time without yielding (e.g. an
SSL handshake typically takes 700 microseconds of crypto computation).
When the scheduler sees this flag when queuing a tasklet, it will place
it into the bulk queue. And during dequeuing, we accept only one of
these in a full round. This means that the first one will be accepted,
will not prevent other lower priority tasks from running, but if a new
one arrives, then the queue stops here and goes back to the polling.
This will allow to collect more important updates for other tasks that
will be batched before the next call of a heavy task.

Preliminary tests consisting in placing this flag on the SSL handshake
tasklet show that response times under SSL stress fell from 14 ms
before the patch to 3.0 ms with the patch, and even 1.8 ms if
tune.sched.low-latency is set to "on".
2021-02-26 00:25:51 +01:00
Christopher Faulet
69beaa91d5 REORG: server: Export and rename some functions updating server info
Some static functions are now exported and renamed to follow the same
pattern of other exported functions. Here is the list :

 * update_server_fqdn: Renamed to srv_update_fqdn and exported
 * update_server_check_addr_port: renamed to srv_update_check_addr_port and exported
 * update_server_agent_addr_port: renamed to srv_update_agent_addr_port and exported
 * update_server_addr: renamed to srv_update_addr
 * update_server_addr_potr: renamed to srv_update_addr_port
 * srv_prepare_for_resolution: exported

This change is mandatory to move all functions dealing with the server-state
files in a separate file.
2021-02-25 10:02:39 +01:00
Christopher Faulet
ecfb9b9109 MEDIUM: server: Store parsed params of a server-state line in the tree
Parsed parameters are now stored in the tree of server-state lines. This
way, a line from the global server-state file is only parsed once. Before,
it was parsed a first time to store it in the tree and one more time to load
the server state. To do so, the server-state line object must be allocated
before parsing a line. This means its size must no longer depend on the
length of first parsed parameters (backend and server names). Thus the node
type was changed to use a hashed key instead of a string.
2021-02-25 10:02:39 +01:00
Christopher Faulet
6d87c58fb4 CLEANUP: server: Rename state_line structure into server_state_line
The structure used to store a server-state line in an eb-tree has a too
generic name. Instead of state_line, the structure is renamed as
server_state_line.
2021-02-25 10:02:39 +01:00
Christopher Faulet
fcb53fbb58 CLEANUP: server: Rename state_line node to node instead of name_name
<state_line.name_name> field is a node in an eb-tree. Thus, instead of
"name_name", we now use "node" to name this field. If is a more explicit
name and not too strange.
2021-02-25 10:02:39 +01:00
Willy Tarreau
b2285de049 MINOR: tasks: also compute the tasklet latency when DEBUG_TASK is set
It is extremely useful to be able to observe the wakeup latency of some
important I/O operations, so let's accept to inflate the tasklet struct
by 8 extra bytes when DEBUG_TASK is set. With just this we have enough
to get live reports like this:

  $ socat - /tmp/sock1 <<< "show profiling"
  Per-task CPU profiling              : on      # set profiling tasks {on|auto|off}
  Tasks activity:
    function                      calls   cpu_tot   cpu_avg   lat_tot   lat_avg
    si_cs_io_cb                 8099492   4.833s    596.0ns   8.974m    66.48us
    h1_io_cb                    7460365   11.55s    1.548us   2.477m    19.92us
    process_stream              7383828   22.79s    3.086us   18.39m    149.5us
    h1_timeout_task                4157      -         -      348.4ms   83.81us
    srv_cleanup_toremove_connections751   39.70ms   52.86us   10.54ms   14.04us
    srv_cleanup_idle_connections     21   1.405ms   66.89us   30.82us   1.467us
    task_run_applet                  16   1.058ms   66.13us   446.2us   27.89us
    accept_queue_process              7   34.53us   4.933us   333.1us   47.58us
2021-02-25 09:44:16 +01:00
Willy Tarreau
45499c56d3 MINOR: task: make grq_total atomic to move it outside of the grq_lock
Instead of decrementing grq_total once per task picked from the global
run queue, let's do it at once after the loop like we do for other
counters. This simplifies the code everywhere. It is not expected to
bring noticeable improvements however, since global tasks tend to be
less common nowadays.
2021-02-25 09:44:16 +01:00
Willy Tarreau
c03fbeb358 CLEANUP: task: re-merge __task_unlink_rq() with task_unlink_rq()
There's no point keeping the two separate anymore, some tests are
duplicated for no reason.
2021-02-25 09:44:16 +01:00
Christopher Faulet
e071f0e6a4 MINOR: htx: Add function to reserve the max possible size for an HTX DATA block
The function htx_reserve_max_data() should be used to get an HTX DATA block
with the max possible size. A current block may be extended or a new one
created, depending on the HTX message state. But the idea is to let the
caller to copy a bunch of data without requesting many new blocks. It is its
responsibility to resize the block at the end, to set the final block size.

This function will be used to parse messages with small chunks. Indeed, we
can have more than 2700 1-byte chunks in a 16Kb of input data. So it is easy
to understand how this function may help to improve the parsing of chunk
messages.
2021-02-24 22:10:01 +01:00
Baptiste Assmann
b4badf720c BUG/MINOR: resolvers: new callback to properly handle SRV record errors
When a SRV record was created, it used to register the regular server name
resolution callbacks. That said, SRV records and regular server name
resolution don't work the same way, furthermore on error management.

This patch introduces a new call back to manage DNS errors related to
the SRV queries.

this fixes github issue #50.

Backport status: 2.3, 2.2, 2.1, 2.0
2021-02-24 21:58:45 +01:00
Willy Tarreau
5926e384e6 BUG/MINOR: fd: properly wait for !running_mask in fd_set_running_excl()
In fd_set_running_excl() we don't reset the old mask in the CAS loop,
so if we fail on the first round, we'll forcefully take the FD on the
next one.

In practice it's used bu fd_insert() and fd_delete() only, none of which
is supposed to be passed an FD which is still in use since in practice,
given that for now only listeners may be enabled on multiple threads at
once.

This can be backported to 2.2 but shouldn't result in fixing any user
visible bug for now.
2021-02-24 19:40:49 +01:00
Willy Tarreau
9c6dbf0eea CLEANUP: task: split the large tasklet_wakeup_on() function in two
This function has become large with the multi-queue scheduler. We need
to keep the fast path and the debugging parts inlined, but the rest now
moves to task.c just like was done for task_wakeup(). This has reduced
the code size by 6kB due to less inlining of large parts that are always
context-dependent, and as a side effect, has increased the overall
performance by 1%.
2021-02-24 17:55:58 +01:00
Willy Tarreau
955a11ebfa MINOR: task: move the allocated tasks counter to the per-thread struct
The nb_tasks counter was still global and gets incremented and decremented
for each task_new()/task_free(), and was read in process_runnable_tasks().
But it's only used for stats reporting, so doing this this often is
pointless and expensive. Let's move it to the task_per_thread struct and
have the stats sum it when needed.
2021-02-24 17:42:04 +01:00
Willy Tarreau
018564eaa2 CLEANUP: task: move the tree root detection from __task_wakeup() to task_wakeup()
Historically we used to call __task_wakeup() with a known tree root but
this is not the case and the code has remained needlessly complicated
with the root calculation in task_wakeup() passed in argument to
__task_wakeup() which compares it again.

Let's get rid of this and just move the detection code there. This
eliminates some ifdefs and allows to simplify the test conditions quite
a bit.
2021-02-24 17:42:04 +01:00
Willy Tarreau
1f3b1417b8 CLEANUP: tasks: use a less confusing name for task_list_size
This one is systematically misunderstood due to its unclear name. It
is in fact the number of tasks in the local tasklet list. Let's call
it "tasks_in_list" to remove some of the confusion.
2021-02-24 17:42:04 +01:00
Willy Tarreau
2c41d77ebc MINOR: tasks: do not maintain the rqueue_size counter anymore
This one is exclusively used as a boolean nowadays and is non-zero only
when the thread-local run queue is not empty. Better check the root tree's
pointer and avoid updating this counter all the time.
2021-02-24 17:42:04 +01:00
Willy Tarreau
9c7b8085f4 MEDIUM: task: remove the tasks_run_queue counter and have one per thread
This counter is solely used for reporting in the stats and is the hottest
thread contention point to date. Moving it to the scheduler and having a
separate one for the global run queue dramatically improves the performance,
showing a 12% boost on the request rate on 16 threads!

In addition, the thread debugging output which used to rely on rqueue_size
was not totally accurate as it would only report task counts. Now we can
return the exact thread's run queue length.

It is also interesting to note that there are still a few other task/tasklet
counters in the scheduler that are not efficiently updated because some cover
a single area and others cover multiple areas. It looks like having a distinct
counter for each of the following entries would help and would keep the code
a bit cleaner:
  - global run queue (tree)
  - per-thread run queue (tree)
  - per-thread shared tasklets list
  - per-thread local lists

Maybe even splitting the shared tasklets lists between pure tasklets and
tasks instead of having the whole and tasks would simplify the code because
there remain a number of places where several counters have to be updated.
2021-02-24 17:42:04 +01:00
Willy Tarreau
49de68520e MEDIUM: streams: do not use the streams lock anymore
The lock was still used exclusively to deal with the concurrency between
the "show sess" release handler and a stream_new() or stream_free() on
another thread. All other accesses made by "show sess" are already done
under thread isolation. The release handler only requires to unlink its
node when stopping in the middle of a dump (error, timeout etc). Let's
just isolate the thread to deal with this case so that it's compatible
with the dump conditions, and remove all remaining locking on the streams.

This effectively kills the streams lock. The measured gain here is around
1.6% with 4 threads (374krps -> 380k).
2021-02-24 13:54:50 +01:00
Willy Tarreau
a698eb6739 MINOR: streams: use one list per stream instead of a global one
The global streams list is exclusively used for "show sess", to look up
a stream to shut down, and for the hard-stop. Having all of them in a
single list is extremely expensive in terms of locking when using threads,
with performance losses as high as 7% having been observed just due to
this.

This patch makes the list per-thread, since there's no need to have a
global one in this situation. All call places just iterate over all
threads. The most "invasive" changes was in "show sess" where the end
of list needs to go back to the beginning of next thread's list until
the last thread is seen. For now the lock was maintained to keep the
code auditable but a next commit should get rid of it.

The observed performance gain here with only 4 threads is already 7%
(350krps -> 374krps).
2021-02-24 13:53:20 +01:00
Willy Tarreau
b981318c11 MINOR: stream: add an "epoch" to figure which streams appeared when
The "show sess" CLI command currently lists all streams and needs to
stop at a given position to avoid dumping forever. Since 2.2 with
commit c6e7a1b8e ("MINOR: cli: make "show sess" stop at the last known
session"), a hack consists in unlinking the stream running the applet
and linking it again at the current end of the list, in order to serve
as a delimiter. But this forces the stream list to be global, which
affects scalability.

This patch introduces an epoch, which is a global 32-bit counter that
is incremented by the "show sess" command, and which is copied by newly
created streams. This way any stream can know whether any other one is
newer or older than itself.

For now it's only stored and not exploited.
2021-02-24 12:12:51 +01:00
Ilya Shipitsin
98a9e1b873 BUILD: SSL: introduce fine guard for RAND_keep_random_devices_open
RAND_keep_random_devices_open is OpenSSL specific function, not
implemented in LibreSSL and BoringSSL. Let us define guard
HAVE_SSL_RAND_KEEP_RANDOM_DEVICES_OPEN in include/haproxy/openssl-compat.h
That guard does not depend anymore on HA_OPENSSL_VERSION
2021-02-22 10:35:23 +01:00
Willy Tarreau
c6ba9a0b9b MINOR: sched: have one runqueue ticks counter per thread
The runqueue_ticks counts the number of task wakeups and is used to
position new tasks in the run queue, but since we've had per-thread
run queues, the values there are not very relevant anymore and the
nice value doesn't apply well if some threads are more loaded than
others. In addition, letting all threads compete over a shared counter
is not smart as this may cause some excessive contention.

Let's move this index close to the run queues themselves, i.e. one per
thread and a global one. In addition to improving fairness, this has
increased global performance by 2% on 16 threads thanks to the lower
contention on rqueue_ticks.

Fairness issues were not observed, but if any were to be, this patch
could be backported as far as 2.0 to address them.
2021-02-20 13:03:37 +01:00
Willy Tarreau
4d77bbf856 MINOR: dynbuf: pass offer_buffers() the number of buffers instead of a threshold
Historically this function would try to wake the most accurate number of
process_stream() waiters. But since the introduction of filters which could
also require buffers (e.g. for compression), things started not to be as
accurate anymore. Nowadays muxes and transport layers also use buffers, so
the runqueue size has nothing to do anymore with the number of supposed
users to come.

In addition to this, the threshold was compared to the number of free buffer
calculated as allocated minus used, but this didn't work anymore with local
pools since these counts are not updated upon alloc/free!

Let's clean this up and pass the number of released buffers instead, and
consider that each waiter successfully called counts as one buffer. This
is not rocket science and will not suddenly fix everything, but at least
it cannot be as wrong as it is today.

This could have been marked as a bug given that the current situation is
totally broken regarding this, but this probably doesn't completely fix
it, it only goes in a better direction. It is possible however that it
makes sense in the future to backport this as part of a larger series if
the situation significantly improves.
2021-02-20 12:38:18 +01:00
Willy Tarreau
90f366b595 MINOR: dynbuf: use regular lists instead of mt_lists for buffer_wait
There's no point anymore in keeping mt_lists for the buffer_wait and
buffer_wq since it's thread-local now.
2021-02-20 12:38:18 +01:00
Willy Tarreau
e8e5091510 MINOR: dynbuf: make the buffer wait queue per thread
The buffer wait queue used to be global historically but this doest not
make any sense anymore given that the most common use case is to have
thread-local pools. Thus there's no point waking up waiters of other
threads after releasing an entry, as they won't benefit from it.

Let's move the queue head to the thread_info structure and use
ti->buffer_wq from now on.
2021-02-20 12:38:18 +01:00
Christopher Faulet
ea2cdf55e3 MEDIUM: server: Don't introduce a new server-state file version
This revert the commit 63e6cba12 ("MEDIUM: server: add server-states version
2"), but keeping all recent features added to the server-sate file. Instead
of adding a 2nd version for the server-state file format to handle the 5 new
fields added during the 2.4 development, these fields are considered as
optionnal during the parsing. So it is possible to load a server-state file
from HAProxy 2.3. However, from 2.4, these new fields are always dumped in
the server-state file. But it should not be a problem to load it on the 2.3.

This patch seems a bit huge but the diff ignoring the space is much smaller.

The version 2 of the server-state file format is reserved for a real
refactoring to address all issues of the current format.
2021-02-19 18:03:59 +01:00
Amaury Denoyelle
8990b010a0 MINOR: connection: allocate dynamically hash node for backend conns
Remove ebmb_node entry from struct connection and create a dedicated
struct conn_hash_node. struct connection contains now only a pointer to
a conn_hash_node, allocated only for connections where target is of type
OBJ_TYPE_SERVER. This will reduce memory footprints for every
connections that does not need http-reuse such as frontend connections.
2021-02-19 16:59:18 +01:00
Olivier Houchard
5567f41d0a BUG/MEDIUM: lists: Avoid an infinite loop in MT_LIST_TRY_ADDQ().
In MT_LIST_TRY_ADDQ(), deal with the "prev" field of the element before the
"next". If the element is the first in the list, then its next will
already have been locked when we locked list->prev->next, so locking it
again will fail, and we'll start over and over.

This should be backported to 2.3.
2021-02-19 16:47:20 +01:00
Willy Tarreau
66161326fd MINOR: listener: refine the default MAX_ACCEPT from 64 to 4
The maximum number of connections accepted at once by a thread for a single
listener used to default to 64 divided by the number of processes but the
tasklet-based model is much more scalable and benefits from smaller values.
Experimentation has shown that 4 gives the highest accept rate for all
thread values, and that 3 and 5 come very close, as shown below (HTTP/1
connections forwarded per second at multi-accept 4 and 64):

 ac\thr|    1     2    4     8     16
 ------+------------------------------
      4|   80k  106k  168k  270k  336k
     64|   63k   89k  145k  230k  274k

Some tests were also conducted on SSL and absolutely no change was observed.

The value was placed into a define because it used to be spread all over the
code.

It might be useful at some point to backport this to 2.3 and 2.2 to help
those who observed some performance regressions from 1.6.
2021-02-19 16:02:04 +01:00
Willy Tarreau
4327d0ac00 MINOR: tasks: refine the default run queue depth
Since a lot of internal callbacks were turned to tasklets, the runqueue
depth had not been readjusted from the default 200 which was initially
used to favor batched processing. But nowadays it appears too large
already based on the following tests conducted on a 8c16t machine with
a simple config involving "balance leastconn" and one server. The setup
always involved the two threads of a same CPU core except for 1 thread,
and the client was running over 1000 concurrent H1 connections. The
number of requests per second is reported for each (runqueue-depth,
nbthread) couple:

 rq\thr|    1     2     4     8    16
 ------+------------------------------
     32|  120k  159k  276k  477k  698k
     40|  122k  160k  276k  478k  722k
     48|  121k  159k  274k  482k  720k
     64|  121k  160k  274k  469k  710k
    200|  114k  150k  247k  415k  613k  <-- default

It's possible to save up to about 18% performance by lowering the
default value to 40. One possible explanation to this is that checking
I/Os more frequently allows to flush buffers faster and to smooth the
I/O wait time over multiple operations instead of alternating phases
of processing, waiting for locks and waiting for new I/Os.

The total round trip time also fell from 1.62ms to 1.40ms on average,
among which at least 0.5ms is attributed to the testing tools since
this is the minimum attainable on the loopback.

After some observation it would be nice to backport this to 2.3 and
2.2 which observe similar improvements, since some users have already
observed some perf regressions between 1.6 and 2.2.
2021-02-19 16:01:55 +01:00
Ilya Shipitsin
c47d676bd7 BUILD: ssl: introduce fine guard for OpenSSL specific SCTL functions
SCTL (signed certificate timestamp list) specified in RFC6962
was implemented in c74ce24cd22e8c683ba0e5353c0762f8616e597d, let
us introduce macro HAVE_SSL_SCTL for the HAVE_SSL_SCTL sake,
which in turn is based on SN_ct_cert_scts, which comes in the same commit
2021-02-18 15:55:50 +01:00
Christopher Faulet
8dd40fbde9 BUG/MINOR: sample: Always consider zero size string samples as unsafe
smp_is_safe() function is used to be sure a sample may be safely
modified. For string samples, a test is performed to verify if there is a
null-terminated byte. If not, one is added, if possible. It means if the
sample is not const and if there is some free space in the buffer, after
data. However, we must not try to read the null-terminated byte if the
string sample is too long (data >= size) or if the size is equal to
zero. This last test was not performed. Thus it was possible to consider a
string sample as safe by testing a byte outside the buffer.

Now, a zero size string sample is always considered as unsafe and is
duplicated when smp_make_safe() is called.

This patch must be backported in all stable versions.
2021-02-18 14:58:43 +01:00
Willy Tarreau
ca9f60c1ac MINOR: tasks/debug: add some extra controls of use-after-free in DEBUG_TASK
It's pretty easy to pre-initialize the index, change it on free() and check
it during the wakeup, so let's do this to ease detection of any accidental
task_wakeup() after a task_free() or tasklet_wakeup() after a tasklet_free().
If this would ever happen we'd then get a backtrace and a core now. The
index's parity is respected so that the call history remains exploitable.
2021-02-18 14:38:49 +01:00
Willy Tarreau
b23f04260b MINOR: tasks: add DEBUG_TASK to report caller info in a task
The idea is to know who woke a task up, by recording the last two
callers in a rotating mode. For now it's trivial with task_wakeup()
but tasklet_wakeup_on() will require quite some more changes.

This typically gives this from the debugger:

  (gdb) p t->debug
  $2 = {
    caller_file = {0x0, 0x8c0d80 "src/task.c"},
    caller_line = {0, 260},
    caller_idx = 1
  }

or this:

  (gdb) p t->debug
  $6 = {
    caller_file = {0x7fffe40329e0 "", 0x885feb "src/stream.c"},
    caller_line = {284, 284},
    caller_idx = 1
  }

But it also provides a trivial macro allowing to simply place a call in
a task/tasklet handler that needs to be observed:

   DEBUG_TASK_PRINT_CALLER(t);

Then starting haproxy this way would trivially yield such info:

  $ ./haproxy -db -f test.cfg | sort | uniq -c | sort -nr
   199992 h1_io_cb woken up from src/sock.c:797
    51764 h1_io_cb woken up from src/mux_h1.c:3634
       65 h1_io_cb woken up from src/connection.c:169
       45 h1_io_cb woken up from src/sock.c:777
2021-02-18 10:42:07 +01:00
Willy Tarreau
59b0fecfd9 MINOR: lb/api: let callers of take_conn/drop_conn tell if they have the lock
The two algos defining these functions (first and leastconn) do not need the
server's lock. However it's already present in pendconn_process_next_strm()
so the API must be updated so that the functions may take it if needed and
that the callers indicate whether they already own it.

As such, the call places (backend.c and stream.c) now do not take it
anymore, queue.c was unchanged since it's already held, and both "first"
and "leastconn" were updated to take it if not already held.

A quick test on the "first" algo showed a jump from 432 to 565k rps by
just dropping the lock in stream.c!
2021-02-18 10:06:45 +01:00
Willy Tarreau
b9ad30a8ad Revert "MINOR: threads: change lock_t to an unsigned int"
This reverts commit 8f1f177ed0.

Repeated tests have shown a small perforamnce degradation of ~1.8%
caused by this patch at high request rates on 16 threads. The exact
cause is not yet perfectly known but it probably stems in slower
accesses for non-64-bit aligned atomic accesses.
2021-02-18 10:06:45 +01:00
Willy Tarreau
751153e0f1 OPTIM: server: switch the actconn list to an mt-list
The remaining contention on the server lock solely comes from
sess_change_server() which takes the lock to add and remove a
stream from the server's actconn list. This is both expensive
and pointless since we have mt-lists, and this list is only
used by the CLI's "shutdown server sessions" command!

Let's migrate to an mt-list and remove the need for this costly
lock. By doing so, the request rate increased by ~1.8%.
2021-02-18 10:06:45 +01:00
Willy Tarreau
ccea3c54f4 DEBUG: thread: add 5 extra lock labels for statistics and debugging
Since OTHER_LOCK is commonly used it's become much more difficult to
profile lock contention by temporarily changing a lock label. Let's
add DEBUG1..5 to serve only for debugging. These ones must not be
used in committed code. We could decide to only define them when
DEBUG_THREAD is set but that would complicate attempts at measuring
performance with debugging turned off.
2021-02-18 10:06:45 +01:00
Willy Tarreau
4e9df2737d BUG/MEDIUM: checks: don't needlessly take the server lock in health_adjust()
The server lock was taken preventively for anything in health_adjust(),
including the static config checks needed to detect that the lock was not
needed, while the function is always called on the response path to update
a server's status. This was responsible for huge contention causing a
performance drop of about 17% on 16 threads. Let's move the lock only
where it should be, i.e. inside the function around the critical sections
only. By doing this, a 16-thread process jumped back from 575 to 675 krps.

This should be backported to 2.3 as the situation degraded there, and
maybe later to 2.2.
2021-02-18 10:06:45 +01:00
Amaury Denoyelle
36441f46c4 MINOR: connection: remove pointers for prehash in conn_hash_params
Replace unneeded pointers for sni/proxy prehash by plain data type.
The code is slightly cleaner.
2021-02-17 16:43:07 +01:00
Amaury Denoyelle
aba507334b BUG/MAJOR: connection: prevent double free if conn selected for removal
Always try to remove a connexion from its toremove_list in conn_free.
This prevents a double-free in case the connection is freed but was
already added in toremove_list.

This bug was easily reproduced by running 4-5 runs of inject on a
single-thread instance of haproxy :

$ inject -u 10000 -d 10 -G 127.0.0.1:20080

A crash would soon be triggered in srv_cleanup_toremove_connections.

This does not need to be backported.
2021-02-16 16:17:29 +01:00
William Dauchy
3679d0c794 MINOR: stats: add helper to get status string
move listen status to a helper, defining both status enum and string
definition.
this will be helpful to be reused in prometheus code. It also removes
this hard-to-read nested ternary.

Signed-off-by: William Dauchy <wdauchy@gmail.com>
2021-02-15 14:13:32 +01:00
William Dauchy
655e14ef17 MEDIUM: stats: allow to select one field in stats_fill_li_stats
prometheus approach requires to output all values for a given metric
name; meaning we iterate through all metrics, and then iterate in the
inner loop on all objects for this metric.
In order to allow more code reuse, adapt the stats API to be able to
select one field or fill them all otherwise.
From this patch it should be possible to add support for listen stats in
prometheus.

Signed-off-by: William Dauchy <wdauchy@gmail.com>
2021-02-15 14:13:32 +01:00
Emeric Brun
fd647d5f5f MEDIUM: dns: adds code to support pipelined DNS requests over TCP.
This patch introduce the "dns_stream_nameserver" to use DNS over
TCP on strict nameservers.  For the upper layer it is analog to
the api used with udp nameservers except that the user que switch
the name server in "stream" mode at the init using "dns_stream_init".

The fallback from UDP to TCP is not handled and this is not the
purpose of this feature. This is done to choose the transport layer
during the initialization.

Currently there is a hardcoded limit of 4 pipelined transactions
per TCP connections. A batch of idle connections is expired every 5s.
This code is designed to support a maximum DNS message size on TCP: 64k.

Note: this code won't perform retry on unanswered queries this
should be handled by the upper layer
2021-02-13 10:03:46 +01:00
Emeric Brun
c943799c86 MEDIUM: resolvers/dns: split dns.c into dns.c and resolvers.c
This patch splits current dns.c into two files:

The first dns.c contains code related to DNS message exchange over UDP
and in future other TCP. We try to remove depencies to resolving
to make it usable by other stuff as DNS load balancing.

The new resolvers.c inherit of the code specific to the actual
resolvers.

Note:
It was really difficult to obtain a clean diff dur to the amount
of moved code.

Note2:
Counters and stuff related to stats is not cleany separated because
currently counters for both layers are merged and hard to separate
for now.
2021-02-13 10:03:46 +01:00
Emeric Brun
d26a6237ad MEDIUM: resolvers: split resolving and dns message exchange layers.
This patch splits recv and send functions in two layers. the
lowest is responsible of DNS message transactions over
the network. Doing this we could use DNS message layer
for something else than resolving. Load balancing for instance.

This patch also re-works the way to init a nameserver and
introduce the new struct dns_dgram_server to prepare the arrival
of dns_stream_server and the support of DNS over TCP.

The way to retry a send failure of a request because of EAGAIN
was re-worked. Previously there was no control and all "pending"
queries were re-played each time it reaches a EAGAIN. This
patch introduce a ring to stack messages in case of sent
failure. This patch is emptied if poller shows that the
socket is ready again to push messages.
2021-02-13 09:51:10 +01:00
Emeric Brun
d3b4495f0d MINOR: resolvers: rework dns stats prototype because specific to resolvers
Counters are currently stored into lowlevel nameservers struct but
most of them are resolving layer data and increased in the upper layer
So this patch renames the prototype used to allocate/dump them with prefix
'resolv' waiting for a clean split.
2021-02-13 09:43:18 +01:00
Emeric Brun
6a2006ae37 MINOR: resolvers: replace nameserver's resolver ref by generic parent pointer
This will allow to use nameservers in something else than a resolver
section (load balancing for instance).
2021-02-13 09:43:18 +01:00
Emeric Brun
d30e9a1709 MINOR: resolvers: rework prototype suffixes to split resolving and dns.
A lot of prototypes in dns.h are specific to resolvers and must
be renamed to split resolving and DNS layers.
2021-02-13 09:43:18 +01:00
Emeric Brun
456de77bdb MINOR: resolvers: renames resolvers DNS_UPD_* returncodes to RSLV_UPD_*
This patch renames some #defines prefixes from DNS to RSLV.
2021-02-13 09:43:18 +01:00
Emeric Brun
30c766ebbc MINOR: resolvers: renames resolvers DNS_RESP_* errcodes RSLV_RESP_*
This patch renames some #defines prefixes from DNS to RSLV.
2021-02-13 09:43:18 +01:00
Emeric Brun
21fbeedf97 MINOR: resolvers: renames some dns prefixed types using resolv prefix.
@@ -119,8 +119,8 @@ struct act_rule {
-               } dns;                         /* dns resolution */
+               } resolv;                      /* resolving */

-struct dns_options {
+struct resolv_options {
2021-02-13 09:43:18 +01:00
Emeric Brun
08622d3c0a MINOR: resolvers: renames some resolvers specific types to not use dns prefix
This patch applies those changes on names:

-struct dns_resolution {
+struct resolv_resolution {

-struct dns_requester {
+struct resolv_requester {

-struct dns_srvrq {
+struct resolv_srvrq {

@@ -185,12 +185,12 @@ struct stream {

        struct {
-               struct dns_requester *dns_requester;
+               struct resolv_requester *requester;
...
-       } dns_ctx;
+       } resolv_ctx;
2021-02-13 09:43:18 +01:00
Emeric Brun
750fe79cd0 MINOR: resolvers: renames type dns_resolvers to resolvers.
It also renames 'dns_resolvers' head list to sec_resolvers
to avoid conflicts with local variables 'resolvers'.
2021-02-13 09:43:17 +01:00
Emeric Brun
85914e9d9b MINOR: resolvers: renames some resolvers internal types and removes dns prefix
Some types are specific to resolver code and a renamed using
the 'resolv' prefix instead 'dns'.

-struct dns_query_item {
+struct resolv_query_item {

-struct dns_answer_item {
+struct resolv_answer_item {

-struct dns_response_packet {
+struct resolv_response {
2021-02-13 09:43:17 +01:00
Emeric Brun
67f830d29d BUG/MINOR: resolvers: fix attribute packed struct for dns
This patch adds the attribute packed on struct dns_question
because it is directly memcpy to network building a response.

This patch also removes the commented line:
//     struct list options;       /* list of option records */

because it is also used directly using memcpy to build a request
and must not contain host data.
2021-02-13 09:43:17 +01:00
Emeric Brun
50c870e4de BUG/MINOR: dns: add missing sent counter and parent id to dns counters.
Resolv callbacks are also updated to rely on counters and not on
nameservers.
"show stat domain dns" will now show the parent id (i.e. resolvers
section name).
2021-02-13 09:43:17 +01:00
Emeric Brun
e14b98c08e MINOR: ring: adds new ring_init function.
Adds the new ring_init function to initialize
a pre-allocated ring struct using the given
memory area.
2021-02-13 09:43:17 +01:00
Willy Tarreau
49962b58d0 MINOR: peers/cli: do not dump the peers dictionaries by default on "show peers"
The "show peers" output has become huge due to the dictionaries making it
less readable. Now this feature has reached a certain level of maturity
which doesn't warrant to dump it all the time, given that it was essentially
needed by developers. Let's make it optional, and disabled by default, only
when "show peers dict" is requested. The default output reminds about the
command. The output has been divided by 5 :

  $ socat - /tmp/sock1  <<< "show peers dict" | wc -l
  125
  $ socat - /tmp/sock1  <<< "show peers" | wc -l
  26

It could be useful to backport this to recent stable versions.
2021-02-12 17:00:52 +01:00
Willy Tarreau
e90904d5a9 MEDIUM: proxy: store the default proxies in a tree by name
Now default proxies are stored into a dedicated tree, sorted by name.
Only unnamed entries are not kept upon new section creation. The very
first call to cfg_parse_listen() will automatically allocate a dummy
defaults section which corresponds to the previous static one, since
the code requires to have one at a few places.

The first immediately visible benefit is that it allows to reuse
alloc_new_proxy() to allocate a defaults section instead of doing it by
hand. And the secret goal is to allow to keep multiple named defaults
section in memory to reuse them from various proxies.
2021-02-12 16:23:46 +01:00
Willy Tarreau
0a0f6a7e4f MINOR: proxy: support storing defaults sections into their own tree
Now we'll have a tree of named defaults sections. The regular insertion
and lookup functions take care of the capability in order to select the
appropriate tree. A new function proxy_destroy_defaults() removes a
proxy from this tree and frees it entirely.
2021-02-12 16:23:46 +01:00
Willy Tarreau
80dc6fea59 MINOR: proxy: add a new capability PR_CAP_DEF
In order to more easily distinguish a default proxy from a standard one,
let's introduce a new capability PR_CAP_DEF.
2021-02-12 16:23:46 +01:00
Willy Tarreau
7d0c143185 MINOR: cfgparse: move defproxy to cfgparse-listen as a static
We don't want to expose this one anymore as we'll soon keep multiple
default proxies. Let's move it inside the parser which is the only
place which still uses it, and initialize it on the fly once needed
instead of doing it at boot time.
2021-02-12 16:23:46 +01:00
Willy Tarreau
bb8669ae28 BUG/MINOR: server: parse_server() must take a const for the defproxy
The default proxy was passed as a variable, which in addition to being
a PITA to deal with in the config parser, doesn't feel safe to use when
it ought to be const.

This will only affect new code so no backport is needed.
2021-02-12 16:23:46 +01:00
Willy Tarreau
54fa7e332a BUG/MINOR: tcpcheck: proxy_parse_*check*() must take a const for the defproxy
The default proxy was passed as a variable, which in addition to being
a PITA to deal with in the config parser, doesn't feel safe to use when
it ought to be const.

This will only affect new code so no backport is needed.
2021-02-12 16:23:46 +01:00
Willy Tarreau
220fd70694 BUG/MINOR: extcheck: proxy_parse_extcheck() must take a const for the defproxy
The default proxy was passed as a variable, which in addition to being
a PITA to deal with in the config parser, doesn't feel safe to use when
it ought to be const.

This will only affect new code so no backport is needed.
2021-02-12 16:23:46 +01:00
Willy Tarreau
a3320a0509 MINOR: proxy: move the defproxy freeing code to proxy.c
This used to be open-coded in cfgparse-listen.c when facing a "defaults"
keyword. Let's move this into proxy_free_defaults(). This code is ugly and
doesn't even reset the just freed pointers. Let's not change this yet.

This code should probably be merged with a generic proxy deinit function
called from deinit(). However there's a catch on uri_auth which cannot be
freed because it might be used by one or several proxies. We definitely
need refcounts there!
2021-02-12 16:23:46 +01:00
Willy Tarreau
7683893c70 REORG: proxy: centralize the proxy allocation code into alloc_new_proxy()
This new function takes over the old open-coding that used to be done
for too long in cfg_parse_listen() and it now does everything at once
in a proxy-centric function. The function does all the job of allocating
the structure, initializing it, presetting its defaults from the default
proxy and checking for errors. The code was almost unchanged except for
defproxy being passed as a pointer, and the error message being passed
using memprintf().

This change will be needed to ease reuse of multiple default proxies,
or to create dynamic backends in a distant future.
2021-02-12 16:23:46 +01:00
Willy Tarreau
144289b459 REORG: move init_default_instance() to proxy.c and pass it the defproxy pointer
init_default_instance() was still left in cfgparse.c which is not the
best place to pre-initialize a proxy. Let's place it in proxy.c just
after init_new_proxy(), take this opportunity for renaming it to
proxy_preset_defaults() and taking out init_new_proxy() from it, and
let's pass it the pointer to the default proxy to be initialized instead
of implicitly assuming defproxy. We'll soon be able to exploit this.
Only two call places had to be updated.
2021-02-12 16:23:46 +01:00
Willy Tarreau
168a414037 BUILD: proxy: add missing compression-t.h to proxy-t.h
struct comp is used in struct proxy but never declared prior to this
so depending on where proxy.h is included, touching the <comp> field
can break the build.
2021-02-12 16:23:46 +01:00
Willy Tarreau
09f2e77eb1 BUG/MINOR: tcpheck: the source list must be a const in dup_tcpcheck_var()
This is just an API bug but it's annoying when trying to tidy the code.
The source list passed in argument must be a const and not a variable,
as it's typically the list head from a default proxy and must obviously
not be modified by the function. No backport is needed as it only impacts
new code.
2021-02-12 16:23:46 +01:00
Willy Tarreau
016255a483 BUG/MINOR: http-htx: defpx must be a const in proxy_dup_default_conf_errors()
This is just an API bug but it's annoying when trying to tidy the code.
The default proxy passed in argument must be a const and not a variable.
No backport is needed as it only impacts new code.
2021-02-12 16:23:46 +01:00
Willy Tarreau
5bbc676608 BUG/MINOR: stats: revert the change on ST_CONVDONE
In 2.1, commit ee4f5f83d ("MINOR: stats: get rid of the ST_CONVDONE flag")
introduced a subtle bug. By testing curproxy against defproxy in
check_config_validity(), it tried to eliminate the need for a flag
to indicate that stats authentication rules were already compiled,
but by doing so it left the issue opened for the case where a new
defaults section appears after the two proxies sharing the first
one:

      defaults
          mode http
          stats auth foo:bar

      listen l1
          bind :8080

      listen l2
          bind :8181

      defaults
          # just to break above

This config results in:
  [ALERT] 042/113725 (3121) : proxy 'f2': stats 'auth'/'realm' and 'http-request' can't be used at the same time.
  [ALERT] 042/113725 (3121) : Fatal errors found in configuration.

Removing the last defaults remains OK. It turns out that the cleanups
that followed that patch render it useless, so the best fix is to revert
the change (with the up-to-date flags instead). The flag was marked as
belonging to the config. It's not exact but it's the closest to the
reality, as it's not there to configure the behavior but ti mention
that the config parser did its job.

This could be backported as far as 2.1, but in practice it looks like
nobody ever hit it.
2021-02-12 16:23:45 +01:00
William Dauchy
d1a7b85a40 MEDIUM: server: support {check,agent}_addr, agent_port in server state
logical followup from cli commands addition, so that the state server
file stays compatible with the changes made at runtime; use previously
added helper to load server attributes.

also alloc a specific chunk to avoid mixing with other called functions
using it

Signed-off-by: William Dauchy <wdauchy@gmail.com>
2021-02-12 16:04:52 +01:00
William Dauchy
63e6cba12a MEDIUM: server: add server-states version 2
Even if it is possibly too much work for the current usage, it makes
sure we don't break states file from v2.3 to v2.4; indeed, since v2.3,
we introduced two new fields, so we put them aside to guarantee we can
easily reload from a version 1.
The diff seems huge but there is no specific change apart from:
- introduce v2 where it is needed (parsing, update)
- move away from switch/case in update to be able to reuse code
- move srv lock to the whole function to make it easier

this patch confirm how painful it is to maintain this functionality.

Signed-off-by: William Dauchy <wdauchy@gmail.com>
2021-02-12 16:04:52 +01:00
Amaury Denoyelle
1921d20fff MINOR: connection: use proxy protocol as parameter for srv conn hash
Use the proxy protocol frame if proxy protocol is activated on the
server line. Do not add anymore these connections in the private list.
If some requests are made with the same proxy fields, they can reuse
the idle connection.

The reg-tests proxy_protocol_send_unique_id must be adapted has it
relied on the side effect behavior that every requests from a same
connection reused a private server connection. Now, a new connection is
created as expected if the proxy protocol fields differ.
2021-02-12 12:54:04 +01:00
Amaury Denoyelle
d10a200f62 MINOR: connection: use src addr as parameter for srv conn hash
The source address is used as an input to the the server connection hash. The
address and port are used as separate hash inputs. Do not add anymore these
connections in the private list.

This parameter is set only if used in the transparent-proxy mode.
2021-02-12 12:54:04 +01:00
Amaury Denoyelle
01a287f1e5 MINOR: connection: use dst addr as parameter for srv conn hash
The destination address is used as an input to the server connection hash. The
address and port are used as separated hash inputs. Note that they are not used
when statically specified on the server line. This is only useful for dynamic
destination address.

This is typically used when the server address is dynamically set via the
set-dst action. The address and port are separated hash parameters.

Most notably, it should fixed set-dst use case (cf github issue #947).
2021-02-12 12:53:56 +01:00
Amaury Denoyelle
9b626e3c19 MINOR: connection: use sni as parameter for srv conn hash
The sni parameter is an input to the server connection hash. Do not add
anymore connections with dynamic sni in the private list. Thus, it is
now possible to reuse a server connection if they use the same sni.
2021-02-12 12:48:11 +01:00
Amaury Denoyelle
293dcc400e MINOR: backend: compare conn hash for session conn reuse
Compare the connection hash when reusing a connection from the session.
This ensures that a private connection is reused only if it shares the
same set of parameters.
2021-02-12 12:33:05 +01:00
Amaury Denoyelle
1a58aca84e MINOR: connection: use the srv pointer for the srv conn hash
The pointer of the target server is used as a first parameter for the
server connection hash calcul. This prevents the hash to be null when no
specific parameters are present, and can serve as a simple defense
against an attacker trying to reuse a non-conform connection.
2021-02-12 12:33:05 +01:00
Amaury Denoyelle
81c6f76d3e MINOR: connection: prepare hash calcul for server conns
This is a preliminary work for the calcul of the backend connection
hash. A structure conn_hash_params is the input for the operation,
containing the various specific parameters of a connection.

The high bits of the hash will reflect the parameters present as input.
A set of macros is written to manipulate the connection hash and extract
the parameters/payload.
2021-02-12 12:33:05 +01:00
Amaury Denoyelle
f232cb3e9b MEDIUM: connection: replace idle conn lists by eb trees
The server idle/safe/available connection lists are replaced with ebmb-
trees. This is used to store backend connections, with the new field
connection hash as the key. The hash is a 8-bytes size field, used to
reflect specific connection parameters.

This is a preliminary work to be able to reuse connection with SNI,
explicit src/dst address or PROXY protocol.
2021-02-12 12:33:05 +01:00
Amaury Denoyelle
5c7086f6b0 MEDIUM: connection: protect idle conn lists with locks
This is a preparation work for connection reuse with sni/proxy
protocol/specific src-dst addresses.

Protect every access to idle conn lists with a lock. This is currently
strictly not needed because the access to the list are made with atomic
operations. However, to be able to reuse connection with specific
parameters, the list storage will be converted to eb-trees. As this
structure does not have atomic operation, it is mandatory to protect it
with a lock.

For this, the takeover lock is reused. Its role was to protect during
connection takeover. As it is now extended to general idle conns usage,
it is renamed to idle_conns_lock. A new lock section is also
instantiated named IDLE_CONNS_LOCK to isolate its impact on performance.
2021-02-12 12:33:04 +01:00
William Dauchy
38cd986c54 BUG/MINOR: server: re-align state file fields number
Since commit 3169471964 ("MINOR: Add
server port field to server state file.") max_fields was not increased
on version number 1. So this patch aims to fix it. This should be
backported as far as v1.8, but the numbering should be adpated depending
on the version: simply increase the field by 1.

Signed-off-by: William Dauchy <wdauchy@gmail.com>
2021-02-10 16:25:42 +01:00
William Lallemand
7b41654495 MINOR: ssl: add SSL_SERVER_LOCK label in threads.h
Amaury reported that the commit 3ce6eed ("MEDIUM: ssl: add a rwlock for
SSL server session cache") introduced some warning during compilation:

    include/haproxy/thread.h|411 col 2| warning: enumeration value 'SSL_SERVER_LOCK' not handled in switch [-Wswitch]

This patch fix the issue by adding the right entry in the switch block.

Must be backported where 3ce6eed is backported. (2.4 only for now)
2021-02-10 16:17:19 +01:00
Willy Tarreau
826f3ab5e6 MINOR: stick-tables/counters: add http_fail_cnt and http_fail_rate data types
Historically we've been counting lots of client-triggered events in stick
tables to help detect misbehaving ones, but we've been missing the same on
the server side, and there's been repeated requests for being able to count
the server errors per URL in order to precisely monitor the quality of
service or even to avoid routing requests to certain dead services, which
is also called "circuit breaking" nowadays.

This commit introduces http_fail_cnt and http_fail_rate, which work like
http_err_cnt and http_err_rate in that they respectively count events and
their frequency, but they only consider server-side issues such as network
errors, unparsable and truncated responses, and 5xx status codes other
than 501 and 505 (since these ones are usually triggered by the client).
Note that retryable errors are purposely not accounted for, so that only
what the client really sees is considered.

With this it becomes very simple to put some protective measures in place
to perform a redirect or return an excuse page when the error rate goes
beyond a certain threshold for a given URL, and give more chances to the
server to recover from this condition. Typically it could look like this
to bypass a URL causing more than 10 requests per second:

  stick-table type string len 80 size 4k expire 1m store http_fail_rate(1m)
  http-request track-sc0 base       # track host+path, ignore query string
  http-request return status 503 content-type text/html \
      lf-file excuse.html if { sc0_http_fail_rate gt 10 }

A more advanced mechanism using gpt0 could even implement high/low rates
to disable/enable the service.

Reg-test converteers_ref_cnt_never_dec.vtc was updated to test it.
2021-02-10 12:27:01 +01:00
Willy Tarreau
e66ee1a651 BUG/MINOR: intops: fix mul32hi()'s off-by-one
mul32hi() multiples a constant a with a variable b from 0 to 0xffffffff
and shifts the result by 32 bits. It's visible that it's always impossible
to reach the constant a this way because the product always misses exactly
one unit of a to be preserved. And this cannot be corrected by the caller
either as adding one to the output will only shift the output range, and
it's not possible to pass 2^32 on the ratio <b>. The right approach is to
add "a" after the multiplication so that the input range is always
preserved for all ratio values from 0 to 0xffffffff:

     (a=0x00000000 * b=0x00000000 + a=0x00000000) >> 32 = 0x00000000
     (a=0x00000000 * b=0x00000001 + a=0x00000000) >> 32 = 0x00000000
     (a=0x00000000 * b=0xffffffff + a=0x00000000) >> 32 = 0x00000000
     (a=0x00000001 * b=0x00000000 + a=0x00000001) >> 32 = 0x00000000
     (a=0x00000001 * b=0x00000001 + a=0x00000001) >> 32 = 0x00000000
     (a=0x00000001 * b=0xffffffff + a=0x00000001) >> 32 = 0x00000001
     (a=0xffffffff * b=0x00000000 + a=0xffffffff) >> 32 = 0x00000000
     (a=0xffffffff * b=0x00000001 + a=0xffffffff) >> 32 = 0x00000001
     (a=0xffffffff * b=0xffffffff + a=0xffffffff) >> 32 = 0xffffffff

This is only used in freq_ctr calculations and the slightly lower value
is unlikely to have ever been noticed by anyone. This may be backported
though it is not important.
2021-02-09 17:52:50 +01:00
William Lallemand
3ce6eedb37 MEDIUM: ssl: add a rwlock for SSL server session cache
When adding the server side support for certificate update over the CLI
we encountered a design problem with the SSL session cache which was not
locked.

Indeed, once a certificate is updated we need to flush the cache, but we
also need to ensure that the cache is not used during the update.
To prevent the use of the cache during an update, this patch introduce a
rwlock for the SSL server session cache.

In the SSL session part this patch only lock in read, even if it writes.
The reason behind this, is that in the session part, there is one cache
storage per thread so it is not a problem to write in the cache from
several threads. The problem is only when trying to write in the cache
from the CLI (which could be on any thread) when a session is trying to
access the cache. So there is a write lock in the CLI part to prevent
simultaneous access by a session and the CLI.

This patch also remove the thread_isolate attempt which is eating too
much CPU time and was not protecting from the use of a free ptr in the
session.
2021-02-09 09:43:44 +01:00
Ilya Shipitsin
acf84595a7 CLEANUP: assorted typo fixes in the code and comments
This is 17th iteration of typo fixes
2021-02-08 10:49:08 +01:00