HAProxy - Load balancer
Find a file
Willy Tarreau 787dc20952 BUG/MEDIUM: lists: add missing store barrier on MT_LIST_BEHEAD()
When running multi-threaded tests on my NanoPI-Fire3 (8 A53 cores), I
managed to occasionally get either a bus error or a segfault in the
scheduler, but only when running at a high connection rate (injecting
on a tcp-request connection reject rule). The bug is rare and happens
around once per million connections. I could never reproduce it with
less than 4 threads nor on A72 cores.

Haproxy 2.1.0 would also fail there but not 2.1.7.

Every time the crash happened with the TL_URGENT task list corrupted,
though it was not immediately after the LIST_SPLICE() call, indicating
background activity survived the MT_LIST_BEHEAD() operation. This queue
is where the shared runqueue is transferred, and the shared runqueue
gets fast inter-thread tasklet wakeups from idle conn takeover and new
connections.

Comparing the MT_LIST_BEHEAD() and MT_LIST_DEL() implementations, it's
quite obvious that a few barriers are missing from the former, and these
will simply fail on weakly ordered caches.

Two store barriers were added before the break() on failure, to match
what is done on the normal path. Missing them almost always results in
a segfault which is quite rare but consistent (after ~3M connections).

The 3rd one before updating n->prev seems intuitively needed though I
coudln't make the code fail without it. It's present in MT_LIST_DEL so
better not be needlessly creative. The last one is the most important
one, and seems to be the one that helps a concurrent MT_LIST_ADDQ()
detect a late failure and try again. With this, the code survives at
least 30M connections.

Interestingly the exact same issue was addressed in 2.0-dev2 for MT_LIST_DEL
with commit 690d2ad4d ("BUG/MEDIUM: list: add missing store barriers when
updating elements and head").

This fix must be backported to 2.1 as MT_LIST_BEHEAD() is also used there.

It's only tagged as medium because it will only affect entry-level CPUs
like Cortex A53 (x86 are not affected), and requires load levels that are
very hard to achieve on such machines to trigger it. In practice it's
unlikely anyone will ever hit it.
2020-07-08 19:45:50 +02:00
.github CI: extend spellchecker whitelist 2020-06-26 11:26:52 +02:00
contrib CLEANUP: contrib/prometheus-exporter: typo fixes for ssl reuse metric 2020-07-07 23:15:03 +02:00
doc MINOR: config: make strict limits enabled by default 2020-07-07 16:52:35 +02:00
examples CLEANUP: assorted typo fixes in the code and comments 2020-06-26 11:27:28 +02:00
include BUG/MEDIUM: lists: add missing store barrier on MT_LIST_BEHEAD() 2020-07-08 19:45:50 +02:00
reg-tests BUG/MINOR: http-rules: Fix ACLs parsing for http deny rules 2020-06-30 09:32:03 +02:00
scripts CI: travis-ci: switch BoringSSL builds to ninja 2020-06-26 11:26:26 +02:00
src CLEANUP: Add static void hlua_deinit() 2020-07-07 16:52:35 +02:00
tests REORG: include: split mini-clist into haproxy/list and list-t.h 2020-06-11 10:18:56 +02:00
.cirrus.yml CI: cirrus-ci: exclude slow reg-tests 2020-07-04 06:58:14 +02:00
.gitignore DOC: create a BRANCHES file to explain the life cycle 2019-06-15 22:00:14 +02:00
.travis.yml CI: travis-ci: switch BoringSSL builds to ninja 2020-06-26 11:26:26 +02:00
BRANCHES DOC: assorted typo fixes in the documentation 2020-03-09 14:45:58 +01:00
CHANGELOG [RELEASE] Released version 2.3-dev0 2020-07-07 16:39:18 +02:00
CONTRIBUTING DOC: assorted typo fixes in the documentation and Makefile 2020-03-06 10:49:55 +01:00
INSTALL MINOR: version: back to development, update status message 2020-07-07 16:38:51 +02:00
LICENSE LICENSE: add licence exception for OpenSSL 2012-09-07 13:52:26 +02:00
MAINTAINERS REORG: include: split hathreads into haproxy/thread.h and haproxy/thread-t.h 2020-06-11 10:18:56 +02:00
Makefile CLEANUP: makefile: update the outdated list of DEBUG_xxx options 2020-07-04 12:43:46 +02:00
README DOC: create a BRANCHES file to explain the life cycle 2019-06-15 22:00:14 +02:00
ROADMAP DOC: update the outdated ROADMAP file 2019-06-15 21:59:54 +02:00
SUBVERS BUILD: use format tags in VERDATE and SUBVERS files 2013-12-10 11:22:49 +01:00
VERDATE [RELEASE] Released version 2.2.0 2020-07-07 16:33:14 +02:00
VERSION [RELEASE] Released version 2.3-dev0 2020-07-07 16:35:28 +02:00

The HAProxy documentation has been split into a number of different files for
ease of use.

Please refer to the following files depending on what you're looking for :

  - INSTALL for instructions on how to build and install HAProxy
  - BRANCHES to understand the project's life cycle and what version to use
  - LICENSE for the project's license
  - CONTRIBUTING for the process to follow to submit contributions

The more detailed documentation is located into the doc/ directory :

  - doc/intro.txt for a quick introduction on HAProxy
  - doc/configuration.txt for the configuration's reference manual
  - doc/lua.txt for the Lua's reference manual
  - doc/SPOE.txt for how to use the SPOE engine
  - doc/network-namespaces.txt for how to use network namespaces under Linux
  - doc/management.txt for the management guide
  - doc/regression-testing.txt for how to use the regression testing suite
  - doc/peers.txt for the peers protocol reference
  - doc/coding-style.txt for how to adopt HAProxy's coding style
  - doc/internals for developer-specific documentation (not all up to date)