HAProxy - Load balancer
Find a file
Willy Tarreau fc50b9dd14 BUG/MAJOR: sched: protect task during removal from wait queue
The issue addressed by commit fbb934da9 ("BUG/MEDIUM: stick-table: fix
a race condition when updating the expiration task") is still present
when thread groups are enabled, but this time it lies in the scheduler.

What happens is that a task configured to run anywhere might already
have been queued into one group's wait queue. When updating a stick
table entry, sometimes the task will have to be dequeued and requeued.

For this a lock is taken on the current thread group's wait queue lock,
but while this is necessary for the queuing, it's not sufficient for
dequeuing since another thread might be in the process of expiring this
task under its own group's lock which is different. This is easy to test
using 3 stick tables with 1ms expiration, 3 track-sc rules and 4 thread
groups. The process crashes almost instantly under heavy traffic.

One approach could consist in storing the group number the task was
queued under in its descriptor (we don't need 32 bits to store the
thread id, it's possible to use one short for the tid and another
one for the tgrp). Sadly, no safe way to do this was figured, because
the race remains at the moment the thread group number is checked, as
it might be in the process of being changed by another thread. It seems
that a working approach could consist in always having it associated
with one group, and only allowing to change it under this group's lock,
so that any code trying to change it would have to iterately read it
and lock its group until the value matches, confirming it really holds
the correct lock. But this seems a bit complicated, particularly with
wait_expired_tasks() which already uses upgradable locks to switch from
read state to a write state.

Given that the shared tasks are not that common (stick-table expirations,
rate-limited listeners, maybe resolvers), it doesn't seem worth the extra
complexity for now. This patch takes a simpler and safer approach
consisting in switching back to a single wq_lock, but still keeping
separate wait queues. Given that shared wait queues are almost always
empty and that otherwise they're scanned under a read lock, the
contention remains manageable and most of the time the lock doesn't
even need to be taken since such tasks are not present in a group's
queue. In essence, this patch reverts half of the aforementionned
patch. This was tested and confirmed to work fine, without observing
any performance degradation under any workload. The performance with
8 groups on an EPYC 74F3 and 3 tables remains twice the one of a
single group, with the contention remaining on the table's lock first.

No backport is needed.
2022-11-22 09:10:08 +01:00
.github CI: emit the compiler's version in the build reports 2022-11-14 11:14:02 +01:00
addons BUILD: prometheus: use __fallthrough in promex_dump_metrics() and IO handler() 2022-11-14 11:14:02 +01:00
admin CLEANUP: assorted typo fixes in the code and comments 2022-10-30 17:17:56 +01:00
dev DEV: poll: indicate the FD's side in front of its value 2022-11-17 10:56:35 +01:00
doc MINOR: global: generate random cluster.secret if not defined 2022-11-21 16:41:34 +01:00
examples EXAMPLES: remove completely outdated acl-content-sw.cfg 2022-05-30 18:14:24 +02:00
include BUG/MAJOR: sched: protect task during removal from wait queue 2022-11-22 09:10:08 +01:00
reg-tests REG-TESTS: cache: Remove T-E header for 304-Not-Modified responses 2022-11-16 17:19:43 +01:00
scripts CLEANUP: assorted typo fixes in the code and comments 2022-10-30 17:17:56 +01:00
src BUG/MAJOR: sched: protect task during removal from wait queue 2022-11-22 09:10:08 +01:00
tests TESTS: add a unit test for one_among_mask() 2022-06-21 20:29:57 +02:00
.cirrus.yml CI: cirrus-ci: bump FreeBSD image to 13-1 2022-09-09 13:30:17 +02:00
.gitattributes MINOR: Configure the cpp userdiff driver for *.[ch] in .gitattributes 2021-02-22 18:17:57 +01:00
.gitignore CLEANUP: exclude udp-perturb with .gitignore 2022-09-16 15:47:04 +02:00
.mailmap DOC: update Tim's address in .mailmap 2021-09-16 09:14:14 +02:00
.travis.yml CI: travis-ci: temporarily disable arm64 builds 2021-08-07 07:28:15 +02:00
BRANCHES DOC: fix some spelling issues over multiple files 2021-01-08 14:53:47 +01:00
CHANGELOG [RELEASE] Released version 2.7-dev9 2022-11-18 17:48:49 +01:00
CONTRIBUTING CLEANUP: assorted typo fixes in the code and comments 2021-08-16 12:37:59 +02:00
INSTALL BUILD: Makefile: Add Lua 5.4 autodetect 2022-07-04 17:28:48 +02:00
LICENSE LICENSE: add licence exception for OpenSSL 2012-09-07 13:52:26 +02:00
MAINTAINERS DOC: add maintainers for QUIC and HTTP/3 2022-05-30 17:34:51 +02:00
Makefile BUILD: Makefile: enable USE_SHM_OPEN by default on freebsd 2022-11-18 15:24:23 +01:00
README DOC: create a BRANCHES file to explain the life cycle 2019-06-15 22:00:14 +02:00
SUBVERS BUILD: use format tags in VERDATE and SUBVERS files 2013-12-10 11:22:49 +01:00
VERDATE [RELEASE] Released version 2.7-dev9 2022-11-18 17:48:49 +01:00
VERSION [RELEASE] Released version 2.7-dev9 2022-11-18 17:48:49 +01:00

The HAProxy documentation has been split into a number of different files for
ease of use.

Please refer to the following files depending on what you're looking for :

  - INSTALL for instructions on how to build and install HAProxy
  - BRANCHES to understand the project's life cycle and what version to use
  - LICENSE for the project's license
  - CONTRIBUTING for the process to follow to submit contributions

The more detailed documentation is located into the doc/ directory :

  - doc/intro.txt for a quick introduction on HAProxy
  - doc/configuration.txt for the configuration's reference manual
  - doc/lua.txt for the Lua's reference manual
  - doc/SPOE.txt for how to use the SPOE engine
  - doc/network-namespaces.txt for how to use network namespaces under Linux
  - doc/management.txt for the management guide
  - doc/regression-testing.txt for how to use the regression testing suite
  - doc/peers.txt for the peers protocol reference
  - doc/coding-style.txt for how to adopt HAProxy's coding style
  - doc/internals for developer-specific documentation (not all up to date)