Commit graph

37 commits

Author SHA1 Message Date
Willy Tarreau
1229ef312d MINOR: wdt: do not rely on threads_to_dump anymore
This flag is not needed anymore as we're already marking the waiting
threads as harmless, thus the thread's bit is already covered by this
information. The variable was unexported.
2022-07-01 19:26:35 +02:00
Willy Tarreau
03f9b35114 MEDIUM: tinfo: add a dynamic thread-group context
The thread group info is not sufficient to represent a thread group's
current state as it's read-only. We also need something comparable to
the thread context to represent the aggregate state of the threads in
that group. This patch introduces ha_tgroup_ctx[] and tg_ctx for this.
It's indexed on the group id and must be cache-line aligned. The thread
masks that were global and that do not need to remain global were moved
there (want_rdv, harmless, idle).

Given that all the masks placed there now become group-specific, the
associated thread mask (tid_bit) now switches to the thread's local
bit (ltid_bit). Both are the same for nbtgroups 1 but will differ for
other values.

There's also a tg_ctx pointer in the thread so that it can be reached
from other threads.
2022-07-01 19:15:15 +02:00
Willy Tarreau
adc1f52c92 MINOR: wdt: use ltid_bit in wdt_handler()
Since commit cc7a11ee3 ("MINOR: threads: set the tid, ltid and their bit
in thread_cfg") we ought not use (1UL << thr) to get the group mask for
thread <thr>, but (ha_thread_info[thr].ltid_bit). wdt_handler() needs
this.
2022-07-01 19:15:14 +02:00
Willy Tarreau
e7475c8e79 MEDIUM: tasks/fd: replace sleeping_thread_mask with a TH_FL_SLEEPING flag
Every single place where sleeping_thread_mask was still used was to test
or set a single thread. We can now add a per-thread flag to indicate a
thread is sleeping, and remove this shared mask.

The wake_thread() function now always performs an atomic fetch-and-or
instead of a first load then an atomic OR. That's cleaner and more
reliable.

This is not easy to test, as broadcast FD events are rare. The good
way to test for this is to run a very low rate-limited frontend with
a listener that listens to the fewest possible threads (2), and to
send it only 1 connection at a time. The listener will periodically
pause and the wakeup task will sometimes wake up on a random thread
and will call wake_thread():

   frontend test
        bind :8888 maxconn 10 thread 1-2
        rate-limit sessions 5

Alternately, disabling/enabling a frontend in loops via the CLI also
broadcasts such events, but they're more difficult to observe since
this is causing connection failures.
2022-07-01 19:15:14 +02:00
Willy Tarreau
bdcd32598f MINOR: thread: only use atomic ops to touch the flags
The thread flags are touched a little bit by other threads, e.g. the STUCK
flag may be set by other ones, and they're watched a little bit. As such
we need to use atomic ops only to manipulate them. Most places were already
using them, but here we generalize the practice. Only ha_thread_dump() does
not change because it's run under isolation.
2022-07-01 19:15:14 +02:00
William Lallemand
ae053b30da BUG/MEDIUM: wdt: don't trigger the watchdog when p is unitialized
In wdt_handler(), does not try to trigger the watchdog if the
prev_cpu_time wasn't initialized.

This prevents an unexpected trigger of the watchdog when it wasn't
initialized yet. This case could happen in the master just after loading
the configuration. This would show a trace where the <diff> value is equal
to the <now> value in the trace, and the <poll> value would be 0.

For example:

    Thread 1 is about to kill the process.
    *>Thread 1 : id=0x0 act=1 glob=1 wq=0 rq=0 tl=0 tlsz=0 rqsz=0
                  stuck=1 prof=0 harmless=0 wantrdv=0
                  cpu_ns: poll=0 now=6005541706 diff=6005541706
                  curr_task=0

Thanks to Christian Ruppert for repporting the problem.

Could be backported in every stable versions.
2022-05-13 11:28:08 +02:00
Willy Tarreau
a0b99536c8 REORG: thread/sched: move the thread_info flags to the thread_ctx
The TI_FL_STUCK flag is manipulated by the watchdog and scheduler
and describes the apparent life/death of a thread so it changes
all the time and it makes sense to move it to the thread's context
for an active thread.
2021-10-08 17:22:26 +02:00
Willy Tarreau
45c38e22bf REORG: thread/clock: move the clock parts of thread_info to thread_ctx
The "thread_info" name was initially chosen to store all info about
threads but since we now have a separate per-thread context, there is
no point keeping some of its elements in the thread_info struct.

As such, this patch moves prev_cpu_time, prev_mono_time and idle_pct to
thread_ctx, into the thread context, with the scheduler parts. Instead
of accessing them via "ti->" we now access them via "th_ctx->", which
makes more sense as they're totally dynamic, and will be required for
future evolutions. There's no room problem for now, the structure still
has 84 bytes available at the end.
2021-10-08 17:22:26 +02:00
Willy Tarreau
6414e4423c CLEANUP: wdt: do not remap SI_TKILL to SI_LWP, test the values directly
We used to remap SI_TKILL to SI_LWP when SI_TKILL was not available
(e.g. FreeBSD) but that's ugly and since we need this only in a single
switch/case block in wdt.c it's even simpler and cleaner to perform the
two tests there, so let's do this.
2021-10-08 17:22:26 +02:00
Willy Tarreau
b474f43816 MINOR: wdt: move wd_timer to wdt.c
The watchdog timer had no more reason for being shared with the struct
thread_info since the watchdog is the only user now. Let's remove it
from the struct and move it to a static array in wdt.c. This removes
some ifdefs and the need for the ugly mapping to empty_t that might be
subject to a cast to a long when compared to TIMER_INVALID. Now timer_t
is not known outside of wdt.c and clock.c anymore.
2021-10-08 17:22:26 +02:00
Willy Tarreau
2169498941 MINOR: clock: move the clock_ids to clock.c
This removes the knowledge of clockid_t from anywhere but clock.c, thus
eliminating a source of includes burden. The unused clock_id field was
removed from thread_info, and the definition setting of clockid_t was
removed from compat.h. The most visible change is that the function
now_cpu_time_thread() now takes the thread number instead of a tinfo
pointer.
2021-10-08 17:22:26 +02:00
Willy Tarreau
6cb0c391e7 REORG: clock/wdt: move wdt timer initialization to clock.c
The code that deals with timer creation for the WDT was moved to clock.c
and is called with the few relevant arguments. This removes the need for
awareness of clock_id from wdt.c and as such saves us from having to
share it outside. The timer_t is also known only from both ends but not
from the public API so that we don't have to create a fake timer_t
anymore on systems which do not support it (e.g. macos).
2021-10-08 17:22:26 +02:00
Willy Tarreau
5554264f31 REORG: time: move time-keeping code and variables to clock.c
There is currently a problem related to time keeping. We're mixing
the functions to perform calculations with the os-dependent code
needed to retrieve and adjust the local time.

This patch extracts from time.{c,h} the parts that are solely dedicated
to time keeping. These are the "now" or "before_poll" variables for
example, as well as the various now_*() functions that make use of
gettimeofday() and clock_gettime() to retrieve the current time.

The "tv_*" functions moved there were also more appropriately renamed
to "clock_*".

Other parts used to compute stolen time are in other files, they will
have to be picked next.
2021-10-08 17:22:26 +02:00
Willy Tarreau
19b18ad552 CLENAUP: wdt: use ha_tkill() instead of accessing pthread directly
Instead of calling pthread_kill() directly on the pthread_t let's
call ha_tkill() which does the same by itself. This will help isolate
pthread_t.
2021-10-07 01:41:14 +02:00
Willy Tarreau
9310f481ce CLEANUP: tree-wide: remove unneeded include time.h in ~20 files
20 files used to have haproxy/time.h included only for now_ms, and two
were missing it for other things but used to inherit from it via other
files.
2021-10-07 01:41:14 +02:00
Willy Tarreau
7f673c2cde BUILD: wdt: include signal-t.h
WDT_SIG is used there, thus signal-t.h is required. Currently it's
retrieved by accident through global.h.
2021-05-08 12:29:01 +02:00
Christopher Faulet
fc633b6eff CLEANUP: config: Return ERR_NONE from config callbacks instead of 0
Return ERR_NONE instead of 0 on success for all config callbacks that should
return ERR_* codes. There is no change because ERR_NONE is a macro equals to
0. But this makes the return value more explicit.
2020-11-13 16:26:10 +01:00
Willy Tarreau
36979d9ad5 REORG: include: move the error reporting functions to from log.h to errors.h
Most of the files dealing with error reports have to include log.h in order
to access ha_alert(), ha_warning() etc. But while these functions don't
depend on anything, log.h depends on a lot of stuff because it deals with
log-formats and samples. As a result it's impossible not to embark long
dependencies when using ha_warning() or qfprintf().

This patch moves these low-level functions to errors.h, which already
defines the error codes used at the same places. About half of the users
of log.h could be adjusted, sometimes revealing other issues such as
missing tools.h. Interestingly the total preprocessed size shrunk by
4%.
2020-06-11 10:18:59 +02:00
Willy Tarreau
aeed4a85d6 REORG: include: move log.h to haproxy/log{,-t}.h
The current state of the logging is a real mess. The main problem is
that almost all files include log.h just in order to have access to
the alert/warning functions like ha_alert() etc, and don't care about
logs. But log.h also deals with real logging as well as log-format and
depends on stream.h and various other things. As such it forces a few
heavy files like stream.h to be loaded early and to hide missing
dependencies depending where it's loaded. Among the missing ones is
syslog.h which was often automatically included resulting in no less
than 3 users missing it.

Among 76 users, only 5 could be removed, and probably 70 don't need the
full set of dependencies.

A good approach would consist in splitting that file in 3 parts:
  - one for error output ("errors" ?).
  - one for log_format processing
  - and one for actual logging.
2020-06-11 10:18:58 +02:00
Willy Tarreau
3727a8a083 REORG: include: move signal.h to haproxy/signal{,-t}.h
No change was necessary. Include from wdt.c was dropped since unneeded.
2020-06-11 10:18:58 +02:00
Willy Tarreau
f268ee8795 REORG: include: split global.h into haproxy/global{,-t}.h
global.h was one of the messiest files, it has accumulated tons of
implicit dependencies and declares many globals that make almost all
other file include it. It managed to silence a dependency loop between
server.h and proxy.h by being well placed to pre-define the required
structs, forcing struct proxy and struct server to be forward-declared
in a significant number of files.

It was split in to, one which is the global struct definition and the
few macros and flags, and the rest containing the functions prototypes.

The UNIX_MAX_PATH definition was moved to compat.h.
2020-06-11 10:18:58 +02:00
Willy Tarreau
48fbcae07c REORG: tools: split common/standard.h into haproxy/tools{,-t}.h
And also rename standard.c to tools.c. The original split between
tools.h and standard.h dates from version 1.3-dev and was mostly an
accident. This patch moves the files back to what they were expected
to be, and takes care of not changing anything else. However this
time tools.h was split between functions and types, because it contains
a small number of commonly used macros and structures (e.g. name_desc)
which in turn cause the massive list of includes of tools.h to conflict
with the callers.

They remain the ugliest files of the whole project and definitely need
to be cleaned and split apart. A few types are defined there only for
functions provided there, and some parts are even OS-specific and should
move somewhere else, such as the symbol resolution code.
2020-06-11 10:18:57 +02:00
Willy Tarreau
3f567e4949 REORG: include: split hathreads into haproxy/thread.h and haproxy/thread-t.h
This splits the hathreads.h file into types+macros and functions. Given
that most users of this file used to include it only to get the definition
of THREAD_LOCAL and MAXTHREADS, the bare minimum was placed into thread-t.h
(i.e. types and macros).

All the thread management was left to haproxy/thread.h. It's worth noting
the drop of the trailing "s" in the name, to remove the permanent confusion
that arises between this one and the system implementation (no "s") and the
makefile's option (no "s").

For consistency, src/hathreads.c was also renamed thread.c.

A number of files were updated to only include thread-t which is the one
they really needed.

Some future improvements are possible like replacing empty inlined
functions with macros for the thread-less case, as building at -O0 disables
inlining and causes these ones to be emitted. But this really is cosmetic.
2020-06-11 10:18:56 +02:00
Willy Tarreau
2a83d60662 REORG: include: move debug.h from common/ to haproxy/
The debug file is cleaner now and does not depend on much anymore.
2020-06-11 10:18:56 +02:00
Willy Tarreau
4c7e4b7738 REORG: include: update all files to use haproxy/api.h or api-t.h if needed
All files that were including one of the following include files have
been updated to only include haproxy/api.h or haproxy/api-t.h once instead:

  - common/config.h
  - common/compat.h
  - common/compiler.h
  - common/defaults.h
  - common/initcall.h
  - common/tools.h

The choice is simple: if the file only requires type definitions, it includes
api-t.h, otherwise it includes the full api.h.

In addition, in these files, explicit includes for inttypes.h and limits.h
were dropped since these are now covered by api.h and api-t.h.

No other change was performed, given that this patch is large and
affects 201 files. At least one (tools.h) was already freestanding and
didn't get the new one added.
2020-06-11 10:18:42 +02:00
Olivier Houchard
de01ea9878 MINOR: wdt: Move the definitions of WDTSIG and DEBUGSIG into types/signal.h.
Move the definition of WDTSIG and DEBUGSIG from wdt.c and debug.c into
types/signal.h, so that we can access them in another file.
We need those definition to avoid blocking those signals when running
__signal_process_queue().

This should be backported to 2.1, 2.0 and 1.9.
2020-03-18 13:07:19 +01:00
Willy Tarreau
0627815f70 BUILD: wdt: only test for SI_TKILL when compiled with thread support
SI_TKILL is not necessarily defined on older systems and is used only
with the pthread_kill() call a few lines below, so it should also be
subject to the USE_THREAD condition.
2020-03-10 09:26:17 +01:00
Willy Tarreau
e58114e0e5 MINOR: wdt: do not depend on USE_THREAD
There is no reason for restricting the use of the watchdog to threads
anymore, as it works perfectly without threads as well.
2020-03-04 12:02:27 +01:00
Willy Tarreau
d6f1966543 MEDIUM: wdt: fall back to CLOCK_REALTIME if CLOCK_THREAD_CPUTIME is not available
At least FreeBSD has a fully functional CLOCK_THREAD_CPUTIME but it
cannot create a timer on it. This is not a problem since our timer is
only used to measure each thread's usage using now_cpu_time_thread().
So by just replacing this clock with CLOCK_REALTIME we allow such
platforms to periodically call the wdt and check the thread's CPU usage.
The consequence is that even on a totally idle system there will still
be a few extra periodic wakeups, but the watchdog becomes usable there
as well.
2020-03-04 12:02:27 +01:00
Willy Tarreau
7259fa2b89 BUG/MINOR: wdt: do not return an error when the watchdog couldn't be enabled
On operating systems not supporting to create a timer on
POSIX_THREAD_CPUTIME we emit a warning but we return an error so the
process fails to start, which is absurd. Let's return a success once
the warning is emitted instead.

This may be backported to 2.1 and 2.0.
2020-03-04 12:02:27 +01:00
Willy Tarreau
c1563e5474 MINOR: wdt: always clear sigev_value to make valgrind happy
In issue #471 it was reported that valgrind sometimes complains about
timer_create() being called with uninitialized bytes. These are in fact
the bits from sigev_value.sival_ptr that are not part of sival_int that
are tagged as such, as valgrind has no way to know we're using the int
instead of the ptr in the union. It's cheap to initialize the field so
let's do it.
2020-02-26 14:05:20 +01:00
David Carlier
a92c5cec2d BUILD/MEDIUM: threads: rename thread_info struct to ha_thread_info
On Darwin, the thread_info name exists as a standard function thus
we need to rename our array to ha_thread_info to fix this conflict.
2019-10-17 07:15:17 +02:00
Willy Tarreau
a37cb1880c MINOR: wdt: also consider that waiting in the thread dumper is normal
It happens that upon looping threads the watchdog fires, starts a dump,
and other threads expire their budget while waiting for the other threads
to get dumped and trigger a watchdog event again, adding some confusion
to the traces. With this patch the situation becomes clearer as we export
the list of threads being dumped so that the watchdog can check it before
deciding to trigger. This way such threads in queue for being dumped are
not attempted to be reported in turn.

This should be backported to 2.0 as it helps understand stack traces.
2019-07-31 19:35:31 +02:00
Willy Tarreau
bc1b820606 BUILD: watchdog: condition it to USE_RT
It's needed on Linux to have access to timerfd_*, and on FreeBSD this
lib is needed as well, though not enabled in our default build. We can
see later if it's OK to enable it, for now let's fix the build issues.
2019-05-23 10:20:55 +02:00
Willy Tarreau
02255b24df BUILD: watchdog: use si_value.sival_int, not si_int for the timer's value
Bah, the linux manpage suggests to use si_int but it's a fake, it's only
a define on sigval.sival_int where sigval is defined as si_value. Let's
use si_value.sival_int, at least it builds on both Linux and FreeBSD. It's
likely that this code will have to be limited to a small subset of OSes
if it causes difficulties like this.
2019-05-23 08:36:29 +02:00
Willy Tarreau
823bda0eb7 BUILD: time: remove the test on _POSIX_C_SOURCE
It seems it's not defined on FreeBSD while it's mentioned on Linux that
clock_gettime() can be detected using this. Given that we also have the
test for _POSIX_TIMERS>0 that should cover it well enough. If it breaks
on other systems, we'll see.

Report was here :
    https://github.com/haproxy/haproxy/runs/133866993
2019-05-22 19:14:59 +02:00
Willy Tarreau
2bfefdbaef MAJOR: watchdog: implement a thread lockup detection mechanism
Since threads were introduced, we've naturally had a number of bugs
related to locking issues. In addition we've also got some issues
with corrupted lists in certain rare cases not necessarily involving
threads. Not only these events cause a lot of trouble to the production
as it is very hard to detect that the process is stuck in a loop and
doesn't deliver the service anymore, but it's often difficult (or too
late) to collect more debugging information.

The patch presented here implements a lockup detection mechanism, also
known as "watchdog". The principle is that (on systems supporting it),
each thread will have its own CPU timer which progresses as the thread
consumes CPU cycles, and when a deadline is met, a signal is delivered
(SIGALRM here since it doesn't interrupt gdb by default).

The thread handling this signal (which is not necessarily the one which
triggered the timer) figures the thread ID from the signal arguments and
checks if it's really stuck by looking at the time spent since last exit
from poll() and by checking that the thread's scheduler is still alive
(so that even when dealing with configuration issues resulting in insane
amount of tasks being called in turn, it is not possible to accidently
trigger it). Checking the scheduler's activity will usually result in a
second chance, thus doubling the detecting time.

In order not to incorrectly flag a thread as being the cause of the
lockup, the thread_harmless_mask is checked : a thread could very well
be spinning on itself waiting for all other threads to join (typically
what happens when issuing "show sess"). In this case, once all threads
but one (or two) have joined, all the innocent ones are marked harmless
and will not trigger the timer. Only the ones not reacting will.

The deadline is set to one second, which already appears impossible to
reach, especially since it's 1 second of CPU usage, not elapsed time
with the CPU being preempted by other threads/processes/hypervisor. In
practice due to the scheduler's health verification it takes up to two
seconds to decide to panic.

Once all conditions are met, the goal is to crash from the offending
thread. So if it's the current one, we call ha_panic() otherwise the
signal is bounced to the offending thread which deals with it. This
will result in all threads being woken up in turn to dump their context,
the whole state is emitted on stderr in hope that it can be logged, and
the process aborts, leaving a chance for a core to be dumped and for a
service manager to restart it.

An alternative mechanism could be implemented for systems unable to
wake up a thread once its CPU clock reaches a deadline (e.g. FreeBSD).
Instead of waking the timer each and every deadline, it is possible to
use a standard timer which is reset each time we leave poll(). Since
the signal handler rechecks the CPU consumption this will also work.
However a totally idle process may trigger it from time to time which
may or may not confuse some debugging sessions. The same is true for
alarm() which could be another option for systems not having such a
broad choice of timers (but it seems that in this case they will not
have per-thread CPU measurements available either).

The feature is currently implemented only when threads are enabled in
order to keep the code clean, since the main purpose is to detect and
address inter-thread deadlocks. But if it proves useful for other
situations this condition might be relaxed.
2019-05-22 11:50:48 +02:00