Commit graph

25 commits

Author SHA1 Message Date
Willy Tarreau
a4471ea56d MINOR: cpu-topo: implement a CPU sorting mechanism by cluster ID
This will be used to detect and fix incorrect setups which report
the same cluster ID for multiple L3 instances.

The arrangement of functions in this file is becoming a real problem.
Maybe we should move all this to cpu_topo for example, and better
distinguish OS-specific and generic code.
2025-03-14 18:30:31 +01:00
Willy Tarreau
a8acdbd9fd MINOR: cpu-topo: implement a sorting mechanism by CPU locality
Once we've kept only the CPUs we want, the next step will be to form
groups and these ones are based on locality. Thus we'll have to sort by
locality. For now the locality is only inferred by the index. No grouping
is made at this point. For this we add the "cpu_reorder_by_locality"
function with a locality-based comparison function.
2025-03-14 18:30:31 +01:00
Willy Tarreau
18133a054d MINOR: cpu-topo: implement a sorting mechanism for CPU index
CPU selection will be performed by sorting CPUs according to
various criteria. For dumps however, that's really not convenient
and we'll need to reorder the CPUs according to their index only.
This is what the new function cpu_reorder_by_index() does. It's
called  in thread_detect_count() before dumping the CPU topology.
2025-03-14 18:30:31 +01:00
Willy Tarreau
661d49a18a MINOR: cpu-topo: skip CPU properties that we've verified do not exist
A number of entries under /cpu/cpu%d only exist on certain kernel
versions, certain archs and/or with certain modules loaded. It's
pointless to insist on trying to read them all for all CPUs when
we've already verified they do not exist. Thus let's use stat()
the first time prior to checking some of them, and only try to
access them when they really exist. This almost completely
eliminates the large number of ENOENT that was visible in strace
during startup.
2025-03-14 18:30:31 +01:00
Willy Tarreau
baeea08dba MINOR: cpu-topo: skip identification of non-existing CPUs
There's no point trying to read all entries under /cpu/cpu%d when that
one does not exist, so let's just skip it in this case.
2025-03-14 18:30:31 +01:00
Willy Tarreau
8542c79f9d MINOR: cpu-topo: skip CPU detection when /sys/.../cpu does not exist
There's no point scanning all entries when /cpu doesn't exist in the
first place. Let's check once for it and skip the loop in this case.
2025-03-14 18:30:30 +01:00
Willy Tarreau
c5ddf4a5b2 MINOR: cpu-topo: boost the capacity of performance cores with cpufreq
Cpufreq alone isn't a good metric on heterogenous CPUs because efficient
cores can reach almost as high frequencies as performant ones. Tests have
shown that majoring performance cores by 50% gives a pretty accurate
estimate of the performance to expect on modern CPUs, and that counting
+33% per extra SMT thread is reasonable as well. We don't have the info
about the core's quality, but using the presence of SMT is a reasonable
approach in this case, given that efficiency cores will not use it.

As an example, using one thread of each of the 8 P-cores of an intel
i9-14900k gives 395k rps for a corrected total capacity of 69.3k, using
the 16 E-cores gives 40.5k for a total capacity of 70.4k, and using both
threads of 6 P-cores gives 41.1k for a total capacity of 69.6k. Thus the
3 same scores deliver the same performance in various combinations.
2025-03-14 18:30:30 +01:00
Willy Tarreau
e4aa13e786 MINOR: cpu-topo: use cpufreq before acpi cppc
The acpi_cppc method was found to take about 5ms per CPU on a 64-core
EPYC system, which is plain unacceptable as it delays the boot by half
a second. Let's use the less accurate cpufreq first, which should be
sufficient anyway since many systems do not have acpi_cppc. We'll only
fall back to acpi_cppc for systems without cpufreq. If it were to be
an issue over time, we could also automatically consider that all
threads of the same core or even of the same cluster run at the same
speed (when a cluster is known to be accurate).
2025-03-14 18:30:30 +01:00
Willy Tarreau
d11241b7ba MINOR: cpu-topo: fall back to nominal_perf and scaling_max_freq for the capacity
When cpu_capacity is not present, let's try to check acpi_cppc's
nominal_perf which is similar and commonly found on servers, then
scaling_max_freq (though that last one may vary a bit between CPUs
depending on die quality). That variation is not a problem since
we can absorb a ~5% variation without issue.

It was verified on an i9-14900 featuring 5.7-P, 6.0-P and 4.4-E GHz
that P-cores were not reordered and that E cores were placed last.
It was also OK on a W3-2345 with 4.3 to 4.5GHz.
2025-03-14 18:30:30 +01:00
Willy Tarreau
322c28cc19 MINOR: cpu-topo: refine cpu dump output to better show kept/dropped CPUs
It's becoming difficult to see which CPUs are going to be kept/dropped.
Let's just skip all offline CPUs, and indicate "keep" in front of those
that are going to be used, and "----" in front of the excluded ones. It
is way more readable this way.

Also let's just drop the array entry number, since it's always the same
as the CPU number and is only an internal representation anyway.
2025-03-14 18:30:30 +01:00
Willy Tarreau
68069e4b27 MINOR: cpu-topo: add "drop-cpu" and "only-cpu" to cpu-set
These allow respectively to disable binding to CPUs listed in a set, and
to disable binding to CPUs not in a set.
2025-03-14 18:30:30 +01:00
Willy Tarreau
cda4956d9c MINOR: cpu-topo: add a new "cpu-set" global directive to choose cpus
For now it's limited, it only supports "reset" to ask that any previous
"taskset" be ignored. The goal will be to later add more actions that
allow to symbolically define sets of cpus to bind to or to drop. This
also clears the cpu_mask_forced variable that is used to detect
that a taskset had been used.
2025-03-14 18:30:30 +01:00
Willy Tarreau
ac1db9db7d MINOR: thread: turn thread_cpu_mask_forced() into an init-time variable
The function is not convenient because it doesn't allow us to undo the
startup changes, and depending on where it's being used, we don't know
whether the values read have already been altered (this is not the case
right now but it's going to evolve).

Let's just compute the status during cpu_detect_usable() and set a
variable accordingly. This way we'll always read the init value, and
if needed we can even afford to reset it. Also, placing it in cpu_topo.c
limits cross-file dependencies (e.g. threads without affinity etc).
2025-03-14 18:30:30 +01:00
Willy Tarreau
3a7cc676fa MINOR: cpu-topo: add NUMA node identification to CPUs on FreeBSD
With this patch we're also NUMA node IDs to each CPU when the info is
found. The code is highly inspired from the one in commit f5d48f8b3
("MEDIUM: cfgparse: numa detect topology on FreeBSD."), the difference
being that we're just setting the value in ha_cpu_topo[].
2025-03-14 18:30:30 +01:00
Willy Tarreau
f6154c079e MINOR: cpu-topo: add NUMA node identification to CPUs on Linux
With this patch we're also assigning NUMA node IDs to each CPU when one
is found. The code is highly inspired from the one in commit b56a7c89a
("MEDIUM: cfgparse: detect numa and set affinity if needed") that already
did the job, except that it could be simplified since we're just collecting
info to fill the ha_cpu_topo[] array.
2025-03-14 18:30:30 +01:00
Willy Tarreau
65612369e7 MINOR: cpu-topo: also store the sibling ID with SMT
The sibling ID was not reported because it's not directly accessible
but we don't care, what matters is that we assign numbers to all the
threads we find using the same CPU so that some strategies permit to
allocate one thread at a time if we want to use few threads with max
performance.
2025-03-14 18:30:30 +01:00
Willy Tarreau
7cb274439b MINOR: cpu-topo: add CPU topology detection for linux
This uses the publicly available information from /sys to figure the cache
and package arrangements between logical CPUs and fill ha_cpu_topo[], as
well as their SMT capabilities and relative capacity for those which expose
this. The functions clearly have to be OS-specific.
2025-03-14 18:30:30 +01:00
Willy Tarreau
12f3a2bbb7 MINOR: cpu-topo: try to detect offline cpus at boot
When possible, the offline CPUs are detected at boot and their OFFLINE
flag is set in the ha_cpu_topo[] array. When the detection is not
possible (e.g. not linux, /sys not mounted etc), we just mark none of
them as being offline, as we don't want to infer wrong info that could
hinder automatic CPU placement detection. When valid, we take this
opportunity for refining cpu_topo_lastcpu so that we don't need to
manipulate CPUs beyond this value.
2025-03-14 18:30:30 +01:00
Willy Tarreau
44881e5abf MINOR: cpu-topo: add detection of online CPUs on FreeBSD
On FreeBSD we can detect online CPUs at least by doing the bitwise-OR of
the CPUs of all domains, so we're using this and adding this detection
to ha_cpuset_detect_online(). If we find simpler later, we can always
rework it, but it's reasonably inexpensive since we only check existing
domains.
2025-03-14 18:30:30 +01:00
Willy Tarreau
8f72ce335a MINOR: cpu-topo: add detection of online CPUs on Linux
This adds a generic function ha_cpuset_detect_online() which for now
only supports linux via /sys. It fills a cpuset with the list of online
CPUs that were detected (or returns a failure).
2025-03-14 18:30:30 +01:00
Willy Tarreau
8c524c7c9d REORG: cpu-topo: move bound cpu detection from cpuset to cpu-topo
The cpuset files are normally used only for cpu manipulations. It happens
that the initial CPU binding detection was initially placed there since
there was no better place, but in practice, being OS-specific, it should
really be in cpu-topo. This simplifies cpuset which doesn't need to know
about the OS anymore.
2025-03-14 18:30:30 +01:00
Willy Tarreau
a6fdc3eaf0 MINOR: cpu-topo: update CPU topology from excluded CPUs at boot
Now before trying to resolve the thread assignment to groups, we detect
which CPUs are not bound at boot so that we can mark them with
HA_CPU_F_EXCLUDED. This will be useful to better know on which CPUs we
can count later. Note that we purposely ignore cpu-map here as we
don't know how threads and groups will map to cpu-map entries, hence
which CPUs will really be used.

It's important to proceed this way so that when we have no info we
assume they're all available.
2025-03-14 18:30:30 +01:00
Willy Tarreau
bdb731172c MINOR: cpu-topo: add a function to dump CPU topology
The new function cpu_dump_topology() will centralize most debugging
calls, and it can make efforts of not dumping some possibly irrelevant
fields (e.g. non-existing cache levels).
2025-03-14 18:30:30 +01:00
Willy Tarreau
041462c4af MINOR: cpu-topo: rely on _SC_NPROCESSORS_CONF to trim maxcpus
We don't want to constantly deal with as many CPUs as a cpuset can hold,
so let's first try to trim the value to what the system claims to support
via _SC_NPROCESSORS_CONF. It is obviously still subject to the limit of
the cpuset size though. The value is stored globally so that we can
reuse it elsewhere after initialization.
2025-03-14 18:30:30 +01:00
Willy Tarreau
656cedad42 MINOR: cpu-topo: allocate and initialize the ha_cpu_topo array.
This does the bare minimum to allocate and initialize a global
ha_cpu_topo array for the number of supported CPUs and release
it at deinit time.
2025-03-14 18:30:30 +01:00