RANDOM_CACHED is overloaded to refer both to entropy obtained from files
loaded by the boot loader, and entropy obtained via writes to
/dev/random. Introduce a new source, RANDOM_RANDOMDEV, to refer to the
latter. This is to enable treating RANDOM_CACHED as a special case in
the NIST health test implementation.
Update the default harvest_mask in rc.conf to include RANDOM_RANDOMDEV,
preserving the old behaviour of accepting writes to /dev/random.
Bump __FreeBSD_version for modules which register a pure source, since
all of their values have now shifted.
Reviewed by: cem
MFC after: 3 months
Sponsored by: Stormshield
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D51155
This patch implements the noise source health tests described in chapter
four of NIST SP 800-90B[1]. The repetition count test and adaptive
proportion test both help identify cases where a noise source is stuck
and generating the same output too frequently. The tests are disabled
by default, but making an implementation available may help implementors
conform to FIPS validation requirements. This implementation aims to
comply with the requirements listed in section 4.3 of the document.
To enable health testing, set the kern.random.nist_healthtest_enabled
tunable to 1. Startup testing is implemented as specified in the
document: the first 1024 samples from a source are evaluated according
to the two tests, and they are discarded. The RANDOM_CACHED and
RANDOM_PURE_VMGENID sources are excluded from testing, as they are
effectively a one-time source of entropy, and statistical testing
doesn't seem to provide much use.
Since the first 1024 samples from entropy sources are discarded by the
implementation, it is possible that we might end up with insufficient
entropy during early boot if no boot-time entropy source (i.e.,
/entropy) is provided. If this is a problem, it could be remediated by
modifying the implementation to poll applicable sources (e.g., RDRAND)
to complete startup testing quickly, rather than relying on the random
kthread.
The entry point for the tests is random_harvest_healthtest(), intended
to be called from individual CSPRNG implementations in order to leverage
their locking context, e.g., the entropy pool lock in Fortuna. The
Fortuna implementation is modified to call this entry point, mainly to
demonstrate how the health tests can be integrated.
The tests operate on the entropy buffer plus the embedded timestamp,
treating them as a single value. We could alternately apply the tests
to the buffer and timestamp separately.
The main parameters for the tests themselves are H, the expected
min-entropy of samples, and alpha, the desired false positive error
rate. This implementation selects H=1 and alpha=2^{-34}; since each
sample includes a CPU cycle counter value, it seems reasonable to expect
at least one bit of entropy from among the low bits of the
high-frequency counter present on systems where FreeBSD is commonly
deployed, and the false positive rate was somewhat arbitrarily selected;
for more details see the comment in random_healthtest_init().
When a health test fails, a message is printed to the console and the
source is disabled. On-demand testing is also supported via the
kern.random.nist_healthtest_ondemand sysctl. This can be used be an
administrator to re-enable a disabled source, following the same startup
testing mentioned above.
[1] https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-90B.pdf
Reviewed by: cem
MFC after: 3 months
Sponsored by: Stormshield
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D51154
The entropy queue stores entropy gathered from environmental sources.
Periodically (every 100ms currently), the random kthread will drain this
queue and mix it into the CSPRNG's entropy pool(s).
The old scheme uses a ring buffer with a mutex to serialize producers,
while the sole consumer, the random kthread, avoids using a mutex on the
basis that no serialization is needed since nothing else is updating the
consumer index. On platforms without total store ordering, however,
this isn't sufficient: when a producer inserts a queue entry and updates
`ring.in`, there is no guarantee that the consumer will see the updated
queue entry upon observing the updated producer index. That is, the
update to `ring.in` may be visible before the updated queue entry is
visible. As a result, we could end up mixing in zero'ed queue entries,
though this race is fairly unlikely in practice given how infrequently
the kthread runs.
The easiest way to fix this is to make the kthread acquire the mutex as
well, and hold it while processing queue entries. However, this might
result in a long hold time if there are many queue entries, and we
really want the hold times to be short, e.g., to avoid delaying
interrupt processing.
We could introduce a proper MPSC queue, but this is probably
overcomplicated for a consumer which runs at 10Hz.
Instead, define two buffers, always with one designated as the "active"
buffer. Producers queue entries in the active buffer, and the kthread
uses the mutex to atomically flip the two buffers, so it can process
entries from the inactive buffer without holding the mutex. This
requires more memory, but keeps mutex hold times short and lets us keep
the queue implementation very simple.
Reviewed by: cem
MFC after: 1 month
Sponsored by: Stormshield
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D51112
Such annotations are obsolete, the compiler tells us when parameters are
unused. No functional change intended.
Reviewed by: cem
MFC after: 1 week
Sponsored by: Stormshield
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D51114
Entropy queue entries always include the low 32 bits of a CPU cycle
count reading. Introduce a macro for this instead of hard-coding
get_cyclecount() calls everywhere; this is handy for testing purposes
since this way, random(4)'s use of the cycle counter (e.g., the number
of bits we use) can be changed in one place.
No functional change intended.
Reviewed by: cem, delphij
MFC after: 1 week
Sponsored by: Stormshield
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D51113
They can't be used externally, so it makes no sense to have them in a
header. No functional change intended.
Reviewed by: cem
MFC after: 1 week
Sponsored by: Stormshield
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D51111
Switch to using sys/stdarg.h for va_list type and va_* builtins.
Make an attempt to insert the include in a sensible place. Where
style(9) was followed this is easy, where it was ignored, aim for the
first block of sys/*.h headers and don't get too fussy or try to fix
other style bugs.
Reviewed by: imp
Exp-run by: antoine (PR 286274)
Pull Request: https://github.com/freebsd/freebsd-src/pull/1595
The Arm True Random Number Generator Firmware Interface provides a way
to query the SMCCC firmware for up to 192 bits of entropy. Use it to
provide another source of randomness to the kernel.
Reviewed by: cem, markm
Sponsored by: Arm Ltd
Differential Revision: https://reviews.freebsd.org/D46989
Move RANDOM_FORTUNA_{NPOOLS,DEFPOOLSIZE} from fortuna.c to fortuna.h
and use RANDOM_FORTUNA_DEFPOOLSIZE in random_harvestq.c rather than
having a magic (albeit explained in a comment) number. The NPOOLS
value will be used in a later commit.
Reviewed by: cem
MFC after: 1 week
Sponsored by: Amazon
Differential Revision: https://reviews.freebsd.org/D46693
We have the mechanism in place to support encoding system registers
explicitly, so use that rather than requiring LLVM 13+, which breaks our
current set of GitHub CI builds.
Fixes: 9eecef0521 ("Add an Armv8 rndr random number provider")
Apply the following automated changes to try to eliminate
no-longer-needed sys/cdefs.h includes as well as now-empty
blank lines in a row.
Remove /^#if.*\n#endif.*\n#include\s+<sys/cdefs.h>.*\n/
Remove /\n+#include\s+<sys/cdefs.h>.*\n+#if.*\n#endif.*\n+/
Remove /\n+#if.*\n#endif.*\n+/
Remove /^#if.*\n#endif.*\n/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/types.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/param.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/capsicum.h>/
Sponsored by: Netflix
Armv8.5 adds an optional random number generator. This is implemented
as two special registers one to read a random number, the other to
re-seed the entropy pool before reading a random number. Both registers
will set the condition flags to tell the caller they can't produce a
random number in a reasonable amount of time.
Without a signal to reseed the entropy pool use the latter register
to provide random numbers to the kernel pool. If at a later time we
had a way to tell the provider if it needs to reseed or not we could
use the former.
On an Amazon AWS Graviton3 VM this never failed, however this may not
be the case on low end CPUs so retry reading the random number 10 times
before returning an error.
Reviewed by: imp, delphij (csprng)
Sponsored by: The FreeBSD Foundation
Sponsored by: Arm Ltd
Differential Revision: https://reviews.freebsd.org/D35411
The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch
up to that fact and revert to their recommended match of BSD-2-Clause.
Discussed with: pfg
MFC After: 3 days
Sponsored by: Netflix
We periodically ingest entropy from pollable entropy sources, but only
8 bytes at a time and only occasionally enough to feed all of Fortuna's
pools once per second. This can result in Fortuna remaining unseeded
for a nontrivial amount of time when there is no entropy passed in from
the boot loader, even if RDRAND is available to quickly provide a large
amount of entropy.
Detect in random_sources_feed if we are not yet seeded, and increase the
amount of immediate entropy harvesting we perform, in order to "fill"
Fortuna's entropy pools and avoid having
random: randomdev_wait_until_seeded unblock wait
stall the boot process when entropy is available.
This speeds up the FreeBSD boot in the Firecracker VM by 2.3 seconds.
Approved by: csprng (delphij)
Sponsored by: https://www.patreon.com/cperciva
Differential Revision: https://reviews.freebsd.org/D35802
- Add the missing RANDOM_PURE_QUALCOMM description
- Make RANDOM_PURE_VMGENID consistent with the other pure sources
by including "PURE_" in the description.
Approved by: csprng (cem)
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D35412
All supported compilers (modern versions of GCC and clang) support
this.
Many places didn't have an #else so would just silently do the wrong
thing. Ancient versions of icc (the original motivation for this) are
no longer a compiler FreeBSD supports.
PR: 263102 (exp-run)
Reviewed by: brooks, imp
Differential Revision: https://reviews.freebsd.org/D34797
UEFI provides a protocol for accessing randomness. This is a good way
to gather early entropy, especially when there's no driver for the RNG
on the platform (as is the case on the Marvell Armada8k (MACCHIATObin)
for now).
If the entropy_efi_seed option is enabled in loader.conf (default: YES)
obtain 2048 bytes of entropy from UEFI and pass is to the kernel as a
"module" of name "efi_rng_seed" and type "boot_entropy_platform"; if
present, ingest it into the kernel RNG.
Submitted by: Greg V
Reviewed by: markm, kevans
Approved by: csprng (markm)
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D20780
74cf7cae4d ("softclock: Use dedicated ithreads for running callouts.")
switched callouts away from the swi infrastructure. It turns out that
this was a major source of entropy in early boot, which we've now lost.
As a result, first boot on hardware without a 'fast' entropy source
would block waiting for fortuna to be seeded with little hope of
progressing without manual intervention.
Let's resolve it by explicitly harvesting entropy in callout_process()
if we've handled any callouts. cc/curthread/now seem to be reasonable
sources of entropy, so use those.
Discussed with: jhb (also proposed initial patch)
Reported by: many
Reviewed by: cem, markm (both csprng)
Differential Revision: https://reviews.freebsd.org/D34150
This was introduced in 2014 along with the comment (which has since
been deleted):
/* Introduce an annoying delay to stop swamping */
Modern cryptographic random number generators can ingest arbitrarily
large amounts of non-random (or even maliciously selected) input
without losing their security.
Depending on the number of "boot entropy files" present on the system,
this can speed up the boot process by up to 1 second.
Reviewed by: cem
MFC ater: 1 week
Sponsored by: https://www.patreon.com/cperciva
Differential Revision: https://reviews.freebsd.org/D32984
It seems that clang IAS erronously adds repz prefix which should not be
there. Cpu would try to store around %ecx bytes of random, while we
only expect a word.
PR: 259218
Reported and tested by: Dennis Clarke <dclarke@blastwave.org>
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Previously, we were collecting at a base rate of:
64 bits x 32 pools x 10 Hz = 2.5 kB/s
This change drops it to closer to 64-ish bits per pool per second, to
work a little better with entropy providers in virtualized environments
without compromising the security goals of Fortuna.
Reviewed by: #csprng (cem, delphij, markm)
Differential Revision: https://reviews.freebsd.org/D32021
Refer to discussion in PR 230808 for a less incomplete discussion, but
the gist of this change is that we currently collect orders of magnitude
more entropy than we need.
The excess comes from bytes being read out of /dev/*random. The default
rate at which we collect entropy without the read_rate increase is
already more than we need to recover from a compromise of an internal
state.
Reviewed by: #csprng (cem, delphij, markm)
Differential Revision: https://reviews.freebsd.org/D32021
Push the root seed version to userspace through the VDSO page, if
the RANDOM_FENESTRASX algorithm is enabled. Otherwise, there is no
functional change. The mechanism can be disabled with
debug.fxrng_vdso_enable=0.
arc4random(3) obtains a pointer to the root seed version published by
the kernel in the shared page at allocation time. Like arc4random(9),
it maintains its own per-process copy of the seed version corresponding
to the root seed version at the time it last rekeyed. On read requests,
the process seed version is compared with the version published in the
shared page; if they do not match, arc4random(3) reseeds from the
kernel before providing generated output.
This change does not implement the FenestrasX concept of PCPU userspace
generators seeded from a per-process base generator. That change is
left for future discussion/work.
Reviewed by: kib (previous version)
Approved by: csprng (me -- only touching FXRNG here)
Differential Revision: https://reviews.freebsd.org/D22839
There is no functional change for the existing Fortuna random(4)
implementation, which remains the default in GENERIC.
In the FenestrasX model, when the root CSPRNG is reseeded from pools due to
an (infrequent) timer, child CSPRNGs can cheaply detect this condition and
reseed. To do so, they just need to track an additional 64-bit value in the
associated state, and compare it against the root seed version (generation)
on random reads.
This revision integrates arc4random(9) into that model without substantially
changing the design or implementation of arc4random(9). The motivation is
that arc4random(9) is immediately reseeded when the backing random(4)
implementation has additional entropy. This is arguably most important
during boot, when fenestrasX is reseeding at 1, 3, 9, 27, etc., second
intervals. Today, arc4random(9) has a hardcoded 300 second reseed window.
Without this mechanism, if arc4random(9) gets weak entropy during initial
seed (and arc4random(9) is used early in boot, so this is quite possible),
it may continue to emit poorly seeded output for 5 minutes. The FenestrasX
push-reseed scheme corrects consumers, like arc4random(9), as soon as
possible.
Reviewed by: markm
Approved by: csprng (markm)
Differential Revision: https://reviews.freebsd.org/D22838
Fortuna remains the default; no functional change to GENERIC.
Big picture:
- Scalable entropy generation with per-CPU, buffered local generators.
- "Push" system for reseeding child generators when root PRNG is
reseeded. (Design can be extended to arc4random(9) and userspace
generators.)
- Similar entropy pooling system to Fortuna, but starts with a single
pool to quickly bootstrap as much entropy as possible early on.
- Reseeding from pooled entropy based on time schedule. The time
interval starts small and grows exponentially until reaching a cap.
Again, the goal is to have the RNG state depend on as much entropy as
possible quickly, but still periodically incorporate new entropy for
the same reasons as Fortuna.
Notable design choices in this implementation that differ from those
specified in the whitepaper:
- Blake2B instead of SHA-2 512 for entropy pooling
- Chacha20 instead of AES-CTR DRBG
- Initial seeding. We support more platforms and not all of them use
loader(8). So we have to grab the initial entropy sources in kernel
mode instead, as much as possible. Fortuna didn't have any mechanism
for this aside from the special case of loader-provided previous-boot
entropy, so most of these sources remain TODO after this commit.
Reviewed by: markm
Approved by: csprng (markm)
Differential Revision: https://reviews.freebsd.org/D22837
In addition to reducing lines of code, this also ensures that the full
allocation is always zeroed avoiding possible bugs with incorrect
lengths passed to explicit_bzero().
Suggested by: cem
Reviewed by: cem, delphij
Approved by: csprng (cem)
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D25435
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.
This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.
Mark all obvious cases as MPSAFE. All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT
Reviewed by: cem
Approved by: csprng, kib (mentor, blanket)
Differential Revision: https://reviews.freebsd.org/D23841
The number is public and has no "entropy," but should be integrated quickly
on VM rewind events to avoid duplicate sequences.
Approved by: csprng(markm)
Differential Revision: https://reviews.freebsd.org/D22946
Allow loadable modules that provide random entropy source(s) to safely
unload. Prior to this change, no driver could ensure that their
random_source structure was not being used by random_harvestq.c for any
period of time after invoking random_source_deregister().
This change converts the source_list LIST to a ConcurrencyKit CK_LIST and
uses an epoch(9) to protect typical read accesses of the list. The existing
HARVEST_LOCK spin mutex is used to safely add and remove list entries.
random_source_deregister() uses epoch_wait() to ensure no concurrent
source_list readers are accessing a random_source before freeing the list
item and returning to the caller.
Callers can safely unload immediately after random_source_deregister()
returns.
Reviewed by: markj
Approved by: csprng(markm)
Discussed with: jhb
Differential Revision: https://reviews.freebsd.org/D22489
Simplify RANDOM_LOADABLE by removing the ability to unload a LOADABLE
random(4) implementation. This allows one-time random module selection
at boot, by loader(8). Swapping modules on the fly doesn't seem
especially useful.
This removes the need to hold a lock over the sleepable module calls
read_random and read_random_uio.
init/deinit have been pulled out of random_algorithm entirely. Algorithms
can run their own sysinits to initialize; deinit is removed entirely, as
algorithms can not be unloaded. Algorithms should initialize at
SI_SUB_RANDOM:SI_ORDER_SECOND. In LOADABLE systems, algorithms install
a pointer to their local random_algorithm context in p_random_alg_context at
that time.
Go ahead and const'ify random_algorithm objects; there is no need to mutate
them at runtime.
LOADABLE kernel NULL checks are removed from random_harvestq by ordering
random_harvestq initialization at SI_SUB_RANDOM:SI_ORDER_THIRD, after
algorithm init. Prior to random_harvestq init, hc_harvest_mask is zero and
no events are forwarded to algorithms; after random_harvestq init, the
relevant pointers will already have been installed.
Remove the bulk of random_infra shim wrappers and instead expose the bare
function pointers in sys/random.h. In LOADABLE systems, read_random(9) et
al are just thin shim macros around invoking the associated function
pointer. We do not provide a registration system but instead expect
LOADABLE modules to register themselves at SI_SUB_RANDOM:SI_ORDER_SECOND.
An example is provided in randomdev.c, as used in the random_fortuna.ko
module.
Approved by: csprng(markm)
Discussed with: gordon
Differential Revision: https://reviews.freebsd.org/D22512
The implementation was landed in r344913 and has had some bake time (at
least on my personal systems). There is some discussion of the motivation
for defaulting to this cipher as a PRF in the commit log for r344913.
As documented in that commit, administrators can retain the prior (AES-ICM)
mode of operation by setting the 'kern.random.use_chacha20_cipher' tunable
to 0 in loader.conf(5).
Approved by: csprng(delphij, markm)
Differential Revision: https://reviews.freebsd.org/D22878
Flip the knob added in r349154 to "enabled." The commit message from that
revision and associated code comment describe the rationale, implementation,
and motivation for the new default in detail. I have dog-fooded this
configuration on my own systems for six months, for what that's worth.
For end-users: the result is just as secure. The benefit is a faster, more
responsive system when processes produce significant demand on random(4).
As mentioned in the earlier commit, the prior behavior may be restored by
setting the kern.random.fortuna.concurrent_read="0" knob in loader.conf(5).
This scales the random generation side of random(4) somewhat, although there
is still a global mutex being shared by all cores and rand_harvestq; the
situation is generally much better than it was before on small CPU systems,
but do not expect miracles on 256-core systems running 256-thread full-rate
random(4) read. Work is ongoing to address both the generation-side (in
more depth) and the harvest-side scaling problems.
Approved by: csprng(delphij, markm)
Tested by: markm
Differential Revision: https://reviews.freebsd.org/D22879
The internal datastructures do not need to be visible outside of
random_harvestq, and this helps ensure they are not misused.
No functional change.
Approved by: csprng(delphij, markm)
Differential Revision: https://reviews.freebsd.org/D22485
There's no need to dynamically populate them; the SYSCTL_ macros take care
of load/unload appropriately already (and random_harvestq is 'standard' and
cannot be unloaded anyway).
Approved by: csprng(delphij, markm)
Differential Revision: https://reviews.freebsd.org/D22484
Break random_harvestq_prime up into some logical subroutines. The goal
is that it becomes easier to add other early entropy sources.
While here, drop pre-12.0 compatibility logic. loader default configuration
should preload the file as expeced since 12.0.
Approved by: csprng(delphij, markm)
Differential Revision: https://reviews.freebsd.org/D22482
On x86 platforms with the intrinsic, rdrand is a deterministic bit generator
(AES-CTR) seeded from an entropic source. On x86 platforms with rdseed, it
is something closer to the upstream entropic source. (There is more nuance;
a block diagram is provided in [1].)
On devices with rdrand and without rdseed, there is no good intrinsic for
acecssing the good entropic soure directly. However, the DRBG is guaranteed
to reseed every 8 kB on these platforms. As a conservative option, on such
hardware we can read an extra 7.99kB samples every time we want a sample
from an independent seed.
As one can imagine, this drastically slows the effective read rate of
RDRAND (a factor of 1024 on amd64 and 2048 on ia32). Microbenchmarks on AMD
Zen (has RDSEED) show an RDRAND rate of 25 MB/s and Intel Haswell (no
RDSEED) show RDRAND of 170 MB/s. This would reduce the read rate on Haswell
to ~170 kB/s (at 100% CPU). random(4)'s harvestq thread periodically
"feeds" from pure sources in amounts of 128-1024 bytes. On Haswell,
enabling this feature increases the CPU time of RDRAND in each "feed" from
approximately 0.7-6 µs to 0.7-6 ms.
Because there is some performance penalty to this more conservative option,
a knob is provided to enable the change. The change does not affect
platforms with RDSEED.
[1]: https://software.intel.com/en-us/articles/intel-digital-random-number-generator-drng-software-implementation-guide#inpage-nav-4-2
Approved by: csprng(delphij, markm)
Differential Revision: https://reviews.freebsd.org/D22455
It is clearer to me to return success/error (true/false) instead of some
retry count linked to the inline assembly implementation.
No functional change.
Approved by: core(csprng) => csprng(markm)
Differential Revision: https://reviews.freebsd.org/D22454
Move fast entropy source registration to the earlier
SI_SUB_RANDOM:SI_ORDER_FOURTH and move random_harvestq_prime after that.
Relocate the registration routines out of the much later randomdev module
and into random_harvestq.
This is necessary for the fast random sources to actually register before we
perform random_harvestq_prime() early in the kernel boot.
No functional change.
Reviewed by: delphij, markjm
Approved by: secteam(delphij)
Differential Revision: https://reviews.freebsd.org/D21308
No functional change.
Add a verbose comment giving an example side-by-side comparison between the
prior and Concurrent modes of Fortuna, and why one should believe they
produce the same result.
The intent is to flip this on by default prior to 13.0, so testing is
encouraged. To enable, add the following to loader.conf:
kern.random.fortuna.concurrent_read="1"
The intent is also to flip the default blockcipher to the faster Chacha-20
prior to 13.0, so testing of that mode of operation is also appreciated.
To enable, add the following to loader.conf:
kern.random.use_chacha20_cipher="1"
Approved by: secteam(implicit)
In r349154, random device reads of size < 16 bytes (AES block size) were
accidentally broken to loop forever. Correct the loop condition for small
reads.
Reported by: pho
Reviewed by: delphij
Approved by: secteam(delphij)
Differential Revision: https://reviews.freebsd.org/D20686
Add experimental feature to increase concurrency in Fortuna. As this
diverges slightly from canonical Fortuna, and due to the security
sensitivity of random(4), it is off by default. To enable it, set the
tunable kern.random.fortuna.concurrent_read="1". The rest of this commit
message describes the behavior when enabled.
Readers continue to update shared Fortuna state under global mutex, as they
do in the status quo implementation of the algorithm, but shift the actual
PRF generation out from under the global lock. This massively reduces the
CPU time readers spend holding the global lock, allowing for increased
concurrency on SMP systems and less bullying of the harvestq kthread.
It is somewhat of a deviation from FS&K. I think the primary difference is
that the specific sequence of AES keys will differ if READ_RANDOM_UIO is
accessed concurrently (as the 2nd thread to take the mutex will no longer
receive a key derived from rekeying the first thread). However, I believe
the goals of rekeying AES are maintained: trivially, we continue to rekey
every 1MB for the statistical property; and each consumer gets a
forward-secret, independent AES key for their PRF.
Since Chacha doesn't need to rekey for sequences of any length, this change
makes no difference to the sequence of Chacha keys and PRF generated when
Chacha is used in place of AES.
On a GENERIC 4-thread VM (so, INVARIANTS/WITNESS, numbers not necessarily
representative), 3x concurrent AES performance jumped from ~55 MiB/s per
thread to ~197 MB/s per thread. Concurrent Chacha20 at 3 threads went from
roughly ~113 MB/s per thread to ~430 MB/s per thread.
Prior to this change, the system was extremely unresponsive with 3-4
concurrent random readers; each thread had high variance in latency and
throughput, depending on who got lucky and won the lock. "rand_harvestq"
thread CPU use was high (double digits), seemingly due to spinning on the
global lock.
After the change, concurrent random readers and the system in general are
much more responsive, and rand_harvestq CPU use dropped to basically zero.
Tests are added to the devrandom suite to ensure the uint128_add64 primitive
utilized by unlocked read functions to specification.
Reviewed by: markm
Approved by: secteam(delphij)
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D20313