TLS 1.0 records are encrypted as one continuous CBC chain where the
last block of the previous record is used as the IV for the next
record. As a result, TLS 1.0 records cannot be encrypted out of order
but must be encrypted as a FIFO.
If the later pages of a sendfile(2) request complete before the first
pages, then TLS records can be encrypted out of order. For TLS 1.1
and later this is fine, but this can break for TLS 1.0.
To cope, add a queue in each TLS session to hold TLS records that
contain valid unencrypted data but are waiting for an earlier TLS
record to be encrypted first.
- In ktls_enqueue(), check if a TLS record being queued is the next
record expected for a TLS 1.0 session. If not, it is placed in
sorted order in the pending_records queue in the TLS session.
If it is the next expected record, queue it for SW encryption like
normal. In addition, check if this new record (really a potential
batch of records) was holding up any previously queued records in
the pending_records queue. Any of those records that are now in
order are also placed on the queue for SW encryption.
- In ktls_destroy(), free any TLS records on the pending_records
queue. These mbufs are marked M_NOTREADY so were not freed when the
socket buffer was purged in sbdestroy(). Instead, they must be
freed explicitly.
Reviewed by: gallatin, markj
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D32381
(cherry picked from commit 9f03d2c001)
I missed updating this counter when rebasing the changes in
9c64fc4029 after the switch to
COUNTER_U64_DEFINE_EARLY in 1755b2b989.
Fixes: 9c64fc4029 Add Chacha20-Poly1305 as a KTLS cipher suite.
Sponsored by: Netflix
(cherry picked from commit 90972f0402)
Chacha20-Poly1305 for TLS is an AEAD cipher suite for both TLS 1.2 and
TLS 1.3 (RFCs 7905 and 8446). For both versions, Chacha20 uses the
server and client IVs as implicit nonces xored with the record
sequence number to generate the per-record nonce matching the
construction used with AES-GCM for TLS 1.3.
Reviewed by: gallatin
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D27839
(cherry picked from commit 9c64fc4029)
Create the initial pool of kprocs on demand when the first socket AIO
request is submitted instead. The pool of kprocs used for other AIO
requests is similarly created on first use.
Reviewed by: asomers
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D32468
(cherry picked from commit d1b6fef075)
This is how most SYSINITs are defined. Also annotate the dummy
parameter with __unused. No functional change intended.
Sponsored by: The FreeBSD Foundation
(cherry picked from commit 2287ced2f5)
Hyper-V wants to register its MSR-based timecounter during
SI_SUB_HYPERVISOR, before SI_SUB_LOCK, since an emulated 8254 may not be
available for DELAY(). So we cannot use MTX_SYSINIT to initialize the
timecounter lock.
PR: 259878
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
(cherry picked from commit 3339950117)
Change the 'period' argument to 'duration' and change its type to
sbintime_t so we can more easily express different durations.
Reviewed by: tsoome, glebius
Differential Revision: https://reviews.freebsd.org/D32619
(cherry picked from commit 072d5b98c4)
This define will later on be used by coming TLS RX hardware offload patches.
No functional change intended.
Reviewed by: jhb@
Sponsored by: NVIDIA Networking
(cherry picked from commit dd31400c3c)
Remove page zeroing code from consumers and stop specifying
VM_ALLOC_NOOBJ. In a few places, also convert an allocation loop to
simply use VM_ALLOC_WAITOK.
Similarly, convert vm_page_alloc_domain() callers.
Note that callers are now responsible for assigning the pindex.
Reviewed by: alc, hselasky, kib
Sponsored by: The FreeBSD Foundation
(cherry picked from commit a4667e09e6)
Callout c_time is always bigger or equal than the scheduled time. It
is also smaller than sbinuptime() and can't change while the callback
is running. So we reliably can use it instead of sbinuptime() here.
In case there was a race and the callout was rescheduled to the later
time, the callback will be called again.
According to profiles it saves ~5% of the timer interrupt time even
with fast TSC timecounter.
MFC after: 1 month
(cherry picked from commit 6df1359e55)
Similar to commit 3ead60236f ("Generalize bus_space(9) and atomic(9)
sanitizer interceptors"), use a more generic scheme for interposing
sanitizer implementations of routines like memcpy().
No functional change intended.
Sponsored by: The FreeBSD Foundation
(cherry picked from commit ec8f1ea8d5)
Make it easy to define interceptors for new sanitizer runtimes, rather
than assuming KCSAN. Lay a bit of groundwork for KASAN and KMSAN.
When a sanitizer is compiled in, atomic(9) and bus_space(9) definitions
in atomic_san.h are used by default instead of the inline
implementations in the platform's atomic.h. These definitions are
implemented in the sanitizer runtime, which includes
machine/{atomic,bus}.h with SAN_RUNTIME defined to pull in the actual
implementations.
No functional change intended.
Sponsored by: The FreeBSD Foundation
(cherry picked from commit 3ead60236f)
KASAN hooks will not generate reports if panicstr != NULL, but then
there is a window after the initial panic() call where another report
may be raised. This can happen if a false positive occurs; to simplify
debugging of such problems, avoid recursing.
Sponsored by: The FreeBSD Foundation
(cherry picked from commit ea3fbe0707)
It will be called during KLD unload to unpoison the redzones following
global variables. Otherwise, virtual address ranges previously used for
a KLD may be left tainted, triggering false positives when they are
recycled.
Reported by: pho
Sponsored by: The FreeBSD Foundation
(cherry picked from commit 588c7a06df)
When copying from the old buffer to the new buffer, we don't know the
requested size of the old allocation, but only the size of the
allocation provided by UMA. This value is "alloc". Because the copy
may access bytes in the old allocation's red zone, we must mark the full
allocation valid in the shadow map. Do so using the correct size.
Reported by: kp
Tested by: kp
Sponsored by: The FreeBSD Foundation
(cherry picked from commit 9a7c2de364)
- Reuse some REDZONE bits to keep track of the requested and allocated
sizes, and use that to provide red zones.
- As in UMA, disable memory trashing to avoid unnecessary CPU overhead.
Sponsored by: The FreeBSD Foundation
(cherry picked from commit 06a53ecf24)
We cache mapped execve argument buffers to avoid the overhead of TLB
shootdowns. Mark them invalid when they are freed to the cache.
Sponsored by: The FreeBSD Foundation
(cherry picked from commit f1c3adefd9)
vnodes are a bit special in that they may exist on per-CPU lists even
while free. Add a KASAN-only destructor that poisons regions of each
vnode that are not expected to be accessed after a free.
Sponsored by: The FreeBSD Foundation
(cherry picked from commit b261bb4057)
The idea behind KASAN is to use a region of memory to track the validity
of buffers in the kernel map. This region is the shadow map. The
compiler inserts calls to the KASAN runtime for every emitted load
and store, and the runtime uses the shadow map to decide whether the
access is valid. Various kernel allocators call kasan_mark() to update
the shadow map.
Since the shadow map tracks only accesses to the kernel map, accesses to
other kernel maps are not validated by KASAN. UMA_MD_SMALL_ALLOC is
disabled when KASAN is configured to reduce usage of the direct map.
Currently we have no mechanism to completely eliminate uses of the
direct map, so KASAN's coverage is not comprehensive.
The shadow map uses one byte per eight bytes in the kernel map. In
pmap_bootstrap() we create an initial set of page tables for the kernel
and preloaded data.
When pmap_growkernel() is called, we call kasan_shadow_map() to extend
the shadow map. kasan_shadow_map() uses pmap_kasan_enter() to allocate
memory for the shadow region and map it.
Reviewed by: kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D29417
(cherry picked from commit 6faf45b34b)
KASAN enables the use of LLVM's AddressSanitizer in the kernel. This
feature makes use of compiler instrumentation to validate memory
accesses in the kernel and detect several types of bugs, including
use-after-frees and out-of-bounds accesses. It is particularly
effective when combined with test suites or syzkaller. KASAN has high
CPU and memory usage overhead and so is not suited for production
environments.
The runtime and pmap maintain a shadow of the kernel map to store
information about the validity of memory mapped at a given kernel
address.
The runtime implements a number of functions defined by the compiler
ABI. These are prefixed by __asan. The compiler emits calls to
__asan_load*() and __asan_store*() around memory accesses, and the
runtime consults the shadow map to determine whether a given access is
valid.
kasan_mark() is called by various kernel allocators to update state in
the shadow map. Updates to those allocators will come in subsequent
commits.
The runtime also defines various interceptors. Some low-level routines
are implemented in assembly and are thus not amenable to compiler
instrumentation. To handle this, the runtime implements these routines
on behalf of the rest of the kernel. The sanitizer implementation
validates memory accesses manually before handing off to the real
implementation.
The sanitizer in a KASAN-configured kernel can be disabled by setting
the loader tunable debug.kasan.disable=1.
Obtained from: NetBSD
Sponsored by: The FreeBSD Foundation
(cherry picked from commit 38da497a4d)
Timecounter registration is dynamic, i.e., there is no requirement that
timecounters must be registered during single-threaded boot. Loadable
drivers may in principle register timecounters (which can be switched to
automatically). Timecounters cannot be unregistered, though this could
be implemented.
Registered timecounters belong to a global linked list. Add a mutex to
synchronize insertions and the traversals done by (mpsafe) sysctl
handlers. No functional change intended.
Reviewed by: imp, kib
Sponsored by: The FreeBSD Foundation
(cherry picked from commit 621fd9dcb2)
Add a SIG_FOREACH macro that can be used to iterate over a signal set.
This is a bit cleaner and more efficient than calling sig_ffs() in a
loop. The implementation is based on BIT_FOREACH_ISSET(), except
that the bitset limbs are always 32 bits wide, and signal sets are
1-indexed rather than 0-indexed like bitset(9) sets.
issignal() cannot really be modified to use SIG_FOREACH() directly.
Take this opportunity to split the function into two explicit loops.
I've always found this function hard to read and think that this change
is an improvement.
Remove sig_ffs(), nothing uses it now.
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
(cherry picked from commit 81f2e9063d)