Commit graph

18528 commits

Author SHA1 Message Date
John Baldwin
94280c5811 ktls: Reject some invalid cipher suites.
- Reject AES-CBC cipher suites for TLS 1.0 and TLS 1.1 using auth
  algorithms other than SHA1-HMAC.

- Reject AES-GCM cipher suites for TLS versions older than 1.2.

Reviewed by:	markj
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D32842

(cherry picked from commit 900a28fe33)
2021-11-23 15:11:53 -08:00
John Baldwin
ba6b771d17 ktls: Ensure FIFO encryption order for TLS 1.0.
TLS 1.0 records are encrypted as one continuous CBC chain where the
last block of the previous record is used as the IV for the next
record.  As a result, TLS 1.0 records cannot be encrypted out of order
but must be encrypted as a FIFO.

If the later pages of a sendfile(2) request complete before the first
pages, then TLS records can be encrypted out of order.  For TLS 1.1
and later this is fine, but this can break for TLS 1.0.

To cope, add a queue in each TLS session to hold TLS records that
contain valid unencrypted data but are waiting for an earlier TLS
record to be encrypted first.

- In ktls_enqueue(), check if a TLS record being queued is the next
  record expected for a TLS 1.0 session.  If not, it is placed in
  sorted order in the pending_records queue in the TLS session.

  If it is the next expected record, queue it for SW encryption like
  normal.  In addition, check if this new record (really a potential
  batch of records) was holding up any previously queued records in
  the pending_records queue.  Any of those records that are now in
  order are also placed on the queue for SW encryption.

- In ktls_destroy(), free any TLS records on the pending_records
  queue.  These mbufs are marked M_NOTREADY so were not freed when the
  socket buffer was purged in sbdestroy().  Instead, they must be
  freed explicitly.

Reviewed by:	gallatin, markj
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D32381

(cherry picked from commit 9f03d2c001)
2021-11-23 15:11:44 -08:00
John Baldwin
0053fedc1b ktls: Reject attempts to enable AES-CBC with TLS 1.3.
AES-CBC cipher suites are not supported in TLS 1.3.

Reported by:	syzbot+ab501c50033ec01d53c6@syzkaller.appspotmail.com
Reviewed by:	tuexen, markj
Differential Revision:	https://reviews.freebsd.org/D32404

(cherry picked from commit a63752cce6)
2021-11-23 15:11:44 -08:00
John Baldwin
6afc00ed13 ktls: Use COUNTER_U64_DEFINE_EARLY for the ktls_toe_chacha20 counter.
I missed updating this counter when rebasing the changes in
9c64fc4029 after the switch to
COUNTER_U64_DEFINE_EARLY in 1755b2b989.

Fixes:		9c64fc4029 Add Chacha20-Poly1305 as a KTLS cipher suite.
Sponsored by:	Netflix

(cherry picked from commit 90972f0402)
2021-11-23 15:11:44 -08:00
John Baldwin
b7f27a60ac Add Chacha20-Poly1305 as a KTLS cipher suite.
Chacha20-Poly1305 for TLS is an AEAD cipher suite for both TLS 1.2 and
TLS 1.3 (RFCs 7905 and 8446).  For both versions, Chacha20 uses the
server and client IVs as implicit nonces xored with the record
sequence number to generate the per-record nonce matching the
construction used with AES-GCM for TLS 1.3.

Reviewed by:	gallatin
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D27839

(cherry picked from commit 9c64fc4029)
2021-11-23 15:11:44 -08:00
John Baldwin
b07b1f890e Stop creating socket aio kprocs during boot.
Create the initial pool of kprocs on demand when the first socket AIO
request is submitted instead.  The pool of kprocs used for other AIO
requests is similarly created on first use.

Reviewed by:	asomers
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D32468

(cherry picked from commit d1b6fef075)
2021-11-23 15:11:43 -08:00
Mark Johnston
35dfdb88ea unix: Remove a write-only local variable
Reported by:	clang
Sponsored by:	The FreeBSD Foundation

(cherry picked from commit 42188bb5c1)
2021-11-23 09:32:46 -05:00
Mark Johnston
d16fbc488e clock: Group the "clocks" SYSINIT with the function definition
This is how most SYSINITs are defined.  Also annotate the dummy
parameter with __unused.  No functional change intended.

Sponsored by:	The FreeBSD Foundation

(cherry picked from commit 2287ced2f5)
2021-11-22 08:45:47 -05:00
Mark Johnston
686b143f37 timecounter: Initialize tc_lock earlier
Hyper-V wants to register its MSR-based timecounter during
SI_SUB_HYPERVISOR, before SI_SUB_LOCK, since an emulated 8254 may not be
available for DELAY().  So we cannot use MTX_SYSINIT to initialize the
timecounter lock.

PR:		259878
Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation

(cherry picked from commit 3339950117)
2021-11-22 08:44:49 -05:00
Konstantin Belousov
7dab2a7cf5 Kernel linkers: some style
(cherry picked from commit a7e4eb1422)
2021-11-21 02:27:44 +02:00
Warner Losh
87586bff11 sysbeep: Adjust interface to take a duration as a sbt
Change the 'period' argument to 'duration' and change its type to
sbintime_t so we can more easily express different durations.

Reviewed by:	tsoome, glebius
Differential Revision:	https://reviews.freebsd.org/D32619

(cherry picked from commit 072d5b98c4)
2021-11-18 21:52:22 -07:00
Konstantin Belousov
19f2755d9e DEBUG_VFS_LOCKS: stop excluding devfs and doomed vnode from asserts
(cherry picked from commit d032cda0d0)
2021-11-19 06:25:29 +02:00
Konstantin Belousov
b9283ea323 Make locking assertions for VOP_FSYNC() and VOP_FDATASYNC() more correct
(cherry picked from commit 47b248ac65)
2021-11-19 06:25:29 +02:00
Konstantin Belousov
3a12ea648f freevnode(): lock the freeing vnode around destroy_vpollinfo()
(cherry picked from commit d1d675cb30)
2021-11-19 06:25:29 +02:00
Konstantin Belousov
4c04226222 getblk(): do not require devvp vnodes to be locked
(cherry picked from commit a7b4a54d2c)
2021-11-19 06:25:28 +02:00
Konstantin Belousov
5bd64640f7 start_init: use 'p'
(cherry picked from commit 8660813153)
2021-11-18 02:32:32 +02:00
Hans Petter Selasky
4a36455c41 Factor out flags preserved during mbuf demote into a separate define.
This define will later on be used by coming TLS RX hardware offload patches.

No functional change intended.

Reviewed by:	jhb@
Sponsored by:	NVIDIA Networking

(cherry picked from commit dd31400c3c)
2021-11-12 15:33:54 +01:00
Konstantin Belousov
9de9a33050 fexecve(2): allow O_PATH file descriptors opened without O_EXEC
(cherry picked from commit be10c0a910)
2021-11-06 04:12:33 +02:00
Konstantin Belousov
5291b294d3 proc_get_binpath(): provide syntaxically correct value for unused NDINIT arg
(cherry picked from commit 7ac82c96fe)
2021-11-06 04:12:32 +02:00
Konstantin Belousov
392fbf5cce proc_get_binpath(): return empty string instead of NULL
(cherry picked from commit 02de91d740)
2021-11-06 04:12:32 +02:00
Konstantin Belousov
17aab23bf7 fexecve(2): restore the attempts to calculate the executable path
(cherry picked from commit e4ce23b238)
2021-11-06 04:12:32 +02:00
Konstantin Belousov
0303cc4be8 Extract proc_get_binpath() from sysctl_kern_proc_pathname()
(cherry picked from commit f34fc6ba06)
2021-11-06 04:12:32 +02:00
Konstantin Belousov
ea4e8e191c sysctl kern.proc.procname: report right hardlink name
PR:	248184

(cherry picked from commit ee92c8a842)
2021-11-06 04:12:32 +02:00
Konstantin Belousov
d39bd6d14d exec: store parent directory and hardlink name of the binary in struct proc
(cherry picked from commit 351d5f7fc5)
2021-11-06 04:12:32 +02:00
Konstantin Belousov
a69fb7452e exec: provide right hardlink name in AT_EXECPATH
PR:	248184

(cherry picked from commit 0c10648fbb)
2021-11-06 04:12:31 +02:00
Konstantin Belousov
b94df11d52 Make vn_fullpath_hardlink() externally callable
(cherry picked from commit 9a0bee9f6a)
2021-11-06 04:12:31 +02:00
Konstantin Belousov
1849361644 struct image_params: use bool type for boolean members
(cherry picked from commit 15bf81f354)
2021-11-06 04:12:31 +02:00
Konstantin Belousov
3b4baefca9 do_execve(): switch boolean locals to use bool type
(cherry picked from commit 9d58243fbc)
2021-11-06 04:12:31 +02:00
Konstantin Belousov
0b06c284ae kern_exec.c: style
(cherry picked from commit 143dba3a91)
2021-11-06 04:12:31 +02:00
Konstantin Belousov
3e322ded35 Unmap shared page manually before doing vm_map_remove() on exit or exec
(cherry picked from commit 1c69690319)
2021-11-04 02:56:39 +02:00
Sebastian Huber
b765d3da06 kern_tc.c: Scaling/large delta recalculation
(cherry picked from commit ae750fbac7)
2021-11-04 02:56:38 +02:00
Mark Johnston
66cb1858f4 Convert vm_page_alloc() callers to use vm_page_alloc_noobj().
Remove page zeroing code from consumers and stop specifying
VM_ALLOC_NOOBJ.  In a few places, also convert an allocation loop to
simply use VM_ALLOC_WAITOK.

Similarly, convert vm_page_alloc_domain() callers.

Note that callers are now responsible for assigning the pindex.

Reviewed by:	alc, hselasky, kib
Sponsored by:	The FreeBSD Foundation

(cherry picked from commit a4667e09e6)
2021-11-03 13:39:36 -04:00
Mark Johnston
b5e5020260 rmslock: Update td_locks during lock and unlock operations
Reviewed by:	mjg
Sponsored by:	The FreeBSD Foundation

(cherry picked from commit 71f31d784e)
2021-11-03 09:15:05 -04:00
Mark Johnston
10d94487df kasan: Use vm_offset_t for the first parameter to kasan_shadow_map()
No functional change intended.

Sponsored by:	The FreeBSD Foundation

(cherry picked from commit 20e3b9d8bd)
2021-11-02 18:17:58 -04:00
Alexander Motin
aac9d07f93 sleepqueue(9): Remove sbinuptime() from sleepq_timeout().
Callout c_time is always bigger or equal than the scheduled time.  It
is also smaller than sbinuptime() and can't change while the callback
is running.  So we reliably can use it instead of sbinuptime() here.
In case there was a race and the callout was rescheduled to the later
time, the callback will be called again.

According to profiles it saves ~5% of the timer interrupt time even
with fast TSC timecounter.

MFC after:	1 month

(cherry picked from commit 6df1359e55)
2021-11-01 20:24:07 -04:00
Mark Johnston
3388bf06d7 Generalize sanitizer interceptors for memory and string routines
Similar to commit 3ead60236f ("Generalize bus_space(9) and atomic(9)
sanitizer interceptors"), use a more generic scheme for interposing
sanitizer implementations of routines like memcpy().

No functional change intended.

Sponsored by:	The FreeBSD Foundation

(cherry picked from commit ec8f1ea8d5)
2021-11-01 10:20:50 -04:00
Mark Johnston
bf0986b742 Generalize bus_space(9) and atomic(9) sanitizer interceptors
Make it easy to define interceptors for new sanitizer runtimes, rather
than assuming KCSAN.  Lay a bit of groundwork for KASAN and KMSAN.

When a sanitizer is compiled in, atomic(9) and bus_space(9) definitions
in atomic_san.h are used by default instead of the inline
implementations in the platform's atomic.h.  These definitions are
implemented in the sanitizer runtime, which includes
machine/{atomic,bus}.h with SAN_RUNTIME defined to pull in the actual
implementations.

No functional change intended.

Sponsored by:	The FreeBSD Foundation

(cherry picked from commit 3ead60236f)
2021-11-01 10:16:39 -04:00
Mark Johnston
252b6ae3e6 KASAN: Disable checking before triggering a panic
KASAN hooks will not generate reports if panicstr != NULL, but then
there is a window after the initial panic() call where another report
may be raised.  This can happen if a false positive occurs; to simplify
debugging of such problems, avoid recursing.

Sponsored by:	The FreeBSD Foundation

(cherry picked from commit ea3fbe0707)
2021-11-01 10:07:45 -04:00
Mark Johnston
224a01a342 KASAN: Implement __asan_unregister_globals()
It will be called during KLD unload to unpoison the redzones following
global variables.  Otherwise, virtual address ranges previously used for
a KLD may be left tainted, triggering false positives when they are
recycled.

Reported by:	pho
Sponsored by:	The FreeBSD Foundation

(cherry picked from commit 588c7a06df)
2021-11-01 10:07:13 -04:00
Mark Johnston
28c338b342 realloc: Fix KASAN(9) shadow map updates
When copying from the old buffer to the new buffer, we don't know the
requested size of the old allocation, but only the size of the
allocation provided by UMA.  This value is "alloc".  Because the copy
may access bytes in the old allocation's red zone, we must mark the full
allocation valid in the shadow map.  Do so using the correct size.

Reported by:	kp
Tested by:	kp
Sponsored by:	The FreeBSD Foundation

(cherry picked from commit 9a7c2de364)
2021-11-01 10:05:22 -04:00
Mark Johnston
9710b74dd0 malloc: Add state transitions for KASAN
- Reuse some REDZONE bits to keep track of the requested and allocated
  sizes, and use that to provide red zones.
- As in UMA, disable memory trashing to avoid unnecessary CPU overhead.

Sponsored by:	The FreeBSD Foundation

(cherry picked from commit 06a53ecf24)
2021-11-01 10:03:36 -04:00
Mark Johnston
2748ecec95 execve: Mark exec argument buffers
We cache mapped execve argument buffers to avoid the overhead of TLB
shootdowns.  Mark them invalid when they are freed to the cache.

Sponsored by:	The FreeBSD Foundation

(cherry picked from commit f1c3adefd9)
2021-11-01 10:03:28 -04:00
Mark Johnston
75306778f1 vfs: Add KASAN state transitions for vnodes
vnodes are a bit special in that they may exist on per-CPU lists even
while free.  Add a KASAN-only destructor that poisons regions of each
vnode that are not expected to be accessed after a free.

Sponsored by:	The FreeBSD Foundation

(cherry picked from commit b261bb4057)
2021-11-01 10:03:19 -04:00
Mark Johnston
a3d4c8e21d amd64: Implement a KASAN shadow map
The idea behind KASAN is to use a region of memory to track the validity
of buffers in the kernel map.  This region is the shadow map.  The
compiler inserts calls to the KASAN runtime for every emitted load
and store, and the runtime uses the shadow map to decide whether the
access is valid.  Various kernel allocators call kasan_mark() to update
the shadow map.

Since the shadow map tracks only accesses to the kernel map, accesses to
other kernel maps are not validated by KASAN.  UMA_MD_SMALL_ALLOC is
disabled when KASAN is configured to reduce usage of the direct map.
Currently we have no mechanism to completely eliminate uses of the
direct map, so KASAN's coverage is not comprehensive.

The shadow map uses one byte per eight bytes in the kernel map.  In
pmap_bootstrap() we create an initial set of page tables for the kernel
and preloaded data.

When pmap_growkernel() is called, we call kasan_shadow_map() to extend
the shadow map.  kasan_shadow_map() uses pmap_kasan_enter() to allocate
memory for the shadow region and map it.

Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D29417

(cherry picked from commit 6faf45b34b)
2021-11-01 09:57:30 -04:00
Mark Johnston
48d2c7cc30 Add the KASAN runtime
KASAN enables the use of LLVM's AddressSanitizer in the kernel.  This
feature makes use of compiler instrumentation to validate memory
accesses in the kernel and detect several types of bugs, including
use-after-frees and out-of-bounds accesses.  It is particularly
effective when combined with test suites or syzkaller.  KASAN has high
CPU and memory usage overhead and so is not suited for production
environments.

The runtime and pmap maintain a shadow of the kernel map to store
information about the validity of memory mapped at a given kernel
address.

The runtime implements a number of functions defined by the compiler
ABI.  These are prefixed by __asan.  The compiler emits calls to
__asan_load*() and __asan_store*() around memory accesses, and the
runtime consults the shadow map to determine whether a given access is
valid.

kasan_mark() is called by various kernel allocators to update state in
the shadow map.  Updates to those allocators will come in subsequent
commits.

The runtime also defines various interceptors.  Some low-level routines
are implemented in assembly and are thus not amenable to compiler
instrumentation.  To handle this, the runtime implements these routines
on behalf of the rest of the kernel.  The sanitizer implementation
validates memory accesses manually before handing off to the real
implementation.

The sanitizer in a KASAN-configured kernel can be disabled by setting
the loader tunable debug.kasan.disable=1.

Obtained from:	NetBSD
Sponsored by:	The FreeBSD Foundation

(cherry picked from commit 38da497a4d)
2021-11-01 09:56:31 -04:00
Mark Johnston
bb5c81812f timecounter: Lock the timecounter list
Timecounter registration is dynamic, i.e., there is no requirement that
timecounters must be registered during single-threaded boot.  Loadable
drivers may in principle register timecounters (which can be switched to
automatically).  Timecounters cannot be unregistered, though this could
be implemented.

Registered timecounters belong to a global linked list.  Add a mutex to
synchronize insertions and the traversals done by (mpsafe) sysctl
handlers.  No functional change intended.

Reviewed by:	imp, kib
Sponsored by:	The FreeBSD Foundation

(cherry picked from commit 621fd9dcb2)
2021-11-01 09:20:11 -04:00
Mark Johnston
943421bdf7 signal: Add SIG_FOREACH and refactor issignal()
Add a SIG_FOREACH macro that can be used to iterate over a signal set.
This is a bit cleaner and more efficient than calling sig_ffs() in a
loop.  The implementation is based on BIT_FOREACH_ISSET(), except
that the bitset limbs are always 32 bits wide, and signal sets are
1-indexed rather than 0-indexed like bitset(9) sets.

issignal() cannot really be modified to use SIG_FOREACH() directly.
Take this opportunity to split the function into two explicit loops.
I've always found this function hard to read and think that this change
is an improvement.

Remove sig_ffs(), nothing uses it now.

Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation

(cherry picked from commit 81f2e9063d)
2021-11-01 09:20:11 -04:00
Gordon Bergling
6ad1c6a826 jail(8): Fix a few common typos in source code comments
- s/phyiscal/physical/

(cherry picked from commit 70de1003da)
2021-10-30 09:48:43 +02:00
Konstantin Belousov
c3c880be15 uipc_shm: silent warnings about write-only variables in largepage code
(cherry picked from commit 3b5331dd8d)
2021-10-27 03:24:41 +03:00
Konstantin Belousov
17c83b7670 sig_ast_checksusp(): mark the local p as __diagused
(cherry picked from commit 3d2778515a)
2021-10-27 03:24:40 +03:00