When a DMA request using bounce pages completes, a swi is triggered to
schedule pending DMA requests using the just-freed bounce pages. For
a long time this bus_dma swi has been tied to a "virtual memory" swi
(swi_vm). However, all of the swi_vm implementations are the same and
consist of checking a flag (busdma_swi_pending) which is always true
and if set calling busdma_swi. I suspect this dates back to the
pre-SMPng days and that the intention was for swi_vm to serve as a
mux. However, in the current scheme there's no need for the mux.
Instead, remove swi_vm and vm_ih. Each bus_dma implementation that
uses bounce pages is responsible for creating its own swi (busdma_ih)
which it now schedules directly. This swi invokes busdma_swi directly
removing the need for busdma_swi_pending.
One consequence is that the swi now works on RISC-V which had previously
failed to invoke busdma_swi from swi_vm.
Reviewed by: imp, kib
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D33447
The header exports the following:
- Definition of struct tcb.
- Helpers to get/set the tcb for the current thread.
- TLS_TCB_SIZE (size of TCB)
- TLS_TCB_ALIGN (alignment of TCB)
- TLS_VARIANT_I or TLS_VARIANT_II
- TLS_DTV_OFFSET (bias of pointers in dtv[])
- TLS_TP_OFFSET (bias of "thread pointer" relative to TCB)
Note that TLS_TP_OFFSET does not account for if the unbiased thread
pointer points to the start of the TCB (arm and x86) or the end of the
TCB (MIPS, PowerPC, and RISC-V).
Note also that for amd64, the struct tcb does not include the unused
tcb_spare field included in the current structure in libthr. libthr
does not use this field, and the existing calls in libc and rtld that
allocate a TCB for amd64 assume it is the size of 3 Elf_Addr's (and
thus do not allocate room for tcb_spare).
A <sys/_tls_variant_i.h> header is used by architectures using
Variant I TLS which uses a common struct tcb.
Reviewed by: kib (older version of x86/tls.h), jrtc27
Sponsored by: The University of Cambridge, Google Inc.
Differential Revision: https://reviews.freebsd.org/D33351
After a round of cleanups in late 2020, all definitions are
functionally identical.
This removes a rotted __aligned(8) on arm. It was added in
b7112ead32 and was intended to align the
args member so that 64-bit types (off_t, etc) could be safely read on
armeb compiled with clang. With the removal of armev, this is no
longer needed (armv7 requires that 32-bit aligned reads of 64-bit
values be supported and we enable such support on armv6). As further
evidence this is unnecessary, cleanups to struct syscall_args have
resulted in args being 32-bit aligned on 32-bit systems. The sole
effect is to bloat the struct by 4 bytes.
Reviewed by: kib, jhb, imp
Differential Revision: https://reviews.freebsd.org/D33308
The headers were mostly identical on amd64 and i386.
No functional change intended.
Reviewed by: cperciva, mav, imp, kib, jhb
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33205
It was never implemented on powerpc or riscv and appears to have been
unused since it was added in 1998. No functional change intended.
Reviewed by: kp, glebius, cy
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33093
It is useful for quickly checking an address against the DMAP region.
These definitions exist already on arm64 and riscv.
Reviewed by: kib, markj
MFC after: 3 days
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D32962
We use pmap_invalidate_cpu_mask() to get the set of active CPUs. This
(32-byte) set is copied by value through multiple frames until we get to
smp_targeted_tlb_shootdown(), where it is copied yet again.
Avoid this copying by having smp_targeted_tlb_shootdown() make a local
copy of the active CPUs for the pmap, and drop the cpuset parameter,
simplifying callers. Also leverage the use of the non-destructive
CPU_FOREACH_ISSET to avoid unneeded copying within
smp_targeted_tlb_shootdown().
Reviewed by: alc, kib
Tested by: pho
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32792
This is in preference to simply filling the cpuset, and allows the
conditional in pmap_invalidate_cpu_mask() to be elided.
Also export pmap_invalidate_cpu_mask() outside of pmap.c for use in a
subsequent commit.
Suggested by: kib
Reviewed by: alc, kib
Tested by: pho
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32792
When working on the ports these functions were slightly different, but
now there's no reason for them to be separate.
No functional change intended.
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Similar to pmap_page_set_memattr() by setting MD page cache attribute
to the argument. Unlike pmap_page_set_memattr(), does not flush cache
for the direct mapping of the page.
Reviewed by: alc, markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D32318
This reverts commit 0f6829488e.
Also it changes the type of md_usr_fpu_save struct mdthread member
to void *, which is what uncovered this trouble. Now the save area
is untyped, but since it is hidden behind accessors, it is not too
significant. Since apparently there are consumers affected outside
the tree, this hack is better than one from the reverted revision.
PR: 258678
Reported by: cy
Reviewed by: cy, kevans, markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D32060
For signal send, copyout from the user FPU save area directly.
For sigreturn, we are in sleepable context and can do temporal
allocation of the transient save area. We cannot copying from userspace
directly to user save area because XSAVE state needs to be validated,
also partial copyins can corrupt it.
Requested by: jhb
Reviewed by: jhb, markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D31954
Instead do one more allocation at the thread creation time. This frees
a lot of space on the stack.
Also do not use alloca() for temporal storage in signal delivery sendsig()
function and signal return syscall sys_sigreturn(). This saves equal
amount of space, again by the cost of one more allocation at the thread
creation time.
A useful experiment now would be to reduce KSTACK_PAGES.
Reviewed by: jhb, markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D31954
Move the common kernel function signatures from machine/reg.h to a new
sys/reg.h. This is in preperation for adding PT_GETREGSET to ptrace(2).
Reviewed by: imp, markj
Sponsored by: DARPA, AFRL (original work)
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D19830
Add a variant of 'rdtsc()' that performs the ordered version of 'rdtsc'
appropriate for the invoking x86 variant.
Also, expose the 'lfence'-ed and 'mfence'-ed 'rdtsc()' variants needed
by 'rdtsc_ordered()' for general use.
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D31416
Add a variant of 'rdtscp()' that retains and returns the 'IA32_TSC_AUX'
value read by 'rdtscp'.
Sponsored By: Juniper Networks, Inc.
Sponsored By: Klara, Inc.
Reviewed by: markj, kib
Differential Revision: https://reviews.freebsd.org/D31415
Interrupt and exception handlers must call kmsan_intr_enter() prior to
calling any C code. This is because the KMSAN runtime maintains some
TLS in order to track initialization state of function parameters and
return values across function calls. Then, to ensure that this state is
kept consistent in the face of asynchronous kernel-mode excpeptions, the
runtime uses a stack of TLS blocks, and kmsan_intr_enter() and
kmsan_intr_leave() push and pop that stack, respectively.
Use these functions in amd64 interrupt and exception handlers. Note
that handlers for user->kernel transitions need not be annotated.
Also ensure that trap frames pushed by the CPU and by handlers are
marked as initialized before they are used.
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31467
- During boot, allocate PDP pages for the shadow maps. The region above
KERNBASE is currently not shadowed.
- Create a dummy shadow for the vm page array. For now, this array is
not protected by the shadow map to help reduce kernel memory usage.
- Grow shadows when growing the kernel map.
- Increase the default kernel stack size when KMSAN is enabled. As with
KASAN, sanitizer instrumentation appears to create stack frames large
enough that the default value is not sufficient.
- Disable UMA's use of the direct map when KMSAN is configured. KMSAN
cannot validate the direct map.
- Disable unmapped I/O when KMSAN configured.
- Lower the limit on paging buffers when KMSAN is configured. Each
buffer has a static MAXPHYS-sized allocation of KVA, which in turn
eats 2*MAXPHYS of space in the shadow map.
Reviewed by: alc, kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31295
KMSAN enables the use of LLVM's MemorySanitizer in the kernel. This
enables precise detection of uses of uninitialized memory. As with
KASAN, this feature has substantial runtime overhead and is intended to
be used as part of some automated testing regime.
The runtime maintains a pair of shadow maps. One is used to track the
state of memory in the kernel map at bit-granularity: a bit in the
kernel map is initialized when the corresponding shadow bit is clear,
and is uninitialized otherwise. The second shadow map stores
information about the origin of uninitialized regions of the kernel map,
simplifying debugging.
KMSAN relies on being able to intercept certain functions which cannot
be instrumented by the compiler. KMSAN thus implements interceptors
which manually update shadow state and in some cases explicitly check
for uninitialized bytes. For instance, all calls to copyout() are
subject to such checks.
The runtime exports several functions which can be used to verify the
shadow map for a given buffer. Helpers provide the same functionality
for a few structures commonly used for I/O, such as CAM CCBs, BIOs and
mbufs. These are handy when debugging a KMSAN report whose
proximate and root causes are far away from each other.
Obtained from: NetBSD
Sponsored by: The FreeBSD Foundation
KMSAN requires two shadow maps, each one-to-one with the kernel map.
Allocate regions of the kernels PML4 page for them. Add functions to
create mappings in the shadow map regions, these will be used by the
KMSAN runtime.
Reviewed by: alc, kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31295
These ones were unambiguous cases where the Foundation was the only
listed copyright holder (in the associated license block).
Sponsored by: The FreeBSD Foundation
Current expression checks that vm_page_alloc(9) never returns a page
belonging to the preload area. This is not true if something was freed
from there, for instance a preloaded module was unloaded, or ucode update
freed.
Only check that we never allow to allocate a page belonging to the kernel
proper, check against _end.
Reported and tested by: dhw
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
which is the place to put MD asserts about allocated pages.
On amd64, verify that allocated page does not belong to the kernel
(text, data) or early allocated pages.
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D31121
Allow any 2M aligned contiguous location below 4G for the staging
area location. It should still be mapped by loader at KERNBASE.
The assumption kernel makes about loader->kernel handoff with regard to
the MMU programming are explicitly listed at the beginning of hammer_time(),
where kernphys is calculated. Now kernphys is the variable instead of
symbol designating the physical address.
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D31121
KASAN and KCSAN implement interceptors for various primitive operations
that are not instrumented by the compiler. KMSAN requires them as well.
Rather than adding new cases for each sanitizer which requires
interceptors, implement the following protocol:
- When interceptor definitions are required, define
SAN_NEEDS_INTERCEPTORS and SANITIZER_INTERCEPTOR_PREFIX.
- In headers that declare functions which need to be intercepted by a
sanitizer runtime, use SANITIZER_INTERCEPTOR_PREFIX to provide
declarations.
- When SAN_RUNTIME is defined, do not redefine the names of intercepted
functions. This is typically the case in files which implement
sanitizer runtimes but is also needed in, for example, files which
define ifunc selectors for intercepted operations.
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Stop using temporal page table with 1:1 mapping of low 1G populated over
the whole VA. Use 1:1 mapping of low 4G temporarily installed in the
normal kernel page table.
The features are:
- now there is one less step for startup asm to perform
- the startup code still needs to be at lower 1G because CPU starts in
real mode. But everything else can be located anywhere in low 4G
because it is accessed by non-paged 32bit protected mode. Note that
kernel page table root page is at low 4G, as well as the kernel itself.
- the page table pages can be allocated by normal allocator, there is
no need to carve them from the phys_avail segments at very early time.
The allocation of the page for startup code still requires some magic.
Pages are freed after APs are ignited.
- la57 startup for APs is less tricky, we directly load the final page
table and do not need to tweak the paging mode.
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D31121
The syscall number is stored in the same register as the syscall return
on amd64 (and possibly other architectures) and so it is impossible to
recover in the signal handler after the call has returned. This small
tweak delivers it in the `si_value` field of the signal, which is
sufficient to catch capability violations and emulate them with a call
to a more-privileged process in the signal handler.
This reapplies 3a522ba1bc with a fix for
the static assertion failure on i386.
Approved by: markj (mentor)
Reviewed by: kib, bcr (manpages)
Differential Revision: https://reviews.freebsd.org/D29185
The syscall number is stored in the same register as the syscall return
on amd64 (and possibly other architectures) and so it is impossible to
recover in the signal handler after the call has returned. This small
tweak delivers it in the `si_value` field of the signal, which is
sufficient to catch capability violations and emulate them with a call
to a more-privileged process in the signal handler.
Approved by: markj (mentor)
Reviewed by: kib, bcr (manpages)
Differential Revision: https://reviews.freebsd.org/D29185
PVHv1 was officially removed from Xen in 4.9, so just axe the related
code from FreeBSD.
Note FreeBSD supports PVHv2, which is the replacement for PVHv1.
Sponsored by: Citrix Systems R&D
Reviewed by: kib, Elliott Mitchell
Differential Revision: https://reviews.freebsd.org/D30228
- Initialize KASAN before executing SYSINITs.
- Add a GENERIC-KASAN kernel config, akin to GENERIC-KCSAN.
- Increase the kernel stack size if KASAN is enabled. Some of the
ASAN instrumentation increases stack usage and it's enough to
trigger stack overflows in ZFS.
- Mark the trapframe as valid in interrupt handlers if it is
assigned to td_intr_frame. Otherwise, an interrupt in a function
which creates a poisoned alloca region can trigger false positives.
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D29455
The idea behind KASAN is to use a region of memory to track the validity
of buffers in the kernel map. This region is the shadow map. The
compiler inserts calls to the KASAN runtime for every emitted load
and store, and the runtime uses the shadow map to decide whether the
access is valid. Various kernel allocators call kasan_mark() to update
the shadow map.
Since the shadow map tracks only accesses to the kernel map, accesses to
other kernel maps are not validated by KASAN. UMA_MD_SMALL_ALLOC is
disabled when KASAN is configured to reduce usage of the direct map.
Currently we have no mechanism to completely eliminate uses of the
direct map, so KASAN's coverage is not comprehensive.
The shadow map uses one byte per eight bytes in the kernel map. In
pmap_bootstrap() we create an initial set of page tables for the kernel
and preloaded data.
When pmap_growkernel() is called, we call kasan_shadow_map() to extend
the shadow map. kasan_shadow_map() uses pmap_kasan_enter() to allocate
memory for the shadow region and map it.
Reviewed by: kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D29417
This is intended to be used with memory mapped IO, e.g. from
bus_space_map with no flags, or pmap_mapdev.
Use this new memory type in the map request configured by
resource_init_map_request, and in pciconf.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D29692
On some systems (e.g. Lenovo ThinkPad X240, Apple MacBookPro12,1)
the SMBIOS entry point is not found in the <0xFFFFF space.
Follow the SMBIOS spec and use the EFI Configuration Table for
locating the entry point on EFI systems.
Reviewed by: rpokala, dab
MFC after: 1 week
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D29276
The remote protocol allows for implementations to report more specific
reasons for the break in execution back to the client [1]. This is
entirely optional, so it is only implemented for amd64, arm64, and i386
at the moment.
[1] https://sourceware.org/gdb/current/onlinedocs/gdb/Stop-Reply-Packets.html
Reviewed by: jhb
MFC after: 3 weeks
Sponsored by: NetApp, Inc.
Sponsored by: Klara, Inc.
NetApp PR: 51
Differential Revision: https://reviews.freebsd.org/D29174
Add wrappers around the dbreg interface that can be consumed by MI
kernel debugger code. The dbreg functions themselves are updated to
return error codes, not just -1. dbreg_set_watchpoint() is extended to
accept access bits as an argument.
Reviewed by: jhb, kib, markj
MFC after: 3 weeks
Sponsored by: NetApp, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D29155
Make it easy to define interceptors for new sanitizer runtimes, rather
than assuming KCSAN. Lay a bit of groundwork for KASAN and KMSAN.
When a sanitizer is compiled in, atomic(9) and bus_space(9) definitions
in atomic_san.h are used by default instead of the inline
implementations in the platform's atomic.h. These definitions are
implemented in the sanitizer runtime, which includes
machine/{atomic,bus}.h with SAN_RUNTIME defined to pull in the actual
implementations.
No functional change intended.
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
e4b8deb222 removed the last in-tree uses of PCPU_INC(). Its
potential benefit is also practically nonexistent. Non-x86
platforms already implement it as PCPU_ADD(..., 1), and according
to [0] there are no recent x86 processors for which the 'inc'
instruction provides a performance benefit over the equivalent
memory-operand form of the 'add' instruction. The only remaining
benefit of 'inc' is smaller instruction size, which in this case
is inconsequential given the limited number of per-CPU data consumers.
[0]: https://www.agner.org/optimize/instruction_tables.pdf
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D29308
We want to allow the UEFI firmware to enumerate and assign
addresses to PCI devices so we can boot from NVMe[1]. Address
assignment of PCI BARs is properly handled by the PCI emulation
code in general, but a few specific cases need additional support.
fbuf and passthru map additional objects into the guest physical
address space and so need to handle address updates. Here we add a
callback to emulated PCI devices to inform them of a BAR
configuration change. fbuf and passthru then watch for these BAR
changes and relocate the frame buffer memory segment and passthru
device mmio area respectively.
We also add new VM_MUNMAP_MEMSEG and VM_UNMAP_PPTDEV_MMIO ioctls
to vmm(4) to facilitate the unmapping needed for addres updates.
[1]: https://github.com/freebsd/uefi-edk2/pull/9/
Originally by: scottph
MFC After: 1 week
Sponsored by: Intel Corporation
Reviewed by: grehan
Approved by: philip (mentor)
Differential Revision: https://reviews.freebsd.org/D24066
This change converts most of the counters in the amd64 pmap from
global atomics to scalable counter(9) counters. Per discussion
with kib@, it also removes the handrolled per-CPU PCID save count
as it isn't considered generally useful.
The bulk of these counters remain guarded by PV_STATS, as it seems
unlikely that they will be useful outside of very specific debugging
scenarios. However, this change does add two new counters that
are available without PV_STATS. pt_page_count and pv_page_count
track the number of active physical-to-virtual list pages and page
table pages, respectively. These will be useful in evaluating
the memory footprint of pmap structures under various workloads,
which will help to guide future changes in this area.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D28923
Other kernel sanitizers (KMSAN, KASAN) require interceptors as well, so
put these in a more generic place as a step towards importing the other
sanitizers.
No functional change intended.
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D29103
In some cases the DELAY implementation on amd64 can recurse on a spin
mutex in the i8254 early delay code. Detect when this is going to
happen and don't call delay in this case. It is safe to not delay here
with the only issue being KCSAN may not detect data races.
Reviewed by: kib
Tested by: arichardson
Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D28895