fork() may allocate a new thread in one of two ways: from UMA, or cached
in a freed proc that was just allocated from UMA. In either case, KASAN
and KMSAN need to initialize some state; in particular they need to
initialize the shadow mapping of the new thread's stack.
This is done differently between KASAN and KMSAN, which is confusing.
This patch improves things a bit:
- Add a new thread_recycle() function, which moves all kernel stack
handling out of kern_fork.c, since it doesn't really belong there.
- Then, thread_alloc_stack() has only one local caller, so just inline
it.
- Avoid redundant shadow stack initialization: thread_alloc()
initializes the KMSAN shadow stack (via kmsan_thread_alloc()) even
through vm_thread_new() already did that.
- Add kasan_thread_alloc(), for consistency with kmsan_thread_alloc().
No functional change intended.
Reviewed by: khng
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D44891
(cherry picked from commit 800da341bc4a35f4b4d82d104b130825d9a42ffa)
Right now we have the vm.pageout_cpus_per_thread tunable which controls
the number of threads to start up per CPU per NUMA domain, but after
booting, it's not possible to disable multi-threaded scanning.
There is at least one workload where this mechanism doesn't work well;
let's make it possible to disable it without a reboot, to simplify
troubleshooting.
Reviewed by: dougm, kib
MFC after: 2 weeks
Sponsored by: Klara, Inc.
Sponsored by: Modirum MDPay
Differential Revision: https://reviews.freebsd.org/D48377
(cherry picked from commit 55b343f4f9bc586eba5e26a2524a35f04dd60c65)
Readahead/behind pages are handled by the swap pager, but the get_pages
caller is responsible for putting fetched pages into queues (or wiring
them beforehand).
Note that the VM object lock prevents the newly queued page from being
immediately reclaimed in the window before it is marked dirty by
swap_pager_swapoff_object().
Reported by: pho
Tested by: pho
Reviewed by: dougm, alc, kib
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D47526
(cherry picked from commit d11d407aee4835fd50811a5980125bb46748fa0b)
Pages in PQ_UNSWAPPABLE should be considered part of the laundry.
Otherwise, on systems with no swap, the total amount of memory visible
to tools like top(1) decreases.
It doesn't seem very useful to have a dedicated counter for unswappable
pages, and updating applications accordingly would be painful, so just
lump them in with laundry for now.
PR: 280846
Reviewed by: bnovkov, kib
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D47216
(cherry picked from commit 6a07e67fb7a8b5687a492d9d70a10651d5933ff5)
When releasing a page reference, we have logic for various cases, based
on the value of the counter. But, the implementation fails to take into
account the possibility that the VPRC_BLOCKED flag is set, which is ORed
into the counter for short windows when removing mappings of a page. If
the flag is set while the last reference is being released, we may fail
to add the page to a page queue when the last wiring reference is
released.
Fix the problem by performing comparisons with VPRC_BLOCKED masked off.
While here, add a related assertion.
Reviewed by: dougm, kib
Tested by: pho
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D46944
(cherry picked from commit c59166e5b4e8821556a3d23af7bd17ca556f2e22)
Take advantage of a nearby 2-byte hole to avoid growing the struct.
This way, only the offsets of "flags" and "pg_color" change. Bump
__FreeBSD_version since some out-of-tree kernel modules may access these
fields, though I haven't found any examples so far.
Reviewed by: kib
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D35905
(cherry picked from commit 9d52823bf1dfac237e58b5208299aaa5e2df42e9)
Make sure that the compiler loads the initial value value only once.
Because atomic_fcmpset is used to load the value for subsequent
iterations, this is probably not needed, but we should not rely on that.
I verified that code generated for an amd64 GENERIC kernel does not
change.
Reviewed by: dougm, alc, kib
Tested by: pho
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D46943
(cherry picked from commit d8b32da2354d2fd72ae017fd63affa3684786e1f)
Add a function like kva_alloc that allows us to specify the alignment
of the virtual address space returned.
Reviewed by: alc, kib, markj
Sponsored by: Arm Ltd
Differential Revision: https://reviews.freebsd.org/D42788
(cherry picked from commit 839999e7efdc980d5ada92ea93719c7e29765809)
The kernel_arena used in kva_alloc has the qcache disabled. vmem_alloc
will first try to use the qcache before falling back to vmem_xalloc.
Rather than trying to use the qcache in vmem_alloc just call
vmem_xalloc directly.
Reviewed by: alc, kib, markj
Sponsored by: Arm Ltd
Differential Revision: https://reviews.freebsd.org/D42831
(cherry picked from commit 8daee410d2c13b4e8530b00e7877eeecf30bb064)
Fixes: bec000c9c1ef409989685bb03ff0532907befb4aESC
Sponsored by: The FreeBSD Foundation
(cherry picked from commit 9c5d7e4a0c02bc45b61f565586da2abcc65d70fa)
Make the tlb shootdown function as a pointer. By default, it still
points to the system function smp_targeted_tlb_shootdown(). It allows
other implemenations to overwrite in the future.
Reviewed by: kib
Tested by: whu
Authored-by: Souradeep Chakrabarti <schakrabarti@microsoft.com>
Co-Authored-by: Erni Sri Satya Vennela <ernis@microsoft.com>
MFC after: 1 week
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D45174
(cherry picked from commit bec000c9c1ef409989685bb03ff0532907befb4a)
I cannot find a time where the function was not named this.
Reviewed by: kib, markj
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D45383
(cherry picked from commit deab57178f0b06eab56d7811674176985a8ea98d)
The swap pager itself allocates readahead pages, so should take care to
unbusy them after a read error, just as it does in the non-error case.
PR: 277538
Reviewed by: olce, dougm, alc, kib
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D44646
(cherry picked from commit 4696650782e2e5cf7ae5823f1de04550c05b5b75)
Use u_long for memory accesses instead of uint32_t. On my tests on
amd64 this by ~30% reduces time spent in those functions thanks to
bigger 64bit accesses. i386 still uses 32bit accesses.
MFC after: 1 month
(cherry picked from commit 7c566d6cfc7bfb913bad89d87386fa21dce8c2e6)
vm_phys_find_freelist_contig is called to search a list of max-sized
free page blocks and find one that, when joined with adjacent blocks
in memory, can satisfy a request for a memory allocation bigger than
any single max-sized free page block. In commit
fa8a6585c7, I defined this function in
order to offer two improvements: 1) reduce the worst-case search time,
and 2) allow solutions that include less-than max-sized free page
blocks at the front or back of the giant allocation. However, it turns
out that this change introduced an error, reported in In Bug
274592. That error concerns failing to check segment boundaries. This
change fixes an error in vm_phys_find_freelist_config that resolves
that bug. It also abandons improvement 2), because the value of that
improvement is small and because preserving it would require more
testing than I am able to do.
PR: 274592
Reported by: shafaisal.us@gmail.com
Reviewed by: alc, markj
Tested by: shafaisal.us@gmail.com
Fixes: fa8a6585c7 vm_phys: avoid waste in multipage allocation
MFC after: 10 days
Differential Revision: https://reviews.freebsd.org/D42509
(cherry picked from commit 2a4897bd4e1bd8430d955abd3cf6675956bb9d61)
To be used for structures for which we want to enforce that pointers to
them have some number of lower bits always set to 0, while still
ensuring we benefit from cache line alignment to avoid false sharing
between structures and fields within the structures (provided they are
properly ordered).
First candidate consumer that comes to mind is 'struct thread', see next
commit.
Reviewed by: markj, kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D42265
(cherry picked from commit 733e0abd2897289e2acf70f7c72e31a5a560394a)
New function check_align_mask() asserts (under INVARIANTS) that the mask
fits in a (signed) integer (see the comment) and that the corresponding
alignment is a power of two.
Use check_align_mask() in uma_set_align_mask() and also in uma_zcreate()
to replace the KASSERT() there (that was checking only for a power of
2).
Reviewed by: kib, markj
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D42263
(cherry picked from commit 87090f5e5a7b927a2ab30878435f6dcba0705a1d)
In uma_set_align_mask(), ensure that the passed value doesn't have its
highest bit set, which would lead to problems since keg/zone alignment
is internally stored as signed integers. Such big values do not make
sense anyway and indicate some programming error. A future commit will
introduce checks for this case and other ones.
Reviewed by: kib, markj
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D42262
(cherry picked from commit 3d8f548b9e5772ff6890bdc01f7ba7b76203857d)
Having a special value of -1 that is resolved internally to
'uma_align_cache' provides no significant advantages and prevents
changing that variable to an unsigned type, which is natural for an
alignment mask. So suppress it and replace its use with a call to
uma_get_align_mask(). The small overhead of the added function call is
irrelevant since UMA_ALIGN_CACHE is only used when creating new zones,
which is not performance critical.
Reviewed by: markj, kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D42259
(cherry picked from commit e557eafe7233f8231c1f5f5b098e4bab8e818645)
Create the uma_get_cache_align_mask() accessor and put it in a separate
private header so as to minimize namespace pollution in header/source
files that need only this function and not the whole 'uma.h' header.
Make sure the accessors have '_mask' as a suffix, so that callers are
aware that the real alignment is the power of two that is the mask plus
one. Rename the stem to something more explicit. Rename
uma_set_cache_align_mask()'s single parameter to 'mask'.
Hide 'uma_align_cache' to ensure that it cannot be set in any other way
then by a call to uma_set_cache_align_mask(), which will perform sanity
checks in a further commit. While here, rename it to
'uma_cache_align_mask'.
This is also in preparation for some further changes, such as improving
the sanity checks, eliminating internal resolving of UMA_ALIGN_CACHE and
changing the type of the 'uma_cache_align_mask' variable.
Reviewed by: markj, kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D42258
(cherry picked from commit dc8f7692fd1de628814f4eaf4a233dccf4c92199)
The loader tunable 'vm.numa.disabled' does not have corresponding sysctl
MIB entry. Add it so that it can be retrieved, and `sysctl -T` will also
report it correctly.
Reviewed by: markj
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D42138
(cherry picked from commit c415cfc8be1b732a80f1ada6d52091e08eeb9ab5)
The loader tunable 'vm.pgcache_zone_max_pcpu' does not have corresponding
sysctl MIB entry. Add it so that it can be retrieved, and `sysctl -T`
will also report it correctly.
Reviewed by: markj
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D42138
(cherry picked from commit a55fbda874db31b804490567c69502c891b6ff61)
When we disable swapping to a device, we scan the full VM object list
looking for objects with swap trie nodes that reference the device in
question. The pages corresponding to those nodes are paged in.
While paging in, we drop the VM object lock. Moreover, we do not hold a
reference for the object; swap_pager_swapoff_object() merely bumps the
paging-in-progress counter. vm_object_terminate() waits for this
counter to drain before proceeding and freeing pages.
However, swap_pager_swapoff_object() decrements the counter before
re-acquiring the VM object lock, which means that vm_object_terminate()
can race to acquire the lock and free the pages. Then,
swap_pager_swapoff_object() ends up unbusying a freed page. Fix the
problem by acquiring the lock before waking up sleepers.
PR: 273610
Reported by: Graham Perrin <grahamperrin@gmail.com>
Reviewed by: kib
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D42029
(cherry picked from commit e61568aeeec7667789e6c9d4837e074edecc990e)
For amd64, i386, arm, and riscv, i.e. all architectures except arm64,
the custom implementation is provided since we maintain the bitmask of
active CPUs anyway.
Arm64 uses somewhat naive iteration over CPUs and match current vmspace'
pmap with the argument. It is not guaranteed that vmspace->pmap is the
same as the active pmap, but the inaccuracy should be toleratable.
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D32360