Commit graph

4831 commits

Author SHA1 Message Date
Mark Johnston
0db4588bbe thread: Simplify sanitizer integration with thread creation
fork() may allocate a new thread in one of two ways: from UMA, or cached
in a freed proc that was just allocated from UMA.  In either case, KASAN
and KMSAN need to initialize some state; in particular they need to
initialize the shadow mapping of the new thread's stack.

This is done differently between KASAN and KMSAN, which is confusing.
This patch improves things a bit:
- Add a new thread_recycle() function, which moves all kernel stack
  handling out of kern_fork.c, since it doesn't really belong there.
- Then, thread_alloc_stack() has only one local caller, so just inline
  it.
- Avoid redundant shadow stack initialization: thread_alloc()
  initializes the KMSAN shadow stack (via kmsan_thread_alloc()) even
  through vm_thread_new() already did that.
- Add kasan_thread_alloc(), for consistency with kmsan_thread_alloc().

No functional change intended.

Reviewed by:	khng
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D44891

(cherry picked from commit 800da341bc4a35f4b4d82d104b130825d9a42ffa)
2025-02-07 14:46:53 +00:00
Mark Johnston
67e54a07e9 vm_pageout: Add a chicken switch for multithreaded PQ_INACTIVE scanning
Right now we have the vm.pageout_cpus_per_thread tunable which controls
the number of threads to start up per CPU per NUMA domain, but after
booting, it's not possible to disable multi-threaded scanning.

There is at least one workload where this mechanism doesn't work well;
let's make it possible to disable it without a reboot, to simplify
troubleshooting.

Reviewed by:	dougm, kib
MFC after:	2 weeks
Sponsored by:	Klara, Inc.
Sponsored by:	Modirum MDPay
Differential Revision:	https://reviews.freebsd.org/D48377

(cherry picked from commit 55b343f4f9bc586eba5e26a2524a35f04dd60c65)
2025-01-23 13:58:07 +00:00
Mark Johnston
e16a2508e1 vm_pageout: Make vmd_oom a bool
No functional change intended.

Reviewed by:	dougm, kib
MFC after:	1 week
Sponsored by:	Klara, Inc.
Sponsored by:	Modirum MDPay
Differential Revision:	https://reviews.freebsd.org/D48376

(cherry picked from commit fe1165df4b776b14b21a04d2ef3fc4c46740c2f5)
2025-01-17 13:18:51 +00:00
Mark Johnston
b64e348055 buf: Add a runningbufclaim() helper
No functional change intended.

Reviewed by:	kib
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D47696

(cherry picked from commit 4efe531c9d50a803a28d001fab9cc3011eb1f587)
2024-12-06 14:51:09 +00:00
Mark Johnston
34182e597b swap_pager: Ensure that swapoff puts swapped-in pages in page queues
Readahead/behind pages are handled by the swap pager, but the get_pages
caller is responsible for putting fetched pages into queues (or wiring
them beforehand).

Note that the VM object lock prevents the newly queued page from being
immediately reclaimed in the window before it is marked dirty by
swap_pager_swapoff_object().

Reported by:	pho
Tested by:	pho
Reviewed by:	dougm, alc, kib
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D47526

(cherry picked from commit d11d407aee4835fd50811a5980125bb46748fa0b)
2024-11-28 14:38:17 +00:00
Konstantin Belousov
92a9501b6b vm_object: do not assume that un_pager.devp.dev is cdev
PR:	282533

(cherry picked from commit 580340dbdaaf372867e9ed3dd257430982753e5e)
2024-11-13 01:19:18 +02:00
Konstantin Belousov
c57dc755fa device_pager: rename the un_pager.devp.dev field to handle
(cherry picked from commit f0c07fe3d0007a4499f81583a99598cd0a74d45b)
2024-11-13 01:19:18 +02:00
Mark Johnston
1d271ba05f vm_meter: Fix laundry accounting
Pages in PQ_UNSWAPPABLE should be considered part of the laundry.
Otherwise, on systems with no swap, the total amount of memory visible
to tools like top(1) decreases.

It doesn't seem very useful to have a dedicated counter for unswappable
pages, and updating applications accordingly would be painful, so just
lump them in with laundry for now.

PR:		280846
Reviewed by:	bnovkov, kib
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D47216

(cherry picked from commit 6a07e67fb7a8b5687a492d9d70a10651d5933ff5)
2024-10-29 13:04:25 +00:00
Mark Johnston
9b42b98638 vm_object: Report laundry pages in kinfo_vmobject
Reviewed by:	bnovkov, kib
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D47214

(cherry picked from commit a86373bc93ee1c850943e8585d0d426479378145)
2024-10-29 13:04:25 +00:00
Mark Johnston
b947b53f0f vm_page: Fix a logic bug in vm_page_unwire_managed()
When releasing a page reference, we have logic for various cases, based
on the value of the counter.  But, the implementation fails to take into
account the possibility that the VPRC_BLOCKED flag is set, which is ORed
into the counter for short windows when removing mappings of a page.  If
the flag is set while the last reference is being released, we may fail
to add the page to a page queue when the last wiring reference is
released.

Fix the problem by performing comparisons with VPRC_BLOCKED masked off.
While here, add a related assertion.

Reviewed by:	dougm, kib
Tested by:	pho
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D46944

(cherry picked from commit c59166e5b4e8821556a3d23af7bd17ca556f2e22)
2024-10-26 12:58:50 +00:00
Konstantin Belousov
dfe83ae4da sysctl vm.vm_objects: report cdev name for device-backed objects
(cherry picked from commit d9daa28c364d0b1189ab616d8d697b4c9f748038)
2024-10-15 18:03:59 +03:00
Konstantin Belousov
dd7b445698 sysctl vm.objects: report objects backing posix shm segments
(cherry picked from commit b0b18b57a55b429cf3f625883da5dcb541b14960)
2024-10-15 17:50:17 +03:00
Konstantin Belousov
ec35a9c65d posix shm: mark backing objects with SHM_POSIXSHM flag
(cherry picked from commit a10870ecea813042db7c41e906e1a5c5693f8a34)
2024-10-15 17:50:17 +03:00
Konstantin Belousov
987c8e9afa kinfo_{vmobject,vmentry}: move copy of pathes into the vnode handling scope
(cherry picked from commit 71a66883b58f796baf2bf79a43a91c16a71673b3)
2024-10-15 17:50:16 +03:00
Konstantin Belousov
4d5f771c43 kinfo_vmobject: report backing object of the SysV shm segments
(cherry picked from commit 6a3fbdc7e9c8323a1c13c4afcc65f89cb47911e6)
2024-10-15 17:50:16 +03:00
Konstantin Belousov
1ef669ae41 vm_object: add OBJ_SYSVSHM flag to indicate SysV shm backing object
(cherry picked from commit f186252e0d6ef970a23c6af12ec34003df56055d)
2024-10-15 17:50:16 +03:00
Konstantin Belousov
891664589b vm_object: reformat flags definitions
(cherry picked from commit 34935a6b3723422ef27ce4eb80fbe52c3dab12fc)
2024-10-15 17:50:15 +03:00
Mark Johnston
3464b209d6 vm_object: Fix the argument type to vm_object_set_flag()
Reported by:	kib
Fixes:		9d52823bf1df ("vm_object: Widen the flags field")

(cherry picked from commit 7f1dfd6c33dbbb6c1136e987de554c5c5a7d014d)
2024-10-15 13:45:42 +00:00
Mark Johnston
8a5a9dbf38 vm_object: Widen the flags field
Take advantage of a nearby 2-byte hole to avoid growing the struct.
This way, only the offsets of "flags" and "pg_color" change.  Bump
__FreeBSD_version since some out-of-tree kernel modules may access these
fields, though I haven't found any examples so far.

Reviewed by:	kib
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D35905

(cherry picked from commit 9d52823bf1dfac237e58b5208299aaa5e2df42e9)
2024-10-15 13:45:28 +00:00
Mark Johnston
94e5ec7f86 vm_page: Use atomic loads for cmpset loops
Make sure that the compiler loads the initial value value only once.
Because atomic_fcmpset is used to load the value for subsequent
iterations, this is probably not needed, but we should not rely on that.

I verified that code generated for an amd64 GENERIC kernel does not
change.

Reviewed by:	dougm, alc, kib
Tested by:	pho
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D46943

(cherry picked from commit d8b32da2354d2fd72ae017fd63affa3684786e1f)
2024-10-15 12:39:44 +00:00
Konstantin Belousov
b0e45fea61 vm_page_free_pages_toq(): return the count of freed pages
(cherry picked from commit 1784fb44498da8007fb8cd8ee5060894eb5fe1e6)
2024-10-05 10:08:56 +03:00
Konstantin Belousov
4cb8ec6c6f vm_map: add vm_map_find_locked(9)
(cherry picked from commit 0ecbb28ce351652b3a2dae271eedf1eb3aa65400)
2024-10-05 10:08:54 +03:00
Andrew Turner
bdbb0be043 vm: Add kva_alloc_aligned
Add a function like kva_alloc that allows us to specify the alignment
of the virtual address space returned.

Reviewed by:	alc, kib, markj
Sponsored by:	Arm Ltd
Differential Revision:	https://reviews.freebsd.org/D42788

(cherry picked from commit 839999e7efdc980d5ada92ea93719c7e29765809)
2024-09-02 08:43:18 +00:00
Andrew Turner
3736b79f0f vm: Use vmem_xalloc in kva_alloc
The kernel_arena used in kva_alloc has the qcache disabled. vmem_alloc
will first try to use the qcache before falling back to vmem_xalloc.

Rather than trying to use the qcache in vmem_alloc just call
vmem_xalloc directly.

Reviewed by:	alc, kib, markj
Sponsored by:	Arm Ltd
Differential Revision:	https://reviews.freebsd.org/D42831

(cherry picked from commit 8daee410d2c13b4e8530b00e7877eeecf30bb064)
2024-09-02 08:43:01 +00:00
Konstantin Belousov
049a256e9a vm_page: add vm_page_clearref() helper
(cherry picked from commit 45cde0e439188589ca2511f6fd76829cbf68267e)
2024-07-21 11:50:29 +03:00
Konstantin Belousov
7a3d7aec41 pmap: move the smp_targeted_tlb_shutdown pointer stuff to amd64 pmap.h
Fixes:	bec000c9c1ef409989685bb03ff0532907befb4aESC
Sponsored by:	The FreeBSD Foundation

(cherry picked from commit 9c5d7e4a0c02bc45b61f565586da2abcc65d70fa)
2024-07-01 13:07:38 +00:00
Souradeep Chakrabarti
840d8e0c30 amd64: add a func pointer to tlb shootdown function
Make the tlb shootdown function as a pointer. By default, it still
points to the system function smp_targeted_tlb_shootdown(). It allows
other implemenations to overwrite in the future.

Reviewed by:	kib
Tested by:	whu
Authored-by:    Souradeep Chakrabarti <schakrabarti@microsoft.com>
Co-Authored-by: Erni Sri Satya Vennela <ernis@microsoft.com>
MFC after:	1 week
Sponsored by:	Microsoft
Differential Revision:	https://reviews.freebsd.org/D45174

(cherry picked from commit bec000c9c1ef409989685bb03ff0532907befb4a)
2024-07-01 13:03:02 +00:00
Mitchell Horne
227b486de4 Adjust comments referencing vm_mem_init()
I cannot find a time where the function was not named this.

Reviewed by:	kib, markj
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D45383

(cherry picked from commit deab57178f0b06eab56d7811674176985a8ea98d)
2024-06-06 11:23:01 -03:00
Konstantin Belousov
f0d6377a3e swap-like pagers: assert that writemapping decrease does not pass zero
(cherry picked from commit 6ada4e8a0ae901f0012015c8d277d80aad7d8f37)
2024-05-19 03:57:54 +03:00
Konstantin Belousov
4018bcdea8 cdev_pager_allocate(): ensure that the cdev_pager_ops ctr is called only once
PR:	278826

(cherry picked from commit e93404065177d6c909cd64bf7d74fe0d8df35edf)
2024-05-19 03:57:54 +03:00
Minsoo Choo
1f85f06276 vm_reserv_reclaim_contig: Return NULL not false
Reviewed by:	dougm, zlei
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D44667

(cherry picked from commit 989a2cf19d053954f0bad28790114a374b05c9c1)
2024-04-17 10:33:26 -04:00
Mark Johnston
5128c8948b swap_pager: Unbusy readahead pages after an I/O error
The swap pager itself allocates readahead pages, so should take care to
unbusy them after a read error, just as it does in the non-error case.

PR:		277538
Reviewed by:	olce, dougm, alc, kib
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D44646

(cherry picked from commit 4696650782e2e5cf7ae5823f1de04550c05b5b75)
2024-04-15 10:05:13 -04:00
Konstantin Belousov
8788c3d3fa sysctl vm.objects/vm.swap_objects: do not fill vnode info if jailed
(cherry picked from commit 38f5f2a4af5daeec7f13d39cad1ff4dc90da52d8)
2024-01-24 15:04:07 +02:00
Konstantin Belousov
e23813c0fe vm/vm_object.c: minor cleanup
(cherry picked from commit 69748e62e82a1f5ef77fd3e1b0c9d7e6a89d22b2)
2024-01-20 02:32:20 +02:00
Konstantin Belousov
64e869e9b9 Add vnode_pager_clean_{a,}sync(9)
(cherry picked from commit b068bb09a1a82d9fef0e939ad6135443a959e290)
2024-01-18 02:51:33 +02:00
Konstantin Belousov
c4c138072a vnode_pager_generic_putpages(): rename maxblksz local to max_offset
(cherry picked from commit ed1a88a3116a59b4fd37912099a575b4c8f559dc)
2024-01-18 02:51:33 +02:00
Konstantin Belousov
8ecd7bfd6c vnode_pager_generic_putpages(): correctly handle clean block at EOF
PR:	276191

(cherry picked from commit bdb46c21a3e68d4395d6e0b6a205187e655532b0)
2024-01-18 02:51:32 +02:00
Alexander Motin
58f5c260a2 uma: Micro-optimize memory trashing
Use u_long for memory accesses instead of uint32_t.  On my tests on
amd64 this by ~30% reduces time spent in those functions thanks to
bigger 64bit accesses.  i386 still uses 32bit accesses.

MFC after:	1 month

(cherry picked from commit 7c566d6cfc7bfb913bad89d87386fa21dce8c2e6)
2023-12-08 21:32:43 -05:00
Doug Moore
210fce73ae vm_phys: fix freelist_contig
vm_phys_find_freelist_contig is called to search a list of max-sized
free page blocks and find one that, when joined with adjacent blocks
in memory, can satisfy a request for a memory allocation bigger than
any single max-sized free page block. In commit
fa8a6585c7, I defined this function in
order to offer two improvements: 1) reduce the worst-case search time,
and 2) allow solutions that include less-than max-sized free page
blocks at the front or back of the giant allocation. However, it turns
out that this change introduced an error, reported in In Bug
274592. That error concerns failing to check segment boundaries. This
change fixes an error in vm_phys_find_freelist_config that resolves
that bug. It also abandons improvement 2), because the value of that
improvement is small and because preserving it would require more
testing than I am able to do.

PR:		274592
Reported by:	shafaisal.us@gmail.com
Reviewed by:	alc, markj
Tested by:	shafaisal.us@gmail.com
Fixes:	fa8a6585c7 vm_phys: avoid waste in multipage allocation
MFC after:	10 days
Differential Revision:	https://reviews.freebsd.org/D42509

(cherry picked from commit 2a4897bd4e1bd8430d955abd3cf6675956bb9d61)
2023-11-24 19:19:05 -06:00
Olivier Certner
25e0e25afd uma: Permit specifying max of cache line and some custom alignment
To be used for structures for which we want to enforce that pointers to
them have some number of lower bits always set to 0, while still
ensuring we benefit from cache line alignment to avoid false sharing
between structures and fields within the structures (provided they are
properly ordered).

First candidate consumer that comes to mind is 'struct thread', see next
commit.

Reviewed by:            markj, kib
MFC after:              2 weeks
Sponsored by:           The FreeBSD Foundation
Differential Revision:  https://reviews.freebsd.org/D42265

(cherry picked from commit 733e0abd2897289e2acf70f7c72e31a5a560394a)
2023-11-16 10:07:18 -05:00
Olivier Certner
7deedba4e5 uma: New check_align_mask(): Validate alignments (INVARIANTS)
New function check_align_mask() asserts (under INVARIANTS) that the mask
fits in a (signed) integer (see the comment) and that the corresponding
alignment is a power of two.

Use check_align_mask() in uma_set_align_mask() and also in uma_zcreate()
to replace the KASSERT() there (that was checking only for a power of
2).

Reviewed by:            kib, markj
MFC after:              2 weeks
Sponsored by:           The FreeBSD Foundation
Differential Revision:  https://reviews.freebsd.org/D42263

(cherry picked from commit 87090f5e5a7b927a2ab30878435f6dcba0705a1d)
2023-11-16 10:07:17 -05:00
Olivier Certner
690ca45aeb uma: Make the cache alignment mask unsigned
In uma_set_align_mask(), ensure that the passed value doesn't have its
highest bit set, which would lead to problems since keg/zone alignment
is internally stored as signed integers.  Such big values do not make
sense anyway and indicate some programming error.  A future commit will
introduce checks for this case and other ones.

Reviewed by:            kib, markj
MFC after:              2 weeks
Sponsored by:           The FreeBSD Foundation
Differential Revision:  https://reviews.freebsd.org/D42262

(cherry picked from commit 3d8f548b9e5772ff6890bdc01f7ba7b76203857d)
2023-11-16 10:07:16 -05:00
Olivier Certner
c98dded0c7 uma: UMA_ALIGN_CACHE: Resolve the proper value at use point
Having a special value of -1 that is resolved internally to
'uma_align_cache' provides no significant advantages and prevents
changing that variable to an unsigned type, which is natural for an
alignment mask.  So suppress it and replace its use with a call to
uma_get_align_mask().  The small overhead of the added function call is
irrelevant since UMA_ALIGN_CACHE is only used when creating new zones,
which is not performance critical.

Reviewed by:            markj, kib
MFC after:              2 weeks
Sponsored by:           The FreeBSD Foundation
Differential Revision:  https://reviews.freebsd.org/D42259

(cherry picked from commit e557eafe7233f8231c1f5f5b098e4bab8e818645)
2023-11-16 10:07:11 -05:00
Olivier Certner
4587326893 uma: Hide 'uma_align_cache'; Create/rename accessors
Create the uma_get_cache_align_mask() accessor and put it in a separate
private header so as to minimize namespace pollution in header/source
files that need only this function and not the whole 'uma.h' header.

Make sure the accessors have '_mask' as a suffix, so that callers are
aware that the real alignment is the power of two that is the mask plus
one.  Rename the stem to something more explicit.  Rename
uma_set_cache_align_mask()'s single parameter to 'mask'.

Hide 'uma_align_cache' to ensure that it cannot be set in any other way
then by a call to uma_set_cache_align_mask(), which will perform sanity
checks in a further commit.  While here, rename it to
'uma_cache_align_mask'.

This is also in preparation for some further changes, such as improving
the sanity checks, eliminating internal resolving of UMA_ALIGN_CACHE and
changing the type of the 'uma_cache_align_mask' variable.

Reviewed by:            markj, kib
MFC after:              2 weeks
Sponsored by:           The FreeBSD Foundation
Differential Revision:  https://reviews.freebsd.org/D42258

(cherry picked from commit dc8f7692fd1de628814f4eaf4a233dccf4c92199)
2023-11-16 10:07:07 -05:00
Zhenlei Huang
e26b7e8d02 vm_phys: Add corresponding sysctl knob for loader tunable
The loader tunable 'vm.numa.disabled' does not have corresponding sysctl
MIB entry. Add it so that it can be retrieved, and `sysctl -T` will also
report it correctly.

Reviewed by:	markj
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D42138

(cherry picked from commit c415cfc8be1b732a80f1ada6d52091e08eeb9ab5)
2023-10-19 22:00:57 +08:00
Zhenlei Huang
cb5bc8a748 vm_page: Add corresponding sysctl knob for loader tunable
The loader tunable 'vm.pgcache_zone_max_pcpu' does not have corresponding
sysctl MIB entry. Add it so that it can be retrieved, and `sysctl -T`
will also report it correctly.

Reviewed by:	markj
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D42138

(cherry picked from commit a55fbda874db31b804490567c69502c891b6ff61)
2023-10-19 22:00:57 +08:00
Gordon Bergling
44e3ce37f2 uma.h: Fix a typo in a source code comment
- s/setable/settable/

(cherry picked from commit fc9f1d2c6391b1a4b133aab56ace625b72c9ea85)
2023-10-18 07:57:16 +02:00
Mark Johnston
aa229a59ad swap_pager: Fix a race in swap_pager_swapoff_object()
When we disable swapping to a device, we scan the full VM object list
looking for objects with swap trie nodes that reference the device in
question.  The pages corresponding to those nodes are paged in.

While paging in, we drop the VM object lock.  Moreover, we do not hold a
reference for the object; swap_pager_swapoff_object() merely bumps the
paging-in-progress counter.  vm_object_terminate() waits for this
counter to drain before proceeding and freeing pages.

However, swap_pager_swapoff_object() decrements the counter before
re-acquiring the VM object lock, which means that vm_object_terminate()
can race to acquire the lock and free the pages.  Then,
swap_pager_swapoff_object() ends up unbusying a freed page.  Fix the
problem by acquiring the lock before waking up sleepers.

PR:		273610
Reported by:	Graham Perrin <grahamperrin@gmail.com>
Reviewed by:	kib
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D42029

(cherry picked from commit e61568aeeec7667789e6c9d4837e074edecc990e)
2023-10-08 20:41:35 -04:00
Konstantin Belousov
8882b7852a add pmap_active_cpus()
For amd64, i386, arm, and riscv, i.e. all architectures except arm64,
the custom implementation is provided since we maintain the bitmask of
active CPUs anyway.

Arm64 uses somewhat naive iteration over CPUs and match current vmspace'
pmap with the argument. It is not guaranteed that vmspace->pmap is the
same as the active pmap, but the inaccuracy should be toleratable.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32360
2023-08-23 03:02:21 +03:00
Konstantin Belousov
5f452214f2 vm_map.c: fix syntax
Fixes:	c718009884
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2023-08-18 16:37:16 +03:00