Commit graph

4844 commits

Author SHA1 Message Date
Jason A. Harmening
64f4e2bdf5 Avoid waiting on physical allocations that can't possibly be satisfied
- Change vm_page_reclaim_contig[_domain] to return an errno instead
  of a boolean.  0 indicates a successful reclaim, ENOMEM indicates
  lack of available memory to reclaim, with any other error (currently
  only ERANGE) indicating that reclamation is impossible for the
  specified address range.  Change all callers to only follow
  up with vm_page_wait* in the ENOMEM case.

- Introduce vm_domainset_iter_ignore(), which marks the specified
  domain as unavailable for further use by the iterator.  Use this
  function to ignore domains that can't possibly satisfy a physical
  allocation request.  Since WAITOK allocations run the iterators
  repeatedly, this avoids the possibility of infinitely spinning
  in domain iteration if no available domain can satisfy the
  allocation request.

PR:		274252
Reported by:	kevans
Tested by:	kevans
Reviewed by:	markj
Differential Revision: https://reviews.freebsd.org/D42706

(cherry picked from commit 2619c5ccfe1f7889f0241916bd17d06340142b05)

MFCed as a prerequisite for further MFC of VM domainset changes.  Based
on analysis, it would not hurt, and I have been using it in productions
for months now.

Resolved the trivial conflict due to commit 718d1928f874 ("LinuxKPI:
make linux_alloc_pages() honor __GFP_NORETRY") having been MFCed before
this one.
2025-10-23 08:18:48 +02:00
Mark Johnston
97a057f168 vm_pageout: Disallow invalid values for act_scan_laundry_weight
PR:		234167
MFC after:	2 weeks
Approved by:	re (cperciva)

(cherry picked from commit d8b03c5904faff84656d3a84a25c2b37bcbf8075)
(cherry picked from commit 098e4ecd65492bd23f88f4358f0c6bde13a1e114)
2025-05-03 20:22:08 +00:00
Mark Johnston
21ea2ef51c vm_object: Make a comment more clear
Reviewed by:	alc, kib
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D49675

(cherry picked from commit da05ca9ab655272569f4af99c86d2aff97a0d2ab)
2025-04-24 13:20:52 +00:00
Mark Johnston
afbe2bb9e9 vm_object: Fix handling of wired map entries in vm_object_split()
Suppose a vnode is mapped with MAP_PROT and MAP_PRIVATE, mlock() is
called on the mapping, and then the vnode is truncated such that the
last page of the mapping becomes invalid.  The now-invalid page will be
unmapped, but stays resident in the VM object to preserve the invariant
that a range of pages mapped by a wired map entry is always resident.
This invariant is checked by vm_object_unwire(), for example.

Then, suppose that the mapping is upgraded to PROT_READ|PROT_WRITE.  We
will copy the invalid page into a new anonymous VM object.  If the
process then forks, vm_object_split() may then be called on the object.
Upon encountering an invalid page, rather than moving it into the
destination object, it is removed.  However, this is wrong when the
entry is wired, since the invalid page's wiring belongs to the map
entry; this behaviour also violates the invariant mentioned above.

Fix this by moving invalid pages into the destination object if the map
entry is wired.  In this case we must not dirty the page, so add a flag
to vm_page_iter_rename() to control this.

Reported by:	syzkaller
Reviewed by:	dougm, kib
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D49443

(cherry picked from commit 43c1eb894a57ef30562a02708445c512610d4f02)
2025-04-18 13:53:55 +00:00
Olivier Certner
f3983aeb62
vm_page_startup(): Clarify memory lowest, highest and size computation
Change the comment before this block of code, and separate the latter
from the preceding one by an empty line.

Move the loop on phys_avail[] to compute the minimum and maximum memory
physical addresses closer to the initialization of 'low_avail' and
'high_avail', so that it's immediately clear why the loop starts at
2 (and remove the related comment).

While here, fuse the additional loop in the VM_PHYSSEG_DENSE case that
is used to compute the exact physical memory size.

This change suppresses one occurence of detecting whether at least one
of VM_PHYSSEG_DENSE or VM_PHYSSEG_SPARSE is defined at compile time, but
there is still another one in PHYS_TO_VM_PAGE().

Reviewed by:    markj
MFC after:      1 week
Sponsored by:   The FreeBSD Foundation
Differential Revision:  https://reviews.freebsd.org/D48632

(cherry picked from commit 16317a174a5288f0377f8d40421b5c7821d57ac2)
2025-04-08 15:38:22 +02:00
Olivier Certner
6230945b88
vm_phys_early_startup(): Panic if phys_avail[] is empty
Reviewed by:    markj
MFC after:      1 week
Sponsored by:   The FreeBSD Foundation
Differential Revision:  https://reviews.freebsd.org/D48631

(cherry picked from commit 32e77bcdec5c034a9252876aa018f0bf34b36dbc)
2025-04-08 15:38:22 +02:00
Olivier Certner
25e25d6f1b
vm_phys_avail_split(): Tolerate split requests at boundaries
Previously, such requests would lead to a panic.  The only caller so far
(vm_phys_early_startup()) actually faces the case where some address can
be one of the chunk's boundaries and has to test it by hand.  Moreover,
a later commit will introduce vm_phys_early_alloc_ex(), which will also
have to deal with such boundary cases.

Consequently, make this function handle boundaries by not splitting the
chunk and returning EJUSTRETURN instead of 0 to distinguish this case
from the "was split" result.

While here, expand the panic message when the address to split is not in
the passed chunk with available details.

Reviewed by:    markj
MFC after:      1 week
Sponsored by:   The FreeBSD Foundation
Differential Revision:  https://reviews.freebsd.org/D48630

(cherry picked from commit e1499bfff8b8c128d7b3d330f95e0c67d7c1fa77)
2025-04-08 15:38:21 +02:00
Olivier Certner
7658c4f8c7
vm_phys_avail_count(): Fix out-of-bounds accesses
On improper termination of phys_avail[] (two consecutive 0 starting at
an even index), this function would (unnecessarily) continue searching
for the termination markers even if the index was out of bounds.

Reviewed by:    markj
MFC after:      1 week
Sponsored by:   The FreeBSD Foundation
Differential Revision:  https://reviews.freebsd.org/D48629

(cherry picked from commit 291b7bf071e8b50f2b7877213b2d3307ae5d3e38)
2025-04-08 15:38:21 +02:00
Olivier Certner
f106887cb0
vm_phys: Check for overlap when adding a segment
Segments are passed by machine-dependent routines, so explicit checks
will make debugging much easier on very weird machines or when someone
is tweaking these machine-dependent routines.  Additionally, this
operation is not performance-sensitive.

For the same reasons, test that we don't reach the maximum number of
physical segments (the compile-time of the internal storage) in
production kernels (replaces the existing KASSERT()).

Reviewed by:    markj
MFC after:      1 week
Sponsored by:   The FreeBSD Foundation
Differential Revision:  https://reviews.freebsd.org/D48628

(cherry picked from commit 8a14ddcc1d8e4384d8ad77c5536c916c6e9a7d65)
2025-04-08 15:38:21 +02:00
Olivier Certner
088dd40169
vm_phys_add_seg(): Check for bad segments, allow empty ones
A bad specification is if 'start' is strictly greater than 'end', or
bounds are not page aligned.

The latter was already tested under INVARIANTS, but now will be also on
production kernels.  The reason is that vm_phys_early_startup() pours
early segments into the final phys_segs[] array via vm_phys_add_seg(),
but vm_phys_early_add_seg() did not check their validity.  Checking
segments once and for all in vm_phys_add_seg() avoids duplicating
validity tests and is possible since early segments are not used before
being poured into phys_segs[].  Finally, vm_phys_add_seg() is not
performance critical.

Allow empty segments and discard them (silently, unless 'bootverbose' is
true), as vm_page_startup() was testing for this case before calling
vm_phys_add_seg(), and we felt the same test in vm_phys_early_startup()
was due before calling vm_phys_add_seg().  As a consequence, remove the
empty segment test from vm_page_startup().

Reviewed by:    markj
MFC after:      1 week
Sponsored by:   The FreeBSD Foundation
Differential Revision:  https://reviews.freebsd.org/D48627

(cherry picked from commit f30309abcce4cec891413da5cba2db92dd6ab0d7)
2025-04-08 15:38:21 +02:00
Olivier Certner
9dc47f536d
vm_phys_avail_check(): Check index parity, fix panic messages
The passed index must be the start of a chunk in phys_avail[], so must
be even.  Test for that and print a separate panic message.

While here, fix panic messages: In one, the wrong chunk boundary was
printed, and in another, the desired but not the actual condition was
printed, possibly leading to confusion.

Reviewed by:    markj
MFC after:      1 week
Sponsored by:   The FreeBSD Foundation
Differential Revision:  https://reviews.freebsd.org/D48626

(cherry picked from commit 125ef4e041fed40fed2d00b0ddd90fa0eb7b6ac3)
2025-04-08 15:38:20 +02:00
Konstantin Belousov
e2af76d470 Add sysctl kern.proc.kqueue
(cherry picked from commit e60f608eb9cf3b38099948545934d699de9bbcea)
2025-04-07 04:28:20 +03:00
Mark Johnston
02324ae827 uma: Avoid excessive per-CPU draining
After commit 389a3fa693, uma_reclaim_domain(UMA_RECLAIM_DRAIN_CPU)
calls uma_zone_reclaim_domain(UMA_RECLAIM_DRAIN_CPU) twice on each zone
in addition to globally draining per-CPU caches. This was unintended
and is unnecessarily slow; in particular, draining per-CPU caches
requires binding to each CPU.

Stop draining per-CPU caches when visiting each zone, just do it once in
pcpu_cache_drain_safe() to minimize the amount of expensive sched_bind()
calls.

Fixes:		389a3fa693 ("uma: Add UMA_ZONE_UNMANAGED")
MFC after:	1 week
Sponsored by:	Klara, Inc.
Sponsored by:	NetApp, Inc.
Reviewed by:	gallatin, kib
Differential Revision:	https://reviews.freebsd.org/D49349

(cherry picked from commit f506d5af50fccc37f5aa9fe090e9a0d5f05506c8)
2025-03-31 18:35:33 +00:00
Mark Johnston
0db4588bbe thread: Simplify sanitizer integration with thread creation
fork() may allocate a new thread in one of two ways: from UMA, or cached
in a freed proc that was just allocated from UMA.  In either case, KASAN
and KMSAN need to initialize some state; in particular they need to
initialize the shadow mapping of the new thread's stack.

This is done differently between KASAN and KMSAN, which is confusing.
This patch improves things a bit:
- Add a new thread_recycle() function, which moves all kernel stack
  handling out of kern_fork.c, since it doesn't really belong there.
- Then, thread_alloc_stack() has only one local caller, so just inline
  it.
- Avoid redundant shadow stack initialization: thread_alloc()
  initializes the KMSAN shadow stack (via kmsan_thread_alloc()) even
  through vm_thread_new() already did that.
- Add kasan_thread_alloc(), for consistency with kmsan_thread_alloc().

No functional change intended.

Reviewed by:	khng
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D44891

(cherry picked from commit 800da341bc4a35f4b4d82d104b130825d9a42ffa)
2025-02-07 14:46:53 +00:00
Mark Johnston
67e54a07e9 vm_pageout: Add a chicken switch for multithreaded PQ_INACTIVE scanning
Right now we have the vm.pageout_cpus_per_thread tunable which controls
the number of threads to start up per CPU per NUMA domain, but after
booting, it's not possible to disable multi-threaded scanning.

There is at least one workload where this mechanism doesn't work well;
let's make it possible to disable it without a reboot, to simplify
troubleshooting.

Reviewed by:	dougm, kib
MFC after:	2 weeks
Sponsored by:	Klara, Inc.
Sponsored by:	Modirum MDPay
Differential Revision:	https://reviews.freebsd.org/D48377

(cherry picked from commit 55b343f4f9bc586eba5e26a2524a35f04dd60c65)
2025-01-23 13:58:07 +00:00
Mark Johnston
e16a2508e1 vm_pageout: Make vmd_oom a bool
No functional change intended.

Reviewed by:	dougm, kib
MFC after:	1 week
Sponsored by:	Klara, Inc.
Sponsored by:	Modirum MDPay
Differential Revision:	https://reviews.freebsd.org/D48376

(cherry picked from commit fe1165df4b776b14b21a04d2ef3fc4c46740c2f5)
2025-01-17 13:18:51 +00:00
Mark Johnston
b64e348055 buf: Add a runningbufclaim() helper
No functional change intended.

Reviewed by:	kib
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D47696

(cherry picked from commit 4efe531c9d50a803a28d001fab9cc3011eb1f587)
2024-12-06 14:51:09 +00:00
Mark Johnston
34182e597b swap_pager: Ensure that swapoff puts swapped-in pages in page queues
Readahead/behind pages are handled by the swap pager, but the get_pages
caller is responsible for putting fetched pages into queues (or wiring
them beforehand).

Note that the VM object lock prevents the newly queued page from being
immediately reclaimed in the window before it is marked dirty by
swap_pager_swapoff_object().

Reported by:	pho
Tested by:	pho
Reviewed by:	dougm, alc, kib
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D47526

(cherry picked from commit d11d407aee4835fd50811a5980125bb46748fa0b)
2024-11-28 14:38:17 +00:00
Konstantin Belousov
92a9501b6b vm_object: do not assume that un_pager.devp.dev is cdev
PR:	282533

(cherry picked from commit 580340dbdaaf372867e9ed3dd257430982753e5e)
2024-11-13 01:19:18 +02:00
Konstantin Belousov
c57dc755fa device_pager: rename the un_pager.devp.dev field to handle
(cherry picked from commit f0c07fe3d0007a4499f81583a99598cd0a74d45b)
2024-11-13 01:19:18 +02:00
Mark Johnston
1d271ba05f vm_meter: Fix laundry accounting
Pages in PQ_UNSWAPPABLE should be considered part of the laundry.
Otherwise, on systems with no swap, the total amount of memory visible
to tools like top(1) decreases.

It doesn't seem very useful to have a dedicated counter for unswappable
pages, and updating applications accordingly would be painful, so just
lump them in with laundry for now.

PR:		280846
Reviewed by:	bnovkov, kib
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D47216

(cherry picked from commit 6a07e67fb7a8b5687a492d9d70a10651d5933ff5)
2024-10-29 13:04:25 +00:00
Mark Johnston
9b42b98638 vm_object: Report laundry pages in kinfo_vmobject
Reviewed by:	bnovkov, kib
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D47214

(cherry picked from commit a86373bc93ee1c850943e8585d0d426479378145)
2024-10-29 13:04:25 +00:00
Mark Johnston
b947b53f0f vm_page: Fix a logic bug in vm_page_unwire_managed()
When releasing a page reference, we have logic for various cases, based
on the value of the counter.  But, the implementation fails to take into
account the possibility that the VPRC_BLOCKED flag is set, which is ORed
into the counter for short windows when removing mappings of a page.  If
the flag is set while the last reference is being released, we may fail
to add the page to a page queue when the last wiring reference is
released.

Fix the problem by performing comparisons with VPRC_BLOCKED masked off.
While here, add a related assertion.

Reviewed by:	dougm, kib
Tested by:	pho
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D46944

(cherry picked from commit c59166e5b4e8821556a3d23af7bd17ca556f2e22)
2024-10-26 12:58:50 +00:00
Konstantin Belousov
dfe83ae4da sysctl vm.vm_objects: report cdev name for device-backed objects
(cherry picked from commit d9daa28c364d0b1189ab616d8d697b4c9f748038)
2024-10-15 18:03:59 +03:00
Konstantin Belousov
dd7b445698 sysctl vm.objects: report objects backing posix shm segments
(cherry picked from commit b0b18b57a55b429cf3f625883da5dcb541b14960)
2024-10-15 17:50:17 +03:00
Konstantin Belousov
ec35a9c65d posix shm: mark backing objects with SHM_POSIXSHM flag
(cherry picked from commit a10870ecea813042db7c41e906e1a5c5693f8a34)
2024-10-15 17:50:17 +03:00
Konstantin Belousov
987c8e9afa kinfo_{vmobject,vmentry}: move copy of pathes into the vnode handling scope
(cherry picked from commit 71a66883b58f796baf2bf79a43a91c16a71673b3)
2024-10-15 17:50:16 +03:00
Konstantin Belousov
4d5f771c43 kinfo_vmobject: report backing object of the SysV shm segments
(cherry picked from commit 6a3fbdc7e9c8323a1c13c4afcc65f89cb47911e6)
2024-10-15 17:50:16 +03:00
Konstantin Belousov
1ef669ae41 vm_object: add OBJ_SYSVSHM flag to indicate SysV shm backing object
(cherry picked from commit f186252e0d6ef970a23c6af12ec34003df56055d)
2024-10-15 17:50:16 +03:00
Konstantin Belousov
891664589b vm_object: reformat flags definitions
(cherry picked from commit 34935a6b3723422ef27ce4eb80fbe52c3dab12fc)
2024-10-15 17:50:15 +03:00
Mark Johnston
3464b209d6 vm_object: Fix the argument type to vm_object_set_flag()
Reported by:	kib
Fixes:		9d52823bf1df ("vm_object: Widen the flags field")

(cherry picked from commit 7f1dfd6c33dbbb6c1136e987de554c5c5a7d014d)
2024-10-15 13:45:42 +00:00
Mark Johnston
8a5a9dbf38 vm_object: Widen the flags field
Take advantage of a nearby 2-byte hole to avoid growing the struct.
This way, only the offsets of "flags" and "pg_color" change.  Bump
__FreeBSD_version since some out-of-tree kernel modules may access these
fields, though I haven't found any examples so far.

Reviewed by:	kib
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D35905

(cherry picked from commit 9d52823bf1dfac237e58b5208299aaa5e2df42e9)
2024-10-15 13:45:28 +00:00
Mark Johnston
94e5ec7f86 vm_page: Use atomic loads for cmpset loops
Make sure that the compiler loads the initial value value only once.
Because atomic_fcmpset is used to load the value for subsequent
iterations, this is probably not needed, but we should not rely on that.

I verified that code generated for an amd64 GENERIC kernel does not
change.

Reviewed by:	dougm, alc, kib
Tested by:	pho
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D46943

(cherry picked from commit d8b32da2354d2fd72ae017fd63affa3684786e1f)
2024-10-15 12:39:44 +00:00
Konstantin Belousov
b0e45fea61 vm_page_free_pages_toq(): return the count of freed pages
(cherry picked from commit 1784fb44498da8007fb8cd8ee5060894eb5fe1e6)
2024-10-05 10:08:56 +03:00
Konstantin Belousov
4cb8ec6c6f vm_map: add vm_map_find_locked(9)
(cherry picked from commit 0ecbb28ce351652b3a2dae271eedf1eb3aa65400)
2024-10-05 10:08:54 +03:00
Andrew Turner
bdbb0be043 vm: Add kva_alloc_aligned
Add a function like kva_alloc that allows us to specify the alignment
of the virtual address space returned.

Reviewed by:	alc, kib, markj
Sponsored by:	Arm Ltd
Differential Revision:	https://reviews.freebsd.org/D42788

(cherry picked from commit 839999e7efdc980d5ada92ea93719c7e29765809)
2024-09-02 08:43:18 +00:00
Andrew Turner
3736b79f0f vm: Use vmem_xalloc in kva_alloc
The kernel_arena used in kva_alloc has the qcache disabled. vmem_alloc
will first try to use the qcache before falling back to vmem_xalloc.

Rather than trying to use the qcache in vmem_alloc just call
vmem_xalloc directly.

Reviewed by:	alc, kib, markj
Sponsored by:	Arm Ltd
Differential Revision:	https://reviews.freebsd.org/D42831

(cherry picked from commit 8daee410d2c13b4e8530b00e7877eeecf30bb064)
2024-09-02 08:43:01 +00:00
Konstantin Belousov
049a256e9a vm_page: add vm_page_clearref() helper
(cherry picked from commit 45cde0e439188589ca2511f6fd76829cbf68267e)
2024-07-21 11:50:29 +03:00
Konstantin Belousov
7a3d7aec41 pmap: move the smp_targeted_tlb_shutdown pointer stuff to amd64 pmap.h
Fixes:	bec000c9c1ef409989685bb03ff0532907befb4aESC
Sponsored by:	The FreeBSD Foundation

(cherry picked from commit 9c5d7e4a0c02bc45b61f565586da2abcc65d70fa)
2024-07-01 13:07:38 +00:00
Souradeep Chakrabarti
840d8e0c30 amd64: add a func pointer to tlb shootdown function
Make the tlb shootdown function as a pointer. By default, it still
points to the system function smp_targeted_tlb_shootdown(). It allows
other implemenations to overwrite in the future.

Reviewed by:	kib
Tested by:	whu
Authored-by:    Souradeep Chakrabarti <schakrabarti@microsoft.com>
Co-Authored-by: Erni Sri Satya Vennela <ernis@microsoft.com>
MFC after:	1 week
Sponsored by:	Microsoft
Differential Revision:	https://reviews.freebsd.org/D45174

(cherry picked from commit bec000c9c1ef409989685bb03ff0532907befb4a)
2024-07-01 13:03:02 +00:00
Mitchell Horne
227b486de4 Adjust comments referencing vm_mem_init()
I cannot find a time where the function was not named this.

Reviewed by:	kib, markj
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D45383

(cherry picked from commit deab57178f0b06eab56d7811674176985a8ea98d)
2024-06-06 11:23:01 -03:00
Konstantin Belousov
f0d6377a3e swap-like pagers: assert that writemapping decrease does not pass zero
(cherry picked from commit 6ada4e8a0ae901f0012015c8d277d80aad7d8f37)
2024-05-19 03:57:54 +03:00
Konstantin Belousov
4018bcdea8 cdev_pager_allocate(): ensure that the cdev_pager_ops ctr is called only once
PR:	278826

(cherry picked from commit e93404065177d6c909cd64bf7d74fe0d8df35edf)
2024-05-19 03:57:54 +03:00
Minsoo Choo
1f85f06276 vm_reserv_reclaim_contig: Return NULL not false
Reviewed by:	dougm, zlei
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D44667

(cherry picked from commit 989a2cf19d053954f0bad28790114a374b05c9c1)
2024-04-17 10:33:26 -04:00
Mark Johnston
5128c8948b swap_pager: Unbusy readahead pages after an I/O error
The swap pager itself allocates readahead pages, so should take care to
unbusy them after a read error, just as it does in the non-error case.

PR:		277538
Reviewed by:	olce, dougm, alc, kib
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D44646

(cherry picked from commit 4696650782e2e5cf7ae5823f1de04550c05b5b75)
2024-04-15 10:05:13 -04:00
Konstantin Belousov
8788c3d3fa sysctl vm.objects/vm.swap_objects: do not fill vnode info if jailed
(cherry picked from commit 38f5f2a4af5daeec7f13d39cad1ff4dc90da52d8)
2024-01-24 15:04:07 +02:00
Konstantin Belousov
e23813c0fe vm/vm_object.c: minor cleanup
(cherry picked from commit 69748e62e82a1f5ef77fd3e1b0c9d7e6a89d22b2)
2024-01-20 02:32:20 +02:00
Konstantin Belousov
64e869e9b9 Add vnode_pager_clean_{a,}sync(9)
(cherry picked from commit b068bb09a1a82d9fef0e939ad6135443a959e290)
2024-01-18 02:51:33 +02:00
Konstantin Belousov
c4c138072a vnode_pager_generic_putpages(): rename maxblksz local to max_offset
(cherry picked from commit ed1a88a3116a59b4fd37912099a575b4c8f559dc)
2024-01-18 02:51:33 +02:00
Konstantin Belousov
8ecd7bfd6c vnode_pager_generic_putpages(): correctly handle clean block at EOF
PR:	276191

(cherry picked from commit bdb46c21a3e68d4395d6e0b6a205187e655532b0)
2024-01-18 02:51:32 +02:00