Also, rename min_addr to default_addr, which better reflects what it
represents. The min_addr is not a minimum address in the same way that
max_addr is actually a maximum address that can be allocated. For
example, a non-zero hint can be less than min_addr and be allocated.
Reported by: dchagin
Reviewed by: dchagin, kib, markj
Fixes: d8e6f4946c "vm: Fix anonymous memory clustering under ASLR"
Differential Revision: https://reviews.freebsd.org/D41397
which requests to propagate lowest stack segment protection to the grow gap.
This seems to be required for Linux emulation.
Reported by: dchagin
Reviewed by: alc, markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D41099
mprotect(2) on the stack region needs to adjust guard stored protection,
so that e.g. enable executing on stack worked properly on stack growth.
Reported by: dchagin
Reviewed by: alc, markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D41099
Restructure the first phase slightly, to facilitate further changes.
Reviewed by: alc, markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D41099
Do not assume that protection is same as max_protection. Store both in
offset, packed in the same way as the prot syscall parameter.
Reviewed by: alc, markj (previous version)
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D41099
The function returns the newly created entry.
Use vm_map_insert1() in stack grow code to avoid gap entry re-lookup.
The comment update for vm_map_try_merge_entries() was suggested by dougm.
Suggested by: alc
Reviewed by: alc, markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D41099
Only a part of the object may be mapped.
Noted by: alc
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D41099
The resulting code is a bit more concise. No functional change
intended.
Reviewed by: alc, dougm, kib
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D41249
Fix the handling of address hints that are less than min_addr by
vm_map_find_min().
Reported by: dchagin
Reviewed by: kib
Fixes: d8e6f4946c "vm: Fix anonymous memory clustering under ASLR"
Differential Revision: https://reviews.freebsd.org/D41159
If mprotect(2) changed protection in the bottom of the currently grown
stack region, currently the changed protection would be used for the
stack grow on next fault. This is arguably unexpected.
Store the original protection for the entry at mmap(2) time in the
offset member of the gap vm_map_entry, and use it for protection of the
grown stack region.
PR: 272585
Reported by: John F. Carr <jfc@mit.edu>
Reviewed by: alc, markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D41089
By default, our ASLR implementation is supposed to cluster anonymous
memory allocations, unless the application's mmap(..., MAP_ANON, ...)
call included a non-zero address hint. Unfortunately, clustering
never occurred because kern_mmap() always replaced the given address
hint when it was zero. So, the ASLR implementation always believed
that a non-zero hint had been provided and randomized the mapping's
location in the address space. To fix this problem, I'm pushing down
the point at which we convert a hint of zero to the minimum allocatable
address from kern_mmap() to vm_map_find_min().
Reviewed by: kib
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D40743
When vm_map_remove() is called from vm_swapout_map_deactivate_pages()
due to swapout, PKRU attributes for the removed range must be kept
intact. Provide a variant of pmap_remove(), pmap_map_delete(), to
allow pmap to distinguish between real removes of the UVA mappings
and any other internal removes, e.g. swapout.
For non-amd64, pmap_map_delete() is stubbed by define to pmap_remove().
Reported by: andrew
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D39556
Store the shared page address in struct vmspace.
Also instead of storing absolute addresses of various shared page
segments save their offsets with respect to the shared page address.
This will be more useful when the shared page address is randomized.
Approved by: mw(mentor)
Sponsored by: Stormshield
Obtained from: Semihalf
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D35393
Now that OBJT_DEFAULT objects can't be instantiated, we can simplify
checks of the form object->type == OBJT_DEFAULT || (object->flags &
OBJ_SWAP) != 0. No functional change intended.
Reviewed by: alc, kib
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D35788
vm_object_allocate_anon() automatically sets "charge" to 0 if no cred
reference is provided, so the caller doesn't need any conditional logic.
No functional change intended.
Reviewed by: alc, kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D35781
Commit 4b8365d752 introduced the ability to dynamically register
VM object types, for use by tmpfs, which creates swap-backed objects.
As a part of this, checks for such objects changed from
object->type == OBJT_DEFAULT || object->type == OBJT_SWAP
to
object->type == OBJT_DEFAULT || (object->flags & OBJ_SWAP) != 0
In particular, objects of type OBJT_DEFAULT do not have OBJ_SWAP set;
the swap pager sets this flag when converting from OBJT_DEFAULT to
OBJT_SWAP.
A few of these checks are done without the object lock held. It turns
out that this can result in false negatives since the swap pager
converts objects like so:
object->type = OBJT_SWAP;
object->flags |= OBJ_SWAP;
Fix the problem by adding explicit tests for OBJT_SWAP objects in
unlocked checks.
PR: 258932
Fixes: 4b8365d752 ("Add OBJT_SWAP_TMPFS pager")
Reported by: bdrewery
Reviewed by: kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D35470
The approach taken by the stack gap implementation was to insert a
random gap between the top of the fixed stack mapping and the true top
of the main process stack. This approach was chosen so as to avoid
randomizing the previously fixed address of certain process metadata
stored at the top of the stack, but had some shortcomings. In
particular, mlockall(2) calls would wire the gap, bloating the process'
memory usage, and RLIMIT_STACK included the size of the gap so small
(< several MB) limits could not be used.
There is little value in storing each process' ps_strings at a fixed
location, as only very old programs hard-code this address; consumers
were converted decades ago to use a sysctl-based interface for this
purpose. Thus, this change re-implements stack address randomization by
simply breaking the convention of storing ps_strings at a fixed
location, and randomizing the location of the entire stack mapping.
This implementation is simpler and avoids the problems mentioned above,
while being unlikely to break compatibility anywhere the default ASLR
settings are used.
The kern.elfN.aslr.stack_gap sysctl is renamed to kern.elfN.aslr.stack,
and is re-enabled by default.
PR: 260303
Reviewed by: kib
Discussed with: emaste, mw
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33704
Define simple functions for alignment and boundary checks and use them
everywhere instead of having slightly different implementations
scattered about. Define them in vm_extern.h and use them where
possible where vm_extern.h is included.
Reviewed by: kib, markj
Differential Revision: https://reviews.freebsd.org/D33685
Calling setrlimit with stack gap enabled and with low values of stack
resource limit often caused the program to abort immediately after
exiting the syscall. This happened due to the fact that the resource
limit was calculated assuming that the stack started at sv_usrstack,
while with stack gap enabled the stack is moved by a random number
of bytes.
Save information about stack size in struct vmspace and adjust the
rlim_cur value. If the rlim_cur and stack gap is bigger than rlim_max,
then the value is truncated to rlim_max.
PR: 253208
Reviewed by: kib
Obtained from: Semihalf
Sponsored by: Stormshield
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D31516
This is OBJT_SWAP pager, specialized for tmpfs. Right now, both swap pager
and generic vm code have to explicitly handle swap objects which are tmpfs
vnode v_object, in the special ways. Replace (almost) all such places with
proper methods.
Since VM still needs a notion of the 'swap object', regardless of its
use, add yet another type-classification flag OBJ_SWAP. Set it in
vm_object_allocate() where other type-class flags are set.
This change almost completely eliminates the knowledge of tmpfs from VM,
and opens a way to make OBJT_SWAP_TMPFS loadable from tmpfs.ko.
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D30070
This eliminates the staircase of conditions in vm_map_entry_set_vnode_text().
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D30070
This prevents a situation where other thread modifies map entries
permissions between setting max_prot, then relocking, then setting prot,
confusing the operation outcome. E.g. you can get an error that is not
possible if operation is performed atomic.
Also enable setting rwx for max_prot even if map does not allow to set
effective rwx protection.
Reviewed by: brooks, markj (previous version)
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D28117
It is checked in vm_map_insert() and vm_map_protect() that PROT_WRITE |
PROT_EXEC are never specified together, if vm_map has MAP_WX flag set.
FreeBSD control flag allows specific binary to request WX exempt, and
there are per ABI boolean sysctls kern.elf{32,64}.allow_wx to enable/
disable globally.
Reviewed by: emaste, jhb
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D28050
On platforms without a direct map[*], vm_map_insert() may in rare
situations need to allocate a kernel map entry in order to allocate
kernel map entries. This poses a problem similar to the one solved for
vmem boundary tags by vmem_bt_alloc(). In fact the kernel map case is a
bit more complicated since we must allocate entries with the kernel map
locked, whereas vmem can recurse into itself because boundary tags are
allocated up-front.
The solution is to add a custom slab allocator for kmapentzone which
allocates KVA directly from kernel_map, bypassing the kmem_* layer.
This avoids mutual recursion with the vmem btag allocator. Then, when
vm_map_insert() allocates a new kernel map entry, it avoids triggering
allocation of a new slab with M_NOVM until after the insertion is
complete. Instead, vm_map_insert() allocates from the reserve and sets
a flag in kernel_map to trigger re-population of the reserve just before
the map is unlocked. This places an implicit upper bound on the number
of kernel map entries that may be allocated before the kernel map lock
is released, but in general a bound of 1 suffices.
[*] This also comes up on amd64 with UMA_MD_SMALL_ALLOC undefined, a
configuration required by some kernel sanitizers.
Discussed with: kib, rlibby
Reported by: andrew
Tested by: pho (i386 and amd64 with !UMA_MD_SMALL_ALLOC)
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26851
This is mostly mechanical except for vmspace_exit(). There, use the new
refcount_release_if_last() to avoid switching to vmspace0 unless other
processes are sharing the vmspace. In that case, upon switching to
vmspace0 we can unconditionally release the reference.
Remove the volatile qualifier from vm_refcnt now that accesses are
protected using refcount(9) KPIs.
Reviewed by: alc, kib, mmel
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D27057
The entries and their clip boundaries must be aligned on supported
superpages sizes from pagesizes[]. vm_map operations return Mach
error KERN_INVALID_ARGUMENT, which is usually translated to EINVAL, if
it would require clip not at the boundary.
In other words, entries force preserving virtual addresses superpage
properties.
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D24652
Today, the zone is only used to allocate a trio of kernel maps: the
kernel map itself, and the exec and pipe submaps. Maps for user
processes are dynamically allocated but are embedded in the vmspace
structure, which is allocated from its own zone. Make the
aforementioned kernel maps statically allocated and get rid of the zone.
While here, remove a stale comment above vmspace_alloc() and change the
names of locks initialized in vm_map_init() to match vmspace_zinit().
Reported by: alc
Reviewed by: alc, kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26052
In the most common case (fork+execve) this doesn't matter, but further
attempts to apply entropy would fail in (e.g.) a pre-fork server.
Reported by: Alfredo Mazzinghi
Reviewed by: kib, markj
Obtained from: CheriBSD
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D25966
All vm_object_page_remove() callers, except
linux_invalidate_mapping_pages() in the LinuxKPI, free swap space when
removing a range of pages from an object. The LinuxKPI case appears to
be an unintentional omission that could result in leaked swap blocks, so
unconditionally free swap space in vm_object_page_remove() to protect
against similar bugs in the future.
Reviewed by: alc, kib
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D25329