opnsense-src

mirror of https://github.com/opnsense/src.git synced 2026-04-04 17:05:14 -04:00

Author	SHA1	Message	Date
Vladimir Kondratyev	5d4bf0578f	LinuxKPI: Implement ksize() function. In Linux, ksize() gets the actual amount of memory allocated for a given object. This commit adds malloc_usable_size() to FreeBSD KPI which does the same. It also maps LinuxKPI ksize() to newly created function. ksize() function is used by drm-kmod. Reviewed by: hselasky, kib MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D26215	2020-08-29 19:26:31 +00:00
Eric van Gyzen	609de97e04	vm_pageout_scan_active: ensure ps_delta is initialized Reported by: Coverity Reviewed by: markj MFC after: 2 weeks Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D26212	2020-08-28 19:59:02 +00:00
Eric van Gyzen	a2e194654f	memstat_kvm_uma: fix reading of uma_zone_domain structures Coverity flagged the scaling by sizeof(uzd). That is the type of the pointer, so the scaling was already done by pointer arithmetic. However, this was also passing a stack frame pointer to kvm_read, so it was doubly wrong. Move ZDOM_GET into the !_KERNEL section and use it in libmemstat. Reported by: Coverity Reviewed by: markj MFC after: 2 weeks Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D26213	2020-08-28 19:50:40 +00:00
Mark Johnston	aea9103e06	Use a large kmem arena import size on NUMA systems. This helps minimize internal fragmentation that occurs when 2MB imports are interleaved across NUMA domains. Virtually all KVA allocations on direct map platforms consume more than one page, so the fragmentation manifests as runs of 511 4KB page mappings in the kernel. Reviewed by: alc, kib Tested by: pho Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D26050	2020-08-26 14:31:48 +00:00
Conrad Meyer	74f5530d7a	vm_pageout: Scale worker threads with CPUs Autoscale vm_pageout worker threads from r364129 with CPU count. The default is arbitrarily chosen to be 16 CPUs per worker thread, but can be adjusted with the vm.pageout_cpus_per_thread tunable. There will never be less than 1 thread per populated NUMA domain, and the previous arbitrary upper limit (at most ncpus/2 threads per NUMA domain) is preserved. Care is taken to gracefully handle asymmetric NUMA nodes, such as empty node systems (e.g., AMD 2990WX) and systems with nodes of varying size (e.g., some larger >20 core Intel Haswell/Broadwell Xeon). Reviewed by: kib, markj Sponsored by: Isilon Differential Revision: https://reviews.freebsd.org/D26152	2020-08-25 21:36:56 +00:00
Mark Johnston	411096d034	Permit vm_page_wire() to be called on pages not belonging to an object. For such pages ref_count is effectively a consumer-managed field, but there is no harm in calling vm_page_wire() on them. vm_page_unwire_noq() handles them as well. Relax the vm_page_wire() assertions to permit this case which is triggered by some out-of-tree code. [1] Also guard a conditional assertion with INVARIANTS. Otherwise the conditions are evaluated even though the result is unused. [2] Reported by: bz, cem [1], kib [2] Reviewed by: dougm, kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D26173	2020-08-25 13:45:06 +00:00
Matt Macy	9e5787d228	Merge OpenZFS support in to HEAD. The primary benefit is maintaining a completely shared code base with the community allowing FreeBSD to receive new features sooner and with less effort. I would advise against doing 'zpool upgrade' or creating indispensable pools using new features until this change has had a month+ to soak. Work on merging FreeBSD support in to what was at the time "ZFS on Linux" began in August 2018. I first publicly proposed transitioning FreeBSD to (new) OpenZFS on December 18th, 2018. FreeBSD support in OpenZFS was finally completed in December 2019. A CFT for downstreaming OpenZFS support in to FreeBSD was first issued on July 8th. All issues that were reported have been addressed or, for a couple of less critical matters there are pull requests in progress with OpenZFS. iXsystems has tested and dogfooded extensively internally. The TrueNAS 12 release is based on OpenZFS with some additional features that have not yet made it upstream. Improvements include: project quotas, encrypted datasets, allocation classes, vectorized raidz, vectorized checksums, various command line improvements, zstd compression. Thanks to those who have helped along the way: Ryan Moeller, Allan Jude, Zack Welch, and many others. Sponsored by: iXsystems, Inc. Differential Revision: https://reviews.freebsd.org/D25872	2020-08-25 02:21:27 +00:00
Mateusz Guzik	feabaaf995	cache: drop the always curthread argument from reverse lookup routines Note VOP_VPTOCNP keeps getting it as temporary compatibility for zfs. Tested by: pho	2020-08-24 08:57:02 +00:00
Andrew Gallatin	791dda877f	uma: record allocation failures due to zone limits The zone limit mechanism was recently reworked, and allocation failures due to limits being exceeded were inadvertently no longer being recorded. This would lead to, for example, mbuf allocation failures not being indicated in netstat -m or vmstat -z Reviewed by: markj Sponsored by: Netflix	2020-08-21 18:31:57 +00:00
Mateusz Guzik	7ad2a82da2	vfs: drop the error parameter from vn_isdisk, introduce vn_isdisk_error Most consumers pass NULL.	2020-08-19 02:51:17 +00:00
Mark Johnston	b21b022a81	Revert r364310. Some of the resulting fallout in CAM does not appear straightforward to fix, so simply revert the commit for now in the absence of a better solution. Discussed with: mjg Reported by: dhw	2020-08-18 14:09:49 +00:00
Gleb Smirnoff	1921bb7b68	With INVARIANTS panic immediately if M_WAITOK is requested in a non-sleepable context. Previously only _sleep() would panic. This will catch misuse of M_WAITOK at development stage rather than at stress load stage. Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D26027	2020-08-17 15:37:08 +00:00
Mark Johnston	7efe14cb99	Commit a missing piece of r364302. This had failed to apply due to a merge conflict. Reported by: Jenkins MFC with: r364302	2020-08-17 14:06:51 +00:00
Mark Johnston	7dd979dfef	Remove the VM map zone. Today, the zone is only used to allocate a trio of kernel maps: the kernel map itself, and the exec and pipe submaps. Maps for user processes are dynamically allocated but are embedded in the vmspace structure, which is allocated from its own zone. Make the aforementioned kernel maps statically allocated and get rid of the zone. While here, remove a stale comment above vmspace_alloc() and change the names of locks initialized in vm_map_init() to match vmspace_zinit(). Reported by: alc Reviewed by: alc, kib MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D26052	2020-08-17 13:02:01 +00:00
Konstantin Belousov	ffae7ea935	vm_object: allow paging_in_progress to be acquired after object termination. The vm objects are type-stable, and can be accessed even after the last reference is dropped, or in case of vnode objects, after vgone() destroyed it as well. Stop asserting that pip == 0 after vm_object_terminate() waited for existing owners to drop it, we only want to drain them before setting OBJ_DEAD flag. Also stop asserting pip == 0 in object destructor. Update comments explaining the interaction between paging_in_progress and termination. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D25968	2020-08-16 20:57:02 +00:00
Konstantin Belousov	419e5698a0	Atomically update vm_object vnp_size, where atomic is available. This will be used later, where it matters on 32bit arches. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D25968	2020-08-16 20:52:24 +00:00
Mateusz Guzik	a92a971bbb	vfs: remove the thread argument from vget It was already asserted to be curthread. Semantic patch: @@ expression arg1, arg2, arg3; @@ - vget(arg1, arg2, arg3) + vget(arg1, arg2)	2020-08-16 17:18:54 +00:00
Conrad Meyer	ea7b737a6f	vm_pageout: Correct threshold calculation on single-CPU systems Reported by: Michael Butler X-MFC-With: r364129	2020-08-14 18:48:48 +00:00
Conrad Meyer	b7883452d4	Back out unrelated change Reported by: kib, markj X-MFC-With: r364129	2020-08-12 00:21:30 +00:00
Conrad Meyer	0292c54bdb	Add support for multithreading the inactive queue pageout within a domain. In very high throughput workloads, the inactive scan can become overwhelmed as you have many cores producing pages and a single core freeing. Since Mark's introduction of batched pagequeue operations, we can now run multiple inactive threads working on independent batches. To avoid confusing the pid and other control algorithms, I (Jeff) do this in a mpi-like fan out and collect model that is driven from the primary page daemon. It decides whether the shortfall can be overcome with a single thread and if not dispatches multiple threads and waits for their results. The heuristic is based on timing the pageout activity and averaging a pages-per-second variable which is exponentially decayed. This is visible in sysctl and may be interesting for other purposes. I (Jeff) have verified that this does indeed double our paging throughput when used with two threads. With four we tend to run into other contention problems. For now I would like to commit this infrastructure with only a single thread enabled. The number of worker threads per domain can be controlled with the 'vm.pageout_threads_per_domain' tunable. Submitted by: jeff (earlier version) Discussed with: markj Tested by: pho Sponsored by: probably Netflix (based on contemporary commits) Differential Revision: https://reviews.freebsd.org/D21629	2020-08-11 20:37:45 +00:00
Mark Johnston	af32cefd7c	Check the UMA zone's full bucket cache before short-circuiting an alloc. The global "bucketdisable" flag indicates that we are in a low memory situation and should avoid allocating buckets. However, in the allocation path we were checking it before the full bucket cache and bailing even if the cache is non-empty. Defer the check so that we have a shot at allocating from the cache. This came up because M_NOWAIT allocations from the buf trie node zone must always succeed. In one scenario, all of the preallocated trie nodes were in the bucket list, and a new slab allocation could not succeed due to a memory shortage. The short-circuiting caused an allocation failure which triggered a panic. Reported by: pho Reviewed by: cem Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D25980	2020-08-10 20:34:45 +00:00
Brooks Davis	9f9cc3f989	Preserve ASLR vm_map flags across fork In the most common case (fork+execve) this doesn't matter, but further attempts to apply entropy would fail in (e.g.) a pre-fork server. Reported by: Alfredo Mazzinghi Reviewed by: kib, markj Obtained from: CheriBSD MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D25966	2020-08-06 16:20:20 +00:00
Mark Johnston	efec381dd1	Remove most lingering references to the page lock in comments. Finish updating comments to reflect new locking protocols introduced over the past year. In particular, vm_page_lock is now effectively unused. Reviewed by: kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D25868	2020-08-04 14:59:43 +00:00
Mark Johnston	96ad26eefb	Remove free_domain() and uma_zfree_domain(). These functions were introduced before UMA started ensuring that freed memory gets placed in domain-local caches. They no longer serve any purpose since UMA now provides their functionality by default. Remove them to simplyify the kernel memory allocator interfaces a bit. Reviewed by: cem, kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D25937	2020-08-04 13:58:36 +00:00
Mark Johnston	958d8f527c	Remove the volatile qualifier from busy_lock. Use atomic(9) to load the lock state. Some places were doing this already, so it was inconsistent. In initialization code, the lock state is still initialized with plain stores. Reviewed by: alc, kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D25861	2020-07-29 19:38:49 +00:00
Mark Johnston	f72e5be58a	vm_page_xbusy_claim(): Use atomics to update busy lock state. vm_page_xbusy_claim() could clobber the waiter bit. For its original use, kernel memory pages, this was not a problem since nothing would ever block on the busy lock for such pages. r363607 introduced a new use where this could in principle be a problem. Fix the problem by using atomic_cmpset to update the lock owner. Since this macro is defined only for INVARIANTS kernels the extra overhead doesn't seem prohibitive. Reported by: vangyzen Reviewed by: alc, kib, vangyzen Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D25859	2020-07-28 19:50:39 +00:00
Mark Johnston	782ebde52e	vm_page_free_invalid(): Relax the xbusy assertion. vm_page_assert_xbusied() asserts that the busying thread is the current thread. For some uses of vm_page_free_invalid() (e.g., error handling in vnode_pager_generic_getpages_done()), this condition might not hold. Reported by: Jenkins via trasz Reviewed by: chs, kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D25828	2020-07-27 14:25:10 +00:00
Doug Moore	00fd73d2da	Fix an overflow bug in the blist allocator that needlessly capped max swap size by dividing a value, which was always a multiple of 64, by 64. Remove the code that reduced max swap size down to that cap. Eliminate the distinction between BLIST_BMAP_RADIX and BLIST_META_RADIX. Call them both BLIST_RADIX. Make improvments to the blist self-test code to silence compiler warnings and to test larger blists. Reported by: jmallett Reviewed by: alc Discussed with: kib Tested by: pho Differential Revision: https://reviews.freebsd.org/D25736	2020-07-25 18:29:10 +00:00
Mateusz Guzik	ee74412269	vm: fix swap reservation leak and clean up surrounding code The code did not subtract from the global counter if per-uid reservation failed. Cleanup highlights: - load overcommit once - move per-uid manipulation to dedicated routines - don't fetch wire count if requested size is below the limit - convert return type from int to bool - ifdef the routines with _KERNEL to keep vm.h compilable by userspace Reviewed by: kib (previous version) Differential Revision: https://reviews.freebsd.org/D25787	2020-07-24 13:23:32 +00:00
Mateusz Guzik	126a2470b9	vm: annotate swap_reserved with __exclusive_cache_line The counter keeps being updated all the time and variables read afterwards share the cacheline. Note this still fundamentally does not scale and needs to be replaced, in the meantime gets a bandaid. brk1_processes -t 52 ops/s: before: 8598298 after: 9098080	2020-07-23 08:42:16 +00:00
Chuck Silvers	1bd12a3bb2	Fix vnode_pager handling of read ahead/behind pages when a disk read fails. Rather than marking the read ahead/behind pages valid even though they were not initialized, free them using the new function vm_page_free_invalid(). Reviewed by: markj, kib Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D25430	2020-07-17 23:10:35 +00:00
Chuck Silvers	4dfa06e114	Add a new function vm_page_free_invalid() for freeing invalid pages that might be wired. If the page is wired then it cannot be freed now, but the thread that eventually unwires it will free it at that point. Reviewed by: markj, kib Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D25430	2020-07-17 23:09:36 +00:00
Chuck Silvers	c3dbadc1fd	Revert my change from r361855 in favor of a better fix. Reviewed by: markj, kib Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D25430	2020-07-17 23:08:01 +00:00
Mark Johnston	a7752896f0	Add vm_map_valid_range_KBI(). This is required for standalone module builds. Reported by: hselasky Reviewed by: dougm, hselasky, kib MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D25650	2020-07-13 16:39:27 +00:00
Scott Long	ffc568ba8b	Revert r362998, r326999 while a better compatibility strategy is devised.	2020-07-09 22:38:36 +00:00
Scott Long	b302c2e5c9	Migrate the feature of excluding RAM pages to use "excludelist" as its nomenclature. MFC after: 1 week	2020-07-07 20:33:11 +00:00
Conrad Meyer	8a64110e43	vm: Add missing WITNESS warnings for M_WAITOK allocation vm_map_clip_{end,start} and lookup_clip_start allocate memory M_WAITOK for !system_map vm_maps. Add WITNESS warning annotation for !system_map callers who may be holding non-sleepable locks. Reviewed by: markj Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D25283	2020-06-29 16:54:00 +00:00
Mark Johnston	8c277118d8	Fix UMA's first-touch policy on systems with empty domains. Suppose a thread is running on a CPU in a NUMA domain with no physical RAM. When an item is freed to a first-touch zone, it ends up in the cross-domain bucket. When the bucket is full, it gets placed in another domain's bucket queue. However, when allocating an item, UMA will always go to the keg upon a per-CPU cache miss because the empty domain's bucket queue will always be empty. This means that a non-empty domain's bucket queues can grow very rapidly on such systems. For example, it can easily cause mbuf allocation failures when the zone limit is reached. Change cache_alloc() to follow a round-robin policy when running on an empty domain. Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D25355	2020-06-28 21:35:04 +00:00
Konstantin Belousov	ee06cffcd2	vm_page_free_prep(): correct description of the required page and object state. Reviewed by: markj Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D25482	2020-06-27 02:31:39 +00:00
Mark Johnston	84242cf68a	Call swap_pager_freespace() from vm_object_page_remove(). All vm_object_page_remove() callers, except linux_invalidate_mapping_pages() in the LinuxKPI, free swap space when removing a range of pages from an object. The LinuxKPI case appears to be an unintentional omission that could result in leaked swap blocks, so unconditionally free swap space in vm_object_page_remove() to protect against similar bugs in the future. Reviewed by: alc, kib Tested by: pho Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D25329	2020-06-25 15:21:21 +00:00
Jeff Roberson	c8b0a88b8d	Clarify some language. Favor primary where both master and primary were used in conjunction with secondary.	2020-06-20 20:21:04 +00:00
Edward Tomasz Napierala	52c81be11a	Add linux_madvise(2) instead of having Linux apps call the native FreeBSD madvise(2) directly. While some of the flag values match, most don't. PR: kern/230160 Reported by: markj Reviewed by: markj Discussed with: brooks, kib MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D25272	2020-06-20 18:29:22 +00:00
Mark Johnston	cdd02f43b9	Revert r362360. This commit was simply wrong since two different objects are locked. Reported by: lwhsu, pho Pointy hat: markj	2020-06-19 11:04:49 +00:00
Mark Johnston	f034074034	Restore a check unintentionally dropped in r362361. MFC with: r362361	2020-06-19 04:18:20 +00:00
Mark Johnston	0f1e6ec591	Add a helper function for validating VA ranges. Functions which take untrusted user ranges must validate against the bounds of the map, and also check for wraparound. Instead of having the same logic duplicated in a number of places, add a function to check. Reviewed by: dougm, kib Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D25328	2020-06-19 03:32:04 +00:00
Mark Johnston	61b006887e	Fix a double object unlock in vm_object_backing_collapse_wait(). Reviewed by: kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D25327	2020-06-19 03:31:46 +00:00
Conrad Meyer	a116b5d3e4	vm: Drop vm_map_clip_{start,end} macro wrappers No functional change. Reviewed by: dougm, markj Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D25282	2020-06-16 22:53:56 +00:00
Eric van Gyzen	8cc8c5864a	Honor db_pager_quit in some vm_object ddb commands These can be rather verbose. MFC after: 2 weeks Sponsored by: Dell EMC Isilon	2020-06-12 21:53:08 +00:00
Mateusz Guzik	7ce3a31286	vm: rework swap_pager_status to execute in constant time The lock-protected iteration is trivially avoidable. This removes a serialisation point from Linux binaries (which end up calling here from the sysinfo syscall).	2020-06-09 14:16:18 +00:00
Chuck Silvers	bd7d64f548	Don't mark pages as valid if reading the contents from disk fails. Instead, just skip marking pages valid if the read fails. Future attempts to access such pages will notice that they are not marked valid and try to read them from disk again. Reviewed by: kib, markj Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D25138	2020-06-06 00:47:59 +00:00

1 2 3 4 5 ...

4440 commits